Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1331 (13.3%) zerosZeros

Reproduction

Analysis started2024-05-11 06:49:17.164046
Analysis finished2024-05-11 06:49:19.755225
Duration2.59 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2181
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:20.132104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.4121
Min length2

Characters and Unicode

Total characters74121
Distinct characters429
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique118 ?
Unique (%)1.2%

Sample

1st row양평한솔
2nd row신트리4단지
3rd row당산성원아파트
4th row잠원동아
5th row고척월드메르디앙
ValueCountFrequency (%)
아파트 220
 
2.0%
래미안 43
 
0.4%
아이파크 29
 
0.3%
e편한세상 25
 
0.2%
북한산 20
 
0.2%
용산 20
 
0.2%
힐스테이트 18
 
0.2%
경남아너스빌 17
 
0.2%
sk뷰 17
 
0.2%
푸르지오 17
 
0.2%
Other values (2268) 10586
96.1%
2024-05-11T06:49:21.687428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2710
 
3.7%
2638
 
3.6%
2582
 
3.5%
1707
 
2.3%
1677
 
2.3%
1666
 
2.2%
1483
 
2.0%
1470
 
2.0%
1392
 
1.9%
1275
 
1.7%
Other values (419) 55521
74.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67852
91.5%
Decimal Number 3373
 
4.6%
Space Separator 1111
 
1.5%
Uppercase Letter 876
 
1.2%
Lowercase Letter 331
 
0.4%
Open Punctuation 170
 
0.2%
Close Punctuation 170
 
0.2%
Dash Punctuation 117
 
0.2%
Other Punctuation 117
 
0.2%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2710
 
4.0%
2638
 
3.9%
2582
 
3.8%
1707
 
2.5%
1677
 
2.5%
1666
 
2.5%
1483
 
2.2%
1470
 
2.2%
1392
 
2.1%
1275
 
1.9%
Other values (374) 49252
72.6%
Uppercase Letter
ValueCountFrequency (%)
C 179
20.4%
S 130
14.8%
K 117
13.4%
M 107
12.2%
D 107
12.2%
H 41
 
4.7%
L 39
 
4.5%
E 29
 
3.3%
I 28
 
3.2%
G 21
 
2.4%
Other values (7) 78
8.9%
Lowercase Letter
ValueCountFrequency (%)
e 202
61.0%
l 30
 
9.1%
i 23
 
6.9%
s 19
 
5.7%
k 17
 
5.1%
v 17
 
5.1%
c 8
 
2.4%
w 7
 
2.1%
h 6
 
1.8%
g 1
 
0.3%
Decimal Number
ValueCountFrequency (%)
1 1006
29.8%
2 996
29.5%
3 448
13.3%
4 262
 
7.8%
5 176
 
5.2%
6 150
 
4.4%
7 101
 
3.0%
8 94
 
2.8%
9 79
 
2.3%
0 61
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 89
76.1%
. 28
 
23.9%
Space Separator
ValueCountFrequency (%)
1111
100.0%
Open Punctuation
ValueCountFrequency (%)
( 170
100.0%
Close Punctuation
ValueCountFrequency (%)
) 170
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 117
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67852
91.5%
Common 5058
 
6.8%
Latin 1211
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2710
 
4.0%
2638
 
3.9%
2582
 
3.8%
1707
 
2.5%
1677
 
2.5%
1666
 
2.5%
1483
 
2.2%
1470
 
2.2%
1392
 
2.1%
1275
 
1.9%
Other values (374) 49252
72.6%
Latin
ValueCountFrequency (%)
e 202
16.7%
C 179
14.8%
S 130
10.7%
K 117
9.7%
M 107
8.8%
D 107
8.8%
H 41
 
3.4%
L 39
 
3.2%
l 30
 
2.5%
E 29
 
2.4%
Other values (19) 230
19.0%
Common
ValueCountFrequency (%)
1111
22.0%
1 1006
19.9%
2 996
19.7%
3 448
8.9%
4 262
 
5.2%
5 176
 
3.5%
( 170
 
3.4%
) 170
 
3.4%
6 150
 
3.0%
- 117
 
2.3%
Other values (6) 452
8.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67852
91.5%
ASCII 6265
 
8.5%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2710
 
4.0%
2638
 
3.9%
2582
 
3.8%
1707
 
2.5%
1677
 
2.5%
1666
 
2.5%
1483
 
2.2%
1470
 
2.2%
1392
 
2.1%
1275
 
1.9%
Other values (374) 49252
72.6%
ASCII
ValueCountFrequency (%)
1111
17.7%
1 1006
16.1%
2 996
15.9%
3 448
 
7.2%
4 262
 
4.2%
e 202
 
3.2%
C 179
 
2.9%
5 176
 
2.8%
( 170
 
2.7%
) 170
 
2.7%
Other values (34) 1545
24.7%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2185
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:22.778708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique119 ?
Unique (%)1.2%

Sample

1st rowA15010601
2nd rowA15807316
3rd rowA15004501
4th rowA13703027
5th rowA15208007
ValueCountFrequency (%)
a13872504 14
 
0.1%
a13905301 12
 
0.1%
a13786803 12
 
0.1%
a10027901 12
 
0.1%
a12284501 12
 
0.1%
a13204302 12
 
0.1%
a13380104 12
 
0.1%
a13084101 12
 
0.1%
a13805002 11
 
0.1%
a13676605 11
 
0.1%
Other values (2175) 9880
98.8%
2024-05-11T06:49:24.237335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18731
20.8%
1 17346
19.3%
A 10000
11.1%
3 8971
10.0%
2 8545
9.5%
5 6166
 
6.9%
8 5356
 
6.0%
7 4643
 
5.2%
4 4056
 
4.5%
6 3414
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18731
23.4%
1 17346
21.7%
3 8971
11.2%
2 8545
10.7%
5 6166
 
7.7%
8 5356
 
6.7%
7 4643
 
5.8%
4 4056
 
5.1%
6 3414
 
4.3%
9 2772
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18731
23.4%
1 17346
21.7%
3 8971
11.2%
2 8545
10.7%
5 6166
 
7.7%
8 5356
 
6.7%
7 4643
 
5.8%
4 4056
 
5.1%
6 3414
 
4.3%
9 2772
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18731
20.8%
1 17346
19.3%
A 10000
11.1%
3 8971
10.0%
2 8545
9.5%
5 6166
 
6.9%
8 5356
 
6.0%
7 4643
 
5.2%
4 4056
 
4.5%
6 3414
 
3.8%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:25.098429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8266
Min length2

Characters and Unicode

Total characters48266
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row잡비용
2nd row부과차익
3rd row수도광열비
4th row음식물처리비
5th row세대수도료
ValueCountFrequency (%)
사무용품비 240
 
2.4%
경비비 224
 
2.2%
도서인쇄비 222
 
2.2%
제수당 217
 
2.2%
수선유지비 216
 
2.2%
입주자대표회의운영비 215
 
2.1%
소독비 214
 
2.1%
급여 212
 
2.1%
승강기유지비 211
 
2.1%
세대전기료 209
 
2.1%
Other values (76) 7820
78.2%
2024-05-11T06:49:26.258199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5388
 
11.2%
3535
 
7.3%
2142
 
4.4%
1968
 
4.1%
1356
 
2.8%
1330
 
2.8%
1068
 
2.2%
851
 
1.8%
828
 
1.7%
807
 
1.7%
Other values (110) 28993
60.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48266
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5388
 
11.2%
3535
 
7.3%
2142
 
4.4%
1968
 
4.1%
1356
 
2.8%
1330
 
2.8%
1068
 
2.2%
851
 
1.8%
828
 
1.7%
807
 
1.7%
Other values (110) 28993
60.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48266
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5388
 
11.2%
3535
 
7.3%
2142
 
4.4%
1968
 
4.1%
1356
 
2.8%
1330
 
2.8%
1068
 
2.2%
851
 
1.8%
828
 
1.7%
807
 
1.7%
Other values (110) 28993
60.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48266
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5388
 
11.2%
3535
 
7.3%
2142
 
4.4%
1968
 
4.1%
1356
 
2.8%
1330
 
2.8%
1068
 
2.2%
851
 
1.8%
828
 
1.7%
807
 
1.7%
Other values (110) 28993
60.1%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202309
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202309
2nd row202309
3rd row202309
4th row202309
5th row202309

Common Values

ValueCountFrequency (%)
202309 10000
100.0%

Length

2024-05-11T06:49:26.643802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:49:26.941335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202309 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7023
Distinct (%)70.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3673597
Minimum-7847400
Maximum5.193063 × 108
Zeros1331
Zeros (%)13.3%
Negative10
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:49:27.328906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-7847400
5-th percentile0
Q151625
median288800
Q31451175
95-th percentile17806809
Maximum5.193063 × 108
Range5.271537 × 108
Interquartile range (IQR)1399550

Descriptive statistics

Standard deviation14406545
Coefficient of variation (CV)3.9216454
Kurtosis369.91691
Mean3673597
Median Absolute Deviation (MAD)288800
Skewness14.689921
Sum3.673597 × 1010
Variance2.0754853 × 1014
MonotonicityNot monotonic
2024-05-11T06:49:27.795172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1331
 
13.3%
200000 87
 
0.9%
100000 78
 
0.8%
35000 64
 
0.6%
300000 55
 
0.5%
150000 47
 
0.5%
50000 33
 
0.3%
250000 31
 
0.3%
500000 28
 
0.3%
30000 26
 
0.3%
Other values (7013) 8220
82.2%
ValueCountFrequency (%)
-7847400 1
< 0.1%
-2698750 1
< 0.1%
-2355864 1
< 0.1%
-1200000 1
< 0.1%
-903060 1
< 0.1%
-476100 1
< 0.1%
-225200 1
< 0.1%
-75000 1
< 0.1%
-2944 1
< 0.1%
-1000 1
< 0.1%
ValueCountFrequency (%)
519306304 1
< 0.1%
445116879 1
< 0.1%
418539763 1
< 0.1%
282962224 1
< 0.1%
227525420 1
< 0.1%
187975691 1
< 0.1%
179708239 1
< 0.1%
179223119 1
< 0.1%
174538165 1
< 0.1%
161438200 1
< 0.1%

Interactions

2024-05-11T06:49:18.855828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:49:28.059420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.379
금액0.3791.000

Missing values

2024-05-11T06:49:19.315415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:49:19.610117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
77577양평한솔A15010601잡비용202309200000
98553신트리4단지A15807316부과차익2023093576
75740당산성원아파트A15004501수도광열비2023097510
52514잠원동아A13703027음식물처리비2023091261330
82534고척월드메르디앙A15208007세대수도료2023093827100
91326사당극동A15681503경비비20230928080170
22974북한산힐스테이트1차A12204003검침수익202309259290
39901명일삼환아파트A13407202광고료수익20230960000
4009포레나노원 아파트A10024547연체료수익20230968470
42846삼성청구타운A13509101승강기유지비202309662200
아파트명아파트코드비용명년월일금액
10209당산롯데캐슬프레스티지 아파트A10026797광고료수익202309160000
30009묵동신안1차A13185507교육비2023090
48877정릉힐스테이트1차아파트A13610003기타운영비용2023090
31351도봉삼환A13201207연차수당2023091472360
32352창동주공4단지A13204104피복비202309103400
35114도봉래미안A13293505세대전기료20230924051300
80209삼성산주공3단지A15101506복리후생비2023094120000
97055화곡푸르지오A15792602지급수수료20230951840
75411영등포삼환A15003801선거관리위원회운영비202309250000
8787DMC센트럴아이파크A10025976공동주택지원금수익2023090