Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1506 (15.1%) zerosZeros

Reproduction

Analysis started2024-05-11 06:53:41.304463
Analysis finished2024-05-11 06:53:43.516487
Duration2.21 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2102
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:44.055187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length19.5
Mean length7.349
Min length2

Characters and Unicode

Total characters73490
Distinct characters426
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique104 ?
Unique (%)1.0%

Sample

1st row길동우성1차
2nd row사당롯데캐슬샤인
3rd row방배아크로리버
4th row신길우성2차
5th row방화신동아
ValueCountFrequency (%)
아파트 196
 
1.8%
래미안 50
 
0.5%
아이파크 29
 
0.3%
북한산 22
 
0.2%
e편한세상 19
 
0.2%
해모로 18
 
0.2%
푸르지오 17
 
0.2%
고덕 15
 
0.1%
래미안밤섬리베뉴 15
 
0.1%
팰리스 15
 
0.1%
Other values (2176) 10448
96.3%
2024-05-11T06:53:45.270646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2720
 
3.7%
2545
 
3.5%
2463
 
3.4%
1736
 
2.4%
1673
 
2.3%
1620
 
2.2%
1482
 
2.0%
1468
 
2.0%
1378
 
1.9%
1241
 
1.7%
Other values (416) 55164
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67387
91.7%
Decimal Number 3434
 
4.7%
Space Separator 938
 
1.3%
Uppercase Letter 752
 
1.0%
Lowercase Letter 379
 
0.5%
Open Punctuation 170
 
0.2%
Close Punctuation 170
 
0.2%
Dash Punctuation 128
 
0.2%
Other Punctuation 124
 
0.2%
Letter Number 8
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2720
 
4.0%
2545
 
3.8%
2463
 
3.7%
1736
 
2.6%
1673
 
2.5%
1620
 
2.4%
1482
 
2.2%
1468
 
2.2%
1378
 
2.0%
1241
 
1.8%
Other values (371) 49061
72.8%
Uppercase Letter
ValueCountFrequency (%)
S 123
16.4%
C 102
13.6%
K 95
12.6%
L 70
9.3%
D 61
8.1%
M 61
8.1%
H 48
 
6.4%
G 38
 
5.1%
I 37
 
4.9%
E 23
 
3.1%
Other values (7) 94
12.5%
Lowercase Letter
ValueCountFrequency (%)
e 218
57.5%
i 36
 
9.5%
l 30
 
7.9%
v 20
 
5.3%
k 19
 
5.0%
s 17
 
4.5%
w 16
 
4.2%
c 10
 
2.6%
a 5
 
1.3%
g 5
 
1.3%
Decimal Number
ValueCountFrequency (%)
1 1047
30.5%
2 990
28.8%
3 468
13.6%
4 229
 
6.7%
5 205
 
6.0%
6 158
 
4.6%
7 94
 
2.7%
8 87
 
2.5%
9 79
 
2.3%
0 77
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 108
87.1%
. 16
 
12.9%
Space Separator
ValueCountFrequency (%)
938
100.0%
Open Punctuation
ValueCountFrequency (%)
( 170
100.0%
Close Punctuation
ValueCountFrequency (%)
) 170
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 128
100.0%
Letter Number
ValueCountFrequency (%)
8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67387
91.7%
Common 4964
 
6.8%
Latin 1139
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2720
 
4.0%
2545
 
3.8%
2463
 
3.7%
1736
 
2.6%
1673
 
2.5%
1620
 
2.4%
1482
 
2.2%
1468
 
2.2%
1378
 
2.0%
1241
 
1.8%
Other values (371) 49061
72.8%
Latin
ValueCountFrequency (%)
e 218
19.1%
S 123
10.8%
C 102
 
9.0%
K 95
 
8.3%
L 70
 
6.1%
D 61
 
5.4%
M 61
 
5.4%
H 48
 
4.2%
G 38
 
3.3%
I 37
 
3.2%
Other values (19) 286
25.1%
Common
ValueCountFrequency (%)
1 1047
21.1%
2 990
19.9%
938
18.9%
3 468
9.4%
4 229
 
4.6%
5 205
 
4.1%
( 170
 
3.4%
) 170
 
3.4%
6 158
 
3.2%
- 128
 
2.6%
Other values (6) 461
9.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67387
91.7%
ASCII 6095
 
8.3%
Number Forms 8
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2720
 
4.0%
2545
 
3.8%
2463
 
3.7%
1736
 
2.6%
1673
 
2.5%
1620
 
2.4%
1482
 
2.2%
1468
 
2.2%
1378
 
2.0%
1241
 
1.8%
Other values (371) 49061
72.8%
ASCII
ValueCountFrequency (%)
1 1047
17.2%
2 990
16.2%
938
15.4%
3 468
 
7.7%
4 229
 
3.8%
e 218
 
3.6%
5 205
 
3.4%
( 170
 
2.8%
) 170
 
2.8%
6 158
 
2.6%
Other values (34) 1502
24.6%
Number Forms
ValueCountFrequency (%)
8
100.0%
Distinct2104
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:46.350577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique104 ?
Unique (%)1.0%

Sample

1st rowA13481001
2nd rowA15609305
3rd rowA13706001
4th rowA15086007
5th rowA15722203
ValueCountFrequency (%)
a13672101 14
 
0.1%
a13582002 13
 
0.1%
a13805002 13
 
0.1%
a13501006 12
 
0.1%
a15086601 12
 
0.1%
a15279402 12
 
0.1%
a13002002 11
 
0.1%
a10027424 11
 
0.1%
a15370301 11
 
0.1%
a13983816 11
 
0.1%
Other values (2094) 9880
98.8%
2024-05-11T06:53:47.931125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18740
20.8%
1 17552
19.5%
A 10000
11.1%
3 8912
9.9%
2 8430
9.4%
5 6032
 
6.7%
8 5418
 
6.0%
7 4634
 
5.1%
4 4011
 
4.5%
6 3531
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18740
23.4%
1 17552
21.9%
3 8912
11.1%
2 8430
10.5%
5 6032
 
7.5%
8 5418
 
6.8%
7 4634
 
5.8%
4 4011
 
5.0%
6 3531
 
4.4%
9 2740
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18740
23.4%
1 17552
21.9%
3 8912
11.1%
2 8430
10.5%
5 6032
 
7.5%
8 5418
 
6.8%
7 4634
 
5.8%
4 4011
 
5.0%
6 3531
 
4.4%
9 2740
 
3.4%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18740
20.8%
1 17552
19.5%
A 10000
11.1%
3 8912
9.9%
2 8430
9.4%
5 6032
 
6.7%
8 5418
 
6.0%
7 4634
 
5.1%
4 4011
 
4.5%
6 3531
 
3.9%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:48.781845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8735
Min length2

Characters and Unicode

Total characters48735
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row선거관리위원회운영비
2nd row교통비
3rd row기타부대비
4th row상여
5th row주차장운영비
ValueCountFrequency (%)
청소비 234
 
2.3%
세대전기료 230
 
2.3%
소독비 225
 
2.2%
경비비 220
 
2.2%
급여 216
 
2.2%
세대수도료 215
 
2.1%
사무용품비 215
 
2.1%
퇴직급여 213
 
2.1%
승강기유지비 213
 
2.1%
수선유지비 208
 
2.1%
Other values (77) 7811
78.1%
2024-05-11T06:53:50.173247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5400
 
11.1%
3584
 
7.4%
2100
 
4.3%
1933
 
4.0%
1594
 
3.3%
1302
 
2.7%
1068
 
2.2%
889
 
1.8%
782
 
1.6%
724
 
1.5%
Other values (110) 29359
60.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48735
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5400
 
11.1%
3584
 
7.4%
2100
 
4.3%
1933
 
4.0%
1594
 
3.3%
1302
 
2.7%
1068
 
2.2%
889
 
1.8%
782
 
1.6%
724
 
1.5%
Other values (110) 29359
60.2%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48735
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5400
 
11.1%
3584
 
7.4%
2100
 
4.3%
1933
 
4.0%
1594
 
3.3%
1302
 
2.7%
1068
 
2.2%
889
 
1.8%
782
 
1.6%
724
 
1.5%
Other values (110) 29359
60.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48735
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5400
 
11.1%
3584
 
7.4%
2100
 
4.3%
1933
 
4.0%
1594
 
3.3%
1302
 
2.7%
1068
 
2.2%
889
 
1.8%
782
 
1.6%
724
 
1.5%
Other values (110) 29359
60.2%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202110
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202110
2nd row202110
3rd row202110
4th row202110
5th row202110

Common Values

ValueCountFrequency (%)
202110 10000
100.0%

Length

2024-05-11T06:53:50.640905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:53:51.012295image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202110 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6664
Distinct (%)66.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3240505.4
Minimum-5846250
Maximum2.3871034 × 108
Zeros1506
Zeros (%)15.1%
Negative13
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:53:51.379622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-5846250
5-th percentile0
Q157522.5
median300500
Q31357225.5
95-th percentile16845784
Maximum2.3871034 × 108
Range2.4455659 × 108
Interquartile range (IQR)1299703

Descriptive statistics

Standard deviation11037362
Coefficient of variation (CV)3.406062
Kurtosis128.90762
Mean3240505.4
Median Absolute Deviation (MAD)300500
Skewness9.3414054
Sum3.2405054 × 1010
Variance1.2182336 × 1014
MonotonicityNot monotonic
2024-05-11T06:53:51.830672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1506
 
15.1%
200000 76
 
0.8%
300000 62
 
0.6%
100000 61
 
0.6%
73000 57
 
0.6%
400000 45
 
0.4%
500000 39
 
0.4%
250000 38
 
0.4%
50000 37
 
0.4%
150000 37
 
0.4%
Other values (6654) 8042
80.4%
ValueCountFrequency (%)
-5846250 1
< 0.1%
-3000000 1
< 0.1%
-1362892 1
< 0.1%
-1120000 1
< 0.1%
-320000 1
< 0.1%
-242800 1
< 0.1%
-209640 1
< 0.1%
-35000 1
< 0.1%
-27999 1
< 0.1%
-12764 1
< 0.1%
ValueCountFrequency (%)
238710337 1
< 0.1%
217228060 1
< 0.1%
212703090 1
< 0.1%
212240894 1
< 0.1%
206069764 1
< 0.1%
198526630 1
< 0.1%
179093360 1
< 0.1%
167637421 1
< 0.1%
151217610 1
< 0.1%
147391205 1
< 0.1%

Interactions

2024-05-11T06:53:42.566744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:53:52.091727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.464
금액0.4641.000

Missing values

2024-05-11T06:53:43.022329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:53:43.371079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
40267길동우성1차A13481001선거관리위원회운영비2021100
91750사당롯데캐슬샤인A15609305교통비2021100
52549방배아크로리버A13706001기타부대비202110719570
80146신길우성2차A15086007상여2021100
95849방화신동아A15722203주차장운영비202110768710
40523명일현대A13482801감가상각비2021100
18961마포태영제2A12181102입주자대표회의운영비20211021740
15674DMC센트레빌A12072801교육비20211073000
35992래미안옥수리버젠A13375907입주자대표회의운영비2021101543690
9703힐스테이트 송파위례아파트A10027461공동전기료2021109316708
아파트명아파트코드비용명년월일금액
91282동작금강KCCA15608002충당부채전입이자비용2021100
64153상계미도A13971501알뜰시장수익202110159090
1906사가정 센트럴 아이파크 아파트A10024741기타부대비202110335970
87422구로주공A15286809잡수익202110363003
8508래미안에스티움A10027073홈네트워크설비유지비202110275000
43704수서가람A13523003기타사용료2021106600
85124오류금강수목원A15210211급여20211017968400
3949충무로엘크루메트로시티2A10025223검침비용202110163400
28439묵동신도1차A13184804복리후생비202110988180
45760논현동현A13582002교육비20211073000