Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1619 (16.2%) zerosZeros

Reproduction

Analysis started2024-05-11 06:55:46.803491
Analysis finished2024-05-11 06:55:48.714434
Duration1.91 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2100
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:48.931777image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.226
Min length2

Characters and Unicode

Total characters72260
Distinct characters427
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique106 ?
Unique (%)1.1%

Sample

1st row신길우창
2nd row하월곡아남
3rd row구산경남아너스빌
4th row백련산힐스테이트2차
5th row미아뉴타운두산위브트레지움
ValueCountFrequency (%)
아파트 165
 
1.5%
래미안 41
 
0.4%
아이파크 26
 
0.2%
e편한세상 24
 
0.2%
신반포 18
 
0.2%
sk뷰 17
 
0.2%
힐스테이트 17
 
0.2%
래미안에스티움 15
 
0.1%
북한산 14
 
0.1%
한가람아파트 14
 
0.1%
Other values (2167) 10370
96.7%
2024-05-11T06:55:49.918038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2608
 
3.6%
2449
 
3.4%
2332
 
3.2%
1845
 
2.6%
1635
 
2.3%
1571
 
2.2%
1475
 
2.0%
1407
 
1.9%
1357
 
1.9%
1276
 
1.8%
Other values (417) 54305
75.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66368
91.8%
Decimal Number 3374
 
4.7%
Uppercase Letter 853
 
1.2%
Space Separator 790
 
1.1%
Lowercase Letter 334
 
0.5%
Open Punctuation 154
 
0.2%
Close Punctuation 154
 
0.2%
Other Punctuation 119
 
0.2%
Dash Punctuation 107
 
0.1%
Letter Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2608
 
3.9%
2449
 
3.7%
2332
 
3.5%
1845
 
2.8%
1635
 
2.5%
1571
 
2.4%
1475
 
2.2%
1407
 
2.1%
1357
 
2.0%
1276
 
1.9%
Other values (371) 48413
72.9%
Uppercase Letter
ValueCountFrequency (%)
S 127
14.9%
C 122
14.3%
K 92
10.8%
D 82
9.6%
M 82
9.6%
L 67
7.9%
E 49
 
5.7%
H 43
 
5.0%
I 42
 
4.9%
A 40
 
4.7%
Other values (7) 107
12.5%
Lowercase Letter
ValueCountFrequency (%)
e 187
56.0%
s 26
 
7.8%
l 26
 
7.8%
i 25
 
7.5%
k 22
 
6.6%
v 17
 
5.1%
h 7
 
2.1%
w 6
 
1.8%
g 6
 
1.8%
c 6
 
1.8%
Decimal Number
ValueCountFrequency (%)
1 1016
30.1%
2 1010
29.9%
3 473
14.0%
4 227
 
6.7%
5 186
 
5.5%
6 150
 
4.4%
7 89
 
2.6%
8 87
 
2.6%
9 82
 
2.4%
0 54
 
1.6%
Other Punctuation
ValueCountFrequency (%)
, 94
79.0%
. 25
 
21.0%
Space Separator
ValueCountFrequency (%)
790
100.0%
Open Punctuation
ValueCountFrequency (%)
( 154
100.0%
Close Punctuation
ValueCountFrequency (%)
) 154
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 107
100.0%
Letter Number
ValueCountFrequency (%)
5
100.0%
Math Symbol
ValueCountFrequency (%)
~ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66368
91.8%
Common 4700
 
6.5%
Latin 1192
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2608
 
3.9%
2449
 
3.7%
2332
 
3.5%
1845
 
2.8%
1635
 
2.5%
1571
 
2.4%
1475
 
2.2%
1407
 
2.1%
1357
 
2.0%
1276
 
1.9%
Other values (371) 48413
72.9%
Latin
ValueCountFrequency (%)
e 187
15.7%
S 127
10.7%
C 122
10.2%
K 92
 
7.7%
D 82
 
6.9%
M 82
 
6.9%
L 67
 
5.6%
E 49
 
4.1%
H 43
 
3.6%
I 42
 
3.5%
Other values (19) 299
25.1%
Common
ValueCountFrequency (%)
1 1016
21.6%
2 1010
21.5%
790
16.8%
3 473
10.1%
4 227
 
4.8%
5 186
 
4.0%
( 154
 
3.3%
) 154
 
3.3%
6 150
 
3.2%
- 107
 
2.3%
Other values (7) 433
9.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66368
91.8%
ASCII 5887
 
8.1%
Number Forms 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2608
 
3.9%
2449
 
3.7%
2332
 
3.5%
1845
 
2.8%
1635
 
2.5%
1571
 
2.4%
1475
 
2.2%
1407
 
2.1%
1357
 
2.0%
1276
 
1.9%
Other values (371) 48413
72.9%
ASCII
ValueCountFrequency (%)
1 1016
17.3%
2 1010
17.2%
790
13.4%
3 473
 
8.0%
4 227
 
3.9%
e 187
 
3.2%
5 186
 
3.2%
( 154
 
2.6%
) 154
 
2.6%
6 150
 
2.5%
Other values (35) 1540
26.2%
Number Forms
ValueCountFrequency (%)
5
100.0%
Distinct2106
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:50.530740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique106 ?
Unique (%)1.1%

Sample

1st rowA15005602
2nd rowA13613001
3rd rowA12282203
4th rowA12201002
5th rowA14272314
ValueCountFrequency (%)
a10027073 15
 
0.1%
a14072701 14
 
0.1%
a14077901 13
 
0.1%
a13790703 13
 
0.1%
a10025387 12
 
0.1%
a13770608 12
 
0.1%
a10026207 12
 
0.1%
a13385303 12
 
0.1%
a13203303 12
 
0.1%
a13007001 12
 
0.1%
Other values (2096) 9873
98.7%
2024-05-11T06:55:51.423875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18884
21.0%
1 17599
19.6%
A 10000
11.1%
3 8954
9.9%
2 8231
9.1%
5 6075
 
6.8%
8 5398
 
6.0%
7 4765
 
5.3%
4 3887
 
4.3%
6 3420
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18884
23.6%
1 17599
22.0%
3 8954
11.2%
2 8231
10.3%
5 6075
 
7.6%
8 5398
 
6.7%
7 4765
 
6.0%
4 3887
 
4.9%
6 3420
 
4.3%
9 2787
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18884
23.6%
1 17599
22.0%
3 8954
11.2%
2 8231
10.3%
5 6075
 
7.6%
8 5398
 
6.7%
7 4765
 
6.0%
4 3887
 
4.9%
6 3420
 
4.3%
9 2787
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18884
21.0%
1 17599
19.6%
A 10000
11.1%
3 8954
9.9%
2 8231
9.1%
5 6075
 
6.8%
8 5398
 
6.0%
7 4765
 
5.3%
4 3887
 
4.3%
6 3420
 
3.8%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:52.056351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.9032
Min length2

Characters and Unicode

Total characters49032
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row산재보험료
2nd row사무용품비
3rd row도서인쇄비
4th row피복비
5th row세금과공과
ValueCountFrequency (%)
청소비 226
 
2.3%
통신비 222
 
2.2%
이자수익 218
 
2.2%
보험료 217
 
2.2%
교육비 215
 
2.1%
세대전기료 214
 
2.1%
사무용품비 210
 
2.1%
승강기유지비 210
 
2.1%
도서인쇄비 207
 
2.1%
수선유지비 205
 
2.1%
Other values (77) 7856
78.6%
2024-05-11T06:55:53.117347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5438
 
11.1%
3532
 
7.2%
2090
 
4.3%
1997
 
4.1%
1722
 
3.5%
1324
 
2.7%
1060
 
2.2%
860
 
1.8%
804
 
1.6%
757
 
1.5%
Other values (110) 29448
60.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49032
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5438
 
11.1%
3532
 
7.2%
2090
 
4.3%
1997
 
4.1%
1722
 
3.5%
1324
 
2.7%
1060
 
2.2%
860
 
1.8%
804
 
1.6%
757
 
1.5%
Other values (110) 29448
60.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49032
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5438
 
11.1%
3532
 
7.2%
2090
 
4.3%
1997
 
4.1%
1722
 
3.5%
1324
 
2.7%
1060
 
2.2%
860
 
1.8%
804
 
1.6%
757
 
1.5%
Other values (110) 29448
60.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49032
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5438
 
11.1%
3532
 
7.2%
2090
 
4.3%
1997
 
4.1%
1722
 
3.5%
1324
 
2.7%
1060
 
2.2%
860
 
1.8%
804
 
1.6%
757
 
1.5%
Other values (110) 29448
60.1%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202010
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202010
2nd row202010
3rd row202010
4th row202010
5th row202010

Common Values

ValueCountFrequency (%)
202010 10000
100.0%

Length

2024-05-11T06:55:53.556568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:55:53.910112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202010 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6661
Distinct (%)66.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2767713.5
Minimum-2645086
Maximum2.6155759 × 108
Zeros1619
Zeros (%)16.2%
Negative11
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:55:54.258465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2645086
5-th percentile0
Q146990
median297000
Q31227615
95-th percentile14395162
Maximum2.6155759 × 108
Range2.6420268 × 108
Interquartile range (IQR)1180625

Descriptive statistics

Standard deviation9524732.3
Coefficient of variation (CV)3.4413723
Kurtosis223.9428
Mean2767713.5
Median Absolute Deviation (MAD)297000
Skewness11.68665
Sum2.7677135 × 1010
Variance9.0720526 × 1013
MonotonicityNot monotonic
2024-05-11T06:55:54.721372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1619
 
16.2%
300000 66
 
0.7%
200000 65
 
0.7%
100000 58
 
0.6%
150000 37
 
0.4%
400000 34
 
0.3%
250000 33
 
0.3%
50000 32
 
0.3%
220000 26
 
0.3%
600000 26
 
0.3%
Other values (6651) 8004
80.0%
ValueCountFrequency (%)
-2645086 1
< 0.1%
-1224727 1
< 0.1%
-882680 1
< 0.1%
-807150 1
< 0.1%
-105000 1
< 0.1%
-80800 1
< 0.1%
-38000 1
< 0.1%
-14000 1
< 0.1%
-4459 1
< 0.1%
-1858 1
< 0.1%
ValueCountFrequency (%)
261557590 1
< 0.1%
260686776 1
< 0.1%
226800000 1
< 0.1%
210331284 1
< 0.1%
191397070 1
< 0.1%
180034120 1
< 0.1%
158332561 1
< 0.1%
148844336 1
< 0.1%
145865260 1
< 0.1%
125649195 1
< 0.1%

Interactions

2024-05-11T06:55:47.734204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:55:55.016224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.390
금액0.3901.000

Missing values

2024-05-11T06:55:48.232422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:55:48.570747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
76788신길우창A15005602산재보험료20201069940
48815하월곡아남A13613001사무용품비20201035500
21338구산경남아너스빌A12282203도서인쇄비202010136000
19390백련산힐스테이트2차A12201002피복비202010178930
72719미아뉴타운두산위브트레지움A14272314세금과공과2020101100
60850풍납동아한가람A13887302안전진단실시비20201064170
7554래미안 서초에스티지A10027221위탁관리수수료2020101485000
72720미아뉴타운두산위브트레지움A14272314소모품비2020100
41597청담현대3차A13510102승강기수익202010200000
43692도곡쌍용예가A13527019입주자대표회의운영비2020101078850
아파트명아파트코드비용명년월일금액
5010반포래미안아이파크A10026051고용보험료202010256880
71548번동금호어울림A14206002부과차익202010378
28995도봉삼환A13201207세대난방비20201015919270
89451신대방경남교수A15601102피복비20201034170
55010양재우성A13789203음식물처리비2020101421330
66611두산아파트A13983713복리후생비202010573330
23666래미안엘파인A13075402충당부채전입이자비용20201019357
98194한숲대림아파트A15785703장기수선비2020107807960
89084현대공무원A15384101도서인쇄비20201077000
83701고척벽산베스트블루밍A15208006통신비20201074490