Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1073 (10.7%) zerosZeros

Reproduction

Analysis started2024-05-11 06:48:36.389691
Analysis finished2024-05-11 06:48:38.083261
Duration1.69 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2250
Distinct (%)22.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:48:38.402754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.4425
Min length2

Characters and Unicode

Total characters74425
Distinct characters433
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique128 ?
Unique (%)1.3%

Sample

1st row용산원효루미니
2nd row포레나노원 아파트
3rd row사당4-3우성
4th row흑석한강푸르지오
5th row상계동양메이저
ValueCountFrequency (%)
아파트 197
 
1.8%
래미안 42
 
0.4%
e편한세상 41
 
0.4%
아이파크 35
 
0.3%
푸르지오 19
 
0.2%
고덕 18
 
0.2%
해모로 16
 
0.1%
이편한세상 15
 
0.1%
브라운스톤 15
 
0.1%
센트럴 14
 
0.1%
Other values (2334) 10517
96.2%
2024-05-11T06:48:39.280709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2655
 
3.6%
2606
 
3.5%
2498
 
3.4%
1907
 
2.6%
1659
 
2.2%
1597
 
2.1%
1493
 
2.0%
1452
 
2.0%
1448
 
1.9%
1415
 
1.9%
Other values (423) 55695
74.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 68214
91.7%
Decimal Number 3529
 
4.7%
Space Separator 1014
 
1.4%
Uppercase Letter 866
 
1.2%
Lowercase Letter 323
 
0.4%
Open Punctuation 132
 
0.2%
Close Punctuation 132
 
0.2%
Dash Punctuation 113
 
0.2%
Other Punctuation 98
 
0.1%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2655
 
3.9%
2606
 
3.8%
2498
 
3.7%
1907
 
2.8%
1659
 
2.4%
1597
 
2.3%
1493
 
2.2%
1452
 
2.1%
1448
 
2.1%
1415
 
2.1%
Other values (378) 49484
72.5%
Uppercase Letter
ValueCountFrequency (%)
C 141
16.3%
S 137
15.8%
K 99
11.4%
M 94
10.9%
D 94
10.9%
L 57
6.6%
H 53
 
6.1%
E 42
 
4.8%
I 40
 
4.6%
V 28
 
3.2%
Other values (7) 81
9.4%
Lowercase Letter
ValueCountFrequency (%)
e 202
62.5%
l 22
 
6.8%
i 22
 
6.8%
k 19
 
5.9%
s 16
 
5.0%
v 16
 
5.0%
c 10
 
3.1%
w 8
 
2.5%
a 3
 
0.9%
g 3
 
0.9%
Decimal Number
ValueCountFrequency (%)
2 1060
30.0%
1 1042
29.5%
3 469
13.3%
4 245
 
6.9%
5 211
 
6.0%
6 146
 
4.1%
7 114
 
3.2%
8 95
 
2.7%
9 92
 
2.6%
0 55
 
1.6%
Other Punctuation
ValueCountFrequency (%)
, 73
74.5%
. 25
 
25.5%
Space Separator
ValueCountFrequency (%)
1014
100.0%
Open Punctuation
ValueCountFrequency (%)
( 132
100.0%
Close Punctuation
ValueCountFrequency (%)
) 132
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 113
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 68214
91.7%
Common 5018
 
6.7%
Latin 1193
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2655
 
3.9%
2606
 
3.8%
2498
 
3.7%
1907
 
2.8%
1659
 
2.4%
1597
 
2.3%
1493
 
2.2%
1452
 
2.1%
1448
 
2.1%
1415
 
2.1%
Other values (378) 49484
72.5%
Latin
ValueCountFrequency (%)
e 202
16.9%
C 141
11.8%
S 137
11.5%
K 99
8.3%
M 94
7.9%
D 94
7.9%
L 57
 
4.8%
H 53
 
4.4%
E 42
 
3.5%
I 40
 
3.4%
Other values (19) 234
19.6%
Common
ValueCountFrequency (%)
2 1060
21.1%
1 1042
20.8%
1014
20.2%
3 469
9.3%
4 245
 
4.9%
5 211
 
4.2%
6 146
 
2.9%
( 132
 
2.6%
) 132
 
2.6%
7 114
 
2.3%
Other values (6) 453
9.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 68214
91.7%
ASCII 6207
 
8.3%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2655
 
3.9%
2606
 
3.8%
2498
 
3.7%
1907
 
2.8%
1659
 
2.4%
1597
 
2.3%
1493
 
2.2%
1452
 
2.1%
1448
 
2.1%
1415
 
2.1%
Other values (378) 49484
72.5%
ASCII
ValueCountFrequency (%)
2 1060
17.1%
1 1042
16.8%
1014
16.3%
3 469
 
7.6%
4 245
 
3.9%
5 211
 
3.4%
e 202
 
3.3%
6 146
 
2.4%
C 141
 
2.3%
S 137
 
2.2%
Other values (34) 1540
24.8%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2254
Distinct (%)22.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:48:39.979617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique129 ?
Unique (%)1.3%

Sample

1st rowA10023798
2nd rowA10024547
3rd rowA15681501
4th rowA15679108
5th rowA13981608
ValueCountFrequency (%)
a15703304 13
 
0.1%
a12187403 12
 
0.1%
a15210211 12
 
0.1%
a13186305 11
 
0.1%
a10027633 11
 
0.1%
a15606001 11
 
0.1%
a12125202 11
 
0.1%
a13606004 11
 
0.1%
a12172401 11
 
0.1%
a13290404 11
 
0.1%
Other values (2244) 9886
98.9%
2024-05-11T06:48:41.009297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18582
20.6%
1 17469
19.4%
A 10000
11.1%
3 8859
9.8%
2 8437
9.4%
5 6159
 
6.8%
8 5487
 
6.1%
7 4601
 
5.1%
4 4018
 
4.5%
6 3453
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18582
23.2%
1 17469
21.8%
3 8859
11.1%
2 8437
10.5%
5 6159
 
7.7%
8 5487
 
6.9%
7 4601
 
5.8%
4 4018
 
5.0%
6 3453
 
4.3%
9 2935
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18582
23.2%
1 17469
21.8%
3 8859
11.1%
2 8437
10.5%
5 6159
 
7.7%
8 5487
 
6.9%
7 4601
 
5.8%
4 4018
 
5.0%
6 3453
 
4.3%
9 2935
 
3.7%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18582
20.6%
1 17469
19.4%
A 10000
11.1%
3 8859
9.8%
2 8437
9.4%
5 6159
 
6.8%
8 5487
 
6.1%
7 4601
 
5.1%
4 4018
 
4.5%
6 3453
 
3.8%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:48:41.552431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8053
Min length2

Characters and Unicode

Total characters48053
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row청소비
2nd row주차장수익
3rd row세대급탕비
4th row교육비
5th row도서인쇄비
ValueCountFrequency (%)
급여 236
 
2.4%
연체료수익 234
 
2.3%
소독비 232
 
2.3%
승강기유지비 226
 
2.3%
경비비 222
 
2.2%
교육비 222
 
2.2%
수선유지비 221
 
2.2%
고용보험료 220
 
2.2%
통신비 218
 
2.2%
잡수익 215
 
2.1%
Other values (77) 7754
77.5%
2024-05-11T06:48:42.725445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5406
 
11.3%
3635
 
7.6%
2199
 
4.6%
2002
 
4.2%
1371
 
2.9%
1346
 
2.8%
1064
 
2.2%
930
 
1.9%
843
 
1.8%
803
 
1.7%
Other values (110) 28454
59.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48053
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5406
 
11.3%
3635
 
7.6%
2199
 
4.6%
2002
 
4.2%
1371
 
2.9%
1346
 
2.8%
1064
 
2.2%
930
 
1.9%
843
 
1.8%
803
 
1.7%
Other values (110) 28454
59.2%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48053
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5406
 
11.3%
3635
 
7.6%
2199
 
4.6%
2002
 
4.2%
1371
 
2.9%
1346
 
2.8%
1064
 
2.2%
930
 
1.9%
843
 
1.8%
803
 
1.7%
Other values (110) 28454
59.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48053
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5406
 
11.3%
3635
 
7.6%
2199
 
4.6%
2002
 
4.2%
1371
 
2.9%
1346
 
2.8%
1064
 
2.2%
930
 
1.9%
843
 
1.8%
803
 
1.7%
Other values (110) 28454
59.2%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202306
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202306
2nd row202306
3rd row202306
4th row202306
5th row202306

Common Values

ValueCountFrequency (%)
202306 10000
100.0%

Length

2024-05-11T06:48:43.142179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:48:43.409232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202306 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7347
Distinct (%)73.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3613122.1
Minimum-9386072
Maximum3.2769172 × 108
Zeros1073
Zeros (%)10.7%
Negative7
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:48:43.670204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-9386072
5-th percentile0
Q175000
median314313.5
Q31500250
95-th percentile18421460
Maximum3.2769172 × 108
Range3.3707779 × 108
Interquartile range (IQR)1425250

Descriptive statistics

Standard deviation13500400
Coefficient of variation (CV)3.7364916
Kurtosis186.63007
Mean3613122.1
Median Absolute Deviation (MAD)311912
Skewness11.337319
Sum3.6131221 × 1010
Variance1.8226081 × 1014
MonotonicityNot monotonic
2024-05-11T06:48:44.032546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1073
 
10.7%
200000 84
 
0.8%
100000 64
 
0.6%
300000 52
 
0.5%
30000 42
 
0.4%
150000 39
 
0.4%
50000 33
 
0.3%
250000 32
 
0.3%
400000 32
 
0.3%
60000 31
 
0.3%
Other values (7337) 8518
85.2%
ValueCountFrequency (%)
-9386072 1
 
< 0.1%
-3813700 1
 
< 0.1%
-2111647 1
 
< 0.1%
-378835 1
 
< 0.1%
-359860 1
 
< 0.1%
-240000 1
 
< 0.1%
-38500 1
 
< 0.1%
0 1073
10.7%
1 2
 
< 0.1%
2 1
 
< 0.1%
ValueCountFrequency (%)
327691720 1
< 0.1%
316444270 1
< 0.1%
308092974 1
< 0.1%
269644910 1
< 0.1%
257741446 1
< 0.1%
246798915 1
< 0.1%
235793698 1
< 0.1%
231980129 1
< 0.1%
207953100 1
< 0.1%
205460450 1
< 0.1%

Interactions

2024-05-11T06:48:37.244388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:48:44.306937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.441
금액0.4411.000

Missing values

2024-05-11T06:48:37.603644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:48:37.937189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
965용산원효루미니A10023798청소비20230619518051
3624포레나노원 아파트A10024547주차장수익2023064941670
89097사당4-3우성A15681501세대급탕비20230616609320
88719흑석한강푸르지오A15679108교육비202306155000
63937상계동양메이저A13981608도서인쇄비202306276100
88726흑석한강푸르지오A15679108이자수익202306274343
46786삼선코오롱아파트A13604401소독비202306345000
31693창동쌍용A13204406사무용품비202306373520
16857충정리시온A12070201검침비용202306110940
95519목동금호어울림A15805403피복비2023060
아파트명아파트코드비용명년월일금액
93255강서센트레빌4차A15781201경비비2023068281370
31597창동현대4차아이파크A13204402검침수익20230684000
60737상계주공9단지A13921005세대난방비2023069837700
46924돈암일신건영휴먼빌A13606003교통비2023062400
18829상암월드컵파크5단지A12127001잡비용202306100000
76167래미안당산1차아파트A15081001승강기유지비202306880000
37339한양현대아파트A13384301위탁관리수수료202306160000
38417둔촌프라자A13406005고용보험료202306216690
92366화곡초록A15770801산재보험료202306197510
81593오류금강수목원A15210211통신비202306148247