Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1525 (15.2%) zerosZeros

Reproduction

Analysis started2024-05-11 06:51:03.934481
Analysis finished2024-05-11 06:51:05.807571
Duration1.87 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2130
Distinct (%)21.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:06.041537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length21
Mean length7.3437
Min length2

Characters and Unicode

Total characters73437
Distinct characters429
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique109 ?
Unique (%)1.1%

Sample

1st row압구정신현대
2nd row응암신동아
3rd row상계불암대림
4th row방학신동아1단지
5th row염창벽산늘푸른
ValueCountFrequency (%)
아파트 185
 
1.7%
래미안 50
 
0.5%
e편한세상 33
 
0.3%
아이파크 25
 
0.2%
sk뷰 20
 
0.2%
송파 16
 
0.1%
푸르지오 15
 
0.1%
롯데캐슬아파트 15
 
0.1%
잠원신화 15
 
0.1%
래미안밤섬리베뉴 15
 
0.1%
Other values (2212) 10479
96.4%
2024-05-11T06:51:06.843154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2763
 
3.8%
2665
 
3.6%
2472
 
3.4%
1730
 
2.4%
1636
 
2.2%
1581
 
2.2%
1476
 
2.0%
1468
 
2.0%
1394
 
1.9%
1279
 
1.7%
Other values (419) 54973
74.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67528
92.0%
Decimal Number 3333
 
4.5%
Space Separator 944
 
1.3%
Uppercase Letter 748
 
1.0%
Lowercase Letter 342
 
0.5%
Open Punctuation 152
 
0.2%
Close Punctuation 152
 
0.2%
Dash Punctuation 118
 
0.2%
Other Punctuation 113
 
0.2%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2763
 
4.1%
2665
 
3.9%
2472
 
3.7%
1730
 
2.6%
1636
 
2.4%
1581
 
2.3%
1476
 
2.2%
1468
 
2.2%
1394
 
2.1%
1279
 
1.9%
Other values (374) 49064
72.7%
Uppercase Letter
ValueCountFrequency (%)
S 132
17.6%
C 121
16.2%
K 105
14.0%
M 78
10.4%
D 78
10.4%
H 35
 
4.7%
L 35
 
4.7%
E 34
 
4.5%
I 28
 
3.7%
V 23
 
3.1%
Other values (7) 79
10.6%
Lowercase Letter
ValueCountFrequency (%)
e 205
59.9%
i 31
 
9.1%
l 30
 
8.8%
v 21
 
6.1%
s 17
 
5.0%
k 16
 
4.7%
w 14
 
4.1%
h 2
 
0.6%
g 2
 
0.6%
a 2
 
0.6%
Decimal Number
ValueCountFrequency (%)
2 1031
30.9%
1 1009
30.3%
3 441
13.2%
4 226
 
6.8%
5 174
 
5.2%
6 153
 
4.6%
7 89
 
2.7%
8 80
 
2.4%
9 70
 
2.1%
0 60
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 89
78.8%
. 24
 
21.2%
Space Separator
ValueCountFrequency (%)
944
100.0%
Open Punctuation
ValueCountFrequency (%)
( 152
100.0%
Close Punctuation
ValueCountFrequency (%)
) 152
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 118
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67528
92.0%
Common 4812
 
6.6%
Latin 1097
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2763
 
4.1%
2665
 
3.9%
2472
 
3.7%
1730
 
2.6%
1636
 
2.4%
1581
 
2.3%
1476
 
2.2%
1468
 
2.2%
1394
 
2.1%
1279
 
1.9%
Other values (374) 49064
72.7%
Latin
ValueCountFrequency (%)
e 205
18.7%
S 132
12.0%
C 121
11.0%
K 105
9.6%
M 78
 
7.1%
D 78
 
7.1%
H 35
 
3.2%
L 35
 
3.2%
E 34
 
3.1%
i 31
 
2.8%
Other values (19) 243
22.2%
Common
ValueCountFrequency (%)
2 1031
21.4%
1 1009
21.0%
944
19.6%
3 441
9.2%
4 226
 
4.7%
5 174
 
3.6%
6 153
 
3.2%
( 152
 
3.2%
) 152
 
3.2%
- 118
 
2.5%
Other values (6) 412
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67528
92.0%
ASCII 5902
 
8.0%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2763
 
4.1%
2665
 
3.9%
2472
 
3.7%
1730
 
2.6%
1636
 
2.4%
1581
 
2.3%
1476
 
2.2%
1468
 
2.2%
1394
 
2.1%
1279
 
1.9%
Other values (374) 49064
72.7%
ASCII
ValueCountFrequency (%)
2 1031
17.5%
1 1009
17.1%
944
16.0%
3 441
 
7.5%
4 226
 
3.8%
e 205
 
3.5%
5 174
 
2.9%
6 153
 
2.6%
( 152
 
2.6%
) 152
 
2.6%
Other values (34) 1415
24.0%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2134
Distinct (%)21.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:07.644801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique109 ?
Unique (%)1.1%

Sample

1st rowA13511004
2nd rowA12201101
3rd rowA13981006
4th rowA13202312
5th rowA15704009
ValueCountFrequency (%)
a13790703 15
 
0.1%
a13613011 13
 
0.1%
a10024131 13
 
0.1%
a10027553 13
 
0.1%
a13481305 12
 
0.1%
a13204402 12
 
0.1%
a10026682 12
 
0.1%
a13610003 11
 
0.1%
a12179505 11
 
0.1%
a14278101 11
 
0.1%
Other values (2124) 9877
98.8%
2024-05-11T06:51:08.739994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18877
21.0%
1 17511
19.5%
A 10000
11.1%
3 9001
10.0%
2 8457
9.4%
5 5983
 
6.6%
8 5282
 
5.9%
7 4496
 
5.0%
4 4027
 
4.5%
6 3502
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18877
23.6%
1 17511
21.9%
3 9001
11.3%
2 8457
10.6%
5 5983
 
7.5%
8 5282
 
6.6%
7 4496
 
5.6%
4 4027
 
5.0%
6 3502
 
4.4%
9 2864
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18877
23.6%
1 17511
21.9%
3 9001
11.3%
2 8457
10.6%
5 5983
 
7.5%
8 5282
 
6.6%
7 4496
 
5.6%
4 4027
 
5.0%
6 3502
 
4.4%
9 2864
 
3.6%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18877
21.0%
1 17511
19.5%
A 10000
11.1%
3 9001
10.0%
2 8457
9.4%
5 5983
 
6.6%
8 5282
 
5.9%
7 4496
 
5.0%
4 4027
 
4.5%
6 3502
 
3.9%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:09.508941image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8958
Min length2

Characters and Unicode

Total characters48958
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row피복비
2nd row산재보험료
3rd row고용안정사업수익
4th row고용안정사업비용
5th row고용안정사업수익
ValueCountFrequency (%)
청소비 235
 
2.4%
도서인쇄비 232
 
2.3%
퇴직급여 226
 
2.3%
수선유지비 223
 
2.2%
세대전기료 217
 
2.2%
승강기유지비 211
 
2.1%
산재보험료 211
 
2.1%
급여 211
 
2.1%
이자수익 210
 
2.1%
입주자대표회의운영비 209
 
2.1%
Other values (76) 7815
78.1%
2024-05-11T06:51:10.586296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5370
 
11.0%
3538
 
7.2%
2082
 
4.3%
2035
 
4.2%
1696
 
3.5%
1301
 
2.7%
1058
 
2.2%
815
 
1.7%
802
 
1.6%
776
 
1.6%
Other values (110) 29485
60.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48958
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5370
 
11.0%
3538
 
7.2%
2082
 
4.3%
2035
 
4.2%
1696
 
3.5%
1301
 
2.7%
1058
 
2.2%
815
 
1.7%
802
 
1.6%
776
 
1.6%
Other values (110) 29485
60.2%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48958
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5370
 
11.0%
3538
 
7.2%
2082
 
4.3%
2035
 
4.2%
1696
 
3.5%
1301
 
2.7%
1058
 
2.2%
815
 
1.7%
802
 
1.6%
776
 
1.6%
Other values (110) 29485
60.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48958
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5370
 
11.0%
3538
 
7.2%
2082
 
4.3%
2035
 
4.2%
1696
 
3.5%
1301
 
2.7%
1058
 
2.2%
815
 
1.7%
802
 
1.6%
776
 
1.6%
Other values (110) 29485
60.2%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202209
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202209
2nd row202209
3rd row202209
4th row202209
5th row202209

Common Values

ValueCountFrequency (%)
202209 10000
100.0%

Length

2024-05-11T06:51:10.956705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:51:11.250565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202209 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6865
Distinct (%)68.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3193262
Minimum-6300000
Maximum5.7724876 × 108
Zeros1525
Zeros (%)15.2%
Negative8
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:51:11.578784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-6300000
5-th percentile0
Q137435
median271405
Q31378895
95-th percentile15848620
Maximum5.7724876 × 108
Range5.8354876 × 108
Interquartile range (IQR)1341460

Descriptive statistics

Standard deviation12523238
Coefficient of variation (CV)3.9217697
Kurtosis602.76303
Mean3193262
Median Absolute Deviation (MAD)271405
Skewness18.340876
Sum3.193262 × 1010
Variance1.568315 × 1014
MonotonicityNot monotonic
2024-05-11T06:51:12.049804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1525
 
15.2%
200000 86
 
0.9%
23000 58
 
0.6%
100000 55
 
0.5%
300000 53
 
0.5%
110000 44
 
0.4%
400000 40
 
0.4%
50000 39
 
0.4%
150000 33
 
0.3%
120000 32
 
0.3%
Other values (6855) 8035
80.3%
ValueCountFrequency (%)
-6300000 1
 
< 0.1%
-1669991 1
 
< 0.1%
-386700 1
 
< 0.1%
-240000 1
 
< 0.1%
-38488 1
 
< 0.1%
-32300 1
 
< 0.1%
-23750 1
 
< 0.1%
-1660 1
 
< 0.1%
0 1525
15.2%
3 1
 
< 0.1%
ValueCountFrequency (%)
577248761 1
< 0.1%
324714141 1
< 0.1%
322958188 1
< 0.1%
282817737 1
< 0.1%
259474144 1
< 0.1%
234664530 1
< 0.1%
202149380 1
< 0.1%
194184124 1
< 0.1%
153049931 1
< 0.1%
117465712 1
< 0.1%

Interactions

2024-05-11T06:51:04.893697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:51:12.334014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.230
금액0.2301.000

Missing values

2024-05-11T06:51:05.212565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:51:05.685895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
44108압구정신현대A13511004피복비2022090
22203응암신동아A12201101산재보험료20220946690
66833상계불암대림A13981006고용안정사업수익2022090
31863방학신동아1단지A13202312고용안정사업비용2022090
95150염창벽산늘푸른A15704009고용안정사업수익2022090
97611가양대림경동A15780703고용안정사업비용2022090
31047신내새한아파트A13187406소모품비202209185400
40230명일삼익가든1,2차A13407002기타부대비202209210400
8228힐스테이트청계A10026104퇴직급여2022092480520
1772이수푸르지오더프레티움A10024245교육비202209145000
아파트명아파트코드비용명년월일금액
6886경희궁 롯데캐슬아파트A10025710장기수선비2022092725150
99060염창동아3차A15786227제수당2022091619100
61860문정푸르지오2차A13882401퇴직급여202209944850
74885한강우성아파트A14319010위탁관리수수료202209390000
10836올림픽파크한양수자인A10027354기타부대비20220950800
97750가양도시개발공사8단지(임대)A15780904교육비20220945000
44919수서신동아A13522006세대전기료20220930277240
46259대치동부센트레빌A13528103급여20220931889770
94329코오롱하늘채아파트A15703001승강기수익202209100000
29232용마금호타운A13181203부과차익2022099900