Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1580 (15.8%) zerosZeros

Reproduction

Analysis started2024-05-11 06:49:30.459701
Analysis finished2024-05-11 06:49:33.714102
Duration3.25 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2178
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:34.074450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length21
Mean length7.3349
Min length2

Characters and Unicode

Total characters73349
Distinct characters430
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique106 ?
Unique (%)1.1%

Sample

1st row금호동롯데아파트
2nd row홍제성원아파트
3rd row목동금호1차
4th row한강
5th row창동상아1차
ValueCountFrequency (%)
아파트 216
 
2.0%
래미안 51
 
0.5%
e편한세상 31
 
0.3%
힐스테이트 26
 
0.2%
아이파크 19
 
0.2%
sk뷰 18
 
0.2%
신도림현대 17
 
0.2%
푸르지오 16
 
0.1%
송파 16
 
0.1%
신반포 15
 
0.1%
Other values (2262) 10526
96.1%
2024-05-11T06:49:35.365951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2694
 
3.7%
2574
 
3.5%
2488
 
3.4%
1734
 
2.4%
1691
 
2.3%
1640
 
2.2%
1443
 
2.0%
1405
 
1.9%
1392
 
1.9%
1335
 
1.8%
Other values (420) 54953
74.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67261
91.7%
Decimal Number 3377
 
4.6%
Space Separator 1053
 
1.4%
Uppercase Letter 817
 
1.1%
Lowercase Letter 334
 
0.5%
Close Punctuation 144
 
0.2%
Open Punctuation 144
 
0.2%
Other Punctuation 109
 
0.1%
Dash Punctuation 104
 
0.1%
Letter Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2694
 
4.0%
2574
 
3.8%
2488
 
3.7%
1734
 
2.6%
1691
 
2.5%
1640
 
2.4%
1443
 
2.1%
1405
 
2.1%
1392
 
2.1%
1335
 
2.0%
Other values (375) 48865
72.6%
Uppercase Letter
ValueCountFrequency (%)
S 128
15.7%
C 128
15.7%
K 99
12.1%
D 91
11.1%
M 91
11.1%
L 43
 
5.3%
H 40
 
4.9%
E 40
 
4.9%
I 36
 
4.4%
V 25
 
3.1%
Other values (7) 96
11.8%
Lowercase Letter
ValueCountFrequency (%)
e 188
56.3%
l 28
 
8.4%
i 25
 
7.5%
k 20
 
6.0%
s 20
 
6.0%
v 17
 
5.1%
c 14
 
4.2%
h 7
 
2.1%
w 7
 
2.1%
g 4
 
1.2%
Decimal Number
ValueCountFrequency (%)
2 1026
30.4%
1 995
29.5%
3 456
13.5%
4 222
 
6.6%
5 186
 
5.5%
6 147
 
4.4%
7 103
 
3.1%
8 103
 
3.1%
9 92
 
2.7%
0 47
 
1.4%
Other Punctuation
ValueCountFrequency (%)
, 94
86.2%
. 15
 
13.8%
Space Separator
ValueCountFrequency (%)
1053
100.0%
Close Punctuation
ValueCountFrequency (%)
) 144
100.0%
Open Punctuation
ValueCountFrequency (%)
( 144
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 104
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67261
91.7%
Common 4931
 
6.7%
Latin 1157
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2694
 
4.0%
2574
 
3.8%
2488
 
3.7%
1734
 
2.6%
1691
 
2.5%
1640
 
2.4%
1443
 
2.1%
1405
 
2.1%
1392
 
2.1%
1335
 
2.0%
Other values (375) 48865
72.6%
Latin
ValueCountFrequency (%)
e 188
16.2%
S 128
11.1%
C 128
11.1%
K 99
 
8.6%
D 91
 
7.9%
M 91
 
7.9%
L 43
 
3.7%
H 40
 
3.5%
E 40
 
3.5%
I 36
 
3.1%
Other values (19) 273
23.6%
Common
ValueCountFrequency (%)
1053
21.4%
2 1026
20.8%
1 995
20.2%
3 456
9.2%
4 222
 
4.5%
5 186
 
3.8%
6 147
 
3.0%
) 144
 
2.9%
( 144
 
2.9%
- 104
 
2.1%
Other values (6) 454
9.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67261
91.7%
ASCII 6082
 
8.3%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2694
 
4.0%
2574
 
3.8%
2488
 
3.7%
1734
 
2.6%
1691
 
2.5%
1640
 
2.4%
1443
 
2.1%
1405
 
2.1%
1392
 
2.1%
1335
 
2.0%
Other values (375) 48865
72.6%
ASCII
ValueCountFrequency (%)
1053
17.3%
2 1026
16.9%
1 995
16.4%
3 456
 
7.5%
4 222
 
3.7%
e 188
 
3.1%
5 186
 
3.1%
6 147
 
2.4%
) 144
 
2.4%
( 144
 
2.4%
Other values (34) 1521
25.0%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2181
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:36.631103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique106 ?
Unique (%)1.1%

Sample

1st rowA13309402
2nd rowA12009201
3rd rowA15882107
4th rowA13790620
5th rowA13204507
ValueCountFrequency (%)
a13872504 12
 
0.1%
a15805115 12
 
0.1%
a15083701 12
 
0.1%
a15875101 12
 
0.1%
a15807311 12
 
0.1%
a12013003 12
 
0.1%
a13987306 11
 
0.1%
a12013202 11
 
0.1%
a14003105 11
 
0.1%
a14381407 11
 
0.1%
Other values (2171) 9884
98.8%
2024-05-11T06:49:38.227932image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18859
21.0%
1 17457
19.4%
A 10000
11.1%
3 9000
10.0%
2 8202
9.1%
5 6186
 
6.9%
8 5499
 
6.1%
7 4585
 
5.1%
4 3937
 
4.4%
6 3392
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18859
23.6%
1 17457
21.8%
3 9000
11.2%
2 8202
10.3%
5 6186
 
7.7%
8 5499
 
6.9%
7 4585
 
5.7%
4 3937
 
4.9%
6 3392
 
4.2%
9 2883
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18859
23.6%
1 17457
21.8%
3 9000
11.2%
2 8202
10.3%
5 6186
 
7.7%
8 5499
 
6.9%
7 4585
 
5.7%
4 3937
 
4.9%
6 3392
 
4.2%
9 2883
 
3.6%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18859
21.0%
1 17457
19.4%
A 10000
11.1%
3 9000
10.0%
2 8202
9.1%
5 6186
 
6.9%
8 5499
 
6.1%
7 4585
 
5.1%
4 3937
 
4.4%
6 3392
 
3.8%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:38.853244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8122
Min length2

Characters and Unicode

Total characters48122
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row선거관리위원회운영비
2nd row고용보험료
3rd row주차장수익
4th row건강보험료
5th row고용보험료
ValueCountFrequency (%)
소독비 238
 
2.4%
교육비 227
 
2.3%
통신비 225
 
2.2%
경비비 216
 
2.2%
세대전기료 215
 
2.1%
청소비 214
 
2.1%
제수당 211
 
2.1%
연체료수익 209
 
2.1%
승강기유지비 209
 
2.1%
사무용품비 209
 
2.1%
Other values (76) 7827
78.3%
2024-05-11T06:49:40.092827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5438
 
11.3%
3499
 
7.3%
2140
 
4.4%
1943
 
4.0%
1389
 
2.9%
1309
 
2.7%
1029
 
2.1%
892
 
1.9%
831
 
1.7%
797
 
1.7%
Other values (110) 28855
60.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48122
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5438
 
11.3%
3499
 
7.3%
2140
 
4.4%
1943
 
4.0%
1389
 
2.9%
1309
 
2.7%
1029
 
2.1%
892
 
1.9%
831
 
1.7%
797
 
1.7%
Other values (110) 28855
60.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48122
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5438
 
11.3%
3499
 
7.3%
2140
 
4.4%
1943
 
4.0%
1389
 
2.9%
1309
 
2.7%
1029
 
2.1%
892
 
1.9%
831
 
1.7%
797
 
1.7%
Other values (110) 28855
60.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48122
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5438
 
11.3%
3499
 
7.3%
2140
 
4.4%
1943
 
4.0%
1389
 
2.9%
1309
 
2.7%
1029
 
2.1%
892
 
1.9%
831
 
1.7%
797
 
1.7%
Other values (110) 28855
60.0%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202310
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202310
2nd row202310
3rd row202310
4th row202310
5th row202310

Common Values

ValueCountFrequency (%)
202310 10000
100.0%

Length

2024-05-11T06:49:40.872390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:49:41.231860image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202310 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6852
Distinct (%)68.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3551884.4
Minimum-47373150
Maximum3.65629 × 108
Zeros1580
Zeros (%)15.8%
Negative15
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:49:41.745066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-47373150
5-th percentile0
Q150000
median285252.5
Q31416015
95-th percentile19026952
Maximum3.65629 × 108
Range4.1300215 × 108
Interquartile range (IQR)1366015

Descriptive statistics

Standard deviation12414757
Coefficient of variation (CV)3.4952594
Kurtosis193.75661
Mean3551884.4
Median Absolute Deviation (MAD)285252.5
Skewness10.713289
Sum3.5518844 × 1010
Variance1.541262 × 1014
MonotonicityNot monotonic
2024-05-11T06:49:42.418733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1580
 
15.8%
200000 74
 
0.7%
100000 58
 
0.6%
300000 57
 
0.6%
400000 42
 
0.4%
150000 39
 
0.4%
250000 38
 
0.4%
30000 34
 
0.3%
220000 32
 
0.3%
120000 31
 
0.3%
Other values (6842) 8015
80.2%
ValueCountFrequency (%)
-47373150 1
< 0.1%
-7791178 1
< 0.1%
-6060606 1
< 0.1%
-5500000 1
< 0.1%
-1778310 1
< 0.1%
-1018000 1
< 0.1%
-892140 1
< 0.1%
-401560 1
< 0.1%
-396666 1
< 0.1%
-118080 1
< 0.1%
ValueCountFrequency (%)
365628998 1
< 0.1%
299211202 1
< 0.1%
282779662 1
< 0.1%
237491905 1
< 0.1%
225920170 1
< 0.1%
214890218 1
< 0.1%
206721922 1
< 0.1%
182875390 1
< 0.1%
168082438 1
< 0.1%
152575066 1
< 0.1%

Interactions

2024-05-11T06:49:32.103883image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:49:42.746268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.309
금액0.3091.000

Missing values

2024-05-11T06:49:32.683949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:49:33.475397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
35571금호동롯데아파트A13309402선거관리위원회운영비2023100
16241홍제성원아파트A12009201고용보험료20231042360
99584목동금호1차A15882107주차장수익2023101125200
55221한강A13790620건강보험료202310744000
32599창동상아1차A13204507고용보험료202310268980
69854한남힐스테이트A14077901국민연금202310624010
64441공릉대동1차A13980801잡수익202310135
38343천호태영A13402002사무용품비202310204720
63346공릉태릉우성A13980009재활용품비용202310140000
49870장위참누리A13614302수선유지비2023101293950
아파트명아파트코드비용명년월일금액
2431자양호반써밋아파트A10024132건강보험료202310490500
55499강변아파트A13790714복리후생비202310324000
35358응봉금호현대A13308004회계감사비202310110000
90594대방현대1차A15681106이자수익2023100
69337후암미주A14019001세대전기료2023105954150
32317창동동아청솔A13204409수선유지비2023107721930
21888신촌금호A12188201세대전기료20231012062310
7595래미안블레스티지A10025675선거관리위원회운영비202310293600
82592대상A15209303소방안전관리비202310187000
83419신도림디큐브시티A15277302복리후생비202310143000