Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1365 (13.7%) zerosZeros

Reproduction

Analysis started2024-05-11 06:55:27.028394
Analysis finished2024-05-11 06:55:28.690739
Duration1.66 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2077
Distinct (%)20.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:28.933600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.2737
Min length2

Characters and Unicode

Total characters72737
Distinct characters428
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)0.9%

Sample

1st row하월곡샹그레빌
2nd row은평뉴타운상림마을13단지
3rd row한강대우
4th row용두롯데캐슬피렌체
5th row방배대우효령
ValueCountFrequency (%)
아파트 179
 
1.7%
래미안 40
 
0.4%
아이파크 29
 
0.3%
신반포 25
 
0.2%
고덕 17
 
0.2%
백련산 15
 
0.1%
e편한세상 14
 
0.1%
sk뷰 14
 
0.1%
힐스테이트 14
 
0.1%
신동아아파트 13
 
0.1%
Other values (2141) 10391
96.7%
2024-05-11T06:55:29.631171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2663
 
3.7%
2597
 
3.6%
2340
 
3.2%
1855
 
2.6%
1621
 
2.2%
1556
 
2.1%
1445
 
2.0%
1393
 
1.9%
1360
 
1.9%
1295
 
1.8%
Other values (418) 54612
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66630
91.6%
Decimal Number 3362
 
4.6%
Uppercase Letter 898
 
1.2%
Space Separator 841
 
1.2%
Lowercase Letter 393
 
0.5%
Open Punctuation 168
 
0.2%
Close Punctuation 168
 
0.2%
Other Punctuation 134
 
0.2%
Dash Punctuation 132
 
0.2%
Letter Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2663
 
4.0%
2597
 
3.9%
2340
 
3.5%
1855
 
2.8%
1621
 
2.4%
1556
 
2.3%
1445
 
2.2%
1393
 
2.1%
1360
 
2.0%
1295
 
1.9%
Other values (372) 48505
72.8%
Uppercase Letter
ValueCountFrequency (%)
S 142
15.8%
C 114
12.7%
K 106
11.8%
D 81
9.0%
M 81
9.0%
L 60
6.7%
H 48
 
5.3%
E 46
 
5.1%
I 46
 
5.1%
A 35
 
3.9%
Other values (7) 139
15.5%
Lowercase Letter
ValueCountFrequency (%)
e 198
50.4%
l 44
 
11.2%
i 40
 
10.2%
v 24
 
6.1%
k 23
 
5.9%
c 20
 
5.1%
s 16
 
4.1%
w 11
 
2.8%
a 7
 
1.8%
g 7
 
1.8%
Decimal Number
ValueCountFrequency (%)
2 1011
30.1%
1 992
29.5%
3 430
12.8%
4 208
 
6.2%
5 208
 
6.2%
6 160
 
4.8%
7 116
 
3.5%
8 94
 
2.8%
9 90
 
2.7%
0 53
 
1.6%
Other Punctuation
ValueCountFrequency (%)
, 100
74.6%
. 34
 
25.4%
Space Separator
ValueCountFrequency (%)
841
100.0%
Open Punctuation
ValueCountFrequency (%)
( 168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 168
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 132
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%
Math Symbol
ValueCountFrequency (%)
~ 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66630
91.6%
Common 4810
 
6.6%
Latin 1297
 
1.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2663
 
4.0%
2597
 
3.9%
2340
 
3.5%
1855
 
2.8%
1621
 
2.4%
1556
 
2.3%
1445
 
2.2%
1393
 
2.1%
1360
 
2.0%
1295
 
1.9%
Other values (372) 48505
72.8%
Latin
ValueCountFrequency (%)
e 198
15.3%
S 142
 
10.9%
C 114
 
8.8%
K 106
 
8.2%
D 81
 
6.2%
M 81
 
6.2%
L 60
 
4.6%
H 48
 
3.7%
E 46
 
3.5%
I 46
 
3.5%
Other values (19) 375
28.9%
Common
ValueCountFrequency (%)
2 1011
21.0%
1 992
20.6%
841
17.5%
3 430
8.9%
4 208
 
4.3%
5 208
 
4.3%
( 168
 
3.5%
) 168
 
3.5%
6 160
 
3.3%
- 132
 
2.7%
Other values (7) 492
10.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66630
91.6%
ASCII 6101
 
8.4%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2663
 
4.0%
2597
 
3.9%
2340
 
3.5%
1855
 
2.8%
1621
 
2.4%
1556
 
2.3%
1445
 
2.2%
1393
 
2.1%
1360
 
2.0%
1295
 
1.9%
Other values (372) 48505
72.8%
ASCII
ValueCountFrequency (%)
2 1011
16.6%
1 992
16.3%
841
13.8%
3 430
 
7.0%
4 208
 
3.4%
5 208
 
3.4%
e 198
 
3.2%
( 168
 
2.8%
) 168
 
2.8%
6 160
 
2.6%
Other values (35) 1717
28.1%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2083
Distinct (%)20.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:30.278281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)0.9%

Sample

1st rowA13613201
2nd rowA12220002
3rd rowA14003105
4th rowA13007002
5th rowA13706303
ValueCountFrequency (%)
a15683402 13
 
0.1%
a13520002 12
 
0.1%
a13611008 12
 
0.1%
a15283709 12
 
0.1%
a13483803 12
 
0.1%
a12081602 12
 
0.1%
a13789201 11
 
0.1%
a13203002 11
 
0.1%
a14005001 11
 
0.1%
a13382106 11
 
0.1%
Other values (2073) 9883
98.8%
2024-05-11T06:55:31.412833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18903
21.0%
1 17636
19.6%
A 10000
11.1%
3 9071
10.1%
2 8286
9.2%
5 6018
 
6.7%
8 5483
 
6.1%
7 4541
 
5.0%
4 3796
 
4.2%
6 3550
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18903
23.6%
1 17636
22.0%
3 9071
11.3%
2 8286
10.4%
5 6018
 
7.5%
8 5483
 
6.9%
7 4541
 
5.7%
4 3796
 
4.7%
6 3550
 
4.4%
9 2716
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18903
23.6%
1 17636
22.0%
3 9071
11.3%
2 8286
10.4%
5 6018
 
7.5%
8 5483
 
6.9%
7 4541
 
5.7%
4 3796
 
4.7%
6 3550
 
4.4%
9 2716
 
3.4%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18903
21.0%
1 17636
19.6%
A 10000
11.1%
3 9071
10.1%
2 8286
9.2%
5 6018
 
6.7%
8 5483
 
6.1%
7 4541
 
5.0%
4 3796
 
4.2%
6 3550
 
3.9%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:31.924189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.9146
Min length2

Characters and Unicode

Total characters49146
Distinct characters118
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row사무용품비
2nd row승강기수익
3rd row승강기유지비
4th row퇴직급여
5th row교육비
ValueCountFrequency (%)
경비비 227
 
2.3%
사무용품비 219
 
2.2%
청소비 215
 
2.1%
수선유지비 211
 
2.1%
이자수익 211
 
2.1%
소모품비 209
 
2.1%
소독비 205
 
2.1%
세대수도료 205
 
2.1%
교육비 204
 
2.0%
승강기유지비 202
 
2.0%
Other values (76) 7892
78.9%
2024-05-11T06:55:32.801634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5490
 
11.2%
3537
 
7.2%
2043
 
4.2%
1991
 
4.1%
1787
 
3.6%
1294
 
2.6%
1003
 
2.0%
852
 
1.7%
792
 
1.6%
746
 
1.5%
Other values (108) 29611
60.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49146
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5490
 
11.2%
3537
 
7.2%
2043
 
4.2%
1991
 
4.1%
1787
 
3.6%
1294
 
2.6%
1003
 
2.0%
852
 
1.7%
792
 
1.6%
746
 
1.5%
Other values (108) 29611
60.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49146
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5490
 
11.2%
3537
 
7.2%
2043
 
4.2%
1991
 
4.1%
1787
 
3.6%
1294
 
2.6%
1003
 
2.0%
852
 
1.7%
792
 
1.6%
746
 
1.5%
Other values (108) 29611
60.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49146
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5490
 
11.2%
3537
 
7.2%
2043
 
4.2%
1991
 
4.1%
1787
 
3.6%
1294
 
2.6%
1003
 
2.0%
852
 
1.7%
792
 
1.6%
746
 
1.5%
Other values (108) 29611
60.3%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202012
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202012
2nd row202012
3rd row202012
4th row202012
5th row202012

Common Values

ValueCountFrequency (%)
202012 10000
100.0%

Length

2024-05-11T06:55:33.175442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:55:33.468276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202012 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6962
Distinct (%)69.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3442789.5
Minimum-61560599
Maximum5.6566121 × 108
Zeros1365
Zeros (%)13.7%
Negative20
Negative (%)0.2%
Memory size166.0 KiB
2024-05-11T06:55:33.793223image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-61560599
5-th percentile0
Q159397.75
median300085
Q31376025
95-th percentile16685321
Maximum5.6566121 × 108
Range6.2722181 × 108
Interquartile range (IQR)1316627.2

Descriptive statistics

Standard deviation14226391
Coefficient of variation (CV)4.1322279
Kurtosis489.1383
Mean3442789.5
Median Absolute Deviation (MAD)300085
Skewness16.789287
Sum3.4427895 × 1010
Variance2.023902 × 1014
MonotonicityNot monotonic
2024-05-11T06:55:34.226898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1365
 
13.7%
200000 95
 
0.9%
100000 79
 
0.8%
300000 63
 
0.6%
50000 41
 
0.4%
400000 39
 
0.4%
150000 38
 
0.4%
250000 36
 
0.4%
30000 31
 
0.3%
110000 31
 
0.3%
Other values (6952) 8182
81.8%
ValueCountFrequency (%)
-61560599 1
< 0.1%
-34806336 1
< 0.1%
-10580930 1
< 0.1%
-5383483 1
< 0.1%
-5011630 1
< 0.1%
-3862747 1
< 0.1%
-2747840 1
< 0.1%
-1518331 1
< 0.1%
-894550 1
< 0.1%
-729480 1
< 0.1%
ValueCountFrequency (%)
565661214 1
< 0.1%
521301820 1
< 0.1%
266711630 1
< 0.1%
264485120 1
< 0.1%
259926210 1
< 0.1%
214314354 1
< 0.1%
213377150 1
< 0.1%
188868390 1
< 0.1%
188239920 1
< 0.1%
181575780 1
< 0.1%

Interactions

2024-05-11T06:55:27.892233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:55:34.489772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.684
금액0.6841.000

Missing values

2024-05-11T06:55:28.229621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:55:28.558084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
49534하월곡샹그레빌A13613201사무용품비202012155300
20840은평뉴타운상림마을13단지A12220002승강기수익202012100000
70278한강대우A14003105승강기유지비2020121567500
22849용두롯데캐슬피렌체A13007002퇴직급여2020121311250
51922방배대우효령A13706303교육비2020120
5769송파호반베르디움더퍼스트A10026362부과차손2020121080
87746신도림미성A15288611소독비202012400000
96547마곡수명산파크1단지A15728008위탁관리수수료202012343420
26890용마금호타운A13181203통신비20201249950
43424도곡현대그린A13527002소방안전관리비202012154000
아파트명아파트코드비용명년월일금액
64833공릉삼익2차A13980403수도광열비20201256630
61650월계흥화브라운빌A13905202부과차익2020121600
78501양평한신A15010502승강기수익202012385000
70529신창세방리버하이빌A14006001소모품비202012213780
2906래미안아트리치A10025283주차장수익2020122000
96698마곡엠밸리15단지A15728011자치활동비202012200000
4348아크로 리버하임A10025770경비비20201254214627
75686자양7차현대홈타운A14388204정화조관리비202012386660
90643상도래미안1차A15603204교통비2020123000
67260월계삼호4차A13984003건강보험료202012958210