Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1556 (15.6%) zerosZeros

Reproduction

Analysis started2024-05-11 06:53:15.586436
Analysis finished2024-05-11 06:53:17.909570
Duration2.32 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2101
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:18.180580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.2785
Min length2

Characters and Unicode

Total characters72785
Distinct characters428
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique86 ?
Unique (%)0.9%

Sample

1st row북한산힐스테이트1차
2nd row대림신동아
3rd row유원강변
4th row포스코더샵스타리버
5th row답십리청솔우성
ValueCountFrequency (%)
아파트 189
 
1.7%
래미안 39
 
0.4%
e편한세상 26
 
0.2%
아이파크 22
 
0.2%
고덕 21
 
0.2%
sk뷰 19
 
0.2%
북한산 18
 
0.2%
백련산 16
 
0.1%
dmc래미안클라시스 15
 
0.1%
꿈의숲 15
 
0.1%
Other values (2178) 10425
96.5%
2024-05-11T06:53:19.425640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2598
 
3.6%
2500
 
3.4%
2373
 
3.3%
1776
 
2.4%
1765
 
2.4%
1650
 
2.3%
1455
 
2.0%
1440
 
2.0%
1391
 
1.9%
1290
 
1.8%
Other values (418) 54547
74.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66837
91.8%
Decimal Number 3333
 
4.6%
Space Separator 897
 
1.2%
Uppercase Letter 831
 
1.1%
Lowercase Letter 303
 
0.4%
Open Punctuation 186
 
0.3%
Close Punctuation 186
 
0.3%
Dash Punctuation 117
 
0.2%
Other Punctuation 89
 
0.1%
Letter Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2598
 
3.9%
2500
 
3.7%
2373
 
3.6%
1776
 
2.7%
1765
 
2.6%
1650
 
2.5%
1455
 
2.2%
1440
 
2.2%
1391
 
2.1%
1290
 
1.9%
Other values (373) 48599
72.7%
Uppercase Letter
ValueCountFrequency (%)
S 140
16.8%
C 107
12.9%
K 104
12.5%
M 82
9.9%
D 82
9.9%
L 71
8.5%
H 51
 
6.1%
G 42
 
5.1%
I 37
 
4.5%
E 25
 
3.0%
Other values (7) 90
10.8%
Lowercase Letter
ValueCountFrequency (%)
e 191
63.0%
k 19
 
6.3%
i 19
 
6.3%
s 18
 
5.9%
l 18
 
5.9%
v 11
 
3.6%
c 10
 
3.3%
w 7
 
2.3%
h 4
 
1.3%
g 3
 
1.0%
Decimal Number
ValueCountFrequency (%)
1 1011
30.3%
2 1000
30.0%
3 450
13.5%
4 212
 
6.4%
5 188
 
5.6%
6 131
 
3.9%
8 98
 
2.9%
7 96
 
2.9%
9 81
 
2.4%
0 66
 
2.0%
Other Punctuation
ValueCountFrequency (%)
, 69
77.5%
. 20
 
22.5%
Space Separator
ValueCountFrequency (%)
897
100.0%
Open Punctuation
ValueCountFrequency (%)
( 186
100.0%
Close Punctuation
ValueCountFrequency (%)
) 186
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 117
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66837
91.8%
Common 4808
 
6.6%
Latin 1140
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2598
 
3.9%
2500
 
3.7%
2373
 
3.6%
1776
 
2.7%
1765
 
2.6%
1650
 
2.5%
1455
 
2.2%
1440
 
2.2%
1391
 
2.1%
1290
 
1.9%
Other values (373) 48599
72.7%
Latin
ValueCountFrequency (%)
e 191
16.8%
S 140
12.3%
C 107
9.4%
K 104
9.1%
M 82
 
7.2%
D 82
 
7.2%
L 71
 
6.2%
H 51
 
4.5%
G 42
 
3.7%
I 37
 
3.2%
Other values (19) 233
20.4%
Common
ValueCountFrequency (%)
1 1011
21.0%
2 1000
20.8%
897
18.7%
3 450
9.4%
4 212
 
4.4%
5 188
 
3.9%
( 186
 
3.9%
) 186
 
3.9%
6 131
 
2.7%
- 117
 
2.4%
Other values (6) 430
8.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66837
91.8%
ASCII 5942
 
8.2%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2598
 
3.9%
2500
 
3.7%
2373
 
3.6%
1776
 
2.7%
1765
 
2.6%
1650
 
2.5%
1455
 
2.2%
1440
 
2.2%
1391
 
2.1%
1290
 
1.9%
Other values (373) 48599
72.7%
ASCII
ValueCountFrequency (%)
1 1011
17.0%
2 1000
16.8%
897
15.1%
3 450
 
7.6%
4 212
 
3.6%
e 191
 
3.2%
5 188
 
3.2%
( 186
 
3.1%
) 186
 
3.1%
S 140
 
2.4%
Other values (34) 1481
24.9%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2106
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:20.348424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique86 ?
Unique (%)0.9%

Sample

1st rowA12204003
2nd rowA15081606
3rd rowA15606002
4th rowA13824001
5th rowA13003202
ValueCountFrequency (%)
a13305007 13
 
0.1%
a15083701 13
 
0.1%
a13986302 13
 
0.1%
a15209207 12
 
0.1%
a14003101 12
 
0.1%
a15086601 12
 
0.1%
a10025410 12
 
0.1%
a10024938 12
 
0.1%
a13082704 11
 
0.1%
a12187703 11
 
0.1%
Other values (2096) 9879
98.8%
2024-05-11T06:53:21.604074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18889
21.0%
1 17579
19.5%
A 10000
11.1%
3 8743
9.7%
2 8423
9.4%
5 6277
 
7.0%
8 5548
 
6.2%
7 4530
 
5.0%
4 3905
 
4.3%
6 3332
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18889
23.6%
1 17579
22.0%
3 8743
10.9%
2 8423
10.5%
5 6277
 
7.8%
8 5548
 
6.9%
7 4530
 
5.7%
4 3905
 
4.9%
6 3332
 
4.2%
9 2774
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18889
23.6%
1 17579
22.0%
3 8743
10.9%
2 8423
10.5%
5 6277
 
7.8%
8 5548
 
6.9%
7 4530
 
5.7%
4 3905
 
4.9%
6 3332
 
4.2%
9 2774
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18889
21.0%
1 17579
19.5%
A 10000
11.1%
3 8743
9.7%
2 8423
9.4%
5 6277
 
7.0%
8 5548
 
6.2%
7 4530
 
5.0%
4 3905
 
4.3%
6 3332
 
3.7%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:22.143245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8901
Min length2

Characters and Unicode

Total characters48901
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row승강기수익
2nd row선거관리위원회운영비
3rd row검침수익
4th row승강기유지비
5th row고용안정사업수익
ValueCountFrequency (%)
급여 222
 
2.2%
세대전기료 222
 
2.2%
소독비 216
 
2.2%
경비비 215
 
2.1%
퇴직급여 214
 
2.1%
승강기유지비 213
 
2.1%
청소비 210
 
2.1%
도서인쇄비 208
 
2.1%
소모품비 206
 
2.1%
연체료수익 205
 
2.1%
Other values (76) 7869
78.7%
2024-05-11T06:53:23.140900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5369
 
11.0%
3498
 
7.2%
2098
 
4.3%
2002
 
4.1%
1693
 
3.5%
1312
 
2.7%
1046
 
2.1%
830
 
1.7%
797
 
1.6%
752
 
1.5%
Other values (110) 29504
60.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48901
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5369
 
11.0%
3498
 
7.2%
2098
 
4.3%
2002
 
4.1%
1693
 
3.5%
1312
 
2.7%
1046
 
2.1%
830
 
1.7%
797
 
1.6%
752
 
1.5%
Other values (110) 29504
60.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48901
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5369
 
11.0%
3498
 
7.2%
2098
 
4.3%
2002
 
4.1%
1693
 
3.5%
1312
 
2.7%
1046
 
2.1%
830
 
1.7%
797
 
1.6%
752
 
1.5%
Other values (110) 29504
60.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48901
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5369
 
11.0%
3498
 
7.2%
2098
 
4.3%
2002
 
4.1%
1693
 
3.5%
1312
 
2.7%
1046
 
2.1%
830
 
1.7%
797
 
1.6%
752
 
1.5%
Other values (110) 29504
60.3%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202111
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202111
2nd row202111
3rd row202111
4th row202111
5th row202111

Common Values

ValueCountFrequency (%)
202111 10000
100.0%

Length

2024-05-11T06:53:23.560164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:53:23.843862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202111 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6714
Distinct (%)67.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3356240.7
Minimum-7890000
Maximum3.5784531 × 108
Zeros1556
Zeros (%)15.6%
Negative9
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:53:24.108207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-7890000
5-th percentile0
Q153332.5
median297595
Q31254225
95-th percentile17543062
Maximum3.5784531 × 108
Range3.6573531 × 108
Interquartile range (IQR)1200892.5

Descriptive statistics

Standard deviation12381971
Coefficient of variation (CV)3.6892381
Kurtosis196.73041
Mean3356240.7
Median Absolute Deviation (MAD)297595
Skewness11.110161
Sum3.3562407 × 1010
Variance1.5331321 × 1014
MonotonicityNot monotonic
2024-05-11T06:53:24.592357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1556
 
15.6%
200000 81
 
0.8%
300000 63
 
0.6%
100000 59
 
0.6%
150000 47
 
0.5%
400000 38
 
0.4%
30000 34
 
0.3%
50000 33
 
0.3%
500000 32
 
0.3%
250000 32
 
0.3%
Other values (6704) 8025
80.2%
ValueCountFrequency (%)
-7890000 1
 
< 0.1%
-2923460 1
 
< 0.1%
-2903340 1
 
< 0.1%
-200000 1
 
< 0.1%
-135560 1
 
< 0.1%
-129440 1
 
< 0.1%
-73000 1
 
< 0.1%
-48670 1
 
< 0.1%
-39014 1
 
< 0.1%
0 1556
15.6%
ValueCountFrequency (%)
357845310 1
< 0.1%
301629336 1
< 0.1%
283516110 1
< 0.1%
232414558 1
< 0.1%
231461300 1
< 0.1%
215344161 1
< 0.1%
193136223 1
< 0.1%
180422060 1
< 0.1%
162006248 1
< 0.1%
154348511 1
< 0.1%

Interactions

2024-05-11T06:53:16.888391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:53:24.948749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.409
금액0.4091.000

Missing values

2024-05-11T06:53:17.288049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:53:17.693031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
20943북한산힐스테이트1차A12204003승강기수익202111320000
78002대림신동아A15081606선거관리위원회운영비202111450000
89085유원강변A15606002검침수익202111131580
57957포스코더샵스타리버A13824001승강기유지비2021112728000
23301답십리청솔우성A13003202고용안정사업수익2021111200000
84869구로성호주상복합A15284204장기수선비2021112017780
95147가양3단지(강변)A15780704소모품비202111253090
71232번동신원A14206306잡비용202111367350
95630등촌코오롱오투빌1차A15784003경비비2021116523160
13173광화문풍림스페이스본 아파트A11005401위탁관리수수료2021111901856
아파트명아파트코드비용명년월일금액
54020서초한신플러스타운A13786704이자수익2021110
22571대조삼성타운A12284501주차장운영비2021111600000
77999대림신동아A15081606경비비20211127192580
26998상봉프레미어스엠코A13122002국민연금202111385494
95016한강타운아파트A15780604국민연금202111589760
43047엘에이치강남브리즈힐A13520004청소비20211110374670
58845잠실5단지아파트A13879102주차장수익2021111833970
17591상암월드컵파크6단지A12127002잡수익2021110
35650래미안옥수리버젠A13375907소독비2021111210000
47802보문아남A13608601잡수익2021110