Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2274 (22.7%) zerosZeros

Reproduction

Analysis started2024-05-11 05:59:36.325601
Analysis finished2024-05-11 05:59:37.196882
Duration0.87 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2191
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:37.359118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.2825
Min length2

Characters and Unicode

Total characters72825
Distinct characters436
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique123 ?
Unique (%)1.2%

Sample

1st row잠실올림픽공원아이파크
2nd row송천센트레빌
3rd row구로주공
4th row대치1차현대아파트
5th row신길경남
ValueCountFrequency (%)
아파트 167
 
1.6%
래미안 38
 
0.4%
e편한세상 19
 
0.2%
래미안밤섬리베뉴 15
 
0.1%
힐스테이트 13
 
0.1%
아이파크 13
 
0.1%
북한산 13
 
0.1%
푸르지오 13
 
0.1%
sk뷰 12
 
0.1%
서울숲2차푸르지오임대 12
 
0.1%
Other values (2255) 10324
97.0%
2024-05-11T14:59:37.842279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2469
 
3.4%
2441
 
3.4%
2249
 
3.1%
1824
 
2.5%
1790
 
2.5%
1686
 
2.3%
1528
 
2.1%
1516
 
2.1%
1446
 
2.0%
1319
 
1.8%
Other values (426) 54557
74.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66671
91.5%
Decimal Number 3732
 
5.1%
Uppercase Letter 777
 
1.1%
Space Separator 715
 
1.0%
Lowercase Letter 379
 
0.5%
Open Punctuation 146
 
0.2%
Close Punctuation 146
 
0.2%
Dash Punctuation 129
 
0.2%
Other Punctuation 120
 
0.2%
Letter Number 8
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2469
 
3.7%
2441
 
3.7%
2249
 
3.4%
1824
 
2.7%
1790
 
2.7%
1686
 
2.5%
1528
 
2.3%
1516
 
2.3%
1446
 
2.2%
1319
 
2.0%
Other values (380) 48403
72.6%
Uppercase Letter
ValueCountFrequency (%)
S 116
14.9%
C 104
13.4%
K 100
12.9%
M 66
8.5%
D 66
8.5%
L 57
7.3%
I 45
 
5.8%
H 37
 
4.8%
E 36
 
4.6%
G 35
 
4.5%
Other values (7) 115
14.8%
Lowercase Letter
ValueCountFrequency (%)
e 214
56.5%
i 30
 
7.9%
l 28
 
7.4%
s 26
 
6.9%
k 21
 
5.5%
v 21
 
5.5%
w 13
 
3.4%
c 10
 
2.6%
h 10
 
2.6%
a 3
 
0.8%
Decimal Number
ValueCountFrequency (%)
1 1138
30.5%
2 1078
28.9%
3 490
13.1%
4 276
 
7.4%
5 199
 
5.3%
6 165
 
4.4%
7 118
 
3.2%
9 105
 
2.8%
8 83
 
2.2%
0 80
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 93
77.5%
. 27
 
22.5%
Space Separator
ValueCountFrequency (%)
715
100.0%
Open Punctuation
ValueCountFrequency (%)
( 146
100.0%
Close Punctuation
ValueCountFrequency (%)
) 146
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 129
100.0%
Letter Number
ValueCountFrequency (%)
8
100.0%
Math Symbol
ValueCountFrequency (%)
~ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66671
91.5%
Common 4990
 
6.9%
Latin 1164
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2469
 
3.7%
2441
 
3.7%
2249
 
3.4%
1824
 
2.7%
1790
 
2.7%
1686
 
2.5%
1528
 
2.3%
1516
 
2.3%
1446
 
2.2%
1319
 
2.0%
Other values (380) 48403
72.6%
Latin
ValueCountFrequency (%)
e 214
18.4%
S 116
 
10.0%
C 104
 
8.9%
K 100
 
8.6%
M 66
 
5.7%
D 66
 
5.7%
L 57
 
4.9%
I 45
 
3.9%
H 37
 
3.2%
E 36
 
3.1%
Other values (19) 323
27.7%
Common
ValueCountFrequency (%)
1 1138
22.8%
2 1078
21.6%
715
14.3%
3 490
9.8%
4 276
 
5.5%
5 199
 
4.0%
6 165
 
3.3%
( 146
 
2.9%
) 146
 
2.9%
- 129
 
2.6%
Other values (7) 508
10.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66671
91.5%
ASCII 6146
 
8.4%
Number Forms 8
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2469
 
3.7%
2441
 
3.7%
2249
 
3.4%
1824
 
2.7%
1790
 
2.7%
1686
 
2.5%
1528
 
2.3%
1516
 
2.3%
1446
 
2.2%
1319
 
2.0%
Other values (380) 48403
72.6%
ASCII
ValueCountFrequency (%)
1 1138
18.5%
2 1078
17.5%
715
11.6%
3 490
 
8.0%
4 276
 
4.5%
e 214
 
3.5%
5 199
 
3.2%
6 165
 
2.7%
( 146
 
2.4%
) 146
 
2.4%
Other values (35) 1579
25.7%
Number Forms
ValueCountFrequency (%)
8
100.0%
Distinct2198
Distinct (%)22.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:38.247479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique124 ?
Unique (%)1.2%

Sample

1st rowA10025185
2nd rowA14272313
3rd rowA15286809
4th rowA10024799
5th rowA15083703
ValueCountFrequency (%)
a15807703 12
 
0.1%
a13204510 11
 
0.1%
a15005001 11
 
0.1%
a13991017 11
 
0.1%
a13519006 11
 
0.1%
a15375809 11
 
0.1%
a15083703 11
 
0.1%
a15807311 11
 
0.1%
a13920106 11
 
0.1%
a13611202 11
 
0.1%
Other values (2188) 9889
98.9%
2024-05-11T14:59:38.781929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18546
20.6%
1 17731
19.7%
A 9984
11.1%
3 8810
9.8%
2 8177
9.1%
5 6241
 
6.9%
8 5618
 
6.2%
7 4788
 
5.3%
4 3740
 
4.2%
6 3323
 
3.7%
Other values (2) 3042
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18546
23.2%
1 17731
22.2%
3 8810
11.0%
2 8177
10.2%
5 6241
 
7.8%
8 5618
 
7.0%
7 4788
 
6.0%
4 3740
 
4.7%
6 3323
 
4.2%
9 3026
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9984
99.8%
B 16
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18546
23.2%
1 17731
22.2%
3 8810
11.0%
2 8177
10.2%
5 6241
 
7.8%
8 5618
 
7.0%
7 4788
 
6.0%
4 3740
 
4.7%
6 3323
 
4.2%
9 3026
 
3.8%
Latin
ValueCountFrequency (%)
A 9984
99.8%
B 16
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18546
20.6%
1 17731
19.7%
A 9984
11.1%
3 8810
9.8%
2 8177
9.1%
5 6241
 
6.9%
8 5618
 
6.2%
7 4788
 
5.3%
4 3740
 
4.2%
6 3323
 
3.7%
Other values (2) 3042
 
3.4%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:39.060928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length5.977
Min length2

Characters and Unicode

Total characters59770
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row당기순이익
2nd row현금
3rd row장기수선충당부채
4th row선급비용
5th row공동체활성화단체지원적립금
ValueCountFrequency (%)
선급비용 341
 
3.4%
당기순이익 337
 
3.4%
예금 333
 
3.3%
비품 324
 
3.2%
장기수선충당예금 322
 
3.2%
연차수당충당부채 320
 
3.2%
미처분이익잉여금 315
 
3.1%
퇴직급여충당부채 313
 
3.1%
예수금 312
 
3.1%
관리비미수금 307
 
3.1%
Other values (67) 6776
67.8%
2024-05-11T14:59:39.522236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4595
 
7.7%
3909
 
6.5%
3195
 
5.3%
3169
 
5.3%
3083
 
5.2%
3009
 
5.0%
2707
 
4.5%
2428
 
4.1%
1953
 
3.3%
1832
 
3.1%
Other values (97) 29890
50.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59770
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4595
 
7.7%
3909
 
6.5%
3195
 
5.3%
3169
 
5.3%
3083
 
5.2%
3009
 
5.0%
2707
 
4.5%
2428
 
4.1%
1953
 
3.3%
1832
 
3.1%
Other values (97) 29890
50.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59770
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4595
 
7.7%
3909
 
6.5%
3195
 
5.3%
3169
 
5.3%
3083
 
5.2%
3009
 
5.0%
2707
 
4.5%
2428
 
4.1%
1953
 
3.3%
1832
 
3.1%
Other values (97) 29890
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59770
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4595
 
7.7%
3909
 
6.5%
3195
 
5.3%
3169
 
5.3%
3083
 
5.2%
3009
 
5.0%
2707
 
4.5%
2428
 
4.1%
1953
 
3.3%
1832
 
3.1%
Other values (97) 29890
50.0%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202012
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202012
2nd row202012
3rd row202012
4th row202012
5th row202012

Common Values

ValueCountFrequency (%)
202012 10000
100.0%

Length

2024-05-11T14:59:39.686854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:59:39.829164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202012 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7395
Distinct (%)74.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72169035
Minimum-4.09024 × 109
Maximum7.376961 × 109
Zeros2274
Zeros (%)22.7%
Negative326
Negative (%)3.3%
Memory size166.0 KiB
2024-05-11T14:59:39.962991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4.09024 × 109
5-th percentile0
Q10
median3378054.5
Q336960243
95-th percentile3.6455594 × 108
Maximum7.376961 × 109
Range1.1467201 × 1010
Interquartile range (IQR)36960243

Descriptive statistics

Standard deviation2.716446 × 108
Coefficient of variation (CV)3.7640049
Kurtosis174.33614
Mean72169035
Median Absolute Deviation (MAD)3378054.5
Skewness9.8846743
Sum7.2169035 × 1011
Variance7.3790788 × 1016
MonotonicityNot monotonic
2024-05-11T14:59:40.135898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2274
 
22.7%
250000 31
 
0.3%
500000 30
 
0.3%
300000 15
 
0.1%
242000 14
 
0.1%
484000 10
 
0.1%
1000000 10
 
0.1%
400000 10
 
0.1%
10000000 9
 
0.1%
30000000 9
 
0.1%
Other values (7385) 7588
75.9%
ValueCountFrequency (%)
-4090240000 1
< 0.1%
-421312336 1
< 0.1%
-243342513 1
< 0.1%
-199831810 1
< 0.1%
-190670210 1
< 0.1%
-138881815 1
< 0.1%
-128141840 1
< 0.1%
-121466611 1
< 0.1%
-110568181 1
< 0.1%
-104892771 1
< 0.1%
ValueCountFrequency (%)
7376961038 1
< 0.1%
6412048173 1
< 0.1%
5776718393 1
< 0.1%
5325406194 1
< 0.1%
5051111961 1
< 0.1%
4202900346 1
< 0.1%
4028253729 1
< 0.1%
3927368279 1
< 0.1%
3927107289 1
< 0.1%
3270784730 1
< 0.1%

Interactions

2024-05-11T14:59:36.880493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:59:40.244780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.498
금액0.4981.000

Missing values

2024-05-11T14:59:37.011845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:59:37.132824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
1389잠실올림픽공원아이파크A10025185당기순이익202012100338649
47565송천센트레빌A14272313현금20201221159
56959구로주공A15286809장기수선충당부채2020121772552961
510대치1차현대아파트A10024799선급비용202012760260
52085신길경남A15083703공동체활성화단체지원적립금2020121676950
62264마곡엠밸리7단지A15721007퇴직급여충당예금2020120
15025이문삼성래미안아파트A13076801예금202012126551339
18720도봉한신A13201209장기수선충당부채적립금2020120
37307가락극동A13816202선수수도료2020120
37856송파파인타운9단지A13821007현금202012131311
아파트명아파트코드비용명년월일금액
57853독산주공14단지A15375809주차장충당부채2020120
66425목동우성2차A15807703일반관리비충당부채2020120
22913행당두산위브아파트A13377901미수금2020120
9103디엠씨한양A12081703가수금2020125632127
56377신도림쌍용플래티넘노블A15283801기타의비유동자산2020120
11987백련산힐스테이트2차A12201002공동체활성화단체지원적립금2020120
11468신수현대A12185603저장품20201224000
43978월계청백3단지A13985105미부과관리비20201285023075
57848독산주공14단지A15375809미지급금202012131831409
57804독산동한양수자인아파트A15370301선급비용2020124805100