Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2332 (23.3%) zerosZeros

Reproduction

Analysis started2024-05-11 05:58:41.571789
Analysis finished2024-05-11 05:58:42.668271
Duration1.1 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2211
Distinct (%)22.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:58:42.913872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.3538
Min length2

Characters and Unicode

Total characters73538
Distinct characters434
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique120 ?
Unique (%)1.2%

Sample

1st row도봉현대성우
2nd row천호동아하이빌
3rd row한남힐스테이트
4th row북한산수자인
5th row북가좌삼호제2
ValueCountFrequency (%)
아파트 152
 
1.4%
래미안 29
 
0.3%
해모로 20
 
0.2%
e편한세상 19
 
0.2%
아이파크 18
 
0.2%
중계그린 17
 
0.2%
우리유앤미 16
 
0.1%
신반포 15
 
0.1%
경남아너스빌 15
 
0.1%
푸르지오 14
 
0.1%
Other values (2284) 10366
97.1%
2024-05-11T14:58:43.475791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2502
 
3.4%
2417
 
3.3%
2231
 
3.0%
1862
 
2.5%
1828
 
2.5%
1646
 
2.2%
1465
 
2.0%
1462
 
2.0%
1446
 
2.0%
1311
 
1.8%
Other values (424) 55368
75.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67298
91.5%
Decimal Number 3773
 
5.1%
Uppercase Letter 766
 
1.0%
Space Separator 759
 
1.0%
Lowercase Letter 332
 
0.5%
Open Punctuation 154
 
0.2%
Close Punctuation 154
 
0.2%
Other Punctuation 154
 
0.2%
Dash Punctuation 142
 
0.2%
Letter Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2502
 
3.7%
2417
 
3.6%
2231
 
3.3%
1862
 
2.8%
1828
 
2.7%
1646
 
2.4%
1465
 
2.2%
1462
 
2.2%
1446
 
2.1%
1311
 
1.9%
Other values (379) 49128
73.0%
Uppercase Letter
ValueCountFrequency (%)
S 114
14.9%
K 97
12.7%
C 96
12.5%
D 71
9.3%
M 71
9.3%
L 71
9.3%
I 47
6.1%
H 44
 
5.7%
G 38
 
5.0%
E 26
 
3.4%
Other values (7) 91
11.9%
Lowercase Letter
ValueCountFrequency (%)
e 195
58.7%
i 29
 
8.7%
l 24
 
7.2%
v 17
 
5.1%
s 16
 
4.8%
k 14
 
4.2%
a 9
 
2.7%
g 9
 
2.7%
w 8
 
2.4%
c 6
 
1.8%
Decimal Number
ValueCountFrequency (%)
1 1126
29.8%
2 1087
28.8%
3 466
12.4%
4 303
 
8.0%
5 217
 
5.8%
6 164
 
4.3%
7 128
 
3.4%
9 107
 
2.8%
8 98
 
2.6%
0 77
 
2.0%
Other Punctuation
ValueCountFrequency (%)
, 125
81.2%
. 29
 
18.8%
Space Separator
ValueCountFrequency (%)
759
100.0%
Open Punctuation
ValueCountFrequency (%)
( 154
100.0%
Close Punctuation
ValueCountFrequency (%)
) 154
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 142
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67298
91.5%
Common 5136
 
7.0%
Latin 1104
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2502
 
3.7%
2417
 
3.6%
2231
 
3.3%
1862
 
2.8%
1828
 
2.7%
1646
 
2.4%
1465
 
2.2%
1462
 
2.2%
1446
 
2.1%
1311
 
1.9%
Other values (379) 49128
73.0%
Latin
ValueCountFrequency (%)
e 195
17.7%
S 114
10.3%
K 97
 
8.8%
C 96
 
8.7%
D 71
 
6.4%
M 71
 
6.4%
L 71
 
6.4%
I 47
 
4.3%
H 44
 
4.0%
G 38
 
3.4%
Other values (19) 260
23.6%
Common
ValueCountFrequency (%)
1 1126
21.9%
2 1087
21.2%
759
14.8%
3 466
9.1%
4 303
 
5.9%
5 217
 
4.2%
6 164
 
3.2%
( 154
 
3.0%
) 154
 
3.0%
- 142
 
2.8%
Other values (6) 564
11.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67298
91.5%
ASCII 6234
 
8.5%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2502
 
3.7%
2417
 
3.6%
2231
 
3.3%
1862
 
2.8%
1828
 
2.7%
1646
 
2.4%
1465
 
2.2%
1462
 
2.2%
1446
 
2.1%
1311
 
1.9%
Other values (379) 49128
73.0%
ASCII
ValueCountFrequency (%)
1 1126
18.1%
2 1087
17.4%
759
12.2%
3 466
 
7.5%
4 303
 
4.9%
5 217
 
3.5%
e 195
 
3.1%
6 164
 
2.6%
( 154
 
2.5%
) 154
 
2.5%
Other values (34) 1609
25.8%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2216
Distinct (%)22.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:58:43.995363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique120 ?
Unique (%)1.2%

Sample

1st rowA13201201
2nd rowA13486504
3rd rowA14077901
4th rowA12204001
5th rowA12076601
ValueCountFrequency (%)
a13986306 17
 
0.2%
a13611005 13
 
0.1%
a13519001 12
 
0.1%
a15701007 12
 
0.1%
a13671209 12
 
0.1%
a15303002 11
 
0.1%
a15209002 11
 
0.1%
a13583402 11
 
0.1%
a15609301 10
 
0.1%
a15210212 10
 
0.1%
Other values (2206) 9881
98.8%
2024-05-11T14:58:44.697067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18491
20.5%
1 17768
19.7%
A 9992
11.1%
3 8749
9.7%
2 8293
9.2%
5 6293
 
7.0%
8 5529
 
6.1%
7 4602
 
5.1%
4 3917
 
4.4%
6 3328
 
3.7%
Other values (2) 3038
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18491
23.1%
1 17768
22.2%
3 8749
10.9%
2 8293
10.4%
5 6293
 
7.9%
8 5529
 
6.9%
7 4602
 
5.8%
4 3917
 
4.9%
6 3328
 
4.2%
9 3030
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9992
99.9%
B 8
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18491
23.1%
1 17768
22.2%
3 8749
10.9%
2 8293
10.4%
5 6293
 
7.9%
8 5529
 
6.9%
7 4602
 
5.8%
4 3917
 
4.9%
6 3328
 
4.2%
9 3030
 
3.8%
Latin
ValueCountFrequency (%)
A 9992
99.9%
B 8
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18491
20.5%
1 17768
19.7%
A 9992
11.1%
3 8749
9.7%
2 8293
9.2%
5 6293
 
7.0%
8 5529
 
6.1%
7 4602
 
5.1%
4 3917
 
4.4%
6 3328
 
3.7%
Other values (2) 3038
 
3.4%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:58:45.043763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length10
Mean length5.9814
Min length2

Characters and Unicode

Total characters59814
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row미지급금
2nd row기타의비유동부채
3rd row전신전화가입권
4th row현금
5th row미지급금
ValueCountFrequency (%)
관리비미수금 338
 
3.4%
미처분이익잉여금 320
 
3.2%
퇴직급여충당부채 315
 
3.1%
선급비용 312
 
3.1%
예수금 311
 
3.1%
예금 304
 
3.0%
공동주택적립금 304
 
3.0%
연차수당충당부채 301
 
3.0%
가수금 298
 
3.0%
당기순이익 298
 
3.0%
Other values (67) 6899
69.0%
2024-05-11T14:58:45.946731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4621
 
7.7%
3785
 
6.3%
3130
 
5.2%
3073
 
5.1%
3017
 
5.0%
2914
 
4.9%
2611
 
4.4%
2449
 
4.1%
1888
 
3.2%
1750
 
2.9%
Other values (97) 30576
51.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59814
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4621
 
7.7%
3785
 
6.3%
3130
 
5.2%
3073
 
5.1%
3017
 
5.0%
2914
 
4.9%
2611
 
4.4%
2449
 
4.1%
1888
 
3.2%
1750
 
2.9%
Other values (97) 30576
51.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59814
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4621
 
7.7%
3785
 
6.3%
3130
 
5.2%
3073
 
5.1%
3017
 
5.0%
2914
 
4.9%
2611
 
4.4%
2449
 
4.1%
1888
 
3.2%
1750
 
2.9%
Other values (97) 30576
51.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59814
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4621
 
7.7%
3785
 
6.3%
3130
 
5.2%
3073
 
5.1%
3017
 
5.0%
2914
 
4.9%
2611
 
4.4%
2449
 
4.1%
1888
 
3.2%
1750
 
2.9%
Other values (97) 30576
51.1%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202109
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202109
2nd row202109
3rd row202109
4th row202109
5th row202109

Common Values

ValueCountFrequency (%)
202109 10000
100.0%

Length

2024-05-11T14:58:46.142724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:58:46.281916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202109 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7333
Distinct (%)73.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72378979
Minimum-2.896616 × 108
Maximum6.6857401 × 109
Zeros2332
Zeros (%)23.3%
Negative328
Negative (%)3.3%
Memory size166.0 KiB
2024-05-11T14:58:46.426909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2.896616 × 108
5-th percentile0
Q10
median3186790
Q334856706
95-th percentile3.6775082 × 108
Maximum6.6857401 × 109
Range6.9754017 × 109
Interquartile range (IQR)34856706

Descriptive statistics

Standard deviation2.7400933 × 108
Coefficient of variation (CV)3.7857584
Kurtosis142.73069
Mean72378979
Median Absolute Deviation (MAD)3186790
Skewness9.8355887
Sum7.2378979 × 1011
Variance7.5081111 × 1016
MonotonicityNot monotonic
2024-05-11T14:58:46.661116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2332
 
23.3%
500000 21
 
0.2%
250000 19
 
0.2%
300000 18
 
0.2%
1000000 17
 
0.2%
242000 13
 
0.1%
200000 12
 
0.1%
30000000 12
 
0.1%
10000000 11
 
0.1%
2000000 10
 
0.1%
Other values (7323) 7535
75.3%
ValueCountFrequency (%)
-289661600 1
< 0.1%
-255090780 1
< 0.1%
-194704230 1
< 0.1%
-176135835 1
< 0.1%
-145205010 1
< 0.1%
-137092890 1
< 0.1%
-132018610 1
< 0.1%
-91590905 1
< 0.1%
-88646820 1
< 0.1%
-84630330 1
< 0.1%
ValueCountFrequency (%)
6685740092 1
< 0.1%
6066196221 1
< 0.1%
5538891275 1
< 0.1%
5094368716 1
< 0.1%
4873014602 1
< 0.1%
4187706428 1
< 0.1%
3962981329 1
< 0.1%
3640096976 1
< 0.1%
3518561164 1
< 0.1%
3407294651 1
< 0.1%

Interactions

2024-05-11T14:58:42.201146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:58:46.849869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.509
금액0.5091.000

Missing values

2024-05-11T14:58:42.435728image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:58:42.592990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
19523도봉현대성우A13201201미지급금2021091077000
27349천호동아하이빌A13486504기타의비유동부채202109197854
48062한남힐스테이트A14077901전신전화가입권2021090
13197북한산수자인A12204001현금202109166510
9870북가좌삼호제2A12076601미지급금20210911135290
17874신내6단지A13176901기타충당부채2021091400000
35609방배롯데캐슬아르떼A13771001미지급금20210996937989
213디에이치포레센트아파트A10024258선급비용20210918709110
49364수유극동아파트A14278101미지급금20210941844340
52460문래한신A15009602예금20210962687121
아파트명아파트코드비용명년월일금액
13987갈현베르빌주상복합아파트A12271402관리비예치금20210947830000
70068신정푸른마을2단지A15886508예수금2021092164690
33270월곡두산위브아파트A13613008선수수도료2021090
70242은평뉴타운상림마을7단지A41279903임대보증금20210916000000
36724서초진흥A13785604공동체활성화단체지원적립금2021090
19722도봉한신A13201209수선유지비충당부채20210923972690
70310은평뉴타운상림마을4단지A41279905선급금2021091566650
70369은평뉴타운상림마을10단지A41279906장기수선충당부채202109301943838
30810도곡대림A13586101전신전화가입권202109242000
22426브라운스톤쌍문A13295201가수금202109252000