Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2116 (21.2%) zerosZeros

Reproduction

Analysis started2024-05-11 06:01:21.344900
Analysis finished2024-05-11 06:01:22.548007
Duration1.2 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2117
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:01:22.833022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1193
Min length2

Characters and Unicode

Total characters71193
Distinct characters429
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique116 ?
Unique (%)1.2%

Sample

1st row송파성지
2nd row은평뉴타운상림마을6단지
3rd row장안위더스빌
4th row정릉힐스테이트3차
5th row월곡래미안루나밸리
ValueCountFrequency (%)
아파트 104
 
1.0%
래미안 23
 
0.2%
신동아파밀리에 18
 
0.2%
힐스테이트 14
 
0.1%
당산2차삼성 14
 
0.1%
신반포 13
 
0.1%
미아경남아너스빌 12
 
0.1%
방학우성2차 12
 
0.1%
가락2차쌍용아파트 12
 
0.1%
신내 12
 
0.1%
Other values (2170) 10236
97.8%
2024-05-11T15:01:23.489980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2162
 
3.0%
2105
 
3.0%
1895
 
2.7%
1868
 
2.6%
1842
 
2.6%
1656
 
2.3%
1531
 
2.2%
1501
 
2.1%
1475
 
2.1%
1348
 
1.9%
Other values (419) 53810
75.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65355
91.8%
Decimal Number 3793
 
5.3%
Uppercase Letter 686
 
1.0%
Space Separator 530
 
0.7%
Lowercase Letter 337
 
0.5%
Dash Punctuation 126
 
0.2%
Close Punctuation 123
 
0.2%
Open Punctuation 123
 
0.2%
Other Punctuation 116
 
0.2%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2162
 
3.3%
2105
 
3.2%
1895
 
2.9%
1868
 
2.9%
1842
 
2.8%
1656
 
2.5%
1531
 
2.3%
1501
 
2.3%
1475
 
2.3%
1348
 
2.1%
Other values (374) 47972
73.4%
Uppercase Letter
ValueCountFrequency (%)
S 118
17.2%
K 101
14.7%
C 73
10.6%
L 65
9.5%
H 55
8.0%
I 39
 
5.7%
E 35
 
5.1%
M 30
 
4.4%
D 30
 
4.4%
G 29
 
4.2%
Other values (7) 111
16.2%
Lowercase Letter
ValueCountFrequency (%)
e 186
55.2%
i 36
 
10.7%
l 34
 
10.1%
v 22
 
6.5%
w 12
 
3.6%
k 10
 
3.0%
c 10
 
3.0%
s 9
 
2.7%
g 7
 
2.1%
a 7
 
2.1%
Decimal Number
ValueCountFrequency (%)
2 1135
29.9%
1 1116
29.4%
3 514
13.6%
4 248
 
6.5%
5 221
 
5.8%
6 150
 
4.0%
9 122
 
3.2%
7 113
 
3.0%
0 94
 
2.5%
8 80
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 94
81.0%
. 22
 
19.0%
Space Separator
ValueCountFrequency (%)
530
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 126
100.0%
Close Punctuation
ValueCountFrequency (%)
) 123
100.0%
Open Punctuation
ValueCountFrequency (%)
( 123
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65355
91.8%
Common 4811
 
6.8%
Latin 1027
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2162
 
3.3%
2105
 
3.2%
1895
 
2.9%
1868
 
2.9%
1842
 
2.8%
1656
 
2.5%
1531
 
2.3%
1501
 
2.3%
1475
 
2.3%
1348
 
2.1%
Other values (374) 47972
73.4%
Latin
ValueCountFrequency (%)
e 186
18.1%
S 118
11.5%
K 101
 
9.8%
C 73
 
7.1%
L 65
 
6.3%
H 55
 
5.4%
I 39
 
3.8%
i 36
 
3.5%
E 35
 
3.4%
l 34
 
3.3%
Other values (19) 285
27.8%
Common
ValueCountFrequency (%)
2 1135
23.6%
1 1116
23.2%
530
11.0%
3 514
10.7%
4 248
 
5.2%
5 221
 
4.6%
6 150
 
3.1%
- 126
 
2.6%
) 123
 
2.6%
( 123
 
2.6%
Other values (6) 525
10.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65355
91.8%
ASCII 5834
 
8.2%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2162
 
3.3%
2105
 
3.2%
1895
 
2.9%
1868
 
2.9%
1842
 
2.8%
1656
 
2.5%
1531
 
2.3%
1501
 
2.3%
1475
 
2.3%
1348
 
2.1%
Other values (374) 47972
73.4%
ASCII
ValueCountFrequency (%)
2 1135
19.5%
1 1116
19.1%
530
 
9.1%
3 514
 
8.8%
4 248
 
4.3%
5 221
 
3.8%
e 186
 
3.2%
6 150
 
2.6%
- 126
 
2.2%
) 123
 
2.1%
Other values (34) 1485
25.5%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2123
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:01:24.016992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique118 ?
Unique (%)1.2%

Sample

1st rowA13817202
2nd rowA41279904
3rd rowA13078701
4th rowA13610005
5th rowA13613006
ValueCountFrequency (%)
a15004405 14
 
0.1%
a13282510 12
 
0.1%
a15284302 12
 
0.1%
a13880105 12
 
0.1%
a14272306 12
 
0.1%
a15671801 12
 
0.1%
a15279101 12
 
0.1%
a13523001 11
 
0.1%
a13120403 11
 
0.1%
a13527003 11
 
0.1%
Other values (2113) 9881
98.8%
2024-05-11T15:01:24.674193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18171
20.2%
1 17730
19.7%
A 9993
11.1%
3 8881
9.9%
2 8001
8.9%
5 6293
 
7.0%
8 5842
 
6.5%
7 4938
 
5.5%
4 3763
 
4.2%
6 3418
 
3.8%
Other values (2) 2970
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18171
22.7%
1 17730
22.2%
3 8881
11.1%
2 8001
10.0%
5 6293
 
7.9%
8 5842
 
7.3%
7 4938
 
6.2%
4 3763
 
4.7%
6 3418
 
4.3%
9 2963
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9993
99.9%
B 7
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18171
22.7%
1 17730
22.2%
3 8881
11.1%
2 8001
10.0%
5 6293
 
7.9%
8 5842
 
7.3%
7 4938
 
6.2%
4 3763
 
4.7%
6 3418
 
4.3%
9 2963
 
3.7%
Latin
ValueCountFrequency (%)
A 9993
99.9%
B 7
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18171
20.2%
1 17730
19.7%
A 9993
11.1%
3 8881
9.9%
2 8001
8.9%
5 6293
 
7.0%
8 5842
 
6.5%
7 4938
 
5.5%
4 3763
 
4.2%
6 3418
 
3.8%
Other values (2) 2970
 
3.3%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:01:24.994858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length6.0235
Min length2

Characters and Unicode

Total characters60235
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row장기수선충당부채
2nd row미처분이익잉여금
3rd row공동주택적립금
4th row연차수당충당부채
5th row기타시설운영충당부채
ValueCountFrequency (%)
연차수당충당부채 324
 
3.2%
당기순이익 323
 
3.2%
선급비용 321
 
3.2%
퇴직급여충당부채 320
 
3.2%
공동주택적립금 315
 
3.1%
관리비미수금 314
 
3.1%
예금 311
 
3.1%
미처분이익잉여금 309
 
3.1%
수선유지비충당부채 308
 
3.1%
장기수선충당예금 305
 
3.0%
Other values (67) 6850
68.5%
2024-05-11T15:01:25.505375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4664
 
7.7%
3894
 
6.5%
3213
 
5.3%
3161
 
5.2%
2990
 
5.0%
2959
 
4.9%
2683
 
4.5%
2362
 
3.9%
2000
 
3.3%
1798
 
3.0%
Other values (97) 30511
50.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 60235
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4664
 
7.7%
3894
 
6.5%
3213
 
5.3%
3161
 
5.2%
2990
 
5.0%
2959
 
4.9%
2683
 
4.5%
2362
 
3.9%
2000
 
3.3%
1798
 
3.0%
Other values (97) 30511
50.7%

Most occurring scripts

ValueCountFrequency (%)
Hangul 60235
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4664
 
7.7%
3894
 
6.5%
3213
 
5.3%
3161
 
5.2%
2990
 
5.0%
2959
 
4.9%
2683
 
4.5%
2362
 
3.9%
2000
 
3.3%
1798
 
3.0%
Other values (97) 30511
50.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 60235
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4664
 
7.7%
3894
 
6.5%
3213
 
5.3%
3161
 
5.2%
2990
 
5.0%
2959
 
4.9%
2683
 
4.5%
2362
 
3.9%
2000
 
3.3%
1798
 
3.0%
Other values (97) 30511
50.7%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201908
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201908
2nd row201908
3rd row201908
4th row201908
5th row201908

Common Values

ValueCountFrequency (%)
201908 10000
100.0%

Length

2024-05-11T15:01:25.676624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:01:25.802620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201908 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7535
Distinct (%)75.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71406874
Minimum-4.09024 × 109
Maximum8.5542298 × 109
Zeros2116
Zeros (%)21.2%
Negative347
Negative (%)3.5%
Memory size166.0 KiB
2024-05-11T15:01:25.978430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4.09024 × 109
5-th percentile0
Q12177.5
median3836096.5
Q339072418
95-th percentile3.4314286 × 108
Maximum8.5542298 × 109
Range1.264447 × 1010
Interquartile range (IQR)39070240

Descriptive statistics

Standard deviation2.6793038 × 108
Coefficient of variation (CV)3.7521651
Kurtosis216.32651
Mean71406874
Median Absolute Deviation (MAD)3836096.5
Skewness10.461718
Sum7.1406874 × 1011
Variance7.1786688 × 1016
MonotonicityNot monotonic
2024-05-11T15:01:26.202351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2116
 
21.2%
500000 26
 
0.3%
250000 24
 
0.2%
300000 17
 
0.2%
242000 17
 
0.2%
30000000 15
 
0.1%
20000000 13
 
0.1%
1000000 12
 
0.1%
200000 11
 
0.1%
100000 10
 
0.1%
Other values (7525) 7739
77.4%
ValueCountFrequency (%)
-4090240000 1
< 0.1%
-2103712870 1
< 0.1%
-401540536 1
< 0.1%
-337879488 1
< 0.1%
-302145700 1
< 0.1%
-189656470 1
< 0.1%
-150627170 1
< 0.1%
-145334189 1
< 0.1%
-135615804 1
< 0.1%
-134212500 1
< 0.1%
ValueCountFrequency (%)
8554229848 1
< 0.1%
6343122933 1
< 0.1%
6149131802 1
< 0.1%
4950393542 1
< 0.1%
4771438214 1
< 0.1%
4030634614 1
< 0.1%
3661585934 1
< 0.1%
3517225852 1
< 0.1%
3444824045 1
< 0.1%
3441331479 1
< 0.1%

Interactions

2024-05-11T15:01:22.121461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T15:01:26.334745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.318
금액0.3181.000

Missing values

2024-05-11T15:01:22.328408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:01:22.479403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
34934송파성지A13817202장기수선충당부채201908520353409
65098은평뉴타운상림마을6단지A41279904미처분이익잉여금2019080
13052장안위더스빌A13078701공동주택적립금2019085861141
28884정릉힐스테이트3차A13610005연차수당충당부채2019088249720
29717월곡래미안루나밸리A13613006기타시설운영충당부채20190827475770
29206길음삼부A13611004전신전화가입권201908484000
39125공릉청솔9단지A13980006전신전화가입권201908350000
6757인왕산현대A12078201미수관리비예치금2019081274000
30303래미안라센트A13671209기타공동주택관리비충당부채201908251826
12023답십리동아A13003406예금201908287905725
아파트명아파트코드비용명년월일금액
277래미안길음센터피스A10025638당기순이익20190837346592
39328공릉태강A13980019관리비미수금20190884701430
52697오류한신플러스타운A15210106미지급금20190845096020
3348신촌푸르지오 아파트A10027851가지급금2019085077216
40636상계주공7단지A13982704퇴직급여충당부채201908201204370
7475염리삼성래미안A12109002미수금2019083440000
27435수서한아름A13588402선급금201908843370
25043압구정신현대A13511004미부과관리비201908726536633
53900구로새솔금호A15284305선급비용20190818287130
64245목동금호베스트빌A15880905상여충당부채2019080