Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1575 (15.8%) zerosZeros

Reproduction

Analysis started2024-05-11 06:58:28.421962
Analysis finished2024-05-11 06:58:30.771835
Duration2.35 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2088
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:31.286352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1248
Min length2

Characters and Unicode

Total characters71248
Distinct characters426
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique82 ?
Unique (%)0.8%

Sample

1st row신대림자이
2nd row중계라이프신동아청구아파트
3rd row일원목련타운
4th row광장힐스테이트
5th row밤섬경남아너스빌
ValueCountFrequency (%)
아파트 150
 
1.4%
래미안 29
 
0.3%
아이파크 19
 
0.2%
브라운스톤 14
 
0.1%
잠실우성1,2,3차 13
 
0.1%
신동아파밀리에 13
 
0.1%
고덕현대 12
 
0.1%
왕십리 12
 
0.1%
종암2차sk뷰 12
 
0.1%
오금우방 11
 
0.1%
Other values (2148) 10295
97.3%
2024-05-11T06:58:32.925154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2303
 
3.2%
2172
 
3.0%
1980
 
2.8%
1864
 
2.6%
1757
 
2.5%
1685
 
2.4%
1555
 
2.2%
1503
 
2.1%
1388
 
1.9%
1378
 
1.9%
Other values (416) 53663
75.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65345
91.7%
Decimal Number 3689
 
5.2%
Uppercase Letter 748
 
1.0%
Space Separator 628
 
0.9%
Lowercase Letter 311
 
0.4%
Other Punctuation 133
 
0.2%
Close Punctuation 132
 
0.2%
Open Punctuation 132
 
0.2%
Dash Punctuation 125
 
0.2%
Letter Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2303
 
3.5%
2172
 
3.3%
1980
 
3.0%
1864
 
2.9%
1757
 
2.7%
1685
 
2.6%
1555
 
2.4%
1503
 
2.3%
1388
 
2.1%
1378
 
2.1%
Other values (371) 47760
73.1%
Uppercase Letter
ValueCountFrequency (%)
S 132
17.6%
C 99
13.2%
K 91
12.2%
L 57
7.6%
H 56
7.5%
D 55
7.4%
M 55
7.4%
E 42
 
5.6%
G 33
 
4.4%
I 33
 
4.4%
Other values (7) 95
12.7%
Lowercase Letter
ValueCountFrequency (%)
e 172
55.3%
l 34
 
10.9%
i 29
 
9.3%
v 23
 
7.4%
k 16
 
5.1%
s 12
 
3.9%
w 10
 
3.2%
c 10
 
3.2%
g 2
 
0.6%
a 2
 
0.6%
Decimal Number
ValueCountFrequency (%)
2 1163
31.5%
1 1098
29.8%
3 491
13.3%
4 230
 
6.2%
5 203
 
5.5%
6 168
 
4.6%
9 94
 
2.5%
7 81
 
2.2%
8 81
 
2.2%
0 80
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 114
85.7%
. 19
 
14.3%
Space Separator
ValueCountFrequency (%)
628
100.0%
Close Punctuation
ValueCountFrequency (%)
) 132
100.0%
Open Punctuation
ValueCountFrequency (%)
( 132
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 125
100.0%
Letter Number
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65345
91.7%
Common 4839
 
6.8%
Latin 1064
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2303
 
3.5%
2172
 
3.3%
1980
 
3.0%
1864
 
2.9%
1757
 
2.7%
1685
 
2.6%
1555
 
2.4%
1503
 
2.3%
1388
 
2.1%
1378
 
2.1%
Other values (371) 47760
73.1%
Latin
ValueCountFrequency (%)
e 172
16.2%
S 132
12.4%
C 99
 
9.3%
K 91
 
8.6%
L 57
 
5.4%
H 56
 
5.3%
D 55
 
5.2%
M 55
 
5.2%
E 42
 
3.9%
l 34
 
3.2%
Other values (19) 271
25.5%
Common
ValueCountFrequency (%)
2 1163
24.0%
1 1098
22.7%
628
13.0%
3 491
10.1%
4 230
 
4.8%
5 203
 
4.2%
6 168
 
3.5%
) 132
 
2.7%
( 132
 
2.7%
- 125
 
2.6%
Other values (6) 469
9.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65345
91.7%
ASCII 5898
 
8.3%
Number Forms 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2303
 
3.5%
2172
 
3.3%
1980
 
3.0%
1864
 
2.9%
1757
 
2.7%
1685
 
2.6%
1555
 
2.4%
1503
 
2.3%
1388
 
2.1%
1378
 
2.1%
Other values (371) 47760
73.1%
ASCII
ValueCountFrequency (%)
2 1163
19.7%
1 1098
18.6%
628
10.6%
3 491
 
8.3%
4 230
 
3.9%
5 203
 
3.4%
e 172
 
2.9%
6 168
 
2.8%
) 132
 
2.2%
S 132
 
2.2%
Other values (34) 1481
25.1%
Number Forms
ValueCountFrequency (%)
5
100.0%
Distinct2093
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:33.841734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique83 ?
Unique (%)0.8%

Sample

1st rowA15007201
2nd rowA13986111
3rd rowA13523005
4th rowA14375301
5th rowA12171201
ValueCountFrequency (%)
a13822702 13
 
0.1%
a13671213 12
 
0.1%
a12170601 11
 
0.1%
a10026947 11
 
0.1%
a13813002 11
 
0.1%
a15704023 11
 
0.1%
a13204508 11
 
0.1%
a13789203 11
 
0.1%
a12170401 11
 
0.1%
a12220002 11
 
0.1%
Other values (2083) 9887
98.9%
2024-05-11T06:58:35.257238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18567
20.6%
1 17664
19.6%
A 10000
11.1%
3 8848
9.8%
2 8098
9.0%
5 6265
 
7.0%
8 5744
 
6.4%
7 4759
 
5.3%
4 3814
 
4.2%
6 3433
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18567
23.2%
1 17664
22.1%
3 8848
11.1%
2 8098
10.1%
5 6265
 
7.8%
8 5744
 
7.2%
7 4759
 
5.9%
4 3814
 
4.8%
6 3433
 
4.3%
9 2808
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18567
23.2%
1 17664
22.1%
3 8848
11.1%
2 8098
10.1%
5 6265
 
7.8%
8 5744
 
7.2%
7 4759
 
5.9%
4 3814
 
4.8%
6 3433
 
4.3%
9 2808
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18567
20.6%
1 17664
19.6%
A 10000
11.1%
3 8848
9.8%
2 8098
9.0%
5 6265
 
7.0%
8 5744
 
6.4%
7 4759
 
5.3%
4 3814
 
4.2%
6 3433
 
3.8%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:35.955943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.908
Min length2

Characters and Unicode

Total characters49080
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row승강기수익
2nd row건강보험료
3rd row사무용품비
4th row입주자대표회의운영비
5th row업무추진비
ValueCountFrequency (%)
청소비 235
 
2.4%
소독비 224
 
2.2%
도서인쇄비 224
 
2.2%
이자수익 222
 
2.2%
세대전기료 208
 
2.1%
경비비 207
 
2.1%
사무용품비 206
 
2.1%
연체료수익 205
 
2.1%
통신비 203
 
2.0%
수선유지비 200
 
2.0%
Other values (77) 7866
78.7%
2024-05-11T06:58:37.381760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5425
 
11.1%
3540
 
7.2%
2083
 
4.2%
2013
 
4.1%
1757
 
3.6%
1274
 
2.6%
1005
 
2.0%
850
 
1.7%
780
 
1.6%
739
 
1.5%
Other values (110) 29614
60.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49080
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5425
 
11.1%
3540
 
7.2%
2083
 
4.2%
2013
 
4.1%
1757
 
3.6%
1274
 
2.6%
1005
 
2.0%
850
 
1.7%
780
 
1.6%
739
 
1.5%
Other values (110) 29614
60.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49080
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5425
 
11.1%
3540
 
7.2%
2083
 
4.2%
2013
 
4.1%
1757
 
3.6%
1274
 
2.6%
1005
 
2.0%
850
 
1.7%
780
 
1.6%
739
 
1.5%
Other values (110) 29614
60.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49080
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5425
 
11.1%
3540
 
7.2%
2083
 
4.2%
2013
 
4.1%
1757
 
3.6%
1274
 
2.6%
1005
 
2.0%
850
 
1.7%
780
 
1.6%
739
 
1.5%
Other values (110) 29614
60.3%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201910
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201910
2nd row201910
3rd row201910
4th row201910
5th row201910

Common Values

ValueCountFrequency (%)
201910 10000
100.0%

Length

2024-05-11T06:58:37.888044image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:58:38.228240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201910 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6682
Distinct (%)66.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2814806.5
Minimum-3421000
Maximum4.4191358 × 108
Zeros1575
Zeros (%)15.8%
Negative12
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:58:38.767036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-3421000
5-th percentile0
Q150000
median297445
Q31324390
95-th percentile14044091
Maximum4.4191358 × 108
Range4.4533458 × 108
Interquartile range (IQR)1274390

Descriptive statistics

Standard deviation10365038
Coefficient of variation (CV)3.6823271
Kurtosis468.29485
Mean2814806.5
Median Absolute Deviation (MAD)297445
Skewness15.923266
Sum2.8148065 × 1010
Variance1.0743402 × 1014
MonotonicityNot monotonic
2024-05-11T06:58:39.498570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1575
 
15.8%
200000 77
 
0.8%
300000 70
 
0.7%
100000 62
 
0.6%
150000 41
 
0.4%
250000 34
 
0.3%
600000 28
 
0.3%
500000 27
 
0.3%
110000 25
 
0.2%
30000 25
 
0.2%
Other values (6672) 8036
80.4%
ValueCountFrequency (%)
-3421000 1
< 0.1%
-1134415 1
< 0.1%
-1100000 1
< 0.1%
-515480 1
< 0.1%
-436660 1
< 0.1%
-350000 1
< 0.1%
-227418 1
< 0.1%
-19990 1
< 0.1%
-15960 1
< 0.1%
-11550 1
< 0.1%
ValueCountFrequency (%)
441913580 1
< 0.1%
297087528 1
< 0.1%
231378144 1
< 0.1%
211313736 1
< 0.1%
193275270 1
< 0.1%
172888865 1
< 0.1%
169481573 1
< 0.1%
139002470 1
< 0.1%
113953970 1
< 0.1%
108890097 1
< 0.1%

Interactions

2024-05-11T06:58:29.769890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:58:39.948934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.258
금액0.2581.000

Missing values

2024-05-11T06:58:30.214275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:58:30.587829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
73038신대림자이A15007201승강기수익201910540000
64160중계라이프신동아청구아파트A13986111건강보험료2019101214160
39804일원목련타운A13523005사무용품비201910117000
70264광장힐스테이트A14375301입주자대표회의운영비201910691000
13796밤섬경남아너스빌A12171201업무추진비201910150000
68376SK북한산시티아파트A14272304기타부대비20191060020
33138성수청구강변A13383003교통비2019100
44493정릉산장A13610004급여2019104610000
19846용두신동아A13007004도서인쇄비201910653000
22286청량리신현대A13087201임대료수익2019105025000
아파트명아파트코드비용명년월일금액
1033목동파크자이아파트A10025729이자수익2019100
87450사당롯데캐슬A15609301광고료수익201910300000
94566염창우성2차A15786405잡수익20191050910
74273당산반도유보라A15072201감가상각비20191064500
44700정릉힐스테이트A13610103소모품비20191085700
33855강변건영A13392307교육비2019100
5304위례2차아이파크아파트A10027553복리후생비201910619500
36516성내현대A13484003기타부대비201910161300
59475하계미성A13923104잡비용2019104425380
92654가양강나루현대A15780401공동전기료2019106353170