Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1517 (15.2%) zerosZeros

Reproduction

Analysis started2024-05-11 06:58:58.743319
Analysis finished2024-05-11 06:59:00.970334
Duration2.23 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2109
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:59:01.468145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1683
Min length2

Characters and Unicode

Total characters71683
Distinct characters430
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique93 ?
Unique (%)0.9%

Sample

1st row녹번대림
2nd row당산2가현대
3rd row압구정한양3단지
4th row상계주공3단지
5th row대치포스코더샵
ValueCountFrequency (%)
아파트 125
 
1.2%
래미안 23
 
0.2%
힐스테이트 18
 
0.2%
신반포 17
 
0.2%
코오롱하늘채아파트 16
 
0.2%
신동아파밀리에 15
 
0.1%
우리유앤미 15
 
0.1%
래미안밤섬리베뉴 13
 
0.1%
e편한세상 13
 
0.1%
잠실파크리오 13
 
0.1%
Other values (2163) 10271
97.5%
2024-05-11T06:59:02.897470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2298
 
3.2%
2223
 
3.1%
1991
 
2.8%
1866
 
2.6%
1790
 
2.5%
1645
 
2.3%
1560
 
2.2%
1493
 
2.1%
1440
 
2.0%
1387
 
1.9%
Other values (420) 53990
75.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65833
91.8%
Decimal Number 3805
 
5.3%
Uppercase Letter 705
 
1.0%
Space Separator 588
 
0.8%
Lowercase Letter 268
 
0.4%
Open Punctuation 123
 
0.2%
Close Punctuation 123
 
0.2%
Dash Punctuation 120
 
0.2%
Other Punctuation 110
 
0.2%
Letter Number 8
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2298
 
3.5%
2223
 
3.4%
1991
 
3.0%
1866
 
2.8%
1790
 
2.7%
1645
 
2.5%
1560
 
2.4%
1493
 
2.3%
1440
 
2.2%
1387
 
2.1%
Other values (375) 48140
73.1%
Uppercase Letter
ValueCountFrequency (%)
S 108
15.3%
C 103
14.6%
K 94
13.3%
L 59
8.4%
D 50
7.1%
M 50
7.1%
G 49
7.0%
I 33
 
4.7%
H 33
 
4.7%
E 32
 
4.5%
Other values (7) 94
13.3%
Lowercase Letter
ValueCountFrequency (%)
e 167
62.3%
l 28
 
10.4%
i 24
 
9.0%
v 18
 
6.7%
w 7
 
2.6%
s 6
 
2.2%
k 6
 
2.2%
c 4
 
1.5%
a 3
 
1.1%
g 3
 
1.1%
Decimal Number
ValueCountFrequency (%)
2 1131
29.7%
1 1119
29.4%
3 528
13.9%
4 272
 
7.1%
5 198
 
5.2%
6 186
 
4.9%
9 113
 
3.0%
7 91
 
2.4%
8 89
 
2.3%
0 78
 
2.0%
Other Punctuation
ValueCountFrequency (%)
, 90
81.8%
. 20
 
18.2%
Space Separator
ValueCountFrequency (%)
588
100.0%
Open Punctuation
ValueCountFrequency (%)
( 123
100.0%
Close Punctuation
ValueCountFrequency (%)
) 123
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 120
100.0%
Letter Number
ValueCountFrequency (%)
8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65833
91.8%
Common 4869
 
6.8%
Latin 981
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2298
 
3.5%
2223
 
3.4%
1991
 
3.0%
1866
 
2.8%
1790
 
2.7%
1645
 
2.5%
1560
 
2.4%
1493
 
2.3%
1440
 
2.2%
1387
 
2.1%
Other values (375) 48140
73.1%
Latin
ValueCountFrequency (%)
e 167
17.0%
S 108
11.0%
C 103
10.5%
K 94
 
9.6%
L 59
 
6.0%
D 50
 
5.1%
M 50
 
5.1%
G 49
 
5.0%
I 33
 
3.4%
H 33
 
3.4%
Other values (19) 235
24.0%
Common
ValueCountFrequency (%)
2 1131
23.2%
1 1119
23.0%
588
12.1%
3 528
10.8%
4 272
 
5.6%
5 198
 
4.1%
6 186
 
3.8%
( 123
 
2.5%
) 123
 
2.5%
- 120
 
2.5%
Other values (6) 481
9.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65833
91.8%
ASCII 5842
 
8.1%
Number Forms 8
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2298
 
3.5%
2223
 
3.4%
1991
 
3.0%
1866
 
2.8%
1790
 
2.7%
1645
 
2.5%
1560
 
2.4%
1493
 
2.3%
1440
 
2.2%
1387
 
2.1%
Other values (375) 48140
73.1%
ASCII
ValueCountFrequency (%)
2 1131
19.4%
1 1119
19.2%
588
10.1%
3 528
 
9.0%
4 272
 
4.7%
5 198
 
3.4%
6 186
 
3.2%
e 167
 
2.9%
( 123
 
2.1%
) 123
 
2.1%
Other values (34) 1407
24.1%
Number Forms
ValueCountFrequency (%)
8
100.0%
Distinct2115
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:59:04.085788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique93 ?
Unique (%)0.9%

Sample

1st rowA12283603
2nd rowA15004202
3rd rowA13590602
4th rowA13971502
5th rowA13584101
ValueCountFrequency (%)
a15807705 13
 
0.1%
a13824006 13
 
0.1%
a10027920 12
 
0.1%
a13790004 12
 
0.1%
a15807101 12
 
0.1%
a15086007 11
 
0.1%
a13403101 11
 
0.1%
a13822902 11
 
0.1%
a13983712 11
 
0.1%
a13881701 11
 
0.1%
Other values (2105) 9883
98.8%
2024-05-11T06:59:05.831595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18295
20.3%
1 17687
19.7%
A 9993
11.1%
3 8826
9.8%
2 8125
9.0%
5 6244
 
6.9%
8 5765
 
6.4%
7 4845
 
5.4%
4 3730
 
4.1%
6 3479
 
3.9%
Other values (2) 3011
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18295
22.9%
1 17687
22.1%
3 8826
11.0%
2 8125
10.2%
5 6244
 
7.8%
8 5765
 
7.2%
7 4845
 
6.1%
4 3730
 
4.7%
6 3479
 
4.3%
9 3004
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9993
99.9%
B 7
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18295
22.9%
1 17687
22.1%
3 8826
11.0%
2 8125
10.2%
5 6244
 
7.8%
8 5765
 
7.2%
7 4845
 
6.1%
4 3730
 
4.7%
6 3479
 
4.3%
9 3004
 
3.8%
Latin
ValueCountFrequency (%)
A 9993
99.9%
B 7
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18295
20.3%
1 17687
19.7%
A 9993
11.1%
3 8826
9.8%
2 8125
9.0%
5 6244
 
6.9%
8 5765
 
6.4%
7 4845
 
5.4%
4 3730
 
4.1%
6 3479
 
3.9%
Other values (2) 3011
 
3.3%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:59:06.431111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.9195
Min length2

Characters and Unicode

Total characters49195
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row재활용품비용
2nd row급여
3rd row알뜰시장수익
4th row재활용품비용
5th row연체료수익
ValueCountFrequency (%)
급여 221
 
2.2%
사무용품비 220
 
2.2%
소독비 216
 
2.2%
통신비 210
 
2.1%
이자수익 209
 
2.1%
산재보험료 209
 
2.1%
도서인쇄비 208
 
2.1%
교육비 207
 
2.1%
연체료수익 205
 
2.1%
퇴직급여 205
 
2.1%
Other values (77) 7890
78.9%
2024-05-11T06:59:07.691010image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5429
 
11.0%
3572
 
7.3%
2039
 
4.1%
2010
 
4.1%
1811
 
3.7%
1274
 
2.6%
1037
 
2.1%
806
 
1.6%
763
 
1.6%
719
 
1.5%
Other values (110) 29735
60.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49195
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5429
 
11.0%
3572
 
7.3%
2039
 
4.1%
2010
 
4.1%
1811
 
3.7%
1274
 
2.6%
1037
 
2.1%
806
 
1.6%
763
 
1.6%
719
 
1.5%
Other values (110) 29735
60.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49195
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5429
 
11.0%
3572
 
7.3%
2039
 
4.1%
2010
 
4.1%
1811
 
3.7%
1274
 
2.6%
1037
 
2.1%
806
 
1.6%
763
 
1.6%
719
 
1.5%
Other values (110) 29735
60.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49195
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5429
 
11.0%
3572
 
7.3%
2039
 
4.1%
2010
 
4.1%
1811
 
3.7%
1274
 
2.6%
1037
 
2.1%
806
 
1.6%
763
 
1.6%
719
 
1.5%
Other values (110) 29735
60.4%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201908
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201908
2nd row201908
3rd row201908
4th row201908
5th row201908

Common Values

ValueCountFrequency (%)
201908 10000
100.0%

Length

2024-05-11T06:59:08.098620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:59:08.422904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201908 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6624
Distinct (%)66.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2976380.4
Minimum-32769330
Maximum3.430316 × 108
Zeros1517
Zeros (%)15.2%
Negative12
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:59:08.769583image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-32769330
5-th percentile0
Q153975
median296150
Q31319737.5
95-th percentile13588412
Maximum3.430316 × 108
Range3.7580093 × 108
Interquartile range (IQR)1265762.5

Descriptive statistics

Standard deviation12097385
Coefficient of variation (CV)4.0644619
Kurtosis244.24359
Mean2976380.4
Median Absolute Deviation (MAD)296150
Skewness12.871718
Sum2.9763804 × 1010
Variance1.4634672 × 1014
MonotonicityNot monotonic
2024-05-11T06:59:09.243923image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1517
 
15.2%
200000 91
 
0.9%
62500 75
 
0.8%
100000 71
 
0.7%
300000 60
 
0.6%
150000 45
 
0.4%
500000 44
 
0.4%
400000 44
 
0.4%
250000 37
 
0.4%
50000 33
 
0.3%
Other values (6614) 7983
79.8%
ValueCountFrequency (%)
-32769330 1
< 0.1%
-8575906 1
< 0.1%
-6310479 1
< 0.1%
-3070930 1
< 0.1%
-1467580 1
< 0.1%
-1021950 1
< 0.1%
-645000 1
< 0.1%
-260000 1
< 0.1%
-183300 1
< 0.1%
-30000 1
< 0.1%
ValueCountFrequency (%)
343031600 1
< 0.1%
321221537 1
< 0.1%
297604730 1
< 0.1%
282812480 1
< 0.1%
206216679 1
< 0.1%
201351822 1
< 0.1%
197601200 1
< 0.1%
195202772 1
< 0.1%
194225358 1
< 0.1%
179572749 1
< 0.1%

Interactions

2024-05-11T06:58:59.974064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:59:09.601882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.641
금액0.6411.000

Missing values

2024-05-11T06:59:00.529869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:59:00.839362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
17819녹번대림A12283603재활용품비용201908303000
71526당산2가현대A15004202급여20190815923960
41949압구정한양3단지A13590602알뜰시장수익201908190909
59187상계주공3단지A13971502재활용품비용2019081076000
41444대치포스코더샵A13584101연체료수익20190856840
66930번동삼성A14206001통신비20190828870
18670답십리두산A13003201제수당2019081829640
74712문래자이아파트A15083404경비비20190858298450
43977정릉힐스테이트3차A13610005검침수익201908224455
45324동일하이빌뉴시티A13613011입주자대표회의운영비2019081255000
아파트명아파트코드비용명년월일금액
49909방배래미안A13785301연차수당201908564110
94744목동현대하이페리온2차A15805111소모품비201908601500
68837자양현대홈타운8차A14319007급여2019084615000
51599신반포4차A13790828기타운영수익2019082576047
30537성수아이파크A13312303주차장운영비2019081909700
72776문래두산위브A15009505청소비2019084184270
96355목동우성2차A15807703재활용품수익201908852727
12859상암월드컵8단지A12127008세금과공과20190867500
60939상계한신1차A13981304이자수익201908-9701
33538성내2차e-편한세상A13403001급여2019083550000