Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1211 (12.1%) zerosZeros

Reproduction

Analysis started2024-05-11 06:54:35.741239
Analysis finished2024-05-11 06:54:37.193689
Duration1.45 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2187
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:37.448101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length18
Mean length7.2601
Min length2

Characters and Unicode

Total characters72601
Distinct characters432
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique121 ?
Unique (%)1.2%

Sample

1st row가락현대6차
2nd row해모로
3rd row신정이든채
4th row개포우성8차
5th row롯데캐슬천지인
ValueCountFrequency (%)
아파트 209
 
1.9%
래미안 33
 
0.3%
e편한세상 26
 
0.2%
아이파크 21
 
0.2%
힐스테이트 18
 
0.2%
북한산 18
 
0.2%
고덕 16
 
0.1%
해모로 16
 
0.1%
송파 15
 
0.1%
보라매 13
 
0.1%
Other values (2252) 10382
96.4%
2024-05-11T06:54:38.267333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2593
 
3.6%
2520
 
3.5%
2302
 
3.2%
1807
 
2.5%
1745
 
2.4%
1705
 
2.3%
1472
 
2.0%
1450
 
2.0%
1409
 
1.9%
1349
 
1.9%
Other values (422) 54249
74.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66516
91.6%
Decimal Number 3539
 
4.9%
Space Separator 859
 
1.2%
Uppercase Letter 795
 
1.1%
Lowercase Letter 325
 
0.4%
Close Punctuation 151
 
0.2%
Open Punctuation 151
 
0.2%
Dash Punctuation 136
 
0.2%
Other Punctuation 122
 
0.2%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2593
 
3.9%
2520
 
3.8%
2302
 
3.5%
1807
 
2.7%
1745
 
2.6%
1705
 
2.6%
1472
 
2.2%
1450
 
2.2%
1409
 
2.1%
1349
 
2.0%
Other values (377) 48164
72.4%
Uppercase Letter
ValueCountFrequency (%)
S 124
15.6%
K 101
12.7%
C 100
12.6%
L 78
9.8%
M 63
7.9%
D 63
7.9%
H 52
6.5%
I 43
 
5.4%
E 41
 
5.2%
G 28
 
3.5%
Other values (7) 102
12.8%
Lowercase Letter
ValueCountFrequency (%)
e 212
65.2%
i 24
 
7.4%
l 20
 
6.2%
v 14
 
4.3%
k 13
 
4.0%
s 13
 
4.0%
w 8
 
2.5%
c 6
 
1.8%
g 6
 
1.8%
a 6
 
1.8%
Decimal Number
ValueCountFrequency (%)
2 1084
30.6%
1 1038
29.3%
3 485
13.7%
4 238
 
6.7%
5 208
 
5.9%
6 137
 
3.9%
7 108
 
3.1%
8 96
 
2.7%
9 82
 
2.3%
0 63
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 101
82.8%
. 21
 
17.2%
Space Separator
ValueCountFrequency (%)
859
100.0%
Close Punctuation
ValueCountFrequency (%)
) 151
100.0%
Open Punctuation
ValueCountFrequency (%)
( 151
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 136
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66516
91.6%
Common 4958
 
6.8%
Latin 1127
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2593
 
3.9%
2520
 
3.8%
2302
 
3.5%
1807
 
2.7%
1745
 
2.6%
1705
 
2.6%
1472
 
2.2%
1450
 
2.2%
1409
 
2.1%
1349
 
2.0%
Other values (377) 48164
72.4%
Latin
ValueCountFrequency (%)
e 212
18.8%
S 124
11.0%
K 101
9.0%
C 100
 
8.9%
L 78
 
6.9%
M 63
 
5.6%
D 63
 
5.6%
H 52
 
4.6%
I 43
 
3.8%
E 41
 
3.6%
Other values (19) 250
22.2%
Common
ValueCountFrequency (%)
2 1084
21.9%
1 1038
20.9%
859
17.3%
3 485
9.8%
4 238
 
4.8%
5 208
 
4.2%
) 151
 
3.0%
( 151
 
3.0%
6 137
 
2.8%
- 136
 
2.7%
Other values (6) 471
9.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66516
91.6%
ASCII 6078
 
8.4%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2593
 
3.9%
2520
 
3.8%
2302
 
3.5%
1807
 
2.7%
1745
 
2.6%
1705
 
2.6%
1472
 
2.2%
1450
 
2.2%
1409
 
2.1%
1349
 
2.0%
Other values (377) 48164
72.4%
ASCII
ValueCountFrequency (%)
2 1084
17.8%
1 1038
17.1%
859
14.1%
3 485
 
8.0%
4 238
 
3.9%
e 212
 
3.5%
5 208
 
3.4%
) 151
 
2.5%
( 151
 
2.5%
6 137
 
2.3%
Other values (34) 1515
24.9%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2192
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:38.942475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique122 ?
Unique (%)1.2%

Sample

1st rowA13880201
2nd rowA14286108
3rd rowA10025649
4th rowA13580002
5th rowA11087601
ValueCountFrequency (%)
a13821003 13
 
0.1%
a13887405 13
 
0.1%
a12220004 12
 
0.1%
a13672102 12
 
0.1%
a15603008 12
 
0.1%
a13480401 11
 
0.1%
a15807705 11
 
0.1%
a15683901 11
 
0.1%
a10024831 11
 
0.1%
a13981901 11
 
0.1%
Other values (2182) 9883
98.8%
2024-05-11T06:54:40.337528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18618
20.7%
1 17620
19.6%
A 10000
11.1%
3 8918
9.9%
2 8267
9.2%
5 6389
 
7.1%
8 5795
 
6.4%
7 4594
 
5.1%
4 3824
 
4.2%
6 3262
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18618
23.3%
1 17620
22.0%
3 8918
11.1%
2 8267
10.3%
5 6389
 
8.0%
8 5795
 
7.2%
7 4594
 
5.7%
4 3824
 
4.8%
6 3262
 
4.1%
9 2713
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18618
23.3%
1 17620
22.0%
3 8918
11.1%
2 8267
10.3%
5 6389
 
8.0%
8 5795
 
7.2%
7 4594
 
5.7%
4 3824
 
4.8%
6 3262
 
4.1%
9 2713
 
3.4%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18618
20.7%
1 17620
19.6%
A 10000
11.1%
3 8918
9.9%
2 8267
9.2%
5 6389
 
7.1%
8 5795
 
6.4%
7 4594
 
5.1%
4 3824
 
4.2%
6 3262
 
3.6%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:40.981801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8637
Min length2

Characters and Unicode

Total characters48637
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row장기수선비
2nd row검침비용
3rd row고용안정사업수익
4th row음식물처리비
5th row보험료
ValueCountFrequency (%)
도서인쇄비 247
 
2.5%
세대전기료 227
 
2.3%
청소비 226
 
2.3%
통신비 223
 
2.2%
교육비 219
 
2.2%
건강보험료 214
 
2.1%
경비비 213
 
2.1%
소독비 212
 
2.1%
산재보험료 212
 
2.1%
급여 209
 
2.1%
Other values (77) 7798
78.0%
2024-05-11T06:54:42.213663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5444
 
11.2%
3598
 
7.4%
2199
 
4.5%
1959
 
4.0%
1695
 
3.5%
1299
 
2.7%
1089
 
2.2%
857
 
1.8%
857
 
1.8%
816
 
1.7%
Other values (110) 28824
59.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48637
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5444
 
11.2%
3598
 
7.4%
2199
 
4.5%
1959
 
4.0%
1695
 
3.5%
1299
 
2.7%
1089
 
2.2%
857
 
1.8%
857
 
1.8%
816
 
1.7%
Other values (110) 28824
59.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48637
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5444
 
11.2%
3598
 
7.4%
2199
 
4.5%
1959
 
4.0%
1695
 
3.5%
1299
 
2.7%
1089
 
2.2%
857
 
1.8%
857
 
1.8%
816
 
1.7%
Other values (110) 28824
59.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48637
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5444
 
11.2%
3598
 
7.4%
2199
 
4.5%
1959
 
4.0%
1695
 
3.5%
1299
 
2.7%
1089
 
2.2%
857
 
1.8%
857
 
1.8%
816
 
1.7%
Other values (110) 28824
59.3%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202105
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202105
2nd row202105
3rd row202105
4th row202105
5th row202105

Common Values

ValueCountFrequency (%)
202105 10000
100.0%

Length

2024-05-11T06:54:42.668340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:54:42.999756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202105 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6948
Distinct (%)69.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2993145.5
Minimum-3153230
Maximum3.0158188 × 108
Zeros1211
Zeros (%)12.1%
Negative7
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:54:43.243351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-3153230
5-th percentile0
Q174848
median310000
Q31350060
95-th percentile15157793
Maximum3.0158188 × 108
Range3.047351 × 108
Interquartile range (IQR)1275212

Descriptive statistics

Standard deviation10419118
Coefficient of variation (CV)3.4809928
Kurtosis200.39278
Mean2993145.5
Median Absolute Deviation (MAD)308060
Skewness11.253243
Sum2.9931455 × 1010
Variance1.0855802 × 1014
MonotonicityNot monotonic
2024-05-11T06:54:43.622736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1211
 
12.1%
200000 91
 
0.9%
300000 66
 
0.7%
30000 51
 
0.5%
100000 51
 
0.5%
150000 47
 
0.5%
250000 37
 
0.4%
400000 33
 
0.3%
120000 32
 
0.3%
60000 31
 
0.3%
Other values (6938) 8350
83.5%
ValueCountFrequency (%)
-3153230 1
 
< 0.1%
-2000000 1
 
< 0.1%
-784790 1
 
< 0.1%
-516720 1
 
< 0.1%
-69955 1
 
< 0.1%
-29100 1
 
< 0.1%
-1992 1
 
< 0.1%
0 1211
12.1%
7 1
 
< 0.1%
27 1
 
< 0.1%
ValueCountFrequency (%)
301581875 1
< 0.1%
252802524 1
< 0.1%
213585030 1
< 0.1%
203885721 1
< 0.1%
193126060 1
< 0.1%
183339805 1
< 0.1%
180888320 1
< 0.1%
179301660 1
< 0.1%
174419696 1
< 0.1%
145108830 1
< 0.1%

Interactions

2024-05-11T06:54:36.577281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:54:43.795240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.317
금액0.3171.000

Missing values

2024-05-11T06:54:36.833355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:54:37.018594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
57519가락현대6차A13880201장기수선비2021053200000
70734해모로A14286108검침비용202105184900
4226신정이든채A10025649고용안정사업수익202105240000
43565개포우성8차A13580002음식물처리비202105427410
12478롯데캐슬천지인A11087601보험료202105323816
22278제기현대A13006002피복비20210521800
41817일원목련타운A13523005식대202105600000
97582목동8단지A15807604위탁관리수수료202105552740
30583도봉파크빌2단지A13275303승강기유지비202105227000
20721북한산래미안A12275201회계감사비2021050
아파트명아파트코드비용명년월일금액
20585은평뉴타운상림마을2단지A12220003수도광열비2021050
79302봉천은천1단지A15106101주차장수익2021052630000
54815거여1단지A13811206장기수선비20210514694650
31803쌍문현대3차A13287801광고료수익20210530000
60712중계주공4단지A13922406검침수익202105296700
17171신공덕삼성임대A12179002잡수익2021050
52470방배래미안A13785301복리후생비202105546550
37239고덕아이파크아파트A13408003도서인쇄비202105249000
3589신당KCC스위첸아파트A10025372고용안정사업비용202105120000
79672라이프아파트A15177101복리후생비202105248490