Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1446 (14.5%) zerosZeros

Reproduction

Analysis started2024-05-11 06:59:12.012148
Analysis finished2024-05-11 06:59:14.176808
Duration2.16 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2110
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:59:14.841543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1243
Min length2

Characters and Unicode

Total characters71243
Distinct characters430
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique94 ?
Unique (%)0.9%

Sample

1st row여의도한양
2nd row천왕이펜하우스2단지
3rd row잠원한신로얄
4th row보라매아카데미타워아파트
5th rowe편한세상신촌아파트
ValueCountFrequency (%)
아파트 122
 
1.2%
래미안 29
 
0.3%
힐스테이트 18
 
0.2%
왕십리 15
 
0.1%
브라운스톤 15
 
0.1%
신반포 14
 
0.1%
목동2단지 13
 
0.1%
e편한세상 12
 
0.1%
청계벽산 12
 
0.1%
공덕자이 12
 
0.1%
Other values (2165) 10280
97.5%
2024-05-11T06:59:16.488261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2225
 
3.1%
2216
 
3.1%
1970
 
2.8%
1885
 
2.6%
1766
 
2.5%
1644
 
2.3%
1517
 
2.1%
1516
 
2.1%
1393
 
2.0%
1359
 
1.9%
Other values (420) 53752
75.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65416
91.8%
Decimal Number 3686
 
5.2%
Uppercase Letter 731
 
1.0%
Space Separator 584
 
0.8%
Lowercase Letter 344
 
0.5%
Dash Punctuation 131
 
0.2%
Open Punctuation 121
 
0.2%
Close Punctuation 121
 
0.2%
Other Punctuation 105
 
0.1%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2225
 
3.4%
2216
 
3.4%
1970
 
3.0%
1885
 
2.9%
1766
 
2.7%
1644
 
2.5%
1517
 
2.3%
1516
 
2.3%
1393
 
2.1%
1359
 
2.1%
Other values (375) 47925
73.3%
Uppercase Letter
ValueCountFrequency (%)
S 127
17.4%
C 91
12.4%
K 84
11.5%
L 61
8.3%
H 57
7.8%
M 49
 
6.7%
D 49
 
6.7%
G 40
 
5.5%
E 39
 
5.3%
I 36
 
4.9%
Other values (7) 98
13.4%
Lowercase Letter
ValueCountFrequency (%)
e 189
54.9%
l 40
 
11.6%
i 35
 
10.2%
v 25
 
7.3%
c 14
 
4.1%
k 12
 
3.5%
w 12
 
3.5%
s 8
 
2.3%
h 3
 
0.9%
a 3
 
0.9%
Decimal Number
ValueCountFrequency (%)
1 1113
30.2%
2 1082
29.4%
3 494
13.4%
4 250
 
6.8%
5 203
 
5.5%
6 161
 
4.4%
7 113
 
3.1%
9 108
 
2.9%
8 84
 
2.3%
0 78
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 91
86.7%
. 14
 
13.3%
Space Separator
ValueCountFrequency (%)
584
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 131
100.0%
Open Punctuation
ValueCountFrequency (%)
( 121
100.0%
Close Punctuation
ValueCountFrequency (%)
) 121
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65416
91.8%
Common 4748
 
6.7%
Latin 1079
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2225
 
3.4%
2216
 
3.4%
1970
 
3.0%
1885
 
2.9%
1766
 
2.7%
1644
 
2.5%
1517
 
2.3%
1516
 
2.3%
1393
 
2.1%
1359
 
2.1%
Other values (375) 47925
73.3%
Latin
ValueCountFrequency (%)
e 189
17.5%
S 127
11.8%
C 91
 
8.4%
K 84
 
7.8%
L 61
 
5.7%
H 57
 
5.3%
M 49
 
4.5%
D 49
 
4.5%
G 40
 
3.7%
l 40
 
3.7%
Other values (19) 292
27.1%
Common
ValueCountFrequency (%)
1 1113
23.4%
2 1082
22.8%
584
12.3%
3 494
10.4%
4 250
 
5.3%
5 203
 
4.3%
6 161
 
3.4%
- 131
 
2.8%
( 121
 
2.5%
) 121
 
2.5%
Other values (6) 488
10.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65416
91.8%
ASCII 5823
 
8.2%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2225
 
3.4%
2216
 
3.4%
1970
 
3.0%
1885
 
2.9%
1766
 
2.7%
1644
 
2.5%
1517
 
2.3%
1516
 
2.3%
1393
 
2.1%
1359
 
2.1%
Other values (375) 47925
73.3%
ASCII
ValueCountFrequency (%)
1 1113
19.1%
2 1082
18.6%
584
 
10.0%
3 494
 
8.5%
4 250
 
4.3%
5 203
 
3.5%
e 189
 
3.2%
6 161
 
2.8%
- 131
 
2.2%
S 127
 
2.2%
Other values (34) 1489
25.6%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2116
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:59:17.517896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique95 ?
Unique (%)0.9%

Sample

1st rowA15088918
2nd rowA15213003
3rd rowA13790706
4th rowA15601002
5th rowA10027213
ValueCountFrequency (%)
a15875102 13
 
0.1%
a10027906 12
 
0.1%
a13302001 12
 
0.1%
a13881204 12
 
0.1%
a10026207 12
 
0.1%
a13470101 12
 
0.1%
a14319008 11
 
0.1%
a13606004 11
 
0.1%
a14003105 11
 
0.1%
a10025850 11
 
0.1%
Other values (2106) 9883
98.8%
2024-05-11T06:59:19.460643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18483
20.5%
1 17644
19.6%
A 9996
11.1%
3 8842
9.8%
2 7930
8.8%
5 6223
 
6.9%
8 5861
 
6.5%
7 4785
 
5.3%
4 3861
 
4.3%
6 3453
 
3.8%
Other values (2) 2922
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18483
23.1%
1 17644
22.1%
3 8842
11.1%
2 7930
9.9%
5 6223
 
7.8%
8 5861
 
7.3%
7 4785
 
6.0%
4 3861
 
4.8%
6 3453
 
4.3%
9 2918
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 9996
> 99.9%
B 4
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18483
23.1%
1 17644
22.1%
3 8842
11.1%
2 7930
9.9%
5 6223
 
7.8%
8 5861
 
7.3%
7 4785
 
6.0%
4 3861
 
4.8%
6 3453
 
4.3%
9 2918
 
3.6%
Latin
ValueCountFrequency (%)
A 9996
> 99.9%
B 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18483
20.5%
1 17644
19.6%
A 9996
11.1%
3 8842
9.8%
2 7930
8.8%
5 6223
 
6.9%
8 5861
 
6.5%
7 4785
 
5.3%
4 3861
 
4.3%
6 3453
 
3.8%
Other values (2) 2922
 
3.2%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:59:20.444968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.9016
Min length2

Characters and Unicode

Total characters49016
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row재활용품비용
2nd row부과차익
3rd row기타부대비
4th row보험료
5th row공동전기료
ValueCountFrequency (%)
수선유지비 232
 
2.3%
연체료수익 229
 
2.3%
이자수익 228
 
2.3%
청소비 222
 
2.2%
경비비 218
 
2.2%
세대전기료 217
 
2.2%
승강기유지비 213
 
2.1%
교육비 213
 
2.1%
제수당 208
 
2.1%
소독비 207
 
2.1%
Other values (77) 7813
78.1%
2024-05-11T06:59:22.046665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5488
 
11.2%
3590
 
7.3%
2081
 
4.2%
2016
 
4.1%
1723
 
3.5%
1325
 
2.7%
1063
 
2.2%
852
 
1.7%
818
 
1.7%
765
 
1.6%
Other values (110) 29295
59.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49016
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5488
 
11.2%
3590
 
7.3%
2081
 
4.2%
2016
 
4.1%
1723
 
3.5%
1325
 
2.7%
1063
 
2.2%
852
 
1.7%
818
 
1.7%
765
 
1.6%
Other values (110) 29295
59.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49016
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5488
 
11.2%
3590
 
7.3%
2081
 
4.2%
2016
 
4.1%
1723
 
3.5%
1325
 
2.7%
1063
 
2.2%
852
 
1.7%
818
 
1.7%
765
 
1.6%
Other values (110) 29295
59.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49016
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5488
 
11.2%
3590
 
7.3%
2081
 
4.2%
2016
 
4.1%
1723
 
3.5%
1325
 
2.7%
1063
 
2.2%
852
 
1.7%
818
 
1.7%
765
 
1.6%
Other values (110) 29295
59.8%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201907
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201907
2nd row201907
3rd row201907
4th row201907
5th row201907

Common Values

ValueCountFrequency (%)
201907 10000
100.0%

Length

2024-05-11T06:59:22.543718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:59:23.020228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201907 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6823
Distinct (%)68.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2885624.1
Minimum-19145800
Maximum2.7944257 × 108
Zeros1446
Zeros (%)14.5%
Negative12
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:59:23.350529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-19145800
5-th percentile0
Q157860
median323015
Q31426900
95-th percentile14073855
Maximum2.7944257 × 108
Range2.9858837 × 108
Interquartile range (IQR)1369040

Descriptive statistics

Standard deviation10290204
Coefficient of variation (CV)3.5660236
Kurtosis240.3056
Mean2885624.1
Median Absolute Deviation (MAD)323015
Skewness12.362524
Sum2.8856241 × 1010
Variance1.058883 × 1014
MonotonicityNot monotonic
2024-05-11T06:59:23.918304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1446
 
14.5%
200000 83
 
0.8%
300000 57
 
0.6%
100000 52
 
0.5%
150000 43
 
0.4%
500000 37
 
0.4%
50000 34
 
0.3%
400000 32
 
0.3%
450000 29
 
0.3%
180000 26
 
0.3%
Other values (6813) 8161
81.6%
ValueCountFrequency (%)
-19145800 1
< 0.1%
-6281240 1
< 0.1%
-5369950 1
< 0.1%
-1126430 1
< 0.1%
-695680 1
< 0.1%
-280100 1
< 0.1%
-150000 1
< 0.1%
-73690 1
< 0.1%
-69170 1
< 0.1%
-66000 1
< 0.1%
ValueCountFrequency (%)
279442572 1
< 0.1%
256949930 1
< 0.1%
256549930 1
< 0.1%
255035760 1
< 0.1%
246583458 1
< 0.1%
205817991 1
< 0.1%
159976420 1
< 0.1%
143732894 1
< 0.1%
117617066 1
< 0.1%
117052312 1
< 0.1%

Interactions

2024-05-11T06:59:13.071723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:59:24.249758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.550
금액0.5501.000

Missing values

2024-05-11T06:59:13.473662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:59:14.010062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
74710여의도한양A15088918재활용품비용2019071345000
79213천왕이펜하우스2단지A15213003부과차익2019071370
50424잠원한신로얄A13790706기타부대비201907389040
83410보라매아카데미타워아파트A15601002보험료201907569991
3898e편한세상신촌아파트A10027213공동전기료20190712222932
54726가락래미안파크팰리스A13881005잡비용2019075463550
6286정릉꿈에그린아파트A10028000고용보험료20190770700
3562도봉숲 아뜨리움A10027136지급수수료201907200000
7651황학아크로타워A10086801보험료201907432730
65353산천리버힐제2A14076401고용안정사업수익201907500000
아파트명아파트코드비용명년월일금액
13047마포자이2차A12172401장기수선비2019074340000
94851목동우성2차A15807703급여20190721528160
93926신트리1단지A15807002선거관리위원회운영비201907560000
77281봉천동아제2A15192202보험료201907278420
97434은평뉴타운박석고개1단지A41279910부과차익201907130
23412신내5단지대림두산A13184610고용보험료201907308420
42163개포주공5단지A13599402주차장수익2019071433520
65051동부센트레빌아스테리움A14070901음식물처리비201907360000
6242LH강남힐스테이트A10027985잡수익2019078636702
88384염창1차보람더하임아파트A15704007복리후생비201907479770