Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 is highly skewed (γ1 = 34.02405411)Skewed
금액 has 1314 (13.1%) zerosZeros

Reproduction

Analysis started2024-05-11 06:53:01.646577
Analysis finished2024-05-11 06:53:03.839022
Duration2.19 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2092
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:04.067645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.3131
Min length2

Characters and Unicode

Total characters73131
Distinct characters426
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique111 ?
Unique (%)1.1%

Sample

1st row마곡금호어울림
2nd row종암SK
3rd row돈암삼성
4th row신수현대
5th row화곡푸르지오
ValueCountFrequency (%)
아파트 155
 
1.4%
래미안 53
 
0.5%
e편한세상 26
 
0.2%
북한산 21
 
0.2%
아이파크 21
 
0.2%
신반포 17
 
0.2%
신동아아파트 17
 
0.2%
브라운스톤 15
 
0.1%
길음뉴타운7단지 15
 
0.1%
삼성산주공3단지 15
 
0.1%
Other values (2170) 10425
96.7%
2024-05-11T06:53:05.060152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2729
 
3.7%
2610
 
3.6%
2459
 
3.4%
1822
 
2.5%
1674
 
2.3%
1528
 
2.1%
1496
 
2.0%
1464
 
2.0%
1343
 
1.8%
1274
 
1.7%
Other values (416) 54732
74.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67279
92.0%
Decimal Number 3390
 
4.6%
Space Separator 869
 
1.2%
Uppercase Letter 797
 
1.1%
Lowercase Letter 277
 
0.4%
Close Punctuation 157
 
0.2%
Open Punctuation 157
 
0.2%
Other Punctuation 111
 
0.2%
Dash Punctuation 93
 
0.1%
Letter Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2729
 
4.1%
2610
 
3.9%
2459
 
3.7%
1822
 
2.7%
1674
 
2.5%
1528
 
2.3%
1496
 
2.2%
1464
 
2.2%
1343
 
2.0%
1274
 
1.9%
Other values (372) 48880
72.7%
Uppercase Letter
ValueCountFrequency (%)
S 125
15.7%
C 114
14.3%
K 95
11.9%
D 77
9.7%
M 77
9.7%
L 65
8.2%
H 48
 
6.0%
I 40
 
5.0%
E 37
 
4.6%
G 34
 
4.3%
Other values (7) 85
10.7%
Decimal Number
ValueCountFrequency (%)
1 1006
29.7%
2 984
29.0%
3 497
14.7%
5 216
 
6.4%
4 209
 
6.2%
6 128
 
3.8%
7 117
 
3.5%
9 95
 
2.8%
8 87
 
2.6%
0 51
 
1.5%
Lowercase Letter
ValueCountFrequency (%)
e 179
64.6%
i 20
 
7.2%
l 20
 
7.2%
k 14
 
5.1%
v 12
 
4.3%
s 10
 
3.6%
c 8
 
2.9%
w 6
 
2.2%
g 4
 
1.4%
a 4
 
1.4%
Other Punctuation
ValueCountFrequency (%)
, 89
80.2%
. 22
 
19.8%
Space Separator
ValueCountFrequency (%)
869
100.0%
Close Punctuation
ValueCountFrequency (%)
) 157
100.0%
Open Punctuation
ValueCountFrequency (%)
( 157
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 93
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67279
92.0%
Common 4777
 
6.5%
Latin 1075
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2729
 
4.1%
2610
 
3.9%
2459
 
3.7%
1822
 
2.7%
1674
 
2.5%
1528
 
2.3%
1496
 
2.2%
1464
 
2.2%
1343
 
2.0%
1274
 
1.9%
Other values (372) 48880
72.7%
Latin
ValueCountFrequency (%)
e 179
16.7%
S 125
11.6%
C 114
10.6%
K 95
8.8%
D 77
 
7.2%
M 77
 
7.2%
L 65
 
6.0%
H 48
 
4.5%
I 40
 
3.7%
E 37
 
3.4%
Other values (18) 218
20.3%
Common
ValueCountFrequency (%)
1 1006
21.1%
2 984
20.6%
869
18.2%
3 497
10.4%
5 216
 
4.5%
4 209
 
4.4%
) 157
 
3.3%
( 157
 
3.3%
6 128
 
2.7%
7 117
 
2.4%
Other values (6) 437
9.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67279
92.0%
ASCII 5851
 
8.0%
Number Forms 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2729
 
4.1%
2610
 
3.9%
2459
 
3.7%
1822
 
2.7%
1674
 
2.5%
1528
 
2.3%
1496
 
2.2%
1464
 
2.2%
1343
 
2.0%
1274
 
1.9%
Other values (372) 48880
72.7%
ASCII
ValueCountFrequency (%)
1 1006
17.2%
2 984
16.8%
869
14.9%
3 497
 
8.5%
5 216
 
3.7%
4 209
 
3.6%
e 179
 
3.1%
) 157
 
2.7%
( 157
 
2.7%
6 128
 
2.2%
Other values (33) 1449
24.8%
Number Forms
ValueCountFrequency (%)
1
100.0%
Distinct2096
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:06.221018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique111 ?
Unique (%)1.1%

Sample

1st rowA15721001
2nd rowA13671205
3rd rowA13606107
4th rowA12185603
5th rowA15792602
ValueCountFrequency (%)
a13679403 15
 
0.1%
a15101506 15
 
0.1%
a13302001 14
 
0.1%
a13790714 13
 
0.1%
a13983004 13
 
0.1%
a10026051 12
 
0.1%
a15701007 12
 
0.1%
a10026879 12
 
0.1%
a13922002 12
 
0.1%
a14320302 12
 
0.1%
Other values (2086) 9870
98.7%
2024-05-11T06:53:07.658594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18831
20.9%
1 17596
19.6%
A 10000
11.1%
3 8903
9.9%
2 8509
9.5%
5 5974
 
6.6%
8 5225
 
5.8%
7 4545
 
5.1%
4 4056
 
4.5%
6 3490
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18831
23.5%
1 17596
22.0%
3 8903
11.1%
2 8509
10.6%
5 5974
 
7.5%
8 5225
 
6.5%
7 4545
 
5.7%
4 4056
 
5.1%
6 3490
 
4.4%
9 2871
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18831
23.5%
1 17596
22.0%
3 8903
11.1%
2 8509
10.6%
5 5974
 
7.5%
8 5225
 
6.5%
7 4545
 
5.7%
4 4056
 
5.1%
6 3490
 
4.4%
9 2871
 
3.6%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18831
20.9%
1 17596
19.6%
A 10000
11.1%
3 8903
9.9%
2 8509
9.5%
5 5974
 
6.6%
8 5225
 
5.8%
7 4545
 
5.1%
4 4056
 
4.5%
6 3490
 
3.9%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:08.693973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8769
Min length2

Characters and Unicode

Total characters48769
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row고용보험료
2nd row지급수수료
3rd row재활용품수익
4th row세대수도료
5th row정화조관리비
ValueCountFrequency (%)
소독비 229
 
2.3%
교육비 218
 
2.2%
경비비 215
 
2.1%
보험료 215
 
2.1%
청소비 205
 
2.1%
제수당 204
 
2.0%
연체료수익 204
 
2.0%
승강기유지비 203
 
2.0%
수선유지비 202
 
2.0%
통신비 201
 
2.0%
Other values (76) 7904
79.0%
2024-05-11T06:53:10.080370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5383
 
11.0%
3581
 
7.3%
2102
 
4.3%
2064
 
4.2%
1677
 
3.4%
1287
 
2.6%
988
 
2.0%
839
 
1.7%
828
 
1.7%
777
 
1.6%
Other values (110) 29243
60.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48769
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5383
 
11.0%
3581
 
7.3%
2102
 
4.3%
2064
 
4.2%
1677
 
3.4%
1287
 
2.6%
988
 
2.0%
839
 
1.7%
828
 
1.7%
777
 
1.6%
Other values (110) 29243
60.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48769
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5383
 
11.0%
3581
 
7.3%
2102
 
4.3%
2064
 
4.2%
1677
 
3.4%
1287
 
2.6%
988
 
2.0%
839
 
1.7%
828
 
1.7%
777
 
1.6%
Other values (110) 29243
60.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48769
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5383
 
11.0%
3581
 
7.3%
2102
 
4.3%
2064
 
4.2%
1677
 
3.4%
1287
 
2.6%
988
 
2.0%
839
 
1.7%
828
 
1.7%
777
 
1.6%
Other values (110) 29243
60.0%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202112
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202112
2nd row202112
3rd row202112
4th row202112
5th row202112

Common Values

ValueCountFrequency (%)
202112 10000
100.0%

Length

2024-05-11T06:53:10.587396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:53:11.036175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202112 10000
100.0%

금액
Real number (ℝ)

SKEWED  ZEROS 

Distinct6999
Distinct (%)70.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3735806
Minimum-21299880
Maximum1.2693154 × 109
Zeros1314
Zeros (%)13.1%
Negative23
Negative (%)0.2%
Memory size166.0 KiB
2024-05-11T06:53:11.451823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-21299880
5-th percentile0
Q167385
median300000
Q31321370
95-th percentile17068653
Maximum1.2693154 × 109
Range1.2906152 × 109
Interquartile range (IQR)1253985

Descriptive statistics

Standard deviation19335901
Coefficient of variation (CV)5.1758312
Kurtosis1932.3358
Mean3735806
Median Absolute Deviation (MAD)300000
Skewness34.024054
Sum3.735806 × 1010
Variance3.7387708 × 1014
MonotonicityNot monotonic
2024-05-11T06:53:11.989574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1314
 
13.1%
200000 79
 
0.8%
300000 64
 
0.6%
100000 58
 
0.6%
150000 47
 
0.5%
400000 40
 
0.4%
250000 31
 
0.3%
110000 29
 
0.3%
120000 29
 
0.3%
30000 27
 
0.3%
Other values (6989) 8282
82.8%
ValueCountFrequency (%)
-21299880 1
< 0.1%
-5799780 1
< 0.1%
-3127410 1
< 0.1%
-2181757 1
< 0.1%
-2167500 1
< 0.1%
-1518182 1
< 0.1%
-876250 1
< 0.1%
-430046 1
< 0.1%
-311180 1
< 0.1%
-150000 1
< 0.1%
ValueCountFrequency (%)
1269315353 1
< 0.1%
455449290 1
< 0.1%
424933640 1
< 0.1%
388760071 1
< 0.1%
347307324 1
< 0.1%
264944260 1
< 0.1%
245474261 1
< 0.1%
236899110 1
< 0.1%
235353850 1
< 0.1%
230422927 1
< 0.1%

Interactions

2024-05-11T06:53:02.613812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:53:12.372220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.334
금액0.3341.000

Missing values

2024-05-11T06:53:03.157720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:53:03.672751image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
95663마곡금호어울림A15721001고용보험료20211253320
51086종암SKA13671205지급수수료20211215000
48654돈암삼성A13606107재활용품수익2021120
20067신수현대A12185603세대수도료2021124325110
99820화곡푸르지오A15792602정화조관리비2021122219520
24521브라운스톤휘경A13009003도서인쇄비202112238000
7494e편한세상신촌아파트A10026370광고료수익202112400000
82499관악푸르지오아파트A15105302충당부채전입이자비용202112168396
76479여의도금호리첸시아A15001005보험료2021124270450
90193보라매e편한세상A15601003급여20211211619880
아파트명아파트코드비용명년월일금액
56729강변아파트A13790714세대전기료20211220719270
7486e편한세상신촌아파트A10026370잡비용2021123764530
20168대원칸타빌A12185605충당부채전입이자비용2021122200
81294여의도한양A15088918선거관리위원회운영비202112800000
32814벽산1A13276417재활용품비용202112400000
27277면목마젤란A13120001도서인쇄비202112157000
50774장위참누리A13614302경비비20211213971550
53433서초포레스타5단지A13716002광고료수익202112100000
30866쌍문동북한산월드메르디앙A13203001보험료202112292490
80888양평신동아아파트A15086202입주자대표회의운영비202112490000