Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 249 (2.5%) zerosZeros

Reproduction

Analysis started2024-05-11 06:55:16.308692
Analysis finished2024-05-11 06:55:18.045577
Duration1.74 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2150
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:18.335421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.3012
Min length2

Characters and Unicode

Total characters73012
Distinct characters434
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique109 ?
Unique (%)1.1%

Sample

1st row신대방한성
2nd row양재리본타워1단지
3rd row신림현대
4th rowDMC우방
5th row공릉2단지라이프
ValueCountFrequency (%)
아파트 163
 
1.5%
래미안 35
 
0.3%
아이파크 19
 
0.2%
신반포 18
 
0.2%
e편한세상 17
 
0.2%
고덕 17
 
0.2%
은평뉴타운상림마을6단지 15
 
0.1%
힐스테이트 15
 
0.1%
휘경 15
 
0.1%
목동파크자이아파트 14
 
0.1%
Other values (2218) 10340
96.9%
2024-05-11T06:55:19.179110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2532
 
3.5%
2355
 
3.2%
2204
 
3.0%
1814
 
2.5%
1796
 
2.5%
1721
 
2.4%
1565
 
2.1%
1483
 
2.0%
1407
 
1.9%
1406
 
1.9%
Other values (424) 54729
75.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66872
91.6%
Decimal Number 3745
 
5.1%
Space Separator 781
 
1.1%
Uppercase Letter 731
 
1.0%
Lowercase Letter 337
 
0.5%
Close Punctuation 143
 
0.2%
Open Punctuation 143
 
0.2%
Other Punctuation 143
 
0.2%
Dash Punctuation 113
 
0.2%
Math Symbol 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2532
 
3.8%
2355
 
3.5%
2204
 
3.3%
1814
 
2.7%
1796
 
2.7%
1721
 
2.6%
1565
 
2.3%
1483
 
2.2%
1407
 
2.1%
1406
 
2.1%
Other values (379) 48589
72.7%
Uppercase Letter
ValueCountFrequency (%)
S 111
15.2%
C 106
14.5%
K 88
12.0%
D 71
9.7%
M 71
9.7%
L 53
7.3%
H 40
 
5.5%
I 35
 
4.8%
E 31
 
4.2%
G 31
 
4.2%
Other values (7) 94
12.9%
Lowercase Letter
ValueCountFrequency (%)
e 188
55.8%
i 31
 
9.2%
l 28
 
8.3%
s 20
 
5.9%
v 19
 
5.6%
k 17
 
5.0%
w 12
 
3.6%
h 6
 
1.8%
c 6
 
1.8%
a 5
 
1.5%
Decimal Number
ValueCountFrequency (%)
1 1155
30.8%
2 1140
30.4%
3 473
12.6%
4 235
 
6.3%
5 199
 
5.3%
6 159
 
4.2%
8 113
 
3.0%
9 102
 
2.7%
7 92
 
2.5%
0 77
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 117
81.8%
. 26
 
18.2%
Space Separator
ValueCountFrequency (%)
781
100.0%
Close Punctuation
ValueCountFrequency (%)
) 143
100.0%
Open Punctuation
ValueCountFrequency (%)
( 143
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 113
100.0%
Math Symbol
ValueCountFrequency (%)
~ 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66872
91.6%
Common 5072
 
6.9%
Latin 1068
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2532
 
3.8%
2355
 
3.5%
2204
 
3.3%
1814
 
2.7%
1796
 
2.7%
1721
 
2.6%
1565
 
2.3%
1483
 
2.2%
1407
 
2.1%
1406
 
2.1%
Other values (379) 48589
72.7%
Latin
ValueCountFrequency (%)
e 188
17.6%
S 111
10.4%
C 106
 
9.9%
K 88
 
8.2%
D 71
 
6.6%
M 71
 
6.6%
L 53
 
5.0%
H 40
 
3.7%
I 35
 
3.3%
E 31
 
2.9%
Other values (18) 274
25.7%
Common
ValueCountFrequency (%)
1 1155
22.8%
2 1140
22.5%
781
15.4%
3 473
9.3%
4 235
 
4.6%
5 199
 
3.9%
6 159
 
3.1%
) 143
 
2.8%
( 143
 
2.8%
, 117
 
2.3%
Other values (7) 527
10.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66872
91.6%
ASCII 6140
 
8.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2532
 
3.8%
2355
 
3.5%
2204
 
3.3%
1814
 
2.7%
1796
 
2.7%
1721
 
2.6%
1565
 
2.3%
1483
 
2.2%
1407
 
2.1%
1406
 
2.1%
Other values (379) 48589
72.7%
ASCII
ValueCountFrequency (%)
1 1155
18.8%
2 1140
18.6%
781
12.7%
3 473
 
7.7%
4 235
 
3.8%
5 199
 
3.2%
e 188
 
3.1%
6 159
 
2.6%
) 143
 
2.3%
( 143
 
2.3%
Other values (35) 1524
24.8%
Distinct2157
Distinct (%)21.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:19.999690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique110 ?
Unique (%)1.1%

Sample

1st rowA15601202
2nd rowA13713001
3rd rowA15101508
4th rowA12294102
5th rowA13980510
ValueCountFrequency (%)
a10025729 14
 
0.1%
a13790703 13
 
0.1%
a13922904 13
 
0.1%
a15701007 13
 
0.1%
a15186002 13
 
0.1%
a13552002 13
 
0.1%
a41279906 12
 
0.1%
a41279930 11
 
0.1%
a41279904 11
 
0.1%
a13994501 11
 
0.1%
Other values (2147) 9876
98.8%
2024-05-11T06:55:21.197255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18562
20.6%
1 17413
19.3%
A 9994
11.1%
3 8604
9.6%
2 8394
9.3%
5 6369
 
7.1%
8 5575
 
6.2%
7 4780
 
5.3%
4 3914
 
4.3%
6 3390
 
3.8%
Other values (2) 3005
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18562
23.2%
1 17413
21.8%
3 8604
10.8%
2 8394
10.5%
5 6369
 
8.0%
8 5575
 
7.0%
7 4780
 
6.0%
4 3914
 
4.9%
6 3390
 
4.2%
9 2999
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9994
99.9%
B 6
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18562
23.2%
1 17413
21.8%
3 8604
10.8%
2 8394
10.5%
5 6369
 
8.0%
8 5575
 
7.0%
7 4780
 
6.0%
4 3914
 
4.9%
6 3390
 
4.2%
9 2999
 
3.7%
Latin
ValueCountFrequency (%)
A 9994
99.9%
B 6
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18562
20.6%
1 17413
19.3%
A 9994
11.1%
3 8604
9.6%
2 8394
9.3%
5 6369
 
7.1%
8 5575
 
6.2%
7 4780
 
5.3%
4 3914
 
4.3%
6 3390
 
3.8%
Other values (2) 3005
 
3.3%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:21.738609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.7897
Min length2

Characters and Unicode

Total characters47897
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row검침비용
2nd row세대전기료
3rd row연체료수익
4th row국민연금
5th row산재보험료
ValueCountFrequency (%)
급여 264
 
2.6%
도서인쇄비 251
 
2.5%
통신비 249
 
2.5%
소독비 246
 
2.5%
보험료 243
 
2.4%
승강기유지비 239
 
2.4%
수선유지비 238
 
2.4%
경비비 235
 
2.4%
청소비 235
 
2.4%
입주자대표회의운영비 231
 
2.3%
Other values (76) 7569
75.7%
2024-05-11T06:55:22.618062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5425
 
11.3%
3530
 
7.4%
2240
 
4.7%
1742
 
3.6%
1643
 
3.4%
1363
 
2.8%
1111
 
2.3%
933
 
1.9%
884
 
1.8%
844
 
1.8%
Other values (110) 28182
58.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47897
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5425
 
11.3%
3530
 
7.4%
2240
 
4.7%
1742
 
3.6%
1643
 
3.4%
1363
 
2.8%
1111
 
2.3%
933
 
1.9%
884
 
1.8%
844
 
1.8%
Other values (110) 28182
58.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 47897
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5425
 
11.3%
3530
 
7.4%
2240
 
4.7%
1742
 
3.6%
1643
 
3.4%
1363
 
2.8%
1111
 
2.3%
933
 
1.9%
884
 
1.8%
844
 
1.8%
Other values (110) 28182
58.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 47897
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5425
 
11.3%
3530
 
7.4%
2240
 
4.7%
1742
 
3.6%
1643
 
3.4%
1363
 
2.8%
1111
 
2.3%
933
 
1.9%
884
 
1.8%
844
 
1.8%
Other values (110) 28182
58.8%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202101
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202101
2nd row202101
3rd row202101
4th row202101
5th row202101

Common Values

ValueCountFrequency (%)
202101 10000
100.0%

Length

2024-05-11T06:55:23.030882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:55:23.322952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202101 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7658
Distinct (%)76.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4098677.9
Minimum-990100
Maximum3.3125822 × 108
Zeros249
Zeros (%)2.5%
Negative8
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:55:23.656404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-990100
5-th percentile3678
Q1125125
median390140
Q31624180
95-th percentile19048561
Maximum3.3125822 × 108
Range3.3224832 × 108
Interquartile range (IQR)1499055

Descriptive statistics

Standard deviation15536233
Coefficient of variation (CV)3.7905474
Kurtosis155.46354
Mean4098677.9
Median Absolute Deviation (MAD)340140
Skewness10.606779
Sum4.0986779 × 1010
Variance2.4137454 × 1014
MonotonicityNot monotonic
2024-05-11T06:55:24.099601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 249
 
2.5%
200000 117
 
1.2%
100000 70
 
0.7%
300000 64
 
0.6%
400000 48
 
0.5%
500000 46
 
0.5%
180000 46
 
0.5%
250000 42
 
0.4%
150000 41
 
0.4%
50000 38
 
0.4%
Other values (7648) 9239
92.4%
ValueCountFrequency (%)
-990100 1
 
< 0.1%
-627830 1
 
< 0.1%
-210000 1
 
< 0.1%
-51357 1
 
< 0.1%
-46030 1
 
< 0.1%
-10000 1
 
< 0.1%
-2730 1
 
< 0.1%
-270 1
 
< 0.1%
0 249
2.5%
1 2
 
< 0.1%
ValueCountFrequency (%)
331258220 1
< 0.1%
324879200 1
< 0.1%
314204715 1
< 0.1%
302651080 1
< 0.1%
286848090 1
< 0.1%
280610200 1
< 0.1%
270605080 1
< 0.1%
265100535 1
< 0.1%
253122000 1
< 0.1%
246239440 1
< 0.1%

Interactions

2024-05-11T06:55:17.163983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:55:24.361020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.527
금액0.5271.000

Missing values

2024-05-11T06:55:17.592350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:55:17.914423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
74948신대방한성A15601202검침비용20210160000
43234양재리본타워1단지A13713001세대전기료20210114016780
67704신림현대A15101508연체료수익202101165800
18436DMC우방A12294102국민연금20210192250
53807공릉2단지라이프A13980510산재보험료202101119250
12519북아현두산A12079501검침비용202101365000
66588보라매경남아너스빌A15086006퇴직급여2021012249810
1501구로항동제일풍경채포레스트A10024927세대난방비20210123898530
1617신촌숲 아이파크 아파트A10024974사무용품비202101737900
48668신천장미1차2차A13824005보험료2021011538530
아파트명아파트코드비용명년월일금액
22569면목늘푸른동아아파트A13183504재활용품비용202101160000
5374꿈의숲코오롱하늘채아파트A10026571위탁관리수수료202101238887
63318당산금호어울림A15004403시설보수비202101377560
29257옥수현대A13376702위탁관리수수료202101556770
32460길동삼익파크A13470101공동수도료2021011540135
17761갈현한솔아파트A12281801정화조관리비202101350940
16047DMC마포청구아파트A12187904잡수익202101510
87286은평뉴타운구파발10단지2관리A41279928세대전기료20210121120790
65442당산진로A15080002복리후생비202101577450
80718가양한보구암A15780602고용안정사업비용202101340000