Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1759 (17.6%) zerosZeros

Reproduction

Analysis started2024-05-11 06:58:16.070244
Analysis finished2024-05-11 06:58:17.906801
Duration1.84 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2081
Distinct (%)20.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:18.231857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1121
Min length2

Characters and Unicode

Total characters71121
Distinct characters426
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique104 ?
Unique (%)1.0%

Sample

1st row은평뉴타운상림마을13단지
2nd row등촌라인
3rd row가락프라자
4th row힐스테이트 백련산4차 아파트
5th row신도림롯데
ValueCountFrequency (%)
아파트 132
 
1.2%
래미안 39
 
0.4%
신반포 17
 
0.2%
힐스테이트 17
 
0.2%
신동아파밀리에 15
 
0.1%
고덕 14
 
0.1%
신내 14
 
0.1%
여의도진주 14
 
0.1%
가양대림경동 13
 
0.1%
잠원신화 13
 
0.1%
Other values (2137) 10304
97.3%
2024-05-11T06:58:19.222631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2301
 
3.2%
2191
 
3.1%
2019
 
2.8%
1843
 
2.6%
1770
 
2.5%
1670
 
2.3%
1544
 
2.2%
1543
 
2.2%
1384
 
1.9%
1304
 
1.8%
Other values (416) 53552
75.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65266
91.8%
Decimal Number 3538
 
5.0%
Uppercase Letter 732
 
1.0%
Space Separator 648
 
0.9%
Lowercase Letter 375
 
0.5%
Open Punctuation 152
 
0.2%
Close Punctuation 152
 
0.2%
Dash Punctuation 131
 
0.2%
Other Punctuation 115
 
0.2%
Letter Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2301
 
3.5%
2191
 
3.4%
2019
 
3.1%
1843
 
2.8%
1770
 
2.7%
1670
 
2.6%
1544
 
2.4%
1543
 
2.4%
1384
 
2.1%
1304
 
2.0%
Other values (370) 47697
73.1%
Uppercase Letter
ValueCountFrequency (%)
S 143
19.5%
K 104
14.2%
C 87
11.9%
L 49
 
6.7%
H 47
 
6.4%
G 41
 
5.6%
D 40
 
5.5%
M 40
 
5.5%
I 39
 
5.3%
E 38
 
5.2%
Other values (7) 104
14.2%
Lowercase Letter
ValueCountFrequency (%)
e 184
49.1%
l 50
 
13.3%
i 42
 
11.2%
v 35
 
9.3%
s 16
 
4.3%
k 14
 
3.7%
w 14
 
3.7%
c 8
 
2.1%
h 6
 
1.6%
a 3
 
0.8%
Decimal Number
ValueCountFrequency (%)
1 1091
30.8%
2 1008
28.5%
3 494
14.0%
4 262
 
7.4%
5 160
 
4.5%
6 157
 
4.4%
8 105
 
3.0%
7 99
 
2.8%
9 84
 
2.4%
0 78
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 105
91.3%
. 10
 
8.7%
Space Separator
ValueCountFrequency (%)
648
100.0%
Open Punctuation
ValueCountFrequency (%)
( 152
100.0%
Close Punctuation
ValueCountFrequency (%)
) 152
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 131
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%
Math Symbol
ValueCountFrequency (%)
~ 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65266
91.8%
Common 4742
 
6.7%
Latin 1113
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2301
 
3.5%
2191
 
3.4%
2019
 
3.1%
1843
 
2.8%
1770
 
2.7%
1670
 
2.6%
1544
 
2.4%
1543
 
2.4%
1384
 
2.1%
1304
 
2.0%
Other values (370) 47697
73.1%
Latin
ValueCountFrequency (%)
e 184
16.5%
S 143
12.8%
K 104
 
9.3%
C 87
 
7.8%
l 50
 
4.5%
L 49
 
4.4%
H 47
 
4.2%
i 42
 
3.8%
G 41
 
3.7%
D 40
 
3.6%
Other values (19) 326
29.3%
Common
ValueCountFrequency (%)
1 1091
23.0%
2 1008
21.3%
648
13.7%
3 494
10.4%
4 262
 
5.5%
5 160
 
3.4%
6 157
 
3.3%
( 152
 
3.2%
) 152
 
3.2%
- 131
 
2.8%
Other values (7) 487
10.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65266
91.8%
ASCII 5849
 
8.2%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2301
 
3.5%
2191
 
3.4%
2019
 
3.1%
1843
 
2.8%
1770
 
2.7%
1670
 
2.6%
1544
 
2.4%
1543
 
2.4%
1384
 
2.1%
1304
 
2.0%
Other values (370) 47697
73.1%
ASCII
ValueCountFrequency (%)
1 1091
18.7%
2 1008
17.2%
648
11.1%
3 494
 
8.4%
4 262
 
4.5%
e 184
 
3.1%
5 160
 
2.7%
6 157
 
2.7%
( 152
 
2.6%
) 152
 
2.6%
Other values (35) 1541
26.3%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2087
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:19.904816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique104 ?
Unique (%)1.0%

Sample

1st rowA12220002
2nd rowA15783806
3rd rowA13881204
4th rowA10026834
5th rowA15205511
ValueCountFrequency (%)
a15089513 14
 
0.1%
a15780703 13
 
0.1%
a13528103 13
 
0.1%
a15375809 13
 
0.1%
a13790703 13
 
0.1%
a10026734 12
 
0.1%
a15288004 12
 
0.1%
a15722102 12
 
0.1%
a15303401 12
 
0.1%
a15807210 11
 
0.1%
Other values (2077) 9875
98.8%
2024-05-11T06:58:21.064878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18652
20.7%
1 17705
19.7%
A 10000
11.1%
3 8883
9.9%
2 7889
8.8%
5 6308
 
7.0%
8 5785
 
6.4%
7 4738
 
5.3%
4 3666
 
4.1%
6 3530
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18652
23.3%
1 17705
22.1%
3 8883
11.1%
2 7889
9.9%
5 6308
 
7.9%
8 5785
 
7.2%
7 4738
 
5.9%
4 3666
 
4.6%
6 3530
 
4.4%
9 2844
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18652
23.3%
1 17705
22.1%
3 8883
11.1%
2 7889
9.9%
5 6308
 
7.9%
8 5785
 
7.2%
7 4738
 
5.9%
4 3666
 
4.6%
6 3530
 
4.4%
9 2844
 
3.6%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18652
20.7%
1 17705
19.7%
A 10000
11.1%
3 8883
9.9%
2 7889
8.8%
5 6308
 
7.0%
8 5785
 
6.4%
7 4738
 
5.3%
4 3666
 
4.1%
6 3530
 
3.9%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:21.667602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.903
Min length2

Characters and Unicode

Total characters49030
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row소독비
2nd row고용보험료
3rd row세금과공과
4th row세대전기료
5th row광고료수익
ValueCountFrequency (%)
경비비 223
 
2.2%
교육비 221
 
2.2%
청소비 219
 
2.2%
장기수선비 214
 
2.1%
세대전기료 211
 
2.1%
잡수익 209
 
2.1%
이자수익 206
 
2.1%
급여 204
 
2.0%
통신비 203
 
2.0%
소독비 203
 
2.0%
Other values (77) 7887
78.9%
2024-05-11T06:58:22.897521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5431
 
11.1%
3627
 
7.4%
2078
 
4.2%
2063
 
4.2%
1743
 
3.6%
1287
 
2.6%
1046
 
2.1%
862
 
1.8%
779
 
1.6%
730
 
1.5%
Other values (110) 29384
59.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49030
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5431
 
11.1%
3627
 
7.4%
2078
 
4.2%
2063
 
4.2%
1743
 
3.6%
1287
 
2.6%
1046
 
2.1%
862
 
1.8%
779
 
1.6%
730
 
1.5%
Other values (110) 29384
59.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49030
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5431
 
11.1%
3627
 
7.4%
2078
 
4.2%
2063
 
4.2%
1743
 
3.6%
1287
 
2.6%
1046
 
2.1%
862
 
1.8%
779
 
1.6%
730
 
1.5%
Other values (110) 29384
59.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49030
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5431
 
11.1%
3627
 
7.4%
2078
 
4.2%
2063
 
4.2%
1743
 
3.6%
1287
 
2.6%
1046
 
2.1%
862
 
1.8%
779
 
1.6%
730
 
1.5%
Other values (110) 29384
59.9%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201911
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201911
2nd row201911
3rd row201911
4th row201911
5th row201911

Common Values

ValueCountFrequency (%)
201911 10000
100.0%

Length

2024-05-11T06:58:23.371985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:58:23.793185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201911 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6541
Distinct (%)65.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2990892.6
Minimum-32769330
Maximum2.0096224 × 108
Zeros1759
Zeros (%)17.6%
Negative11
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:58:24.168710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-32769330
5-th percentile0
Q139807.5
median260410.5
Q31230445
95-th percentile14788429
Maximum2.0096224 × 108
Range2.3373158 × 108
Interquartile range (IQR)1190637.5

Descriptive statistics

Standard deviation10450711
Coefficient of variation (CV)3.4941779
Kurtosis107.82101
Mean2990892.6
Median Absolute Deviation (MAD)260410.5
Skewness8.7461757
Sum2.9908926 × 1010
Variance1.0921736 × 1014
MonotonicityNot monotonic
2024-05-11T06:58:24.694166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1759
 
17.6%
200000 72
 
0.7%
300000 67
 
0.7%
100000 66
 
0.7%
150000 46
 
0.5%
400000 42
 
0.4%
30000 36
 
0.4%
250000 34
 
0.3%
60000 32
 
0.3%
350000 28
 
0.3%
Other values (6531) 7818
78.2%
ValueCountFrequency (%)
-32769330 1
< 0.1%
-11423630 1
< 0.1%
-3090910 1
< 0.1%
-2727330 1
< 0.1%
-1988790 1
< 0.1%
-201100 1
< 0.1%
-147490 1
< 0.1%
-32000 2
< 0.1%
-27000 1
< 0.1%
-510 1
< 0.1%
ValueCountFrequency (%)
200962245 1
< 0.1%
189102270 1
< 0.1%
187379820 1
< 0.1%
179848890 1
< 0.1%
179574800 1
< 0.1%
176920440 1
< 0.1%
172230900 1
< 0.1%
157076306 1
< 0.1%
137892860 1
< 0.1%
131750720 1
< 0.1%

Interactions

2024-05-11T06:58:17.144143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:58:25.089775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.558
금액0.5581.000

Missing values

2024-05-11T06:58:17.440861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:58:17.768321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
18026은평뉴타운상림마을13단지A12220002소독비201911170000
93855등촌라인A15783806고용보험료20191183280
56416가락프라자A13881204세금과공과2019110
3518힐스테이트 백련산4차 아파트A10026834세대전기료20191144430080
79770신도림롯데A15205511광고료수익201911180000
37344천호현대타워A13487102사무용품비20191145400
85892신대방한성A15601202충당부채전입이자비용2019110
45294길음뉴타운8단지A13611008재활용품수익2019111796400
57009송파한양2차A13885304부과차손201911255
70017광진트라팰리스A14319305세대수도료2019114102120
아파트명아파트코드비용명년월일금액
96861신정동아이파크A15807210회계감사비2019110
33735옥수극동그린아파트A13384403통신비20191174400
69602자양대동A14319008고용안정사업수익2019111000000
38984압구정현대8차A13511201퇴직급여2019113284222
9720홍제유원하나제2A12009001승강기유지비20191175000
39441강남한양수자인A13520002연체료수익201911485810
98326신정이펜하우스3단지A15879502세대난방비20191138175470
30499마장SH-vill임대A13305005잡비용2019115286760
83839신도림우성1,2차A15288806보험료201911410065
43628브라운스톤동선A13603702정화조관리비2019110