Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2199 (22.0%) zerosZeros

Reproduction

Analysis started2024-05-11 06:00:19.272426
Analysis finished2024-05-11 06:00:20.291037
Duration1.02 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2176
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:20.571629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length21
Mean length7.2214
Min length2

Characters and Unicode

Total characters72214
Distinct characters433
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)0.9%

Sample

1st row천호한신
2nd row장안위더스빌
3rd row길음삼부
4th row목동10단지
5th row양평경남1차
ValueCountFrequency (%)
아파트 135
 
1.3%
래미안 37
 
0.3%
힐스테이트 19
 
0.2%
sk뷰 17
 
0.2%
아이파크 17
 
0.2%
서울숲2차푸르지오임대 16
 
0.2%
북한산 15
 
0.1%
신반포 14
 
0.1%
해모로 14
 
0.1%
고덕 14
 
0.1%
Other values (2237) 10318
97.2%
2024-05-11T15:00:21.239932image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2374
 
3.3%
2300
 
3.2%
2081
 
2.9%
1861
 
2.6%
1750
 
2.4%
1689
 
2.3%
1565
 
2.2%
1545
 
2.1%
1401
 
1.9%
1387
 
1.9%
Other values (423) 54261
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66074
91.5%
Decimal Number 3724
 
5.2%
Uppercase Letter 758
 
1.0%
Space Separator 694
 
1.0%
Lowercase Letter 363
 
0.5%
Open Punctuation 165
 
0.2%
Close Punctuation 165
 
0.2%
Dash Punctuation 157
 
0.2%
Other Punctuation 106
 
0.1%
Math Symbol 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2374
 
3.6%
2300
 
3.5%
2081
 
3.1%
1861
 
2.8%
1750
 
2.6%
1689
 
2.6%
1565
 
2.4%
1545
 
2.3%
1401
 
2.1%
1387
 
2.1%
Other values (377) 48121
72.8%
Uppercase Letter
ValueCountFrequency (%)
S 139
18.3%
K 105
13.9%
C 87
11.5%
L 57
7.5%
H 54
 
7.1%
M 52
 
6.9%
D 52
 
6.9%
E 43
 
5.7%
I 42
 
5.5%
V 32
 
4.2%
Other values (7) 95
12.5%
Lowercase Letter
ValueCountFrequency (%)
e 211
58.1%
l 38
 
10.5%
i 29
 
8.0%
v 22
 
6.1%
s 21
 
5.8%
k 18
 
5.0%
w 8
 
2.2%
c 6
 
1.7%
h 6
 
1.7%
g 2
 
0.6%
Decimal Number
ValueCountFrequency (%)
1 1157
31.1%
2 1039
27.9%
3 503
13.5%
4 248
 
6.7%
5 214
 
5.7%
6 158
 
4.2%
7 131
 
3.5%
8 99
 
2.7%
0 92
 
2.5%
9 83
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 79
74.5%
. 27
 
25.5%
Space Separator
ValueCountFrequency (%)
694
100.0%
Open Punctuation
ValueCountFrequency (%)
( 165
100.0%
Close Punctuation
ValueCountFrequency (%)
) 165
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 157
100.0%
Math Symbol
ValueCountFrequency (%)
~ 4
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66074
91.5%
Common 5015
 
6.9%
Latin 1125
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2374
 
3.6%
2300
 
3.5%
2081
 
3.1%
1861
 
2.8%
1750
 
2.6%
1689
 
2.6%
1565
 
2.4%
1545
 
2.3%
1401
 
2.1%
1387
 
2.1%
Other values (377) 48121
72.8%
Latin
ValueCountFrequency (%)
e 211
18.8%
S 139
12.4%
K 105
 
9.3%
C 87
 
7.7%
L 57
 
5.1%
H 54
 
4.8%
M 52
 
4.6%
D 52
 
4.6%
E 43
 
3.8%
I 42
 
3.7%
Other values (19) 283
25.2%
Common
ValueCountFrequency (%)
1 1157
23.1%
2 1039
20.7%
694
13.8%
3 503
10.0%
4 248
 
4.9%
5 214
 
4.3%
( 165
 
3.3%
) 165
 
3.3%
6 158
 
3.2%
- 157
 
3.1%
Other values (7) 515
10.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66074
91.5%
ASCII 6136
 
8.5%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2374
 
3.6%
2300
 
3.5%
2081
 
3.1%
1861
 
2.8%
1750
 
2.6%
1689
 
2.6%
1565
 
2.4%
1545
 
2.3%
1401
 
2.1%
1387
 
2.1%
Other values (377) 48121
72.8%
ASCII
ValueCountFrequency (%)
1 1157
18.9%
2 1039
16.9%
694
11.3%
3 503
 
8.2%
4 248
 
4.0%
5 214
 
3.5%
e 211
 
3.4%
( 165
 
2.7%
) 165
 
2.7%
6 158
 
2.6%
Other values (35) 1582
25.8%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2182
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:21.722339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)0.9%

Sample

1st rowA13486601
2nd rowA13078701
3rd rowA13611004
4th rowA15873701
5th rowA15010302
ValueCountFrequency (%)
b13380801 13
 
0.1%
a12071002 12
 
0.1%
a13881701 12
 
0.1%
a12071102 12
 
0.1%
a15685206 12
 
0.1%
a13816101 11
 
0.1%
a13003005 11
 
0.1%
a13684605 11
 
0.1%
a15205104 11
 
0.1%
a13407002 11
 
0.1%
Other values (2172) 9884
98.8%
2024-05-11T15:00:22.374314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18204
20.2%
1 17663
19.6%
A 9978
11.1%
3 9016
10.0%
2 8118
9.0%
5 6227
 
6.9%
8 5789
 
6.4%
7 4802
 
5.3%
4 3766
 
4.2%
6 3419
 
3.8%
Other values (2) 3018
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18204
22.8%
1 17663
22.1%
3 9016
11.3%
2 8118
10.1%
5 6227
 
7.8%
8 5789
 
7.2%
7 4802
 
6.0%
4 3766
 
4.7%
6 3419
 
4.3%
9 2996
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9978
99.8%
B 22
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18204
22.8%
1 17663
22.1%
3 9016
11.3%
2 8118
10.1%
5 6227
 
7.8%
8 5789
 
7.2%
7 4802
 
6.0%
4 3766
 
4.7%
6 3419
 
4.3%
9 2996
 
3.7%
Latin
ValueCountFrequency (%)
A 9978
99.8%
B 22
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18204
20.2%
1 17663
19.6%
A 9978
11.1%
3 9016
10.0%
2 8118
9.0%
5 6227
 
6.9%
8 5789
 
6.4%
7 4802
 
5.3%
4 3766
 
4.2%
6 3419
 
3.8%
Other values (2) 3018
 
3.4%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:22.686591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length6.0151
Min length2

Characters and Unicode

Total characters60151
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row승강기유지비충당부채
2nd row관리비예치금
3rd row연차수당충당부채
4th row미부과관리비
5th row미지급금
ValueCountFrequency (%)
미처분이익잉여금 324
 
3.2%
수선유지비충당부채 320
 
3.2%
연차수당충당부채 313
 
3.1%
당기순이익 311
 
3.1%
예수금 310
 
3.1%
선급비용 304
 
3.0%
예금 303
 
3.0%
장기수선충당예금 301
 
3.0%
퇴직급여충당부채 300
 
3.0%
장기수선충당부채 299
 
3.0%
Other values (67) 6915
69.2%
2024-05-11T15:00:23.243301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4646
 
7.7%
3863
 
6.4%
3159
 
5.3%
3155
 
5.2%
3056
 
5.1%
2998
 
5.0%
2748
 
4.6%
2433
 
4.0%
1936
 
3.2%
1742
 
2.9%
Other values (97) 30415
50.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 60151
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4646
 
7.7%
3863
 
6.4%
3159
 
5.3%
3155
 
5.2%
3056
 
5.1%
2998
 
5.0%
2748
 
4.6%
2433
 
4.0%
1936
 
3.2%
1742
 
2.9%
Other values (97) 30415
50.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 60151
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4646
 
7.7%
3863
 
6.4%
3159
 
5.3%
3155
 
5.2%
3056
 
5.1%
2998
 
5.0%
2748
 
4.6%
2433
 
4.0%
1936
 
3.2%
1742
 
2.9%
Other values (97) 30415
50.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 60151
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4646
 
7.7%
3863
 
6.4%
3159
 
5.3%
3155
 
5.2%
3056
 
5.1%
2998
 
5.0%
2748
 
4.6%
2433
 
4.0%
1936
 
3.2%
1742
 
2.9%
Other values (97) 30415
50.6%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202005
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202005
2nd row202005
3rd row202005
4th row202005
5th row202005

Common Values

ValueCountFrequency (%)
202005 10000
100.0%

Length

2024-05-11T15:00:23.771169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:00:23.888336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202005 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7459
Distinct (%)74.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75018849
Minimum-3.7314823 × 108
Maximum8.9980122 × 109
Zeros2199
Zeros (%)22.0%
Negative312
Negative (%)3.1%
Memory size166.0 KiB
2024-05-11T15:00:24.033426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-3.7314823 × 108
5-th percentile0
Q10
median3150368
Q334787275
95-th percentile3.6606834 × 108
Maximum8.9980122 × 109
Range9.3711605 × 109
Interquartile range (IQR)34787275

Descriptive statistics

Standard deviation3.0069243 × 108
Coefficient of variation (CV)4.008225
Kurtosis258.87072
Mean75018849
Median Absolute Deviation (MAD)3150368
Skewness12.693987
Sum7.5018849 × 1011
Variance9.0415936 × 1016
MonotonicityNot monotonic
2024-05-11T15:00:24.232092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2199
 
22.0%
500000 31
 
0.3%
250000 20
 
0.2%
300000 13
 
0.1%
1000000 11
 
0.1%
242000 11
 
0.1%
400000 10
 
0.1%
10000 9
 
0.1%
484000 9
 
0.1%
100000 9
 
0.1%
Other values (7449) 7678
76.8%
ValueCountFrequency (%)
-373148226 1
< 0.1%
-274511234 1
< 0.1%
-243688934 1
< 0.1%
-185960982 1
< 0.1%
-135178250 1
< 0.1%
-133385584 1
< 0.1%
-125211700 1
< 0.1%
-121299737 1
< 0.1%
-116349116 1
< 0.1%
-102207246 1
< 0.1%
ValueCountFrequency (%)
8998012236 1
< 0.1%
8959412051 1
< 0.1%
7301297897 1
< 0.1%
5987905932 1
< 0.1%
5409417731 1
< 0.1%
5115662860 1
< 0.1%
5084404131 1
< 0.1%
4443367362 1
< 0.1%
4350317710 1
< 0.1%
4034231841 1
< 0.1%

Interactions

2024-05-11T15:00:19.921202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T15:00:24.397243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.469
금액0.4691.000

Missing values

2024-05-11T15:00:20.081637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:00:20.221400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
25778천호한신A13486601승강기유지비충당부채2020050
14758장안위더스빌A13078701관리비예치금20200542821000
31128길음삼부A13611004연차수당충당부채20200512240025
67090목동10단지A15873701미부과관리비202005561499445
50784양평경남1차A15010302미지급금2020054249480
15418제기안암골벽산A13086101공동체활성화단체지원적립금2020051417640
66956신월삼정그린뷰A15809402현금2020055230
19355창동주공17단지A13204408장기수선충당부채2020051706170486
7939북가좌삼호A12013202주차장충당부채2020050
36871오금대림A13813008주차장충당부채2020050
아파트명아파트코드비용명년월일금액
68524은평뉴타운박석고개1단지A41279910미지급금202005194180405
66697목동우성2차A15807703선급비용20200523756280
1139백련산 sk뷰 아이파크A10025310미처분이익잉여금20200517670340
28687역삼2차아이파크A13579503연차수당충당부채2020050
64607방화청솔3단지A15785709선급금2020053864690
42247공릉대주파크빌A13980706공동주택적립금20200522194591
51380한강아파트A15080501상여충당부채2020050
50455문래현대5차아파트A15009504예수금2020051020680
64569한숲대림아파트A15785703기타당좌자산2020050
42394상계불암대림A13981006주차장충당부채2020050