Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2156 (21.6%) zerosZeros

Reproduction

Analysis started2024-05-11 06:01:28.113312
Analysis finished2024-05-11 06:01:28.868654
Duration0.76 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2111
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:01:29.050294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1789
Min length2

Characters and Unicode

Total characters71789
Distinct characters429
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique95 ?
Unique (%)0.9%

Sample

1st row서빙고신동아
2nd row신림금호타운1차
3rd row강남한양수자인
4th row가양2단지
5th row양평동6차현대아파트
ValueCountFrequency (%)
아파트 109
 
1.0%
래미안 26
 
0.2%
신동아파밀리에 18
 
0.2%
당산현대3차 14
 
0.1%
서울숲2차푸르지오임대 14
 
0.1%
코오롱하늘채아파트 14
 
0.1%
은평뉴타운상림마을6단지 13
 
0.1%
북한산 13
 
0.1%
구로신성미소지움 13
 
0.1%
목동10단지 12
 
0.1%
Other values (2166) 10198
97.6%
2024-05-11T15:01:29.574409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2273
 
3.2%
2229
 
3.1%
1921
 
2.7%
1873
 
2.6%
1823
 
2.5%
1697
 
2.4%
1544
 
2.2%
1464
 
2.0%
1423
 
2.0%
1374
 
1.9%
Other values (419) 54168
75.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65929
91.8%
Decimal Number 3841
 
5.4%
Uppercase Letter 672
 
0.9%
Space Separator 496
 
0.7%
Lowercase Letter 295
 
0.4%
Open Punctuation 143
 
0.2%
Close Punctuation 143
 
0.2%
Other Punctuation 141
 
0.2%
Dash Punctuation 125
 
0.2%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2273
 
3.4%
2229
 
3.4%
1921
 
2.9%
1873
 
2.8%
1823
 
2.8%
1697
 
2.6%
1544
 
2.3%
1464
 
2.2%
1423
 
2.2%
1374
 
2.1%
Other values (375) 48308
73.3%
Uppercase Letter
ValueCountFrequency (%)
S 111
16.5%
K 95
14.1%
C 74
11.0%
L 53
7.9%
D 47
7.0%
M 47
7.0%
I 37
 
5.5%
G 36
 
5.4%
E 34
 
5.1%
H 34
 
5.1%
Other values (7) 104
15.5%
Decimal Number
ValueCountFrequency (%)
1 1187
30.9%
2 1153
30.0%
3 523
13.6%
4 264
 
6.9%
5 203
 
5.3%
6 153
 
4.0%
7 105
 
2.7%
9 89
 
2.3%
0 83
 
2.2%
8 81
 
2.1%
Lowercase Letter
ValueCountFrequency (%)
e 184
62.4%
i 27
 
9.2%
l 22
 
7.5%
v 18
 
6.1%
w 12
 
4.1%
s 12
 
4.1%
k 7
 
2.4%
h 5
 
1.7%
g 4
 
1.4%
a 4
 
1.4%
Other Punctuation
ValueCountFrequency (%)
, 109
77.3%
. 32
 
22.7%
Space Separator
ValueCountFrequency (%)
496
100.0%
Open Punctuation
ValueCountFrequency (%)
( 143
100.0%
Close Punctuation
ValueCountFrequency (%)
) 143
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 125
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65929
91.8%
Common 4889
 
6.8%
Latin 971
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2273
 
3.4%
2229
 
3.4%
1921
 
2.9%
1873
 
2.8%
1823
 
2.8%
1697
 
2.6%
1544
 
2.3%
1464
 
2.2%
1423
 
2.2%
1374
 
2.1%
Other values (375) 48308
73.3%
Latin
ValueCountFrequency (%)
e 184
18.9%
S 111
11.4%
K 95
 
9.8%
C 74
 
7.6%
L 53
 
5.5%
D 47
 
4.8%
M 47
 
4.8%
I 37
 
3.8%
G 36
 
3.7%
E 34
 
3.5%
Other values (18) 253
26.1%
Common
ValueCountFrequency (%)
1 1187
24.3%
2 1153
23.6%
3 523
10.7%
496
10.1%
4 264
 
5.4%
5 203
 
4.2%
6 153
 
3.1%
( 143
 
2.9%
) 143
 
2.9%
- 125
 
2.6%
Other values (6) 499
10.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65929
91.8%
ASCII 5856
 
8.2%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2273
 
3.4%
2229
 
3.4%
1921
 
2.9%
1873
 
2.8%
1823
 
2.8%
1697
 
2.6%
1544
 
2.3%
1464
 
2.2%
1423
 
2.2%
1374
 
2.1%
Other values (375) 48308
73.3%
ASCII
ValueCountFrequency (%)
1 1187
20.3%
2 1153
19.7%
3 523
 
8.9%
496
 
8.5%
4 264
 
4.5%
5 203
 
3.5%
e 184
 
3.1%
6 153
 
2.6%
( 143
 
2.4%
) 143
 
2.4%
Other values (33) 1407
24.0%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2117
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:01:30.050744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique95 ?
Unique (%)0.9%

Sample

1st rowA14024002
2nd rowA15101901
3rd rowA13520002
4th rowA15780605
5th rowA15010307
ValueCountFrequency (%)
a15004406 14
 
0.1%
a15205301 13
 
0.1%
a13778201 12
 
0.1%
a15873701 12
 
0.1%
a15883201 12
 
0.1%
a13408003 11
 
0.1%
a13771601 11
 
0.1%
a15089307 11
 
0.1%
a13790726 11
 
0.1%
a13508011 11
 
0.1%
Other values (2107) 9882
98.8%
2024-05-11T15:01:30.637005image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18245
20.3%
1 17690
19.7%
A 9989
11.1%
3 9079
10.1%
2 7955
8.8%
5 6224
 
6.9%
8 5860
 
6.5%
7 4754
 
5.3%
4 3857
 
4.3%
6 3440
 
3.8%
Other values (2) 2907
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18245
22.8%
1 17690
22.1%
3 9079
11.3%
2 7955
9.9%
5 6224
 
7.8%
8 5860
 
7.3%
7 4754
 
5.9%
4 3857
 
4.8%
6 3440
 
4.3%
9 2896
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 9989
99.9%
B 11
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18245
22.8%
1 17690
22.1%
3 9079
11.3%
2 7955
9.9%
5 6224
 
7.8%
8 5860
 
7.3%
7 4754
 
5.9%
4 3857
 
4.8%
6 3440
 
4.3%
9 2896
 
3.6%
Latin
ValueCountFrequency (%)
A 9989
99.9%
B 11
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18245
20.3%
1 17690
19.7%
A 9989
11.1%
3 9079
10.1%
2 7955
8.8%
5 6224
 
6.9%
8 5860
 
6.5%
7 4754
 
5.3%
4 3857
 
4.3%
6 3440
 
3.8%
Other values (2) 2907
 
3.2%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:01:31.026378image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length5.9769
Min length2

Characters and Unicode

Total characters59769
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row퇴직급여충당예금
2nd row승강기유지비충당부채
3rd row예수금
4th row예금
5th row퇴직급여충당부채
ValueCountFrequency (%)
당기순이익 333
 
3.3%
미처분이익잉여금 330
 
3.3%
예금 325
 
3.2%
공동주택적립금 317
 
3.2%
예수금 316
 
3.2%
퇴직급여충당부채 316
 
3.2%
선급비용 312
 
3.1%
관리비미수금 307
 
3.1%
수선유지비충당부채 306
 
3.1%
장기수선충당예금 304
 
3.0%
Other values (67) 6834
68.3%
2024-05-11T15:01:31.590038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4715
 
7.9%
3778
 
6.3%
3255
 
5.4%
3067
 
5.1%
3022
 
5.1%
2901
 
4.9%
2606
 
4.4%
2313
 
3.9%
1990
 
3.3%
1808
 
3.0%
Other values (97) 30314
50.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59769
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4715
 
7.9%
3778
 
6.3%
3255
 
5.4%
3067
 
5.1%
3022
 
5.1%
2901
 
4.9%
2606
 
4.4%
2313
 
3.9%
1990
 
3.3%
1808
 
3.0%
Other values (97) 30314
50.7%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59769
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4715
 
7.9%
3778
 
6.3%
3255
 
5.4%
3067
 
5.1%
3022
 
5.1%
2901
 
4.9%
2606
 
4.4%
2313
 
3.9%
1990
 
3.3%
1808
 
3.0%
Other values (97) 30314
50.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59769
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4715
 
7.9%
3778
 
6.3%
3255
 
5.4%
3067
 
5.1%
3022
 
5.1%
2901
 
4.9%
2606
 
4.4%
2313
 
3.9%
1990
 
3.3%
1808
 
3.0%
Other values (97) 30314
50.7%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201907
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201907
2nd row201907
3rd row201907
4th row201907
5th row201907

Common Values

ValueCountFrequency (%)
201907 10000
100.0%

Length

2024-05-11T15:01:31.806692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:01:31.948537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201907 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7502
Distinct (%)75.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70723662
Minimum-4.0038614 × 108
Maximum1.0457935 × 1010
Zeros2156
Zeros (%)21.6%
Negative332
Negative (%)3.3%
Memory size166.0 KiB
2024-05-11T15:01:32.220544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4.0038614 × 108
5-th percentile0
Q1332.5
median3129432.5
Q334132830
95-th percentile3.5465614 × 108
Maximum1.0457935 × 1010
Range1.0858321 × 1010
Interquartile range (IQR)34132497

Descriptive statistics

Standard deviation2.7562953 × 108
Coefficient of variation (CV)3.8972746
Kurtosis322.31737
Mean70723662
Median Absolute Deviation (MAD)3129432.5
Skewness13.521548
Sum7.0723662 × 1011
Variance7.5971639 × 1016
MonotonicityNot monotonic
2024-05-11T15:01:32.480175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2156
 
21.6%
250000 27
 
0.3%
500000 25
 
0.2%
100000 17
 
0.2%
20000000 14
 
0.1%
30000000 13
 
0.1%
484000 13
 
0.1%
1000000 10
 
0.1%
242000 10
 
0.1%
300000 8
 
0.1%
Other values (7492) 7707
77.1%
ValueCountFrequency (%)
-400386136 1
< 0.1%
-336660978 1
< 0.1%
-243113604 1
< 0.1%
-221882790 1
< 0.1%
-202584070 1
< 0.1%
-136844826 1
< 0.1%
-121853295 1
< 0.1%
-112068285 1
< 0.1%
-82606030 1
< 0.1%
-80456800 1
< 0.1%
ValueCountFrequency (%)
10457935355 1
< 0.1%
6339254456 1
< 0.1%
6104026672 1
< 0.1%
5441997739 1
< 0.1%
5183905819 1
< 0.1%
4915592874 1
< 0.1%
4815569084 1
< 0.1%
3823122111 1
< 0.1%
3437325530 1
< 0.1%
3394588360 1
< 0.1%

Interactions

2024-05-11T15:01:28.561247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T15:01:32.629810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.435
금액0.4351.000

Missing values

2024-05-11T15:01:28.685860image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:01:28.807739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
43339서빙고신동아A14024002퇴직급여충당예금201907857228330
50180신림금호타운1차A15101901승강기유지비충당부채2019070
25293강남한양수자인A13520002예수금2019076562795
60507가양2단지A15780605예금201907292655575
48155양평동6차현대아파트A15010307퇴직급여충당부채20190716432891
12718래미안엘파인A13075402수선유지비충당부채2019076981790
42070하계6단지(장미)A13987304당기순이익201907108305189
1635왕십리 자이 아파트A10026900관리비예치금예금201907450000
44981번동한양A14286104장기수선충당예금201907126954922
17087창동금용A13204201미지급금2019078156520
아파트명아파트코드비용명년월일금액
49400양평신동아아파트A15086202선급금2019071830950
59568개화산동부센트레빌A15722102상여충당부채2019070
35986송파한라비발디아파트A13876113가지급금2019071216200
10024응암경남A12201301예금20190765306810
29767동일하이빌뉴시티A13613011선급비용2019074557360
34286송파파크데일2단지A13812005미지급금20190752061884
14357상봉프레미어스엠코A13122002퇴직급여충당부채20190739740562
1822상암DMC엘가A10027019임차보증금201907100000
55138금천현대A15381706수선유지비충당부채2019073551810
26784개나리SKVIEWA13579506미부과관리비20190785258810