Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2178 (21.8%) zerosZeros

Reproduction

Analysis started2024-05-11 05:59:53.357215
Analysis finished2024-05-11 05:59:54.280447
Duration0.92 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2201
Distinct (%)22.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:54.507748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.2501
Min length2

Characters and Unicode

Total characters72501
Distinct characters436
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique127 ?
Unique (%)1.3%

Sample

1st row올림픽파크한양수자인
2nd row염창동아3차
3rd row공릉두산힐스빌
4th row은평뉴타운구파발10단지2관리
5th row염창한마음삼성
ValueCountFrequency (%)
아파트 145
 
1.4%
래미안 24
 
0.2%
고덕 21
 
0.2%
아이파크 20
 
0.2%
래미안밤섬리베뉴 18
 
0.2%
e편한세상 16
 
0.2%
코오롱하늘채아파트 15
 
0.1%
신도림현대 15
 
0.1%
신동아파밀리에 14
 
0.1%
신당남산타운(분양 13
 
0.1%
Other values (2267) 10343
97.2%
2024-05-11T14:59:55.016597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2441
 
3.4%
2357
 
3.3%
2135
 
2.9%
1837
 
2.5%
1767
 
2.4%
1651
 
2.3%
1566
 
2.2%
1555
 
2.1%
1377
 
1.9%
1353
 
1.9%
Other values (426) 54462
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66449
91.7%
Decimal Number 3677
 
5.1%
Uppercase Letter 761
 
1.0%
Space Separator 722
 
1.0%
Lowercase Letter 337
 
0.5%
Close Punctuation 144
 
0.2%
Open Punctuation 144
 
0.2%
Dash Punctuation 128
 
0.2%
Other Punctuation 125
 
0.2%
Letter Number 11
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2441
 
3.7%
2357
 
3.5%
2135
 
3.2%
1837
 
2.8%
1767
 
2.7%
1651
 
2.5%
1566
 
2.4%
1555
 
2.3%
1377
 
2.1%
1353
 
2.0%
Other values (380) 48410
72.9%
Uppercase Letter
ValueCountFrequency (%)
S 117
15.4%
C 116
15.2%
K 97
12.7%
D 74
9.7%
M 74
9.7%
L 56
7.4%
H 48
6.3%
I 38
 
5.0%
E 31
 
4.1%
G 24
 
3.2%
Other values (7) 86
11.3%
Lowercase Letter
ValueCountFrequency (%)
e 204
60.5%
l 30
 
8.9%
i 25
 
7.4%
v 20
 
5.9%
k 17
 
5.0%
s 16
 
4.7%
c 8
 
2.4%
w 6
 
1.8%
g 4
 
1.2%
a 4
 
1.2%
Decimal Number
ValueCountFrequency (%)
1 1140
31.0%
2 1029
28.0%
3 476
12.9%
4 279
 
7.6%
5 208
 
5.7%
6 178
 
4.8%
7 119
 
3.2%
0 86
 
2.3%
9 81
 
2.2%
8 81
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 100
80.0%
. 25
 
20.0%
Space Separator
ValueCountFrequency (%)
722
100.0%
Close Punctuation
ValueCountFrequency (%)
) 144
100.0%
Open Punctuation
ValueCountFrequency (%)
( 144
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 128
100.0%
Letter Number
ValueCountFrequency (%)
11
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66449
91.7%
Common 4943
 
6.8%
Latin 1109
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2441
 
3.7%
2357
 
3.5%
2135
 
3.2%
1837
 
2.8%
1767
 
2.7%
1651
 
2.5%
1566
 
2.4%
1555
 
2.3%
1377
 
2.1%
1353
 
2.0%
Other values (380) 48410
72.9%
Latin
ValueCountFrequency (%)
e 204
18.4%
S 117
10.6%
C 116
10.5%
K 97
 
8.7%
D 74
 
6.7%
M 74
 
6.7%
L 56
 
5.0%
H 48
 
4.3%
I 38
 
3.4%
E 31
 
2.8%
Other values (19) 254
22.9%
Common
ValueCountFrequency (%)
1 1140
23.1%
2 1029
20.8%
722
14.6%
3 476
9.6%
4 279
 
5.6%
5 208
 
4.2%
6 178
 
3.6%
) 144
 
2.9%
( 144
 
2.9%
- 128
 
2.6%
Other values (7) 495
10.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66449
91.7%
ASCII 6041
 
8.3%
Number Forms 11
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2441
 
3.7%
2357
 
3.5%
2135
 
3.2%
1837
 
2.8%
1767
 
2.7%
1651
 
2.5%
1566
 
2.4%
1555
 
2.3%
1377
 
2.1%
1353
 
2.0%
Other values (380) 48410
72.9%
ASCII
ValueCountFrequency (%)
1 1140
18.9%
2 1029
17.0%
722
12.0%
3 476
 
7.9%
4 279
 
4.6%
5 208
 
3.4%
e 204
 
3.4%
6 178
 
2.9%
) 144
 
2.4%
( 144
 
2.4%
Other values (35) 1517
25.1%
Number Forms
ValueCountFrequency (%)
11
100.0%
Distinct2209
Distinct (%)22.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:55.462908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique128 ?
Unique (%)1.3%

Sample

1st rowA10027354
2nd rowA15786227
3rd rowA13980415
4th rowA41279928
5th rowA15786118
ValueCountFrequency (%)
a10045302 13
 
0.1%
a13817202 12
 
0.1%
a15606003 11
 
0.1%
a13311101 11
 
0.1%
a10028177 11
 
0.1%
a15722102 11
 
0.1%
a13822004 11
 
0.1%
a15678101 11
 
0.1%
a15205108 11
 
0.1%
a13381701 11
 
0.1%
Other values (2199) 9887
98.9%
2024-05-11T14:59:56.107870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18506
20.6%
1 17643
19.6%
A 9990
11.1%
3 8821
9.8%
2 8145
9.0%
5 6305
 
7.0%
8 5675
 
6.3%
7 4770
 
5.3%
4 3791
 
4.2%
6 3432
 
3.8%
Other values (2) 2922
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18506
23.1%
1 17643
22.1%
3 8821
11.0%
2 8145
10.2%
5 6305
 
7.9%
8 5675
 
7.1%
7 4770
 
6.0%
4 3791
 
4.7%
6 3432
 
4.3%
9 2912
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 9990
99.9%
B 10
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18506
23.1%
1 17643
22.1%
3 8821
11.0%
2 8145
10.2%
5 6305
 
7.9%
8 5675
 
7.1%
7 4770
 
6.0%
4 3791
 
4.7%
6 3432
 
4.3%
9 2912
 
3.6%
Latin
ValueCountFrequency (%)
A 9990
99.9%
B 10
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18506
20.6%
1 17643
19.6%
A 9990
11.1%
3 8821
9.8%
2 8145
9.0%
5 6305
 
7.0%
8 5675
 
6.3%
7 4770
 
5.3%
4 3791
 
4.2%
6 3432
 
3.8%
Other values (2) 2922
 
3.2%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:56.447662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length5.9574
Min length2

Characters and Unicode

Total characters59574
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row관리비미수금
2nd row청소비충당부채
3rd row선급비용
4th row선수전기료
5th row수선유지비충당부채
ValueCountFrequency (%)
당기순이익 343
 
3.4%
예수금 324
 
3.2%
연차수당충당부채 319
 
3.2%
비품 316
 
3.2%
퇴직급여충당부채 315
 
3.1%
관리비미수금 309
 
3.1%
공동주택적립금 306
 
3.1%
선급비용 301
 
3.0%
예금 300
 
3.0%
미처분이익잉여금 296
 
3.0%
Other values (67) 6871
68.7%
2024-05-11T14:59:56.974958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4655
 
7.8%
3787
 
6.4%
3160
 
5.3%
3071
 
5.2%
3033
 
5.1%
2946
 
4.9%
2636
 
4.4%
2390
 
4.0%
1864
 
3.1%
1777
 
3.0%
Other values (97) 30255
50.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59574
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4655
 
7.8%
3787
 
6.4%
3160
 
5.3%
3071
 
5.2%
3033
 
5.1%
2946
 
4.9%
2636
 
4.4%
2390
 
4.0%
1864
 
3.1%
1777
 
3.0%
Other values (97) 30255
50.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59574
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4655
 
7.8%
3787
 
6.4%
3160
 
5.3%
3071
 
5.2%
3033
 
5.1%
2946
 
4.9%
2636
 
4.4%
2390
 
4.0%
1864
 
3.1%
1777
 
3.0%
Other values (97) 30255
50.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59574
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4655
 
7.8%
3787
 
6.4%
3160
 
5.3%
3071
 
5.2%
3033
 
5.1%
2946
 
4.9%
2636
 
4.4%
2390
 
4.0%
1864
 
3.1%
1777
 
3.0%
Other values (97) 30255
50.8%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202009
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202009
2nd row202009
3rd row202009
4th row202009
5th row202009

Common Values

ValueCountFrequency (%)
202009 10000
100.0%

Length

2024-05-11T14:59:57.224011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:59:57.389466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202009 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7476
Distinct (%)74.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71073712
Minimum-4.09024 × 109
Maximum1.1736729 × 1010
Zeros2178
Zeros (%)21.8%
Negative333
Negative (%)3.3%
Memory size166.0 KiB
2024-05-11T14:59:57.587625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4.09024 × 109
5-th percentile0
Q10
median3512020
Q336703212
95-th percentile3.3641574 × 108
Maximum1.1736729 × 1010
Range1.5826969 × 1010
Interquartile range (IQR)36703212

Descriptive statistics

Standard deviation2.9967182 × 108
Coefficient of variation (CV)4.2163524
Kurtosis383.32557
Mean71073712
Median Absolute Deviation (MAD)3512020
Skewness14.630203
Sum7.1073712 × 1011
Variance8.9803197 × 1016
MonotonicityNot monotonic
2024-05-11T14:59:57.855758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2178
 
21.8%
250000 26
 
0.3%
500000 20
 
0.2%
484000 20
 
0.2%
100000 14
 
0.1%
242000 14
 
0.1%
300000 12
 
0.1%
3000000 9
 
0.1%
10000000 9
 
0.1%
200000 9
 
0.1%
Other values (7466) 7689
76.9%
ValueCountFrequency (%)
-4090240000 1
< 0.1%
-389001283 1
< 0.1%
-304675700 1
< 0.1%
-275232430 1
< 0.1%
-242649140 1
< 0.1%
-160448120 1
< 0.1%
-158111490 1
< 0.1%
-133850640 1
< 0.1%
-130567060 1
< 0.1%
-114204475 1
< 0.1%
ValueCountFrequency (%)
11736728832 1
< 0.1%
8111691181 1
< 0.1%
6810387320 1
< 0.1%
5628446971 1
< 0.1%
5253154921 1
< 0.1%
5244582330 1
< 0.1%
5182339176 1
< 0.1%
4836080065 1
< 0.1%
4376421585 1
< 0.1%
4309876602 1
< 0.1%

Interactions

2024-05-11T14:59:53.870076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:59:58.019629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.458
금액0.4581.000

Missing values

2024-05-11T14:59:54.086179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:59:54.226248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
4376올림픽파크한양수자인A10027354관리비미수금20200942651720
65026염창동아3차A15786227청소비충당부채2020090
42338공릉두산힐스빌A13980415선급비용20200912757370
69253은평뉴타운구파발10단지2관리A41279928선수전기료2020090
64923염창한마음삼성A15786118수선유지비충당부채2020096494760
22229성수금호3차A13311101비품2020091827000
13793래미안아름숲A13002002예수금2020093403625
66626목동삼성쉐르빌2차A15807601수선유지비충당부채202009299375
21915행당대림A13307204기타유형자산감가상각누계액2020090
37162오금현대백조A13813006퇴직급여충당예금2020090
아파트명아파트코드비용명년월일금액
14590휘경동일스위트리버A13009206퇴직급여충당예금20200957015648
7020명륜아남1차A11052201미수금2020092043000
30062역삼래미안A13592706기타충당부채2020090
21774서울숲더샵A13307003미부과관리비202009222021294
38481잠실푸르지오월드마크A13872503단기보증금2020099430000
47291수유벽산A14207203장기수선충당예금202009525600555
58175가산삼익아파트A15380101현금202009506348
8014홍은풍림아이원A12010202연차수당충당부채2020091950330
57172구로주공A15286809기타유형자산감가상각누계액202009-1111000
5065인왕산2차아이파크아파트A10027708선수관리비20200932962000