Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2008 (20.1%) zerosZeros

Reproduction

Analysis started2024-05-11 06:00:46.318136
Analysis finished2024-05-11 06:00:47.232889
Duration0.91 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2000
Distinct (%)20.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:47.467309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1868
Min length2

Characters and Unicode

Total characters71868
Distinct characters428
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique72 ?
Unique (%)0.7%

Sample

1st row가락상아1차
2nd row북한산힐스테이트3차
3rd row방학동부센트레빌
4th row자양현대
5th row상계주공6단지
ValueCountFrequency (%)
아파트 98
 
0.9%
래미안 32
 
0.3%
신반포 15
 
0.1%
서울숲힐스테이트 14
 
0.1%
신동아파밀리에 14
 
0.1%
신도림현대 14
 
0.1%
2단지 13
 
0.1%
현대 13
 
0.1%
홍제원 13
 
0.1%
신당남산타운(분양 13
 
0.1%
Other values (2059) 10272
97.7%
2024-05-11T15:00:48.035764image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2207
 
3.1%
2206
 
3.1%
1984
 
2.8%
1928
 
2.7%
1749
 
2.4%
1628
 
2.3%
1566
 
2.2%
1519
 
2.1%
1491
 
2.1%
1289
 
1.8%
Other values (418) 54301
75.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65999
91.8%
Decimal Number 3802
 
5.3%
Uppercase Letter 713
 
1.0%
Space Separator 562
 
0.8%
Lowercase Letter 321
 
0.4%
Close Punctuation 131
 
0.2%
Open Punctuation 131
 
0.2%
Dash Punctuation 118
 
0.2%
Other Punctuation 85
 
0.1%
Letter Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2207
 
3.3%
2206
 
3.3%
1984
 
3.0%
1928
 
2.9%
1749
 
2.7%
1628
 
2.5%
1566
 
2.4%
1519
 
2.3%
1491
 
2.3%
1289
 
2.0%
Other values (373) 48432
73.4%
Uppercase Letter
ValueCountFrequency (%)
S 129
18.1%
K 102
14.3%
C 90
12.6%
L 54
7.6%
D 49
 
6.9%
M 49
 
6.9%
H 41
 
5.8%
G 36
 
5.0%
I 34
 
4.8%
E 30
 
4.2%
Other values (7) 99
13.9%
Lowercase Letter
ValueCountFrequency (%)
e 183
57.0%
i 30
 
9.3%
l 28
 
8.7%
v 21
 
6.5%
s 13
 
4.0%
k 12
 
3.7%
w 11
 
3.4%
c 8
 
2.5%
a 5
 
1.6%
g 5
 
1.6%
Decimal Number
ValueCountFrequency (%)
1 1219
32.1%
2 1105
29.1%
3 483
 
12.7%
4 257
 
6.8%
5 225
 
5.9%
6 148
 
3.9%
9 108
 
2.8%
7 98
 
2.6%
8 87
 
2.3%
0 72
 
1.9%
Other Punctuation
ValueCountFrequency (%)
, 76
89.4%
. 9
 
10.6%
Space Separator
ValueCountFrequency (%)
562
100.0%
Close Punctuation
ValueCountFrequency (%)
) 131
100.0%
Open Punctuation
ValueCountFrequency (%)
( 131
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 118
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65999
91.8%
Common 4829
 
6.7%
Latin 1040
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2207
 
3.3%
2206
 
3.3%
1984
 
3.0%
1928
 
2.9%
1749
 
2.7%
1628
 
2.5%
1566
 
2.4%
1519
 
2.3%
1491
 
2.3%
1289
 
2.0%
Other values (373) 48432
73.4%
Latin
ValueCountFrequency (%)
e 183
17.6%
S 129
12.4%
K 102
 
9.8%
C 90
 
8.7%
L 54
 
5.2%
D 49
 
4.7%
M 49
 
4.7%
H 41
 
3.9%
G 36
 
3.5%
I 34
 
3.3%
Other values (19) 273
26.2%
Common
ValueCountFrequency (%)
1 1219
25.2%
2 1105
22.9%
562
11.6%
3 483
 
10.0%
4 257
 
5.3%
5 225
 
4.7%
6 148
 
3.1%
) 131
 
2.7%
( 131
 
2.7%
- 118
 
2.4%
Other values (6) 450
 
9.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65999
91.8%
ASCII 5863
 
8.2%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2207
 
3.3%
2206
 
3.3%
1984
 
3.0%
1928
 
2.9%
1749
 
2.7%
1628
 
2.5%
1566
 
2.4%
1519
 
2.3%
1491
 
2.3%
1289
 
2.0%
Other values (373) 48432
73.4%
ASCII
ValueCountFrequency (%)
1 1219
20.8%
2 1105
18.8%
562
 
9.6%
3 483
 
8.2%
4 257
 
4.4%
5 225
 
3.8%
e 183
 
3.1%
6 148
 
2.5%
) 131
 
2.2%
( 131
 
2.2%
Other values (34) 1419
24.2%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2006
Distinct (%)20.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:48.504865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique72 ?
Unique (%)0.7%

Sample

1st rowA13813004
2nd rowA12204004
3rd rowA13272102
4th rowA14319003
5th rowA13920707
ValueCountFrequency (%)
a13378001 14
 
0.1%
a14377402 13
 
0.1%
a12078704 13
 
0.1%
a13483002 13
 
0.1%
a10045302 13
 
0.1%
a13186708 12
 
0.1%
a13986306 12
 
0.1%
a10027375 12
 
0.1%
a13410002 12
 
0.1%
a15786222 11
 
0.1%
Other values (1996) 9875
98.8%
2024-05-11T15:00:49.229693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18167
20.2%
1 17539
19.5%
A 9990
11.1%
3 8865
9.8%
2 8328
9.3%
5 6236
 
6.9%
8 5867
 
6.5%
7 4958
 
5.5%
4 3830
 
4.3%
6 3353
 
3.7%
Other values (2) 2867
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18167
22.7%
1 17539
21.9%
3 8865
11.1%
2 8328
10.4%
5 6236
 
7.8%
8 5867
 
7.3%
7 4958
 
6.2%
4 3830
 
4.8%
6 3353
 
4.2%
9 2857
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 9990
99.9%
B 10
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18167
22.7%
1 17539
21.9%
3 8865
11.1%
2 8328
10.4%
5 6236
 
7.8%
8 5867
 
7.3%
7 4958
 
6.2%
4 3830
 
4.8%
6 3353
 
4.2%
9 2857
 
3.6%
Latin
ValueCountFrequency (%)
A 9990
99.9%
B 10
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18167
20.2%
1 17539
19.5%
A 9990
11.1%
3 8865
9.8%
2 8328
9.3%
5 6236
 
6.9%
8 5867
 
6.5%
7 4958
 
5.5%
4 3830
 
4.3%
6 3353
 
3.7%
Other values (2) 2867
 
3.2%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:49.622491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length5.9375
Min length2

Characters and Unicode

Total characters59375
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row선급비용
2nd row단기보증금
3rd row기타시설운영충당부채
4th row미처분이익잉여금
5th row미지급금
ValueCountFrequency (%)
관리비미수금 337
 
3.4%
예금 334
 
3.3%
선급비용 327
 
3.3%
비품 325
 
3.2%
미처분이익잉여금 318
 
3.2%
공동주택적립금 318
 
3.2%
당기순이익 310
 
3.1%
연차수당충당부채 305
 
3.0%
장기수선충당부채 303
 
3.0%
장기수선충당예금 303
 
3.0%
Other values (67) 6820
68.2%
2024-05-11T15:00:50.202963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4748
 
8.0%
3686
 
6.2%
3181
 
5.4%
3003
 
5.1%
3001
 
5.1%
2883
 
4.9%
2589
 
4.4%
2341
 
3.9%
1927
 
3.2%
1761
 
3.0%
Other values (97) 30255
51.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59375
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4748
 
8.0%
3686
 
6.2%
3181
 
5.4%
3003
 
5.1%
3001
 
5.1%
2883
 
4.9%
2589
 
4.4%
2341
 
3.9%
1927
 
3.2%
1761
 
3.0%
Other values (97) 30255
51.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59375
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4748
 
8.0%
3686
 
6.2%
3181
 
5.4%
3003
 
5.1%
3001
 
5.1%
2883
 
4.9%
2589
 
4.4%
2341
 
3.9%
1927
 
3.2%
1761
 
3.0%
Other values (97) 30255
51.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59375
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4748
 
8.0%
3686
 
6.2%
3181
 
5.4%
3003
 
5.1%
3001
 
5.1%
2883
 
4.9%
2589
 
4.4%
2341
 
3.9%
1927
 
3.2%
1761
 
3.0%
Other values (97) 30255
51.0%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202001
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202001
2nd row202001
3rd row202001
4th row202001
5th row202001

Common Values

ValueCountFrequency (%)
202001 10000
100.0%

Length

2024-05-11T15:00:50.419911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:00:50.567603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202001 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7668
Distinct (%)76.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75489781
Minimum-4.7775438 × 108
Maximum1.1661407 × 1010
Zeros2008
Zeros (%)20.1%
Negative386
Negative (%)3.9%
Memory size166.0 KiB
2024-05-11T15:00:50.751523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4.7775438 × 108
5-th percentile0
Q115768
median3388148.5
Q336244005
95-th percentile3.83988 × 108
Maximum1.1661407 × 1010
Range1.2139161 × 1010
Interquartile range (IQR)36228237

Descriptive statistics

Standard deviation3.081933 × 108
Coefficient of variation (CV)4.0825831
Kurtosis400.26829
Mean75489781
Median Absolute Deviation (MAD)3388148.5
Skewness15.749134
Sum7.5489781 × 1011
Variance9.4983113 × 1016
MonotonicityNot monotonic
2024-05-11T15:00:51.011536image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2008
 
20.1%
500000 33
 
0.3%
250000 31
 
0.3%
300000 16
 
0.2%
5000 14
 
0.1%
1000000 13
 
0.1%
200000 12
 
0.1%
484000 10
 
0.1%
242000 9
 
0.1%
30000000 9
 
0.1%
Other values (7658) 7845
78.5%
ValueCountFrequency (%)
-477754375 1
< 0.1%
-302145700 1
< 0.1%
-282000000 1
< 0.1%
-205956340 1
< 0.1%
-177053510 1
< 0.1%
-161481980 1
< 0.1%
-149282800 1
< 0.1%
-145971370 1
< 0.1%
-136095880 1
< 0.1%
-127648010 1
< 0.1%
ValueCountFrequency (%)
11661406948 1
< 0.1%
8854326575 1
< 0.1%
7909385769 1
< 0.1%
6901374795 1
< 0.1%
6106284701 1
< 0.1%
5880083757 1
< 0.1%
5653287022 1
< 0.1%
5330459540 1
< 0.1%
4909231012 1
< 0.1%
4260978071 1
< 0.1%

Interactions

2024-05-11T15:00:46.841343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T15:00:51.179852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.374
금액0.3741.000

Missing values

2024-05-11T15:00:47.010855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:00:47.159325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
32806가락상아1차A13813004선급비용202001598280
10452북한산힐스테이트3차A12204004단기보증금20200137310250
17506방학동부센트레빌A13272102기타시설운영충당부채2020011206774
42683자양현대A14319003미처분이익잉여금2020010
35769상계주공6단지A13920707미지급금2020010
45472양평경남1차A15010302전신전화가입권202001250000
45199문래현대5차아파트A15009504퇴직급여충당예금20200122549640
36450중계주공7단지A13922910공동주택적립금20200110127562
7248홍제원현대임대A12078707당기순이익202001702482
35891상계주공10단지A13920804선수전기료2020013376390
아파트명아파트코드비용명년월일금액
17535도봉서원제2A13275302예금202001131330121
24917도곡1차아이파크A13527007세대배부용비품202001593000
37431공릉동신A13980411연차수당충당부채2020015691720
23934청담삼환A13510201가지급금20200167950
13747청량리미주A13086705장기수선충당부채2020011611998025
48141봉천두산3단지A15178203주차장충당부채2020010
38607상계현대3차A13983712당기순이익2020017564626
21402성내삼성A13403101관리비예치금202001192150000
4427e편한세상마포리버파크A10028006미수금2020010
24906도곡1차아이파크A13527007연차수당충당부채20200114075870