Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2400 (24.0%) zerosZeros

Reproduction

Analysis started2024-05-11 05:58:55.118802
Analysis finished2024-05-11 05:58:56.164657
Duration1.05 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2216
Distinct (%)22.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:58:56.432678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.3128
Min length2

Characters and Unicode

Total characters73128
Distinct characters435
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique120 ?
Unique (%)1.2%

Sample

1st row상암월드컵파크9단지
2nd row신길삼성
3rd row래미안베라힐즈아파트
4th row길동우성1차
5th row송파해모로아파트
ValueCountFrequency (%)
아파트 166
 
1.6%
래미안 39
 
0.4%
e편한세상 29
 
0.3%
고덕 21
 
0.2%
아이파크 19
 
0.2%
은평뉴타운상림마을6단지 14
 
0.1%
푸르지오 13
 
0.1%
해모로 13
 
0.1%
신답극동아파트 13
 
0.1%
중계그린 13
 
0.1%
Other values (2283) 10324
96.8%
2024-05-11T14:58:57.087233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2459
 
3.4%
2356
 
3.2%
2206
 
3.0%
1892
 
2.6%
1745
 
2.4%
1730
 
2.4%
1504
 
2.1%
1476
 
2.0%
1462
 
2.0%
1339
 
1.8%
Other values (425) 54959
75.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66918
91.5%
Decimal Number 3793
 
5.2%
Uppercase Letter 751
 
1.0%
Space Separator 732
 
1.0%
Lowercase Letter 359
 
0.5%
Close Punctuation 151
 
0.2%
Open Punctuation 151
 
0.2%
Other Punctuation 136
 
0.2%
Dash Punctuation 132
 
0.2%
Letter Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2459
 
3.7%
2356
 
3.5%
2206
 
3.3%
1892
 
2.8%
1745
 
2.6%
1730
 
2.6%
1504
 
2.2%
1476
 
2.2%
1462
 
2.2%
1339
 
2.0%
Other values (380) 48749
72.8%
Uppercase Letter
ValueCountFrequency (%)
S 123
16.4%
K 98
13.0%
C 96
12.8%
M 63
8.4%
D 63
8.4%
L 58
7.7%
H 44
 
5.9%
I 40
 
5.3%
E 34
 
4.5%
V 31
 
4.1%
Other values (7) 101
13.4%
Lowercase Letter
ValueCountFrequency (%)
e 203
56.5%
i 33
 
9.2%
l 30
 
8.4%
v 20
 
5.6%
s 19
 
5.3%
k 16
 
4.5%
w 13
 
3.6%
c 8
 
2.2%
h 7
 
1.9%
a 5
 
1.4%
Decimal Number
ValueCountFrequency (%)
1 1175
31.0%
2 1117
29.4%
3 486
12.8%
4 265
 
7.0%
5 215
 
5.7%
6 162
 
4.3%
9 107
 
2.8%
7 102
 
2.7%
8 89
 
2.3%
0 75
 
2.0%
Other Punctuation
ValueCountFrequency (%)
, 110
80.9%
. 26
 
19.1%
Space Separator
ValueCountFrequency (%)
732
100.0%
Close Punctuation
ValueCountFrequency (%)
) 151
100.0%
Open Punctuation
ValueCountFrequency (%)
( 151
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 132
100.0%
Letter Number
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66918
91.5%
Common 5095
 
7.0%
Latin 1115
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2459
 
3.7%
2356
 
3.5%
2206
 
3.3%
1892
 
2.8%
1745
 
2.6%
1730
 
2.6%
1504
 
2.2%
1476
 
2.2%
1462
 
2.2%
1339
 
2.0%
Other values (380) 48749
72.8%
Latin
ValueCountFrequency (%)
e 203
18.2%
S 123
11.0%
K 98
 
8.8%
C 96
 
8.6%
M 63
 
5.7%
D 63
 
5.7%
L 58
 
5.2%
H 44
 
3.9%
I 40
 
3.6%
E 34
 
3.0%
Other values (19) 293
26.3%
Common
ValueCountFrequency (%)
1 1175
23.1%
2 1117
21.9%
732
14.4%
3 486
9.5%
4 265
 
5.2%
5 215
 
4.2%
6 162
 
3.2%
) 151
 
3.0%
( 151
 
3.0%
- 132
 
2.6%
Other values (6) 509
10.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66918
91.5%
ASCII 6205
 
8.5%
Number Forms 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2459
 
3.7%
2356
 
3.5%
2206
 
3.3%
1892
 
2.8%
1745
 
2.6%
1730
 
2.6%
1504
 
2.2%
1476
 
2.2%
1462
 
2.2%
1339
 
2.0%
Other values (380) 48749
72.8%
ASCII
ValueCountFrequency (%)
1 1175
18.9%
2 1117
18.0%
732
11.8%
3 486
 
7.8%
4 265
 
4.3%
5 215
 
3.5%
e 203
 
3.3%
6 162
 
2.6%
) 151
 
2.4%
( 151
 
2.4%
Other values (34) 1548
24.9%
Number Forms
ValueCountFrequency (%)
5
100.0%
Distinct2221
Distinct (%)22.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:58:57.557007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique120 ?
Unique (%)1.2%

Sample

1st rowA12179504
2nd rowA15005603
3rd rowA10025846
4th rowA13481001
5th rowA10025632
ValueCountFrequency (%)
a13080401 13
 
0.1%
a13986306 13
 
0.1%
a13285406 12
 
0.1%
a15080501 12
 
0.1%
a13607101 12
 
0.1%
a12127005 11
 
0.1%
a13671209 11
 
0.1%
a11052201 11
 
0.1%
a11081302 11
 
0.1%
a13606102 11
 
0.1%
Other values (2211) 9883
98.8%
2024-05-11T14:58:58.190329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18428
20.5%
1 17563
19.5%
A 9995
11.1%
3 8797
9.8%
2 8215
9.1%
5 6287
 
7.0%
8 5554
 
6.2%
7 4734
 
5.3%
4 3984
 
4.4%
6 3431
 
3.8%
Other values (2) 3012
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18428
23.0%
1 17563
22.0%
3 8797
11.0%
2 8215
10.3%
5 6287
 
7.9%
8 5554
 
6.9%
7 4734
 
5.9%
4 3984
 
5.0%
6 3431
 
4.3%
9 3007
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9995
> 99.9%
B 5
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18428
23.0%
1 17563
22.0%
3 8797
11.0%
2 8215
10.3%
5 6287
 
7.9%
8 5554
 
6.9%
7 4734
 
5.9%
4 3984
 
5.0%
6 3431
 
4.3%
9 3007
 
3.8%
Latin
ValueCountFrequency (%)
A 9995
> 99.9%
B 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18428
20.5%
1 17563
19.5%
A 9995
11.1%
3 8797
9.8%
2 8215
9.1%
5 6287
 
7.0%
8 5554
 
6.2%
7 4734
 
5.3%
4 3984
 
4.4%
6 3431
 
3.8%
Other values (2) 3012
 
3.3%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:58:58.523785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length10
Mean length5.9722
Min length2

Characters and Unicode

Total characters59722
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row단기보증금
2nd row공동주택적립금예금
3rd row기타충당부채
4th row수선유지비충당부채
5th row연차수당충당부채
ValueCountFrequency (%)
예금 330
 
3.3%
당기순이익 321
 
3.2%
퇴직급여충당부채 319
 
3.2%
관리비미수금 311
 
3.1%
예수금 309
 
3.1%
미처분이익잉여금 309
 
3.1%
연차수당충당부채 306
 
3.1%
선급비용 304
 
3.0%
공동주택적립금 292
 
2.9%
미부과관리비 291
 
2.9%
Other values (67) 6908
69.1%
2024-05-11T14:58:59.017001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4644
 
7.8%
3791
 
6.3%
3122
 
5.2%
3017
 
5.1%
3005
 
5.0%
2911
 
4.9%
2607
 
4.4%
2417
 
4.0%
1852
 
3.1%
1791
 
3.0%
Other values (97) 30565
51.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59722
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4644
 
7.8%
3791
 
6.3%
3122
 
5.2%
3017
 
5.1%
3005
 
5.0%
2911
 
4.9%
2607
 
4.4%
2417
 
4.0%
1852
 
3.1%
1791
 
3.0%
Other values (97) 30565
51.2%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59722
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4644
 
7.8%
3791
 
6.3%
3122
 
5.2%
3017
 
5.1%
3005
 
5.0%
2911
 
4.9%
2607
 
4.4%
2417
 
4.0%
1852
 
3.1%
1791
 
3.0%
Other values (97) 30565
51.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59722
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4644
 
7.8%
3791
 
6.3%
3122
 
5.2%
3017
 
5.1%
3005
 
5.0%
2911
 
4.9%
2607
 
4.4%
2417
 
4.0%
1852
 
3.1%
1791
 
3.0%
Other values (97) 30565
51.2%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202107
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202107
2nd row202107
3rd row202107
4th row202107
5th row202107

Common Values

ValueCountFrequency (%)
202107 10000
100.0%

Length

2024-05-11T14:58:59.222387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:58:59.351905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202107 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7314
Distinct (%)73.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70482554
Minimum-4.09024 × 109
Maximum1.1733328 × 1010
Zeros2400
Zeros (%)24.0%
Negative321
Negative (%)3.2%
Memory size166.0 KiB
2024-05-11T14:58:59.484216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4.09024 × 109
5-th percentile0
Q10
median3017719
Q336069804
95-th percentile3.3774445 × 108
Maximum1.1733328 × 1010
Range1.5823568 × 1010
Interquartile range (IQR)36069804

Descriptive statistics

Standard deviation2.9389584 × 108
Coefficient of variation (CV)4.1697672
Kurtosis394.00447
Mean70482554
Median Absolute Deviation (MAD)3017719
Skewness14.487815
Sum7.0482554 × 1011
Variance8.6374764 × 1016
MonotonicityNot monotonic
2024-05-11T14:58:59.640571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2400
 
24.0%
500000 35
 
0.4%
250000 17
 
0.2%
2000000 13
 
0.1%
300000 11
 
0.1%
242000 11
 
0.1%
484000 11
 
0.1%
30000000 10
 
0.1%
1000000 9
 
0.1%
100000 9
 
0.1%
Other values (7304) 7474
74.7%
ValueCountFrequency (%)
-4090240000 1
< 0.1%
-2461078406 1
< 0.1%
-365893768 1
< 0.1%
-304675700 1
< 0.1%
-280886952 1
< 0.1%
-251304174 1
< 0.1%
-211068660 1
< 0.1%
-169641277 1
< 0.1%
-148472511 1
< 0.1%
-136094650 1
< 0.1%
ValueCountFrequency (%)
11733327782 1
< 0.1%
7759211707 1
< 0.1%
6585060177 1
< 0.1%
5976030477 1
< 0.1%
5505081074 1
< 0.1%
5047155569 1
< 0.1%
4964672638 1
< 0.1%
4351681924 1
< 0.1%
3925744107 1
< 0.1%
3733870686 1
< 0.1%

Interactions

2024-05-11T14:58:55.775143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:58:59.757795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.437
금액0.4371.000

Missing values

2024-05-11T14:58:55.953593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:58:56.094313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
11636상암월드컵파크9단지A12179504단기보증금2021070
52193신길삼성A15005603공동주택적립금예금2021070
3111래미안베라힐즈아파트A10025846기타충당부채202107133115
26801길동우성1차A13481001수선유지비충당부채2021073465073
2554송파해모로아파트A10025632연차수당충당부채2021074224914
41442월계동현대A13905105기타재고자산202107-60683330
28730강남엘에이치1단지A13519007기타의비유동부채2021070
36592방배임광3차A13785001복리후생비충당부채202107612000
55429봉천은천2단지A15106103주차장충당부채20210756280922
70985은평뉴타운제각말5단지제1A41279923예수금2021071231293
아파트명아파트코드비용명년월일금액
530개포래미안포레스트A10024564미수관리비예치금2021072460000
59106궁동우신빌라A15288301당기순이익20210711292807
62208동작상떼빌주상복합A15670001선수수도료2021070
13721은평뉴타운상림마을6단지 제1아파트A12220001비품20210721170110
34688돈암동부센트레빌A13681303장기수선충당예금202107602383589
5994공덕파크자이 아파트A10027748기타당좌자산202107192310
22635마장신성미소지움A13305003퇴직급여충당부채2021075791520
64245염창강변한솔솔파크A15704025선급금20210711750
29357우성캐릭터199 아파트A13527003기타당좌자산202107182080
66869염창관음삼성A15786321관리비미수금20210712234540