Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1285 (12.8%) zerosZeros

Reproduction

Analysis started2024-05-11 06:56:48.977942
Analysis finished2024-05-11 06:56:51.308226
Duration2.33 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2182
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:51.744424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length21
Mean length7.2105
Min length2

Characters and Unicode

Total characters72105
Distinct characters432
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique130 ?
Unique (%)1.3%

Sample

1st row관악드림타운제2
2nd row구로2차순영웰라이빌
3rd row독립문극동
4th row염창극동
5th row풍납동아한가람
ValueCountFrequency (%)
아파트 138
 
1.3%
래미안 27
 
0.3%
아이파크 25
 
0.2%
신반포 19
 
0.2%
e편한세상 15
 
0.1%
은평뉴타운상림마을6단지 14
 
0.1%
코오롱하늘채아파트 13
 
0.1%
힐스테이트 13
 
0.1%
sk뷰 13
 
0.1%
홍은현대 13
 
0.1%
Other values (2245) 10304
97.3%
2024-05-11T06:56:53.311152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2482
 
3.4%
2378
 
3.3%
2116
 
2.9%
1791
 
2.5%
1777
 
2.5%
1701
 
2.4%
1533
 
2.1%
1480
 
2.1%
1398
 
1.9%
1325
 
1.8%
Other values (422) 54124
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66252
91.9%
Decimal Number 3711
 
5.1%
Uppercase Letter 673
 
0.9%
Space Separator 651
 
0.9%
Lowercase Letter 313
 
0.4%
Open Punctuation 130
 
0.2%
Close Punctuation 130
 
0.2%
Other Punctuation 124
 
0.2%
Dash Punctuation 113
 
0.2%
Letter Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2482
 
3.7%
2378
 
3.6%
2116
 
3.2%
1791
 
2.7%
1777
 
2.7%
1701
 
2.6%
1533
 
2.3%
1480
 
2.2%
1398
 
2.1%
1325
 
2.0%
Other values (376) 48271
72.9%
Uppercase Letter
ValueCountFrequency (%)
S 119
17.7%
K 97
14.4%
C 88
13.1%
D 53
7.9%
M 53
7.9%
L 48
7.1%
I 36
 
5.3%
H 35
 
5.2%
G 32
 
4.8%
E 29
 
4.3%
Other values (7) 83
12.3%
Lowercase Letter
ValueCountFrequency (%)
e 169
54.0%
i 28
 
8.9%
l 28
 
8.9%
k 20
 
6.4%
v 17
 
5.4%
c 16
 
5.1%
s 15
 
4.8%
w 11
 
3.5%
h 3
 
1.0%
g 3
 
1.0%
Decimal Number
ValueCountFrequency (%)
2 1111
29.9%
1 1105
29.8%
3 503
13.6%
4 243
 
6.5%
5 209
 
5.6%
6 156
 
4.2%
9 109
 
2.9%
7 97
 
2.6%
8 94
 
2.5%
0 84
 
2.3%
Other Punctuation
ValueCountFrequency (%)
, 97
78.2%
. 27
 
21.8%
Space Separator
ValueCountFrequency (%)
651
100.0%
Open Punctuation
ValueCountFrequency (%)
( 130
100.0%
Close Punctuation
ValueCountFrequency (%)
) 130
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 113
100.0%
Letter Number
ValueCountFrequency (%)
5
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66252
91.9%
Common 4862
 
6.7%
Latin 991
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2482
 
3.7%
2378
 
3.6%
2116
 
3.2%
1791
 
2.7%
1777
 
2.7%
1701
 
2.6%
1533
 
2.3%
1480
 
2.2%
1398
 
2.1%
1325
 
2.0%
Other values (376) 48271
72.9%
Latin
ValueCountFrequency (%)
e 169
17.1%
S 119
12.0%
K 97
 
9.8%
C 88
 
8.9%
D 53
 
5.3%
M 53
 
5.3%
L 48
 
4.8%
I 36
 
3.6%
H 35
 
3.5%
G 32
 
3.2%
Other values (19) 261
26.3%
Common
ValueCountFrequency (%)
2 1111
22.9%
1 1105
22.7%
651
13.4%
3 503
10.3%
4 243
 
5.0%
5 209
 
4.3%
6 156
 
3.2%
( 130
 
2.7%
) 130
 
2.7%
- 113
 
2.3%
Other values (7) 511
10.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66252
91.9%
ASCII 5848
 
8.1%
Number Forms 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2482
 
3.7%
2378
 
3.6%
2116
 
3.2%
1791
 
2.7%
1777
 
2.7%
1701
 
2.6%
1533
 
2.3%
1480
 
2.2%
1398
 
2.1%
1325
 
2.0%
Other values (376) 48271
72.9%
ASCII
ValueCountFrequency (%)
2 1111
19.0%
1 1105
18.9%
651
11.1%
3 503
 
8.6%
4 243
 
4.2%
5 209
 
3.6%
e 169
 
2.9%
6 156
 
2.7%
( 130
 
2.2%
) 130
 
2.2%
Other values (35) 1441
24.6%
Number Forms
ValueCountFrequency (%)
5
100.0%
Distinct2188
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:54.108486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique131 ?
Unique (%)1.3%

Sample

1st rowA15105503
2nd rowA15284101
3rd rowA12008003
4th rowA15786111
5th rowA13887302
ValueCountFrequency (%)
a12084504 13
 
0.1%
a13881603 12
 
0.1%
a14206202 12
 
0.1%
a13671208 12
 
0.1%
a12008003 12
 
0.1%
a10027188 12
 
0.1%
a15685702 11
 
0.1%
a13707203 11
 
0.1%
a15792602 11
 
0.1%
a13872502 11
 
0.1%
Other values (2178) 9883
98.8%
2024-05-11T06:56:55.871507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18416
20.5%
1 17603
19.6%
A 9989
11.1%
3 8781
9.8%
2 8235
9.2%
5 6348
 
7.1%
8 5713
 
6.3%
7 4843
 
5.4%
4 3710
 
4.1%
6 3443
 
3.8%
Other values (2) 2919
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18416
23.0%
1 17603
22.0%
3 8781
11.0%
2 8235
10.3%
5 6348
 
7.9%
8 5713
 
7.1%
7 4843
 
6.1%
4 3710
 
4.6%
6 3443
 
4.3%
9 2908
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 9989
99.9%
B 11
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18416
23.0%
1 17603
22.0%
3 8781
11.0%
2 8235
10.3%
5 6348
 
7.9%
8 5713
 
7.1%
7 4843
 
6.1%
4 3710
 
4.6%
6 3443
 
4.3%
9 2908
 
3.6%
Latin
ValueCountFrequency (%)
A 9989
99.9%
B 11
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18416
20.5%
1 17603
19.6%
A 9989
11.1%
3 8781
9.8%
2 8235
9.2%
5 6348
 
7.1%
8 5713
 
6.3%
7 4843
 
5.4%
4 3710
 
4.1%
6 3443
 
3.8%
Other values (2) 2919
 
3.2%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:56.820244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8707
Min length2

Characters and Unicode

Total characters48707
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row광고선전비
2nd row시설보수비
3rd row자치활동비
4th row건강보험료
5th row광고료수익
ValueCountFrequency (%)
퇴직급여 240
 
2.4%
수선유지비 235
 
2.4%
경비비 234
 
2.3%
세대전기료 224
 
2.2%
장기수선비 221
 
2.2%
입주자대표회의운영비 218
 
2.2%
통신비 216
 
2.2%
청소비 214
 
2.1%
이자수익 211
 
2.1%
소독비 211
 
2.1%
Other values (76) 7776
77.8%
2024-05-11T06:56:58.466227image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5414
 
11.1%
3570
 
7.3%
2055
 
4.2%
2012
 
4.1%
1781
 
3.7%
1355
 
2.8%
1052
 
2.2%
837
 
1.7%
794
 
1.6%
761
 
1.6%
Other values (110) 29076
59.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48707
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5414
 
11.1%
3570
 
7.3%
2055
 
4.2%
2012
 
4.1%
1781
 
3.7%
1355
 
2.8%
1052
 
2.2%
837
 
1.7%
794
 
1.6%
761
 
1.6%
Other values (110) 29076
59.7%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48707
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5414
 
11.1%
3570
 
7.3%
2055
 
4.2%
2012
 
4.1%
1781
 
3.7%
1355
 
2.8%
1052
 
2.2%
837
 
1.7%
794
 
1.6%
761
 
1.6%
Other values (110) 29076
59.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48707
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5414
 
11.1%
3570
 
7.3%
2055
 
4.2%
2012
 
4.1%
1781
 
3.7%
1355
 
2.8%
1052
 
2.2%
837
 
1.7%
794
 
1.6%
761
 
1.6%
Other values (110) 29076
59.7%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202005
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202005
2nd row202005
3rd row202005
4th row202005
5th row202005

Common Values

ValueCountFrequency (%)
202005 10000
100.0%

Length

2024-05-11T06:56:58.987751image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:56:59.297725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202005 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6935
Distinct (%)69.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3058755.6
Minimum-6000000
Maximum5.6366563 × 108
Zeros1285
Zeros (%)12.8%
Negative6
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:56:59.638013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-6000000
5-th percentile0
Q171815
median330000
Q31490473
95-th percentile15499905
Maximum5.6366563 × 108
Range5.6966563 × 108
Interquartile range (IQR)1418658

Descriptive statistics

Standard deviation11473506
Coefficient of variation (CV)3.7510372
Kurtosis691.9371
Mean3058755.6
Median Absolute Deviation (MAD)330000
Skewness19.019361
Sum3.0587556 × 1010
Variance1.3164134 × 1014
MonotonicityNot monotonic
2024-05-11T06:57:00.204992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1285
 
12.8%
200000 95
 
0.9%
300000 62
 
0.6%
100000 53
 
0.5%
400000 43
 
0.4%
150000 34
 
0.3%
350000 33
 
0.3%
120000 30
 
0.3%
500000 27
 
0.3%
600000 25
 
0.2%
Other values (6925) 8313
83.1%
ValueCountFrequency (%)
-6000000 1
 
< 0.1%
-3410000 1
 
< 0.1%
-337050 1
 
< 0.1%
-173530 1
 
< 0.1%
-156000 1
 
< 0.1%
-10000 1
 
< 0.1%
0 1285
12.8%
5 1
 
< 0.1%
6 1
 
< 0.1%
8 1
 
< 0.1%
ValueCountFrequency (%)
563665630 1
< 0.1%
310456392 1
< 0.1%
240481126 1
< 0.1%
227826890 1
< 0.1%
218609374 1
< 0.1%
186211770 1
< 0.1%
160034000 1
< 0.1%
151082260 1
< 0.1%
136015380 1
< 0.1%
117273446 1
< 0.1%

Interactions

2024-05-11T06:56:50.338812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:57:00.538818image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.192
금액0.1921.000

Missing values

2024-05-11T06:56:50.797755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:56:51.124275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
77127관악드림타운제2A15105503광고선전비20200547500
81574구로2차순영웰라이빌A15284101시설보수비2020050
10933독립문극동A12008003자치활동비2020050
93090염창극동A15786111건강보험료202005316190
57033풍납동아한가람A13887302광고료수익2020050
76643건영3차아파트A15101903음식물처리비2020051658790
8022목동센트럴푸르지오아파트A10027849연체료수익20200523620
89665마곡금호어울림A15721001기타운영비용20200540000
26048신내8단지두산화성A13187201잡비용202005774280
35768강일리버파크10단지A13410005국민연금202005483380
아파트명아파트코드비용명년월일금액
33127금호1차푸르지오A13380602제수당202005603220
13414공덕한화꿈에그린A12102002승강기수익2020050
9388신당삼성(분양)A10045403지급수수료2020058800
54785잠실우성4차A13822902세대난방비2020055709220
9954남산롯데캐슬아이리스A10088102기타운영비용2020053320250
5184래미안강동팰리스A10026852승강기수익202005638000
80286천왕이펜하우스1단지A15213006입주자대표회의운영비202005300000
80054오류푸르지오A15210209급여20200511163520
18163갈현미미A12205001소방안전관리비202005220000
91136화곡2차보람A15770101수선유지비2020051211040