Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1681 (16.8%) zerosZeros

Reproduction

Analysis started2024-05-11 06:51:51.339466
Analysis finished2024-05-11 06:51:53.300714
Duration1.96 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2164
Distinct (%)21.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:53.573352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.3976
Min length2

Characters and Unicode

Total characters73976
Distinct characters430
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique113 ?
Unique (%)1.1%

Sample

1st row하계극동건영벽산
2nd row중계4단지목화
3rd row송천센트레빌
4th row상계조합대림
5th row신천장미1차2차
ValueCountFrequency (%)
아파트 191
 
1.8%
래미안 56
 
0.5%
e편한세상 28
 
0.3%
아이파크 20
 
0.2%
신반포 20
 
0.2%
고덕 19
 
0.2%
북한산 18
 
0.2%
sk뷰 17
 
0.2%
경남아너스빌 17
 
0.2%
서초스위트 14
 
0.1%
Other values (2245) 10503
96.3%
2024-05-11T06:51:54.595542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2770
 
3.7%
2669
 
3.6%
2495
 
3.4%
1718
 
2.3%
1676
 
2.3%
1622
 
2.2%
1499
 
2.0%
1452
 
2.0%
1432
 
1.9%
1280
 
1.7%
Other values (420) 55363
74.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67801
91.7%
Decimal Number 3434
 
4.6%
Space Separator 1006
 
1.4%
Uppercase Letter 824
 
1.1%
Lowercase Letter 367
 
0.5%
Open Punctuation 150
 
0.2%
Close Punctuation 150
 
0.2%
Dash Punctuation 125
 
0.2%
Other Punctuation 115
 
0.2%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2770
 
4.1%
2669
 
3.9%
2495
 
3.7%
1718
 
2.5%
1676
 
2.5%
1622
 
2.4%
1499
 
2.2%
1452
 
2.1%
1432
 
2.1%
1280
 
1.9%
Other values (375) 49188
72.5%
Uppercase Letter
ValueCountFrequency (%)
C 136
16.5%
S 116
14.1%
M 104
12.6%
D 104
12.6%
K 91
11.0%
L 61
7.4%
H 52
 
6.3%
I 31
 
3.8%
E 29
 
3.5%
A 20
 
2.4%
Other values (7) 80
9.7%
Lowercase Letter
ValueCountFrequency (%)
e 213
58.0%
i 30
 
8.2%
l 29
 
7.9%
v 21
 
5.7%
s 19
 
5.2%
k 19
 
5.2%
w 13
 
3.5%
c 8
 
2.2%
h 7
 
1.9%
a 4
 
1.1%
Decimal Number
ValueCountFrequency (%)
2 1049
30.5%
1 1009
29.4%
3 434
12.6%
4 236
 
6.9%
5 218
 
6.3%
6 130
 
3.8%
7 111
 
3.2%
8 88
 
2.6%
9 83
 
2.4%
0 76
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 96
83.5%
. 19
 
16.5%
Space Separator
ValueCountFrequency (%)
1006
100.0%
Open Punctuation
ValueCountFrequency (%)
( 150
100.0%
Close Punctuation
ValueCountFrequency (%)
) 150
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 125
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67801
91.7%
Common 4980
 
6.7%
Latin 1195
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2770
 
4.1%
2669
 
3.9%
2495
 
3.7%
1718
 
2.5%
1676
 
2.5%
1622
 
2.4%
1499
 
2.2%
1452
 
2.1%
1432
 
2.1%
1280
 
1.9%
Other values (375) 49188
72.5%
Latin
ValueCountFrequency (%)
e 213
17.8%
C 136
11.4%
S 116
9.7%
M 104
8.7%
D 104
8.7%
K 91
 
7.6%
L 61
 
5.1%
H 52
 
4.4%
I 31
 
2.6%
i 30
 
2.5%
Other values (19) 257
21.5%
Common
ValueCountFrequency (%)
2 1049
21.1%
1 1009
20.3%
1006
20.2%
3 434
8.7%
4 236
 
4.7%
5 218
 
4.4%
( 150
 
3.0%
) 150
 
3.0%
6 130
 
2.6%
- 125
 
2.5%
Other values (6) 473
9.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67801
91.7%
ASCII 6171
 
8.3%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2770
 
4.1%
2669
 
3.9%
2495
 
3.7%
1718
 
2.5%
1676
 
2.5%
1622
 
2.4%
1499
 
2.2%
1452
 
2.1%
1432
 
2.1%
1280
 
1.9%
Other values (375) 49188
72.5%
ASCII
ValueCountFrequency (%)
2 1049
17.0%
1 1009
16.4%
1006
16.3%
3 434
 
7.0%
4 236
 
3.8%
5 218
 
3.5%
e 213
 
3.5%
( 150
 
2.4%
) 150
 
2.4%
C 136
 
2.2%
Other values (34) 1570
25.4%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2170
Distinct (%)21.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:55.524502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique113 ?
Unique (%)1.1%

Sample

1st rowA13987306
2nd rowA13972603
3rd rowA14272313
4th rowA13981407
5th rowA13824005
ValueCountFrequency (%)
a13707009 14
 
0.1%
a13876114 13
 
0.1%
a14381407 12
 
0.1%
a15086601 12
 
0.1%
a13312303 12
 
0.1%
a12201001 12
 
0.1%
a13985909 12
 
0.1%
a13550502 11
 
0.1%
a13790703 11
 
0.1%
a12012202 11
 
0.1%
Other values (2160) 9880
98.8%
2024-05-11T06:51:57.059846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18890
21.0%
1 17470
19.4%
A 10000
11.1%
3 9060
10.1%
2 8298
9.2%
5 6006
 
6.7%
8 5444
 
6.0%
7 4570
 
5.1%
4 3968
 
4.4%
6 3436
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18890
23.6%
1 17470
21.8%
3 9060
11.3%
2 8298
10.4%
5 6006
 
7.5%
8 5444
 
6.8%
7 4570
 
5.7%
4 3968
 
5.0%
6 3436
 
4.3%
9 2858
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18890
23.6%
1 17470
21.8%
3 9060
11.3%
2 8298
10.4%
5 6006
 
7.5%
8 5444
 
6.8%
7 4570
 
5.7%
4 3968
 
5.0%
6 3436
 
4.3%
9 2858
 
3.6%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18890
21.0%
1 17470
19.4%
A 10000
11.1%
3 9060
10.1%
2 8298
9.2%
5 6006
 
6.7%
8 5444
 
6.0%
7 4570
 
5.1%
4 3968
 
4.4%
6 3436
 
3.8%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:58.023793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8866
Min length2

Characters and Unicode

Total characters48866
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기타운영비용
2nd row도서인쇄비
3rd row통신비
4th row수도광열비
5th row피복비
ValueCountFrequency (%)
경비비 231
 
2.3%
이자수익 230
 
2.3%
승강기유지비 225
 
2.2%
급여 224
 
2.2%
세대전기료 219
 
2.2%
통신비 217
 
2.2%
사무용품비 215
 
2.1%
교육비 213
 
2.1%
연체료수익 212
 
2.1%
잡수익 212
 
2.1%
Other values (77) 7802
78.0%
2024-05-11T06:51:59.329086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5481
 
11.2%
3520
 
7.2%
2084
 
4.3%
2008
 
4.1%
1760
 
3.6%
1278
 
2.6%
1073
 
2.2%
854
 
1.7%
798
 
1.6%
752
 
1.5%
Other values (110) 29258
59.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48866
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5481
 
11.2%
3520
 
7.2%
2084
 
4.3%
2008
 
4.1%
1760
 
3.6%
1278
 
2.6%
1073
 
2.2%
854
 
1.7%
798
 
1.6%
752
 
1.5%
Other values (110) 29258
59.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48866
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5481
 
11.2%
3520
 
7.2%
2084
 
4.3%
2008
 
4.1%
1760
 
3.6%
1278
 
2.6%
1073
 
2.2%
854
 
1.7%
798
 
1.6%
752
 
1.5%
Other values (110) 29258
59.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48866
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5481
 
11.2%
3520
 
7.2%
2084
 
4.3%
2008
 
4.1%
1760
 
3.6%
1278
 
2.6%
1073
 
2.2%
854
 
1.7%
798
 
1.6%
752
 
1.5%
Other values (110) 29258
59.9%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202207
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202207
2nd row202207
3rd row202207
4th row202207
5th row202207

Common Values

ValueCountFrequency (%)
202207 10000
100.0%

Length

2024-05-11T06:52:00.057858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:52:00.704138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202207 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6790
Distinct (%)67.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3441299
Minimum-1608970
Maximum5.6917407 × 108
Zeros1681
Zeros (%)16.8%
Negative14
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:52:01.204424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-1608970
5-th percentile0
Q136950
median280000
Q31361655
95-th percentile17641865
Maximum5.6917407 × 108
Range5.7078304 × 108
Interquartile range (IQR)1324705

Descriptive statistics

Standard deviation13163823
Coefficient of variation (CV)3.8252483
Kurtosis457.7883
Mean3441299
Median Absolute Deviation (MAD)280000
Skewness15.564678
Sum3.441299 × 1010
Variance1.7328624 × 1014
MonotonicityNot monotonic
2024-05-11T06:52:01.742420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1681
 
16.8%
200000 75
 
0.8%
100000 60
 
0.6%
300000 59
 
0.6%
50000 41
 
0.4%
250000 35
 
0.4%
400000 34
 
0.3%
150000 33
 
0.3%
500000 31
 
0.3%
30000 30
 
0.3%
Other values (6780) 7921
79.2%
ValueCountFrequency (%)
-1608970 1
< 0.1%
-494633 1
< 0.1%
-432900 1
< 0.1%
-391091 1
< 0.1%
-237600 1
< 0.1%
-231070 1
< 0.1%
-177090 1
< 0.1%
-96490 1
< 0.1%
-72590 1
< 0.1%
-44000 1
< 0.1%
ValueCountFrequency (%)
569174069 1
< 0.1%
327478659 1
< 0.1%
312105490 1
< 0.1%
234624605 1
< 0.1%
226735831 1
< 0.1%
187773750 1
< 0.1%
187252118 1
< 0.1%
181510230 1
< 0.1%
178879620 1
< 0.1%
174536149 1
< 0.1%

Interactions

2024-05-11T06:51:52.362847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:52:02.115651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.290
금액0.2901.000

Missing values

2024-05-11T06:51:52.853070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:51:53.158536image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
69357하계극동건영벽산A13987306기타운영비용2022074725120
64542중계4단지목화A13972603도서인쇄비202207616000
73236송천센트레빌A14272313통신비202207100660
66315상계조합대림A13981407수도광열비20220714600
59544신천장미1차2차A13824005피복비2022070
516롯데캐슬클라시아A10023926급여20220748310320
4117구로항동제일풍경채포레스트A10024927제수당2022072099660
93238흑석한강현대A15685702청소비20220711460226
54295LH서초5단지A13778210청소비2022074878740
13513신당삼성(분양)A10045403세대전기료20220759432833
아파트명아파트코드비용명년월일금액
60811가락프라자A13881204감가상각비20220731523
8477래미안서초에스티지에스아파트A10026411고용보험료202207304850
72244오동공원현대홈타운A14206201이자수익2022070
77098당산삼성래미안A15004507연차수당2022071427870
19121월드컵아이파크1단지A12171101기타운영비용2022071929290
72430번동기산그린A14206305전기안전관리비202207330000
72721삼각산아이원임대A14210001산재보험료20220716730
47738개포주공7단지A13599301제수당20220712536510
90981흑석동양A15607001승강기유지비202207858000
52813반포미도아파트A13704404소모품비202207290940