Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1025 (10.2%) zerosZeros

Reproduction

Analysis started2024-05-11 06:56:33.994119
Analysis finished2024-05-11 06:56:36.649903
Duration2.66 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2166
Distinct (%)21.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:37.063782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length21
Mean length7.1689
Min length2

Characters and Unicode

Total characters71689
Distinct characters431
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique118 ?
Unique (%)1.2%

Sample

1st row구로현대
2nd row마포현대아파트
3rd row성북역신도브래뉴
4th row예성그린캐슬아파트
5th row갈현현대아파트
ValueCountFrequency (%)
아파트 142
 
1.3%
래미안 35
 
0.3%
북한산 20
 
0.2%
신동아파밀리에 18
 
0.2%
힐스테이트 17
 
0.2%
신동아아파트 17
 
0.2%
신반포 16
 
0.2%
아이파크 15
 
0.1%
두산아파트 15
 
0.1%
신림동부 14
 
0.1%
Other values (2231) 10289
97.1%
2024-05-11T06:56:38.329876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2429
 
3.4%
2348
 
3.3%
2128
 
3.0%
1785
 
2.5%
1756
 
2.4%
1674
 
2.3%
1561
 
2.2%
1530
 
2.1%
1319
 
1.8%
1299
 
1.8%
Other values (421) 53860
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66005
92.1%
Decimal Number 3566
 
5.0%
Space Separator 661
 
0.9%
Uppercase Letter 599
 
0.8%
Lowercase Letter 335
 
0.5%
Open Punctuation 133
 
0.2%
Close Punctuation 133
 
0.2%
Dash Punctuation 125
 
0.2%
Other Punctuation 119
 
0.2%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2429
 
3.7%
2348
 
3.6%
2128
 
3.2%
1785
 
2.7%
1756
 
2.7%
1674
 
2.5%
1561
 
2.4%
1530
 
2.3%
1319
 
2.0%
1299
 
2.0%
Other values (375) 48176
73.0%
Uppercase Letter
ValueCountFrequency (%)
S 92
15.4%
C 86
14.4%
K 66
11.0%
L 58
9.7%
M 51
8.5%
D 51
8.5%
H 50
8.3%
E 33
 
5.5%
I 28
 
4.7%
V 25
 
4.2%
Other values (7) 59
9.8%
Lowercase Letter
ValueCountFrequency (%)
e 193
57.6%
l 28
 
8.4%
i 28
 
8.4%
v 21
 
6.3%
k 19
 
5.7%
c 16
 
4.8%
s 13
 
3.9%
w 13
 
3.9%
h 2
 
0.6%
g 1
 
0.3%
Decimal Number
ValueCountFrequency (%)
2 1078
30.2%
1 1047
29.4%
3 506
14.2%
4 236
 
6.6%
5 202
 
5.7%
6 143
 
4.0%
8 94
 
2.6%
7 93
 
2.6%
9 89
 
2.5%
0 78
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 92
77.3%
. 27
 
22.7%
Space Separator
ValueCountFrequency (%)
661
100.0%
Open Punctuation
ValueCountFrequency (%)
( 133
100.0%
Close Punctuation
ValueCountFrequency (%)
) 133
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 125
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%
Math Symbol
ValueCountFrequency (%)
~ 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66005
92.1%
Common 4743
 
6.6%
Latin 941
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2429
 
3.7%
2348
 
3.6%
2128
 
3.2%
1785
 
2.7%
1756
 
2.7%
1674
 
2.5%
1561
 
2.4%
1530
 
2.3%
1319
 
2.0%
1299
 
2.0%
Other values (375) 48176
73.0%
Latin
ValueCountFrequency (%)
e 193
20.5%
S 92
 
9.8%
C 86
 
9.1%
K 66
 
7.0%
L 58
 
6.2%
M 51
 
5.4%
D 51
 
5.4%
H 50
 
5.3%
E 33
 
3.5%
l 28
 
3.0%
Other values (19) 233
24.8%
Common
ValueCountFrequency (%)
2 1078
22.7%
1 1047
22.1%
661
13.9%
3 506
10.7%
4 236
 
5.0%
5 202
 
4.3%
6 143
 
3.0%
( 133
 
2.8%
) 133
 
2.8%
- 125
 
2.6%
Other values (7) 479
10.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66005
92.1%
ASCII 5677
 
7.9%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2429
 
3.7%
2348
 
3.6%
2128
 
3.2%
1785
 
2.7%
1756
 
2.7%
1674
 
2.5%
1561
 
2.4%
1530
 
2.3%
1319
 
2.0%
1299
 
2.0%
Other values (375) 48176
73.0%
ASCII
ValueCountFrequency (%)
2 1078
19.0%
1 1047
18.4%
661
11.6%
3 506
 
8.9%
4 236
 
4.2%
5 202
 
3.6%
e 193
 
3.4%
6 143
 
2.5%
( 133
 
2.3%
) 133
 
2.3%
Other values (35) 1345
23.7%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2172
Distinct (%)21.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:39.565687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique119 ?
Unique (%)1.2%

Sample

1st rowA15288004
2nd rowA12102005
3rd rowA13987501
4th rowA13123001
5th rowA12205004
ValueCountFrequency (%)
a13983713 15
 
0.1%
a15101101 14
 
0.1%
a13380803 13
 
0.1%
a13986702 12
 
0.1%
a13305003 12
 
0.1%
a14082601 12
 
0.1%
a13410003 11
 
0.1%
a13920207 11
 
0.1%
a12078705 11
 
0.1%
a15279403 11
 
0.1%
Other values (2162) 9878
98.8%
2024-05-11T06:56:41.033444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18476
20.5%
1 17639
19.6%
A 10000
11.1%
3 9025
10.0%
2 8219
9.1%
5 6178
 
6.9%
8 5752
 
6.4%
7 4730
 
5.3%
4 3676
 
4.1%
6 3479
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18476
23.1%
1 17639
22.0%
3 9025
11.3%
2 8219
10.3%
5 6178
 
7.7%
8 5752
 
7.2%
7 4730
 
5.9%
4 3676
 
4.6%
6 3479
 
4.3%
9 2826
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18476
23.1%
1 17639
22.0%
3 9025
11.3%
2 8219
10.3%
5 6178
 
7.7%
8 5752
 
7.2%
7 4730
 
5.9%
4 3676
 
4.6%
6 3479
 
4.3%
9 2826
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18476
20.5%
1 17639
19.6%
A 10000
11.1%
3 9025
10.0%
2 8219
9.1%
5 6178
 
6.9%
8 5752
 
6.4%
7 4730
 
5.3%
4 3676
 
4.1%
6 3479
 
3.9%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:41.825104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8601
Min length2

Characters and Unicode

Total characters48601
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row세대수도료
2nd row부과차손
3rd row복리후생비
4th row세대전기료
5th row사무용품비
ValueCountFrequency (%)
승강기유지비 241
 
2.4%
경비비 230
 
2.3%
퇴직급여 227
 
2.3%
청소비 223
 
2.2%
입주자대표회의운영비 220
 
2.2%
이자수익 220
 
2.2%
통신비 220
 
2.2%
보험료 218
 
2.2%
교육비 216
 
2.2%
복리후생비 211
 
2.1%
Other values (76) 7774
77.7%
2024-05-11T06:56:43.234370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5480
 
11.3%
3528
 
7.3%
2090
 
4.3%
1988
 
4.1%
1682
 
3.5%
1296
 
2.7%
1053
 
2.2%
846
 
1.7%
823
 
1.7%
773
 
1.6%
Other values (110) 29042
59.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48601
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5480
 
11.3%
3528
 
7.3%
2090
 
4.3%
1988
 
4.1%
1682
 
3.5%
1296
 
2.7%
1053
 
2.2%
846
 
1.7%
823
 
1.7%
773
 
1.6%
Other values (110) 29042
59.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48601
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5480
 
11.3%
3528
 
7.3%
2090
 
4.3%
1988
 
4.1%
1682
 
3.5%
1296
 
2.7%
1053
 
2.2%
846
 
1.7%
823
 
1.7%
773
 
1.6%
Other values (110) 29042
59.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48601
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5480
 
11.3%
3528
 
7.3%
2090
 
4.3%
1988
 
4.1%
1682
 
3.5%
1296
 
2.7%
1053
 
2.2%
846
 
1.7%
823
 
1.7%
773
 
1.6%
Other values (110) 29042
59.8%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202006
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202006
2nd row202006
3rd row202006
4th row202006
5th row202006

Common Values

ValueCountFrequency (%)
202006 10000
100.0%

Length

2024-05-11T06:56:43.801381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:56:44.325509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202006 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7081
Distinct (%)70.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2868116.6
Minimum-2217600
Maximum3.5655979 × 108
Zeros1025
Zeros (%)10.2%
Negative10
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:56:44.752384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2217600
5-th percentile0
Q165097.5
median308000
Q31386115
95-th percentile13927893
Maximum3.5655979 × 108
Range3.5877739 × 108
Interquartile range (IQR)1321017.5

Descriptive statistics

Standard deviation9911111.4
Coefficient of variation (CV)3.4556166
Kurtosis300.28605
Mean2868116.6
Median Absolute Deviation (MAD)304795
Skewness12.85274
Sum2.8681166 × 1010
Variance9.8230129 × 1013
MonotonicityNot monotonic
2024-05-11T06:56:45.261621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1025
 
10.2%
38000 155
 
1.6%
200000 92
 
0.9%
300000 68
 
0.7%
100000 64
 
0.6%
150000 51
 
0.5%
400000 34
 
0.3%
600000 33
 
0.3%
500000 32
 
0.3%
50000 30
 
0.3%
Other values (7071) 8416
84.2%
ValueCountFrequency (%)
-2217600 1
< 0.1%
-1970510 1
< 0.1%
-1638000 1
< 0.1%
-730940 1
< 0.1%
-618550 1
< 0.1%
-596909 1
< 0.1%
-394900 1
< 0.1%
-276461 1
< 0.1%
-245850 1
< 0.1%
-119636 1
< 0.1%
ValueCountFrequency (%)
356559790 1
< 0.1%
256291548 1
< 0.1%
242454582 1
< 0.1%
239504110 1
< 0.1%
137924020 1
< 0.1%
122297498 1
< 0.1%
121096852 1
< 0.1%
119856560 1
< 0.1%
111916670 1
< 0.1%
106768700 1
< 0.1%

Interactions

2024-05-11T06:56:35.410067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:56:45.611216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.326
금액0.3261.000

Missing values

2024-05-11T06:56:35.991566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:56:36.407839image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
83737구로현대A15288004세대수도료2020068735680
13815마포현대아파트A12102005부과차손2020060
65747성북역신도브래뉴A13987501복리후생비202006200000
24664예성그린캐슬아파트A13123001세대전기료20200610579570
18802갈현현대아파트A12205004사무용품비20200610200
12133래미안남가좌2차A12012101청소비2020068923160
88121동작상떼빌주상복합A15670001소독비202006410000
560구로항동우남퍼스트빌A10024849기타부대비202006313790
76392신길우성3차아파트A15086004교통비2020064600
2680송파 두산위브아파트A10025657검침비용202006124630
아파트명아파트코드비용명년월일금액
73125당산효성1차A15004506산재보험료20200665660
40295래미안포레A13520001세대수도료20200619686530
85786남서울럭키아파트A15386506시설보수비20200697350
27355도봉동아에코빌A13201206세금과공과20200668580
35870둔촌하이츠A13406003잡수익2020060
66161하계삼익선경A13993501승강기수익202006150000
11100냉천동부센트레빌A12005001세대수도료2020064357200
23294전농SKA13084804잡비용2020061280000
21699휘경동일하이빌A13009202퇴직급여2020061491940
23396전농동아A13085901임대료수익2020060