Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text2
Categorical2
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15822/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 310 (3.1%) zerosZeros

Reproduction

Analysis started2024-05-11 05:47:49.624976
Analysis finished2024-05-11 05:47:50.463047
Duration0.84 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2098
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:47:50.667897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length19
Mean length7.1264
Min length2

Characters and Unicode

Total characters71264
Distinct characters431
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)0.9%

Sample

1st row도곡현대
2nd row북가좌삼호제2
3rd row잠실미성
4th row신성둔촌미소지움1차
5th row북가좌휴먼빌
ValueCountFrequency (%)
아파트 100
 
1.0%
래미안 32
 
0.3%
암사선사현대 17
 
0.2%
신내 14
 
0.1%
래미안밤섬리베뉴 14
 
0.1%
가양대림경동 13
 
0.1%
브라운스톤 13
 
0.1%
제기안암골벽산 12
 
0.1%
반포리체 12
 
0.1%
보문파크뷰자이아파트 12
 
0.1%
Other values (2150) 10228
97.7%
2024-05-11T14:47:51.199299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2190
 
3.1%
2075
 
2.9%
1899
 
2.7%
1895
 
2.7%
1828
 
2.6%
1707
 
2.4%
1568
 
2.2%
1561
 
2.2%
1510
 
2.1%
1371
 
1.9%
Other values (421) 53660
75.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65238
91.5%
Decimal Number 3981
 
5.6%
Uppercase Letter 604
 
0.8%
Space Separator 514
 
0.7%
Lowercase Letter 383
 
0.5%
Dash Punctuation 148
 
0.2%
Other Punctuation 130
 
0.2%
Close Punctuation 127
 
0.2%
Open Punctuation 127
 
0.2%
Math Symbol 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2190
 
3.4%
2075
 
3.2%
1899
 
2.9%
1895
 
2.9%
1828
 
2.8%
1707
 
2.6%
1568
 
2.4%
1561
 
2.4%
1510
 
2.3%
1371
 
2.1%
Other values (375) 47634
73.0%
Uppercase Letter
ValueCountFrequency (%)
S 116
19.2%
K 75
12.4%
C 61
10.1%
L 50
8.3%
H 44
 
7.3%
M 42
 
7.0%
D 42
 
7.0%
G 41
 
6.8%
I 29
 
4.8%
E 24
 
4.0%
Other values (7) 80
13.2%
Lowercase Letter
ValueCountFrequency (%)
e 183
47.8%
l 54
 
14.1%
i 47
 
12.3%
v 36
 
9.4%
s 15
 
3.9%
w 13
 
3.4%
k 11
 
2.9%
a 7
 
1.8%
g 7
 
1.8%
h 6
 
1.6%
Decimal Number
ValueCountFrequency (%)
1 1226
30.8%
2 1155
29.0%
3 506
12.7%
4 254
 
6.4%
5 212
 
5.3%
6 170
 
4.3%
8 132
 
3.3%
7 120
 
3.0%
9 114
 
2.9%
0 92
 
2.3%
Other Punctuation
ValueCountFrequency (%)
, 106
81.5%
. 24
 
18.5%
Space Separator
ValueCountFrequency (%)
514
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 148
100.0%
Close Punctuation
ValueCountFrequency (%)
) 127
100.0%
Open Punctuation
ValueCountFrequency (%)
( 127
100.0%
Math Symbol
ValueCountFrequency (%)
~ 7
100.0%
Letter Number
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65238
91.5%
Common 5034
 
7.1%
Latin 992
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2190
 
3.4%
2075
 
3.2%
1899
 
2.9%
1895
 
2.9%
1828
 
2.8%
1707
 
2.6%
1568
 
2.4%
1561
 
2.4%
1510
 
2.3%
1371
 
2.1%
Other values (375) 47634
73.0%
Latin
ValueCountFrequency (%)
e 183
18.4%
S 116
11.7%
K 75
 
7.6%
C 61
 
6.1%
l 54
 
5.4%
L 50
 
5.0%
i 47
 
4.7%
H 44
 
4.4%
M 42
 
4.2%
D 42
 
4.2%
Other values (19) 278
28.0%
Common
ValueCountFrequency (%)
1 1226
24.4%
2 1155
22.9%
514
10.2%
3 506
10.1%
4 254
 
5.0%
5 212
 
4.2%
6 170
 
3.4%
- 148
 
2.9%
8 132
 
2.6%
) 127
 
2.5%
Other values (7) 590
11.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65238
91.5%
ASCII 6021
 
8.4%
Number Forms 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2190
 
3.4%
2075
 
3.2%
1899
 
2.9%
1895
 
2.9%
1828
 
2.8%
1707
 
2.6%
1568
 
2.4%
1561
 
2.4%
1510
 
2.3%
1371
 
2.1%
Other values (375) 47634
73.0%
ASCII
ValueCountFrequency (%)
1 1226
20.4%
2 1155
19.2%
514
 
8.5%
3 506
 
8.4%
4 254
 
4.2%
5 212
 
3.5%
e 183
 
3.0%
6 170
 
2.8%
- 148
 
2.5%
8 132
 
2.2%
Other values (35) 1521
25.3%
Number Forms
ValueCountFrequency (%)
5
100.0%
Distinct2104
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:47:51.537743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)0.9%

Sample

1st rowA13586102
2nd rowA12076601
3rd rowA13824004
4th rowA13406205
5th rowA12013001
ValueCountFrequency (%)
a13405201 17
 
0.2%
a15780703 13
 
0.1%
a15786104 12
 
0.1%
a12208102 12
 
0.1%
a15703301 12
 
0.1%
a12114001 12
 
0.1%
a12187906 12
 
0.1%
a13086101 12
 
0.1%
a10027189 12
 
0.1%
a13776301 12
 
0.1%
Other values (2094) 9874
98.7%
2024-05-11T14:47:52.055827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18272
20.3%
1 17577
19.5%
A 9990
11.1%
3 9136
10.2%
2 7891
8.8%
5 6157
 
6.8%
8 5741
 
6.4%
7 4885
 
5.4%
4 3806
 
4.2%
6 3414
 
3.8%
Other values (2) 3131
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18272
22.8%
1 17577
22.0%
3 9136
11.4%
2 7891
9.9%
5 6157
 
7.7%
8 5741
 
7.2%
7 4885
 
6.1%
4 3806
 
4.8%
6 3414
 
4.3%
9 3121
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
A 9990
99.9%
B 10
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18272
22.8%
1 17577
22.0%
3 9136
11.4%
2 7891
9.9%
5 6157
 
7.7%
8 5741
 
7.2%
7 4885
 
6.1%
4 3806
 
4.8%
6 3414
 
4.3%
9 3121
 
3.9%
Latin
ValueCountFrequency (%)
A 9990
99.9%
B 10
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18272
20.3%
1 17577
19.5%
A 9990
11.1%
3 9136
10.2%
2 7891
8.8%
5 6157
 
6.8%
8 5741
 
6.4%
7 4885
 
5.4%
4 3806
 
4.2%
6 3414
 
3.8%
Other values (2) 3131
 
3.5%

비용명
Categorical

Distinct44
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
급여
 
514
통신비
 
487
산재보험료
 
467
세대전기료
 
455
도서인쇄비
 
454
Other values (39)
7623 

Length

Max length7
Median length5
Mean length4.3326
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row산재보험료
2nd row국민연금
3rd row퇴직급여
4th row차량유지비
5th row도서인쇄비

Common Values

ValueCountFrequency (%)
급여 514
 
5.1%
통신비 487
 
4.9%
산재보험료 467
 
4.7%
세대전기료 455
 
4.5%
도서인쇄비 454
 
4.5%
퇴직급여 452
 
4.5%
사무용품비 445
 
4.5%
제수당 439
 
4.4%
국민연금 430
 
4.3%
기타부대비 418
 
4.2%
Other values (34) 5439
54.4%

Length

2024-05-11T14:47:52.256846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
급여 514
 
5.1%
통신비 487
 
4.9%
산재보험료 467
 
4.7%
세대전기료 455
 
4.5%
도서인쇄비 454
 
4.5%
퇴직급여 452
 
4.5%
사무용품비 445
 
4.5%
제수당 439
 
4.4%
국민연금 430
 
4.3%
기타부대비 418
 
4.2%
Other values (34) 5439
54.4%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201901
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201901
2nd row201901
3rd row201901
4th row201901
5th row201901

Common Values

ValueCountFrequency (%)
201901 10000
100.0%

Length

2024-05-11T14:47:52.426912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:47:52.522974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201901 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7823
Distinct (%)78.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4855792
Minimum-2913440
Maximum7.0520912 × 108
Zeros310
Zeros (%)3.1%
Negative7
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T14:47:52.638628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2913440
5-th percentile5335.5
Q1100000
median281995
Q31350670
95-th percentile24951920
Maximum7.0520912 × 108
Range7.0812256 × 108
Interquartile range (IQR)1250670

Descriptive statistics

Standard deviation19265478
Coefficient of variation (CV)3.9675255
Kurtosis337.99309
Mean4855792
Median Absolute Deviation (MAD)239710
Skewness13.929104
Sum4.855792 × 1010
Variance3.7115866 × 1014
MonotonicityNot monotonic
2024-05-11T14:47:52.815265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 310
 
3.1%
78000 122
 
1.2%
200000 110
 
1.1%
300000 66
 
0.7%
100000 59
 
0.6%
110000 44
 
0.4%
150000 42
 
0.4%
10000 23
 
0.2%
50000 23
 
0.2%
121000 23
 
0.2%
Other values (7813) 9178
91.8%
ValueCountFrequency (%)
-2913440 1
 
< 0.1%
-822520 1
 
< 0.1%
-279620 1
 
< 0.1%
-132000 1
 
< 0.1%
-106990 1
 
< 0.1%
-60760 1
 
< 0.1%
-50000 1
 
< 0.1%
0 310
3.1%
7 1
 
< 0.1%
400 2
 
< 0.1%
ValueCountFrequency (%)
705209120 1
< 0.1%
568227968 1
< 0.1%
480668350 1
< 0.1%
390648530 1
< 0.1%
278879680 1
< 0.1%
265930050 1
< 0.1%
263861480 1
< 0.1%
256988102 1
< 0.1%
242252330 1
< 0.1%
230042320 1
< 0.1%

Interactions

2024-05-11T14:47:50.087654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:47:52.951711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.469
금액0.4691.000
2024-05-11T14:47:53.037134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
금액비용명
금액1.0000.195
비용명0.1951.000

Missing values

2024-05-11T14:47:50.287226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:47:50.400460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
17339도곡현대A13586102산재보험료201901145770
4069북가좌삼호제2A12076601국민연금201901181980
22822잠실미성A13824004퇴직급여2019013310320
13802신성둔촌미소지움1차A13406205차량유지비201901200000
3737북가좌휴먼빌A12013001도서인쇄비201901406500
8886묵동신안3차A13114106퇴직급여2019011098110
40063목동부영그린타운3차A15805301세대수도료2019013471040
28086산천리버힐제2A14076401고용보험료20190126900
3480홍제현대그린A12009303고용보험료20190140870
30139여의도광장A15001019교육비20190119000
아파트명아파트코드비용명년월일금액
32333신길자이A15096001급여2019016441430
37168대방대림A15681110세대난방비201901149005360
41375신정대림A15885303퇴직급여201901750160
27098하계현대우성A13987303소모품비201901918200
12186금호삼성래미안A13309102산재보험료201901532630
15326아이파크삼성동A13509009기타인건비2019015709407
20153반포미도2차A13770105국민연금201901401420
38851가양강나루현대A15780401국민연금201901229960
1153금천롯데캐슬골드파크1차아파트A10027188교육비201901-132000
26866중계우성3차A13986201건강보험료201901240890