Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells2616
Missing cells (%)4.4%
Duplicate rows240
Duplicate rows (%)2.4%
Total size in memory556.6 KiB
Average record size in memory57.0 B

Variable types

Categorical2
Text2
DateTime1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-21245/F/1/datasetView.do

Alerts

기준년월 has constant value ""Constant
Dataset has 240 (2.4%) duplicate rowsDuplicates
연료 is highly imbalanced (77.0%)Imbalance
현소유자의출생년도 has 2611 (26.1%) missing valuesMissing

Reproduction

Analysis started2024-03-13 07:47:31.189270
Analysis finished2024-03-13 07:47:31.814147
Duration0.62 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

기준년월
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2017-12
10000 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2017-12
2nd row2017-12
3rd row2017-12
4th row2017-12
5th row2017-12

Common Values

ValueCountFrequency (%)
2017-12 10000
100.0%

Length

2024-03-13T16:47:31.894553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T16:47:31.969507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2017-12 10000
100.0%
Distinct424
Distinct (%)4.2%
Missing5
Missing (%)< 0.1%
Memory size156.2 KiB
2024-03-13T16:47:32.271667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length14
Mean length13.871436
Min length11

Characters and Unicode

Total characters138645
Distinct characters194
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row서울특별시 동대문구 답십리2동
2nd row서울특별시 중구 을지로동
3rd row서울특별시 강남구 압구정동
4th row서울특별시 동작구 노량진1동
5th row서울특별시 성동구 성수1가2동
ValueCountFrequency (%)
서울특별시 9995
33.3%
강남구 1573
 
5.2%
강서구 861
 
2.9%
서초구 851
 
2.8%
송파구 746
 
2.5%
영등포구 529
 
1.8%
역삼1동 489
 
1.6%
강동구 381
 
1.3%
양천구 378
 
1.3%
대치4동 370
 
1.2%
Other values (439) 13812
46.1%
2024-03-13T16:47:32.716653image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
19990
14.4%
12242
 
8.8%
11214
 
8.1%
10627
 
7.7%
10071
 
7.3%
9995
 
7.2%
9995
 
7.2%
9995
 
7.2%
3085
 
2.2%
1 3018
 
2.2%
Other values (184) 38413
27.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 111442
80.4%
Space Separator 19990
 
14.4%
Decimal Number 7071
 
5.1%
Other Punctuation 142
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
12242
 
11.0%
11214
 
10.1%
10627
 
9.5%
10071
 
9.0%
9995
 
9.0%
9995
 
9.0%
9995
 
9.0%
3085
 
2.8%
1670
 
1.5%
1201
 
1.1%
Other values (172) 31347
28.1%
Decimal Number
ValueCountFrequency (%)
1 3018
42.7%
2 1999
28.3%
4 831
 
11.8%
3 751
 
10.6%
5 165
 
2.3%
7 121
 
1.7%
6 113
 
1.6%
8 50
 
0.7%
9 12
 
0.2%
0 11
 
0.2%
Space Separator
ValueCountFrequency (%)
19990
100.0%
Other Punctuation
ValueCountFrequency (%)
. 142
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 111442
80.4%
Common 27203
 
19.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
12242
 
11.0%
11214
 
10.1%
10627
 
9.5%
10071
 
9.0%
9995
 
9.0%
9995
 
9.0%
9995
 
9.0%
3085
 
2.8%
1670
 
1.5%
1201
 
1.1%
Other values (172) 31347
28.1%
Common
ValueCountFrequency (%)
19990
73.5%
1 3018
 
11.1%
2 1999
 
7.3%
4 831
 
3.1%
3 751
 
2.8%
5 165
 
0.6%
. 142
 
0.5%
7 121
 
0.4%
6 113
 
0.4%
8 50
 
0.2%
Other values (2) 23
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 111442
80.4%
ASCII 27203
 
19.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
19990
73.5%
1 3018
 
11.1%
2 1999
 
7.3%
4 831
 
3.1%
3 751
 
2.8%
5 165
 
0.6%
. 142
 
0.5%
7 121
 
0.4%
6 113
 
0.4%
8 50
 
0.2%
Other values (2) 23
 
0.1%
Hangul
ValueCountFrequency (%)
12242
 
11.0%
11214
 
10.1%
10627
 
9.5%
10071
 
9.0%
9995
 
9.0%
9995
 
9.0%
9995
 
9.0%
3085
 
2.8%
1670
 
1.5%
1201
 
1.1%
Other values (172) 31347
28.1%

차명
Text

Distinct115
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-13T16:47:32.997660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length19
Mean length12.8484
Min length2

Characters and Unicode

Total characters128484
Distinct characters140
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39 ?
Unique (%)0.4%

Sample

1st row니로 하이브리드
2nd row토요타 Prius
3rd row토요타 Camry Hybrid
4th row쏘나타 (SONATA) 하이브리드
5th rowK5 하이브리드
ValueCountFrequency (%)
하이브리드 4590
19.8%
렉서스 1918
 
8.3%
es300h 1201
 
5.2%
쏘나타 1161
 
5.0%
토요타 1138
 
4.9%
니로 1023
 
4.4%
그랜저 953
 
4.1%
hyb 940
 
4.0%
k5 866
 
3.7%
하이브리드(sonata 734
 
3.2%
Other values (134) 8697
37.5%
2024-03-13T16:47:33.358454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
13680
 
10.6%
6681
 
5.2%
5945
 
4.6%
5942
 
4.6%
5922
 
4.6%
5919
 
4.6%
A 4756
 
3.7%
N 3796
 
3.0%
0 3750
 
2.9%
E 3496
 
2.7%
Other values (130) 68597
53.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 56035
43.6%
Uppercase Letter 39172
30.5%
Space Separator 13680
 
10.6%
Decimal Number 7557
 
5.9%
Lowercase Letter 7192
 
5.6%
Open Punctuation 3405
 
2.7%
Close Punctuation 1300
 
1.0%
Other Punctuation 120
 
0.1%
Dash Punctuation 23
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6681
11.9%
5945
 
10.6%
5942
 
10.6%
5922
 
10.6%
5919
 
10.6%
2721
 
4.9%
2328
 
4.2%
1955
 
3.5%
1938
 
3.5%
1729
 
3.1%
Other values (70) 14955
26.7%
Uppercase Letter
ValueCountFrequency (%)
A 4756
12.1%
N 3796
 
9.7%
E 3496
 
8.9%
S 3467
 
8.9%
R 2848
 
7.3%
O 2414
 
6.2%
I 2287
 
5.8%
T 2144
 
5.5%
H 1982
 
5.1%
Y 1476
 
3.8%
Other values (15) 10506
26.8%
Lowercase Letter
ValueCountFrequency (%)
h 1945
27.0%
r 1079
15.0%
i 876
12.2%
y 859
11.9%
d 615
 
8.6%
b 595
 
8.3%
a 272
 
3.8%
m 271
 
3.8%
u 214
 
3.0%
s 212
 
2.9%
Other values (10) 254
 
3.5%
Decimal Number
ValueCountFrequency (%)
0 3750
49.6%
3 1541
20.4%
5 1172
 
15.5%
4 402
 
5.3%
7 371
 
4.9%
2 210
 
2.8%
6 41
 
0.5%
9 31
 
0.4%
8 23
 
0.3%
1 16
 
0.2%
Space Separator
ValueCountFrequency (%)
13680
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3405
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1300
100.0%
Other Punctuation
ValueCountFrequency (%)
. 120
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 56035
43.6%
Latin 46364
36.1%
Common 26085
20.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6681
11.9%
5945
 
10.6%
5942
 
10.6%
5922
 
10.6%
5919
 
10.6%
2721
 
4.9%
2328
 
4.2%
1955
 
3.5%
1938
 
3.5%
1729
 
3.1%
Other values (70) 14955
26.7%
Latin
ValueCountFrequency (%)
A 4756
 
10.3%
N 3796
 
8.2%
E 3496
 
7.5%
S 3467
 
7.5%
R 2848
 
6.1%
O 2414
 
5.2%
I 2287
 
4.9%
T 2144
 
4.6%
H 1982
 
4.3%
h 1945
 
4.2%
Other values (35) 17229
37.2%
Common
ValueCountFrequency (%)
13680
52.4%
0 3750
 
14.4%
( 3405
 
13.1%
3 1541
 
5.9%
) 1300
 
5.0%
5 1172
 
4.5%
4 402
 
1.5%
7 371
 
1.4%
2 210
 
0.8%
. 120
 
0.5%
Other values (5) 134
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 72449
56.4%
Hangul 56035
43.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13680
18.9%
A 4756
 
6.6%
N 3796
 
5.2%
0 3750
 
5.2%
E 3496
 
4.8%
S 3467
 
4.8%
( 3405
 
4.7%
R 2848
 
3.9%
O 2414
 
3.3%
I 2287
 
3.2%
Other values (50) 28550
39.4%
Hangul
ValueCountFrequency (%)
6681
11.9%
5945
 
10.6%
5942
 
10.6%
5922
 
10.6%
5919
 
10.6%
2721
 
4.9%
2328
 
4.2%
1955
 
3.5%
1938
 
3.5%
1729
 
3.1%
Other values (70) 14955
26.7%

연료
Categorical

IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
하이브리드(휘발유+전기)
8928 
전기
 
750
하이브리드(LPG+전기)
 
312
하이브리드(CNG+전기)
 
6
하이브리드(경유+전기)
 
3

Length

Max length13
Median length13
Mean length12.1736
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row하이브리드(휘발유+전기)
2nd row하이브리드(휘발유+전기)
3rd row하이브리드(휘발유+전기)
4th row하이브리드(휘발유+전기)
5th row하이브리드(휘발유+전기)

Common Values

ValueCountFrequency (%)
하이브리드(휘발유+전기) 8928
89.3%
전기 750
 
7.5%
하이브리드(LPG+전기) 312
 
3.1%
하이브리드(CNG+전기) 6
 
0.1%
하이브리드(경유+전기) 3
 
< 0.1%
수소 1
 
< 0.1%

Length

2024-03-13T16:47:33.467909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T16:47:33.559629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
하이브리드(휘발유+전기 8928
89.3%
전기 750
 
7.5%
하이브리드(lpg+전기 312
 
3.1%
하이브리드(cng+전기 6
 
0.1%
하이브리드(경유+전기 3
 
< 0.1%
수소 1
 
< 0.1%
Distinct1948
Distinct (%)19.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2004-06-10 00:00:00
Maximum2017-12-29 00:00:00
2024-03-13T16:47:33.674575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T16:47:33.782618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

현소유자의출생년도
Real number (ℝ)

MISSING 

Distinct75
Distinct (%)1.0%
Missing2611
Missing (%)26.1%
Infinite0
Infinite (%)0.0%
Mean1971.3443
Minimum1924
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-13T16:47:33.900661image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1924
5-th percentile1951
Q11963
median1973
Q31981
95-th percentile1987
Maximum2015
Range91
Interquartile range (IQR)18

Descriptive statistics

Standard deviation11.469328
Coefficient of variation (CV)0.005818024
Kurtosis-0.16522264
Mean1971.3443
Median Absolute Deviation (MAD)8
Skewness-0.46163693
Sum14566263
Variance131.54549
MonotonicityNot monotonic
2024-03-13T16:47:34.029804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1983 270
 
2.7%
1981 266
 
2.7%
1971 251
 
2.5%
1980 250
 
2.5%
1982 246
 
2.5%
1970 245
 
2.5%
1974 241
 
2.4%
1975 240
 
2.4%
1973 238
 
2.4%
1979 236
 
2.4%
Other values (65) 4906
49.1%
(Missing) 2611
26.1%
ValueCountFrequency (%)
1924 1
 
< 0.1%
1929 1
 
< 0.1%
1930 2
 
< 0.1%
1931 2
 
< 0.1%
1932 1
 
< 0.1%
1933 1
 
< 0.1%
1934 2
 
< 0.1%
1935 1
 
< 0.1%
1936 5
0.1%
1937 7
0.1%
ValueCountFrequency (%)
2015 1
 
< 0.1%
2012 1
 
< 0.1%
2011 4
 
< 0.1%
2006 1
 
< 0.1%
2001 1
 
< 0.1%
2000 1
 
< 0.1%
1996 2
 
< 0.1%
1995 4
 
< 0.1%
1994 4
 
< 0.1%
1993 10
0.1%

Interactions

2024-03-13T16:47:31.486871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-13T16:47:34.105668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연료현소유자의출생년도
연료1.0000.000
현소유자의출생년도0.0001.000
2024-03-13T16:47:34.186695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
현소유자의출생년도연료
현소유자의출생년도1.0000.000
연료0.0001.000

Missing values

2024-03-13T16:47:31.575710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T16:47:31.664797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-13T16:47:31.759061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

기준년월사용본거지시읍면동_행정동기준차명연료최초등록일현소유자의출생년도
86932017-12서울특별시 동대문구 답십리2동니로 하이브리드하이브리드(휘발유+전기)2016-11-082012
632752017-12서울특별시 중구 을지로동토요타 Prius하이브리드(휘발유+전기)2016-10-311972
391182017-12서울특별시 강남구 압구정동토요타 Camry Hybrid하이브리드(휘발유+전기)2017-10-271989
39302017-12서울특별시 동작구 노량진1동쏘나타 (SONATA) 하이브리드하이브리드(휘발유+전기)2012-02-291962
559202017-12서울특별시 성동구 성수1가2동K5 하이브리드하이브리드(휘발유+전기)2013-06-051984
186292017-12서울특별시 서초구 방배4동렉서스 LS600hL하이브리드(휘발유+전기)2014-09-181941
448742017-12서울특별시 용산구 원효로1동그랜저(GRANDEUR) 하이브리드하이브리드(휘발유+전기)2014-12-171969
124402017-12서울특별시 양천구 신정7동그랜저 하이브리드하이브리드(휘발유+전기)2017-08-301959
79452017-12서울특별시 성동구 응봉동토요타 Camry Hybrid하이브리드(휘발유+전기)2017-04-211963
563192017-12서울특별시 영등포구 여의동그랜저(GRANDEUR) 하이브리드하이브리드(휘발유+전기)2014-12-19<NA>
기준년월사용본거지시읍면동_행정동기준차명연료최초등록일현소유자의출생년도
236182017-12서울특별시 관악구 청룡동아이오닉 일렉트릭(IONIQ ELEC전기2017-06-02<NA>
291692017-12서울특별시 은평구 진관동그랜저 하이브리드하이브리드(휘발유+전기)2017-09-281962
193882017-12서울특별시 서초구 방배1동아이오닉 하이브리드(IONIQ HY하이브리드(휘발유+전기)2016-12-271980
309962017-12서울특별시 중구 소공동그랜저(GRANDEUR) 하이브리드하이브리드(휘발유+전기)2015-03-20<NA>
543402017-12서울특별시 서초구 양재1동쏘나타 하이브리드(SONATA HYB하이브리드(휘발유+전기)2017-07-19<NA>
617742017-12서울특별시 양천구 신정6동쏘나타(SONATA) 하이브리드하이브리드(휘발유+전기)2013-04-291971
159002017-12서울특별시 영등포구 여의동쏘나타 하이브리드(SONATA HYB하이브리드(휘발유+전기)2017-06-23<NA>
140732017-12서울특별시 도봉구 창2동K5 하이브리드하이브리드(휘발유+전기)2016-05-041966
449432017-12서울특별시 영등포구 신길3동토요타 RAV4 Hybrid하이브리드(휘발유+전기)2016-11-111982
83212017-12서울특별시 강남구 압구정동K5 하이브리드하이브리드(휘발유+전기)2014-08-071937

Duplicate rows

Most frequently occurring

기준년월사용본거지시읍면동_행정동기준차명연료최초등록일현소유자의출생년도# duplicates
2032017-12서울특별시 서초구 양재2동K5 하이브리드하이브리드(휘발유+전기)2016-01-13<NA>29
212017-12서울특별시 강남구 대치4동아이오닉 일렉트릭(IONIQ ELEC전기2017-06-29<NA>21
1892017-12서울특별시 서초구 양재1동쏘나타 하이브리드(SONATA HYB하이브리드(휘발유+전기)2014-12-24<NA>21
2022017-12서울특별시 서초구 양재2동K5 하이브리드하이브리드(휘발유+전기)2016-01-12<NA>21
202017-12서울특별시 강남구 대치4동아이오닉 일렉트릭(IONIQ ELEC전기2017-06-16<NA>19
302017-12서울특별시 강남구 대치4동아이오닉 일렉트릭(IONIQ ELEC전기2017-09-21<NA>19
1862017-12서울특별시 서초구 양재1동쏘나타 하이브리드(SONATA HYB하이브리드(휘발유+전기)2014-12-19<NA>18
162017-12서울특별시 강남구 대치4동쏘울 EV전기2017-11-22<NA>17
172017-12서울특별시 강남구 대치4동쏘울 EV전기2017-12-04<NA>17
322017-12서울특별시 강남구 대치4동아이오닉 일렉트릭(IONIQ ELEC전기2017-09-28<NA>17