Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells2585
Missing cells (%)4.3%
Duplicate rows259
Duplicate rows (%)2.6%
Total size in memory556.6 KiB
Average record size in memory57.0 B

Variable types

DateTime2
Text2
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-21245/F/1/datasetView.do

Alerts

기준년월 has constant value ""Constant
Dataset has 259 (2.6%) duplicate rowsDuplicates
연료 is highly imbalanced (81.5%)Imbalance
현소유자의출생년도 has 2582 (25.8%) missing valuesMissing

Reproduction

Analysis started2024-03-13 07:47:35.110577
Analysis finished2024-03-13 07:47:35.983475
Duration0.87 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

기준년월
Date

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2016-12-01 00:00:00
Maximum2016-12-01 00:00:00
2024-03-13T16:47:36.033036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T16:47:36.122525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)
Distinct424
Distinct (%)4.2%
Missing3
Missing (%)< 0.1%
Memory size156.2 KiB
2024-03-13T16:47:36.394226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length14
Mean length13.846754
Min length11

Characters and Unicode

Total characters138426
Distinct characters194
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row서울특별시 성북구 정릉1동
2nd row서울특별시 양천구 목2동
3rd row서울특별시 강서구 화곡3동
4th row서울특별시 금천구 시흥3동
5th row서울특별시 강동구 성내2동
ValueCountFrequency (%)
서울특별시 9997
33.3%
강남구 1357
 
4.5%
서초구 966
 
3.2%
강서구 847
 
2.8%
송파구 726
 
2.4%
역삼1동 593
 
2.0%
영등포구 563
 
1.9%
마포구 445
 
1.5%
강동구 383
 
1.3%
양천구 380
 
1.3%
Other values (439) 13734
45.8%
2024-03-13T16:47:36.795455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
19994
14.4%
12323
 
8.9%
11237
 
8.1%
10656
 
7.7%
10082
 
7.3%
9997
 
7.2%
9997
 
7.2%
9997
 
7.2%
1 3182
 
2.3%
2858
 
2.1%
Other values (184) 38103
27.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 111417
80.5%
Space Separator 19994
 
14.4%
Decimal Number 6862
 
5.0%
Other Punctuation 153
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
12323
 
11.1%
11237
 
10.1%
10656
 
9.6%
10082
 
9.0%
9997
 
9.0%
9997
 
9.0%
9997
 
9.0%
2858
 
2.6%
1449
 
1.3%
1261
 
1.1%
Other values (172) 31560
28.3%
Decimal Number
ValueCountFrequency (%)
1 3182
46.4%
2 2020
29.4%
3 754
 
11.0%
4 475
 
6.9%
5 154
 
2.2%
6 122
 
1.8%
7 98
 
1.4%
8 38
 
0.6%
0 10
 
0.1%
9 9
 
0.1%
Space Separator
ValueCountFrequency (%)
19994
100.0%
Other Punctuation
ValueCountFrequency (%)
. 153
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 111417
80.5%
Common 27009
 
19.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
12323
 
11.1%
11237
 
10.1%
10656
 
9.6%
10082
 
9.0%
9997
 
9.0%
9997
 
9.0%
9997
 
9.0%
2858
 
2.6%
1449
 
1.3%
1261
 
1.1%
Other values (172) 31560
28.3%
Common
ValueCountFrequency (%)
19994
74.0%
1 3182
 
11.8%
2 2020
 
7.5%
3 754
 
2.8%
4 475
 
1.8%
5 154
 
0.6%
. 153
 
0.6%
6 122
 
0.5%
7 98
 
0.4%
8 38
 
0.1%
Other values (2) 19
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 111417
80.5%
ASCII 27009
 
19.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
19994
74.0%
1 3182
 
11.8%
2 2020
 
7.5%
3 754
 
2.8%
4 475
 
1.8%
5 154
 
0.6%
. 153
 
0.6%
6 122
 
0.5%
7 98
 
0.4%
8 38
 
0.1%
Other values (2) 19
 
0.1%
Hangul
ValueCountFrequency (%)
12323
 
11.1%
11237
 
10.1%
10656
 
9.6%
10082
 
9.0%
9997
 
9.0%
9997
 
9.0%
9997
 
9.0%
2858
 
2.6%
1449
 
1.3%
1261
 
1.1%
Other values (172) 31560
28.3%

차명
Text

Distinct103
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-13T16:47:37.017534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length20
Mean length13.3635
Min length2

Characters and Unicode

Total characters133635
Distinct characters141
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)0.4%

Sample

1st rowK5 하이브리드
2nd row쏘나타 하이브리드(SONATA HYB
3rd row쏘나타(SONATA) 하이브리드
4th row아반떼 하이브리드(AVANTE HYB
5th row쏘나타(SONATA) 하이브리드
ValueCountFrequency (%)
하이브리드 4705
20.1%
렉서스 1970
 
8.4%
쏘나타 1506
 
6.4%
es300h 1304
 
5.6%
토요타 1256
 
5.4%
k5 1171
 
5.0%
hyb 1114
 
4.8%
하이브리드(sonata 830
 
3.6%
prius 761
 
3.3%
sonata 668
 
2.9%
Other values (116) 8074
34.6%
2024-03-13T16:47:37.405206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
13928
 
10.4%
6654
 
5.0%
6247
 
4.7%
6246
 
4.7%
A 6224
 
4.7%
6219
 
4.7%
6216
 
4.7%
S 4288
 
3.2%
N 4285
 
3.2%
( 3952
 
3.0%
Other values (131) 69376
51.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 55896
41.8%
Uppercase Letter 42899
32.1%
Space Separator 13928
 
10.4%
Decimal Number 7992
 
6.0%
Lowercase Letter 6799
 
5.1%
Open Punctuation 3952
 
3.0%
Close Punctuation 1918
 
1.4%
Other Punctuation 230
 
0.2%
Dash Punctuation 21
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6654
11.9%
6247
11.2%
6246
11.2%
6219
11.1%
6216
11.1%
3340
 
6.0%
2137
 
3.8%
2097
 
3.8%
2066
 
3.7%
2020
 
3.6%
Other values (70) 12654
22.6%
Uppercase Letter
ValueCountFrequency (%)
A 6224
14.5%
S 4288
10.0%
N 4285
10.0%
R 3581
 
8.3%
E 3198
 
7.5%
T 2643
 
6.2%
O 2444
 
5.7%
H 2060
 
4.8%
U 1822
 
4.2%
I 1672
 
3.9%
Other values (15) 10682
24.9%
Lowercase Letter
ValueCountFrequency (%)
h 2002
29.4%
r 975
14.3%
y 828
12.2%
i 817
12.0%
b 598
 
8.8%
d 592
 
8.7%
a 242
 
3.6%
m 235
 
3.5%
u 146
 
2.1%
s 140
 
2.1%
Other values (10) 224
 
3.3%
Decimal Number
ValueCountFrequency (%)
0 3882
48.6%
3 1666
20.8%
5 1427
 
17.9%
4 361
 
4.5%
7 314
 
3.9%
2 240
 
3.0%
6 55
 
0.7%
8 30
 
0.4%
1 11
 
0.1%
9 6
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 229
99.6%
, 1
 
0.4%
Space Separator
ValueCountFrequency (%)
13928
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3952
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1918
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 55896
41.8%
Latin 49698
37.2%
Common 28041
21.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6654
11.9%
6247
11.2%
6246
11.2%
6219
11.1%
6216
11.1%
3340
 
6.0%
2137
 
3.8%
2097
 
3.8%
2066
 
3.7%
2020
 
3.6%
Other values (70) 12654
22.6%
Latin
ValueCountFrequency (%)
A 6224
 
12.5%
S 4288
 
8.6%
N 4285
 
8.6%
R 3581
 
7.2%
E 3198
 
6.4%
T 2643
 
5.3%
O 2444
 
4.9%
H 2060
 
4.1%
h 2002
 
4.0%
U 1822
 
3.7%
Other values (35) 17151
34.5%
Common
ValueCountFrequency (%)
13928
49.7%
( 3952
 
14.1%
0 3882
 
13.8%
) 1918
 
6.8%
3 1666
 
5.9%
5 1427
 
5.1%
4 361
 
1.3%
7 314
 
1.1%
2 240
 
0.9%
. 229
 
0.8%
Other values (6) 124
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 77739
58.2%
Hangul 55896
41.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13928
17.9%
A 6224
 
8.0%
S 4288
 
5.5%
N 4285
 
5.5%
( 3952
 
5.1%
0 3882
 
5.0%
R 3581
 
4.6%
E 3198
 
4.1%
T 2643
 
3.4%
O 2444
 
3.1%
Other values (51) 29314
37.7%
Hangul
ValueCountFrequency (%)
6654
11.9%
6247
11.2%
6246
11.2%
6219
11.1%
6216
11.1%
3340
 
6.0%
2137
 
3.8%
2097
 
3.8%
2066
 
3.7%
2020
 
3.6%
Other values (70) 12654
22.6%

연료
Categorical

IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
하이브리드(휘발유+전기)
9237 
하이브리드(LPG+전기)
 
433
전기
 
311
하이브리드(CNG+전기)
 
11
하이브리드(경유+전기)
 
7

Length

Max length13
Median length13
Mean length12.6561
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row하이브리드(휘발유+전기)
2nd row하이브리드(휘발유+전기)
3rd row하이브리드(휘발유+전기)
4th row하이브리드(LPG+전기)
5th row하이브리드(휘발유+전기)

Common Values

ValueCountFrequency (%)
하이브리드(휘발유+전기) 9237
92.4%
하이브리드(LPG+전기) 433
 
4.3%
전기 311
 
3.1%
하이브리드(CNG+전기) 11
 
0.1%
하이브리드(경유+전기) 7
 
0.1%
수소 1
 
< 0.1%

Length

2024-03-13T16:47:37.531349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T16:47:37.616516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
하이브리드(휘발유+전기 9237
92.4%
하이브리드(lpg+전기 433
 
4.3%
전기 311
 
3.1%
하이브리드(cng+전기 11
 
0.1%
하이브리드(경유+전기 7
 
0.1%
수소 1
 
< 0.1%
Distinct1830
Distinct (%)18.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2005-10-27 00:00:00
Maximum2016-12-30 00:00:00
2024-03-13T16:47:37.729047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T16:47:37.846928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

현소유자의출생년도
Real number (ℝ)

MISSING 

Distinct73
Distinct (%)1.0%
Missing2582
Missing (%)25.8%
Infinite0
Infinite (%)0.0%
Mean1970.5898
Minimum1924
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-13T16:47:37.991684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1924
5-th percentile1950
Q11962
median1972
Q31980
95-th percentile1987
Maximum2015
Range91
Interquartile range (IQR)18

Descriptive statistics

Standard deviation11.547492
Coefficient of variation (CV)0.0058599168
Kurtosis-0.28950047
Mean1970.5898
Median Absolute Deviation (MAD)9
Skewness-0.43416319
Sum14617835
Variance133.34457
MonotonicityNot monotonic
2024-03-13T16:47:38.110099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1981 271
 
2.7%
1983 260
 
2.6%
1982 253
 
2.5%
1980 240
 
2.4%
1971 239
 
2.4%
1977 238
 
2.4%
1972 238
 
2.4%
1979 237
 
2.4%
1975 235
 
2.4%
1974 234
 
2.3%
Other values (63) 4973
49.7%
(Missing) 2582
25.8%
ValueCountFrequency (%)
1924 1
 
< 0.1%
1927 1
 
< 0.1%
1929 2
 
< 0.1%
1930 1
 
< 0.1%
1931 1
 
< 0.1%
1932 1
 
< 0.1%
1933 1
 
< 0.1%
1934 3
< 0.1%
1935 5
0.1%
1936 6
0.1%
ValueCountFrequency (%)
2015 1
 
< 0.1%
2011 4
 
< 0.1%
2001 3
 
< 0.1%
1996 1
 
< 0.1%
1995 2
 
< 0.1%
1994 4
 
< 0.1%
1993 6
 
0.1%
1992 18
0.2%
1991 25
0.2%
1990 36
0.4%

Interactions

2024-03-13T16:47:35.674761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-13T16:47:38.185087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연료현소유자의출생년도
연료1.0000.000
현소유자의출생년도0.0001.000
2024-03-13T16:47:38.255986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
현소유자의출생년도연료
현소유자의출생년도1.0000.000
연료0.0001.000

Missing values

2024-03-13T16:47:35.765434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T16:47:35.851296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-13T16:47:35.934762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

기준년월사용본거지시읍면동_행정동기준차명연료최초등록일현소유자의출생년도
79792016-12서울특별시 성북구 정릉1동K5 하이브리드하이브리드(휘발유+전기)2014-07-11<NA>
276452016-12서울특별시 양천구 목2동쏘나타 하이브리드(SONATA HYB하이브리드(휘발유+전기)2015-11-181974
280132016-12서울특별시 강서구 화곡3동쏘나타(SONATA) 하이브리드하이브리드(휘발유+전기)2013-12-091966
188142016-12서울특별시 금천구 시흥3동아반떼 하이브리드(AVANTE HYB하이브리드(LPG+전기)2009-08-181964
356402016-12서울특별시 강동구 성내2동쏘나타(SONATA) 하이브리드하이브리드(휘발유+전기)2013-01-021954
148602016-12서울특별시 마포구 상암동렉서스 CT200h하이브리드(휘발유+전기)2015-05-181969
152662016-12서울특별시 강서구 가양1동그랜저 하이브리드 (GRANDEUR하이브리드(휘발유+전기)2015-04-23<NA>
354852016-12서울특별시 강서구 발산1동쏘나타 (SONATA) 하이브리드하이브리드(휘발유+전기)2012-11-281985
457922016-12서울특별시 서초구 반포4동렉서스 NX300h하이브리드(휘발유+전기)2016-04-291972
144982016-12서울특별시 서초구 양재2동K5 하이브리드하이브리드(휘발유+전기)2016-01-13<NA>
기준년월사용본거지시읍면동_행정동기준차명연료최초등록일현소유자의출생년도
340792016-12서울특별시 서대문구 북아현동포르테하이브리드하이브리드(LPG+전기)2011-03-15<NA>
435982016-12서울특별시 노원구 중계본동렉서스 ES300h하이브리드(휘발유+전기)2013-11-181976
46672016-12서울특별시 강남구 역삼1동렉서스 ES300h하이브리드(휘발유+전기)2014-06-26<NA>
207882016-12서울특별시 동대문구 휘경1동렉서스 ES300h하이브리드(휘발유+전기)2013-06-211970
416192016-12서울특별시 송파구 삼전동렉서스 ES300h하이브리드(휘발유+전기)2016-07-271974
114592016-12서울특별시 강서구 가양1동쏘나타(SONATA) 하이브리드하이브리드(휘발유+전기)2013-12-16<NA>
348572016-12서울특별시 광진구 화양동K5 하이브리드하이브리드(휘발유+전기)2016-05-101988
305352016-12서울특별시 강서구 화곡본동아이오닉 하이브리드(IONIQ HY하이브리드(휘발유+전기)2016-09-07<NA>
357312016-12서울특별시 중구 다산동토요타 CAMRY Hybrid하이브리드(휘발유+전기)2012-05-231957
126442016-12서울특별시 강남구 역삼1동렉서스 ES300h하이브리드(휘발유+전기)2016-10-21<NA>

Duplicate rows

Most frequently occurring

기준년월사용본거지시읍면동_행정동기준차명연료최초등록일현소유자의출생년도# duplicates
2082016-12서울특별시 서초구 양재2동K5 하이브리드하이브리드(휘발유+전기)2016-01-12<NA>35
2092016-12서울특별시 서초구 양재2동K5 하이브리드하이브리드(휘발유+전기)2016-01-13<NA>34
1962016-12서울특별시 서초구 양재1동쏘나타 하이브리드(SONATA HYB하이브리드(휘발유+전기)2014-12-24<NA>31
1932016-12서울특별시 서초구 양재1동쏘나타 하이브리드(SONATA HYB하이브리드(휘발유+전기)2014-12-19<NA>29
1952016-12서울특별시 서초구 양재1동쏘나타 하이브리드(SONATA HYB하이브리드(휘발유+전기)2014-12-23<NA>20
1202016-12서울특별시 강서구 가양1동그랜저(GRANDEUR) 하이브리드하이브리드(휘발유+전기)2014-01-24<NA>14
1892016-12서울특별시 서초구 양재1동K7 하이브리드하이브리드(휘발유+전기)2016-11-24<NA>14
1972016-12서울특별시 서초구 양재1동쏘나타 하이브리드(SONATA HYB하이브리드(휘발유+전기)2014-12-26<NA>13
1942016-12서울특별시 서초구 양재1동쏘나타 하이브리드(SONATA HYB하이브리드(휘발유+전기)2014-12-22<NA>12
1622016-12서울특별시 구로구 구로1동K5 하이브리드하이브리드(휘발유+전기)2015-02-26<NA>10