Overview

Dataset statistics

Number of variables16
Number of observations562
Missing cells4463
Missing cells (%)49.6%
Duplicate rows24
Duplicate rows (%)4.3%
Total size in memory73.7 KiB
Average record size in memory134.2 B

Variable types

Text2
Unsupported11
Categorical3

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15640/F/1/datasetView.do

Alerts

Dataset has 24 (4.3%) duplicate rowsDuplicates
Unnamed: 3 is highly overall correlated with Unnamed: 6High correlation
Unnamed: 6 is highly overall correlated with Unnamed: 3High correlation
Unnamed: 6 is highly imbalanced (52.9%)Imbalance
Unnamed: 14 is highly imbalanced (98.1%)Imbalance
Unnamed: 0 has 508 (90.4%) missing valuesMissing
Unnamed: 1 has 562 (100.0%) missing valuesMissing
Unnamed: 2 has 560 (99.6%) missing valuesMissing
연료별 차종별 용도별 등록현황 has 562 (100.0%) missing valuesMissing
Unnamed: 5 has 562 (100.0%) missing valuesMissing
Unnamed: 10 has 562 (100.0%) missing valuesMissing
Unnamed: 11 has 560 (99.6%) missing valuesMissing
Unnamed: 13 has 562 (100.0%) missing valuesMissing
Unnamed: 1 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 2 is an unsupported type, check if it needs cleaning or further analysisUnsupported
연료별 차종별 용도별 등록현황 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 12 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 13 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 15 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-05-11 06:29:15.784727
Analysis finished2024-05-11 06:29:19.267206
Duration3.48 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Unnamed: 0
Text

MISSING 

Distinct30
Distinct (%)55.6%
Missing508
Missing (%)90.4%
Memory size4.5 KiB
2024-05-11T06:29:19.600423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length3
Mean length3.4814815
Min length2

Characters and Unicode

Total characters188
Distinct characters59
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)11.1%

Sample

1st row자동차관리 시스템
2nd rowPROG_ID :
3rd row통계기준월 :
4th row시군구별
5th row합 계
ValueCountFrequency (%)
서대문구 2
 
3.4%
마포구 2
 
3.4%
2
 
3.4%
강서구 2
 
3.4%
구로구 2
 
3.4%
금천구 2
 
3.4%
영등포구 2
 
3.4%
서초구 2
 
3.4%
관악구 2
 
3.4%
양천구 2
 
3.4%
Other values (23) 38
65.5%
2024-05-11T06:29:20.515080image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
52
27.7%
9
 
4.8%
9
 
4.8%
8
 
4.3%
6
 
3.2%
4
 
2.1%
4
 
2.1%
4
 
2.1%
4
 
2.1%
4
 
2.1%
Other values (49) 84
44.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 170
90.4%
Space Separator 9
 
4.8%
Uppercase Letter 6
 
3.2%
Other Punctuation 2
 
1.1%
Connector Punctuation 1
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
52
30.6%
9
 
5.3%
8
 
4.7%
6
 
3.5%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
Other values (40) 71
41.8%
Uppercase Letter
ValueCountFrequency (%)
P 1
16.7%
R 1
16.7%
O 1
16.7%
G 1
16.7%
I 1
16.7%
D 1
16.7%
Space Separator
ValueCountFrequency (%)
9
100.0%
Other Punctuation
ValueCountFrequency (%)
: 2
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 170
90.4%
Common 12
 
6.4%
Latin 6
 
3.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
52
30.6%
9
 
5.3%
8
 
4.7%
6
 
3.5%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
Other values (40) 71
41.8%
Latin
ValueCountFrequency (%)
P 1
16.7%
R 1
16.7%
O 1
16.7%
G 1
16.7%
I 1
16.7%
D 1
16.7%
Common
ValueCountFrequency (%)
9
75.0%
: 2
 
16.7%
_ 1
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 170
90.4%
ASCII 18
 
9.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
52
30.6%
9
 
5.3%
8
 
4.7%
6
 
3.5%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
Other values (40) 71
41.8%
ASCII
ValueCountFrequency (%)
9
50.0%
: 2
 
11.1%
P 1
 
5.6%
_ 1
 
5.6%
R 1
 
5.6%
O 1
 
5.6%
G 1
 
5.6%
I 1
 
5.6%
D 1
 
5.6%

Unnamed: 1
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing562
Missing (%)100.0%
Memory size5.1 KiB

Unnamed: 2
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing560
Missing (%)99.6%
Memory size4.5 KiB

Unnamed: 3
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size4.5 KiB
전기
50 
휘발유(무연)
50 
CNG
50 
경유
50 
기타연료
50 
Other values (12)
312 

Length

Max length13
Median length12
Mean length5.8309609
Min length1

Unique

Unique3 ?
Unique (%)0.5%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
전기 50
8.9%
휘발유(무연) 50
8.9%
CNG 50
8.9%
경유 50
8.9%
기타연료 50
8.9%
엘피지 50
8.9%
하이브리드(휘발유+전기) 50
8.9%
휘발유 49
8.7%
하이브리드(경유+전기) 44
7.8%
하이브리드(LPG+전기) 43
7.7%
Other values (7) 76
13.5%

Length

2024-05-11T06:29:21.117027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
전기 50
8.9%
휘발유(무연 50
8.9%
cng 50
8.9%
경유 50
8.9%
기타연료 50
8.9%
엘피지 50
8.9%
하이브리드(휘발유+전기 50
8.9%
휘발유 49
8.7%
하이브리드(경유+전기 44
7.8%
하이브리드(lpg+전기 43
7.7%
Other values (7) 76
13.5%

연료별 차종별 용도별 등록현황
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing562
Missing (%)100.0%
Memory size5.1 KiB

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing562
Missing (%)100.0%
Memory size5.1 KiB

Unnamed: 6
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size4.5 KiB
비사업용
300 
사업용
255 
<NA>
 
5
용도별
 
1
 
1

Length

Max length4
Median length4
Mean length3.5391459
Min length1

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
비사업용 300
53.4%
사업용 255
45.4%
<NA> 5
 
0.9%
용도별 1
 
0.2%
1
 
0.2%

Length

2024-05-11T06:29:21.777427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:29:22.241268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
비사업용 300
53.4%
사업용 255
45.4%
na 5
 
0.9%
용도별 1
 
0.2%
1
 
0.2%

Unnamed: 7
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 8
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 9
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing562
Missing (%)100.0%
Memory size5.1 KiB

Unnamed: 11
Text

MISSING 

Distinct2
Distinct (%)100.0%
Missing560
Missing (%)99.6%
Memory size4.5 KiB
2024-05-11T06:29:22.614895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length8
Mean length8
Min length6

Characters and Unicode

Total characters16
Distinct characters13
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)100.0%

Sample

1st rowPage No. :
2nd row출력일자 :
ValueCountFrequency (%)
2
40.0%
page 1
20.0%
no 1
20.0%
출력일자 1
20.0%
2024-05-11T06:29:23.470774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3
18.8%
: 2
12.5%
P 1
 
6.2%
a 1
 
6.2%
g 1
 
6.2%
e 1
 
6.2%
N 1
 
6.2%
o 1
 
6.2%
. 1
 
6.2%
1
 
6.2%
Other values (3) 3
18.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4
25.0%
Other Letter 4
25.0%
Space Separator 3
18.8%
Other Punctuation 3
18.8%
Uppercase Letter 2
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1
25.0%
g 1
25.0%
e 1
25.0%
o 1
25.0%
Other Letter
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Other Punctuation
ValueCountFrequency (%)
: 2
66.7%
. 1
33.3%
Uppercase Letter
ValueCountFrequency (%)
P 1
50.0%
N 1
50.0%
Space Separator
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 6
37.5%
Latin 6
37.5%
Hangul 4
25.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 1
16.7%
a 1
16.7%
g 1
16.7%
e 1
16.7%
N 1
16.7%
o 1
16.7%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Common
ValueCountFrequency (%)
3
50.0%
: 2
33.3%
. 1
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12
75.0%
Hangul 4
 
25.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3
25.0%
: 2
16.7%
P 1
 
8.3%
a 1
 
8.3%
g 1
 
8.3%
e 1
 
8.3%
N 1
 
8.3%
o 1
 
8.3%
. 1
 
8.3%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Unnamed: 12
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 13
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing562
Missing (%)100.0%
Memory size5.1 KiB

Unnamed: 14
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.5 KiB
<NA>
561 
1
 
1

Length

Max length4
Median length4
Mean length3.9946619
Min length1

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row1
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 561
99.8%
1 1
 
0.2%

Length

2024-05-11T06:29:23.811201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:29:24.177722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 561
99.8%
1 1
 
0.2%

Unnamed: 15
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Correlations

2024-05-11T06:29:24.383171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 0Unnamed: 3Unnamed: 6Unnamed: 11
Unnamed: 01.0000.6510.9080.000
Unnamed: 30.6511.0000.981NaN
Unnamed: 60.9080.9811.000NaN
Unnamed: 110.000NaNNaN1.000
2024-05-11T06:29:24.603322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 6Unnamed: 3Unnamed: 14
Unnamed: 61.0000.814NaN
Unnamed: 30.8141.000NaN
Unnamed: 14NaNNaN1.000
2024-05-11T06:29:24.868720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 6Unnamed: 14
Unnamed: 31.0000.8140.000
Unnamed: 60.8141.0000.000
Unnamed: 140.0000.0001.000

Missing values

2024-05-11T06:29:16.620607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:29:17.956368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-05-11T06:29:18.780700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0Unnamed: 1Unnamed: 2Unnamed: 3연료별 차종별 용도별 등록현황Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15
0<NA><NA>NaN<NA><NA><NA><NA>NaNNaNNaN<NA><NA>NaN<NA><NA>NaN
1<NA><NA>NaN<NA><NA><NA><NA>NaNNaNNaN<NA><NA>NaN<NA><NA>NaN
2자동차관리 시스템<NA>NaN<NA><NA><NA><NA>NaNNaNNaN<NA><NA>NaN<NA><NA>NaN
3PROG_ID :<NA>STA029Q<NA><NA><NA><NA>NaNNaNNaN<NA>Page No. :NaN<NA>1NaN
4통계기준월 :<NA>202403<NA><NA><NA><NA>NaNNaNNaN<NA>출력일자 :NaN<NA><NA>NaN
5시군구별<NA>NaN연료별<NA><NA>용도별승 용승 합화 물<NA><NA>특 수<NA><NA>
6합 계<NA>NaN<NA><NA>277538689887311049<NA><NA>11454<NA><NA>3187776
7종로구<NA>NaNCNG<NA><NA>비사업용6916<NA><NA>0<NA><NA>31
8<NA><NA>NaNCNG<NA><NA>사업용0800<NA><NA>0<NA><NA>80
9<NA><NA>NaN경유<NA><NA>비사업용939027513465<NA><NA>143<NA><NA>15749
Unnamed: 0Unnamed: 1Unnamed: 2Unnamed: 3연료별 차종별 용도별 등록현황Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15
552<NA><NA>NaN하이브리드(LPG+전기)<NA><NA>사업용1200<NA><NA>0<NA><NA>12
553<NA><NA>NaN하이브리드(경유+전기)<NA><NA>비사업용20400<NA><NA>0<NA><NA>204
554<NA><NA>NaN하이브리드(경유+전기)<NA><NA>사업용700<NA><NA>0<NA><NA>7
555<NA><NA>NaN하이브리드(휘발유+전기)<NA><NA>비사업용1077201<NA><NA>0<NA><NA>10773
556<NA><NA>NaN하이브리드(휘발유+전기)<NA><NA>사업용8300<NA><NA>0<NA><NA>83
557<NA><NA>NaN휘발유<NA><NA>비사업용2810817114<NA><NA>0<NA><NA>28239
558강동구<NA>NaN휘발유<NA><NA>사업용8400<NA><NA>0<NA><NA>84
559<NA><NA>NaN휘발유(무연)<NA><NA>비사업용501391929<NA><NA>0<NA><NA>50187
560<NA><NA>NaN휘발유(무연)<NA><NA>사업용28600<NA><NA>0<NA><NA>286
561<NA><NA>NaN휘발유(유연)<NA><NA>비사업용5200<NA><NA>0<NA><NA>52

Duplicate rows

Most frequently occurring

Unnamed: 0Unnamed: 3Unnamed: 6Unnamed: 11Unnamed: 14# duplicates
0<NA>CNG사업용<NA><NA>25
1<NA>경유비사업용<NA><NA>25
4<NA>기타연료사업용<NA><NA>25
9<NA>전기비사업용<NA><NA>25
17<NA>하이브리드(휘발유+전기)사업용<NA><NA>25
20<NA>휘발유(무연)비사업용<NA><NA>25
21<NA>휘발유(무연)사업용<NA><NA>25
22<NA>휘발유(유연)비사업용<NA><NA>25
2<NA>경유사업용<NA><NA>24
3<NA>기타연료비사업용<NA><NA>24