Overview

Dataset statistics

Number of variables16
Number of observations555
Missing cells4962
Missing cells (%)55.9%
Duplicate rows25
Duplicate rows (%)4.5%
Total size in memory72.2 KiB
Average record size in memory133.2 B

Variable types

Text3
Unsupported11
Categorical2

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15640/F/1/datasetView.do

Alerts

Dataset has 25 (4.5%) duplicate rowsDuplicates
Unnamed: 3 is highly overall correlated with Unnamed: 6High correlation
Unnamed: 6 is highly overall correlated with Unnamed: 3High correlation
Unnamed: 6 is highly imbalanced (52.9%)Imbalance
Unnamed: 0 has 503 (90.6%) missing valuesMissing
Unnamed: 1 has 555 (100.0%) missing valuesMissing
Unnamed: 2 has 553 (99.6%) missing valuesMissing
연료별 차종별 용도별 등록현황 has 555 (100.0%) missing valuesMissing
Unnamed: 5 has 555 (100.0%) missing valuesMissing
Unnamed: 10 has 555 (100.0%) missing valuesMissing
Unnamed: 11 has 553 (99.6%) missing valuesMissing
Unnamed: 13 has 555 (100.0%) missing valuesMissing
Unnamed: 14 has 553 (99.6%) missing valuesMissing
Unnamed: 1 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 2 is an unsupported type, check if it needs cleaning or further analysisUnsupported
연료별 차종별 용도별 등록현황 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 12 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 13 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 15 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-17 22:38:30.468840
Analysis finished2024-04-17 22:38:31.647239
Duration1.18 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Unnamed: 0
Text

MISSING 

Distinct30
Distinct (%)57.7%
Missing503
Missing (%)90.6%
Memory size4.5 KiB
2024-04-18T07:38:31.783793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length3
Mean length3.5
Min length2

Characters and Unicode

Total characters182
Distinct characters59
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)15.4%

Sample

1st row자동차관리 시스템
2nd rowPROG_ID :
3rd row통계기준월 :
4th row시군구별
5th row합 계
ValueCountFrequency (%)
은평구 2
 
3.6%
서대문구 2
 
3.6%
2
 
3.6%
양천구 2
 
3.6%
강서구 2
 
3.6%
구로구 2
 
3.6%
금천구 2
 
3.6%
송파구 2
 
3.6%
강동구 2
 
3.6%
강남구 2
 
3.6%
Other values (23) 36
64.3%
2024-04-18T07:38:32.128964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
50
27.5%
9
 
4.9%
8
 
4.4%
8
 
4.4%
6
 
3.3%
4
 
2.2%
4
 
2.2%
4
 
2.2%
4
 
2.2%
4
 
2.2%
Other values (49) 81
44.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 164
90.1%
Space Separator 9
 
4.9%
Uppercase Letter 6
 
3.3%
Other Punctuation 2
 
1.1%
Connector Punctuation 1
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
50
30.5%
8
 
4.9%
8
 
4.9%
6
 
3.7%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
Other values (40) 68
41.5%
Uppercase Letter
ValueCountFrequency (%)
I 1
16.7%
G 1
16.7%
O 1
16.7%
R 1
16.7%
P 1
16.7%
D 1
16.7%
Space Separator
ValueCountFrequency (%)
9
100.0%
Other Punctuation
ValueCountFrequency (%)
: 2
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 164
90.1%
Common 12
 
6.6%
Latin 6
 
3.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
50
30.5%
8
 
4.9%
8
 
4.9%
6
 
3.7%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
Other values (40) 68
41.5%
Latin
ValueCountFrequency (%)
I 1
16.7%
G 1
16.7%
O 1
16.7%
R 1
16.7%
P 1
16.7%
D 1
16.7%
Common
ValueCountFrequency (%)
9
75.0%
: 2
 
16.7%
_ 1
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 164
90.1%
ASCII 18
 
9.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
50
30.5%
8
 
4.9%
8
 
4.9%
6
 
3.7%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
Other values (40) 68
41.5%
ASCII
ValueCountFrequency (%)
9
50.0%
: 2
 
11.1%
I 1
 
5.6%
_ 1
 
5.6%
G 1
 
5.6%
O 1
 
5.6%
R 1
 
5.6%
P 1
 
5.6%
D 1
 
5.6%

Unnamed: 1
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing555
Missing (%)100.0%
Memory size5.0 KiB

Unnamed: 2
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing553
Missing (%)99.6%
Memory size4.5 KiB

Unnamed: 3
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size4.5 KiB
CNG
50 
경유
50 
기타연료
50 
엘피지
50 
전기
50 
Other values (11)
305 

Length

Max length13
Median length12
Mean length5.7423423
Min length1

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
CNG 50
9.0%
경유 50
9.0%
기타연료 50
9.0%
엘피지 50
9.0%
전기 50
9.0%
휘발유 50
9.0%
휘발유(무연) 50
9.0%
하이브리드(휘발유+전기) 49
8.8%
하이브리드(경유+전기) 42
7.6%
수소 37
6.7%
Other values (6) 77
13.9%

Length

2024-04-18T07:38:32.278821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cng 50
9.0%
경유 50
9.0%
기타연료 50
9.0%
엘피지 50
9.0%
전기 50
9.0%
휘발유 50
9.0%
휘발유(무연 50
9.0%
하이브리드(휘발유+전기 49
8.8%
하이브리드(경유+전기 42
7.6%
수소 37
6.7%
Other values (6) 77
13.9%

연료별 차종별 용도별 등록현황
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing555
Missing (%)100.0%
Memory size5.0 KiB

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing555
Missing (%)100.0%
Memory size5.0 KiB

Unnamed: 6
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size4.5 KiB
비사업용
300 
사업용
248 
<NA>
 
5
용도별
 
1
 
1

Length

Max length4
Median length4
Mean length3.5459459
Min length1

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
비사업용 300
54.1%
사업용 248
44.7%
<NA> 5
 
0.9%
용도별 1
 
0.2%
1
 
0.2%

Length

2024-04-18T07:38:32.397652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-18T07:38:32.496134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
비사업용 300
54.1%
사업용 248
44.7%
na 5
 
0.9%
용도별 1
 
0.2%
1
 
0.2%

Unnamed: 7
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 8
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 9
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing555
Missing (%)100.0%
Memory size5.0 KiB

Unnamed: 11
Text

MISSING 

Distinct2
Distinct (%)100.0%
Missing553
Missing (%)99.6%
Memory size4.5 KiB
2024-04-18T07:38:32.588938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length8
Mean length8
Min length6

Characters and Unicode

Total characters16
Distinct characters13
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)100.0%

Sample

1st rowPage No. :
2nd row출력일자 :
ValueCountFrequency (%)
2
40.0%
page 1
20.0%
no 1
20.0%
출력일자 1
20.0%
2024-04-18T07:38:32.823859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3
18.8%
: 2
12.5%
P 1
 
6.2%
a 1
 
6.2%
g 1
 
6.2%
e 1
 
6.2%
N 1
 
6.2%
o 1
 
6.2%
. 1
 
6.2%
1
 
6.2%
Other values (3) 3
18.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4
25.0%
Other Letter 4
25.0%
Space Separator 3
18.8%
Other Punctuation 3
18.8%
Uppercase Letter 2
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1
25.0%
g 1
25.0%
e 1
25.0%
o 1
25.0%
Other Letter
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Other Punctuation
ValueCountFrequency (%)
: 2
66.7%
. 1
33.3%
Uppercase Letter
ValueCountFrequency (%)
P 1
50.0%
N 1
50.0%
Space Separator
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 6
37.5%
Latin 6
37.5%
Hangul 4
25.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 1
16.7%
a 1
16.7%
g 1
16.7%
e 1
16.7%
N 1
16.7%
o 1
16.7%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Common
ValueCountFrequency (%)
3
50.0%
: 2
33.3%
. 1
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12
75.0%
Hangul 4
 
25.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3
25.0%
: 2
16.7%
P 1
 
8.3%
a 1
 
8.3%
g 1
 
8.3%
e 1
 
8.3%
N 1
 
8.3%
o 1
 
8.3%
. 1
 
8.3%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Unnamed: 12
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 13
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing555
Missing (%)100.0%
Memory size5.0 KiB

Unnamed: 14
Text

MISSING 

Distinct2
Distinct (%)100.0%
Missing553
Missing (%)99.6%
Memory size4.5 KiB
2024-04-18T07:38:32.956703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length10
Mean length10
Min length1

Characters and Unicode

Total characters20
Distinct characters9
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)100.0%

Sample

1st row1
2nd row2022-03-16 16:00:40
ValueCountFrequency (%)
1 1
33.3%
2022-03-16 1
33.3%
16:00:40 1
33.3%
2024-04-18T07:38:33.217414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 5
25.0%
1 3
15.0%
2 3
15.0%
- 2
 
10.0%
6 2
 
10.0%
: 2
 
10.0%
3 1
 
5.0%
1
 
5.0%
4 1
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15
75.0%
Dash Punctuation 2
 
10.0%
Other Punctuation 2
 
10.0%
Space Separator 1
 
5.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5
33.3%
1 3
20.0%
2 3
20.0%
6 2
 
13.3%
3 1
 
6.7%
4 1
 
6.7%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Other Punctuation
ValueCountFrequency (%)
: 2
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 20
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5
25.0%
1 3
15.0%
2 3
15.0%
- 2
 
10.0%
6 2
 
10.0%
: 2
 
10.0%
3 1
 
5.0%
1
 
5.0%
4 1
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5
25.0%
1 3
15.0%
2 3
15.0%
- 2
 
10.0%
6 2
 
10.0%
: 2
 
10.0%
3 1
 
5.0%
1
 
5.0%
4 1
 
5.0%

Unnamed: 15
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Correlations

2024-04-18T07:38:33.306667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 0Unnamed: 3Unnamed: 6Unnamed: 11Unnamed: 14
Unnamed: 01.0000.5060.9120.0000.000
Unnamed: 30.5061.0000.929NaNNaN
Unnamed: 60.9120.9291.000NaNNaN
Unnamed: 110.000NaNNaN1.0000.000
Unnamed: 140.000NaNNaN0.0001.000
2024-04-18T07:38:33.402606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 6
Unnamed: 31.0000.816
Unnamed: 60.8161.000
2024-04-18T07:38:33.524904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 6
Unnamed: 31.0000.816
Unnamed: 60.8161.000

Missing values

2024-04-18T07:38:31.356698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-18T07:38:31.535383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0Unnamed: 1Unnamed: 2Unnamed: 3연료별 차종별 용도별 등록현황Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15
0<NA><NA>NaN<NA><NA><NA><NA>NaNNaNNaN<NA><NA>NaN<NA><NA>NaN
1<NA><NA>NaN<NA><NA><NA><NA>NaNNaNNaN<NA><NA>NaN<NA><NA>NaN
2자동차관리 시스템<NA>NaN<NA><NA><NA><NA>NaNNaNNaN<NA><NA>NaN<NA><NA>NaN
3PROG_ID :<NA>STA029Q<NA><NA><NA><NA>NaNNaNNaN<NA>Page No. :NaN<NA>1NaN
4통계기준월 :<NA>202202<NA><NA><NA><NA>NaNNaNNaN<NA>출력일자 :NaN<NA>2022-03-16 16:00:40NaN
5시군구별<NA>NaN연료별<NA><NA>용도별승 용승 합화 물<NA><NA>특 수<NA><NA>
6합 계<NA>NaN<NA><NA>274362099542326132<NA><NA>10034<NA><NA>3179328
7종로구<NA>NaNCNG<NA><NA>비사업용8421<NA><NA>0<NA><NA>33
8<NA><NA>NaNCNG<NA><NA>사업용0860<NA><NA>0<NA><NA>86
9<NA><NA>NaN경유<NA><NA>비사업용1017230173709<NA><NA>136<NA><NA>17034
Unnamed: 0Unnamed: 1Unnamed: 2Unnamed: 3연료별 차종별 용도별 등록현황Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15
545<NA><NA>NaN하이브리드(LPG+전기)<NA><NA>비사업용7200<NA><NA>0<NA><NA>72
546<NA><NA>NaN하이브리드(경유+전기)<NA><NA>비사업용9400<NA><NA>0<NA><NA>94
547<NA><NA>NaN하이브리드(경유+전기)<NA><NA>사업용400<NA><NA>0<NA><NA>4
548<NA><NA>NaN하이브리드(휘발유+전기)<NA><NA>비사업용681500<NA><NA>0<NA><NA>6815
549<NA><NA>NaN하이브리드(휘발유+전기)<NA><NA>사업용6700<NA><NA>0<NA><NA>67
550<NA><NA>NaN휘발유<NA><NA>비사업용297611994<NA><NA>0<NA><NA>29874
551<NA><NA>NaN휘발유<NA><NA>사업용8300<NA><NA>0<NA><NA>83
552<NA><NA>NaN휘발유(무연)<NA><NA>비사업용475371830<NA><NA>0<NA><NA>47585
553<NA><NA>NaN휘발유(무연)<NA><NA>사업용24500<NA><NA>0<NA><NA>245
554<NA><NA>NaN휘발유(유연)<NA><NA>비사업용5300<NA><NA>0<NA><NA>53

Duplicate rows

Most frequently occurring

Unnamed: 0Unnamed: 3Unnamed: 6Unnamed: 11Unnamed: 14# duplicates
0<NA>CNG사업용<NA><NA>25
4<NA>기타연료사업용<NA><NA>25
8<NA>엘피지사업용<NA><NA>25
14<NA>하이브리드(경유+전기)비사업용<NA><NA>25
19<NA>휘발유사업용<NA><NA>25
22<NA>휘발유(유연)비사업용<NA><NA>25
1<NA>경유비사업용<NA><NA>24
7<NA>엘피지비사업용<NA><NA>24
9<NA>전기비사업용<NA><NA>24
10<NA>전기사업용<NA><NA>24