Overview

Dataset statistics

Number of variables16
Number of observations565
Missing cells5050
Missing cells (%)55.9%
Duplicate rows25
Duplicate rows (%)4.4%
Total size in memory73.5 KiB
Average record size in memory133.2 B

Variable types

Text3
Unsupported11
Categorical2

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15640/F/1/datasetView.do

Alerts

Dataset has 25 (4.4%) duplicate rowsDuplicates
Unnamed: 3 is highly overall correlated with Unnamed: 6High correlation
Unnamed: 6 is highly overall correlated with Unnamed: 3High correlation
Unnamed: 6 is highly imbalanced (52.9%)Imbalance
Unnamed: 0 has 511 (90.4%) missing valuesMissing
Unnamed: 1 has 565 (100.0%) missing valuesMissing
Unnamed: 2 has 563 (99.6%) missing valuesMissing
연료별 차종별 용도별 등록현황 has 565 (100.0%) missing valuesMissing
Unnamed: 5 has 565 (100.0%) missing valuesMissing
Unnamed: 10 has 565 (100.0%) missing valuesMissing
Unnamed: 11 has 563 (99.6%) missing valuesMissing
Unnamed: 13 has 565 (100.0%) missing valuesMissing
Unnamed: 14 has 563 (99.6%) missing valuesMissing
Unnamed: 1 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 2 is an unsupported type, check if it needs cleaning or further analysisUnsupported
연료별 차종별 용도별 등록현황 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 12 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 13 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 15 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-06 11:21:08.708371
Analysis finished2024-04-06 11:21:10.484007
Duration1.78 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Unnamed: 0
Text

MISSING 

Distinct30
Distinct (%)55.6%
Missing511
Missing (%)90.4%
Memory size4.5 KiB
2024-04-06T20:21:10.713444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length3
Mean length3.4814815
Min length2

Characters and Unicode

Total characters188
Distinct characters59
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)11.1%

Sample

1st row자동차관리 시스템
2nd rowPROG_ID :
3rd row통계기준월 :
4th row시군구별
5th row합 계
ValueCountFrequency (%)
서대문구 2
 
3.4%
마포구 2
 
3.4%
2
 
3.4%
강서구 2
 
3.4%
구로구 2
 
3.4%
금천구 2
 
3.4%
영등포구 2
 
3.4%
서초구 2
 
3.4%
관악구 2
 
3.4%
양천구 2
 
3.4%
Other values (23) 38
65.5%
2024-04-06T20:21:11.315589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
52
27.7%
9
 
4.8%
9
 
4.8%
8
 
4.3%
6
 
3.2%
4
 
2.1%
4
 
2.1%
4
 
2.1%
4
 
2.1%
4
 
2.1%
Other values (49) 84
44.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 170
90.4%
Space Separator 9
 
4.8%
Uppercase Letter 6
 
3.2%
Other Punctuation 2
 
1.1%
Connector Punctuation 1
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
52
30.6%
9
 
5.3%
8
 
4.7%
6
 
3.5%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
Other values (40) 71
41.8%
Uppercase Letter
ValueCountFrequency (%)
P 1
16.7%
R 1
16.7%
O 1
16.7%
G 1
16.7%
I 1
16.7%
D 1
16.7%
Space Separator
ValueCountFrequency (%)
9
100.0%
Other Punctuation
ValueCountFrequency (%)
: 2
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 170
90.4%
Common 12
 
6.4%
Latin 6
 
3.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
52
30.6%
9
 
5.3%
8
 
4.7%
6
 
3.5%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
Other values (40) 71
41.8%
Latin
ValueCountFrequency (%)
P 1
16.7%
R 1
16.7%
O 1
16.7%
G 1
16.7%
I 1
16.7%
D 1
16.7%
Common
ValueCountFrequency (%)
9
75.0%
: 2
 
16.7%
_ 1
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 170
90.4%
ASCII 18
 
9.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
52
30.6%
9
 
5.3%
8
 
4.7%
6
 
3.5%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
4
 
2.4%
Other values (40) 71
41.8%
ASCII
ValueCountFrequency (%)
9
50.0%
: 2
 
11.1%
P 1
 
5.6%
_ 1
 
5.6%
R 1
 
5.6%
O 1
 
5.6%
G 1
 
5.6%
I 1
 
5.6%
D 1
 
5.6%

Unnamed: 1
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing565
Missing (%)100.0%
Memory size5.1 KiB

Unnamed: 2
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing563
Missing (%)99.6%
Memory size4.5 KiB

Unnamed: 3
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size4.5 KiB
휘발유(무연)
50 
CNG
50 
경유
50 
기타연료
50 
엘피지
50 
Other values (13)
315 

Length

Max length13
Median length12
Mean length5.7911504
Min length1

Unique

Unique4 ?
Unique (%)0.7%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
휘발유(무연) 50
8.8%
CNG 50
8.8%
경유 50
8.8%
기타연료 50
8.8%
엘피지 50
8.8%
전기 50
8.8%
휘발유 50
8.8%
하이브리드(휘발유+전기) 49
8.7%
하이브리드(경유+전기) 45
8.0%
하이브리드(LPG+전기) 40
7.1%
Other values (8) 81
14.3%

Length

2024-04-06T20:21:11.555084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
휘발유(무연 50
8.8%
cng 50
8.8%
경유 50
8.8%
기타연료 50
8.8%
엘피지 50
8.8%
전기 50
8.8%
휘발유 50
8.8%
하이브리드(휘발유+전기 49
8.7%
하이브리드(경유+전기 45
8.0%
하이브리드(lpg+전기 40
7.1%
Other values (8) 81
14.3%

연료별 차종별 용도별 등록현황
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing565
Missing (%)100.0%
Memory size5.1 KiB

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing565
Missing (%)100.0%
Memory size5.1 KiB

Unnamed: 6
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size4.5 KiB
비사업용
301 
사업용
257 
<NA>
 
5
용도별
 
1
 
1

Length

Max length4
Median length4
Mean length3.5380531
Min length1

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
비사업용 301
53.3%
사업용 257
45.5%
<NA> 5
 
0.9%
용도별 1
 
0.2%
1
 
0.2%

Length

2024-04-06T20:21:11.748644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T20:21:11.953601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
비사업용 301
53.3%
사업용 257
45.5%
na 5
 
0.9%
용도별 1
 
0.2%
1
 
0.2%

Unnamed: 7
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 8
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 9
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing565
Missing (%)100.0%
Memory size5.1 KiB

Unnamed: 11
Text

MISSING 

Distinct2
Distinct (%)100.0%
Missing563
Missing (%)99.6%
Memory size4.5 KiB
2024-04-06T20:21:12.113427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length8
Mean length8
Min length6

Characters and Unicode

Total characters16
Distinct characters13
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)100.0%

Sample

1st rowPage No. :
2nd row출력일자 :
ValueCountFrequency (%)
2
40.0%
page 1
20.0%
no 1
20.0%
출력일자 1
20.0%
2024-04-06T20:21:12.533131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3
18.8%
: 2
12.5%
P 1
 
6.2%
a 1
 
6.2%
g 1
 
6.2%
e 1
 
6.2%
N 1
 
6.2%
o 1
 
6.2%
. 1
 
6.2%
1
 
6.2%
Other values (3) 3
18.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4
25.0%
Other Letter 4
25.0%
Space Separator 3
18.8%
Other Punctuation 3
18.8%
Uppercase Letter 2
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1
25.0%
g 1
25.0%
e 1
25.0%
o 1
25.0%
Other Letter
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Other Punctuation
ValueCountFrequency (%)
: 2
66.7%
. 1
33.3%
Uppercase Letter
ValueCountFrequency (%)
P 1
50.0%
N 1
50.0%
Space Separator
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 6
37.5%
Latin 6
37.5%
Hangul 4
25.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 1
16.7%
a 1
16.7%
g 1
16.7%
e 1
16.7%
N 1
16.7%
o 1
16.7%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Common
ValueCountFrequency (%)
3
50.0%
: 2
33.3%
. 1
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12
75.0%
Hangul 4
 
25.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3
25.0%
: 2
16.7%
P 1
 
8.3%
a 1
 
8.3%
g 1
 
8.3%
e 1
 
8.3%
N 1
 
8.3%
o 1
 
8.3%
. 1
 
8.3%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Unnamed: 12
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Unnamed: 13
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing565
Missing (%)100.0%
Memory size5.1 KiB

Unnamed: 14
Text

MISSING 

Distinct2
Distinct (%)100.0%
Missing563
Missing (%)99.6%
Memory size4.5 KiB
2024-04-06T20:21:12.817348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length10
Mean length10
Min length1

Characters and Unicode

Total characters20
Distinct characters9
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)100.0%

Sample

1st row1
2nd row2023-03-23 10:28:42
ValueCountFrequency (%)
1 1
33.3%
2023-03-23 1
33.3%
10:28:42 1
33.3%
2024-04-06T20:21:13.250596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 5
25.0%
0 3
15.0%
3 3
15.0%
1 2
 
10.0%
- 2
 
10.0%
: 2
 
10.0%
1
 
5.0%
8 1
 
5.0%
4 1
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15
75.0%
Dash Punctuation 2
 
10.0%
Other Punctuation 2
 
10.0%
Space Separator 1
 
5.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 5
33.3%
0 3
20.0%
3 3
20.0%
1 2
 
13.3%
8 1
 
6.7%
4 1
 
6.7%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Other Punctuation
ValueCountFrequency (%)
: 2
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 20
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 5
25.0%
0 3
15.0%
3 3
15.0%
1 2
 
10.0%
- 2
 
10.0%
: 2
 
10.0%
1
 
5.0%
8 1
 
5.0%
4 1
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 5
25.0%
0 3
15.0%
3 3
15.0%
1 2
 
10.0%
- 2
 
10.0%
: 2
 
10.0%
1
 
5.0%
8 1
 
5.0%
4 1
 
5.0%

Unnamed: 15
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)0.9%
Memory size4.5 KiB

Correlations

2024-04-06T20:21:13.406690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 0Unnamed: 3Unnamed: 6Unnamed: 11Unnamed: 14
Unnamed: 01.0000.6760.9080.0000.000
Unnamed: 30.6761.0000.933NaNNaN
Unnamed: 60.9080.9331.000NaNNaN
Unnamed: 110.000NaNNaN1.0000.000
Unnamed: 140.000NaNNaN0.0001.000
2024-04-06T20:21:13.565562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 6
Unnamed: 31.0000.812
Unnamed: 60.8121.000
2024-04-06T20:21:13.701224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 6
Unnamed: 31.0000.812
Unnamed: 60.8121.000

Missing values

2024-04-06T20:21:09.308970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-06T20:21:09.623658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-06T20:21:09.939915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0Unnamed: 1Unnamed: 2Unnamed: 3연료별 차종별 용도별 등록현황Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15
0<NA><NA>NaN<NA><NA><NA><NA>NaNNaNNaN<NA><NA>NaN<NA><NA>NaN
1<NA><NA>NaN<NA><NA><NA><NA>NaNNaNNaN<NA><NA>NaN<NA><NA>NaN
2자동차관리 시스템<NA>NaN<NA><NA><NA><NA>NaNNaNNaN<NA><NA>NaN<NA><NA>NaN
3PROG_ID :<NA>STA029Q<NA><NA><NA><NA>NaNNaNNaN<NA>Page No. :NaN<NA>1NaN
4통계기준월 :<NA>202301<NA><NA><NA><NA>NaNNaNNaN<NA>출력일자 :NaN<NA>2023-03-23 10:28:42NaN
5시군구별<NA>NaN연료별<NA><NA>용도별승 용승 합화 물<NA><NA>특 수<NA><NA>
6합 계<NA>NaN<NA><NA>276578295828321896<NA><NA>10845<NA><NA>3194351
7종로구<NA>NaNCNG<NA><NA>비사업용6419<NA><NA>0<NA><NA>29
8<NA><NA>NaNCNG<NA><NA>사업용0840<NA><NA>0<NA><NA>84
9<NA><NA>NaN경유<NA><NA>비사업용982029523578<NA><NA>141<NA><NA>16491
Unnamed: 0Unnamed: 1Unnamed: 2Unnamed: 3연료별 차종별 용도별 등록현황Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15
555<NA><NA>NaN하이브리드(LPG+전기)<NA><NA>사업용100<NA><NA>0<NA><NA>1
556<NA><NA>NaN하이브리드(경유+전기)<NA><NA>비사업용14400<NA><NA>0<NA><NA>144
557<NA><NA>NaN하이브리드(경유+전기)<NA><NA>사업용600<NA><NA>0<NA><NA>6
558강동구<NA>NaN하이브리드(휘발유+전기)<NA><NA>비사업용812800<NA><NA>0<NA><NA>8128
559<NA><NA>NaN하이브리드(휘발유+전기)<NA><NA>사업용6700<NA><NA>0<NA><NA>67
560<NA><NA>NaN휘발유<NA><NA>비사업용2933018105<NA><NA>0<NA><NA>29453
561<NA><NA>NaN휘발유<NA><NA>사업용8500<NA><NA>0<NA><NA>85
562<NA><NA>NaN휘발유(무연)<NA><NA>비사업용486481629<NA><NA>0<NA><NA>48693
563<NA><NA>NaN휘발유(무연)<NA><NA>사업용26730<NA><NA>0<NA><NA>270
564<NA><NA>NaN휘발유(유연)<NA><NA>비사업용4900<NA><NA>0<NA><NA>49

Duplicate rows

Most frequently occurring

Unnamed: 0Unnamed: 3Unnamed: 6Unnamed: 11Unnamed: 14# duplicates
2<NA>경유사업용<NA><NA>25
4<NA>기타연료사업용<NA><NA>25
9<NA>전기비사업용<NA><NA>25
10<NA>전기사업용<NA><NA>25
20<NA>휘발유(무연)비사업용<NA><NA>25
21<NA>휘발유(무연)사업용<NA><NA>25
22<NA>휘발유(유연)비사업용<NA><NA>25
0<NA>CNG사업용<NA><NA>24
1<NA>경유비사업용<NA><NA>24
5<NA>수소비사업용<NA><NA>24