Overview

Dataset statistics

Number of variables8
Number of observations36
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.6 KiB
Average record size in memory72.7 B

Variable types

Text1
Categorical5
Numeric2

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-20299/F/1/datasetView.do

Alerts

화 물 has constant value ""Constant
특 수 has constant value ""Constant
승 합 is highly overall correlated with 승 용 and 2 other fieldsHigh correlation
용도별 is highly overall correlated with 연료별High correlation
연료별 is highly overall correlated with 승 용 and 3 other fieldsHigh correlation
승 용 is highly overall correlated with and 2 other fieldsHigh correlation
is highly overall correlated with 승 용 and 2 other fieldsHigh correlation
연료별 is highly imbalanced (81.7%)Imbalance
승 합 is highly imbalanced (69.0%)Imbalance

Reproduction

Analysis started2024-03-13 11:09:17.886741
Analysis finished2024-03-13 11:09:18.574925
Duration0.69 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct26
Distinct (%)72.2%
Missing0
Missing (%)0.0%
Memory size420.0 B
2024-03-13T20:09:18.691413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length3.0833333
Min length2

Characters and Unicode

Total characters111
Distinct characters38
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)44.4%

Sample

1st row합계
2nd row종로구
3rd row중구
4th row용산구
5th row성동구
ValueCountFrequency (%)
강동구 2
 
5.6%
동대문구 2
 
5.6%
영등포구 2
 
5.6%
강남구 2
 
5.6%
광진구 2
 
5.6%
서초구 2
 
5.6%
은평구 2
 
5.6%
금천구 2
 
5.6%
강서구 2
 
5.6%
양천구 2
 
5.6%
Other values (16) 16
44.4%
2024-03-13T20:09:18.974820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
36
32.4%
7
 
6.3%
6
 
5.4%
5
 
4.5%
4
 
3.6%
3
 
2.7%
3
 
2.7%
3
 
2.7%
2
 
1.8%
2
 
1.8%
Other values (28) 40
36.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 111
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
36
32.4%
7
 
6.3%
6
 
5.4%
5
 
4.5%
4
 
3.6%
3
 
2.7%
3
 
2.7%
3
 
2.7%
2
 
1.8%
2
 
1.8%
Other values (28) 40
36.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 111
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
36
32.4%
7
 
6.3%
6
 
5.4%
5
 
4.5%
4
 
3.6%
3
 
2.7%
3
 
2.7%
3
 
2.7%
2
 
1.8%
2
 
1.8%
Other values (28) 40
36.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 111
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
36
32.4%
7
 
6.3%
6
 
5.4%
5
 
4.5%
4
 
3.6%
3
 
2.7%
3
 
2.7%
3
 
2.7%
2
 
1.8%
2
 
1.8%
Other values (28) 40
36.0%

연료별
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Memory size420.0 B
수소
35 
<NA>
 
1

Length

Max length4
Median length2
Mean length2.0555556
Min length2

Unique

Unique1 ?
Unique (%)2.8%

Sample

1st row<NA>
2nd row수소
3rd row수소
4th row수소
5th row수소

Common Values

ValueCountFrequency (%)
수소 35
97.2%
<NA> 1
 
2.8%

Length

2024-03-13T20:09:19.098356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T20:09:19.190831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
수소 35
97.2%
na 1
 
2.8%

용도별
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size420.0 B
비사업용
25 
사업용
10 
<NA>
 
1

Length

Max length4
Median length4
Mean length3.7222222
Min length3

Unique

Unique1 ?
Unique (%)2.8%

Sample

1st row<NA>
2nd row비사업용
3rd row비사업용
4th row비사업용
5th row비사업용

Common Values

ValueCountFrequency (%)
비사업용 25
69.4%
사업용 10
 
27.8%
<NA> 1
 
2.8%

Length

2024-03-13T20:09:19.294845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T20:09:19.418571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
비사업용 25
69.4%
사업용 10
 
27.8%
na 1
 
2.8%

승 용
Real number (ℝ)

HIGH CORRELATION 

Distinct30
Distinct (%)83.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean65.666667
Minimum1
Maximum1182
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size456.0 B
2024-03-13T20:09:19.555264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q18
median24
Q351.5
95-th percentile103
Maximum1182
Range1181
Interquartile range (IQR)43.5

Descriptive statistics

Standard deviation193.91382
Coefficient of variation (CV)2.9530024
Kurtosis33.984707
Mean65.666667
Median Absolute Deviation (MAD)20
Skewness5.7583971
Sum2364
Variance37602.571
MonotonicityNot monotonic
2024-03-13T20:09:19.685477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=30)
ValueCountFrequency (%)
1 4
 
11.1%
22 2
 
5.6%
29 2
 
5.6%
8 2
 
5.6%
1182 1
 
2.8%
35 1
 
2.8%
43 1
 
2.8%
17 1
 
2.8%
73 1
 
2.8%
51 1
 
2.8%
Other values (20) 20
55.6%
ValueCountFrequency (%)
1 4
11.1%
2 1
 
2.8%
3 1
 
2.8%
5 1
 
2.8%
7 1
 
2.8%
8 2
5.6%
10 1
 
2.8%
15 1
 
2.8%
17 1
 
2.8%
18 1
 
2.8%
ValueCountFrequency (%)
1182 1
2.8%
127 1
2.8%
95 1
2.8%
92 1
2.8%
85 1
2.8%
73 1
2.8%
66 1
2.8%
61 1
2.8%
53 1
2.8%
51 1
2.8%

승 합
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Memory size420.0 B
0
34 
2
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 34
94.4%
2 2
 
5.6%

Length

2024-03-13T20:09:19.788186image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T20:09:19.870156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 34
94.4%
2 2
 
5.6%

화 물
Categorical

CONSTANT 

Distinct1
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size420.0 B
0
36 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 36
100.0%

Length

2024-03-13T20:09:19.955896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T20:09:20.037526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 36
100.0%

특 수
Categorical

CONSTANT 

Distinct1
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size420.0 B
0
36 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 36
100.0%

Length

2024-03-13T20:09:20.118651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T20:09:20.197656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 36
100.0%


Real number (ℝ)

HIGH CORRELATION 

Distinct30
Distinct (%)83.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean65.777778
Minimum1
Maximum1184
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size456.0 B
2024-03-13T20:09:20.284935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q18
median24
Q351.5
95-th percentile103
Maximum1184
Range1183
Interquartile range (IQR)43.5

Descriptive statistics

Standard deviation194.2361
Coefficient of variation (CV)2.9529137
Kurtosis33.989181
Mean65.777778
Median Absolute Deviation (MAD)20.5
Skewness5.7589003
Sum2368
Variance37727.663
MonotonicityNot monotonic
2024-03-13T20:09:20.411286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=30)
ValueCountFrequency (%)
1 4
 
11.1%
22 2
 
5.6%
29 2
 
5.6%
8 2
 
5.6%
1184 1
 
2.8%
35 1
 
2.8%
43 1
 
2.8%
17 1
 
2.8%
73 1
 
2.8%
51 1
 
2.8%
Other values (20) 20
55.6%
ValueCountFrequency (%)
1 4
11.1%
2 1
 
2.8%
3 1
 
2.8%
5 1
 
2.8%
7 1
 
2.8%
8 2
5.6%
10 1
 
2.8%
15 1
 
2.8%
17 1
 
2.8%
18 1
 
2.8%
ValueCountFrequency (%)
1184 1
2.8%
127 1
2.8%
95 1
2.8%
92 1
2.8%
85 1
2.8%
73 1
2.8%
66 1
2.8%
61 1
2.8%
53 1
2.8%
51 1
2.8%

Interactions

2024-03-13T20:09:18.236878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:09:18.077363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:09:18.310944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:09:18.147861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-13T20:09:20.498631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군구별용도별승 용승 합
시군구별1.0000.0000.6161.0000.616
용도별0.0001.0000.0000.0000.000
승 용0.6160.0001.0000.4331.000
승 합1.0000.0000.4331.0000.433
0.6160.0001.0000.4331.000
2024-03-13T20:09:20.585798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승 합용도별연료별
승 합1.0000.0001.000
용도별0.0001.0001.000
연료별1.0001.0001.000
2024-03-13T20:09:20.659723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승 용연료별용도별승 합
승 용1.0001.0001.0000.0000.665
1.0001.0001.0000.0000.665
연료별1.0001.0001.0001.0001.000
용도별0.0000.0001.0001.0000.000
승 합0.6650.6651.0000.0001.000

Missing values

2024-03-13T20:09:18.422477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T20:09:18.535357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

시군구별연료별용도별승 용승 합화 물특 수
0합계<NA><NA>11822001184
1종로구수소비사업용4220044
2중구수소비사업용2200022
3용산구수소비사업용2900029
4성동구수소비사업용2900029
5광진구수소비사업용1900019
6광진구수소사업용10001
7동대문구수소비사업용2200022
8동대문구수소사업용70007
9중랑구수소비사업용1500015
시군구별연료별용도별승 용승 합화 물특 수
26영등포구수소사업용80008
27동작구수소비사업용5100051
28관악구수소비사업용3500035
29서초구수소비사업용127000127
30서초구수소사업용80008
31강남구수소비사업용9500095
32강남구수소사업용10001
33송파구수소비사업용9200092
34강동구수소비사업용8500085
35강동구수소사업용30003