gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	372
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	12.5 KiB
Average record size in memory	34.4 B

Variable types

Numeric	2
Text	1
Categorical	1

Dataset

Description	한국도로공사 고속도로 영업소별 하이패스 이용비율 관련 정보를 제공한다.(구분, 영업소, 하이패스 이용비율, 비고)
URL	https://www.data.go.kr/data/15101912/fileData.do

Alerts

`구분` is highly overall correlated with `하이패스 이용비율` and 1 other fields	High correlation
`하이패스 이용비율` is highly overall correlated with `구분`	High correlation
`비고` is highly overall correlated with `구분`	High correlation
`비고` is highly imbalanced (68.7%)	Imbalance
`구분` has unique values	Unique
`영업소` has unique values	Unique

Reproduction

Analysis started	2023-12-12 04:57:41.788404
Analysis finished	2023-12-12 04:57:42.689157
Duration	0.9 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

구분
Real number (ℝ)

HIGH CORRELATION UNIQUE

Distinct	372
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	186.5

Minimum	1
Maximum	372
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	3.4 KiB

Quantile statistics

Minimum	1
5-th percentile	19.55
Q1	93.75
median	186.5
Q3	279.25
95-th percentile	353.45
Maximum	372
Range	371
Interquartile range (IQR)	185.5

Descriptive statistics

Standard deviation	107.53139
Coefficient of variation (CV)	0.57657582
Kurtosis	-1.2
Mean	186.5
Median Absolute Deviation (MAD)	93
Skewness	0
Sum	69378
Variance	11563
Monotonicity	Strictly increasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
1	1	0.3%
247	1	0.3%
256	1	0.3%
255	1	0.3%
254	1	0.3%
253	1	0.3%
252	1	0.3%
251	1	0.3%
250	1	0.3%
249	1	0.3%
Other values (362)	362	97.3%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	1	0.3%
2	1	0.3%
3	1	0.3%
4	1	0.3%
5	1	0.3%
6	1	0.3%
7	1	0.3%
8	1	0.3%
9	1	0.3%
10	1	0.3%

Value	Count	Frequency (%)
372	1	0.3%
371	1	0.3%
370	1	0.3%
369	1	0.3%
368	1	0.3%
367	1	0.3%
366	1	0.3%
365	1	0.3%
364	1	0.3%
363	1	0.3%

영업소
Text

UNIQUE

Distinct	372
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	3.0 KiB

Length

Max length	9
Median length	2
Mean length	2.5295699
Min length	2

Characters and Unicode

Total characters	941
Distinct characters	190
Distinct categories	5 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	372 ?
Unique (%)	100.0%

Sample

1st row	기흥동탄
2nd row	수원신갈
3rd row	동수원
4th row	북여주
5th row	북수원

Value	Count	Frequency (%)
기흥동탄	1	0.3%
동고령	1	0.3%
사천	1	0.3%
서순천	1	0.3%
서안동	1	0.3%
상주	1	0.3%
영동	1	0.3%
동순천	1	0.3%
북영천	1	0.3%
목포	1	0.3%
Other values (362)	362	97.3%

Most occurring characters

Value	Count	Frequency (%)
주	42	4.5%
서	40	4.3%
산	38	4.0%
동	37	3.9%
천	37	3.9%
남	34	3.6%
양	28	3.0%
성	24	2.6%
북	24	2.6%
안	19	2.0%
Other values (180)	618	65.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	929	98.7%
Close Punctuation	4	0.4%
Open Punctuation	4	0.4%
Uppercase Letter	3	0.3%
Decimal Number	1	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
주	42	4.5%
서	40	4.3%
산	38	4.1%
동	37	4.0%
천	37	4.0%
남	34	3.7%
양	28	3.0%
성	24	2.6%
북	24	2.6%
안	19	2.0%
Other values (174)	606	65.2%

Uppercase Letter

Value	Count	Frequency (%)
E	1	33.3%
K	1	33.3%
C	1	33.3%

Close Punctuation

Value	Count	Frequency (%)
)	4	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	4	100.0%

Decimal Number

Value	Count	Frequency (%)
2	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	929	98.7%
Common	9	1.0%
Latin	3	0.3%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
주	42	4.5%
서	40	4.3%
산	38	4.1%
동	37	4.0%
천	37	4.0%
남	34	3.7%
양	28	3.0%
성	24	2.6%
북	24	2.6%
안	19	2.0%
Other values (174)	606	65.2%

Common

Value	Count	Frequency (%)
)	4	44.4%
(	4	44.4%
2	1	11.1%

Latin

Value	Count	Frequency (%)
E	1	33.3%
K	1	33.3%
C	1	33.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	929	98.7%
ASCII	12	1.3%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
주	42	4.5%
서	40	4.3%
산	38	4.1%
동	37	4.0%
천	37	4.0%
남	34	3.7%
양	28	3.0%
성	24	2.6%
북	24	2.6%
안	19	2.0%
Other values (174)	606	65.2%

ASCII

Value	Count	Frequency (%)
)	4	33.3%
(	4	33.3%
E	1	8.3%
K	1	8.3%
C	1	8.3%
2	1	8.3%

하이패스 이용비율
Real number (ℝ)

HIGH CORRELATION

Distinct	96
Distinct (%)	25.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	86.595968

Minimum	78.9
Maximum	92.6
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	3.4 KiB

Quantile statistics

Minimum	78.9
5-th percentile	83.155
Q1	85.1
median	86.4
Q3	87.9
95-th percentile	90.9
Maximum	92.6
Range	13.7
Interquartile range (IQR)	2.8

Descriptive statistics

Standard deviation	2.3041881
Coefficient of variation (CV)	0.026608492
Kurtosis	0.13871039
Mean	86.595968
Median Absolute Deviation (MAD)	1.4
Skewness	0.17994416
Sum	32213.7
Variance	5.3092829
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
85.2	11	3.0%
85.3	11	3.0%
85.8	10	2.7%
86.7	9	2.4%
86.4	9	2.4%
86.2	8	2.2%
86.8	8	2.2%
86.6	8	2.2%
85.0	8	2.2%
85.4	8	2.2%
Other values (86)	282	75.8%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
78.9	1	0.3%
80.3	1	0.3%
81.0	1	0.3%
81.3	1	0.3%
81.4	1	0.3%
81.5	1	0.3%
81.8	2	0.5%
82.0	1	0.3%
82.1	2	0.5%
82.4	1	0.3%

Value	Count	Frequency (%)
92.6	1	0.3%
92.3	1	0.3%
92.0	2	0.5%
91.7	3	0.8%
91.6	2	0.5%
91.5	2	0.5%
91.4	1	0.3%
91.3	1	0.3%
91.2	2	0.5%
91.1	2	0.5%

비고
Categorical

HIGH CORRELATION IMBALANCE

Distinct	2
Distinct (%)	0.5%
Missing	0
Missing (%)	0.0%
Memory size	3.0 KiB

폐쇄식	351
개방식	21

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	폐쇄식
2nd row	폐쇄식
3rd row	폐쇄식
4th row	폐쇄식
5th row	폐쇄식

Common Values

Value	Count	Frequency (%)
폐쇄식	351	94.4%
개방식	21	5.6%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
폐쇄식	351	94.4%
개방식	21	5.6%

구분
하이패스 이용비율

하이패스 이용비율
구분

하이패스 이용비율
구분

Phik (φk)
Auto

Heatmap
Table

	구분	하이패스 이용비율	비고
구분	1.000	0.954	0.882
하이패스 이용비율	0.954	1.000	0.479
비고	0.882	0.479	1.000

Heatmap
Table

	구분	하이패스 이용비율	비고
구분	1.000	-0.804	0.709
하이패스 이용비율	-0.804	1.000	0.365
비고	0.709	0.365	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	구분	영업소	하이패스 이용비율	비고
0	1	기흥동탄	92.6	폐쇄식
1	2	수원신갈	92.3	폐쇄식
2	3	동수원	92.0	폐쇄식
3	4	북여주	92.0	폐쇄식
4	5	북수원	91.7	폐쇄식
5	6	마성	91.6	폐쇄식
6	7	서안산	91.5	폐쇄식
7	8	부곡	91.4	폐쇄식
8	9	경기광주	91.2	폐쇄식
9	10	남여주	91.2	폐쇄식

	구분	영업소	하이패스 이용비율	비고
362	363	다사	89.1	개방식
363	364	김포	88.5	개방식
364	365	인천	87.8	개방식
365	366	내서	86.4	개방식
366	367	가락(개)	85.5	개방식
367	368	대동(개)	84.2	개방식
368	369	순천만	82.1	개방식
369	370	서영암(개)	81.0	개방식
370	371	일로	80.3	개방식
371	372	동부산	89.2	개방식

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Close Punctuation

Open Punctuation

Decimal Number

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample