gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	57
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	1.1 KiB
Average record size in memory	19.3 B

Variable types

Numeric	1
Text	1

Dataset

Description	경기도_BMS 시외버스 노선
Author	경기도
URL	https://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=CH40CJSQILRV1RO9BSHQ34365008&infSeq=1

Alerts

`노선ID` has unique values	Unique
`노선명` has unique values	Unique

Reproduction

Analysis started	2023-12-10 22:37:30.698731
Analysis finished	2023-12-10 22:37:30.950590
Duration	0.25 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

노선ID
Real number (ℝ)

UNIQUE

Distinct	57
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	2.4100602 × 10⁸

Minimum	2.4100204 × 10⁸
Maximum	2.4100716 × 10⁸
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	645.0 B

Quantile statistics

Minimum	2.4100204 × 10⁸
5-th percentile	2.4100306 × 10⁸
Q1	2.4100589 × 10⁸
median	2.4100658 × 10⁸
Q3	2.4100703 × 10⁸
95-th percentile	2.4100714 × 10⁸
Maximum	2.4100716 × 10⁸
Range	5125
Interquartile range (IQR)	1142

Descriptive statistics

Standard deviation	1317.3104
Coefficient of variation (CV)	5.4658818 × 10^-6
Kurtosis	1.2838929
Mean	2.4100602 × 10⁸
Median Absolute Deviation (MAD)	567
Skewness	-1.4649284
Sum	1.3737343 × 10¹⁰
Variance	1735306.7
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
241006868	1	1.8%
241006889	1	1.8%
241005960	1	1.8%
241005980	1	1.8%
241005890	1	1.8%
241007070	1	1.8%
241005900	1	1.8%
241005970	1	1.8%
241006973	1	1.8%
241007074	1	1.8%
Other values (47)	47	82.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
241002040	1	1.8%
241002680	1	1.8%
241003000	1	1.8%
241003070	1	1.8%
241003550	1	1.8%
241003860	1	1.8%
241003960	1	1.8%
241003980	1	1.8%
241003990	1	1.8%
241004890	1	1.8%

Value	Count	Frequency (%)
241007165	1	1.8%
241007164	1	1.8%
241007147	1	1.8%
241007135	1	1.8%
241007096	1	1.8%
241007077	1	1.8%
241007076	1	1.8%
241007075	1	1.8%
241007074	1	1.8%
241007073	1	1.8%

노선명
Text

UNIQUE

Distinct	57
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	588.0 B

Length

Max length	16
Median length	15
Mean length	9.6315789
Min length	4

Characters and Unicode

Total characters	549
Distinct characters	78
Distinct categories	6 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	57 ?
Unique (%)	100.0%

Sample

1st row	8877
2nd row	9701-1안동-인천공항
3rd row	A4300
4th row	8852
5th row	7001

Value	Count	Frequency (%)
8877	1	1.8%
8835안녕동-인천공항	1	1.8%
7300의정부-김포공항	1	1.8%
8829이천-인천공항	1	1.8%
8844진접-인천공항	1	1.8%
4000	1	1.8%
7100전곡-인천공항	1	1.8%
7200의정부-인천공항	1	1.8%
8455고양-안성	1	1.8%
4200	1	1.8%
Other values (47)	47	82.5%

Most occurring characters

Value	Count	Frequency (%)
0	71	12.9%
-	49	8.9%
항	38	6.9%
공	38	6.9%
8	36	6.6%
천	35	6.4%
인	33	6.0%
4	22	4.0%
5	20	3.6%
1	19	3.5%
Other values (68)	188	34.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	256	46.6%
Decimal Number	236	43.0%
Dash Punctuation	49	8.9%
Close Punctuation	3	0.5%
Open Punctuation	3	0.5%
Uppercase Letter	2	0.4%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
항	38	14.8%
공	38	14.8%
천	35	13.7%
인	33	12.9%
안	6	2.3%
포	6	2.3%
주	5	2.0%
김	5	2.0%
역	5	2.0%
동	5	2.0%
Other values (54)	80	31.2%

Decimal Number

Value	Count	Frequency (%)
0	71	30.1%
8	36	15.3%
4	22	9.3%
5	20	8.5%
1	19	8.1%
2	18	7.6%
7	17	7.2%
3	16	6.8%
9	13	5.5%
6	4	1.7%

Dash Punctuation

Value	Count	Frequency (%)
-	49	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	3	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	3	100.0%

Uppercase Letter

Value	Count	Frequency (%)
A	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	291	53.0%
Hangul	256	46.6%
Latin	2	0.4%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
항	38	14.8%
공	38	14.8%
천	35	13.7%
인	33	12.9%
안	6	2.3%
포	6	2.3%
주	5	2.0%
김	5	2.0%
역	5	2.0%
동	5	2.0%
Other values (54)	80	31.2%

Common

Value	Count	Frequency (%)
0	71	24.4%
-	49	16.8%
8	36	12.4%
4	22	7.6%
5	20	6.9%
1	19	6.5%
2	18	6.2%
7	17	5.8%
3	16	5.5%
9	13	4.5%
Other values (3)	10	3.4%

Latin

Value	Count	Frequency (%)
A	2	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	293	53.4%
Hangul	256	46.6%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	71	24.2%
-	49	16.7%
8	36	12.3%
4	22	7.5%
5	20	6.8%
1	19	6.5%
2	18	6.1%
7	17	5.8%
3	16	5.5%
9	13	4.4%
Other values (4)	12	4.1%

Hangul

Value	Count	Frequency (%)
항	38	14.8%
공	38	14.8%
천	35	13.7%
인	33	12.9%
안	6	2.3%
포	6	2.3%
주	5	2.0%
김	5	2.0%
역	5	2.0%
동	5	2.0%
Other values (54)	80	31.2%

노선ID

노선ID

Phik (φk)

Heatmap
Table

	노선ID	노선명
노선ID	1.000	1.000
노선명	1.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	노선ID	노선명
0	241006868	8877
1	241007135	9701-1안동-인천공항
2	241007076	A4300
3	241004900	8852
4	241006590	7001
5	241006580	7000
6	241007165	7000A
7	241005910	5100신흥동-김포공항
8	241005880	8843-1마석-인천공항
9	241003990	9701안동-인천공항

	노선ID	노선명
47	241007075	4200-1
48	241007071	4100
49	241006010	8834안성-인천공항
50	241006894	8864평택-인천공항
51	241004890	8165
52	241006880	3903북부-인천
53	241007147	8848하남-강변역-인천공항
54	241006855	9500(인천공항-정읍)
55	241007096	1357김포공항-연무대(논산)
56	241003000	4800

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Dash Punctuation

Close Punctuation

Open Punctuation

Uppercase Letter

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Interactions

Correlations

Missing values

Sample