gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	269
Missing cells	102
Missing cells (%)	9.5%
Duplicate rows	1
Duplicate rows (%)	0.4%
Total size in memory	9.1 KiB
Average record size in memory	34.5 B

Variable types

Text	1
Categorical	1
Numeric	2

Dataset

Description	매년 발간되는 철도통계연보에 수록된 역별 승하차 현황으로(역명, 승차인원, 하차인원 등), 간선여객 수송인원에 한하여 제공합니다.
URL	https://www.data.go.kr/data/15029727/fileData.do

Alerts

`단위` has constant value ""	Constant
Dataset has 1 (0.4%) duplicate rows	Duplicates
`승차인원` is highly overall correlated with `하차인원`	High correlation
`하차인원` is highly overall correlated with `승차인원`	High correlation
`역명` has 34 (12.6%) missing values	Missing
`승차인원` has 34 (12.6%) missing values	Missing
`하차인원` has 34 (12.6%) missing values	Missing

Reproduction

Analysis started	2023-12-11 23:49:14.074742
Analysis finished	2023-12-11 23:49:14.889215
Duration	0.81 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

역명
Text

MISSING

Distinct	235
Distinct (%)	100.0%
Missing	34
Missing (%)	12.6%
Memory size	2.2 KiB

Length

Max length	5
Median length	2
Mean length	2.2765957
Min length	2

Characters and Unicode

Total characters	535
Distinct characters	179
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	235 ?
Unique (%)	100.0%

Sample

1st row	서울
2nd row	용산
3rd row	행신
4th row	일산
5th row	도라산

Value	Count	Frequency (%)
미군기지	1	0.4%
북영천	1	0.4%
영덕	1	0.4%
명봉	1	0.4%
김천	1	0.4%
구미	1	0.4%
약목	1	0.4%
왜관	1	0.4%
신동	1	0.4%
대구	1	0.4%
Other values (225)	225	95.7%

Most occurring characters

Value	Count	Frequency (%)
천	26	4.9%
주	20	3.7%
산	20	3.7%
동	14	2.6%
성	13	2.4%
양	13	2.4%
원	12	2.2%
신	10	1.9%
구	10	1.9%
서	10	1.9%
Other values (169)	387	72.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	535	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
천	26	4.9%
주	20	3.7%
산	20	3.7%
동	14	2.6%
성	13	2.4%
양	13	2.4%
원	12	2.2%
신	10	1.9%
구	10	1.9%
서	10	1.9%
Other values (169)	387	72.3%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	535	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
천	26	4.9%
주	20	3.7%
산	20	3.7%
동	14	2.6%
성	13	2.4%
양	13	2.4%
원	12	2.2%
신	10	1.9%
구	10	1.9%
서	10	1.9%
Other values (169)	387	72.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	535	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
천	26	4.9%
주	20	3.7%
산	20	3.7%
동	14	2.6%
성	13	2.4%
양	13	2.4%
원	12	2.2%
신	10	1.9%
구	10	1.9%
서	10	1.9%
Other values (169)	387	72.3%

단위
Categorical

CONSTANT

Distinct	1
Distinct (%)	0.4%
Missing	0
Missing (%)	0.0%
Memory size	2.2 KiB

명	269

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	명
2nd row	명
3rd row	명
4th row	명
5th row	명

Common Values

Value	Count	Frequency (%)
명	269	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
명	269	100.0%

승차인원
Real number (ℝ)

HIGH CORRELATION MISSING

Distinct	235
Distinct (%)	100.0%
Missing	34
Missing (%)	12.6%
Infinite	0
Infinite (%)	0.0%
Mean	516620.87

Minimum	0
Maximum	16701060
Zeros	1
Zeros (%)	0.4%
Negative	0
Negative (%)	0.0%
Memory size	2.5 KiB

Quantile statistics

Minimum	0
5-th percentile	349
Q1	5878
median	43905
Q3	324954.5
95-th percentile	2415907.7
Maximum	16701060
Range	16701060
Interquartile range (IQR)	319076.5

Descriptive statistics

Standard deviation	1563528.9
Coefficient of variation (CV)	3.0264532
Kurtosis	53.86825
Mean	516620.87
Median Absolute Deviation (MAD)	42580
Skewness	6.4076305
Sum	1.214059 × 10⁸
Variance	2.4446226 × 10¹²
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
345168	1	0.4%
1325	1	0.4%
764427	1	0.4%
1978650	1	0.4%
44916	1	0.4%
651879	1	0.4%
2980	1	0.4%
1834877	1	0.4%
8263464	1	0.4%
958041	1	0.4%
Other values (225)	225	83.6%
(Missing)	34	12.6%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
0	1	0.4%
2	1	0.4%
3	1	0.4%
10	1	0.4%
49	1	0.4%
58	1	0.4%
66	1	0.4%
82	1	0.4%
112	1	0.4%
115	1	0.4%

Value	Count	Frequency (%)
16701060	1	0.4%
8263464	1	0.4%
7190808	1	0.4%
6397281	1	0.4%
6000682	1	0.4%
5260330	1	0.4%
4941467	1	0.4%
3797768	1	0.4%
3740829	1	0.4%
3440680	1	0.4%

하차인원
Real number (ℝ)

HIGH CORRELATION MISSING

Distinct	234
Distinct (%)	99.6%
Missing	34
Missing (%)	12.6%
Infinite	0
Infinite (%)	0.0%
Mean	516620.87

Minimum	0
Maximum	16649827
Zeros	2
Zeros (%)	0.7%
Negative	0
Negative (%)	0.0%
Memory size	2.5 KiB

Quantile statistics

Minimum	0
5-th percentile	180.6
Q1	6282.5
median	46488
Q3	308881.5
95-th percentile	2414734.8
Maximum	16649827
Range	16649827
Interquartile range (IQR)	302599

Descriptive statistics

Standard deviation	1566097
Coefficient of variation (CV)	3.0314241
Kurtosis	53.076
Mean	516620.87
Median Absolute Deviation (MAD)	45133
Skewness	6.3690686
Sum	1.214059 × 10⁸
Variance	2.4526597 × 10¹²
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2	0.7%
343225	1	0.4%
1355	1	0.4%
766632	1	0.4%
1994872	1	0.4%
51498	1	0.4%
644679	1	0.4%
3525	1	0.4%
1836905	1	0.4%
8300091	1	0.4%
Other values (224)	224	83.3%
(Missing)	34	12.6%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
0	2	0.7%
1	1	0.4%
3	1	0.4%
16	1	0.4%
24	1	0.4%
32	1	0.4%
58	1	0.4%
82	1	0.4%
91	1	0.4%
104	1	0.4%

Value	Count	Frequency (%)
16649827	1	0.4%
8300091	1	0.4%
7274191	1	0.4%
6400902	1	0.4%
6157505	1	0.4%
5323229	1	0.4%
4892871	1	0.4%
3735966	1	0.4%
3715260	1	0.4%
3516441	1	0.4%

승차인원
하차인원

하차인원
승차인원

하차인원
승차인원

Phik (φk)
Auto

Heatmap
Table

	승차인원	하차인원
승차인원	1.000	1.000
하차인원	1.000	1.000

Heatmap
Table

	승차인원	하차인원
승차인원	1.000	0.999
하차인원	0.999	1.000

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	역명	단위	승차인원	하차인원
0	서울	명	16701060	16649827
1	용산	명	6000682	6157505
2	행신	명	914436	878286
3	일산	명	457	464
4	도라산	명	853	853
5	서빙고	명	10	0
6	영등포	명	3440680	3516441
7	안양	명	192182	166519
8	수원	명	5260330	5323229
9	오산	명	82513	90305

	역명	단위	승차인원	하차인원
259	<NA>	명	<NA>	<NA>
260	<NA>	명	<NA>	<NA>
261	<NA>	명	<NA>	<NA>
262	<NA>	명	<NA>	<NA>
263	<NA>	명	<NA>	<NA>
264	<NA>	명	<NA>	<NA>
265	<NA>	명	<NA>	<NA>
266	<NA>	명	<NA>	<NA>
267	<NA>	명	<NA>	<NA>
268	<NA>	명	<NA>	<NA>

Most frequently occurring

	역명	단위	승차인원	하차인원	# duplicates
0	<NA>	명	<NA>	<NA>	34

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring