gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	669
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	22.3 KiB
Average record size in memory	34.2 B

Variable types

Numeric	2
Text	1
Categorical	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-12764/A/1/datasetView.do

Alerts

`SUBWAY_ID` is highly overall correlated with `STATN_ID` and 1 other fields	High correlation
`STATN_ID` is highly overall correlated with `SUBWAY_ID` and 1 other fields	High correlation
`호선이름` is highly overall correlated with `SUBWAY_ID` and 1 other fields	High correlation
`STATN_ID` has unique values	Unique

Reproduction

Analysis started	2024-05-11 07:15:21.416309
Analysis finished	2024-05-11 07:15:24.214630
Duration	2.8 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

SUBWAY_ID
Real number (ℝ)

HIGH CORRELATION

Distinct	17
Distinct (%)	2.5%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	1027.0135

Minimum	1001
Maximum	1093
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	6.0 KiB

Quantile statistics

Minimum	1001
5-th percentile	1001
Q1	1003
median	1006
Q3	1063
95-th percentile	1087.6
Maximum	1093
Range	92
Interquartile range (IQR)	60

Descriptive statistics

Standard deviation	33.138994
Coefficient of variation (CV)	0.032267342
Kurtosis	-1.1643279
Mean	1027.0135
Median Absolute Deviation (MAD)	4
Skewness	0.81675923
Sum	687072
Variance	1098.1929
Monotonicity	Increasing

Histogram with fixed size bins (bins=17)

Value	Count	Frequency (%)
1001	102	15.2%
1075	63	9.4%
1063	57	8.5%
1005	56	8.4%
1007	53	7.9%
1002	51	7.6%
1004	48	7.2%
1003	44	6.6%
1006	39	5.8%
1009	38	5.7%
Other values (7)	118	17.6%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1001	102	15.2%
1002	51	7.6%
1003	44	6.6%
1004	48	7.2%
1005	56	8.4%
1006	39	5.8%
1007	53	7.9%
1008	18	2.7%
1009	38	5.7%
1063	57	8.5%

Value	Count	Frequency (%)
1093	21	3.1%
1092	13	1.9%
1081	11	1.6%
1077	16	2.4%
1075	63	9.4%
1067	25	3.7%
1065	14	2.1%
1063	57	8.5%
1009	38	5.7%
1008	18	2.7%

STATN_ID
Real number (ℝ)

HIGH CORRELATION UNIQUE

Distinct	669
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	1.027038 × 10⁹

Minimum	1.0010001 × 10⁹
Maximum	1.093004 × 10⁹
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	6.0 KiB

Quantile statistics

Minimum	1.0010001 × 10⁹
5-th percentile	1.0010001 × 10⁹
Q1	1.0030003 × 10⁹
median	1.0060006 × 10⁹
Q3	1.0630753 × 10⁹
95-th percentile	1.0876178 × 10⁹
Maximum	1.093004 × 10⁹
Range	92003922
Interquartile range (IQR)	60075013

Descriptive statistics

Standard deviation	33156862
Coefficient of variation (CV)	0.032283969
Kurtosis	-1.166184
Mean	1.027038 × 10⁹
Median Absolute Deviation (MAD)	4000430
Skewness	0.81629874
Sum	6.870884 × 10¹¹
Variance	1.0993775 × 10¹⁵
Monotonicity	Strictly increasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
1001000100	1	0.1%
1009000930	1	0.1%
1009000932	1	0.1%
1009000933	1	0.1%
1009000934	1	0.1%
1009000935	1	0.1%
1009000936	1	0.1%
1009000937	1	0.1%
1009000938	1	0.1%
1063075110	1	0.1%
Other values (659)	659	98.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1001000100	1	0.1%
1001000101	1	0.1%
1001000102	1	0.1%
1001000103	1	0.1%
1001000104	1	0.1%
1001000105	1	0.1%
1001000106	1	0.1%
1001000107	1	0.1%
1001000108	1	0.1%
1001000109	1	0.1%

Value	Count	Frequency (%)
1093004022	1	0.1%
1093004021	1	0.1%
1093004020	1	0.1%
1093004019	1	0.1%
1093004018	1	0.1%
1093004017	1	0.1%
1093004016	1	0.1%
1093004014	1	0.1%
1093004013	1	0.1%
1093004012	1	0.1%

STATN_NM
Text

Distinct	548
Distinct (%)	81.9%
Missing	0
Missing (%)	0.0%
Memory size	5.4 KiB

Length

Max length	16
Median length	2
Mean length	3.0508221
Min length	2

Characters and Unicode

Total characters	2041
Distinct characters	294
Distinct categories	6 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	447 ?
Unique (%)	66.8%

Sample

1st row	소요산
2nd row	동두천
3rd row	보산
4th row	동두천중앙
5th row	지행

Value	Count	Frequency (%)
청량리	4	0.6%
김포공항	4	0.6%
공덕	4	0.6%
서울	4	0.6%
왕십리	4	0.6%
홍대입구	3	0.4%
디지털미디어시티	3	0.4%
신설동	3	0.4%
대곡	3	0.4%
고속터미널	3	0.4%
Other values (539)	635	94.8%

Most occurring characters

Value	Count	Frequency (%)
대	69	3.4%
산	56	2.7%
신	52	2.5%
구	51	2.5%
동	48	2.4%
천	41	2.0%
원	38	1.9%
정	36	1.8%
청	32	1.6%
)	29	1.4%
Other values (284)	1589	77.9%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1967	96.4%
Close Punctuation	29	1.4%
Open Punctuation	29	1.4%
Decimal Number	13	0.6%
Other Punctuation	2	0.1%
Space Separator	1	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
대	69	3.5%
산	56	2.8%
신	52	2.6%
구	51	2.6%
동	48	2.4%
천	41	2.1%
원	38	1.9%
정	36	1.8%
청	32	1.6%
지	29	1.5%
Other values (273)	1515	77.0%

Decimal Number

Value	Count	Frequency (%)
3	5	38.5%
4	3	23.1%
1	2	15.4%
2	1	7.7%
9	1	7.7%
5	1	7.7%

Other Punctuation

Value	Count	Frequency (%)
.	1	50.0%
,	1	50.0%

Close Punctuation

Value	Count	Frequency (%)
)	29	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	29	100.0%

Space Separator

Value	Count	Frequency (%)
	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1967	96.4%
Common	74	3.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
대	69	3.5%
산	56	2.8%
신	52	2.6%
구	51	2.6%
동	48	2.4%
천	41	2.1%
원	38	1.9%
정	36	1.8%
청	32	1.6%
지	29	1.5%
Other values (273)	1515	77.0%

Common

Value	Count	Frequency (%)
)	29	39.2%
(	29	39.2%
3	5	6.8%
4	3	4.1%
1	2	2.7%
2	1	1.4%
	1	1.4%
.	1	1.4%
9	1	1.4%
,	1	1.4%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1967	96.4%
ASCII	74	3.6%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
대	69	3.5%
산	56	2.8%
신	52	2.6%
구	51	2.6%
동	48	2.4%
천	41	2.1%
원	38	1.9%
정	36	1.8%
청	32	1.6%
지	29	1.5%
Other values (273)	1515	77.0%

ASCII

Value	Count	Frequency (%)
)	29	39.2%
(	29	39.2%
3	5	6.8%
4	3	4.1%
1	2	2.7%
2	1	1.4%
	1	1.4%
.	1	1.4%
9	1	1.4%
,	1	1.4%

호선이름
Categorical

HIGH CORRELATION

Distinct	17
Distinct (%)	2.5%
Missing	0
Missing (%)	0.0%
Memory size	5.4 KiB

1호선	102
수인분당선	63
경의중앙선	57
5호선	56
7호선	53
Other values (12)	338

Length

Max length	5
Median length	3
Mean length	3.4424514
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	1호선
2nd row	1호선
3rd row	1호선
4th row	1호선
5th row	1호선

Common Values

Value	Count	Frequency (%)
1호선	102	15.2%
수인분당선	63	9.4%
경의중앙선	57	8.5%
5호선	56	8.4%
7호선	53	7.9%
2호선	51	7.6%
4호선	48	7.2%
3호선	44	6.6%
6호선	39	5.8%
9호선	38	5.7%
Other values (7)	118	17.6%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
1호선	102	15.2%
수인분당선	63	9.4%
경의중앙선	57	8.5%
5호선	56	8.4%
7호선	53	7.9%
2호선	51	7.6%
4호선	48	7.2%
3호선	44	6.6%
6호선	39	5.8%
9호선	38	5.7%
Other values (7)	118	17.6%

SUBWAY_ID
STATN_ID

STATN_ID
SUBWAY_ID

STATN_ID
SUBWAY_ID

Phik (φk)
Auto

Heatmap
Table

	SUBWAY_ID	STATN_ID	호선이름
SUBWAY_ID	1.000	1.000	1.000
STATN_ID	1.000	1.000	1.000
호선이름	1.000	1.000	1.000

Heatmap
Table

	SUBWAY_ID	STATN_ID	호선이름
SUBWAY_ID	1.000	0.996	0.991
STATN_ID	0.996	1.000	0.991
호선이름	0.991	0.991	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	SUBWAY_ID	STATN_ID	STATN_NM	호선이름
0	1001	1001000100	소요산	1호선
1	1001	1001000101	동두천	1호선
2	1001	1001000102	보산	1호선
3	1001	1001000103	동두천중앙	1호선
4	1001	1001000104	지행	1호선
5	1001	1001000105	덕정	1호선
6	1001	1001000106	덕계	1호선
7	1001	1001000107	양주	1호선
8	1001	1001000108	녹양	1호선
9	1001	1001000109	가능	1호선

	SUBWAY_ID	STATN_ID	STATN_NM	호선이름
659	1093	1093004012	시흥대야	서해선
660	1093	1093004013	신천	서해선
661	1093	1093004014	신현	서해선
662	1093	1093004016	시흥시청	서해선
663	1093	1093004017	시흥능곡	서해선
664	1093	1093004018	달미	서해선
665	1093	1093004019	선부	서해선
666	1093	1093004020	초지	서해선
667	1093	1093004021	시우	서해선
668	1093	1093004022	원시	서해선

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Other Punctuation

Close Punctuation

Open Punctuation

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Common Values

Length

Interactions

Correlations

Missing values

Sample