gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	1125
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	2
Duplicate rows (%)	0.2%
Total size in memory	44.1 KiB
Average record size in memory	40.1 B

Variable types

Categorical	3
Text	2

Dataset

Description	수도권1호선에 포함된 도시광역철도역들의 철도운영기관명,선명,역명,출구번호,출구별 주요시설명, 주소 등의 데이터 입니다.
Author	국가철도공단
URL	https://www.data.go.kr/data/15073464/fileData.do

Alerts

`선명` has constant value ""	Constant
Dataset has 2 (0.2%) duplicate rows	Duplicates
`철도운영기관명` is highly overall correlated with `출구번호`	High correlation
`출구번호` is highly overall correlated with `철도운영기관명`	High correlation

Reproduction

Analysis started	2023-12-12 17:30:23.785858
Analysis finished	2023-12-12 17:30:24.812720
Duration	1.03 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

철도운영기관명
Categorical

HIGH CORRELATION

Distinct	2
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

코레일	648
서울교통공사	477

Length

Max length	6
Median length	3
Mean length	4.272
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	코레일
2nd row	코레일
3rd row	코레일
4th row	코레일
5th row	코레일

Common Values

Value	Count	Frequency (%)
코레일	648	57.6%
서울교통공사	477	42.4%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
코레일	648	57.6%
서울교통공사	477	42.4%

선명
Categorical

CONSTANT

Distinct	1
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

1호선	1125

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	1호선
2nd row	1호선
3rd row	1호선
4th row	1호선
5th row	1호선

Common Values

Value	Count	Frequency (%)
1호선	1125	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
1호선	1125	100.0%

역명
Text

Distinct	62
Distinct (%)	5.5%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

Length

Max length	12
Median length	5
Mean length	2.7111111
Min length	2

Characters and Unicode

Total characters	3050
Distinct characters	88
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	소요산
2nd row	소요산
3rd row	소요산
4th row	동두천
5th row	동두천

Value	Count	Frequency (%)
시청	119	10.6%
서울역	59	5.2%
신설동	57	5.1%
신도림	51	4.5%
종로3가	51	4.5%
동묘앞	47	4.2%
종로5가	46	4.1%
제기동	36	3.2%
창동	33	2.9%
의정부	32	2.8%
Other values (52)	594	52.8%

Most occurring characters

Value	Count	Frequency (%)
동	249	8.2%
시	131	4.3%
청	131	4.3%
신	131	4.3%
종	122	4.0%
가	107	3.5%
로	105	3.4%
도	88	2.9%
부	79	2.6%
서	71	2.3%
Other values (78)	1836	60.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	2929	96.0%
Decimal Number	97	3.2%
Close Punctuation	12	0.4%
Open Punctuation	12	0.4%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
동	249	8.5%
시	131	4.5%
청	131	4.5%
신	131	4.5%
종	122	4.2%
가	107	3.7%
로	105	3.6%
도	88	3.0%
부	79	2.7%
서	71	2.4%
Other values (74)	1715	58.6%

Decimal Number

Value	Count	Frequency (%)
3	51	52.6%
5	46	47.4%

Close Punctuation

Value	Count	Frequency (%)
)	12	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	12	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	2929	96.0%
Common	121	4.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
동	249	8.5%
시	131	4.5%
청	131	4.5%
신	131	4.5%
종	122	4.2%
가	107	3.7%
로	105	3.6%
도	88	3.0%
부	79	2.7%
서	71	2.4%
Other values (74)	1715	58.6%

Common

Value	Count	Frequency (%)
3	51	42.1%
5	46	38.0%
)	12	9.9%
(	12	9.9%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	2929	96.0%
ASCII	121	4.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
동	249	8.5%
시	131	4.5%
청	131	4.5%
신	131	4.5%
종	122	4.2%
가	107	3.7%
로	105	3.6%
도	88	3.0%
부	79	2.7%
서	71	2.4%
Other values (74)	1715	58.6%

ASCII

Value	Count	Frequency (%)
3	51	42.1%
5	46	38.0%
)	12	9.9%
(	12	9.9%

출구번호
Categorical

HIGH CORRELATION

Distinct	18
Distinct (%)	1.6%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

1	293
2	251
3	170
6	77
4	69
Other values (13)	265

Length

Max length	3
Median length	1
Mean length	1.1004444
Min length	1

Unique

Unique	1 ?
Unique (%)	0.1%

Sample

1st row	1
2nd row	1
3rd row	1
4th row	1
5th row	1

Common Values

Value	Count	Frequency (%)
1	293	26.0%
2	251	22.3%
3	170	15.1%
6	77	6.8%
4	69	6.1%
5	68	6.0%
8	36	3.2%
7	34	3.0%
10	33	2.9%
9	23	2.0%
Other values (8)	71	6.3%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
1	293	26.0%
2	251	22.3%
3	170	15.1%
6	77	6.8%
4	69	6.1%
5	68	6.0%
8	36	3.2%
7	34	3.0%
10	33	2.9%
9	23	2.0%
Other values (8)	71	6.3%

출구별 주요시설명
Text

Distinct	938
Distinct (%)	83.4%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

Length

Max length	19
Median length	16
Mean length	6.2844444
Min length	2

Characters and Unicode

Total characters	7070
Distinct characters	382
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	793 ?
Unique (%)	70.5%

Sample

1st row	소요산사거리
2nd row	소요소방파출소
3rd row	소요산유원지
4th row	소요파출소
5th row	동안치안센터

Value	Count	Frequency (%)
방면	10	0.8%
동대문	8	0.6%
국민건강보험공단	6	0.5%
신한은행	6	0.5%
우리은행	5	0.4%
서울특별시청	5	0.4%
우체국	5	0.4%
근로복지공단	4	0.3%
고등학교	4	0.3%
창덕궁	4	0.3%
Other values (976)	1178	95.4%

Most occurring characters

Value	Count	Frequency (%)
동	251	3.6%
학	221	3.1%
교	217	3.1%
소	146	2.1%
등	133	1.9%
대	132	1.9%
서	128	1.8%
사	118	1.7%
	110	1.6%
국	108	1.5%
Other values (372)	5506	77.9%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	6683	94.5%
Decimal Number	147	2.1%
Space Separator	110	1.6%
Uppercase Letter	58	0.8%
Other Punctuation	30	0.4%
Open Punctuation	19	0.3%
Close Punctuation	19	0.3%
Dash Punctuation	2	< 0.1%
Math Symbol	1	< 0.1%
Other Symbol	1	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
동	251	3.8%
학	221	3.3%
교	217	3.2%
소	146	2.2%
등	133	2.0%
대	132	2.0%
서	128	1.9%
사	118	1.8%
국	108	1.6%
원	104	1.6%
Other values (336)	5125	76.7%

Uppercase Letter

Value	Count	Frequency (%)
K	8	13.8%
C	7	12.1%
S	6	10.3%
G	5	8.6%
T	5	8.6%
L	4	6.9%
V	4	6.9%
A	4	6.9%
I	3	5.2%
B	3	5.2%
Other values (7)	9	15.5%

Decimal Number

Value	Count	Frequency (%)
1	53	36.1%
2	40	27.2%
3	23	15.6%
4	11	7.5%
5	10	6.8%
9	3	2.0%
6	2	1.4%
0	2	1.4%
7	2	1.4%
8	1	0.7%

Other Punctuation

Value	Count	Frequency (%)
/	26	86.7%
·	3	10.0%
.	1	3.3%

Space Separator

Value	Count	Frequency (%)
	110	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	19	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	19	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	2	100.0%

Math Symbol

Value	Count	Frequency (%)
~	1	100.0%

Other Symbol

Value	Count	Frequency (%)
㈜	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	6684	94.5%
Common	328	4.6%
Latin	58	0.8%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
동	251	3.8%
학	221	3.3%
교	217	3.2%
소	146	2.2%
등	133	2.0%
대	132	2.0%
서	128	1.9%
사	118	1.8%
국	108	1.6%
원	104	1.6%
Other values (337)	5126	76.7%

Common

Value	Count	Frequency (%)
	110	33.5%
1	53	16.2%
2	40	12.2%
/	26	7.9%
3	23	7.0%
(	19	5.8%
)	19	5.8%
4	11	3.4%
5	10	3.0%
9	3	0.9%
Other values (8)	14	4.3%

Latin

Value	Count	Frequency (%)
K	8	13.8%
C	7	12.1%
S	6	10.3%
G	5	8.6%
T	5	8.6%
L	4	6.9%
V	4	6.9%
A	4	6.9%
I	3	5.2%
B	3	5.2%
Other values (7)	9	15.5%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	6683	94.5%
ASCII	383	5.4%
None	4	0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
동	251	3.8%
학	221	3.3%
교	217	3.2%
소	146	2.2%
등	133	2.0%
대	132	2.0%
서	128	1.9%
사	118	1.8%
국	108	1.6%
원	104	1.6%
Other values (336)	5125	76.7%

ASCII

Value	Count	Frequency (%)
	110	28.7%
1	53	13.8%
2	40	10.4%
/	26	6.8%
3	23	6.0%
(	19	5.0%
)	19	5.0%
4	11	2.9%
5	10	2.6%
K	8	2.1%
Other values (24)	64	16.7%

None

Value	Count	Frequency (%)
·	3	75.0%
㈜	1	25.0%

Heatmap
Table

	철도운영기관명	역명	출구번호
철도운영기관명	1.000	1.000	0.635
역명	1.000	1.000	0.602
출구번호	0.635	0.602	1.000

Heatmap
Table

	출구번호	철도운영기관명
출구번호	1.000	0.503
철도운영기관명	0.503	1.000

Heatmap
Table

	철도운영기관명	출구번호
철도운영기관명	1.000	0.503
출구번호	0.503	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	철도운영기관명	선명	역명	출구번호	출구별 주요시설명
0	코레일	1호선	소요산	1	소요산사거리
1	코레일	1호선	소요산	1	소요소방파출소
2	코레일	1호선	소요산	1	소요산유원지
3	코레일	1호선	동두천	1	소요파출소
4	코레일	1호선	동두천	1	동안치안센터
5	코레일	1호선	동두천	1	소요동사무소
6	코레일	1호선	동두천	2	동보초등학교
7	코레일	1호선	동두천	2	신창비바페밀리아파트
8	코레일	1호선	보산	1	보산초등학교
9	코레일	1호선	보산	1	보영여자고등학교

	철도운영기관명	선명	역명	출구번호	출구별 주요시설명
1115	코레일	1호선	동인천	2	우리은행
1116	코레일	1호선	동인천	3	축현파출소
1117	코레일	1호선	동인천	4	송현동
1118	코레일	1호선	동인천	4	화수동
1119	코레일	1호선	인천	1	중구청
1120	코레일	1호선	인천	1	연안부두
1121	코레일	1호선	인천	1	월미도
1122	코레일	1호선	인천	1	인천광역시종합관광안내소
1123	코레일	1호선	인천	1	자유공원
1124	코레일	1호선	인천	1	화교거리

Most frequently occurring

	철도운영기관명	선명	역명	출구번호	출구별 주요시설명	# duplicates
0	서울교통공사	1호선	시청	4	서울글로벌센터	2
1	코레일	1호선	간석	2	인천남고등학교	2

Overview

Variables

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Close Punctuation

Open Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Math Symbol

Other Symbol

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

Hangul

ASCII

None

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring