gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	1979
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	3
Duplicate rows (%)	0.2%
Total size in memory	77.4 KiB
Average record size in memory	40.1 B

Variable types

Categorical	3
Text	2

Dataset

Description	코레일에서 관리하는 도시광역철도역들의 철도운영기관명,선명,역명,출구번호,출구별 주요시설명, 주소 등의 데이터 입니다.
Author	국가철도공단
URL	https://www.data.go.kr/data/15073465/fileData.do

Alerts

`철도운영기관명` has constant value ""	Constant
Dataset has 3 (0.2%) duplicate rows	Duplicates

Reproduction

Analysis started	2023-12-12 07:18:36.011236
Analysis finished	2023-12-12 07:18:36.724899
Duration	0.71 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

철도운영기관명
Categorical

CONSTANT

Distinct	1
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	15.6 KiB

코레일	1979

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	코레일
2nd row	코레일
3rd row	코레일
4th row	코레일
5th row	코레일

Common Values

Value	Count	Frequency (%)
코레일	1979	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
코레일	1979	100.0%

선명
Categorical

Distinct	7
Distinct (%)	0.4%
Missing	0
Missing (%)	0.0%
Memory size	15.6 KiB

1호선	648
수인분당	482
경의중앙	315
4호선	214
경춘	154
Other values (2)	166

Length

Max length	4
Median length	3
Mean length	3.3016675
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	1호선
2nd row	1호선
3rd row	1호선
4th row	1호선
5th row	1호선

Common Values

Value	Count	Frequency (%)
1호선	648	32.7%
수인분당	482	24.4%
경의중앙	315	15.9%
4호선	214	10.8%
경춘	154	7.8%
3호선	120	6.1%
경강	46	2.3%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
1호선	648	32.7%
수인분당	482	24.4%
경의중앙	315	15.9%
4호선	214	10.8%
경춘	154	7.8%
3호선	120	6.1%
경강	46	2.3%

역명
Text

Distinct	190
Distinct (%)	9.6%
Missing	0
Missing (%)	0.0%
Memory size	15.6 KiB

Length

Max length	14
Median length	2
Mean length	3.2647802
Min length	2

Characters and Unicode

Total characters	6461
Distinct characters	191
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	2 ?
Unique (%)	0.1%

Sample

1st row	소요산
2nd row	소요산
3rd row	소요산
4th row	동두천
5th row	동두천

Value	Count	Frequency (%)
신도림	51	2.6%
연수	37	1.9%
창동	33	1.7%
의정부	32	1.6%
용문	30	1.5%
망포	28	1.4%
평내호평	28	1.4%
부평	28	1.4%
한티	26	1.3%
녹천	26	1.3%
Other values (180)	1660	83.9%

Most occurring characters

Value	Count	Frequency (%)
(	211	3.3%
)	211	3.3%
대	209	3.2%
천	182	2.8%
산	180	2.8%
원	164	2.5%
도	150	2.3%
정	143	2.2%
수	124	1.9%
평	120	1.9%
Other values (181)	4767	73.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	6034	93.4%
Open Punctuation	211	3.3%
Close Punctuation	211	3.3%
Other Punctuation	5	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
대	209	3.5%
천	182	3.0%
산	180	3.0%
원	164	2.7%
도	150	2.5%
정	143	2.4%
수	124	2.1%
평	120	2.0%
신	116	1.9%
부	116	1.9%
Other values (178)	4530	75.1%

Open Punctuation

Value	Count	Frequency (%)
(	211	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	211	100.0%

Other Punctuation

Value	Count	Frequency (%)
·	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	6034	93.4%
Common	427	6.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
대	209	3.5%
천	182	3.0%
산	180	3.0%
원	164	2.7%
도	150	2.5%
정	143	2.4%
수	124	2.1%
평	120	2.0%
신	116	1.9%
부	116	1.9%
Other values (178)	4530	75.1%

Common

Value	Count	Frequency (%)
(	211	49.4%
)	211	49.4%
·	5	1.2%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	6034	93.4%
ASCII	422	6.5%
None	5	0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
(	211	50.0%
)	211	50.0%

Hangul

Value	Count	Frequency (%)
대	209	3.5%
천	182	3.0%
산	180	3.0%
원	164	2.7%
도	150	2.5%
정	143	2.4%
수	124	2.1%
평	120	2.0%
신	116	1.9%
부	116	1.9%
Other values (178)	4530	75.1%

None

Value	Count	Frequency (%)
·	5	100.0%

출구번호
Categorical

Distinct	13
Distinct (%)	0.7%
Missing	0
Missing (%)	0.0%
Memory size	15.6 KiB

1	709
2	512
3	267
4	145
5	101
Other values (8)	245

Length

Max length	3
Median length	1
Mean length	1.0111167
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	1
2nd row	1
3rd row	1
4th row	1
5th row	1

Common Values

Value	Count	Frequency (%)
1	709	35.8%
2	512	25.9%
3	267	13.5%
4	145	7.3%
5	101	5.1%
6	98	5.0%
7	64	3.2%
8	57	2.9%
9	10	0.5%
10	5	0.3%
Other values (3)	11	0.6%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
1	709	35.8%
2	512	25.9%
3	267	13.5%
4	145	7.3%
5	101	5.1%
6	98	5.0%
7	64	3.2%
8	57	2.9%
9	10	0.5%
10	5	0.3%
Other values (3)	11	0.6%

출구별 주요시설명
Text

Distinct	1711
Distinct (%)	86.5%
Missing	0
Missing (%)	0.0%
Memory size	15.6 KiB

Length

Max length	19
Median length	17
Mean length	6.3036887
Min length	2

Characters and Unicode

Total characters	12475
Distinct characters	446
Distinct categories	9 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1489 ?
Unique (%)	75.2%

Sample

1st row	소요산사거리
2nd row	소요소방파출소
3rd row	소요산유원지
4th row	동안치안센터
5th row	소요동사무소

Value	Count	Frequency (%)
고등학교	12	0.6%
고교	8	0.4%
주민센터	7	0.3%
주차장	6	0.3%
방향	6	0.3%
d주차장	5	0.2%
이마트	5	0.2%
중	5	0.2%
c주차장	5	0.2%
태장고등학교	5	0.2%
Other values (1766)	2060	97.0%

Most occurring characters

Value	Count	Frequency (%)
교	610	4.9%
학	586	4.7%
등	376	3.0%
동	342	2.7%
초	256	2.1%
소	249	2.0%
원	246	2.0%
중	221	1.8%
사	207	1.7%
고	184	1.5%
Other values (436)	9198	73.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	11845	94.9%
Decimal Number	285	2.3%
Space Separator	145	1.2%
Uppercase Letter	109	0.9%
Other Punctuation	63	0.5%
Open Punctuation	11	0.1%
Close Punctuation	11	0.1%
Dash Punctuation	3	< 0.1%
Math Symbol	3	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
교	610	5.1%
학	586	4.9%
등	376	3.2%
동	342	2.9%
초	256	2.2%
소	249	2.1%
원	246	2.1%
중	221	1.9%
사	207	1.7%
고	184	1.6%
Other values (399)	8568	72.3%

Uppercase Letter

Value	Count	Frequency (%)
C	16	14.7%
A	11	10.1%
T	10	9.2%
K	9	8.3%
S	8	7.3%
M	6	5.5%
G	6	5.5%
I	6	5.5%
B	6	5.5%
L	5	4.6%
Other values (10)	26	23.9%

Decimal Number

Value	Count	Frequency (%)
1	120	42.1%
2	74	26.0%
3	25	8.8%
4	19	6.7%
9	13	4.6%
5	13	4.6%
7	7	2.5%
8	5	1.8%
6	5	1.8%
0	4	1.4%

Other Punctuation

Value	Count	Frequency (%)
/	62	98.4%
.	1	1.6%

Space Separator

Value	Count	Frequency (%)
	145	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	11	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	11	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	3	100.0%

Math Symbol

Value	Count	Frequency (%)
~	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	11845	94.9%
Common	521	4.2%
Latin	109	0.9%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
교	610	5.1%
학	586	4.9%
등	376	3.2%
동	342	2.9%
초	256	2.2%
소	249	2.1%
원	246	2.1%
중	221	1.9%
사	207	1.7%
고	184	1.6%
Other values (399)	8568	72.3%

Latin

Value	Count	Frequency (%)
C	16	14.7%
A	11	10.1%
T	10	9.2%
K	9	8.3%
S	8	7.3%
M	6	5.5%
G	6	5.5%
I	6	5.5%
B	6	5.5%
L	5	4.6%
Other values (10)	26	23.9%

Common

Value	Count	Frequency (%)
	145	27.8%
1	120	23.0%
2	74	14.2%
/	62	11.9%
3	25	4.8%
4	19	3.6%
9	13	2.5%
5	13	2.5%
(	11	2.1%
)	11	2.1%
Other values (7)	28	5.4%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	11845	94.9%
ASCII	630	5.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
교	610	5.1%
학	586	4.9%
등	376	3.2%
동	342	2.9%
초	256	2.2%
소	249	2.1%
원	246	2.1%
중	221	1.9%
사	207	1.7%
고	184	1.6%
Other values (399)	8568	72.3%

ASCII

Value	Count	Frequency (%)
	145	23.0%
1	120	19.0%
2	74	11.7%
/	62	9.8%
3	25	4.0%
4	19	3.0%
C	16	2.5%
9	13	2.1%
5	13	2.1%
A	11	1.7%
Other values (27)	132	21.0%

Heatmap
Table

	선명	출구번호
선명	1.000	0.412
출구번호	0.412	1.000

Heatmap
Table

	선명	출구번호
선명	1.000	0.205
출구번호	0.205	1.000

Heatmap
Table

	선명	출구번호
선명	1.000	0.205
출구번호	0.205	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	철도운영기관명	선명	역명	출구번호	출구별 주요시설명
0	코레일	1호선	소요산	1	소요산사거리
1	코레일	1호선	소요산	1	소요소방파출소
2	코레일	1호선	소요산	1	소요산유원지
3	코레일	1호선	동두천	1	동안치안센터
4	코레일	1호선	동두천	1	소요동사무소
5	코레일	1호선	동두천	2	동보초등학교
6	코레일	1호선	동두천	2	신창비바페밀리아파트
7	코레일	1호선	동두천	1	소요파출소
8	코레일	1호선	보산	1	보산초등학교
9	코레일	1호선	보산	1	보영여자고등학교

	철도운영기관명	선명	역명	출구번호	출구별 주요시설명
1969	코레일	경강	부발	1	이천시립효양도서관
1970	코레일	경강	세종대왕릉	1	세종대왕릉
1971	코레일	경강	세종대왕릉	1	효종대왕릉
1972	코레일	경강	세종대왕릉	1	능서면사무소
1973	코레일	경강	세종대왕릉	1	한국농어촌공사(여주/ 이천지사)
1974	코레일	경강	세종대왕릉	1	능서초등학교
1975	코레일	경강	여주	1	여주경찰서
1976	코레일	경강	여주	1	여주교육지원청
1977	코레일	경강	여주	2	주차장
1978	코레일	경강	여주	1	수원지방법원여주지원

Most frequently occurring

	철도운영기관명	선명	역명	출구번호	출구별 주요시설명	# duplicates
0	코레일	1호선	간석	2	인천남고등학교	2
1	코레일	경의중앙	운길산	1	진중리생태마을	2
2	코레일	경의중앙	운길산	2	진중리생태마을	2

Overview

Variables

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Open Punctuation

Close Punctuation

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

None

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Math Symbol

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring