Overview

Dataset statistics

Number of variables5
Number of observations1125
Missing cells0
Missing cells (%)0.0%
Duplicate rows2
Duplicate rows (%)0.2%
Total size in memory44.1 KiB
Average record size in memory40.1 B

Variable types

Categorical3
Text2

Dataset

Description수도권1호선에 포함된 도시광역철도역들의 철도운영기관명,선명,역명,출구번호,출구별 주요시설명, 주소 등의 데이터 입니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15073464/fileData.do

Alerts

선명 has constant value ""Constant
Dataset has 2 (0.2%) duplicate rowsDuplicates
철도운영기관명 is highly overall correlated with 출구번호High correlation
출구번호 is highly overall correlated with 철도운영기관명High correlation

Reproduction

Analysis started2023-12-12 17:30:23.785858
Analysis finished2023-12-12 17:30:24.812720
Duration1.03 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
코레일
648 
서울교통공사
477 

Length

Max length6
Median length3
Mean length4.272
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row코레일
2nd row코레일
3rd row코레일
4th row코레일
5th row코레일

Common Values

ValueCountFrequency (%)
코레일 648
57.6%
서울교통공사 477
42.4%

Length

2023-12-13T02:30:24.912218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:30:25.044460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
코레일 648
57.6%
서울교통공사 477
42.4%

선명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
1호선
1125 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
1호선 1125
100.0%

Length

2023-12-13T02:30:25.202816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:30:25.330362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1호선 1125
100.0%

역명
Text

Distinct62
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
2023-12-13T02:30:25.574396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length5
Mean length2.7111111
Min length2

Characters and Unicode

Total characters3050
Distinct characters88
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row소요산
2nd row소요산
3rd row소요산
4th row동두천
5th row동두천
ValueCountFrequency (%)
시청 119
 
10.6%
서울역 59
 
5.2%
신설동 57
 
5.1%
신도림 51
 
4.5%
종로3가 51
 
4.5%
동묘앞 47
 
4.2%
종로5가 46
 
4.1%
제기동 36
 
3.2%
창동 33
 
2.9%
의정부 32
 
2.8%
Other values (52) 594
52.8%
2023-12-13T02:30:26.062215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
249
 
8.2%
131
 
4.3%
131
 
4.3%
131
 
4.3%
122
 
4.0%
107
 
3.5%
105
 
3.4%
88
 
2.9%
79
 
2.6%
71
 
2.3%
Other values (78) 1836
60.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2929
96.0%
Decimal Number 97
 
3.2%
Close Punctuation 12
 
0.4%
Open Punctuation 12
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
249
 
8.5%
131
 
4.5%
131
 
4.5%
131
 
4.5%
122
 
4.2%
107
 
3.7%
105
 
3.6%
88
 
3.0%
79
 
2.7%
71
 
2.4%
Other values (74) 1715
58.6%
Decimal Number
ValueCountFrequency (%)
3 51
52.6%
5 46
47.4%
Close Punctuation
ValueCountFrequency (%)
) 12
100.0%
Open Punctuation
ValueCountFrequency (%)
( 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2929
96.0%
Common 121
 
4.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
249
 
8.5%
131
 
4.5%
131
 
4.5%
131
 
4.5%
122
 
4.2%
107
 
3.7%
105
 
3.6%
88
 
3.0%
79
 
2.7%
71
 
2.4%
Other values (74) 1715
58.6%
Common
ValueCountFrequency (%)
3 51
42.1%
5 46
38.0%
) 12
 
9.9%
( 12
 
9.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2929
96.0%
ASCII 121
 
4.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
249
 
8.5%
131
 
4.5%
131
 
4.5%
131
 
4.5%
122
 
4.2%
107
 
3.7%
105
 
3.6%
88
 
3.0%
79
 
2.7%
71
 
2.4%
Other values (74) 1715
58.6%
ASCII
ValueCountFrequency (%)
3 51
42.1%
5 46
38.0%
) 12
 
9.9%
( 12
 
9.9%

출구번호
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
1
293 
2
251 
3
170 
6
77 
4
69 
Other values (13)
265 

Length

Max length3
Median length1
Mean length1.1004444
Min length1

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 293
26.0%
2 251
22.3%
3 170
15.1%
6 77
 
6.8%
4 69
 
6.1%
5 68
 
6.0%
8 36
 
3.2%
7 34
 
3.0%
10 33
 
2.9%
9 23
 
2.0%
Other values (8) 71
 
6.3%

Length

2023-12-13T02:30:26.277806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 293
26.0%
2 251
22.3%
3 170
15.1%
6 77
 
6.8%
4 69
 
6.1%
5 68
 
6.0%
8 36
 
3.2%
7 34
 
3.0%
10 33
 
2.9%
9 23
 
2.0%
Other values (8) 71
 
6.3%
Distinct938
Distinct (%)83.4%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
2023-12-13T02:30:26.627404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length16
Mean length6.2844444
Min length2

Characters and Unicode

Total characters7070
Distinct characters382
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique793 ?
Unique (%)70.5%

Sample

1st row소요산사거리
2nd row소요소방파출소
3rd row소요산유원지
4th row소요파출소
5th row동안치안센터
ValueCountFrequency (%)
방면 10
 
0.8%
동대문 8
 
0.6%
국민건강보험공단 6
 
0.5%
신한은행 6
 
0.5%
우리은행 5
 
0.4%
서울특별시청 5
 
0.4%
우체국 5
 
0.4%
근로복지공단 4
 
0.3%
고등학교 4
 
0.3%
창덕궁 4
 
0.3%
Other values (976) 1178
95.4%
2023-12-13T02:30:27.089127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
251
 
3.6%
221
 
3.1%
217
 
3.1%
146
 
2.1%
133
 
1.9%
132
 
1.9%
128
 
1.8%
118
 
1.7%
110
 
1.6%
108
 
1.5%
Other values (372) 5506
77.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6683
94.5%
Decimal Number 147
 
2.1%
Space Separator 110
 
1.6%
Uppercase Letter 58
 
0.8%
Other Punctuation 30
 
0.4%
Open Punctuation 19
 
0.3%
Close Punctuation 19
 
0.3%
Dash Punctuation 2
 
< 0.1%
Math Symbol 1
 
< 0.1%
Other Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
251
 
3.8%
221
 
3.3%
217
 
3.2%
146
 
2.2%
133
 
2.0%
132
 
2.0%
128
 
1.9%
118
 
1.8%
108
 
1.6%
104
 
1.6%
Other values (336) 5125
76.7%
Uppercase Letter
ValueCountFrequency (%)
K 8
13.8%
C 7
12.1%
S 6
10.3%
G 5
8.6%
T 5
8.6%
L 4
6.9%
V 4
6.9%
A 4
6.9%
I 3
 
5.2%
B 3
 
5.2%
Other values (7) 9
15.5%
Decimal Number
ValueCountFrequency (%)
1 53
36.1%
2 40
27.2%
3 23
15.6%
4 11
 
7.5%
5 10
 
6.8%
9 3
 
2.0%
6 2
 
1.4%
0 2
 
1.4%
7 2
 
1.4%
8 1
 
0.7%
Other Punctuation
ValueCountFrequency (%)
/ 26
86.7%
· 3
 
10.0%
. 1
 
3.3%
Space Separator
ValueCountFrequency (%)
110
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6684
94.5%
Common 328
 
4.6%
Latin 58
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
251
 
3.8%
221
 
3.3%
217
 
3.2%
146
 
2.2%
133
 
2.0%
132
 
2.0%
128
 
1.9%
118
 
1.8%
108
 
1.6%
104
 
1.6%
Other values (337) 5126
76.7%
Common
ValueCountFrequency (%)
110
33.5%
1 53
16.2%
2 40
 
12.2%
/ 26
 
7.9%
3 23
 
7.0%
( 19
 
5.8%
) 19
 
5.8%
4 11
 
3.4%
5 10
 
3.0%
9 3
 
0.9%
Other values (8) 14
 
4.3%
Latin
ValueCountFrequency (%)
K 8
13.8%
C 7
12.1%
S 6
10.3%
G 5
8.6%
T 5
8.6%
L 4
6.9%
V 4
6.9%
A 4
6.9%
I 3
 
5.2%
B 3
 
5.2%
Other values (7) 9
15.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6683
94.5%
ASCII 383
 
5.4%
None 4
 
0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
251
 
3.8%
221
 
3.3%
217
 
3.2%
146
 
2.2%
133
 
2.0%
132
 
2.0%
128
 
1.9%
118
 
1.8%
108
 
1.6%
104
 
1.6%
Other values (336) 5125
76.7%
ASCII
ValueCountFrequency (%)
110
28.7%
1 53
13.8%
2 40
 
10.4%
/ 26
 
6.8%
3 23
 
6.0%
( 19
 
5.0%
) 19
 
5.0%
4 11
 
2.9%
5 10
 
2.6%
K 8
 
2.1%
Other values (24) 64
16.7%
None
ValueCountFrequency (%)
· 3
75.0%
1
 
25.0%

Correlations

2023-12-13T02:30:27.219007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관명역명출구번호
철도운영기관명1.0001.0000.635
역명1.0001.0000.602
출구번호0.6350.6021.000
2023-12-13T02:30:27.336306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출구번호철도운영기관명
출구번호1.0000.503
철도운영기관명0.5031.000
2023-12-13T02:30:27.446053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관명출구번호
철도운영기관명1.0000.503
출구번호0.5031.000

Missing values

2023-12-13T02:30:24.287826image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T02:30:24.411670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명출구번호출구별 주요시설명
0코레일1호선소요산1소요산사거리
1코레일1호선소요산1소요소방파출소
2코레일1호선소요산1소요산유원지
3코레일1호선동두천1소요파출소
4코레일1호선동두천1동안치안센터
5코레일1호선동두천1소요동사무소
6코레일1호선동두천2동보초등학교
7코레일1호선동두천2신창비바페밀리아파트
8코레일1호선보산1보산초등학교
9코레일1호선보산1보영여자고등학교
철도운영기관명선명역명출구번호출구별 주요시설명
1115코레일1호선동인천2우리은행
1116코레일1호선동인천3축현파출소
1117코레일1호선동인천4송현동
1118코레일1호선동인천4화수동
1119코레일1호선인천1중구청
1120코레일1호선인천1연안부두
1121코레일1호선인천1월미도
1122코레일1호선인천1인천광역시종합관광안내소
1123코레일1호선인천1자유공원
1124코레일1호선인천1화교거리

Duplicate rows

Most frequently occurring

철도운영기관명선명역명출구번호출구별 주요시설명# duplicates
0서울교통공사1호선시청4서울글로벌센터2
1코레일1호선간석2인천남고등학교2