Overview

Dataset statistics

Number of variables5
Number of observations1979
Missing cells0
Missing cells (%)0.0%
Duplicate rows3
Duplicate rows (%)0.2%
Total size in memory77.4 KiB
Average record size in memory40.1 B

Variable types

Categorical3
Text2

Dataset

Description코레일에서 관리하는 도시광역철도역들의 철도운영기관명,선명,역명,출구번호,출구별 주요시설명, 주소 등의 데이터 입니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15073465/fileData.do

Alerts

철도운영기관명 has constant value ""Constant
Dataset has 3 (0.2%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 07:18:36.011236
Analysis finished2023-12-12 07:18:36.724899
Duration0.71 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
코레일
1979 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row코레일
2nd row코레일
3rd row코레일
4th row코레일
5th row코레일

Common Values

ValueCountFrequency (%)
코레일 1979
100.0%

Length

2023-12-12T16:18:36.793778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:18:36.881066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
코레일 1979
100.0%

선명
Categorical

Distinct7
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
1호선
648 
수인분당
482 
경의중앙
315 
4호선
214 
경춘
154 
Other values (2)
166 

Length

Max length4
Median length3
Mean length3.3016675
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
1호선 648
32.7%
수인분당 482
24.4%
경의중앙 315
15.9%
4호선 214
 
10.8%
경춘 154
 
7.8%
3호선 120
 
6.1%
경강 46
 
2.3%

Length

2023-12-12T16:18:36.983423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:18:37.091288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1호선 648
32.7%
수인분당 482
24.4%
경의중앙 315
15.9%
4호선 214
 
10.8%
경춘 154
 
7.8%
3호선 120
 
6.1%
경강 46
 
2.3%

역명
Text

Distinct190
Distinct (%)9.6%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
2023-12-12T16:18:37.385085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length2
Mean length3.2647802
Min length2

Characters and Unicode

Total characters6461
Distinct characters191
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row소요산
2nd row소요산
3rd row소요산
4th row동두천
5th row동두천
ValueCountFrequency (%)
신도림 51
 
2.6%
연수 37
 
1.9%
창동 33
 
1.7%
의정부 32
 
1.6%
용문 30
 
1.5%
망포 28
 
1.4%
평내호평 28
 
1.4%
부평 28
 
1.4%
한티 26
 
1.3%
녹천 26
 
1.3%
Other values (180) 1660
83.9%
2023-12-12T16:18:37.860136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 211
 
3.3%
) 211
 
3.3%
209
 
3.2%
182
 
2.8%
180
 
2.8%
164
 
2.5%
150
 
2.3%
143
 
2.2%
124
 
1.9%
120
 
1.9%
Other values (181) 4767
73.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6034
93.4%
Open Punctuation 211
 
3.3%
Close Punctuation 211
 
3.3%
Other Punctuation 5
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
209
 
3.5%
182
 
3.0%
180
 
3.0%
164
 
2.7%
150
 
2.5%
143
 
2.4%
124
 
2.1%
120
 
2.0%
116
 
1.9%
116
 
1.9%
Other values (178) 4530
75.1%
Open Punctuation
ValueCountFrequency (%)
( 211
100.0%
Close Punctuation
ValueCountFrequency (%)
) 211
100.0%
Other Punctuation
ValueCountFrequency (%)
· 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6034
93.4%
Common 427
 
6.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
209
 
3.5%
182
 
3.0%
180
 
3.0%
164
 
2.7%
150
 
2.5%
143
 
2.4%
124
 
2.1%
120
 
2.0%
116
 
1.9%
116
 
1.9%
Other values (178) 4530
75.1%
Common
ValueCountFrequency (%)
( 211
49.4%
) 211
49.4%
· 5
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6034
93.4%
ASCII 422
 
6.5%
None 5
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 211
50.0%
) 211
50.0%
Hangul
ValueCountFrequency (%)
209
 
3.5%
182
 
3.0%
180
 
3.0%
164
 
2.7%
150
 
2.5%
143
 
2.4%
124
 
2.1%
120
 
2.0%
116
 
1.9%
116
 
1.9%
Other values (178) 4530
75.1%
None
ValueCountFrequency (%)
· 5
100.0%

출구번호
Categorical

Distinct13
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
1
709 
2
512 
3
267 
4
145 
5
101 
Other values (8)
245 

Length

Max length3
Median length1
Mean length1.0111167
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 709
35.8%
2 512
25.9%
3 267
 
13.5%
4 145
 
7.3%
5 101
 
5.1%
6 98
 
5.0%
7 64
 
3.2%
8 57
 
2.9%
9 10
 
0.5%
10 5
 
0.3%
Other values (3) 11
 
0.6%

Length

2023-12-12T16:18:38.023947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 709
35.8%
2 512
25.9%
3 267
 
13.5%
4 145
 
7.3%
5 101
 
5.1%
6 98
 
5.0%
7 64
 
3.2%
8 57
 
2.9%
9 10
 
0.5%
10 5
 
0.3%
Other values (3) 11
 
0.6%
Distinct1711
Distinct (%)86.5%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
2023-12-12T16:18:38.287645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length17
Mean length6.3036887
Min length2

Characters and Unicode

Total characters12475
Distinct characters446
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1489 ?
Unique (%)75.2%

Sample

1st row소요산사거리
2nd row소요소방파출소
3rd row소요산유원지
4th row동안치안센터
5th row소요동사무소
ValueCountFrequency (%)
고등학교 12
 
0.6%
고교 8
 
0.4%
주민센터 7
 
0.3%
주차장 6
 
0.3%
방향 6
 
0.3%
d주차장 5
 
0.2%
이마트 5
 
0.2%
5
 
0.2%
c주차장 5
 
0.2%
태장고등학교 5
 
0.2%
Other values (1766) 2060
97.0%
2023-12-12T16:18:38.663100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
610
 
4.9%
586
 
4.7%
376
 
3.0%
342
 
2.7%
256
 
2.1%
249
 
2.0%
246
 
2.0%
221
 
1.8%
207
 
1.7%
184
 
1.5%
Other values (436) 9198
73.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 11845
94.9%
Decimal Number 285
 
2.3%
Space Separator 145
 
1.2%
Uppercase Letter 109
 
0.9%
Other Punctuation 63
 
0.5%
Open Punctuation 11
 
0.1%
Close Punctuation 11
 
0.1%
Dash Punctuation 3
 
< 0.1%
Math Symbol 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
610
 
5.1%
586
 
4.9%
376
 
3.2%
342
 
2.9%
256
 
2.2%
249
 
2.1%
246
 
2.1%
221
 
1.9%
207
 
1.7%
184
 
1.6%
Other values (399) 8568
72.3%
Uppercase Letter
ValueCountFrequency (%)
C 16
14.7%
A 11
10.1%
T 10
 
9.2%
K 9
 
8.3%
S 8
 
7.3%
M 6
 
5.5%
G 6
 
5.5%
I 6
 
5.5%
B 6
 
5.5%
L 5
 
4.6%
Other values (10) 26
23.9%
Decimal Number
ValueCountFrequency (%)
1 120
42.1%
2 74
26.0%
3 25
 
8.8%
4 19
 
6.7%
9 13
 
4.6%
5 13
 
4.6%
7 7
 
2.5%
8 5
 
1.8%
6 5
 
1.8%
0 4
 
1.4%
Other Punctuation
ValueCountFrequency (%)
/ 62
98.4%
. 1
 
1.6%
Space Separator
ValueCountFrequency (%)
145
100.0%
Open Punctuation
ValueCountFrequency (%)
( 11
100.0%
Close Punctuation
ValueCountFrequency (%)
) 11
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 11845
94.9%
Common 521
 
4.2%
Latin 109
 
0.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
610
 
5.1%
586
 
4.9%
376
 
3.2%
342
 
2.9%
256
 
2.2%
249
 
2.1%
246
 
2.1%
221
 
1.9%
207
 
1.7%
184
 
1.6%
Other values (399) 8568
72.3%
Latin
ValueCountFrequency (%)
C 16
14.7%
A 11
10.1%
T 10
 
9.2%
K 9
 
8.3%
S 8
 
7.3%
M 6
 
5.5%
G 6
 
5.5%
I 6
 
5.5%
B 6
 
5.5%
L 5
 
4.6%
Other values (10) 26
23.9%
Common
ValueCountFrequency (%)
145
27.8%
1 120
23.0%
2 74
14.2%
/ 62
11.9%
3 25
 
4.8%
4 19
 
3.6%
9 13
 
2.5%
5 13
 
2.5%
( 11
 
2.1%
) 11
 
2.1%
Other values (7) 28
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 11845
94.9%
ASCII 630
 
5.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
610
 
5.1%
586
 
4.9%
376
 
3.2%
342
 
2.9%
256
 
2.2%
249
 
2.1%
246
 
2.1%
221
 
1.9%
207
 
1.7%
184
 
1.6%
Other values (399) 8568
72.3%
ASCII
ValueCountFrequency (%)
145
23.0%
1 120
19.0%
2 74
11.7%
/ 62
9.8%
3 25
 
4.0%
4 19
 
3.0%
C 16
 
2.5%
9 13
 
2.1%
5 13
 
2.1%
A 11
 
1.7%
Other values (27) 132
21.0%

Correlations

2023-12-12T16:18:38.763500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명출구번호
선명1.0000.412
출구번호0.4121.000
2023-12-12T16:18:38.858359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명출구번호
선명1.0000.205
출구번호0.2051.000
2023-12-12T16:18:38.969416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명출구번호
선명1.0000.205
출구번호0.2051.000

Missing values

2023-12-12T16:18:36.567393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T16:18:36.682387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명출구번호출구별 주요시설명
0코레일1호선소요산1소요산사거리
1코레일1호선소요산1소요소방파출소
2코레일1호선소요산1소요산유원지
3코레일1호선동두천1동안치안센터
4코레일1호선동두천1소요동사무소
5코레일1호선동두천2동보초등학교
6코레일1호선동두천2신창비바페밀리아파트
7코레일1호선동두천1소요파출소
8코레일1호선보산1보산초등학교
9코레일1호선보산1보영여자고등학교
철도운영기관명선명역명출구번호출구별 주요시설명
1969코레일경강부발1이천시립효양도서관
1970코레일경강세종대왕릉1세종대왕릉
1971코레일경강세종대왕릉1효종대왕릉
1972코레일경강세종대왕릉1능서면사무소
1973코레일경강세종대왕릉1한국농어촌공사(여주/ 이천지사)
1974코레일경강세종대왕릉1능서초등학교
1975코레일경강여주1여주경찰서
1976코레일경강여주1여주교육지원청
1977코레일경강여주2주차장
1978코레일경강여주1수원지방법원여주지원

Duplicate rows

Most frequently occurring

철도운영기관명선명역명출구번호출구별 주요시설명# duplicates
0코레일1호선간석2인천남고등학교2
1코레일경의중앙운길산1진중리생태마을2
2코레일경의중앙운길산2진중리생태마을2