Overview

Dataset statistics

Number of variables8
Number of observations474
Missing cells0
Missing cells (%)0.0%
Duplicate rows26
Duplicate rows (%)5.5%
Total size in memory29.8 KiB
Average record size in memory64.3 B

Variable types

Categorical6
Text2

Dataset

Description수도권1호선에 포함된 도시광역철도역들의 철도운영기관명, 선명, 역명, 상하행구분, 출입구번호, 상세위치, 시작층, 종료층의 데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041367/fileData.do

Alerts

선명 has constant value ""Constant
Dataset has 26 (5.5%) duplicate rowsDuplicates
상하행구분 is highly overall correlated with 시작층 and 1 other fieldsHigh correlation
시작층 is highly overall correlated with 상하행구분High correlation
종료층 is highly overall correlated with 상하행구분High correlation
철도운영기관 is highly imbalanced (63.5%)Imbalance

Reproduction

Analysis started2023-12-12 13:13:00.857659
Analysis finished2023-12-12 13:13:01.641124
Duration0.78 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
코레일
441 
서울교통공사
 
33

Length

Max length6
Median length3
Mean length3.2088608
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row코레일
2nd row코레일
3rd row코레일
4th row코레일
5th row코레일

Common Values

ValueCountFrequency (%)
코레일 441
93.0%
서울교통공사 33
 
7.0%

Length

2023-12-12T22:13:01.731475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:13:01.843338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
코레일 441
93.0%
서울교통공사 33
 
7.0%

선명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
1호선
474 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
1호선 474
100.0%

Length

2023-12-12T22:13:01.939008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:13:02.025092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1호선 474
100.0%

역명
Text

Distinct77
Distinct (%)16.2%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
2023-12-12T22:13:02.232971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length2
Mean length2.5970464
Min length2

Characters and Unicode

Total characters1231
Distinct characters107
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.8%

Sample

1st row회기
2nd row회기
3rd row회기
4th row회기
5th row회기
ValueCountFrequency (%)
광명 28
 
5.9%
아산 22
 
4.6%
동묘앞 12
 
2.5%
인천 12
 
2.5%
영등포 12
 
2.5%
도봉산 10
 
2.1%
송내 10
 
2.1%
의정부 10
 
2.1%
대방 10
 
2.1%
신도림 9
 
1.9%
Other values (67) 339
71.5%
2023-12-12T22:13:02.638055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
62
 
5.0%
57
 
4.6%
46
 
3.7%
38
 
3.1%
36
 
2.9%
34
 
2.8%
28
 
2.3%
26
 
2.1%
26
 
2.1%
26
 
2.1%
Other values (97) 852
69.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1199
97.4%
Open Punctuation 14
 
1.1%
Close Punctuation 14
 
1.1%
Decimal Number 4
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
62
 
5.2%
57
 
4.8%
46
 
3.8%
38
 
3.2%
36
 
3.0%
34
 
2.8%
28
 
2.3%
26
 
2.2%
26
 
2.2%
26
 
2.2%
Other values (94) 820
68.4%
Open Punctuation
ValueCountFrequency (%)
( 14
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14
100.0%
Decimal Number
ValueCountFrequency (%)
3 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1199
97.4%
Common 32
 
2.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
62
 
5.2%
57
 
4.8%
46
 
3.8%
38
 
3.2%
36
 
3.0%
34
 
2.8%
28
 
2.3%
26
 
2.2%
26
 
2.2%
26
 
2.2%
Other values (94) 820
68.4%
Common
ValueCountFrequency (%)
( 14
43.8%
) 14
43.8%
3 4
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1199
97.4%
ASCII 32
 
2.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
62
 
5.2%
57
 
4.8%
46
 
3.8%
38
 
3.2%
36
 
3.0%
34
 
2.8%
28
 
2.3%
26
 
2.2%
26
 
2.2%
26
 
2.2%
Other values (94) 820
68.4%
ASCII
ValueCountFrequency (%)
( 14
43.8%
) 14
43.8%
3 4
 
12.5%

상하행구분
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
상행
270 
하행
204 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row상행
2nd row상행
3rd row상행
4th row상행
5th row상행

Common Values

ValueCountFrequency (%)
상행 270
57.0%
하행 204
43.0%

Length

2023-12-12T22:13:02.777514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:13:02.869102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
상행 270
57.0%
하행 204
43.0%

출입구번호
Categorical

Distinct22
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
<NA>
212 
1
98 
2
61 
3
32 
4
 
15
Other values (17)
56 

Length

Max length7
Median length5
Mean length2.535865
Min length1

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row1
2nd row2
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 212
44.7%
1 98
20.7%
2 61
 
12.9%
3 32
 
6.8%
4 15
 
3.2%
1/2 11
 
2.3%
2/3 5
 
1.1%
5/6 4
 
0.8%
5 4
 
0.8%
1/3 4
 
0.8%
Other values (12) 28
 
5.9%

Length

2023-12-12T22:13:02.967996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 212
44.7%
1 98
20.7%
2 61
 
12.9%
3 32
 
6.8%
4 15
 
3.2%
1/2 11
 
2.3%
2/3 5
 
1.1%
5/6 4
 
0.8%
5 4
 
0.8%
1/3 4
 
0.8%
Other values (12) 28
 
5.9%
Distinct398
Distinct (%)84.0%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
2023-12-12T22:13:03.253971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length79
Median length41
Mean length17.563291
Min length3

Characters and Unicode

Total characters8325
Distinct characters194
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique342 ?
Unique (%)72.2%

Sample

1st row(1층) 1번 출입구 계단옆 > (2층) 맞이방
2nd row(1층) 2번 출입구 계단옆
3rd row(1층) 광운대방향 4-3 출입문 앞
4th row(1층) 광운대방향 7-2 출입문 앞
5th row(1층) 문산방향 6-2 출입문 앞
ValueCountFrequency (%)
135
 
6.7%
출입구 113
 
5.6%
방향 106
 
5.2%
출입문 80
 
3.9%
1f 76
 
3.7%
승강장 71
 
3.5%
맞이방 54
 
2.7%
1층 45
 
2.2%
45
 
2.2%
2f 44
 
2.2%
Other values (368) 1258
62.1%
2023-12-12T22:13:03.749871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1625
19.5%
( 454
 
5.5%
) 454
 
5.5%
1 417
 
5.0%
F 292
 
3.5%
287
 
3.4%
272
 
3.3%
2 268
 
3.2%
245
 
2.9%
217
 
2.6%
Other values (184) 3794
45.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4185
50.3%
Space Separator 1625
 
19.5%
Decimal Number 1013
 
12.2%
Open Punctuation 454
 
5.5%
Close Punctuation 454
 
5.5%
Uppercase Letter 359
 
4.3%
Dash Punctuation 122
 
1.5%
Math Symbol 81
 
1.0%
Other Punctuation 26
 
0.3%
Lowercase Letter 6
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
287
 
6.9%
272
 
6.5%
245
 
5.9%
217
 
5.2%
196
 
4.7%
147
 
3.5%
143
 
3.4%
138
 
3.3%
135
 
3.2%
126
 
3.0%
Other values (160) 2279
54.5%
Decimal Number
ValueCountFrequency (%)
1 417
41.2%
2 268
26.5%
3 123
 
12.1%
4 79
 
7.8%
5 40
 
3.9%
6 37
 
3.7%
7 24
 
2.4%
0 10
 
1.0%
8 8
 
0.8%
9 7
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
F 292
81.3%
B 62
 
17.3%
A 3
 
0.8%
C 2
 
0.6%
Other Punctuation
ValueCountFrequency (%)
/ 15
57.7%
. 10
38.5%
* 1
 
3.8%
Math Symbol
ValueCountFrequency (%)
> 71
87.7%
10
 
12.3%
Space Separator
ValueCountFrequency (%)
1625
100.0%
Open Punctuation
ValueCountFrequency (%)
( 454
100.0%
Close Punctuation
ValueCountFrequency (%)
) 454
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 122
100.0%
Lowercase Letter
ValueCountFrequency (%)
m 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4185
50.3%
Common 3775
45.3%
Latin 365
 
4.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
287
 
6.9%
272
 
6.5%
245
 
5.9%
217
 
5.2%
196
 
4.7%
147
 
3.5%
143
 
3.4%
138
 
3.3%
135
 
3.2%
126
 
3.0%
Other values (160) 2279
54.5%
Common
ValueCountFrequency (%)
1625
43.0%
( 454
 
12.0%
) 454
 
12.0%
1 417
 
11.0%
2 268
 
7.1%
3 123
 
3.3%
- 122
 
3.2%
4 79
 
2.1%
> 71
 
1.9%
5 40
 
1.1%
Other values (9) 122
 
3.2%
Latin
ValueCountFrequency (%)
F 292
80.0%
B 62
 
17.0%
m 6
 
1.6%
A 3
 
0.8%
C 2
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4185
50.3%
ASCII 4130
49.6%
Arrows 10
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1625
39.3%
( 454
 
11.0%
) 454
 
11.0%
1 417
 
10.1%
F 292
 
7.1%
2 268
 
6.5%
3 123
 
3.0%
- 122
 
3.0%
4 79
 
1.9%
> 71
 
1.7%
Other values (13) 225
 
5.4%
Hangul
ValueCountFrequency (%)
287
 
6.9%
272
 
6.5%
245
 
5.9%
217
 
5.2%
196
 
4.7%
147
 
3.5%
143
 
3.4%
138
 
3.3%
135
 
3.2%
126
 
3.0%
Other values (160) 2279
54.5%
Arrows
ValueCountFrequency (%)
10
100.0%

시작층
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
지상1
238 
지상2
127 
지하1
54 
지상3
41 
지하2
 
8
Other values (2)
 
6

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row지상1
2nd row지상1
3rd row지상1
4th row지상1
5th row지상1

Common Values

ValueCountFrequency (%)
지상1 238
50.2%
지상2 127
26.8%
지하1 54
 
11.4%
지상3 41
 
8.6%
지하2 8
 
1.7%
지상4 4
 
0.8%
지하3 2
 
0.4%

Length

2023-12-12T22:13:03.878563image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:13:03.998461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지상1 238
50.2%
지상2 127
26.8%
지하1 54
 
11.4%
지상3 41
 
8.6%
지하2 8
 
1.7%
지상4 4
 
0.8%
지하3 2
 
0.4%

종료층
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
지상1
169 
지상2
166 
지상3
60 
지하1
48 
지하2
24 
Other values (3)
 
7

Length

Max length3
Median length3
Mean length2.9978903
Min length2

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row지상2
2nd row지상2
3rd row지상2
4th row지상2
5th row지상2

Common Values

ValueCountFrequency (%)
지상1 169
35.7%
지상2 166
35.0%
지상3 60
 
12.7%
지하1 48
 
10.1%
지하2 24
 
5.1%
지상4 4
 
0.8%
지하3 2
 
0.4%
지상 1
 
0.2%

Length

2023-12-12T22:13:04.117225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:13:04.231068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지상1 169
35.7%
지상2 166
35.0%
지상3 60
 
12.7%
지하1 48
 
10.1%
지하2 24
 
5.1%
지상4 4
 
0.8%
지하3 2
 
0.4%
지상 1
 
0.2%

Correlations

2023-12-12T22:13:04.590415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관역명상하행구분출입구번호시작층종료층
철도운영기관1.0001.0000.0000.3680.4220.489
역명1.0001.0000.0000.9490.6900.732
상하행구분0.0000.0001.0000.0000.5410.744
출입구번호0.3680.9490.0001.0000.0000.000
시작층0.4220.6900.5410.0001.0000.712
종료층0.4890.7320.7440.0000.7121.000
2023-12-12T22:13:04.689333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상하행구분종료층철도운영기관출입구번호시작층
상하행구분1.0000.5670.0000.0000.578
종료층0.5671.0000.3660.0000.484
철도운영기관0.0000.3661.0000.3110.449
출입구번호0.0000.0000.3111.0000.000
시작층0.5780.4840.4490.0001.000
2023-12-12T22:13:04.794575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관상하행구분출입구번호시작층종료층
철도운영기관1.0000.0000.3110.4490.366
상하행구분0.0001.0000.0000.5780.567
출입구번호0.3110.0001.0000.0000.000
시작층0.4490.5780.0001.0000.484
종료층0.3660.5670.0000.4841.000

Missing values

2023-12-12T22:13:01.431084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:13:01.578955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관선명역명상하행구분출입구번호상세위치시작층종료층
0코레일1호선회기상행1(1층) 1번 출입구 계단옆 > (2층) 맞이방지상1지상2
1코레일1호선회기상행2(1층) 2번 출입구 계단옆지상1지상2
2코레일1호선회기상행<NA>(1층) 광운대방향 4-3 출입문 앞지상1지상2
3코레일1호선회기상행<NA>(1층) 광운대방향 7-2 출입문 앞지상1지상2
4코레일1호선회기상행<NA>(1층) 문산방향 6-2 출입문 앞지상1지상2
5코레일1호선회기상행<NA>(1층) 문산방향 3-3 츨입문 앞지상1지상2
6코레일1호선회기상행<NA>(1층) 1번츨입구 계단 옆지상1지상2
7코레일1호선화서상행6(1) 6번 출입구 계단 옆지상1지상2
8코레일1호선화서하행4(1) 4번 출입구 계단 옆지상1지상2
9코레일1호선덕정하행1(1F) 지행역 방향 승강장 7-3지상2지상1
철도운영기관선명역명상하행구분출입구번호상세위치시작층종료층
464코레일1호선평택지제상행<NA>(1F)서정리역 방향 7-2 출입문 앞지상1지상2
465코레일1호선평택지제상행<NA>(1F)평택역 방향 4-3 출입문 앞지상1지상2
466코레일1호선회룡상행<NA>(1F)망월사역 방향 승강장 5-3 출입문 앞>(3F)맞이방지상1지상3
467코레일1호선회룡하행<NA>(3F)맞이방>(1F)망월사역 방향 승강장 5-3 출입문 앞지상3지상1
468코레일1호선회룡상행<NA>(1F)의정부역 방향 5-3 출입문 앞>(3F)맞이방지상1지상3
469코레일1호선회룡하행<NA>(3F)맞이방>(1F)의정부역 방향 5-3 출입문 앞지상3지상1
470코레일1호선회룡상행3(2F)3번출입구>(3F)맞이방지상2지상3
471코레일1호선회룡하행3(3F)맞이방>(2F)3번출입구지상3지상2
472코레일1호선회룡상행3(1F)3번출입구>(2F)중층지상1지상2
473코레일1호선회룡하행3(2F)중층>(1F)3번출입구지상2지상

Duplicate rows

Most frequently occurring

철도운영기관선명역명상하행구분출입구번호상세위치시작층종료층# duplicates
22코레일1호선의정부하행<NA>(3F)맞이방지상3지상13
0코레일1호선개봉하행<NA>(2F) 표내는곳 옆지상2지상12
1코레일1호선덕정상행1(1F) 맞이방(표 내는 곳 안쪽)지상1지상22
2코레일1호선동두천하행<NA>(3F) 맞이방지상3지상12
3코레일1호선동암상행<NA>맞이방→승강장지상1지상22
4코레일1호선동암하행<NA>승강장→맞이방지상2지상12
5코레일1호선동인천상행44번 출입구 개집표구 앞지상1지상12
6코레일1호선부천상행<NA>(B1) 지하 동부 맞이방지상1지상12
7코레일1호선서정리상행1서정리역 1층 1번출구 앞지상1지상32
8코레일1호선서정리하행1서정리역 3층 1번출구 앞지상3지상12