Overview

Dataset statistics

Number of variables9
Number of observations107
Missing cells5
Missing cells (%)0.5%
Duplicate rows1
Duplicate rows (%)0.9%
Total size in memory7.9 KiB
Average record size in memory75.2 B

Variable types

Categorical6
Text3

Dataset

Description코레일에서 관리하는 도시광역철도역들의 철도운영기관명, 선명, 역명, 지상지하구분, 역층, 상세위치, 충전설비수, 이용요금, 전화번호의 데이터가 포함되어 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041285/fileData.do

Alerts

철도운영기관명 has constant value ""Constant
Dataset has 1 (0.9%) duplicate rowsDuplicates
충전설비수 is highly overall correlated with 이용요금High correlation
선명 is highly overall correlated with 지상지하구분 and 1 other fieldsHigh correlation
역층 is highly overall correlated with 이용요금High correlation
지상지하구분 is highly overall correlated with 선명 and 1 other fieldsHigh correlation
이용요금 is highly overall correlated with 선명 and 3 other fieldsHigh correlation
충전설비수 is highly imbalanced (86.6%)Imbalance
이용요금 is highly imbalanced (86.6%)Imbalance
전화번호 has 5 (4.7%) missing valuesMissing

Reproduction

Analysis started2023-12-12 13:01:38.135055
Analysis finished2023-12-12 13:01:38.869066
Duration0.73 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size988.0 B
코레일
107 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row코레일
2nd row코레일
3rd row코레일
4th row코레일
5th row코레일

Common Values

ValueCountFrequency (%)
코레일 107
100.0%

Length

2023-12-12T22:01:38.949412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:01:39.066203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
코레일 107
100.0%

선명
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Memory size988.0 B
1호선
44 
경의중앙선
20 
동해선
3호선
분당선
Other values (5)
19 

Length

Max length5
Median length3
Mean length3.364486
Min length2

Unique

Unique1 ?
Unique (%)0.9%

Sample

1st row경의중앙선
2nd row분당선
3rd row경춘선
4th row경춘선
5th row동해선

Common Values

ValueCountFrequency (%)
1호선 44
41.1%
경의중앙선 20
18.7%
동해선 9
 
8.4%
3호선 8
 
7.5%
분당선 7
 
6.5%
경춘선 7
 
6.5%
4호선 6
 
5.6%
경강선 3
 
2.8%
수인선 2
 
1.9%
역명 1
 
0.9%

Length

2023-12-12T22:01:39.188960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:01:39.337180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1호선 44
41.1%
경의중앙선 20
18.7%
동해선 9
 
8.4%
3호선 8
 
7.5%
분당선 7
 
6.5%
경춘선 7
 
6.5%
4호선 6
 
5.6%
경강선 3
 
2.8%
수인선 2
 
1.9%
역명 1
 
0.9%

역명
Text

Distinct103
Distinct (%)96.3%
Missing0
Missing (%)0.0%
Memory size988.0 B
2023-12-12T22:01:39.717689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length3
Mean length3.4953271
Min length2

Characters and Unicode

Total characters374
Distinct characters134
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique99 ?
Unique (%)92.5%

Sample

1st row가좌역
2nd row가천대역
3rd row갈매역
4th row강촌역
5th row거제역
ValueCountFrequency (%)
동두천중앙 2
 
1.9%
용산역 2
 
1.9%
광명 2
 
1.9%
녹천역 2
 
1.9%
일산역 1
 
0.9%
왕십리역 1
 
0.9%
인하대역 1
 
0.9%
이촌 1
 
0.9%
이천역 1
 
0.9%
이매역 1
 
0.9%
Other values (93) 93
86.9%
2023-12-12T22:01:40.215672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
87
23.3%
14
 
3.7%
12
 
3.2%
10
 
2.7%
10
 
2.7%
7
 
1.9%
6
 
1.6%
6
 
1.6%
6
 
1.6%
) 5
 
1.3%
Other values (124) 211
56.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 364
97.3%
Close Punctuation 5
 
1.3%
Open Punctuation 5
 
1.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
87
23.9%
14
 
3.8%
12
 
3.3%
10
 
2.7%
10
 
2.7%
7
 
1.9%
6
 
1.6%
6
 
1.6%
6
 
1.6%
5
 
1.4%
Other values (122) 201
55.2%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 364
97.3%
Common 10
 
2.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
87
23.9%
14
 
3.8%
12
 
3.3%
10
 
2.7%
10
 
2.7%
7
 
1.9%
6
 
1.6%
6
 
1.6%
6
 
1.6%
5
 
1.4%
Other values (122) 201
55.2%
Common
ValueCountFrequency (%)
) 5
50.0%
( 5
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 364
97.3%
ASCII 10
 
2.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
87
23.9%
14
 
3.8%
12
 
3.3%
10
 
2.7%
10
 
2.7%
7
 
1.9%
6
 
1.6%
6
 
1.6%
6
 
1.6%
5
 
1.4%
Other values (122) 201
55.2%
ASCII
ValueCountFrequency (%)
) 5
50.0%
( 5
50.0%

지상지하구분
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size988.0 B
지상
82 
지하
25 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row지하
2nd row지하
3rd row지상
4th row지상
5th row지상

Common Values

ValueCountFrequency (%)
지상 82
76.6%
지하 25
 
23.4%

Length

2023-12-12T22:01:40.390267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:01:40.502613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지상 82
76.6%
지하 25
 
23.4%

역층
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size988.0 B
1
61 
2
30 
3
15 
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)0.9%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row3

Common Values

ValueCountFrequency (%)
1 61
57.0%
2 30
28.0%
3 15
 
14.0%
4 1
 
0.9%

Length

2023-12-12T22:01:40.623824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:01:40.741117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 61
57.0%
2 30
28.0%
3 15
 
14.0%
4 1
 
0.9%
Distinct102
Distinct (%)95.3%
Missing0
Missing (%)0.0%
Memory size988.0 B
2023-12-12T22:01:41.100807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length37
Median length27
Mean length14.635514
Min length3

Characters and Unicode

Total characters1566
Distinct characters134
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique97 ?
Unique (%)90.7%

Sample

1st row북쪽개찰구 앞
2nd row(B1) 고객안내센터 앞 맞이방
3rd row1번 출입구 나가는 방향
4th row(F1) 1번 출입구 옆
5th row역무실 옆
ValueCountFrequency (%)
32
 
7.6%
29
 
6.9%
출입구 26
 
6.2%
맞이방 19
 
4.5%
b1 19
 
4.5%
2f 18
 
4.3%
1f 17
 
4.1%
방향 15
 
3.6%
역무실 10
 
2.4%
3f 9
 
2.1%
Other values (123) 225
53.7%
2023-12-12T22:01:41.688780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
325
20.8%
( 93
 
5.9%
) 93
 
5.9%
1 67
 
4.3%
F 57
 
3.6%
56
 
3.6%
54
 
3.4%
51
 
3.3%
2 44
 
2.8%
44
 
2.8%
Other values (124) 682
43.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 812
51.9%
Space Separator 325
20.8%
Decimal Number 150
 
9.6%
Open Punctuation 93
 
5.9%
Close Punctuation 93
 
5.9%
Uppercase Letter 88
 
5.6%
Other Punctuation 4
 
0.3%
Dash Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
56
 
6.9%
54
 
6.7%
51
 
6.3%
44
 
5.4%
37
 
4.6%
36
 
4.4%
35
 
4.3%
33
 
4.1%
31
 
3.8%
26
 
3.2%
Other values (104) 409
50.4%
Decimal Number
ValueCountFrequency (%)
1 67
44.7%
2 44
29.3%
3 28
18.7%
4 5
 
3.3%
7 2
 
1.3%
8 2
 
1.3%
5 1
 
0.7%
6 1
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
F 57
64.8%
B 23
26.1%
E 3
 
3.4%
V 2
 
2.3%
L 1
 
1.1%
A 1
 
1.1%
C 1
 
1.1%
Space Separator
ValueCountFrequency (%)
325
100.0%
Open Punctuation
ValueCountFrequency (%)
( 93
100.0%
Close Punctuation
ValueCountFrequency (%)
) 93
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 4
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 812
51.9%
Common 666
42.5%
Latin 88
 
5.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
56
 
6.9%
54
 
6.7%
51
 
6.3%
44
 
5.4%
37
 
4.6%
36
 
4.4%
35
 
4.3%
33
 
4.1%
31
 
3.8%
26
 
3.2%
Other values (104) 409
50.4%
Common
ValueCountFrequency (%)
325
48.8%
( 93
 
14.0%
) 93
 
14.0%
1 67
 
10.1%
2 44
 
6.6%
3 28
 
4.2%
4 5
 
0.8%
/ 4
 
0.6%
7 2
 
0.3%
8 2
 
0.3%
Other values (3) 3
 
0.5%
Latin
ValueCountFrequency (%)
F 57
64.8%
B 23
26.1%
E 3
 
3.4%
V 2
 
2.3%
L 1
 
1.1%
A 1
 
1.1%
C 1
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 812
51.9%
ASCII 754
48.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
325
43.1%
( 93
 
12.3%
) 93
 
12.3%
1 67
 
8.9%
F 57
 
7.6%
2 44
 
5.8%
3 28
 
3.7%
B 23
 
3.1%
4 5
 
0.7%
/ 4
 
0.5%
Other values (10) 15
 
2.0%
Hangul
ValueCountFrequency (%)
56
 
6.9%
54
 
6.7%
51
 
6.3%
44
 
5.4%
37
 
4.6%
36
 
4.4%
35
 
4.3%
33
 
4.1%
31
 
3.8%
26
 
3.2%
Other values (104) 409
50.4%

충전설비수
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size988.0 B
1
105 
2
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 105
98.1%
2 2
 
1.9%

Length

2023-12-12T22:01:41.834073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:01:41.938330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 105
98.1%
2 2
 
1.9%

이용요금
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size988.0 B
무료
105 
<NA>
 
2

Length

Max length4
Median length2
Mean length2.0373832
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row무료
2nd row무료
3rd row무료
4th row무료
5th row무료

Common Values

ValueCountFrequency (%)
무료 105
98.1%
<NA> 2
 
1.9%

Length

2023-12-12T22:01:42.083334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:01:42.201744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
무료 105
98.1%
na 2
 
1.9%

전화번호
Text

MISSING 

Distinct63
Distinct (%)61.8%
Missing5
Missing (%)4.7%
Memory size988.0 B
2023-12-12T22:01:42.436490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length12
Mean length11.872549
Min length2

Characters and Unicode

Total characters1211
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique49 ?
Unique (%)48.0%

Sample

1st row031-732-7725
2nd row031-550-8383
3rd row033-261-7730
4th row051-665-4335
5th row051-665-4331
ValueCountFrequency (%)
031-442-9819 12
 
11.8%
031-909-9000 7
 
6.9%
02-999-3117 5
 
4.9%
031-732-7725 5
 
4.9%
031-731-4587 4
 
3.9%
070-4236-4587 3
 
2.9%
031-861-2345 3
 
2.9%
051-550-4882 2
 
2.0%
051-709-4371 2
 
2.0%
031-8075-3293 2
 
2.0%
Other values (53) 57
55.9%
2023-12-12T22:01:42.894187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 201
16.6%
0 181
14.9%
3 135
11.1%
1 133
11.0%
7 98
8.1%
2 96
7.9%
9 89
7.3%
4 85
7.0%
8 78
 
6.4%
5 67
 
5.5%
Other values (3) 48
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1008
83.2%
Dash Punctuation 201
 
16.6%
Other Letter 2
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 181
18.0%
3 135
13.4%
1 133
13.2%
7 98
9.7%
2 96
9.5%
9 89
8.8%
4 85
8.4%
8 78
7.7%
5 67
 
6.6%
6 46
 
4.6%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 201
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1209
99.8%
Hangul 2
 
0.2%

Most frequent character per script

Common
ValueCountFrequency (%)
- 201
16.6%
0 181
15.0%
3 135
11.2%
1 133
11.0%
7 98
8.1%
2 96
7.9%
9 89
7.4%
4 85
7.0%
8 78
 
6.5%
5 67
 
5.5%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1209
99.8%
Hangul 2
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 201
16.6%
0 181
15.0%
3 135
11.2%
1 133
11.0%
7 98
8.1%
2 96
7.9%
9 89
7.4%
4 85
7.0%
8 78
 
6.5%
5 67
 
5.5%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Correlations

2023-12-12T22:01:43.008045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명지상지하구분역층충전설비수전화번호
선명1.0000.9460.1220.0000.971
지상지하구분0.9461.0000.5920.0000.545
역층0.1220.5921.0000.0950.000
충전설비수0.0000.0000.0951.0001.000
전화번호0.9710.5450.0001.0001.000
2023-12-12T22:01:43.109631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
충전설비수선명역층지상지하구분이용요금
충전설비수1.0000.0000.0600.0001.000
선명0.0001.0000.0650.7711.000
역층0.0600.0651.0000.4041.000
지상지하구분0.0000.7710.4041.0001.000
이용요금1.0001.0001.0001.0001.000
2023-12-12T22:01:43.230273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명지상지하구분역층충전설비수이용요금
선명1.0000.7710.0650.0001.000
지상지하구분0.7711.0000.4040.0001.000
역층0.0650.4041.0000.0601.000
충전설비수0.0000.0000.0601.0001.000
이용요금1.0001.0001.0001.0001.000

Missing values

2023-12-12T22:01:38.637348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:01:38.789478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명지상지하구분역층상세위치충전설비수이용요금전화번호
0코레일경의중앙선가좌역지하1북쪽개찰구 앞1무료<NA>
1코레일분당선가천대역지하1(B1) 고객안내센터 앞 맞이방1무료031-732-7725
2코레일경춘선갈매역지상11번 출입구 나가는 방향1무료031-550-8383
3코레일경춘선강촌역지상1(F1) 1번 출입구 옆1무료033-261-7730
4코레일동해선거제역지상3역무실 옆1무료051-665-4335
5코레일동해선거제해맞이역지상1(1F) 3번출입구 기계실과 태화강방면 표내는 곳 사이1무료051-665-4331
6코레일4호선과천역지하1(B1) 북쪽 맞이방(자동발매기 앞)1무료031-442-9819
7코레일1호선광명지상1(1F) 1번출구1무료031-442-9819
8코레일1호선광명지상1(1F) 6번출구1무료031-442-9819
9코레일경의중앙선구리역지상2(2F) 남자화장실 옆1무료031-731-4587
철도운영기관명선명역명지상지하구분역층상세위치충전설비수이용요금전화번호
97코레일경강선판교지하2(B2) 3번 출입구 옆1무료070-4236-4587
98코레일4호선평촌역지하1(B1) 역무실 앞1무료031-442-9819
99코레일1호선평택지상3(3F) 1번 출입구 엘리베이터 타는 곳 방향 ABC마트 앞1무료031-8024-5000
100코레일1호선평택지제(한경국립대)지상2(2F) 1번출구 E/V 옆1무료031-8024-5000
101코레일경의중앙선한남역지상1(1F) 1번출입구 근처1무료02-2199-7103
102코레일경의중앙선행신역지상2(2) 맞이방 게이트 우측1무료031-909-9000
103코레일3호선화정역지하1(B1) 4번출구 계단옆1무료031-909-9000
104코레일1호선회기역지상12층 자유통로 (2번출구방면)1무료02-965-1467
105코레일1호선회룡역지상3(3F) 3번 출입구 E/V 3호기 앞1무료031-872-7744
106코레일경의중앙선효창공원앞지하1역무실 앞1무료02-701-6518

Duplicate rows

Most frequently occurring

철도운영기관명선명역명지상지하구분역층상세위치충전설비수이용요금전화번호# duplicates
0코레일1호선동두천중앙지상1(1F) 3번 출입구 앞1무료031-861-23452