Overview

Dataset statistics

Number of variables10
Number of observations40
Missing cells3
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.4 KiB
Average record size in memory87.3 B

Variable types

Categorical8
Numeric1
Text1

Dataset

Description서울메트로에서 관리하는 도시광역철도역들의 철도운영기관명, 선명, 역명, 휠체어리프트의 관리번호, 출입구번호, 상세위치, 길이, 폭, 시작층, 종료층의 데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041426/fileData.do

Alerts

철도운영기관명 has constant value ""Constant
길이 is highly overall correlated with 선명 and 2 other fieldsHigh correlation
선명 is highly overall correlated with 역명 and 3 other fieldsHigh correlation
is highly overall correlated with 선명 and 2 other fieldsHigh correlation
역명 is highly overall correlated with 선명 and 3 other fieldsHigh correlation
시작층 is highly overall correlated with 종료층High correlation
종료층 is highly overall correlated with 선명 and 2 other fieldsHigh correlation
출입구번호 is highly imbalanced (83.1%)Imbalance
상세위치 has 3 (7.5%) missing valuesMissing

Reproduction

Analysis started2023-12-12 02:29:53.941533
Analysis finished2023-12-12 02:29:54.885901
Duration0.94 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size452.0 B
서울교통공사
40 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울교통공사
2nd row서울교통공사
3rd row서울교통공사
4th row서울교통공사
5th row서울교통공사

Common Values

ValueCountFrequency (%)
서울교통공사 40
100.0%

Length

2023-12-12T11:29:54.975434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:29:55.418863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울교통공사 40
100.0%

선명
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size452.0 B
2호선
12 
3호선
11 
4호선
1호선

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
2호선 12
30.0%
3호선 11
27.5%
4호선 9
22.5%
1호선 8
20.0%

Length

2023-12-12T11:29:55.548461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:29:55.683510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2호선 12
30.0%
3호선 11
27.5%
4호선 9
22.5%
1호선 8
20.0%

역명
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)40.0%
Missing0
Missing (%)0.0%
Memory size452.0 B
신설동
11 
종로3가
창동
청량리(서울시립대입구)
용답
Other values (11)
14 

Length

Max length12
Median length11
Mean length4.35
Min length2

Unique

Unique8 ?
Unique (%)20.0%

Sample

1st row서울역
2nd row신설동
3rd row신설동
4th row신설동
5th row신설동

Common Values

ValueCountFrequency (%)
신설동 11
27.5%
종로3가 7
17.5%
창동 4
 
10.0%
청량리(서울시립대입구) 2
 
5.0%
용답 2
 
5.0%
고속터미널 2
 
5.0%
교대(법원·검찰청) 2
 
5.0%
상계 2
 
5.0%
서울역 1
 
2.5%
한양대 1
 
2.5%
Other values (6) 6
15.0%

Length

2023-12-12T11:29:55.843867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
신설동 11
27.5%
종로3가 7
17.5%
창동 4
 
10.0%
청량리(서울시립대입구 2
 
5.0%
용답 2
 
5.0%
고속터미널 2
 
5.0%
교대(법원·검찰청 2
 
5.0%
상계 2
 
5.0%
서울역 1
 
2.5%
한양대 1
 
2.5%
Other values (6) 6
15.0%
Distinct7
Distinct (%)17.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.425
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size492.0 B
2023-12-12T11:29:55.958644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33.25
95-th percentile6
Maximum7
Range6
Interquartile range (IQR)2.25

Descriptive statistics

Standard deviation1.7080128
Coefficient of variation (CV)0.70433517
Kurtosis0.24115045
Mean2.425
Median Absolute Deviation (MAD)1
Skewness1.1097083
Sum97
Variance2.9173077
MonotonicityNot monotonic
2023-12-12T11:29:56.126033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1 17
42.5%
2 9
22.5%
3 4
 
10.0%
4 4
 
10.0%
5 3
 
7.5%
6 2
 
5.0%
7 1
 
2.5%
ValueCountFrequency (%)
1 17
42.5%
2 9
22.5%
3 4
 
10.0%
4 4
 
10.0%
5 3
 
7.5%
6 2
 
5.0%
7 1
 
2.5%
ValueCountFrequency (%)
7 1
 
2.5%
6 2
 
5.0%
5 3
 
7.5%
4 4
 
10.0%
3 4
 
10.0%
2 9
22.5%
1 17
42.5%

출입구번호
Categorical

IMBALANCE 

Distinct2
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Memory size452.0 B
<NA>
39 
12
 
1

Length

Max length4
Median length4
Mean length3.95
Min length2

Unique

Unique1 ?
Unique (%)2.5%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 39
97.5%
12 1
 
2.5%

Length

2023-12-12T11:29:56.299106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:29:56.427961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 39
97.5%
12 1
 
2.5%

상세위치
Text

MISSING 

Distinct36
Distinct (%)97.3%
Missing3
Missing (%)7.5%
Memory size452.0 B
2023-12-12T11:29:56.707157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length17
Mean length12.486486
Min length6

Characters and Unicode

Total characters462
Distinct characters68
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)94.6%

Sample

1st row(B2)내부 C 계단
2nd row(F1)6번 출입구
3rd row(B2)상선승강장 시점측
4th row(B2)하선승강장 시점측
5th row(B1)대합실 연결통로
ValueCountFrequency (%)
승강장 6
 
7.5%
연결통로 5
 
6.2%
계단 3
 
3.8%
b1)대합실 2
 
2.5%
출입구 2
 
2.5%
대합실 2
 
2.5%
b1)환승통로 2
 
2.5%
시점측 2
 
2.5%
을지3가측 2
 
2.5%
중앙 2
 
2.5%
Other values (49) 52
65.0%
2023-12-12T11:29:57.201062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
43
 
9.3%
( 35
 
7.6%
) 35
 
7.6%
1 27
 
5.8%
2 24
 
5.2%
B 19
 
4.1%
17
 
3.7%
17
 
3.7%
15
 
3.2%
13
 
2.8%
Other values (58) 217
47.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 243
52.6%
Decimal Number 66
 
14.3%
Space Separator 43
 
9.3%
Open Punctuation 35
 
7.6%
Close Punctuation 35
 
7.6%
Uppercase Letter 30
 
6.5%
Dash Punctuation 7
 
1.5%
Other Punctuation 3
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
17
 
7.0%
17
 
7.0%
15
 
6.2%
13
 
5.3%
13
 
5.3%
10
 
4.1%
10
 
4.1%
9
 
3.7%
9
 
3.7%
9
 
3.7%
Other values (42) 121
49.8%
Decimal Number
ValueCountFrequency (%)
1 27
40.9%
2 24
36.4%
3 6
 
9.1%
8 3
 
4.5%
4 3
 
4.5%
7 1
 
1.5%
0 1
 
1.5%
6 1
 
1.5%
Uppercase Letter
ValueCountFrequency (%)
B 19
63.3%
F 10
33.3%
C 1
 
3.3%
Space Separator
ValueCountFrequency (%)
43
100.0%
Open Punctuation
ValueCountFrequency (%)
( 35
100.0%
Close Punctuation
ValueCountFrequency (%)
) 35
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 243
52.6%
Common 189
40.9%
Latin 30
 
6.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
17
 
7.0%
17
 
7.0%
15
 
6.2%
13
 
5.3%
13
 
5.3%
10
 
4.1%
10
 
4.1%
9
 
3.7%
9
 
3.7%
9
 
3.7%
Other values (42) 121
49.8%
Common
ValueCountFrequency (%)
43
22.8%
( 35
18.5%
) 35
18.5%
1 27
14.3%
2 24
12.7%
- 7
 
3.7%
3 6
 
3.2%
8 3
 
1.6%
/ 3
 
1.6%
4 3
 
1.6%
Other values (3) 3
 
1.6%
Latin
ValueCountFrequency (%)
B 19
63.3%
F 10
33.3%
C 1
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 243
52.6%
ASCII 219
47.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
43
19.6%
( 35
16.0%
) 35
16.0%
1 27
12.3%
2 24
11.0%
B 19
8.7%
F 10
 
4.6%
- 7
 
3.2%
3 6
 
2.7%
8 3
 
1.4%
Other values (6) 10
 
4.6%
Hangul
ValueCountFrequency (%)
17
 
7.0%
17
 
7.0%
15
 
6.2%
13
 
5.3%
13
 
5.3%
10
 
4.1%
10
 
4.1%
9
 
3.7%
9
 
3.7%
9
 
3.7%
Other values (42) 121
49.8%

길이
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)7.5%
Missing0
Missing (%)0.0%
Memory size452.0 B
125
29 
<NA>
1250

Length

Max length4
Median length3
Mean length3.275
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row125
2nd row125
3rd row125
4th row125
5th row125

Common Values

ValueCountFrequency (%)
125 29
72.5%
<NA> 6
 
15.0%
1250 5
 
12.5%

Length

2023-12-12T11:29:57.371724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:29:57.499447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
125 29
72.5%
na 6
 
15.0%
1250 5
 
12.5%


Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)12.5%
Missing0
Missing (%)0.0%
Memory size452.0 B
80
23 
<NA>
110
800
94
 
1

Length

Max length4
Median length2
Mean length2.55
Min length2

Unique

Unique1 ?
Unique (%)2.5%

Sample

1st row80
2nd row80
3rd row80
4th row80
5th row80

Common Values

ValueCountFrequency (%)
80 23
57.5%
<NA> 6
 
15.0%
110 5
 
12.5%
800 5
 
12.5%
94 1
 
2.5%

Length

2023-12-12T11:29:57.654533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:29:57.827965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
80 23
57.5%
na 6
 
15.0%
110 5
 
12.5%
800 5
 
12.5%
94 1
 
2.5%

시작층
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)17.5%
Missing0
Missing (%)0.0%
Memory size452.0 B
지하2층
13 
지하1층
11 
지상1층
지상2층
지하4층
Other values (2)

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row지하2층
2nd row지상1층
3rd row지하2층
4th row지하2층
5th row지하1층

Common Values

ValueCountFrequency (%)
지하2층 13
32.5%
지하1층 11
27.5%
지상1층 6
15.0%
지상2층 4
 
10.0%
지하4층 2
 
5.0%
지하3층 2
 
5.0%
<NA> 2
 
5.0%

Length

2023-12-12T11:29:57.977083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:29:58.108700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지하2층 13
32.5%
지하1층 11
27.5%
지상1층 6
15.0%
지상2층 4
 
10.0%
지하4층 2
 
5.0%
지하3층 2
 
5.0%
na 2
 
5.0%

종료층
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)17.5%
Missing0
Missing (%)0.0%
Memory size452.0 B
지하1층
21 
지상3층
지상2층
지하3층
<NA>
Other values (2)

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row지하1층
2nd row지하1층
3rd row지하1층
4th row지하1층
5th row지하1층

Common Values

ValueCountFrequency (%)
지하1층 21
52.5%
지상3층 6
 
15.0%
지상2층 3
 
7.5%
지하3층 3
 
7.5%
<NA> 3
 
7.5%
지상1층 2
 
5.0%
지하2층 2
 
5.0%

Length

2023-12-12T11:29:58.262842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:29:58.389913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지하1층 21
52.5%
지상3층 6
 
15.0%
지상2층 3
 
7.5%
지하3층 3
 
7.5%
na 3
 
7.5%
지상1층 2
 
5.0%
지하2층 2
 
5.0%

Interactions

2023-12-12T11:29:54.480580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T11:29:58.500016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명역명휠체어리프트의 관리번호상세위치길이시작층종료층
선명1.0000.9840.0001.0001.0000.8800.6340.739
역명0.9841.0000.0001.0001.0000.9930.7050.849
휠체어리프트의 관리번호0.0000.0001.0000.6330.0000.0000.0000.000
상세위치1.0001.0000.6331.0001.0001.0001.0001.000
길이1.0001.0000.0001.0001.0001.0000.7280.642
0.8800.9930.0001.0001.0001.0000.6570.671
시작층0.6340.7050.0001.0000.7280.6571.0000.968
종료층0.7390.8490.0001.0000.6420.6710.9681.000
2023-12-12T11:29:58.638874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
길이선명시작층역명종료층출입구번호
길이1.0000.9680.4980.9680.7500.432NaN
선명0.9681.0000.4460.5490.6790.554NaN
시작층0.4980.4461.0000.3210.3730.733NaN
0.9680.5490.3211.0000.6890.332NaN
역명0.7500.6790.3730.6891.0000.538NaN
종료층0.4320.5540.7330.3320.5381.000NaN
출입구번호NaNNaNNaNNaNNaNNaN1.000
2023-12-12T11:29:58.782376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
휠체어리프트의 관리번호선명역명출입구번호길이시작층종료층
휠체어리프트의 관리번호1.0000.0000.000NaN0.0000.0000.0000.000
선명0.0001.0000.679NaN0.9680.5490.4460.554
역명0.0000.6791.000NaN0.7500.6890.3730.538
출입구번호NaNNaNNaN1.000NaNNaNNaNNaN
길이0.0000.9680.750NaN1.0000.9680.4980.432
0.0000.5490.689NaN0.9681.0000.3210.332
시작층0.0000.4460.373NaN0.4980.3211.0000.733
종료층0.0000.5540.538NaN0.4320.3320.7331.000

Missing values

2023-12-12T11:29:54.611453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T11:29:54.811959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명휠체어리프트의 관리번호출입구번호상세위치길이시작층종료층
0서울교통공사1호선서울역1<NA>(B2)내부 C 계단12580지하2층지하1층
1서울교통공사1호선신설동1<NA>(F1)6번 출입구12580지상1층지하1층
2서울교통공사1호선신설동2<NA>(B2)상선승강장 시점측12580지하2층지하1층
3서울교통공사1호선신설동3<NA>(B2)하선승강장 시점측12580지하2층지하1층
4서울교통공사1호선신설동4<NA>(B1)대합실 연결통로12580지하1층지하1층
5서울교통공사1호선신설동5<NA>(B1)대합실 연결통로12580지하1층지하1층
6서울교통공사1호선청량리(서울시립대입구)1<NA>(B2)제기동측 승강장12580지하2층지하1층
7서울교통공사1호선청량리(서울시립대입구)2<NA>(B2)섬식(상)8-2125110지하2층지하1층
8서울교통공사2호선한양대1<NA>(F1)섬식(외) 8-2125110지상1층지상2층
9서울교통공사2호선용답1<NA>(F1)섬식(상) 1-412580지상1층지상2층
철도운영기관명선명역명휠체어리프트의 관리번호출입구번호상세위치길이시작층종료층
30서울교통공사3호선종로3가7<NA>을지3가측 1-2층<NA><NA>지하2층지하1층
31서울교통공사4호선상계1<NA>(F2)상행 3-1125110지상2층지상3층
32서울교통공사4호선상계2<NA>(F2)하행 8-3125110지상2층지상3층
33서울교통공사4호선창동1<NA>(F2)2번출입구 1호선 연결통로12580지상2층지상3층
34서울교통공사4호선창동2<NA>(F1)1호선 상선측 승강장12580지상1층지상3층
35서울교통공사4호선창동3<NA>(F1)1호선 하선측 승강장12580지상1층지상3층
36서울교통공사4호선창동4<NA>(F2)1번출입구 1호선 연결통로12580지상2층지상3층
37서울교통공사4호선동대문1<NA>(환승통로환승통로)1/4호선(환승통로)12594<NA><NA>
38서울교통공사4호선이촌(국립중앙박물관)1<NA>(환승통로환승통로)국철 연결통로12580<NA><NA>
39서울교통공사4호선사당1<NA>(B1)2/4호선대합실연결통로12580지하1층지하1층