Overview

Dataset statistics

Number of variables7
Number of observations179
Missing cells100
Missing cells (%)8.0%
Duplicate rows25
Duplicate rows (%)14.0%
Total size in memory10.4 KiB
Average record size in memory59.7 B

Variable types

Categorical3
Text2
Numeric2

Dataset

Description수도권7호선에 포함된 도시광역철도역들의 엘리베이터 데이터로 철도운영기관명, 선명, 역명, 출입구번호, 상세위치, 정원인원, 정원중량의데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041395/fileData.do

Alerts

선명 has constant value ""Constant
Dataset has 25 (14.0%) duplicate rowsDuplicates
정원_중량(kg) is highly overall correlated with 철도운영기관명 and 1 other fieldsHigh correlation
철도운영기관명 is highly overall correlated with 정원_중량(kg) and 1 other fieldsHigh correlation
정원_인원 is highly overall correlated with 정원_중량(kg) and 1 other fieldsHigh correlation
출입구번호 has 100 (55.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 18:44:08.822906
Analysis finished2023-12-12 18:44:10.317133
Duration1.49 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
서울교통공사
127 
인천교통공사
52 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울교통공사
2nd row서울교통공사
3rd row서울교통공사
4th row서울교통공사
5th row서울교통공사

Common Values

ValueCountFrequency (%)
서울교통공사 127
70.9%
인천교통공사 52
29.1%

Length

2023-12-13T03:44:10.440965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:44:10.630401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울교통공사 127
70.9%
인천교통공사 52
29.1%

선명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
7호선
179 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row7호선
2nd row7호선
3rd row7호선
4th row7호선
5th row7호선

Common Values

ValueCountFrequency (%)
7호선 179
100.0%

Length

2023-12-13T03:44:10.822665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:44:10.996147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
7호선 179
100.0%

역명
Text

Distinct52
Distinct (%)29.1%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
2023-12-13T03:44:11.328249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length10
Mean length4.3910615
Min length2

Characters and Unicode

Total characters786
Distinct characters115
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.1%

Sample

1st row가산디지털단지
2nd row가산디지털단지
3rd row가산디지털단지
4th row가산디지털단지
5th row강남구청
ValueCountFrequency (%)
석남(거북시장 11
 
6.1%
부평구청 9
 
5.0%
내방 5
 
2.8%
도봉산 5
 
2.8%
상봉(시외버스터미널 5
 
2.8%
산곡 4
 
2.2%
굴포천 4
 
2.2%
부천종합운동장 4
 
2.2%
신대방삼거리 4
 
2.2%
신풍 4
 
2.2%
Other values (42) 124
69.3%
2023-12-13T03:44:11.946053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 37
 
4.7%
) 37
 
4.7%
31
 
3.9%
30
 
3.8%
25
 
3.2%
20
 
2.5%
19
 
2.4%
19
 
2.4%
17
 
2.2%
17
 
2.2%
Other values (105) 534
67.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 712
90.6%
Open Punctuation 37
 
4.7%
Close Punctuation 37
 
4.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
31
 
4.4%
30
 
4.2%
25
 
3.5%
20
 
2.8%
19
 
2.7%
19
 
2.7%
17
 
2.4%
17
 
2.4%
16
 
2.2%
16
 
2.2%
Other values (103) 502
70.5%
Open Punctuation
ValueCountFrequency (%)
( 37
100.0%
Close Punctuation
ValueCountFrequency (%)
) 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 712
90.6%
Common 74
 
9.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
31
 
4.4%
30
 
4.2%
25
 
3.5%
20
 
2.8%
19
 
2.7%
19
 
2.7%
17
 
2.4%
17
 
2.4%
16
 
2.2%
16
 
2.2%
Other values (103) 502
70.5%
Common
ValueCountFrequency (%)
( 37
50.0%
) 37
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 712
90.6%
ASCII 74
 
9.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 37
50.0%
) 37
50.0%
Hangul
ValueCountFrequency (%)
31
 
4.4%
30
 
4.2%
25
 
3.5%
20
 
2.8%
19
 
2.7%
19
 
2.7%
17
 
2.4%
17
 
2.4%
16
 
2.2%
16
 
2.2%
Other values (103) 502
70.5%

출입구번호
Real number (ℝ)

MISSING 

Distinct10
Distinct (%)12.7%
Missing100
Missing (%)55.9%
Infinite0
Infinite (%)0.0%
Mean3.6329114
Minimum1
Maximum11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-12-13T03:44:12.162133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile7.1
Maximum11
Range10
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.3214563
Coefficient of variation (CV)0.63900714
Kurtosis0.86230468
Mean3.6329114
Median Absolute Deviation (MAD)2
Skewness1.0091471
Sum287
Variance5.3891594
MonotonicityNot monotonic
2023-12-13T03:44:12.355942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
3 19
 
10.6%
1 16
 
8.9%
2 12
 
6.7%
5 10
 
5.6%
6 7
 
3.9%
4 6
 
3.4%
7 5
 
2.8%
10 2
 
1.1%
8 1
 
0.6%
11 1
 
0.6%
(Missing) 100
55.9%
ValueCountFrequency (%)
1 16
8.9%
2 12
6.7%
3 19
10.6%
4 6
 
3.4%
5 10
5.6%
6 7
 
3.9%
7 5
 
2.8%
8 1
 
0.6%
10 2
 
1.1%
11 1
 
0.6%
ValueCountFrequency (%)
11 1
 
0.6%
10 2
 
1.1%
8 1
 
0.6%
7 5
 
2.8%
6 7
 
3.9%
5 10
5.6%
4 6
 
3.4%
3 19
10.6%
2 12
6.7%
1 16
8.9%
Distinct91
Distinct (%)50.8%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
2023-12-13T03:44:12.697635image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length70
Median length67
Mean length21.821229
Min length3

Characters and Unicode

Total characters3906
Distinct characters132
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique70 ?
Unique (%)39.1%

Sample

1st row(B2-B4) 승강장
2nd row(B2-B4) 승강장
3rd row(B1-B2)지하2층 대합실
4th row(F1-B1)6번 출입구
5th row(B2-B3) 승강장
ValueCountFrequency (%)
출입구 99
 
12.1%
승강장 79
 
9.7%
방향 44
 
5.4%
b1 38
 
4.7%
b2-b3 30
 
3.7%
대합실 26
 
3.2%
1f 26
 
3.2%
엘리베이터 24
 
2.9%
24
 
2.9%
출입문 19
 
2.3%
Other values (143) 406
49.8%
2023-12-13T03:44:13.332469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
641
 
16.4%
B 260
 
6.7%
( 244
 
6.2%
) 244
 
6.2%
1 223
 
5.7%
- 162
 
4.1%
2 128
 
3.3%
128
 
3.3%
121
 
3.1%
120
 
3.1%
Other values (122) 1635
41.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1627
41.7%
Space Separator 641
 
16.4%
Decimal Number 575
 
14.7%
Uppercase Letter 357
 
9.1%
Open Punctuation 244
 
6.2%
Close Punctuation 244
 
6.2%
Dash Punctuation 162
 
4.1%
Other Punctuation 36
 
0.9%
Math Symbol 20
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
128
 
7.9%
121
 
7.4%
120
 
7.4%
107
 
6.6%
96
 
5.9%
93
 
5.7%
84
 
5.2%
52
 
3.2%
51
 
3.1%
41
 
2.5%
Other values (100) 734
45.1%
Decimal Number
ValueCountFrequency (%)
1 223
38.8%
2 128
22.3%
3 91
15.8%
4 58
 
10.1%
5 29
 
5.0%
7 24
 
4.2%
6 13
 
2.3%
8 5
 
0.9%
0 3
 
0.5%
9 1
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
B 260
72.8%
F 94
 
26.3%
M 1
 
0.3%
S 1
 
0.3%
G 1
 
0.3%
Math Symbol
ValueCountFrequency (%)
> 10
50.0%
< 10
50.0%
Space Separator
ValueCountFrequency (%)
641
100.0%
Open Punctuation
ValueCountFrequency (%)
( 244
100.0%
Close Punctuation
ValueCountFrequency (%)
) 244
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 162
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 36
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1922
49.2%
Hangul 1627
41.7%
Latin 357
 
9.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
128
 
7.9%
121
 
7.4%
120
 
7.4%
107
 
6.6%
96
 
5.9%
93
 
5.7%
84
 
5.2%
52
 
3.2%
51
 
3.1%
41
 
2.5%
Other values (100) 734
45.1%
Common
ValueCountFrequency (%)
641
33.4%
( 244
 
12.7%
) 244
 
12.7%
1 223
 
11.6%
- 162
 
8.4%
2 128
 
6.7%
3 91
 
4.7%
4 58
 
3.0%
/ 36
 
1.9%
5 29
 
1.5%
Other values (7) 66
 
3.4%
Latin
ValueCountFrequency (%)
B 260
72.8%
F 94
 
26.3%
M 1
 
0.3%
S 1
 
0.3%
G 1
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2279
58.3%
Hangul 1627
41.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
641
28.1%
B 260
11.4%
( 244
 
10.7%
) 244
 
10.7%
1 223
 
9.8%
- 162
 
7.1%
2 128
 
5.6%
F 94
 
4.1%
3 91
 
4.0%
4 58
 
2.5%
Other values (12) 134
 
5.9%
Hangul
ValueCountFrequency (%)
128
 
7.9%
121
 
7.4%
120
 
7.4%
107
 
6.6%
96
 
5.9%
93
 
5.7%
84
 
5.2%
52
 
3.2%
51
 
3.1%
41
 
2.5%
Other values (100) 734
45.1%

정원_인원
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
15
132 
17
16 
24
15 
11
 
12
20
 
4

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row15
2nd row15
3rd row15
4th row15
5th row15

Common Values

ValueCountFrequency (%)
15 132
73.7%
17 16
 
8.9%
24 15
 
8.4%
11 12
 
6.7%
20 4
 
2.2%

Length

2023-12-13T03:44:13.581248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:44:13.790442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
15 132
73.7%
17 16
 
8.9%
24 15
 
8.4%
11 12
 
6.7%
20 4
 
2.2%

정원_중량(kg)
Real number (ℝ)

HIGH CORRELATION 

Distinct8
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1064.7486
Minimum750
Maximum1600
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-12-13T03:44:13.984943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum750
5-th percentile750
Q11000
median1000
Q31080
95-th percentile1600
Maximum1600
Range850
Interquartile range (IQR)80

Descriptive statistics

Standard deviation189.45536
Coefficient of variation (CV)0.17793436
Kurtosis3.1646237
Mean1064.7486
Median Absolute Deviation (MAD)0
Skewness1.6711247
Sum190590
Variance35893.335
MonotonicityNot monotonic
2023-12-13T03:44:14.182977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
1000 107
59.8%
1150 16
 
8.9%
1600 15
 
8.4%
1080 13
 
7.3%
750 12
 
6.7%
1050 10
 
5.6%
1350 4
 
2.2%
1125 2
 
1.1%
ValueCountFrequency (%)
750 12
 
6.7%
1000 107
59.8%
1050 10
 
5.6%
1080 13
 
7.3%
1125 2
 
1.1%
1150 16
 
8.9%
1350 4
 
2.2%
1600 15
 
8.4%
ValueCountFrequency (%)
1600 15
 
8.4%
1350 4
 
2.2%
1150 16
 
8.9%
1125 2
 
1.1%
1080 13
 
7.3%
1050 10
 
5.6%
1000 107
59.8%
750 12
 
6.7%

Interactions

2023-12-13T03:44:09.666051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:44:09.365706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:44:09.813815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:44:09.508840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T03:44:14.327710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관명역명출입구번호상세위치정원_인원정원_중량(kg)
철도운영기관명1.0001.0000.3581.0000.4890.801
역명1.0001.0000.7330.0000.8490.898
출입구번호0.3580.7331.0001.0000.3330.000
상세위치1.0000.0001.0001.0000.9650.993
정원_인원0.4890.8490.3330.9651.0000.988
정원_중량(kg)0.8010.8980.0000.9930.9881.000
2023-12-13T03:44:14.536696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관명정원_인원
철도운영기관명1.0000.588
정원_인원0.5881.000
2023-12-13T03:44:14.746286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출입구번호정원_중량(kg)철도운영기관명정원_인원
출입구번호1.0000.2330.1970.167
정원_중량(kg)0.2331.0000.9220.958
철도운영기관명0.1970.9221.0000.588
정원_인원0.1670.9580.5881.000

Missing values

2023-12-13T03:44:10.023667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T03:44:10.242865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
0서울교통공사7호선가산디지털단지<NA>(B2-B4) 승강장151000
1서울교통공사7호선가산디지털단지<NA>(B2-B4) 승강장151000
2서울교통공사7호선가산디지털단지<NA>(B1-B2)지하2층 대합실151000
3서울교통공사7호선가산디지털단지6(F1-B1)6번 출입구151000
4서울교통공사7호선강남구청<NA>(B2-B3) 승강장151000
5서울교통공사7호선강남구청<NA>(B2-B3) 승강장151000
6서울교통공사7호선건대입구<NA>(B2-B3) 승강장151000
7서울교통공사7호선건대입구<NA>(B2-B3) 승강장151000
8서울교통공사7호선고속터미널<NA>(B2-B3) 승강장151000
9서울교통공사7호선고속터미널<NA>(B2-B3) 승강장151000
철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
169인천교통공사7호선석남(거북시장)6(B1) 4번/7번 출입구 오른방향 안쪽 엘리베이터 (1F) 7번 출입구 뒷편 고가도로 밑(2호선 8번 출입구 앞)201350
170인천교통공사7호선석남(거북시장)7(B1) 2번/3번 출입구 왼쪽 방향 엘리베이터 (1F) 2번 출입구 뒷편 고가도로 밑(2호선 3번 출입구 뒷편)201350
171인천교통공사7호선석남(거북시장)<NA>(B4) 2호선<->7호선 환승통로 (B2) 2호선<->7호선 환승통로151150
172인천교통공사7호선신중동<NA>(B2) 춘의역 방향 승강장 4-1/ 부천시청역 방향 승강장 5-3의 사이의 중앙 (B1) 표 내는 곳 내 대합실 중앙151050
173인천교통공사7호선신중동1(B1) 대합실 1번/2번 출입구 (1F) 1번/2번 출입구 사이151050
174인천교통공사7호선신중동7(B1) 대합실 7번 출입구 (1F) 7번 출입구 사이151050
175인천교통공사7호선신중동3(B1) 대합실 3번 출입구 (1F) 3번 출입구 인근151050
176인천교통공사7호선춘의<NA>(B2) 부천종합운동장역 방향 5-1/ 신중동 방향 4-4 (B1) 고객안내센터 앞 대합실151080
177인천교통공사7호선춘의3(B1) 대합실 3번/4번 출입구 (1F) 3번/4번 출입구 사이151080
178인천교통공사7호선춘의7(B1) 대합실 7번/8번 출입구 (1F) 3번/4번 출입구 사이151080

Duplicate rows

Most frequently occurring

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)# duplicates
0서울교통공사7호선가산디지털단지<NA>(B2-B4) 승강장1510002
1서울교통공사7호선강남구청<NA>(B2-B3) 승강장1510002
2서울교통공사7호선건대입구<NA>(B2-B3) 승강장1510002
3서울교통공사7호선고속터미널<NA>(B2-B3) 승강장1510002
4서울교통공사7호선남성<NA>(B2-B3) 승강장1510002
5서울교통공사7호선내방<NA>(B2-B3) 승강장1510002
6서울교통공사7호선논현<NA>(B2-B3) 승강장1510002
7서울교통공사7호선대림(구로구청)<NA>(B1-B2) 승강장1510002
8서울교통공사7호선도봉산2(F2-F1) 승강장117502
9서울교통공사7호선도봉산<NA>(F2-F1) 승강장1510002