Overview

Dataset statistics

Number of variables7
Number of observations304
Missing cells159
Missing cells (%)7.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory17.6 KiB
Average record size in memory59.4 B

Variable types

Categorical4
Text2
Numeric1

Dataset

Description대구교통공사에서 운영하는 노선의 엘리베이터 데이터로 철도운영기관명, 선명, 역명, 출입구번호, 상세위치, 정원인원, 정원중량의데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041380/fileData.do

Alerts

철도운영기관명 has constant value ""Constant
정원_인원 is highly overall correlated with 정원_중량(kg)High correlation
정원_중량(kg) is highly overall correlated with 정원_인원High correlation
정원_인원 is highly imbalanced (75.9%)Imbalance
정원_중량(kg) is highly imbalanced (78.6%)Imbalance
출입구번호 has 159 (52.3%) missing valuesMissing

Reproduction

Analysis started2023-12-12 16:55:31.128472
Analysis finished2023-12-12 16:55:32.123824
Duration1 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
대구교통공사
304 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대구교통공사
2nd row대구교통공사
3rd row대구교통공사
4th row대구교통공사
5th row대구교통공사

Common Values

ValueCountFrequency (%)
대구교통공사 304
100.0%

Length

2023-12-13T01:55:32.200973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:55:32.312498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대구교통공사 304
100.0%

선명
Categorical

Distinct3
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
1호선
117 
3호선
103 
2호선
84 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
1호선 117
38.5%
3호선 103
33.9%
2호선 84
27.6%

Length

2023-12-13T01:55:32.417023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:55:32.534601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1호선 117
38.5%
3호선 103
33.9%
2호선 84
27.6%

역명
Text

Distinct88
Distinct (%)28.9%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2023-12-13T01:55:32.813303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length2
Mean length3.9835526
Min length2

Characters and Unicode

Total characters1211
Distinct characters137
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row각산
2nd row각산
3rd row각산
4th row각산
5th row교대
ValueCountFrequency (%)
청라언덕 10
 
3.3%
명덕(2.28민주운동기념회관 10
 
3.3%
반월당 6
 
2.0%
대공원 6
 
2.0%
율하 5
 
1.6%
용지 4
 
1.3%
강창 4
 
1.3%
각산 4
 
1.3%
수성시장 4
 
1.3%
임당 4
 
1.3%
Other values (78) 247
81.2%
2023-12-13T01:55:33.353120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
63
 
5.2%
( 42
 
3.5%
) 42
 
3.5%
34
 
2.8%
31
 
2.6%
29
 
2.4%
27
 
2.2%
24
 
2.0%
22
 
1.8%
21
 
1.7%
Other values (127) 876
72.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1060
87.5%
Open Punctuation 42
 
3.5%
Close Punctuation 42
 
3.5%
Decimal Number 30
 
2.5%
Other Punctuation 22
 
1.8%
Uppercase Letter 15
 
1.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
63
 
5.9%
34
 
3.2%
31
 
2.9%
29
 
2.7%
27
 
2.5%
24
 
2.3%
22
 
2.1%
21
 
2.0%
21
 
2.0%
20
 
1.9%
Other values (116) 768
72.5%
Uppercase Letter
ValueCountFrequency (%)
B 5
33.3%
C 3
20.0%
T 3
20.0%
S 2
 
13.3%
K 2
 
13.3%
Decimal Number
ValueCountFrequency (%)
2 20
66.7%
8 10
33.3%
Other Punctuation
ValueCountFrequency (%)
· 12
54.5%
. 10
45.5%
Open Punctuation
ValueCountFrequency (%)
( 42
100.0%
Close Punctuation
ValueCountFrequency (%)
) 42
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1060
87.5%
Common 136
 
11.2%
Latin 15
 
1.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
63
 
5.9%
34
 
3.2%
31
 
2.9%
29
 
2.7%
27
 
2.5%
24
 
2.3%
22
 
2.1%
21
 
2.0%
21
 
2.0%
20
 
1.9%
Other values (116) 768
72.5%
Common
ValueCountFrequency (%)
( 42
30.9%
) 42
30.9%
2 20
14.7%
· 12
 
8.8%
8 10
 
7.4%
. 10
 
7.4%
Latin
ValueCountFrequency (%)
B 5
33.3%
C 3
20.0%
T 3
20.0%
S 2
 
13.3%
K 2
 
13.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1060
87.5%
ASCII 139
 
11.5%
None 12
 
1.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
63
 
5.9%
34
 
3.2%
31
 
2.9%
29
 
2.7%
27
 
2.5%
24
 
2.3%
22
 
2.1%
21
 
2.0%
21
 
2.0%
20
 
1.9%
Other values (116) 768
72.5%
ASCII
ValueCountFrequency (%)
( 42
30.2%
) 42
30.2%
2 20
14.4%
8 10
 
7.2%
. 10
 
7.2%
B 5
 
3.6%
C 3
 
2.2%
T 3
 
2.2%
S 2
 
1.4%
K 2
 
1.4%
None
ValueCountFrequency (%)
· 12
100.0%

출입구번호
Real number (ℝ)

MISSING 

Distinct9
Distinct (%)6.2%
Missing159
Missing (%)52.3%
Infinite0
Infinite (%)0.0%
Mean2.9034483
Minimum1
Maximum23
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 KiB
2023-12-13T01:55:33.520842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median3
Q34
95-th percentile5.8
Maximum23
Range22
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.2831634
Coefficient of variation (CV)0.7863627
Kurtosis41.165903
Mean2.9034483
Median Absolute Deviation (MAD)1
Skewness4.976626
Sum421
Variance5.2128352
MonotonicityNot monotonic
2023-12-13T01:55:33.633807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
1 40
 
13.2%
4 33
 
10.9%
3 32
 
10.5%
2 26
 
8.6%
5 6
 
2.0%
6 3
 
1.0%
7 2
 
0.7%
8 2
 
0.7%
23 1
 
0.3%
(Missing) 159
52.3%
ValueCountFrequency (%)
1 40
13.2%
2 26
8.6%
3 32
10.5%
4 33
10.9%
5 6
 
2.0%
6 3
 
1.0%
7 2
 
0.7%
8 2
 
0.7%
23 1
 
0.3%
ValueCountFrequency (%)
23 1
 
0.3%
8 2
 
0.7%
7 2
 
0.7%
6 3
 
1.0%
5 6
 
2.0%
4 33
10.9%
3 32
10.5%
2 26
8.6%
1 40
13.2%
Distinct293
Distinct (%)96.4%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2023-12-13T01:55:33.948398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length101
Median length62
Mean length42.069079
Min length19

Characters and Unicode

Total characters12789
Distinct characters275
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique284 ?
Unique (%)93.4%

Sample

1st row(1F) 1번 출입구 앞 (B1F) 발매기실 앞
2nd row(B1F) 설화명곡역 방향 표 내는 곳 (B2F) 설화명곡역 방향 6-2 출입문 앞
3rd row(B1F) 안심역 방향 표 내는 곳 (B2F) 안심역 방향 3-3 출입문 앞
4th row(1F) 2번/3번 출입구 사이 (B1F) 2번/3번 출입구 방향
5th row(1F) 4번 출입구 앞 (B2F) 2코너 표 내는 곳 옆
ValueCountFrequency (%)
출입구 287
 
9.2%
방향 247
 
7.9%
187
 
6.0%
1f 147
 
4.7%
b1f 142
 
4.5%
141
 
4.5%
출입문 121
 
3.9%
승강장 119
 
3.8%
사이 87
 
2.8%
b2f 80
 
2.6%
Other values (446) 1573
50.2%
2023-12-13T01:55:34.532364image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2848
22.3%
( 667
 
5.2%
) 667
 
5.2%
F 639
 
5.0%
1 543
 
4.2%
432
 
3.4%
431
 
3.4%
403
 
3.2%
2 363
 
2.8%
B 340
 
2.7%
Other values (265) 5456
42.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 5743
44.9%
Space Separator 2848
22.3%
Decimal Number 1472
 
11.5%
Uppercase Letter 1039
 
8.1%
Open Punctuation 667
 
5.2%
Close Punctuation 667
 
5.2%
Other Punctuation 173
 
1.4%
Dash Punctuation 165
 
1.3%
Lowercase Letter 14
 
0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
432
 
7.5%
431
 
7.5%
403
 
7.0%
336
 
5.9%
312
 
5.4%
268
 
4.7%
193
 
3.4%
171
 
3.0%
156
 
2.7%
149
 
2.6%
Other values (231) 2892
50.4%
Uppercase Letter
ValueCountFrequency (%)
F 639
61.5%
B 340
32.7%
S 19
 
1.8%
E 11
 
1.1%
P 9
 
0.9%
D 8
 
0.8%
G 2
 
0.2%
A 2
 
0.2%
T 2
 
0.2%
M 2
 
0.2%
Other values (5) 5
 
0.5%
Decimal Number
ValueCountFrequency (%)
1 543
36.9%
2 363
24.7%
3 293
19.9%
4 137
 
9.3%
5 52
 
3.5%
6 40
 
2.7%
8 19
 
1.3%
7 18
 
1.2%
0 6
 
0.4%
9 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
/ 168
97.1%
# 3
 
1.7%
" 2
 
1.2%
Space Separator
ValueCountFrequency (%)
2848
100.0%
Open Punctuation
ValueCountFrequency (%)
( 667
100.0%
Close Punctuation
ValueCountFrequency (%)
) 667
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 165
100.0%
Lowercase Letter
ValueCountFrequency (%)
m 14
100.0%
Math Symbol
ValueCountFrequency (%)
> 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5993
46.9%
Hangul 5743
44.9%
Latin 1053
 
8.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
432
 
7.5%
431
 
7.5%
403
 
7.0%
336
 
5.9%
312
 
5.4%
268
 
4.7%
193
 
3.4%
171
 
3.0%
156
 
2.7%
149
 
2.6%
Other values (231) 2892
50.4%
Common
ValueCountFrequency (%)
2848
47.5%
( 667
 
11.1%
) 667
 
11.1%
1 543
 
9.1%
2 363
 
6.1%
3 293
 
4.9%
/ 168
 
2.8%
- 165
 
2.8%
4 137
 
2.3%
5 52
 
0.9%
Other values (8) 90
 
1.5%
Latin
ValueCountFrequency (%)
F 639
60.7%
B 340
32.3%
S 19
 
1.8%
m 14
 
1.3%
E 11
 
1.0%
P 9
 
0.9%
D 8
 
0.8%
G 2
 
0.2%
A 2
 
0.2%
T 2
 
0.2%
Other values (6) 7
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7046
55.1%
Hangul 5743
44.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2848
40.4%
( 667
 
9.5%
) 667
 
9.5%
F 639
 
9.1%
1 543
 
7.7%
2 363
 
5.2%
B 340
 
4.8%
3 293
 
4.2%
/ 168
 
2.4%
- 165
 
2.3%
Other values (24) 353
 
5.0%
Hangul
ValueCountFrequency (%)
432
 
7.5%
431
 
7.5%
403
 
7.0%
336
 
5.9%
312
 
5.4%
268
 
4.7%
193
 
3.4%
171
 
3.0%
156
 
2.7%
149
 
2.6%
Other values (231) 2892
50.4%

정원_인원
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
13
282 
15
 
11
18
 
7
21
 
4

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row13
2nd row13
3rd row13
4th row13
5th row13

Common Values

ValueCountFrequency (%)
13 282
92.8%
15 11
 
3.6%
18 7
 
2.3%
21 4
 
1.3%

Length

2023-12-13T01:55:34.690922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:55:34.817744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
13 282
92.8%
15 11
 
3.6%
18 7
 
2.3%
21 4
 
1.3%

정원_중량(kg)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
1000
282 
1150
 
10
1350
 
7
1600
 
4
1100
 
1

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row1000
2nd row1000
3rd row1000
4th row1000
5th row1000

Common Values

ValueCountFrequency (%)
1000 282
92.8%
1150 10
 
3.3%
1350 7
 
2.3%
1600 4
 
1.3%
1100 1
 
0.3%

Length

2023-12-13T01:55:34.938988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:55:35.081167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1000 282
92.8%
1150 10
 
3.3%
1350 7
 
2.3%
1600 4
 
1.3%
1100 1
 
0.3%

Interactions

2023-12-13T01:55:31.485509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T01:55:35.175986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명역명출입구번호정원_인원정원_중량(kg)
선명1.0000.9950.1240.1070.156
역명0.9951.0000.0000.9570.893
출입구번호0.1240.0001.0000.0690.069
정원_인원0.1070.9570.0691.0001.000
정원_중량(kg)0.1560.8930.0691.0001.000
2023-12-13T01:55:35.299156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
정원_인원선명정원_중량(kg)
정원_인원1.0000.1010.998
선명0.1011.0000.118
정원_중량(kg)0.9980.1181.000
2023-12-13T01:55:35.402360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출입구번호선명정원_인원정원_중량(kg)
출입구번호1.0000.0920.0540.054
선명0.0921.0000.1010.118
정원_인원0.0540.1011.0000.998
정원_중량(kg)0.0540.1180.9981.000

Missing values

2023-12-13T01:55:31.891746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:55:32.072857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
0대구교통공사1호선각산1(1F) 1번 출입구 앞 (B1F) 발매기실 앞131000
1대구교통공사1호선각산<NA>(B1F) 설화명곡역 방향 표 내는 곳 (B2F) 설화명곡역 방향 6-2 출입문 앞131000
2대구교통공사1호선각산<NA>(B1F) 안심역 방향 표 내는 곳 (B2F) 안심역 방향 3-3 출입문 앞131000
3대구교통공사1호선각산3(1F) 2번/3번 출입구 사이 (B1F) 2번/3번 출입구 방향131000
4대구교통공사1호선교대4(1F) 4번 출입구 앞 (B2F) 2코너 표 내는 곳 옆131000
5대구교통공사1호선교대3(1F) 3번 출입구 앞 (B2F) 2코너 표 사는 곳 옆131000
6대구교통공사1호선교대<NA>(B2F) 2코너 표 내는 곳 옆 (B3F) 영대병원역 방향 승강장 6-4근처 비승차구간131000
7대구교통공사1호선교대<NA>(B2F) 2코너 표 내는 곳 옆 (B3F) 명덕역 방향 승강장 2-3 출입문 앞131000
8대구교통공사1호선대곡(정부대구청사)4(1F) 4번 출입구 앞 (B2F) 대합실 고객안내센터 옆 (B3F) 화원역 방향승강장 4-1 출입문 앞131000
9대구교통공사1호선대곡(정부대구청사)3(1F) 3번 출입구 앞 (B2F) 대합실 발매기실 옆 (B3F) 화원역 방향 승강장 6-1출입문 앞131000
철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
294대구교통공사3호선팔달시장1(1F) 1번 출입구 옆 (2F) 1번 출입구 대합실 표 내는 곳 옆181350
295대구교통공사3호선팔달시장<NA>(2F) 1번 출입구 대합실 표 내는 곳 앞(칠곡경대병원방면) (3F) 만평역 방향 승강장 3-2131000
296대구교통공사3호선팔달시장<NA>(2F) 1번 출입구 대합실 표 내는 곳 앞(용지방면) (3F) 원대역 방향 승강장 1-1131000
297대구교통공사3호선학정1(1F) 1번/4번 출입구 사이 (2F)1발매기 옆 통로131000
298대구교통공사3호선학정2(1F) 2번/3번 출입구 사이 (2F)2발매 옆 통로131000
299대구교통공사3호선학정<NA>(2F)표내는 곳 근처 (3F)칠곡경대병원역 승강장 1-2 앞131000
300대구교통공사3호선학정<NA>(2F)표내는 곳 근처 (3F)용지역 방향 승강장 3-1 앞131000
301대구교통공사3호선황금1(1F) 1번 출입구 옆 (2F)대합실가는 통로131000
302대구교통공사3호선황금<NA>(2F)대합실 (3F)어린이회관역 방향 3-1 출입문131000
303대구교통공사3호선황금<NA>(2F)대합실 (3F)황금역 방향 1-1 출입문131000