Overview

Dataset statistics

Number of variables7
Number of observations354
Missing cells198
Missing cells (%)8.0%
Duplicate rows1
Duplicate rows (%)0.3%
Total size in memory20.5 KiB
Average record size in memory59.4 B

Variable types

Categorical2
Text2
Numeric3

Dataset

Description서울교통공사에서 관리하는 도시광역철도역들의 엘리베이터 데이터로 철도운영기관명, 선명, 역명, 출입구번호, 상세위치, 정원인원, 정원중량의데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041387/fileData.do

Alerts

철도운영기관명 has constant value ""Constant
Dataset has 1 (0.3%) duplicate rowsDuplicates
정원_인원 is highly overall correlated with 정원_중량(kg)High correlation
정원_중량(kg) is highly overall correlated with 정원_인원High correlation
출입구번호 has 198 (55.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 04:45:08.650948
Analysis finished2023-12-12 04:45:10.870851
Duration2.22 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
서울교통공사
354 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울교통공사
2nd row서울교통공사
3rd row서울교통공사
4th row서울교통공사
5th row서울교통공사

Common Values

ValueCountFrequency (%)
서울교통공사 354
100.0%

Length

2023-12-12T13:45:10.939656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:45:11.032314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울교통공사 354
100.0%

선명
Categorical

Distinct4
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
2호선
150 
3호선
88 
4호선
80 
1호선
36 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
2호선 150
42.4%
3호선 88
24.9%
4호선 80
22.6%
1호선 36
 
10.2%

Length

2023-12-12T13:45:11.131250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:45:11.238011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2호선 150
42.4%
3호선 88
24.9%
4호선 80
22.6%
1호선 36
 
10.2%

역명
Text

Distinct108
Distinct (%)30.5%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
2023-12-12T13:45:11.490801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length4.1158192
Min length2

Characters and Unicode

Total characters1457
Distinct characters162
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)1.1%

Sample

1st row동대문
2nd row동대문
3rd row동대문
4th row동묘앞
5th row동묘앞
ValueCountFrequency (%)
신설동 7
 
2.0%
동묘앞 7
 
2.0%
창동 6
 
1.7%
서울역 6
 
1.7%
동대문 5
 
1.4%
이촌(국립중앙박물관 5
 
1.4%
삼각지 5
 
1.4%
교대(법원·검찰청 5
 
1.4%
삼성(무역센터 5
 
1.4%
시청 5
 
1.4%
Other values (98) 298
84.2%
2023-12-12T13:45:11.880383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
76
 
5.2%
67
 
4.6%
) 67
 
4.6%
( 67
 
4.6%
48
 
3.3%
46
 
3.2%
46
 
3.2%
32
 
2.2%
31
 
2.1%
28
 
1.9%
Other values (152) 949
65.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1303
89.4%
Close Punctuation 67
 
4.6%
Open Punctuation 67
 
4.6%
Decimal Number 15
 
1.0%
Other Punctuation 5
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
76
 
5.8%
67
 
5.1%
48
 
3.7%
46
 
3.5%
46
 
3.5%
32
 
2.5%
31
 
2.4%
28
 
2.1%
26
 
2.0%
26
 
2.0%
Other values (146) 877
67.3%
Decimal Number
ValueCountFrequency (%)
3 9
60.0%
4 3
 
20.0%
5 3
 
20.0%
Close Punctuation
ValueCountFrequency (%)
) 67
100.0%
Open Punctuation
ValueCountFrequency (%)
( 67
100.0%
Other Punctuation
ValueCountFrequency (%)
· 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1303
89.4%
Common 154
 
10.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
76
 
5.8%
67
 
5.1%
48
 
3.7%
46
 
3.5%
46
 
3.5%
32
 
2.5%
31
 
2.4%
28
 
2.1%
26
 
2.0%
26
 
2.0%
Other values (146) 877
67.3%
Common
ValueCountFrequency (%)
) 67
43.5%
( 67
43.5%
3 9
 
5.8%
· 5
 
3.2%
4 3
 
1.9%
5 3
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1303
89.4%
ASCII 149
 
10.2%
None 5
 
0.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
76
 
5.8%
67
 
5.1%
48
 
3.7%
46
 
3.5%
46
 
3.5%
32
 
2.5%
31
 
2.4%
28
 
2.1%
26
 
2.0%
26
 
2.0%
Other values (146) 877
67.3%
ASCII
ValueCountFrequency (%)
) 67
45.0%
( 67
45.0%
3 9
 
6.0%
4 3
 
2.0%
5 3
 
2.0%
None
ValueCountFrequency (%)
· 5
100.0%

출입구번호
Real number (ℝ)

MISSING 

Distinct14
Distinct (%)9.0%
Missing198
Missing (%)55.9%
Infinite0
Infinite (%)0.0%
Mean3.9358974
Minimum1
Maximum14
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2023-12-12T13:45:11.987116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median3
Q36
95-th percentile10
Maximum14
Range13
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.0726252
Coefficient of variation (CV)0.78066699
Kurtosis0.94743475
Mean3.9358974
Median Absolute Deviation (MAD)2
Skewness1.1641802
Sum614
Variance9.4410256
MonotonicityNot monotonic
2023-12-12T13:45:12.078618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
1 46
 
13.0%
3 19
 
5.4%
2 19
 
5.4%
4 19
 
5.4%
6 15
 
4.2%
5 11
 
3.1%
8 7
 
2.0%
7 5
 
1.4%
9 5
 
1.4%
10 4
 
1.1%
Other values (4) 6
 
1.7%
(Missing) 198
55.9%
ValueCountFrequency (%)
1 46
13.0%
2 19
5.4%
3 19
5.4%
4 19
5.4%
5 11
 
3.1%
6 15
 
4.2%
7 5
 
1.4%
8 7
 
2.0%
9 5
 
1.4%
10 4
 
1.1%
ValueCountFrequency (%)
14 2
 
0.6%
13 1
 
0.3%
12 2
 
0.6%
11 1
 
0.3%
10 4
 
1.1%
9 5
 
1.4%
8 7
2.0%
7 5
 
1.4%
6 15
4.2%
5 11
3.1%
Distinct183
Distinct (%)51.7%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
2023-12-12T13:45:12.301572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length23
Mean length13.20904
Min length7

Characters and Unicode

Total characters4676
Distinct characters58
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique133 ?
Unique (%)37.6%

Sample

1st row(B1-B2) 10-4
2nd row(B1-B2) 2-1
3rd row(B1-F1)6번 출입구측
4th row(B1-F4)본관건물(상)6-2
5th row(B1-B2) 10-3
ValueCountFrequency (%)
출입구측 118
 
16.7%
b1-b2 96
 
13.6%
b1-f1)1번 28
 
4.0%
출입구 24
 
3.4%
3-2 18
 
2.6%
f2-f3 18
 
2.6%
b3-b2 15
 
2.1%
b1-f1)3번 15
 
2.1%
8-3 13
 
1.8%
b1-f1)2번 13
 
1.8%
Other values (130) 347
49.2%
2023-12-12T13:45:12.669916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 567
12.1%
1 538
11.5%
B 466
10.0%
) 399
 
8.5%
( 399
 
8.5%
352
 
7.5%
2 295
 
6.3%
F 243
 
5.2%
3 187
 
4.0%
156
 
3.3%
Other values (48) 1074
23.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1299
27.8%
Other Letter 941
20.1%
Uppercase Letter 709
15.2%
Dash Punctuation 567
12.1%
Close Punctuation 399
 
8.5%
Open Punctuation 399
 
8.5%
Space Separator 352
 
7.5%
Other Punctuation 6
 
0.1%
Math Symbol 4
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
156
16.6%
156
16.6%
156
16.6%
156
16.6%
120
12.8%
39
 
4.1%
39
 
4.1%
25
 
2.7%
15
 
1.6%
15
 
1.6%
Other values (30) 64
6.8%
Decimal Number
ValueCountFrequency (%)
1 538
41.4%
2 295
22.7%
3 187
 
14.4%
4 118
 
9.1%
8 50
 
3.8%
6 35
 
2.7%
7 28
 
2.2%
5 25
 
1.9%
9 12
 
0.9%
0 11
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
B 466
65.7%
F 243
34.3%
Dash Punctuation
ValueCountFrequency (%)
- 567
100.0%
Close Punctuation
ValueCountFrequency (%)
) 399
100.0%
Open Punctuation
ValueCountFrequency (%)
( 399
100.0%
Space Separator
ValueCountFrequency (%)
352
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 6
100.0%
Math Symbol
ValueCountFrequency (%)
~ 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3026
64.7%
Hangul 941
 
20.1%
Latin 709
 
15.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
156
16.6%
156
16.6%
156
16.6%
156
16.6%
120
12.8%
39
 
4.1%
39
 
4.1%
25
 
2.7%
15
 
1.6%
15
 
1.6%
Other values (30) 64
6.8%
Common
ValueCountFrequency (%)
- 567
18.7%
1 538
17.8%
) 399
13.2%
( 399
13.2%
352
11.6%
2 295
9.7%
3 187
 
6.2%
4 118
 
3.9%
8 50
 
1.7%
6 35
 
1.2%
Other values (6) 86
 
2.8%
Latin
ValueCountFrequency (%)
B 466
65.7%
F 243
34.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3735
79.9%
Hangul 941
 
20.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 567
15.2%
1 538
14.4%
B 466
12.5%
) 399
10.7%
( 399
10.7%
352
9.4%
2 295
7.9%
F 243
6.5%
3 187
 
5.0%
4 118
 
3.2%
Other values (8) 171
 
4.6%
Hangul
ValueCountFrequency (%)
156
16.6%
156
16.6%
156
16.6%
156
16.6%
120
12.8%
39
 
4.1%
39
 
4.1%
25
 
2.7%
15
 
1.6%
15
 
1.6%
Other values (30) 64
6.8%

정원_인원
Real number (ℝ)

HIGH CORRELATION 

Distinct12
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.053672
Minimum8
Maximum24
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2023-12-12T13:45:12.782532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile11
Q113
median15
Q315
95-th percentile15
Maximum24
Range16
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.9613693
Coefficient of variation (CV)0.13956276
Kurtosis2.5938525
Mean14.053672
Median Absolute Deviation (MAD)0
Skewness-0.41975927
Sum4975
Variance3.8469695
MonotonicityNot monotonic
2023-12-12T13:45:12.902786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
15 233
65.8%
11 54
 
15.3%
13 42
 
11.9%
17 7
 
2.0%
9 6
 
1.7%
10 3
 
0.8%
8 3
 
0.8%
20 2
 
0.6%
24 1
 
0.3%
16 1
 
0.3%
Other values (2) 2
 
0.6%
ValueCountFrequency (%)
8 3
 
0.8%
9 6
 
1.7%
10 3
 
0.8%
11 54
 
15.3%
12 1
 
0.3%
13 42
 
11.9%
15 233
65.8%
16 1
 
0.3%
17 7
 
2.0%
20 2
 
0.6%
ValueCountFrequency (%)
24 1
 
0.3%
21 1
 
0.3%
20 2
 
0.6%
17 7
 
2.0%
16 1
 
0.3%
15 233
65.8%
13 42
 
11.9%
12 1
 
0.3%
11 54
 
15.3%
10 3
 
0.8%

정원_중량(kg)
Real number (ℝ)

HIGH CORRELATION 

Distinct7
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean956.63842
Minimum600
Maximum1600
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2023-12-12T13:45:13.005060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum600
5-th percentile750
Q11000
median1000
Q31000
95-th percentile1000
Maximum1600
Range1000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation114.85796
Coefficient of variation (CV)0.12006413
Kurtosis4.2355994
Mean956.63842
Median Absolute Deviation (MAD)0
Skewness-0.62887201
Sum338650
Variance13192.35
MonotonicityNot monotonic
2023-12-12T13:45:13.096518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1000 270
76.3%
750 53
 
15.0%
900 15
 
4.2%
600 7
 
2.0%
1150 6
 
1.7%
1350 2
 
0.6%
1600 1
 
0.3%
ValueCountFrequency (%)
600 7
 
2.0%
750 53
 
15.0%
900 15
 
4.2%
1000 270
76.3%
1150 6
 
1.7%
1350 2
 
0.6%
1600 1
 
0.3%
ValueCountFrequency (%)
1600 1
 
0.3%
1350 2
 
0.6%
1150 6
 
1.7%
1000 270
76.3%
900 15
 
4.2%
750 53
 
15.0%
600 7
 
2.0%

Interactions

2023-12-12T13:45:10.326620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:45:08.968774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:45:09.954609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:45:10.433272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:45:09.695337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:45:10.093228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:45:10.546892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:45:09.842459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:45:10.207790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T13:45:13.170232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명출입구번호정원_인원정원_중량(kg)
선명1.0000.0950.1390.068
출입구번호0.0951.0000.0000.000
정원_인원0.1390.0001.0000.972
정원_중량(kg)0.0680.0000.9721.000
2023-12-12T13:45:13.259495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출입구번호정원_인원정원_중량(kg)선명
출입구번호1.000-0.080-0.0820.052
정원_인원-0.0801.0000.8080.076
정원_중량(kg)-0.0820.8081.0000.041
선명0.0520.0760.0411.000

Missing values

2023-12-12T13:45:10.672928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:45:10.808204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
0서울교통공사1호선동대문<NA>(B1-B2) 10-411750
1서울교통공사1호선동대문<NA>(B1-B2) 2-111750
2서울교통공사1호선동대문6(B1-F1)6번 출입구측151000
3서울교통공사1호선동묘앞<NA>(B1-F4)본관건물(상)6-2151000
4서울교통공사1호선동묘앞<NA>(B1-B2) 10-3151000
5서울교통공사1호선동묘앞<NA>(B2-B1) 4-8151000
6서울교통공사1호선동묘앞<NA>(B1-B2) 7-3151000
7서울교통공사1호선동묘앞<NA>(B2-B1) 1-2151000
8서울교통공사1호선동묘앞1(B1-F1)1-10번 출입구 사이151000
9서울교통공사1호선동묘앞3(B1-F1)3번 출입구측151000
철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
344서울교통공사4호선충무로7(B1-F1)7번 출입구8600
345서울교통공사4호선한성대입구(삼선교)<NA>(B1-B2)섬식(상)6-1151000
346서울교통공사4호선한성대입구(삼선교)3(B1-F1)3-4번 출입구사이151000
347서울교통공사4호선한성대입구(삼선교)5(B1-F1)5번 출입구측151000
348서울교통공사4호선혜화<NA>(B1-B2) 7-411750
349서울교통공사4호선혜화<NA>(B1-B2) 3-411750
350서울교통공사4호선혜화2(B1-F1)2번 출입구측11750
351서울교통공사4호선혜화3(B1-F1)3번 출입구측11750
352서울교통공사4호선회현(남대문시장)<NA>(B4-B1)섬식(상)10-1151000
353서울교통공사4호선회현(남대문시장)3(B1-F1)지상도로 중앙151000

Duplicate rows

Most frequently occurring

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)# duplicates
0서울교통공사3호선일원1(B2-F1)1번 출입구측1711502