Overview

Dataset statistics

Number of variables7
Number of observations175
Missing cells107
Missing cells (%)8.7%
Duplicate rows37
Duplicate rows (%)21.1%
Total size in memory10.2 KiB
Average record size in memory59.8 B

Variable types

Categorical2
Text2
Numeric3

Dataset

Description수도권5호선에 포함된 도시광역철도역들의 엘리베이터 데이터로 철도운영기관명, 선명, 역명, 출입구번호, 상세위치, 정원인원, 정원중량의데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041393/fileData.do

Alerts

철도운영기관명 has constant value ""Constant
선명 has constant value ""Constant
Dataset has 37 (21.1%) duplicate rowsDuplicates
정원_인원 is highly overall correlated with 정원_중량(kg)High correlation
정원_중량(kg) is highly overall correlated with 정원_인원High correlation
출입구번호 has 107 (61.1%) missing valuesMissing

Reproduction

Analysis started2023-12-12 16:24:17.235347
Analysis finished2023-12-12 16:24:18.473387
Duration1.24 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
서울교통공사
175 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울교통공사
2nd row서울교통공사
3rd row서울교통공사
4th row서울교통공사
5th row서울교통공사

Common Values

ValueCountFrequency (%)
서울교통공사 175
100.0%

Length

2023-12-13T01:24:18.531053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:24:18.646363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울교통공사 175
100.0%

선명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
5호선
175 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5호선
2nd row5호선
3rd row5호선
4th row5호선
5th row5호선

Common Values

ValueCountFrequency (%)
5호선 175
100.0%

Length

2023-12-13T01:24:18.755270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:24:18.850990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5호선 175
100.0%

역명
Text

Distinct55
Distinct (%)31.4%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
2023-12-13T01:24:19.103866image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length11
Mean length4.1714286
Min length2

Characters and Unicode

Total characters730
Distinct characters111
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.7%

Sample

1st row강동
2nd row강동
3rd row강일
4th row강일
5th row강일
ValueCountFrequency (%)
강일 7
 
4.0%
신길 6
 
3.4%
마곡 6
 
3.4%
애오개 5
 
2.9%
오금 5
 
2.9%
하남풍산 5
 
2.9%
고덕 4
 
2.3%
천호(풍납토성 4
 
2.3%
미사 4
 
2.3%
상일동 4
 
2.3%
Other values (45) 125
71.4%
2023-12-13T01:24:19.440990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 34
 
4.7%
) 34
 
4.7%
29
 
4.0%
23
 
3.2%
21
 
2.9%
19
 
2.6%
17
 
2.3%
14
 
1.9%
14
 
1.9%
14
 
1.9%
Other values (101) 511
70.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 656
89.9%
Open Punctuation 34
 
4.7%
Close Punctuation 34
 
4.7%
Other Punctuation 4
 
0.5%
Decimal Number 2
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
29
 
4.4%
23
 
3.5%
21
 
3.2%
19
 
2.9%
17
 
2.6%
14
 
2.1%
14
 
2.1%
14
 
2.1%
13
 
2.0%
13
 
2.0%
Other values (96) 479
73.0%
Decimal Number
ValueCountFrequency (%)
4 1
50.0%
3 1
50.0%
Open Punctuation
ValueCountFrequency (%)
( 34
100.0%
Close Punctuation
ValueCountFrequency (%)
) 34
100.0%
Other Punctuation
ValueCountFrequency (%)
· 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 656
89.9%
Common 74
 
10.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
29
 
4.4%
23
 
3.5%
21
 
3.2%
19
 
2.9%
17
 
2.6%
14
 
2.1%
14
 
2.1%
14
 
2.1%
13
 
2.0%
13
 
2.0%
Other values (96) 479
73.0%
Common
ValueCountFrequency (%)
( 34
45.9%
) 34
45.9%
· 4
 
5.4%
4 1
 
1.4%
3 1
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 656
89.9%
ASCII 70
 
9.6%
None 4
 
0.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 34
48.6%
) 34
48.6%
4 1
 
1.4%
3 1
 
1.4%
Hangul
ValueCountFrequency (%)
29
 
4.4%
23
 
3.5%
21
 
3.2%
19
 
2.9%
17
 
2.6%
14
 
2.1%
14
 
2.1%
14
 
2.1%
13
 
2.0%
13
 
2.0%
Other values (96) 479
73.0%
None
ValueCountFrequency (%)
· 4
100.0%

출입구번호
Real number (ℝ)

MISSING 

Distinct8
Distinct (%)11.8%
Missing107
Missing (%)61.1%
Infinite0
Infinite (%)0.0%
Mean3.1911765
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-12-13T01:24:19.606428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile7
Maximum8
Range7
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.9642883
Coefficient of variation (CV)0.61553734
Kurtosis-0.22788994
Mean3.1911765
Median Absolute Deviation (MAD)1
Skewness0.82022742
Sum217
Variance3.8584284
MonotonicityNot monotonic
2023-12-13T01:24:19.732618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
1 15
 
8.6%
3 15
 
8.6%
2 15
 
8.6%
4 7
 
4.0%
6 5
 
2.9%
5 5
 
2.9%
7 4
 
2.3%
8 2
 
1.1%
(Missing) 107
61.1%
ValueCountFrequency (%)
1 15
8.6%
2 15
8.6%
3 15
8.6%
4 7
4.0%
5 5
 
2.9%
6 5
 
2.9%
7 4
 
2.3%
8 2
 
1.1%
ValueCountFrequency (%)
8 2
 
1.1%
7 4
 
2.3%
6 5
 
2.9%
5 5
 
2.9%
4 7
4.0%
3 15
8.6%
2 15
8.6%
1 15
8.6%
Distinct66
Distinct (%)37.7%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
2023-12-13T01:24:20.247975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length15
Mean length11.914286
Min length9

Characters and Unicode

Total characters2085
Distinct characters35
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique34 ?
Unique (%)19.4%

Sample

1st row(B3-B4) 승강장
2nd row(F1-B3)1번 출입구
3rd row(B3-B2) 승강장
4th row(B3-B2) 승강장
5th row(B3-B1) 승강장
ValueCountFrequency (%)
승강장 78
23.8%
출입구 67
20.4%
b1-b2 26
 
7.9%
b2-b3 20
 
6.1%
f1-b1)2번 8
 
2.4%
대합실 7
 
2.1%
b1-b3 7
 
2.1%
f1-b1)3번 6
 
1.8%
b3-b4 5
 
1.5%
b1-b5)승강장 5
 
1.5%
Other values (59) 99
30.2%
2023-12-13T01:24:20.641148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
B 280
13.4%
1 190
 
9.1%
( 175
 
8.4%
- 175
 
8.4%
) 175
 
8.4%
153
 
7.3%
2 101
 
4.8%
95
 
4.6%
94
 
4.5%
94
 
4.5%
Other values (25) 553
26.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 615
29.5%
Decimal Number 433
20.8%
Uppercase Letter 350
16.8%
Open Punctuation 175
 
8.4%
Dash Punctuation 175
 
8.4%
Close Punctuation 175
 
8.4%
Space Separator 153
 
7.3%
Other Punctuation 9
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
95
15.4%
94
15.3%
94
15.3%
67
10.9%
67
10.9%
67
10.9%
67
10.9%
12
 
2.0%
12
 
2.0%
12
 
2.0%
Other values (10) 28
 
4.6%
Decimal Number
ValueCountFrequency (%)
1 190
43.9%
2 101
23.3%
3 64
 
14.8%
4 37
 
8.5%
5 23
 
5.3%
6 7
 
1.6%
7 6
 
1.4%
8 5
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
B 280
80.0%
F 70
 
20.0%
Open Punctuation
ValueCountFrequency (%)
( 175
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 175
100.0%
Close Punctuation
ValueCountFrequency (%)
) 175
100.0%
Space Separator
ValueCountFrequency (%)
153
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1120
53.7%
Hangul 615
29.5%
Latin 350
 
16.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
95
15.4%
94
15.3%
94
15.3%
67
10.9%
67
10.9%
67
10.9%
67
10.9%
12
 
2.0%
12
 
2.0%
12
 
2.0%
Other values (10) 28
 
4.6%
Common
ValueCountFrequency (%)
1 190
17.0%
( 175
15.6%
- 175
15.6%
) 175
15.6%
153
13.7%
2 101
9.0%
3 64
 
5.7%
4 37
 
3.3%
5 23
 
2.1%
/ 9
 
0.8%
Other values (3) 18
 
1.6%
Latin
ValueCountFrequency (%)
B 280
80.0%
F 70
 
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1470
70.5%
Hangul 615
29.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B 280
19.0%
1 190
12.9%
( 175
11.9%
- 175
11.9%
) 175
11.9%
153
10.4%
2 101
 
6.9%
F 70
 
4.8%
3 64
 
4.4%
4 37
 
2.5%
Other values (5) 50
 
3.4%
Hangul
ValueCountFrequency (%)
95
15.4%
94
15.3%
94
15.3%
67
10.9%
67
10.9%
67
10.9%
67
10.9%
12
 
2.0%
12
 
2.0%
12
 
2.0%
Other values (10) 28
 
4.6%

정원_인원
Real number (ℝ)

HIGH CORRELATION 

Distinct7
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16.628571
Minimum11
Maximum24
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-12-13T01:24:20.749908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile12.7
Q115
median15
Q315
95-th percentile24
Maximum24
Range13
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3.8288508
Coefficient of variation (CV)0.23025735
Kurtosis0.030262387
Mean16.628571
Median Absolute Deviation (MAD)0
Skewness1.207672
Sum2910
Variance14.660099
MonotonicityNot monotonic
2023-12-13T01:24:20.850515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
15 125
71.4%
24 34
 
19.4%
11 8
 
4.6%
13 3
 
1.7%
21 3
 
1.7%
17 1
 
0.6%
12 1
 
0.6%
ValueCountFrequency (%)
11 8
 
4.6%
12 1
 
0.6%
13 3
 
1.7%
15 125
71.4%
17 1
 
0.6%
21 3
 
1.7%
24 34
 
19.4%
ValueCountFrequency (%)
24 34
 
19.4%
21 3
 
1.7%
17 1
 
0.6%
15 125
71.4%
13 3
 
1.7%
12 1
 
0.6%
11 8
 
4.6%

정원_중량(kg)
Real number (ℝ)

HIGH CORRELATION 

Distinct7
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1120.6857
Minimum750
Maximum1800
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-12-13T01:24:20.947146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum750
5-th percentile900
Q11000
median1000
Q31000
95-th percentile1600
Maximum1800
Range1050
Interquartile range (IQR)0

Descriptive statistics

Standard deviation267.55833
Coefficient of variation (CV)0.23874519
Kurtosis0.16990963
Mean1120.6857
Median Absolute Deviation (MAD)0
Skewness1.2864009
Sum196120
Variance71587.458
MonotonicityNot monotonic
2023-12-13T01:24:21.037722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1000 122
69.7%
1600 33
 
18.9%
750 8
 
4.6%
1005 4
 
2.3%
1800 4
 
2.3%
900 2
 
1.1%
1150 2
 
1.1%
ValueCountFrequency (%)
750 8
 
4.6%
900 2
 
1.1%
1000 122
69.7%
1005 4
 
2.3%
1150 2
 
1.1%
1600 33
 
18.9%
1800 4
 
2.3%
ValueCountFrequency (%)
1800 4
 
2.3%
1600 33
 
18.9%
1150 2
 
1.1%
1005 4
 
2.3%
1000 122
69.7%
900 2
 
1.1%
750 8
 
4.6%

Interactions

2023-12-13T01:24:18.009813image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:24:17.532027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:24:17.777534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:24:18.098075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:24:17.615433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:24:17.859708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:24:18.190970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:24:17.697329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:24:17.935571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T01:24:21.111501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명출입구번호상세위치정원_인원정원_중량(kg)
역명1.0000.0000.8990.7720.847
출입구번호0.0001.0001.0000.0000.000
상세위치0.8991.0001.0000.7640.902
정원_인원0.7720.0000.7641.0000.827
정원_중량(kg)0.8470.0000.9020.8271.000
2023-12-13T01:24:21.196480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출입구번호정원_인원정원_중량(kg)
출입구번호1.0000.0090.025
정원_인원0.0091.0000.949
정원_중량(kg)0.0250.9491.000

Missing values

2023-12-13T01:24:18.295404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:24:18.422272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
0서울교통공사5호선강동<NA>(B3-B4) 승강장151000
1서울교통공사5호선강동1(F1-B3)1번 출입구151000
2서울교통공사5호선강일<NA>(B3-B2) 승강장241600
3서울교통공사5호선강일<NA>(B3-B2) 승강장241600
4서울교통공사5호선강일<NA>(B3-B1) 승강장241600
5서울교통공사5호선강일<NA>(B3-B1) 승강장241600
6서울교통공사5호선강일1(B2-F1)1번 출입구241600
7서울교통공사5호선강일3(B2-F1)3번 출입구241600
8서울교통공사5호선강일4(B2-F1)4번 출입구241600
9서울교통공사5호선개롱<NA>(B1-B2) 승강장151000
철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
165서울교통공사5호선하남풍산<NA>(B2-B1) 승강장241600
166서울교통공사5호선하남풍산<NA>(B2-B1) 승강장241600
167서울교통공사5호선하남풍산1(F1-F2)1번 출입구241600
168서울교통공사5호선하남풍산5(B1-F1)5번 출입구241600
169서울교통공사5호선하남풍산7(B1-F1)7번 출입구241600
170서울교통공사5호선행당<NA>(B3-B5) 승강장151000
171서울교통공사5호선행당<NA>(B4-B5) 승강장151000
172서울교통공사5호선행당3(F1-B3)3번 출입구151000
173서울교통공사5호선화곡<NA>(B1-B2)승강장11750
174서울교통공사5호선화곡1(F1-B1)1/2번 출입구151000

Duplicate rows

Most frequently occurring

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)# duplicates
0서울교통공사5호선강일<NA>(B3-B1) 승강장2416002
1서울교통공사5호선강일<NA>(B3-B2) 승강장2416002
2서울교통공사5호선개롱<NA>(B1-B2) 승강장1510002
3서울교통공사5호선거여<NA>(B1-B2) 승강장1510002
4서울교통공사5호선고덕<NA>(B1-B2) 승강장1510002
5서울교통공사5호선광나루(장신대)<NA>(B2-B3) 승강장1510002
6서울교통공사5호선군자(능동)<NA>(B1-B3) 승강장1510002
7서울교통공사5호선굽은다리(강동구민회관앞)<NA>(B1-B2) 승강장1510002
8서울교통공사5호선길동<NA>(B2-B3) 승강장1510002
9서울교통공사5호선김포공항<NA>(B2-B3) 승강장1510002