Overview

Dataset statistics

Number of variables7
Number of observations472
Missing cells284
Missing cells (%)8.6%
Duplicate rows95
Duplicate rows (%)20.1%
Total size in memory27.3 KiB
Average record size in memory59.3 B

Variable types

Categorical2
Text2
Numeric3

Dataset

Description서울교통공사에서 관리하는 도시광역철도역들의 엘리베이터 데이터로 철도운영기관명, 선명, 역명, 출입구번호, 상세위치, 정원인원, 정원중량의데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041388/fileData.do

Alerts

철도운영기관명 has constant value ""Constant
Dataset has 95 (20.1%) duplicate rowsDuplicates
정원_인원 is highly overall correlated with 정원_중량(kg)High correlation
정원_중량(kg) is highly overall correlated with 정원_인원High correlation
출입구번호 has 284 (60.2%) missing valuesMissing

Reproduction

Analysis started2023-12-12 18:42:20.719312
Analysis finished2023-12-12 18:42:23.000063
Duration2.28 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
서울교통공사
472 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울교통공사
2nd row서울교통공사
3rd row서울교통공사
4th row서울교통공사
5th row서울교통공사

Common Values

ValueCountFrequency (%)
서울교통공사 472
100.0%

Length

2023-12-13T03:42:23.110364image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:42:23.273336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울교통공사 472
100.0%

선명
Categorical

Distinct4
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
5호선
175 
7호선
127 
6호선
113 
8호선
57 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5호선
2nd row5호선
3rd row5호선
4th row5호선
5th row5호선

Common Values

ValueCountFrequency (%)
5호선 175
37.1%
7호선 127
26.9%
6호선 113
23.9%
8호선 57
 
12.1%

Length

2023-12-13T03:42:23.471292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:42:23.649820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5호선 175
37.1%
7호선 127
26.9%
6호선 113
23.9%
8호선 57
 
12.1%

역명
Text

Distinct149
Distinct (%)31.6%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
2023-12-13T03:42:24.071420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length14
Mean length4.3135593
Min length2

Characters and Unicode

Total characters2036
Distinct characters199
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)1.9%

Sample

1st row강동
2nd row강동
3rd row강일
4th row강일
5th row강일
ValueCountFrequency (%)
공덕 7
 
1.5%
강일 7
 
1.5%
천호(풍납토성 6
 
1.3%
마곡 6
 
1.3%
신길 6
 
1.3%
가락시장 6
 
1.3%
태릉입구 6
 
1.3%
청구 5
 
1.1%
애오개 5
 
1.1%
하남풍산 5
 
1.1%
Other values (139) 413
87.5%
2023-12-13T03:42:24.738906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
) 105
 
5.2%
( 105
 
5.2%
75
 
3.7%
62
 
3.0%
56
 
2.8%
42
 
2.1%
41
 
2.0%
41
 
2.0%
36
 
1.8%
33
 
1.6%
Other values (189) 1440
70.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1818
89.3%
Close Punctuation 105
 
5.2%
Open Punctuation 105
 
5.2%
Other Punctuation 6
 
0.3%
Decimal Number 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
75
 
4.1%
62
 
3.4%
56
 
3.1%
42
 
2.3%
41
 
2.3%
41
 
2.3%
36
 
2.0%
33
 
1.8%
31
 
1.7%
28
 
1.5%
Other values (184) 1373
75.5%
Decimal Number
ValueCountFrequency (%)
3 1
50.0%
4 1
50.0%
Close Punctuation
ValueCountFrequency (%)
) 105
100.0%
Open Punctuation
ValueCountFrequency (%)
( 105
100.0%
Other Punctuation
ValueCountFrequency (%)
· 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1818
89.3%
Common 218
 
10.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
75
 
4.1%
62
 
3.4%
56
 
3.1%
42
 
2.3%
41
 
2.3%
41
 
2.3%
36
 
2.0%
33
 
1.8%
31
 
1.7%
28
 
1.5%
Other values (184) 1373
75.5%
Common
ValueCountFrequency (%)
) 105
48.2%
( 105
48.2%
· 6
 
2.8%
3 1
 
0.5%
4 1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1818
89.3%
ASCII 212
 
10.4%
None 6
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
) 105
49.5%
( 105
49.5%
3 1
 
0.5%
4 1
 
0.5%
Hangul
ValueCountFrequency (%)
75
 
4.1%
62
 
3.4%
56
 
3.1%
42
 
2.3%
41
 
2.3%
41
 
2.3%
36
 
2.0%
33
 
1.8%
31
 
1.7%
28
 
1.5%
Other values (184) 1373
75.5%
None
ValueCountFrequency (%)
· 6
100.0%

출입구번호
Real number (ℝ)

MISSING 

Distinct12
Distinct (%)6.4%
Missing284
Missing (%)60.2%
Infinite0
Infinite (%)0.0%
Mean3.5691489
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.3 KiB
2023-12-13T03:42:24.973496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum12
Range11
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.4691693
Coefficient of variation (CV)0.69180899
Kurtosis1.0478792
Mean3.5691489
Median Absolute Deviation (MAD)1
Skewness1.1861738
Sum671
Variance6.0967971
MonotonicityNot monotonic
2023-12-13T03:42:25.262214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1 41
 
8.7%
3 37
 
7.8%
2 36
 
7.6%
4 22
 
4.7%
6 15
 
3.2%
5 14
 
3.0%
8 8
 
1.7%
7 6
 
1.3%
11 3
 
0.6%
10 3
 
0.6%
Other values (2) 3
 
0.6%
(Missing) 284
60.2%
ValueCountFrequency (%)
1 41
8.7%
2 36
7.6%
3 37
7.8%
4 22
4.7%
5 14
 
3.0%
6 15
 
3.2%
7 6
 
1.3%
8 8
 
1.7%
9 2
 
0.4%
10 3
 
0.6%
ValueCountFrequency (%)
12 1
 
0.2%
11 3
 
0.6%
10 3
 
0.6%
9 2
 
0.4%
8 8
 
1.7%
7 6
 
1.3%
6 15
3.2%
5 14
 
3.0%
4 22
4.7%
3 37
7.8%
Distinct110
Distinct (%)23.3%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
2023-12-13T03:42:25.631675image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length15
Mean length11.887712
Min length7

Characters and Unicode

Total characters5611
Distinct characters44
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique57 ?
Unique (%)12.1%

Sample

1st row(B3-B4) 승강장
2nd row(F1-B3)1번 출입구
3rd row(B3-B2) 승강장
4th row(B3-B2) 승강장
5th row(B3-B1) 승강장
ValueCountFrequency (%)
승강장 218
24.5%
출입구 174
19.6%
b2-b3 72
 
8.1%
b1-b2 68
 
7.6%
대합실 20
 
2.2%
f1-b1)3번 20
 
2.2%
b1-b3 18
 
2.0%
f1-b1)2번 18
 
2.0%
f1-b1)1번 14
 
1.6%
b3-b4 14
 
1.6%
Other values (103) 254
28.5%
2023-12-13T03:42:26.244761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
B 735
13.1%
1 515
 
9.2%
- 474
 
8.4%
) 472
 
8.4%
( 472
 
8.4%
418
 
7.4%
2 290
 
5.2%
262
 
4.7%
258
 
4.6%
258
 
4.6%
Other values (34) 1457
26.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1655
29.5%
Decimal Number 1162
20.7%
Uppercase Letter 944
16.8%
Dash Punctuation 474
 
8.4%
Close Punctuation 472
 
8.4%
Open Punctuation 472
 
8.4%
Space Separator 418
 
7.4%
Other Punctuation 14
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
262
15.8%
258
15.6%
258
15.6%
175
10.6%
175
10.6%
175
10.6%
175
10.6%
29
 
1.8%
26
 
1.6%
26
 
1.6%
Other values (17) 96
 
5.8%
Decimal Number
ValueCountFrequency (%)
1 515
44.3%
2 290
25.0%
3 193
 
16.6%
4 87
 
7.5%
5 35
 
3.0%
6 18
 
1.5%
8 11
 
0.9%
7 8
 
0.7%
0 3
 
0.3%
9 2
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
B 735
77.9%
F 209
 
22.1%
Dash Punctuation
ValueCountFrequency (%)
- 474
100.0%
Close Punctuation
ValueCountFrequency (%)
) 472
100.0%
Open Punctuation
ValueCountFrequency (%)
( 472
100.0%
Space Separator
ValueCountFrequency (%)
418
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3012
53.7%
Hangul 1655
29.5%
Latin 944
 
16.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
262
15.8%
258
15.6%
258
15.6%
175
10.6%
175
10.6%
175
10.6%
175
10.6%
29
 
1.8%
26
 
1.6%
26
 
1.6%
Other values (17) 96
 
5.8%
Common
ValueCountFrequency (%)
1 515
17.1%
- 474
15.7%
) 472
15.7%
( 472
15.7%
418
13.9%
2 290
9.6%
3 193
 
6.4%
4 87
 
2.9%
5 35
 
1.2%
6 18
 
0.6%
Other values (5) 38
 
1.3%
Latin
ValueCountFrequency (%)
B 735
77.9%
F 209
 
22.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3956
70.5%
Hangul 1655
29.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B 735
18.6%
1 515
13.0%
- 474
12.0%
) 472
11.9%
( 472
11.9%
418
10.6%
2 290
 
7.3%
F 209
 
5.3%
3 193
 
4.9%
4 87
 
2.2%
Other values (7) 91
 
2.3%
Hangul
ValueCountFrequency (%)
262
15.8%
258
15.6%
258
15.6%
175
10.6%
175
10.6%
175
10.6%
175
10.6%
29
 
1.8%
26
 
1.6%
26
 
1.6%
Other values (17) 96
 
5.8%

정원_인원
Real number (ℝ)

HIGH CORRELATION 

Distinct7
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.548729
Minimum11
Maximum24
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.3 KiB
2023-12-13T03:42:26.472681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile11
Q115
median15
Q315
95-th percentile24
Maximum24
Range13
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3.2052502
Coefficient of variation (CV)0.20614227
Kurtosis2.5388406
Mean15.548729
Median Absolute Deviation (MAD)0
Skewness1.6120925
Sum7339
Variance10.273629
MonotonicityNot monotonic
2023-12-13T03:42:26.685674image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
15 356
75.4%
11 52
 
11.0%
24 48
 
10.2%
21 6
 
1.3%
17 5
 
1.1%
13 4
 
0.8%
12 1
 
0.2%
ValueCountFrequency (%)
11 52
 
11.0%
12 1
 
0.2%
13 4
 
0.8%
15 356
75.4%
17 5
 
1.1%
21 6
 
1.3%
24 48
 
10.2%
ValueCountFrequency (%)
24 48
 
10.2%
21 6
 
1.3%
17 5
 
1.1%
15 356
75.4%
13 4
 
0.8%
12 1
 
0.2%
11 52
 
11.0%

정원_중량(kg)
Real number (ℝ)

HIGH CORRELATION 

Distinct7
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1044.9576
Minimum750
Maximum1800
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.3 KiB
2023-12-13T03:42:26.902095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum750
5-th percentile750
Q11000
median1000
Q31000
95-th percentile1600
Maximum1800
Range1050
Interquartile range (IQR)0

Descriptive statistics

Standard deviation221.24958
Coefficient of variation (CV)0.21173067
Kurtosis2.7620745
Mean1044.9576
Median Absolute Deviation (MAD)0
Skewness1.7536968
Sum493220
Variance48951.378
MonotonicityNot monotonic
2023-12-13T03:42:27.102571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1000 352
74.6%
750 52
 
11.0%
1600 50
 
10.6%
1150 8
 
1.7%
1005 4
 
0.8%
1800 4
 
0.8%
900 2
 
0.4%
ValueCountFrequency (%)
750 52
 
11.0%
900 2
 
0.4%
1000 352
74.6%
1005 4
 
0.8%
1150 8
 
1.7%
1600 50
 
10.6%
1800 4
 
0.8%
ValueCountFrequency (%)
1800 4
 
0.8%
1600 50
 
10.6%
1150 8
 
1.7%
1005 4
 
0.8%
1000 352
74.6%
900 2
 
0.4%
750 52
 
11.0%

Interactions

2023-12-13T03:42:22.174285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:42:21.222687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:42:21.720872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:42:22.286575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:42:21.386778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:42:21.872852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:42:22.447100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:42:21.560656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:42:22.036785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T03:42:27.271068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명출입구번호정원_인원정원_중량(kg)
선명1.0000.2180.3840.155
출입구번호0.2181.0000.1230.176
정원_인원0.3840.1231.0000.826
정원_중량(kg)0.1550.1760.8261.000
2023-12-13T03:42:27.447469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출입구번호정원_인원정원_중량(kg)선명
출입구번호1.0000.1140.0990.129
정원_인원0.1141.0000.9600.256
정원_중량(kg)0.0990.9601.0000.243
선명0.1290.2560.2431.000

Missing values

2023-12-13T03:42:22.691193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T03:42:22.920336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
0서울교통공사5호선강동<NA>(B3-B4) 승강장151000
1서울교통공사5호선강동1(F1-B3)1번 출입구151000
2서울교통공사5호선강일<NA>(B3-B2) 승강장241600
3서울교통공사5호선강일<NA>(B3-B2) 승강장241600
4서울교통공사5호선강일<NA>(B3-B1) 승강장241600
5서울교통공사5호선강일<NA>(B3-B1) 승강장241600
6서울교통공사5호선강일1(B2-F1)1번 출입구241600
7서울교통공사5호선강일3(B2-F1)3번 출입구241600
8서울교통공사5호선강일4(B2-F1)4번 출입구241600
9서울교통공사5호선개롱<NA>(B1-B2) 승강장151000
철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
462서울교통공사8호선잠실(송파구청)<NA>(B1-B3) 승강장151000
463서울교통공사8호선잠실(송파구청)<NA>(B1-B3) 승강장151000
464서울교통공사8호선잠실(송파구청)9(F1-B1)9번 출입구151000
465서울교통공사8호선잠실(송파구청)10(B2-F1)10번 출입구171150
466서울교통공사8호선장지<NA>(B1-B2) 승강장11750
467서울교통공사8호선장지<NA>(B1-B2) 승강장11750
468서울교통공사8호선장지1(F1-B1)1번 출입구151000
469서울교통공사8호선장지3(F1-B1)3번 출입구151000
470서울교통공사8호선천호(풍납토성)<NA>(B1-B2) 승강장151000
471서울교통공사8호선천호(풍납토성)<NA>(B1-B2) 승강장151000

Duplicate rows

Most frequently occurring

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)# duplicates
0서울교통공사5호선강일<NA>(B3-B1) 승강장2416002
1서울교통공사5호선강일<NA>(B3-B2) 승강장2416002
2서울교통공사5호선개롱<NA>(B1-B2) 승강장1510002
3서울교통공사5호선거여<NA>(B1-B2) 승강장1510002
4서울교통공사5호선고덕<NA>(B1-B2) 승강장1510002
5서울교통공사5호선광나루(장신대)<NA>(B2-B3) 승강장1510002
6서울교통공사5호선군자(능동)<NA>(B1-B3) 승강장1510002
7서울교통공사5호선굽은다리(강동구민회관앞)<NA>(B1-B2) 승강장1510002
8서울교통공사5호선길동<NA>(B2-B3) 승강장1510002
9서울교통공사5호선김포공항<NA>(B2-B3) 승강장1510002