Overview

Dataset statistics

Number of variables7
Number of observations347
Missing cells141
Missing cells (%)5.8%
Duplicate rows7
Duplicate rows (%)2.0%
Total size in memory20.1 KiB
Average record size in memory59.4 B

Variable types

Categorical2
Text2
Numeric3

Dataset

Description수도권1호선에 포함된 도시광역철도역들의 엘리베이터 데이터로 철도운영기관명, 선명, 역명, 출입구번호, 상세위치, 정원인원, 정원중량의데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041389/fileData.do

Alerts

선명 has constant value ""Constant
Dataset has 7 (2.0%) duplicate rowsDuplicates
정원_인원 is highly overall correlated with 정원_중량(kg)High correlation
정원_중량(kg) is highly overall correlated with 정원_인원High correlation
철도운영기관명 is highly imbalanced (51.9%)Imbalance
출입구번호 has 132 (38.0%) missing valuesMissing
정원_인원 has 5 (1.4%) missing valuesMissing

Reproduction

Analysis started2023-12-12 18:25:12.661983
Analysis finished2023-12-12 18:25:14.235163
Duration1.57 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
코레일
311 
서울교통공사
36 

Length

Max length6
Median length3
Mean length3.3112392
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울교통공사
2nd row서울교통공사
3rd row서울교통공사
4th row서울교통공사
5th row서울교통공사

Common Values

ValueCountFrequency (%)
코레일 311
89.6%
서울교통공사 36
 
10.4%

Length

2023-12-13T03:25:14.332232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:25:14.770969image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
코레일 311
89.6%
서울교통공사 36
 
10.4%

선명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
1호선
347 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
1호선 347
100.0%

Length

2023-12-13T03:25:14.906083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:25:15.033355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1호선 347
100.0%

역명
Text

Distinct94
Distinct (%)27.1%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
2023-12-13T03:25:15.345874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length2
Mean length2.5965418
Min length2

Characters and Unicode

Total characters901
Distinct characters117
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)2.0%

Sample

1st row동대문
2nd row동대문
3rd row동대문
4th row동묘앞
5th row동묘앞
ValueCountFrequency (%)
영등포 12
 
3.5%
광명 12
 
3.5%
의정부 8
 
2.3%
월계 8
 
2.3%
녹천 8
 
2.3%
동묘앞 7
 
2.0%
구로 7
 
2.0%
천안 6
 
1.7%
주안 6
 
1.7%
평택 6
 
1.7%
Other values (84) 267
76.9%
2023-12-13T03:25:15.861219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
42
 
4.7%
40
 
4.4%
31
 
3.4%
27
 
3.0%
27
 
3.0%
22
 
2.4%
20
 
2.2%
19
 
2.1%
16
 
1.8%
16
 
1.8%
Other values (107) 641
71.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 875
97.1%
Close Punctuation 10
 
1.1%
Open Punctuation 10
 
1.1%
Decimal Number 6
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
42
 
4.8%
40
 
4.6%
31
 
3.5%
27
 
3.1%
27
 
3.1%
22
 
2.5%
20
 
2.3%
19
 
2.2%
16
 
1.8%
16
 
1.8%
Other values (103) 615
70.3%
Decimal Number
ValueCountFrequency (%)
5 3
50.0%
3 3
50.0%
Close Punctuation
ValueCountFrequency (%)
) 10
100.0%
Open Punctuation
ValueCountFrequency (%)
( 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 875
97.1%
Common 26
 
2.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
42
 
4.8%
40
 
4.6%
31
 
3.5%
27
 
3.1%
27
 
3.1%
22
 
2.5%
20
 
2.3%
19
 
2.2%
16
 
1.8%
16
 
1.8%
Other values (103) 615
70.3%
Common
ValueCountFrequency (%)
) 10
38.5%
( 10
38.5%
5 3
 
11.5%
3 3
 
11.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 875
97.1%
ASCII 26
 
2.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
42
 
4.8%
40
 
4.6%
31
 
3.5%
27
 
3.1%
27
 
3.1%
22
 
2.5%
20
 
2.3%
19
 
2.2%
16
 
1.8%
16
 
1.8%
Other values (103) 615
70.3%
ASCII
ValueCountFrequency (%)
) 10
38.5%
( 10
38.5%
5 3
 
11.5%
3 3
 
11.5%

출입구번호
Real number (ℝ)

MISSING 

Distinct10
Distinct (%)4.7%
Missing132
Missing (%)38.0%
Infinite0
Infinite (%)0.0%
Mean2.1906977
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2023-12-13T03:25:16.006809image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile6
Maximum12
Range11
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.776247
Coefficient of variation (CV)0.81081337
Kurtosis6.2588892
Mean2.1906977
Median Absolute Deviation (MAD)1
Skewness2.278569
Sum471
Variance3.1550532
MonotonicityNot monotonic
2023-12-13T03:25:16.158076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1 102
29.4%
2 58
16.7%
3 23
 
6.6%
4 10
 
2.9%
6 7
 
2.0%
5 7
 
2.0%
8 4
 
1.2%
7 2
 
0.6%
12 1
 
0.3%
9 1
 
0.3%
(Missing) 132
38.0%
ValueCountFrequency (%)
1 102
29.4%
2 58
16.7%
3 23
 
6.6%
4 10
 
2.9%
5 7
 
2.0%
6 7
 
2.0%
7 2
 
0.6%
8 4
 
1.2%
9 1
 
0.3%
12 1
 
0.3%
ValueCountFrequency (%)
12 1
 
0.3%
9 1
 
0.3%
8 4
 
1.2%
7 2
 
0.6%
6 7
 
2.0%
5 7
 
2.0%
4 10
 
2.9%
3 23
 
6.6%
2 58
16.7%
1 102
29.4%
Distinct327
Distinct (%)94.8%
Missing2
Missing (%)0.6%
Memory size2.8 KiB
2023-12-13T03:25:16.563382image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length102
Median length52
Mean length27.637681
Min length3

Characters and Unicode

Total characters9535
Distinct characters244
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique312 ?
Unique (%)90.4%

Sample

1st row(B1-B2) 10-4
2nd row(B1-B2) 2-1
3rd row(B1-F1)6번 출입구측
4th row(B1-F4)본관건물(상)6-2
5th row(B1-B2) 10-3
ValueCountFrequency (%)
1f 181
 
7.7%
승강장 139
 
5.9%
출입구 113
 
4.8%
방향 112
 
4.8%
105
 
4.5%
103
 
4.4%
2f 89
 
3.8%
출입문 63
 
2.7%
맞이방 49
 
2.1%
계단 48
 
2.0%
Other values (454) 1348
57.4%
2023-12-13T03:25:17.190317image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2090
21.9%
( 515
 
5.4%
) 514
 
5.4%
1 491
 
5.1%
F 374
 
3.9%
2 309
 
3.2%
252
 
2.6%
251
 
2.6%
243
 
2.5%
223
 
2.3%
Other values (234) 4273
44.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4431
46.5%
Space Separator 2090
21.9%
Decimal Number 1162
 
12.2%
Open Punctuation 515
 
5.4%
Close Punctuation 514
 
5.4%
Uppercase Letter 496
 
5.2%
Dash Punctuation 192
 
2.0%
Other Punctuation 118
 
1.2%
Math Symbol 16
 
0.2%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
252
 
5.7%
251
 
5.7%
243
 
5.5%
223
 
5.0%
199
 
4.5%
195
 
4.4%
188
 
4.2%
167
 
3.8%
153
 
3.5%
139
 
3.1%
Other values (203) 2421
54.6%
Decimal Number
ValueCountFrequency (%)
1 491
42.3%
2 309
26.6%
3 144
 
12.4%
4 94
 
8.1%
5 36
 
3.1%
6 26
 
2.2%
7 23
 
2.0%
8 16
 
1.4%
0 12
 
1.0%
9 11
 
0.9%
Uppercase Letter
ValueCountFrequency (%)
F 374
75.4%
B 113
 
22.8%
C 2
 
0.4%
M 1
 
0.2%
R 1
 
0.2%
N 1
 
0.2%
E 1
 
0.2%
S 1
 
0.2%
U 1
 
0.2%
A 1
 
0.2%
Other Punctuation
ValueCountFrequency (%)
/ 110
93.2%
. 6
 
5.1%
· 2
 
1.7%
Math Symbol
ValueCountFrequency (%)
> 11
68.8%
3
 
18.8%
~ 2
 
12.5%
Space Separator
ValueCountFrequency (%)
2090
100.0%
Open Punctuation
ValueCountFrequency (%)
( 515
100.0%
Close Punctuation
ValueCountFrequency (%)
) 514
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 192
100.0%
Lowercase Letter
ValueCountFrequency (%)
m 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4607
48.3%
Hangul 4431
46.5%
Latin 497
 
5.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
252
 
5.7%
251
 
5.7%
243
 
5.5%
223
 
5.0%
199
 
4.5%
195
 
4.4%
188
 
4.2%
167
 
3.8%
153
 
3.5%
139
 
3.1%
Other values (203) 2421
54.6%
Common
ValueCountFrequency (%)
2090
45.4%
( 515
 
11.2%
) 514
 
11.2%
1 491
 
10.7%
2 309
 
6.7%
- 192
 
4.2%
3 144
 
3.1%
/ 110
 
2.4%
4 94
 
2.0%
5 36
 
0.8%
Other values (10) 112
 
2.4%
Latin
ValueCountFrequency (%)
F 374
75.3%
B 113
 
22.7%
C 2
 
0.4%
m 1
 
0.2%
M 1
 
0.2%
R 1
 
0.2%
N 1
 
0.2%
E 1
 
0.2%
S 1
 
0.2%
U 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5099
53.5%
Hangul 4431
46.5%
Arrows 3
 
< 0.1%
None 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2090
41.0%
( 515
 
10.1%
) 514
 
10.1%
1 491
 
9.6%
F 374
 
7.3%
2 309
 
6.1%
- 192
 
3.8%
3 144
 
2.8%
B 113
 
2.2%
/ 110
 
2.2%
Other values (19) 247
 
4.8%
Hangul
ValueCountFrequency (%)
252
 
5.7%
251
 
5.7%
243
 
5.5%
223
 
5.0%
199
 
4.5%
195
 
4.4%
188
 
4.2%
167
 
3.8%
153
 
3.5%
139
 
3.1%
Other values (203) 2421
54.6%
Arrows
ValueCountFrequency (%)
3
100.0%
None
ValueCountFrequency (%)
· 2
100.0%

정원_인원
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct9
Distinct (%)2.6%
Missing5
Missing (%)1.4%
Infinite0
Infinite (%)0.0%
Mean15.669591
Minimum11
Maximum40
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2023-12-13T03:25:17.355933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile11
Q115
median15
Q315
95-th percentile20
Maximum40
Range29
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.9406043
Coefficient of variation (CV)0.31529888
Kurtosis18.175964
Mean15.669591
Median Absolute Deviation (MAD)0
Skewness4.1947055
Sum5359
Variance24.409571
MonotonicityNot monotonic
2023-12-13T03:25:17.504776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
15 252
72.6%
11 27
 
7.8%
13 26
 
7.5%
17 14
 
4.0%
40 12
 
3.5%
20 6
 
1.7%
24 2
 
0.6%
21 2
 
0.6%
16 1
 
0.3%
(Missing) 5
 
1.4%
ValueCountFrequency (%)
11 27
 
7.8%
13 26
 
7.5%
15 252
72.6%
16 1
 
0.3%
17 14
 
4.0%
20 6
 
1.7%
21 2
 
0.6%
24 2
 
0.6%
40 12
 
3.5%
ValueCountFrequency (%)
40 12
 
3.5%
24 2
 
0.6%
21 2
 
0.6%
20 6
 
1.7%
17 14
 
4.0%
16 1
 
0.3%
15 252
72.6%
13 26
 
7.5%
11 27
 
7.8%

정원_중량(kg)
Real number (ℝ)

HIGH CORRELATION 

Distinct13
Distinct (%)3.8%
Missing2
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean1062.2986
Minimum500
Maximum2600
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2023-12-13T03:25:17.631723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum500
5-th percentile750
Q11000
median1000
Q31000
95-th percentile1350
Maximum2600
Range2100
Interquartile range (IQR)0

Descriptive statistics

Standard deviation317.25738
Coefficient of variation (CV)0.2986518
Kurtosis16.799255
Mean1062.2986
Median Absolute Deviation (MAD)0
Skewness3.9903604
Sum366493
Variance100652.24
MonotonicityNot monotonic
2023-12-13T03:25:17.764503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
1000 249
71.8%
750 24
 
6.9%
1150 22
 
6.3%
900 12
 
3.5%
2600 12
 
3.5%
1040 6
 
1.7%
1275 6
 
1.7%
1600 5
 
1.4%
1350 5
 
1.4%
500 1
 
0.3%
Other values (3) 3
 
0.9%
(Missing) 2
 
0.6%
ValueCountFrequency (%)
500 1
 
0.3%
750 24
 
6.9%
900 12
 
3.5%
1000 249
71.8%
1001 1
 
0.3%
1002 1
 
0.3%
1040 6
 
1.7%
1050 1
 
0.3%
1150 22
 
6.3%
1275 6
 
1.7%
ValueCountFrequency (%)
2600 12
 
3.5%
1600 5
 
1.4%
1350 5
 
1.4%
1275 6
 
1.7%
1150 22
 
6.3%
1050 1
 
0.3%
1040 6
 
1.7%
1002 1
 
0.3%
1001 1
 
0.3%
1000 249
71.8%

Interactions

2023-12-13T03:25:13.587120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:25:13.026791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:25:13.300444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:25:13.690361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:25:13.116012image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:25:13.393450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:25:13.795428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:25:13.210159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:25:13.480819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T03:25:17.853760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관명역명출입구번호정원_인원정원_중량(kg)
철도운영기관명1.0001.0000.2880.0000.153
역명1.0001.0000.8430.9340.952
출입구번호0.2880.8431.0000.2960.228
정원_인원0.0000.9340.2961.0000.985
정원_중량(kg)0.1530.9520.2280.9851.000
2023-12-13T03:25:17.992390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출입구번호정원_인원정원_중량(kg)철도운영기관명
출입구번호1.0000.1750.2060.282
정원_인원0.1751.0000.7810.000
정원_중량(kg)0.2060.7811.0000.097
철도운영기관명0.2820.0000.0971.000

Missing values

2023-12-13T03:25:13.911415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T03:25:14.051908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T03:25:14.160006image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
0서울교통공사1호선동대문<NA>(B1-B2) 10-411750
1서울교통공사1호선동대문<NA>(B1-B2) 2-111750
2서울교통공사1호선동대문6(B1-F1)6번 출입구측151000
3서울교통공사1호선동묘앞<NA>(B1-F4)본관건물(상)6-2151000
4서울교통공사1호선동묘앞<NA>(B1-B2) 10-3151000
5서울교통공사1호선동묘앞<NA>(B2-B1) 4-8151000
6서울교통공사1호선동묘앞<NA>(B1-B2) 7-3151000
7서울교통공사1호선동묘앞<NA>(B2-B1) 1-2151000
8서울교통공사1호선동묘앞1(B1-F1)1-10번 출입구 사이151000
9서울교통공사1호선동묘앞3(B1-F1)3번 출입구측151000
철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)
337코레일1호선화서5(1) 1층 상행맞이방 장애인게이트 근처/ 5번 출입구 옆 (2) 6번 출입구 옆151000
338코레일1호선화서1(1R) 2번 출입구 근처/ (1) 1층 하행맞이방 장애인게이트 앞 (2) 1번 출입구 옆151000
339코레일1호선회기1(1층) 1번 출입구 옆131000
340코레일1호선회기2(1층) 2번 출입구 옆131000
341코레일1호선회기<NA>(1층) 광운대방면 승강장 5-4 출입문 앞131000
342코레일1호선회기<NA>(1층) 문산방면 승강장 4-4 출입문 앞131000
343코레일1호선회룡<NA>(1F) 망월사역 방향 승강장 7-2 출입문 앞151000
344코레일1호선회룡<NA>(1F) 의정부역 방향 승강장 3-4 출입문 앞151000
345코레일1호선회룡3(1F) 3번 출입구 북쪽 20M 지점151000
346코레일1호선회룡5(1F) 5번 출입구 계단 옆151000

Duplicate rows

Most frequently occurring

철도운영기관명선명역명출입구번호상세위치정원_인원정원_중량(kg)# duplicates
0코레일1호선녹천4<NA><NA><NA>2
1코레일1호선동두천중앙3(1F) 4번 출입구 방향1510002
2코레일1호선동암<NA>맞이방1510002
3코레일1호선부평2(2F) 2/3번 출입구 옆 (1F) 상행승강장1510002
4코레일1호선송탄<NA>(2F) 맞이방 주출입구 옆1510002
5코레일1호선의정부8(B2) 8번 출입구 방향1511502
6코레일1호선직산<NA>(2F) 1번/2번 출입구 사이 엘리베이터1510002