Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells39
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory664.1 KiB
Average record size in memory68.0 B

Variable types

Categorical3
Text1
Numeric3

Dataset

Description서울교통공사에서 운영하는 1호선의 승강장 이격거리에 대한 데이터로 철도운영기관명, 선명, 역명, 승강장이격거리에 대한 승강장번호, 차량순서, 차량출입문번호, 안전거리의 데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041514/fileData.do

Alerts

선명 has constant value ""Constant
철도운영기관명 is highly imbalanced (60.7%)Imbalance

Reproduction

Analysis started2023-12-12 05:43:21.418064
Analysis finished2023-12-12 05:43:23.317770
Duration1.9 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
코레일
9227 
서울교통공사
 
773

Length

Max length6
Median length3
Mean length3.2319
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row코레일
2nd row코레일
3rd row코레일
4th row코레일
5th row코레일

Common Values

ValueCountFrequency (%)
코레일 9227
92.3%
서울교통공사 773
 
7.7%

Length

2023-12-12T14:43:23.381948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:43:23.532645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
코레일 9227
92.3%
서울교통공사 773
 
7.7%

선명
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1호선
10000 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
1호선 10000
100.0%

Length

2023-12-12T14:43:23.669396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:43:23.777184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1호선 10000
100.0%

역명
Text

Distinct97
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T14:43:24.070710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length2
Mean length2.5441
Min length2

Characters and Unicode

Total characters25441
Distinct characters117
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row오류동
2nd row석수
3rd row세마
4th row구일
5th row평택
ValueCountFrequency (%)
구로 351
 
3.5%
광운대 196
 
2.0%
영등포 194
 
1.9%
대방 157
 
1.6%
제물포 156
 
1.6%
용산 156
 
1.6%
동인천 156
 
1.6%
군포 156
 
1.6%
창동 156
 
1.6%
동암 155
 
1.6%
Other values (87) 8167
81.7%
2023-12-12T14:43:24.512447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1163
 
4.6%
972
 
3.8%
918
 
3.6%
892
 
3.5%
660
 
2.6%
649
 
2.6%
532
 
2.1%
506
 
2.0%
506
 
2.0%
501
 
2.0%
Other values (107) 18142
71.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 24822
97.6%
Open Punctuation 232
 
0.9%
Close Punctuation 232
 
0.9%
Decimal Number 155
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1163
 
4.7%
972
 
3.9%
918
 
3.7%
892
 
3.6%
660
 
2.7%
649
 
2.6%
532
 
2.1%
506
 
2.0%
506
 
2.0%
501
 
2.0%
Other values (103) 17523
70.6%
Decimal Number
ValueCountFrequency (%)
3 78
50.3%
5 77
49.7%
Open Punctuation
ValueCountFrequency (%)
( 232
100.0%
Close Punctuation
ValueCountFrequency (%)
) 232
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 24822
97.6%
Common 619
 
2.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1163
 
4.7%
972
 
3.9%
918
 
3.7%
892
 
3.6%
660
 
2.7%
649
 
2.6%
532
 
2.1%
506
 
2.0%
506
 
2.0%
501
 
2.0%
Other values (103) 17523
70.6%
Common
ValueCountFrequency (%)
( 232
37.5%
) 232
37.5%
3 78
 
12.6%
5 77
 
12.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 24822
97.6%
ASCII 619
 
2.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1163
 
4.7%
972
 
3.9%
918
 
3.7%
892
 
3.6%
660
 
2.7%
649
 
2.6%
532
 
2.1%
506
 
2.0%
506
 
2.0%
501
 
2.0%
Other values (103) 17523
70.6%
ASCII
ValueCountFrequency (%)
( 232
37.5%
) 232
37.5%
3 78
 
12.6%
5 77
 
12.4%

승강장번호
Real number (ℝ)

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0768
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:43:24.658678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile4
Maximum9
Range8
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.2280279
Coefficient of variation (CV)0.59130774
Kurtosis5.9040504
Mean2.0768
Median Absolute Deviation (MAD)1
Skewness1.9037344
Sum20768
Variance1.5080526
MonotonicityNot monotonic
2023-12-12T14:43:24.794404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
2 3747
37.5%
1 3732
37.3%
3 1217
 
12.2%
4 1028
 
10.3%
5 117
 
1.2%
9 40
 
0.4%
8 40
 
0.4%
7 40
 
0.4%
6 39
 
0.4%
ValueCountFrequency (%)
1 3732
37.3%
2 3747
37.5%
3 1217
 
12.2%
4 1028
 
10.3%
5 117
 
1.2%
6 39
 
0.4%
7 40
 
0.4%
8 40
 
0.4%
9 40
 
0.4%
ValueCountFrequency (%)
9 40
 
0.4%
8 40
 
0.4%
7 40
 
0.4%
6 39
 
0.4%
5 117
 
1.2%
4 1028
 
10.3%
3 1217
 
12.2%
2 3747
37.5%
1 3732
37.3%

차량순서
Real number (ℝ)

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.4848
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:43:24.932070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q38
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.8687265
Coefficient of variation (CV)0.52303211
Kurtosis-1.2207575
Mean5.4848
Median Absolute Deviation (MAD)2
Skewness0.0039800821
Sum54848
Variance8.2295919
MonotonicityNot monotonic
2023-12-12T14:43:25.062305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1 1010
10.1%
5 1006
10.1%
3 1006
10.1%
7 1005
10.1%
8 1003
10.0%
6 1002
10.0%
2 999
10.0%
4 997
10.0%
9 987
9.9%
10 985
9.8%
ValueCountFrequency (%)
1 1010
10.1%
2 999
10.0%
3 1006
10.1%
4 997
10.0%
5 1006
10.1%
6 1002
10.0%
7 1005
10.1%
8 1003
10.0%
9 987
9.9%
10 985
9.8%
ValueCountFrequency (%)
10 985
9.8%
9 987
9.9%
8 1003
10.0%
7 1005
10.1%
6 1002
10.0%
5 1006
10.1%
4 997
10.0%
3 1006
10.1%
2 999
10.0%
1 1010
10.1%
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
3
2510 
1
2505 
4
2497 
2
2488 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row1
4th row3
5th row3

Common Values

ValueCountFrequency (%)
3 2510
25.1%
1 2505
25.1%
4 2497
25.0%
2 2488
24.9%

Length

2023-12-12T14:43:25.200458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:43:25.325976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3 2510
25.1%
1 2505
25.1%
4 2497
25.0%
2 2488
24.9%

안전거리
Real number (ℝ)

Distinct212
Distinct (%)2.1%
Missing39
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean10.912483
Minimum0.5
Maximum36
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:43:25.463231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.5
5-th percentile6
Q18.9
median10.1
Q313
95-th percentile16.5
Maximum36
Range35.5
Interquartile range (IQR)4.1

Descriptive statistics

Standard deviation3.4570659
Coefficient of variation (CV)0.31679921
Kurtosis3.1711369
Mean10.912483
Median Absolute Deviation (MAD)2.1
Skewness0.99864724
Sum108699.24
Variance11.951305
MonotonicityNot monotonic
2023-12-12T14:43:25.648185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.0 903
 
9.0%
10.0 830
 
8.3%
8.0 559
 
5.6%
11.0 543
 
5.4%
13.0 464
 
4.6%
14.0 460
 
4.6%
12.0 412
 
4.1%
9.5 360
 
3.6%
15.0 344
 
3.4%
7.0 260
 
2.6%
Other values (202) 4826
48.3%
ValueCountFrequency (%)
0.5 1
 
< 0.1%
0.7 1
 
< 0.1%
1.0 1
 
< 0.1%
1.5 1
 
< 0.1%
1.9 3
 
< 0.1%
2.0 12
0.1%
2.1 1
 
< 0.1%
2.2 2
 
< 0.1%
2.3 4
 
< 0.1%
2.4 3
 
< 0.1%
ValueCountFrequency (%)
36.0 1
 
< 0.1%
35.0 1
 
< 0.1%
34.0 1
 
< 0.1%
33.0 1
 
< 0.1%
32.0 1
 
< 0.1%
30.0 2
 
< 0.1%
29.0 3
< 0.1%
28.0 7
0.1%
27.0 7
0.1%
26.9 1
 
< 0.1%

Interactions

2023-12-12T14:43:22.709127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:43:22.028963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:43:22.368821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:43:22.820954image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:43:22.154233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:43:22.503507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:43:22.950686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:43:22.245441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:43:22.608021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:43:25.752582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관명역명승강장번호차량순서차량출입문번호안전거리
철도운영기관명1.0001.0000.1660.0000.0000.171
역명1.0001.0000.7090.0000.0000.780
승강장번호0.1660.7091.0000.0000.0000.166
차량순서0.0000.0000.0001.0000.0000.084
차량출입문번호0.0000.0000.0000.0001.0000.000
안전거리0.1710.7800.1660.0840.0001.000
2023-12-12T14:43:25.875849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
차량출입문번호철도운영기관명
차량출입문번호1.0000.000
철도운영기관명0.0001.000
2023-12-12T14:43:25.967339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승강장번호차량순서안전거리철도운영기관명차량출입문번호
승강장번호1.000-0.0040.0140.1660.000
차량순서-0.0041.0000.0050.0000.000
안전거리0.0140.0051.0000.1310.000
철도운영기관명0.1660.0000.1311.0000.000
차량출입문번호0.0000.0000.0000.0001.000

Missing values

2023-12-12T14:43:23.105616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:43:23.257141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명승강장번호차량순서차량출입문번호안전거리
7879코레일1호선오류동11037.5
5588코레일1호선석수1719.8
5917코레일1호선세마110114.0
1872코레일1호선구일1939.0
9978코레일1호선평택25315.0
6118코레일1호선소요산2104<NA>
319서울교통공사1호선시청21033.0
9588코레일1호선직산2718.9
7736코레일1호선영등포34410.5
4326코레일1호선방학12213.0
철도운영기관명선명역명승강장번호차량순서차량출입문번호안전거리
6914코레일1호선신이문19215.0
7371코레일1호선양주13110.0
276서울교통공사1호선시청11013.0
9729코레일1호선창동23412.5
7478코레일1호선양주310211.0
735서울교통공사1호선청량리(서울시립대입구)14412.0
6016코레일1호선소사2418.8
1928코레일1호선구일3339.5
7968코레일1호선오산22113.8
2317코레일1호선금천구청21039.0