Overview

Dataset statistics

Number of variables8
Number of observations145
Missing cells1
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.5 KiB
Average record size in memory66.9 B

Variable types

Numeric2
Text2
Categorical4

Dataset

Description파일 다운로드
Author서울 교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13192/F/1/datasetView.do

Alerts

장비 has constant value ""Constant
연번 is highly overall correlated with 호선High correlation
호선 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
운행구간 is highly overall correlated with 호선 and 1 other fieldsHigh correlation
설치위치 is highly overall correlated with 호선 and 1 other fieldsHigh correlation
설치위치 is highly imbalanced (51.2%)Imbalance
연번 has unique valuesUnique
승강기번호 has unique valuesUnique

Reproduction

Analysis started2023-12-11 06:11:22.101083
Analysis finished2023-12-11 06:11:23.383163
Duration1.28 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct145
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73
Minimum1
Maximum145
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-11T15:11:23.469407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8.2
Q137
median73
Q3109
95-th percentile137.8
Maximum145
Range144
Interquartile range (IQR)72

Descriptive statistics

Standard deviation42.001984
Coefficient of variation (CV)0.57536964
Kurtosis-1.2
Mean73
Median Absolute Deviation (MAD)36
Skewness0
Sum10585
Variance1764.1667
MonotonicityStrictly increasing
2023-12-11T15:11:23.659354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.7%
110 1
 
0.7%
94 1
 
0.7%
95 1
 
0.7%
96 1
 
0.7%
97 1
 
0.7%
98 1
 
0.7%
99 1
 
0.7%
100 1
 
0.7%
101 1
 
0.7%
Other values (135) 135
93.1%
ValueCountFrequency (%)
1 1
0.7%
2 1
0.7%
3 1
0.7%
4 1
0.7%
5 1
0.7%
6 1
0.7%
7 1
0.7%
8 1
0.7%
9 1
0.7%
10 1
0.7%
ValueCountFrequency (%)
145 1
0.7%
144 1
0.7%
143 1
0.7%
142 1
0.7%
141 1
0.7%
140 1
0.7%
139 1
0.7%
138 1
0.7%
137 1
0.7%
136 1
0.7%

호선
Real number (ℝ)

HIGH CORRELATION 

Distinct8
Distinct (%)5.6%
Missing1
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean5.5138889
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-11T15:11:23.809608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1.15
Q15
median6
Q37
95-th percentile8
Maximum8
Range7
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.9432008
Coefficient of variation (CV)0.3524193
Kurtosis0.018416166
Mean5.5138889
Median Absolute Deviation (MAD)1
Skewness-0.94326672
Sum794
Variance3.7760295
MonotonicityIncreasing
2023-12-11T15:11:23.962208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
7 41
28.3%
6 33
22.8%
5 23
15.9%
8 14
 
9.7%
2 12
 
8.3%
4 11
 
7.6%
1 8
 
5.5%
3 2
 
1.4%
(Missing) 1
 
0.7%
ValueCountFrequency (%)
1 8
 
5.5%
2 12
 
8.3%
3 2
 
1.4%
4 11
 
7.6%
5 23
15.9%
6 33
22.8%
7 41
28.3%
8 14
 
9.7%
ValueCountFrequency (%)
8 14
 
9.7%
7 41
28.3%
6 33
22.8%
5 23
15.9%
4 11
 
7.6%
3 2
 
1.4%
2 12
 
8.3%
1 8
 
5.5%

역명
Text

Distinct68
Distinct (%)46.9%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
2023-12-11T15:11:24.230023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length3.5103448
Min length2

Characters and Unicode

Total characters509
Distinct characters108
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)22.8%

Sample

1st row서울(1)
2nd row신설동(1)
3rd row신설동(1)
4th row신설동(1)
5th row신설동(1)
ValueCountFrequency (%)
남구로 11
 
7.6%
신설동(2 6
 
4.1%
신당 5
 
3.4%
신설동(1 5
 
3.4%
잠실 5
 
3.4%
고속터미널 5
 
3.4%
온수 5
 
3.4%
가산디지털단지 4
 
2.8%
마포구청 4
 
2.8%
이수 4
 
2.8%
Other values (58) 91
62.8%
2023-12-11T15:11:24.701272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
24
 
4.7%
24
 
4.7%
( 19
 
3.7%
19
 
3.7%
) 19
 
3.7%
14
 
2.8%
14
 
2.8%
13
 
2.6%
13
 
2.6%
13
 
2.6%
Other values (98) 337
66.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 451
88.6%
Decimal Number 20
 
3.9%
Open Punctuation 19
 
3.7%
Close Punctuation 19
 
3.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
24
 
5.3%
24
 
5.3%
19
 
4.2%
14
 
3.1%
14
 
3.1%
13
 
2.9%
13
 
2.9%
13
 
2.9%
12
 
2.7%
11
 
2.4%
Other values (92) 294
65.2%
Decimal Number
ValueCountFrequency (%)
1 8
40.0%
2 6
30.0%
4 4
20.0%
3 2
 
10.0%
Open Punctuation
ValueCountFrequency (%)
( 19
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 451
88.6%
Common 58
 
11.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
24
 
5.3%
24
 
5.3%
19
 
4.2%
14
 
3.1%
14
 
3.1%
13
 
2.9%
13
 
2.9%
13
 
2.9%
12
 
2.7%
11
 
2.4%
Other values (92) 294
65.2%
Common
ValueCountFrequency (%)
( 19
32.8%
) 19
32.8%
1 8
13.8%
2 6
 
10.3%
4 4
 
6.9%
3 2
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 451
88.6%
ASCII 58
 
11.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
24
 
5.3%
24
 
5.3%
19
 
4.2%
14
 
3.1%
14
 
3.1%
13
 
2.9%
13
 
2.9%
13
 
2.9%
12
 
2.7%
11
 
2.4%
Other values (92) 294
65.2%
ASCII
ValueCountFrequency (%)
( 19
32.8%
) 19
32.8%
1 8
13.8%
2 6
 
10.3%
4 4
 
6.9%
3 2
 
3.4%

장비
Categorical

CONSTANT 

Distinct1
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
W/L
145 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowW/L
2nd rowW/L
3rd rowW/L
4th rowW/L
5th rowW/L

Common Values

ValueCountFrequency (%)
W/L 145
100.0%

Length

2023-12-11T15:11:24.878935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T15:11:24.996499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
w/l 145
100.0%

호기
Categorical

Distinct19
Distinct (%)13.1%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
1
47 
2
27 
3
17 
내부#1
15 
4
11 
Other values (14)
28 

Length

Max length4
Median length1
Mean length1.6965517
Min length1

Unique

Unique8 ?
Unique (%)5.5%

Sample

1st row내부#1
2nd row외부#2
3rd row내부#1
4th row내부#2
5th row내부#3

Common Values

ValueCountFrequency (%)
1 47
32.4%
2 27
18.6%
3 17
 
11.7%
내부#1 15
 
10.3%
4 11
 
7.6%
내부#2 5
 
3.4%
5 4
 
2.8%
외부#1 3
 
2.1%
내부#3 3
 
2.1%
내부#4 3
 
2.1%
Other values (9) 10
 
6.9%

Length

2023-12-11T15:11:25.189771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 47
32.4%
2 27
18.6%
3 17
 
11.7%
내부#1 15
 
10.3%
4 11
 
7.6%
내부#2 5
 
3.4%
5 4
 
2.8%
내부#3 3
 
2.1%
내부#4 3
 
2.1%
외부#1 3
 
2.1%
Other values (9) 10
 
6.9%

승강기번호
Text

UNIQUE 

Distinct145
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
2023-12-11T15:11:25.587147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters1160
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique145 ?
Unique (%)100.0%

Sample

1st row1902-362
2nd row1903-467
3rd row1903-415
4th row1903-417
5th row1903-416
ValueCountFrequency (%)
1902-362 1
 
0.7%
1900-246 1
 
0.7%
1904-018 1
 
0.7%
1901-957 1
 
0.7%
1901-956 1
 
0.7%
1901-913 1
 
0.7%
1901-914 1
 
0.7%
1901-555 1
 
0.7%
1900-831 1
 
0.7%
1901-785 1
 
0.7%
Other values (135) 135
93.1%
2023-12-11T15:11:26.136007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 250
21.6%
0 244
21.0%
9 176
15.2%
- 145
12.5%
3 71
 
6.1%
2 68
 
5.9%
4 67
 
5.8%
7 44
 
3.8%
5 37
 
3.2%
6 29
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1015
87.5%
Dash Punctuation 145
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 250
24.6%
0 244
24.0%
9 176
17.3%
3 71
 
7.0%
2 68
 
6.7%
4 67
 
6.6%
7 44
 
4.3%
5 37
 
3.6%
6 29
 
2.9%
8 29
 
2.9%
Dash Punctuation
ValueCountFrequency (%)
- 145
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 250
21.6%
0 244
21.0%
9 176
15.2%
- 145
12.5%
3 71
 
6.1%
2 68
 
5.9%
4 67
 
5.8%
7 44
 
3.8%
5 37
 
3.2%
6 29
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 250
21.6%
0 244
21.0%
9 176
15.2%
- 145
12.5%
3 71
 
6.1%
2 68
 
5.9%
4 67
 
5.8%
7 44
 
3.8%
5 37
 
3.2%
6 29
 
2.5%

운행구간
Categorical

HIGH CORRELATION 

Distinct36
Distinct (%)24.8%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
B1-B2
30 
B2-B3
25 
F1-B1
16 
<NA>
B3-B4
 
5
Other values (31)
62 

Length

Max length11
Median length5
Mean length6.062069
Min length4

Unique

Unique15 ?
Unique (%)10.3%

Sample

1st rowB2(승)~B1(대)
2nd row지상~B1(대)
3rd rowB2(승)~B1(대)
4th rowB2(승)~B1(대)
5th rowB1(대)~B1(대)

Common Values

ValueCountFrequency (%)
B1-B2 30
20.7%
B2-B3 25
17.2%
F1-B1 16
 
11.0%
<NA> 7
 
4.8%
B3-B4 5
 
3.4%
B1(대)⇔B1(대) 5
 
3.4%
B4-B5 5
 
3.4%
BM1-B1 4
 
2.8%
B5-B4 4
 
2.8%
B2(승)~B1(대) 3
 
2.1%
Other values (26) 41
28.3%

Length

2023-12-11T15:11:26.336042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b1-b2 30
20.7%
b2-b3 25
17.2%
f1-b1 16
 
11.0%
na 7
 
4.8%
b3-b4 5
 
3.4%
b1(대)⇔b1(대 5
 
3.4%
b4-b5 5
 
3.4%
bm1-b1 4
 
2.8%
b5-b4 4
 
2.8%
b2(승)~b1(대 3
 
2.1%
Other values (26) 41
28.3%

설치위치
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct35
Distinct (%)24.1%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
내부
90 
외부
21 
대합실 연결통로
 
2
상선승강장 시점측
 
1
하선승강장 시점측
 
1
Other values (30)
30 

Length

Max length16
Median length2
Mean length3.6965517
Min length2

Unique

Unique32 ?
Unique (%)22.1%

Sample

1st row내부 C 계단
2nd row6번 출입구
3rd row상선승강장 시점측
4th row하선승강장 시점측
5th row대합실 연결통로

Common Values

ValueCountFrequency (%)
내부 90
62.1%
외부 21
 
14.5%
대합실 연결통로 2
 
1.4%
상선승강장 시점측 1
 
0.7%
하선승강장 시점측 1
 
0.7%
제기동측 승강장 1
 
0.7%
섬식(상)8-2 1
 
0.7%
섬식(외) 8-2 1
 
0.7%
성수측 승강장 1
 
0.7%
6번 출입구 1
 
0.7%
Other values (25) 25
 
17.2%

Length

2023-12-11T15:11:26.513162image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
내부 91
48.9%
외부 21
 
11.3%
연결통로 5
 
2.7%
승강장 5
 
2.7%
대합실 4
 
2.2%
1호선 4
 
2.2%
계단 3
 
1.6%
시점측 2
 
1.1%
연결계단 2
 
1.1%
환승통로 2
 
1.1%
Other values (47) 47
25.3%

Interactions

2023-12-11T15:11:22.882487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T15:11:22.641256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T15:11:22.996589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T15:11:22.765458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T15:11:26.635835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선역명호기운행구간설치위치
연번1.0000.9130.9920.6180.8590.465
호선0.9131.0000.9920.6290.9250.915
역명0.9920.9921.0000.0000.9410.876
호기0.6180.6290.0001.0000.8950.916
운행구간0.8590.9250.9410.8951.0000.976
설치위치0.4650.9150.8760.9160.9761.000
2023-12-11T15:11:26.775651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
운행구간설치위치호기
운행구간1.0000.6520.442
설치위치0.6521.0000.489
호기0.4420.4891.000
2023-12-11T15:11:26.918862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선호기운행구간설치위치
연번1.0000.9790.2730.4510.162
호선0.9791.0000.3090.6160.598
호기0.2730.3091.0000.4420.489
운행구간0.4510.6160.4421.0000.652
설치위치0.1620.5980.4890.6521.000

Missing values

2023-12-11T15:11:23.136914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T15:11:23.331268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번호선역명장비호기승강기번호운행구간설치위치
011서울(1)W/L내부#11902-362B2(승)~B1(대)내부 C 계단
121신설동(1)W/L외부#21903-467지상~B1(대)6번 출입구
231신설동(1)W/L내부#11903-415B2(승)~B1(대)상선승강장 시점측
341신설동(1)W/L내부#21903-417B2(승)~B1(대)하선승강장 시점측
451신설동(1)W/L내부#31903-416B1(대)~B1(대)대합실 연결통로
561신설동(1)W/L내부#41903-414B1(대)~B1(대)대합실 연결통로
671청량리(1)W/L내부#11903-411B2(대)~B1(승)제기동측 승강장
781청량리(1)W/L내부#21901-053B2(대)~B1(승)섬식(상)8-2
892한양대W/L내부#11900-169F1(승)~F2(대)섬식(외) 8-2
9102용답W/L내부#11900-276F1(대)⇔F2(승)성수측 승강장
연번호선역명장비호기승강기번호운행구간설치위치
1351368잠실W/L21900-790B2-B3내부
1361378잠실W/L31900-791B1-B2내부
1371388잠실W/L43904-584<NA><NA>
1381398복정W/L11900-771F1-B1외부
1391408남한산성입구W/L13903-032B1-B2내부
1401418남한산성입구W/L23903-033F1-B1외부
1411428수진W/L13903-007F1-B1외부
1421438모란W/L13900-941B2-B3내부
1431448모란W/L23900-942B2-B3내부
144145<NA>대공원어린이집W/L11902-175F1-F2외부