Overview

Dataset statistics

Number of variables10
Number of observations337
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory27.1 KiB
Average record size in memory82.4 B

Variable types

Categorical7
Text2
DateTime1

Dataset

Description파일 다운로드
Author서울 교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13241/F/1/datasetView.do

Alerts

운행구간 is highly overall correlated with 기기명 and 3 other fieldsHigh correlation
형식 is highly overall correlated with 기기명 and 2 other fieldsHigh correlation
기기명 is highly overall correlated with 형식 and 2 other fieldsHigh correlation
정지 층수 is highly overall correlated with 운행구간High correlation
규격 is highly overall correlated with 기기명 and 2 other fieldsHigh correlation
기기명 is highly imbalanced (88.6%)Imbalance
형식 is highly imbalanced (88.3%)Imbalance
규격 is highly imbalanced (58.4%)Imbalance
정지 층수 is highly imbalanced (77.1%)Imbalance

Reproduction

Analysis started2023-12-11 04:00:31.440460
Analysis finished2023-12-11 04:00:32.424771
Duration0.98 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

Distinct4
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
2
147 
3
81 
4
72 
1
37 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 147
43.6%
3 81
24.0%
4 72
21.4%
1 37
 
11.0%

Length

2023-12-11T13:00:32.522077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T13:00:32.679029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 147
43.6%
3 81
24.0%
4 72
21.4%
1 37
 
11.0%
Distinct117
Distinct (%)34.7%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
2023-12-11T13:00:32.987729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length3.4183976
Min length2

Characters and Unicode

Total characters1152
Distinct characters145
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)2.1%

Sample

1st row서울역(1)
2nd row서울역(1)
3rd row서울역(1)
4th row서울역(1)
5th row시청(1)
ValueCountFrequency (%)
동묘앞 7
 
2.0%
신설동(1 5
 
1.4%
삼성 5
 
1.4%
성신여대입구 4
 
1.1%
서울역(1 4
 
1.1%
역삼 4
 
1.1%
용두 4
 
1.1%
경찰병원역 4
 
1.1%
수서 4
 
1.1%
잠실 4
 
1.1%
Other values (111) 303
87.1%
2023-12-11T13:00:33.515180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
57
 
4.9%
) 47
 
4.1%
47
 
4.1%
( 47
 
4.1%
44
 
3.8%
37
 
3.2%
28
 
2.4%
27
 
2.3%
25
 
2.2%
22
 
1.9%
Other values (135) 771
66.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 987
85.7%
Decimal Number 60
 
5.2%
Close Punctuation 47
 
4.1%
Open Punctuation 47
 
4.1%
Space Separator 11
 
1.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
57
 
5.8%
47
 
4.8%
44
 
4.5%
37
 
3.7%
28
 
2.8%
27
 
2.7%
25
 
2.5%
22
 
2.2%
22
 
2.2%
19
 
1.9%
Other values (127) 659
66.8%
Decimal Number
ValueCountFrequency (%)
1 18
30.0%
2 15
25.0%
4 14
23.3%
3 10
16.7%
5 3
 
5.0%
Close Punctuation
ValueCountFrequency (%)
) 47
100.0%
Open Punctuation
ValueCountFrequency (%)
( 47
100.0%
Space Separator
ValueCountFrequency (%)
11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 987
85.7%
Common 165
 
14.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
57
 
5.8%
47
 
4.8%
44
 
4.5%
37
 
3.7%
28
 
2.8%
27
 
2.7%
25
 
2.5%
22
 
2.2%
22
 
2.2%
19
 
1.9%
Other values (127) 659
66.8%
Common
ValueCountFrequency (%)
) 47
28.5%
( 47
28.5%
1 18
 
10.9%
2 15
 
9.1%
4 14
 
8.5%
11
 
6.7%
3 10
 
6.1%
5 3
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 987
85.7%
ASCII 165
 
14.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
57
 
5.8%
47
 
4.8%
44
 
4.5%
37
 
3.7%
28
 
2.8%
27
 
2.7%
25
 
2.5%
22
 
2.2%
22
 
2.2%
19
 
1.9%
Other values (127) 659
66.8%
ASCII
ValueCountFrequency (%)
) 47
28.5%
( 47
28.5%
1 18
 
10.9%
2 15
 
9.1%
4 14
 
8.5%
11
 
6.7%
3 10
 
6.1%
5 3
 
1.8%

기기명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
E/V
329 
V/L
 
6
E/V경사
 
2

Length

Max length5
Median length3
Mean length3.0118694
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowE/V
2nd rowE/V
3rd rowE/V
4th rowE/V
5th rowE/V

Common Values

ValueCountFrequency (%)
E/V 329
97.6%
V/L 6
 
1.8%
E/V경사 2
 
0.6%

Length

2023-12-11T13:00:33.714631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T13:00:33.882252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
e/v 329
97.6%
v/l 6
 
1.8%
e/v경사 2
 
0.6%

호기
Categorical

Distinct12
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
내부#1
113 
외부#1
101 
내부#2
77 
외부#2
35 
외부#3
 
4
Other values (7)
 
7

Length

Max length11
Median length4
Mean length4.0148368
Min length3

Unique

Unique7 ?
Unique (%)2.1%

Sample

1st row내부#1
2nd row외부#1
3rd row외부#2
4th row외부#3
5th row내부#1

Common Values

ValueCountFrequency (%)
내부#1 113
33.5%
외부#1 101
30.0%
내부#2 77
22.8%
외부#2 35
 
10.4%
외부#3 4
 
1.2%
내부#3 1
 
0.3%
내부#4 1
 
0.3%
내부#5 1
 
0.3%
외부#1 (내부겸용) 1
 
0.3%
외부#4 1
 
0.3%
Other values (2) 2
 
0.6%

Length

2023-12-11T13:00:34.040916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
내부#1 113
33.4%
외부#1 102
30.2%
내부#2 77
22.8%
외부#2 35
 
10.4%
외부#3 4
 
1.2%
내부#3 1
 
0.3%
내부#4 1
 
0.3%
내부#5 1
 
0.3%
내부겸용 1
 
0.3%
외부#4 1
 
0.3%
Other values (2) 2
 
0.6%

형식
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
로프식(MRL)
327 
유압식(V/L)
 
6
로프식(일반)
 
2
경사형
 
2

Length

Max length8
Median length8
Mean length7.9643917
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row로프식(MRL)
2nd row로프식(MRL)
3rd row로프식(MRL)
4th row로프식(MRL)
5th row로프식(MRL)

Common Values

ValueCountFrequency (%)
로프식(MRL) 327
97.0%
유압식(V/L) 6
 
1.8%
로프식(일반) 2
 
0.6%
경사형 2
 
0.6%

Length

2023-12-11T13:00:34.275173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T13:00:34.414839image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
로프식(mrl 327
97.0%
유압식(v/l 6
 
1.8%
로프식(일반 2
 
0.6%
경사형 2
 
0.6%

규격
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
15인승
252 
11인승
54 
13인승
 
13
9인승
 
8
4인승
 
5
Other values (3)
 
5

Length

Max length5
Median length4
Mean length3.9643917
Min length3

Unique

Unique2 ?
Unique (%)0.6%

Sample

1st row11인승
2nd row13인승
3rd row15인승
4th row15인승
5th row15인승

Common Values

ValueCountFrequency (%)
15인승 252
74.8%
11인승 54
 
16.0%
13인승 13
 
3.9%
9인승 8
 
2.4%
4인승 5
 
1.5%
17인승 3
 
0.9%
20인승 1
 
0.3%
340㎏용 1
 
0.3%

Length

2023-12-11T13:00:34.604189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T13:00:34.789278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
15인승 252
74.8%
11인승 54
 
16.0%
13인승 13
 
3.9%
9인승 8
 
2.4%
4인승 5
 
1.5%
17인승 3
 
0.9%
20인승 1
 
0.3%
340㎏용 1
 
0.3%

운행구간
Categorical

HIGH CORRELATION 

Distinct35
Distinct (%)10.4%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
B2(승)~B1(대)
111 
B1(대)~지상
101 
F2(대)~F3(승)
22 
B3(승)~B2(대)
21 
B2(대)~지상
13 
Other values (30)
69 

Length

Max length20
Median length11
Mean length9.9970326
Min length8

Unique

Unique15 ?
Unique (%)4.5%

Sample

1st rowB2(승)~B1(대)
2nd rowB1(대)~지상
3rd rowB1(대)~지상
4th rowB1(대)~지상
5th rowB2(승)~B1(대)

Common Values

ValueCountFrequency (%)
B2(승)~B1(대) 111
32.9%
B1(대)~지상 101
30.0%
F2(대)~F3(승) 22
 
6.5%
B3(승)~B2(대) 21
 
6.2%
B2(대)~지상 13
 
3.9%
지상~F2(대) 13
 
3.9%
B3(승)~B1(대) 5
 
1.5%
F1(대)~F2(승) 5
 
1.5%
B3(승)~B2(대)~B1(대) 4
 
1.2%
F1(승)~F2(대) 4
 
1.2%
Other values (25) 38
 
11.3%

Length

2023-12-11T13:00:34.971522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b2(승)~b1(대 111
32.9%
b1(대)~지상 101
30.0%
f2(대)~f3(승 22
 
6.5%
b3(승)~b2(대 21
 
6.2%
b2(대)~지상 13
 
3.9%
지상~f2(대 13
 
3.9%
b3(승)~b1(대 5
 
1.5%
f1(대)~f2(승 5
 
1.5%
b3(승)~b2(대)~b1(대 4
 
1.2%
f1(승)~f2(대 4
 
1.2%
Other values (25) 38
 
11.3%
Distinct155
Distinct (%)46.0%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
2023-12-11T13:00:35.284583image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length6
Mean length6.5816024
Min length4

Characters and Unicode

Total characters2218
Distinct characters38
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique94 ?
Unique (%)27.9%

Sample

1st row섬식(상) 5-1
2nd row2번 출구측
3rd row4번 출구측
4th row3번 출구측
5th row상행 3-2
ValueCountFrequency (%)
출구측 114
 
17.3%
상행 42
 
6.4%
하행 42
 
6.4%
외선 33
 
5.0%
내선 33
 
5.0%
1번 31
 
4.7%
섬식(상 18
 
2.7%
3-2 17
 
2.6%
3번 16
 
2.4%
2번 15
 
2.3%
Other values (86) 299
45.3%
2023-12-11T13:00:35.791538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
325
14.7%
- 210
 
9.5%
144
 
6.5%
144
 
6.5%
144
 
6.5%
3 122
 
5.5%
118
 
5.3%
1 113
 
5.1%
4 101
 
4.6%
84
 
3.8%
Other values (28) 713
32.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1016
45.8%
Decimal Number 572
25.8%
Space Separator 325
 
14.7%
Dash Punctuation 210
 
9.5%
Open Punctuation 44
 
2.0%
Close Punctuation 44
 
2.0%
Math Symbol 4
 
0.2%
Other Punctuation 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
144
14.2%
144
14.2%
144
14.2%
118
11.6%
84
8.3%
68
6.7%
66
6.5%
47
 
4.6%
42
 
4.1%
42
 
4.1%
Other values (12) 117
11.5%
Decimal Number
ValueCountFrequency (%)
3 122
21.3%
1 113
19.8%
4 101
17.7%
2 81
14.2%
8 49
8.6%
6 33
 
5.8%
7 27
 
4.7%
5 22
 
3.8%
0 12
 
2.1%
9 12
 
2.1%
Space Separator
ValueCountFrequency (%)
325
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 210
100.0%
Open Punctuation
ValueCountFrequency (%)
( 44
100.0%
Close Punctuation
ValueCountFrequency (%)
) 44
100.0%
Math Symbol
ValueCountFrequency (%)
~ 4
100.0%
Other Punctuation
ValueCountFrequency (%)
, 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1202
54.2%
Hangul 1016
45.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
144
14.2%
144
14.2%
144
14.2%
118
11.6%
84
8.3%
68
6.7%
66
6.5%
47
 
4.6%
42
 
4.1%
42
 
4.1%
Other values (12) 117
11.5%
Common
ValueCountFrequency (%)
325
27.0%
- 210
17.5%
3 122
 
10.1%
1 113
 
9.4%
4 101
 
8.4%
2 81
 
6.7%
8 49
 
4.1%
( 44
 
3.7%
) 44
 
3.7%
6 33
 
2.7%
Other values (6) 80
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1202
54.2%
Hangul 1016
45.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
325
27.0%
- 210
17.5%
3 122
 
10.1%
1 113
 
9.4%
4 101
 
8.4%
2 81
 
6.7%
8 49
 
4.1%
( 44
 
3.7%
) 44
 
3.7%
6 33
 
2.7%
Other values (6) 80
 
6.7%
Hangul
ValueCountFrequency (%)
144
14.2%
144
14.2%
144
14.2%
118
11.6%
84
8.3%
68
6.7%
66
6.5%
47
 
4.6%
42
 
4.1%
42
 
4.1%
Other values (12) 117
11.5%

정지 층수
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
2
311 
3
 
21
4
 
4
6
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 311
92.3%
3 21
 
6.2%
4 4
 
1.2%
6 1
 
0.3%

Length

2023-12-11T13:00:35.980825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T13:00:36.128345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 311
92.3%
3 21
 
6.2%
4 4
 
1.2%
6 1
 
0.3%
Distinct177
Distinct (%)52.5%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
Minimum1993-06-01 00:00:00
Maximum2016-03-18 00:00:00
2023-12-11T13:00:36.285467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T13:00:36.491021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Correlations

2023-12-11T13:00:36.643547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선기기명호기형식규격운행구간정지 층수
호선1.0000.0000.1770.0910.0000.5940.211
기기명0.0001.0000.0001.0000.7910.8360.000
호기0.1770.0001.0000.0000.6740.8760.000
형식0.0911.0000.0001.0000.8850.9350.807
규격0.0000.7910.6740.8851.0000.8500.192
운행구간0.5940.8360.8760.9350.8501.0000.992
정지\n층수0.2110.0000.0000.8070.1920.9921.000
2023-12-11T13:00:36.802349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호기규격운행구간형식호선기기명정지 층수
호기1.0000.3590.4940.0000.0810.0000.000
규격0.3591.0000.5160.5710.0000.7050.086
운행구간0.4940.5161.0000.7400.3300.6050.915
형식0.0000.5710.7401.0000.0360.9990.448
호선0.0810.0000.3300.0361.0000.0000.084
기기명0.0000.7050.6050.9990.0001.0000.000
정지\n층수0.0000.0860.9150.4480.0840.0001.000
2023-12-11T13:00:36.959014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선기기명호기형식규격운행구간정지 층수
호선1.0000.0000.0810.0360.0000.3300.084
기기명0.0001.0000.0000.9990.7050.6050.000
호기0.0810.0001.0000.0000.3590.4940.000
형식0.0360.9990.0001.0000.5710.7400.448
규격0.0000.7050.3590.5711.0000.5160.086
운행구간0.3300.6050.4940.7400.5161.0000.915
정지\n층수0.0840.0000.0000.4480.0860.9151.000

Missing values

2023-12-11T13:00:32.145227image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T13:00:32.345062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선역사명기기명호기형식규격운행구간설치위치 (승차위치기준)정지 층수설치일자
01서울역(1)E/V내부#1로프식(MRL)11인승B2(승)~B1(대)섬식(상) 5-122014-12-02
11서울역(1)E/V외부#1로프식(MRL)13인승B1(대)~지상2번 출구측22013-08-02
21서울역(1)E/V외부#2로프식(MRL)15인승B1(대)~지상4번 출구측22000-05-01
31서울역(1)E/V외부#3로프식(MRL)15인승B1(대)~지상3번 출구측22014-11-01
41시청(1)E/V내부#1로프식(MRL)15인승B2(승)~B1(대)상행 3-222004-03-13
51시청(1)E/V내부#2로프식(MRL)15인승B2(승)~B1(대)하행 8-322004-03-13
61시청(1)E/V외부#1로프식(MRL)15인승B1(대)~지상1-2번 출구사이22005-04-01
71종각E/V내부#1로프식(MRL)15인승B2(승)~B1(대)상행 6-422005-01-25
81종각E/V내부#2로프식(MRL)15인승B2(승)~B1(대)하행 4-422004-04-17
91종각E/V외부#1로프식(MRL)15인승B1(대)~지상3번 출구측22004-11-05
호선역사명기기명호기형식규격운행구간설치위치 (승차위치기준)정지 층수설치일자
3274동 작E/V내부#2로프식(MRL)11인승F1(대)~F2(대)~F3(승)상행 7-432010-03-19
3284총신대입구E/V내부#1로프식(MRL)15인승B2(승)~B1(대)하행 4-422015-06-02
3294총신대입구E/V내부#2로프식(MRL)15인승B2(승)~B1(대)상행 7-322015-08-07
3304총신대입구E/V외부#1로프식(MRL)15인승B1(대)~지상하행 8-322014-08-15
3314총신대입구E/V외부#2로프식(MRL)15인승B1(대)~지상1번 출구측22015-06-02
3324사당(4)E/V내부#1로프식(MRL)13인승B3(승)~B2(대)~B1(대)14번 출구측32003-05-12
3334사당(4)E/V외부#1로프식(MRL)15인승B1(대)~지상섬식(상)09-222004-07-30
3344남태령E/V경사내부#1경사형11인승B3(대)~B2(대)9-10번출구사이22005-12-15
3354남태령E/V내부#2로프식(MRL)15인승B2(대)~B1(대)섬식(상) 3-322005-12-15
3364남태령E/V외부#1로프식(MRL)15인승B1(대)~지상1번 출구측22005-12-15