Overview

Dataset statistics

Number of variables6
Number of observations1513
Missing cells1263
Missing cells (%)13.9%
Duplicate rows12
Duplicate rows (%)0.8%
Total size in memory71.1 KiB
Average record size in memory48.1 B

Variable types

Categorical4
Text2

Dataset

Description수도권2호선에 포함된 도시광역철도역들의 철도운영기관명,선명,역명,출구번호,출구별 주요시설명, 주소 등의 데이터 입니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15073461/fileData.do

Alerts

철도운영기관명 has constant value ""Constant
선명 has constant value ""Constant
Dataset has 12 (0.8%) duplicate rowsDuplicates
주소 has 1263 (83.5%) missing valuesMissing

Reproduction

Analysis started2023-12-12 00:17:07.387220
Analysis finished2023-12-12 00:17:07.953229
Duration0.57 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.9 KiB
서울교통공사
1513 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울교통공사
2nd row서울교통공사
3rd row서울교통공사
4th row서울교통공사
5th row서울교통공사

Common Values

ValueCountFrequency (%)
서울교통공사 1513
100.0%

Length

2023-12-12T09:17:08.030526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T09:17:08.130687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울교통공사 1513
100.0%

선명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.9 KiB
2호선
1513 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2호선
2nd row2호선
3rd row2호선
4th row2호선
5th row2호선

Common Values

ValueCountFrequency (%)
2호선 1513
100.0%

Length

2023-12-12T09:17:08.244923image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T09:17:08.351982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2호선 1513
100.0%

역명
Categorical

Distinct41
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Memory size11.9 KiB
서울대입구(관악구청)
 
69
사당
 
66
을지로입구
 
63
성수
 
62
을지로3가
 
56
Other values (36)
1197 

Length

Max length11
Median length10
Mean length4.5459352
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강남
2nd row강남
3rd row강남
4th row강남
5th row강남

Common Values

ValueCountFrequency (%)
서울대입구(관악구청) 69
 
4.6%
사당 66
 
4.4%
을지로입구 63
 
4.2%
성수 62
 
4.1%
을지로3가 56
 
3.7%
대림(구로구청) 52
 
3.4%
신당 49
 
3.2%
강남 49
 
3.2%
교대(법원·검찰청) 48
 
3.2%
선릉 48
 
3.2%
Other values (31) 951
62.9%

Length

2023-12-12T09:17:08.455781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울대입구(관악구청 69
 
4.6%
사당 66
 
4.4%
을지로입구 63
 
4.2%
성수 62
 
4.1%
을지로3가 56
 
3.7%
대림(구로구청 52
 
3.4%
신당 49
 
3.2%
강남 49
 
3.2%
교대(법원·검찰청 48
 
3.2%
선릉 48
 
3.2%
Other values (31) 951
62.9%

출구번호
Categorical

Distinct17
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size11.9 KiB
1
223 
4
219 
3
212 
2
179 
5
144 
Other values (12)
536 

Length

Max length3
Median length1
Mean length1.0938533
Min length1

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 223
14.7%
4 219
14.5%
3 212
14.0%
2 179
11.8%
5 144
9.5%
6 121
8.0%
7 116
7.7%
8 105
6.9%
9 56
 
3.7%
10 45
 
3.0%
Other values (7) 93
6.1%

Length

2023-12-12T09:17:08.579768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 223
14.7%
4 219
14.5%
3 212
14.0%
2 179
11.8%
5 144
9.5%
6 121
8.0%
7 116
7.7%
8 105
6.9%
9 56
 
3.7%
10 45
 
3.0%
Other values (7) 93
6.1%
Distinct1236
Distinct (%)81.7%
Missing0
Missing (%)0.0%
Memory size11.9 KiB
2023-12-12T09:17:08.835181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length15
Mean length6.0839392
Min length2

Characters and Unicode

Total characters9205
Distinct characters437
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1060 ?
Unique (%)70.1%

Sample

1st row국민건강보험강남지사
2nd row국세종합상담센터
3rd row삼성세무서
4th row서초
5th row역삼
ValueCountFrequency (%)
국민은행 13
 
0.8%
우리은행 12
 
0.7%
기업은행 10
 
0.6%
신한은행 9
 
0.6%
우체국 8
 
0.5%
농협 7
 
0.4%
현대아파트 7
 
0.4%
외환은행 7
 
0.4%
주민센터 7
 
0.4%
우성아파트 6
 
0.4%
Other values (1262) 1519
94.6%
2023-12-12T09:17:09.278559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
264
 
2.9%
255
 
2.8%
224
 
2.4%
221
 
2.4%
216
 
2.3%
203
 
2.2%
180
 
2.0%
162
 
1.8%
142
 
1.5%
141
 
1.5%
Other values (427) 7197
78.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8773
95.3%
Decimal Number 124
 
1.3%
Uppercase Letter 119
 
1.3%
Space Separator 92
 
1.0%
Close Punctuation 47
 
0.5%
Open Punctuation 47
 
0.5%
Other Punctuation 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
264
 
3.0%
255
 
2.9%
224
 
2.6%
221
 
2.5%
216
 
2.5%
203
 
2.3%
180
 
2.1%
162
 
1.8%
142
 
1.6%
141
 
1.6%
Other values (394) 6765
77.1%
Uppercase Letter
ValueCountFrequency (%)
K 16
13.4%
G 13
10.9%
C 12
10.1%
S 12
10.1%
L 10
8.4%
T 10
8.4%
A 7
 
5.9%
Y 6
 
5.0%
M 6
 
5.0%
P 5
 
4.2%
Other values (9) 22
18.5%
Decimal Number
ValueCountFrequency (%)
1 46
37.1%
2 21
16.9%
3 17
 
13.7%
4 11
 
8.9%
5 9
 
7.3%
6 9
 
7.3%
7 4
 
3.2%
9 3
 
2.4%
8 2
 
1.6%
0 2
 
1.6%
Space Separator
ValueCountFrequency (%)
92
100.0%
Close Punctuation
ValueCountFrequency (%)
) 47
100.0%
Open Punctuation
ValueCountFrequency (%)
( 47
100.0%
Other Punctuation
ValueCountFrequency (%)
. 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8773
95.3%
Common 313
 
3.4%
Latin 119
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
264
 
3.0%
255
 
2.9%
224
 
2.6%
221
 
2.5%
216
 
2.5%
203
 
2.3%
180
 
2.1%
162
 
1.8%
142
 
1.6%
141
 
1.6%
Other values (394) 6765
77.1%
Latin
ValueCountFrequency (%)
K 16
13.4%
G 13
10.9%
C 12
10.1%
S 12
10.1%
L 10
8.4%
T 10
8.4%
A 7
 
5.9%
Y 6
 
5.0%
M 6
 
5.0%
P 5
 
4.2%
Other values (9) 22
18.5%
Common
ValueCountFrequency (%)
92
29.4%
) 47
15.0%
( 47
15.0%
1 46
14.7%
2 21
 
6.7%
3 17
 
5.4%
4 11
 
3.5%
5 9
 
2.9%
6 9
 
2.9%
7 4
 
1.3%
Other values (4) 10
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8773
95.3%
ASCII 432
 
4.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
264
 
3.0%
255
 
2.9%
224
 
2.6%
221
 
2.5%
216
 
2.5%
203
 
2.3%
180
 
2.1%
162
 
1.8%
142
 
1.6%
141
 
1.6%
Other values (394) 6765
77.1%
ASCII
ValueCountFrequency (%)
92
21.3%
) 47
10.9%
( 47
10.9%
1 46
10.6%
2 21
 
4.9%
3 17
 
3.9%
K 16
 
3.7%
G 13
 
3.0%
C 12
 
2.8%
S 12
 
2.8%
Other values (23) 109
25.2%

주소
Text

MISSING 

Distinct210
Distinct (%)84.0%
Missing1263
Missing (%)83.5%
Memory size11.9 KiB
2023-12-12T09:17:09.587319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length24
Mean length18.296
Min length13

Characters and Unicode

Total characters4574
Distinct characters174
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique179 ?
Unique (%)71.6%

Sample

1st row서울특별시 강남구 테헤란로 114
2nd row서울특별시 서초구 서초대로 308
3rd row서울특별시 서초구 서초대로 305
4th row서울특별시 서초구 서초대로65길 13-10
5th row서울특별시 서초구 서초중앙로24길 43
ValueCountFrequency (%)
서울특별시 250
25.1%
영등포구 34
 
3.4%
중구 34
 
3.4%
서초구 31
 
3.1%
강남구 29
 
2.9%
송파구 27
 
2.7%
성동구 26
 
2.6%
관악구 23
 
2.3%
광진구 15
 
1.5%
마포구 13
 
1.3%
Other values (313) 516
51.7%
2023-12-12T09:17:10.067646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
750
16.4%
304
 
6.6%
258
 
5.6%
253
 
5.5%
251
 
5.5%
250
 
5.5%
250
 
5.5%
227
 
5.0%
1 157
 
3.4%
3 105
 
2.3%
Other values (164) 1769
38.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3049
66.7%
Space Separator 750
 
16.4%
Decimal Number 745
 
16.3%
Dash Punctuation 17
 
0.4%
Uppercase Letter 13
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
304
 
10.0%
258
 
8.5%
253
 
8.3%
251
 
8.2%
250
 
8.2%
250
 
8.2%
227
 
7.4%
84
 
2.8%
63
 
2.1%
54
 
1.8%
Other values (144) 1055
34.6%
Decimal Number
ValueCountFrequency (%)
1 157
21.1%
3 105
14.1%
2 103
13.8%
6 68
9.1%
4 65
8.7%
0 55
 
7.4%
7 54
 
7.2%
9 53
 
7.1%
5 52
 
7.0%
8 33
 
4.4%
Uppercase Letter
ValueCountFrequency (%)
S 3
23.1%
O 2
15.4%
T 2
15.4%
K 2
15.4%
P 1
 
7.7%
W 1
 
7.7%
E 1
 
7.7%
R 1
 
7.7%
Space Separator
ValueCountFrequency (%)
750
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3049
66.7%
Common 1512
33.1%
Latin 13
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
304
 
10.0%
258
 
8.5%
253
 
8.3%
251
 
8.2%
250
 
8.2%
250
 
8.2%
227
 
7.4%
84
 
2.8%
63
 
2.1%
54
 
1.8%
Other values (144) 1055
34.6%
Common
ValueCountFrequency (%)
750
49.6%
1 157
 
10.4%
3 105
 
6.9%
2 103
 
6.8%
6 68
 
4.5%
4 65
 
4.3%
0 55
 
3.6%
7 54
 
3.6%
9 53
 
3.5%
5 52
 
3.4%
Other values (2) 50
 
3.3%
Latin
ValueCountFrequency (%)
S 3
23.1%
O 2
15.4%
T 2
15.4%
K 2
15.4%
P 1
 
7.7%
W 1
 
7.7%
E 1
 
7.7%
R 1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3049
66.7%
ASCII 1525
33.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
750
49.2%
1 157
 
10.3%
3 105
 
6.9%
2 103
 
6.8%
6 68
 
4.5%
4 65
 
4.3%
0 55
 
3.6%
7 54
 
3.5%
9 53
 
3.5%
5 52
 
3.4%
Other values (10) 63
 
4.1%
Hangul
ValueCountFrequency (%)
304
 
10.0%
258
 
8.5%
253
 
8.3%
251
 
8.2%
250
 
8.2%
250
 
8.2%
227
 
7.4%
84
 
2.8%
63
 
2.1%
54
 
1.8%
Other values (144) 1055
34.6%

Correlations

2023-12-12T09:17:10.162219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명출구번호
역명1.0000.614
출구번호0.6141.000
2023-12-12T09:17:10.235512image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출구번호역명
출구번호1.0000.208
역명0.2081.000
2023-12-12T09:17:10.307642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명출구번호
역명1.0000.208
출구번호0.2081.000

Missing values

2023-12-12T09:17:07.802359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T09:17:07.909864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명출구번호출구별 주요시설명주소
0서울교통공사2호선강남1국민건강보험강남지사<NA>
1서울교통공사2호선강남1국세종합상담센터<NA>
2서울교통공사2호선강남1삼성세무서서울특별시 강남구 테헤란로 114
3서울교통공사2호선강남1서초<NA>
4서울교통공사2호선강남1역삼<NA>
5서울교통공사2호선강남1테헤란빌딩<NA>
6서울교통공사2호선강남1캠브리지빌딩<NA>
7서울교통공사2호선강남1특허청서울사무소<NA>
8서울교통공사2호선강남2메리츠타워<NA>
9서울교통공사2호선강남2푸르덴셜타워<NA>
철도운영기관명선명역명출구번호출구별 주요시설명주소
1503서울교통공사2호선을지로입구8서울플라자호텔<NA>
1504서울교통공사2호선을지로입구8프레지던트호텔<NA>
1505서울교통공사2호선이대1신촌우체국<NA>
1506서울교통공사2호선이대2이화여대<NA>
1507서울교통공사2호선이대2이대사회복지관<NA>
1508서울교통공사2호선이대3이화여대<NA>
1509서울교통공사2호선이대3서부교육청<NA>
1510서울교통공사2호선이대3대신초등학교<NA>
1511서울교통공사2호선이대3서울과학종합대학원대학교서울특별시 서대문구 이화여대2길 46
1512서울교통공사2호선이대4북성초등학교<NA>

Duplicate rows

Most frequently occurring

철도운영기관명선명역명출구번호출구별 주요시설명주소# duplicates
0서울교통공사2호선강남10교보타워<NA>2
1서울교통공사2호선강남10한남대교방면<NA>2
2서울교통공사2호선강변(동서울터미널)4서울광진학교<NA>2
3서울교통공사2호선건대입구5동자초등학교<NA>2
4서울교통공사2호선건대입구5신양초등학교<NA>2
5서울교통공사2호선방배2경남아파트<NA>2
6서울교통공사2호선사당1예술의전당방면<NA>2
7서울교통공사2호선잠실나루2진주아파트<NA>2
8서울교통공사2호선잠실나루3잠동초등학교<NA>2
9서울교통공사2호선잠실나루3장미아파트<NA>2