Overview

Dataset statistics

Number of variables4
Number of observations784
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory24.6 KiB
Average record size in memory32.2 B

Variable types

Text3
Categorical1

Dataset

Description전철역코드,전철역명,호선,외부코드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-121/S/1/datasetView.do

Alerts

전철역코드 has unique valuesUnique
외부코드 has unique valuesUnique

Reproduction

Analysis started2024-05-18 04:45:40.986060
Analysis finished2024-05-18 04:45:41.848125
Duration0.86 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

전철역코드
Text

UNIQUE 

Distinct784
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size6.3 KiB
2024-05-18T13:45:42.411216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters3136
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique784 ?
Unique (%)100.0%

Sample

1st row0008
2nd row0009
3rd row0011
4th row0228
5th row0318
ValueCountFrequency (%)
0008 1
 
0.1%
1268 1
 
0.1%
1902 1
 
0.1%
1329 1
 
0.1%
1501 1
 
0.1%
1801 1
 
0.1%
1831 1
 
0.1%
1850 1
 
0.1%
1856 1
 
0.1%
1866 1
 
0.1%
Other values (774) 774
98.7%
2024-05-18T13:45:43.536635image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 654
20.9%
2 521
16.6%
0 399
12.7%
4 329
10.5%
3 310
9.9%
5 235
 
7.5%
7 204
 
6.5%
8 197
 
6.3%
6 165
 
5.3%
9 113
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3127
99.7%
Uppercase Letter 9
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 654
20.9%
2 521
16.7%
0 399
12.8%
4 329
10.5%
3 310
9.9%
5 235
 
7.5%
7 204
 
6.5%
8 197
 
6.3%
6 165
 
5.3%
9 113
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
C 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3127
99.7%
Latin 9
 
0.3%

Most frequent character per script

Common
ValueCountFrequency (%)
1 654
20.9%
2 521
16.7%
0 399
12.8%
4 329
10.5%
3 310
9.9%
5 235
 
7.5%
7 204
 
6.5%
8 197
 
6.3%
6 165
 
5.3%
9 113
 
3.6%
Latin
ValueCountFrequency (%)
C 9
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3136
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 654
20.9%
2 521
16.6%
0 399
12.7%
4 329
10.5%
3 310
9.9%
5 235
 
7.5%
7 204
 
6.5%
8 197
 
6.3%
6 165
 
5.3%
9 113
 
3.6%
Distinct645
Distinct (%)82.3%
Missing0
Missing (%)0.0%
Memory size6.3 KiB
2024-05-18T13:45:44.090207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length2
Mean length2.8596939
Min length2

Characters and Unicode

Total characters2242
Distinct characters306
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique528 ?
Unique (%)67.3%

Sample

1st row수서
2nd row성남
3rd row동탄
4th row서울대입구
5th row안국
ValueCountFrequency (%)
김포공항 5
 
0.6%
서울역 4
 
0.5%
청량리 4
 
0.5%
공덕 4
 
0.5%
왕십리 4
 
0.5%
회기 3
 
0.4%
수서 3
 
0.4%
디지털미디어시티 3
 
0.4%
상봉 3
 
0.4%
고속터미널 3
 
0.4%
Other values (635) 748
95.4%
2024-05-18T13:45:45.125576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
66
 
2.9%
60
 
2.7%
53
 
2.4%
51
 
2.3%
49
 
2.2%
48
 
2.1%
44
 
2.0%
41
 
1.8%
40
 
1.8%
33
 
1.5%
Other values (296) 1757
78.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2226
99.3%
Decimal Number 13
 
0.6%
Other Punctuation 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
66
 
3.0%
60
 
2.7%
53
 
2.4%
51
 
2.3%
49
 
2.2%
48
 
2.2%
44
 
2.0%
41
 
1.8%
40
 
1.8%
33
 
1.5%
Other values (288) 1741
78.2%
Decimal Number
ValueCountFrequency (%)
3 5
38.5%
4 3
23.1%
1 2
 
15.4%
9 1
 
7.7%
2 1
 
7.7%
5 1
 
7.7%
Other Punctuation
ValueCountFrequency (%)
. 2
66.7%
? 1
33.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2226
99.3%
Common 16
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
66
 
3.0%
60
 
2.7%
53
 
2.4%
51
 
2.3%
49
 
2.2%
48
 
2.2%
44
 
2.0%
41
 
1.8%
40
 
1.8%
33
 
1.5%
Other values (288) 1741
78.2%
Common
ValueCountFrequency (%)
3 5
31.2%
4 3
18.8%
. 2
 
12.5%
1 2
 
12.5%
9 1
 
6.2%
? 1
 
6.2%
2 1
 
6.2%
5 1
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2226
99.3%
ASCII 16
 
0.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
66
 
3.0%
60
 
2.7%
53
 
2.4%
51
 
2.3%
49
 
2.2%
48
 
2.2%
44
 
2.0%
41
 
1.8%
40
 
1.8%
33
 
1.5%
Other values (288) 1741
78.2%
ASCII
ValueCountFrequency (%)
3 5
31.2%
4 3
18.8%
. 2
 
12.5%
1 2
 
12.5%
9 1
 
6.2%
? 1
 
6.2%
2 1
 
6.2%
5 1
 
6.2%

호선
Categorical

Distinct24
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Memory size6.3 KiB
01호선
102 
수인분당선
63 
경의선
57 
05호선
56 
07호선
53 
Other values (19)
453 

Length

Max length7
Median length4
Mean length4.0522959
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGTX-A
2nd rowGTX-A
3rd rowGTX-A
4th row02호선
5th row03호선

Common Values

ValueCountFrequency (%)
01호선 102
13.0%
수인분당선 63
 
8.0%
경의선 57
 
7.3%
05호선 56
 
7.1%
07호선 53
 
6.8%
02호선 51
 
6.5%
04호선 51
 
6.5%
03호선 44
 
5.6%
06호선 39
 
5.0%
09호선 38
 
4.8%
Other values (14) 230
29.3%

Length

2024-05-18T13:45:45.494387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
01호선 102
13.0%
수인분당선 63
 
8.0%
경의선 57
 
7.3%
05호선 56
 
7.1%
07호선 53
 
6.8%
02호선 51
 
6.5%
04호선 51
 
6.5%
03호선 44
 
5.6%
06호선 39
 
5.0%
09호선 38
 
4.8%
Other values (14) 230
29.3%

외부코드
Text

UNIQUE 

Distinct784
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size6.3 KiB
2024-05-18T13:45:46.319082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length3
Mean length3.4145408
Min length2

Characters and Unicode

Total characters2677
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique784 ?
Unique (%)100.0%

Sample

1st rowX108
2nd rowX109
3rd rowX111
4th row228
5th row328
ValueCountFrequency (%)
x108 1
 
0.1%
k318 1
 
0.1%
114 1
 
0.1%
p140 1
 
0.1%
k409 1
 
0.1%
143 1
 
0.1%
k252 1
 
0.1%
k214 1
 
0.1%
k229 1
 
0.1%
k238 1
 
0.1%
Other values (774) 774
98.7%
2024-05-18T13:45:47.252171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 513
19.2%
2 392
14.6%
3 282
10.5%
4 251
9.4%
5 203
 
7.6%
0 155
 
5.8%
6 149
 
5.6%
7 141
 
5.3%
9 134
 
5.0%
K 130
 
4.9%
Other values (10) 327
12.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2311
86.3%
Uppercase Letter 353
 
13.2%
Dash Punctuation 13
 
0.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 513
22.2%
2 392
17.0%
3 282
12.2%
4 251
10.9%
5 203
 
8.8%
0 155
 
6.7%
6 149
 
6.4%
7 141
 
6.1%
9 134
 
5.8%
8 91
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
K 130
36.8%
P 71
20.1%
I 57
16.1%
S 32
 
9.1%
D 16
 
4.5%
U 15
 
4.2%
Y 15
 
4.2%
A 14
 
4.0%
X 3
 
0.8%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2324
86.8%
Latin 353
 
13.2%

Most frequent character per script

Common
ValueCountFrequency (%)
1 513
22.1%
2 392
16.9%
3 282
12.1%
4 251
10.8%
5 203
 
8.7%
0 155
 
6.7%
6 149
 
6.4%
7 141
 
6.1%
9 134
 
5.8%
8 91
 
3.9%
Latin
ValueCountFrequency (%)
K 130
36.8%
P 71
20.1%
I 57
16.1%
S 32
 
9.1%
D 16
 
4.5%
U 15
 
4.2%
Y 15
 
4.2%
A 14
 
4.0%
X 3
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2677
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 513
19.2%
2 392
14.6%
3 282
10.5%
4 251
9.4%
5 203
 
7.6%
0 155
 
5.8%
6 149
 
5.6%
7 141
 
5.3%
9 134
 
5.0%
K 130
 
4.9%
Other values (10) 327
12.2%

Missing values

2024-05-18T13:45:41.464837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-18T13:45:41.739207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

전철역코드전철역명호선외부코드
00008수서GTX-AX108
10009성남GTX-AX109
20011동탄GTX-AX111
30228서울대입구02호선228
40318안국03호선328
50321충무로03호선331
61209도심경의선K127
71307회기경춘선P118
81308중랑경춘선P119
91404탕정01호선P173
전철역코드전철역명호선외부코드
7744920양촌김포도시철도690
7754921구래김포도시철도691
7764922마산김포도시철도692
7774923장기김포도시철도693
7784924운양김포도시철도694
7794925걸포북변김포도시철도695
7804926사우김포도시철도696
7811271능곡경의선K321
7821326강촌경춘선P137
7830300대곡경의선K322