Overview

Dataset statistics

Number of variables6
Number of observations57
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.8 KiB
Average record size in memory50.3 B

Variable types

Text6

Dataset

Description국가철도공단에서 관리하는 전국 고속철도역사의 한글, 영문, 로마자, 일본어, 중국어(간체, 번체) 등의 정보를 제공합니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15096780/fileData.do

Alerts

역명(중국어 간체) has unique valuesUnique

Reproduction

Analysis started2023-12-12 21:59:45.560681
Analysis finished2023-12-12 21:59:46.197661
Duration0.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

Distinct56
Distinct (%)98.2%
Missing0
Missing (%)0.0%
Memory size588.0 B
2023-12-13T06:59:46.366158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length2
Mean length2.4035088
Min length2

Characters and Unicode

Total characters137
Distinct characters74
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)96.5%

Sample

1st row김제
2nd row광주송정
3rd row공주
4th row계룡
5th row정읍
ValueCountFrequency (%)
오송 2
 
3.5%
김제 1
 
1.8%
평창 1
 
1.8%
횡성 1
 
1.8%
창원 1
 
1.8%
진영 1
 
1.8%
마산 1
 
1.8%
순천 1
 
1.8%
구례구 1
 
1.8%
곡성 1
 
1.8%
Other values (46) 46
80.7%
2023-12-13T06:59:46.711046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9
 
6.6%
6
 
4.4%
5
 
3.6%
5
 
3.6%
4
 
2.9%
4
 
2.9%
4
 
2.9%
4
 
2.9%
4
 
2.9%
3
 
2.2%
Other values (64) 89
65.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 135
98.5%
Close Punctuation 1
 
0.7%
Open Punctuation 1
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9
 
6.7%
6
 
4.4%
5
 
3.7%
5
 
3.7%
4
 
3.0%
4
 
3.0%
4
 
3.0%
4
 
3.0%
4
 
3.0%
3
 
2.2%
Other values (62) 87
64.4%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 135
98.5%
Common 2
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9
 
6.7%
6
 
4.4%
5
 
3.7%
5
 
3.7%
4
 
3.0%
4
 
3.0%
4
 
3.0%
4
 
3.0%
4
 
3.0%
3
 
2.2%
Other values (62) 87
64.4%
Common
ValueCountFrequency (%)
) 1
50.0%
( 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 135
98.5%
ASCII 2
 
1.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9
 
6.7%
6
 
4.4%
5
 
3.7%
5
 
3.7%
4
 
3.0%
4
 
3.0%
4
 
3.0%
4
 
3.0%
4
 
3.0%
3
 
2.2%
Other values (62) 87
64.4%
ASCII
ValueCountFrequency (%)
) 1
50.0%
( 1
50.0%
Distinct56
Distinct (%)98.2%
Missing0
Missing (%)0.0%
Memory size588.0 B
2023-12-13T06:59:46.941959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length13
Mean length7.9298246
Min length4

Characters and Unicode

Total characters452
Distinct characters42
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)96.5%

Sample

1st rowGimje
2nd rowGwangjusongjeong
3rd rowGongju
4th rowGyeryong
5th rowJeongeup
ValueCountFrequency (%)
osong 2
 
3.5%
gimje 1
 
1.8%
pyeongchang 1
 
1.8%
hoengseong 1
 
1.8%
changwon 1
 
1.8%
jinyeong 1
 
1.8%
masan 1
 
1.8%
suncheon 1
 
1.8%
guryegu 1
 
1.8%
gokseong 1
 
1.8%
Other values (46) 46
80.7%
2023-12-13T06:59:47.283449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 80
17.7%
o 52
11.5%
g 47
10.4%
e 41
 
9.1%
a 35
 
7.7%
u 24
 
5.3%
s 19
 
4.2%
j 14
 
3.1%
G 12
 
2.7%
i 12
 
2.7%
Other values (32) 116
25.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 385
85.2%
Uppercase Letter 60
 
13.3%
Open Punctuation 2
 
0.4%
Close Punctuation 2
 
0.4%
Dash Punctuation 2
 
0.4%
Space Separator 1
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 80
20.8%
o 52
13.5%
g 47
12.2%
e 41
10.6%
a 35
9.1%
u 24
 
6.2%
s 19
 
4.9%
j 14
 
3.6%
i 12
 
3.1%
y 11
 
2.9%
Other values (12) 50
13.0%
Uppercase Letter
ValueCountFrequency (%)
G 12
20.0%
J 8
13.3%
S 6
10.0%
D 5
8.3%
M 5
8.3%
Y 5
8.3%
C 4
 
6.7%
O 3
 
5.0%
N 3
 
5.0%
P 2
 
3.3%
Other values (6) 7
11.7%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 445
98.5%
Common 7
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 80
18.0%
o 52
11.7%
g 47
10.6%
e 41
 
9.2%
a 35
 
7.9%
u 24
 
5.4%
s 19
 
4.3%
j 14
 
3.1%
G 12
 
2.7%
i 12
 
2.7%
Other values (28) 109
24.5%
Common
ValueCountFrequency (%)
( 2
28.6%
) 2
28.6%
- 2
28.6%
1
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 452
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 80
17.7%
o 52
11.5%
g 47
10.4%
e 41
 
9.1%
a 35
 
7.7%
u 24
 
5.3%
s 19
 
4.2%
j 14
 
3.1%
G 12
 
2.7%
i 12
 
2.7%
Other values (32) 116
25.7%
Distinct56
Distinct (%)98.2%
Missing0
Missing (%)0.0%
Memory size588.0 B
2023-12-13T06:59:47.509962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length13
Mean length7.9298246
Min length4

Characters and Unicode

Total characters452
Distinct characters42
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)96.5%

Sample

1st rowGimje
2nd rowGwangjusongjeong
3rd rowGongju
4th rowGyeryong
5th rowJeongeup
ValueCountFrequency (%)
osong 2
 
3.5%
gimje 1
 
1.8%
pyeongchang 1
 
1.8%
hoengseong 1
 
1.8%
changwon 1
 
1.8%
jinyeong 1
 
1.8%
masan 1
 
1.8%
suncheon 1
 
1.8%
guryegu 1
 
1.8%
gokseong 1
 
1.8%
Other values (46) 46
80.7%
2023-12-13T06:59:47.857648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 80
17.7%
o 52
11.5%
g 47
10.4%
e 41
 
9.1%
a 35
 
7.7%
u 24
 
5.3%
s 19
 
4.2%
j 14
 
3.1%
G 12
 
2.7%
i 12
 
2.7%
Other values (32) 116
25.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 385
85.2%
Uppercase Letter 60
 
13.3%
Open Punctuation 2
 
0.4%
Close Punctuation 2
 
0.4%
Dash Punctuation 2
 
0.4%
Space Separator 1
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 80
20.8%
o 52
13.5%
g 47
12.2%
e 41
10.6%
a 35
9.1%
u 24
 
6.2%
s 19
 
4.9%
j 14
 
3.6%
i 12
 
3.1%
y 11
 
2.9%
Other values (12) 50
13.0%
Uppercase Letter
ValueCountFrequency (%)
G 12
20.0%
J 8
13.3%
S 6
10.0%
D 5
8.3%
M 5
8.3%
Y 5
8.3%
C 4
 
6.7%
O 3
 
5.0%
N 3
 
5.0%
P 2
 
3.3%
Other values (6) 7
11.7%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 445
98.5%
Common 7
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 80
18.0%
o 52
11.7%
g 47
10.6%
e 41
 
9.2%
a 35
 
7.9%
u 24
 
5.4%
s 19
 
4.3%
j 14
 
3.1%
G 12
 
2.7%
i 12
 
2.7%
Other values (28) 109
24.5%
Common
ValueCountFrequency (%)
( 2
28.6%
) 2
28.6%
- 2
28.6%
1
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 452
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 80
17.7%
o 52
11.5%
g 47
10.4%
e 41
 
9.1%
a 35
 
7.7%
u 24
 
5.3%
s 19
 
4.2%
j 14
 
3.1%
G 12
 
2.7%
i 12
 
2.7%
Other values (32) 116
25.7%
Distinct56
Distinct (%)98.2%
Missing0
Missing (%)0.0%
Memory size588.0 B
2023-12-13T06:59:48.105787image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length4.7192982
Min length2

Characters and Unicode

Total characters269
Distinct characters57
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)96.5%

Sample

1st rowキムジェ
2nd rowクァンジュ ソンジョン
3rd rowコンジュ
4th rowケリョン
5th rowチョンウプ
ValueCountFrequency (%)
オソン 2
 
3.4%
チャンウォン 2
 
3.4%
スソ 1
 
1.7%
チョンドンジン 1
 
1.7%
ジュンアン 1
 
1.7%
ピョンチャン 1
 
1.7%
フェンソン 1
 
1.7%
チニョン 1
 
1.7%
マサン 1
 
1.7%
スンチョン 1
 
1.7%
Other values (47) 47
79.7%
2023-12-13T06:59:48.440499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
73
27.1%
19
 
7.1%
16
 
5.9%
14
 
5.2%
9
 
3.3%
9
 
3.3%
7
 
2.6%
7
 
2.6%
6
 
2.2%
6
 
2.2%
Other values (47) 103
38.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 261
97.0%
Space Separator 4
 
1.5%
Close Punctuation 2
 
0.7%
Open Punctuation 2
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
73
28.0%
19
 
7.3%
16
 
6.1%
14
 
5.4%
9
 
3.4%
9
 
3.4%
7
 
2.7%
7
 
2.7%
6
 
2.3%
6
 
2.3%
Other values (44) 95
36.4%
Space Separator
ValueCountFrequency (%)
4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Katakana 259
96.3%
Common 8
 
3.0%
Han 2
 
0.7%

Most frequent character per script

Katakana
ValueCountFrequency (%)
73
28.2%
19
 
7.3%
16
 
6.2%
14
 
5.4%
9
 
3.5%
9
 
3.5%
7
 
2.7%
7
 
2.7%
6
 
2.3%
6
 
2.3%
Other values (42) 93
35.9%
Common
ValueCountFrequency (%)
4
50.0%
) 2
25.0%
( 2
25.0%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Katakana 259
96.3%
ASCII 8
 
3.0%
CJK 2
 
0.7%

Most frequent character per block

Katakana
ValueCountFrequency (%)
73
28.2%
19
 
7.3%
16
 
6.2%
14
 
5.4%
9
 
3.5%
9
 
3.5%
7
 
2.7%
7
 
2.7%
6
 
2.3%
6
 
2.3%
Other values (42) 93
35.9%
ASCII
ValueCountFrequency (%)
4
50.0%
) 2
25.0%
( 2
25.0%
CJK
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct57
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size588.0 B
2023-12-13T06:59:48.727277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length2
Mean length2.4385965
Min length2

Characters and Unicode

Total characters139
Distinct characters93
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique57 ?
Unique (%)100.0%

Sample

1st row鸡龙
2nd row公州
3rd row光州松汀
4th row金堤
5th row罗州
ValueCountFrequency (%)
鸡龙 1
 
1.8%
马山 1
 
1.8%
晋州 1
 
1.8%
昌原 1
 
1.8%
昌原中央 1
 
1.8%
谷城 1
 
1.8%
求礼口 1
 
1.8%
南原 1
 
1.8%
顺天 1
 
1.8%
丽水世博会 1
 
1.8%
Other values (47) 47
82.5%
2023-12-13T06:59:49.100257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9
 
6.5%
6
 
4.3%
4
 
2.9%
4
 
2.9%
4
 
2.9%
3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
2
 
1.4%
Other values (83) 98
70.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 133
95.7%
Open Punctuation 2
 
1.4%
Close Punctuation 2
 
1.4%
Other Punctuation 2
 
1.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9
 
6.8%
6
 
4.5%
4
 
3.0%
4
 
3.0%
4
 
3.0%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
2
 
1.5%
Other values (80) 92
69.2%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Other Punctuation
ValueCountFrequency (%)
? 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 133
95.7%
Common 6
 
4.3%

Most frequent character per script

Han
ValueCountFrequency (%)
9
 
6.8%
6
 
4.5%
4
 
3.0%
4
 
3.0%
4
 
3.0%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
2
 
1.5%
Other values (80) 92
69.2%
Common
ValueCountFrequency (%)
( 2
33.3%
) 2
33.3%
? 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
CJK 131
94.2%
ASCII 6
 
4.3%
CJK Compat Ideographs 2
 
1.4%

Most frequent character per block

CJK
ValueCountFrequency (%)
9
 
6.9%
6
 
4.6%
4
 
3.1%
4
 
3.1%
4
 
3.1%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
2
 
1.5%
Other values (79) 90
68.7%
ASCII
ValueCountFrequency (%)
( 2
33.3%
) 2
33.3%
? 2
33.3%
CJK Compat Ideographs
ValueCountFrequency (%)
2
100.0%
Distinct56
Distinct (%)98.2%
Missing0
Missing (%)0.0%
Memory size588.0 B
2023-12-13T06:59:49.354320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length2
Mean length2.5438596
Min length2

Characters and Unicode

Total characters145
Distinct characters94
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)96.5%

Sample

1st row金堤
2nd row光州松汀
3rd row公州
4th row鷄龍
5th row井邑
ValueCountFrequency (%)
五松 2
 
3.5%
金堤 1
 
1.8%
平昌 1
 
1.8%
橫城 1
 
1.8%
昌原 1
 
1.8%
進永 1
 
1.8%
馬山 1
 
1.8%
順天 1
 
1.8%
求禮口 1
 
1.8%
谷城 1
 
1.8%
Other values (46) 46
80.7%
2023-12-13T06:59:49.764860image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9
 
6.2%
6
 
4.1%
4
 
2.8%
4
 
2.8%
4
 
2.8%
3
 
2.1%
( 3
 
2.1%
3
 
2.1%
3
 
2.1%
3
 
2.1%
Other values (84) 103
71.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 139
95.9%
Open Punctuation 3
 
2.1%
Close Punctuation 3
 
2.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9
 
6.5%
6
 
4.3%
4
 
2.9%
4
 
2.9%
4
 
2.9%
3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
Other values (82) 97
69.8%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 136
93.8%
Common 6
 
4.1%
Hangul 3
 
2.1%

Most frequent character per script

Han
ValueCountFrequency (%)
9
 
6.6%
6
 
4.4%
4
 
2.9%
4
 
2.9%
4
 
2.9%
3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
Other values (79) 94
69.1%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Common
ValueCountFrequency (%)
( 3
50.0%
) 3
50.0%

Most occurring blocks

ValueCountFrequency (%)
CJK 129
89.0%
CJK Compat Ideographs 7
 
4.8%
ASCII 6
 
4.1%
Hangul 3
 
2.1%

Most frequent character per block

CJK
ValueCountFrequency (%)
9
 
7.0%
6
 
4.7%
4
 
3.1%
4
 
3.1%
4
 
3.1%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
Other values (74) 87
67.4%
ASCII
ValueCountFrequency (%)
( 3
50.0%
) 3
50.0%
CJK Compat Ideographs
ValueCountFrequency (%)
2
28.6%
2
28.6%
1
14.3%
1
14.3%
1
14.3%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

Correlations

2023-12-13T06:59:49.892760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명역명(영문)역명(로마자)역명(일본어)역명(중국어 간체)역명(중국어 번체)
역명1.0001.0001.0001.0001.0001.000
역명(영문)1.0001.0001.0001.0001.0001.000
역명(로마자)1.0001.0001.0001.0001.0001.000
역명(일본어)1.0001.0001.0001.0001.0001.000
역명(중국어 간체)1.0001.0001.0001.0001.0001.000
역명(중국어 번체)1.0001.0001.0001.0001.0001.000

Missing values

2023-12-13T06:59:46.025447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:59:46.139680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

역명역명(영문)역명(로마자)역명(일본어)역명(중국어 간체)역명(중국어 번체)
0김제GimjeGimjeキムジェ鸡龙金堤
1광주송정GwangjusongjeongGwangjusongjeongクァンジュ ソンジョン公州光州松汀
2공주GongjuGongjuコンジュ光州松汀公州
3계룡GyeryongGyeryongケリョン金堤鷄龍
4정읍JeongeupJeongeupチョンウプ罗州井邑
5나주NajuNajuナジュ论山羅州
6익산IksanIksanイクサン木浦益山
7오송OsongOsongオソン西大田五松
8서대전SeodaejeonSeodaejeonソデジョン益山西大田
9목포MokpoMokpoモクポ长城木浦
역명역명(영문)역명(로마자)역명(일본어)역명(중국어 간체)역명(중국어 번체)
47청량리CheongnyangniCheongnyangniチョンニャンニ万钟淸凉里
48강릉GangneungGangneungカンヌン墨湖江陵
49동해DonghaeDonghaeトンヘ正东津東海
50둔내DunnaeDunnaeトゥンネ珍富(五台山)屯內
51만종ManjongManjongマンジョン平昌萬鍾
52묵호MukhoMukhoムコ横城墨湖
53정동진JeongdongjinJeongdongjinチョンドンジン水西正東津
54수서SuseoSuseoスソ芝制水西
55지제JijeJijeチジェ东滩芝制
56동탄DongtanDongtanトンタン??東灘