Overview

Dataset statistics

Number of variables6
Number of observations38
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 KiB
Average record size in memory51.5 B

Variable types

Text5
Categorical1

Dataset

Description역명(한글),역명(영문),역명(로마자),역명(일본어),역명(중국어간체),역명(중국어번체) 등의 정보를 제공
URLhttps://www.data.go.kr/data/15064049/fileData.do

Alerts

역명(중국어 번체) is highly imbalanced (63.0%)Imbalance
역명 has unique valuesUnique
역명(영문) has unique valuesUnique
역명(로마자) has unique valuesUnique
역명(일본어) has unique valuesUnique
역명(중국어 간체) has unique valuesUnique

Reproduction

Analysis started2023-12-12 06:39:40.318417
Analysis finished2023-12-12 06:39:40.954607
Duration0.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

UNIQUE 

Distinct38
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size436.0 B
2023-12-12T15:39:41.137002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length9
Mean length3.6315789
Min length2

Characters and Unicode

Total characters138
Distinct characters91
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)100.0%

Sample

1st row개화
2nd row김포공항
3rd row공항시장
4th row신방화
5th row마곡나루
ValueCountFrequency (%)
개화 1
 
2.6%
삼성중앙 1
 
2.6%
둔촌오륜 1
 
2.6%
신반포 1
 
2.6%
고속터미널 1
 
2.6%
사평 1
 
2.6%
신논현 1
 
2.6%
언주 1
 
2.6%
선정릉 1
 
2.6%
종합운동장 1
 
2.6%
Other values (28) 28
73.7%
2023-12-12T15:39:41.576810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4
 
2.9%
4
 
2.9%
( 3
 
2.2%
3
 
2.2%
3
 
2.2%
) 3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
Other values (81) 106
76.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 132
95.7%
Open Punctuation 3
 
2.2%
Close Punctuation 3
 
2.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4
 
3.0%
4
 
3.0%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
Other values (79) 100
75.8%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 132
95.7%
Common 6
 
4.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4
 
3.0%
4
 
3.0%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
Other values (79) 100
75.8%
Common
ValueCountFrequency (%)
( 3
50.0%
) 3
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 132
95.7%
ASCII 6
 
4.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4
 
3.0%
4
 
3.0%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
3
 
2.3%
Other values (79) 100
75.8%
ASCII
ValueCountFrequency (%)
( 3
50.0%
) 3
50.0%

역명(영문)
Text

UNIQUE 

Distinct38
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size436.0 B
2023-12-12T15:39:41.889486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length18.5
Mean length11.894737
Min length5

Characters and Unicode

Total characters452
Distinct characters45
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)100.0%

Sample

1st rowGaehwa
2nd rowGimpo Int'l Airport
3rd rowAirport Market
4th rowSinbanghwa
5th rowMagongnaru
ValueCountFrequency (%)
airport 2
 
3.5%
seokchon 2
 
3.5%
national 2
 
3.5%
gaehwa 1
 
1.8%
complex 1
 
1.8%
sinbanpo 1
 
1.8%
express 1
 
1.8%
bus 1
 
1.8%
terminal 1
 
1.8%
sapyeong 1
 
1.8%
Other values (44) 44
77.2%
2023-12-12T15:39:42.358065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 58
 
12.8%
o 42
 
9.3%
e 37
 
8.2%
a 36
 
8.0%
g 27
 
6.0%
u 19
 
4.2%
19
 
4.2%
i 17
 
3.8%
S 16
 
3.5%
r 16
 
3.5%
Other values (35) 165
36.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 368
81.4%
Uppercase Letter 60
 
13.3%
Space Separator 19
 
4.2%
Open Punctuation 2
 
0.4%
Close Punctuation 2
 
0.4%
Other Punctuation 1
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 58
15.8%
o 42
11.4%
e 37
10.1%
a 36
 
9.8%
g 27
 
7.3%
u 19
 
5.2%
i 17
 
4.6%
r 16
 
4.3%
s 12
 
3.3%
y 12
 
3.3%
Other values (13) 92
25.0%
Uppercase Letter
ValueCountFrequency (%)
S 16
26.7%
G 5
 
8.3%
C 4
 
6.7%
A 4
 
6.7%
H 4
 
6.7%
D 4
 
6.7%
N 4
 
6.7%
B 3
 
5.0%
M 3
 
5.0%
Y 3
 
5.0%
Other values (8) 10
16.7%
Space Separator
ValueCountFrequency (%)
19
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Other Punctuation
ValueCountFrequency (%)
' 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 428
94.7%
Common 24
 
5.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 58
 
13.6%
o 42
 
9.8%
e 37
 
8.6%
a 36
 
8.4%
g 27
 
6.3%
u 19
 
4.4%
i 17
 
4.0%
S 16
 
3.7%
r 16
 
3.7%
s 12
 
2.8%
Other values (31) 148
34.6%
Common
ValueCountFrequency (%)
19
79.2%
( 2
 
8.3%
) 2
 
8.3%
' 1
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 452
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 58
 
12.8%
o 42
 
9.3%
e 37
 
8.2%
a 36
 
8.0%
g 27
 
6.0%
u 19
 
4.2%
19
 
4.2%
i 17
 
3.8%
S 16
 
3.5%
r 16
 
3.5%
Other values (35) 165
36.5%

역명(로마자)
Text

UNIQUE 

Distinct38
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size436.0 B
2023-12-12T15:39:42.615197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length20
Mean length12.026316
Min length5

Characters and Unicode

Total characters457
Distinct characters43
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)100.0%

Sample

1st rowGaehwa
2nd rowGimpo Int'l Airport
3rd rowAirport Market
4th rowSinbanghwa
5th rowMagongnaru
ValueCountFrequency (%)
airport 2
 
3.8%
seokchon 2
 
3.8%
national 2
 
3.8%
gaehwa 1
 
1.9%
seonjeongneung 1
 
1.9%
dunchon 1
 
1.9%
oryun 1
 
1.9%
sinbanpo 1
 
1.9%
express 1
 
1.9%
bus 1
 
1.9%
Other values (40) 40
75.5%
2023-12-12T15:39:43.036215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 64
 
14.0%
o 49
 
10.7%
e 35
 
7.7%
a 35
 
7.7%
g 31
 
6.8%
u 20
 
4.4%
i 16
 
3.5%
15
 
3.3%
S 15
 
3.3%
r 14
 
3.1%
Other values (33) 163
35.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 382
83.6%
Uppercase Letter 55
 
12.0%
Space Separator 15
 
3.3%
Close Punctuation 2
 
0.4%
Open Punctuation 2
 
0.4%
Other Punctuation 1
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 64
16.8%
o 49
12.8%
e 35
 
9.2%
a 35
 
9.2%
g 31
 
8.1%
u 20
 
5.2%
i 16
 
4.2%
r 14
 
3.7%
y 13
 
3.4%
s 12
 
3.1%
Other values (13) 93
24.3%
Uppercase Letter
ValueCountFrequency (%)
S 15
27.3%
G 6
 
10.9%
N 4
 
7.3%
D 4
 
7.3%
A 4
 
7.3%
Y 3
 
5.5%
H 3
 
5.5%
B 3
 
5.5%
C 3
 
5.5%
M 2
 
3.6%
Other values (6) 8
14.5%
Space Separator
ValueCountFrequency (%)
15
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Other Punctuation
ValueCountFrequency (%)
' 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 437
95.6%
Common 20
 
4.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 64
14.6%
o 49
 
11.2%
e 35
 
8.0%
a 35
 
8.0%
g 31
 
7.1%
u 20
 
4.6%
i 16
 
3.7%
S 15
 
3.4%
r 14
 
3.2%
y 13
 
3.0%
Other values (29) 145
33.2%
Common
ValueCountFrequency (%)
15
75.0%
) 2
 
10.0%
( 2
 
10.0%
' 1
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 457
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 64
 
14.0%
o 49
 
10.7%
e 35
 
7.7%
a 35
 
7.7%
g 31
 
6.8%
u 20
 
4.4%
i 16
 
3.5%
15
 
3.3%
S 15
 
3.3%
r 14
 
3.1%
Other values (33) 163
35.7%

역명(일본어)
Text

UNIQUE 

Distinct38
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size436.0 B
2023-12-12T15:39:43.251818image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length11
Mean length6.1842105
Min length3

Characters and Unicode

Total characters235
Distinct characters57
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)100.0%

Sample

1st rowケファ
2nd rowキンポゴンハン
3rd rowコンハンシジャン
4th rowシンバンファ
5th rowマゴンナル
ValueCountFrequency (%)
ケファ 1
 
2.6%
サムソン·チュンアン 1
 
2.6%
トゥンチョノリュン 1
 
2.6%
シンバンポ 1
 
2.6%
コソクターミナル 1
 
2.6%
サピョン 1
 
2.6%
シンノンヒョン 1
 
2.6%
オンジュ 1
 
2.6%
ソンジョンヌン 1
 
2.6%
チョンハブンドンジャン 1
 
2.6%
Other values (28) 28
73.7%
2023-12-12T15:39:43.903224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
64
27.2%
12
 
5.1%
11
 
4.7%
10
 
4.3%
9
 
3.8%
7
 
3.0%
6
 
2.6%
6
 
2.6%
6
 
2.6%
6
 
2.6%
Other values (47) 98
41.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 232
98.7%
Other Punctuation 2
 
0.9%
Modifier Letter 1
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
64
27.6%
12
 
5.2%
11
 
4.7%
10
 
4.3%
9
 
3.9%
7
 
3.0%
6
 
2.6%
6
 
2.6%
6
 
2.6%
6
 
2.6%
Other values (44) 95
40.9%
Other Punctuation
ValueCountFrequency (%)
1
50.0%
· 1
50.0%
Modifier Letter
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Katakana 232
98.7%
Common 3
 
1.3%

Most frequent character per script

Katakana
ValueCountFrequency (%)
64
27.6%
12
 
5.2%
11
 
4.7%
10
 
4.3%
9
 
3.9%
7
 
3.0%
6
 
2.6%
6
 
2.6%
6
 
2.6%
6
 
2.6%
Other values (44) 95
40.9%
Common
ValueCountFrequency (%)
1
33.3%
· 1
33.3%
1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Katakana 234
99.6%
None 1
 
0.4%

Most frequent character per block

Katakana
ValueCountFrequency (%)
64
27.4%
12
 
5.1%
11
 
4.7%
10
 
4.3%
9
 
3.8%
7
 
3.0%
6
 
2.6%
6
 
2.6%
6
 
2.6%
6
 
2.6%
Other values (46) 97
41.5%
None
ValueCountFrequency (%)
· 1
100.0%
Distinct38
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size436.0 B
2023-12-12T15:39:44.218973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length8
Mean length3.7368421
Min length2

Characters and Unicode

Total characters142
Distinct characters108
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)100.0%

Sample

1st row开花
2nd row金浦机场
3rd row机场市场
4th row新傍花
5th row麻谷渡口
ValueCountFrequency (%)
开花 1
 
2.6%
三成中央 1
 
2.6%
遁村五轮 1
 
2.6%
新盘浦 1
 
2.6%
高速巴士客运站 1
 
2.6%
砂平 1
 
2.6%
新论岘 1
 
2.6%
彦州 1
 
2.6%
宣靖陵 1
 
2.6%
综合运动场 1
 
2.6%
Other values (28) 28
73.7%
2023-12-12T15:39:44.694722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4
 
2.8%
4
 
2.8%
4
 
2.8%
) 3
 
2.1%
3
 
2.1%
3
 
2.1%
( 3
 
2.1%
3
 
2.1%
3
 
2.1%
2
 
1.4%
Other values (98) 110
77.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 136
95.8%
Close Punctuation 3
 
2.1%
Open Punctuation 3
 
2.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4
 
2.9%
4
 
2.9%
4
 
2.9%
3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
2
 
1.5%
2
 
1.5%
2
 
1.5%
Other values (96) 106
77.9%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 136
95.8%
Common 6
 
4.2%

Most frequent character per script

Han
ValueCountFrequency (%)
4
 
2.9%
4
 
2.9%
4
 
2.9%
3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
2
 
1.5%
2
 
1.5%
2
 
1.5%
Other values (96) 106
77.9%
Common
ValueCountFrequency (%)
) 3
50.0%
( 3
50.0%

Most occurring blocks

ValueCountFrequency (%)
CJK 135
95.1%
ASCII 6
 
4.2%
CJK Compat Ideographs 1
 
0.7%

Most frequent character per block

CJK
ValueCountFrequency (%)
4
 
3.0%
4
 
3.0%
4
 
3.0%
3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
2
 
1.5%
2
 
1.5%
2
 
1.5%
Other values (95) 105
77.8%
ASCII
ValueCountFrequency (%)
) 3
50.0%
( 3
50.0%
CJK Compat Ideographs
ValueCountFrequency (%)
1
100.0%

역명(중국어 번체)
Categorical

IMBALANCE 

Distinct7
Distinct (%)18.4%
Missing0
Missing (%)0.0%
Memory size436.0 B
-
32 
開花
 
1
新芳華站
 
1
麻谷나루
 
1
堂山
 
1
Other values (2)
 
2

Length

Max length5
Median length1
Mean length1.3684211
Min length1

Unique

Unique6 ?
Unique (%)15.8%

Sample

1st row開花
2nd row-
3rd row-
4th row新芳華站
5th row麻谷나루

Common Values

ValueCountFrequency (%)
- 32
84.2%
開花 1
 
2.6%
新芳華站 1
 
2.6%
麻谷나루 1
 
2.6%
堂山 1
 
2.6%
鷺梁津 1
 
2.6%
高速터미널 1
 
2.6%

Length

2023-12-12T15:39:44.867206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:39:45.048306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
32
84.2%
開花 1
 
2.6%
新芳華站 1
 
2.6%
麻谷나루 1
 
2.6%
堂山 1
 
2.6%
鷺梁津 1
 
2.6%
高速터미널 1
 
2.6%

Correlations

2023-12-12T15:39:45.150424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명역명(영문)역명(로마자)역명(일본어)역명(중국어 간체)역명(중국어 번체)
역명1.0001.0001.0001.0001.0001.000
역명(영문)1.0001.0001.0001.0001.0001.000
역명(로마자)1.0001.0001.0001.0001.0001.000
역명(일본어)1.0001.0001.0001.0001.0001.000
역명(중국어 간체)1.0001.0001.0001.0001.0001.000
역명(중국어 번체)1.0001.0001.0001.0001.0001.000

Missing values

2023-12-12T15:39:40.731039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:39:40.896014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

역명역명(영문)역명(로마자)역명(일본어)역명(중국어 간체)역명(중국어 번체)
0개화GaehwaGaehwaケファ开花開花
1김포공항Gimpo Int'l AirportGimpo Int'l Airportキンポゴンハン金浦机场-
2공항시장Airport MarketAirport Marketコンハンシジャン机场市场-
3신방화SinbanghwaSinbanghwaシンバンファ新傍花新芳華站
4마곡나루MagongnaruMagongnaruマゴンナル麻谷渡口麻谷나루
5양천향교Yangcheon HyanggyoYangcheon Hyanggyoヤンチョンヒャンギョ阳川乡校-
6가양GayangGayangカヤン加阳-
7증미JeungmiJeungmiチュンミ曾米-
8등촌DeungchonDeungchonドゥンチョン登村-
9염창YeomchangYeomchangヨムチャン盐仓-
역명역명(영문)역명(로마자)역명(일본어)역명(중국어 간체)역명(중국어 번체)
28봉은사BongeunsaBongeunsaポンウンサ奉恩寺-
29종합운동장Sports ComplexSports Complexチョンハブンドンジャン综合运动场-
30삼전SamjeonSamjeonサムジョン三田-
31석촌고분Seokchon GobunSeokchon Gobunソクチョンゴブン石村古坟-
32석촌SeokchonSeokchonソクチョン石村-
33송파나루SongpanaruSongpanaruソンパナル松坡渡口-
34한성백제Hanseong BaekjeHanseongBaekjeハンソンベクチェ汉城百济-
35올림픽공원(한국체대)Olympic ParkOlympicGongwonオリンピック・コンウォン奥林匹克公园(韩国体育大学)-
36둔촌오륜Dunchon OryunDunchon Oryunトゥンチョノリュン遁村五轮-
37중앙보훈병원VHS Medical Centerjoongangbohunbyeongwonチュンアンボフンビョンウォン中央报勋医院-