gimi9 Pandas Profiling

Dataset statistics

Number of variables	6
Number of observations	38
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	1.9 KiB
Average record size in memory	51.5 B

Variable types

Text	5
Categorical	1

Dataset

Description	역명(한글),역명(영문),역명(로마자),역명(일본어),역명(중국어간체),역명(중국어번체) 등의 정보를 제공
URL	https://www.data.go.kr/data/15064049/fileData.do

Alerts

`역명(중국어 번체)` is highly imbalanced (63.0%)	Imbalance
`역명` has unique values	Unique
`역명(영문)` has unique values	Unique
`역명(로마자)` has unique values	Unique
`역명(일본어)` has unique values	Unique
`역명(중국어 간체)` has unique values	Unique

Reproduction

Analysis started	2023-12-12 06:39:40.318417
Analysis finished	2023-12-12 06:39:40.954607
Duration	0.64 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

역명
Text

UNIQUE

Distinct	38
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	436.0 B

Length

Max length	11
Median length	9
Mean length	3.6315789
Min length	2

Characters and Unicode

Total characters	138
Distinct characters	91
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	38 ?
Unique (%)	100.0%

Sample

1st row	개화
2nd row	김포공항
3rd row	공항시장
4th row	신방화
5th row	마곡나루

Value	Count	Frequency (%)
개화	1	2.6%
삼성중앙	1	2.6%
둔촌오륜	1	2.6%
신반포	1	2.6%
고속터미널	1	2.6%
사평	1	2.6%
신논현	1	2.6%
언주	1	2.6%
선정릉	1	2.6%
종합운동장	1	2.6%
Other values (28)	28	73.7%

Most occurring characters

Value	Count	Frequency (%)
신	4	2.9%
촌	4	2.9%
(	3	2.2%
포	3	2.2%
원	3	2.2%
)	3	2.2%
앙	3	2.2%
중	3	2.2%
석	3	2.2%
사	3	2.2%
Other values (81)	106	76.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	132	95.7%
Open Punctuation	3	2.2%
Close Punctuation	3	2.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
신	4	3.0%
촌	4	3.0%
포	3	2.3%
원	3	2.3%
앙	3	2.3%
중	3	2.3%
석	3	2.3%
사	3	2.3%
동	3	2.3%
공	3	2.3%
Other values (79)	100	75.8%

Open Punctuation

Value	Count	Frequency (%)
(	3	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	132	95.7%
Common	6	4.3%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
신	4	3.0%
촌	4	3.0%
포	3	2.3%
원	3	2.3%
앙	3	2.3%
중	3	2.3%
석	3	2.3%
사	3	2.3%
동	3	2.3%
공	3	2.3%
Other values (79)	100	75.8%

Common

Value	Count	Frequency (%)
(	3	50.0%
)	3	50.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	132	95.7%
ASCII	6	4.3%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
신	4	3.0%
촌	4	3.0%
포	3	2.3%
원	3	2.3%
앙	3	2.3%
중	3	2.3%
석	3	2.3%
사	3	2.3%
동	3	2.3%
공	3	2.3%
Other values (79)	100	75.8%

ASCII

Value	Count	Frequency (%)
(	3	50.0%
)	3	50.0%

역명(영문)
Text

UNIQUE

Distinct	38
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	436.0 B

Length

Max length	33
Median length	18.5
Mean length	11.894737
Min length	5

Characters and Unicode

Total characters	452
Distinct characters	45
Distinct categories	6 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	38 ?
Unique (%)	100.0%

Sample

1st row	Gaehwa
2nd row	Gimpo Int'l Airport
3rd row	Airport Market
4th row	Sinbanghwa
5th row	Magongnaru

Value	Count	Frequency (%)
airport	2	3.5%
seokchon	2	3.5%
national	2	3.5%
gaehwa	1	1.8%
complex	1	1.8%
sinbanpo	1	1.8%
express	1	1.8%
bus	1	1.8%
terminal	1	1.8%
sapyeong	1	1.8%
Other values (44)	44	77.2%

Most occurring characters

Value	Count	Frequency (%)
n	58	12.8%
o	42	9.3%
e	37	8.2%
a	36	8.0%
g	27	6.0%
u	19	4.2%
	19	4.2%
i	17	3.8%
S	16	3.5%
r	16	3.5%
Other values (35)	165	36.5%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	368	81.4%
Uppercase Letter	60	13.3%
Space Separator	19	4.2%
Open Punctuation	2	0.4%
Close Punctuation	2	0.4%
Other Punctuation	1	0.2%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
n	58	15.8%
o	42	11.4%
e	37	10.1%
a	36	9.8%
g	27	7.3%
u	19	5.2%
i	17	4.6%
r	16	4.3%
s	12	3.3%
y	12	3.3%
Other values (13)	92	25.0%

Uppercase Letter

Value	Count	Frequency (%)
S	16	26.7%
G	5	8.3%
C	4	6.7%
A	4	6.7%
H	4	6.7%
D	4	6.7%
N	4	6.7%
B	3	5.0%
M	3	5.0%
Y	3	5.0%
Other values (8)	10	16.7%

Space Separator

Value	Count	Frequency (%)
	19	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	2	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	2	100.0%

Other Punctuation

Value	Count	Frequency (%)
'	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	428	94.7%
Common	24	5.3%

Most frequent character per script

Latin

Value	Count	Frequency (%)
n	58	13.6%
o	42	9.8%
e	37	8.6%
a	36	8.4%
g	27	6.3%
u	19	4.4%
i	17	4.0%
S	16	3.7%
r	16	3.7%
s	12	2.8%
Other values (31)	148	34.6%

Common

Value	Count	Frequency (%)
	19	79.2%
(	2	8.3%
)	2	8.3%
'	1	4.2%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	452	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
n	58	12.8%
o	42	9.3%
e	37	8.2%
a	36	8.0%
g	27	6.0%
u	19	4.2%
	19	4.2%
i	17	3.8%
S	16	3.5%
r	16	3.5%
Other values (35)	165	36.5%

역명(로마자)
Text

UNIQUE

Distinct	38
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	436.0 B

Length

Max length	33
Median length	20
Mean length	12.026316
Min length	5

Characters and Unicode

Total characters	457
Distinct characters	43
Distinct categories	6 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	38 ?
Unique (%)	100.0%

Sample

1st row	Gaehwa
2nd row	Gimpo Int'l Airport
3rd row	Airport Market
4th row	Sinbanghwa
5th row	Magongnaru

Value	Count	Frequency (%)
airport	2	3.8%
seokchon	2	3.8%
national	2	3.8%
gaehwa	1	1.9%
seonjeongneung	1	1.9%
dunchon	1	1.9%
oryun	1	1.9%
sinbanpo	1	1.9%
express	1	1.9%
bus	1	1.9%
Other values (40)	40	75.5%

Most occurring characters

Value	Count	Frequency (%)
n	64	14.0%
o	49	10.7%
e	35	7.7%
a	35	7.7%
g	31	6.8%
u	20	4.4%
i	16	3.5%
	15	3.3%
S	15	3.3%
r	14	3.1%
Other values (33)	163	35.7%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	382	83.6%
Uppercase Letter	55	12.0%
Space Separator	15	3.3%
Close Punctuation	2	0.4%
Open Punctuation	2	0.4%
Other Punctuation	1	0.2%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
n	64	16.8%
o	49	12.8%
e	35	9.2%
a	35	9.2%
g	31	8.1%
u	20	5.2%
i	16	4.2%
r	14	3.7%
y	13	3.4%
s	12	3.1%
Other values (13)	93	24.3%

Uppercase Letter

Value	Count	Frequency (%)
S	15	27.3%
G	6	10.9%
N	4	7.3%
D	4	7.3%
A	4	7.3%
Y	3	5.5%
H	3	5.5%
B	3	5.5%
C	3	5.5%
M	2	3.6%
Other values (6)	8	14.5%

Space Separator

Value	Count	Frequency (%)
	15	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	2	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	2	100.0%

Other Punctuation

Value	Count	Frequency (%)
'	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	437	95.6%
Common	20	4.4%

Most frequent character per script

Latin

Value	Count	Frequency (%)
n	64	14.6%
o	49	11.2%
e	35	8.0%
a	35	8.0%
g	31	7.1%
u	20	4.6%
i	16	3.7%
S	15	3.4%
r	14	3.2%
y	13	3.0%
Other values (29)	145	33.2%

Common

Value	Count	Frequency (%)
	15	75.0%
)	2	10.0%
(	2	10.0%
'	1	5.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	457	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
n	64	14.0%
o	49	10.7%
e	35	7.7%
a	35	7.7%
g	31	6.8%
u	20	4.4%
i	16	3.5%
	15	3.3%
S	15	3.3%
r	14	3.1%
Other values (33)	163	35.7%

역명(일본어)
Text

UNIQUE

Distinct	38
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	436.0 B

Length

Max length	14
Median length	11
Mean length	6.1842105
Min length	3

Characters and Unicode

Total characters	235
Distinct characters	57
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	38 ?
Unique (%)	100.0%

Sample

1st row	ケファ
2nd row	キンポゴンハン
3rd row	コンハンシジャン
4th row	シンバンファ
5th row	マゴンナル

Value	Count	Frequency (%)
ケファ	1	2.6%
サムソン·チュンアン	1	2.6%
トゥンチョノリュン	1	2.6%
シンバンポ	1	2.6%
コソクターミナル	1	2.6%
サピョン	1	2.6%
シンノンヒョン	1	2.6%
オンジュ	1	2.6%
ソンジョンヌン	1	2.6%
チョンハブンドンジャン	1	2.6%
Other values (28)	28	73.7%

Most occurring characters

Value	Count	Frequency (%)
ン	64	27.2%
ョ	12	5.1%
チ	11	4.7%
ク	10	4.3%
ソ	9	3.8%
ジ	7	3.0%
サ	6	2.6%
ド	6	2.6%
ュ	6	2.6%
ャ	6	2.6%
Other values (47)	98	41.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	232	98.7%
Other Punctuation	2	0.9%
Modifier Letter	1	0.4%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
ン	64	27.6%
ョ	12	5.2%
チ	11	4.7%
ク	10	4.3%
ソ	9	3.9%
ジ	7	3.0%
サ	6	2.6%
ド	6	2.6%
ュ	6	2.6%
ャ	6	2.6%
Other values (44)	95	40.9%

Other Punctuation

Value	Count	Frequency (%)
・	1	50.0%
·	1	50.0%

Modifier Letter

Value	Count	Frequency (%)
ー	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Katakana	232	98.7%
Common	3	1.3%

Most frequent character per script

Katakana

Value	Count	Frequency (%)
ン	64	27.6%
ョ	12	5.2%
チ	11	4.7%
ク	10	4.3%
ソ	9	3.9%
ジ	7	3.0%
サ	6	2.6%
ド	6	2.6%
ュ	6	2.6%
ャ	6	2.6%
Other values (44)	95	40.9%

Common

Value	Count	Frequency (%)
・	1	33.3%
·	1	33.3%
ー	1	33.3%

Most occurring blocks

Value	Count	Frequency (%)
Katakana	234	99.6%
None	1	0.4%

Most frequent character per block

Katakana

Value	Count	Frequency (%)
ン	64	27.4%
ョ	12	5.1%
チ	11	4.7%
ク	10	4.3%
ソ	9	3.8%
ジ	7	3.0%
サ	6	2.6%
ド	6	2.6%
ュ	6	2.6%
ャ	6	2.6%
Other values (46)	97	41.5%

None

Value	Count	Frequency (%)
·	1	100.0%

역명(중국어 간체)
Text

UNIQUE

Distinct	38
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	436.0 B

Length

Max length	14
Median length	8
Mean length	3.7368421
Min length	2

Characters and Unicode

Total characters	142
Distinct characters	108
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	38 ?
Unique (%)	100.0%

Sample

1st row	开花
2nd row	金浦机场
3rd row	机场市场
4th row	新傍花
5th row	麻谷渡口

Value	Count	Frequency (%)
开花	1	2.6%
三成中央	1	2.6%
遁村五轮	1	2.6%
新盘浦	1	2.6%
高速巴士客运站	1	2.6%
砂平	1	2.6%
新论岘	1	2.6%
彦州	1	2.6%
宣靖陵	1	2.6%
综合运动场	1	2.6%
Other values (28)	28	73.7%

Most occurring characters

Value	Count	Frequency (%)
村	4	2.8%
场	4	2.8%
新	4	2.8%
)	3	2.1%
浦	3	2.1%
石	3	2.1%
(	3	2.1%
中	3	2.1%
央	3	2.1%
运	2	1.4%
Other values (98)	110	77.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	136	95.8%
Close Punctuation	3	2.1%
Open Punctuation	3	2.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
村	4	2.9%
场	4	2.9%
新	4	2.9%
浦	3	2.2%
石	3	2.2%
中	3	2.2%
央	3	2.2%
运	2	1.5%
国	2	1.5%
大	2	1.5%
Other values (96)	106	77.9%

Close Punctuation

Value	Count	Frequency (%)
)	3	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Han	136	95.8%
Common	6	4.2%

Most frequent character per script

Han

Value	Count	Frequency (%)
村	4	2.9%
场	4	2.9%
新	4	2.9%
浦	3	2.2%
石	3	2.2%
中	3	2.2%
央	3	2.2%
运	2	1.5%
国	2	1.5%
大	2	1.5%
Other values (96)	106	77.9%

Common

Value	Count	Frequency (%)
)	3	50.0%
(	3	50.0%

Most occurring blocks

Value	Count	Frequency (%)
CJK	135	95.1%
ASCII	6	4.2%
CJK Compat Ideographs	1	0.7%

Most frequent character per block

CJK

Value	Count	Frequency (%)
村	4	3.0%
场	4	3.0%
新	4	3.0%
浦	3	2.2%
石	3	2.2%
中	3	2.2%
央	3	2.2%
运	2	1.5%
国	2	1.5%
大	2	1.5%
Other values (95)	105	77.8%

ASCII

Value	Count	Frequency (%)
)	3	50.0%
(	3	50.0%

CJK Compat Ideographs

Value	Count	Frequency (%)
鷺	1	100.0%

역명(중국어 번체)
Categorical

IMBALANCE

Distinct	7
Distinct (%)	18.4%
Missing	0
Missing (%)	0.0%
Memory size	436.0 B

-	32
開花	1
新芳華站	1
麻谷나루	1
堂山	1
Other values (2)	2

Length

Max length	5
Median length	1
Mean length	1.3684211
Min length	1

Unique

Unique	6 ?
Unique (%)	15.8%

Sample

1st row	開花
2nd row	-
3rd row	-
4th row	新芳華站
5th row	麻谷나루

Common Values

Value	Count	Frequency (%)
-	32	84.2%
開花	1	2.6%
新芳華站	1	2.6%
麻谷나루	1	2.6%
堂山	1	2.6%
鷺梁津	1	2.6%
高速터미널	1	2.6%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
	32	84.2%
開花	1	2.6%
新芳華站	1	2.6%
麻谷나루	1	2.6%
堂山	1	2.6%
鷺梁津	1	2.6%
高速터미널	1	2.6%

Phik (φk)

Heatmap
Table

	역명	역명(영문)	역명(로마자)	역명(일본어)	역명(중국어 간체)	역명(중국어 번체)
역명	1.000	1.000	1.000	1.000	1.000	1.000
역명(영문)	1.000	1.000	1.000	1.000	1.000	1.000
역명(로마자)	1.000	1.000	1.000	1.000	1.000	1.000
역명(일본어)	1.000	1.000	1.000	1.000	1.000	1.000
역명(중국어 간체)	1.000	1.000	1.000	1.000	1.000	1.000
역명(중국어 번체)	1.000	1.000	1.000	1.000	1.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	역명	역명(영문)	역명(로마자)	역명(일본어)	역명(중국어 간체)	역명(중국어 번체)
0	개화	Gaehwa	Gaehwa	ケファ	开花	開花
1	김포공항	Gimpo Int'l Airport	Gimpo Int'l Airport	キンポゴンハン	金浦机场	-
2	공항시장	Airport Market	Airport Market	コンハンシジャン	机场市场	-
3	신방화	Sinbanghwa	Sinbanghwa	シンバンファ	新傍花	新芳華站
4	마곡나루	Magongnaru	Magongnaru	マゴンナル	麻谷渡口	麻谷나루
5	양천향교	Yangcheon Hyanggyo	Yangcheon Hyanggyo	ヤンチョンヒャンギョ	阳川乡校	-
6	가양	Gayang	Gayang	カヤン	加阳	-
7	증미	Jeungmi	Jeungmi	チュンミ	曾米	-
8	등촌	Deungchon	Deungchon	ドゥンチョン	登村	-
9	염창	Yeomchang	Yeomchang	ヨムチャン	盐仓	-

	역명	역명(영문)	역명(로마자)	역명(일본어)	역명(중국어 간체)	역명(중국어 번체)
28	봉은사	Bongeunsa	Bongeunsa	ポンウンサ	奉恩寺	-
29	종합운동장	Sports Complex	Sports Complex	チョンハブンドンジャン	综合运动场	-
30	삼전	Samjeon	Samjeon	サムジョン	三田	-
31	석촌고분	Seokchon Gobun	Seokchon Gobun	ソクチョンゴブン	石村古坟	-
32	석촌	Seokchon	Seokchon	ソクチョン	石村	-
33	송파나루	Songpanaru	Songpanaru	ソンパナル	松坡渡口	-
34	한성백제	Hanseong Baekje	HanseongBaekje	ハンソンベクチェ	汉城百济	-
35	올림픽공원(한국체대)	Olympic Park	OlympicGongwon	オリンピック・コンウォン	奥林匹克公园(韩国体育大学)	-
36	둔촌오륜	Dunchon Oryun	Dunchon Oryun	トゥンチョノリュン	遁村五轮	-
37	중앙보훈병원	VHS Medical Center	joongangbohunbyeongwon	チュンアンボフンビョンウォン	中央报勋医院	-

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Open Punctuation

Close Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Space Separator

Open Punctuation

Close Punctuation

Other Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Space Separator

Close Punctuation

Open Punctuation

Other Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Other Punctuation

Modifier Letter

Most occurring scripts

Most frequent character per script

Katakana

Common

Most occurring blocks

Most frequent character per block

Katakana

None

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Close Punctuation

Open Punctuation

Most occurring scripts

Most frequent character per script

Han

Common

Most occurring blocks

Most frequent character per block

CJK

ASCII

CJK Compat Ideographs

Common Values

Length

Common Values (Plot)