gimi9 Pandas Profiling

Dataset statistics

Number of variables	6
Number of observations	30
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	1.5 KiB
Average record size in memory	52.4 B

Variable types

Text	6

Dataset

Description	수도권 인천1호선의 도시광역철도역들에 대한 한글,영문,로마자,일본어,중국어(간체,번체) 등의 정보입니다.
Author	국가철도공단
URL	https://www.data.go.kr/data/15064661/fileData.do

Alerts

`역명` has unique values	Unique
`역명(영문)` has unique values	Unique
`역명(로마자)` has unique values	Unique
`역명(일본어)` has unique values	Unique
`역명(중국어 간체)` has unique values	Unique

Reproduction

Analysis started	2023-12-12 15:17:14.796325
Analysis finished	2023-12-12 15:17:15.313809
Duration	0.52 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

역명
Text

UNIQUE

Distinct	30
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	372.0 B

Length

Max length	8
Median length	6
Mean length	3.7333333
Min length	2

Characters and Unicode

Total characters	112
Distinct characters	75
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	30 ?
Unique (%)	100.0%

Sample

1st row	계양
2nd row	귤현
3rd row	박촌
4th row	임학
5th row	계산

Value	Count	Frequency (%)
계양	1	3.3%
귤현	1	3.3%
국제업무지구	1	3.3%
센트럴파크	1	3.3%
인천대입구	1	3.3%
지식정보단지	1	3.3%
테크노파크	1	3.3%
캠퍼스타운	1	3.3%
동막	1	3.3%
동춘	1	3.3%
Other values (20)	20	66.7%

Most occurring characters

Value	Count	Frequency (%)
인	5	4.5%
평	4	3.6%
구	4	3.6%
부	4	3.6%
천	3	2.7%
학	3	2.7%
동	3	2.7%
지	3	2.7%
크	3	2.7%
파	2	1.8%
Other values (65)	78	69.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	112	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
인	5	4.5%
평	4	3.6%
구	4	3.6%
부	4	3.6%
천	3	2.7%
학	3	2.7%
동	3	2.7%
지	3	2.7%
크	3	2.7%
파	2	1.8%
Other values (65)	78	69.6%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	112	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
인	5	4.5%
평	4	3.6%
구	4	3.6%
부	4	3.6%
천	3	2.7%
학	3	2.7%
동	3	2.7%
지	3	2.7%
크	3	2.7%
파	2	1.8%
Other values (65)	78	69.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	112	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
인	5	4.5%
평	4	3.6%
구	4	3.6%
부	4	3.6%
천	3	2.7%
학	3	2.7%
동	3	2.7%
지	3	2.7%
크	3	2.7%
파	2	1.8%
Other values (65)	78	69.6%

역명(영문)
Text

UNIQUE

Distinct	30
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	372.0 B

Length

Max length	33
Median length	21
Mean length	13.166667
Min length	5

Characters and Unicode

Total characters	395
Distinct characters	47
Distinct categories	5 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	30 ?
Unique (%)	100.0%

Sample

1st row	Gyeyang
2nd row	Gyulhyeon
3rd row	Bakchon
4th row	Imhak
5th row	Gyesan

Value	Count	Frequency (%)
incheon	3	5.5%
bupyeong	2	3.6%
park	2	3.6%
gyeyang	1	1.8%
sports	1	1.8%
complex	1	1.8%
seonhak	1	1.8%
sinyeonsu	1	1.8%
woninjae	1	1.8%
dongchun	1	1.8%
Other values (41)	41	74.5%

Most occurring characters

Value	Count	Frequency (%)
n	43	10.9%
o	33	8.4%
e	31	7.8%
a	26	6.6%
	25	6.3%
i	18	4.6%
s	16	4.1%
t	16	4.1%
u	16	4.1%
g	15	3.8%
Other values (37)	156	39.5%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	310	78.5%
Uppercase Letter	55	13.9%
Space Separator	25	6.3%
Other Punctuation	4	1.0%
Dash Punctuation	1	0.3%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
n	43	13.9%
o	33	10.6%
e	31	10.0%
a	26	8.4%
i	18	5.8%
s	16	5.2%
t	16	5.2%
u	16	5.2%
g	15	4.8%
r	14	4.5%
Other values (13)	82	26.5%

Uppercase Letter

Value	Count	Frequency (%)
B	8	14.5%
C	6	10.9%
G	6	10.9%
I	6	10.9%
S	4	7.3%
T	4	7.3%
D	4	7.3%
M	3	5.5%
U	2	3.6%
P	2	3.6%
Other values (9)	10	18.2%

Other Punctuation

Value	Count	Frequency (%)
'	2	50.0%
.	1	25.0%
&	1	25.0%

Space Separator

Value	Count	Frequency (%)
	25	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	365	92.4%
Common	30	7.6%

Most frequent character per script

Latin

Value	Count	Frequency (%)
n	43	11.8%
o	33	9.0%
e	31	8.5%
a	26	7.1%
i	18	4.9%
s	16	4.4%
t	16	4.4%
u	16	4.4%
g	15	4.1%
r	14	3.8%
Other values (32)	137	37.5%

Common

Value	Count	Frequency (%)
	25	83.3%
'	2	6.7%
.	1	3.3%
&	1	3.3%
-	1	3.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	395	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
n	43	10.9%
o	33	8.4%
e	31	7.8%
a	26	6.6%
	25	6.3%
i	18	4.6%
s	16	4.1%
t	16	4.1%
u	16	4.1%
g	15	3.8%
Other values (37)	156	39.5%

역명(로마자)
Text

UNIQUE

Distinct	30
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	372.0 B

Length

Max length	33
Median length	15
Mean length	11.5
Min length	1

Characters and Unicode

Total characters	345
Distinct characters	43
Distinct categories	5 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	30 ?
Unique (%)	100.0%

Sample

1st row	Gyeyang
2nd row	Gyulhyeon
3rd row	Bakchon
4th row	Imhak
5th row	Gyesan

Value	Count	Frequency (%)
incheon	3	7.0%
bupyeong	2	4.7%
univ	2	4.7%
of	2	4.7%
woninjae	1	2.3%
center	1	2.3%
terminal	1	2.3%
munhak	1	2.3%
stadium	1	2.3%
seonhak	1	2.3%
Other values (28)	28	65.1%

Most occurring characters

Value	Count	Frequency (%)
n	39	11.3%
e	36	10.4%
o	30	8.7%
a	24	7.0%
u	21	6.1%
i	16	4.6%
g	15	4.3%
	13	3.8%
y	12	3.5%
k	12	3.5%
Other values (33)	127	36.8%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	284	82.3%
Uppercase Letter	40	11.6%
Space Separator	13	3.8%
Dash Punctuation	5	1.4%
Other Punctuation	3	0.9%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
n	39	13.7%
e	36	12.7%
o	30	10.6%
a	24	8.5%
u	21	7.4%
i	16	5.6%
g	15	5.3%
y	12	4.2%
k	12	4.2%
s	9	3.2%
Other values (12)	70	24.6%

Uppercase Letter

Value	Count	Frequency (%)
G	7	17.5%
B	5	12.5%
I	4	10.0%
S	4	10.0%
D	3	7.5%
T	2	5.0%
C	2	5.0%
M	2	5.0%
U	2	5.0%
J	2	5.0%
Other values (7)	7	17.5%

Other Punctuation

Value	Count	Frequency (%)
.	2	66.7%
'	1	33.3%

Space Separator

Value	Count	Frequency (%)
	13	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	324	93.9%
Common	21	6.1%

Most frequent character per script

Latin

Value	Count	Frequency (%)
n	39	12.0%
e	36	11.1%
o	30	9.3%
a	24	7.4%
u	21	6.5%
i	16	4.9%
g	15	4.6%
y	12	3.7%
k	12	3.7%
s	9	2.8%
Other values (29)	110	34.0%

Common

Value	Count	Frequency (%)
	13	61.9%
-	5	23.8%
.	2	9.5%
'	1	4.8%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	345	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
n	39	11.3%
e	36	10.4%
o	30	8.7%
a	24	7.0%
u	21	6.1%
i	16	4.6%
g	15	4.3%
	13	3.8%
y	12	3.5%
k	12	3.5%
Other values (33)	127	36.8%

역명(일본어)
Text

UNIQUE

Distinct	30
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	372.0 B

Length

Max length	17
Median length	10.5
Mean length	7.0666667
Min length	3

Characters and Unicode

Total characters	212
Distinct characters	50
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	30 ?
Unique (%)	100.0%

Sample

1st row	ケヤン
2nd row	キュルヒョン
3rd row	パクチョン
4th row	イムハク
5th row	ケサン

Value	Count	Frequency (%)
ケヤン	1	3.3%
キュルヒョン	1	3.3%
ククチェオンムジグ	1	3.3%
セントラルパ-ク	1	3.3%
インチョンデイック	1	3.3%
チシクチョンボダンジ	1	3.3%
テクノパ-ク	1	3.3%
キャンパスタウン	1	3.3%
トンマク	1	3.3%
トンチュン	1	3.3%
Other values (20)	20	66.7%

Most occurring characters

Value	Count	Frequency (%)
ン	42	19.8%
ョ	17	8.0%
ク	16	7.5%
チ	14	6.6%
イ	8	3.8%
ル	6	2.8%
ェ	5	2.4%
ジ	5	2.4%
ピ	5	2.4%
ャ	4	1.9%
Other values (40)	90	42.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	207	97.6%
Dash Punctuation	3	1.4%
Other Punctuation	2	0.9%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
ン	42	20.3%
ョ	17	8.2%
ク	16	7.7%
チ	14	6.8%
イ	8	3.9%
ル	6	2.9%
ェ	5	2.4%
ジ	5	2.4%
ピ	5	2.4%
ャ	4	1.9%
Other values (38)	85	41.1%

Dash Punctuation

Value	Count	Frequency (%)
-	3	100.0%

Other Punctuation

Value	Count	Frequency (%)
․	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Katakana	207	97.6%
Common	5	2.4%

Most frequent character per script

Katakana

Value	Count	Frequency (%)
ン	42	20.3%
ョ	17	8.2%
ク	16	7.7%
チ	14	6.8%
イ	8	3.9%
ル	6	2.9%
ェ	5	2.4%
ジ	5	2.4%
ピ	5	2.4%
ャ	4	1.9%
Other values (38)	85	41.1%

Common

Value	Count	Frequency (%)
-	3	60.0%
․	2	40.0%

Most occurring blocks

Value	Count	Frequency (%)
Katakana	207	97.6%
ASCII	3	1.4%
Punctuation	2	0.9%

Most frequent character per block

Katakana

Value	Count	Frequency (%)
ン	42	20.3%
ョ	17	8.2%
ク	16	7.7%
チ	14	6.8%
イ	8	3.9%
ル	6	2.9%
ェ	5	2.4%
ジ	5	2.4%
ピ	5	2.4%
ャ	4	1.9%
Other values (38)	85	41.1%

ASCII

Value	Count	Frequency (%)
-	3	100.0%

Punctuation

Value	Count	Frequency (%)
․	2	100.0%

역명(중국어 간체)
Text

UNIQUE

Distinct	30
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	372.0 B

Length

Max length	14
Median length	6
Mean length	3.7
Min length	1

Characters and Unicode

Total characters	111
Distinct characters	74
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	30 ?
Unique (%)	100.0%

Sample

1st row	桂阳
2nd row	橘岘
3rd row	朴村
4th row	林鹤
5th row	桂山

Value	Count	Frequency (%)
桂阳	1	3.3%
橘岘	1	3.3%
国际业务园区	1	3.3%
中央公园	1	3.3%
仁川大入口	1	3.3%
知识信息园区(知识情报团地	1	3.3%
科技公园	1	3.3%
大学城	1	3.3%
东幕	1	3.3%
东春	1	3.3%
Other values (20)	20	66.7%

Most occurring characters

Value	Count	Frequency (%)
仁	5	4.5%
平	4	3.6%
园	4	3.6%
口	4	3.6%
富	4	3.6%
大	3	2.7%
区	3	2.7%
川	3	2.7%
鹤	3	2.7%
东	3	2.7%
Other values (64)	75	67.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	108	97.3%
Open Punctuation	1	0.9%
Close Punctuation	1	0.9%
Dash Punctuation	1	0.9%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
仁	5	4.6%
平	4	3.7%
园	4	3.7%
口	4	3.7%
富	4	3.7%
大	3	2.8%
区	3	2.8%
川	3	2.8%
鹤	3	2.8%
东	3	2.8%
Other values (61)	72	66.7%

Open Punctuation

Value	Count	Frequency (%)
(	1	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	1	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Han	108	97.3%
Common	3	2.7%

Most frequent character per script

Han

Value	Count	Frequency (%)
仁	5	4.6%
平	4	3.7%
园	4	3.7%
口	4	3.7%
富	4	3.7%
大	3	2.8%
区	3	2.8%
川	3	2.8%
鹤	3	2.8%
东	3	2.8%
Other values (61)	72	66.7%

Common

Value	Count	Frequency (%)
(	1	33.3%
)	1	33.3%
-	1	33.3%

Most occurring blocks

Value	Count	Frequency (%)
CJK	108	97.3%
ASCII	3	2.7%

Most frequent character per block

CJK

Value	Count	Frequency (%)
仁	5	4.6%
平	4	3.7%
园	4	3.7%
口	4	3.7%
富	4	3.7%
大	3	2.8%
区	3	2.8%
川	3	2.8%
鹤	3	2.8%
东	3	2.8%
Other values (61)	72	66.7%

ASCII

Value	Count	Frequency (%)
(	1	33.3%
)	1	33.3%
-	1	33.3%

역명(중국어 번체)
Text

Distinct	28
Distinct (%)	93.3%
Missing	0
Missing (%)	0.0%
Memory size	372.0 B

Length

Max length	8
Median length	6
Mean length	3.3333333
Min length	1

Characters and Unicode

Total characters	100
Distinct characters	70
Distinct categories	2 ?
Distinct scripts	3 ?
Distinct blocks	4 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	27 ?
Unique (%)	90.0%

Sample

1st row	桂陽
2nd row	橘峴
3rd row	朴村
4th row	林鶴
5th row	桂山

Value	Count	Frequency (%)
	3	10.0%
桂陽	1	3.3%
橘峴	1	3.3%
國際業務地區	1	3.3%
仁川大入口	1	3.3%
知識情報團地	1	3.3%
東幕	1	3.3%
東春	1	3.3%
源仁齋	1	3.3%
新延壽	1	3.3%
Other values (18)	18	60.0%

Most occurring characters

Value	Count	Frequency (%)
仁	5	5.0%
富	4	4.0%
平	4	4.0%
-	3	3.0%
川	3	3.0%
鶴	3	3.0%
東	3	3.0%
桂	2	2.0%
地	2	2.0%
리	2	2.0%
Other values (60)	69	69.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	97	97.0%
Dash Punctuation	3	3.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
仁	5	5.2%
富	4	4.1%
平	4	4.1%
川	3	3.1%
鶴	3	3.1%
東	3	3.1%
桂	2	2.1%
地	2	2.1%
리	2	2.1%
거	2	2.1%
Other values (59)	67	69.1%

Dash Punctuation

Value	Count	Frequency (%)
-	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Han	88	88.0%
Hangul	9	9.0%
Common	3	3.0%

Most frequent character per script

Han

Value	Count	Frequency (%)
仁	5	5.7%
富	4	4.5%
平	4	4.5%
川	3	3.4%
鶴	3	3.4%
東	3	3.4%
桂	2	2.3%
地	2	2.3%
場	2	2.3%
廳	2	2.3%
Other values (52)	58	65.9%

Hangul

Value	Count	Frequency (%)
리	2	22.2%
거	2	22.2%
빛	1	11.1%
달	1	11.1%
터	1	11.1%
미	1	11.1%
널	1	11.1%

Common

Value	Count	Frequency (%)
-	3	100.0%

Most occurring blocks

Value	Count	Frequency (%)
CJK	87	87.0%
Hangul	9	9.0%
ASCII	3	3.0%
CJK Compat Ideographs	1	1.0%

Most frequent character per block

CJK

Value	Count	Frequency (%)
仁	5	5.7%
富	4	4.6%
平	4	4.6%
川	3	3.4%
鶴	3	3.4%
東	3	3.4%
桂	2	2.3%
地	2	2.3%
場	2	2.3%
廳	2	2.3%
Other values (51)	57	65.5%

ASCII

Value	Count	Frequency (%)
-	3	100.0%

Hangul

Value	Count	Frequency (%)
리	2	22.2%
거	2	22.2%
빛	1	11.1%
달	1	11.1%
터	1	11.1%
미	1	11.1%
널	1	11.1%

CJK Compat Ideographs

Value	Count	Frequency (%)
林	1	100.0%

Phik (φk)

Heatmap
Table

	역명	역명(영문)	역명(로마자)	역명(일본어)	역명(중국어 간체)	역명(중국어 번체)
역명	1.000	1.000	1.000	1.000	1.000	1.000
역명(영문)	1.000	1.000	1.000	1.000	1.000	1.000
역명(로마자)	1.000	1.000	1.000	1.000	1.000	1.000
역명(일본어)	1.000	1.000	1.000	1.000	1.000	1.000
역명(중국어 간체)	1.000	1.000	1.000	1.000	1.000	1.000
역명(중국어 번체)	1.000	1.000	1.000	1.000	1.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	역명	역명(영문)	역명(로마자)	역명(일본어)	역명(중국어 간체)	역명(중국어 번체)
0	계양	Gyeyang	Gyeyang	ケヤン	桂阳	桂陽
1	귤현	Gyulhyeon	Gyulhyeon	キュルヒョン	橘岘	橘峴
2	박촌	Bakchon	Bakchon	パクチョン	朴村	朴村
3	임학	Imhak	Imhak	イムハク	林鹤	林鶴
4	계산	Gyesan	Gyesan	ケサン	桂山	桂山
5	경인교대입구	Gyeongin Nat'l Univ. of Education	Gyeongin Nat'l Univ. of Education	キョンインギョデイック	京仁敎大入口	京仁敎大入口
6	작전	Jakjeon	Jakjeon	チャクチョン	鹊田	鵲田
7	갈산	Galsan	Galsan	カルサン	葛山	葛山
8	부평구청	Bupyeong-gu Office	Bupyeong-gu Office	プピョングチョン	富平区厅	富平區廳
9	부평시장	Bupyeong Market	Bupyeong Market	プピョンシジャン	富平市场	富平市場

	역명	역명(영문)	역명(로마자)	역명(일본어)	역명(중국어 간체)	역명(중국어 번체)
20	원인재	Woninjae	Woninjae	ウォニンジェ	源仁斋	源仁齋
21	동춘	Dongchun	Dongchun	トンチュン	东春	東春
22	동막	Dongmak	Dongmak	トンマク	东幕	東幕
23	캠퍼스타운	Campus Town	Kaempeoseutaun	キャンパスタウン	大学城	-
24	테크노파크	Technopark	Tekeunopakeu	テクノパ-ク	科技公园	-
25	지식정보단지	BIT Zone	Jisikjeongbodanji	チシクチョンボダンジ	知识信息园区(知识情报团地)	知識情報團地
26	인천대입구	Incheon National University	Univ. of Incheon	インチョンデイック	仁川大入口	仁川大入口
27	센트럴파크	Central Park	Senteureolpakeu	セントラルパ-ク	中央公园	-
28	국제업무지구	Int'l Business District	Gukjeeommuji-gu	ククチェオンムジグ	国际业务园区	國際業務地區
29	송도달빛축제공원	Songdo Moonlight Festival Park	-	ソンドダルピッチュクチェゴンウォン	-	松島달빛祝祭公園

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Other Punctuation

Space Separator

Dash Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Other Punctuation

Space Separator

Dash Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Dash Punctuation

Other Punctuation

Most occurring scripts

Most frequent character per script

Katakana

Common

Most occurring blocks

Most frequent character per block

Katakana

ASCII

Punctuation

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Open Punctuation

Close Punctuation

Dash Punctuation

Most occurring scripts

Most frequent character per script

Han

Common

Most occurring blocks

Most frequent character per block

CJK

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Han