gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	57
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	2.0 KiB
Average record size in memory	35.3 B

Variable types

Categorical	1
Text	3

Dataset

Description	인천교통공사에서 운영중인 인천지하철 1호선, 인천지하철 2호선 역사의 외국어 표기명으로 사용 외국어는 국어, 한자, 영어 현황입니다. (필드정보는 호선, 역사명, 한자, 영문명 입니다.)
URL	https://www.data.go.kr/data/15043808/fileData.do

Alerts

한 글 has unique values Unique

Reproduction

Analysis started	2023-12-12 14:32:42.367252
Analysis finished	2023-12-12 14:32:42.827637
Duration	0.46 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

호선
Categorical

Distinct	2
Distinct (%)	3.5%
Missing	0
Missing (%)	0.0%
Memory size	588.0 B

1	30
2	27

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	1
2nd row	1
3rd row	1
4th row	1
5th row	1

Common Values

Value	Count	Frequency (%)
1	30	52.6%
2	27	47.4%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
1	30	52.6%
2	27	47.4%

한 글
Text

UNIQUE

Distinct	57
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	588.0 B

Length

Max length	9
Median length	7
Mean length	4.5087719
Min length	2

Characters and Unicode

Total characters	257
Distinct characters	100
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	57 ?
Unique (%)	100.0%

Sample

1st row	계양
2nd row	귤현
3rd row	박촌
4th row	임학
5th row	계산

Value	Count	Frequency (%)
인천시청	2	3.5%
계양	1	1.8%
국제업무지구	1	1.8%
왕길	1	1.8%
검단사거리	1	1.8%
마전	1	1.8%
완정	1	1.8%
독정	1	1.8%
검암	1	1.8%
검바위	1	1.8%
Other values (46)	46	80.7%

Most occurring characters

Value	Count	Frequency (%)
	54	21.0%
인	8	3.1%
시	8	3.1%
천	7	2.7%
구	6	2.3%
장	6	2.3%
청	5	1.9%
가	5	1.9%
부	5	1.9%
정	5	1.9%
Other values (90)	148	57.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	203	79.0%
Space Separator	54	21.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
인	8	3.9%
시	8	3.9%
천	7	3.4%
구	6	3.0%
장	6	3.0%
청	5	2.5%
가	5	2.5%
부	5	2.5%
정	5	2.5%
리	4	2.0%
Other values (89)	144	70.9%

Space Separator

Value	Count	Frequency (%)
	54	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	203	79.0%
Common	54	21.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
인	8	3.9%
시	8	3.9%
천	7	3.4%
구	6	3.0%
장	6	3.0%
청	5	2.5%
가	5	2.5%
부	5	2.5%
정	5	2.5%
리	4	2.0%
Other values (89)	144	70.9%

Common

Value	Count	Frequency (%)
	54	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	203	79.0%
ASCII	54	21.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	54	100.0%

Hangul

Value	Count	Frequency (%)
인	8	3.9%
시	8	3.9%
천	7	3.4%
구	6	3.0%
장	6	3.0%
청	5	2.5%
가	5	2.5%
부	5	2.5%
정	5	2.5%
리	4	2.0%
Other values (89)	144	70.9%

漢字
Text

Distinct	51
Distinct (%)	89.5%
Missing	0
Missing (%)	0.0%
Memory size	588.0 B

Length

Max length	15
Median length	14
Mean length	4.122807
Min length	1

Characters and Unicode

Total characters	235
Distinct characters	115
Distinct categories	5 ?
Distinct scripts	4 ?
Distinct blocks	4 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	49 ?
Unique (%)	86.0%

Sample

1st row	桂陽
2nd row	橘峴
3rd row	朴村
4th row	林鶴
5th row	桂山

Value	Count	Frequency (%)
仁川市廳	2	3.5%
知識情報團地	1	1.8%
거북市場	1	1.8%
黔丹四거리	1	1.8%
麻田	1	1.8%
完井	1	1.8%
篤亭	1	1.8%
黔岩	1	1.8%
아시아드競技場	1	1.8%
公村四거리	1	1.8%
Other values (46)	46	80.7%

Most occurring characters

Value	Count	Frequency (%)
	12	5.1%
仁	9	3.8%
市	8	3.4%
場	7	3.0%
川	7	3.0%
(	7	3.0%
)	6	2.6%
리	6	2.6%
거	6	2.6%
廳	5	2.1%
Other values (105)	162	68.9%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	209	88.9%
Space Separator	12	5.1%
Open Punctuation	7	3.0%
Close Punctuation	6	2.6%
Uppercase Letter	1	0.4%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
仁	9	4.3%
市	8	3.8%
場	7	3.3%
川	7	3.3%
리	6	2.9%
거	6	2.9%
廳	5	2.4%
黔	4	1.9%
區	4	1.9%
地	4	1.9%
Other values (101)	149	71.3%

Space Separator

Value	Count	Frequency (%)
	12	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	7	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	6	100.0%

Uppercase Letter

Value	Count	Frequency (%)
J	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Han	182	77.4%
Hangul	27	11.5%
Common	25	10.6%
Latin	1	0.4%

Most frequent character per script

Han

Value	Count	Frequency (%)
仁	9	4.9%
市	8	4.4%
場	7	3.8%
川	7	3.8%
廳	5	2.7%
黔	4	2.2%
區	4	2.2%
地	4	2.2%
平	4	2.2%
富	4	2.2%
Other values (85)	126	69.2%

Hangul

Value	Count	Frequency (%)
리	6	22.2%
거	6	22.2%
아	2	7.4%
바	1	3.7%
드	1	3.7%
시	1	3.7%
모	1	3.7%
래	1	3.7%
위	1	3.7%
석	1	3.7%
Other values (6)	6	22.2%

Common

Value	Count	Frequency (%)
	12	48.0%
(	7	28.0%
)	6	24.0%

Latin

Value	Count	Frequency (%)
J	1	100.0%

Most occurring blocks

Value	Count	Frequency (%)
CJK	181	77.0%
Hangul	27	11.5%
ASCII	26	11.1%
CJK Compat Ideographs	1	0.4%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	12	46.2%
(	7	26.9%
)	6	23.1%
J	1	3.8%

CJK

Value	Count	Frequency (%)
仁	9	5.0%
市	8	4.4%
場	7	3.9%
川	7	3.9%
廳	5	2.8%
黔	4	2.2%
區	4	2.2%
地	4	2.2%
平	4	2.2%
富	4	2.2%
Other values (84)	125	69.1%

Hangul

Value	Count	Frequency (%)
리	6	22.2%
거	6	22.2%
아	2	7.4%
바	1	3.7%
드	1	3.7%
시	1	3.7%
모	1	3.7%
래	1	3.7%
위	1	3.7%
석	1	3.7%
Other values (6)	6	22.2%

CJK Compat Ideographs

Value	Count	Frequency (%)
林	1	100.0%

로마字
Text

Distinct	56
Distinct (%)	98.2%
Missing	0
Missing (%)	0.0%
Memory size	588.0 B

Length

Max length	51
Median length	29
Mean length	14.491228
Min length	4

Characters and Unicode

Total characters	826
Distinct characters	54
Distinct categories	9 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	55 ?
Unique (%)	96.5%

Sample

1st row	Gyeyang
2nd row	Gyulhyeon
3rd row	Bakchon
4th row	Imhak
5th row	Gyesan

Value	Count	Frequency (%)
incheon	7	6.0%
market	5	4.3%
office	3	2.6%
sageori	3	2.6%
geomdan	3	2.6%
city	3	2.6%
complex	3	2.6%
park	3	2.6%
gajeong	2	1.7%
univ	2	1.7%
Other values (74)	82	70.7%

Most occurring characters

Value	Count	Frequency (%)
n	83	10.0%
e	73	8.8%
a	67	8.1%
o	65	7.9%
	59	7.1%
i	33	4.0%
u	32	3.9%
g	30	3.6%
r	29	3.5%
t	29	3.5%
Other values (44)	326	39.5%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	626	75.8%
Uppercase Letter	116	14.0%
Space Separator	59	7.1%
Close Punctuation	7	0.8%
Open Punctuation	7	0.8%
Other Punctuation	5	0.6%
Dash Punctuation	4	0.5%
Decimal Number	1	0.1%
Final Punctuation	1	0.1%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
n	83	13.3%
e	73	11.7%
a	67	10.7%
o	65	10.4%
i	33	5.3%
u	32	5.1%
g	30	4.8%
r	29	4.6%
t	29	4.6%
l	23	3.7%
Other values (15)	162	25.9%

Uppercase Letter

Value	Count	Frequency (%)
G	18	15.5%
C	15	12.9%
S	12	10.3%
I	12	10.3%
M	9	7.8%
B	8	6.9%
J	5	4.3%
D	5	4.3%
W	5	4.3%
O	4	3.4%
Other values (10)	23	19.8%

Other Punctuation

Value	Count	Frequency (%)
'	3	60.0%
&	1	20.0%
.	1	20.0%

Space Separator

Value	Count	Frequency (%)
	59	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	7	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	7	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	4	100.0%

Decimal Number

Value	Count	Frequency (%)
1	1	100.0%

Final Punctuation

Value	Count	Frequency (%)
’	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	742	89.8%
Common	84	10.2%

Most frequent character per script

Latin

Value	Count	Frequency (%)
n	83	11.2%
e	73	9.8%
a	67	9.0%
o	65	8.8%
i	33	4.4%
u	32	4.3%
g	30	4.0%
r	29	3.9%
t	29	3.9%
l	23	3.1%
Other values (35)	278	37.5%

Common

Value	Count	Frequency (%)
	59	70.2%
)	7	8.3%
(	7	8.3%
-	4	4.8%
'	3	3.6%
&	1	1.2%
.	1	1.2%
1	1	1.2%
’	1	1.2%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	825	99.9%
Punctuation	1	0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
n	83	10.1%
e	73	8.8%
a	67	8.1%
o	65	7.9%
	59	7.2%
i	33	4.0%
u	32	3.9%
g	30	3.6%
r	29	3.5%
t	29	3.5%
Other values (43)	325	39.4%

Punctuation

Value	Count	Frequency (%)
’	1	100.0%

Phik (φk)

Heatmap
Table

	호선	한 글	漢字	로마字
호선	1.000	1.000	0.000	0.000
한 글	1.000	1.000	1.000	1.000
漢字	0.000	1.000	1.000	1.000
로마字	0.000	1.000	1.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	호선	한 글	漢字	로마字
0	1	계양	桂陽	Gyeyang
1	1	귤현	橘峴	Gyulhyeon
2	1	박촌	朴村	Bakchon
3	1	임학	林鶴	Imhak
4	1	계산	桂山	Gyesan
5	1	경인교대	京仁敎大入口	Gyeong-in Nat'l
6	1	입구		Univ. of Education
7	1	작전	鵲田	Jakjeon
8	1	갈산	葛山	Galsan
9	1	부평구청	富平區廳	Bupyeong-gu Office

	호선	한 글	漢字	로마字
47	2	주안	朱安	Juan
48	2	시민공원	市民公園 (文化創作地帶)	Citizens Park (Culture Creation Zone)
49	2	석바위시장	석바위市場	Seokbawi Market
50	2	인천시청	仁川市廳	Incheon City Hall
51	2	석천사거리	石泉四거리	Seokcheon Sageori
52	2	모래내시장	모래내市場	Moraenae Market
53	2	만수	萬壽	Mansu
54	2	남동구청	南洞區廳	Namdong-gu Office
55	2	인천대공원	仁川大公園	Incheon Grand Park
56	2	운연	云宴 (西昌)	Unyeon (Seochang)

Overview

Variables

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Open Punctuation

Close Punctuation

Uppercase Letter

Most occurring scripts

Most frequent character per script

Han

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

CJK

Hangul

CJK Compat Ideographs

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Decimal Number

Final Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Punctuation

Correlations

Missing values

Sample