gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	1395
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	43.7 KiB
Average record size in memory	32.1 B

Variable types

Categorical	1
Text	3

Dataset

Description	한국국제교류재단이 <해외대학 국문명칭 표준화 지침>을 통해 정의한 해외 주요대학의 국문 및 영문 명칭에 관한 정보를 제공합니다.
Author	한국국제교류재단
URL	https://www.data.go.kr/data/15060038/fileData.do

Reproduction

Analysis started	2024-03-14 15:28:59.637388
Analysis finished	2024-03-14 15:29:00.934922
Duration	1.3 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

지역
Categorical

Distinct	12
Distinct (%)	0.9%
Missing	0
Missing (%)	0.0%
Memory size	11.0 KiB

동북아	530
북미	184
서유럽	161
동남아	127
중유럽	91
Other values (7)	302

Length

Max length	5
Median length	3
Mean length	2.9491039
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	남미
2nd row	남미
3rd row	남미
4th row	남미
5th row	남미

Common Values

Value	Count	Frequency (%)
동북아	530	38.0%
북미	184	13.2%
서유럽	161	11.5%
동남아	127	9.1%
중유럽	91	6.5%
유라시아	84	6.0%
중미카리브	50	3.6%
중동	42	3.0%
남미	40	2.9%
서남아	40	2.9%
Other values (2)	46	3.3%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
동북아	530	38.0%
북미	184	13.2%
서유럽	161	11.5%
동남아	127	9.1%
중유럽	91	6.5%
유라시아	84	6.0%
중미카리브	50	3.6%
중동	42	3.0%
남미	40	2.9%
서남아	40	2.9%
Other values (2)	46	3.3%

국가
Text

Distinct	109
Distinct (%)	7.8%
Missing	0
Missing (%)	0.0%
Memory size	11.0 KiB

Length

Max length	10
Median length	2
Mean length	2.7146953
Min length	2

Characters and Unicode

Total characters	3787
Distinct characters	133
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	31 ?
Unique (%)	2.2%

Sample

1st row	볼리비아
2nd row	볼리비아
3rd row	볼리비아
4th row	볼리비아
5th row	볼리비아

Value	Count	Frequency (%)
일본	347	24.9%
미국	156	11.2%
중국	125	9.0%
영국	55	3.9%
대만	43	3.1%
러시아	42	3.0%
베트남	34	2.4%
태국	33	2.4%
독일	29	2.1%
인도	29	2.1%
Other values (100)	503	36.0%

Most occurring characters

Value	Count	Frequency (%)
일	381	10.1%
국	372	9.8%
본	347	9.2%
아	198	5.2%
미	167	4.4%
스	144	3.8%
중	125	3.3%
시	100	2.6%
이	69	1.8%
르	69	1.8%
Other values (123)	1815	47.9%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	3786	> 99.9%
Space Separator	1	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
일	381	10.1%
국	372	9.8%
본	347	9.2%
아	198	5.2%
미	167	4.4%
스	144	3.8%
중	125	3.3%
시	100	2.6%
이	69	1.8%
르	69	1.8%
Other values (122)	1814	47.9%

Space Separator

Value	Count	Frequency (%)
	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	3786	> 99.9%
Common	1	< 0.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
일	381	10.1%
국	372	9.8%
본	347	9.2%
아	198	5.2%
미	167	4.4%
스	144	3.8%
중	125	3.3%
시	100	2.6%
이	69	1.8%
르	69	1.8%
Other values (122)	1814	47.9%

Common

Value	Count	Frequency (%)
	1	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	3786	> 99.9%
ASCII	1	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
일	381	10.1%
국	372	9.8%
본	347	9.2%
아	198	5.2%
미	167	4.4%
스	144	3.8%
중	125	3.3%
시	100	2.6%
이	69	1.8%
르	69	1.8%
Other values (122)	1814	47.9%

ASCII

Value	Count	Frequency (%)
	1	100.0%

기관명(한글)
Text

Distinct	1393
Distinct (%)	99.9%
Missing	0
Missing (%)	0.0%
Memory size	11.0 KiB

Length

Max length	38
Median length	23
Mean length	8.6193548
Min length	3

Characters and Unicode

Total characters	12024
Distinct characters	573
Distinct categories	9 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1391 ?
Unique (%)	99.7%

Sample

1st row	가브리엘레네모레노자치대학교
2nd row	산시몬대학교(UMSS)
3rd row	산안드레스대학교
4th row	세인트프란시스하비에르대학교
5th row	후안미사엘사라초자치대학교

Value	Count	Frequency (%)
캘리포니아대학교	8	0.5%
인도공과대학교	7	0.5%
대학교	5	0.3%
펜실베니아	4	0.3%
텍사스대학교	4	0.3%
뉴욕주립대학교	4	0.3%
자이드대학교	3	0.2%
송클라대학교	3	0.2%
쓰촨외국어대학교	3	0.2%
아메리칸대학교	3	0.2%
Other values (1430)	1441	97.0%

Most occurring characters

Value	Count	Frequency (%)
학	1429	11.9%
교	1343	11.2%
대	1316	10.9%
스	276	2.3%
이	239	2.0%
리	237	2.0%
국	191	1.6%
립	177	1.5%
아	161	1.3%
시	133	1.1%
Other values (563)	6522	54.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	11573	96.2%
Uppercase Letter	217	1.8%
Space Separator	90	0.7%
Open Punctuation	50	0.4%
Close Punctuation	50	0.4%
Lowercase Letter	24	0.2%
Dash Punctuation	13	0.1%
Decimal Number	4	< 0.1%
Other Punctuation	3	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
학	1429	12.3%
교	1343	11.6%
대	1316	11.4%
스	276	2.4%
이	239	2.1%
리	237	2.0%
국	191	1.7%
립	177	1.5%
아	161	1.4%
시	133	1.1%
Other values (522)	6071	52.5%

Uppercase Letter

Value	Count	Frequency (%)
U	45	20.7%
S	21	9.7%
N	20	9.2%
I	18	8.3%
A	16	7.4%
C	16	7.4%
L	11	5.1%
M	10	4.6%
T	10	4.6%
E	7	3.2%
Other values (11)	43	19.8%

Lowercase Letter

Value	Count	Frequency (%)
s	4	16.7%
n	4	16.7%
i	3	12.5%
e	3	12.5%
t	2	8.3%
u	2	8.3%
a	2	8.3%
g	1	4.2%
l	1	4.2%
r	1	4.2%

Decimal Number

Value	Count	Frequency (%)
8	1	25.0%
3	1	25.0%
5	1	25.0%
2	1	25.0%

Space Separator

Value	Count	Frequency (%)
	90	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	50	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	50	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	13	100.0%

Other Punctuation

Value	Count	Frequency (%)
.	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	11573	96.2%
Latin	241	2.0%
Common	210	1.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
학	1429	12.3%
교	1343	11.6%
대	1316	11.4%
스	276	2.4%
이	239	2.1%
리	237	2.0%
국	191	1.7%
립	177	1.5%
아	161	1.4%
시	133	1.1%
Other values (522)	6071	52.5%

Latin

Value	Count	Frequency (%)
U	45	18.7%
S	21	8.7%
N	20	8.3%
I	18	7.5%
A	16	6.6%
C	16	6.6%
L	11	4.6%
M	10	4.1%
T	10	4.1%
E	7	2.9%
Other values (22)	67	27.8%

Common

Value	Count	Frequency (%)
	90	42.9%
(	50	23.8%
)	50	23.8%
-	13	6.2%
.	3	1.4%
8	1	0.5%
3	1	0.5%
5	1	0.5%
2	1	0.5%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	11573	96.2%
ASCII	451	3.8%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
학	1429	12.3%
교	1343	11.6%
대	1316	11.4%
스	276	2.4%
이	239	2.1%
리	237	2.0%
국	191	1.7%
립	177	1.5%
아	161	1.4%
시	133	1.1%
Other values (522)	6071	52.5%

ASCII

Value	Count	Frequency (%)
	90	20.0%
(	50	11.1%
)	50	11.1%
U	45	10.0%
S	21	4.7%
N	20	4.4%
I	18	4.0%
A	16	3.5%
C	16	3.5%
-	13	2.9%
Other values (31)	112	24.8%

기관명(영문)
Text

Distinct	1393
Distinct (%)	99.9%
Missing	0
Missing (%)	0.0%
Memory size	11.0 KiB

Length

Max length	105
Median length	59
Mean length	27.395699
Min length	4

Characters and Unicode

Total characters	38217
Distinct characters	62
Distinct categories	8 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1391 ?
Unique (%)	99.7%

Sample

1st row	Gabriel Rene Moreno Autonomous University
2nd row	University of San Simon (UMSS)
3rd row	Higher University of San Andres
4th row	The Royal and Pontifical Major University of Saint Francis Xavier of Chuquisaca
5th row	Juan Misael Saracho Autonomous University

Value	Count	Frequency (%)
university	1248	25.4%
of	572	11.6%
college	81	1.6%
state	70	1.4%
technology	70	1.4%
national	66	1.3%
and	56	1.1%
institute	48	1.0%
international	47	1.0%
normal	25	0.5%
Other values (1544)	2638	53.6%

Most occurring characters

Value	Count	Frequency (%)
i	4117	10.8%
	3527	9.2%
n	3030	7.9%
e	2874	7.5%
t	2407	6.3%
a	2342	6.1%
r	2126	5.6%
o	2108	5.5%
s	2028	5.3%
y	1604	4.2%
Other values (52)	12054	31.5%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	29802	78.0%
Uppercase Letter	4586	12.0%
Space Separator	3527	9.2%
Other Punctuation	92	0.2%
Open Punctuation	84	0.2%
Close Punctuation	84	0.2%
Dash Punctuation	41	0.1%
Decimal Number	1	< 0.1%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
i	4117	13.8%
n	3030	10.2%
e	2874	9.6%
t	2407	8.1%
a	2342	7.9%
r	2126	7.1%
o	2108	7.1%
s	2028	6.8%
y	1604	5.4%
v	1335	4.5%
Other values (16)	5831	19.6%

Uppercase Letter

Value	Count	Frequency (%)
U	1356	29.6%
S	401	8.7%
C	335	7.3%
T	270	5.9%
N	229	5.0%
I	202	4.4%
A	193	4.2%
M	187	4.1%
P	144	3.1%
K	136	3.0%
Other values (16)	1133	24.7%

Other Punctuation

Value	Count	Frequency (%)
'	34	37.0%
,	26	28.3%
.	21	22.8%
&	10	10.9%
?	1	1.1%

Space Separator

Value	Count	Frequency (%)
	3527	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	84	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	84	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	41	100.0%

Decimal Number

Value	Count	Frequency (%)
3	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	34388	90.0%
Common	3829	10.0%

Most frequent character per script

Latin

Value	Count	Frequency (%)
i	4117	12.0%
n	3030	8.8%
e	2874	8.4%
t	2407	7.0%
a	2342	6.8%
r	2126	6.2%
o	2108	6.1%
s	2028	5.9%
y	1604	4.7%
U	1356	3.9%
Other values (42)	10396	30.2%

Common

Value	Count	Frequency (%)
	3527	92.1%
(	84	2.2%
)	84	2.2%
-	41	1.1%
'	34	0.9%
,	26	0.7%
.	21	0.5%
&	10	0.3%
3	1	< 0.1%
?	1	< 0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	38217	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
i	4117	10.8%
	3527	9.2%
n	3030	7.9%
e	2874	7.5%
t	2407	6.3%
a	2342	6.1%
r	2126	5.6%
o	2108	5.5%
s	2028	5.3%
y	1604	4.2%
Other values (52)	12054	31.5%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	지역	국가	기관명(한글)	기관명(영문)
0	남미	볼리비아	가브리엘레네모레노자치대학교	Gabriel Rene Moreno Autonomous University
1	남미	볼리비아	산시몬대학교(UMSS)	University of San Simon (UMSS)
2	남미	볼리비아	산안드레스대학교	Higher University of San Andres
3	남미	볼리비아	세인트프란시스하비에르대학교	The Royal and Pontifical Major University of Saint Francis Xavier of Chuquisaca
4	남미	볼리비아	후안미사엘사라초자치대학교	Juan Misael Saracho Autonomous University
5	남미	브라질	리우데자네이루연방대학교	Federal University of Rio de Janeiro
6	남미	브라질	미나스제라이스연방대학교	Federal University of Minas Gerais
7	남미	브라질	발리두히우두스시누스대학교(UNISINOS)	University of Vale do Rio dos Sinos (UNISINOS)
8	남미	브라질	브라질리아대학교	University of Brasilia
9	남미	브라질	상파울루대학교(USP)	University of Sao Paulo (USP)

	지역	국가	기관명(한글)	기관명(영문)
1385	중유럽	튀르키예	이스탄불대학교	Istanbul University
1386	중유럽	튀르키예	이즈미르경제대학교	Izmir University of Economics
1387	중유럽	튀르키예	중동공과대학교	Middle East Technical University
1388	중유럽	튀르키예	하제테페대학교	Hacettepe University
1389	중유럽	폴란드	바르샤바대학교	University of Warsaw
1390	중유럽	폴란드	브로츠와프대학교	University of Wroclaw
1391	중유럽	폴란드	아담미츠키에비츠대학교	Adam Mickiewicz University
1392	중유럽	폴란드	야기엘론스키대학교	Jagiellonian University
1393	중유럽	헝가리	데브레첸서머대학교	Debrecen Summer University
1394	중유럽	헝가리	중앙유럽대학교(CEU)	Central European University(CEU)

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Decimal Number

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Missing values

Sample