gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	238
Missing cells	1
Missing cells (%)	0.2%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	3.8 KiB
Average record size in memory	16.6 B

Variable types

Text	2

Dataset

Description	국립암센터에서 19년도 9월까지 국립암센터홈페이지를 통해 개방하는 나라코드 마스터 정보
Author	국립암센터
URL	https://www.data.go.kr/data/15049630/fileData.do

Alerts

NAME has unique values Unique

Reproduction

Analysis started	2023-12-11 22:46:23.020934
Analysis finished	2023-12-11 22:46:23.283601
Duration	0.26 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

CODE
Text

Distinct	237
Distinct (%)	100.0%
Missing	1
Missing (%)	0.4%
Memory size	2.0 KiB

Length

Max length	2
Median length	2
Mean length	2
Min length	2

Characters and Unicode

Total characters	474
Distinct characters	26
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	237 ?
Unique (%)	100.0%

Sample

1st row	GH
2nd row	GA
3rd row	GY
4th row	GM
5th row	GP

Value	Count	Frequency (%)
gh	1	0.4%
vg	1	0.4%
ky	1	0.4%
ye	1	0.4%
om	1	0.4%
at	1	0.4%
hn	1	0.4%
wf	1	0.4%
jo	1	0.4%
ug	1	0.4%
Other values (227)	227	95.8%

Most occurring characters

Value	Count	Frequency (%)
M	36	7.6%
G	30	6.3%
S	29	6.1%
T	28	5.9%
A	27	5.7%
C	25	5.3%
N	23	4.9%
B	23	4.9%
E	20	4.2%
K	20	4.2%
Other values (16)	213	44.9%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	474	100.0%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
M	36	7.6%
G	30	6.3%
S	29	6.1%
T	28	5.9%
A	27	5.7%
C	25	5.3%
N	23	4.9%
B	23	4.9%
E	20	4.2%
K	20	4.2%
Other values (16)	213	44.9%

Most occurring scripts

Value	Count	Frequency (%)
Latin	474	100.0%

Most frequent character per script

Latin

Value	Count	Frequency (%)
M	36	7.6%
G	30	6.3%
S	29	6.1%
T	28	5.9%
A	27	5.7%
C	25	5.3%
N	23	4.9%
B	23	4.9%
E	20	4.2%
K	20	4.2%
Other values (16)	213	44.9%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	474	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
M	36	7.6%
G	30	6.3%
S	29	6.1%
T	28	5.9%
A	27	5.7%
C	25	5.3%
N	23	4.9%
B	23	4.9%
E	20	4.2%
K	20	4.2%
Other values (16)	213	44.9%

NAME
Text

UNIQUE

Distinct	238
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	2.0 KiB

Length

Max length	15
Median length	14
Mean length	4.3655462
Min length	1

Characters and Unicode

Total characters	1039
Distinct characters	213
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	238 ?
Unique (%)	100.0%

Sample

1st row	가나
2nd row	가봉
3rd row	가이아나
4th row	감비아
5th row	과델로프

Value	Count	Frequency (%)
군도	11	3.8%
세인트	5	1.7%
불령	4	1.4%
아일랜드	4	1.4%
	3	1.0%
영령	3	1.0%
버진군도	2	0.7%
사모아	2	0.7%
네덜란드	2	0.7%
도미니카	2	0.7%
Other values (251)	253	86.9%

Most occurring characters

Value	Count	Frequency (%)
아	66	6.4%
	53	5.1%
리	37	3.6%
스	33	3.2%
이	30	2.9%
도	30	2.9%
라	25	2.4%
니	24	2.3%
르	22	2.1%
나	20	1.9%
Other values (203)	699	67.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	981	94.4%
Space Separator	53	5.1%
Other Punctuation	4	0.4%
Dash Punctuation	1	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	66	6.7%
리	37	3.8%
스	33	3.4%
이	30	3.1%
도	30	3.1%
라	25	2.5%
니	24	2.4%
르	22	2.2%
나	20	2.0%
카	20	2.0%
Other values (199)	674	68.7%

Other Punctuation

Value	Count	Frequency (%)
&	3	75.0%
,	1	25.0%

Space Separator

Value	Count	Frequency (%)
	53	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	981	94.4%
Common	58	5.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	66	6.7%
리	37	3.8%
스	33	3.4%
이	30	3.1%
도	30	3.1%
라	25	2.5%
니	24	2.4%
르	22	2.2%
나	20	2.0%
카	20	2.0%
Other values (199)	674	68.7%

Common

Value	Count	Frequency (%)
	53	91.4%
&	3	5.2%
,	1	1.7%
-	1	1.7%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	981	94.4%
ASCII	58	5.6%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	66	6.7%
리	37	3.8%
스	33	3.4%
이	30	3.1%
도	30	3.1%
라	25	2.5%
니	24	2.4%
르	22	2.2%
나	20	2.0%
카	20	2.0%
Other values (199)	674	68.7%

ASCII

Value	Count	Frequency (%)
	53	91.4%
&	3	5.2%
,	1	1.7%
-	1	1.7%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	CODE	NAME
0	GH	가나
1	GA	가봉
2	GY	가이아나
3	GM	감비아
4	GP	과델로프
5	GT	과테말라
6	GU	괌
7	VA	교황청
8	GD	그레나다
9	GE	그루지아

	CODE	NAME
228	PR	푸에르토리코
229	FR	프랑스
230	FJ	피지
231	PN	피트카이른
232	FI	핀란드
233	PH	필리핀
234	KR	한국
235	HU	헝가리
236	AU	호주
237	HK	홍콩

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Most occurring scripts

Most frequent character per script

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Other Punctuation

Space Separator

Dash Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Missing values

Sample