gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	237
Missing cells	23
Missing cells (%)	1.9%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	9.6 KiB
Average record size in memory	41.6 B

Variable types

Text	4
Numeric	1

Dataset

Description	외교부에서 보유하고 있는 국가별, 지역별 ISO3166-1 표준코드(2자리, 3자리) 및 한글명칭, 영문명칭을 CSV 파일로 제공합니다.
Author	외교부
URL	https://www.data.go.kr/data/15076566/fileData.do

Alerts

`국가명(영문)` has 7 (3.0%) missing values	Missing
`ISO(3자리)` has 7 (3.0%) missing values	Missing
`ISO(숫자)` has 8 (3.4%) missing values	Missing
`국가명(국문)` has unique values	Unique

Reproduction

Analysis started	2023-12-12 22:39:39.496252
Analysis finished	2023-12-12 22:39:40.000550
Duration	0.5 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

국가명(영문)
Text

MISSING

Distinct	230
Distinct (%)	100.0%
Missing	7
Missing (%)	3.0%
Memory size	2.0 KiB

Length

Max length	31
Median length	27
Mean length	9.2565217
Min length	4

Characters and Unicode

Total characters	2129
Distinct characters	57
Distinct categories	5 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	230 ?
Unique (%)	100.0%

Sample

1st row	Ghana
2nd row	Gabon
3rd row	Guyana
4th row	Gambia
5th row	Bailiwick of Guernsey

Value	Count	Frequency (%)
islands	8	2.5%
of	6	1.9%
st	5	1.6%
republic	4	1.3%
and	4	1.3%
	4	1.3%
british	3	1.0%
new	3	1.0%
united	3	1.0%
guinea	3	1.0%
Other values (261)	271	86.3%

Most occurring characters

Value	Count	Frequency (%)
a	299	14.0%
i	185	8.7%
n	163	7.7%
e	149	7.0%
r	126	5.9%
o	111	5.2%
t	87	4.1%
	84	3.9%
u	80	3.8%
s	80	3.8%
Other values (47)	765	35.9%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	1719	80.7%
Uppercase Letter	308	14.5%
Space Separator	84	3.9%
Other Punctuation	14	0.7%
Dash Punctuation	4	0.2%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
a	299	17.4%
i	185	10.8%
n	163	9.5%
e	149	8.7%
r	126	7.3%
o	111	6.5%
t	87	5.1%
u	80	4.7%
s	80	4.7%
l	79	4.6%
Other values (16)	360	20.9%

Uppercase Letter

Value	Count	Frequency (%)
S	32	10.4%
M	27	8.8%
B	25	8.1%
C	22	7.1%
A	22	7.1%
G	21	6.8%
I	19	6.2%
T	17	5.5%
N	17	5.5%
P	16	5.2%
Other values (15)	90	29.2%

Other Punctuation

Value	Count	Frequency (%)
.	8	57.1%
&	3	21.4%
:	2	14.3%
'	1	7.1%

Space Separator

Value	Count	Frequency (%)
	84	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	2027	95.2%
Common	102	4.8%

Most frequent character per script

Latin

Value	Count	Frequency (%)
a	299	14.8%
i	185	9.1%
n	163	8.0%
e	149	7.4%
r	126	6.2%
o	111	5.5%
t	87	4.3%
u	80	3.9%
s	80	3.9%
l	79	3.9%
Other values (41)	668	33.0%

Common

Value	Count	Frequency (%)
	84	82.4%
.	8	7.8%
-	4	3.9%
&	3	2.9%
:	2	2.0%
'	1	1.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2129	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
a	299	14.0%
i	185	8.7%
n	163	7.7%
e	149	7.0%
r	126	5.9%
o	111	5.2%
t	87	4.1%
	84	3.9%
u	80	3.8%
s	80	3.8%
Other values (47)	765	35.9%

국가명(국문)
Text

UNIQUE

Distinct	237
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	2.0 KiB

Length

Max length	18
Median length	13
Mean length	4.2700422
Min length	1

Characters and Unicode

Total characters	1012
Distinct characters	229
Distinct categories	6 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	237 ?
Unique (%)	100.0%

Sample

1st row	가나
2nd row	가봉
3rd row	가이아나
4th row	감비아
5th row	건지

Value	Count	Frequency (%)
세인트	4	1.5%
영국령	3	1.2%
섬	3	1.2%
국가	2	0.8%
제도	2	0.8%
아일랜드	2	0.8%
프랑스령	2	0.8%
일본	1	0.4%
인도네시아	1	0.4%
인도	1	0.4%
Other values (239)	239	91.9%

Most occurring characters

Value	Count	Frequency (%)
아	63	6.2%
스	35	3.5%
리	34	3.4%
이	27	2.7%
니	27	2.7%
르	25	2.5%
	23	2.3%
라	23	2.3%
도	20	2.0%
나	20	2.0%
Other values (219)	715	70.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	980	96.8%
Space Separator	23	2.3%
Other Punctuation	3	0.3%
Open Punctuation	2	0.2%
Close Punctuation	2	0.2%
Uppercase Letter	2	0.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	63	6.4%
스	35	3.6%
리	34	3.5%
이	27	2.8%
니	27	2.8%
르	25	2.6%
라	23	2.3%
도	20	2.0%
나	20	2.0%
국	17	1.7%
Other values (213)	689	70.3%

Uppercase Letter

Value	Count	Frequency (%)
R	1	50.0%
D	1	50.0%

Space Separator

Value	Count	Frequency (%)
	23	100.0%

Other Punctuation

Value	Count	Frequency (%)
·	3	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	2	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	980	96.8%
Common	30	3.0%
Latin	2	0.2%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	63	6.4%
스	35	3.6%
리	34	3.5%
이	27	2.8%
니	27	2.8%
르	25	2.6%
라	23	2.3%
도	20	2.0%
나	20	2.0%
국	17	1.7%
Other values (213)	689	70.3%

Common

Value	Count	Frequency (%)
	23	76.7%
·	3	10.0%
(	2	6.7%
)	2	6.7%

Latin

Value	Count	Frequency (%)
R	1	50.0%
D	1	50.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	980	96.8%
ASCII	29	2.9%
None	3	0.3%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	63	6.4%
스	35	3.6%
리	34	3.5%
이	27	2.8%
니	27	2.8%
르	25	2.6%
라	23	2.3%
도	20	2.0%
나	20	2.0%
국	17	1.7%
Other values (213)	689	70.3%

ASCII

Value	Count	Frequency (%)
	23	79.3%
(	2	6.9%
)	2	6.9%
R	1	3.4%
D	1	3.4%

None

Value	Count	Frequency (%)
·	3	100.0%

ISO(2자리)
Text

Distinct	236
Distinct (%)	100.0%
Missing	1
Missing (%)	0.4%
Memory size	2.0 KiB

Length

Max length	2
Median length	2
Mean length	2
Min length	2

Characters and Unicode

Total characters	472
Distinct characters	27
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	236 ?
Unique (%)	100.0%

Sample

1st row	GH
2nd row	GA
3rd row	GY
4th row	GM
5th row	GG

Value	Count	Frequency (%)
gh	1	0.4%
jo	1	0.4%
uy	1	0.4%
uz	1	0.4%
ua	1	0.4%
iq	1	0.4%
ir	1	0.4%
il	1	0.4%
eg	1	0.4%
it	1	0.4%
Other values (226)	226	95.8%

Most occurring characters

Value	Count	Frequency (%)
M	37	7.8%
G	31	6.6%
S	29	6.1%
A	27	5.7%
T	26	5.5%
C	24	5.1%
N	23	4.9%
E	23	4.9%
B	23	4.9%
R	20	4.2%
Other values (17)	209	44.3%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	470	99.6%
Decimal Number	2	0.4%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
M	37	7.9%
G	31	6.6%
S	29	6.2%
A	27	5.7%
T	26	5.5%
C	24	5.1%
N	23	4.9%
E	23	4.9%
B	23	4.9%
R	20	4.3%
Other values (16)	207	44.0%

Decimal Number

Value	Count	Frequency (%)
9	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	470	99.6%
Common	2	0.4%

Most frequent character per script

Latin

Value	Count	Frequency (%)
M	37	7.9%
G	31	6.6%
S	29	6.2%
A	27	5.7%
T	26	5.5%
C	24	5.1%
N	23	4.9%
E	23	4.9%
B	23	4.9%
R	20	4.3%
Other values (16)	207	44.0%

Common

Value	Count	Frequency (%)
9	2	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	472	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
M	37	7.8%
G	31	6.6%
S	29	6.1%
A	27	5.7%
T	26	5.5%
C	24	5.1%
N	23	4.9%
E	23	4.9%
B	23	4.9%
R	20	4.2%
Other values (17)	209	44.3%

ISO(3자리)
Text

MISSING

Distinct	230
Distinct (%)	100.0%
Missing	7
Missing (%)	3.0%
Memory size	2.0 KiB

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Characters and Unicode

Total characters	690
Distinct characters	26
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	230 ?
Unique (%)	100.0%

Sample

1st row	GHA
2nd row	GAB
3rd row	GUY
4th row	GMB
5th row	GGY

Value	Count	Frequency (%)
mus	1	0.4%
tcd	1	0.4%
ind	1	0.4%
hnd	1	0.4%
wlf	1	0.4%
jor	1	0.4%
uga	1	0.4%
ury	1	0.4%
uzb	1	0.4%
ukr	1	0.4%
Other values (220)	220	95.7%

Most occurring characters

Value	Count	Frequency (%)
A	52	7.5%
N	49	7.1%
R	48	7.0%
M	48	7.0%
S	39	5.7%
T	38	5.5%
L	37	5.4%
G	37	5.4%
B	35	5.1%
C	30	4.3%
Other values (16)	277	40.1%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	690	100.0%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
A	52	7.5%
N	49	7.1%
R	48	7.0%
M	48	7.0%
S	39	5.7%
T	38	5.5%
L	37	5.4%
G	37	5.4%
B	35	5.1%
C	30	4.3%
Other values (16)	277	40.1%

Most occurring scripts

Value	Count	Frequency (%)
Latin	690	100.0%

Most frequent character per script

Latin

Value	Count	Frequency (%)
A	52	7.5%
N	49	7.1%
R	48	7.0%
M	48	7.0%
S	39	5.7%
T	38	5.5%
L	37	5.4%
G	37	5.4%
B	35	5.1%
C	30	4.3%
Other values (16)	277	40.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	690	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
A	52	7.5%
N	49	7.1%
R	48	7.0%
M	48	7.0%
S	39	5.7%
T	38	5.5%
L	37	5.4%
G	37	5.4%
B	35	5.1%
C	30	4.3%
Other values (16)	277	40.1%

ISO(숫자)
Real number (ℝ)

MISSING

Distinct	229
Distinct (%)	100.0%
Missing	8
Missing (%)	3.4%
Infinite	0
Infinite (%)	0.0%
Mean	435.30131

Minimum	4
Maximum	894
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	2.2 KiB

Quantile statistics

Minimum	4
5-th percentile	45.6
Q1	214
median	434
Q3	654
95-th percentile	831.6
Maximum	894
Range	890
Interquartile range (IQR)	440

Descriptive statistics

Standard deviation	253.83846
Coefficient of variation (CV)	0.58313278
Kurtosis	-1.183379
Mean	435.30131
Median Absolute Deviation (MAD)	220
Skewness	0.0033512732
Sum	99684
Variance	64433.966
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
887	1	0.4%
40	1	0.4%
340	1	0.4%
876	1	0.4%
400	1	0.4%
800	1	0.4%
858	1	0.4%
860	1	0.4%
804	1	0.4%
368	1	0.4%
Other values (219)	219	92.4%
(Missing)	8	3.4%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
4	1	0.4%
8	1	0.4%
10	1	0.4%
12	1	0.4%
20	1	0.4%
24	1	0.4%
28	1	0.4%
31	1	0.4%
32	1	0.4%
36	1	0.4%

Value	Count	Frequency (%)
894	1	0.4%
887	1	0.4%
882	1	0.4%
876	1	0.4%
862	1	0.4%
860	1	0.4%
858	1	0.4%
854	1	0.4%
840	1	0.4%
834	1	0.4%

ISO(숫자)

ISO(숫자)

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	국가명(영문)	국가명(국문)	ISO(2자리)	ISO(3자리)	ISO(숫자)
0	Ghana	가나	GH	GHA	288
1	Gabon	가봉	GA	GAB	266
2	Guyana	가이아나	GY	GUY	328
3	Gambia	감비아	GM	GMB	270
4	Bailiwick of Guernsey	건지	GG	GGY	831
5	Guadeloupe	과들루프	GP	GLP	312
6	Guatemala	과테말라	GT	GTM	320
7	Guam	괌	GU	GUM	316
8	Grenada	그레나다	GD	GRD	308
9	Greece	그리스	GR	GRC	300

	국가명(영문)	국가명(국문)	ISO(2자리)	ISO(3자리)	ISO(숫자)
227	Korea	대한민국	KR	KOR	410
228	Puerto Rico	푸에르토리코	PR	PRI	630
229	<NA>	유럽연합	EU	<NA>	<NA>
230	<NA>	아세안	XA	<NA>	<NA>
231	<NA>	유네스코	XB	<NA>	<NA>
232	<NA>	국제연합	UN	<NA>	<NA>
233	<NA>	이슬람 국가	XC	<NA>	<NA>
234	<NA>	경제협력개발기구	XD	<NA>	<NA>
235	Hongkong	홍콩	HK	HKG	344
236	<NA>	전체 국가	99	<NA>	<NA>

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Other Punctuation

Space Separator

Dash Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Space Separator

Other Punctuation

Open Punctuation

Close Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

Hangul

ASCII

None

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Decimal Number

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Most occurring scripts

Most frequent character per script

Latin

Most occurring blocks

Most frequent character per block

ASCII

Interactions

Missing values

Sample