gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	192
Missing cells	1
Missing cells (%)	0.2%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	4.8 KiB
Average record size in memory	25.7 B

Variable types

Numeric	1
Text	2

Dataset

Description	현대한국구술자료관 구술자료와 관련된 국가명, 국가코드가 포함됨
Author	한국학중앙연구원
URL	https://www.data.go.kr/data/15049075/fileData.do

Alerts

`번호` has unique values	Unique
`국가명` has unique values	Unique

Reproduction

Analysis started	2023-12-12 08:54:03.327616
Analysis finished	2023-12-12 08:54:03.786912
Duration	0.46 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

번호
Real number (ℝ)

UNIQUE

Distinct	192
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	522.23958

Minimum	184
Maximum	894
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	1.8 KiB

Quantile statistics

Minimum	184
5-th percentile	216.2
Q1	347
median	518
Q3	691
95-th percentile	836.7
Maximum	894
Range	710
Interquartile range (IQR)	344

Descriptive statistics

Standard deviation	202.75897
Coefficient of variation (CV)	0.38824895
Kurtosis	-1.1811906
Mean	522.23958
Median Absolute Deviation (MAD)	173
Skewness	0.040779166
Sum	100270
Variance	41111.199
Monotonicity	Strictly increasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
184	1	0.5%
524	1	0.5%
620	1	0.5%
624	1	0.5%
626	1	0.5%
630	1	0.5%
634	1	0.5%
638	1	0.5%
642	1	0.5%
643	1	0.5%
Other values (182)	182	94.8%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
184	1	0.5%
188	1	0.5%
191	1	0.5%
192	1	0.5%
196	1	0.5%
203	1	0.5%
204	1	0.5%
208	1	0.5%
212	1	0.5%
214	1	0.5%

Value	Count	Frequency (%)
894	1	0.5%
887	1	0.5%
882	1	0.5%
876	1	0.5%
862	1	0.5%
860	1	0.5%
858	1	0.5%
854	1	0.5%
850	1	0.5%
840	1	0.5%

국가명
Text

UNIQUE

Distinct	192
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	1.6 KiB

Length

Max length	17
Median length	12.5
Mean length	4.453125
Min length	1

Characters and Unicode

Total characters	855
Distinct characters	199
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	192 ?
Unique (%)	100.0%

Sample

1st row	쿡 제도
2nd row	코스타리카
3rd row	크로아티아
4th row	쿠바
5th row	키프로스

Value	Count	Frequency (%)
제도	11	4.7%
섬	4	1.7%
공화국	3	1.3%
프랑스령	3	1.3%
도미니카	2	0.9%
기니	2	0.9%
미국령	2	0.9%
루마니아	1	0.4%
러시아	1	0.4%
앵귈라	1	0.4%
Other values (202)	202	87.1%

Most occurring characters

Value	Count	Frequency (%)
아	47	5.5%
	40	4.7%
스	34	4.0%
리	27	3.2%
도	22	2.6%
니	22	2.6%
르	21	2.5%
이	20	2.3%
라	18	2.1%
드	16	1.9%
Other values (189)	588	68.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	815	95.3%
Space Separator	40	4.7%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	47	5.8%
스	34	4.2%
리	27	3.3%
도	22	2.7%
니	22	2.7%
르	21	2.6%
이	20	2.5%
라	18	2.2%
드	16	2.0%
나	15	1.8%
Other values (188)	573	70.3%

Space Separator

Value	Count	Frequency (%)
	40	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	815	95.3%
Common	40	4.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	47	5.8%
스	34	4.2%
리	27	3.3%
도	22	2.7%
니	22	2.7%
르	21	2.6%
이	20	2.5%
라	18	2.2%
드	16	2.0%
나	15	1.8%
Other values (188)	573	70.3%

Common

Value	Count	Frequency (%)
	40	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	815	95.3%
ASCII	40	4.7%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	47	5.8%
스	34	4.2%
리	27	3.3%
도	22	2.7%
니	22	2.7%
르	21	2.6%
이	20	2.5%
라	18	2.2%
드	16	2.0%
나	15	1.8%
Other values (188)	573	70.3%

ASCII

Value	Count	Frequency (%)
	40	100.0%

국가코드
Text

Distinct	191
Distinct (%)	100.0%
Missing	1
Missing (%)	0.5%
Memory size	1.6 KiB

Length

Max length	2
Median length	2
Mean length	2
Min length	2

Characters and Unicode

Total characters	382
Distinct characters	26
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	191 ?
Unique (%)	100.0%

Sample

1st row	CK
2nd row	CR
3rd row	HR
4th row	CU
5th row	CY

Value	Count	Frequency (%)
ck	1	0.5%
sh	1	0.5%
pl	1	0.5%
pt	1	0.5%
gw	1	0.5%
tl	1	0.5%
pr	1	0.5%
qa	1	0.5%
re	1	0.5%
ro	1	0.5%
Other values (181)	181	94.8%

Most occurring characters

Value	Count	Frequency (%)
M	33	8.6%
G	28	7.3%
S	27	7.1%
T	23	6.0%
E	21	5.5%
N	21	5.5%
P	19	5.0%
R	18	4.7%
I	18	4.7%
L	17	4.5%
Other values (16)	157	41.1%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	382	100.0%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
M	33	8.6%
G	28	7.3%
S	27	7.1%
T	23	6.0%
E	21	5.5%
N	21	5.5%
P	19	5.0%
R	18	4.7%
I	18	4.7%
L	17	4.5%
Other values (16)	157	41.1%

Most occurring scripts

Value	Count	Frequency (%)
Latin	382	100.0%

Most frequent character per script

Latin

Value	Count	Frequency (%)
M	33	8.6%
G	28	7.3%
S	27	7.1%
T	23	6.0%
E	21	5.5%
N	21	5.5%
P	19	5.0%
R	18	4.7%
I	18	4.7%
L	17	4.5%
Other values (16)	157	41.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	382	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
M	33	8.6%
G	28	7.3%
S	27	7.1%
T	23	6.0%
E	21	5.5%
N	21	5.5%
P	19	5.0%
R	18	4.7%
I	18	4.7%
L	17	4.5%
Other values (16)	157	41.1%

번호

번호

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	번호	국가명	국가코드
0	184	쿡 제도	CK
1	188	코스타리카	CR
2	191	크로아티아	HR
3	192	쿠바	CU
4	196	키프로스	CY
5	203	체코	CZ
6	204	베냉	BJ
7	208	덴마크	DK
8	212	도미니카	DM
9	214	도미니카 공화국	DO

	번호	국가명	국가코드
182	840	미국	US
183	850	미국령 버진아일랜드	VI
184	854	부르키나파소	BF
185	858	우루과이	UY
186	860	우즈베키스탄	UZ
187	862	베네수엘라	VE
188	876	왈리스 퓌튀나	WF
189	882	사모아	WS
190	887	예멘	YE
191	894	잠비아	ZM

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Most occurring scripts

Most frequent character per script

Latin

Most occurring blocks

Most frequent character per block

ASCII

Interactions

Missing values

Sample