gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	240
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	6.0 KiB
Average record size in memory	25.6 B

Variable types

Text	2
Numeric	1

Dataset

Description	온라인수출플랫폼(Gobizkorea)에서 보유하고 현재 국내외에서 회원가입, 인콰이어리 발송 등을 통해 활동 중인 국가별 해외 바이어 수 정보를 제공합니다.
URL	https://www.data.go.kr/data/15071457/fileData.do

Alerts

`국가명(국문)` has unique values	Unique
`국가명(영문)` has unique values	Unique

Reproduction

Analysis started	2023-12-12 03:21:17.506814
Analysis finished	2023-12-12 03:21:17.934449
Duration	0.43 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

국가명(국문)
Text

UNIQUE

Distinct	240
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	2.0 KiB

Length

Max length	11
Median length	10
Mean length	4.3583333
Min length	1

Characters and Unicode

Total characters	1046
Distinct characters	213
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	240 ?
Unique (%)	100.0%

Sample

1st row	미국
2nd row	중국
3rd row	아프가니스탄
4th row	인도
5th row	홍콩

Value	Count	Frequency (%)
제도	11	3.9%
섬	6	2.1%
공화국	6	2.1%
도미니카	2	0.7%
연방	2	0.7%
미국령	2	0.7%
콩고	2	0.7%
프랑스령	2	0.7%
기니	2	0.7%
버진아일랜드	2	0.7%
Other values (247)	247	87.0%

Most occurring characters

Value	Count	Frequency (%)
아	63	6.0%
	44	4.2%
스	41	3.9%
리	37	3.5%
르	28	2.7%
니	26	2.5%
이	24	2.3%
도	23	2.2%
라	23	2.2%
나	20	1.9%
Other values (203)	717	68.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1002	95.8%
Space Separator	44	4.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	63	6.3%
스	41	4.1%
리	37	3.7%
르	28	2.8%
니	26	2.6%
이	24	2.4%
도	23	2.3%
라	23	2.3%
나	20	2.0%
트	18	1.8%
Other values (202)	699	69.8%

Space Separator

Value	Count	Frequency (%)
	44	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1002	95.8%
Common	44	4.2%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	63	6.3%
스	41	4.1%
리	37	3.7%
르	28	2.8%
니	26	2.6%
이	24	2.4%
도	23	2.3%
라	23	2.3%
나	20	2.0%
트	18	1.8%
Other values (202)	699	69.8%

Common

Value	Count	Frequency (%)
	44	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1002	95.8%
ASCII	44	4.2%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	63	6.3%
스	41	4.1%
리	37	3.7%
르	28	2.8%
니	26	2.6%
이	24	2.4%
도	23	2.3%
라	23	2.3%
나	20	2.0%
트	18	1.8%
Other values (202)	699	69.8%

ASCII

Value	Count	Frequency (%)
	44	100.0%

국가명(영문)
Text

UNIQUE

Distinct	240
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	2.0 KiB

Length

Max length	52
Median length	41
Mean length	11.216667
Min length	4

Characters and Unicode

Total characters	2692
Distinct characters	58
Distinct categories	7 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	240 ?
Unique (%)	100.0%

Sample

1st row	United States of America
2nd row	China
3rd row	Afghanistan
4th row	India
5th row	China, Hong Kong Special Administrative Region

Value	Count	Frequency (%)
islands	15	3.8%
of	14	3.6%
republic	12	3.1%
and	12	3.1%
united	6	1.5%
saint	5	1.3%
states	4	1.0%
the	3	0.8%
democratic	3	0.8%
island	3	0.8%
Other values (293)	315	80.4%

Most occurring characters

Value	Count	Frequency (%)
a	363	13.5%
i	216	8.0%
n	214	7.9%
e	194	7.2%
	152	5.6%
r	140	5.2%
o	136	5.1%
l	118	4.4%
t	110	4.1%
s	105	3.9%
Other values (48)	944	35.1%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	2151	79.9%
Uppercase Letter	363	13.5%
Space Separator	152	5.6%
Other Punctuation	10	0.4%
Open Punctuation	7	0.3%
Close Punctuation	7	0.3%
Dash Punctuation	2	0.1%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
a	363	16.9%
i	216	10.0%
n	214	9.9%
e	194	9.0%
r	140	6.5%
o	136	6.3%
l	118	5.5%
t	110	5.1%
s	105	4.9%
u	98	4.6%
Other values (16)	457	21.2%

Uppercase Letter

Value	Count	Frequency (%)
S	44	12.1%
M	31	8.5%
I	30	8.3%
C	26	7.2%
B	25	6.9%
A	23	6.3%
G	21	5.8%
R	20	5.5%
N	18	5.0%
T	17	4.7%
Other values (15)	108	29.8%

Other Punctuation

Value	Count	Frequency (%)
?	4	40.0%
'	3	30.0%
,	3	30.0%

Space Separator

Value	Count	Frequency (%)
	152	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	7	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	7	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	2514	93.4%
Common	178	6.6%

Most frequent character per script

Latin

Value	Count	Frequency (%)
a	363	14.4%
i	216	8.6%
n	214	8.5%
e	194	7.7%
r	140	5.6%
o	136	5.4%
l	118	4.7%
t	110	4.4%
s	105	4.2%
u	98	3.9%
Other values (41)	820	32.6%

Common

Value	Count	Frequency (%)
	152	85.4%
(	7	3.9%
)	7	3.9%
?	4	2.2%
'	3	1.7%
,	3	1.7%
-	2	1.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2692	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
a	363	13.5%
i	216	8.0%
n	214	7.9%
e	194	7.2%
	152	5.6%
r	140	5.2%
o	136	5.1%
l	118	4.4%
t	110	4.1%
s	105	3.9%
Other values (48)	944	35.1%

해외바이어 수
Real number (ℝ)

Distinct	172
Distinct (%)	71.7%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	1399.65

Minimum	2
Maximum	64310
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	2.2 KiB

Quantile statistics

Minimum	2
5-th percentile	68
Q1	83
median	193.5
Q3	475.75
95-th percentile	5276.5
Maximum	64310
Range	64308
Interquartile range (IQR)	392.75

Descriptive statistics

Standard deviation	5649.666
Coefficient of variation (CV)	4.0364849
Kurtosis	75.413223
Mean	1399.65
Median Absolute Deviation (MAD)	117.5
Skewness	8.0602259
Sum	335916
Variance	31918726
Monotonicity	Decreasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
75	8	3.3%
80	6	2.5%
79	5	2.1%
87	5	2.1%
68	5	2.1%
93	4	1.7%
229	4	1.7%
84	4	1.7%
78	4	1.7%
76	4	1.7%
Other values (162)	191	79.6%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
2	1	0.4%
3	2	0.8%
51	1	0.4%
60	1	0.4%
63	1	0.4%
65	3	1.2%
66	1	0.4%
67	1	0.4%
68	5	2.1%
69	2	0.8%

Value	Count	Frequency (%)
64310	1	0.4%
35803	1	0.4%
32414	1	0.4%
29227	1	0.4%
14707	1	0.4%
8090	1	0.4%
7968	1	0.4%
7195	1	0.4%
7103	1	0.4%
6137	1	0.4%

해외바이어 수

해외바이어 수

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	국가명(국문)	국가명(영문)	해외바이어 수
0	미국	United States of America	64310
1	중국	China	35803
2	아프가니스탄	Afghanistan	32414
3	인도	India	29227
4	홍콩	China, Hong Kong Special Administrative Region	14707
5	일본	Japan	8090
6	싱가포르	Singapore	7968
7	프랑스	France	7195
8	대한민국	Republic of Korea	7103
9	대만	Taiwan	6137

	국가명(국문)	국가명(영문)	해외바이어 수
230	카보베르데	Cabo Verde	66
231	니우에	Niue	65
232	쿡 제도	Cook Islands	65
233	투발루	Tuvalu	65
234	세인트빈센트 그레나딘	Saint Vincent and the Grenadines	63
235	몬테네그로	Montenegro	60
236	레소토	Lesotho	51
237	미국령 군소 제도	United States Minor Outlying Islands	3
238	크리스마스 섬	Christmas Island	3
239	코코스 제도	Cocos (Keeling) Islands	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Interactions

Missing values

Sample