gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	1583
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	24.9 KiB
Average record size in memory	16.1 B

Variable types

Text	2

Dataset

Description	농수산물도매시장 수산물 시장도매인 시스템에서 추출한 품종별 원산지 목록(거래물량 많은 순)에 관한 데이터로 품종 및 원산지 정보를 제공합니다.
Author	대구광역시
URL	https://www.data.go.kr/data/15086039/fileData.do

Reproduction

Analysis started	2023-12-12 00:44:28.527862
Analysis finished	2023-12-12 00:44:28.801551
Duration	0.27 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

품종
Text

Distinct	570
Distinct (%)	36.0%
Missing	0
Missing (%)	0.0%
Memory size	12.5 KiB

Length

Max length	12
Median length	11
Mean length	5.701832
Min length	2

Characters and Unicode

Total characters	9026
Distinct characters	221
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	248 ?
Unique (%)	15.7%

Sample

1st row	가자미(가공)
2nd row	가자미(가공)
3rd row	가자미(가공)
4th row	가자미(가공)
5th row	가자미(일반)

Value	Count	Frequency (%)
냉동	649	21.1%
신선	389	12.7%
활	282	9.2%
건	166	5.4%
기타	51	1.7%
오징어	41	1.3%
게	34	1.1%
갈치	30	1.0%
고등어	29	0.9%
새우	28	0.9%
Other values (345)	1370	44.6%

Most occurring characters

Value	Count	Frequency (%)
	1486	16.5%
동	663	7.3%
냉	649	7.2%
신	389	4.3%
선	389	4.3%
어	369	4.1%
기	295	3.3%
활	282	3.1%
타	205	2.3%
)	202	2.2%
Other values (211)	4097	45.4%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	7136	79.1%
Space Separator	1486	16.5%
Close Punctuation	202	2.2%
Open Punctuation	202	2.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
동	663	9.3%
냉	649	9.1%
신	389	5.5%
선	389	5.5%
어	369	5.2%
기	295	4.1%
활	282	4.0%
타	205	2.9%
가	172	2.4%
치	172	2.4%
Other values (208)	3551	49.8%

Space Separator

Value	Count	Frequency (%)
	1486	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	202	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	202	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	7136	79.1%
Common	1890	20.9%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
동	663	9.3%
냉	649	9.1%
신	389	5.5%
선	389	5.5%
어	369	5.2%
기	295	4.1%
활	282	4.0%
타	205	2.9%
가	172	2.4%
치	172	2.4%
Other values (208)	3551	49.8%

Common

Value	Count	Frequency (%)
	1486	78.6%
)	202	10.7%
(	202	10.7%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	7136	79.1%
ASCII	1890	20.9%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	1486	78.6%
)	202	10.7%
(	202	10.7%

Hangul

Value	Count	Frequency (%)
동	663	9.3%
냉	649	9.1%
신	389	5.5%
선	389	5.5%
어	369	5.2%
기	295	4.1%
활	282	4.0%
타	205	2.9%
가	172	2.4%
치	172	2.4%
Other values (208)	3551	49.8%

원산지
Text

Distinct	88
Distinct (%)	5.6%
Missing	0
Missing (%)	0.0%
Memory size	12.5 KiB

Length

Max length	12
Median length	2
Mean length	3.0233733
Min length	2

Characters and Unicode

Total characters	4786
Distinct characters	134
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	22 ?
Unique (%)	1.4%

Sample

1st row	기니
2nd row	세네갈
3rd row	원양산
4th row	한국
5th row	기니

Value	Count	Frequency (%)
한국	442	24.8%
중국	183	10.3%
러시아	78	4.4%
연방	78	4.4%
원양산	59	3.3%
베트남	56	3.1%
수입산	53	3.0%
일본	53	3.0%
동해산	49	2.8%
남해안	49	2.8%
Other values (88)	679	38.2%

Most occurring characters

Value	Count	Frequency (%)
국	687	14.4%
한	442	9.2%
	235	4.9%
시	191	4.0%
중	185	3.9%
남	181	3.8%
아	179	3.7%
산	173	3.6%
해	113	2.4%
수	84	1.8%
Other values (124)	2316	48.4%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	4551	95.1%
Space Separator	235	4.9%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
국	687	15.1%
한	442	9.7%
시	191	4.2%
중	185	4.1%
남	181	4.0%
아	179	3.9%
산	173	3.8%
해	113	2.5%
수	84	1.8%
연	79	1.7%
Other values (123)	2237	49.2%

Space Separator

Value	Count	Frequency (%)
	235	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	4551	95.1%
Common	235	4.9%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
국	687	15.1%
한	442	9.7%
시	191	4.2%
중	185	4.1%
남	181	4.0%
아	179	3.9%
산	173	3.8%
해	113	2.5%
수	84	1.8%
연	79	1.7%
Other values (123)	2237	49.2%

Common

Value	Count	Frequency (%)
	235	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	4551	95.1%
ASCII	235	4.9%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
국	687	15.1%
한	442	9.7%
시	191	4.2%
중	185	4.1%
남	181	4.0%
아	179	3.9%
산	173	3.8%
해	113	2.5%
수	84	1.8%
연	79	1.7%
Other values (123)	2237	49.2%

ASCII

Value	Count	Frequency (%)
	235	100.0%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	품종	원산지
0	가자미(가공)	기니
1	가자미(가공)	세네갈
2	가자미(가공)	원양산
3	가자미(가공)	한국
4	가자미(일반)	기니
5	가자미(일반)	세네갈
6	가자미(일반)	스페인
7	갈치속젓	한국
8	갈치젓	한국
9	건 가문어	페루

	품종	원산지
1573	활 해삼	일본
1574	활 해삼	중국
1575	활 해삼	한국
1576	활 홍삼치	한국
1577	활 홍어	아르헨티나
1578	활 홍합	중국
1579	활 홍합	한국
1580	활 홍해삼	한국
1581	활 황석어	한국
1582	황석어젓	한국

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Close Punctuation

Open Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Missing values

Sample