gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	1873
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	29.4 KiB
Average record size in memory	16.1 B

Variable types

Text	2

Dataset

Description	대구광역시_수산물 거래 품종별 원산지 목록_20210630
Author	대구광역시
URL	http://data.daegu.go.kr/open/data/dataView.do?dataSetId=15086039&dataSetDetailId=150860391c4917c273346&provdMethod=FILE

Reproduction

Analysis started	2023-12-10 19:48:16.095922
Analysis finished	2023-12-10 19:48:16.642226
Duration	0.55 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

품종
Text

Distinct	705
Distinct (%)	37.6%
Missing	0
Missing (%)	0.0%
Memory size	14.8 KiB

Length

Max length	12
Median length	11
Mean length	5.2712226
Min length	2

Characters and Unicode

Total characters	9873
Distinct characters	316
Distinct categories	6 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	330 ?
Unique (%)	17.6%

Sample

1st row	냉동 명태
2nd row	활 전어
3rd row	냉동 갈치
4th row	신선 굴
5th row	건 과메기

Value	Count	Frequency (%)
냉동	672	20.1%
신선	363	10.8%
활	281	8.4%
건	155	4.6%
기타	56	1.7%
오징어	41	1.2%
고등어	29	0.9%
새우	29	0.9%
갈치	27	0.8%
가자미	25	0.7%
Other values (488)	1668	49.9%

Most occurring characters

Value	Count	Frequency (%)
	1473	14.9%
동	688	7.0%
냉	672	6.8%
신	367	3.7%
선	363	3.7%
어	348	3.5%
기	307	3.1%
활	287	2.9%
)	192	1.9%
(	192	1.9%
Other values (306)	4984	50.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	7996	81.0%
Space Separator	1473	14.9%
Close Punctuation	192	1.9%
Open Punctuation	192	1.9%
Other Punctuation	17	0.2%
Decimal Number	3	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
동	688	8.6%
냉	672	8.4%
신	367	4.6%
선	363	4.5%
어	348	4.4%
기	307	3.8%
활	287	3.6%
타	186	2.3%
가	181	2.3%
치	164	2.1%
Other values (300)	4433	55.4%

Decimal Number

Value	Count	Frequency (%)
1	2	66.7%
4	1	33.3%

Space Separator

Value	Count	Frequency (%)
	1473	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	192	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	192	100.0%

Other Punctuation

Value	Count	Frequency (%)
/	17	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	7995	81.0%
Common	1877	19.0%
Han	1	< 0.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
동	688	8.6%
냉	672	8.4%
신	367	4.6%
선	363	4.5%
어	348	4.4%
기	307	3.8%
활	287	3.6%
타	186	2.3%
가	181	2.3%
치	164	2.1%
Other values (299)	4432	55.4%

Common

Value	Count	Frequency (%)
	1473	78.5%
)	192	10.2%
(	192	10.2%
/	17	0.9%
1	2	0.1%
4	1	0.1%

Han

Value	Count	Frequency (%)
土	1	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	7995	81.0%
ASCII	1877	19.0%
CJK	1	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	1473	78.5%
)	192	10.2%
(	192	10.2%
/	17	0.9%
1	2	0.1%
4	1	0.1%

Hangul

Value	Count	Frequency (%)
동	688	8.6%
냉	672	8.4%
신	367	4.6%
선	363	4.5%
어	348	4.4%
기	307	3.8%
활	287	3.6%
타	186	2.3%
가	181	2.3%
치	164	2.1%
Other values (299)	4432	55.4%

CJK

Value	Count	Frequency (%)
土	1	100.0%

원산지
Text

Distinct	141
Distinct (%)	7.5%
Missing	0
Missing (%)	0.0%
Memory size	14.8 KiB

Length

Max length	9
Median length	8
Mean length	3.6572344
Min length	2

Characters and Unicode

Total characters	6850
Distinct characters	164
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	39 ?
Unique (%)	2.1%

Sample

1st row	러시아 연방
2nd row	한국
3rd row	한국
4th row	한국
5th row	원양산

Value	Count	Frequency (%)
한국	418	17.3%
경북	209	8.7%
중국	156	6.5%
동해산	79	3.3%
러시아	73	3.0%
연방	73	3.0%
전남	63	2.6%
베트남	61	2.5%
수입산	59	2.4%
의성군	56	2.3%
Other values (142)	1163	48.3%

Most occurring characters

Value	Count	Frequency (%)
	712	10.4%
국	632	9.2%
한	418	6.1%
시	345	5.0%
경	273	4.0%
남	236	3.4%
산	234	3.4%
북	233	3.4%
군	180	2.6%
아	173	2.5%
Other values (154)	3414	49.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	6138	89.6%
Space Separator	712	10.4%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
국	632	10.3%
한	418	6.8%
시	345	5.6%
경	273	4.4%
남	236	3.8%
산	234	3.8%
북	233	3.8%
군	180	2.9%
아	173	2.8%
중	163	2.7%
Other values (153)	3251	53.0%

Space Separator

Value	Count	Frequency (%)
	712	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	6138	89.6%
Common	712	10.4%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
국	632	10.3%
한	418	6.8%
시	345	5.6%
경	273	4.4%
남	236	3.8%
산	234	3.8%
북	233	3.8%
군	180	2.9%
아	173	2.8%
중	163	2.7%
Other values (153)	3251	53.0%

Common

Value	Count	Frequency (%)
	712	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	6138	89.6%
ASCII	712	10.4%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	712	100.0%

Hangul

Value	Count	Frequency (%)
국	632	10.3%
한	418	6.8%
시	345	5.6%
경	273	4.4%
남	236	3.8%
산	234	3.8%
북	233	3.8%
군	180	2.9%
아	173	2.8%
중	163	2.7%
Other values (153)	3251	53.0%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	품종	원산지
0	냉동 명태	러시아 연방
1	활 전어	한국
2	냉동 갈치	한국
3	신선 굴	한국
4	건 과메기	원양산
5	냉동 명태	수입산
6	냉동 고등어	한국
7	활 숭어	한국
8	건 멸치	한국
9	신선 홍합	한국

	품종	원산지
1863	냉동 참치방어	일본
1864	냉동 고둥	아르헨티나
1865	냉동 조기	아이슬란드
1866	활 개불	수입산
1867	신선 전갱이	부산 서구
1868	냉동 갈치(기타)	남해안
1869	활 홍살치	한국
1870	굼벵이(제조)	대구 달성군
1871	원삼	경북 의성군
1872	냉동 서대	대서양

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Space Separator

Close Punctuation

Open Punctuation

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Han

Most occurring blocks

Most frequent character per block

ASCII

Hangul

CJK

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Missing values

Sample