gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	468.8 KiB
Average record size in memory	48.0 B

Variable types

Text	3
Categorical	2

Dataset

Description	2015년 제·개정된 농축수산물 표준코드의 포장상태코드와 동일한 의미를 가지는 2013년 농축수산물 표준코드의 포장상태코드를 나타낸 정보
Author	농림수산식품교육문화정보원
URL	https://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220210000000001768

Alerts

UPDT_DE is highly imbalanced (93.6%) Imbalance

Reproduction

Analysis started	2024-04-21 01:00:35.197911
Analysis finished	2024-04-21 01:00:36.382445
Duration	1.18 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

STD_FRMLC_NEW_CODE
Text

Distinct	57
Distinct (%)	0.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Characters and Unicode

Total characters	30000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	714
2nd row	703
3rd row	117
4th row	117
5th row	117

Value	Count	Frequency (%)
1zz	487	4.9%
108	475	4.8%
7zz	456	4.6%
701	447	4.5%
703	445	4.5%
106	247	2.5%
101	246	2.5%
110	244	2.4%
114	241	2.4%
715	238	2.4%
Other values (47)	6474	64.7%

Most occurring characters

Value	Count	Frequency (%)
1	10380	34.6%
7	5287	17.6%
0	5198	17.3%
Z	2154	7.2%
3	1609	5.4%
8	1235	4.1%
4	995	3.3%
2	977	3.3%
5	954	3.2%
6	712	2.4%
Other values (2)	499	1.7%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	27824	92.7%
Uppercase Letter	2176	7.3%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
1	10380	37.3%
7	5287	19.0%
0	5198	18.7%
3	1609	5.8%
8	1235	4.4%
4	995	3.6%
2	977	3.5%
5	954	3.4%
6	712	2.6%
9	477	1.7%

Uppercase Letter

Value	Count	Frequency (%)
Z	2154	99.0%
A	22	1.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	27824	92.7%
Latin	2176	7.3%

Most frequent character per script

Common

Value	Count	Frequency (%)
1	10380	37.3%
7	5287	19.0%
0	5198	18.7%
3	1609	5.8%
8	1235	4.4%
4	995	3.6%
2	977	3.5%
5	954	3.4%
6	712	2.6%
9	477	1.7%

Latin

Value	Count	Frequency (%)
Z	2154	99.0%
A	22	1.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	30000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
1	10380	34.6%
7	5287	17.6%
0	5198	17.3%
Z	2154	7.2%
3	1609	5.4%
8	1235	4.1%
4	995	3.3%
2	977	3.3%
5	954	3.2%
6	712	2.4%
Other values (2)	499	1.7%

STD_FRMLC_NEW_NM
Categorical

Distinct	35
Distinct (%)	0.4%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

기타	1077
상자	752
봉지	683
PP대	509
그물망	490
Other values (30)	6489

Length

Max length	6
Median length	5
Mean length	2.1192
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	코
2nd row	D/M
3rd row	축
4th row	축
5th row	축

Common Values

Value	Count	Frequency (%)
기타	1077	10.8%
상자	752	7.5%
봉지	683	6.8%
PP대	509	5.1%
그물망	490	4.9%
속	464	4.6%
D/M	445	4.5%
쾌	440	4.4%
포	430	4.3%
축	419	4.2%
Other values (25)	4291	42.9%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
기타	1077	10.8%
상자	752	7.5%
봉지	683	6.8%
pp대	509	5.1%
그물망	490	4.9%
속	464	4.6%
d/m	445	4.5%
쾌	440	4.4%
포	430	4.3%
축	419	4.2%
Other values (25)	4291	42.9%

STD_FRMLC_CODE
Text

Distinct	64
Distinct (%)	0.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Characters and Unicode

Total characters	30000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	714
2nd row	703
3rd row	117
4th row	117
5th row	117

Value	Count	Frequency (%)
106	247	2.5%
1zz	246	2.5%
101	246	2.5%
110	244	2.4%
109	244	2.4%
7zz	241	2.4%
114	241	2.4%
100	241	2.4%
715	238	2.4%
112	237	2.4%
Other values (54)	7575	75.8%

Most occurring characters

Value	Count	Frequency (%)
1	10374	34.6%
0	6017	20.1%
7	5287	17.6%
3	1609	5.4%
Z	1114	3.7%
4	995	3.3%
8	991	3.3%
2	977	3.3%
5	954	3.2%
6	939	3.1%
Other values (2)	743	2.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	28864	96.2%
Uppercase Letter	1136	3.8%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
1	10374	35.9%
0	6017	20.8%
7	5287	18.3%
3	1609	5.6%
4	995	3.4%
8	991	3.4%
2	977	3.4%
5	954	3.3%
6	939	3.3%
9	721	2.5%

Uppercase Letter

Value	Count	Frequency (%)
Z	1114	98.1%
A	22	1.9%

Most occurring scripts

Value	Count	Frequency (%)
Common	28864	96.2%
Latin	1136	3.8%

Most frequent character per script

Common

Value	Count	Frequency (%)
1	10374	35.9%
0	6017	20.8%
7	5287	18.3%
3	1609	5.6%
4	995	3.4%
8	991	3.4%
2	977	3.4%
5	954	3.3%
6	939	3.3%
9	721	2.5%

Latin

Value	Count	Frequency (%)
Z	1114	98.1%
A	22	1.9%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	30000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
1	10374	34.6%
0	6017	20.1%
7	5287	17.6%
3	1609	5.4%
Z	1114	3.7%
4	995	3.3%
8	991	3.3%
2	977	3.3%
5	954	3.2%
6	939	3.1%
Other values (2)	743	2.5%

STD_FRMLC_NM
Text

Distinct	9556
Distinct (%)	95.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	25
Median length	22
Mean length	8.978
Min length	1

Characters and Unicode

Total characters	89780
Distinct characters	95
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	9133 ?
Unique (%)	91.3%

Sample

1st row	ton 코 3미
2nd row	g D/M 1방
3rd row	ton 축 500개이상
4th row	l 축 80내
5th row	ton 축 35내

Value	Count	Frequency (%)
g	2088	7.7%
kg	2071	7.6%
ton	2046	7.5%
l	969	3.6%
ml	807	3.0%
기타	727	2.7%
상자	525	1.9%
pp대	509	1.9%
그물망	490	1.8%
속	475	1.7%
Other values (181)	16524	60.7%

Most occurring characters

Value	Count	Frequency (%)
	17231	19.2%
0	5499	6.1%
g	4112	4.6%
내	4090	4.6%
1	3957	4.4%
5	2347	2.6%
2	2312	2.6%
P	2196	2.4%
개	2153	2.4%
k	2067	2.3%
Other values (85)	43816	48.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	24882	27.7%
Decimal Number	20356	22.7%
Space Separator	17231	19.2%
Lowercase Letter	16514	18.4%
Uppercase Letter	5932	6.6%
Other Punctuation	1557	1.7%
Open Punctuation	1134	1.3%
Close Punctuation	1059	1.2%
Math Symbol	570	0.6%
Dash Punctuation	545	0.6%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
내	4090	16.4%
개	2153	8.7%
미	1473	5.9%
상	1126	4.5%
대	1017	4.1%
단	990	4.0%
기	971	3.9%
타	727	2.9%
지	683	2.7%
봉	683	2.7%
Other values (45)	10969	44.1%

Uppercase Letter

Value	Count	Frequency (%)
P	2196	37.0%
B	498	8.4%
T	456	7.7%
S	396	6.7%
M	389	6.6%
E	272	4.6%
X	271	4.6%
O	271	4.6%
C	227	3.8%
D	224	3.8%
Other values (5)	732	12.3%

Decimal Number

Value	Count	Frequency (%)
0	5499	27.0%
1	3957	19.4%
5	2347	11.5%
2	2312	11.4%
3	1802	8.9%
4	1291	6.3%
8	972	4.8%
7	906	4.5%
6	675	3.3%
9	595	2.9%

Lowercase Letter

Value	Count	Frequency (%)
g	4112	24.9%
k	2067	12.5%
n	2046	12.4%
o	2046	12.4%
t	2044	12.4%
m	1906	11.5%
l	1723	10.4%
c	570	3.5%

Other Punctuation

Value	Count	Frequency (%)
/	901	57.9%
.	656	42.1%

Space Separator

Value	Count	Frequency (%)
	17231	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	1134	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	1059	100.0%

Math Symbol

Value	Count	Frequency (%)
×	570	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	545	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	42452	47.3%
Hangul	24882	27.7%
Latin	22446	25.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
내	4090	16.4%
개	2153	8.7%
미	1473	5.9%
상	1126	4.5%
대	1017	4.1%
단	990	4.0%
기	971	3.9%
타	727	2.9%
지	683	2.7%
봉	683	2.7%
Other values (45)	10969	44.1%

Latin

Value	Count	Frequency (%)
g	4112	18.3%
P	2196	9.8%
k	2067	9.2%
n	2046	9.1%
o	2046	9.1%
t	2044	9.1%
m	1906	8.5%
l	1723	7.7%
c	570	2.5%
B	498	2.2%
Other values (13)	3238	14.4%

Common

Value	Count	Frequency (%)
	17231	40.6%
0	5499	13.0%
1	3957	9.3%
5	2347	5.5%
2	2312	5.4%
3	1802	4.2%
4	1291	3.0%
(	1134	2.7%
)	1059	2.5%
8	972	2.3%
Other values (7)	4848	11.4%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	64328	71.7%
Hangul	24882	27.7%
None	570	0.6%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	17231	26.8%
0	5499	8.5%
g	4112	6.4%
1	3957	6.2%
5	2347	3.6%
2	2312	3.6%
P	2196	3.4%
k	2067	3.2%
n	2046	3.2%
o	2046	3.2%
Other values (29)	20515	31.9%

Hangul

Value	Count	Frequency (%)
내	4090	16.4%
개	2153	8.7%
미	1473	5.9%
상	1126	4.5%
대	1017	4.1%
단	990	4.0%
기	971	3.9%
타	727	2.9%
지	683	2.7%
봉	683	2.7%
Other values (45)	10969	44.1%

None

Value	Count	Frequency (%)
×	570	100.0%

UPDT_DE
Categorical

IMBALANCE

Distinct	2
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

20220127	9925
뿌리)	75

Length

Max length	8
Median length	8
Mean length	7.9625
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	20220127
2nd row	20220127
3rd row	20220127
4th row	20220127
5th row	20220127

Common Values

Value	Count	Frequency (%)
20220127	9925	99.2%
뿌리)	75	0.8%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
20220127	9925	99.2%
뿌리	75	0.8%

Heatmap
Table

	STD_FRMLC_NEW_CODE	STD_FRMLC_NEW_NM	STD_FRMLC_CODE	UPDT_DE
STD_FRMLC_NEW_CODE	1.000	1.000	1.000	0.072
STD_FRMLC_NEW_NM	1.000	1.000	1.000	0.046
STD_FRMLC_CODE	1.000	1.000	1.000	0.070
UPDT_DE	0.072	0.046	0.070	1.000

Heatmap
Table

	UPDT_DE	STD_FRMLC_NEW_NM
UPDT_DE	1.000	0.038
STD_FRMLC_NEW_NM	0.038	1.000

Heatmap
Table

	STD_FRMLC_NEW_NM	UPDT_DE
STD_FRMLC_NEW_NM	1.000	0.038
UPDT_DE	0.038	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	STD_FRMLC_NEW_CODE	STD_FRMLC_NEW_NM	STD_FRMLC_CODE	STD_FRMLC_NM	UPDT_DE
14870	714	코	714	ton 코 3미	20220127
11466	703	D/M	703	g D/M 1방	20220127
6735	117	축	117	ton 축 500개이상	20220127
6620	117	축	117	l 축 80내	20220127
6732	117	축	117	ton 축 35내	20220127
14953	714	코	714	ton 코 80내	20220127
10868	702	PAN(펜)	702	PAN(펜) L	20220127
14262	711	각	711	g 각 170내	20220127
2495	107	파렛트	107	g 파렛트 22개	20220127
11484	703	D/M	713	깡 400내	20220127

	STD_FRMLC_NEW_CODE	STD_FRMLC_NEW_NM	STD_FRMLC_CODE	STD_FRMLC_NM	UPDT_DE
5533	114	채	114	ml 채 18개	20220127
1546	104	PP대	104	ton PP대 22개	20220127
749	102	P-BOX	102	ml P-BOX 5개	20220127
10050	701	상자	706	ton C/T(B/T) 20미	20220127
10367	701	상자	706	C/T(B/T) 5통	20220127
3910	110	접시용기	110	ml 접시용기 19개	20220127
12228	705	그물망	705	g 그물망 9통	20220127
10703	702	PAN(펜)	702	ton PAN(펜) 3방	20220127
11688	704	PP대	704	PP대 6미	20220127
5748	115	속	115	속 40내	20220127

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Lowercase Letter

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Math Symbol

Dash Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

None

Common Values

Length

Common Values (Plot)

Correlations

Missing values

Sample