gimi9 Pandas Profiling

Dataset statistics

Number of variables	10
Number of observations	643
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	52.2 KiB
Average record size in memory	83.2 B

Variable types

Numeric	1
Categorical	6
Text	2
DateTime	1

Dataset

Description	부산광역시_식품방사능검사현황_20201231
Author	부산광역시
URL	http://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15083358

Alerts

`세슘검출량(Bq/kg)` has constant value ""	Constant
`요오드검출량(Bq/kg)` has constant value ""	Constant
`적부판정` has constant value ""	Constant
`연번` is highly overall correlated with `원산지`	High correlation
`원산지` is highly overall correlated with `연번` and 1 other fields	High correlation
`수입국` is highly overall correlated with `원산지`	High correlation
`수입국` is highly imbalanced (60.7%)	Imbalance
`연번` has unique values	Unique

Reproduction

Analysis started	2024-03-30 09:11:29.195007
Analysis finished	2024-03-30 09:11:32.636692
Duration	3.44 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

연번
Real number (ℝ)

HIGH CORRELATION UNIQUE

Distinct	643
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	322

Minimum	1
Maximum	643
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	5.8 KiB

Quantile statistics

Minimum	1
5-th percentile	33.1
Q1	161.5
median	322
Q3	482.5
95-th percentile	610.9
Maximum	643
Range	642
Interquartile range (IQR)	321

Descriptive statistics

Standard deviation	185.76239
Coefficient of variation (CV)	0.57690184
Kurtosis	-1.2
Mean	322
Median Absolute Deviation (MAD)	161
Skewness	0
Sum	207046
Variance	34507.667
Monotonicity	Strictly increasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
1	1	0.2%
404	1	0.2%
426	1	0.2%
427	1	0.2%
428	1	0.2%
429	1	0.2%
430	1	0.2%
431	1	0.2%
432	1	0.2%
433	1	0.2%
Other values (633)	633	98.4%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	1	0.2%
2	1	0.2%
3	1	0.2%
4	1	0.2%
5	1	0.2%
6	1	0.2%
7	1	0.2%
8	1	0.2%
9	1	0.2%
10	1	0.2%

Value	Count	Frequency (%)
643	1	0.2%
642	1	0.2%
641	1	0.2%
640	1	0.2%
639	1	0.2%
638	1	0.2%
637	1	0.2%
636	1	0.2%
635	1	0.2%
634	1	0.2%

분류
Categorical

Distinct	4
Distinct (%)	0.6%
Missing	0
Missing (%)	0.0%
Memory size	5.2 KiB

가공식품	312
수산물	265
농산물	60
축산물	6

Length

Max length	4
Median length	3
Mean length	3.4852255
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	수산물
2nd row	수산물
3rd row	가공식품
4th row	수산물
5th row	농산물

Common Values

Value	Count	Frequency (%)
가공식품	312	48.5%
수산물	265	41.2%
농산물	60	9.3%
축산물	6	0.9%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
가공식품	312	48.5%
수산물	265	41.2%
농산물	60	9.3%
축산물	6	0.9%

제품명
Text

Distinct	478
Distinct (%)	74.3%
Missing	0
Missing (%)	0.0%
Memory size	5.2 KiB

Length

Max length	23
Median length	19
Mean length	6.2410575
Min length	1

Characters and Unicode

Total characters	4013
Distinct characters	440
Distinct categories	9 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	406 ?
Unique (%)	63.1%

Sample

1st row	포항가자미
2nd row	물메기
3rd row	블독우스타소스
4th row	생대구
5th row	친환경골든키위

Value	Count	Frequency (%)
고등어	16	2.1%
기꼬만혼쯔유(코이다시	9	1.2%
삼치	8	1.0%
오이오차녹차	7	0.9%
신주일미된장	7	0.9%
가자미	7	0.9%
농축쯔유	6	0.8%
동태	6	0.8%
친환경	6	0.8%
갈치	6	0.8%
Other values (524)	696	89.9%

Most occurring characters

Value	Count	Frequency (%)
	138	3.4%
스	110	2.7%
미	87	2.2%
어	83	2.1%
장	81	2.0%
기	71	1.8%
소	69	1.7%
오	69	1.7%
이	65	1.6%
리	59	1.5%
Other values (430)	3181	79.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	3737	93.1%
Space Separator	138	3.4%
Close Punctuation	49	1.2%
Open Punctuation	49	1.2%
Decimal Number	27	0.7%
Other Punctuation	6	0.1%
Uppercase Letter	5	0.1%
Other Symbol	1	< 0.1%
Dash Punctuation	1	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
스	110	2.9%
미	87	2.3%
어	83	2.2%
장	81	2.2%
기	71	1.9%
소	69	1.8%
오	69	1.8%
이	65	1.7%
리	59	1.6%
시	56	1.5%
Other values (411)	2987	79.9%

Decimal Number

Value	Count	Frequency (%)
0	9	33.3%
1	7	25.9%
2	5	18.5%
5	3	11.1%
7	2	7.4%
3	1	3.7%

Uppercase Letter

Value	Count	Frequency (%)
B	1	20.0%
S	1	20.0%
T	1	20.0%
O	1	20.0%
P	1	20.0%

Other Punctuation

Value	Count	Frequency (%)
&	3	50.0%
.	2	33.3%
/	1	16.7%

Space Separator

Value	Count	Frequency (%)
	138	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	49	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	49	100.0%

Other Symbol

Value	Count	Frequency (%)
㈜	1	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	3738	93.1%
Common	270	6.7%
Latin	5	0.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
스	110	2.9%
미	87	2.3%
어	83	2.2%
장	81	2.2%
기	71	1.9%
소	69	1.8%
오	69	1.8%
이	65	1.7%
리	59	1.6%
시	56	1.5%
Other values (412)	2988	79.9%

Common

Value	Count	Frequency (%)
	138	51.1%
)	49	18.1%
(	49	18.1%
0	9	3.3%
1	7	2.6%
2	5	1.9%
5	3	1.1%
&	3	1.1%
.	2	0.7%
7	2	0.7%
Other values (3)	3	1.1%

Latin

Value	Count	Frequency (%)
B	1	20.0%
S	1	20.0%
T	1	20.0%
O	1	20.0%
P	1	20.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	3737	93.1%
ASCII	275	6.9%
None	1	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	138	50.2%
)	49	17.8%
(	49	17.8%
0	9	3.3%
1	7	2.5%
2	5	1.8%
5	3	1.1%
&	3	1.1%
.	2	0.7%
7	2	0.7%
Other values (8)	8	2.9%

Hangul

Value	Count	Frequency (%)
스	110	2.9%
미	87	2.3%
어	83	2.2%
장	81	2.2%
기	71	1.9%
소	69	1.8%
오	69	1.8%
이	65	1.7%
리	59	1.6%
시	56	1.5%
Other values (411)	2987	79.9%

None

Value	Count	Frequency (%)
㈜	1	100.0%

품목(또는 식품유형)
Text

Distinct	174
Distinct (%)	27.1%
Missing	0
Missing (%)	0.0%
Memory size	5.2 KiB

Length

Max length	9
Median length	8
Mean length	3.6236392
Min length	1

Characters and Unicode

Total characters	2330
Distinct characters	196
Distinct categories	5 ?
Distinct scripts	2 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	94 ?
Unique (%)	14.6%

Sample

1st row	가자미
2nd row	메기
3rd row	소스
4th row	대구
5th row	키위

Value	Count	Frequency (%)
소스	99	15.2%
기타수산물가공품	75	11.5%
혼합장	34	5.2%
액상차	20	3.1%
카레	19	2.9%
가자미	15	2.3%
고등어	12	1.8%
갈치	11	1.7%
오징어	11	1.7%
다시마	10	1.5%
Other values (163)	344	52.9%

Most occurring characters

Value	Count	Frequency (%)
가	121	5.2%
기	117	5.0%
품	110	4.7%
소	108	4.6%
산	107	4.6%
스	99	4.2%
공	98	4.2%
물	94	4.0%
타	94	4.0%
수	91	3.9%
Other values (186)	1291	55.4%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	2286	98.1%
Open Punctuation	15	0.6%
Close Punctuation	15	0.6%
Space Separator	9	0.4%
Other Punctuation	5	0.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
가	121	5.3%
기	117	5.1%
품	110	4.8%
소	108	4.7%
산	107	4.7%
스	99	4.3%
공	98	4.3%
물	94	4.1%
타	94	4.1%
수	91	4.0%
Other values (181)	1247	54.5%

Other Punctuation

Value	Count	Frequency (%)
.	4	80.0%
·	1	20.0%

Open Punctuation

Value	Count	Frequency (%)
(	15	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	15	100.0%

Space Separator

Value	Count	Frequency (%)
	9	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	2286	98.1%
Common	44	1.9%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
가	121	5.3%
기	117	5.1%
품	110	4.8%
소	108	4.7%
산	107	4.7%
스	99	4.3%
공	98	4.3%
물	94	4.1%
타	94	4.1%
수	91	4.0%
Other values (181)	1247	54.5%

Common

Value	Count	Frequency (%)
(	15	34.1%
)	15	34.1%
	9	20.5%
.	4	9.1%
·	1	2.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	2286	98.1%
ASCII	43	1.8%
None	1	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
가	121	5.3%
기	117	5.1%
품	110	4.8%
소	108	4.7%
산	107	4.7%
스	99	4.3%
공	98	4.3%
물	94	4.1%
타	94	4.1%
수	91	4.0%
Other values (181)	1247	54.5%

ASCII

Value	Count	Frequency (%)
(	15	34.9%
)	15	34.9%
	9	20.9%
.	4	9.3%

None

Value	Count	Frequency (%)
·	1	100.0%

수거일
Date

Distinct	96
Distinct (%)	14.9%
Missing	0
Missing (%)	0.0%
Memory size	5.2 KiB

Minimum	2019-12-04 00:00:00
Maximum	2020-12-04 00:00:00

Histogram

Histogram with fixed size bins (bins=50)

원산지
Categorical

HIGH CORRELATION

Distinct	4
Distinct (%)	0.6%
Missing	0
Missing (%)	0.0%
Memory size	5.2 KiB

국내	274
국외	255
국산	96
수입	18

Length

Max length	2
Median length	2
Mean length	2
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	국내
2nd row	국내
3rd row	수입
4th row	국내
5th row	국내

Common Values

Value	Count	Frequency (%)
국내	274	42.6%
국외	255	39.7%
국산	96	14.9%
수입	18	2.8%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
국내	274	42.6%
국외	255	39.7%
국산	96	14.9%
수입	18	2.8%

수입국
Categorical

HIGH CORRELATION IMBALANCE

Distinct	26
Distinct (%)	4.0%
Missing	0
Missing (%)	0.0%
Memory size	5.2 KiB

대한민국	366
일본	195
미국	16
러시아	15
중국	11
Other values (21)	40

Length

Max length	5
Median length	4
Mean length	3.251944
Min length	2

Unique

Unique	12 ?
Unique (%)	1.9%

Sample

1st row	대한민국
2nd row	대한민국
3rd row	일본
4th row	대한민국
5th row	대한민국

Common Values

Value	Count	Frequency (%)
대한민국	366	56.9%
일본	195	30.3%
미국	16	2.5%
러시아	15	2.3%
중국	11	1.7%
노르웨이	6	0.9%
베트남	5	0.8%
페루	4	0.6%
포르투갈	3	0.5%
말레이시아	2	0.3%
Other values (16)	20	3.1%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
대한민국	366	56.9%
일본	195	30.3%
미국	16	2.5%
러시아	15	2.3%
중국	11	1.7%
노르웨이	6	0.9%
베트남	5	0.8%
페루	4	0.6%
포르투갈	3	0.5%
원양산	2	0.3%
Other values (16)	20	3.1%

세슘검출량(Bq/kg)
Categorical

CONSTANT

Distinct	1
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	5.2 KiB

0	643

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	0
2nd row	0
3rd row	0
4th row	0
5th row	0

Common Values

Value	Count	Frequency (%)
0	643	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
0	643	100.0%

요오드검출량(Bq/kg)
Categorical

CONSTANT

Distinct	1
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	5.2 KiB

0	643

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	0
2nd row	0
3rd row	0
4th row	0
5th row	0

Common Values

Value	Count	Frequency (%)
0	643	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
0	643	100.0%

적부판정
Categorical

CONSTANT

Distinct	1
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	5.2 KiB

적합	643

Length

Max length	2
Median length	2
Mean length	2
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	적합
2nd row	적합
3rd row	적합
4th row	적합
5th row	적합

Common Values

Value	Count	Frequency (%)
적합	643	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
적합	643	100.0%

연번

연번

Heatmap
Table

	연번	분류	수거일	원산지	수입국
연번	1.000	0.412	0.995	0.707	0.416
분류	0.412	1.000	0.930	0.598	0.574
수거일	0.995	0.930	1.000	0.941	0.601
원산지	0.707	0.598	0.941	1.000	0.819
수입국	0.416	0.574	0.601	0.819	1.000

Heatmap
Table

	분류	수입국	원산지
분류	1.000	0.332	0.269
수입국	0.332	1.000	0.576
원산지	0.269	0.576	1.000

Heatmap
Table

	연번	분류	원산지	수입국
연번	1.000	0.257	0.507	0.161
분류	0.257	1.000	0.269	0.332
원산지	0.507	0.269	1.000	0.576
수입국	0.161	0.332	0.576	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	연번	분류	제품명	품목(또는 식품유형)	수거일	원산지	수입국	적부판정
0	1	수산물	포항가자미	가자미	2019-12-04	국내	대한민국	적합
1	2	수산물	물메기	메기	2019-12-04	국내	대한민국	적합
2	3	가공식품	블독우스타소스	소스	2019-12-04	수입	일본	적합
3	4	수산물	생대구	대구	2019-12-11	국내	대한민국	적합
4	5	농산물	친환경골든키위	키위	2019-12-11	국내	대한민국	적합
5	6	농산물	저탄소인증샤인머스켓	포도	2019-12-11	국내	대한민국	적합
6	7	가공식품	맛있게빠르다	과.채음료	2019-12-09	국내	대한민국	적합
7	8	가공식품	간바레오도상팩	청주	2019-12-09	수입	일본	적합
8	9	가공식품	츄하이스위토나그린애플	기타주류	2019-12-09	수입	일본	적합
9	10	수산물	동태전	명태냉동	2019-12-10	수입	러시아	적합

	연번	분류	제품명	품목(또는 식품유형)	수거일	원산지	수입국	적부판정
633	634	가공식품	기꼬만혼쯔유(코이다시)	소스	2020-11-23	국외	일본	적합
634	635	수산물	전갱이	전갱이	2020-11-24	국내	대한민국	적합
635	636	수산물	백조기	조기	2020-11-24	국내	대한민국	적합
636	637	수산물	동태	명태	2020-11-24	국내	대한민국	적합
637	638	가공식품	소바가게 소바쯔유	소스	2020-11-25	국외	일본	적합
638	639	가공식품	기꼬만환대두간장	양조간장	2020-11-25	국외	일본	적합
639	640	축산물	한우등심	소고기	2020-12-04	국내	대한민국	적합
640	641	가공식품	덩이뿌리청무와 연근을 함께 담아낸 혼합견과	견과류가공품	2020-11-25	국내	대한민국	적합
641	642	가공식품	오뚜기힐링타임생강차	액상차	2020-11-25	국내	대한민국	적합
642	643	가공식품	광동위생천	액상차	2020-11-25	국내	대한민국	적합

Overview

Variables

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Uppercase Letter

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Other Symbol

Dash Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Hangul

None

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Other Punctuation

Open Punctuation

Close Punctuation

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

None

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample