gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	28
Missing cells	6
Missing cells (%)	5.4%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	1.1 KiB
Average record size in memory	39.7 B

Variable types

Text	1
Numeric	2
Categorical	1

Dataset

Description	자가품질검사현황2014
Author	전라북도
URL	https://www.bigdatahub.go.kr/opendata/dataSet/detail.nm?contentId=37&rlik=49451aebf056b486&serviceId=202362

Alerts

`자기품질검사건수` is highly overall correlated with `적합` and 1 other fields	High correlation
`적합` is highly overall correlated with `자기품질검사건수` and 1 other fields	High correlation
`부적합` is highly overall correlated with `자기품질검사건수` and 1 other fields	High correlation
`부적합` is highly imbalanced (66.9%)	Imbalance
`자기품질검사건수` has 3 (10.7%) missing values	Missing
`적합` has 3 (10.7%) missing values	Missing
`식품 유형` has unique values	Unique

Reproduction

Analysis started	2024-03-14 02:08:05.210484
Analysis finished	2024-03-14 02:08:05.820535
Duration	0.61 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

식품 유형
Text

UNIQUE

Distinct	28
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	356.0 B

Length

Max length	9
Median length	6.5
Mean length	3.8928571
Min length	2

Characters and Unicode

Total characters	109
Distinct characters	58
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	28 ?
Unique (%)	100.0%

Sample

1st row	과자류
2nd row	빵또는 떡류
3rd row	포도당
4th row	과당
5th row	엿류

Value	Count	Frequency (%)
과자류	1	3.2%
드레싱류	1	3.2%
위생용품	1	3.2%
기구및용기포장	1	3.2%
식품첨가물	1	3.2%
건강기능식품	1	3.2%
장기보존식품	1	3.2%
기타가공품	1	3.2%
기타식품류	1	3.2%
건포류	1	3.2%
Other values (21)	21	67.7%

Most occurring characters

Value	Count	Frequency (%)
류	16	14.7%
품	9	8.3%
식	8	7.3%
기	6	5.5%
장	3	2.8%
	3	2.8%
용	3	2.8%
포	3	2.8%
과	2	1.8%
가	2	1.8%
Other values (48)	54	49.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	106	97.2%
Space Separator	3	2.8%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
류	16	15.1%
품	9	8.5%
식	8	7.5%
기	6	5.7%
장	3	2.8%
용	3	2.8%
포	3	2.8%
과	2	1.9%
가	2	1.9%
타	2	1.9%
Other values (47)	52	49.1%

Space Separator

Value	Count	Frequency (%)
	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	106	97.2%
Common	3	2.8%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
류	16	15.1%
품	9	8.5%
식	8	7.5%
기	6	5.7%
장	3	2.8%
용	3	2.8%
포	3	2.8%
과	2	1.9%
가	2	1.9%
타	2	1.9%
Other values (47)	52	49.1%

Common

Value	Count	Frequency (%)
	3	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	106	97.2%
ASCII	3	2.8%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
류	16	15.1%
품	9	8.5%
식	8	7.5%
기	6	5.7%
장	3	2.8%
용	3	2.8%
포	3	2.8%
과	2	1.9%
가	2	1.9%
타	2	1.9%
Other values (47)	52	49.1%

ASCII

Value	Count	Frequency (%)
	3	100.0%

자기품질검사건수
Real number (ℝ)

HIGH CORRELATION MISSING

Distinct	22
Distinct (%)	88.0%
Missing	3
Missing (%)	10.7%
Infinite	0
Infinite (%)	0.0%
Mean	72.48

Minimum	1
Maximum	906
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	384.0 B

Quantile statistics

Minimum	1
5-th percentile	4.6
Q1	14
median	31
Q3	58
95-th percentile	145.6
Maximum	906
Range	905
Interquartile range (IQR)	44

Descriptive statistics

Standard deviation	177.04734
Coefficient of variation (CV)	2.4427061
Kurtosis	22.892019
Mean	72.48
Median Absolute Deviation (MAD)	20
Skewness	4.7087109
Sum	1812
Variance	31345.76
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=22)

Value	Count	Frequency (%)
13	2	7.1%
7	2	7.1%
16	2	7.1%
18	1	3.6%
906	1	3.6%
51	1	3.6%
58	1	3.6%
61	1	3.6%
163	1	3.6%
4	1	3.6%
Other values (12)	12	42.9%
(Missing)	3	10.7%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	1	3.6%
4	1	3.6%
7	2	7.1%
13	2	7.1%
14	1	3.6%
16	2	7.1%
18	1	3.6%
20	1	3.6%
25	1	3.6%
31	1	3.6%

Value	Count	Frequency (%)
906	1	3.6%
163	1	3.6%
76	1	3.6%
74	1	3.6%
61	1	3.6%
60	1	3.6%
58	1	3.6%
56	1	3.6%
51	1	3.6%
47	1	3.6%

적합
Real number (ℝ)

HIGH CORRELATION MISSING

Distinct	23
Distinct (%)	92.0%
Missing	3
Missing (%)	10.7%
Infinite	0
Infinite (%)	0.0%
Mean	72

Minimum	1
Maximum	900
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	384.0 B

Quantile statistics

Minimum	1
5-th percentile	4.6
Q1	14
median	31
Q3	58
95-th percentile	142.4
Maximum	900
Range	899
Interquartile range (IQR)	44

Descriptive statistics

Standard deviation	175.81477
Coefficient of variation (CV)	2.4418718
Kurtosis	22.927582
Mean	72
Median Absolute Deviation (MAD)	20
Skewness	4.7129068
Sum	1800
Variance	30910.833
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=23)

Value	Count	Frequency (%)
7	2	7.1%
16	2	7.1%
31	1	3.6%
900	1	3.6%
51	1	3.6%
58	1	3.6%
61	1	3.6%
159	1	3.6%
4	1	3.6%
74	1	3.6%
Other values (13)	13	46.4%
(Missing)	3	10.7%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	1	3.6%
4	1	3.6%
7	2	7.1%
11	1	3.6%
13	1	3.6%
14	1	3.6%
16	2	7.1%
18	1	3.6%
20	1	3.6%
25	1	3.6%

Value	Count	Frequency (%)
900	1	3.6%
159	1	3.6%
76	1	3.6%
74	1	3.6%
61	1	3.6%
60	1	3.6%
58	1	3.6%
56	1	3.6%
51	1	3.6%
47	1	3.6%

부적합
Categorical

HIGH CORRELATION IMBALANCE

Distinct	4
Distinct (%)	14.3%
Missing	0
Missing (%)	0.0%
Memory size	356.0 B

<NA>	25
2	1
4	1
6	1

Length

Max length	4
Median length	4
Mean length	3.6785714
Min length	1

Unique

Unique	3 ?
Unique (%)	10.7%

Sample

1st row	<NA>
2nd row	<NA>
3rd row	<NA>
4th row	<NA>
5th row	<NA>

Common Values

Value	Count	Frequency (%)
<NA>	25	89.3%
2	1	3.6%
4	1	3.6%
6	1	3.6%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	25	89.3%
2	1	3.6%
4	1	3.6%
6	1	3.6%

자기품질검사건수
적합

적합
자기품질검사건수

적합
자기품질검사건수

Phik (φk)
Auto

Heatmap
Table

	식품 유형	자기품질검사건수	적합	부적합
식품 유형	1.000	1.000	1.000	1.000
자기품질검사건수	1.000	1.000	1.000	1.000
적합	1.000	1.000	1.000	1.000
부적합	1.000	1.000	1.000	1.000

Heatmap
Table

	자기품질검사건수	적합	부적합
자기품질검사건수	1.000	1.000	1.000
적합	1.000	1.000	1.000
부적합	1.000	1.000	1.000

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	식품 유형	자기품질검사건수	적합	부적합
0	과자류	76	76	<NA>
1	빵또는 떡류	31	31	<NA>
2	포도당	<NA>	<NA>	<NA>
3	과당	<NA>	<NA>	<NA>
4	엿류	20	20	<NA>
5	두부류 또는 묵류	13	13	<NA>
6	식용유지류	43	43	<NA>
7	면류	32	32	<NA>
8	다류	60	60	<NA>
9	커피	14	14	<NA>

	식품 유형	자기품질검사건수	적합	부적합
18	주류	74	74	<NA>
19	건포류	4	4	<NA>
20	기타식품류	163	159	4
21	기타가공품	61	61	<NA>
22	장기보존식품	7	7	<NA>
23	건강기능식품	<NA>	<NA>	<NA>
24	식품첨가물	58	58	<NA>
25	기구및용기포장	51	51	<NA>
26	위생용품	16	16	<NA>
27	총합계	906	900	6

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample