Overview

Dataset statistics

Number of variables4
Number of observations28
Missing cells6
Missing cells (%)5.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.1 KiB
Average record size in memory39.7 B

Variable types

Text1
Numeric2
Categorical1

Alerts

자기품질검사건수 is highly overall correlated with 적합 and 1 other fieldsHigh correlation
적합 is highly overall correlated with 자기품질검사건수 and 1 other fieldsHigh correlation
부적합 is highly overall correlated with 자기품질검사건수 and 1 other fieldsHigh correlation
부적합 is highly imbalanced (66.9%)Imbalance
자기품질검사건수 has 3 (10.7%) missing valuesMissing
적합 has 3 (10.7%) missing valuesMissing
식품 유형 has unique valuesUnique

Reproduction

Analysis started2024-03-14 02:08:05.210484
Analysis finished2024-03-14 02:08:05.820535
Duration0.61 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

식품 유형
Text

UNIQUE 

Distinct28
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size356.0 B
2024-03-14T11:08:05.937051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length6.5
Mean length3.8928571
Min length2

Characters and Unicode

Total characters109
Distinct characters58
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)100.0%

Sample

1st row과자류
2nd row빵또는 떡류
3rd row포도당
4th row과당
5th row엿류
ValueCountFrequency (%)
과자류 1
 
3.2%
드레싱류 1
 
3.2%
위생용품 1
 
3.2%
기구및용기포장 1
 
3.2%
식품첨가물 1
 
3.2%
건강기능식품 1
 
3.2%
장기보존식품 1
 
3.2%
기타가공품 1
 
3.2%
기타식품류 1
 
3.2%
건포류 1
 
3.2%
Other values (21) 21
67.7%
2024-03-14T11:08:06.209207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
16
 
14.7%
9
 
8.3%
8
 
7.3%
6
 
5.5%
3
 
2.8%
3
 
2.8%
3
 
2.8%
3
 
2.8%
2
 
1.8%
2
 
1.8%
Other values (48) 54
49.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 106
97.2%
Space Separator 3
 
2.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
16
 
15.1%
9
 
8.5%
8
 
7.5%
6
 
5.7%
3
 
2.8%
3
 
2.8%
3
 
2.8%
2
 
1.9%
2
 
1.9%
2
 
1.9%
Other values (47) 52
49.1%
Space Separator
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 106
97.2%
Common 3
 
2.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
16
 
15.1%
9
 
8.5%
8
 
7.5%
6
 
5.7%
3
 
2.8%
3
 
2.8%
3
 
2.8%
2
 
1.9%
2
 
1.9%
2
 
1.9%
Other values (47) 52
49.1%
Common
ValueCountFrequency (%)
3
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 106
97.2%
ASCII 3
 
2.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
16
 
15.1%
9
 
8.5%
8
 
7.5%
6
 
5.7%
3
 
2.8%
3
 
2.8%
3
 
2.8%
2
 
1.9%
2
 
1.9%
2
 
1.9%
Other values (47) 52
49.1%
ASCII
ValueCountFrequency (%)
3
100.0%

자기품질검사건수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct22
Distinct (%)88.0%
Missing3
Missing (%)10.7%
Infinite0
Infinite (%)0.0%
Mean72.48
Minimum1
Maximum906
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size384.0 B
2024-03-14T11:08:06.314392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4.6
Q114
median31
Q358
95-th percentile145.6
Maximum906
Range905
Interquartile range (IQR)44

Descriptive statistics

Standard deviation177.04734
Coefficient of variation (CV)2.4427061
Kurtosis22.892019
Mean72.48
Median Absolute Deviation (MAD)20
Skewness4.7087109
Sum1812
Variance31345.76
MonotonicityNot monotonic
2024-03-14T11:08:06.422102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
13 2
 
7.1%
7 2
 
7.1%
16 2
 
7.1%
18 1
 
3.6%
906 1
 
3.6%
51 1
 
3.6%
58 1
 
3.6%
61 1
 
3.6%
163 1
 
3.6%
4 1
 
3.6%
Other values (12) 12
42.9%
(Missing) 3
 
10.7%
ValueCountFrequency (%)
1 1
3.6%
4 1
3.6%
7 2
7.1%
13 2
7.1%
14 1
3.6%
16 2
7.1%
18 1
3.6%
20 1
3.6%
25 1
3.6%
31 1
3.6%
ValueCountFrequency (%)
906 1
3.6%
163 1
3.6%
76 1
3.6%
74 1
3.6%
61 1
3.6%
60 1
3.6%
58 1
3.6%
56 1
3.6%
51 1
3.6%
47 1
3.6%

적합
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct23
Distinct (%)92.0%
Missing3
Missing (%)10.7%
Infinite0
Infinite (%)0.0%
Mean72
Minimum1
Maximum900
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size384.0 B
2024-03-14T11:08:06.555452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4.6
Q114
median31
Q358
95-th percentile142.4
Maximum900
Range899
Interquartile range (IQR)44

Descriptive statistics

Standard deviation175.81477
Coefficient of variation (CV)2.4418718
Kurtosis22.927582
Mean72
Median Absolute Deviation (MAD)20
Skewness4.7129068
Sum1800
Variance30910.833
MonotonicityNot monotonic
2024-03-14T11:08:06.692392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
7 2
 
7.1%
16 2
 
7.1%
31 1
 
3.6%
900 1
 
3.6%
51 1
 
3.6%
58 1
 
3.6%
61 1
 
3.6%
159 1
 
3.6%
4 1
 
3.6%
74 1
 
3.6%
Other values (13) 13
46.4%
(Missing) 3
 
10.7%
ValueCountFrequency (%)
1 1
3.6%
4 1
3.6%
7 2
7.1%
11 1
3.6%
13 1
3.6%
14 1
3.6%
16 2
7.1%
18 1
3.6%
20 1
3.6%
25 1
3.6%
ValueCountFrequency (%)
900 1
3.6%
159 1
3.6%
76 1
3.6%
74 1
3.6%
61 1
3.6%
60 1
3.6%
58 1
3.6%
56 1
3.6%
51 1
3.6%
47 1
3.6%

부적합
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)14.3%
Missing0
Missing (%)0.0%
Memory size356.0 B
<NA>
25 
2
 
1
4
 
1
6
 
1

Length

Max length4
Median length4
Mean length3.6785714
Min length1

Unique

Unique3 ?
Unique (%)10.7%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 25
89.3%
2 1
 
3.6%
4 1
 
3.6%
6 1
 
3.6%

Length

2024-03-14T11:08:06.883795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-14T11:08:07.022253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 25
89.3%
2 1
 
3.6%
4 1
 
3.6%
6 1
 
3.6%

Interactions

2024-03-14T11:08:05.515352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T11:08:05.384639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T11:08:05.572448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T11:08:05.455476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-14T11:08:07.111981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
식품 유형자기품질검사건수적합부적합
식품 유형1.0001.0001.0001.000
자기품질검사건수1.0001.0001.0001.000
적합1.0001.0001.0001.000
부적합1.0001.0001.0001.000
2024-03-14T11:08:07.322476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자기품질검사건수적합부적합
자기품질검사건수1.0001.0001.000
적합1.0001.0001.000
부적합1.0001.0001.000

Missing values

2024-03-14T11:08:05.646345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T11:08:05.711462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-14T11:08:05.779682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

식품 유형자기품질검사건수적합부적합
0과자류7676<NA>
1빵또는 떡류3131<NA>
2포도당<NA><NA><NA>
3과당<NA><NA><NA>
4엿류2020<NA>
5두부류 또는 묵류1313<NA>
6식용유지류4343<NA>
7면류3232<NA>
8다류6060<NA>
9커피1414<NA>
식품 유형자기품질검사건수적합부적합
18주류7474<NA>
19건포류44<NA>
20기타식품류1631594
21기타가공품6161<NA>
22장기보존식품77<NA>
23건강기능식품<NA><NA><NA>
24식품첨가물5858<NA>
25기구및용기포장5151<NA>
26위생용품1616<NA>
27총합계9069006