Overview

Dataset statistics

Number of variables5
Number of observations28
Missing cells28
Missing cells (%)20.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.3 KiB
Average record size in memory48.7 B

Variable types

Unsupported1
Text1
Numeric2
Categorical1

Dataset

Description자가품질검사현황2015년상반기
Author전라북도
URLhttps://www.bigdatahub.go.kr/opendata/dataSet/detail.nm?contentId=37&rlik=49451aebf056b486&serviceId=202417

Alerts

자기품질검사건수 is highly overall correlated with 적합 and 1 other fieldsHigh correlation
적합 is highly overall correlated with 자기품질검사건수 and 1 other fieldsHigh correlation
부적합 is highly overall correlated with 자기품질검사건수 and 1 other fieldsHigh correlation
부적합 is highly imbalanced (66.9%)Imbalance
Unnamed: 0 has 28 (100.0%) missing valuesMissing
식품 유형 has unique valuesUnique
Unnamed: 0 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-14 02:34:43.146801
Analysis finished2024-03-14 02:34:43.756222
Duration0.61 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Unnamed: 0
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing28
Missing (%)100.0%
Memory size384.0 B

식품 유형
Text

UNIQUE 

Distinct28
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size356.0 B
2024-03-14T11:34:43.882315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length8
Mean length4.4642857
Min length2

Characters and Unicode

Total characters125
Distinct characters70
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)100.0%

Sample

1st row과자류
2nd row빵또는 떡류
3rd row코코아가공품 및 초콜릿류
4th row잼류
5th row올리고당류
ValueCountFrequency (%)
과자류 1
 
3.0%
조미식품 1
 
3.0%
위생용품 1
 
3.0%
기구및용기포장 1
 
3.0%
식품첨가물 1
 
3.0%
건강기능식품 1
 
3.0%
장기보존식품 1
 
3.0%
규격외일반가공품 1
 
3.0%
기타식품류 1
 
3.0%
건포류 1
 
3.0%
Other values (23) 23
69.7%
2024-03-14T11:34:44.168867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
18
 
14.4%
10
 
8.0%
8
 
6.4%
5
 
4.0%
5
 
4.0%
3
 
2.4%
3
 
2.4%
3
 
2.4%
2
 
1.6%
2
 
1.6%
Other values (60) 66
52.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 120
96.0%
Space Separator 5
 
4.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
18
 
15.0%
10
 
8.3%
8
 
6.7%
5
 
4.2%
3
 
2.5%
3
 
2.5%
3
 
2.5%
2
 
1.7%
2
 
1.7%
2
 
1.7%
Other values (59) 64
53.3%
Space Separator
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 120
96.0%
Common 5
 
4.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
18
 
15.0%
10
 
8.3%
8
 
6.7%
5
 
4.2%
3
 
2.5%
3
 
2.5%
3
 
2.5%
2
 
1.7%
2
 
1.7%
2
 
1.7%
Other values (59) 64
53.3%
Common
ValueCountFrequency (%)
5
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 120
96.0%
ASCII 5
 
4.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
18
 
15.0%
10
 
8.3%
8
 
6.7%
5
 
4.2%
3
 
2.5%
3
 
2.5%
3
 
2.5%
2
 
1.7%
2
 
1.7%
2
 
1.7%
Other values (59) 64
53.3%
ASCII
ValueCountFrequency (%)
5
100.0%

자기품질검사건수
Real number (ℝ)

HIGH CORRELATION 

Distinct19
Distinct (%)67.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.428571
Minimum1
Maximum524
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size384.0 B
2024-03-14T11:34:44.288105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13.75
median10.5
Q325.5
95-th percentile107.5
Maximum524
Range523
Interquartile range (IQR)21.75

Descriptive statistics

Standard deviation99.171168
Coefficient of variation (CV)2.6496114
Kurtosis23.494876
Mean37.428571
Median Absolute Deviation (MAD)9.5
Skewness4.7218287
Sum1048
Variance9834.9206
MonotonicityNot monotonic
2024-03-14T11:34:44.379164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
1 6
21.4%
14 2
 
7.1%
10 2
 
7.1%
4 2
 
7.1%
15 2
 
7.1%
45 1
 
3.6%
132 1
 
3.6%
524 1
 
3.6%
8 1
 
3.6%
27 1
 
3.6%
Other values (9) 9
32.1%
ValueCountFrequency (%)
1 6
21.4%
3 1
 
3.6%
4 2
 
7.1%
7 1
 
3.6%
8 1
 
3.6%
9 1
 
3.6%
10 2
 
7.1%
11 1
 
3.6%
14 2
 
7.1%
15 2
 
7.1%
ValueCountFrequency (%)
524 1
3.6%
132 1
3.6%
62 1
3.6%
54 1
3.6%
45 1
3.6%
28 1
3.6%
27 1
3.6%
25 1
3.6%
21 1
3.6%
15 2
7.1%

적합
Real number (ℝ)

HIGH CORRELATION 

Distinct19
Distinct (%)67.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.214286
Minimum1
Maximum521
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size384.0 B
2024-03-14T11:34:44.490238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13.75
median10.5
Q325.5
95-th percentile105.85
Maximum521
Range520
Interquartile range (IQR)21.75

Descriptive statistics

Standard deviation98.546608
Coefficient of variation (CV)2.6480854
Kurtosis23.554805
Mean37.214286
Median Absolute Deviation (MAD)9.5
Skewness4.7286816
Sum1042
Variance9711.4339
MonotonicityNot monotonic
2024-03-14T11:34:44.609222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
1 6
21.4%
14 2
 
7.1%
10 2
 
7.1%
4 2
 
7.1%
15 2
 
7.1%
45 1
 
3.6%
130 1
 
3.6%
521 1
 
3.6%
8 1
 
3.6%
27 1
 
3.6%
Other values (9) 9
32.1%
ValueCountFrequency (%)
1 6
21.4%
3 1
 
3.6%
4 2
 
7.1%
7 1
 
3.6%
8 1
 
3.6%
9 1
 
3.6%
10 2
 
7.1%
11 1
 
3.6%
14 2
 
7.1%
15 2
 
7.1%
ValueCountFrequency (%)
521 1
3.6%
130 1
3.6%
61 1
3.6%
54 1
3.6%
45 1
3.6%
28 1
3.6%
27 1
3.6%
25 1
3.6%
21 1
3.6%
15 2
7.1%

부적합
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)14.3%
Missing0
Missing (%)0.0%
Memory size356.0 B
<NA>
25 
2
 
1
1
 
1
3
 
1

Length

Max length4
Median length4
Mean length3.6785714
Min length1

Unique

Unique3 ?
Unique (%)10.7%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 25
89.3%
2 1
 
3.6%
1 1
 
3.6%
3 1
 
3.6%

Length

2024-03-14T11:34:44.705281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-14T11:34:44.804127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 25
89.3%
2 1
 
3.6%
1 1
 
3.6%
3 1
 
3.6%

Interactions

2024-03-14T11:34:43.408350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T11:34:43.262785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T11:34:43.511250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T11:34:43.324963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-14T11:34:44.864918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
식품 유형자기품질검사건수적합부적합
식품 유형1.0001.0001.0001.000
자기품질검사건수1.0001.0001.0001.000
적합1.0001.0001.0001.000
부적합1.0001.0001.0001.000
2024-03-14T11:34:44.949644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자기품질검사건수적합부적합
자기품질검사건수1.0001.0001.000
적합1.0001.0001.000
부적합1.0001.0001.000

Missing values

2024-03-14T11:34:43.613403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T11:34:43.718001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Unnamed: 0식품 유형자기품질검사건수적합부적합
0<NA>과자류4545<NA>
1<NA>빵또는 떡류2525<NA>
2<NA>코코아가공품 및 초콜릿류11<NA>
3<NA>잼류11<NA>
4<NA>올리고당류1414<NA>
5<NA>두부류 또는 묵류99<NA>
6<NA>식용유지류1010<NA>
7<NA>면류44<NA>
8<NA>다류1515<NA>
9<NA>커피1111<NA>
Unnamed: 0식품 유형자기품질검사건수적합부적합
18<NA>주류5454<NA>
19<NA>건포류77<NA>
20<NA>기타식품류1321302
21<NA>규격외일반가공품62611
22<NA>장기보존식품44<NA>
23<NA>건강기능식품11<NA>
24<NA>식품첨가물2828<NA>
25<NA>기구및용기포장2727<NA>
26<NA>위생용품88<NA>
27<NA>총합계5245213