Overview

Dataset statistics

Number of variables11
Number of observations3691
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory328.1 KiB
Average record size in memory91.0 B

Variable types

Numeric2
Categorical7
Text1
Boolean1

Alerts

test_result has constant value ""Constant
apr_at has constant value ""Constant
last_load_dttm has constant value ""Constant
test_year is highly overall correlated with skey and 1 other fieldsHigh correlation
data_day is highly overall correlated with skey and 2 other fieldsHigh correlation
skey is highly overall correlated with test_year and 1 other fieldsHigh correlation
test_month is highly overall correlated with data_dayHigh correlation
detec_result is highly overall correlated with originHigh correlation
origin is highly overall correlated with detec_resultHigh correlation
detec_result is highly imbalanced (98.9%)Imbalance
origin is highly imbalanced (64.9%)Imbalance
skey has unique valuesUnique

Reproduction

Analysis started2023-10-09 06:33:36.127833
Analysis finished2023-10-09 06:33:42.566345
Duration6.44 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

skey
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct3691
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4823
Minimum2978
Maximum6668
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.6 KiB
2023-10-09T15:33:42.731710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2978
5-th percentile3162.5
Q13900.5
median4823
Q35745.5
95-th percentile6483.5
Maximum6668
Range3690
Interquartile range (IQR)1845

Descriptive statistics

Standard deviation1065.6442
Coefficient of variation (CV)0.2209505
Kurtosis-1.2
Mean4823
Median Absolute Deviation (MAD)923
Skewness0
Sum17801693
Variance1135597.7
MonotonicityNot monotonic
2023-10-09T15:33:43.073717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4674 1
 
< 0.1%
4052 1
 
< 0.1%
4054 1
 
< 0.1%
4055 1
 
< 0.1%
4056 1
 
< 0.1%
4057 1
 
< 0.1%
4058 1
 
< 0.1%
4059 1
 
< 0.1%
4060 1
 
< 0.1%
4061 1
 
< 0.1%
Other values (3681) 3681
99.7%
ValueCountFrequency (%)
2978 1
< 0.1%
2979 1
< 0.1%
2980 1
< 0.1%
2981 1
< 0.1%
2982 1
< 0.1%
2983 1
< 0.1%
2984 1
< 0.1%
2985 1
< 0.1%
2986 1
< 0.1%
2987 1
< 0.1%
ValueCountFrequency (%)
6668 1
< 0.1%
6667 1
< 0.1%
6666 1
< 0.1%
6665 1
< 0.1%
6664 1
< 0.1%
6663 1
< 0.1%
6662 1
< 0.1%
6661 1
< 0.1%
6660 1
< 0.1%
6659 1
< 0.1%

test_year
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size29.0 KiB
2019
2408 
2020
1283 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019
2nd row2019
3rd row2019
4th row2019
5th row2019

Common Values

ValueCountFrequency (%)
2019 2408
65.2%
2020 1283
34.8%

Length

2023-10-09T15:33:43.368365image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T15:33:43.591292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2019 2408
65.2%
2020 1283
34.8%

test_month
Real number (ℝ)

HIGH CORRELATION 

Distinct12
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.4968843
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.6 KiB
2023-10-09T15:33:43.783079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q14
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.3986572
Coefficient of variation (CV)0.5231211
Kurtosis-1.1395352
Mean6.4968843
Median Absolute Deviation (MAD)3
Skewness0.050389878
Sum23980
Variance11.550871
MonotonicityNot monotonic
2023-10-09T15:33:43.998419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
5 458
12.4%
3 444
12.0%
1 372
10.1%
11 359
9.7%
4 324
8.8%
8 317
8.6%
7 316
8.6%
12 306
8.3%
9 277
7.5%
6 242
6.6%
Other values (2) 276
7.5%
ValueCountFrequency (%)
1 372
10.1%
2 48
 
1.3%
3 444
12.0%
4 324
8.8%
5 458
12.4%
6 242
6.6%
7 316
8.6%
8 317
8.6%
9 277
7.5%
10 228
6.2%
ValueCountFrequency (%)
12 306
8.3%
11 359
9.7%
10 228
6.2%
9 277
7.5%
8 317
8.6%
7 316
8.6%
6 242
6.6%
5 458
12.4%
4 324
8.8%
3 444
12.0%
Distinct733
Distinct (%)19.9%
Missing0
Missing (%)0.0%
Memory size29.0 KiB
2023-10-09T15:33:44.503081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length23
Mean length5.7745868
Min length1

Characters and Unicode

Total characters21314
Distinct characters508
Distinct categories10 ?
Distinct scripts5 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique121 ?
Unique (%)3.3%

Sample

1st row아리담배추김치
2nd row땅콩맛전병
3rd row잣맛전병
4th row김파래맛전병
5th row깨땅콩맛전병
ValueCountFrequency (%)
고등어 182
 
3.8%
삼치 91
 
1.9%
우럭 59
 
1.2%
가자미 59
 
1.2%
오징어 45
 
0.9%
소스 45
 
0.9%
명란 39
 
0.8%
생대구 38
 
0.8%
갈치 37
 
0.8%
기꼬만 36
 
0.7%
Other values (847) 4222
87.0%
2023-10-09T15:33:45.342518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1162
 
5.5%
608
 
2.9%
498
 
2.3%
457
 
2.1%
444
 
2.1%
422
 
2.0%
402
 
1.9%
387
 
1.8%
325
 
1.5%
301
 
1.4%
Other values (498) 16308
76.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 19244
90.3%
Space Separator 1162
 
5.5%
Open Punctuation 246
 
1.2%
Close Punctuation 246
 
1.2%
Lowercase Letter 152
 
0.7%
Decimal Number 149
 
0.7%
Uppercase Letter 75
 
0.4%
Dash Punctuation 18
 
0.1%
Other Punctuation 18
 
0.1%
Other Symbol 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
608
 
3.2%
498
 
2.6%
457
 
2.4%
444
 
2.3%
422
 
2.2%
402
 
2.1%
387
 
2.0%
325
 
1.7%
301
 
1.6%
300
 
1.6%
Other values (450) 15100
78.5%
Lowercase Letter
ValueCountFrequency (%)
j 24
15.8%
u 14
 
9.2%
i 12
 
7.9%
r 12
 
7.9%
a 10
 
6.6%
e 10
 
6.6%
o 10
 
6.6%
s 8
 
5.3%
t 8
 
5.3%
l 8
 
5.3%
Other values (8) 36
23.7%
Uppercase Letter
ValueCountFrequency (%)
S 12
16.0%
P 8
10.7%
M 8
10.7%
G 8
10.7%
B 8
10.7%
T 6
8.0%
N 6
8.0%
A 4
 
5.3%
C 4
 
5.3%
E 4
 
5.3%
Other values (3) 7
9.3%
Decimal Number
ValueCountFrequency (%)
0 42
28.2%
5 38
25.5%
1 29
19.5%
6 16
 
10.7%
3 10
 
6.7%
2 7
 
4.7%
9 5
 
3.4%
7 2
 
1.3%
Other Punctuation
ValueCountFrequency (%)
& 13
72.2%
, 2
 
11.1%
/ 2
 
11.1%
? 1
 
5.6%
Space Separator
ValueCountFrequency (%)
1162
100.0%
Open Punctuation
ValueCountFrequency (%)
( 246
100.0%
Close Punctuation
ValueCountFrequency (%)
) 246
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 18
100.0%
Other Symbol
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 19239
90.3%
Common 1843
 
8.6%
Latin 227
 
1.1%
Han 4
 
< 0.1%
Katakana 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
608
 
3.2%
498
 
2.6%
457
 
2.4%
444
 
2.3%
422
 
2.2%
402
 
2.1%
387
 
2.0%
325
 
1.7%
301
 
1.6%
300
 
1.6%
Other values (445) 15095
78.5%
Latin
ValueCountFrequency (%)
j 24
 
10.6%
u 14
 
6.2%
i 12
 
5.3%
S 12
 
5.3%
r 12
 
5.3%
a 10
 
4.4%
e 10
 
4.4%
o 10
 
4.4%
P 8
 
3.5%
s 8
 
3.5%
Other values (21) 107
47.1%
Common
ValueCountFrequency (%)
1162
63.0%
( 246
 
13.3%
) 246
 
13.3%
0 42
 
2.3%
5 38
 
2.1%
1 29
 
1.6%
- 18
 
1.0%
6 16
 
0.9%
& 13
 
0.7%
3 10
 
0.5%
Other values (7) 23
 
1.2%
Han
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Katakana
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 19239
90.3%
ASCII 2066
 
9.7%
Specials 4
 
< 0.1%
CJK 4
 
< 0.1%
Katakana 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1162
56.2%
( 246
 
11.9%
) 246
 
11.9%
0 42
 
2.0%
5 38
 
1.8%
1 29
 
1.4%
j 24
 
1.2%
- 18
 
0.9%
6 16
 
0.8%
u 14
 
0.7%
Other values (37) 231
 
11.2%
Hangul
ValueCountFrequency (%)
608
 
3.2%
498
 
2.6%
457
 
2.4%
444
 
2.3%
422
 
2.2%
402
 
2.1%
387
 
2.0%
325
 
1.7%
301
 
1.6%
300
 
1.6%
Other values (445) 15095
78.5%
Specials
ValueCountFrequency (%)
4
100.0%
CJK
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Katakana
ValueCountFrequency (%)
1
100.0%

kind
Categorical

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size29.0 KiB
가공식품
1776 
수산물
1631 
농산물
251 
축산물
 
33

Length

Max length4
Median length3
Mean length3.4811704
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row가공식품
2nd row가공식품
3rd row가공식품
4th row가공식품
5th row가공식품

Common Values

ValueCountFrequency (%)
가공식품 1776
48.1%
수산물 1631
44.2%
농산물 251
 
6.8%
축산물 33
 
0.9%

Length

2023-10-09T15:33:45.593272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T15:33:45.791289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
가공식품 1776
48.1%
수산물 1631
44.2%
농산물 251
 
6.8%
축산물 33
 
0.9%

test_result
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.0 KiB
적합
3691 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row적합
2nd row적합
3rd row적합
4th row적합
5th row적합

Common Values

ValueCountFrequency (%)
적합 3691
100.0%

Length

2023-10-09T15:33:46.009699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T15:33:46.190232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
적합 3691
100.0%

detec_result
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size29.0 KiB
불검출
3684 
2 Bq/kg
 
4
137Cs, 9.8 Bq/kg 검출
 
2
137Cs 0.9 Bq/kg
 
1

Length

Max length19
Median length3
Mean length3.0162558
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row불검출
2nd row불검출
3rd row불검출
4th row불검출
5th row불검출

Common Values

ValueCountFrequency (%)
불검출 3684
99.8%
2 Bq/kg 4
 
0.1%
137Cs, 9.8 Bq/kg 검출 2
 
0.1%
137Cs 0.9 Bq/kg 1
 
< 0.1%

Length

2023-10-09T15:33:46.399462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T15:33:46.611764image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
불검출 3684
99.5%
bq/kg 7
 
0.2%
2 4
 
0.1%
137cs 3
 
0.1%
9.8 2
 
0.1%
검출 2
 
0.1%
0.9 1
 
< 0.1%

origin
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct43
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size29.0 KiB
국내산
2139 
일본산
1064 
국산
 
92
러시아산
 
86
미국산
 
47
Other values (38)
263 

Length

Max length13
Median length3
Mean length3.0671905
Min length2

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st row국내산
2nd row국내산
3rd row국내산
4th row국내산
5th row국내산

Common Values

ValueCountFrequency (%)
국내산 2139
58.0%
일본산 1064
28.8%
국산 92
 
2.5%
러시아산 86
 
2.3%
미국산 47
 
1.3%
노르웨이산 30
 
0.8%
중국산 29
 
0.8%
포르투칼산 25
 
0.7%
원양산 24
 
0.7%
러시아 24
 
0.7%
Other values (33) 131
 
3.5%

Length

2023-10-09T15:33:46.793838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
국내산 2139
57.7%
일본산 1064
28.7%
국산 92
 
2.5%
러시아산 86
 
2.3%
미국산 47
 
1.3%
노르웨이산 30
 
0.8%
중국산 29
 
0.8%
포르투칼산 25
 
0.7%
원양산 24
 
0.6%
러시아 24
 
0.6%
Other values (35) 145
 
3.9%

data_day
Categorical

HIGH CORRELATION 

Distinct23
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size29.0 KiB
2019-04-30
340 
2020-02-03
316 
2019-12-13
284 
2019-09-17
256 
2019-12-31
252 
Other values (18)
2243 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019-03-30
2nd row2019-03-30
3rd row2019-03-30
4th row2019-03-30
5th row2019-03-30

Common Values

ValueCountFrequency (%)
2019-04-30 340
 
9.2%
2020-02-03 316
 
8.6%
2019-12-13 284
 
7.7%
2019-09-17 256
 
6.9%
2019-12-31 252
 
6.8%
2019-06-10 248
 
6.7%
2019-05-13 212
 
5.7%
2019-10-14 212
 
5.7%
2020-06-04 210
 
5.7%
2019-08-12 184
 
5.0%
Other values (13) 1177
31.9%

Length

2023-10-09T15:33:46.993361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2019-04-30 340
 
9.2%
2020-02-03 316
 
8.6%
2019-12-13 284
 
7.7%
2019-09-17 256
 
6.9%
2019-12-31 252
 
6.8%
2019-06-10 248
 
6.7%
2019-05-13 212
 
5.7%
2019-10-14 212
 
5.7%
2020-06-04 210
 
5.7%
2019-08-12 184
 
5.0%
Other values (13) 1177
31.9%

apr_at
Boolean

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
False
3691 
ValueCountFrequency (%)
False 3691
100.0%
2023-10-09T15:33:47.179968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

last_load_dttm
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.0 KiB
2023-05-01 05:49:03
3691 

Length

Max length19
Median length19
Mean length19
Min length19

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-05-01 05:49:03
2nd row2023-05-01 05:49:03
3rd row2023-05-01 05:49:03
4th row2023-05-01 05:49:03
5th row2023-05-01 05:49:03

Common Values

ValueCountFrequency (%)
2023-05-01 05:49:03 3691
100.0%

Length

2023-10-09T15:33:47.346536image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T15:33:47.498712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023-05-01 3691
50.0%
05:49:03 3691
50.0%

Interactions

2023-10-09T15:33:41.637362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T15:33:41.113734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T15:33:41.879157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T15:33:41.400617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T15:33:47.597366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
skeytest_yeartest_monthkinddetec_resultorigindata_day
skey1.0000.9290.8060.2250.0160.2310.882
test_year0.9291.0000.5180.3440.0540.3111.000
test_month0.8060.5181.0000.3020.1110.4671.000
kind0.2250.3440.3021.0000.1170.6130.441
detec_result0.0160.0540.1110.1171.0000.8290.201
origin0.2310.3110.4670.6130.8291.0000.595
data_day0.8821.0001.0000.4410.2010.5951.000
2023-10-09T15:33:47.792789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
test_yearorigindetec_resultkinddata_day
test_year1.0000.2590.0360.2300.997
origin0.2591.0000.5800.3550.175
detec_result0.0360.5801.0000.0470.108
kind0.2300.3550.0471.0000.249
data_day0.9970.1750.1080.2491.000
2023-10-09T15:33:47.963277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
skeytest_monthtest_yearkinddetec_resultorigindata_day
skey1.0000.1680.7750.1360.0100.0810.581
test_month0.1681.0000.3980.1840.0660.1790.998
test_year0.7750.3981.0000.2300.0360.2590.997
kind0.1360.1840.2301.0000.0470.3550.249
detec_result0.0100.0660.0360.0471.0000.5800.108
origin0.0810.1790.2590.3550.5801.0000.175
data_day0.5810.9980.9970.2490.1080.1751.000

Missing values

2023-10-09T15:33:42.119764image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T15:33:42.431101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

skeytest_yeartest_monthspec_namekindtest_resultdetec_resultorigindata_dayapr_atlast_load_dttm
0467420191아리담배추김치가공식품적합불검출국내산2019-03-30N2023-05-01 05:49:03
1467520191땅콩맛전병가공식품적합불검출국내산2019-03-30N2023-05-01 05:49:03
2467620191잣맛전병가공식품적합불검출국내산2019-03-30N2023-05-01 05:49:03
3467720191김파래맛전병가공식품적합불검출국내산2019-03-30N2023-05-01 05:49:03
4467820191깨땅콩맛전병가공식품적합불검출국내산2019-03-30N2023-05-01 05:49:03
5467920191유씨씨블랙넌슈가PET가공식품적합불검출일본산2019-03-30N2023-05-01 05:49:03
6468020191송로가공식품적합불검출일본산2019-03-30N2023-05-01 05:49:03
7468120191치즈크래커가공식품적합불검출일본산2019-03-30N2023-05-01 05:49:03
8468220191제주갈치수산물적합불검출국내산2019-03-30N2023-05-01 05:49:03
9468320191아구수산물적합불검출국내산2019-03-30N2023-05-01 05:49:03
skeytest_yeartest_monthspec_namekindtest_resultdetec_resultorigindata_dayapr_atlast_load_dttm
36816659202012전갱이수산물적합불검출국내산2020-12-23N2023-05-01 05:49:03
36826660202012백조기수산물적합불검출국내산2020-12-23N2023-05-01 05:49:03
36836661202012동태수산물적합불검출러시아산2020-12-23N2023-05-01 05:49:03
36846662202012소바가게 소바쯔유가공식품적합불검출일본산2020-12-23N2023-05-01 05:49:03
36856663202012기꼬만 환대두생간장가공식품적합불검출일본산2020-12-23N2023-05-01 05:49:03
36866664202012다시마장유가공식품적합불검출일본산2020-12-23N2023-05-01 05:49:03
36876665202012기꼬만혼쯔유가공식품적합불검출일본산2020-12-23N2023-05-01 05:49:03
36886666202012컵미소-아와세가공식품적합불검출일본산2020-12-23N2023-05-01 05:49:03
36896667202012한우등심축산물적합불검출국내산2020-12-23N2023-05-01 05:49:03
36906668202012에스앤비 골든카레 매운맛가공식품적합불검출일본산2020-12-23N2023-05-01 05:49:03