Overview

Dataset statistics

Number of variables11
Number of observations1696
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory150.8 KiB
Average record size in memory91.1 B

Variable types

Numeric2
Categorical5
Text1
DateTime2
Boolean1

Alerts

test_result has constant value ""Constant
apr_at has constant value ""Constant
skey is highly overall correlated with test_yearHigh correlation
test_month is highly overall correlated with test_yearHigh correlation
test_year is highly overall correlated with skey and 1 other fieldsHigh correlation
detec_result is highly overall correlated with originHigh correlation
origin is highly overall correlated with detec_resultHigh correlation
detec_result is highly imbalanced (98.7%)Imbalance
origin is highly imbalanced (63.9%)Imbalance
skey has unique valuesUnique

Reproduction

Analysis started2023-12-10 08:45:59.765481
Analysis finished2023-12-10 08:46:02.651595
Duration2.89 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

skey
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct1696
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3825.5
Minimum2978
Maximum4673
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.0 KiB
2023-12-10T17:46:02.796649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2978
5-th percentile3062.75
Q13401.75
median3825.5
Q34249.25
95-th percentile4588.25
Maximum4673
Range1695
Interquartile range (IQR)847.5

Descriptive statistics

Standard deviation489.73734
Coefficient of variation (CV)0.12801917
Kurtosis-1.2
Mean3825.5
Median Absolute Deviation (MAD)424
Skewness0
Sum6488048
Variance239842.67
MonotonicityStrictly increasing
2023-12-10T17:46:03.465622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2978 1
 
0.1%
4118 1
 
0.1%
4116 1
 
0.1%
4115 1
 
0.1%
4114 1
 
0.1%
4113 1
 
0.1%
4112 1
 
0.1%
4111 1
 
0.1%
4110 1
 
0.1%
4109 1
 
0.1%
Other values (1686) 1686
99.4%
ValueCountFrequency (%)
2978 1
0.1%
2979 1
0.1%
2980 1
0.1%
2981 1
0.1%
2982 1
0.1%
2983 1
0.1%
2984 1
0.1%
2985 1
0.1%
2986 1
0.1%
2987 1
0.1%
ValueCountFrequency (%)
4673 1
0.1%
4672 1
0.1%
4671 1
0.1%
4670 1
0.1%
4669 1
0.1%
4668 1
0.1%
4667 1
0.1%
4666 1
0.1%
4665 1
0.1%
4664 1
0.1%

test_year
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.4 KiB
2019
1204 
2020
492 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019
2nd row2019
3rd row2019
4th row2019
5th row2019

Common Values

ValueCountFrequency (%)
2019 1204
71.0%
2020 492
29.0%

Length

2023-12-10T17:46:03.736931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T17:46:03.960695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2019 1204
71.0%
2020 492
29.0%

test_month
Real number (ℝ)

HIGH CORRELATION 

Distinct12
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.1892689
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.0 KiB
2023-12-10T17:46:04.123587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.3507138
Coefficient of variation (CV)0.54137474
Kurtosis-1.0486548
Mean6.1892689
Median Absolute Deviation (MAD)3
Skewness0.18489647
Sum10497
Variance11.227283
MonotonicityNot monotonic
2023-12-10T17:46:04.308968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
5 229
13.5%
3 222
13.1%
1 186
11.0%
4 162
9.6%
7 158
9.3%
11 142
8.4%
8 128
7.5%
12 126
7.4%
6 121
7.1%
9 106
6.2%
Other values (2) 116
6.8%
ValueCountFrequency (%)
1 186
11.0%
2 24
 
1.4%
3 222
13.1%
4 162
9.6%
5 229
13.5%
6 121
7.1%
7 158
9.3%
8 128
7.5%
9 106
6.2%
10 92
5.4%
ValueCountFrequency (%)
12 126
7.4%
11 142
8.4%
10 92
5.4%
9 106
6.2%
8 128
7.5%
7 158
9.3%
6 121
7.1%
5 229
13.5%
4 162
9.6%
3 222
13.1%
Distinct604
Distinct (%)35.6%
Missing0
Missing (%)0.0%
Memory size13.4 KiB
2023-12-10T17:46:04.870774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length17
Mean length5.6745283
Min length1

Characters and Unicode

Total characters9624
Distinct characters462
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique164 ?
Unique (%)9.7%

Sample

1st row멜론향소다음료
2nd row라무바틀탄산음료
3rd row쿠리이리도라야키
4th row기장돌미역
5th row기장다시마
ValueCountFrequency (%)
고등어 88
 
4.0%
삼치 45
 
2.0%
우럭 29
 
1.3%
가자미 29
 
1.3%
오징어 21
 
0.9%
소스 21
 
0.9%
생대구 19
 
0.9%
갈치 18
 
0.8%
아와세미소 17
 
0.8%
제주갈치 17
 
0.8%
Other values (705) 1912
86.3%
2023-12-10T17:46:05.820235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
520
 
5.4%
289
 
3.0%
229
 
2.4%
201
 
2.1%
198
 
2.1%
193
 
2.0%
190
 
2.0%
177
 
1.8%
146
 
1.5%
144
 
1.5%
Other values (452) 7337
76.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8699
90.4%
Space Separator 520
 
5.4%
Open Punctuation 112
 
1.2%
Close Punctuation 112
 
1.2%
Lowercase Letter 72
 
0.7%
Decimal Number 63
 
0.7%
Uppercase Letter 31
 
0.3%
Other Punctuation 8
 
0.1%
Dash Punctuation 7
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
289
 
3.3%
229
 
2.6%
201
 
2.3%
198
 
2.3%
193
 
2.2%
190
 
2.2%
177
 
2.0%
146
 
1.7%
144
 
1.7%
140
 
1.6%
Other values (409) 6792
78.1%
Lowercase Letter
ValueCountFrequency (%)
j 8
11.1%
u 7
 
9.7%
i 6
 
8.3%
r 6
 
8.3%
o 5
 
6.9%
a 5
 
6.9%
e 5
 
6.9%
t 4
 
5.6%
s 4
 
5.6%
l 4
 
5.6%
Other values (8) 18
25.0%
Uppercase Letter
ValueCountFrequency (%)
S 6
19.4%
B 4
12.9%
M 4
12.9%
P 3
9.7%
T 3
9.7%
G 3
9.7%
C 2
 
6.5%
N 2
 
6.5%
E 2
 
6.5%
L 1
 
3.2%
Decimal Number
ValueCountFrequency (%)
0 19
30.2%
5 17
27.0%
1 11
17.5%
6 6
 
9.5%
3 5
 
7.9%
2 3
 
4.8%
9 2
 
3.2%
Other Punctuation
ValueCountFrequency (%)
& 6
75.0%
/ 1
 
12.5%
, 1
 
12.5%
Space Separator
ValueCountFrequency (%)
520
100.0%
Open Punctuation
ValueCountFrequency (%)
( 112
100.0%
Close Punctuation
ValueCountFrequency (%)
) 112
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8699
90.4%
Common 822
 
8.5%
Latin 103
 
1.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
289
 
3.3%
229
 
2.6%
201
 
2.3%
198
 
2.3%
193
 
2.2%
190
 
2.2%
177
 
2.0%
146
 
1.7%
144
 
1.7%
140
 
1.6%
Other values (409) 6792
78.1%
Latin
ValueCountFrequency (%)
j 8
 
7.8%
u 7
 
6.8%
i 6
 
5.8%
r 6
 
5.8%
S 6
 
5.8%
o 5
 
4.9%
a 5
 
4.9%
e 5
 
4.9%
t 4
 
3.9%
s 4
 
3.9%
Other values (19) 47
45.6%
Common
ValueCountFrequency (%)
520
63.3%
( 112
 
13.6%
) 112
 
13.6%
0 19
 
2.3%
5 17
 
2.1%
1 11
 
1.3%
- 7
 
0.9%
6 6
 
0.7%
& 6
 
0.7%
3 5
 
0.6%
Other values (4) 7
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8699
90.4%
ASCII 925
 
9.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
520
56.2%
( 112
 
12.1%
) 112
 
12.1%
0 19
 
2.1%
5 17
 
1.8%
1 11
 
1.2%
j 8
 
0.9%
- 7
 
0.8%
u 7
 
0.8%
i 6
 
0.6%
Other values (33) 106
 
11.5%
Hangul
ValueCountFrequency (%)
289
 
3.3%
229
 
2.6%
201
 
2.3%
198
 
2.3%
193
 
2.2%
190
 
2.2%
177
 
2.0%
146
 
1.7%
144
 
1.7%
140
 
1.6%
Other values (409) 6792
78.1%

kind
Categorical

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size13.4 KiB
수산물
795 
가공식품
767 
농산물
118 
축산물
 
16

Length

Max length4
Median length3
Mean length3.4522406
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row가공식품
2nd row가공식품
3rd row가공식품
4th row수산물
5th row수산물

Common Values

ValueCountFrequency (%)
수산물 795
46.9%
가공식품 767
45.2%
농산물 118
 
7.0%
축산물 16
 
0.9%

Length

2023-12-10T17:46:06.064349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T17:46:06.292091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
수산물 795
46.9%
가공식품 767
45.2%
농산물 118
 
7.0%
축산물 16
 
0.9%

test_result
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.4 KiB
적합
1696 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row적합
2nd row적합
3rd row적합
4th row적합
5th row적합

Common Values

ValueCountFrequency (%)
적합 1696
100.0%

Length

2023-12-10T17:46:06.538491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T17:46:06.731842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
적합 1696
100.0%

detec_result
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size13.4 KiB
불검출
1693 
2 Bq/kg
 
2
137Cs, 9.8 Bq/kg 검출
 
1

Length

Max length19
Median length3
Mean length3.0141509
Min length3

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row불검출
2nd row불검출
3rd row불검출
4th row불검출
5th row불검출

Common Values

ValueCountFrequency (%)
불검출 1693
99.8%
2 Bq/kg 2
 
0.1%
137Cs, 9.8 Bq/kg 검출 1
 
0.1%

Length

2023-12-10T17:46:06.937662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T17:46:07.155639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
불검출 1693
99.5%
bq/kg 3
 
0.2%
2 2
 
0.1%
137cs 1
 
0.1%
9.8 1
 
0.1%
검출 1
 
0.1%

origin
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct40
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size13.4 KiB
국내산
992 
일본산
471 
국산
 
46
러시아산
 
41
미국산
 
22
Other values (35)
124 

Length

Max length13
Median length3
Mean length3.0689858
Min length2

Unique

Unique13 ?
Unique (%)0.8%

Sample

1st row일본산
2nd row일본산
3rd row일본산
4th row국내산
5th row국내산

Common Values

ValueCountFrequency (%)
국내산 992
58.5%
일본산 471
27.8%
국산 46
 
2.7%
러시아산 41
 
2.4%
미국산 22
 
1.3%
노르웨이산 15
 
0.9%
원양산 12
 
0.7%
러시아 12
 
0.7%
포르투칼산 12
 
0.7%
중국산 11
 
0.6%
Other values (30) 62
 
3.7%

Length

2023-12-10T17:46:07.471281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
국내산 992
58.3%
일본산 471
27.7%
국산 46
 
2.7%
러시아산 41
 
2.4%
미국산 22
 
1.3%
노르웨이산 15
 
0.9%
원양산 12
 
0.7%
러시아 12
 
0.7%
포르투칼산 12
 
0.7%
중국산 11
 
0.6%
Other values (32) 69
 
4.1%
Distinct18
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size13.4 KiB
Minimum2019-03-30 00:00:00
Maximum2020-08-10 00:00:00
2023-12-10T17:46:07.679641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T17:46:07.950146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)

apr_at
Boolean

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
False
1696 
ValueCountFrequency (%)
False 1696
100.0%
2023-12-10T17:46:08.253276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.4 KiB
Minimum2020-12-21 12:00:43
Maximum2020-12-21 12:00:44
2023-12-10T17:46:08.480721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T17:46:08.701121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=2)

Interactions

2023-12-10T17:46:01.747527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T17:46:01.289270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T17:46:01.955804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T17:46:01.535424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T17:46:08.889392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
skeytest_yeartest_monthkinddetec_resultorigindata_daylast_load_dttm
skey1.0000.9270.8840.2300.0000.2850.8960.964
test_year0.9271.0000.7360.3370.0160.3381.0000.167
test_month0.8840.7361.0000.2920.0960.4421.0000.521
kind0.2300.3370.2921.0000.0570.6390.3760.171
detec_result0.0000.0160.0960.0571.0000.8890.2050.000
origin0.2850.3380.4420.6390.8891.0000.5250.000
data_day0.8961.0001.0000.3760.2050.5251.0000.657
last_load_dttm0.9640.1670.5210.1710.0000.0000.6571.000
2023-12-10T17:46:09.122346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
origindetec_resulttest_yearkind
origin1.0000.7060.2660.352
detec_result0.7061.0000.0270.054
test_year0.2660.0271.0000.225
kind0.3520.0540.2251.000
2023-12-10T17:46:09.302958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
skeytest_monthtest_yearkinddetec_resultorigin
skey1.000-0.0500.7710.1390.0000.094
test_month-0.0501.0000.5750.1780.0570.155
test_year0.7710.5751.0000.2250.0270.266
kind0.1390.1780.2251.0000.0540.352
detec_result0.0000.0570.0270.0541.0000.706
origin0.0940.1550.2660.3520.7061.000

Missing values

2023-12-10T17:46:02.211917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T17:46:02.526423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

skeytest_yeartest_monthspec_namekindtest_resultdetec_resultorigindata_dayapr_atlast_load_dttm
02978201912멜론향소다음료가공식품적합불검출일본산2019-12-31N2020-12-21 12:00:43
12979201912라무바틀탄산음료가공식품적합불검출일본산2019-12-31N2020-12-21 12:00:43
22980201912쿠리이리도라야키가공식품적합불검출일본산2019-12-31N2020-12-21 12:00:43
32981201912기장돌미역수산물적합불검출국내산2019-12-31N2020-12-21 12:00:43
42982201912기장다시마수산물적합불검출국내산2019-12-31N2020-12-21 12:00:43
52983201912기장 재래미역수산물적합불검출국내산2019-12-31N2020-12-21 12:00:43
62984201912기꼬만혼쯔유가공식품적합불검출일본산2019-12-31N2020-12-21 12:00:43
72985201912계란또리에뿌리는 간장소스가공식품적합불검출일본산2019-12-31N2020-12-21 12:00:43
82986201912끄유노모토가공식품적합불검출일본산2019-12-31N2020-12-21 12:00:43
92987201912고등어수산물적합불검출국내산2019-12-31N2020-12-21 12:00:43
skeytest_yeartest_monthspec_namekindtest_resultdetec_resultorigindata_dayapr_atlast_load_dttm
1686466420207한치수산물적합불검출국내산2020-08-10N2020-12-21 12:00:44
1687466520207사양벌꿀가공식품적합불검출국내산2020-08-10N2020-12-21 12:00:44
1688466620207복음자리 포도잼가공식품적합불검출국내산2020-08-10N2020-12-21 12:00:44
1689466720207오가닉스토리유기농딸기잼가공식품적합불검출국내산2020-08-10N2020-12-21 12:00:44
1690466820207제주갈치수산물적합불검출국내산2020-08-10N2020-12-21 12:00:44
1691466920207참가자미수산물적합불검출국내산2020-08-10N2020-12-21 12:00:44
1692467020207사시미소유가공식품적합불검출일본산2020-08-10N2020-12-21 12:00:44
1693467120207컵미소 아와세가공식품적합불검출일본산2020-08-10N2020-12-21 12:00:44
1694467220207사시미 소유가공식품적합불검출일본산2020-08-10N2020-12-21 12:00:44
1695467320207영상가이석태수산물적합불검출국내산2020-08-10N2020-12-21 12:00:44