Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells537
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

DateTime1
Categorical3
Text1
Numeric2

Dataset

Description국립농산물품질관리원에서 관리하는 원산지표시 시도별 위반품목 및 위반물량 현황 정보(처분년월, 업무구분, 시도명, 위반품목, 위반유형, 위반건수, 위반물량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001684

Alerts

업무구분명 is highly imbalanced (76.8%)Imbalance
위반유형 is highly imbalanced (50.8%)Imbalance
위반물량(kg) has 537 (5.4%) missing valuesMissing
위반물량(kg) is highly skewed (γ1 = 40.50734814)Skewed

Reproduction

Analysis started2024-03-23 07:48:19.776423
Analysis finished2024-03-23 07:48:22.541805
Duration2.77 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct4419
Distinct (%)44.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum1998-05-23 00:00:00
Maximum2024-03-15 00:00:00
2024-03-23T07:48:22.773890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:48:23.390504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

업무구분명
Categorical

IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
원산지단속
8984 
축산물이력
 
504
양곡표시
 
475
미검사품
 
20
재사용화환
 
15

Length

Max length5
Median length5
Mean length4.9501
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row원산지단속
2nd row원산지단속
3rd row원산지단속
4th row원산지단속
5th row원산지단속

Common Values

ValueCountFrequency (%)
원산지단속 8984
89.8%
축산물이력 504
 
5.0%
양곡표시 475
 
4.8%
미검사품 20
 
0.2%
재사용화환 15
 
0.1%
GMO 2
 
< 0.1%

Length

2024-03-23T07:48:23.860247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:48:24.286810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
원산지단속 8984
89.8%
축산물이력 504
 
5.0%
양곡표시 475
 
4.8%
미검사품 20
 
0.2%
재사용화환 15
 
0.1%
gmo 2
 
< 0.1%

시도명
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
1316 
서울특별시
930 
전라남도
868 
경상북도
801 
경상남도
777 
Other values (12)
5308 

Length

Max length7
Median length5
Mean length4.702
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경기도
2nd row경기도
3rd row경상북도
4th row충청남도
5th row전북특별자치도

Common Values

ValueCountFrequency (%)
경기도 1316
13.2%
서울특별시 930
9.3%
전라남도 868
8.7%
경상북도 801
 
8.0%
경상남도 777
 
7.8%
강원특별자치도 760
 
7.6%
전북특별자치도 712
 
7.1%
충청북도 644
 
6.4%
충청남도 638
 
6.4%
대구광역시 551
 
5.5%
Other values (7) 2003
20.0%

Length

2024-03-23T07:48:24.905625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 1316
13.2%
서울특별시 930
9.3%
전라남도 868
8.7%
경상북도 801
 
8.0%
경상남도 777
 
7.8%
강원특별자치도 760
 
7.6%
전북특별자치도 712
 
7.1%
충청북도 644
 
6.4%
충청남도 638
 
6.4%
대구광역시 551
 
5.5%
Other values (7) 2003
20.0%
Distinct618
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:48:25.681576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length3.8163
Min length1

Characters and Unicode

Total characters38163
Distinct characters347
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique219 ?
Unique (%)2.2%

Sample

1st row과실 기타
2nd row상황버섯
3rd row쇠고기
4th row과채가공품
5th row쇠고기부산물
ValueCountFrequency (%)
돼지고기 1376
 
13.6%
배추김치 907
 
9.0%
쇠고기 823
 
8.1%
쇠고기(한우 535
 
5.3%
326
 
3.2%
닭고기 259
 
2.6%
고추가루 208
 
2.1%
멥쌀 191
 
1.9%
두부류 189
 
1.9%
삼겹살 177
 
1.8%
Other values (610) 5120
50.6%
2024-03-23T07:48:26.825561image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3877
 
10.2%
3855
 
10.1%
1744
 
4.6%
1534
 
4.0%
1525
 
4.0%
( 1449
 
3.8%
) 1449
 
3.8%
1409
 
3.7%
1060
 
2.8%
1022
 
2.7%
Other values (337) 19239
50.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 35141
92.1%
Open Punctuation 1449
 
3.8%
Close Punctuation 1449
 
3.8%
Space Separator 111
 
0.3%
Uppercase Letter 6
 
< 0.1%
Decimal Number 4
 
< 0.1%
Other Punctuation 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3877
 
11.0%
3855
 
11.0%
1744
 
5.0%
1534
 
4.4%
1525
 
4.3%
1409
 
4.0%
1060
 
3.0%
1022
 
2.9%
1020
 
2.9%
673
 
1.9%
Other values (326) 17422
49.6%
Uppercase Letter
ValueCountFrequency (%)
M 3
50.0%
A 1
 
16.7%
O 1
 
16.7%
G 1
 
16.7%
Decimal Number
ValueCountFrequency (%)
1 2
50.0%
4 1
25.0%
6 1
25.0%
Open Punctuation
ValueCountFrequency (%)
( 1449
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1449
100.0%
Space Separator
ValueCountFrequency (%)
111
100.0%
Other Punctuation
ValueCountFrequency (%)
. 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 35141
92.1%
Common 3016
 
7.9%
Latin 6
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3877
 
11.0%
3855
 
11.0%
1744
 
5.0%
1534
 
4.4%
1525
 
4.3%
1409
 
4.0%
1060
 
3.0%
1022
 
2.9%
1020
 
2.9%
673
 
1.9%
Other values (326) 17422
49.6%
Common
ValueCountFrequency (%)
( 1449
48.0%
) 1449
48.0%
111
 
3.7%
. 3
 
0.1%
1 2
 
0.1%
4 1
 
< 0.1%
6 1
 
< 0.1%
Latin
ValueCountFrequency (%)
M 3
50.0%
A 1
 
16.7%
O 1
 
16.7%
G 1
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 35141
92.1%
ASCII 3022
 
7.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3877
 
11.0%
3855
 
11.0%
1744
 
5.0%
1534
 
4.4%
1525
 
4.3%
1409
 
4.0%
1060
 
3.0%
1022
 
2.9%
1020
 
2.9%
673
 
1.9%
Other values (326) 17422
49.6%
ASCII
ValueCountFrequency (%)
( 1449
47.9%
) 1449
47.9%
111
 
3.7%
. 3
 
0.1%
M 3
 
0.1%
1 2
 
0.1%
4 1
 
< 0.1%
A 1
 
< 0.1%
O 1
 
< 0.1%
G 1
 
< 0.1%

위반유형
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
거짓표시
6069 
미표시
3913 
영수증미비치
 
16
시정명령 위반
 
2

Length

Max length7
Median length4
Mean length3.6125
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row거짓표시
2nd row거짓표시
3rd row거짓표시
4th row미표시
5th row거짓표시

Common Values

ValueCountFrequency (%)
거짓표시 6069
60.7%
미표시 3913
39.1%
영수증미비치 16
 
0.2%
시정명령 위반 2
 
< 0.1%

Length

2024-03-23T07:48:27.279048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:48:27.654400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
거짓표시 6069
60.7%
미표시 3913
39.1%
영수증미비치 16
 
0.2%
시정명령 2
 
< 0.1%
위반 2
 
< 0.1%

위반건수
Real number (ℝ)

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.2582
Minimum1
Maximum35
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:48:27.941329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum35
Range34
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.88216245
Coefficient of variation (CV)0.70113054
Kurtosis264.78831
Mean1.2582
Median Absolute Deviation (MAD)0
Skewness10.851262
Sum12582
Variance0.77821058
MonotonicityNot monotonic
2024-03-23T07:48:28.329527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
1 8424
84.2%
2 1090
 
10.9%
3 295
 
2.9%
4 89
 
0.9%
5 40
 
0.4%
6 18
 
0.2%
7 16
 
0.2%
8 7
 
0.1%
11 6
 
0.1%
10 4
 
< 0.1%
Other values (6) 11
 
0.1%
ValueCountFrequency (%)
1 8424
84.2%
2 1090
 
10.9%
3 295
 
2.9%
4 89
 
0.9%
5 40
 
0.4%
6 18
 
0.2%
7 16
 
0.2%
8 7
 
0.1%
9 4
 
< 0.1%
10 4
 
< 0.1%
ValueCountFrequency (%)
35 1
 
< 0.1%
16 2
 
< 0.1%
15 1
 
< 0.1%
13 1
 
< 0.1%
12 2
 
< 0.1%
11 6
 
0.1%
10 4
 
< 0.1%
9 4
 
< 0.1%
8 7
0.1%
7 16
0.2%

위반물량(kg)
Real number (ℝ)

MISSING  SKEWED 

Distinct2767
Distinct (%)29.2%
Missing537
Missing (%)5.4%
Infinite0
Infinite (%)0.0%
Mean3753.5232
Minimum0
Maximum3361639.4
Zeros8
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:48:28.742337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q15
median28
Q3212.65
95-th percentile5036
Maximum3361639.4
Range3361639.4
Interquartile range (IQR)207.65

Descriptive statistics

Standard deviation55979.186
Coefficient of variation (CV)14.913771
Kurtosis2020.0767
Mean3753.5232
Median Absolute Deviation (MAD)26.5
Skewness40.507348
Sum35519590
Variance3.1336693 × 109
MonotonicityNot monotonic
2024-03-23T07:48:29.309726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 402
 
4.0%
1.0 391
 
3.9%
2.0 295
 
2.9%
5.0 264
 
2.6%
3.0 253
 
2.5%
20.0 250
 
2.5%
4.0 191
 
1.9%
40.0 168
 
1.7%
6.0 150
 
1.5%
30.0 126
 
1.3%
Other values (2757) 6973
69.7%
(Missing) 537
 
5.4%
ValueCountFrequency (%)
0.0 8
0.1%
0.01 1
 
< 0.1%
0.04 1
 
< 0.1%
0.05 1
 
< 0.1%
0.1 16
0.2%
0.15 5
 
0.1%
0.16 1
 
< 0.1%
0.2 17
0.2%
0.215 1
 
< 0.1%
0.24 1
 
< 0.1%
ValueCountFrequency (%)
3361639.42 1
< 0.1%
2473993.0 1
< 0.1%
2071151.0 1
< 0.1%
1375548.0 1
< 0.1%
794015.0 1
< 0.1%
784682.6 1
< 0.1%
663755.0 1
< 0.1%
616562.0 1
< 0.1%
611231.97 1
< 0.1%
508440.0 1
< 0.1%

Interactions

2024-03-23T07:48:21.343179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:48:20.868620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:48:21.608177image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:48:21.146628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:48:29.646355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업무구분명시도명위반유형위반건수위반물량(kg)
업무구분명1.0000.0920.3020.0000.000
시도명0.0921.0000.0880.0290.000
위반유형0.3020.0881.0000.0000.000
위반건수0.0000.0290.0001.0000.000
위반물량(kg)0.0000.0000.0000.0001.000
2024-03-23T07:48:29.921541image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반유형업무구분명시도명
위반유형1.0000.1990.049
업무구분명0.1991.0000.043
시도명0.0490.0431.000
2024-03-23T07:48:30.214783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반건수위반물량(kg)업무구분명시도명위반유형
위반건수1.0000.1650.0000.0130.000
위반물량(kg)0.1651.0000.0000.0000.000
업무구분명0.0000.0001.0000.0430.199
시도명0.0130.0000.0431.0000.049
위반유형0.0000.0000.1990.0491.000

Missing values

2024-03-23T07:48:21.979055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:48:22.371310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

처분년월업무구분명시도명위반품목위반유형위반건수위반물량(kg)
934252002-02-22원산지단속경기도과실 기타거짓표시1425.0
70302022-05-10원산지단속경기도상황버섯거짓표시19.0
541512011-06-21원산지단속경상북도쇠고기거짓표시177.0
755932007-05-01원산지단속충청남도과채가공품미표시15.0
945662001-10-10원산지단속전북특별자치도쇠고기부산물거짓표시129.8
590592010-08-25원산지단속서울특별시돼지고기거짓표시435115.0
998931999-01-13원산지단속경상남도돼지고기거짓표시11927.0
371772014-09-24원산지단속제주특별자치도거짓표시1100.0
410262014-01-14원산지단속전라남도닭고기미표시11.0
550382011-04-25원산지단속전라남도쇠고기거짓표시160.0
처분년월업무구분명시도명위반품목위반유형위반건수위반물량(kg)
460272013-01-21원산지단속전라남도떡류미표시10.6
423092013-09-05양곡표시전라남도멥쌀미표시140.0
138902020-06-02원산지단속경기도돼지고기거짓표시15615.2
366232014-12-15원산지단속경상남도미표시1<NA>
592772010-08-10원산지단속대구광역시고추가루미표시15.0
720572008-05-06원산지단속광주광역시쇠고기거짓표시2485.0
163862019-09-03원산지단속울산광역시두부류거짓표시1144.0
581232010-10-20원산지단속경상남도쇠고기거짓표시175.0
845542004-09-24원산지단속경상남도녹두거짓표시1150.0
647772009-09-25원산지단속충청남도미표시11.0