Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells549
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

DateTime1
Categorical3
Text1
Numeric2

Dataset

Description국립농산물품질관리원에서 관리하는 원산지표시 시도별 위반품목 및 위반물량 현황 정보(처분년월, 업무구분, 시도명, 위반품목, 위반유형, 위반건수, 위반물량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001684

Alerts

업무구분명 is highly imbalanced (77.3%)Imbalance
위반유형 is highly imbalanced (57.4%)Imbalance
위반물량 has 549 (5.5%) missing valuesMissing
위반물량 is highly skewed (γ1 = 89.18569492)Skewed

Reproduction

Analysis started2024-03-23 07:48:08.479833
Analysis finished2024-03-23 07:48:10.892489
Duration2.41 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct4348
Distinct (%)43.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum1998-02-11 00:00:00
Maximum2023-09-15 00:00:00
2024-03-23T07:48:11.097599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:48:11.523160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

업무구분명
Categorical

IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
원산지단속
9020 
축산물이력
 
501
양곡표시
 
438
미검사품
 
19
재사용화환
 
15

Length

Max length5
Median length5
Mean length4.9529
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row원산지단속
2nd row원산지단속
3rd row원산지단속
4th row원산지단속
5th row원산지단속

Common Values

ValueCountFrequency (%)
원산지단속 9020
90.2%
축산물이력 501
 
5.0%
양곡표시 438
 
4.4%
미검사품 19
 
0.2%
재사용화환 15
 
0.1%
GMO 7
 
0.1%

Length

2024-03-23T07:48:11.992368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:48:12.398962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
원산지단속 9020
90.2%
축산물이력 501
 
5.0%
양곡표시 438
 
4.4%
미검사품 19
 
0.2%
재사용화환 15
 
0.1%
gmo 7
 
0.1%

시도명
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
1344 
서울특별시
943 
전라남도
848 
경상북도
814 
강원특별자치도
803 
Other values (12)
5248 

Length

Max length7
Median length5
Mean length4.4904
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경상북도
2nd row전라북도
3rd row강원특별자치도
4th row경상북도
5th row서울특별시

Common Values

ValueCountFrequency (%)
경기도 1344
13.4%
서울특별시 943
9.4%
전라남도 848
8.5%
경상북도 814
8.1%
강원특별자치도 803
8.0%
전라북도 755
 
7.5%
경상남도 742
 
7.4%
충청남도 638
 
6.4%
충청북도 611
 
6.1%
대구광역시 532
 
5.3%
Other values (7) 1970
19.7%

Length

2024-03-23T07:48:12.800072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 1344
13.4%
서울특별시 943
9.4%
전라남도 848
8.5%
경상북도 814
8.1%
강원특별자치도 803
8.0%
전라북도 755
 
7.5%
경상남도 742
 
7.4%
충청남도 638
 
6.4%
충청북도 611
 
6.1%
대구광역시 532
 
5.3%
Other values (7) 1970
19.7%
Distinct623
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:48:13.297032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length3.8355
Min length1

Characters and Unicode

Total characters38355
Distinct characters347
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique235 ?
Unique (%)2.4%

Sample

1st row배추김치
2nd row쇠고기
3rd row돼지고기
4th row돼지고기
5th row쇠고기
ValueCountFrequency (%)
돼지고기 1381
 
13.6%
배추김치 902
 
8.9%
쇠고기 778
 
7.7%
쇠고기(한우 537
 
5.3%
329
 
3.3%
닭고기 258
 
2.5%
멥쌀 198
 
2.0%
삼겹살 191
 
1.9%
고추가루 187
 
1.8%
두부류 170
 
1.7%
Other values (618) 5190
51.3%
2024-03-23T07:48:14.065360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3860
 
10.1%
3810
 
9.9%
1769
 
4.6%
1559
 
4.1%
1464
 
3.8%
) 1433
 
3.7%
( 1433
 
3.7%
1427
 
3.7%
1085
 
2.8%
1043
 
2.7%
Other values (337) 19472
50.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 35354
92.2%
Close Punctuation 1433
 
3.7%
Open Punctuation 1433
 
3.7%
Space Separator 121
 
0.3%
Uppercase Letter 9
 
< 0.1%
Decimal Number 3
 
< 0.1%
Other Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3860
 
10.9%
3810
 
10.8%
1769
 
5.0%
1559
 
4.4%
1464
 
4.1%
1427
 
4.0%
1085
 
3.1%
1043
 
3.0%
1039
 
2.9%
708
 
2.0%
Other values (328) 17590
49.8%
Decimal Number
ValueCountFrequency (%)
6 1
33.3%
1 1
33.3%
4 1
33.3%
Uppercase Letter
ValueCountFrequency (%)
M 6
66.7%
A 3
33.3%
Close Punctuation
ValueCountFrequency (%)
) 1433
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1433
100.0%
Space Separator
ValueCountFrequency (%)
121
100.0%
Other Punctuation
ValueCountFrequency (%)
. 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 35354
92.2%
Common 2992
 
7.8%
Latin 9
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3860
 
10.9%
3810
 
10.8%
1769
 
5.0%
1559
 
4.4%
1464
 
4.1%
1427
 
4.0%
1085
 
3.1%
1043
 
3.0%
1039
 
2.9%
708
 
2.0%
Other values (328) 17590
49.8%
Common
ValueCountFrequency (%)
) 1433
47.9%
( 1433
47.9%
121
 
4.0%
. 2
 
0.1%
6 1
 
< 0.1%
1 1
 
< 0.1%
4 1
 
< 0.1%
Latin
ValueCountFrequency (%)
M 6
66.7%
A 3
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 35354
92.2%
ASCII 3001
 
7.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3860
 
10.9%
3810
 
10.8%
1769
 
5.0%
1559
 
4.4%
1464
 
4.1%
1427
 
4.0%
1085
 
3.1%
1043
 
3.0%
1039
 
2.9%
708
 
2.0%
Other values (328) 17590
49.8%
ASCII
ValueCountFrequency (%)
) 1433
47.8%
( 1433
47.8%
121
 
4.0%
M 6
 
0.2%
A 3
 
0.1%
. 2
 
0.1%
6 1
 
< 0.1%
1 1
 
< 0.1%
4 1
 
< 0.1%

위반유형
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
거짓표시
6013 
미표시
3968 
영수증미비치
 
17
조사거부
 
1
시정명령 위반
 
1

Length

Max length7
Median length4
Mean length3.6069
Min length3

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row미표시
2nd row거짓표시
3rd row거짓표시
4th row거짓표시
5th row미표시

Common Values

ValueCountFrequency (%)
거짓표시 6013
60.1%
미표시 3968
39.7%
영수증미비치 17
 
0.2%
조사거부 1
 
< 0.1%
시정명령 위반 1
 
< 0.1%

Length

2024-03-23T07:48:14.486354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:48:14.905788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
거짓표시 6013
60.1%
미표시 3968
39.7%
영수증미비치 17
 
0.2%
조사거부 1
 
< 0.1%
시정명령 1
 
< 0.1%
위반 1
 
< 0.1%

위반건수
Real number (ℝ)

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.268
Minimum1
Maximum35
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:48:15.220938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum35
Range34
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.92242132
Coefficient of variation (CV)0.72746161
Kurtosis340.62327
Mean1.268
Median Absolute Deviation (MAD)0
Skewness12.75645
Sum12680
Variance0.85086109
MonotonicityNot monotonic
2024-03-23T07:48:15.522051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
1 8369
83.7%
2 1122
 
11.2%
3 296
 
3.0%
4 107
 
1.1%
5 46
 
0.5%
6 22
 
0.2%
7 18
 
0.2%
8 5
 
0.1%
10 3
 
< 0.1%
11 3
 
< 0.1%
Other values (8) 9
 
0.1%
ValueCountFrequency (%)
1 8369
83.7%
2 1122
 
11.2%
3 296
 
3.0%
4 107
 
1.1%
5 46
 
0.5%
6 22
 
0.2%
7 18
 
0.2%
8 5
 
0.1%
9 2
 
< 0.1%
10 3
 
< 0.1%
ValueCountFrequency (%)
35 1
 
< 0.1%
32 1
 
< 0.1%
17 1
 
< 0.1%
16 1
 
< 0.1%
15 1
 
< 0.1%
14 1
 
< 0.1%
13 1
 
< 0.1%
11 3
< 0.1%
10 3
< 0.1%
9 2
< 0.1%

위반물량
Real number (ℝ)

MISSING  SKEWED 

Distinct2706
Distinct (%)28.6%
Missing549
Missing (%)5.5%
Infinite0
Infinite (%)0.0%
Mean4396.2262
Minimum0
Maximum13325542
Zeros4
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:48:15.906834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q15
median27
Q3210
95-th percentile5229.1
Maximum13325542
Range13325542
Interquartile range (IQR)205

Descriptive statistics

Standard deviation141328.09
Coefficient of variation (CV)32.147593
Kurtosis8361.6998
Mean4396.2262
Median Absolute Deviation (MAD)25.5
Skewness89.185695
Sum41548734
Variance1.997363 × 1010
MonotonicityNot monotonic
2024-03-23T07:48:16.453734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0 384
 
3.8%
10.0 383
 
3.8%
2.0 292
 
2.9%
20.0 251
 
2.5%
3.0 250
 
2.5%
5.0 222
 
2.2%
4.0 194
 
1.9%
40.0 163
 
1.6%
6.0 142
 
1.4%
30.0 132
 
1.3%
Other values (2696) 7038
70.4%
(Missing) 549
 
5.5%
ValueCountFrequency (%)
0.0 4
 
< 0.1%
0.02 4
 
< 0.1%
0.03 1
 
< 0.1%
0.04 1
 
< 0.1%
0.046 1
 
< 0.1%
0.05 3
 
< 0.1%
0.08 2
 
< 0.1%
0.09 1
 
< 0.1%
0.1 10
0.1%
0.13 1
 
< 0.1%
ValueCountFrequency (%)
13325542.0 1
< 0.1%
2071151.0 1
< 0.1%
1574500.0 1
< 0.1%
986298.5 1
< 0.1%
650134.0 1
< 0.1%
466540.0 1
< 0.1%
413891.0 1
< 0.1%
409800.0 1
< 0.1%
406840.0 1
< 0.1%
400500.0 1
< 0.1%

Interactions

2024-03-23T07:48:10.031070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:48:09.470342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:48:10.321003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:48:09.749981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:48:16.732369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업무구분명시도명위반유형위반건수위반물량
업무구분명1.0000.1010.2400.0610.000
시도명0.1011.0000.0780.0220.050
위반유형0.2400.0781.0000.0000.000
위반건수0.0610.0220.0001.0000.000
위반물량0.0000.0500.0000.0001.000
2024-03-23T07:48:17.063145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반유형업무구분명시도명
위반유형1.0000.1650.040
업무구분명0.1651.0000.048
시도명0.0400.0481.000
2024-03-23T07:48:17.346534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반건수위반물량업무구분명시도명위반유형
위반건수1.0000.1620.0220.0100.000
위반물량0.1621.0000.0000.0260.000
업무구분명0.0220.0001.0000.0480.165
시도명0.0100.0260.0481.0000.040
위반유형0.0000.0000.1650.0401.000

Missing values

2024-03-23T07:48:10.562027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:48:10.784707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

처분년월업무구분명시도명위반품목위반유형위반건수위반물량
560282010-11-29원산지단속경상북도배추김치미표시110.0
270362016-08-25원산지단속전라북도쇠고기거짓표시162.8
930892001-09-22원산지단속강원특별자치도돼지고기거짓표시135.3
772332006-06-26원산지단속경상북도돼지고기거짓표시1115.9
337772015-02-26원산지단속서울특별시쇠고기미표시29.4
226882017-09-15원산지단속강원특별자치도배추김치거짓표시1100.0
30572022-11-27축산물이력전라북도닭고기미표시135334.0
178512018-12-03축산물이력경상남도쇠고기(한우)미표시10.1
825382004-11-09원산지단속대구광역시참기름거짓표시130.0
977152000-02-18원산지단속부산광역시맥류기타거짓표시130.0
처분년월업무구분명시도명위반품목위반유형위반건수위반물량
548022011-02-01원산지단속경상북도배추김치거짓표시130.0
219242017-11-30축산물이력강원특별자치도쇠고기(한우)미표시11.0
417022013-05-28원산지단속울산광역시배추김치거짓표시260.0
132412020-01-16원산지단속경기도삼겹살거짓표시1400.0
747772007-03-02원산지단속강원특별자치도참기름미표시210.0
266232016-09-13원산지단속경기도목삼겹거짓표시1264.6
760232006-10-18원산지단속충청북도고추가루거짓표시1460.0
114572020-07-30원산지단속경기도양념육(육지물)미표시115045.0
624342009-10-17원산지단속경기도떡류미표시14.5
703612008-05-01원산지단속경기도쇠고기거짓표시13600.0