Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells517
Missing cells (%)0.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Text2
Categorical3
Numeric2

Dataset

Description국립농산물품질관리원에서 관리하는 원산지표시 시도별 위반품목 및 위반물량 현황 정보(처분년월, 업무구분, 시도명, 위반품목, 위반유형, 위반건수, 위반물량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001684

Alerts

업무구분명 is highly imbalanced (76.7%)Imbalance
위반유형 is highly imbalanced (50.8%)Imbalance
위반물량(kg) has 517 (5.2%) missing valuesMissing
위반물량(kg) is highly skewed (γ1 = 35.05938319)Skewed

Reproduction

Analysis started2024-03-23 07:47:56.110668
Analysis finished2024-03-23 07:47:59.037847
Duration2.93 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct4349
Distinct (%)43.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:47:59.471563image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters100000
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1768 ?
Unique (%)17.7%

Sample

1st row2005-01-30
2nd row2012-10-05
3rd row2008-02-14
4th row2007-02-01
5th row2004-10-20
ValueCountFrequency (%)
2010-09-16 16
 
0.2%
2010-02-12 15
 
0.1%
2009-09-22 15
 
0.1%
2009-09-30 14
 
0.1%
2012-01-20 13
 
0.1%
2010-02-05 12
 
0.1%
2010-02-11 12
 
0.1%
2009-09-24 11
 
0.1%
2005-02-04 11
 
0.1%
2016-02-01 11
 
0.1%
Other values (4339) 9870
98.7%
2024-03-23T07:48:00.434063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 27980
28.0%
- 20000
20.0%
2 18385
18.4%
1 13836
13.8%
3 3371
 
3.4%
9 3263
 
3.3%
5 2689
 
2.7%
4 2672
 
2.7%
8 2646
 
2.6%
7 2599
 
2.6%
Other values (2) 2559
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 79998
80.0%
Dash Punctuation 20000
 
20.0%
Space Separator 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 27980
35.0%
2 18385
23.0%
1 13836
17.3%
3 3371
 
4.2%
9 3263
 
4.1%
5 2689
 
3.4%
4 2672
 
3.3%
8 2646
 
3.3%
7 2599
 
3.2%
6 2557
 
3.2%
Dash Punctuation
ValueCountFrequency (%)
- 20000
100.0%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 27980
28.0%
- 20000
20.0%
2 18385
18.4%
1 13836
13.8%
3 3371
 
3.4%
9 3263
 
3.3%
5 2689
 
2.7%
4 2672
 
2.7%
8 2646
 
2.6%
7 2599
 
2.6%
Other values (2) 2559
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 27980
28.0%
- 20000
20.0%
2 18385
18.4%
1 13836
13.8%
3 3371
 
3.4%
9 3263
 
3.3%
5 2689
 
2.7%
4 2672
 
2.7%
8 2646
 
2.6%
7 2599
 
2.6%
Other values (2) 2559
 
2.6%

업무구분명
Categorical

IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
원산지단속
8977 
양곡표시
 
500
축산물이력
 
489
미검사품
 
18
재사용화환
 
11

Length

Max length5
Median length5
Mean length4.9472
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row원산지단속
2nd row축산물이력
3rd rowGMO
4th row원산지단속
5th row원산지단속

Common Values

ValueCountFrequency (%)
원산지단속 8977
89.8%
양곡표시 500
 
5.0%
축산물이력 489
 
4.9%
미검사품 18
 
0.2%
재사용화환 11
 
0.1%
GMO 5
 
0.1%

Length

2024-03-23T07:48:00.800793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:48:01.011058image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
원산지단속 8977
89.8%
양곡표시 500
 
5.0%
축산물이력 489
 
4.9%
미검사품 18
 
0.2%
재사용화환 11
 
0.1%
gmo 5
 
< 0.1%

시도명
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
1331 
서울특별시
872 
전라남도
865 
경상북도
836 
강원도
773 
Other values (12)
5323 

Length

Max length7
Median length5
Mean length4.1663
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경상남도
2nd row대구광역시
3rd row전라남도
4th row경상북도
5th row충청북도

Common Values

ValueCountFrequency (%)
경기도 1331
13.3%
서울특별시 872
8.7%
전라남도 865
8.6%
경상북도 836
8.4%
강원도 773
 
7.7%
경상남도 760
 
7.6%
전라북도 750
 
7.5%
충청남도 671
 
6.7%
충청북도 647
 
6.5%
대구광역시 548
 
5.5%
Other values (7) 1947
19.5%

Length

2024-03-23T07:48:01.244352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 1331
13.3%
서울특별시 872
8.7%
전라남도 865
8.6%
경상북도 836
8.4%
강원도 773
 
7.7%
경상남도 760
 
7.6%
전라북도 750
 
7.5%
충청남도 671
 
6.7%
충청북도 647
 
6.5%
대구광역시 548
 
5.5%
Other values (7) 1947
19.5%
Distinct599
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:48:01.847442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length10
Mean length3.7928
Min length1

Characters and Unicode

Total characters37928
Distinct characters336
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique207 ?
Unique (%)2.1%

Sample

1st row마늘
2nd row쇠고기(한우)
3rd row면화
4th row돼지고기
5th row기장(일반)
ValueCountFrequency (%)
돼지고기 1320
 
13.0%
배추김치 911
 
9.0%
쇠고기 785
 
7.8%
쇠고기(한우 531
 
5.2%
339
 
3.3%
닭고기 247
 
2.4%
멥쌀 226
 
2.2%
고추가루 205
 
2.0%
삼겹살 181
 
1.8%
두부류 181
 
1.8%
Other values (591) 5198
51.3%
2024-03-23T07:48:02.722736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3772
 
9.9%
3753
 
9.9%
1667
 
4.4%
1481
 
3.9%
1473
 
3.9%
( 1401
 
3.7%
) 1401
 
3.7%
1392
 
3.7%
1081
 
2.9%
1033
 
2.7%
Other values (326) 19474
51.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 34990
92.3%
Open Punctuation 1401
 
3.7%
Close Punctuation 1401
 
3.7%
Space Separator 124
 
0.3%
Uppercase Letter 6
 
< 0.1%
Other Punctuation 4
 
< 0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3772
 
10.8%
3753
 
10.7%
1667
 
4.8%
1481
 
4.2%
1473
 
4.2%
1392
 
4.0%
1081
 
3.1%
1033
 
3.0%
1024
 
2.9%
755
 
2.2%
Other values (319) 17559
50.2%
Uppercase Letter
ValueCountFrequency (%)
M 4
66.7%
A 2
33.3%
Open Punctuation
ValueCountFrequency (%)
( 1401
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1401
100.0%
Space Separator
ValueCountFrequency (%)
124
100.0%
Other Punctuation
ValueCountFrequency (%)
. 4
100.0%
Decimal Number
ValueCountFrequency (%)
1 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 34990
92.3%
Common 2932
 
7.7%
Latin 6
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3772
 
10.8%
3753
 
10.7%
1667
 
4.8%
1481
 
4.2%
1473
 
4.2%
1392
 
4.0%
1081
 
3.1%
1033
 
3.0%
1024
 
2.9%
755
 
2.2%
Other values (319) 17559
50.2%
Common
ValueCountFrequency (%)
( 1401
47.8%
) 1401
47.8%
124
 
4.2%
. 4
 
0.1%
1 2
 
0.1%
Latin
ValueCountFrequency (%)
M 4
66.7%
A 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 34990
92.3%
ASCII 2938
 
7.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3772
 
10.8%
3753
 
10.7%
1667
 
4.8%
1481
 
4.2%
1473
 
4.2%
1392
 
4.0%
1081
 
3.1%
1033
 
3.0%
1024
 
2.9%
755
 
2.2%
Other values (319) 17559
50.2%
ASCII
ValueCountFrequency (%)
( 1401
47.7%
) 1401
47.7%
124
 
4.2%
M 4
 
0.1%
. 4
 
0.1%
A 2
 
0.1%
1 2
 
0.1%

위반유형
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
거짓표시
6091 
미표시
3890 
영수증미비치
 
18
조사거부
 
1

Length

Max length6
Median length4
Mean length3.6146
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row거짓표시
2nd row미표시
3rd row미표시
4th row거짓표시
5th row거짓표시

Common Values

ValueCountFrequency (%)
거짓표시 6091
60.9%
미표시 3890
38.9%
영수증미비치 18
 
0.2%
조사거부 1
 
< 0.1%

Length

2024-03-23T07:48:03.155420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:48:03.472538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
거짓표시 6091
60.9%
미표시 3890
38.9%
영수증미비치 18
 
0.2%
조사거부 1
 
< 0.1%

위반건수
Real number (ℝ)

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.2643
Minimum1
Maximum18
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:48:03.753651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum18
Range17
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.84493603
Coefficient of variation (CV)0.66830344
Kurtosis70.740413
Mean1.2643
Median Absolute Deviation (MAD)0
Skewness6.6338464
Sum12643
Variance0.7139169
MonotonicityNot monotonic
2024-03-23T07:48:04.148509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
1 8410
84.1%
2 1085
 
10.8%
3 282
 
2.8%
4 106
 
1.1%
5 52
 
0.5%
6 23
 
0.2%
7 12
 
0.1%
8 10
 
0.1%
10 5
 
0.1%
13 4
 
< 0.1%
Other values (6) 11
 
0.1%
ValueCountFrequency (%)
1 8410
84.1%
2 1085
 
10.8%
3 282
 
2.8%
4 106
 
1.1%
5 52
 
0.5%
6 23
 
0.2%
7 12
 
0.1%
8 10
 
0.1%
9 4
 
< 0.1%
10 5
 
0.1%
ValueCountFrequency (%)
18 1
 
< 0.1%
16 1
 
< 0.1%
14 1
 
< 0.1%
13 4
 
< 0.1%
12 1
 
< 0.1%
11 3
 
< 0.1%
10 5
0.1%
9 4
 
< 0.1%
8 10
0.1%
7 12
0.1%

위반물량(kg)
Real number (ℝ)

MISSING  SKEWED 

Distinct2726
Distinct (%)28.7%
Missing517
Missing (%)5.2%
Infinite0
Infinite (%)0.0%
Mean3627.1068
Minimum0
Maximum2666400
Zeros5
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:48:04.622388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q15
median25
Q3216
95-th percentile4908.4
Maximum2666400
Range2666400
Interquartile range (IQR)211

Descriptive statistics

Standard deviation51081.288
Coefficient of variation (CV)14.083205
Kurtosis1480.2057
Mean3627.1068
Median Absolute Deviation (MAD)23.6
Skewness35.059383
Sum34395854
Variance2.6092979 × 109
MonotonicityNot monotonic
2024-03-23T07:48:05.061901image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 401
 
4.0%
1.0 376
 
3.8%
2.0 310
 
3.1%
20.0 258
 
2.6%
3.0 241
 
2.4%
5.0 226
 
2.3%
4.0 207
 
2.1%
6.0 174
 
1.7%
40.0 169
 
1.7%
8.0 138
 
1.4%
Other values (2716) 6983
69.8%
(Missing) 517
 
5.2%
ValueCountFrequency (%)
0.0 5
 
0.1%
0.02 2
 
< 0.1%
0.03 1
 
< 0.1%
0.04 1
 
< 0.1%
0.043 1
 
< 0.1%
0.08 1
 
< 0.1%
0.1 16
0.2%
0.12 1
 
< 0.1%
0.15 3
 
< 0.1%
0.18 1
 
< 0.1%
ValueCountFrequency (%)
2666400.0 1
< 0.1%
2147482.0 1
< 0.1%
2000000.0 1
< 0.1%
1375548.0 1
< 0.1%
1282340.0 1
< 0.1%
986298.5 1
< 0.1%
955070.0 1
< 0.1%
612659.0 1
< 0.1%
502546.0 1
< 0.1%
455000.0 1
< 0.1%

Interactions

2024-03-23T07:47:57.722264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:47:57.216059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:47:58.022655image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:47:57.462779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:48:05.332996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업무구분명시도명위반유형위반건수위반물량(kg)
업무구분명1.0000.0970.2940.0650.000
시도명0.0971.0000.1100.0000.000
위반유형0.2940.1101.0000.0000.000
위반건수0.0650.0000.0001.0000.000
위반물량(kg)0.0000.0000.0000.0001.000
2024-03-23T07:48:05.593942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반유형업무구분명시도명
위반유형1.0000.1930.061
업무구분명0.1931.0000.045
시도명0.0610.0451.000
2024-03-23T07:48:06.082788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반건수위반물량(kg)업무구분명시도명위반유형
위반건수1.0000.1670.0340.0000.000
위반물량(kg)0.1671.0000.0000.0000.000
업무구분명0.0340.0001.0000.0450.193
시도명0.0000.0000.0451.0000.061
위반유형0.0000.0000.1930.0611.000

Missing values

2024-03-23T07:47:58.442299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:47:58.843915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

처분년월업무구분명시도명위반품목위반유형위반건수위반물량(kg)
800502005-01-30원산지단속경상남도마늘거짓표시12.0
432402012-10-05축산물이력대구광역시쇠고기(한우)미표시110.0
695202008-02-14GMO전라남도면화미표시12.2
734592007-02-01원산지단속경상북도돼지고기거짓표시2111.2
807302004-10-20원산지단속충청북도기장(일반)거짓표시11215.0
316802015-03-05원산지단속경상남도돼지고기부산물거짓표시216.0
376462013-12-04원산지단속강원도배추김치거짓표시1330.0
399472013-05-15원산지단속경기도더덕거짓표시212.0
842262003-09-20원산지단속충청북도거짓표시11.2
398962013-05-21원산지단속세종특별자치시쇠고기거짓표시18345.4
처분년월업무구분명시도명위반품목위반유형위반건수위반물량(kg)
821152004-05-12원산지단속경상북도약재류 기타거짓표시157.6
202542017-10-24축산물이력경상북도쇠고기(한우)미표시10.5
38222022-03-23원산지단속경상남도감초거짓표시125.0
619912009-09-03원산지단속전라남도깻잎장아찌미표시10.5
532012011-01-26원산지단속전라남도쇠고기거짓표시19.0
366312014-02-07원산지단속경상남도물고사리미표시11.0
23832022-07-27원산지단속대구광역시돼지고기거짓표시1406.83
102072020-05-29원산지단속강원도과자류기타미표시116.9
387602013-08-30원산지단속충청남도배추김치거짓표시120.0
934792001-01-19원산지단속전라북도쇠고기거짓표시16.0