Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells462
Missing cells (%)0.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

DateTime1
Categorical3
Text1
Numeric2

Dataset

Description국립농산물품질관리원에서 관리하는 원산지표시 시도별 위반품목 및 위반물량 현황 정보(처분년월, 업무구분, 시도명, 위반품목, 위반유형, 위반건수, 위반물량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001684

Alerts

업무구분명 is highly imbalanced (79.0%)Imbalance
위반물량(kg) has 462 (4.6%) missing valuesMissing
위반물량(kg) is highly skewed (γ1 = 41.8621486)Skewed

Reproduction

Analysis started2024-03-23 07:47:28.131676
Analysis finished2024-03-23 07:47:30.725768
Duration2.59 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct279
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum1998-01-01 00:00:00
Maximum2022-09-01 00:00:00
2024-03-23T07:47:30.932793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:47:31.368641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

업무구분명
Categorical

IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
원산지단속
9121 
양곡표시
 
495
축산물이력
 
338
미검사품
 
31
재사용화환
 
8

Length

Max length5
Median length5
Mean length4.946
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row원산지단속
2nd row원산지단속
3rd row원산지단속
4th row원산지단속
5th row원산지단속

Common Values

ValueCountFrequency (%)
원산지단속 9121
91.2%
양곡표시 495
 
5.0%
축산물이력 338
 
3.4%
미검사품 31
 
0.3%
재사용화환 8
 
0.1%
GMO 7
 
0.1%

Length

2024-03-23T07:47:31.814259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:47:32.031980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
원산지단속 9121
91.2%
양곡표시 495
 
5.0%
축산물이력 338
 
3.4%
미검사품 31
 
0.3%
재사용화환 8
 
0.1%
gmo 7
 
0.1%

시도명
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
1185 
서울특별시
846 
경상북도
827 
전라남도
825 
전라북도
777 
Other values (12)
5540 

Length

Max length7
Median length5
Mean length4.2132
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부산광역시
2nd row전라남도
3rd row강원도
4th row대구광역시
5th row경상북도

Common Values

ValueCountFrequency (%)
경기도 1185
11.8%
서울특별시 846
 
8.5%
경상북도 827
 
8.3%
전라남도 825
 
8.2%
전라북도 777
 
7.8%
경상남도 772
 
7.7%
강원도 760
 
7.6%
충청남도 682
 
6.8%
충청북도 627
 
6.3%
대구광역시 557
 
5.6%
Other values (7) 2142
21.4%

Length

2024-03-23T07:47:32.264397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 1185
11.8%
서울특별시 846
 
8.5%
경상북도 827
 
8.3%
전라남도 825
 
8.2%
전라북도 777
 
7.8%
경상남도 772
 
7.7%
강원도 760
 
7.6%
충청남도 682
 
6.8%
충청북도 627
 
6.3%
대구광역시 557
 
5.6%
Other values (7) 2142
21.4%
Distinct683
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:47:32.735913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length12
Mean length3.7554
Min length1

Characters and Unicode

Total characters37554
Distinct characters363
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique225 ?
Unique (%)2.2%

Sample

1st row추출가공식품
2nd row일품검정콩
3rd row돼지고기
4th row옥수수가루
5th row추출가공식품
ValueCountFrequency (%)
돼지고기 781
 
7.7%
쇠고기 559
 
5.5%
배추김치 540
 
5.3%
쇠고기(한우 384
 
3.8%
346
 
3.4%
닭고기 261
 
2.6%
고추가루 235
 
2.3%
멥쌀 210
 
2.1%
떡류 176
 
1.7%
두부류 174
 
1.7%
Other values (674) 6510
64.0%
2024-03-23T07:47:33.433032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3115
 
8.3%
3095
 
8.2%
( 1449
 
3.9%
) 1449
 
3.9%
1231
 
3.3%
1153
 
3.1%
1117
 
3.0%
976
 
2.6%
759
 
2.0%
758
 
2.0%
Other values (353) 22452
59.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 34456
91.8%
Open Punctuation 1449
 
3.9%
Close Punctuation 1449
 
3.9%
Space Separator 176
 
0.5%
Uppercase Letter 12
 
< 0.1%
Other Punctuation 8
 
< 0.1%
Decimal Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3115
 
9.0%
3095
 
9.0%
1231
 
3.6%
1153
 
3.3%
1117
 
3.2%
976
 
2.8%
759
 
2.2%
758
 
2.2%
736
 
2.1%
679
 
2.0%
Other values (343) 20837
60.5%
Decimal Number
ValueCountFrequency (%)
1 2
50.0%
6 1
25.0%
4 1
25.0%
Uppercase Letter
ValueCountFrequency (%)
M 8
66.7%
A 4
33.3%
Other Punctuation
ValueCountFrequency (%)
. 7
87.5%
/ 1
 
12.5%
Open Punctuation
ValueCountFrequency (%)
( 1449
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1449
100.0%
Space Separator
ValueCountFrequency (%)
176
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 34456
91.8%
Common 3086
 
8.2%
Latin 12
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3115
 
9.0%
3095
 
9.0%
1231
 
3.6%
1153
 
3.3%
1117
 
3.2%
976
 
2.8%
759
 
2.2%
758
 
2.2%
736
 
2.1%
679
 
2.0%
Other values (343) 20837
60.5%
Common
ValueCountFrequency (%)
( 1449
47.0%
) 1449
47.0%
176
 
5.7%
. 7
 
0.2%
1 2
 
0.1%
/ 1
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%
Latin
ValueCountFrequency (%)
M 8
66.7%
A 4
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 34456
91.8%
ASCII 3098
 
8.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3115
 
9.0%
3095
 
9.0%
1231
 
3.6%
1153
 
3.3%
1117
 
3.2%
976
 
2.8%
759
 
2.2%
758
 
2.2%
736
 
2.1%
679
 
2.0%
Other values (343) 20837
60.5%
ASCII
ValueCountFrequency (%)
( 1449
46.8%
) 1449
46.8%
176
 
5.7%
M 8
 
0.3%
. 7
 
0.2%
A 4
 
0.1%
1 2
 
0.1%
/ 1
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

위반유형
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
거짓표시
5866 
미표시
4112 
영수증미비치
 
22

Length

Max length6
Median length4
Mean length3.5932
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row거짓표시
2nd row미표시
3rd row거짓표시
4th row미표시
5th row거짓표시

Common Values

ValueCountFrequency (%)
거짓표시 5866
58.7%
미표시 4112
41.1%
영수증미비치 22
 
0.2%

Length

2024-03-23T07:47:33.784490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:47:33.981920image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
거짓표시 5866
58.7%
미표시 4112
41.1%
영수증미비치 22
 
0.2%

위반건수
Real number (ℝ)

Distinct30
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.8733
Minimum1
Maximum114
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:47:34.325838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile6
Maximum114
Range113
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.507125
Coefficient of variation (CV)1.3383468
Kurtosis451.43108
Mean1.8733
Median Absolute Deviation (MAD)0
Skewness13.55767
Sum18733
Variance6.2856757
MonotonicityNot monotonic
2024-03-23T07:47:34.668705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=30)
ValueCountFrequency (%)
1 6931
69.3%
2 1452
 
14.5%
3 611
 
6.1%
4 322
 
3.2%
5 180
 
1.8%
6 135
 
1.4%
7 79
 
0.8%
8 69
 
0.7%
9 52
 
0.5%
10 38
 
0.4%
Other values (20) 131
 
1.3%
ValueCountFrequency (%)
1 6931
69.3%
2 1452
 
14.5%
3 611
 
6.1%
4 322
 
3.2%
5 180
 
1.8%
6 135
 
1.4%
7 79
 
0.8%
8 69
 
0.7%
9 52
 
0.5%
10 38
 
0.4%
ValueCountFrequency (%)
114 1
 
< 0.1%
60 1
 
< 0.1%
39 1
 
< 0.1%
31 1
 
< 0.1%
29 3
< 0.1%
28 2
< 0.1%
26 1
 
< 0.1%
23 2
< 0.1%
22 3
< 0.1%
21 1
 
< 0.1%

위반물량(kg)
Real number (ℝ)

MISSING  SKEWED 

Distinct3203
Distinct (%)33.6%
Missing462
Missing (%)4.6%
Infinite0
Infinite (%)0.0%
Mean8343.8825
Minimum0
Maximum8105922
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:47:35.184080image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q15.8
median30
Q3360
95-th percentile9303
Maximum8105922
Range8105922
Interquartile range (IQR)354.2

Descriptive statistics

Standard deviation159106.44
Coefficient of variation (CV)19.068634
Kurtosis1898.5376
Mean8343.8825
Median Absolute Deviation (MAD)28.5
Skewness41.862149
Sum79583952
Variance2.5314859 × 1010
MonotonicityNot monotonic
2024-03-23T07:47:35.672799image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 345
 
3.5%
1.0 329
 
3.3%
2.0 287
 
2.9%
5.0 252
 
2.5%
3.0 244
 
2.4%
20.0 222
 
2.2%
4.0 198
 
2.0%
6.0 162
 
1.6%
40.0 124
 
1.2%
30.0 118
 
1.2%
Other values (3193) 7257
72.6%
(Missing) 462
 
4.6%
ValueCountFrequency (%)
0.0 1
 
< 0.1%
0.05 2
 
< 0.1%
0.06 1
 
< 0.1%
0.08 1
 
< 0.1%
0.1 8
0.1%
0.112 1
 
< 0.1%
0.143 1
 
< 0.1%
0.144 1
 
< 0.1%
0.15 8
0.1%
0.17 2
 
< 0.1%
ValueCountFrequency (%)
8105922.0 1
< 0.1%
7818860.0 1
< 0.1%
6612348.0 1
< 0.1%
6087342.0 1
< 0.1%
4200000.0 1
< 0.1%
1790464.0 1
< 0.1%
1129380.0 1
< 0.1%
1103395.0 1
< 0.1%
1020367.0 1
< 0.1%
884830.0 1
< 0.1%

Interactions

2024-03-23T07:47:29.701387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:47:29.350506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:47:29.886739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:47:29.514002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:47:35.955646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업무구분명시도명위반유형위반건수위반물량(kg)
업무구분명1.0000.0580.4110.0430.000
시도명0.0581.0000.1260.0420.000
위반유형0.4110.1261.0000.0450.000
위반건수0.0430.0420.0451.0000.000
위반물량(kg)0.0000.0000.0000.0001.000
2024-03-23T07:47:36.222231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반유형업무구분명시도명
위반유형1.0000.1870.067
업무구분명0.1871.0000.027
시도명0.0670.0271.000
2024-03-23T07:47:36.475067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반건수위반물량(kg)업무구분명시도명위반유형
위반건수1.0000.2960.0160.0200.019
위반물량(kg)0.2961.0000.0000.0000.000
업무구분명0.0160.0001.0000.0270.187
시도명0.0200.0000.0271.0000.067
위반유형0.0190.0000.1870.0671.000

Missing values

2024-03-23T07:47:30.232606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:47:30.583884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

처분년월업무구분명시도명위반품목위반유형위반건수위반물량(kg)
578412003-03-01원산지단속부산광역시추출가공식품거짓표시1516.0
268682013-03-01원산지단속전라남도일품검정콩미표시16.0
398632009-09-01원산지단속강원도돼지고기거짓표시152024.9
482802007-02-01원산지단속대구광역시옥수수가루미표시127.0
362942010-08-01원산지단속경상북도추출가공식품거짓표시151888.0
620382001-04-01원산지단속서울특별시기타거짓표시140.0
102142019-01-01원산지단속전라북도월동배추거짓표시1500.0
200212015-06-01원산지단속경상북도세척당근거짓표시110.0
482912007-02-01원산지단속부산광역시호박거짓표시12.5
232222014-05-01원산지단속서울특별시산수유거짓표시110.5
처분년월업무구분명시도명위반품목위반유형위반건수위반물량(kg)
622392001-03-01원산지단속전라북도돼지고기거짓표시489.5
554672004-03-01원산지단속경기도엿기름거짓표시1540.0
175822016-04-01양곡표시충청북도메현미거짓표시125000.0
219212014-10-01원산지단속경상북도약재류 기타미표시15.0
139342017-08-01원산지단속전라남도돼지고기거짓표시2117.0
250352013-10-01원산지단속서울특별시포기김치거짓표시180.0
395112009-11-01원산지단속대전광역시닭고기미표시12.0
551702004-05-01원산지단속충청남도조미 기타거짓표시25803.0
47372021-02-01원산지단속충청남도호두미표시10.8
564982003-10-01원산지단속전라북도돼지고기거짓표시4208.6