Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells445
Missing cells (%)0.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

DateTime1
Categorical3
Text1
Numeric2

Dataset

Description국립농산물품질관리원에서 관리하는 원산지표시 시도별 위반품목 및 위반물량 현황 정보(처분년월, 업무구분, 시도명, 위반품목, 위반유형, 위반건수, 위반물량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001684

Alerts

업무구분명 is highly imbalanced (79.6%)Imbalance
위반물량(kg) has 445 (4.5%) missing valuesMissing
위반물량(kg) is highly skewed (γ1 = 34.52731747)Skewed

Reproduction

Analysis started2024-03-23 07:46:52.848135
Analysis finished2024-03-23 07:46:55.494618
Duration2.65 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct270
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum1998-04-01 00:00:00
Maximum2022-03-01 00:00:00
2024-03-23T07:46:55.685231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:46:56.124441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

업무구분명
Categorical

IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
원산지단속
9154 
양곡표시
 
480
축산물이력
 
323
미검사품
 
27
재사용화환
 
10

Length

Max length5
Median length5
Mean length4.9481
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row양곡표시
2nd row원산지단속
3rd row원산지단속
4th row원산지단속
5th row원산지단속

Common Values

ValueCountFrequency (%)
원산지단속 9154
91.5%
양곡표시 480
 
4.8%
축산물이력 323
 
3.2%
미검사품 27
 
0.3%
재사용화환 10
 
0.1%
GMO 6
 
0.1%

Length

2024-03-23T07:46:56.573726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:46:56.932261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
원산지단속 9154
91.5%
양곡표시 480
 
4.8%
축산물이력 323
 
3.2%
미검사품 27
 
0.3%
재사용화환 10
 
0.1%
gmo 6
 
0.1%

시도명
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
1208 
서울특별시
903 
경상북도
849 
전라남도
836 
전라북도
771 
Other values (12)
5433 

Length

Max length7
Median length5
Mean length4.2196
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row광주광역시
2nd row울산광역시
3rd row전라북도
4th row강원도
5th row경상북도

Common Values

ValueCountFrequency (%)
경기도 1208
12.1%
서울특별시 903
9.0%
경상북도 849
 
8.5%
전라남도 836
 
8.4%
전라북도 771
 
7.7%
강원도 731
 
7.3%
경상남도 722
 
7.2%
충청북도 633
 
6.3%
충청남도 621
 
6.2%
대구광역시 551
 
5.5%
Other values (7) 2175
21.8%

Length

2024-03-23T07:46:57.338703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 1208
12.1%
서울특별시 903
9.0%
경상북도 849
 
8.5%
전라남도 836
 
8.4%
전라북도 771
 
7.7%
강원도 731
 
7.3%
경상남도 722
 
7.2%
충청북도 633
 
6.3%
충청남도 621
 
6.2%
대구광역시 551
 
5.5%
Other values (7) 2175
21.8%
Distinct678
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:46:57.880728image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length10
Mean length3.7488
Min length1

Characters and Unicode

Total characters37488
Distinct characters352
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique239 ?
Unique (%)2.4%

Sample

1st row흑현미
2nd row배추김치
3rd row당근
4th row쇠고기
5th row돈가스
ValueCountFrequency (%)
돼지고기 822
 
8.1%
쇠고기 553
 
5.4%
배추김치 528
 
5.2%
쇠고기(한우 384
 
3.8%
349
 
3.4%
닭고기 299
 
2.9%
고추가루 251
 
2.5%
멥쌀 228
 
2.2%
두부류 166
 
1.6%
삼겹살 156
 
1.5%
Other values (668) 6417
63.2%
2024-03-23T07:46:58.895087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3156
 
8.4%
3139
 
8.4%
( 1411
 
3.8%
) 1411
 
3.8%
1267
 
3.4%
1133
 
3.0%
1086
 
2.9%
1009
 
2.7%
777
 
2.1%
711
 
1.9%
Other values (342) 22388
59.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 34487
92.0%
Open Punctuation 1411
 
3.8%
Close Punctuation 1411
 
3.8%
Space Separator 153
 
0.4%
Uppercase Letter 12
 
< 0.1%
Decimal Number 9
 
< 0.1%
Other Punctuation 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3156
 
9.2%
3139
 
9.1%
1267
 
3.7%
1133
 
3.3%
1086
 
3.1%
1009
 
2.9%
777
 
2.3%
711
 
2.1%
694
 
2.0%
685
 
2.0%
Other values (333) 20830
60.4%
Decimal Number
ValueCountFrequency (%)
4 6
66.7%
1 2
 
22.2%
2 1
 
11.1%
Uppercase Letter
ValueCountFrequency (%)
M 8
66.7%
A 4
33.3%
Open Punctuation
ValueCountFrequency (%)
( 1411
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1411
100.0%
Space Separator
ValueCountFrequency (%)
153
100.0%
Other Punctuation
ValueCountFrequency (%)
. 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 34487
92.0%
Common 2989
 
8.0%
Latin 12
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3156
 
9.2%
3139
 
9.1%
1267
 
3.7%
1133
 
3.3%
1086
 
3.1%
1009
 
2.9%
777
 
2.3%
711
 
2.1%
694
 
2.0%
685
 
2.0%
Other values (333) 20830
60.4%
Common
ValueCountFrequency (%)
( 1411
47.2%
) 1411
47.2%
153
 
5.1%
4 6
 
0.2%
. 5
 
0.2%
1 2
 
0.1%
2 1
 
< 0.1%
Latin
ValueCountFrequency (%)
M 8
66.7%
A 4
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 34487
92.0%
ASCII 3001
 
8.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3156
 
9.2%
3139
 
9.1%
1267
 
3.7%
1133
 
3.3%
1086
 
3.1%
1009
 
2.9%
777
 
2.3%
711
 
2.1%
694
 
2.0%
685
 
2.0%
Other values (333) 20830
60.4%
ASCII
ValueCountFrequency (%)
( 1411
47.0%
) 1411
47.0%
153
 
5.1%
M 8
 
0.3%
4 6
 
0.2%
. 5
 
0.2%
A 4
 
0.1%
1 2
 
0.1%
2 1
 
< 0.1%

위반유형
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
거짓표시
5787 
미표시
4187 
영수증미비치
 
21
시정명령 위반
 
5

Length

Max length7
Median length4
Mean length3.587
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row미표시
2nd row거짓표시
3rd row미표시
4th row거짓표시
5th row미표시

Common Values

ValueCountFrequency (%)
거짓표시 5787
57.9%
미표시 4187
41.9%
영수증미비치 21
 
0.2%
시정명령 위반 5
 
0.1%

Length

2024-03-23T07:46:59.338368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:46:59.683404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
거짓표시 5787
57.8%
미표시 4187
41.8%
영수증미비치 21
 
0.2%
시정명령 5
 
< 0.1%
위반 5
 
< 0.1%

위반건수
Real number (ℝ)

Distinct32
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.8724
Minimum1
Maximum49
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:46:59.926219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile5
Maximum49
Range48
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.327243
Coefficient of variation (CV)1.2429198
Kurtosis63.570753
Mean1.8724
Median Absolute Deviation (MAD)0
Skewness6.287976
Sum18724
Variance5.4160598
MonotonicityNot monotonic
2024-03-23T07:47:00.284009image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
1 6932
69.3%
2 1464
 
14.6%
3 590
 
5.9%
4 309
 
3.1%
5 213
 
2.1%
6 135
 
1.4%
7 88
 
0.9%
8 70
 
0.7%
10 35
 
0.4%
9 31
 
0.3%
Other values (22) 133
 
1.3%
ValueCountFrequency (%)
1 6932
69.3%
2 1464
 
14.6%
3 590
 
5.9%
4 309
 
3.1%
5 213
 
2.1%
6 135
 
1.4%
7 88
 
0.9%
8 70
 
0.7%
9 31
 
0.3%
10 35
 
0.4%
ValueCountFrequency (%)
49 1
 
< 0.1%
40 1
 
< 0.1%
39 1
 
< 0.1%
29 2
 
< 0.1%
28 6
0.1%
27 1
 
< 0.1%
26 2
 
< 0.1%
25 1
 
< 0.1%
24 2
 
< 0.1%
23 2
 
< 0.1%

위반물량(kg)
Real number (ℝ)

MISSING  SKEWED 

Distinct3140
Distinct (%)32.9%
Missing445
Missing (%)4.5%
Infinite0
Infinite (%)0.0%
Mean6634.1757
Minimum0
Maximum4916361
Zeros3
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:47:00.694310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q15.5
median28.3
Q3324.45
95-th percentile9782.58
Maximum4916361
Range4916361
Interquartile range (IQR)318.95

Descriptive statistics

Standard deviation84499.916
Coefficient of variation (CV)12.737063
Kurtosis1591.703
Mean6634.1757
Median Absolute Deviation (MAD)26.9
Skewness34.527317
Sum63389549
Variance7.1402358 × 109
MonotonicityNot monotonic
2024-03-23T07:47:01.178936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 350
 
3.5%
1.0 335
 
3.4%
2.0 283
 
2.8%
5.0 253
 
2.5%
3.0 223
 
2.2%
4.0 220
 
2.2%
20.0 214
 
2.1%
6.0 178
 
1.8%
8.0 129
 
1.3%
40.0 113
 
1.1%
Other values (3130) 7257
72.6%
(Missing) 445
 
4.5%
ValueCountFrequency (%)
0.0 3
 
< 0.1%
0.002 1
 
< 0.1%
0.01 2
 
< 0.1%
0.03 1
 
< 0.1%
0.042 1
 
< 0.1%
0.043 1
 
< 0.1%
0.05 1
 
< 0.1%
0.06 2
 
< 0.1%
0.08 1
 
< 0.1%
0.1 11
0.1%
ValueCountFrequency (%)
4916361.0 1
< 0.1%
3361639.42 1
< 0.1%
2147482.0 1
< 0.1%
1790464.0 1
< 0.1%
1674769.0 1
< 0.1%
1639600.0 1
< 0.1%
1637134.0 1
< 0.1%
1540000.0 1
< 0.1%
1318800.0 1
< 0.1%
1218060.0 1
< 0.1%

Interactions

2024-03-23T07:46:54.417842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:46:53.879803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:46:54.730020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:46:54.133701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:47:01.467256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업무구분명시도명위반유형위반건수위반물량(kg)
업무구분명1.0000.0950.2210.0610.000
시도명0.0951.0000.1090.0870.000
위반유형0.2210.1091.0000.0450.000
위반건수0.0610.0870.0451.0000.000
위반물량(kg)0.0000.0000.0000.0001.000
2024-03-23T07:47:01.728994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반유형업무구분명시도명
위반유형1.0000.1440.061
업무구분명0.1441.0000.045
시도명0.0610.0451.000
2024-03-23T07:47:01.981130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반건수위반물량(kg)업무구분명시도명위반유형
위반건수1.0000.3230.0300.0330.028
위반물량(kg)0.3231.0000.0000.0000.000
업무구분명0.0300.0001.0000.0450.144
시도명0.0330.0000.0451.0000.061
위반유형0.0280.0000.1440.0611.000

Missing values

2024-03-23T07:46:55.027306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:46:55.326720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

처분년월업무구분명시도명위반품목위반유형위반건수위반물량(kg)
362012010-04-01양곡표시광주광역시흑현미미표시37.0
315512011-06-01원산지단속울산광역시배추김치거짓표시5780.0
505262005-08-01원산지단속전라북도당근미표시217.0
566972003-01-01원산지단속강원도쇠고기거짓표시114.4
331692011-01-01원산지단속경상북도돈가스미표시213.5
49452020-04-01원산지단속경기도돼지고기미표시220.0
463832007-03-01원산지단속인천광역시보리쌀미표시15.0
140642017-01-01원산지단속전라남도연근거짓표시110.0
506592005-06-01원산지단속강원도땅콩거짓표시115.0
65742019-09-01원산지단속전라북도쇠고기미표시12.0
처분년월업무구분명시도명위반품목위반유형위반건수위반물량(kg)
589572002-01-01원산지단속대전광역시연근거짓표시14.0
88752018-11-01원산지단속대전광역시총각김치미표시1625.0
227642014-01-01원산지단속광주광역시돼지고기거짓표시243286.32
140552017-01-01원산지단속전라남도거봉미표시36.7
385572009-09-01원산지단속대구광역시쇠고기거짓표시175073.7
609802001-01-01원산지단속전라남도엿기름거짓표시425.0
560572003-04-01원산지단속인천광역시쇠고기거짓표시234.0
45432020-07-01원산지단속충청북도옥수수가루거짓표시14640.0
589452002-01-01원산지단속대구광역시돼지고기거짓표시3190.0
498412006-01-01원산지단속충청북도비지미표시16.5