Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows882
Duplicate rows (%)8.8%
Total size in memory761.7 KiB
Average record size in memory78.0 B

Variable types

DateTime1
Categorical5
Numeric2

Dataset

Description농산물 유통 관련하여 가공용쌀 공급업에 대해 지정용도외 사용, 원산지표시, 관리대장 비치 등 단속정보(단속년월, 시도명, 조사건수, 위반업체수, 지정용도외 사용 건수, 표시위반 건수,관리대장 미비치 건수, 기타 )
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20170912000000000790

Alerts

표시위반 건수 has constant value ""Constant
Dataset has 882 (8.8%) duplicate rowsDuplicates
위반업체 수 is highly overall correlated with 관리대장 미비치 건수High correlation
관리대장 미비치 건수 is highly overall correlated with 위반업체 수High correlation
지정용도외 사용 건수 is highly imbalanced (98.0%)Imbalance
관리대장 미비치 건수 is highly imbalanced (93.4%)Imbalance
기타 is highly imbalanced (98.3%)Imbalance
위반업체 수 has 9641 (96.4%) zerosZeros

Reproduction

Analysis started2024-03-23 07:23:51.053207
Analysis finished2024-03-23 07:23:53.283095
Duration2.23 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct3984
Distinct (%)39.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2000-03-27 00:00:00
Maximum2024-03-15 00:00:00
2024-03-23T07:23:53.430052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:53.975956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

시도별
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
1976 
경상북도
1323 
강원특별자치도
1005 
경상남도
900 
전라남도
860 
Other values (12)
3936 

Length

Max length7
Median length5
Mean length4.4967
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강원특별자치도
2nd row충청남도
3rd row경기도
4th row울산광역시
5th row경기도

Common Values

ValueCountFrequency (%)
경기도 1976
19.8%
경상북도 1323
13.2%
강원특별자치도 1005
10.1%
경상남도 900
9.0%
전라남도 860
8.6%
충청북도 779
 
7.8%
충청남도 763
 
7.6%
전북특별자치도 682
 
6.8%
서울특별시 445
 
4.5%
인천광역시 324
 
3.2%
Other values (7) 943
9.4%

Length

2024-03-23T07:23:54.428068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 1976
19.8%
경상북도 1323
13.2%
강원특별자치도 1005
10.1%
경상남도 900
9.0%
전라남도 860
8.6%
충청북도 779
 
7.8%
충청남도 763
 
7.6%
전북특별자치도 682
 
6.8%
서울특별시 445
 
4.5%
인천광역시 324
 
3.2%
Other values (7) 943
9.4%

조사건수
Real number (ℝ)

Distinct14
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.294
Minimum1
Maximum21
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:23:54.749060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum21
Range20
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.90567435
Coefficient of variation (CV)0.6999029
Kurtosis51.957663
Mean1.294
Median Absolute Deviation (MAD)0
Skewness5.6028906
Sum12940
Variance0.82024602
MonotonicityNot monotonic
2024-03-23T07:23:55.126063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
1 8366
83.7%
2 1045
 
10.4%
3 285
 
2.9%
4 133
 
1.3%
5 73
 
0.7%
6 34
 
0.3%
7 33
 
0.3%
8 13
 
0.1%
9 8
 
0.1%
10 3
 
< 0.1%
Other values (4) 7
 
0.1%
ValueCountFrequency (%)
1 8366
83.7%
2 1045
 
10.4%
3 285
 
2.9%
4 133
 
1.3%
5 73
 
0.7%
6 34
 
0.3%
7 33
 
0.3%
8 13
 
0.1%
9 8
 
0.1%
10 3
 
< 0.1%
ValueCountFrequency (%)
21 1
 
< 0.1%
13 1
 
< 0.1%
12 2
 
< 0.1%
11 3
 
< 0.1%
10 3
 
< 0.1%
9 8
 
0.1%
8 13
 
0.1%
7 33
0.3%
6 34
0.3%
5 73
0.7%

위반업체 수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0481
Minimum0
Maximum6
Zeros9641
Zeros (%)96.4%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:23:55.465849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.28843493
Coefficient of variation (CV)5.9965682
Kurtosis96.860531
Mean0.0481
Median Absolute Deviation (MAD)0
Skewness8.503247
Sum481
Variance0.083194709
MonotonicityNot monotonic
2024-03-23T07:23:56.061995image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 9641
96.4%
1 281
 
2.8%
2 49
 
0.5%
3 19
 
0.2%
4 6
 
0.1%
5 3
 
< 0.1%
6 1
 
< 0.1%
ValueCountFrequency (%)
0 9641
96.4%
1 281
 
2.8%
2 49
 
0.5%
3 19
 
0.2%
4 6
 
0.1%
5 3
 
< 0.1%
6 1
 
< 0.1%
ValueCountFrequency (%)
6 1
 
< 0.1%
5 3
 
< 0.1%
4 6
 
0.1%
3 19
 
0.2%
2 49
 
0.5%
1 281
 
2.8%
0 9641
96.4%

지정용도외 사용 건수
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9981 
1
 
19

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9981
99.8%
1 19
 
0.2%

Length

2024-03-23T07:23:56.437534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:23:56.756365image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9981
99.8%
1 19
 
0.2%

표시위반 건수
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
10000 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 10000
100.0%

Length

2024-03-23T07:23:57.138851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:23:57.564131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 10000
100.0%

관리대장 미비치 건수
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9811 
1
 
149
2
 
26
3
 
11
4
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9811
98.1%
1 149
 
1.5%
2 26
 
0.3%
3 11
 
0.1%
4 3
 
< 0.1%

Length

2024-03-23T07:23:57.988159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:23:58.577032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9811
98.1%
1 149
 
1.5%
2 26
 
0.3%
3 11
 
0.1%
4 3
 
< 0.1%

기타
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9974 
1
 
25
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9974
99.7%
1 25
 
0.2%
2 1
 
< 0.1%

Length

2024-03-23T07:23:59.187158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:23:59.513321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9974
99.7%
1 25
 
0.2%
2 1
 
< 0.1%

Interactions

2024-03-23T07:23:52.263614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:51.803285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:52.495983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:52.001505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:23:59.709479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시도별조사건수위반업체 수지정용도외 사용 건수관리대장 미비치 건수기타
시도별1.0000.2270.0970.0630.0520.000
조사건수0.2271.0000.0360.0450.0080.039
위반업체 수0.0970.0361.0000.2210.7940.482
지정용도외 사용 건수0.0630.0450.2211.0000.0210.000
관리대장 미비치 건수0.0520.0080.7940.0211.0000.128
기타0.0000.0390.4820.0000.1281.000
2024-03-23T07:23:59.990668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지정용도외 사용 건수기타관리대장 미비치 건수시도별
지정용도외 사용 건수1.0000.0000.0260.056
기타0.0001.0000.0960.000
관리대장 미비치 건수0.0260.0961.0000.026
시도별0.0560.0000.0261.000
2024-03-23T07:24:00.265305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
조사건수위반업체 수시도별지정용도외 사용 건수관리대장 미비치 건수기타
조사건수1.0000.0300.0920.0110.0000.000
위반업체 수0.0301.0000.0440.2370.6730.370
시도별0.0920.0441.0000.0560.0260.000
지정용도외 사용 건수0.0110.2370.0561.0000.0260.000
관리대장 미비치 건수0.0000.6730.0260.0261.0000.096
기타0.0000.3700.0000.0000.0961.000

Missing values

2024-03-23T07:23:52.812379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:23:53.110552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

단속년월시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타
178652007-02-28강원특별자치도100000
104552022-04-22충청남도100000
57382012-05-25경기도100000
31092006-07-31울산광역시400000
59302013-02-25경기도100000
168572021-06-08경상남도100000
162852013-02-06경상남도100000
3822009-03-23서울특별시200000
121062021-09-16전라남도100000
192962005-04-30전북특별자치도100000
단속년월시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타
125392006-02-28경상북도110000
186452013-05-14강원특별자치도100000
42722006-05-03경기도200000
64562016-02-01경기도200000
161882011-09-19경상남도100000
149852022-09-30경상북도100000
54402010-11-15경기도100000
171612021-02-24제주특별자치도200000
182612009-12-10강원특별자치도100000
187472014-08-26강원특별자치도100000

Duplicate rows

Most frequently occurring

단속년월시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타# duplicates
5012012-09-05경기도1000008
4962012-09-04경기도1000006
3752010-03-17경기도1000005
5962013-11-26전북특별자치도1000005
772005-10-07강원특별자치도1000004
1022006-03-29충청북도1000004
1342006-08-24경상북도1000004
1482006-09-20경기도1000004
1662006-12-11경상북도1000004
1922007-03-22전라남도1000004