Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows964
Duplicate rows (%)9.6%
Total size in memory761.7 KiB
Average record size in memory78.0 B

Variable types

DateTime1
Categorical5
Numeric2

Dataset

Description농산물 유통 관련하여 가공용쌀 공급업에 대해 지정용도외 사용, 원산지표시, 관리대장 비치 등 단속정보(단속년월, 시도명, 조사건수, 위반업체수, 지정용도외 사용 건수, 표시위반 건수,관리대장 미비치 건수, 기타 )
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20170912000000000790

Alerts

Dataset has 964 (9.6%) duplicate rowsDuplicates
위반업체 수 is highly overall correlated with 관리대장 미비치 건수 and 1 other fieldsHigh correlation
관리대장 미비치 건수 is highly overall correlated with 위반업체 수High correlation
기타 is highly overall correlated with 위반업체 수High correlation
지정용도외 사용 건수 is highly imbalanced (98.3%)Imbalance
표시위반 건수 is highly imbalanced (99.6%)Imbalance
관리대장 미비치 건수 is highly imbalanced (93.4%)Imbalance
기타 is highly imbalanced (97.8%)Imbalance
위반업체 수 has 9643 (96.4%) zerosZeros

Reproduction

Analysis started2024-03-23 07:22:33.793802
Analysis finished2024-03-23 07:22:38.156884
Duration4.36 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct3766
Distinct (%)37.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum1999-12-24 00:00:00
Maximum2022-09-08 00:00:00
2024-03-23T07:22:38.457129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:22:39.207617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

시도별
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
1993 
경상북도
1361 
강원도
972 
경상남도
939 
전라남도
904 
Other values (12)
3831 

Length

Max length7
Median length4
Mean length3.8848
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강원도
2nd row강원도
3rd row인천광역시
4th row대전광역시
5th row강원도

Common Values

ValueCountFrequency (%)
경기도 1993
19.9%
경상북도 1361
13.6%
강원도 972
9.7%
경상남도 939
9.4%
전라남도 904
9.0%
충청북도 763
 
7.6%
충청남도 755
 
7.5%
전라북도 688
 
6.9%
서울특별시 439
 
4.4%
인천광역시 332
 
3.3%
Other values (7) 854
8.5%

Length

2024-03-23T07:22:39.669807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 1993
19.9%
경상북도 1361
13.6%
강원도 972
9.7%
경상남도 939
9.4%
전라남도 904
9.0%
충청북도 763
 
7.6%
충청남도 755
 
7.5%
전라북도 688
 
6.9%
서울특별시 439
 
4.4%
인천광역시 332
 
3.3%
Other values (7) 854
8.5%

조사건수
Real number (ℝ)

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3014
Minimum1
Maximum26
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:22:40.050832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum26
Range25
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.95049902
Coefficient of variation (CV)0.73036654
Kurtosis90.209978
Mean1.3014
Median Absolute Deviation (MAD)0
Skewness6.8063058
Sum13014
Variance0.90344838
MonotonicityNot monotonic
2024-03-23T07:22:40.436675image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
1 8348
83.5%
2 1041
 
10.4%
3 310
 
3.1%
4 127
 
1.3%
5 72
 
0.7%
6 35
 
0.4%
7 32
 
0.3%
8 14
 
0.1%
9 8
 
0.1%
10 5
 
0.1%
Other values (5) 8
 
0.1%
ValueCountFrequency (%)
1 8348
83.5%
2 1041
 
10.4%
3 310
 
3.1%
4 127
 
1.3%
5 72
 
0.7%
6 35
 
0.4%
7 32
 
0.3%
8 14
 
0.1%
9 8
 
0.1%
10 5
 
0.1%
ValueCountFrequency (%)
26 1
 
< 0.1%
21 1
 
< 0.1%
14 1
 
< 0.1%
12 2
 
< 0.1%
11 3
 
< 0.1%
10 5
 
0.1%
9 8
 
0.1%
8 14
 
0.1%
7 32
0.3%
6 35
0.4%

위반업체 수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.048
Minimum0
Maximum12
Zeros9643
Zeros (%)96.4%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:22:40.731204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum12
Range12
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.30117282
Coefficient of variation (CV)6.2744339
Kurtosis311.77234
Mean0.048
Median Absolute Deviation (MAD)0
Skewness12.546058
Sum480
Variance0.090705071
MonotonicityNot monotonic
2024-03-23T07:22:41.203489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0 9643
96.4%
1 276
 
2.8%
2 58
 
0.6%
3 16
 
0.2%
4 3
 
< 0.1%
5 2
 
< 0.1%
12 1
 
< 0.1%
6 1
 
< 0.1%
ValueCountFrequency (%)
0 9643
96.4%
1 276
 
2.8%
2 58
 
0.6%
3 16
 
0.2%
4 3
 
< 0.1%
5 2
 
< 0.1%
6 1
 
< 0.1%
12 1
 
< 0.1%
ValueCountFrequency (%)
12 1
 
< 0.1%
6 1
 
< 0.1%
5 2
 
< 0.1%
4 3
 
< 0.1%
3 16
 
0.2%
2 58
 
0.6%
1 276
 
2.8%
0 9643
96.4%

지정용도외 사용 건수
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9984 
1
 
16

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9984
99.8%
1 16
 
0.2%

Length

2024-03-23T07:22:41.606408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:22:41.859624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9984
99.8%
1 16
 
0.2%

표시위반 건수
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9997 
1
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9997
> 99.9%
1 3
 
< 0.1%

Length

2024-03-23T07:22:42.174228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:22:42.407444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9997
> 99.9%
1 3
 
< 0.1%

관리대장 미비치 건수
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9811 
1
 
149
2
 
27
3
 
10
4
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9811
98.1%
1 149
 
1.5%
2 27
 
0.3%
3 10
 
0.1%
4 3
 
< 0.1%

Length

2024-03-23T07:22:42.644169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:22:42.971052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9811
98.1%
1 149
 
1.5%
2 27
 
0.3%
3 10
 
0.1%
4 3
 
< 0.1%

기타
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9965 
1
 
32
2
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9965
99.7%
1 32
 
0.3%
2 3
 
< 0.1%

Length

2024-03-23T07:22:43.379377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:22:43.674186image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9965
99.7%
1 32
 
0.3%
2 3
 
< 0.1%

Interactions

2024-03-23T07:22:36.216767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:22:35.290240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:22:36.679454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:22:35.566240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:22:43.870687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타
시도별1.0000.1930.0740.0520.0190.0710.071
조사건수0.1931.0000.0880.0200.0000.0720.000
위반업체 수0.0740.0881.0000.0160.0000.7830.654
지정용도외 사용 건수0.0520.0200.0161.0000.0000.0000.024
표시위반 건수0.0190.0000.0000.0001.0000.0000.000
관리대장 미비치 건수0.0710.0720.7830.0000.0001.0000.251
기타0.0710.0000.6540.0240.0000.2511.000
2024-03-23T07:22:44.190260image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지정용도외 사용 건수관리대장 미비치 건수기타시도별표시위반 건수
지정용도외 사용 건수1.0000.0000.0400.0460.000
관리대장 미비치 건수0.0001.0000.1950.0370.000
기타0.0400.1951.0000.0380.000
시도별0.0460.0370.0381.0000.017
표시위반 건수0.0000.0000.0000.0171.000
2024-03-23T07:22:44.564449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
조사건수위반업체 수시도별지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타
조사건수1.0000.0380.0810.0060.0000.0240.006
위반업체 수0.0381.0000.0390.0200.0000.6110.557
시도별0.0810.0391.0000.0460.0170.0370.038
지정용도외 사용 건수0.0060.0200.0461.0000.0000.0000.040
표시위반 건수0.0000.0000.0170.0001.0000.0000.000
관리대장 미비치 건수0.0240.6110.0370.0000.0001.0000.195
기타0.0060.5570.0380.0400.0000.1951.000

Missing values

2024-03-23T07:22:37.417042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:22:37.959472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

단속년월시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타
88942021-09-16강원도100000
80652010-03-25강원도100000
20612010-08-03인천광역시100000
27362013-01-23대전광역시100000
79502009-06-09강원도100000
61062015-08-04경기도100000
186942012-04-27경상남도100000
186342011-06-30경상남도100000
115772016-01-22충청남도100000
151392006-03-20경상북도100000
단속년월시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타
122602009-09-15전라북도100000
70582002-07-23강원도300000
151102006-01-25경상북도100000
56152013-01-25경기도100000
124412011-12-14전라북도100000
153572006-12-11경상북도100000
156102007-09-10경상북도100000
143742015-09-23전라남도200000
190252016-06-21경상남도100000
163482011-12-26경상북도200000

Duplicate rows

Most frequently occurring

단속년월시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타# duplicates
5782012-09-04경기도1000008
5832012-09-05경기도1000006
612005-07-05경기도1000005
5762012-09-03전라남도1000005
6952014-01-21경기도1100105
8432017-09-26전라남도1000005
9132020-06-24경상북도1000005
472005-04-18경기도1000004
1232006-03-29충청북도1000004
1532006-07-31충청북도1000004