Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows915
Duplicate rows (%)9.2%
Total size in memory761.7 KiB
Average record size in memory78.0 B

Variable types

DateTime1
Categorical4
Numeric3

Dataset

Description농산물 유통 관련하여 가공용쌀 공급업에 대해 지정용도외 사용, 원산지표시, 관리대장 비치 등 단속정보(단속년월, 시도명, 조사건수, 위반업체수, 지정용도외 사용 건수, 표시위반 건수,관리대장 미비치 건수, 기타 )
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20170912000000000790

Alerts

Dataset has 915 (9.2%) duplicate rowsDuplicates
위반업체 수 is highly overall correlated with 관리대장 미비치 건수 and 1 other fieldsHigh correlation
관리대장 미비치 건수 is highly overall correlated with 위반업체 수High correlation
기타 is highly overall correlated with 위반업체 수High correlation
지정용도외 사용 건수 is highly imbalanced (98.4%)Imbalance
표시위반 건수 is highly imbalanced (99.7%)Imbalance
기타 is highly imbalanced (97.2%)Imbalance
위반업체 수 has 9640 (96.4%) zerosZeros
관리대장 미비치 건수 has 9811 (98.1%) zerosZeros

Reproduction

Analysis started2024-03-23 07:23:17.716167
Analysis finished2024-03-23 07:23:22.880923
Duration5.16 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct3856
Distinct (%)38.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2000-03-27 00:00:00
Maximum2023-03-16 00:00:00
2024-03-23T07:23:23.064946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:23.498107image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

시도별
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
1994 
경상북도
1370 
강원도
989 
경상남도
919 
전라남도
875 
Other values (12)
3853 

Length

Max length7
Median length4
Mean length3.8835
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row경기도
2nd row충청북도
3rd row울산광역시
4th row부산광역시
5th row전라남도

Common Values

ValueCountFrequency (%)
경기도 1994
19.9%
경상북도 1370
13.7%
강원도 989
9.9%
경상남도 919
9.2%
전라남도 875
8.8%
충청북도 766
 
7.7%
충청남도 735
 
7.3%
전라북도 694
 
6.9%
서울특별시 425
 
4.2%
인천광역시 340
 
3.4%
Other values (7) 893
8.9%

Length

2024-03-23T07:23:23.992767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 1994
19.9%
경상북도 1370
13.7%
강원도 989
9.9%
경상남도 919
9.2%
전라남도 875
8.8%
충청북도 766
 
7.7%
충청남도 735
 
7.3%
전라북도 694
 
6.9%
서울특별시 425
 
4.2%
인천광역시 340
 
3.4%
Other values (7) 893
8.9%

조사건수
Real number (ℝ)

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.2914
Minimum1
Maximum21
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:23:24.376016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum21
Range20
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.90529995
Coefficient of variation (CV)0.70102211
Kurtosis56.912934
Mean1.2914
Median Absolute Deviation (MAD)0
Skewness5.8043332
Sum12914
Variance0.819568
MonotonicityNot monotonic
2024-03-23T07:23:24.706368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
1 8389
83.9%
2 1008
 
10.1%
3 310
 
3.1%
4 125
 
1.2%
5 71
 
0.7%
6 38
 
0.4%
7 29
 
0.3%
8 14
 
0.1%
9 5
 
0.1%
10 4
 
< 0.1%
Other values (5) 7
 
0.1%
ValueCountFrequency (%)
1 8389
83.9%
2 1008
 
10.1%
3 310
 
3.1%
4 125
 
1.2%
5 71
 
0.7%
6 38
 
0.4%
7 29
 
0.3%
8 14
 
0.1%
9 5
 
0.1%
10 4
 
< 0.1%
ValueCountFrequency (%)
21 1
 
< 0.1%
16 1
 
< 0.1%
13 1
 
< 0.1%
12 2
 
< 0.1%
11 2
 
< 0.1%
10 4
 
< 0.1%
9 5
 
0.1%
8 14
 
0.1%
7 29
0.3%
6 38
0.4%

위반업체 수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0504
Minimum0
Maximum12
Zeros9640
Zeros (%)96.4%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:23:24.996206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum12
Range12
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.32290907
Coefficient of variation (CV)6.406926
Kurtosis271.67246
Mean0.0504
Median Absolute Deviation (MAD)0
Skewness12.344368
Sum504
Variance0.10427027
MonotonicityNot monotonic
2024-03-23T07:23:25.284998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0 9640
96.4%
1 275
 
2.8%
2 56
 
0.6%
3 17
 
0.2%
5 4
 
< 0.1%
4 4
 
< 0.1%
6 3
 
< 0.1%
12 1
 
< 0.1%
ValueCountFrequency (%)
0 9640
96.4%
1 275
 
2.8%
2 56
 
0.6%
3 17
 
0.2%
4 4
 
< 0.1%
5 4
 
< 0.1%
6 3
 
< 0.1%
12 1
 
< 0.1%
ValueCountFrequency (%)
12 1
 
< 0.1%
6 3
 
< 0.1%
5 4
 
< 0.1%
4 4
 
< 0.1%
3 17
 
0.2%
2 56
 
0.6%
1 275
 
2.8%
0 9640
96.4%

지정용도외 사용 건수
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9985 
1
 
15

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9985
99.9%
1 15
 
0.1%

Length

2024-03-23T07:23:25.767317image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:23:26.055594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9985
99.9%
1 15
 
0.1%

표시위반 건수
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9998 
1
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9998
> 99.9%
1 2
 
< 0.1%

Length

2024-03-23T07:23:26.358681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:23:26.665845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9998
> 99.9%
1 2
 
< 0.1%

관리대장 미비치 건수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0251
Minimum0
Maximum5
Zeros9811
Zeros (%)98.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:23:26.932107image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.20609279
Coefficient of variation (CV)8.2108681
Kurtosis159.16447
Mean0.0251
Median Absolute Deviation (MAD)0
Skewness11.139996
Sum251
Variance0.042474237
MonotonicityNot monotonic
2024-03-23T07:23:27.122886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 9811
98.1%
1 148
 
1.5%
2 26
 
0.3%
3 10
 
0.1%
4 4
 
< 0.1%
5 1
 
< 0.1%
ValueCountFrequency (%)
0 9811
98.1%
1 148
 
1.5%
2 26
 
0.3%
3 10
 
0.1%
4 4
 
< 0.1%
5 1
 
< 0.1%
ValueCountFrequency (%)
5 1
 
< 0.1%
4 4
 
< 0.1%
3 10
 
0.1%
2 26
 
0.3%
1 148
 
1.5%
0 9811
98.1%

기타
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9953 
1
 
45
2
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9953
99.5%
1 45
 
0.4%
2 2
 
< 0.1%

Length

2024-03-23T07:23:27.336703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:23:27.506979image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9953
99.5%
1 45
 
0.4%
2 2
 
< 0.1%

Interactions

2024-03-23T07:23:21.384977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:18.826629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:20.236852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:21.785126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:19.369257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:20.808375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:22.159857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:19.756683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:23:21.036160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:23:27.816108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타
시도별1.0000.2200.0630.0590.0000.0760.045
조사건수0.2201.0000.0760.0000.0000.1400.061
위반업체 수0.0630.0761.0000.0580.0850.7830.718
지정용도외 사용 건수0.0590.0000.0581.0000.0000.0430.000
표시위반 건수0.0000.0000.0850.0001.0000.0000.000
관리대장 미비치 건수0.0760.1400.7830.0430.0001.0000.520
기타0.0450.0610.7180.0000.0000.5201.000
2024-03-23T07:23:28.116141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지정용도외 사용 건수기타시도별표시위반 건수
지정용도외 사용 건수1.0000.0000.0530.000
기타0.0001.0000.0240.000
시도별0.0530.0241.0000.000
표시위반 건수0.0000.0000.0001.000
2024-03-23T07:23:28.420066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
조사건수위반업체 수관리대장 미비치 건수시도별지정용도외 사용 건수표시위반 건수기타
조사건수1.0000.0390.0460.0820.0120.0000.019
위반업체 수0.0391.0000.7210.0290.0620.0910.637
관리대장 미비치 건수0.0460.7211.0000.0360.0310.0000.250
시도별0.0820.0290.0361.0000.0530.0000.024
지정용도외 사용 건수0.0120.0620.0310.0531.0000.0000.000
표시위반 건수0.0000.0910.0000.0000.0001.0000.000
기타0.0190.6370.2500.0240.0000.0001.000

Missing values

2024-03-23T07:23:22.438153image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:23:22.712697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

단속년월시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타
48472009-01-29경기도100000
100442013-01-24충청북도100000
29232007-09-28울산광역시200000
13362022-10-31부산광역시100000
140392008-03-31전라남도210000
3822009-03-23서울특별시200000
35252003-05-21경기도100000
62982016-08-30경기도100000
142004-12-08서울특별시300000
155712006-07-27경상북도100000
단속년월시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타
41612006-08-24경기도100000
157782007-03-21경상북도100000
83992011-12-06강원도100000
193152015-06-23경상남도100000
61662015-07-29경기도100000
148132018-09-04전라남도100000
190222011-09-19경상남도100000
104462019-03-12충청북도100000
102152015-02-06충청북도100000
47192008-06-30경기도200000

Duplicate rows

Most frequently occurring

단속년월시도별조사건수위반업체 수지정용도외 사용 건수표시위반 건수관리대장 미비치 건수기타# duplicates
5112012-09-05경기도1000006
8542021-04-28경기도1000006
4232011-01-26경기도1000005
4312011-01-28전라북도1000005
5072012-09-04경기도1000005
5232012-12-04경상남도1000005
582005-07-05경기도1000004
1162006-03-30경기도1000004
1512006-08-25경상북도1000004
1992007-02-22경상북도1000004