Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows284
Duplicate rows (%)2.8%
Total size in memory498.0 KiB
Average record size in memory51.0 B

Variable types

Numeric3
Text1
Categorical1

Dataset

Description국립농산물품질관리원에서 관리하는 농산물우수관리(GAP) 인증농가 현황 정보(인증번호, 소재지, 품목, 재배면적, 생산계획량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20181019000000000974

Alerts

Dataset has 284 (2.8%) duplicate rowsDuplicates
인증번호 is highly overall correlated with 품목High correlation
재배면적(㎡) is highly overall correlated with 생산계획량(톤)High correlation
생산계획량(톤) is highly overall correlated with 재배면적(㎡)High correlation
품목 is highly overall correlated with 인증번호High correlation
품목 is highly imbalanced (50.3%)Imbalance
생산계획량(톤) is highly skewed (γ1 = 71.39895588)Skewed

Reproduction

Analysis started2024-03-23 07:54:25.080658
Analysis finished2024-03-23 07:54:29.317634
Duration4.24 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

인증번호
Real number (ℝ)

HIGH CORRELATION 

Distinct355
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1001458.1
Minimum1000003
Maximum1002376
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:54:29.549617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1000003
5-th percentile1000113
Q11000364
median1001734
Q31002351
95-th percentile1002374
Maximum1002376
Range2373
Interquartile range (IQR)1987

Descriptive statistics

Standard deviation939.95995
Coefficient of variation (CV)0.00093859143
Kurtosis-1.68047
Mean1001458.1
Median Absolute Deviation (MAD)636
Skewness-0.33890806
Sum1.0014581 × 1010
Variance883524.71
MonotonicityNot monotonic
2024-03-23T07:54:29.954484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1002352 843
 
8.4%
1002370 464
 
4.6%
1002374 430
 
4.3%
1002344 325
 
3.2%
1000471 306
 
3.1%
1002372 276
 
2.8%
1002335 265
 
2.6%
1002232 225
 
2.2%
1001688 221
 
2.2%
1000058 220
 
2.2%
Other values (345) 6425
64.2%
ValueCountFrequency (%)
1000003 3
 
< 0.1%
1000029 11
 
0.1%
1000031 8
 
0.1%
1000032 11
 
0.1%
1000033 3
 
< 0.1%
1000034 8
 
0.1%
1000035 4
 
< 0.1%
1000041 14
 
0.1%
1000058 220
2.2%
1000061 4
 
< 0.1%
ValueCountFrequency (%)
1002376 160
 
1.6%
1002374 430
4.3%
1002373 99
 
1.0%
1002372 276
2.8%
1002371 102
 
1.0%
1002370 464
4.6%
1002366 32
 
0.3%
1002363 2
 
< 0.1%
1002355 38
 
0.4%
1002353 1
 
< 0.1%
Distinct88
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:54:30.484906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length14
Mean length9.2804
Min length7

Characters and Unicode

Total characters92804
Distinct characters95
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)0.1%

Sample

1st row전라남도 순천시
2nd row충청북도 음성군
3rd row전북특별자치도 남원시
4th row경기도 파주시
5th row충청남도 아산시
ValueCountFrequency (%)
강원특별자치도 2985
14.4%
철원군 2921
14.1%
경기도 2196
 
10.6%
경상북도 1783
 
8.6%
평택시 1106
 
5.3%
전북특별자치도 1085
 
5.2%
전라남도 770
 
3.7%
충청북도 601
 
2.9%
상주시 442
 
2.1%
김제시 396
 
1.9%
Other values (97) 6405
31.0%
2024-03-23T07:54:31.305442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10690
 
11.5%
9960
 
10.7%
6127
 
6.6%
5029
 
5.4%
4984
 
5.4%
4633
 
5.0%
4076
 
4.4%
4076
 
4.4%
4076
 
4.4%
4076
 
4.4%
Other values (85) 35077
37.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 82114
88.5%
Space Separator 10690
 
11.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9960
 
12.1%
6127
 
7.5%
5029
 
6.1%
4984
 
6.1%
4633
 
5.6%
4076
 
5.0%
4076
 
5.0%
4076
 
5.0%
4076
 
5.0%
3656
 
4.5%
Other values (84) 31421
38.3%
Space Separator
ValueCountFrequency (%)
10690
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 82114
88.5%
Common 10690
 
11.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9960
 
12.1%
6127
 
7.5%
5029
 
6.1%
4984
 
6.1%
4633
 
5.6%
4076
 
5.0%
4076
 
5.0%
4076
 
5.0%
4076
 
5.0%
3656
 
4.5%
Other values (84) 31421
38.3%
Common
ValueCountFrequency (%)
10690
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 82114
88.5%
ASCII 10690
 
11.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10690
100.0%
Hangul
ValueCountFrequency (%)
9960
 
12.1%
6127
 
7.5%
5029
 
6.1%
4984
 
6.1%
4633
 
5.6%
4076
 
5.0%
4076
 
5.0%
4076
 
5.0%
4076
 
5.0%
3656
 
4.5%
Other values (84) 31421
38.3%

품목
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct50
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
4686 
1730 
사과
934 
포도
574 
복숭아
471 
Other values (45)
1605 

Length

Max length7
Median length1
Mean length1.48
Min length1

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row
2nd row방울토마토
3rd row복숭아
4th row
5th row

Common Values

ValueCountFrequency (%)
4686
46.9%
1730
 
17.3%
사과 934
 
9.3%
포도 574
 
5.7%
복숭아 471
 
4.7%
382
 
3.8%
수박 191
 
1.9%
메론 127
 
1.3%
딸기 108
 
1.1%
방울토마토 85
 
0.9%
Other values (40) 712
 
7.1%

Length

2024-03-23T07:54:31.753268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
4686
46.9%
1730
 
17.3%
사과 934
 
9.3%
포도 574
 
5.7%
복숭아 471
 
4.7%
382
 
3.8%
수박 191
 
1.9%
메론 127
 
1.3%
딸기 108
 
1.1%
방울토마토 85
 
0.9%
Other values (40) 712
 
7.1%

재배면적(㎡)
Real number (ℝ)

HIGH CORRELATION 

Distinct4742
Distinct (%)47.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2461.1668
Minimum2
Maximum47504
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:54:32.147854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile198
Q11010
median2000
Q33400
95-th percentile5738.2
Maximum47504
Range47502
Interquartile range (IQR)2390

Descriptive statistics

Standard deviation2250.4882
Coefficient of variation (CV)0.91439891
Kurtosis43.592005
Mean2461.1668
Median Absolute Deviation (MAD)1100
Skewness4.2968847
Sum24611668
Variance5064697.1
MonotonicityNot monotonic
2024-03-23T07:54:32.578517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4000.0 68
 
0.7%
2000.0 29
 
0.3%
3000.0 28
 
0.3%
1000.0 23
 
0.2%
3967.0 22
 
0.2%
500.0 17
 
0.2%
2975.0 14
 
0.1%
3002.0 14
 
0.1%
1600.0 13
 
0.1%
1500.0 13
 
0.1%
Other values (4732) 9759
97.6%
ValueCountFrequency (%)
2.0 1
 
< 0.1%
3.0 1
 
< 0.1%
4.0 1
 
< 0.1%
5.0 1
 
< 0.1%
6.0 2
 
< 0.1%
7.0 5
0.1%
9.0 3
< 0.1%
10.0 2
 
< 0.1%
11.0 1
 
< 0.1%
12.0 3
< 0.1%
ValueCountFrequency (%)
47504.0 1
< 0.1%
38072.0 1
< 0.1%
29972.0 1
< 0.1%
28879.0 1
< 0.1%
27582.0 1
< 0.1%
27156.0 1
< 0.1%
26235.0 1
< 0.1%
25657.0 1
< 0.1%
24099.0 1
< 0.1%
23810.0 1
< 0.1%

생산계획량(톤)
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct2703
Distinct (%)27.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.6313898
Minimum0
Maximum7605
Zeros20
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:54:32.963730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.2
Q11.0592025
median2.22
Q34.34
95-th percentile12.802525
Maximum7605
Range7605
Interquartile range (IQR)3.2807975

Descriptive statistics

Standard deviation90.487557
Coefficient of variation (CV)16.068424
Kurtosis5540.3017
Mean5.6313898
Median Absolute Deviation (MAD)1.36
Skewness71.398956
Sum56313.898
Variance8187.9981
MonotonicityNot monotonic
2024-03-23T07:54:33.466791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0 105
 
1.1%
2.0 97
 
1.0%
3.0 88
 
0.9%
4.0 71
 
0.7%
0.5 68
 
0.7%
0.1 62
 
0.6%
6.0 61
 
0.6%
2.8 61
 
0.6%
8.0 61
 
0.6%
1.2 56
 
0.6%
Other values (2693) 9270
92.7%
ValueCountFrequency (%)
0.0 20
0.2%
0.00362 1
 
< 0.1%
0.00506 1
 
< 0.1%
0.00723 1
 
< 0.1%
0.0094 1
 
< 0.1%
0.01 14
0.1%
0.01808 1
 
< 0.1%
0.01923 1
 
< 0.1%
0.02 21
0.2%
0.02024 1
 
< 0.1%
ValueCountFrequency (%)
7605.0 1
< 0.1%
4394.0 1
< 0.1%
1632.0 1
< 0.1%
894.0 1
< 0.1%
606.0 1
< 0.1%
506.64 1
< 0.1%
300.0 1
< 0.1%
273.94 1
< 0.1%
270.0 1
< 0.1%
250.0 2
< 0.1%

Interactions

2024-03-23T07:54:27.882077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:26.449712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:27.218567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:28.166714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:26.715846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:27.380957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:28.442850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:26.985773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:27.610001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:54:33.820331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인증번호소재지품목재배면적(㎡)생산계획량(톤)
인증번호1.0000.9620.8690.0500.000
소재지0.9621.0000.9890.4630.434
품목0.8690.9891.0000.7210.719
재배면적(㎡)0.0500.4630.7211.0000.000
생산계획량(톤)0.0000.4340.7190.0001.000
2024-03-23T07:54:34.064032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인증번호재배면적(㎡)생산계획량(톤)품목
인증번호1.0000.0180.1430.519
재배면적(㎡)0.0181.0000.7780.353
생산계획량(톤)0.1430.7781.0000.402
품목0.5190.3530.4021.000

Missing values

2024-03-23T07:54:28.802204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:54:29.144667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

인증번호소재지품목재배면적(㎡)생산계획량(톤)
595201002262전라남도 순천시235.00.4843
417971001339충청북도 음성군방울토마토1498.07.93
562891002219전북특별자치도 남원시복숭아826.01.18
51741000113경기도 파주시211.00.1526
501631001733충청남도 아산시1452.02.99
901201002371강원특별자치도 철원군2193.04.4
571091002232경상북도 영주시사과813.01.6
89821000166경기도 용인시 처인구1398.00.97
995251002376강원특별자치도 철원군2489.01.79955
535561002148전북특별자치도 장수군사과192.00.35
인증번호소재지품목재배면적(㎡)생산계획량(톤)
102701000185전북특별자치도 김제시2499.01.81
983181002374강원특별자치도 철원군2942.05.9
954661002374강원특별자치도 철원군2800.05.6
889391002370강원특별자치도 철원군5432.010.9
703851002347충청북도 진천군483.00.4
319081000471경기도 평택시4049.02.59
567721002226충청북도 충주시복숭아5561.07.9133
182121000330경기도 평택시1962.01.3
901000031경기도 안성시1849.04.0
168761000292전라남도 나주시301.00.6

Duplicate rows

Most frequently occurring

인증번호소재지품목재배면적(㎡)생산계획량(톤)# duplicates
1321002340전북특별자치도 정읍시4000.02.8921
421000208전북특별자치도 부안군4000.02.818
1351002343전북특별자치도 정읍시4000.02.897
1671002352강원특별자치도 철원군42.00.036
601000360경상북도 경주시2000.01.455
911001096부산광역시 강서구토마토2975.019.33755
1431002344전라남도 고흥군2000.01.4465
171000121전라남도 나주시3000.01.824
181000125전북특별자치도 전주시 덕진구3967.02.984
311000186전북특별자치도 김제시3002.02.074