Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows287
Duplicate rows (%)2.9%
Total size in memory498.0 KiB
Average record size in memory51.0 B

Variable types

Numeric3
Text2

Dataset

Description국립농산물품질관리원에서 관리하는 농산물우수관리(GAP) 인증농가 현황 정보(인증번호, 소재지, 품목, 재배면적, 생산계획량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20181019000000000974

Alerts

Dataset has 287 (2.9%) duplicate rowsDuplicates
재배면적(㎡) is highly overall correlated with 생산계획량(톤)High correlation
생산계획량(톤) is highly overall correlated with 재배면적(㎡)High correlation
생산계획량(톤) is highly skewed (γ1 = 42.34022058)Skewed

Reproduction

Analysis started2024-03-23 07:54:10.732617
Analysis finished2024-03-23 07:54:15.192864
Duration4.46 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

인증번호
Real number (ℝ)

Distinct356
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1001448.7
Minimum1000003
Maximum1002376
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:54:15.422319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1000003
5-th percentile1000113
Q11000364
median1001688
Q31002351
95-th percentile1002374
Maximum1002376
Range2373
Interquartile range (IQR)1987

Descriptive statistics

Standard deviation936.47249
Coefficient of variation (CV)0.00093511777
Kurtosis-1.686248
Mean1001448.7
Median Absolute Deviation (MAD)682
Skewness-0.31746768
Sum1.0014487 × 1010
Variance876980.73
MonotonicityNot monotonic
2024-03-23T07:54:15.764926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1002352 899
 
9.0%
1002370 423
 
4.2%
1002374 398
 
4.0%
1002344 340
 
3.4%
1000471 306
 
3.1%
1002372 241
 
2.4%
1001688 232
 
2.3%
1000113 210
 
2.1%
1002335 210
 
2.1%
1000058 208
 
2.1%
Other values (346) 6533
65.3%
ValueCountFrequency (%)
1000003 1
 
< 0.1%
1000029 9
 
0.1%
1000031 4
 
< 0.1%
1000032 5
 
0.1%
1000033 4
 
< 0.1%
1000034 4
 
< 0.1%
1000035 6
 
0.1%
1000041 15
 
0.1%
1000058 208
2.1%
1000061 3
 
< 0.1%
ValueCountFrequency (%)
1002376 182
 
1.8%
1002374 398
4.0%
1002373 116
 
1.2%
1002372 241
 
2.4%
1002371 107
 
1.1%
1002370 423
4.2%
1002366 20
 
0.2%
1002363 3
 
< 0.1%
1002355 34
 
0.3%
1002352 899
9.0%
Distinct90
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:54:16.249436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length8.9271
Min length7

Characters and Unicode

Total characters89271
Distinct characters97
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st row경기도 평택시
2nd row경기도 이천시
3rd row경상북도 영주시
4th row전라북도 장수군
5th row전라남도 고흥군
ValueCountFrequency (%)
강원특별자치도 2998
14.5%
철원군 2926
14.2%
경기도 2248
 
10.9%
경상북도 1733
 
8.4%
평택시 1197
 
5.8%
전라북도 1056
 
5.1%
전라남도 797
 
3.9%
충청북도 584
 
2.8%
김제시 403
 
2.0%
상주시 372
 
1.8%
Other values (99) 6304
30.6%
2024-03-23T07:54:17.285423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10618
 
11.9%
9948
 
11.1%
6170
 
6.9%
5058
 
5.7%
4958
 
5.6%
4671
 
5.2%
3544
 
4.0%
3016
 
3.4%
3005
 
3.4%
3005
 
3.4%
Other values (87) 35278
39.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 78653
88.1%
Space Separator 10618
 
11.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9948
 
12.6%
6170
 
7.8%
5058
 
6.4%
4958
 
6.3%
4671
 
5.9%
3544
 
4.5%
3016
 
3.8%
3005
 
3.8%
3005
 
3.8%
3005
 
3.8%
Other values (86) 32273
41.0%
Space Separator
ValueCountFrequency (%)
10618
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 78653
88.1%
Common 10618
 
11.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9948
 
12.6%
6170
 
7.8%
5058
 
6.4%
4958
 
6.3%
4671
 
5.9%
3544
 
4.5%
3016
 
3.8%
3005
 
3.8%
3005
 
3.8%
3005
 
3.8%
Other values (86) 32273
41.0%
Common
ValueCountFrequency (%)
10618
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 78653
88.1%
ASCII 10618
 
11.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10618
100.0%
Hangul
ValueCountFrequency (%)
9948
 
12.6%
6170
 
7.8%
5058
 
6.4%
4958
 
6.3%
4671
 
5.9%
3544
 
4.5%
3016
 
3.8%
3005
 
3.8%
3005
 
3.8%
3005
 
3.8%
Other values (86) 32273
41.0%

품목
Text

Distinct51
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:54:17.739713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length1
Mean length1.4853
Min length1

Characters and Unicode

Total characters14853
Distinct characters87
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)0.1%

Sample

1st row
2nd row복숭아
3rd row복숭아
4th row사과
5th row
ValueCountFrequency (%)
4853
48.5%
1633
 
16.3%
사과 933
 
9.3%
복숭아 512
 
5.1%
포도 507
 
5.1%
338
 
3.4%
수박 168
 
1.7%
메론 127
 
1.3%
딸기 112
 
1.1%
대추방울 81
 
0.8%
Other values (41) 736
 
7.4%
2024-03-23T07:54:18.342668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4853
32.7%
1662
 
11.2%
1013
 
6.8%
1003
 
6.8%
512
 
3.4%
512
 
3.4%
512
 
3.4%
507
 
3.4%
507
 
3.4%
338
 
2.3%
Other values (77) 3434
23.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 14673
98.8%
Open Punctuation 90
 
0.6%
Close Punctuation 90
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4853
33.1%
1662
 
11.3%
1013
 
6.9%
1003
 
6.8%
512
 
3.5%
512
 
3.5%
512
 
3.5%
507
 
3.5%
507
 
3.5%
338
 
2.3%
Other values (75) 3254
22.2%
Open Punctuation
ValueCountFrequency (%)
( 90
100.0%
Close Punctuation
ValueCountFrequency (%)
) 90
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 14673
98.8%
Common 180
 
1.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4853
33.1%
1662
 
11.3%
1013
 
6.9%
1003
 
6.8%
512
 
3.5%
512
 
3.5%
512
 
3.5%
507
 
3.5%
507
 
3.5%
338
 
2.3%
Other values (75) 3254
22.2%
Common
ValueCountFrequency (%)
( 90
50.0%
) 90
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 14673
98.8%
ASCII 180
 
1.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4853
33.1%
1662
 
11.3%
1013
 
6.9%
1003
 
6.8%
512
 
3.5%
512
 
3.5%
512
 
3.5%
507
 
3.5%
507
 
3.5%
338
 
2.3%
Other values (75) 3254
22.2%
ASCII
ValueCountFrequency (%)
( 90
50.0%
) 90
50.0%

재배면적(㎡)
Real number (ℝ)

HIGH CORRELATION 

Distinct4776
Distinct (%)47.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2463.5928
Minimum1
Maximum49625
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:54:18.756864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile208.95
Q11015
median2010
Q33381.025
95-th percentile5907.05
Maximum49625
Range49624
Interquartile range (IQR)2366.025

Descriptive statistics

Standard deviation2204.998
Coefficient of variation (CV)0.89503347
Kurtosis41.763733
Mean2463.5928
Median Absolute Deviation (MAD)1098
Skewness3.9726599
Sum24635928
Variance4862016.4
MonotonicityNot monotonic
2024-03-23T07:54:19.283190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4000.0 52
 
0.5%
2000.0 50
 
0.5%
3967.0 23
 
0.2%
1000.0 23
 
0.2%
3000.0 22
 
0.2%
1500.0 17
 
0.2%
992.0 15
 
0.1%
1600.0 15
 
0.1%
2975.0 15
 
0.1%
1322.0 15
 
0.1%
Other values (4766) 9753
97.5%
ValueCountFrequency (%)
1.0 2
< 0.1%
2.0 2
< 0.1%
3.0 1
 
< 0.1%
4.0 3
< 0.1%
5.0 2
< 0.1%
6.0 1
 
< 0.1%
7.0 3
< 0.1%
9.0 4
< 0.1%
10.0 1
 
< 0.1%
11.0 1
 
< 0.1%
ValueCountFrequency (%)
49625.0 1
< 0.1%
33627.0 1
< 0.1%
31000.0 1
< 0.1%
28800.0 1
< 0.1%
27156.0 1
< 0.1%
25657.0 1
< 0.1%
23703.0 1
< 0.1%
23632.0 1
< 0.1%
22391.0 1
< 0.1%
20915.0 1
< 0.1%

생산계획량(톤)
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct2799
Distinct (%)28.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.3493725
Minimum0
Maximum3135
Zeros25
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:54:19.573672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.219881
Q11.03
median2.18
Q34.27
95-th percentile12.762
Maximum3135
Range3135
Interquartile range (IQR)3.24

Descriptive statistics

Standard deviation51.938198
Coefficient of variation (CV)9.7092132
Kurtosis2049.6668
Mean5.3493725
Median Absolute Deviation (MAD)1.34
Skewness42.340221
Sum53493.725
Variance2697.5764
MonotonicityNot monotonic
2024-03-23T07:54:19.923493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0 94
 
0.9%
2.0 79
 
0.8%
3.0 74
 
0.7%
8.0 71
 
0.7%
4.0 66
 
0.7%
6.0 64
 
0.6%
0.5 60
 
0.6%
1.8 59
 
0.6%
1.5 59
 
0.6%
0.3 57
 
0.6%
Other values (2789) 9317
93.2%
ValueCountFrequency (%)
0.0 25
0.2%
0.00362 1
 
< 0.1%
0.00651 1
 
< 0.1%
0.01 11
0.1%
0.01518 1
 
< 0.1%
0.01663 1
 
< 0.1%
0.01952 1
 
< 0.1%
0.02 16
0.2%
0.026 1
 
< 0.1%
0.02784 1
 
< 0.1%
ValueCountFrequency (%)
3135.0 1
< 0.1%
2218.0 1
< 0.1%
1945.0 1
< 0.1%
1797.0 1
< 0.1%
1200.0 1
< 0.1%
1120.0 1
< 0.1%
900.0 1
< 0.1%
818.0 1
< 0.1%
682.0 1
< 0.1%
400.0 1
< 0.1%

Interactions

2024-03-23T07:54:13.772726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:12.091880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:12.968495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:14.106050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:12.346575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:13.237107image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:14.367489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:12.625349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:54:13.505446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:54:20.190049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인증번호소재지품목재배면적(㎡)생산계획량(톤)
인증번호1.0000.9690.8920.0560.036
소재지0.9691.0000.9910.4020.705
품목0.8920.9911.0000.4250.756
재배면적(㎡)0.0560.4020.4251.0000.000
생산계획량(톤)0.0360.7050.7560.0001.000
2024-03-23T07:54:20.444081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인증번호재배면적(㎡)생산계획량(톤)
인증번호1.0000.0240.140
재배면적(㎡)0.0241.0000.771
생산계획량(톤)0.1400.7711.000

Missing values

2024-03-23T07:54:14.708210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:54:15.052501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

인증번호소재지품목재배면적(㎡)생산계획량(톤)
191281000331경기도 평택시4886.03.25
526351002143경기도 이천시복숭아992.01.41
444391001420경상북도 영주시복숭아2135.04.3
349181000612전라북도 장수군사과11400.021.01
686541002344전라남도 고흥군3951.22.85672
779751002352강원특별자치도 철원군4836.03.5
415341001348경상북도 김천시자두1482.01.5
49951000113경기도 파주시231.00.167
200511000332경기도 평택시5051.03.36
718011002348강원특별자치도 철원군3925.02.83778
인증번호소재지품목재배면적(㎡)생산계획량(톤)
172281000330경기도 평택시1937.01.29
957801002374강원특별자치도 철원군2152.04.3
85191000166경기도 용인시 처인구2866.01.73
251791000366경상북도 경주시2392.04.41
772251002352강원특별자치도 철원군1498.01.08
864231002370강원특별자치도 철원군3135.06.3
334591000471경기도 평택시2490.01.59
646021002335경상북도 상주시포도324.00.46
698571002347충청북도 진천군965.00.9
673881002344전라남도 고흥군4019.52.9061

Duplicate rows

Most frequently occurring

인증번호소재지품목재배면적(㎡)생산계획량(톤)# duplicates
1281002340전라북도 정읍시4000.02.8914
1301002343전라북도 정읍시4000.02.8910
1331002344전라남도 고흥군2000.01.4468
241000186전라북도 김제시3967.02.737
421000360경상북도 경주시2000.01.456
1341002344전라남도 고흥군3000.02.1696
761001096부산광역시 강서구토마토2975.019.33755
81000113경기도 파주시1329.00.96094
161000131전라북도 전주시 덕진구3967.02.984
811001248전라북도 남원시파프리카2310.011.439124