Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows323
Duplicate rows (%)3.2%
Total size in memory498.0 KiB
Average record size in memory51.0 B

Variable types

Numeric3
Text2

Dataset

Description국립농산물품질관리원에서 관리하는 농산물우수관리(GAP) 인증농가 현황 정보(인증번호, 소재지, 품목, 재배면적, 생산계획량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20181019000000000974

Alerts

Dataset has 323 (3.2%) duplicate rowsDuplicates
재배면적(제곱미터) is highly overall correlated with 생산계획량(톤)High correlation
생산계획량(톤) is highly overall correlated with 재배면적(제곱미터)High correlation
생산계획량(톤) is highly skewed (γ1 = 58.83960284)Skewed

Reproduction

Analysis started2024-03-23 07:53:41.374114
Analysis finished2024-03-23 07:53:45.758655
Duration4.38 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

인증번호
Real number (ℝ)

Distinct368
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1001365.9
Minimum1000003
Maximum1002374
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:53:45.975933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1000003
5-th percentile1000112
Q11000334
median1001573
Q31002348
95-th percentile1002371
Maximum1002374
Range2371
Interquartile range (IQR)2014

Descriptive statistics

Standard deviation949.61578
Coefficient of variation (CV)0.00094832045
Kurtosis-1.7721735
Mean1001365.9
Median Absolute Deviation (MAD)797
Skewness-0.16836117
Sum1.0013659 × 1010
Variance901770.12
MonotonicityNot monotonic
2024-03-23T07:53:46.442981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1002352 901
 
9.0%
1002370 462
 
4.6%
1000471 355
 
3.5%
1002344 326
 
3.3%
1000058 302
 
3.0%
1002372 274
 
2.7%
1001688 252
 
2.5%
1002335 219
 
2.2%
1002232 215
 
2.1%
1000377 196
 
2.0%
Other values (358) 6498
65.0%
ValueCountFrequency (%)
1000003 4
 
< 0.1%
1000014 4
 
< 0.1%
1000029 9
 
0.1%
1000031 8
 
0.1%
1000032 8
 
0.1%
1000033 2
 
< 0.1%
1000034 5
 
0.1%
1000035 4
 
< 0.1%
1000041 14
 
0.1%
1000058 302
3.0%
ValueCountFrequency (%)
1002374 97
 
1.0%
1002373 94
 
0.9%
1002372 274
2.7%
1002371 103
 
1.0%
1002370 462
4.6%
1002366 32
 
0.3%
1002365 19
 
0.2%
1002363 4
 
< 0.1%
1002355 30
 
0.3%
1002353 2
 
< 0.1%
Distinct94
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:53:47.165063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length7.7665
Min length7

Characters and Unicode

Total characters77665
Distinct characters94
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row경기도 평택시
2nd row경기도 화성시
3rd row경기도 화성시
4th row강원도 철원군
5th row강원도 철원군
ValueCountFrequency (%)
강원도 2568
 
12.4%
철원군 2504
 
12.1%
경기도 2420
 
11.7%
경상북도 1878
 
9.1%
평택시 1206
 
5.8%
전라북도 1128
 
5.5%
전라남도 810
 
3.9%
충청북도 577
 
2.8%
김제시 432
 
2.1%
상주시 398
 
1.9%
Other values (103) 6735
32.6%
2024-03-23T07:53:48.702247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10656
13.7%
9957
 
12.8%
5378
 
6.9%
5307
 
6.8%
5004
 
6.4%
4635
 
6.0%
3762
 
4.8%
2586
 
3.3%
2504
 
3.2%
2485
 
3.2%
Other values (84) 25391
32.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67009
86.3%
Space Separator 10656
 
13.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9957
14.9%
5378
 
8.0%
5307
 
7.9%
5004
 
7.5%
4635
 
6.9%
3762
 
5.6%
2586
 
3.9%
2504
 
3.7%
2485
 
3.7%
2422
 
3.6%
Other values (83) 22969
34.3%
Space Separator
ValueCountFrequency (%)
10656
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67009
86.3%
Common 10656
 
13.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9957
14.9%
5378
 
8.0%
5307
 
7.9%
5004
 
7.5%
4635
 
6.9%
3762
 
5.6%
2586
 
3.9%
2504
 
3.7%
2485
 
3.7%
2422
 
3.6%
Other values (83) 22969
34.3%
Common
ValueCountFrequency (%)
10656
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67009
86.3%
ASCII 10656
 
13.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10656
100.0%
Hangul
ValueCountFrequency (%)
9957
14.9%
5378
 
8.0%
5307
 
7.9%
5004
 
7.5%
4635
 
6.9%
3762
 
5.6%
2586
 
3.9%
2504
 
3.7%
2485
 
3.7%
2422
 
3.6%
Other values (83) 22969
34.3%

품목
Text

Distinct58
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:53:49.396639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length1
Mean length1.5303
Min length1

Characters and Unicode

Total characters15303
Distinct characters96
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row
ValueCountFrequency (%)
4139
41.4%
2152
21.5%
사과 1042
 
10.4%
포도 546
 
5.5%
복숭아 466
 
4.7%
349
 
3.5%
수박 177
 
1.8%
단감 124
 
1.2%
찰벼 99
 
1.0%
딸기 95
 
0.9%
Other values (48) 811
 
8.1%
2024-03-23T07:53:50.182474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4139
27.0%
2267
14.8%
1129
 
7.4%
1122
 
7.3%
548
 
3.6%
548
 
3.6%
466
 
3.0%
466
 
3.0%
466
 
3.0%
349
 
2.3%
Other values (86) 3803
24.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 14981
97.9%
Open Punctuation 161
 
1.1%
Close Punctuation 161
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4139
27.6%
2267
15.1%
1129
 
7.5%
1122
 
7.5%
548
 
3.7%
548
 
3.7%
466
 
3.1%
466
 
3.1%
466
 
3.1%
349
 
2.3%
Other values (84) 3481
23.2%
Open Punctuation
ValueCountFrequency (%)
( 161
100.0%
Close Punctuation
ValueCountFrequency (%)
) 161
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 14981
97.9%
Common 322
 
2.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4139
27.6%
2267
15.1%
1129
 
7.5%
1122
 
7.5%
548
 
3.7%
548
 
3.7%
466
 
3.1%
466
 
3.1%
466
 
3.1%
349
 
2.3%
Other values (84) 3481
23.2%
Common
ValueCountFrequency (%)
( 161
50.0%
) 161
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 14981
97.9%
ASCII 322
 
2.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4139
27.6%
2267
15.1%
1129
 
7.5%
1122
 
7.5%
548
 
3.7%
548
 
3.7%
466
 
3.1%
466
 
3.1%
466
 
3.1%
349
 
2.3%
Other values (84) 3481
23.2%
ASCII
ValueCountFrequency (%)
( 161
50.0%
) 161
50.0%

재배면적(제곱미터)
Real number (ℝ)

HIGH CORRELATION 

Distinct4645
Distinct (%)46.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2449.8015
Minimum1
Maximum86836
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:53:50.513348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile221
Q11030.75
median1998
Q33324
95-th percentile5795.63
Maximum86836
Range86835
Interquartile range (IQR)2293.25

Descriptive statistics

Standard deviation2361.7744
Coefficient of variation (CV)0.96406767
Kurtosis205.73658
Mean2449.8015
Median Absolute Deviation (MAD)1069
Skewness8.3575996
Sum24498015
Variance5577978.5
MonotonicityNot monotonic
2024-03-23T07:53:50.762129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4000.0 58
 
0.6%
2000.0 35
 
0.4%
3967.0 29
 
0.3%
3000.0 27
 
0.3%
1000.0 26
 
0.3%
992.0 16
 
0.2%
1600.0 16
 
0.2%
1322.0 15
 
0.1%
2975.0 14
 
0.1%
660.0 13
 
0.1%
Other values (4635) 9751
97.5%
ValueCountFrequency (%)
1.0 1
 
< 0.1%
2.0 1
 
< 0.1%
3.0 2
 
< 0.1%
5.0 2
 
< 0.1%
6.0 1
 
< 0.1%
7.0 2
 
< 0.1%
8.0 1
 
< 0.1%
9.0 3
< 0.1%
10.0 2
 
< 0.1%
11.0 5
0.1%
ValueCountFrequency (%)
86836.0 1
< 0.1%
50000.0 1
< 0.1%
43800.0 1
< 0.1%
38059.0 1
< 0.1%
32802.0 1
< 0.1%
27986.0 1
< 0.1%
26384.0 1
< 0.1%
25000.0 1
< 0.1%
23322.0 1
< 0.1%
23100.0 1
< 0.1%

생산계획량(톤)
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct2598
Distinct (%)26.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.9162399
Minimum0
Maximum7605
Zeros18
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:53:51.092944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.221924
Q11.05
median2.14
Q34
95-th percentile12.321
Maximum7605
Range7605
Interquartile range (IQR)2.95

Descriptive statistics

Standard deviation101.25861
Coefficient of variation (CV)17.115366
Kurtosis3856.174
Mean5.9162399
Median Absolute Deviation (MAD)1.270115
Skewness58.839603
Sum59162.399
Variance10253.306
MonotonicityNot monotonic
2024-03-23T07:53:51.663789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0 97
 
1.0%
3.0 79
 
0.8%
2.0 77
 
0.8%
1.3 73
 
0.7%
4.0 73
 
0.7%
2.8 67
 
0.7%
1.5 62
 
0.6%
0.5 59
 
0.6%
1.6 57
 
0.6%
8.0 56
 
0.6%
Other values (2588) 9300
93.0%
ValueCountFrequency (%)
0.0 18
0.2%
0.00506 1
 
< 0.1%
0.00795 1
 
< 0.1%
0.0094 1
 
< 0.1%
0.01 23
0.2%
0.01663 1
 
< 0.1%
0.02 18
0.2%
0.02024 1
 
< 0.1%
0.02386 1
 
< 0.1%
0.02603 1
 
< 0.1%
ValueCountFrequency (%)
7605.0 1
< 0.1%
4736.0 1
< 0.1%
3697.0 1
< 0.1%
2361.0 1
< 0.1%
1167.0 1
< 0.1%
939.0 1
< 0.1%
591.92 1
< 0.1%
494.0 1
< 0.1%
187.03 1
< 0.1%
135.98 2
< 0.1%

Interactions

2024-03-23T07:53:44.386080image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:42.716623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:43.504083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:44.654013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:42.981701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:43.762329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:44.922168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:43.199078image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:44.031675image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:53:51.844152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인증번호소재지품목재배면적(제곱미터)생산계획량(톤)
인증번호1.0000.9610.8840.0680.000
소재지0.9611.0000.9920.7580.589
품목0.8840.9921.0000.3490.726
재배면적(제곱미터)0.0680.7580.3491.0000.000
생산계획량(톤)0.0000.5890.7260.0001.000
2024-03-23T07:53:52.035824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인증번호재배면적(제곱미터)생산계획량(톤)
인증번호1.000-0.0080.131
재배면적(제곱미터)-0.0081.0000.774
생산계획량(톤)0.1310.7741.000

Missing values

2024-03-23T07:53:45.287546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:53:45.618887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

인증번호소재지품목재배면적(제곱미터)생산계획량(톤)
353941000471경기도 평택시1658.01.3
10781000058경기도 화성시215.00.15544
34911000058경기도 화성시1508.01.09028
917651002370강원도 철원군2970.05.9
964091002372강원도 철원군1277.02.6
723091002344전라남도 고흥군3826.02.7662
791361002351강원도 철원군1150.00.83145
677081002335경상북도 상주시포도324.00.46
963471002372강원도 철원군3990.08.0
600881002207경상북도 상주시포도2866.04.82
인증번호소재지품목재배면적(제곱미터)생산계획량(톤)
174121000277경상북도 문경시사과(후지)109.00.2
94801000166경기도 용인시 처인구1759.01.06
674631002335경상북도 상주시포도1325.02.23
203581000330경기도 평택시7921.05.07
99171000166경기도 용인시 처인구1233.00.75
248971000334경기도 평택시2551.01.63
671581002335경상북도 상주시포도269.00.45
984611002373강원도 철원군1704.03.4
745041002347충청북도 진천군1868.01.7
559041002025인천광역시 옹진군포도1311.01.78

Duplicate rows

Most frequently occurring

인증번호소재지품목재배면적(제곱미터)생산계획량(톤)# duplicates
1621002340전라북도 정읍시4000.02.8915
501000208전라북도 부안군4000.02.812
321000125전라북도 전주시 덕진구3967.02.777
1971002352강원도 철원군42.00.037
1651002343전라북도 정읍시4000.02.896
1671002344전라남도 고흥군2000.01.4466
471000186전라북도 김제시3967.03.25
291000113경기도 파주시3031.02.194
311000121전라남도 나주시3000.02.174
691000360경상북도 경주시2000.01.454