Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows328
Duplicate rows (%)3.3%
Total size in memory498.0 KiB
Average record size in memory51.0 B

Variable types

Numeric3
Text2

Dataset

Description국립농산물품질관리원에서 관리하는 농산물우수관리(GAP) 인증농가 현황 정보(인증번호, 소재지, 품목, 재배면적, 생산계획량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20181019000000000974

Alerts

Dataset has 328 (3.3%) duplicate rowsDuplicates
재배면적(제곱미터) is highly overall correlated with 생산계획량(톤)High correlation
생산계획량(톤) is highly overall correlated with 재배면적(제곱미터)High correlation
생산계획량(톤) is highly skewed (γ1 = 33.87299154)Skewed

Reproduction

Analysis started2024-03-23 07:53:26.591335
Analysis finished2024-03-23 07:53:31.782322
Duration5.19 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

인증번호
Real number (ℝ)

Distinct379
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1001367.6
Minimum1000003
Maximum1002374
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:53:32.011907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1000003
5-th percentile1000113
Q11000334
median1001573
Q31002348
95-th percentile1002371
Maximum1002374
Range2371
Interquartile range (IQR)2014

Descriptive statistics

Standard deviation941.6101
Coefficient of variation (CV)0.00094032409
Kurtosis-1.7581977
Mean1001367.6
Median Absolute Deviation (MAD)797
Skewness-0.17114195
Sum1.0013676 × 1010
Variance886629.57
MonotonicityNot monotonic
2024-03-23T07:53:32.505270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1002352 986
 
9.9%
1002370 429
 
4.3%
1000471 346
 
3.5%
1000058 283
 
2.8%
1002344 253
 
2.5%
1001688 249
 
2.5%
1002372 231
 
2.3%
1002335 207
 
2.1%
1000113 195
 
1.9%
1002232 193
 
1.9%
Other values (369) 6628
66.3%
ValueCountFrequency (%)
1000003 4
 
< 0.1%
1000014 2
 
< 0.1%
1000029 8
 
0.1%
1000031 2
 
< 0.1%
1000032 12
 
0.1%
1000033 3
 
< 0.1%
1000034 5
 
0.1%
1000035 6
 
0.1%
1000041 19
 
0.2%
1000058 283
2.8%
ValueCountFrequency (%)
1002374 152
 
1.5%
1002373 116
 
1.2%
1002372 231
2.3%
1002371 97
 
1.0%
1002370 429
4.3%
1002366 22
 
0.2%
1002365 18
 
0.2%
1002363 2
 
< 0.1%
1002359 6
 
0.1%
1002357 4
 
< 0.1%
Distinct94
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:53:33.094240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length7.7498
Min length7

Characters and Unicode

Total characters77498
Distinct characters95
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st row전라남도 고흥군
2nd row전라남도 고흥군
3rd row강원도 철원군
4th row강원도 철원군
5th row강원도 철원군
ValueCountFrequency (%)
강원도 2603
 
12.6%
철원군 2527
 
12.3%
경기도 2407
 
11.7%
경상북도 1855
 
9.0%
전라북도 1235
 
6.0%
평택시 1234
 
6.0%
전라남도 736
 
3.6%
충청북도 592
 
2.9%
김제시 512
 
2.5%
화성시 374
 
1.8%
Other values (103) 6542
31.7%
2024-03-23T07:53:34.128846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10617
13.7%
9953
 
12.8%
5359
 
6.9%
5295
 
6.8%
4930
 
6.4%
4724
 
6.1%
3841
 
5.0%
2622
 
3.4%
2527
 
3.3%
2427
 
3.1%
Other values (85) 25203
32.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66881
86.3%
Space Separator 10617
 
13.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9953
14.9%
5359
 
8.0%
5295
 
7.9%
4930
 
7.4%
4724
 
7.1%
3841
 
5.7%
2622
 
3.9%
2527
 
3.8%
2427
 
3.6%
2408
 
3.6%
Other values (84) 22795
34.1%
Space Separator
ValueCountFrequency (%)
10617
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66881
86.3%
Common 10617
 
13.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9953
14.9%
5359
 
8.0%
5295
 
7.9%
4930
 
7.4%
4724
 
7.1%
3841
 
5.7%
2622
 
3.9%
2527
 
3.8%
2427
 
3.6%
2408
 
3.6%
Other values (84) 22795
34.1%
Common
ValueCountFrequency (%)
10617
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66881
86.3%
ASCII 10617
 
13.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10617
100.0%
Hangul
ValueCountFrequency (%)
9953
14.9%
5359
 
8.0%
5295
 
7.9%
4930
 
7.4%
4724
 
7.1%
3841
 
5.7%
2622
 
3.9%
2527
 
3.8%
2427
 
3.6%
2408
 
3.6%
Other values (84) 22795
34.1%

품목
Text

Distinct61
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:53:34.530104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length1
Mean length1.6332
Min length1

Characters and Unicode

Total characters16332
Distinct characters102
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row
ValueCountFrequency (%)
4072
40.7%
2121
21.2%
사과 1075
 
10.8%
복숭아 672
 
6.7%
포도 348
 
3.5%
234
 
2.3%
수박 208
 
2.1%
딸기 112
 
1.1%
단감 105
 
1.1%
찰벼 95
 
0.9%
Other values (51) 958
 
9.6%
2024-03-23T07:53:35.411547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4189
25.6%
2234
13.7%
1164
 
7.1%
1152
 
7.1%
672
 
4.1%
672
 
4.1%
672
 
4.1%
350
 
2.1%
350
 
2.1%
331
 
2.0%
Other values (92) 4546
27.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 15800
96.7%
Close Punctuation 266
 
1.6%
Open Punctuation 266
 
1.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4189
26.5%
2234
14.1%
1164
 
7.4%
1152
 
7.3%
672
 
4.3%
672
 
4.3%
672
 
4.3%
350
 
2.2%
350
 
2.2%
331
 
2.1%
Other values (90) 4014
25.4%
Close Punctuation
ValueCountFrequency (%)
) 266
100.0%
Open Punctuation
ValueCountFrequency (%)
( 266
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 15800
96.7%
Common 532
 
3.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4189
26.5%
2234
14.1%
1164
 
7.4%
1152
 
7.3%
672
 
4.3%
672
 
4.3%
672
 
4.3%
350
 
2.2%
350
 
2.2%
331
 
2.1%
Other values (90) 4014
25.4%
Common
ValueCountFrequency (%)
) 266
50.0%
( 266
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 15800
96.7%
ASCII 532
 
3.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4189
26.5%
2234
14.1%
1164
 
7.4%
1152
 
7.3%
672
 
4.3%
672
 
4.3%
672
 
4.3%
350
 
2.2%
350
 
2.2%
331
 
2.1%
Other values (90) 4014
25.4%
ASCII
ValueCountFrequency (%)
) 266
50.0%
( 266
50.0%

재배면적(제곱미터)
Real number (ℝ)

HIGH CORRELATION 

Distinct4550
Distinct (%)45.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2450.3831
Minimum3
Maximum42651
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:53:35.818386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile225
Q11022
median2002
Q33401.25
95-th percentile5802.95
Maximum42651
Range42648
Interquartile range (IQR)2379.25

Descriptive statistics

Standard deviation2118.4791
Coefficient of variation (CV)0.86455014
Kurtosis31.987938
Mean2450.3831
Median Absolute Deviation (MAD)1102
Skewness3.4943091
Sum24503831
Variance4487953.5
MonotonicityNot monotonic
2024-03-23T07:53:36.363572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4000.0 60
 
0.6%
2000.0 34
 
0.3%
3967.0 33
 
0.3%
1000.0 27
 
0.3%
3000.0 20
 
0.2%
992.0 18
 
0.2%
400.0 17
 
0.2%
595.0 17
 
0.2%
3002.0 16
 
0.2%
800.0 16
 
0.2%
Other values (4540) 9742
97.4%
ValueCountFrequency (%)
3.0 2
< 0.1%
4.0 1
 
< 0.1%
5.0 2
< 0.1%
6.0 1
 
< 0.1%
7.0 2
< 0.1%
8.0 1
 
< 0.1%
9.0 1
 
< 0.1%
10.0 3
< 0.1%
11.0 3
< 0.1%
12.0 3
< 0.1%
ValueCountFrequency (%)
42651.0 1
< 0.1%
32545.0 1
< 0.1%
28321.0 1
< 0.1%
28133.0 1
< 0.1%
25418.0 1
< 0.1%
25346.0 1
< 0.1%
20069.0 1
< 0.1%
20000.0 2
< 0.1%
19786.0 1
< 0.1%
19543.0 1
< 0.1%

생산계획량(톤)
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct1729
Distinct (%)17.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.6249611
Minimum0
Maximum591.92
Zeros23
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:53:36.745332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.21
Q11
median2.07
Q33.7
95-th percentile11.7335
Maximum591.92
Range591.92
Interquartile range (IQR)2.7

Descriptive statistics

Standard deviation9.4730635
Coefficient of variation (CV)2.6132869
Kurtosis1756.2154
Mean3.6249611
Median Absolute Deviation (MAD)1.17
Skewness33.872992
Sum36249.611
Variance89.738931
MonotonicityNot monotonic
2024-03-23T07:53:37.164063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0 144
 
1.4%
3.0 124
 
1.2%
2.8 115
 
1.1%
2.0 114
 
1.1%
4.0 91
 
0.9%
1.2 85
 
0.9%
3.1 83
 
0.8%
4.1 72
 
0.7%
0.5 71
 
0.7%
0.9 71
 
0.7%
Other values (1719) 9030
90.3%
ValueCountFrequency (%)
0.0 23
0.2%
0.01 27
0.3%
0.02 24
0.2%
0.03 27
0.3%
0.04 19
0.2%
0.04989 1
 
< 0.1%
0.05 21
0.2%
0.06 21
0.2%
0.06435 1
 
< 0.1%
0.07 23
0.2%
ValueCountFrequency (%)
591.92 1
< 0.1%
300.0 2
< 0.1%
273.93 1
< 0.1%
135.63 1
< 0.1%
130.12 1
< 0.1%
105.0 1
< 0.1%
100.0 1
< 0.1%
82.0 1
< 0.1%
75.0 1
< 0.1%
73.86 1
< 0.1%

Interactions

2024-03-23T07:53:30.238882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:28.537229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:29.412399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:30.723539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:28.858835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:29.658803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:30.990091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:29.142515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:29.965646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:53:37.434649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인증번호소재지품목재배면적(제곱미터)생산계획량(톤)
인증번호1.0000.9580.9100.0650.106
소재지0.9581.0000.9920.1620.480
품목0.9100.9921.0000.0000.847
재배면적(제곱미터)0.0650.1620.0001.0000.464
생산계획량(톤)0.1060.4800.8470.4641.000
2024-03-23T07:53:37.691422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인증번호재배면적(제곱미터)생산계획량(톤)
인증번호1.000-0.0120.021
재배면적(제곱미터)-0.0121.0000.781
생산계획량(톤)0.0210.7811.000

Missing values

2024-03-23T07:53:31.367020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:53:31.661920image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

인증번호소재지품목재배면적(제곱미터)생산계획량(톤)
727541002344전라남도 고흥군3826.02.77
724371002344전라남도 고흥군1540.01.11
811821002352강원도 철원군143.00.1
966591002372강원도 철원군9541.09.8
932871002370강원도 철원군1468.01.5
112241000185전라북도 김제시3716.03.0
576781002139강원도 횡성군브로코리(녹색꽃양배추)5170.4814.0
472911001379경기도 평택시대추방울1057.04.8
751311002348강원도 철원군8258.05.97
59121000113경기도 파주시3883.02.81
인증번호소재지품목재배면적(제곱미터)생산계획량(톤)
651801002295충청북도 음성군복숭아1103.01.57
760551002349강원도 철원군397.00.29
121081000190전라북도 김제시4071.03.3
709811002344전라남도 고흥군4011.02.9
79351000125전라북도 전주시 덕진구3422.02.6
816861002352강원도 철원군4684.03.39
55011000113경기도 파주시1558.01.13
403621000874충청남도 논산시딸기3030.09.26
786091002351강원도 철원군4975.33.6
531791001688경기도 연천군1250.00.9

Duplicate rows

Most frequently occurring

인증번호소재지품목재배면적(제곱미터)생산계획량(톤)# duplicates
1461002340전라북도 정읍시4000.02.826
421000208전라북도 부안군4000.02.89
391000186전라북도 김제시3967.03.27
301000125전라북도 전주시 덕진구3967.02.776
381000186전라북도 김제시3002.02.45
961000938전라남도 담양군딸기(기타)2280.06.975
1471002342전라북도 정읍시3675.02.565
1481002343전라북도 정읍시4000.02.85
1541002344전라남도 고흥군2000.01.455
281000121전라남도 나주시3000.02.174