Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows319
Duplicate rows (%)3.2%
Total size in memory498.0 KiB
Average record size in memory51.0 B

Variable types

Numeric3
Text2

Dataset

Description국립농산물품질관리원에서 관리하는 농산물우수관리(GAP) 인증농가 현황 정보(인증번호, 소재지, 품목, 재배면적, 생산계획량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20181019000000000974

Alerts

Dataset has 319 (3.2%) duplicate rowsDuplicates
재배면적(제곱미터) is highly overall correlated with 생산계획량(톤)High correlation
생산계획량(톤) is highly overall correlated with 재배면적(제곱미터)High correlation
생산계획량(톤) is highly skewed (γ1 = 53.34502798)Skewed

Reproduction

Analysis started2024-03-23 07:53:55.932612
Analysis finished2024-03-23 07:53:59.951097
Duration4.02 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

인증번호
Real number (ℝ)

Distinct354
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1001451.2
Minimum1000003
Maximum1002376
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:54:00.094954image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1000003
5-th percentile1000113
Q11000364
median1001705.5
Q31002351
95-th percentile1002374
Maximum1002376
Range2373
Interquartile range (IQR)1987

Descriptive statistics

Standard deviation939.35231
Coefficient of variation (CV)0.00093799112
Kurtosis-1.6918355
Mean1001451.2
Median Absolute Deviation (MAD)664.5
Skewness-0.31966203
Sum1.0014512 × 1010
Variance882382.76
MonotonicityNot monotonic
2024-03-23T07:54:00.466408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1002352 898
 
9.0%
1002370 464
 
4.6%
1002374 413
 
4.1%
1002344 350
 
3.5%
1000471 326
 
3.3%
1001688 256
 
2.6%
1002372 248
 
2.5%
1000058 222
 
2.2%
1002335 219
 
2.2%
1002232 210
 
2.1%
Other values (344) 6394
63.9%
ValueCountFrequency (%)
1000003 3
 
< 0.1%
1000014 2
 
< 0.1%
1000029 7
 
0.1%
1000031 7
 
0.1%
1000032 10
 
0.1%
1000033 3
 
< 0.1%
1000034 6
 
0.1%
1000035 6
 
0.1%
1000041 21
 
0.2%
1000058 222
2.2%
ValueCountFrequency (%)
1002376 195
1.9%
1002374 413
4.1%
1002373 91
 
0.9%
1002372 248
2.5%
1002371 103
 
1.0%
1002370 464
4.6%
1002366 23
 
0.2%
1002363 5
 
0.1%
1002355 32
 
0.3%
1002353 1
 
< 0.1%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:54:01.159540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length7
Mean length7.6966
Min length7

Characters and Unicode

Total characters76966
Distinct characters92
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row전라북도 김제시
2nd row경기도 평택시
3rd row경기도 평택시
4th row경기도 화성시
5th row경상북도 칠곡군
ValueCountFrequency (%)
강원도 3060
14.9%
철원군 2994
14.6%
경기도 2271
 
11.0%
경상북도 1713
 
8.3%
평택시 1216
 
5.9%
전라북도 1037
 
5.0%
전라남도 767
 
3.7%
충청북도 595
 
2.9%
김제시 388
 
1.9%
상주시 366
 
1.8%
Other values (95) 6160
30.0%
2024-03-23T07:54:02.277573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10567
13.7%
9955
12.9%
6285
 
8.2%
5055
 
6.6%
4955
 
6.4%
4633
 
6.0%
3509
 
4.6%
3082
 
4.0%
2994
 
3.9%
2271
 
3.0%
Other values (82) 23660
30.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66399
86.3%
Space Separator 10567
 
13.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9955
15.0%
6285
 
9.5%
5055
 
7.6%
4955
 
7.5%
4633
 
7.0%
3509
 
5.3%
3082
 
4.6%
2994
 
4.5%
2271
 
3.4%
2268
 
3.4%
Other values (81) 21392
32.2%
Space Separator
ValueCountFrequency (%)
10567
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66399
86.3%
Common 10567
 
13.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9955
15.0%
6285
 
9.5%
5055
 
7.6%
4955
 
7.5%
4633
 
7.0%
3509
 
5.3%
3082
 
4.6%
2994
 
4.5%
2271
 
3.4%
2268
 
3.4%
Other values (81) 21392
32.2%
Common
ValueCountFrequency (%)
10567
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66399
86.3%
ASCII 10567
 
13.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10567
100.0%
Hangul
ValueCountFrequency (%)
9955
15.0%
6285
 
9.5%
5055
 
7.6%
4955
 
7.5%
4633
 
7.0%
3509
 
5.3%
3082
 
4.6%
2994
 
4.5%
2271
 
3.4%
2268
 
3.4%
Other values (81) 21392
32.2%

품목
Text

Distinct53
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:54:02.737560image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length1
Mean length1.4629
Min length1

Characters and Unicode

Total characters14629
Distinct characters90
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row포도
ValueCountFrequency (%)
4911
49.1%
1686
 
16.9%
사과 926
 
9.3%
포도 487
 
4.9%
복숭아 470
 
4.7%
352
 
3.5%
수박 196
 
2.0%
딸기 106
 
1.1%
메론 98
 
1.0%
사과(후지 77
 
0.8%
Other values (43) 691
 
6.9%
2024-03-23T07:54:03.472952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4911
33.6%
1712
 
11.7%
1016
 
6.9%
1003
 
6.9%
487
 
3.3%
487
 
3.3%
470
 
3.2%
470
 
3.2%
470
 
3.2%
352
 
2.4%
Other values (80) 3251
22.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 14429
98.6%
Open Punctuation 100
 
0.7%
Close Punctuation 100
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4911
34.0%
1712
 
11.9%
1016
 
7.0%
1003
 
7.0%
487
 
3.4%
487
 
3.4%
470
 
3.3%
470
 
3.3%
470
 
3.3%
352
 
2.4%
Other values (78) 3051
21.1%
Open Punctuation
ValueCountFrequency (%)
( 100
100.0%
Close Punctuation
ValueCountFrequency (%)
) 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 14429
98.6%
Common 200
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4911
34.0%
1712
 
11.9%
1016
 
7.0%
1003
 
7.0%
487
 
3.4%
487
 
3.4%
470
 
3.3%
470
 
3.3%
470
 
3.3%
352
 
2.4%
Other values (78) 3051
21.1%
Common
ValueCountFrequency (%)
( 100
50.0%
) 100
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 14429
98.6%
ASCII 200
 
1.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4911
34.0%
1712
 
11.9%
1016
 
7.0%
1003
 
7.0%
487
 
3.4%
487
 
3.4%
470
 
3.3%
470
 
3.3%
470
 
3.3%
352
 
2.4%
Other values (78) 3051
21.1%
ASCII
ValueCountFrequency (%)
( 100
50.0%
) 100
50.0%

재배면적(제곱미터)
Real number (ℝ)

HIGH CORRELATION 

Distinct4804
Distinct (%)48.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2479.8595
Minimum2
Maximum49625
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:54:03.762951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile218
Q11002
median1990
Q33402
95-th percentile5929.05
Maximum49625
Range49623
Interquartile range (IQR)2400

Descriptive statistics

Standard deviation2279.9193
Coefficient of variation (CV)0.91937439
Kurtosis44.282686
Mean2479.8595
Median Absolute Deviation (MAD)1100
Skewness4.1634201
Sum24798595
Variance5198032.1
MonotonicityNot monotonic
2024-03-23T07:54:04.245136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4000.0 62
 
0.6%
3000.0 37
 
0.4%
2000.0 36
 
0.4%
1000.0 28
 
0.3%
1653.0 18
 
0.2%
1200.0 17
 
0.2%
3967.0 17
 
0.2%
500.0 16
 
0.2%
800.0 15
 
0.1%
1256.0 14
 
0.1%
Other values (4794) 9740
97.4%
ValueCountFrequency (%)
2.0 2
< 0.1%
4.0 2
< 0.1%
5.0 2
< 0.1%
6.0 2
< 0.1%
7.0 1
 
< 0.1%
8.0 2
< 0.1%
9.0 2
< 0.1%
10.0 3
< 0.1%
11.0 1
 
< 0.1%
12.0 1
 
< 0.1%
ValueCountFrequency (%)
49625.0 1
< 0.1%
39600.0 1
< 0.1%
38059.0 1
< 0.1%
30000.0 1
< 0.1%
26446.0 1
< 0.1%
24300.0 1
< 0.1%
22059.0 1
< 0.1%
21976.0 1
< 0.1%
21717.0 1
< 0.1%
20723.1 1
< 0.1%

생산계획량(톤)
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct2802
Distinct (%)28.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.3786989
Minimum0
Maximum9614
Zeros14
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:54:04.775581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.219881
Q11.027105
median2.173315
Q34.2
95-th percentile12.5
Maximum9614
Range9614
Interquartile range (IQR)3.172895

Descriptive statistics

Standard deviation127.84293
Coefficient of variation (CV)17.325945
Kurtosis3487.8114
Mean7.3786989
Median Absolute Deviation (MAD)1.333315
Skewness53.345028
Sum73786.989
Variance16343.815
MonotonicityNot monotonic
2024-03-23T07:54:05.186175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0 92
 
0.9%
2.0 87
 
0.9%
3.0 77
 
0.8%
4.0 74
 
0.7%
6.0 60
 
0.6%
0.8 59
 
0.6%
8.0 56
 
0.6%
1.5 54
 
0.5%
0.5 53
 
0.5%
0.3 53
 
0.5%
Other values (2792) 9335
93.3%
ValueCountFrequency (%)
0.0 14
0.1%
0.00289 1
 
< 0.1%
0.00434 1
 
< 0.1%
0.00651 1
 
< 0.1%
0.00868 1
 
< 0.1%
0.01 23
0.2%
0.01084 1
 
< 0.1%
0.01301 1
 
< 0.1%
0.01446 1
 
< 0.1%
0.016 1
 
< 0.1%
ValueCountFrequency (%)
9614.0 1
< 0.1%
4394.0 1
< 0.1%
3573.0 1
< 0.1%
3441.0 1
< 0.1%
2576.0 1
< 0.1%
2435.0 1
< 0.1%
2324.0 1
< 0.1%
2000.0 1
< 0.1%
1442.0 1
< 0.1%
1288.0 1
< 0.1%

Interactions

2024-03-23T07:53:59.018990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:57.441862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:58.227201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:59.219203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:57.699240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:58.494493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:59.420234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:57.967275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:53:58.753208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:54:05.502116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인증번호소재지품목재배면적(제곱미터)생산계획량(톤)
인증번호1.0000.9610.8730.0510.000
소재지0.9611.0000.9900.3140.565
품목0.8730.9901.0000.1710.733
재배면적(제곱미터)0.0510.3140.1711.0000.000
생산계획량(톤)0.0000.5650.7330.0001.000
2024-03-23T07:54:05.684006image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인증번호재배면적(제곱미터)생산계획량(톤)
인증번호1.0000.0260.138
재배면적(제곱미터)0.0261.0000.778
생산계획량(톤)0.1380.7781.000

Missing values

2024-03-23T07:53:59.690404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:53:59.871230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

인증번호소재지품목재배면적(제곱미터)생산계획량(톤)
112181000193전라북도 김제시2817.02.25
198161000332경기도 평택시2542.01.69
322591000471경기도 평택시5527.03.53
16751000058경기도 화성시2334.01.69
514211002039경상북도 칠곡군포도5930.016.0
44191000113경기도 파주시2118.01.5313
252171000366경상북도 경주시4290.07.91
586561002244강원도 원주시복숭아1546.02.2
724291002350강원도 철원군2476.91.7908
997711002376강원도 철원군935.00.676
인증번호소재지품목재배면적(제곱미터)생산계획량(톤)
558211002219전라북도 남원시복숭아401.00.57
994451002376강원도 철원군7230.05.22729
739251002351강원도 철원군1593.01.15174
795781002352강원도 철원군4304.03.11
907941002372강원도 철원군3741.07.5
240331000362경상북도 경주시2832.02.05
108151000191전라북도 김제시3796.02.94
962641002374강원도 철원군5918.011.8
713821002348강원도 철원군281.00.20316
8261000058경기도 화성시625.00.45

Duplicate rows

Most frequently occurring

인증번호소재지품목재배면적(제곱미터)생산계획량(톤)# duplicates
1421002340전라북도 정읍시4000.02.8912
1441002343전라북도 정읍시4000.02.898
331000208전라북도 부안군4000.02.87
931001096부산광역시 강서구토마토2975.019.33756
231000125전라북도 전주시 덕진구3967.02.985
711000377경기도 평택시4000.02.665
1081001570전라북도 전주시 덕진구4000.02.255
1411002340전라북도 정읍시2000.01.455
1471002344전라남도 고흥군2000.01.4465
281000186전라북도 김제시3967.02.734