Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Categorical4
Numeric2
Text1

Dataset

Description국립농산물품질관리원에서 관리하는 하추곡검사계획량(년도, 품목명,행정구역, 업무구분, 수량단위, 수량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20170915000000000843

Alerts

실적구분명 has constant value ""Constant
품목명 is highly overall correlated with 업무구분명High correlation
업무구분명 is highly overall correlated with 품목명 and 1 other fieldsHigh correlation
수량단위 is highly overall correlated with 업무구분명High correlation
수량단위 is highly imbalanced (67.2%)Imbalance

Reproduction

Analysis started2024-03-23 07:30:55.210701
Analysis finished2024-03-23 07:30:57.985625
Duration2.77 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

실적구분명
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
계획
10000 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row계획
2nd row계획
3rd row계획
4th row계획
5th row계획

Common Values

ValueCountFrequency (%)
계획 10000
100.0%

Length

2024-03-23T07:30:58.177126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:30:58.466822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
계획 10000
100.0%

년도
Real number (ℝ)

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2008.4227
Minimum1998
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:30:58.804986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1998
5-th percentile1999
Q12003
median2008
Q32014
95-th percentile2019
Maximum2021
Range23
Interquartile range (IQR)11

Descriptive statistics

Standard deviation6.5237014
Coefficient of variation (CV)0.0032481715
Kurtosis-1.1629532
Mean2008.4227
Median Absolute Deviation (MAD)6
Skewness0.22667756
Sum20084227
Variance42.558681
MonotonicityNot monotonic
2024-03-23T07:30:59.149564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
2001 598
 
6.0%
2003 572
 
5.7%
2002 564
 
5.6%
2004 558
 
5.6%
2016 521
 
5.2%
2006 514
 
5.1%
2008 508
 
5.1%
2009 498
 
5.0%
2019 476
 
4.8%
2007 472
 
4.7%
Other values (14) 4719
47.2%
ValueCountFrequency (%)
1998 260
2.6%
1999 452
4.5%
2000 452
4.5%
2001 598
6.0%
2002 564
5.6%
2003 572
5.7%
2004 558
5.6%
2005 459
4.6%
2006 514
5.1%
2007 472
4.7%
ValueCountFrequency (%)
2021 17
 
0.2%
2020 384
3.8%
2019 476
4.8%
2018 316
3.2%
2017 361
3.6%
2016 521
5.2%
2015 368
3.7%
2014 309
3.1%
2013 302
3.0%
2012 264
2.6%

품목명
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
벼종자
6081 
<NA>
1718 
겉보리종자
821 
 
511
맥주보리종자
 
374
Other values (5)
 
495

Length

Max length6
Median length3
Mean length3.398
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row겉보리종자
3rd row벼종자
4th row벼종자
5th row벼종자

Common Values

ValueCountFrequency (%)
벼종자 6081
60.8%
<NA> 1718
 
17.2%
겉보리종자 821
 
8.2%
511
 
5.1%
맥주보리종자 374
 
3.7%
옥수수종자 253
 
2.5%
밀종자 215
 
2.1%
팥종자 13
 
0.1%
콩나물콩 11
 
0.1%
녹두종자 3
 
< 0.1%

Length

2024-03-23T07:30:59.634947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:31:00.179416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
벼종자 6081
60.8%
na 1718
 
17.2%
겉보리종자 821
 
8.2%
511
 
5.1%
맥주보리종자 374
 
3.7%
옥수수종자 253
 
2.5%
밀종자 215
 
2.1%
팥종자 13
 
0.1%
콩나물콩 11
 
0.1%
녹두종자 3
 
< 0.1%
Distinct251
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:31:01.227873image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length8
Mean length7.934
Min length6

Characters and Unicode

Total characters79340
Distinct characters134
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique34 ?
Unique (%)0.3%

Sample

1st row충청북도 보은군
2nd row경상남도 밀양시
3rd row경상남도 사천시
4th row경기도 안성시
5th row경상남도 남해군
ValueCountFrequency (%)
전라남도 1821
 
9.1%
경상남도 1459
 
7.3%
경상북도 1390
 
6.9%
전라북도 1064
 
5.3%
충청남도 910
 
4.5%
강원도 899
 
4.5%
경기도 822
 
4.1%
충청북도 629
 
3.1%
광주광역시 200
 
1.0%
고성군 165
 
0.8%
Other values (249) 10685
53.3%
2024-03-23T07:31:02.912398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10045
 
12.7%
9262
 
11.7%
5645
 
7.1%
4688
 
5.9%
4611
 
5.8%
3870
 
4.9%
3243
 
4.1%
3109
 
3.9%
2926
 
3.7%
2885
 
3.6%
Other values (124) 29056
36.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 69295
87.3%
Space Separator 10045
 
12.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9262
 
13.4%
5645
 
8.1%
4688
 
6.8%
4611
 
6.7%
3870
 
5.6%
3243
 
4.7%
3109
 
4.5%
2926
 
4.2%
2885
 
4.2%
1873
 
2.7%
Other values (123) 27183
39.2%
Space Separator
ValueCountFrequency (%)
10045
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 69295
87.3%
Common 10045
 
12.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9262
 
13.4%
5645
 
8.1%
4688
 
6.8%
4611
 
6.7%
3870
 
5.6%
3243
 
4.7%
3109
 
4.5%
2926
 
4.2%
2885
 
4.2%
1873
 
2.7%
Other values (123) 27183
39.2%
Common
ValueCountFrequency (%)
10045
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 69282
87.3%
ASCII 10045
 
12.7%
Compat Jamo 13
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10045
100.0%
Hangul
ValueCountFrequency (%)
9262
 
13.4%
5645
 
8.1%
4688
 
6.8%
4611
 
6.7%
3870
 
5.6%
3243
 
4.7%
3109
 
4.5%
2926
 
4.2%
2885
 
4.2%
1873
 
2.7%
Other values (122) 27170
39.2%
Compat Jamo
ValueCountFrequency (%)
13
100.0%

업무구분명
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
공공비축벼 검사(산물)
2096 
하곡검사 포대벼검사(40kg)
2073 
공공비축벼 포대벼검사(40kg)
1802 
공공비축벼 포대벼검사(40kg),(800kg)
1644 
잡곡검사(콩, 옥수수)(40kg) (2019.01 이전 )
1374 
Other values (12)
1011 

Length

Max length33
Median length31
Mean length18.8261
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row잡곡검사(콩, 옥수수)(40kg) (2019.01 이전 )
2nd row하곡검사 포대벼검사(40kg)
3rd row공공비축벼 검사(산물)
4th row공공비축벼 검사(산물)
5th row농협시가매입(시장격리곡)

Common Values

ValueCountFrequency (%)
공공비축벼 검사(산물) 2096
21.0%
하곡검사 포대벼검사(40kg) 2073
20.7%
공공비축벼 포대벼검사(40kg) 1802
18.0%
공공비축벼 포대벼검사(40kg),(800kg) 1644
16.4%
잡곡검사(콩, 옥수수)(40kg) (2019.01 이전 ) 1374
13.7%
농협시가매입(시장격리곡) 234
 
2.3%
공공비축벼 피해벼 196
 
2.0%
공공비축벼 포대벼검사(800kg) 128
 
1.3%
콩검사(일반콩)(20kg),(40kg),(800kg) 119
 
1.2%
하곡검사 (산물) 119
 
1.2%
Other values (7) 215
 
2.1%

Length

2024-03-23T07:31:03.379739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
공공비축벼 5866
24.9%
포대벼검사(40kg 3875
16.4%
하곡검사 2192
 
9.3%
검사(산물 2096
 
8.9%
포대벼검사(40kg),(800kg 1644
 
7.0%
잡곡검사(콩 1385
 
5.9%
이전 1385
 
5.9%
1385
 
5.9%
옥수수)(40kg 1374
 
5.8%
2019.01 1374
 
5.8%
Other values (15) 1028
 
4.4%

수량단위
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
40kg/대
7622 
Kg
1832 
800kg/톤백
 
230
30Kg/대
 
152
20Kg/대
 
83
Other values (5)
 
81

Length

Max length8
Median length6
Mean length5.2988
Min length2

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row40kg/대
2nd row40kg/대
3rd rowKg
4th rowKg
5th rowTon

Common Values

ValueCountFrequency (%)
40kg/대 7622
76.2%
Kg 1832
 
18.3%
800kg/톤백 230
 
2.3%
30Kg/대 152
 
1.5%
20Kg/대 83
 
0.8%
Ton 52
 
0.5%
20kg/대 14
 
0.1%
600kg/대 13
 
0.1%
10Kg/대 1
 
< 0.1%
4kg/대 1
 
< 0.1%

Length

2024-03-23T07:31:03.785154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:31:04.160123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
40kg/대 7622
76.2%
kg 1832
 
18.3%
800kg/톤백 230
 
2.3%
30kg/대 152
 
1.5%
20kg/대 97
 
1.0%
ton 52
 
0.5%
600kg/대 13
 
0.1%
10kg/대 1
 
< 0.1%
4kg/대 1
 
< 0.1%

수량
Real number (ℝ)

Distinct7825
Distinct (%)78.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean421503.07
Minimum1
Maximum22567000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:31:04.584981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile88
Q11592
median18662.5
Q3136568.5
95-th percentile2550044
Maximum22567000
Range22566999
Interquartile range (IQR)134976.5

Descriptive statistics

Standard deviation1355724.2
Coefficient of variation (CV)3.216404
Kurtosis47.650594
Mean421503.07
Median Absolute Deviation (MAD)18483.5
Skewness5.8805602
Sum4.2150307 × 109
Variance1.837988 × 1012
MonotonicityNot monotonic
2024-03-23T07:31:05.019135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15000 28
 
0.3%
5000 27
 
0.3%
500 25
 
0.2%
200 23
 
0.2%
10000 23
 
0.2%
100 21
 
0.2%
3000 20
 
0.2%
50 19
 
0.2%
4000 18
 
0.2%
2500 17
 
0.2%
Other values (7815) 9779
97.8%
ValueCountFrequency (%)
1 1
 
< 0.1%
2 4
 
< 0.1%
3 10
0.1%
4 7
0.1%
5 5
 
0.1%
6 7
0.1%
7 7
0.1%
8 5
 
0.1%
9 5
 
0.1%
10 13
0.1%
ValueCountFrequency (%)
22567000 1
< 0.1%
20633800 1
< 0.1%
18893280 1
< 0.1%
18170720 1
< 0.1%
17142560 1
< 0.1%
16436840 1
< 0.1%
16234920 1
< 0.1%
15230640 1
< 0.1%
13845360 1
< 0.1%
13354040 1
< 0.1%

Interactions

2024-03-23T07:30:56.736509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:30:56.179298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:30:57.001899image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:30:56.470011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:31:05.258989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년도품목명업무구분명수량단위수량
년도1.0000.3810.7040.5540.213
품목명0.3811.0000.8340.2560.104
업무구분명0.7040.8341.0000.8860.280
수량단위0.5540.2560.8861.0000.497
수량0.2130.1040.2800.4971.000
2024-03-23T07:31:05.511988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업무구분명수량단위품목명
업무구분명1.0000.6140.542
수량단위0.6141.0000.128
품목명0.5420.1281.000
2024-03-23T07:31:05.802443image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년도수량품목명업무구분명수량단위
년도1.000-0.0210.1810.3680.199
수량-0.0211.0000.0470.1130.172
품목명0.1810.0471.0000.5420.128
업무구분명0.3680.1130.5421.0000.614
수량단위0.1990.1720.1280.6141.000

Missing values

2024-03-23T07:30:57.301680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:30:57.826516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

실적구분명년도품목명행정구역명업무구분명수량단위수량
5142계획2006<NA>충청북도 보은군잡곡검사(콩, 옥수수)(40kg) (2019.01 이전 )40kg/대893
6313계획2009겉보리종자경상남도 밀양시하곡검사 포대벼검사(40kg)40kg/대100219
1624계획2001벼종자경상남도 사천시공공비축벼 검사(산물)Kg2755920
1580계획2001벼종자경기도 안성시공공비축벼 검사(산물)Kg7612320
9372계획2016벼종자경상남도 남해군농협시가매입(시장격리곡)Ton600
3652계획2004벼종자경상북도 의성군공공비축벼 검사(산물)Kg4576640
1704계획2001벼종자대전광역시 중구농협시가매입(시장격리곡)40kg/대893
1046계획2000벼종자경기도 화성시공공비축벼 포대벼검사(40kg)40kg/대70166
6266계획2008<NA>전라남도 화순군하곡검사 포대벼검사(40kg)40kg/대10840
4030계획2005겉보리종자경상남도 고성군하곡검사 포대벼검사(40kg)40kg/대55
실적구분명년도품목명행정구역명업무구분명수량단위수량
1518계획2001벼종자강원도 고성군공공비축벼 검사(산물)Kg496280
4616계획2006겉보리종자충청북도 옥천군하곡검사 포대벼검사(40kg)40kg/대633
11142계획2019충청북도 음성군콩검사(일반콩)(20kg),(40kg),(800kg)40kg/대4246
10281계획2018벼종자강원도 강릉시공공비축벼 포대벼검사(40kg),(800kg)40kg/대9155
506계획1999벼종자경기도 의왕시공공비축벼 포대벼검사(40kg)40kg/대5354
10099계획2017벼종자전라북도 전주시공공비축벼 포대벼검사(40kg),(800kg)40kg/대89130
2538계획2002<NA>경상남도 산청군하곡검사 포대벼검사(40kg)40kg/대6164
2111계획2002겉보리종자전라북도 남원시하곡검사 포대벼검사(40kg)40kg/대3180
7018계획2010벼종자경기도 연천군공공비축벼 포대벼검사(40kg),(800kg)40kg/대61730
5786계획2008맥주보리종자경상남도 진주시하곡검사 포대벼검사(40kg)40kg/대14997