Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells15
Missing cells (%)< 0.1%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Categorical3
Numeric2
Text2

Dataset

Description국립농산물품질관리원에서 관리하는 하추곡검사사실적 정보(구분명, 년도, 품목, 행정구역, 업무구분, 수량단위, 수량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001693

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
JOB_SE_NM is highly overall correlated with QY_UNITHigh correlation
QY_UNIT is highly overall correlated with JOB_SE_NMHigh correlation
QY_UNIT is highly imbalanced (69.6%)Imbalance

Reproduction

Analysis started2023-12-11 03:47:41.362131
Analysis finished2023-12-11 03:47:42.412641
Duration1.05 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
실적
6461 
계획
3539 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row실적
2nd row실적
3rd row계획
4th row실적
5th row실적

Common Values

ValueCountFrequency (%)
실적 6461
64.6%
계획 3539
35.4%

Length

2023-12-11T12:47:42.499411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T12:47:42.601056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
실적 6461
64.6%
계획 3539
35.4%

YEAR
Real number (ℝ)

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2006.5755
Minimum1998
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T12:47:42.683268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1998
5-th percentile1999
Q12003
median2007
Q32010
95-th percentile2015
Maximum2015
Range17
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.8207493
Coefficient of variation (CV)0.0024024759
Kurtosis-0.96690945
Mean2006.5755
Median Absolute Deviation (MAD)4
Skewness-0.0038027532
Sum20065755
Variance23.239624
MonotonicityNot monotonic
2023-12-11T12:47:42.789327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
2006 812
 
8.1%
2004 767
 
7.7%
2008 735
 
7.3%
2009 720
 
7.2%
2007 685
 
6.9%
2005 643
 
6.4%
2000 532
 
5.3%
1999 530
 
5.3%
2013 526
 
5.3%
2010 516
 
5.2%
Other values (8) 3534
35.3%
ValueCountFrequency (%)
1998 379
3.8%
1999 530
5.3%
2000 532
5.3%
2001 412
4.1%
2002 480
4.8%
2003 415
4.2%
2004 767
7.7%
2005 643
6.4%
2006 812
8.1%
2007 685
6.9%
ValueCountFrequency (%)
2015 516
5.2%
2014 444
4.4%
2013 526
5.3%
2012 394
3.9%
2011 494
4.9%
2010 516
5.2%
2009 720
7.2%
2008 735
7.3%
2007 685
6.9%
2006 812
8.1%
Distinct63
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T12:47:42.965185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7
Mean length3.2521
Min length1

Characters and Unicode

Total characters32521
Distinct characters86
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.1%

Sample

1st row
2nd row밭콩(소립종)
3rd row
4th row겉보리
5th row황금누리벼
ValueCountFrequency (%)
1947
19.5%
겉보리 736
 
7.4%
쌀보리 709
 
7.1%
옥수수 645
 
6.5%
동진1호벼 425
 
4.2%
일미벼 371
 
3.7%
추청벼 353
 
3.5%
남평벼 345
 
3.5%
기타(2군 344
 
3.4%
일품벼 332
 
3.3%
Other values (53) 3793
37.9%
2023-12-11T12:47:43.266424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5912
 
18.2%
2193
 
6.7%
1836
 
5.6%
1360
 
4.2%
) 1330
 
4.1%
( 1330
 
4.1%
1002
 
3.1%
899
 
2.8%
819
 
2.5%
805
 
2.5%
Other values (76) 15035
46.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 29081
89.4%
Close Punctuation 1330
 
4.1%
Open Punctuation 1330
 
4.1%
Decimal Number 780
 
2.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5912
20.3%
2193
 
7.5%
1836
 
6.3%
1360
 
4.7%
1002
 
3.4%
899
 
3.1%
819
 
2.8%
805
 
2.8%
760
 
2.6%
736
 
2.5%
Other values (72) 12759
43.9%
Decimal Number
ValueCountFrequency (%)
1 429
55.0%
2 351
45.0%
Close Punctuation
ValueCountFrequency (%)
) 1330
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1330
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 29081
89.4%
Common 3440
 
10.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5912
20.3%
2193
 
7.5%
1836
 
6.3%
1360
 
4.7%
1002
 
3.4%
899
 
3.1%
819
 
2.8%
805
 
2.8%
760
 
2.6%
736
 
2.5%
Other values (72) 12759
43.9%
Common
ValueCountFrequency (%)
) 1330
38.7%
( 1330
38.7%
1 429
 
12.5%
2 351
 
10.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 29081
89.4%
ASCII 3440
 
10.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5912
20.3%
2193
 
7.5%
1836
 
6.3%
1360
 
4.7%
1002
 
3.4%
899
 
3.1%
819
 
2.8%
805
 
2.8%
760
 
2.6%
736
 
2.5%
Other values (72) 12759
43.9%
ASCII
ValueCountFrequency (%)
) 1330
38.7%
( 1330
38.7%
1 429
 
12.5%
2 351
 
10.2%
Distinct239
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T12:47:43.600076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length8
Mean length7.9975
Min length6

Characters and Unicode

Total characters79975
Distinct characters136
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)0.1%

Sample

1st row전라북도 김제시
2nd row전라남도 영광군
3rd row광주광역시 동구
4th row전라북도 김제시
5th row전라북도 순창군
ValueCountFrequency (%)
전라남도 1814
 
9.0%
경상남도 1430
 
7.1%
경상북도 1429
 
7.1%
전라북도 1036
 
5.1%
충청남도 1033
 
5.1%
강원도 823
 
4.1%
경기도 816
 
4.0%
충청북도 636
 
3.1%
광주광역시 203
 
1.0%
인천광역시 160
 
0.8%
Other values (228) 10820
53.6%
2023-12-11T12:47:44.069424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10249
 
12.8%
9312
 
11.6%
5673
 
7.1%
4703
 
5.9%
4692
 
5.9%
3853
 
4.8%
3247
 
4.1%
3055
 
3.8%
2946
 
3.7%
2850
 
3.6%
Other values (126) 29395
36.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 69726
87.2%
Space Separator 10249
 
12.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9312
 
13.4%
5673
 
8.1%
4703
 
6.7%
4692
 
6.7%
3853
 
5.5%
3247
 
4.7%
3055
 
4.4%
2946
 
4.2%
2850
 
4.1%
2001
 
2.9%
Other values (125) 27394
39.3%
Space Separator
ValueCountFrequency (%)
10249
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 69726
87.2%
Common 10249
 
12.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9312
 
13.4%
5673
 
8.1%
4703
 
6.7%
4692
 
6.7%
3853
 
5.5%
3247
 
4.7%
3055
 
4.4%
2946
 
4.2%
2850
 
4.1%
2001
 
2.9%
Other values (125) 27394
39.3%
Common
ValueCountFrequency (%)
10249
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 69726
87.2%
ASCII 10249
 
12.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10249
100.0%
Hangul
ValueCountFrequency (%)
9312
 
13.4%
5673
 
8.1%
4703
 
6.7%
4692
 
6.7%
3853
 
5.5%
3247
 
4.7%
3055
 
4.4%
2946
 
4.2%
2850
 
4.1%
2001
 
2.9%
Other values (125) 27394
39.3%

JOB_SE_NM
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
공공비축벼 포대벼검사(40kg)
3275 
공공비축벼 검사(산물)
1987 
하곡검사 포대벼검사(40kg)
1889 
잡곡검사(콩, 옥수수)
1520 
공공비축벼 포대벼검사(800kg)
758 
Other values (4)
571 

Length

Max length25
Median length18
Mean length15.335
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row하곡검사 포대벼검사(40kg)
2nd row잡곡검사(콩, 옥수수)
3rd row공공비축벼 포대벼검사(40kg),(800kg)
4th row하곡검사 포대벼검사(40kg)
5th row공공비축벼 포대벼검사(40kg)

Common Values

ValueCountFrequency (%)
공공비축벼 포대벼검사(40kg) 3275
32.8%
공공비축벼 검사(산물) 1987
19.9%
하곡검사 포대벼검사(40kg) 1889
18.9%
잡곡검사(콩, 옥수수) 1520
15.2%
공공비축벼 포대벼검사(800kg) 758
 
7.6%
공공비축벼 포대벼검사(40kg),(800kg) 413
 
4.1%
하곡검사 (산물) 99
 
1.0%
농협시가매입 42
 
0.4%
애프터매입벼검사(800kg) 17
 
0.2%

Length

2023-12-11T12:47:44.209781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T12:47:44.322018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
공공비축벼 6433
32.3%
포대벼검사(40kg 5164
25.9%
하곡검사 1988
 
10.0%
검사(산물 1987
 
10.0%
잡곡검사(콩 1520
 
7.6%
옥수수 1520
 
7.6%
포대벼검사(800kg 758
 
3.8%
포대벼검사(40kg),(800kg 413
 
2.1%
산물 99
 
0.5%
농협시가매입 42
 
0.2%

QY_UNIT
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
40kg/대
8575 
800kg/대
 
698
Kg
 
627
800kg/톤백
 
65
<NA>
 
28

Length

Max length8
Median length6
Mean length5.8243
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row40kg/대
2nd row40kg/대
3rd rowKg
4th row40kg/대
5th row40kg/대

Common Values

ValueCountFrequency (%)
40kg/대 8575
85.8%
800kg/대 698
 
7.0%
Kg 627
 
6.3%
800kg/톤백 65
 
0.7%
<NA> 28
 
0.3%
Ton 7
 
0.1%

Length

2023-12-11T12:47:44.439743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T12:47:44.532038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
40kg/대 8575
85.8%
800kg/대 698
 
7.0%
kg 627
 
6.3%
800kg/톤백 65
 
0.7%
na 28
 
0.3%
ton 7
 
0.1%

QY
Real number (ℝ)

Distinct6887
Distinct (%)69.0%
Missing15
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean242273.56
Minimum0
Maximum18049800
Zeros36
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T12:47:44.639194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile31
Q1532
median5690
Q357244
95-th percentile1200000
Maximum18049800
Range18049800
Interquartile range (IQR)56712

Descriptive statistics

Standard deviation1003600.6
Coefficient of variation (CV)4.1424272
Kurtosis75.675664
Mean242273.56
Median Absolute Deviation (MAD)5641
Skewness7.6266833
Sum2.4191015 × 109
Variance1.0072141 × 1012
MonotonicityNot monotonic
2023-12-11T12:47:45.008675image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 36
 
0.4%
10 26
 
0.3%
50 26
 
0.3%
5 24
 
0.2%
20 23
 
0.2%
2 20
 
0.2%
25 20
 
0.2%
30 19
 
0.2%
13 19
 
0.2%
3 19
 
0.2%
Other values (6877) 9753
97.5%
ValueCountFrequency (%)
0 36
0.4%
1 17
0.2%
2 20
0.2%
3 19
0.2%
4 16
0.2%
5 24
0.2%
6 16
0.2%
7 19
0.2%
8 19
0.2%
9 16
0.2%
ValueCountFrequency (%)
18049800 1
< 0.1%
16398720 1
< 0.1%
15554560 1
< 0.1%
15230640 1
< 0.1%
15024840 1
< 0.1%
12771080 1
< 0.1%
12643870 1
< 0.1%
12594440 1
< 0.1%
12338880 1
< 0.1%
12200680 1
< 0.1%

Interactions

2023-12-11T12:47:42.032439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:47:41.829968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:47:42.131256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:47:41.935385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T12:47:45.097124image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
PLAN_ACMSLT_SE_NMYEARPRDLST_NMJOB_SE_NMQY_UNITQY
PLAN_ACMSLT_SE_NM1.0000.2120.9380.3740.3370.197
YEAR0.2121.0000.7020.4660.4250.209
PRDLST_NM0.9380.7021.0000.8800.5880.223
JOB_SE_NM0.3740.4660.8801.0000.7930.285
QY_UNIT0.3370.4250.5880.7931.0000.562
QY0.1970.2090.2230.2850.5621.000
2023-12-11T12:47:45.189105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
PLAN_ACMSLT_SE_NMQY_UNITJOB_SE_NM
PLAN_ACMSLT_SE_NM1.0000.4110.374
QY_UNIT0.4111.0000.617
JOB_SE_NM0.3740.6171.000
2023-12-11T12:47:45.277278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
YEARQYPLAN_ACMSLT_SE_NMJOB_SE_NMQY_UNIT
YEAR1.000-0.1170.1590.2370.192
QY-0.1171.0000.1510.1330.267
PLAN_ACMSLT_SE_NM0.1590.1511.0000.3740.411
JOB_SE_NM0.2370.1330.3741.0000.617
QY_UNIT0.1920.2670.4110.6171.000

Missing values

2023-12-11T12:47:42.265869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T12:47:42.361471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

PLAN_ACMSLT_SE_NMYEARPRDLST_NMADMINIST_ZONE_NMJOB_SE_NMQY_UNITQY
12190실적2004전라북도 김제시하곡검사 포대벼검사(40kg)40kg/대2100
10205실적2001밭콩(소립종)전라남도 영광군잡곡검사(콩, 옥수수)40kg/대8
24544계획2013광주광역시 동구공공비축벼 포대벼검사(40kg),(800kg)Kg4000
19577실적2008겉보리전라북도 김제시하곡검사 포대벼검사(40kg)40kg/대33274
6282실적2013황금누리벼전라북도 순창군공공비축벼 포대벼검사(40kg)40kg/대16897
12003실적2002논콩(대립종)충청북도 괴산군잡곡검사(콩, 옥수수)40kg/대7672
6960계획2001옥수수강원도 횡성군공공비축벼 포대벼검사(40kg)40kg/대638
1567실적2012삼광벼충청남도 서산시공공비축벼 검사(산물)40kg/대1878
4685실적2012호품벼충청북도 음성군공공비축벼 검사(산물)40kg/대21272
1867계획2003충청남도 서천군공공비축벼 포대벼검사(40kg)40kg/대177790
PLAN_ACMSLT_SE_NMYEARPRDLST_NMADMINIST_ZONE_NMJOB_SE_NMQY_UNITQY
13018실적2000기타(2군)전라남도 보성군공공비축벼 검사(산물)40kg/대6108918
21795계획2011전라북도 남원시공공비축벼 포대벼검사(40kg),(800kg)40kg/대144141
23457계획2006옥수수경상북도 문경시잡곡검사(콩, 옥수수)40kg/대16755
22415계획2008옥수수충청북도 제천시잡곡검사(콩, 옥수수)40kg/대5
15874실적2007동진1호벼경상남도 고성군공공비축벼 검사(산물)40kg/대762240
24371계획2009경상북도 칠곡군공공비축벼 포대벼검사(800kg)800kg/톤백500
19545실적2006화영벼경상북도 울진군공공비축벼 포대벼검사(40kg)40kg/대3640
19412실적2005추청벼경기도 남양주시공공비축벼 포대벼검사(40kg)40kg/대3046
14764실적1999기타(2군)제주도 남제주군공공비축벼 포대벼검사(40kg)40kg/대546
2515계획2002전라남도 장흥군공공비축벼 검사(산물)Kg4853000

Duplicate rows

Most frequently occurring

PLAN_ACMSLT_SE_NMYEARPRDLST_NMADMINIST_ZONE_NMJOB_SE_NMQY_UNITQY# duplicates
0계획2013콩(일반)충청남도 홍성군잡곡검사(콩, 옥수수)<NA><NA>2