Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells1268
Missing cells (%)1.8%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Categorical3
Numeric2
Text2

Dataset

Description국립농산물품질관리원에서 관리하는 하추곡검사사실적 정보(구분명, 년도, 품목, 행정구역, 업무구분, 수량단위, 수량)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001693

Alerts

실적구분명 has constant value ""Constant
Dataset has 1 (< 0.1%) duplicate rowsDuplicates
업무구분명 is highly overall correlated with 수량단위High correlation
수량단위 is highly overall correlated with 업무구분명High correlation
수량 has 1208 (12.1%) missing valuesMissing

Reproduction

Analysis started2023-12-11 03:47:47.449169
Analysis finished2023-12-11 03:47:48.509570
Duration1.06 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

실적구분명
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
실적
10000 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row실적
2nd row실적
3rd row실적
4th row실적
5th row실적

Common Values

ValueCountFrequency (%)
실적 10000
100.0%

Length

2023-12-11T12:47:48.567856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T12:47:48.652588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
실적 10000
100.0%

년도
Real number (ℝ)

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2009.3197
Minimum1998
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T12:47:48.730137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1998
5-th percentile1999
Q12004
median2009
Q32015
95-th percentile2019
Maximum2021
Range23
Interquartile range (IQR)11

Descriptive statistics

Standard deviation6.3726096
Coefficient of variation (CV)0.003171526
Kurtosis-1.0864441
Mean2009.3197
Median Absolute Deviation (MAD)5
Skewness-0.0055113874
Sum20093197
Variance40.610153
MonotonicityNot monotonic
2023-12-11T12:47:48.835259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
2006 586
 
5.9%
2009 577
 
5.8%
2004 571
 
5.7%
2008 534
 
5.3%
2007 532
 
5.3%
2019 495
 
5.0%
2013 487
 
4.9%
2016 485
 
4.9%
2005 436
 
4.4%
2015 432
 
4.3%
Other values (14) 4865
48.6%
ValueCountFrequency (%)
1998 332
3.3%
1999 372
3.7%
2000 367
3.7%
2001 295
2.9%
2002 366
3.7%
2003 328
3.3%
2004 571
5.7%
2005 436
4.4%
2006 586
5.9%
2007 532
5.3%
ValueCountFrequency (%)
2021 27
 
0.3%
2020 401
4.0%
2019 495
5.0%
2018 382
3.8%
2017 425
4.2%
2016 485
4.9%
2015 432
4.3%
2014 388
3.9%
2013 487
4.9%
2012 361
3.6%
Distinct83
Distinct (%)0.8%
Missing60
Missing (%)0.6%
Memory size156.2 KiB
2023-12-11T12:47:49.053097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length6
Mean length3.9853119
Min length1

Characters and Unicode

Total characters39614
Distinct characters93
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st row논콩(대립종)
2nd row삼광벼
3rd row오대벼
4th row밭콩(대립종)
5th row화영벼
ValueCountFrequency (%)
삼광벼 625
 
6.3%
동진1호벼 510
 
5.1%
새누리벼 490
 
4.9%
일품벼 482
 
4.8%
추청벼 464
 
4.7%
일미벼 456
 
4.6%
겉보리종자 438
 
4.4%
쌀보리종자 400
 
4.0%
남평벼 383
 
3.9%
기타(2군 382
 
3.8%
Other values (73) 5310
53.4%
2023-12-11T12:47:49.357334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6262
 
15.8%
2265
 
5.7%
1858
 
4.7%
1479
 
3.7%
( 1335
 
3.4%
) 1335
 
3.4%
1332
 
3.4%
1312
 
3.3%
1236
 
3.1%
1204
 
3.0%
Other values (83) 19996
50.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 36036
91.0%
Open Punctuation 1335
 
3.4%
Close Punctuation 1335
 
3.4%
Decimal Number 908
 
2.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6262
 
17.4%
2265
 
6.3%
1858
 
5.2%
1479
 
4.1%
1332
 
3.7%
1312
 
3.6%
1236
 
3.4%
1204
 
3.3%
1079
 
3.0%
965
 
2.7%
Other values (79) 17044
47.3%
Decimal Number
ValueCountFrequency (%)
1 519
57.2%
2 389
42.8%
Open Punctuation
ValueCountFrequency (%)
( 1335
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1335
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 36036
91.0%
Common 3578
 
9.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6262
 
17.4%
2265
 
6.3%
1858
 
5.2%
1479
 
4.1%
1332
 
3.7%
1312
 
3.6%
1236
 
3.4%
1204
 
3.3%
1079
 
3.0%
965
 
2.7%
Other values (79) 17044
47.3%
Common
ValueCountFrequency (%)
( 1335
37.3%
) 1335
37.3%
1 519
 
14.5%
2 389
 
10.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 36036
91.0%
ASCII 3578
 
9.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6262
 
17.4%
2265
 
6.3%
1858
 
5.2%
1479
 
4.1%
1332
 
3.7%
1312
 
3.6%
1236
 
3.4%
1204
 
3.3%
1079
 
3.0%
965
 
2.7%
Other values (79) 17044
47.3%
ASCII
ValueCountFrequency (%)
( 1335
37.3%
) 1335
37.3%
1 519
 
14.5%
2 389
 
10.9%
Distinct184
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T12:47:49.671681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length7.9019
Min length7

Characters and Unicode

Total characters79019
Distinct characters127
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row경상북도 영주시
2nd row충청남도 예산군
3rd row강원도 삼척시
4th row전라남도 무안군
5th row경상북도 포항시
ValueCountFrequency (%)
전라남도 1885
 
9.5%
경상북도 1460
 
7.3%
경상남도 1337
 
6.7%
충청남도 1071
 
5.4%
전라북도 1035
 
5.2%
강원도 923
 
4.6%
충청북도 688
 
3.5%
경기도 664
 
3.3%
광주광역시 168
 
0.8%
인천광역시 160
 
0.8%
Other values (178) 10521
52.8%
2023-12-11T12:47:50.092499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10000
 
12.7%
9324
 
11.8%
5734
 
7.3%
4663
 
5.9%
4657
 
5.9%
3634
 
4.6%
3272
 
4.1%
3119
 
3.9%
2920
 
3.7%
2885
 
3.7%
Other values (117) 28811
36.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 69019
87.3%
Space Separator 10000
 
12.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9324
 
13.5%
5734
 
8.3%
4663
 
6.8%
4657
 
6.7%
3634
 
5.3%
3272
 
4.7%
3119
 
4.5%
2920
 
4.2%
2885
 
4.2%
2085
 
3.0%
Other values (116) 26726
38.7%
Space Separator
ValueCountFrequency (%)
10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 69019
87.3%
Common 10000
 
12.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9324
 
13.5%
5734
 
8.3%
4663
 
6.8%
4657
 
6.7%
3634
 
5.3%
3272
 
4.7%
3119
 
4.5%
2920
 
4.2%
2885
 
4.2%
2085
 
3.0%
Other values (116) 26726
38.7%
Common
ValueCountFrequency (%)
10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 69019
87.3%
ASCII 10000
 
12.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10000
100.0%
Hangul
ValueCountFrequency (%)
9324
 
13.5%
5734
 
8.3%
4663
 
6.8%
4657
 
6.7%
3634
 
5.3%
3272
 
4.7%
3119
 
4.5%
2920
 
4.2%
2885
 
4.2%
2085
 
3.0%
Other values (116) 26726
38.7%

업무구분명
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
공공비축벼 포대벼검사(40kg)
3650 
공공비축벼 검사(산물)
2090 
공공비축벼 포대벼검사(800kg)
1619 
잡곡검사(콩
1201 
하곡검사 포대벼검사(40kg)
1113 
Other values (10)
 
327

Length

Max length20
Median length19
Mean length14.5001
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row잡곡검사(콩
2nd row공공비축벼 검사(산물)
3rd row공공비축벼 포대벼검사(800kg)
4th row잡곡검사(콩
5th row공공비축벼 포대벼검사(40kg)

Common Values

ValueCountFrequency (%)
공공비축벼 포대벼검사(40kg) 3650
36.5%
공공비축벼 검사(산물) 2090
20.9%
공공비축벼 포대벼검사(800kg) 1619
16.2%
잡곡검사(콩 1201
 
12.0%
하곡검사 포대벼검사(40kg) 1113
 
11.1%
농협시가매입(시장격리곡)(800kg) 80
 
0.8%
비축농산물 60
 
0.6%
하곡검사 (산물) 50
 
0.5%
콩(40kg) 42
 
0.4%
농협시가매입(시장격리곡) 33
 
0.3%
Other values (5) 62
 
0.6%

Length

2023-12-11T12:47:50.242491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
공공비축벼 7359
39.7%
포대벼검사(40kg 4763
25.7%
검사(산물 2090
 
11.3%
포대벼검사(800kg 1619
 
8.7%
잡곡검사(콩 1201
 
6.5%
하곡검사 1163
 
6.3%
농협시가매입(시장격리곡)(800kg 80
 
0.4%
비축농산물 60
 
0.3%
산물 50
 
0.3%
콩(40kg 42
 
0.2%
Other values (6) 95
 
0.5%

수량단위
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
40kg/대
7033 
800kg/대
1699 
<NA>
1208 
20Kg/대
 
57
10Kg/대
 
3

Length

Max length7
Median length6
Mean length5.9283
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row40kg/대
3rd row800kg/대
4th row<NA>
5th row40kg/대

Common Values

ValueCountFrequency (%)
40kg/대 7033
70.3%
800kg/대 1699
 
17.0%
<NA> 1208
 
12.1%
20Kg/대 57
 
0.6%
10Kg/대 3
 
< 0.1%

Length

2023-12-11T12:47:50.379749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T12:47:50.544955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
40kg/대 7033
70.3%
800kg/대 1699
 
17.0%
na 1208
 
12.1%
20kg/대 57
 
0.6%
10kg/대 3
 
< 0.1%

수량
Real number (ℝ)

MISSING 

Distinct6526
Distinct (%)74.2%
Missing1208
Missing (%)12.1%
Infinite0
Infinite (%)0.0%
Mean121388
Minimum1
Maximum10544000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T12:47:50.680927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile46
Q1766
median4971
Q330790.5
95-th percentile565556
Maximum10544000
Range10543999
Interquartile range (IQR)30024.5

Descriptive statistics

Standard deviation530025.22
Coefficient of variation (CV)4.3663725
Kurtosis99.514056
Mean121388
Median Absolute Deviation (MAD)4857
Skewness8.7271766
Sum1.0672433 × 109
Variance2.8092673 × 1011
MonotonicityNot monotonic
2023-12-11T12:47:50.829367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 21
 
0.2%
20.0 18
 
0.2%
16.0 15
 
0.1%
100.0 15
 
0.1%
30.0 15
 
0.1%
35.0 14
 
0.1%
5.0 14
 
0.1%
8.0 14
 
0.1%
50.0 13
 
0.1%
26.0 13
 
0.1%
Other values (6516) 8640
86.4%
(Missing) 1208
 
12.1%
ValueCountFrequency (%)
1.0 12
0.1%
2.0 9
0.1%
3.0 8
 
0.1%
4.0 7
 
0.1%
5.0 14
0.1%
6.0 10
0.1%
7.0 10
0.1%
8.0 14
0.1%
9.0 12
0.1%
10.0 21
0.2%
ValueCountFrequency (%)
10544000.0 1
< 0.1%
8813640.0 1
< 0.1%
8745280.0 1
< 0.1%
8146880.0 1
< 0.1%
7620520.0 1
< 0.1%
7564800.0 1
< 0.1%
7338040.0 1
< 0.1%
7292520.0 1
< 0.1%
7179160.0 1
< 0.1%
7098800.0 1
< 0.1%

Interactions

2023-12-11T12:47:48.080106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:47:47.935639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:47:48.170035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:47:48.005640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T12:47:50.941999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년도품목명업무구분명수량단위수량
년도1.0000.8280.5810.4140.289
품목명0.8281.0000.9250.4350.284
업무구분명0.5810.9251.0000.9290.235
수량단위0.4140.4350.9291.0000.066
수량0.2890.2840.2350.0661.000
2023-12-11T12:47:51.044190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업무구분명수량단위
업무구분명1.0000.816
수량단위0.8161.000
2023-12-11T12:47:51.128043image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년도수량업무구분명수량단위
년도1.000-0.3610.2600.264
수량-0.3611.0000.0970.039
업무구분명0.2600.0971.0000.816
수량단위0.2640.0390.8161.000

Missing values

2023-12-11T12:47:48.281293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T12:47:48.374613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T12:47:48.462055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

실적구분명년도품목명행정구역명업무구분명수량단위수량
17404실적2002논콩(대립종)경상북도 영주시잡곡검사(콩<NA><NA>
7538실적2012삼광벼충청남도 예산군공공비축벼 검사(산물)40kg/대3157.0
2355실적2018오대벼강원도 삼척시공공비축벼 포대벼검사(800kg)800kg/대80.0
13840실적2006밭콩(대립종)전라남도 무안군잡곡검사(콩<NA><NA>
14407실적2006화영벼경상북도 포항시공공비축벼 포대벼검사(40kg)40kg/대3898.0
1066실적2019삼광벼충청북도 괴산군공공비축벼 포대벼검사(40kg)40kg/대6392.0
12686실적2007밭콩(대립종)충청북도 괴산군잡곡검사(콩<NA><NA>
1767실적2019해담쌀벼전라북도 진안군공공비축벼 포대벼검사(800kg)800kg/대864.0
10181실적2009동진1호벼충청남도 청양군공공비축벼 포대벼검사(800kg)800kg/대101.0
16433실적2004주남벼경상북도 칠곡군공공비축벼 검사(산물)40kg/대597640.0
실적구분명년도품목명행정구역명업무구분명수량단위수량
12365실적2007논콩(소립종)전라남도 신안군잡곡검사(콩<NA><NA>
5200실적2015운광벼경상북도 의성군공공비축벼 포대벼검사(40kg)40kg/대73236.0
17166실적2003밭콩(소립종)전라남도 완도군잡곡검사(콩<NA><NA>
10285실적2009삼광벼세종특별자치시공공비축벼 포대벼검사(40kg)40kg/대35348.0
2745실적2017삼광벼경상북도 경산시공공비축벼 포대벼검사(800kg)800kg/대101.0
18618실적2000겉보리종자경상북도 고령군하곡검사 포대벼검사(40kg)40kg/대6086.0
17982실적2002쌀보리종자전라북도 김제시하곡검사 (산물)40kg/대840760.0
361실적2020새일미벼전라북도 익산시공공비축벼 포대벼검사(800kg)800kg/대2707.0
17430실적2002논콩(대립종)전라남도 장흥군잡곡검사(콩<NA><NA>
978실적2019삼광벼경상북도 성주군공공비축벼 포대벼검사(40kg)40kg/대5467.0

Duplicate rows

Most frequently occurring

실적구분명년도품목명행정구역명업무구분명수량단위수량# duplicates
0실적2017전라북도 익산시잡곡검사(콩<NA><NA>2