Overview

Dataset statistics

Number of variables7
Number of observations8831
Missing cells120
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory517.6 KiB
Average record size in memory60.0 B

Variable types

Numeric4
Text1
Categorical2

Dataset

Description국립농산물품질관리원에서 관리하는 쌀 등 정곡에 대한 검사 실적 정보(신청년도, 시군, 연산, 용도, 원산지, 검사수량 등)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001690

Alerts

신청년도 is highly overall correlated with 연산High correlation
연산 is highly overall correlated with 신청년도High correlation
용도 is highly imbalanced (95.9%)Imbalance
연산 has 120 (1.4%) missing valuesMissing

Reproduction

Analysis started2024-03-23 07:43:39.420861
Analysis finished2024-03-23 07:43:45.241771
Duration5.82 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

신청년도
Real number (ℝ)

HIGH CORRELATION 

Distinct14
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.718
Minimum2009
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size77.7 KiB
2024-03-23T07:43:45.414603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12012
median2016
Q32019
95-th percentile2022
Maximum2022
Range13
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.9734954
Coefficient of variation (CV)0.0019712556
Kurtosis-1.1671541
Mean2015.718
Median Absolute Deviation (MAD)3
Skewness-0.058006551
Sum17800806
Variance15.788666
MonotonicityDecreasing
2024-03-23T07:43:46.058259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
2019 737
 
8.3%
2015 716
 
8.1%
2014 705
 
8.0%
2022 694
 
7.9%
2018 670
 
7.6%
2016 627
 
7.1%
2021 625
 
7.1%
2020 603
 
6.8%
2017 600
 
6.8%
2011 599
 
6.8%
Other values (4) 2255
25.5%
ValueCountFrequency (%)
2009 535
6.1%
2010 588
6.7%
2011 599
6.8%
2012 552
6.3%
2013 580
6.6%
2014 705
8.0%
2015 716
8.1%
2016 627
7.1%
2017 600
6.8%
2018 670
7.6%
ValueCountFrequency (%)
2022 694
7.9%
2021 625
7.1%
2020 603
6.8%
2019 737
8.3%
2018 670
7.6%
2017 600
6.8%
2016 627
7.1%
2015 716
8.1%
2014 705
8.0%
2013 580
6.6%

시군
Text

Distinct100
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size69.1 KiB
2024-03-23T07:43:46.646506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.1174272
Min length7

Characters and Unicode

Total characters71685
Distinct characters94
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row강원도 강릉시
2nd row강원도 강릉시
3rd row강원도 강릉시
4th row강원도 강릉시
5th row강원도 고성군
ValueCountFrequency (%)
전라남도 1689
 
9.3%
경상북도 1611
 
8.8%
전라북도 1019
 
5.6%
경상남도 935
 
5.1%
경기도 921
 
5.1%
충청남도 849
 
4.7%
충청북도 758
 
4.2%
강원도 644
 
3.5%
북구 160
 
0.9%
논산시 151
 
0.8%
Other values (108) 9495
52.1%
2024-03-23T07:43:47.695064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9406
 
13.1%
8483
 
11.8%
4595
 
6.4%
4356
 
6.1%
3808
 
5.3%
3722
 
5.2%
3548
 
4.9%
2874
 
4.0%
2708
 
3.8%
2666
 
3.7%
Other values (84) 25519
35.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 62279
86.9%
Space Separator 9406
 
13.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8483
 
13.6%
4595
 
7.4%
4356
 
7.0%
3808
 
6.1%
3722
 
6.0%
3548
 
5.7%
2874
 
4.6%
2708
 
4.3%
2666
 
4.3%
1930
 
3.1%
Other values (83) 23589
37.9%
Space Separator
ValueCountFrequency (%)
9406
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 62279
86.9%
Common 9406
 
13.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8483
 
13.6%
4595
 
7.4%
4356
 
7.0%
3808
 
6.1%
3722
 
6.0%
3548
 
5.7%
2874
 
4.6%
2708
 
4.3%
2666
 
4.3%
1930
 
3.1%
Other values (83) 23589
37.9%
Common
ValueCountFrequency (%)
9406
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 62279
86.9%
ASCII 9406
 
13.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9406
100.0%
Hangul
ValueCountFrequency (%)
8483
 
13.6%
4595
 
7.4%
4356
 
7.0%
3808
 
6.1%
3722
 
6.0%
3548
 
5.7%
2874
 
4.6%
2708
 
4.3%
2666
 
4.3%
1930
 
3.1%
Other values (83) 23589
37.9%

시군구코드
Real number (ℝ)

Distinct46
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean502.24018
Minimum110
Maximum900
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size77.7 KiB
2024-03-23T07:43:47.985755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum110
5-th percentile113
Q1170
median610
Q3790
95-th percentile870
Maximum900
Range790
Interquartile range (IQR)620

Descriptive statistics

Standard deviation302.20918
Coefficient of variation (CV)0.60172244
Kurtosis-1.8130431
Mean502.24018
Median Absolute Deviation (MAD)260
Skewness-0.12875996
Sum4435283
Variance91330.391
MonotonicityNot monotonic
2024-03-23T07:43:48.418625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
150 531
 
6.0%
720 439
 
5.0%
230 423
 
4.8%
130 420
 
4.8%
770 412
 
4.7%
170 366
 
4.1%
820 360
 
4.1%
113 319
 
3.6%
210 292
 
3.3%
730 281
 
3.2%
Other values (36) 4988
56.5%
ValueCountFrequency (%)
110 157
 
1.8%
113 319
3.6%
121 132
 
1.5%
130 420
4.8%
131 123
 
1.4%
140 176
 
2.0%
150 531
6.0%
170 366
4.1%
171 1
 
< 0.1%
180 86
 
1.0%
ValueCountFrequency (%)
900 134
 
1.5%
890 74
 
0.8%
880 98
 
1.1%
870 214
2.4%
860 147
1.7%
850 69
 
0.8%
840 226
2.6%
830 203
2.3%
820 360
4.1%
810 235
2.7%

연산
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct19
Distinct (%)0.2%
Missing120
Missing (%)1.4%
Infinite0
Infinite (%)0.0%
Mean2013.6799
Minimum2004
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size77.7 KiB
2024-03-23T07:43:48.809441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2004
5-th percentile2007
Q12010
median2014
Q32017
95-th percentile2020
Maximum2022
Range18
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.1684905
Coefficient of variation (CV)0.0020700859
Kurtosis-0.96991186
Mean2013.6799
Median Absolute Deviation (MAD)3
Skewness-0.11124217
Sum17541166
Variance17.376313
MonotonicityNot monotonic
2024-03-23T07:43:49.104451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
2016 725
 
8.2%
2018 711
 
8.1%
2014 680
 
7.7%
2012 673
 
7.6%
2011 635
 
7.2%
2013 634
 
7.2%
2008 611
 
6.9%
2015 586
 
6.6%
2009 585
 
6.6%
2020 539
 
6.1%
Other values (9) 2332
26.4%
ValueCountFrequency (%)
2004 3
 
< 0.1%
2005 189
 
2.1%
2006 132
 
1.5%
2007 224
 
2.5%
2008 611
6.9%
2009 585
6.6%
2010 488
5.5%
2011 635
7.2%
2012 673
7.6%
2013 634
7.2%
ValueCountFrequency (%)
2022 1
 
< 0.1%
2021 231
 
2.6%
2020 539
6.1%
2019 532
6.0%
2018 711
8.1%
2017 532
6.0%
2016 725
8.2%
2015 586
6.6%
2014 680
7.7%
2013 634
7.2%

용도
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size69.1 KiB
정곡
8771 
<NA>
 
31
대북
 
29

Length

Max length4
Median length2
Mean length2.0070207
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row정곡
2nd row정곡
3rd row정곡
4th row정곡
5th row정곡

Common Values

ValueCountFrequency (%)
정곡 8771
99.3%
<NA> 31
 
0.4%
대북 29
 
0.3%

Length

2024-03-23T07:43:49.497282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:49.847822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정곡 8771
99.3%
na 31
 
0.4%
대북 29
 
0.3%

원산지
Categorical

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size69.1 KiB
국산
4102 
중국
1831 
미국
1592 
태국
734 
베트남
 
343
Other values (4)
 
229

Length

Max length4
Median length2
Mean length2.0483524
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row중국
2nd row미국
3rd row중국
4th row국산
5th row국산

Common Values

ValueCountFrequency (%)
국산 4102
46.5%
중국 1831
20.7%
미국 1592
 
18.0%
태국 734
 
8.3%
베트남 343
 
3.9%
호주 152
 
1.7%
<NA> 41
 
0.5%
인도 35
 
0.4%
파키스탄 1
 
< 0.1%

Length

2024-03-23T07:43:50.238222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:50.701427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국산 4102
46.5%
중국 1831
20.7%
미국 1592
 
18.0%
태국 734
 
8.3%
베트남 343
 
3.9%
호주 152
 
1.7%
na 41
 
0.5%
인도 35
 
0.4%
파키스탄 1
 
< 0.1%

검사수량
Real number (ℝ)

Distinct6065
Distinct (%)68.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean946793.4
Minimum0
Maximum20756080
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size77.7 KiB
2024-03-23T07:43:51.246105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile11060
Q1115160
median375600
Q31120235
95-th percentile3649675
Maximum20756080
Range20756080
Interquartile range (IQR)1005075

Descriptive statistics

Standard deviation1560749
Coefficient of variation (CV)1.6484579
Kurtosis29.379719
Mean946793.4
Median Absolute Deviation (MAD)324720
Skewness4.349752
Sum8.3611325 × 109
Variance2.4359376 × 1012
MonotonicityNot monotonic
2024-03-23T07:43:51.725447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000.0 116
 
1.3%
200000.0 79
 
0.9%
50000.0 65
 
0.7%
150000.0 48
 
0.5%
10000.0 44
 
0.5%
30000.0 42
 
0.5%
20000.0 37
 
0.4%
60000.0 37
 
0.4%
80000.0 36
 
0.4%
40000.0 35
 
0.4%
Other values (6055) 8292
93.9%
ValueCountFrequency (%)
0.0 2
< 0.1%
20.0 1
 
< 0.1%
40.0 4
< 0.1%
80.0 3
< 0.1%
120.0 1
 
< 0.1%
160.0 1
 
< 0.1%
200.0 1
 
< 0.1%
320.0 3
< 0.1%
360.0 4
< 0.1%
400.0 4
< 0.1%
ValueCountFrequency (%)
20756080.0 1
< 0.1%
20447000.0 1
< 0.1%
20075000.0 1
< 0.1%
18161280.0 1
< 0.1%
17180400.0 1
< 0.1%
16235000.0 1
< 0.1%
15599360.0 1
< 0.1%
15247240.0 1
< 0.1%
15087600.0 1
< 0.1%
15001000.0 1
< 0.1%

Interactions

2024-03-23T07:43:44.020667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:40.769225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:41.875384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:43.023881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:44.324551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:41.033971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:42.160242image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:43.336837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:44.509025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:41.324813image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:42.450922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:43.644756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:44.702508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:41.608196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:42.743783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:43.849095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:43:51.998398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신청년도시군시군구코드연산용도원산지검사수량
신청년도1.0000.0600.0000.9170.2310.2480.242
시군0.0601.0001.0000.1620.0000.3130.210
시군구코드0.0001.0001.0000.0230.0000.1200.039
연산0.9170.1620.0231.0000.3010.3360.214
용도0.2310.0000.0000.3011.0000.0610.000
원산지0.2480.3130.1200.3360.0611.0000.203
검사수량0.2420.2100.0390.2140.0000.2031.000
2024-03-23T07:43:52.278247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
용도원산지
용도1.0000.046
원산지0.0461.000
2024-03-23T07:43:52.486784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신청년도시군구코드연산검사수량용도원산지
신청년도1.0000.0380.9610.1070.1190.122
시군구코드0.0381.0000.0250.0070.0000.059
연산0.9610.0251.0000.1050.2310.167
검사수량0.1070.0070.1051.0000.0000.098
용도0.1190.0000.2310.0001.0000.046
원산지0.1220.0590.1670.0980.0461.000

Missing values

2024-03-23T07:43:44.915529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:43:45.149527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

신청년도시군시군구코드연산용도원산지검사수량
02022강원도 강릉시1502019정곡중국148000.0
12022강원도 강릉시1502020정곡미국1089680.0
22022강원도 강릉시1502020정곡중국1174000.0
32022강원도 강릉시1502021정곡국산774580.0
42022강원도 고성군8202019정곡국산51000.0
52022강원도 고성군8202020정곡미국593000.0
62022강원도 고성군8202020정곡중국1135000.0
72022강원도 고성군8202021정곡국산1211820.0
82022강원도 삼척시2302019정곡중국81600.0
92022강원도 삼척시2302020정곡미국1292000.0
신청년도시군시군구코드연산용도원산지검사수량
88212009충청북도 청주시 흥덕구113<NA>정곡중국32000.0
88222009충청북도 청주시 흥덕구113<NA>정곡태국7920.0
88232009충청북도 청주시 흥덕구113<NA>정곡<NA>247760.0
88242009충청북도 충주시1302005정곡국산72440.0
88252009충청북도 충주시1302007정곡국산32000.0
88262009충청북도 충주시1302008정곡국산1401240.0
88272009충청북도 충주시1302008정곡미국440280.0
88282009충청북도 충주시1302008정곡중국388920.0
88292009충청북도 충주시1302008정곡태국51040.0
88302009충청북도 충주시130<NA>정곡<NA>26440.0