Overview

Dataset statistics

Number of variables6
Number of observations8449
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory420.9 KiB
Average record size in memory51.0 B

Variable types

Numeric3
Text1
Categorical2

Dataset

Description국립농산물품질관리원에서 관리하는 쌀 등 정곡에 대한 검사 실적 정보(신청년도, 시군, 연산, 용도, 원산지, 검사수량 등)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001690

Alerts

신청년도 is highly overall correlated with 연산High correlation
연산 is highly overall correlated with 신청년도High correlation
용도 is highly imbalanced (95.7%)Imbalance

Reproduction

Analysis started2024-03-23 07:42:57.737038
Analysis finished2024-03-23 07:43:01.611474
Duration3.87 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

신청년도
Real number (ℝ)

HIGH CORRELATION 

Distinct14
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.8919
Minimum2009
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size74.4 KiB
2024-03-23T07:43:01.822956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2010
Q12013
median2016
Q32019
95-th percentile2022
Maximum2022
Range13
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.7548633
Coefficient of variation (CV)0.0018626313
Kurtosis-1.164884
Mean2015.8919
Median Absolute Deviation (MAD)3
Skewness-0.023858296
Sum17032271
Variance14.098999
MonotonicityDecreasing
2024-03-23T07:43:02.215716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
2019 737
 
8.7%
2015 720
 
8.5%
2014 720
 
8.5%
2018 669
 
7.9%
2010 648
 
7.7%
2011 628
 
7.4%
2016 627
 
7.4%
2021 626
 
7.4%
2020 603
 
7.1%
2017 598
 
7.1%
Other values (4) 1873
22.2%
ValueCountFrequency (%)
2009 116
 
1.4%
2010 648
7.7%
2011 628
7.4%
2012 587
6.9%
2013 586
6.9%
2014 720
8.5%
2015 720
8.5%
2016 627
7.4%
2017 598
7.1%
2018 669
7.9%
ValueCountFrequency (%)
2022 584
6.9%
2021 626
7.4%
2020 603
7.1%
2019 737
8.7%
2018 669
7.9%
2017 598
7.1%
2016 627
7.4%
2015 720
8.5%
2014 720
8.5%
2013 586
6.9%

시군
Text

Distinct100
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size66.1 KiB
2024-03-23T07:43:02.820369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.1109007
Min length7

Characters and Unicode

Total characters68529
Distinct characters94
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row강원도 강릉시
2nd row강원도 강릉시
3rd row강원도 강릉시
4th row강원도 고성군
5th row강원도 고성군
ValueCountFrequency (%)
전라남도 1641
 
9.4%
경상북도 1540
 
8.8%
전라북도 967
 
5.5%
경기도 894
 
5.1%
경상남도 883
 
5.1%
충청남도 808
 
4.6%
충청북도 725
 
4.2%
강원도 624
 
3.6%
북구 152
 
0.9%
논산시 142
 
0.8%
Other values (108) 9063
52.0%
2024-03-23T07:43:03.711126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8995
 
13.1%
8138
 
11.9%
4381
 
6.4%
4190
 
6.1%
3655
 
5.3%
3564
 
5.2%
3384
 
4.9%
2761
 
4.0%
2608
 
3.8%
2535
 
3.7%
Other values (84) 24318
35.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59534
86.9%
Space Separator 8995
 
13.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8138
 
13.7%
4381
 
7.4%
4190
 
7.0%
3655
 
6.1%
3564
 
6.0%
3384
 
5.7%
2761
 
4.6%
2608
 
4.4%
2535
 
4.3%
1840
 
3.1%
Other values (83) 22478
37.8%
Space Separator
ValueCountFrequency (%)
8995
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59534
86.9%
Common 8995
 
13.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8138
 
13.7%
4381
 
7.4%
4190
 
7.0%
3655
 
6.1%
3564
 
6.0%
3384
 
5.7%
2761
 
4.6%
2608
 
4.4%
2535
 
4.3%
1840
 
3.1%
Other values (83) 22478
37.8%
Common
ValueCountFrequency (%)
8995
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59534
86.9%
ASCII 8995
 
13.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8995
100.0%
Hangul
ValueCountFrequency (%)
8138
 
13.7%
4381
 
7.4%
4190
 
7.0%
3655
 
6.1%
3564
 
6.0%
3384
 
5.7%
2761
 
4.6%
2608
 
4.4%
2535
 
4.3%
1840
 
3.1%
Other values (83) 22478
37.8%

연산
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2013.7492
Minimum2005
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size74.4 KiB
2024-03-23T07:43:03.942656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2005
5-th percentile2007
Q12011
median2014
Q32017
95-th percentile2020
Maximum2021
Range16
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.0218031
Coefficient of variation (CV)0.0019971718
Kurtosis-0.9284781
Mean2013.7492
Median Absolute Deviation (MAD)3
Skewness-0.10491945
Sum17014167
Variance16.1749
MonotonicityNot monotonic
2024-03-23T07:43:04.162427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
2016 722
 
8.5%
2018 710
 
8.4%
2014 682
 
8.1%
2012 680
 
8.0%
2011 651
 
7.7%
2013 640
 
7.6%
2009 617
 
7.3%
2015 585
 
6.9%
2010 534
 
6.3%
2017 532
 
6.3%
Other values (7) 2096
24.8%
ValueCountFrequency (%)
2005 145
 
1.7%
2006 118
 
1.4%
2007 178
 
2.1%
2008 460
5.4%
2009 617
7.3%
2010 534
6.3%
2011 651
7.7%
2012 680
8.0%
2013 640
7.6%
2014 682
8.1%
ValueCountFrequency (%)
2021 172
 
2.0%
2020 526
6.2%
2019 497
5.9%
2018 710
8.4%
2017 532
6.3%
2016 722
8.5%
2015 585
6.9%
2014 682
8.1%
2013 640
7.6%
2012 680
8.0%

용도
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size66.1 KiB
정곡
8389 
<NA>
 
31
대북
 
29

Length

Max length4
Median length2
Mean length2.0073381
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row정곡
2nd row정곡
3rd row정곡
4th row정곡
5th row정곡

Common Values

ValueCountFrequency (%)
정곡 8389
99.3%
<NA> 31
 
0.4%
대북 29
 
0.3%

Length

2024-03-23T07:43:04.567372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:04.912146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정곡 8389
99.3%
na 31
 
0.4%
대북 29
 
0.3%

원산지
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size66.1 KiB
국산
3940 
중국
1573 
미국
1468 
태국
667 
베트남
 
336
Other values (12)
465 

Length

Max length8
Median length2
Mean length2.1767073
Min length2

Unique

Unique5 ?
Unique (%)0.1%

Sample

1st row국산
2nd row미국
3rd row중국
4th row국산
5th row중국

Common Values

ValueCountFrequency (%)
국산 3940
46.6%
중국 1573
 
18.6%
미국 1468
 
17.4%
태국 667
 
7.9%
베트남 336
 
4.0%
중국(미국) 262
 
3.1%
호주 142
 
1.7%
인도 34
 
0.4%
중국(태국) 14
 
0.2%
중국(호주) 4
 
< 0.1%
Other values (7) 9
 
0.1%

Length

2024-03-23T07:43:05.271573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
국산 3940
46.6%
중국 1573
 
18.6%
미국 1468
 
17.4%
태국 667
 
7.9%
베트남 336
 
4.0%
중국(미국 262
 
3.1%
호주 142
 
1.7%
인도 34
 
0.4%
중국(태국 14
 
0.2%
중국(호주 4
 
< 0.1%
Other values (7) 9
 
0.1%

검사수량
Real number (ℝ)

Distinct5748
Distinct (%)68.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean927355.47
Minimum0
Maximum20756080
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size74.4 KiB
2024-03-23T07:43:05.700212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10320
Q1110000
median358080
Q31083600
95-th percentile3620876
Maximum20756080
Range20756080
Interquartile range (IQR)973600

Descriptive statistics

Standard deviation1563524.3
Coefficient of variation (CV)1.6860032
Kurtosis30.14651
Mean927355.47
Median Absolute Deviation (MAD)308680
Skewness4.4371391
Sum7.8352263 × 109
Variance2.4446083 × 1012
MonotonicityNot monotonic
2024-03-23T07:43:06.359946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000.0 113
 
1.3%
200000.0 73
 
0.9%
50000.0 73
 
0.9%
150000.0 47
 
0.6%
30000.0 44
 
0.5%
10000.0 43
 
0.5%
20000.0 39
 
0.5%
40000.0 38
 
0.4%
60000.0 34
 
0.4%
300000.0 34
 
0.4%
Other values (5738) 7911
93.6%
ValueCountFrequency (%)
0.0 1
 
< 0.1%
20.0 1
 
< 0.1%
40.0 4
< 0.1%
80.0 3
< 0.1%
120.0 1
 
< 0.1%
200.0 1
 
< 0.1%
320.0 3
< 0.1%
360.0 4
< 0.1%
400.0 4
< 0.1%
420.0 1
 
< 0.1%
ValueCountFrequency (%)
20756080.0 1
< 0.1%
20377000.0 1
< 0.1%
19925000.0 1
< 0.1%
18091280.0 1
< 0.1%
17080400.0 1
< 0.1%
16155000.0 1
< 0.1%
15599360.0 1
< 0.1%
15247240.0 1
< 0.1%
15087600.0 1
< 0.1%
15001000.0 1
< 0.1%

Interactions

2024-03-23T07:43:00.236935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:42:58.710202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:42:59.536170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:00.511964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:42:58.977815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:42:59.870892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:00.758742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:42:59.259633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:00.061626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:43:06.645576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신청년도시군연산용도원산지검사수량
신청년도1.0000.0930.9180.2180.3660.243
시군0.0931.0000.1500.0000.3490.193
연산0.9180.1501.0000.2150.4210.219
용도0.2180.0000.2151.0000.0380.000
원산지0.3660.3490.4210.0381.0000.202
검사수량0.2430.1930.2190.0000.2021.000
2024-03-23T07:43:06.920460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
용도원산지
용도1.0000.034
원산지0.0341.000
2024-03-23T07:43:07.152769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신청년도연산검사수량용도원산지
신청년도1.0000.9590.1240.1510.153
연산0.9591.0000.1320.1650.183
검사수량0.1240.1321.0000.0000.080
용도0.1510.1650.0001.0000.034
원산지0.1530.1830.0800.0341.000

Missing values

2024-03-23T07:43:01.101771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:43:01.466393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

신청년도시군연산용도원산지검사수량
02022강원도 강릉시2021정곡국산607500.0
12022강원도 강릉시2020정곡미국888680.0
22022강원도 강릉시2020정곡중국601000.0
32022강원도 고성군2021정곡국산997720.0
42022강원도 고성군2020정곡중국140000.0
52022강원도 고성군2019정곡국산51000.0
62022강원도 고성군2020정곡미국593000.0
72022강원도 삼척시2021정곡국산124440.0
82022강원도 삼척시2020정곡중국447000.0
92022강원도 삼척시2020정곡미국1065000.0
신청년도시군연산용도원산지검사수량
84392009충청북도 제천시2008정곡국산173240.0
84402009충청북도 진천군2008정곡중국(미국)393360.0
84412009충청북도 진천군2008정곡중국(태국)396280.0
84422009충청북도 진천군2006정곡국산45560.0
84432009충청북도 진천군2008정곡미국100080.0
84442009충청북도 진천군2008정곡중국40000.0
84452009충청북도 진천군2008정곡태국428960.0
84462009충청북도 진천군2005정곡국산120320.0
84472009충청북도 진천군2008정곡국산720020.0
84482009충청북도 충주시2008정곡국산5120.0