Overview

Dataset statistics

Number of variables6
Number of observations9615
Missing cells120
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory479.0 KiB
Average record size in memory51.0 B

Variable types

Numeric3
Text1
Categorical2

Dataset

Description국립농산물품질관리원에서 관리하는 쌀 등 정곡에 대한 검사 실적 정보(신청년도, 시군, 연산, 용도, 원산지, 검사수량 등)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001690

Alerts

신청년도 is highly overall correlated with 연산High correlation
연산 is highly overall correlated with 신청년도High correlation
용도 is highly imbalanced (96.2%)Imbalance
연산 has 120 (1.2%) missing valuesMissing

Reproduction

Analysis started2024-03-23 07:44:23.018265
Analysis finished2024-03-23 07:44:26.931573
Duration3.91 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

신청년도
Real number (ℝ)

HIGH CORRELATION 

Distinct15
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.3042
Minimum2009
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size84.6 KiB
2024-03-23T07:44:27.084244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12013
median2016
Q32020
95-th percentile2023
Maximum2023
Range14
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.2870429
Coefficient of variation (CV)0.0021261886
Kurtosis-1.1818384
Mean2016.3042
Median Absolute Deviation (MAD)4
Skewness-0.068844607
Sum19386765
Variance18.378737
MonotonicityDecreasing
2024-03-23T07:44:27.446783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
2022 767
 
8.0%
2019 737
 
7.7%
2015 716
 
7.4%
2023 711
 
7.4%
2014 705
 
7.3%
2018 670
 
7.0%
2016 627
 
6.5%
2021 625
 
6.5%
2020 603
 
6.3%
2017 600
 
6.2%
Other values (5) 2854
29.7%
ValueCountFrequency (%)
2009 535
5.6%
2010 588
6.1%
2011 599
6.2%
2012 552
5.7%
2013 580
6.0%
2014 705
7.3%
2015 716
7.4%
2016 627
6.5%
2017 600
6.2%
2018 670
7.0%
ValueCountFrequency (%)
2023 711
7.4%
2022 767
8.0%
2021 625
6.5%
2020 603
6.3%
2019 737
7.7%
2018 670
7.0%
2017 600
6.2%
2016 627
6.5%
2015 716
7.4%
2014 705
7.3%

시군
Text

Distinct100
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size75.2 KiB
2024-03-23T07:44:28.003035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.4045762
Min length7

Characters and Unicode

Total characters80810
Distinct characters94
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row강원특별자치도 강릉시
2nd row강원특별자치도 강릉시
3rd row강원특별자치도 강릉시
4th row강원특별자치도 강릉시
5th row강원특별자치도 고성군
ValueCountFrequency (%)
전라남도 1876
 
9.5%
경상북도 1728
 
8.7%
전라북도 1131
 
5.7%
경상남도 1040
 
5.2%
경기도 983
 
5.0%
충청남도 926
 
4.7%
충청북도 816
 
4.1%
강원특별자치도 685
 
3.5%
북구 171
 
0.9%
논산시 166
 
0.8%
Other values (108) 10324
52.0%
2024-03-23T07:44:29.006987image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10240
 
12.7%
9255
 
11.5%
4963
 
6.1%
4784
 
5.9%
4201
 
5.2%
4019
 
5.0%
3846
 
4.8%
3185
 
3.9%
3007
 
3.7%
2897
 
3.6%
Other values (84) 30413
37.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 70570
87.3%
Space Separator 10240
 
12.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9255
 
13.1%
4963
 
7.0%
4784
 
6.8%
4201
 
6.0%
4019
 
5.7%
3846
 
5.4%
3185
 
4.5%
3007
 
4.3%
2897
 
4.1%
2101
 
3.0%
Other values (83) 28312
40.1%
Space Separator
ValueCountFrequency (%)
10240
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 70570
87.3%
Common 10240
 
12.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9255
 
13.1%
4963
 
7.0%
4784
 
6.8%
4201
 
6.0%
4019
 
5.7%
3846
 
5.4%
3185
 
4.5%
3007
 
4.3%
2897
 
4.1%
2101
 
3.0%
Other values (83) 28312
40.1%
Common
ValueCountFrequency (%)
10240
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 70570
87.3%
ASCII 10240
 
12.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10240
100.0%
Hangul
ValueCountFrequency (%)
9255
 
13.1%
4963
 
7.0%
4784
 
6.8%
4201
 
6.0%
4019
 
5.7%
3846
 
5.4%
3185
 
4.5%
3007
 
4.3%
2897
 
4.1%
2101
 
3.0%
Other values (83) 28312
40.1%

연산
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct20
Distinct (%)0.2%
Missing120
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean2014.2439
Minimum2004
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size84.6 KiB
2024-03-23T07:44:29.493761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2004
5-th percentile2007
Q12011
median2014
Q32018
95-th percentile2021
Maximum2023
Range19
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.4286241
Coefficient of variation (CV)0.0021986534
Kurtosis-1.008419
Mean2014.2439
Median Absolute Deviation (MAD)4
Skewness-0.13571301
Sum19125246
Variance19.612712
MonotonicityNot monotonic
2024-03-23T07:44:29.919386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
2018 753
 
7.8%
2016 727
 
7.6%
2020 682
 
7.1%
2014 680
 
7.1%
2012 673
 
7.0%
2011 635
 
6.6%
2019 635
 
6.6%
2013 634
 
6.6%
2008 611
 
6.4%
2015 586
 
6.1%
Other values (10) 2879
29.9%
ValueCountFrequency (%)
2004 3
 
< 0.1%
2005 189
 
2.0%
2006 132
 
1.4%
2007 224
 
2.3%
2008 611
6.4%
2009 585
6.1%
2010 488
5.1%
2011 635
6.6%
2012 673
7.0%
2013 634
6.6%
ValueCountFrequency (%)
2023 2
 
< 0.1%
2022 178
 
1.9%
2021 526
5.5%
2020 682
7.1%
2019 635
6.6%
2018 753
7.8%
2017 552
5.7%
2016 727
7.6%
2015 586
6.1%
2014 680
7.1%

용도
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size75.2 KiB
정곡
9555 
<NA>
 
31
대북
 
29

Length

Max length4
Median length2
Mean length2.0064483
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row정곡
2nd row정곡
3rd row정곡
4th row정곡
5th row정곡

Common Values

ValueCountFrequency (%)
정곡 9555
99.4%
<NA> 31
 
0.3%
대북 29
 
0.3%

Length

2024-03-23T07:44:30.384990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:44:30.795173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정곡 9555
99.4%
na 31
 
0.3%
대북 29
 
0.3%

원산지
Categorical

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size75.2 KiB
국산
4469 
중국
2061 
미국
1623 
태국
785 
베트남
 
395
Other values (4)
 
282

Length

Max length4
Median length2
Mean length2.049818
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row중국
2nd row국산
3rd row중국
4th row국산
5th row국산

Common Values

ValueCountFrequency (%)
국산 4469
46.5%
중국 2061
21.4%
미국 1623
 
16.9%
태국 785
 
8.2%
베트남 395
 
4.1%
호주 204
 
2.1%
<NA> 41
 
0.4%
인도 36
 
0.4%
파키스탄 1
 
< 0.1%

Length

2024-03-23T07:44:31.159923image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:44:31.711318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국산 4469
46.5%
중국 2061
21.4%
미국 1623
 
16.9%
태국 785
 
8.2%
베트남 395
 
4.1%
호주 204
 
2.1%
na 41
 
0.4%
인도 36
 
0.4%
파키스탄 1
 
< 0.1%

검사수량
Real number (ℝ)

Distinct6414
Distinct (%)66.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean934687.72
Minimum0
Maximum20756080
Zeros43
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size84.6 KiB
2024-03-23T07:44:32.135321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile9788
Q1113000
median378520
Q31119580
95-th percentile3568040
Maximum20756080
Range20756080
Interquartile range (IQR)1006580

Descriptive statistics

Standard deviation1525795.8
Coefficient of variation (CV)1.6324124
Kurtosis29.891932
Mean934687.72
Median Absolute Deviation (MAD)328520
Skewness4.3529766
Sum8.9870224 × 109
Variance2.3280529 × 1012
MonotonicityNot monotonic
2024-03-23T07:44:32.644264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000.0 124
 
1.3%
200000.0 79
 
0.8%
50000.0 67
 
0.7%
150000.0 47
 
0.5%
10000.0 46
 
0.5%
0.0 43
 
0.4%
30000.0 42
 
0.4%
60000.0 41
 
0.4%
40000.0 39
 
0.4%
80000.0 38
 
0.4%
Other values (6404) 9049
94.1%
ValueCountFrequency (%)
0.0 43
0.4%
20.0 1
 
< 0.1%
40.0 4
 
< 0.1%
80.0 3
 
< 0.1%
120.0 1
 
< 0.1%
160.0 1
 
< 0.1%
200.0 2
 
< 0.1%
269.0 1
 
< 0.1%
296.0 1
 
< 0.1%
320.0 3
 
< 0.1%
ValueCountFrequency (%)
20756080.0 1
< 0.1%
20447000.0 1
< 0.1%
20075000.0 1
< 0.1%
18161280.0 1
< 0.1%
17180400.0 1
< 0.1%
16235000.0 1
< 0.1%
15599360.0 1
< 0.1%
15247240.0 1
< 0.1%
15087600.0 1
< 0.1%
15001000.0 1
< 0.1%

Interactions

2024-03-23T07:44:25.867292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:24.063432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:24.924616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:26.133984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:24.332645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:25.436500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:26.369122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:24.627927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:25.636526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:44:32.903238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신청년도시군연산용도원산지검사수량
신청년도1.0000.1600.9180.2330.2340.256
시군0.1601.0000.1870.0000.3140.222
연산0.9180.1871.0000.3020.3200.216
용도0.2330.0000.3021.0000.0590.000
원산지0.2340.3140.3200.0591.0000.199
검사수량0.2560.2220.2160.0000.1991.000
2024-03-23T07:44:33.227423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
용도원산지
용도1.0000.044
원산지0.0441.000
2024-03-23T07:44:33.468677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신청년도연산검사수량용도원산지
신청년도1.0000.9640.0770.1210.115
연산0.9641.0000.0890.2320.158
검사수량0.0770.0891.0000.0000.096
용도0.1210.2320.0001.0000.044
원산지0.1150.1580.0960.0441.000

Missing values

2024-03-23T07:44:26.580537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:44:26.785564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

신청년도시군연산용도원산지검사수량
02023강원특별자치도 강릉시2020정곡중국479000.0
12023강원특별자치도 강릉시2021정곡국산66840.0
22023강원특별자치도 강릉시2021정곡중국1398000.0
32023강원특별자치도 강릉시2022정곡국산1089800.0
42023강원특별자치도 고성군2018정곡국산11000.0
52023강원특별자치도 고성군2020정곡국산962000.0
62023강원특별자치도 고성군2021정곡국산1753000.0
72023강원특별자치도 고성군2021정곡중국1266000.0
82023강원특별자치도 고성군2022정곡국산1788890.0
92023강원특별자치도 고성군2022정곡중국116000.0
신청년도시군연산용도원산지검사수량
96052009충청북도 청주시 흥덕구<NA>정곡중국32000.0
96062009충청북도 청주시 흥덕구<NA>정곡태국7920.0
96072009충청북도 청주시 흥덕구<NA>정곡<NA>247760.0
96082009충청북도 충주시2005정곡국산72440.0
96092009충청북도 충주시2007정곡국산32000.0
96102009충청북도 충주시2008정곡국산1401240.0
96112009충청북도 충주시2008정곡미국440280.0
96122009충청북도 충주시2008정곡중국388920.0
96132009충청북도 충주시2008정곡태국51040.0
96142009충청북도 충주시<NA>정곡<NA>26440.0