Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells120
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory576.2 KiB
Average record size in memory59.0 B

Variable types

Numeric3
Text1
Categorical2

Dataset

Description국립농산물품질관리원에서 관리하는 쌀 등 정곡에 대한 검사 실적 정보(신청년도, 시군, 연산, 용도, 원산지, 검사수량 등)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001690

Alerts

신청년도 is highly overall correlated with 연산High correlation
연산 is highly overall correlated with 신청년도High correlation
용도 is highly imbalanced (96.3%)Imbalance
연산 has 120 (1.2%) missing valuesMissing

Reproduction

Analysis started2024-03-23 07:44:35.153273
Analysis finished2024-03-23 07:44:39.392293
Duration4.24 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

신청년도
Real number (ℝ)

HIGH CORRELATION 

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.6471
Minimum2009
Maximum2024
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:44:39.487341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12013
median2017
Q32021
95-th percentile2023
Maximum2024
Range15
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.4754402
Coefficient of variation (CV)0.0022192481
Kurtosis-1.1808102
Mean2016.6471
Median Absolute Deviation (MAD)4
Skewness-0.070336722
Sum20166471
Variance20.029565
MonotonicityNot monotonic
2024-03-23T07:44:39.719743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
2023 787
 
7.9%
2022 765
 
7.6%
2019 732
 
7.3%
2015 711
 
7.1%
2014 699
 
7.0%
2018 666
 
6.7%
2016 621
 
6.2%
2021 618
 
6.2%
2020 596
 
6.0%
2011 595
 
5.9%
Other values (6) 3210
32.1%
ValueCountFrequency (%)
2009 534
5.3%
2010 583
5.8%
2011 595
5.9%
2012 547
5.5%
2013 575
5.8%
2014 699
7.0%
2015 711
7.1%
2016 621
6.2%
2017 595
5.9%
2018 666
6.7%
ValueCountFrequency (%)
2024 376
3.8%
2023 787
7.9%
2022 765
7.6%
2021 618
6.2%
2020 596
6.0%
2019 732
7.3%
2018 666
6.7%
2017 595
5.9%
2016 621
6.2%
2015 711
7.1%

시군
Text

Distinct100
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-23T07:44:40.281356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length8
Mean length8.7572
Min length7

Characters and Unicode

Total characters87572
Distinct characters94
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row경상남도 합천군
2nd row전북특별자치도 군산시
3rd row전라남도 담양군
4th row전라남도 화순군
5th row전북특별자치도 김제시
ValueCountFrequency (%)
전라남도 1928
 
9.3%
경상북도 1819
 
8.8%
전북특별자치도 1184
 
5.7%
경상남도 1094
 
5.3%
경기도 1030
 
5.0%
충청남도 957
 
4.6%
충청북도 834
 
4.0%
강원특별자치도 712
 
3.5%
북구 175
 
0.8%
논산시 172
 
0.8%
Other values (108) 10730
52.0%
2024-03-23T07:44:41.329235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10646
 
12.2%
9638
 
11.0%
5155
 
5.9%
4984
 
5.7%
4350
 
5.0%
4221
 
4.8%
4012
 
4.6%
3295
 
3.8%
3048
 
3.5%
2168
 
2.5%
Other values (84) 36055
41.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 76926
87.8%
Space Separator 10646
 
12.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9638
 
12.5%
5155
 
6.7%
4984
 
6.5%
4350
 
5.7%
4221
 
5.5%
4012
 
5.2%
3295
 
4.3%
3048
 
4.0%
2168
 
2.8%
1940
 
2.5%
Other values (83) 34115
44.3%
Space Separator
ValueCountFrequency (%)
10646
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 76926
87.8%
Common 10646
 
12.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9638
 
12.5%
5155
 
6.7%
4984
 
6.5%
4350
 
5.7%
4221
 
5.5%
4012
 
5.2%
3295
 
4.3%
3048
 
4.0%
2168
 
2.8%
1940
 
2.5%
Other values (83) 34115
44.3%
Common
ValueCountFrequency (%)
10646
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 76926
87.8%
ASCII 10646
 
12.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10646
100.0%
Hangul
ValueCountFrequency (%)
9638
 
12.5%
5155
 
6.7%
4984
 
6.5%
4350
 
5.7%
4221
 
5.5%
4012
 
5.2%
3295
 
4.3%
3048
 
4.0%
2168
 
2.8%
1940
 
2.5%
Other values (83) 34115
44.3%

연산
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct20
Distinct (%)0.2%
Missing120
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean2014.6002
Minimum2004
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:44:41.933642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2004
5-th percentile2007
Q12011
median2015
Q32018
95-th percentile2022
Maximum2023
Range19
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.6213086
Coefficient of variation (CV)0.0022939086
Kurtosis-1.0150978
Mean2014.6002
Median Absolute Deviation (MAD)4
Skewness-0.12617413
Sum19904250
Variance21.356494
MonotonicityNot monotonic
2024-03-23T07:44:42.317488image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
2018 749
 
7.5%
2016 721
 
7.2%
2020 697
 
7.0%
2014 675
 
6.8%
2012 668
 
6.7%
2011 632
 
6.3%
2021 628
 
6.3%
2019 628
 
6.3%
2013 627
 
6.3%
2008 605
 
6.0%
Other values (10) 3250
32.5%
ValueCountFrequency (%)
2004 3
 
< 0.1%
2005 189
 
1.9%
2006 132
 
1.3%
2007 221
 
2.2%
2008 605
6.0%
2009 580
5.8%
2010 484
4.8%
2011 632
6.3%
2012 668
6.7%
2013 627
6.3%
ValueCountFrequency (%)
2023 114
 
1.1%
2022 398
4.0%
2021 628
6.3%
2020 697
7.0%
2019 628
6.3%
2018 749
7.5%
2017 547
5.5%
2016 721
7.2%
2015 582
5.8%
2014 675
6.8%

용도
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
정곡
9941 
<NA>
 
30
대북
 
29

Length

Max length4
Median length2
Mean length2.006
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row정곡
2nd row정곡
3rd row정곡
4th row정곡
5th row정곡

Common Values

ValueCountFrequency (%)
정곡 9941
99.4%
<NA> 30
 
0.3%
대북 29
 
0.3%

Length

2024-03-23T07:44:42.753266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:44:43.097496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정곡 9941
99.4%
na 30
 
0.3%
대북 29
 
0.3%

원산지
Categorical

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
국산
4663 
중국
2187 
미국
1644 
태국
806 
베트남
 
413
Other values (4)
 
287

Length

Max length4
Median length2
Mean length2.0497
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row국산
2nd row국산
3rd row국산
4th row중국
5th row미국

Common Values

ValueCountFrequency (%)
국산 4663
46.6%
중국 2187
21.9%
미국 1644
 
16.4%
태국 806
 
8.1%
베트남 413
 
4.1%
호주 209
 
2.1%
<NA> 41
 
0.4%
인도 36
 
0.4%
파키스탄 1
 
< 0.1%

Length

2024-03-23T07:44:43.468779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:44:43.837829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국산 4663
46.6%
중국 2187
21.9%
미국 1644
 
16.4%
태국 806
 
8.1%
베트남 413
 
4.1%
호주 209
 
2.1%
na 41
 
0.4%
인도 36
 
0.4%
파키스탄 1
 
< 0.1%

검사수량
Real number (ℝ)

Distinct6604
Distinct (%)66.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean938211.23
Minimum0
Maximum20756080
Zeros44
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T07:44:44.449588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile9560
Q1111990
median375620
Q31118462.5
95-th percentile3613973
Maximum20756080
Range20756080
Interquartile range (IQR)1006472.5

Descriptive statistics

Standard deviation1530953.1
Coefficient of variation (CV)1.6317787
Kurtosis28.856746
Mean938211.23
Median Absolute Deviation (MAD)325620
Skewness4.276798
Sum9.3821123 × 109
Variance2.3438174 × 1012
MonotonicityNot monotonic
2024-03-23T07:44:44.872196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000.0 129
 
1.3%
200000.0 83
 
0.8%
50000.0 70
 
0.7%
60000.0 49
 
0.5%
10000.0 49
 
0.5%
150000.0 48
 
0.5%
0.0 44
 
0.4%
30000.0 44
 
0.4%
40000.0 40
 
0.4%
80000.0 40
 
0.4%
Other values (6594) 9404
94.0%
ValueCountFrequency (%)
0.0 44
0.4%
20.0 1
 
< 0.1%
40.0 4
 
< 0.1%
80.0 4
 
< 0.1%
120.0 1
 
< 0.1%
160.0 1
 
< 0.1%
200.0 2
 
< 0.1%
269.0 1
 
< 0.1%
296.0 1
 
< 0.1%
320.0 3
 
< 0.1%
ValueCountFrequency (%)
20756080.0 1
< 0.1%
20447000.0 1
< 0.1%
20075000.0 1
< 0.1%
18161280.0 1
< 0.1%
17180400.0 1
< 0.1%
16235000.0 1
< 0.1%
15599360.0 1
< 0.1%
15247240.0 1
< 0.1%
15087600.0 1
< 0.1%
15001000.0 1
< 0.1%

Interactions

2024-03-23T07:44:38.300862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:36.528917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:37.457385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:38.575497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:36.907079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:37.752256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:38.814071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:37.183289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:44:38.037662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:44:45.154804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신청년도시군연산용도원산지검사수량
신청년도1.0000.1590.9270.2350.2440.236
시군0.1591.0000.1970.0000.3160.221
연산0.9270.1971.0000.3040.3080.217
용도0.2350.0000.3041.0000.0570.000
원산지0.2440.3160.3080.0571.0000.199
검사수량0.2360.2210.2170.0000.1991.000
2024-03-23T07:44:45.415848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
용도원산지
용도1.0000.043
원산지0.0431.000
2024-03-23T07:44:45.692854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신청년도연산검사수량용도원산지
신청년도1.0000.9680.0650.1220.120
연산0.9681.0000.0760.2330.152
검사수량0.0650.0761.0000.0000.096
용도0.1220.2330.0001.0000.043
원산지0.1200.1520.0960.0431.000

Missing values

2024-03-23T07:44:39.109712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:44:39.308509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

신청년도시군연산용도원산지검사수량
21262021경상남도 합천군2019정곡국산33000.0
2752024전북특별자치도 군산시2022정곡국산1078400.0
61952015전라남도 담양군2013정곡국산121000.0
93572010전라남도 화순군2008정곡중국106000.0
30072020전북특별자치도 김제시2018정곡미국9000.0
62152015전라남도 목포시2014정곡중국62920.0
66272014경기도 평택시2013정곡미국147000.0
99192009전북특별자치도 군산시2007정곡미국4000.0
98382009전라남도 곡성군2008정곡미국156520.0
100702009충청북도 청주시 흥덕구<NA>정곡태국7920.0
신청년도시군연산용도원산지검사수량
31422020충청북도 옥천군2019정곡국산2625190.0
91792010경상북도 영천시2007<NA>국산93240.0
46922017경상남도 고성군2013정곡국산3397000.0
20612021경상남도 산청군2017정곡국산479000.0
45612018충청북도 청주시 흥덕구2015정곡미국140000.0
46392017경기도 안성시2016정곡미국775000.0
59072015경기도 평택시2013정곡중국1928240.0
36522019전라남도 영암군2017정곡미국36000.0
77292013충청남도 서산시2010정곡국산230000.0
51282017충청남도 홍성군2015정곡태국2000.0