Overview

Dataset statistics

Number of variables9
Number of observations3330
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory253.8 KiB
Average record size in memory78.0 B

Variable types

Categorical4
Numeric5

Dataset

Description녹비작물 종자 지역별 공급 현황정보
Author농림축산식품부
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220216000000001997

Alerts

SIGNGU_NM has a high cardinality: 153 distinct values High cardinality
AR is highly correlated with VOLMHigh correlation
VOLM is highly correlated with ARHigh correlation
AR has 1503 (45.1%) zeros Zeros
VOLM has 1508 (45.3%) zeros Zeros

Reproduction

Analysis started2022-08-12 14:44:20.781359
Analysis finished2022-08-12 14:44:28.090369
Duration7.31 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

YEAR
Categorical

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size26.1 KiB
2013
790 
2014
740 
2015
655 
2016
650 
2017
495 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013
2nd row2013
3rd row2013
4th row2013
5th row2013

Common Values

ValueCountFrequency (%)
2013790
23.7%
2014740
22.2%
2015655
19.7%
2016650
19.5%
2017495
14.9%

Length

2022-08-12T23:44:28.178365image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:28.358841image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
2013790
23.7%
2014740
22.2%
2015655
19.7%
2016650
19.5%
2017495
14.9%

CTRD_NM
Categorical

Distinct16
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size26.1 KiB
경상북도
505 
전라남도
495 
강원도
415 
경상남도
410 
경기도
340 
Other values (11)
1165 

Length

Max length7
Median length4
Mean length3.896396396
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강원도
2nd row경상북도
3rd row충청남도
4th row경상남도
5th row전라남도

Common Values

ValueCountFrequency (%)
경상북도505
15.2%
전라남도495
14.9%
강원도415
12.5%
경상남도410
12.3%
경기도340
10.2%
충청남도325
9.8%
전라북도310
9.3%
충청북도260
7.8%
대전광역시50
 
1.5%
제주특별자치도45
 
1.4%
Other values (6)175
 
5.3%

Length

2022-08-12T23:44:28.690987image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경상북도505
15.2%
전라남도495
14.9%
강원도415
12.5%
경상남도410
12.3%
경기도340
10.2%
충청남도325
9.8%
전라북도310
9.3%
충청북도260
7.8%
대전광역시50
 
1.5%
제주특별자치도45
 
1.4%
Other values (6)175
 
5.3%

CTRD_CODE
Real number (ℝ≥0)

Distinct16
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6433153.153
Minimum5690000
Maximum6500000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.4 KiB
2022-08-12T23:44:28.894683image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum5690000
5-th percentile6300000
Q16420000
median6450000
Q36470000
95-th percentile6480000
Maximum6500000
Range810000
Interquartile range (IQR)50000

Descriptive statistics

Standard deviation78503.3705
Coefficient of variation (CV)0.01220293822
Kurtosis58.18409364
Mean6433153.153
Median Absolute Deviation (MAD)20000
Skewness-6.706491034
Sum2.14224 × 1010
Variance6162779181
MonotonicityNot monotonic
2022-08-12T23:44:29.090018image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
6470000505
15.2%
6460000495
14.9%
6420000415
12.5%
6480000410
12.3%
6410000340
10.2%
6440000325
9.8%
6450000310
9.3%
6430000260
7.8%
630000050
 
1.5%
650000045
 
1.4%
Other values (6)175
 
5.3%
ValueCountFrequency (%)
569000025
 
0.8%
626000025
 
0.8%
627000010
 
0.3%
628000045
 
1.4%
629000035
 
1.1%
630000050
 
1.5%
631000035
 
1.1%
6410000340
10.2%
6420000415
12.5%
6430000260
7.8%
ValueCountFrequency (%)
650000045
 
1.4%
6480000410
12.3%
6470000505
15.2%
6460000495
14.9%
6450000310
9.3%
6440000325
9.8%
6430000260
7.8%
6420000415
12.5%
6410000340
10.2%
631000035
 
1.1%

SIGNGU_NM
Categorical

HIGH CARDINALITY

Distinct153
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Memory size26.1 KiB
고성군
 
45
중구
 
30
포천시
 
25
연천군
 
25
안성시
 
25
Other values (148)
3180 

Length

Max length4
Median length3
Mean length2.984984985
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row철원군
2nd row청송군
3rd row청양군
4th row남해군
5th row강진군

Common Values

ValueCountFrequency (%)
고성군45
 
1.4%
중구30
 
0.9%
포천시25
 
0.8%
연천군25
 
0.8%
안성시25
 
0.8%
청도군25
 
0.8%
봉화군25
 
0.8%
화성시25
 
0.8%
울진군25
 
0.8%
인제군25
 
0.8%
Other values (143)3055
91.7%

Length

2022-08-12T23:44:29.311102image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
고성군45
 
1.4%
중구30
 
0.9%
장수군25
 
0.8%
철원군25
 
0.8%
여수시25
 
0.8%
구미시25
 
0.8%
평택시25
 
0.8%
태백시25
 
0.8%
정읍시25
 
0.8%
공주시25
 
0.8%
Other values (143)3055
91.7%

SIGNGU_CODE
Real number (ℝ≥0)

Distinct158
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4810773.348
Minimum3360000
Maximum9999010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.4 KiB
2022-08-12T23:44:29.546123image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum3360000
5-th percentile3690000
Q14350000
median4800000
Q35180000
95-th percentile5670000
Maximum9999010
Range6639010
Interquartile range (IQR)830000

Descriptive statistics

Standard deviation725007.1478
Coefficient of variation (CV)0.1507049065
Kurtosis17.93432279
Mean4810773.348
Median Absolute Deviation (MAD)400000
Skewness2.668954566
Sum1.601987525 × 1010
Variance5.256353644 × 1011
MonotonicityNot monotonic
2022-08-12T23:44:29.805330image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
430000025
 
0.8%
417000025
 
0.8%
546000025
 
0.8%
524000025
 
0.8%
553000025
 
0.8%
525000025
 
0.8%
462000025
 
0.8%
519000025
 
0.8%
560000025
 
0.8%
570000025
 
0.8%
Other values (148)3080
92.5%
ValueCountFrequency (%)
336000010
 
0.3%
340000015
0.5%
348000010
 
0.3%
34900005
 
0.2%
35500005
 
0.2%
357000025
0.8%
358000010
 
0.3%
35900005
 
0.2%
36100005
 
0.2%
362000010
 
0.3%
ValueCountFrequency (%)
999901025
0.8%
652000020
0.6%
651000025
0.8%
571000025
0.8%
570000025
0.8%
568000025
0.8%
567000025
0.8%
560000025
0.8%
559000015
0.5%
558000010
 
0.3%

FRMHS_CO
Real number (ℝ≥0)

Distinct479
Distinct (%)14.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean869.1171171
Minimum1
Maximum9102
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.4 KiB
2022-08-12T23:44:30.063955image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q158
median269
Q31039
95-th percentile4038
Maximum9102
Range9101
Interquartile range (IQR)981

Descriptive statistics

Standard deviation1453.908488
Coefficient of variation (CV)1.672856809
Kurtosis9.698317504
Mean869.1171171
Median Absolute Deviation (MAD)253
Skewness2.938884766
Sum2894160
Variance2113849.89
MonotonicityNot monotonic
2022-08-12T23:44:30.322339image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
145
 
1.4%
540
 
1.2%
340
 
1.2%
235
 
1.1%
1035
 
1.1%
1930
 
0.9%
1630
 
0.9%
5830
 
0.9%
5730
 
0.9%
630
 
0.9%
Other values (469)2985
89.6%
ValueCountFrequency (%)
145
1.4%
235
1.1%
340
1.2%
425
0.8%
540
1.2%
630
0.9%
710
 
0.3%
810
 
0.3%
925
0.8%
1035
1.1%
ValueCountFrequency (%)
91025
0.2%
85575
0.2%
84765
0.2%
84045
0.2%
80485
0.2%
78825
0.2%
77555
0.2%
75745
0.2%
72115
0.2%
70825
0.2%

PRDLST_NM
Categorical

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size26.1 KiB
호밀
666 
헤어리베치
666 
녹비(청)보리
666 
들목새
666 
자운영
666 

Length

Max length7
Median length5
Mean length4
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row호밀
2nd row호밀
3rd row호밀
4th row호밀
5th row호밀

Common Values

ValueCountFrequency (%)
호밀666
20.0%
헤어리베치666
20.0%
녹비(청)보리666
20.0%
들목새666
20.0%
자운영666
20.0%

Length

2022-08-12T23:44:30.566751image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:30.784273image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
호밀666
20.0%
헤어리베치666
20.0%
녹비(청)보리666
20.0%
들목새666
20.0%
자운영666
20.0%

AR
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct1814
Distinct (%)54.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean404502.1218
Minimum0
Maximum15995095
Zeros1503
Zeros (%)45.1%
Negative0
Negative (%)0.0%
Memory size29.4 KiB
2022-08-12T23:44:31.002938image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median6093.5
Q3174472.5
95-th percentile2066391.2
Maximum15995095
Range15995095
Interquartile range (IQR)174472.5

Descriptive statistics

Standard deviation1292497.283
Coefficient of variation (CV)3.195279366
Kurtosis48.41242495
Mean404502.1218
Median Absolute Deviation (MAD)6093.5
Skewness6.172817172
Sum1346992065
Variance1.670549227 × 1012
MonotonicityNot monotonic
2022-08-12T23:44:31.390440image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01503
45.1%
113882
 
0.1%
223592
 
0.1%
12922
 
0.1%
15772
 
0.1%
20002
 
0.1%
41932
 
0.1%
4112.92
 
0.1%
107602
 
0.1%
116642
 
0.1%
Other values (1804)1809
54.3%
ValueCountFrequency (%)
01503
45.1%
2081
 
< 0.1%
5361
 
< 0.1%
5851
 
< 0.1%
9001
 
< 0.1%
10002
 
0.1%
10601
 
< 0.1%
10651
 
< 0.1%
10911
 
< 0.1%
11701
 
< 0.1%
ValueCountFrequency (%)
159950951
< 0.1%
15802421.71
< 0.1%
15108395.21
< 0.1%
13614790.81
< 0.1%
132496911
< 0.1%
13094147.71
< 0.1%
12991182.21
< 0.1%
12337307.41
< 0.1%
11841065.71
< 0.1%
11143780.21
< 0.1%

VOLM
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct763
Distinct (%)22.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean334.090991
Minimum0
Maximum18672
Zeros1508
Zeros (%)45.3%
Negative0
Negative (%)0.0%
Memory size29.4 KiB
2022-08-12T23:44:31.631384image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median5
Q3140
95-th percentile1633.55
Maximum18672
Range18672
Interquartile range (IQR)140

Descriptive statistics

Standard deviation1203.840473
Coefficient of variation (CV)3.603331144
Kurtosis82.10911534
Mean334.090991
Median Absolute Deviation (MAD)5
Skewness7.878045575
Sum1112523
Variance1449231.884
MonotonicityNot monotonic
2022-08-12T23:44:31.828339image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01508
45.3%
141
 
1.2%
438
 
1.1%
235
 
1.1%
534
 
1.0%
326
 
0.8%
825
 
0.8%
1222
 
0.7%
721
 
0.6%
620
 
0.6%
Other values (753)1560
46.8%
ValueCountFrequency (%)
01508
45.3%
141
 
1.2%
235
 
1.1%
326
 
0.8%
438
 
1.1%
534
 
1.0%
620
 
0.6%
721
 
0.6%
825
 
0.8%
915
 
0.5%
ValueCountFrequency (%)
186721
< 0.1%
183921
< 0.1%
173791
< 0.1%
148731
< 0.1%
146251
< 0.1%
128961
< 0.1%
127531
< 0.1%
126221
< 0.1%
124831
< 0.1%
117741
< 0.1%

Interactions

2022-08-12T23:44:26.404521image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:21.402501image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:23.106899image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:24.157599image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:25.182819image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:26.642319image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:21.829221image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:23.340261image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:24.363032image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:25.550581image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:26.868091image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:22.201447image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:23.549294image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:24.603008image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:25.744177image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:27.081013image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:22.578399image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:23.776954image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:24.775099image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:25.929944image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:27.328481image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:22.866768image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:23.949437image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:24.981152image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:44:26.169594image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-08-12T23:44:32.032604image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-12T23:44:32.245047image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-12T23:44:32.471063image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-12T23:44:32.684104image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-08-12T23:44:32.857414image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-08-12T23:44:27.684855image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-12T23:44:27.987805image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

YEARCTRD_NMCTRD_CODESIGNGU_NMSIGNGU_CODEFRMHS_COPRDLST_NMARVOLM
02013강원도6420000철원군4300000204호밀313131.9253
12013경상북도6470000청송군51600001181호밀2403573.01944
22013충청남도6440000청양군4590000820호밀1294832.21051
32013경상남도6480000남해군54300001352호밀167248.5133
42013전라남도6460000강진군49200008476호밀1083821.0875
52013강원도6420000화천군43100001454호밀4120646.03282
62013경상북도6470000영양군5170000454호밀348545.5276
72013전라북도6450000부안군4790000908호밀233356.9180
82013충청남도6440000홍성군4600000343호밀454769.8362
92013경상남도6480000하동군54400004440호밀2628165.02101

Last rows

YEARCTRD_NMCTRD_CODESIGNGU_NMSIGNGU_CODEFRMHS_COPRDLST_NMARVOLM
33202016전라북도6450000임실군47600006자운영0.00
33212016경상남도6480000창녕군5410000476자운영10719.01
33222016강원도6420000정선군4290000250자운영0.00
33232016전라남도6460000화순군49000001146자운영562825.5125
33242016경상북도6470000의성군515000027자운영0.00
33252016전라북도6450000순창군4770000312자운영5368.52
33262016경상남도6480000고성군5420000148자운영0.00
33272016충청북도6430000단양군448000046자운영0.00
33282016전라남도6460000장흥군49100001051자운영1079907.7209
33292016강원도6420000철원군430000058자운영0.00