Overview

Dataset statistics

Number of variables13
Number of observations10000
Missing cells1091
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.1 MiB
Average record size in memory112.0 B

Variable types

Numeric8
Categorical5

Dataset

Description중국농산물 가격정보
Author농림수산식품교육문화정보원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220210000000001818

Alerts

df_index is highly correlated with DATESHigh correlation
DATES is highly correlated with df_indexHigh correlation
MCLASSCODE is highly correlated with SCLASSCODEHigh correlation
SCLASSCODE is highly correlated with MCLASSCODEHigh correlation
MINPRICE is highly correlated with AVGPRICE and 1 other fieldsHigh correlation
AVGPRICE is highly correlated with MINPRICE and 1 other fieldsHigh correlation
MAXPRICE is highly correlated with MINPRICE and 1 other fieldsHigh correlation
GRADECODE is highly correlated with GRADENAME and 2 other fieldsHigh correlation
GRADENAME is highly correlated with GRADECODE and 2 other fieldsHigh correlation
SCLASSNAME is highly correlated with GRADECODE and 2 other fieldsHigh correlation
MCLASSNAME is highly correlated with GRADECODE and 2 other fieldsHigh correlation
MINPRICE has 545 (5.5%) missing values Missing
MAXPRICE has 546 (5.5%) missing values Missing
df_index has unique values Unique

Reproduction

Analysis started2022-08-12 15:26:19.482702
Analysis finished2022-08-12 15:26:33.269617
Duration13.79 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25373.0856
Minimum2
Maximum50653
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-08-13T00:26:33.334297image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2500.8
Q112622.75
median25438
Q337956.25
95-th percentile48228.1
Maximum50653
Range50651
Interquartile range (IQR)25333.5

Descriptive statistics

Standard deviation14663.83478
Coefficient of variation (CV)0.5779287161
Kurtosis-1.202224244
Mean25373.0856
Median Absolute Deviation (MAD)12648.5
Skewness-0.00378208083
Sum253730856
Variance215028050.6
MonotonicityNot monotonic
2022-08-13T00:26:33.472845image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
91971
 
< 0.1%
310171
 
< 0.1%
281851
 
< 0.1%
307581
 
< 0.1%
490791
 
< 0.1%
338611
 
< 0.1%
267301
 
< 0.1%
161801
 
< 0.1%
52221
 
< 0.1%
57061
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
21
< 0.1%
131
< 0.1%
151
< 0.1%
261
< 0.1%
271
< 0.1%
291
< 0.1%
351
< 0.1%
411
< 0.1%
431
< 0.1%
501
< 0.1%
ValueCountFrequency (%)
506531
< 0.1%
506491
< 0.1%
506341
< 0.1%
506311
< 0.1%
506291
< 0.1%
506251
< 0.1%
506141
< 0.1%
506101
< 0.1%
505951
< 0.1%
505861
< 0.1%

DATES
Real number (ℝ≥0)

HIGH CORRELATION

Distinct219
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20190567.28
Minimum20190101
Maximum20191223
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-08-13T00:26:33.613479image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20190101
5-th percentile20190115
Q120190318
median20190529
Q320190813
95-th percentile20191018
Maximum20191223
Range1122
Interquartile range (IQR)495

Descriptive statistics

Standard deviation293.7274318
Coefficient of variation (CV)1.454775528 × 10-5
Kurtosis-1.024916437
Mean20190567.28
Median Absolute Deviation (MAD)224
Skewness0.1031900595
Sum2.019056728 × 1011
Variance86275.80417
MonotonicityNot monotonic
2022-08-13T00:26:33.755033image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2019072268
 
0.7%
2019041066
 
0.7%
2019041965
 
0.7%
2019041864
 
0.6%
2019122362
 
0.6%
2019101461
 
0.6%
2019040360
 
0.6%
2019080760
 
0.6%
2019012860
 
0.6%
2019102259
 
0.6%
Other values (209)9375
93.8%
ValueCountFrequency (%)
2019010131
0.3%
2019010235
0.4%
2019010357
0.6%
2019010436
0.4%
2019010743
0.4%
2019010857
0.6%
2019010950
0.5%
2019011040
0.4%
2019011155
0.5%
2019011450
0.5%
ValueCountFrequency (%)
2019122362
0.6%
2019121644
0.4%
2019120952
0.5%
2019111456
0.6%
2019102535
0.4%
2019102451
0.5%
2019102358
0.6%
2019102259
0.6%
2019102147
0.5%
2019101847
0.5%

MARKETCO
Real number (ℝ≥0)

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2527163.529
Minimum1100002
Maximum5107001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-08-13T00:26:33.888569image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1100002
5-th percentile1100002
Q11102221
median2310001
Q33701005
95-th percentile5101006
Maximum5107001
Range4006999
Interquartile range (IQR)2598784

Descriptive statistics

Standard deviation1353457.642
Coefficient of variation (CV)0.5355639343
Kurtosis-1.433828295
Mean2527163.529
Median Absolute Deviation (MAD)1209996
Skewness0.2570646287
Sum2.527163528 × 1010
Variance1.831847589 × 1012
MonotonicityNot monotonic
2022-08-13T00:26:33.999312image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
1100003961
 
9.6%
1100002771
 
7.7%
1100005687
 
6.9%
3401002650
 
6.5%
1102221620
 
6.2%
1501001538
 
5.4%
3702006526
 
5.3%
3205001526
 
5.3%
1404001523
 
5.2%
3509821462
 
4.6%
Other values (15)3736
37.4%
ValueCountFrequency (%)
1100002771
7.7%
1100003961
9.6%
1100005687
6.9%
1102221620
6.2%
1200001110
 
1.1%
1200010429
4.3%
131082160
 
0.6%
1404001523
5.2%
1501001538
5.4%
2301006168
 
1.7%
ValueCountFrequency (%)
5107001332
3.3%
5101006209
 
2.1%
4405001325
3.2%
4305001118
 
1.2%
4201005373
3.7%
4114001199
 
2.0%
3702006526
5.3%
3701221110
 
1.1%
3701005344
3.4%
3604001400
4.0%

MARKETNM
Categorical

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
북경신발지농부산물도매시장
961 
북경화간악각장도매시장
771 
북경팔리교농산물중심도매시장
687 
안휘합비주곡퇴농산물도매시장
650 
북경순의석문채소도매시장
 
620
Other values (20)
6311 

Length

Max length19
Median length17
Mean length13.4086
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row안휘합비주곡퇴농산물도매시장
2nd row안휘합비주곡퇴농산물도매시장
3rd row산동청도성양채소도매시장
4th row흑룡강성할빈하달과채도매시장유한공사
5th row산동청도성양채소도매시장

Common Values

ValueCountFrequency (%)
북경신발지농부산물도매시장961
 
9.6%
북경화간악각장도매시장771
 
7.7%
북경팔리교농산물중심도매시장687
 
6.9%
안휘합비주곡퇴농산물도매시장650
 
6.5%
북경순의석문채소도매시장620
 
6.2%
호화호특시동와요도매시장538
 
5.4%
산동청도성양채소도매시장526
 
5.3%
강소소주남환교농부산물도매시장526
 
5.3%
산서장치시자방농부산물종합교역시장523
 
5.2%
복건복정민절변계농무중심시장462
 
4.6%
Other values (15)3736
37.4%

Length

2022-08-13T00:26:34.125937image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
북경신발지농부산물도매시장961
 
9.6%
북경화간악각장도매시장771
 
7.7%
북경팔리교농산물중심도매시장687
 
6.9%
안휘합비주곡퇴농산물도매시장650
 
6.5%
북경순의석문채소도매시장620
 
6.2%
호화호특시동와요도매시장538
 
5.4%
산동청도성양채소도매시장526
 
5.3%
강소소주남환교농부산물도매시장526
 
5.3%
산서장치시자방농부산물종합교역시장523
 
5.2%
복건복정민절변계농무중심시장462
 
4.6%
Other values (15)3736
37.4%

MCLASSCODE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1886.8069
Minimum1112
Maximum2099
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-08-13T00:26:34.225010image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1112
5-th percentile1714
Q11813
median1912
Q31999
95-th percentile2015
Maximum2099
Range987
Interquartile range (IQR)186

Descriptive statistics

Standard deviation113.7608428
Coefficient of variation (CV)0.06029278505
Kurtosis6.85895972
Mean1886.8069
Median Absolute Deviation (MAD)87
Skewness-1.543842236
Sum18868069
Variance12941.52937
MonotonicityNot monotonic
2022-08-13T00:26:34.316132image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
19122923
29.2%
19111474
14.7%
19991021
 
10.2%
1813976
 
9.8%
2014557
 
5.6%
1716462
 
4.6%
1711420
 
4.2%
1714365
 
3.6%
1717333
 
3.3%
2015308
 
3.1%
Other values (7)1161
 
11.6%
ValueCountFrequency (%)
111239
 
0.4%
1711420
 
4.2%
1714365
 
3.6%
1715308
 
3.1%
1716462
 
4.6%
1717333
 
3.3%
181291
 
0.9%
1813976
 
9.8%
19111474
14.7%
19122923
29.2%
ValueCountFrequency (%)
209994
 
0.9%
2017283
 
2.8%
2015308
 
3.1%
2014557
 
5.6%
2013204
 
2.0%
2011142
 
1.4%
19991021
 
10.2%
19122923
29.2%
19111474
14.7%
1813976
 
9.8%

MCLASSNAME
Categorical

HIGH CORRELATION

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
엽채류
2923 
과채류
1474 
기타
1115 
어류
1067 
사과류
557 
Other values (10)
2864 

Length

Max length4
Median length3
Mean length2.7436
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row과채류
2nd row기타
3rd row감류
4th row엽채류
5th row사과류

Common Values

ValueCountFrequency (%)
엽채류2923
29.2%
과채류1474
14.7%
기타1115
 
11.2%
어류1067
 
10.7%
사과류557
 
5.6%
돼지고기462
 
4.6%
알류420
 
4.2%
소고기365
 
3.6%
양고기333
 
3.3%
닭고기308
 
3.1%
Other values (5)976
 
9.8%

Length

2022-08-13T00:26:34.424781image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
엽채류2923
29.2%
과채류1474
14.7%
기타1115
 
11.2%
어류1067
 
10.7%
사과류557
 
5.6%
돼지고기462
 
4.6%
알류420
 
4.2%
소고기365
 
3.6%
양고기333
 
3.3%
닭고기308
 
3.1%
Other values (5)976
 
9.8%

SCLASSCODE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct31
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean188701.9655
Minimum111221
Maximum209918
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-08-13T00:26:34.698076image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum111221
5-th percentile171415
Q1181322
median191214
Q3199913
95-th percentile201511
Maximum209918
Range98697
Interquartile range (IQR)18591

Descriptive statistics

Standard deviation11375.59665
Coefficient of variation (CV)0.06028340309
Kurtosis6.860832408
Mean188701.9655
Median Absolute Deviation (MAD)8700
Skewness-1.545085267
Sum1887019655
Variance129404199.1
MonotonicityNot monotonic
2022-08-13T00:26:34.808518image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
191214870
 
8.7%
191232828
 
8.3%
191148796
 
8.0%
191111678
 
6.8%
191226666
 
6.7%
199913602
 
6.0%
191231559
 
5.6%
171629462
 
4.6%
181322437
 
4.4%
201413427
 
4.3%
Other values (21)3675
36.8%
ValueCountFrequency (%)
11122139
 
0.4%
171111420
4.2%
171415365
3.6%
171512308
3.1%
171629462
4.6%
171716333
3.3%
18121291
 
0.9%
181320193
1.9%
181322437
4.4%
181326195
1.9%
ValueCountFrequency (%)
20991894
 
0.9%
201715124
 
1.2%
201711159
 
1.6%
20151392
 
0.9%
201511216
2.2%
201418115
 
1.1%
20141415
 
0.1%
201413427
4.3%
20132198
 
1.0%
201313106
 
1.1%

SCLASSNAME
Categorical

HIGH CORRELATION

Distinct31
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
배추
870 
양상추
828 
토마토
796 
백무우
678 
상추
666 
Other values (26)
6162 

Length

Max length5
Median length4
Mean length2.7569
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row백무우
2nd row버섯
3rd row밀감
4th row배추
5th row부시사과

Common Values

ValueCountFrequency (%)
배추870
 
8.7%
양상추828
 
8.3%
토마토796
 
8.0%
백무우678
 
6.8%
상추666
 
6.7%
버섯602
 
6.0%
쪽파559
 
5.6%
돼지고기462
 
4.6%
붕어437
 
4.4%
부시사과427
 
4.3%
Other values (21)3675
36.8%

Length

2022-08-13T00:26:34.926811image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
배추870
 
8.7%
양상추828
 
8.3%
토마토796
 
8.0%
백무우678
 
6.8%
상추666
 
6.7%
버섯602
 
6.0%
쪽파559
 
5.6%
돼지고기462
 
4.6%
붕어437
 
4.4%
부시사과427
 
4.3%
Other values (21)3675
36.8%

GRADECODE
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
0
9538 
1
 
462

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
09538
95.4%
1462
 
4.6%

Length

2022-08-13T00:26:35.028589image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-13T00:26:35.142012image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
09538
95.4%
1462
 
4.6%

GRADENAME
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
무규격
9538 
1급
 
462

Length

Max length3
Median length3
Mean length2.9538
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row무규격
2nd row무규격
3rd row무규격
4th row무규격
5th row무규격

Common Values

ValueCountFrequency (%)
무규격9538
95.4%
1급462
 
4.6%

Length

2022-08-13T00:26:35.231687image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-13T00:26:35.332330image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
무규격9538
95.4%
1급462
 
4.6%

MINPRICE
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct309
Distinct (%)3.3%
Missing545
Missing (%)5.5%
Infinite0
Infinite (%)0.0%
Mean14.36478054
Minimum0.2
Maximum490
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-08-13T00:26:35.425568image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile0.7
Q12
median5.2
Q312.15
95-th percentile60
Maximum490
Range489.8
Interquartile range (IQR)10.15

Descriptive statistics

Standard deviation32.27315284
Coefficient of variation (CV)2.246686105
Kurtosis64.79662389
Mean14.36478054
Median Absolute Deviation (MAD)3.8
Skewness7.115229715
Sum135819
Variance1041.556394
MonotonicityNot monotonic
2022-08-13T00:26:35.585631image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1399
 
4.0%
5395
 
4.0%
4373
 
3.7%
2323
 
3.2%
6323
 
3.2%
3276
 
2.8%
1.6269
 
2.7%
8268
 
2.7%
0.8265
 
2.6%
7245
 
2.5%
Other values (299)6319
63.2%
(Missing)545
 
5.5%
ValueCountFrequency (%)
0.25
 
0.1%
0.324
 
0.2%
0.4123
 
1.2%
0.585
 
0.9%
0.6223
2.2%
0.7121
 
1.2%
0.8265
2.6%
0.995
 
0.9%
1399
4.0%
1.133
 
0.3%
ValueCountFrequency (%)
4901
 
< 0.1%
4401
 
< 0.1%
4101
 
< 0.1%
4003
 
< 0.1%
3802
 
< 0.1%
3501
 
< 0.1%
3303
 
< 0.1%
3208
 
0.1%
3101
 
< 0.1%
30051
0.5%

AVGPRICE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct453
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.09213
Minimum0.3
Maximum510
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-08-13T00:26:35.731809image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.3
5-th percentile0.9
Q12.4
median6
Q313.5
95-th percentile62
Maximum510
Range509.7
Interquartile range (IQR)11.1

Descriptive statistics

Standard deviation32.71584735
Coefficient of variation (CV)2.167742217
Kurtosis68.76745895
Mean15.09213
Median Absolute Deviation (MAD)4.2
Skewness7.269230236
Sum150921.3
Variance1070.326668
MonotonicityNot monotonic
2022-08-13T00:26:35.864799image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6265
 
2.6%
7241
 
2.4%
1231
 
2.3%
1.2211
 
2.1%
1.4185
 
1.8%
0.9171
 
1.7%
9167
 
1.7%
0.8167
 
1.7%
5167
 
1.7%
2153
 
1.5%
Other values (443)8042
80.4%
ValueCountFrequency (%)
0.31
 
< 0.1%
0.411
 
0.1%
0.540
 
0.4%
0.666
 
0.7%
0.7129
1.3%
0.8167
1.7%
0.9171
1.7%
1231
2.3%
1.1124
1.2%
1.2211
2.1%
ValueCountFrequency (%)
5101
 
< 0.1%
5001
 
< 0.1%
4601
 
< 0.1%
4203
< 0.1%
4002
 
< 0.1%
3601
 
< 0.1%
3501
 
< 0.1%
3404
< 0.1%
3361
 
< 0.1%
3306
0.1%

MAXPRICE
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct358
Distinct (%)3.8%
Missing546
Missing (%)5.5%
Infinite0
Infinite (%)0.0%
Mean17.05921303
Minimum0.4
Maximum560
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-08-13T00:26:35.994943image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.4
5-th percentile1.1
Q13.1
median7.5
Q316
95-th percentile66
Maximum560
Range559.6
Interquartile range (IQR)12.9

Descriptive statistics

Standard deviation35.3204558
Coefficient of variation (CV)2.070462203
Kurtosis67.27348167
Mean17.05921303
Median Absolute Deviation (MAD)5.3
Skewness7.161155213
Sum161277.8
Variance1247.534598
MonotonicityNot monotonic
2022-08-13T00:26:36.245399image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8304
 
3.0%
7280
 
2.8%
9275
 
2.8%
6244
 
2.4%
1.6239
 
2.4%
5232
 
2.3%
14230
 
2.3%
12229
 
2.3%
4224
 
2.2%
10224
 
2.2%
Other values (348)6973
69.7%
(Missing)546
 
5.5%
ValueCountFrequency (%)
0.41
 
< 0.1%
0.59
 
0.1%
0.631
 
0.3%
0.743
 
0.4%
0.8105
1.1%
0.947
 
0.5%
1208
2.1%
1.156
 
0.6%
1.2216
2.2%
1.344
 
0.4%
ValueCountFrequency (%)
5601
 
< 0.1%
5101
 
< 0.1%
4901
 
< 0.1%
4801
 
< 0.1%
4601
 
< 0.1%
4403
< 0.1%
4202
 
< 0.1%
3901
 
< 0.1%
3705
0.1%
3602
 
< 0.1%

Interactions

2022-08-13T00:26:31.516528image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:23.565971image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:24.728267image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:25.655720image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:26.908316image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:27.880042image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:28.928526image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:30.391427image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:31.658876image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:23.766863image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:24.840568image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:25.779252image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:27.018330image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:27.999024image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:29.040420image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:30.497459image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:31.773221image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:24.028470image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:24.958860image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:26.050554image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:27.130906image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:28.244337image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:29.178249image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:30.611172image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:31.894894image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:24.150155image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:25.083567image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:26.204835image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:27.288588image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:28.367307image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:29.341303image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:30.889297image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:32.010191image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:24.262184image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:25.192347image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:26.324207image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:27.403528image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:28.487159image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:29.472996image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:30.991797image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:32.116850image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:24.377743image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:25.302316image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:26.474215image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:27.511192image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:28.602134image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:29.614866image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:31.104355image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:32.229778image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:24.496928image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:25.411732image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:26.619286image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:27.627164image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:28.713042image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:30.066922image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:31.217860image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:32.342807image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:24.614602image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:25.540059image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:26.776034image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:27.741409image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:28.822323image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:30.277843image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-13T00:26:31.368913image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-08-13T00:26:36.362720image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-13T00:26:36.538055image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-13T00:26:36.734176image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-13T00:26:36.949963image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-08-13T00:26:37.154013image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-08-13T00:26:32.541733image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-13T00:26:32.764077image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-08-13T00:26:33.039520image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-08-13T00:26:33.142284image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexDATESMARKETCOMARKETNMMCLASSCODEMCLASSNAMESCLASSCODESCLASSNAMEGRADECODEGRADENAMEMINPRICEAVGPRICEMAXPRICE
09197201902263401002안휘합비주곡퇴농산물도매시장1911과채류191111백무우0무규격1.01.41.8
121139201905063401002안휘합비주곡퇴농산물도매시장1999기타199913버섯0무규격6.06.57.0
29025201902253702006산동청도성양채소도매시장2011감류201114밀감0무규격2.22.63.0
344118201909232301006흑룡강성할빈하달과채도매시장유한공사1912엽채류191214배추0무규격0.91.01.0
430537201906283702006산동청도성양채소도매시장2014사과류201413부시사과0무규격9.014.015.0
5122201901013604001강서구강채소도매시장1912엽채류191226상추0무규격3.04.65.0
612863201903193401002안휘합비주곡퇴농산물도매시장1911과채류191148토마토0무규격6.47.28.0
743068201909161200010천진서청구당성무공해농수산물도매시장1911과채류191111백무우0무규격<NA>1.1<NA>
848130201910173702006산동청도성양채소도매시장1911과채류191111백무우0무규격1.62.42.6
948930201910224405001광동산두농부산물도매중심2017복숭아201711복숭아0무규격6.09.012.0

Last rows

df_indexDATESMARKETCOMARKETNMMCLASSCODEMCLASSNAMESCLASSCODESCLASSNAMEGRADECODEGRADENAMEMINPRICEAVGPRICEMAXPRICE
999015434201904031100003북경신발지농부산물도매시장1911과채류191148토마토0무규격3.64.65.6
999134549201907241100002북경화간악각장도매시장1813어류181320장어0무규격40.050.070.0
999219973201904291200010천진서청구당성무공해농수산물도매시장1911과채류191111백무우0무규격<NA>2.0<NA>
999347122201910114405001광동산두농부산물도매중심2014사과류201413부시사과0무규격6.09.012.0
999411888201903133205001강소소주남환교농부산물도매시장1813어류181326농어0무규격15.619.824.0
99958475201902221100005북경팔리교농산물중심도매시장2015포도류201511거봉포도0무규격11.011.512.0
999611192201903083401002안휘합비주곡퇴농산물도매시장1912엽채류191214배추0무규격0.60.81.0
999713779201903251102221북경순의석문채소도매시장1912엽채류191232양상추0무규격1.62.02.4
999828221201906141100003북경신발지농부산물도매시장1715닭고기171512닭고기0무규격13.013.313.6
99995441201901313509821복건복정민절변계농무중심시장1999기타199914팽이버섯0무규격9.29.310.4