Overview

Dataset statistics

Number of variables12
Number of observations101
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory10.6 KiB
Average record size in memory107.3 B

Variable types

Numeric8
Categorical4

Dataset

Description과실전문생산단지에 대한 상세 및 지원 정보
Author농림축산식품부
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220215000000001918

Alerts

PRVYYDO_CYFD_AM has constant value "0" Constant
DELVRY_AM is highly correlated with BUDGET_AM and 1 other fieldsHigh correlation
BUDGET_AM is highly correlated with DELVRY_AM and 1 other fieldsHigh correlation
EXCUT_DTLS is highly correlated with DELVRY_AM and 1 other fieldsHigh correlation
DISUSE_AM is highly correlated with PRVYYDO_CYFD_AMHigh correlation
SIGNGU_NM is highly correlated with PRVYYDO_CYFD_AMHigh correlation
PRVYYDO_CYFD_AM is highly correlated with DISUSE_AM and 2 other fieldsHigh correlation
CTRD_NM is highly correlated with PRVYYDO_CYFD_AMHigh correlation
DELVRY_AM has 20 (19.8%) zeros Zeros
BUDGET_AM has 20 (19.8%) zeros Zeros
EXCUT_DTLS has 20 (19.8%) zeros Zeros
CYFD_AM has 96 (95.0%) zeros Zeros
RL_EXCUT_RT has 20 (19.8%) zeros Zeros

Reproduction

Analysis started2022-08-12 14:53:21.127917
Analysis finished2022-08-12 14:53:30.096093
Duration8.97 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

YEAR
Real number (ℝ≥0)

Distinct12
Distinct (%)11.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011.128713
Minimum2005
Maximum2016
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2022-08-12T23:53:30.155956image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2005
5-th percentile2006
Q12009
median2011
Q32014
95-th percentile2015
Maximum2016
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.167533319
Coefficient of variation (CV)0.001575002783
Kurtosis-1.038487119
Mean2011.128713
Median Absolute Deviation (MAD)3
Skewness-0.2993890921
Sum203124
Variance10.03326733
MonotonicityNot monotonic
2022-08-12T23:53:30.249562image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
201514
13.9%
201413
12.9%
201011
10.9%
201111
10.9%
201310
9.9%
20067
6.9%
20087
6.9%
20097
6.9%
20127
6.9%
20076
5.9%
Other values (2)8
7.9%
ValueCountFrequency (%)
20054
 
4.0%
20067
6.9%
20076
5.9%
20087
6.9%
20097
6.9%
201011
10.9%
201111
10.9%
20127
6.9%
201310
9.9%
201413
12.9%
ValueCountFrequency (%)
20164
 
4.0%
201514
13.9%
201413
12.9%
201310
9.9%
20127
6.9%
201111
10.9%
201011
10.9%
20097
6.9%
20087
6.9%
20076
5.9%

CTRD_NM
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Memory size936.0 B
경상북도
43 
충청북도
22 
전라북도
19 
제주특별자치도
10 
경상남도

Length

Max length7
Median length4
Mean length4.297029703
Min length4

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row경상남도
2nd row전라북도
3rd row경상북도
4th row경상북도
5th row경상북도

Common Values

ValueCountFrequency (%)
경상북도43
42.6%
충청북도22
21.8%
전라북도19
18.8%
제주특별자치도10
 
9.9%
경상남도6
 
5.9%
충청남도1
 
1.0%

Length

2022-08-12T23:53:30.363541image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:53:30.509988image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
경상북도43
42.6%
충청북도22
21.8%
전라북도19
18.8%
제주특별자치도10
 
9.9%
경상남도6
 
5.9%
충청남도1
 
1.0%

CTRD_CODE
Real number (ℝ≥0)

Distinct6
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6460792.079
Minimum6430000
Maximum6500000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2022-08-12T23:53:30.631107image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum6430000
5-th percentile6430000
Q16450000
median6470000
Q36470000
95-th percentile6500000
Maximum6500000
Range70000
Interquartile range (IQR)20000

Descriptive statistics

Standard deviation21151.0363
Coefficient of variation (CV)0.003273752822
Kurtosis-0.7056912064
Mean6460792.079
Median Absolute Deviation (MAD)20000
Skewness0.003929897528
Sum652540000
Variance447366336.6
MonotonicityNot monotonic
2022-08-12T23:53:30.728140image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
647000043
42.6%
643000022
21.8%
645000019
18.8%
650000010
 
9.9%
64800006
 
5.9%
64400001
 
1.0%
ValueCountFrequency (%)
643000022
21.8%
64400001
 
1.0%
645000019
18.8%
647000043
42.6%
64800006
 
5.9%
650000010
 
9.9%
ValueCountFrequency (%)
650000010
 
9.9%
64800006
 
5.9%
647000043
42.6%
645000019
18.8%
64400001
 
1.0%
643000022
21.8%

SIGNGU_NM
Categorical

HIGH CORRELATION

Distinct25
Distinct (%)24.8%
Missing0
Missing (%)0.0%
Memory size936.0 B
예천군
10 
상주시
서귀포시
음성군
영주시
Other values (20)
61 

Length

Max length4
Median length3
Mean length3.079207921
Min length3

Unique

Unique3 ?
Unique (%)3.0%

Sample

1st row김해시
2nd row장수군
3rd row김천시
4th row영주시
5th row영양군

Common Values

ValueCountFrequency (%)
예천군10
 
9.9%
상주시8
 
7.9%
서귀포시8
 
7.9%
음성군7
 
6.9%
영주시7
 
6.9%
제천시7
 
6.9%
장수군7
 
6.9%
남원시6
 
5.9%
김천시5
 
5.0%
봉화군4
 
4.0%
Other values (15)32
31.7%

Length

2022-08-12T23:53:30.861219image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
예천군10
 
9.9%
상주시8
 
7.9%
서귀포시8
 
7.9%
음성군7
 
6.9%
영주시7
 
6.9%
제천시7
 
6.9%
장수군7
 
6.9%
남원시6
 
5.9%
김천시5
 
5.0%
봉화군4
 
4.0%
Other values (15)32
31.7%

SIGNGU_CODE
Real number (ℝ≥0)

Distinct25
Distinct (%)24.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5076930.693
Minimum4390000
Maximum6520000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2022-08-12T23:53:30.968932image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum4390000
5-th percentile4400000
Q14700000
median5090000
Q35230000
95-th percentile6520000
Maximum6520000
Range2130000
Interquartile range (IQR)530000

Descriptive statistics

Standard deviation578594.4047
Coefficient of variation (CV)0.1139653936
Kurtosis1.407840874
Mean5076930.693
Median Absolute Deviation (MAD)310000
Skewness1.270225242
Sum512770000
Variance3.347714851 × 1011
MonotonicityNot monotonic
2022-08-12T23:53:31.078261image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
523000010
 
9.9%
65200008
 
7.9%
51100008
 
7.9%
44700007
 
6.9%
50900007
 
6.9%
44000007
 
6.9%
47500007
 
6.9%
47000006
 
5.9%
50600005
 
5.0%
52400004
 
4.0%
Other values (15)32
31.7%
ValueCountFrequency (%)
43900003
3.0%
44000007
6.9%
44200001
 
1.0%
44400002
 
2.0%
44700007
6.9%
44900001
 
1.0%
47000006
5.9%
47100002
 
2.0%
47400002
 
2.0%
47500007
6.9%
ValueCountFrequency (%)
65200008
7.9%
65100002
 
2.0%
57100002
 
2.0%
53600004
 
4.0%
53500002
 
2.0%
52400004
 
4.0%
523000010
9.9%
51700002
 
2.0%
51500001
 
1.0%
51100008
7.9%

DELVRY_AM
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct80
Distinct (%)79.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1010551.564
Minimum0
Maximum5143700
Zeros20
Zeros (%)19.8%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2022-08-12T23:53:31.225108image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1313000
median762300
Q31320000
95-th percentile2978750
Maximum5143700
Range5143700
Interquartile range (IQR)1007000

Descriptive statistics

Standard deviation993678.3044
Coefficient of variation (CV)0.9833029204
Kurtosis3.940471033
Mean1010551.564
Median Absolute Deviation (MAD)518550
Skewness1.763266221
Sum102065708
Variance9.873965727 × 1011
MonotonicityNot monotonic
2022-08-12T23:53:31.372080image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020
 
19.8%
10837502
 
2.0%
9401362
 
2.0%
29787501
 
1.0%
40063001
 
1.0%
6112501
 
1.0%
7387501
 
1.0%
11083471
 
1.0%
9936311
 
1.0%
4839991
 
1.0%
Other values (70)70
69.3%
ValueCountFrequency (%)
020
19.8%
2058011
 
1.0%
2437501
 
1.0%
2825001
 
1.0%
2870001
 
1.0%
2925001
 
1.0%
3130001
 
1.0%
3412501
 
1.0%
3910001
 
1.0%
4570001
 
1.0%
ValueCountFrequency (%)
51437001
1.0%
43975001
1.0%
40063001
1.0%
34970001
1.0%
34825001
1.0%
29787501
1.0%
27435001
1.0%
24812501
1.0%
23712501
1.0%
23643001
1.0%

PRVYYDO_CYFD_AM
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size936.0 B
0
101 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0101
100.0%

Length

2022-08-12T23:53:31.625584image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:53:31.720124image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0101
100.0%

BUDGET_AM
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct80
Distinct (%)79.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1010551.564
Minimum0
Maximum5143700
Zeros20
Zeros (%)19.8%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2022-08-12T23:53:31.810896image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1313000
median762300
Q31320000
95-th percentile2978750
Maximum5143700
Range5143700
Interquartile range (IQR)1007000

Descriptive statistics

Standard deviation993678.3044
Coefficient of variation (CV)0.9833029204
Kurtosis3.940471033
Mean1010551.564
Median Absolute Deviation (MAD)518550
Skewness1.763266221
Sum102065708
Variance9.873965727 × 1011
MonotonicityNot monotonic
2022-08-12T23:53:31.991791image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020
 
19.8%
10837502
 
2.0%
9401362
 
2.0%
29787501
 
1.0%
40063001
 
1.0%
6112501
 
1.0%
7387501
 
1.0%
11083471
 
1.0%
9936311
 
1.0%
4839991
 
1.0%
Other values (70)70
69.3%
ValueCountFrequency (%)
020
19.8%
2058011
 
1.0%
2437501
 
1.0%
2825001
 
1.0%
2870001
 
1.0%
2925001
 
1.0%
3130001
 
1.0%
3412501
 
1.0%
3910001
 
1.0%
4570001
 
1.0%
ValueCountFrequency (%)
51437001
1.0%
43975001
1.0%
40063001
1.0%
34970001
1.0%
34825001
1.0%
29787501
1.0%
27435001
1.0%
24812501
1.0%
23712501
1.0%
23643001
1.0%

EXCUT_DTLS
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct81
Distinct (%)80.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1010340.109
Minimum0
Maximum5143700
Zeros20
Zeros (%)19.8%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2022-08-12T23:53:32.142135image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1287000
median781586
Q31315000
95-th percentile3047100
Maximum5143700
Range5143700
Interquartile range (IQR)1028000

Descriptive statistics

Standard deviation1012395.665
Coefficient of variation (CV)1.002034519
Kurtosis3.543384099
Mean1010340.109
Median Absolute Deviation (MAD)504664
Skewness1.694221086
Sum102044351
Variance1.024944983 × 1012
MonotonicityNot monotonic
2022-08-12T23:53:32.286060image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020
 
19.8%
10837502
 
2.0%
11935211
 
1.0%
6083371
 
1.0%
5522501
 
1.0%
6112501
 
1.0%
7387501
 
1.0%
11083471
 
1.0%
9936311
 
1.0%
9550001
 
1.0%
Other values (71)71
70.3%
ValueCountFrequency (%)
020
19.8%
478801
 
1.0%
657781
 
1.0%
2033041
 
1.0%
2437501
 
1.0%
2825001
 
1.0%
2870001
 
1.0%
2925001
 
1.0%
3130001
 
1.0%
3412501
 
1.0%
ValueCountFrequency (%)
51437001
1.0%
43919141
1.0%
40063001
1.0%
34970001
1.0%
34825001
1.0%
30471001
1.0%
29787501
1.0%
25870001
1.0%
23712501
1.0%
23112501
1.0%

CYFD_AM
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35384917.82
Minimum0
Maximum2208000000
Zeros96
Zeros (%)95.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2022-08-12T23:53:32.406171image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum2208000000
Range2208000000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation248359267.3
Coefficient of variation (CV)7.018788868
Kurtosis63.89526964
Mean35384917.82
Median Absolute Deviation (MAD)0
Skewness7.828631887
Sum3573876700
Variance6.168232566 × 1016
MonotonicityNot monotonic
2022-08-12T23:53:32.494146image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
096
95.0%
22080000001
 
1.0%
26390001
 
1.0%
11807867001
 
1.0%
1565000001
 
1.0%
259510001
 
1.0%
ValueCountFrequency (%)
096
95.0%
26390001
 
1.0%
259510001
 
1.0%
1565000001
 
1.0%
11807867001
 
1.0%
22080000001
 
1.0%
ValueCountFrequency (%)
22080000001
 
1.0%
11807867001
 
1.0%
1565000001
 
1.0%
259510001
 
1.0%
26390001
 
1.0%
096
95.0%

DISUSE_AM
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size936.0 B
0
99 
2478066
 
1
2979600
 
1

Length

Max length7
Median length1
Mean length1.118811881
Min length1

Unique

Unique2 ?
Unique (%)2.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
099
98.0%
24780661
 
1.0%
29796001
 
1.0%

Length

2022-08-12T23:53:32.604450image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:53:32.745653image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
099
98.0%
24780661
 
1.0%
29796001
 
1.0%

RL_EXCUT_RT
Real number (ℝ≥0)

ZEROS

Distinct20
Distinct (%)19.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8097029703
Minimum0
Maximum2
Zeros20
Zeros (%)19.8%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2022-08-12T23:53:32.870579image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.73
median1
Q31
95-th percentile1.42
Maximum2
Range2
Interquartile range (IQR)0.27

Descriptive statistics

Standard deviation0.4681355689
Coefficient of variation (CV)0.578157159
Kurtosis0.1515886403
Mean0.8097029703
Median Absolute Deviation (MAD)0
Skewness-0.5834695438
Sum81.78
Variance0.2191509109
MonotonicityNot monotonic
2022-08-12T23:53:32.992967image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
161
60.4%
020
 
19.8%
0.992
 
2.0%
1.432
 
2.0%
0.051
 
1.0%
0.921
 
1.0%
1.971
 
1.0%
0.591
 
1.0%
1.821
 
1.0%
21
 
1.0%
Other values (10)10
 
9.9%
ValueCountFrequency (%)
020
19.8%
0.051
 
1.0%
0.081
 
1.0%
0.591
 
1.0%
0.71
 
1.0%
0.721
 
1.0%
0.731
 
1.0%
0.741
 
1.0%
0.771
 
1.0%
0.921
 
1.0%
ValueCountFrequency (%)
21
 
1.0%
1.971
 
1.0%
1.821
 
1.0%
1.432
 
2.0%
1.421
 
1.0%
1.291
 
1.0%
1.21
 
1.0%
161
60.4%
0.992
 
2.0%
0.941
 
1.0%

Interactions

2022-08-12T23:53:28.533900image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:21.621039image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:22.669835image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:23.686948image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:24.549774image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:25.604508image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:26.483577image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:27.649429image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:28.675858image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:21.822211image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:22.794010image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:23.789386image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:24.652423image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:25.710193image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:26.588804image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:27.764513image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:28.807904image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:21.948870image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:22.907073image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:23.926578image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:24.782867image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:25.824099image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:26.740012image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:27.885241image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:28.940898image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:22.060981image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:23.008467image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:24.024539image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:24.911930image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:25.931446image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:26.865569image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:27.981309image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:29.049173image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:22.164475image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:23.111018image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:24.128850image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:25.021812image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:26.041235image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:26.995638image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:28.078105image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:29.184345image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:22.272026image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:23.213526image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:24.234721image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:25.151277image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:26.151071image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:27.123506image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:28.177022image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:29.321995image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:22.411310image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:23.324610image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:24.338568image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:25.259052image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:26.261898image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:27.278541image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:28.278231image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:29.459280image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:22.527888image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:23.429181image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:24.433196image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:25.357321image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:26.367185image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:27.382578image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:53:28.390654image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-08-12T23:53:33.114316image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-12T23:53:33.416608image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-12T23:53:33.585646image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-12T23:53:33.770451image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-08-12T23:53:33.943048image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-08-12T23:53:29.809186image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-12T23:53:30.012157image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

YEARCTRD_NMCTRD_CODESIGNGU_NMSIGNGU_CODEDELVRY_AMPRVYYDO_CYFD_AMBUDGET_AMEXCUT_DTLSCYFD_AMDISUSE_AMRL_EXCUT_RT
02006경상남도6480000김해시53500007623000762300762300001.0
12007전라북도6450000장수군475000013375390133753965778000.05
22007경상북도6470000김천시50600006112500611250611250001.0
32007경상북도6470000영주시50900007387500738750738750001.0
42007경상북도6470000영양군51700001108347011083471108347001.0
52007경상북도6470000예천군52300009936310993631993631001.0
62007경상북도6470000봉화군52400004839990483999955000001.97
72008전라북도6450000장수군4750000133051101330511781586000.59
82008전라북도6450000고창군47800000000000.0
92008경상북도6470000김천시50600006083370608337608337001.0

Last rows

YEARCTRD_NMCTRD_CODESIGNGU_NMSIGNGU_CODEDELVRY_AMPRVYYDO_CYFD_AMBUDGET_AMEXCUT_DTLSCYFD_AMDISUSE_AMRL_EXCUT_RT
912005충청북도6430000영동군44400001473780014737801473780001.0
922005경상북도6470000영주시50900008890000889000889000001.0
932005경상북도6470000봉화군52400002032239020322392032239001.0
942005경상남도6480000김해시53500002286900022869002286900001.0
952006경상북도6470000김천시50600001325105013251051222000000.92
962006경상북도6470000안동시50700001093104010931041093104001.0
972006경상북도6470000영주시50900001219744012197441219744001.0
982006경상북도6470000상주시51100001524680015246801524680001.0
992006경상북도6470000예천군52300009401360940136940136001.0
1002006경상북도6470000봉화군52400009401360940136940000001.0