Overview

Dataset statistics

Number of variables17
Number of observations10000
Missing cells22206
Missing cells (%)13.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.4 MiB
Average record size in memory142.0 B

Variable types

Numeric1
Categorical14
Unsupported2

Dataset

Description식물자원정보
Author농림수산식품교육문화정보원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220210000000001803

Alerts

LIFE_RESRCE_LTTOT_AT has constant value "-" Constant
RESRCE_NO has a high cardinality: 10000 distinct values High cardinality
SCNCENM_CD has a high cardinality: 1344 distinct values High cardinality
SCNCENM has a high cardinality: 1461 distinct values High cardinality
TNOAC has a high cardinality: 1409 distinct values High cardinality
IMAGE_URL has a high cardinality: 848 distinct values High cardinality
LIFE_RESRCE_LTTOT_AT is highly correlated with LAST_UPDT_DE and 7 other fieldsHigh correlation
LAST_UPDT_DE is highly correlated with LIFE_RESRCE_LTTOT_AT and 2 other fieldsHigh correlation
LIFE_RESRCE_STLE_CD_NM is highly correlated with LIFE_RESRCE_LTTOT_AT and 3 other fieldsHigh correlation
INSTT_CD_KOREA_NM is highly correlated with LIFE_RESRCE_LTTOT_AT and 1 other fieldsHigh correlation
OUTNATN_TKOUT_AT is highly correlated with LIFE_RESRCE_LTTOT_ATHigh correlation
INSTT_CD is highly correlated with LIFE_RESRCE_LTTOT_AT and 1 other fieldsHigh correlation
LIFE_RESRCE_KND_CD_NM is highly correlated with LIFE_RESRCE_LTTOT_AT and 4 other fieldsHigh correlation
LIFE_RESRCE_KND_CD is highly correlated with LIFE_RESRCE_LTTOT_AT and 4 other fieldsHigh correlation
LIFE_RESRCE_STLE_CD is highly correlated with LIFE_RESRCE_LTTOT_AT and 3 other fieldsHigh correlation
DETAIL_INFO_URL has 10000 (100.0%) missing values Missing
IMAGE_URL has 2186 (21.9%) missing values Missing
SPCIES_PRTC_APLC_AT has 10000 (100.0%) missing values Missing
df_index has unique values Unique
RESRCE_NO has unique values Unique
DETAIL_INFO_URL is an unsupported type, check if it needs cleaning or further analysis Unsupported
SPCIES_PRTC_APLC_AT is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2022-08-12 14:44:46.341854
Analysis finished2022-08-12 14:44:50.248796
Duration3.91 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23734.2535
Minimum15
Maximum47825
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-08-12T23:44:50.352624image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile2279.9
Q111760.5
median23708.5
Q335684.5
95-th percentile45357.15
Maximum47825
Range47810
Interquartile range (IQR)23924

Descriptive statistics

Standard deviation13854.30096
Coefficient of variation (CV)0.5837260043
Kurtosis-1.20696792
Mean23734.2535
Median Absolute Deviation (MAD)11963.5
Skewness0.0127921497
Sum237342535
Variance191941655.1
MonotonicityNot monotonic
2022-08-12T23:44:50.582891image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7861
 
< 0.1%
243101
 
< 0.1%
43141
 
< 0.1%
328031
 
< 0.1%
214441
 
< 0.1%
459181
 
< 0.1%
112771
 
< 0.1%
37201
 
< 0.1%
297171
 
< 0.1%
346611
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
151
< 0.1%
161
< 0.1%
211
< 0.1%
231
< 0.1%
241
< 0.1%
261
< 0.1%
271
< 0.1%
331
< 0.1%
351
< 0.1%
401
< 0.1%
ValueCountFrequency (%)
478251
< 0.1%
478221
< 0.1%
478131
< 0.1%
478091
< 0.1%
477991
< 0.1%
477891
< 0.1%
477861
< 0.1%
477831
< 0.1%
477791
< 0.1%
477761
< 0.1%

RESRCE_NO
Categorical

HIGH CARDINALITY
UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
14001191150200-071-00036799
 
1
14001191120200-020-00013204
 
1
14003771110102-010-00002132
 
1
14003771110103-010-00002249
 
1
14001191120200-020-00001220
 
1
Other values (9995)
9995 

Length

Max length27
Median length27
Mean length27
Min length27

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row14001191150200-071-00036799
2nd row14001191120200-020-00001235
3rd row14003771110103-010-00002249
4th row14001191120200-020-00001220
5th row14001191110200-010-00003358

Common Values

ValueCountFrequency (%)
14001191150200-071-000367991
 
< 0.1%
14001191120200-020-000132041
 
< 0.1%
14003771110102-010-000021321
 
< 0.1%
14003771110103-010-000022491
 
< 0.1%
14001191120200-020-000012201
 
< 0.1%
14001191110200-010-000033581
 
< 0.1%
14003771110102-010-000051641
 
< 0.1%
14003771110103-010-000018131
 
< 0.1%
14001191120200-020-000072811
 
< 0.1%
14001191120200-020-000096021
 
< 0.1%
Other values (9990)9990
99.9%

Length

2022-08-12T23:44:50.805418image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
14001191150200-071-000367991
 
< 0.1%
14003771110102-010-000094721
 
< 0.1%
14003771110102-010-000013421
 
< 0.1%
14003771110102-010-000049601
 
< 0.1%
14001191120200-020-000010551
 
< 0.1%
14001191110200-010-000038381
 
< 0.1%
14001191120200-020-000171041
 
< 0.1%
14003771110102-010-000073901
 
< 0.1%
14003771110102-010-000026531
 
< 0.1%
14001191120200-020-000038701
 
< 0.1%
Other values (9990)9990
99.9%

LIFE_RESRCE_STLE_CD
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
1
6272 
2
3504 
5
 
224

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5
2nd row2
3rd row1
4th row2
5th row1

Common Values

ValueCountFrequency (%)
16272
62.7%
23504
35.0%
5224
 
2.2%

Length

2022-08-12T23:44:51.117725image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:51.283951image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
16272
62.7%
23504
35.0%
5224
 
2.2%

LIFE_RESRCE_STLE_CD_NM
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
종자
6272 
영양체
3504 
표본
 
224

Length

Max length3
Median length2
Mean length2.3504
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row표본
2nd row영양체
3rd row종자
4th row영양체
5th row종자

Common Values

ValueCountFrequency (%)
종자6272
62.7%
영양체3504
35.0%
표본224
 
2.2%

Length

2022-08-12T23:44:51.436563image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:51.605130image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
종자6272
62.7%
영양체3504
35.0%
표본224
 
2.2%

LIFE_RESRCE_KND_CD
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
1
9776 
7
 
224

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row7
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
19776
97.8%
7224
 
2.2%

Length

2022-08-12T23:44:51.764696image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:51.910652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
19776
97.8%
7224
 
2.2%

LIFE_RESRCE_KND_CD_NM
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
식량작물
9776 
수목류
 
224

Length

Max length4
Median length4
Mean length3.9776
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row수목류
2nd row식량작물
3rd row식량작물
4th row식량작물
5th row식량작물

Common Values

ValueCountFrequency (%)
식량작물9776
97.8%
수목류224
 
2.2%

Length

2022-08-12T23:44:52.044664image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:52.234197image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
식량작물9776
97.8%
수목류224
 
2.2%

SCNCENM_CD
Categorical

HIGH CARDINALITY

Distinct1344
Distinct (%)13.4%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
BSD0003395965
 
249
BSD0001596891
 
198
BSD0001918795
 
164
BSD0003483320
 
144
BSD0001723541
 
133
Other values (1339)
9112 

Length

Max length13
Median length13
Mean length13
Min length13

Unique

Unique478 ?
Unique (%)4.8%

Sample

1st rowBSD0000755943
2nd rowBSD0000792728
3rd rowBSD0003129056
4th rowBSD0002676251
5th rowBSD0001384123

Common Values

ValueCountFrequency (%)
BSD0003395965249
 
2.5%
BSD0001596891198
 
2.0%
BSD0001918795164
 
1.6%
BSD0003483320144
 
1.4%
BSD0001723541133
 
1.3%
BSD0003400683128
 
1.3%
BSD0000846780125
 
1.2%
BSD0003330177115
 
1.1%
BSD0003250789114
 
1.1%
BSD0002997219110
 
1.1%
Other values (1334)8520
85.2%

Length

2022-08-12T23:44:52.376397image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bsd0003395965249
 
2.5%
bsd0001596891198
 
2.0%
bsd0001918795164
 
1.6%
bsd0003483320144
 
1.4%
bsd0001723541133
 
1.3%
bsd0003400683128
 
1.3%
bsd0000846780125
 
1.2%
bsd0003330177115
 
1.1%
bsd0003250789114
 
1.1%
bsd0002997219110
 
1.1%
Other values (1334)8520
85.2%

SCNCENM
Categorical

HIGH CARDINALITY

Distinct1461
Distinct (%)14.6%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
Abies koreana Wilson
 
234
Pinus densiflora Siebold & Zucc.
 
167
Rhododendron yedoense for. poukhanense (H.Lev.) Sugim.
 
161
Acer pseudosieboldianum (Pax) Kom.
 
144
Rhododendron schlippenbachii Maxim.
 
133
Other values (1456)
9161 

Length

Max length72
Median length58
Mean length31.958
Min length8

Unique

Unique569 ?
Unique (%)5.7%

Sample

1st rowCarduus crispus L.
2nd rowMagnolia sieboldii k.koch
3rd rowMachilus thunbergii Siebold & Zucc.
4th rowBerberis thunbergii DC.
5th rowBidens frondosa L.

Common Values

ValueCountFrequency (%)
Abies koreana Wilson234
 
2.3%
Pinus densiflora Siebold & Zucc.167
 
1.7%
Rhododendron yedoense for. poukhanense (H.Lev.) Sugim.161
 
1.6%
Acer pseudosieboldianum (Pax) Kom.144
 
1.4%
Rhododendron schlippenbachii Maxim.133
 
1.3%
Viburnum odoratissimum var. awabuki (K.Koch) Zabel ex Rumpler128
 
1.3%
Camellia sinensis L.125
 
1.2%
Melia azedarach L.115
 
1.1%
Quercus aliena Blume114
 
1.1%
Pinus koraiensis Siebold & Zucc.110
 
1.1%
Other values (1451)8569
85.7%

Length

2022-08-12T23:44:52.613067image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1481
 
3.5%
var1433
 
3.4%
thunb1299
 
3.1%
l1123
 
2.7%
siebold1098
 
2.6%
zucc1061
 
2.5%
ex893
 
2.1%
nakai875
 
2.1%
maxim781
 
1.9%
for651
 
1.5%
Other values (2136)31460
74.6%

TNOAC
Categorical

HIGH CARDINALITY

Distinct1409
Distinct (%)14.1%
Missing20
Missing (%)0.2%
Memory size78.2 KiB
구상나무
 
245
소나무
 
167
산철쭉
 
164
당단풍나무
 
144
철쭉
 
133
Other values (1404)
9127 

Length

Max length17
Median length14
Mean length3.9
Min length1

Unique

Unique533 ?
Unique (%)5.3%

Sample

1st row지느러미엉겅퀴
2nd row함박꽃나무
3rd row후박나무
4th row일본매자나무
5th row미국가막사리

Common Values

ValueCountFrequency (%)
구상나무245
 
2.5%
소나무167
 
1.7%
산철쭉164
 
1.6%
당단풍나무144
 
1.4%
철쭉133
 
1.3%
아왜나무128
 
1.3%
차나무125
 
1.2%
멀구슬나무115
 
1.1%
갈참나무114
 
1.1%
잣나무110
 
1.1%
Other values (1399)8535
85.4%

Length

2022-08-12T23:44:52.844188image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
구상나무245
 
2.4%
소나무167
 
1.7%
산철쭉164
 
1.6%
당단풍나무144
 
1.4%
철쭉133
 
1.3%
아왜나무128
 
1.3%
차나무125
 
1.2%
멀구슬나무115
 
1.1%
갈참나무114
 
1.1%
잣나무110
 
1.1%
Other values (1406)8582
85.6%

INSTT_CD
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
1400119
5440 
1400377
4511 
1400573
 
49

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1400119
2nd row1400119
3rd row1400377
4th row1400119
5th row1400119

Common Values

ValueCountFrequency (%)
14001195440
54.4%
14003774511
45.1%
140057349
 
0.5%

Length

2022-08-12T23:44:53.033815image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:53.214026image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
14001195440
54.4%
14003774511
45.1%
140057349
 
0.5%

INSTT_CD_KOREA_NM
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
국립수목원
5440 
국립산림과학원
4511 
국립산림품종관리센터
 
49

Length

Max length10
Median length5
Mean length5.9267
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row국립수목원
2nd row국립수목원
3rd row국립산림과학원
4th row국립수목원
5th row국립수목원

Common Values

ValueCountFrequency (%)
국립수목원5440
54.4%
국립산림과학원4511
45.1%
국립산림품종관리센터49
 
0.5%

Length

2022-08-12T23:44:53.385215image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:53.639908image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
국립수목원5440
54.4%
국립산림과학원4511
45.1%
국립산림품종관리센터49
 
0.5%

DETAIL_INFO_URL
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing10000
Missing (%)100.0%
Memory size88.0 KiB

IMAGE_URL
Categorical

HIGH CARDINALITY
MISSING

Distinct848
Distinct (%)10.9%
Missing2186
Missing (%)21.9%
Memory size78.2 KiB
http://www.forest.go.kr/images/fgri/2012/image/10000004-xxxxx-01.jpg
 
249
http://www.forest.go.kr/images/fgri/2012/image/10000652-xxxxx-01.jpg
 
198
http://www.forest.go.kr/images/fgri/2012/image/10003015-xxxxx-01.jpg
 
164
http://www.forest.go.kr/images/fgri/2012/image/10002397-41343-01.jpg
 
144
http://www.forest.go.kr/images/fgri/2012/image/10003017-xxxxx-01.jpg
 
133
Other values (843)
6926 

Length

Max length94
Median length68
Mean length68.00934221
Min length57

Unique

Unique253 ?
Unique (%)3.2%

Sample

1st rowhttp://www.forest.go.kr/images/fgri/2012/image/10000951-40779-01.jpg
2nd rowhttp://www.forest.go.kr/images/fgri/2012/image/10013994-29745-02.jpg
3rd rowhttp://www.forest.go.kr/images/fgri/2012/image/10012789-28540-01.jpg
4th rowhttp://www.forest.go.kr/images/fgri/2012/image/10000652-xxxxx-01.jpg
5th rowhttp://www.forest.go.kr/images/fgri/2012/image/10001073-xxxxx-01.jpg

Common Values

ValueCountFrequency (%)
http://www.forest.go.kr/images/fgri/2012/image/10000004-xxxxx-01.jpg249
 
2.5%
http://www.forest.go.kr/images/fgri/2012/image/10000652-xxxxx-01.jpg198
 
2.0%
http://www.forest.go.kr/images/fgri/2012/image/10003015-xxxxx-01.jpg164
 
1.6%
http://www.forest.go.kr/images/fgri/2012/image/10002397-41343-01.jpg144
 
1.4%
http://www.forest.go.kr/images/fgri/2012/image/10003017-xxxxx-01.jpg133
 
1.3%
http://www.forest.go.kr/images/fgri/2012/image/10013889-29640-02.jpg128
 
1.3%
http://www.forest.go.kr/images/fgri/2012/image/10002504-xxxxx-01.jpg125
 
1.2%
http://www.forest.go.kr/images/fgri/2012/image/10013810-29561-01.jpg115
 
1.1%
http://www.forest.go.kr/images/fgri/2012/image/10004710-30684-01.jpg110
 
1.1%
http://www.forest.go.kr/images/fgri/2012/image/10001815-xxxxx-01.jpg98
 
1.0%
Other values (838)6350
63.5%
(Missing)2186
 
21.9%

Length

2022-08-12T23:44:53.945691image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
http://www.forest.go.kr/images/fgri/2012/image/10000004-xxxxx-01.jpg249
 
3.2%
http://www.forest.go.kr/images/fgri/2012/image/10000652-xxxxx-01.jpg198
 
2.5%
http://www.forest.go.kr/images/fgri/2012/image/10003015-xxxxx-01.jpg164
 
2.1%
http://www.forest.go.kr/images/fgri/2012/image/10002397-41343-01.jpg144
 
1.8%
http://www.forest.go.kr/images/fgri/2012/image/10003017-xxxxx-01.jpg133
 
1.7%
http://www.forest.go.kr/images/fgri/2012/image/10013889-29640-02.jpg128
 
1.6%
http://www.forest.go.kr/images/fgri/2012/image/10002504-xxxxx-01.jpg125
 
1.6%
http://www.forest.go.kr/images/fgri/2012/image/10013810-29561-01.jpg115
 
1.5%
http://www.forest.go.kr/images/fgri/2012/image/10004710-30684-01.jpg110
 
1.4%
http://www.forest.go.kr/images/fgri/2012/image/10001815-xxxxx-01.jpg98
 
1.3%
Other values (838)6350
81.3%

OUTNATN_TKOUT_AT
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
불가능
7474 
가능
2526 

Length

Max length3
Median length3
Mean length2.7474
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row가능
2nd row불가능
3rd row불가능
4th row가능
5th row가능

Common Values

ValueCountFrequency (%)
불가능7474
74.7%
가능2526
 
25.3%

Length

2022-08-12T23:44:54.135953image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:54.306479image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
불가능7474
74.7%
가능2526
 
25.3%

SPCIES_PRTC_APLC_AT
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing10000
Missing (%)100.0%
Memory size88.0 KiB

LIFE_RESRCE_LTTOT_AT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
-
10000 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
-10000
100.0%

Length

2022-08-12T23:44:54.426906image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:54.588614image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
10000
100.0%

LAST_UPDT_DE
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
20121210
5265 
20121209
4511 
20121203
 
224

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20121203
2nd row20121210
3rd row20121209
4th row20121210
5th row20121210

Common Values

ValueCountFrequency (%)
201212105265
52.6%
201212094511
45.1%
20121203224
 
2.2%

Length

2022-08-12T23:44:54.707551image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:44:54.862188image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
201212105265
52.6%
201212094511
45.1%
20121203224
 
2.2%

Interactions

2022-08-12T23:44:48.622220image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-08-12T23:44:55.013984image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-12T23:44:55.264513image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-12T23:44:55.492048image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-12T23:44:55.780452image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-08-12T23:44:56.016087image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-08-12T23:44:49.036311image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-12T23:44:49.560743image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-08-12T23:44:49.880785image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-08-12T23:44:50.042738image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexRESRCE_NOLIFE_RESRCE_STLE_CDLIFE_RESRCE_STLE_CD_NMLIFE_RESRCE_KND_CDLIFE_RESRCE_KND_CD_NMSCNCENM_CDSCNCENMTNOACINSTT_CDINSTT_CD_KOREA_NMDETAIL_INFO_URLIMAGE_URLOUTNATN_TKOUT_ATSPCIES_PRTC_APLC_ATLIFE_RESRCE_LTTOT_ATLAST_UPDT_DE
078614001191150200-071-000367995표본7수목류BSD0000755943Carduus crispus L.지느러미엉겅퀴1400119국립수목원<NA><NA>가능<NA>-20121203
13070414001191120200-020-000012352영양체1식량작물BSD0000792728Magnolia sieboldii k.koch함박꽃나무1400119국립수목원<NA>http://www.forest.go.kr/images/fgri/2012/image/10000951-40779-01.jpg불가능<NA>-20121210
2730314003771110103-010-000022491종자1식량작물BSD0003129056Machilus thunbergii Siebold & Zucc.후박나무1400377국립산림과학원<NA>http://www.forest.go.kr/images/fgri/2012/image/10013994-29745-02.jpg불가능<NA>-20121209
33058914001191120200-020-000012202영양체1식량작물BSD0002676251Berberis thunbergii DC.일본매자나무1400119국립수목원<NA><NA>가능<NA>-20121210
41964014001191110200-010-000033581종자1식량작물BSD0001384123Bidens frondosa L.미국가막사리1400119국립수목원<NA>http://www.forest.go.kr/images/fgri/2012/image/10012789-28540-01.jpg가능<NA>-20121210
5557814003771110102-010-000051641종자1식량작물BSD0001596891Pinus densiflora Siebold & Zucc.소나무1400377국립산림과학원<NA>http://www.forest.go.kr/images/fgri/2012/image/10000652-xxxxx-01.jpg불가능<NA>-20121209
6940214003771110103-010-000018131종자1식량작물BSD0004075848Actinodaphne lancifolia (Siebold & Zucc.) Meisn.육박나무1400377국립산림과학원<NA>http://www.forest.go.kr/images/fgri/2012/image/10001073-xxxxx-01.jpg가능<NA>-20121209
73423614001191120200-020-000072812영양체1식량작물BSD0000262747Taxus cuspidata Siebold & Zucc.주목1400119국립수목원<NA>http://www.forest.go.kr/images/fgri/2012/image/10012841-28592-01.jpg불가능<NA>-20121210
84187414001191120200-020-000132042영양체1식량작물BSD0000394084Chaenomeles speciosa (Sweet) Nakai산당화1400119국립수목원<NA>http://www.forest.go.kr/images/fgri/2012/image/10001815-xxxxx-01.jpg가능<NA>-20121210
94428814001191120200-020-000096022영양체1식량작물BSD0000484275Rhododendron sp.산철쭉속1400119국립수목원<NA><NA>가능<NA>-20121210

Last rows

df_indexRESRCE_NOLIFE_RESRCE_STLE_CDLIFE_RESRCE_STLE_CD_NMLIFE_RESRCE_KND_CDLIFE_RESRCE_KND_CD_NMSCNCENM_CDSCNCENMTNOACINSTT_CDINSTT_CD_KOREA_NMDETAIL_INFO_URLIMAGE_URLOUTNATN_TKOUT_ATSPCIES_PRTC_APLC_ATLIFE_RESRCE_LTTOT_ATLAST_UPDT_DE
99903410414001191120200-020-000047462영양체1식량작물BSD0001991875Forsythia koreana (Rehder) Nakai개나리1400119국립수목원<NA>http://www.forest.go.kr/images/fgri/2012/image/10003197-xxxxx-01.jpg불가능<NA>-20121210
99911441514003771110103-010-000060801종자1식량작물BSD0003437192Viburnum dilatatum Thunb. ex Murray가막살나무1400377국립산림과학원<NA>http://www.forest.go.kr/images/fgri/2012/image/10012796-28547-01.jpg불가능<NA>-20121209
99924627414001191120200-020-000144512영양체1식량작물BSD0002414104Rhododendron japonicum for. flavum (Miyoshi) Nakai황철쭉1400119국립수목원<NA><NA>불가능<NA>-20121210
99932185314001191110200-010-000060991종자1식량작물BSD0003401159Akebia quinata (Thunb.) Decne.으름덩굴1400119국립수목원<NA>http://www.forest.go.kr/images/fgri/2012/image/10000915-xxxxx-02.jpg불가능<NA>-20121210
9994154514003771110102-010-000023741종자1식량작물BSD0000047354Liriodendron tulipifera L.튜울립나무1400377국립산림과학원<NA><NA>불가능<NA>-20121209
99954724314001191120200-020-000146712영양체1식량작물BSD0003250789Quercus aliena Blume갈참나무1400119국립수목원<NA><NA>불가능<NA>-20121210
999673014001191150200-071-000318905표본7수목류BSD0003919899Gnaphalium uliginosum L.왜떡쑥1400119국립수목원<NA>http://www.forest.go.kr/images/fgri/2012/image/10004424-xxxxx-01.jpg가능<NA>-20121203
99972447414001191110200-010-000027961종자1식량작물BSD0004000012Patrinia villosa (Thunb.) Juss.뚝갈1400119국립수목원<NA>http://www.forest.go.kr/images/fgri/2012/image/10013074-28825-01.jpg불가능<NA>-20121210
99983390914001191120200-020-000044002영양체1식량작물BSD0002487332Eucommia ulmoides Oliv.두충1400119국립수목원<NA><NA>불가능<NA>-20121210
99992906014001191110200-010-000066751종자1식량작물BSD0000599862Thalictrum kemense var. hypoleucum (Siebold & Zucc.) Kitag.좀꿩의다리1400119국립수목원<NA><NA>가능<NA>-20121210