Overview

Dataset statistics

Number of variables14
Number of observations33
Missing cells77
Missing cells (%)16.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.8 KiB
Average record size in memory116.9 B

Variable types

Unsupported11
Categorical3

Dataset

Description컨테이너,입국장,우편특송물류센에서국제식물검역인증원의검역을통해발견된외래병해충검출정보로월별지역정보,관리,잠정,비검역,건수,마리수등을제공한다
Author국제식물검역인증원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220714000000002160

Alerts

Unnamed: 2 is highly correlated with Unnamed: 3 and 1 other fieldsHigh correlation
Unnamed: 3 is highly correlated with Unnamed: 2 and 1 other fieldsHigh correlation
ㅇ 컨테이너 및 적재장소 점검 관련 병해충 발견 실적 is highly correlated with Unnamed: 2 and 1 other fieldsHigh correlation
Unnamed: 0 has 33 (100.0%) missing values Missing
ㅇ 컨테이너 및 적재장소 점검 관련 병해충 발견 실적 has 26 (78.8%) missing values Missing
Unnamed: 2 has 15 (45.5%) missing values Missing
Unnamed: 3 has 3 (9.1%) missing values Missing
Unnamed: 0 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 11 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 12 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 13 is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2022-08-12 14:52:41.729094
Analysis finished2022-08-12 14:52:42.958246
Duration1.23 second
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Unnamed: 0
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing33
Missing (%)100.0%
Memory size425.0 B
Distinct7
Distinct (%)100.0%
Missing26
Missing (%)78.8%
Memory size392.0 B
지역
인천공항 관리팀
영남 지역관리팀
중부 지역관리팀
호남 지역관리팀
Other values (2)

Length

Max length8
Median length8
Mean length5.571428571
Min length2

Unique

Unique7 ?
Unique (%)100.0%

Sample

1st row지역
2nd row인천공항 관리팀
3rd row영남 지역관리팀
4th row중부 지역관리팀
5th row호남 지역관리팀

Common Values

ValueCountFrequency (%)
지역1
 
3.0%
인천공항 관리팀1
 
3.0%
영남 지역관리팀1
 
3.0%
중부 지역관리팀1
 
3.0%
호남 지역관리팀1
 
3.0%
합계1
 
3.0%
총계 1
 
3.0%
(Missing)26
78.8%

Length

2022-08-12T23:52:43.038625image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:52:43.235529image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
지역관리팀3
27.3%
지역1
 
9.1%
인천공항1
 
9.1%
관리팀1
 
9.1%
영남1
 
9.1%
중부1
 
9.1%
호남1
 
9.1%
합계1
 
9.1%
총계1
 
9.1%

Unnamed: 2
Categorical

HIGH CORRELATION
MISSING

Distinct6
Distinct (%)33.3%
Missing15
Missing (%)45.5%
Memory size392.0 B
관리
잠정
비검역
구분
건수

Length

Max length3
Median length2
Mean length2.333333333
Min length2

Unique

Unique3 ?
Unique (%)16.7%

Sample

1st row구분
2nd row관리
3rd row잠정
4th row비검역
5th row관리

Common Values

ValueCountFrequency (%)
관리5
 
15.2%
잠정5
 
15.2%
비검역5
 
15.2%
구분1
 
3.0%
건수1
 
3.0%
마리수1
 
3.0%
(Missing)15
45.5%

Length

2022-08-12T23:52:43.482271image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:52:43.716715image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
관리5
27.8%
잠정5
27.8%
비검역5
27.8%
구분1
 
5.6%
건수1
 
5.6%
마리수1
 
5.6%

Unnamed: 3
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.7%
Missing3
Missing (%)9.1%
Memory size392.0 B
건수
15 
마리수
15 

Length

Max length3
Median length2.5
Mean length2.5
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row건수
2nd row마리수
3rd row건수
4th row마리수
5th row건수

Common Values

ValueCountFrequency (%)
건수15
45.5%
마리수15
45.5%
(Missing)3
 
9.1%

Length

2022-08-12T23:52:43.952106image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:52:44.142107image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
건수15
50.0%
마리수15
50.0%

Unnamed: 4
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size392.0 B

Unnamed: 5
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size392.0 B

Unnamed: 6
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size392.0 B

Unnamed: 7
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size392.0 B

Unnamed: 8
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size392.0 B

Unnamed: 9
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size392.0 B

Unnamed: 10
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size392.0 B

Unnamed: 11
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size392.0 B

Unnamed: 12
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size392.0 B

Unnamed: 13
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size392.0 B

Correlations

2022-08-12T23:52:44.251732image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-12T23:52:44.615875image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-12T23:52:44.815767image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-12T23:52:45.052380image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-08-12T23:52:45.302944image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-08-12T23:52:42.149538image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-12T23:52:42.503017image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-08-12T23:52:42.708390image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-08-12T23:52:42.841105image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Unnamed: 0ㅇ 컨테이너 및 적재장소 점검 관련 병해충 발견 실적Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13
0<NA>지역구분<NA>4월5월6월7월8월9월10월11월12월
1<NA>인천공항 관리팀관리건수0000000000
2<NA><NA><NA>마리수0000000000
3<NA><NA>잠정건수0000001001
4<NA><NA><NA>마리수0000003003
5<NA><NA>비검역건수0122000005
6<NA><NA><NA>마리수01560000012
7<NA>영남 지역관리팀관리건수410151814201631101
8<NA><NA><NA>마리수4411014925116118119830151139
9<NA><NA>잠정건수1001010003

Last rows

Unnamed: 0ㅇ 컨테이너 및 적재장소 점검 관련 병해충 발견 실적Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13
23<NA><NA>비검역건수15122868918165455410
24<NA><NA><NA>마리수181112112316485475304218242227
25<NA>합계관리건수4244243444625132243
26<NA><NA><NA>마리수4419644640037932723495212142
27<NA><NA>잠정건수37210110015
28<NA><NA><NA>마리수63913101030072
29<NA><NA>비검역건수387114613513913693506814
30<NA><NA><NA>마리수3144766121007730766442245264618
31<NA>총계건수<NA>451021901791831831196381072
32<NA><NA>마리수<NA>3647111071140811091103679340476832