Overview

Dataset statistics

Number of variables6
Number of observations75
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.9 KiB
Average record size in memory53.7 B

Variable types

Categorical3
Numeric3

Dataset

DescriptionNAPPO(북미식물보호기구)에서 요구하는 북미국가(미국, 캐나다 등), 칠레, 뉴질랜드 등 출항선박에 대한 선박 AGM 검사 통계정보
Author국제식물검역인증원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220714000000002161

Alerts

건수(기본) is highly correlated with 건수(할증)High correlation
건수(할증) is highly correlated with 건수(기본)High correlation
건수(기본) has 7 (9.3%) zeros Zeros
건수(할증) has 6 (8.0%) zeros Zeros

Reproduction

Analysis started2022-08-12 14:48:36.566364
Analysis finished2022-08-12 14:48:38.244419
Duration1.68 second
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

검사년도
Categorical

Distinct5
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size728.0 B
2017
15 
2016
15 
2015
15 
2014
15 
2013
15 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2017
2nd row2017
3rd row2017
4th row2017
5th row2017

Common Values

ValueCountFrequency (%)
201715
20.0%
201615
20.0%
201515
20.0%
201415
20.0%
201315
20.0%

Length

2022-08-12T23:48:38.417678image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:48:38.550588image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
201715
20.0%
201615
20.0%
201515
20.0%
201415
20.0%
201315
20.0%

선박 종류
Categorical

Distinct4
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size728.0 B
컨테이너
25 
기타선박류
20 
벌크선
15 
차량운반선
15 

Length

Max length5
Median length4
Mean length4.266666667
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row벌크선
2nd row차량운반선
3rd row벌크선
4th row컨테이너
5th row컨테이너

Common Values

ValueCountFrequency (%)
컨테이너25
33.3%
기타선박류20
26.7%
벌크선15
20.0%
차량운반선15
20.0%

Length

2022-08-12T23:48:38.681107image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:48:38.861987image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
컨테이너25
33.3%
기타선박류20
26.7%
벌크선15
20.0%
차량운반선15
20.0%

선박 중량
Categorical

Distinct9
Distinct (%)12.0%
Missing0
Missing (%)0.0%
Memory size728.0 B
2.5만톤 미만
15 
2.5만~4만톤 미만
15 
4만톤 이상
10 
7만톤 이상
10 
2만톤 미만
Other values (4)
20 

Length

Max length11
Median length9
Mean length8.2
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.5만톤 미만
2nd row4만톤 이상
3rd row4만톤 이상
4th row2만톤 미만
5th row2만~3만톤 미만

Common Values

ValueCountFrequency (%)
2.5만톤 미만15
20.0%
2.5만~4만톤 미만15
20.0%
4만톤 이상10
13.3%
7만톤 이상10
13.3%
2만톤 미만5
 
6.7%
2만~3만톤 미만5
 
6.7%
3만~5만톤 미만5
 
6.7%
5만~7만톤 미만5
 
6.7%
4만~7만톤 미만5
 
6.7%

Length

2022-08-12T23:48:39.005335image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:48:39.155616image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
미만55
36.7%
이상20
 
13.3%
2.5만톤15
 
10.0%
2.5만~4만톤15
 
10.0%
4만톤10
 
6.7%
7만톤10
 
6.7%
2만톤5
 
3.3%
2만~3만톤5
 
3.3%
3만~5만톤5
 
3.3%
5만~7만톤5
 
3.3%

검사수수료(천원)
Real number (ℝ≥0)

Distinct10
Distinct (%)13.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean996.1333333
Minimum80
Maximum2250
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size803.0 B
2022-08-12T23:48:39.272945image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum80
5-th percentile120
Q1200
median1125
Q31500
95-th percentile2250
Maximum2250
Range2170
Interquartile range (IQR)1300

Descriptive statistics

Standard deviation761.3961098
Coefficient of variation (CV)0.7643516027
Kurtosis-1.497680411
Mean996.1333333
Median Absolute Deviation (MAD)750
Skewness0.1298834873
Sum74710
Variance579724.036
MonotonicityNot monotonic
2022-08-12T23:48:39.371586image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
112512
16.0%
187512
16.0%
150012
16.0%
1208
10.7%
2008
10.7%
1608
10.7%
22506
8.0%
2404
 
5.3%
7503
 
4.0%
802
 
2.7%
ValueCountFrequency (%)
802
 
2.7%
1208
10.7%
1608
10.7%
2008
10.7%
2404
 
5.3%
7503
 
4.0%
112512
16.0%
150012
16.0%
187512
16.0%
22506
8.0%
ValueCountFrequency (%)
22506
8.0%
187512
16.0%
150012
16.0%
112512
16.0%
7503
 
4.0%
2404
 
5.3%
2008
10.7%
1608
10.7%
1208
10.7%
802
 
2.7%

건수(기본)
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct56
Distinct (%)74.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean91.82666667
Minimum0
Maximum272
Zeros7
Zeros (%)9.3%
Negative0
Negative (%)0.0%
Memory size803.0 B
2022-08-12T23:48:39.511589image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q17
median86
Q3144
95-th percentile253.1
Maximum272
Range272
Interquartile range (IQR)137

Descriptive statistics

Standard deviation80.56999195
Coefficient of variation (CV)0.8774138807
Kurtosis-0.6468130949
Mean91.82666667
Median Absolute Deviation (MAD)76
Skewness0.5491431952
Sum6887
Variance6491.523604
MonotonicityNot monotonic
2022-08-12T23:48:39.672815image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
07
 
9.3%
73
 
4.0%
33
 
4.0%
23
 
4.0%
982
 
2.7%
642
 
2.7%
62
 
2.7%
1352
 
2.7%
1652
 
2.7%
1222
 
2.7%
Other values (46)47
62.7%
ValueCountFrequency (%)
07
9.3%
12
 
2.7%
23
4.0%
33
4.0%
41
 
1.3%
62
 
2.7%
73
4.0%
101
 
1.3%
161
 
1.3%
221
 
1.3%
ValueCountFrequency (%)
2721
1.3%
2611
1.3%
2601
1.3%
2581
1.3%
2511
1.3%
2491
1.3%
2321
1.3%
2061
1.3%
1891
1.3%
1751
1.3%

건수(할증)
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct52
Distinct (%)69.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean78.62666667
Minimum0
Maximum295
Zeros6
Zeros (%)8.0%
Negative0
Negative (%)0.0%
Memory size803.0 B
2022-08-12T23:48:39.848949image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15.5
median52
Q3128.5
95-th percentile228.2
Maximum295
Range295
Interquartile range (IQR)123

Descriptive statistics

Standard deviation78.00706367
Coefficient of variation (CV)0.9921196837
Kurtosis0.06105737906
Mean78.62666667
Median Absolute Deviation (MAD)50
Skewness0.9603237499
Sum5897
Variance6085.101982
MonotonicityNot monotonic
2022-08-12T23:48:40.012966image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27
 
9.3%
06
 
8.0%
334
 
5.3%
33
 
4.0%
442
 
2.7%
922
 
2.7%
2032
 
2.7%
12
 
2.7%
402
 
2.7%
312
 
2.7%
Other values (42)43
57.3%
ValueCountFrequency (%)
06
8.0%
12
 
2.7%
27
9.3%
33
4.0%
41
 
1.3%
71
 
1.3%
91
 
1.3%
121
 
1.3%
271
 
1.3%
281
 
1.3%
ValueCountFrequency (%)
2951
1.3%
2711
1.3%
2551
1.3%
2521
1.3%
2181
1.3%
2141
1.3%
2032
2.7%
2021
1.3%
1801
1.3%
1751
1.3%

Interactions

2022-08-12T23:48:37.600118image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:48:36.796264image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:48:37.213054image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:48:37.714919image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:48:36.980209image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:48:37.332980image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:48:37.819804image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:48:37.094837image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:48:37.449462image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-08-12T23:48:40.262948image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-12T23:48:40.408231image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-12T23:48:40.550399image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-12T23:48:40.707064image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-08-12T23:48:40.856479image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-08-12T23:48:38.035675image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-12T23:48:38.181601image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

검사년도선박 종류선박 중량검사수수료(천원)건수(기본)건수(할증)
02017벌크선2.5만톤 미만1125189125
12017차량운반선4만톤 이상187513651
22017벌크선4만톤 이상1875163218
32017컨테이너2만톤 미만75021
42017컨테이너2만~3만톤 미만112570
52017컨테이너3만~5만톤 미만15004033
62017컨테이너5만~7만톤 미만18759092
72017컨테이너7만톤 이상2250272214
82017기타선박류2.5만톤 미만11256453
92017기타선박류2.5만~4만톤 미만150014690

Last rows

검사년도선박 종류선박 중량검사수수료(천원)건수(기본)건수(할증)
652013기타선박류2.5만~4만톤 미만1605533
662013기타선박류2.5만톤 미만1204657
672013컨테이너7만톤 이상24013599
682013컨테이너5만~7만톤 미만20013485
692013컨테이너3만~5만톤 미만1608060
702013컨테이너2만~3만톤 미만12072
712013컨테이너2만톤 미만8003
722013벌크선4만톤 이상200122150
732013벌크선2.5만톤 미만120174160
742013벌크선2.5만~4만톤 미만160261203