Overview

Dataset statistics

Number of variables5
Number of observations120
Missing cells5
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.2 KiB
Average record size in memory44.1 B

Variable types

Categorical3
Numeric2

Dataset

Description잠업에 종사하는 전국의 모든 잠업가구의 양잠형태별 생산현황, 누에 생산 및 판매현황 조회 서비스
Author농림축산식품부
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220215000000001937

Alerts

PRDCTN_FRMHS_CO has 5 (4.2%) missing values Missing
PRDCTN_FRMHS_CO has 15 (12.5%) zeros Zeros

Reproduction

Analysis started2022-08-12 14:42:04.370348
Analysis finished2022-08-12 14:42:05.947881
Duration1.58 second
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

YEAR
Categorical

Distinct2
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
2013
60 
2014
60 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013
2nd row2013
3rd row2013
4th row2013
5th row2013

Common Values

ValueCountFrequency (%)
201360
50.0%
201460
50.0%

Length

2022-08-12T23:42:06.038157image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:42:06.189795image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
201360
50.0%
201460
50.0%

CTPRVN
Categorical

Distinct13
Distinct (%)10.8%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
대구광역시
10 
광주광역시
10 
세종자치시
10 
경기도
10 
강원도
10 
Other values (8)
70 

Length

Max length7
Median length6
Mean length4.25
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row제주광역시
2nd row제주광역시
3rd row제주광역시
4th row제주광역시
5th row제주광역시

Common Values

ValueCountFrequency (%)
대구광역시10
8.3%
광주광역시10
8.3%
세종자치시10
8.3%
경기도10
8.3%
강원도10
8.3%
충청북도10
8.3%
충청남도10
8.3%
전라북도10
8.3%
전라남도10
8.3%
경상북도10
8.3%
Other values (3)20
16.7%

Length

2022-08-12T23:42:06.336370image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
대구광역시10
8.3%
광주광역시10
8.3%
세종자치시10
8.3%
경기도10
8.3%
강원도10
8.3%
충청북도10
8.3%
충청남도10
8.3%
전라북도10
8.3%
전라남도10
8.3%
경상북도10
8.3%
Other values (3)20
16.7%

SE
Categorical

Distinct5
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
3년이하
24 
3~5년
24 
6~10년
24 
11~20년
24 
21년이상
24 

Length

Max length6
Median length5
Mean length4.8
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3년이하
2nd row3~5년
3rd row6~10년
4th row11~20년
5th row21년이상

Common Values

ValueCountFrequency (%)
3년이하24
20.0%
3~5년24
20.0%
6~10년24
20.0%
11~20년24
20.0%
21년이상24
20.0%

Length

2022-08-12T23:42:06.544759image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T23:42:06.723650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
3년이하24
20.0%
3~5년24
20.0%
6~10년24
20.0%
11~20년24
20.0%
21년이상24
20.0%

PRDCTN_FRMHS_CO
Real number (ℝ≥0)

MISSING
ZEROS

Distinct71
Distinct (%)61.7%
Missing5
Missing (%)4.2%
Infinite0
Infinite (%)0.0%
Mean112.6869565
Minimum0
Maximum1600
Zeros15
Zeros (%)12.5%
Negative0
Negative (%)0.0%
Memory size1.2 KiB
2022-08-12T23:42:06.930049image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median43
Q397.5
95-th percentile372
Maximum1600
Range1600
Interquartile range (IQR)92.5

Descriptive statistics

Standard deviation238.8734292
Coefficient of variation (CV)2.119796617
Kurtosis24.97959279
Mean112.6869565
Median Absolute Deviation (MAD)42
Skewness4.657113986
Sum12959
Variance57060.51518
MonotonicityNot monotonic
2022-08-12T23:42:07.145567image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
015
 
12.5%
18
 
6.7%
55
 
4.2%
24
 
3.3%
433
 
2.5%
923
 
2.5%
133
 
2.5%
102
 
1.7%
412
 
1.7%
282
 
1.7%
Other values (61)68
56.7%
(Missing)5
 
4.2%
ValueCountFrequency (%)
015
12.5%
18
6.7%
24
 
3.3%
55
 
4.2%
72
 
1.7%
81
 
0.8%
91
 
0.8%
102
 
1.7%
111
 
0.8%
121
 
0.8%
ValueCountFrequency (%)
16001
0.8%
15651
0.8%
8681
0.8%
7601
0.8%
4261
0.8%
4141
0.8%
3541
0.8%
3521
0.8%
3291
0.8%
3091
0.8%

CTPRVN_CD
Real number (ℝ≥0)

Distinct12
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.5
Minimum27
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 KiB
2022-08-12T23:42:07.348246image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum27
5-th percentile27
Q139.75
median43.5
Q346.25
95-th percentile50
Maximum50
Range23
Interquartile range (IQR)6.5

Descriptive statistics

Standard deviation6.999399734
Coefficient of variation (CV)0.1686602346
Kurtosis-0.2028529704
Mean41.5
Median Absolute Deviation (MAD)3
Skewness-0.9868816487
Sum4980
Variance48.99159664
MonotonicityNot monotonic
2022-08-12T23:42:07.501297image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
5010
8.3%
2710
8.3%
2910
8.3%
3610
8.3%
4110
8.3%
4210
8.3%
4310
8.3%
4410
8.3%
4510
8.3%
4610
8.3%
Other values (2)20
16.7%
ValueCountFrequency (%)
2710
8.3%
2910
8.3%
3610
8.3%
4110
8.3%
4210
8.3%
4310
8.3%
4410
8.3%
4510
8.3%
4610
8.3%
4710
8.3%
ValueCountFrequency (%)
5010
8.3%
4810
8.3%
4710
8.3%
4610
8.3%
4510
8.3%
4410
8.3%
4310
8.3%
4210
8.3%
4110
8.3%
3610
8.3%

Interactions

2022-08-12T23:42:05.023947image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:42:04.597739image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:42:05.229966image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T23:42:04.867608image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-08-12T23:42:07.686592image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-12T23:42:07.882166image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-12T23:42:08.185367image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-12T23:42:08.360161image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-08-12T23:42:08.599977image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-08-12T23:42:05.464244image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-12T23:42:05.789793image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-08-12T23:42:05.887066image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

YEARCTPRVNSEPRDCTN_FRMHS_COCTPRVN_CD
02013제주광역시3년이하050
12013제주광역시3~5년1050
22013제주광역시6~10년150
32013제주광역시11~20년050
42013제주광역시21년이상050
52013대구광역시3년이하1327
62013대구광역시3~5년027
72013대구광역시6~10년027
82013대구광역시11~20년027
92013대구광역시21년이상127

Last rows

YEARCTPRVNSEPRDCTN_FRMHS_COCTPRVN_CD
1102014경상북도3년이하8347
1112014경상북도3~5년16847
1122014경상북도6~10년12247
1132014경상북도11~20년9147
1142014경상북도21년이상18847
1152014경상남도3년이하6348
1162014경상남도3~5년12948
1172014경상남도6~10년6548
1182014경상남도11~20년4148
1192014경상남도21년이상4148