gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	120
Missing cells	5
Missing cells (%)	0.8%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	5.2 KiB
Average record size in memory	44.1 B

Variable types

Categorical	3
Numeric	2

Dataset

Description	잠업에 종사하는 전국의 모든 잠업가구의 양잠형태별 생산현황, 누에 생산 및 판매현황 조회 서비스
Author	농림축산식품부
URL	https://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220215000000001937

Alerts

`PRDCTN_FRMHS_CO` has 5 (4.2%) missing values	Missing
`PRDCTN_FRMHS_CO` has 15 (12.5%) zeros	Zeros

Reproduction

Analysis started	2022-08-12 14:42:04.370348
Analysis finished	2022-08-12 14:42:05.947881
Duration	1.58 second
Software version	pandas-profiling v3.2.0
Download configuration	config.json

YEAR
Categorical

Distinct	2
Distinct (%)	1.7%
Missing	0
Missing (%)	0.0%
Memory size	1.1 KiB

2013	60
2014	60

Length

Max length	4
Median length	4
Mean length	4
Min length	4

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	2013
2nd row	2013
3rd row	2013
4th row	2013
5th row	2013

Common Values

Value	Count	Frequency (%)
2013	60	50.0%
2014	60	50.0%

Length

Histogram of lengths of the category

Category Frequency Plot

Value	Count	Frequency (%)
2013	60	50.0%
2014	60	50.0%

CTPRVN
Categorical

Distinct	13
Distinct (%)	10.8%
Missing	0
Missing (%)	0.0%
Memory size	1.1 KiB

대구광역시	10
광주광역시	10
세종자치시	10
경기도	10
강원도	10
Other values (8)	70

Length

Max length	7
Median length	6
Mean length	4.25
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	제주광역시
2nd row	제주광역시
3rd row	제주광역시
4th row	제주광역시
5th row	제주광역시

Common Values

Value	Count	Frequency (%)
대구광역시	10	8.3%
광주광역시	10	8.3%
세종자치시	10	8.3%
경기도	10	8.3%
강원도	10	8.3%
충청북도	10	8.3%
충청남도	10	8.3%
전라북도	10	8.3%
전라남도	10	8.3%
경상북도	10	8.3%
Other values (3)	20	16.7%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
대구광역시	10	8.3%
광주광역시	10	8.3%
세종자치시	10	8.3%
경기도	10	8.3%
강원도	10	8.3%
충청북도	10	8.3%
충청남도	10	8.3%
전라북도	10	8.3%
전라남도	10	8.3%
경상북도	10	8.3%
Other values (3)	20	16.7%

SE
Categorical

Distinct	5
Distinct (%)	4.2%
Missing	0
Missing (%)	0.0%
Memory size	1.1 KiB

3년이하	24
3~5년	24
6~10년	24
11~20년	24
21년이상	24

Length

Max length	6
Median length	5
Mean length	4.8
Min length	4

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	3년이하
2nd row	3~5년
3rd row	6~10년
4th row	11~20년
5th row	21년이상

Common Values

Value	Count	Frequency (%)
3년이하	24	20.0%
3~5년	24	20.0%
6~10년	24	20.0%
11~20년	24	20.0%
21년이상	24	20.0%

Length

Histogram of lengths of the category

Category Frequency Plot

Value	Count	Frequency (%)
3년이하	24	20.0%
3~5년	24	20.0%
6~10년	24	20.0%
11~20년	24	20.0%
21년이상	24	20.0%

PRDCTN_FRMHS_CO
Real number (ℝ_≥0)

MISSING
ZEROS

Distinct	71
Distinct (%)	61.7%
Missing	5
Missing (%)	4.2%
Infinite	0
Infinite (%)	0.0%
Mean	112.6869565

Minimum	0
Maximum	1600
Zeros	15
Zeros (%)	12.5%
Negative	0
Negative (%)	0.0%
Memory size	1.2 KiB

Quantile statistics

Minimum	0
5-th percentile	0
Q1	5
median	43
Q3	97.5
95-th percentile	372
Maximum	1600
Range	1600
Interquartile range (IQR)	92.5

Descriptive statistics

Standard deviation	238.8734292
Coefficient of variation (CV)	2.119796617
Kurtosis	24.97959279
Mean	112.6869565
Median Absolute Deviation (MAD)	42
Skewness	4.657113986
Sum	12959
Variance	57060.51518
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	15	12.5%
1	8	6.7%
5	5	4.2%
2	4	3.3%
43	3	2.5%
92	3	2.5%
13	3	2.5%
10	2	1.7%
41	2	1.7%
28	2	1.7%
Other values (61)	68	56.7%
(Missing)	5	4.2%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
0	15	12.5%
1	8	6.7%
2	4	3.3%
5	5	4.2%
7	2	1.7%
8	1	0.8%
9	1	0.8%
10	2	1.7%
11	1	0.8%
12	1	0.8%

Value	Count	Frequency (%)
1600	1	0.8%
1565	1	0.8%
868	1	0.8%
760	1	0.8%
426	1	0.8%
414	1	0.8%
354	1	0.8%
352	1	0.8%
329	1	0.8%
309	1	0.8%

CTPRVN_CD
Real number (ℝ_≥0)

Distinct	12
Distinct (%)	10.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	41.5

Minimum	27
Maximum	50
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	1.2 KiB

Quantile statistics

Minimum	27
5-th percentile	27
Q1	39.75
median	43.5
Q3	46.25
95-th percentile	50
Maximum	50
Range	23
Interquartile range (IQR)	6.5

Descriptive statistics

Standard deviation	6.999399734
Coefficient of variation (CV)	0.1686602346
Kurtosis	-0.2028529704
Mean	41.5
Median Absolute Deviation (MAD)	3
Skewness	-0.9868816487
Sum	4980
Variance	48.99159664
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=12)

Value	Count	Frequency (%)
50	10	8.3%
27	10	8.3%
29	10	8.3%
36	10	8.3%
41	10	8.3%
42	10	8.3%
43	10	8.3%
44	10	8.3%
45	10	8.3%
46	10	8.3%
Other values (2)	20	16.7%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
27	10	8.3%
29	10	8.3%
36	10	8.3%
41	10	8.3%
42	10	8.3%
43	10	8.3%
44	10	8.3%
45	10	8.3%
46	10	8.3%
47	10	8.3%

Value	Count	Frequency (%)
50	10	8.3%
48	10	8.3%
47	10	8.3%
46	10	8.3%
45	10	8.3%
44	10	8.3%
43	10	8.3%
42	10	8.3%
41	10	8.3%
36	10	8.3%

PRDCTN_FRMHS_CO
CTPRVN_CD

CTPRVN_CD
PRDCTN_FRMHS_CO

CTPRVN_CD
PRDCTN_FRMHS_CO

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

First rows

	YEAR	CTPRVN	SE	PRDCTN_FRMHS_CO	CTPRVN_CD
0	2013	제주광역시	3년이하	0	50
1	2013	제주광역시	3~5년	10	50
2	2013	제주광역시	6~10년	1	50
3	2013	제주광역시	11~20년	0	50
4	2013	제주광역시	21년이상	0	50
5	2013	대구광역시	3년이하	13	27
6	2013	대구광역시	3~5년	0	27
7	2013	대구광역시	6~10년	0	27
8	2013	대구광역시	11~20년	0	27
9	2013	대구광역시	21년이상	1	27

Last rows

	YEAR	CTPRVN	SE	PRDCTN_FRMHS_CO	CTPRVN_CD
110	2014	경상북도	3년이하	83	47
111	2014	경상북도	3~5년	168	47
112	2014	경상북도	6~10년	122	47
113	2014	경상북도	11~20년	91	47
114	2014	경상북도	21년이상	188	47
115	2014	경상남도	3년이하	63	48
116	2014	경상남도	3~5년	129	48
117	2014	경상남도	6~10년	65	48
118	2014	경상남도	11~20년	41	48
119	2014	경상남도	21년이상	41	48

Overview

Variables

Common Values

Length

Category Frequency Plot

Common Values

Length

Common Values

Length

Category Frequency Plot

Interactions

Correlations

Pearson's r

Spearman's ρ

Kendall's τ

Phik (φk)

Cramér's V (φc)

Missing values

Sample

First rows

Last rows