gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	19992
Missing cells (%)	40.0%
Duplicate rows	1
Duplicate rows (%)	< 0.1%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	2
Categorical	3

Dataset

Description	임산물생산현황
Author	충청북도 단양군
URL	https://www.data.go.kr/data/3067868/fileData.do

Alerts

Dataset has 1 (< 0.1%) duplicate rows	Duplicates
`데이터기준일` is highly overall correlated with `생산량(kg,ℓ)` and 1 other fields	High correlation
`생산량(kg,ℓ)` is highly overall correlated with `생산액(백만원)` and 1 other fields	High correlation
`생산액(백만원)` is highly overall correlated with `생산량(kg,ℓ)` and 1 other fields	High correlation
`생산량(kg,ℓ)` is highly imbalanced (99.7%)	Imbalance
`생산액(백만원)` is highly imbalanced (99.7%)	Imbalance
`데이터기준일` is highly imbalanced (99.5%)	Imbalance
`종 류` has 9996 (> 99.9%) missing values	Missing
`품 목` has 9996 (> 99.9%) missing values	Missing

Reproduction

Analysis started	2023-12-12 09:13:06.552301
Analysis finished	2023-12-12 09:13:07.318277
Duration	0.77 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

종 류
Text

MISSING

Distinct	3
Distinct (%)	75.0%
Missing	9996
Missing (%)	> 99.9%
Memory size	156.2 KiB

Length

Max length	4
Median length	4
Mean length	3.75
Min length	3

Characters and Unicode

Total characters	15
Distinct characters	9
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	2 ?
Unique (%)	50.0%

Sample

1st row	관상식물
2nd row	산나물류
3rd row	산나물류
4th row	버섯류

Value	Count	Frequency (%)
산나물류	2	50.0%
관상식물	1	25.0%
버섯류	1	25.0%

Most occurring characters

Value	Count	Frequency (%)
물	3	20.0%
류	3	20.0%
산	2	13.3%
나	2	13.3%
관	1	6.7%
상	1	6.7%
식	1	6.7%
버	1	6.7%
섯	1	6.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	15	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
물	3	20.0%
류	3	20.0%
산	2	13.3%
나	2	13.3%
관	1	6.7%
상	1	6.7%
식	1	6.7%
버	1	6.7%
섯	1	6.7%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	15	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
물	3	20.0%
류	3	20.0%
산	2	13.3%
나	2	13.3%
관	1	6.7%
상	1	6.7%
식	1	6.7%
버	1	6.7%
섯	1	6.7%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	15	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
물	3	20.0%
류	3	20.0%
산	2	13.3%
나	2	13.3%
관	1	6.7%
상	1	6.7%
식	1	6.7%
버	1	6.7%
섯	1	6.7%

품 목
Text

MISSING

Distinct	4
Distinct (%)	100.0%
Missing	9996
Missing (%)	> 99.9%
Memory size	156.2 KiB

Length

Max length	5
Median length	3
Mean length	3.5
Min length	3

Characters and Unicode

Total characters	14
Distinct characters	13
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	4 ?
Unique (%)	100.0%

Sample

1st row	소나무
2nd row	고려엉겅퀴
3rd row	도라지
4th row	생표고

Value	Count	Frequency (%)
소나무	1	25.0%
고려엉겅퀴	1	25.0%
도라지	1	25.0%
생표고	1	25.0%

Most occurring characters

Value	Count	Frequency (%)
고	2	14.3%
소	1	7.1%
나	1	7.1%
무	1	7.1%
려	1	7.1%
엉	1	7.1%
겅	1	7.1%
퀴	1	7.1%
도	1	7.1%
라	1	7.1%
Other values (3)	3	21.4%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	14	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
고	2	14.3%
소	1	7.1%
나	1	7.1%
무	1	7.1%
려	1	7.1%
엉	1	7.1%
겅	1	7.1%
퀴	1	7.1%
도	1	7.1%
라	1	7.1%
Other values (3)	3	21.4%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	14	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
고	2	14.3%
소	1	7.1%
나	1	7.1%
무	1	7.1%
려	1	7.1%
엉	1	7.1%
겅	1	7.1%
퀴	1	7.1%
도	1	7.1%
라	1	7.1%
Other values (3)	3	21.4%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	14	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
고	2	14.3%
소	1	7.1%
나	1	7.1%
무	1	7.1%
려	1	7.1%
엉	1	7.1%
겅	1	7.1%
퀴	1	7.1%
도	1	7.1%
라	1	7.1%
Other values (3)	3	21.4%

생산량(kg,ℓ)
Categorical

HIGH CORRELATION IMBALANCE

Distinct	5
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

<NA>	9996
0	1
582141	1
1118	1
11488	1

Length

Max length	6
Median length	4
Mean length	4
Min length	1

Unique

Unique	4 ?
Unique (%)	< 0.1%

Sample

1st row	<NA>
2nd row	<NA>
3rd row	<NA>
4th row	<NA>
5th row	<NA>

Common Values

Value	Count	Frequency (%)
<NA>	9996	> 99.9%
0	1	< 0.1%
582141	1	< 0.1%
1118	1	< 0.1%
11488	1	< 0.1%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	9996	> 99.9%
0	1	< 0.1%
582141	1	< 0.1%
1118	1	< 0.1%
11488	1	< 0.1%

생산액(백만원)
Categorical

HIGH CORRELATION IMBALANCE

Distinct	5
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

<NA>	9996
0.0	1
1468.5	1
10.9	1
120.1	1

Length

Max length	6
Median length	4
Mean length	4.0002
Min length	3

Unique

Unique	4 ?
Unique (%)	< 0.1%

Sample

1st row	<NA>
2nd row	<NA>
3rd row	<NA>
4th row	<NA>
5th row	<NA>

Common Values

Value	Count	Frequency (%)
<NA>	9996	> 99.9%
0.0	1	< 0.1%
1468.5	1	< 0.1%
10.9	1	< 0.1%
120.1	1	< 0.1%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	9996	> 99.9%
0.0	1	< 0.1%
1468.5	1	< 0.1%
10.9	1	< 0.1%
120.1	1	< 0.1%

데이터기준일
Categorical

HIGH CORRELATION IMBALANCE

Distinct	2
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

<NA>	9996
2021-01-26	4

Length

Max length	10
Median length	4
Mean length	4.0024
Min length	4

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	<NA>
2nd row	<NA>
3rd row	<NA>
4th row	<NA>
5th row	<NA>

Common Values

Value	Count	Frequency (%)
<NA>	9996	> 99.9%
2021-01-26	4	< 0.1%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	9996	> 99.9%
2021-01-26	4	< 0.1%

Heatmap
Table

	종 류	품 목	생산량(kg,ℓ)	생산액(백만원)
종 류	1.000	1.000	1.000	1.000
품 목	1.000	1.000	1.000	1.000
생산량(kg,ℓ)	1.000	1.000	1.000	1.000
생산액(백만원)	1.000	1.000	1.000	1.000

Heatmap
Table

	데이터기준일	생산량(kg,ℓ)	생산액(백만원)
데이터기준일	1.000	1.000	1.000
생산량(kg,ℓ)	1.000	1.000	1.000
생산액(백만원)	1.000	1.000	1.000

Heatmap
Table

	생산량(kg,ℓ)	생산액(백만원)	데이터기준일
생산량(kg,ℓ)	1.000	1.000	1.000
생산액(백만원)	1.000	1.000	1.000
데이터기준일	1.000	1.000	1.000

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	종 류	품 목	생산량(kg,ℓ)	생산액(백만원)	데이터기준일
42364	<NA>	<NA>	<NA>	<NA>	<NA>
272	<NA>	<NA>	<NA>	<NA>	<NA>
83062	<NA>	<NA>	<NA>	<NA>	<NA>
75481	<NA>	<NA>	<NA>	<NA>	<NA>
88960	<NA>	<NA>	<NA>	<NA>	<NA>
11837	<NA>	<NA>	<NA>	<NA>	<NA>
41968	<NA>	<NA>	<NA>	<NA>	<NA>
46130	<NA>	<NA>	<NA>	<NA>	<NA>
67024	<NA>	<NA>	<NA>	<NA>	<NA>
17120	<NA>	<NA>	<NA>	<NA>	<NA>

	종 류	품 목	생산량(kg,ℓ)	생산액(백만원)	데이터기준일
17632	<NA>	<NA>	<NA>	<NA>	<NA>
43853	<NA>	<NA>	<NA>	<NA>	<NA>
85059	<NA>	<NA>	<NA>	<NA>	<NA>
87502	<NA>	<NA>	<NA>	<NA>	<NA>
20746	<NA>	<NA>	<NA>	<NA>	<NA>
9826	<NA>	<NA>	<NA>	<NA>	<NA>
779	<NA>	<NA>	<NA>	<NA>	<NA>
18481	<NA>	<NA>	<NA>	<NA>	<NA>
66301	<NA>	<NA>	<NA>	<NA>	<NA>
68352	<NA>	<NA>	<NA>	<NA>	<NA>

Most frequently occurring

	종 류	품 목	생산량(kg,ℓ)	생산액(백만원)	데이터기준일	# duplicates
0	<NA>	<NA>	<NA>	<NA>	<NA>	9996

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring