Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells19992
Missing cells (%)40.0%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text2
Categorical3

Dataset

Description임산물생산현황
Author충청북도 단양군
URLhttps://www.data.go.kr/data/3067868/fileData.do

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
데이터기준일 is highly overall correlated with 생산량(kg,ℓ) and 1 other fieldsHigh correlation
생산량(kg,ℓ) is highly overall correlated with 생산액(백만원) and 1 other fieldsHigh correlation
생산액(백만원) is highly overall correlated with 생산량(kg,ℓ) and 1 other fieldsHigh correlation
생산량(kg,ℓ) is highly imbalanced (99.7%)Imbalance
생산액(백만원) is highly imbalanced (99.7%)Imbalance
데이터기준일 is highly imbalanced (99.5%)Imbalance
종 류 has 9996 (> 99.9%) missing valuesMissing
품 목 has 9996 (> 99.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 09:13:06.552301
Analysis finished2023-12-12 09:13:07.318277
Duration0.77 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

종 류
Text

MISSING 

Distinct3
Distinct (%)75.0%
Missing9996
Missing (%)> 99.9%
Memory size156.2 KiB
2023-12-12T18:13:07.413398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length3.75
Min length3

Characters and Unicode

Total characters15
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)50.0%

Sample

1st row관상식물
2nd row산나물류
3rd row산나물류
4th row버섯류
ValueCountFrequency (%)
산나물류 2
50.0%
관상식물 1
25.0%
버섯류 1
25.0%
2023-12-12T18:13:07.730797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3
20.0%
3
20.0%
2
13.3%
2
13.3%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 15
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3
20.0%
3
20.0%
2
13.3%
2
13.3%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
Hangul 15
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3
20.0%
3
20.0%
2
13.3%
2
13.3%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 15
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3
20.0%
3
20.0%
2
13.3%
2
13.3%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%

품 목
Text

MISSING 

Distinct4
Distinct (%)100.0%
Missing9996
Missing (%)> 99.9%
Memory size156.2 KiB
2023-12-12T18:13:07.938891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length3
Mean length3.5
Min length3

Characters and Unicode

Total characters14
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)100.0%

Sample

1st row소나무
2nd row고려엉겅퀴
3rd row도라지
4th row생표고
ValueCountFrequency (%)
소나무 1
25.0%
고려엉겅퀴 1
25.0%
도라지 1
25.0%
생표고 1
25.0%
2023-12-12T18:13:08.338971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2
14.3%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
Other values (3) 3
21.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 14
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2
14.3%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
Other values (3) 3
21.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 14
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2
14.3%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
Other values (3) 3
21.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 14
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2
14.3%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
Other values (3) 3
21.4%

생산량(kg,ℓ)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9996 
0
 
1
582141
 
1
1118
 
1
11488
 
1

Length

Max length6
Median length4
Mean length4
Min length1

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9996
> 99.9%
0 1
 
< 0.1%
582141 1
 
< 0.1%
1118 1
 
< 0.1%
11488 1
 
< 0.1%

Length

2023-12-12T18:13:08.538504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:13:08.704167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9996
> 99.9%
0 1
 
< 0.1%
582141 1
 
< 0.1%
1118 1
 
< 0.1%
11488 1
 
< 0.1%

생산액(백만원)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9996 
0.0
 
1
1468.5
 
1
10.9
 
1
120.1
 
1

Length

Max length6
Median length4
Mean length4.0002
Min length3

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9996
> 99.9%
0.0 1
 
< 0.1%
1468.5 1
 
< 0.1%
10.9 1
 
< 0.1%
120.1 1
 
< 0.1%

Length

2023-12-12T18:13:08.866115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:13:09.007169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9996
> 99.9%
0.0 1
 
< 0.1%
1468.5 1
 
< 0.1%
10.9 1
 
< 0.1%
120.1 1
 
< 0.1%

데이터기준일
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9996 
2021-01-26
 
4

Length

Max length10
Median length4
Mean length4.0024
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9996
> 99.9%
2021-01-26 4
 
< 0.1%

Length

2023-12-12T18:13:09.163939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:13:09.308913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9996
> 99.9%
2021-01-26 4
 
< 0.1%

Correlations

2023-12-12T18:13:09.387738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
종 류품 목생산량(kg,ℓ)생산액(백만원)
종 류1.0001.0001.0001.000
품 목1.0001.0001.0001.000
생산량(kg,ℓ)1.0001.0001.0001.000
생산액(백만원)1.0001.0001.0001.000
2023-12-12T18:13:09.524519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
데이터기준일생산량(kg,ℓ)생산액(백만원)
데이터기준일1.0001.0001.000
생산량(kg,ℓ)1.0001.0001.000
생산액(백만원)1.0001.0001.000
2023-12-12T18:13:09.638393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
생산량(kg,ℓ)생산액(백만원)데이터기준일
생산량(kg,ℓ)1.0001.0001.000
생산액(백만원)1.0001.0001.000
데이터기준일1.0001.0001.000

Missing values

2023-12-12T18:13:06.899057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:13:07.039319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T18:13:07.215024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

종 류품 목생산량(kg,ℓ)생산액(백만원)데이터기준일
42364<NA><NA><NA><NA><NA>
272<NA><NA><NA><NA><NA>
83062<NA><NA><NA><NA><NA>
75481<NA><NA><NA><NA><NA>
88960<NA><NA><NA><NA><NA>
11837<NA><NA><NA><NA><NA>
41968<NA><NA><NA><NA><NA>
46130<NA><NA><NA><NA><NA>
67024<NA><NA><NA><NA><NA>
17120<NA><NA><NA><NA><NA>
종 류품 목생산량(kg,ℓ)생산액(백만원)데이터기준일
17632<NA><NA><NA><NA><NA>
43853<NA><NA><NA><NA><NA>
85059<NA><NA><NA><NA><NA>
87502<NA><NA><NA><NA><NA>
20746<NA><NA><NA><NA><NA>
9826<NA><NA><NA><NA><NA>
779<NA><NA><NA><NA><NA>
18481<NA><NA><NA><NA><NA>
66301<NA><NA><NA><NA><NA>
68352<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

종 류품 목생산량(kg,ℓ)생산액(백만원)데이터기준일# duplicates
0<NA><NA><NA><NA><NA>9996