Overview

Dataset statistics

Number of variables4
Number of observations449
Missing cells423
Missing cells (%)23.6%
Duplicate rows1
Duplicate rows (%)0.2%
Total size in memory14.6 KiB
Average record size in memory33.3 B

Variable types

Numeric1
Categorical2
Text1

Dataset

Description환경경영정보포털에서 제공하는 에코디자인 아이디어에 해당되는 키워드정보를 저장(구조 개선, 에너지 저감, 소재 개선, 공유, 수자원 저감, 신재생 에너지 이용 등)
Author환경부
URLhttps://www.data.go.kr/data/15071195/fileData.do

Alerts

Dataset has 1 (0.2%) duplicate rowsDuplicates
키워드코드 is highly overall correlated with 키워드이름High correlation
키워드이름 is highly overall correlated with 키워드코드High correlation
기타키워드내용 has 423 (94.2%) missing valuesMissing

Reproduction

Analysis started2024-04-21 07:43:39.391056
Analysis finished2024-04-21 07:43:40.388574
Duration1 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

키워드 번호
Real number (ℝ)

Distinct199
Distinct (%)44.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean240.60356
Minimum100
Maximum377
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 KiB
2024-04-21T16:43:40.592532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile112
Q1171
median237
Q3306
95-th percentile366.6
Maximum377
Range277
Interquartile range (IQR)135

Descriptive statistics

Standard deviation82.533752
Coefficient of variation (CV)0.34302797
Kurtosis-1.1793466
Mean240.60356
Median Absolute Deviation (MAD)67
Skewness0.094434622
Sum108031
Variance6811.8202
MonotonicityNot monotonic
2024-04-21T16:43:41.012412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
259 11
 
2.4%
353 9
 
2.0%
289 8
 
1.8%
354 8
 
1.8%
366 7
 
1.6%
361 7
 
1.6%
313 6
 
1.3%
362 6
 
1.3%
363 5
 
1.1%
358 5
 
1.1%
Other values (189) 377
84.0%
ValueCountFrequency (%)
100 4
0.9%
101 2
0.4%
102 1
 
0.2%
104 4
0.9%
105 2
0.4%
106 2
0.4%
107 2
0.4%
109 2
0.4%
111 3
0.7%
112 2
0.4%
ValueCountFrequency (%)
377 1
 
0.2%
376 1
 
0.2%
375 1
 
0.2%
374 1
 
0.2%
373 1
 
0.2%
372 1
 
0.2%
371 5
1.1%
370 5
1.1%
369 1
 
0.2%
368 3
0.7%

키워드이름
Categorical

HIGH CORRELATION 

Distinct23
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Memory size3.6 KiB
폐기물 저감/자원화
56 
에너지 저감
56 
자원 저감
50 
기타
26 
생산자/소비자 경제성 향상
26 
Other values (18)
235 

Length

Max length14
Median length11
Mean length7.3563474
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row구조 개선
2nd row에너지 저감
3rd row소재 개선
4th row폐기물 저감/자원화
5th row공유(Sharing)

Common Values

ValueCountFrequency (%)
폐기물 저감/자원화 56
12.5%
에너지 저감 56
12.5%
자원 저감 50
 
11.1%
기타 26
 
5.8%
생산자/소비자 경제성 향상 26
 
5.8%
공기질 개선 24
 
5.3%
신재생 에너지 이용 23
 
5.1%
재활용 효율 향상 23
 
5.1%
폐기물의 친환경적 처리 20
 
4.5%
수질 개선 19
 
4.2%
Other values (13) 126
28.1%

Length

2024-04-21T16:43:41.455137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
저감 117
 
11.8%
개선 115
 
11.6%
에너지 79
 
8.0%
향상 68
 
6.8%
폐기물 56
 
5.6%
저감/자원화 56
 
5.6%
자원 50
 
5.0%
효율 44
 
4.4%
생산자/소비자 26
 
2.6%
경제성 26
 
2.6%
Other values (26) 356
35.9%

기타키워드내용
Text

MISSING 

Distinct17
Distinct (%)65.4%
Missing423
Missing (%)94.2%
Memory size3.6 KiB
2024-04-21T16:43:42.164093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length4.5769231
Min length2

Characters and Unicode

Total characters119
Distinct characters55
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)50.0%

Sample

1st row친환경제품
2nd row무단투기 개선
3rd row개인 위생
4th row교육
5th row재난대응
ValueCountFrequency (%)
zz 4
 
10.3%
제품 4
 
10.3%
친환경 4
 
10.3%
교육 3
 
7.7%
재난 2
 
5.1%
대응 2
 
5.1%
위생 2
 
5.1%
개인 2
 
5.1%
생활 1
 
2.6%
산불진화 1
 
2.6%
Other values (14) 14
35.9%
2024-04-21T16:43:43.304397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
13
 
10.9%
z 8
 
6.7%
6
 
5.0%
5
 
4.2%
5
 
4.2%
5
 
4.2%
5
 
4.2%
3
 
2.5%
3
 
2.5%
3
 
2.5%
Other values (45) 63
52.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 95
79.8%
Space Separator 13
 
10.9%
Lowercase Letter 9
 
7.6%
Decimal Number 1
 
0.8%
Uppercase Letter 1
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6
 
6.3%
5
 
5.3%
5
 
5.3%
5
 
5.3%
5
 
5.3%
3
 
3.2%
3
 
3.2%
3
 
3.2%
3
 
3.2%
3
 
3.2%
Other values (40) 54
56.8%
Lowercase Letter
ValueCountFrequency (%)
z 8
88.9%
o 1
 
11.1%
Space Separator
ValueCountFrequency (%)
13
100.0%
Decimal Number
ValueCountFrequency (%)
2 1
100.0%
Uppercase Letter
ValueCountFrequency (%)
C 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 95
79.8%
Common 14
 
11.8%
Latin 10
 
8.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6
 
6.3%
5
 
5.3%
5
 
5.3%
5
 
5.3%
5
 
5.3%
3
 
3.2%
3
 
3.2%
3
 
3.2%
3
 
3.2%
3
 
3.2%
Other values (40) 54
56.8%
Latin
ValueCountFrequency (%)
z 8
80.0%
C 1
 
10.0%
o 1
 
10.0%
Common
ValueCountFrequency (%)
13
92.9%
2 1
 
7.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 95
79.8%
ASCII 24
 
20.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13
54.2%
z 8
33.3%
2 1
 
4.2%
C 1
 
4.2%
o 1
 
4.2%
Hangul
ValueCountFrequency (%)
6
 
6.3%
5
 
5.3%
5
 
5.3%
5
 
5.3%
5
 
5.3%
3
 
3.2%
3
 
3.2%
3
 
3.2%
3
 
3.2%
3
 
3.2%
Other values (40) 54
56.8%

키워드코드
Categorical

HIGH CORRELATION 

Distinct23
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Memory size3.6 KiB
KWD21
56 
KWD17
56 
KWD19
50 
KWD23
26 
KWD10
26 
Other values (18)
235 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKWD03
2nd rowKWD17
3rd rowKWD12
4th rowKWD21
5th rowKWD02

Common Values

ValueCountFrequency (%)
KWD21 56
12.5%
KWD17 56
12.5%
KWD19 50
 
11.1%
KWD23 26
 
5.8%
KWD10 26
 
5.8%
KWD01 24
 
5.3%
KWD15 23
 
5.1%
KWD20 23
 
5.1%
KWD22 20
 
4.5%
KWD14 19
 
4.2%
Other values (13) 126
28.1%

Length

2024-04-21T16:43:43.692211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
kwd21 56
12.5%
kwd17 56
12.5%
kwd19 50
 
11.1%
kwd23 26
 
5.8%
kwd10 26
 
5.8%
kwd01 24
 
5.3%
kwd15 23
 
5.1%
kwd20 23
 
5.1%
kwd22 20
 
4.5%
kwd14 19
 
4.2%
Other values (13) 126
28.1%

Interactions

2024-04-21T16:43:39.672732image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-21T16:43:44.114442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
키워드 번호키워드이름기타키워드내용키워드코드
키워드 번호1.0000.3740.9450.374
키워드이름0.3741.0000.0001.000
기타키워드내용0.9450.0001.0000.000
키워드코드0.3741.0000.0001.000
2024-04-21T16:43:44.359015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
키워드코드키워드이름
키워드코드1.0001.000
키워드이름1.0001.000
2024-04-21T16:43:44.592169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
키워드 번호키워드이름키워드코드
키워드 번호1.0000.1450.145
키워드이름0.1451.0001.000
키워드코드0.1451.0001.000

Missing values

2024-04-21T16:43:39.999989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T16:43:40.277001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

키워드 번호키워드이름기타키워드내용키워드코드
0100구조 개선<NA>KWD03
1100에너지 저감<NA>KWD17
2101소재 개선<NA>KWD12
3102폐기물 저감/자원화<NA>KWD21
4104공유(Sharing)<NA>KWD02
5104수자원 저감<NA>KWD13
6104에너지 저감<NA>KWD17
7104자원 저감<NA>KWD19
8105생산자/소비자 경제성 향상<NA>KWD10
9105신재생 에너지 이용<NA>KWD15
키워드 번호키워드이름기타키워드내용키워드코드
439363생분해성 향상<NA>KWD08
440363재활용 효율 향상<NA>KWD20
441363폐기물 저감/자원화<NA>KWD21
442363폐기물의 친환경적 처리<NA>KWD22
443364소재 개선<NA>KWD12
444367신재생 에너지 이용<NA>KWD15
445367에너지 저감<NA>KWD17
446367기타<NA>KWD23
447376기타다기능 구이기KWD23
448377기타레인지후드보조장치KWD23

Duplicate rows

Most frequently occurring

키워드 번호키워드이름기타키워드내용키워드코드# duplicates
0353생산자/소비자 경제성 향상<NA>KWD102