Overview

Dataset statistics

Number of variables3
Number of observations153
Missing cells69
Missing cells (%)15.0%
Duplicate rows1
Duplicate rows (%)0.7%
Total size in memory3.7 KiB
Average record size in memory24.9 B

Variable types

Categorical2
Text1

Dataset

Description본 데이터는 환경산업기술원에서 구축 중인 챗봇의 질문답변 세트(21.9.19 기준)의 기준이 되는 항목 분류코드를 정리한 내용입니다.
Author한국환경산업기술원
URLhttps://www.data.go.kr/data/15089192/fileData.do

Alerts

Dataset has 1 (0.7%) duplicate rowsDuplicates
카테고리 대분류-필수 is highly overall correlated with 카테고리 중분류-필수High correlation
카테고리 중분류-필수 is highly overall correlated with 카테고리 대분류-필수High correlation
카테고리 소분류-필수 has 69 (45.1%) missing valuesMissing

Reproduction

Analysis started2023-12-12 09:04:35.838015
Analysis finished2023-12-12 09:04:36.178529
Duration0.34 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

카테고리 대분류-필수
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
<NA>
69 
환경정보공개
27 
환경표지인증
25 
가정용보일러인증
16 
녹색구매·제품정보
16 

Length

Max length9
Median length8
Mean length5.620915
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row가정용보일러인증
2nd row가정용보일러인증
3rd row가정용보일러인증
4th row가정용보일러인증
5th row가정용보일러인증

Common Values

ValueCountFrequency (%)
<NA> 69
45.1%
환경정보공개 27
 
17.6%
환경표지인증 25
 
16.3%
가정용보일러인증 16
 
10.5%
녹색구매·제품정보 16
 
10.5%

Length

2023-12-12T18:04:36.256515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:04:36.372551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 69
45.1%
환경정보공개 27
 
17.6%
환경표지인증 25
 
16.3%
가정용보일러인증 16
 
10.5%
녹색구매·제품정보 16
 
10.5%

카테고리 중분류-필수
Categorical

HIGH CORRELATION 

Distinct22
Distinct (%)14.4%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
<NA>
69 
정보 등록방법
10 
기타
10 
인증신청
(기관용)구매실적
Other values (17)
49 

Length

Max length10
Median length4
Mean length4.9673203
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기타
2nd row기타
3rd row기타
4th row인증신청
5th row인증신청

Common Values

ValueCountFrequency (%)
<NA> 69
45.1%
정보 등록방법 10
 
6.5%
기타 10
 
6.5%
인증신청 8
 
5.2%
(기관용)구매실적 7
 
4.6%
인증정보 6
 
3.9%
인증결과 5
 
3.3%
서류 및 현장검증 4
 
2.6%
시상식 3
 
2.0%
계정관리 및 로그인 3
 
2.0%
Other values (12) 28
18.3%

Length

2023-12-12T18:04:36.537023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 69
35.9%
13
 
6.8%
등록방법 10
 
5.2%
기타 10
 
5.2%
정보 10
 
5.2%
인증신청 8
 
4.2%
기관용)구매실적 7
 
3.6%
인증정보 6
 
3.1%
인증결과 5
 
2.6%
서류 4
 
2.1%
Other values (19) 50
26.0%
Distinct74
Distinct (%)88.1%
Missing69
Missing (%)45.1%
Memory size1.3 KiB
2023-12-12T18:04:36.846119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length13
Mean length5.6428571
Min length2

Characters and Unicode

Total characters474
Distinct characters140
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique68 ?
Unique (%)81.0%

Sample

1st row보조금
2nd row컨설팅
3rd row계정분실
4th row신규신청
5th row변경신청
ValueCountFrequency (%)
5
 
4.3%
신청방법 4
 
3.4%
가능여부 4
 
3.4%
사용량 3
 
2.6%
로그인 2
 
1.7%
실적제출 2
 
1.7%
조회 2
 
1.7%
시험성적서 2
 
1.7%
교육 2
 
1.7%
기타 2
 
1.7%
Other values (84) 88
75.9%
2023-12-12T18:04:37.416221image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
32
 
6.8%
19
 
4.0%
17
 
3.6%
12
 
2.5%
11
 
2.3%
11
 
2.3%
10
 
2.1%
10
 
2.1%
10
 
2.1%
9
 
1.9%
Other values (130) 333
70.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 434
91.6%
Space Separator 32
 
6.8%
Other Punctuation 4
 
0.8%
Open Punctuation 2
 
0.4%
Close Punctuation 2
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
19
 
4.4%
17
 
3.9%
12
 
2.8%
11
 
2.5%
11
 
2.5%
10
 
2.3%
10
 
2.3%
10
 
2.3%
9
 
2.1%
9
 
2.1%
Other values (125) 316
72.8%
Other Punctuation
ValueCountFrequency (%)
· 3
75.0%
/ 1
 
25.0%
Space Separator
ValueCountFrequency (%)
32
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 434
91.6%
Common 40
 
8.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
19
 
4.4%
17
 
3.9%
12
 
2.8%
11
 
2.5%
11
 
2.5%
10
 
2.3%
10
 
2.3%
10
 
2.3%
9
 
2.1%
9
 
2.1%
Other values (125) 316
72.8%
Common
ValueCountFrequency (%)
32
80.0%
· 3
 
7.5%
( 2
 
5.0%
) 2
 
5.0%
/ 1
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 434
91.6%
ASCII 37
 
7.8%
None 3
 
0.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
32
86.5%
( 2
 
5.4%
) 2
 
5.4%
/ 1
 
2.7%
Hangul
ValueCountFrequency (%)
19
 
4.4%
17
 
3.9%
12
 
2.8%
11
 
2.5%
11
 
2.5%
10
 
2.3%
10
 
2.3%
10
 
2.3%
9
 
2.1%
9
 
2.1%
Other values (125) 316
72.8%
None
ValueCountFrequency (%)
· 3
100.0%

Correlations

2023-12-12T18:04:37.568062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
카테고리 대분류-필수카테고리 중분류-필수카테고리 소분류-필수
카테고리 대분류-필수1.0000.9550.918
카테고리 중분류-필수0.9551.0000.000
카테고리 소분류-필수0.9180.0001.000
2023-12-12T18:04:37.691253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
카테고리 중분류-필수카테고리 대분류-필수
카테고리 중분류-필수1.0000.760
카테고리 대분류-필수0.7601.000
2023-12-12T18:04:37.799182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
카테고리 대분류-필수카테고리 중분류-필수
카테고리 대분류-필수1.0000.760
카테고리 중분류-필수0.7601.000

Missing values

2023-12-12T18:04:36.051956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:04:36.142750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

카테고리 대분류-필수카테고리 중분류-필수카테고리 소분류-필수
0가정용보일러인증기타보조금
1가정용보일러인증기타컨설팅
2가정용보일러인증기타계정분실
3가정용보일러인증인증신청신규신청
4가정용보일러인증인증신청변경신청
5가정용보일러인증인증신청처리기간
6가정용보일러인증인증정보국내판매
7가정용보일러인증인증정보인증제품조회
8가정용보일러인증인증정보인증종류
9가정용보일러인증인증정보인증필수
카테고리 대분류-필수카테고리 중분류-필수카테고리 소분류-필수
143<NA><NA><NA>
144<NA><NA><NA>
145<NA><NA><NA>
146<NA><NA><NA>
147<NA><NA><NA>
148<NA><NA><NA>
149<NA><NA><NA>
150<NA><NA><NA>
151<NA><NA><NA>
152<NA><NA><NA>

Duplicate rows

Most frequently occurring

카테고리 대분류-필수카테고리 중분류-필수카테고리 소분류-필수# duplicates
0<NA><NA><NA>69