Dataset statistics
Number of variables | 9 |
---|---|
Number of observations | 100 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 7.2 KiB |
Average record size in memory | 73.3 B |
Variable types
Text | 2 |
---|---|
Categorical | 2 |
Boolean | 5 |
Dataset
Description | 샘플 데이터 |
---|---|
Author | 그린에코스 |
URL | https://www.bigdata-environment.kr/user/data_market/detail.do?id=6b3c1fa0-c1c2-11ea-ab4e-d75b916d14e4 |
출처 has constant value "" | Constant |
플래그 is highly overall correlated with 발암성, 생식세포 변이원성, 생식독성물질 여부 and 4 other fields | High correlation |
EU고위험성우려물질 여부 is highly overall correlated with 플래그 | High correlation |
내분비계장애물질 여부 is highly overall correlated with 플래그 | High correlation |
잔류물질 여부 is highly overall correlated with 플래그 | High correlation |
특정표적장기독성물질 여부 is highly overall correlated with 플래그 | High correlation |
발암성, 생식세포 변이원성, 생식독성물질 여부 is highly overall correlated with 플래그 | High correlation |
잔류물질 여부 is highly imbalanced (75.8%) | Imbalance |
내분비계장애물질 여부 is highly imbalanced (63.4%) | Imbalance |
EU고위험성우려물질 여부 is highly imbalanced (75.8%) | Imbalance |
화학물질영문 has unique values | Unique |
CAS등록번호 has unique values | Unique |
Reproduction
Analysis started | 2023-12-10 11:42:40.765282 |
---|---|
Analysis finished | 2023-12-10 11:42:41.981238 |
Duration | 1.22 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
화학물질영문
Text
UNIQUE
 
Distinct | 100 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Length
Max length | 97 |
---|---|
Median length | 32 |
Mean length | 19.43 |
Min length | 7 |
Characters and Unicode
Total characters | 1943 |
---|---|
Distinct characters | 54 |
Distinct categories | 8 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 100 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | Formaldehyde |
---|---|
2nd row | benzo[def]chrysene |
3rd row | Mechlorethamine |
4th row | Urethane |
5th row | carbon tetrachloride |
Value | Count | Frequency (%) |
phthalate | 5 | 3.6% |
lead | 4 | 2.9% |
acid | 3 | 2.1% |
carbon | 3 | 2.1% |
ether | 3 | 2.1% |
sulphate | 2 | 1.4% |
anhydride | 2 | 1.4% |
3,3'-dichlorobenzidine | 2 | 1.4% |
cobalt | 2 | 1.4% |
chloride | 2 | 1.4% |
Other values (110) | 112 |
Most occurring characters
Value | Count | Frequency (%) |
e | 223 | 11.5% |
o | 147 | 7.6% |
i | 141 | 7.3% |
l | 124 | 6.4% |
n | 116 | 6.0% |
h | 114 | 5.9% |
t | 112 | 5.8% |
a | 100 | 5.1% |
r | 90 | 4.6% |
d | 79 | 4.1% |
Other values (44) | 697 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 1597 | |
Decimal Number | 80 | 4.1% |
Dash Punctuation | 74 | 3.8% |
Uppercase Letter | 68 | 3.5% |
Other Punctuation | 49 | 2.5% |
Space Separator | 41 | 2.1% |
Close Punctuation | 17 | 0.9% |
Open Punctuation | 17 | 0.9% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 223 | |
o | 147 | |
i | 141 | |
l | 124 | 7.8% |
n | 116 | 7.3% |
h | 114 | 7.1% |
t | 112 | 7.0% |
a | 100 | 6.3% |
r | 90 | 5.6% |
d | 79 | 4.9% |
Other values (10) | 351 |
Uppercase Letter
Value | Count | Frequency (%) |
D | 15 | |
C | 9 | |
B | 6 | 8.8% |
A | 6 | 8.8% |
T | 6 | 8.8% |
N | 5 | 7.4% |
M | 4 | 5.9% |
E | 2 | 2.9% |
L | 2 | 2.9% |
S | 2 | 2.9% |
Other values (8) | 11 |
Decimal Number
Value | Count | Frequency (%) |
2 | 27 | |
4 | 19 | |
1 | 18 | |
3 | 11 | |
6 | 3 | 3.8% |
0 | 1 | 1.2% |
5 | 1 | 1.2% |
Other Punctuation
Value | Count | Frequency (%) |
, | 35 | |
' | 13 | 26.5% |
; | 1 | 2.0% |
Close Punctuation
Value | Count | Frequency (%) |
) | 13 | |
] | 4 | 23.5% |
Open Punctuation
Value | Count | Frequency (%) |
( | 13 | |
[ | 4 | 23.5% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 74 |
Space Separator
Value | Count | Frequency (%) |
41 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 1665 | |
Common | 278 | 14.3% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 223 | |
o | 147 | 8.8% |
i | 141 | 8.5% |
l | 124 | 7.4% |
n | 116 | 7.0% |
h | 114 | 6.8% |
t | 112 | 6.7% |
a | 100 | 6.0% |
r | 90 | 5.4% |
d | 79 | 4.7% |
Other values (28) | 419 |
Common
Value | Count | Frequency (%) |
- | 74 | |
41 | ||
, | 35 | |
2 | 27 | 9.7% |
4 | 19 | 6.8% |
1 | 18 | 6.5% |
' | 13 | 4.7% |
) | 13 | 4.7% |
( | 13 | 4.7% |
3 | 11 | 4.0% |
Other values (6) | 14 | 5.0% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1943 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 223 | 11.5% |
o | 147 | 7.6% |
i | 141 | 7.3% |
l | 124 | 6.4% |
n | 116 | 6.0% |
h | 114 | 5.9% |
t | 112 | 5.8% |
a | 100 | 5.1% |
r | 90 | 4.6% |
d | 79 | 4.1% |
Other values (44) | 697 |
CAS등록번호
Text
UNIQUE
 
Distinct | 100 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Value | Count | Frequency (%) |
50-00-0 | 1 | 1.0% |
110-80-5 | 1 | 1.0% |
127-18-4 | 1 | 1.0% |
123-77-3 | 1 | 1.0% |
121-14-2 | 1 | 1.0% |
120-71-8 | 1 | 1.0% |
119-90-4 | 1 | 1.0% |
117-81-7 | 1 | 1.0% |
116-14-3 | 1 | 1.0% |
115-96-8 | 1 | 1.0% |
Other values (90) | 90 |
Most occurring characters
Value | Count | Frequency (%) |
- | 200 | |
1 | 98 | |
0 | 74 | 9.7% |
7 | 61 | 8.0% |
5 | 56 | 7.4% |
3 | 49 | 6.4% |
8 | 49 | 6.4% |
6 | 45 | 5.9% |
2 | 43 | 5.7% |
9 | 43 | 5.7% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 561 | |
Dash Punctuation | 200 | 26.3% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 98 | |
0 | 74 | |
7 | 61 | |
5 | 56 | |
3 | 49 | |
8 | 49 | |
6 | 45 | |
2 | 43 | |
9 | 43 | |
4 | 43 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 200 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 761 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
- | 200 | |
1 | 98 | |
0 | 74 | 9.7% |
7 | 61 | 8.0% |
5 | 56 | 7.4% |
3 | 49 | 6.4% |
8 | 49 | 6.4% |
6 | 45 | 5.9% |
2 | 43 | 5.7% |
9 | 43 | 5.7% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 761 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
- | 200 | |
1 | 98 | |
0 | 74 | 9.7% |
7 | 61 | 8.0% |
5 | 56 | 7.4% |
3 | 49 | 6.4% |
8 | 49 | 6.4% |
6 | 45 | 5.9% |
2 | 43 | 5.7% |
9 | 43 | 5.7% |
플래그
Categorical
HIGH CORRELATION
 
Distinct | 9 |
---|---|
Distinct (%) | 9.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
CMR | |
---|---|
CMR, STOT | |
STOT | 5 |
CMR, EDCs | 5 |
기타(EU SVHC 호흡기과민성) | 4 |
Other values (4) | 6 |
Length
Max length | 18 |
---|---|
Median length | 4 |
Mean length | 5.86 |
Min length | 4 |
Unique
Unique | 2 ? |
---|---|
Unique (%) | 2.0% |
Sample
1st row | CMR |
---|---|
2nd row | CMR, PBT |
3rd row | CMR |
4th row | CMR |
5th row | CMR, STOT |
Common Values
Value | Count | Frequency (%) |
CMR | 68 | |
CMR, STOT | 12 | 12.0% |
STOT | 5 | 5.0% |
CMR, EDCs | 5 | 5.0% |
기타(EU SVHC 호흡기과민성) | 4 | 4.0% |
CMR, PBT | 2 | 2.0% |
EDCs | 2 | 2.0% |
CMR, PBT, STOT | 1 | 1.0% |
PBT | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
cmr | 88 | |
stot | 18 | 14.0% |
edcs | 7 | 5.4% |
기타(eu | 4 | 3.1% |
svhc | 4 | 3.1% |
호흡기과민성 | 4 | 3.1% |
pbt | 4 | 3.1% |
발암성, 생식세포 변이원성, 생식독성물질 여부
Boolean
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 232.0 B |
True | |
---|---|
False |
Value | Count | Frequency (%) |
True | 88 | |
False | 12 | 12.0% |
잔류물질 여부
Boolean
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 232.0 B |
False | |
---|---|
True | 4 |
Value | Count | Frequency (%) |
False | 96 | |
True | 4 | 4.0% |
특정표적장기독성물질 여부
Boolean
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 232.0 B |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 82 | |
True | 18 | 18.0% |
내분비계장애물질 여부
Boolean
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 232.0 B |
False | |
---|---|
True | 7 |
Value | Count | Frequency (%) |
False | 93 | |
True | 7 | 7.0% |
EU고위험성우려물질 여부
Boolean
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 232.0 B |
False | |
---|---|
True | 4 |
Value | Count | Frequency (%) |
False | 96 | |
True | 4 | 4.0% |
출처
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
환경부고시 제2018-233호 별표1 |
---|
Length
Max length | 20 |
---|---|
Median length | 20 |
Mean length | 20 |
Min length | 20 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 환경부고시 제2018-233호 별표1 |
---|---|
2nd row | 환경부고시 제2018-233호 별표1 |
3rd row | 환경부고시 제2018-233호 별표1 |
4th row | 환경부고시 제2018-233호 별표1 |
5th row | 환경부고시 제2018-233호 별표1 |
Common Values
Value | Count | Frequency (%) |
환경부고시 제2018-233호 별표1 | 100 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
환경부고시 | 100 | |
제2018-233호 | 100 | |
별표1 | 100 |
화학물질영문 | CAS등록번호 | 플래그 | 발암성, 생식세포 변이원성, 생식독성물질 여부 | 잔류물질 여부 | 특정표적장기독성물질 여부 | 내분비계장애물질 여부 | EU고위험성우려물질 여부 | |
---|---|---|---|---|---|---|---|---|
화학물질영문 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
CAS등록번호 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
플래그 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
발암성, 생식세포 변이원성, 생식독성물질 여부 | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.248 | 0.000 | 0.668 |
잔류물질 여부 | 1.000 | 1.000 | 1.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 |
특정표적장기독성물질 여부 | 1.000 | 1.000 | 1.000 | 0.248 | 0.000 | 1.000 | 0.000 | 0.000 |
내분비계장애물질 여부 | 1.000 | 1.000 | 1.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 |
EU고위험성우려물질 여부 | 1.000 | 1.000 | 1.000 | 0.668 | 0.000 | 0.000 | 0.000 | 1.000 |
플래그 | EU고위험성우려물질 여부 | 내분비계장애물질 여부 | 잔류물질 여부 | 특정표적장기독성물질 여부 | 발암성, 생식세포 변이원성, 생식독성물질 여부 | |
---|---|---|---|---|---|---|
플래그 | 1.000 | 0.964 | 0.964 | 0.964 | 0.964 | 0.964 |
EU고위험성우려물질 여부 | 0.964 | 1.000 | 0.000 | 0.000 | 0.000 | 0.466 |
내분비계장애물질 여부 | 0.964 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 |
잔류물질 여부 | 0.964 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 |
특정표적장기독성물질 여부 | 0.964 | 0.000 | 0.000 | 0.000 | 1.000 | 0.159 |
발암성, 생식세포 변이원성, 생식독성물질 여부 | 0.964 | 0.466 | 0.000 | 0.000 | 0.159 | 1.000 |
플래그 | 발암성, 생식세포 변이원성, 생식독성물질 여부 | 잔류물질 여부 | 특정표적장기독성물질 여부 | 내분비계장애물질 여부 | EU고위험성우려물질 여부 | |
---|---|---|---|---|---|---|
플래그 | 1.000 | 0.964 | 0.964 | 0.964 | 0.964 | 0.964 |
발암성, 생식세포 변이원성, 생식독성물질 여부 | 0.964 | 1.000 | 0.000 | 0.159 | 0.000 | 0.466 |
잔류물질 여부 | 0.964 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 |
특정표적장기독성물질 여부 | 0.964 | 0.159 | 0.000 | 1.000 | 0.000 | 0.000 |
내분비계장애물질 여부 | 0.964 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 |
EU고위험성우려물질 여부 | 0.964 | 0.466 | 0.000 | 0.000 | 0.000 | 1.000 |
화학물질영문 | CAS등록번호 | 플래그 | 발암성, 생식세포 변이원성, 생식독성물질 여부 | 잔류물질 여부 | 특정표적장기독성물질 여부 | 내분비계장애물질 여부 | EU고위험성우려물질 여부 | 출처 | |
---|---|---|---|---|---|---|---|---|---|
0 | Formaldehyde | 50-00-0 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
1 | benzo[def]chrysene | 50-32-8 | CMR, PBT | Y | Y | N | N | N | 환경부고시 제2018-233호 별표1 |
2 | Mechlorethamine | 51-75-2 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
3 | Urethane | 51-79-6 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
4 | carbon tetrachloride | 56-23-5 | CMR, STOT | Y | N | Y | N | N | 환경부고시 제2018-233호 별표1 |
5 | diphenoxarsin-10-yl oxide | 58-36-6 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
6 | Aniline | 62-53-3 | CMR, STOT | Y | N | Y | N | N | 환경부고시 제2018-233호 별표1 |
7 | diethyl sulphate | 64-67-5 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
8 | Chloroform | 67-66-3 | CMR, STOT | Y | N | Y | N | N | 환경부고시 제2018-233호 별표1 |
9 | N,N-Dimethylformamide | 68-12-2 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
화학물질영문 | CAS등록번호 | 플래그 | 발암성, 생식세포 변이원성, 생식독성물질 여부 | 잔류물질 여부 | 특정표적장기독성물질 여부 | 내분비계장애물질 여부 | EU고위험성우려물질 여부 | 출처 | |
---|---|---|---|---|---|---|---|---|---|
90 | Carbon monoxide | 630-08-0 | CMR, STOT | Y | N | Y | N | N | 환경부고시 제2018-233호 별표1 |
91 | dibutyltin dichloride | 683-18-1 | CMR, STOT | Y | N | Y | N | N | 환경부고시 제2018-233호 별표1 |
92 | 1-Methyl-2-pyrrolidone | 872-50-4 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
93 | lead distearate | 1072-35-1 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
94 | 1,3-Propanesultone | 1120-71-4 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
95 | bis(pentabromophenyl) ether | 1163-19-5 | PBT | N | Y | N | N | N | 환경부고시 제2018-233호 별표1 |
96 | Gallium arsenide | 1303-00-0 | CMR, STOT | Y | N | Y | N | N | 환경부고시 제2018-233호 별표1 |
97 | Arsenic sulfides | 1303-33-9 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
98 | diboron trioxide | 1303-86-2 | CMR | Y | N | N | N | N | 환경부고시 제2018-233호 별표1 |
99 | Cadmium oxide | 1306-19-0 | CMR, STOT | Y | N | Y | N | N | 환경부고시 제2018-233호 별표1 |