Overview

Dataset statistics

Number of variables9
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.2 KiB
Average record size in memory73.3 B

Variable types

Text2
Categorical2
Boolean5

Alerts

출처 has constant value ""Constant
플래그 is highly overall correlated with 발암성, 생식세포 변이원성, 생식독성물질 여부 and 4 other fieldsHigh correlation
EU고위험성우려물질 여부 is highly overall correlated with 플래그High correlation
내분비계장애물질 여부 is highly overall correlated with 플래그High correlation
잔류물질 여부 is highly overall correlated with 플래그High correlation
특정표적장기독성물질 여부 is highly overall correlated with 플래그High correlation
발암성, 생식세포 변이원성, 생식독성물질 여부 is highly overall correlated with 플래그High correlation
잔류물질 여부 is highly imbalanced (75.8%)Imbalance
내분비계장애물질 여부 is highly imbalanced (63.4%)Imbalance
EU고위험성우려물질 여부 is highly imbalanced (75.8%)Imbalance
화학물질영문 has unique valuesUnique
CAS등록번호 has unique valuesUnique

Reproduction

Analysis started2023-12-10 11:42:40.765282
Analysis finished2023-12-10 11:42:41.981238
Duration1.22 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

화학물질영문
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T20:42:42.233476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length97
Median length32
Mean length19.43
Min length7

Characters and Unicode

Total characters1943
Distinct characters54
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowFormaldehyde
2nd rowbenzo[def]chrysene
3rd rowMechlorethamine
4th rowUrethane
5th rowcarbon tetrachloride
ValueCountFrequency (%)
phthalate 5
 
3.6%
lead 4
 
2.9%
acid 3
 
2.1%
carbon 3
 
2.1%
ether 3
 
2.1%
sulphate 2
 
1.4%
anhydride 2
 
1.4%
3,3'-dichlorobenzidine 2
 
1.4%
cobalt 2
 
1.4%
chloride 2
 
1.4%
Other values (110) 112
80.0%
2023-12-10T20:42:43.576702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 223
 
11.5%
o 147
 
7.6%
i 141
 
7.3%
l 124
 
6.4%
n 116
 
6.0%
h 114
 
5.9%
t 112
 
5.8%
a 100
 
5.1%
r 90
 
4.6%
d 79
 
4.1%
Other values (44) 697
35.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1597
82.2%
Decimal Number 80
 
4.1%
Dash Punctuation 74
 
3.8%
Uppercase Letter 68
 
3.5%
Other Punctuation 49
 
2.5%
Space Separator 41
 
2.1%
Close Punctuation 17
 
0.9%
Open Punctuation 17
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 223
14.0%
o 147
9.2%
i 141
8.8%
l 124
 
7.8%
n 116
 
7.3%
h 114
 
7.1%
t 112
 
7.0%
a 100
 
6.3%
r 90
 
5.6%
d 79
 
4.9%
Other values (10) 351
22.0%
Uppercase Letter
ValueCountFrequency (%)
D 15
22.1%
C 9
13.2%
B 6
 
8.8%
A 6
 
8.8%
T 6
 
8.8%
N 5
 
7.4%
M 4
 
5.9%
E 2
 
2.9%
L 2
 
2.9%
S 2
 
2.9%
Other values (8) 11
16.2%
Decimal Number
ValueCountFrequency (%)
2 27
33.8%
4 19
23.8%
1 18
22.5%
3 11
13.8%
6 3
 
3.8%
0 1
 
1.2%
5 1
 
1.2%
Other Punctuation
ValueCountFrequency (%)
, 35
71.4%
' 13
 
26.5%
; 1
 
2.0%
Close Punctuation
ValueCountFrequency (%)
) 13
76.5%
] 4
 
23.5%
Open Punctuation
ValueCountFrequency (%)
( 13
76.5%
[ 4
 
23.5%
Dash Punctuation
ValueCountFrequency (%)
- 74
100.0%
Space Separator
ValueCountFrequency (%)
41
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1665
85.7%
Common 278
 
14.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 223
13.4%
o 147
 
8.8%
i 141
 
8.5%
l 124
 
7.4%
n 116
 
7.0%
h 114
 
6.8%
t 112
 
6.7%
a 100
 
6.0%
r 90
 
5.4%
d 79
 
4.7%
Other values (28) 419
25.2%
Common
ValueCountFrequency (%)
- 74
26.6%
41
14.7%
, 35
12.6%
2 27
 
9.7%
4 19
 
6.8%
1 18
 
6.5%
' 13
 
4.7%
) 13
 
4.7%
( 13
 
4.7%
3 11
 
4.0%
Other values (6) 14
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1943
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 223
 
11.5%
o 147
 
7.6%
i 141
 
7.3%
l 124
 
6.4%
n 116
 
6.0%
h 114
 
5.9%
t 112
 
5.8%
a 100
 
5.1%
r 90
 
4.6%
d 79
 
4.1%
Other values (44) 697
35.9%

CAS등록번호
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T20:42:44.269283image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length7.61
Min length7

Characters and Unicode

Total characters761
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st row50-00-0
2nd row50-32-8
3rd row51-75-2
4th row51-79-6
5th row56-23-5
ValueCountFrequency (%)
50-00-0 1
 
1.0%
110-80-5 1
 
1.0%
127-18-4 1
 
1.0%
123-77-3 1
 
1.0%
121-14-2 1
 
1.0%
120-71-8 1
 
1.0%
119-90-4 1
 
1.0%
117-81-7 1
 
1.0%
116-14-3 1
 
1.0%
115-96-8 1
 
1.0%
Other values (90) 90
90.0%
2023-12-10T20:42:44.966501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 200
26.3%
1 98
12.9%
0 74
 
9.7%
7 61
 
8.0%
5 56
 
7.4%
3 49
 
6.4%
8 49
 
6.4%
6 45
 
5.9%
2 43
 
5.7%
9 43
 
5.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 561
73.7%
Dash Punctuation 200
 
26.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 98
17.5%
0 74
13.2%
7 61
10.9%
5 56
10.0%
3 49
8.7%
8 49
8.7%
6 45
8.0%
2 43
7.7%
9 43
7.7%
4 43
7.7%
Dash Punctuation
ValueCountFrequency (%)
- 200
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 761
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 200
26.3%
1 98
12.9%
0 74
 
9.7%
7 61
 
8.0%
5 56
 
7.4%
3 49
 
6.4%
8 49
 
6.4%
6 45
 
5.9%
2 43
 
5.7%
9 43
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 761
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 200
26.3%
1 98
12.9%
0 74
 
9.7%
7 61
 
8.0%
5 56
 
7.4%
3 49
 
6.4%
8 49
 
6.4%
6 45
 
5.9%
2 43
 
5.7%
9 43
 
5.7%

플래그
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
CMR
68 
CMR, STOT
12 
STOT
 
5
CMR, EDCs
 
5
기타(EU SVHC 호흡기과민성)
 
4
Other values (4)
 
6

Length

Max length18
Median length4
Mean length5.86
Min length4

Unique

Unique2 ?
Unique (%)2.0%

Sample

1st rowCMR
2nd rowCMR, PBT
3rd rowCMR
4th rowCMR
5th rowCMR, STOT

Common Values

ValueCountFrequency (%)
CMR 68
68.0%
CMR, STOT 12
 
12.0%
STOT 5
 
5.0%
CMR, EDCs 5
 
5.0%
기타(EU SVHC 호흡기과민성) 4
 
4.0%
CMR, PBT 2
 
2.0%
EDCs 2
 
2.0%
CMR, PBT, STOT 1
 
1.0%
PBT 1
 
1.0%

Length

2023-12-10T20:42:45.256796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T20:42:45.466829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
cmr 88
68.2%
stot 18
 
14.0%
edcs 7
 
5.4%
기타(eu 4
 
3.1%
svhc 4
 
3.1%
호흡기과민성 4
 
3.1%
pbt 4
 
3.1%
Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size232.0 B
True
88 
False
12 
ValueCountFrequency (%)
True 88
88.0%
False 12
 
12.0%
2023-12-10T20:42:45.663659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

잔류물질 여부
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size232.0 B
False
96 
True
 
4
ValueCountFrequency (%)
False 96
96.0%
True 4
 
4.0%
2023-12-10T20:42:45.797014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

특정표적장기독성물질 여부
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size232.0 B
False
82 
True
18 
ValueCountFrequency (%)
False 82
82.0%
True 18
 
18.0%
2023-12-10T20:42:45.917457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

내분비계장애물질 여부
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size232.0 B
False
93 
True
 
7
ValueCountFrequency (%)
False 93
93.0%
True 7
 
7.0%
2023-12-10T20:42:46.032247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

EU고위험성우려물질 여부
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size232.0 B
False
96 
True
 
4
ValueCountFrequency (%)
False 96
96.0%
True 4
 
4.0%
2023-12-10T20:42:46.144116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

출처
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
환경부고시 제2018-233호 별표1
100 

Length

Max length20
Median length20
Mean length20
Min length20

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row환경부고시 제2018-233호 별표1
2nd row환경부고시 제2018-233호 별표1
3rd row환경부고시 제2018-233호 별표1
4th row환경부고시 제2018-233호 별표1
5th row환경부고시 제2018-233호 별표1

Common Values

ValueCountFrequency (%)
환경부고시 제2018-233호 별표1 100
100.0%

Length

2023-12-10T20:42:46.308857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T20:42:46.465351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
환경부고시 100
33.3%
제2018-233호 100
33.3%
별표1 100
33.3%

Correlations

2023-12-10T20:42:46.576939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
화학물질영문CAS등록번호플래그발암성, 생식세포 변이원성, 생식독성물질 여부잔류물질 여부특정표적장기독성물질 여부내분비계장애물질 여부EU고위험성우려물질 여부
화학물질영문1.0001.0001.0001.0001.0001.0001.0001.000
CAS등록번호1.0001.0001.0001.0001.0001.0001.0001.000
플래그1.0001.0001.0001.0001.0001.0001.0001.000
발암성, 생식세포 변이원성, 생식독성물질 여부1.0001.0001.0001.0000.0000.2480.0000.668
잔류물질 여부1.0001.0001.0000.0001.0000.0000.0000.000
특정표적장기독성물질 여부1.0001.0001.0000.2480.0001.0000.0000.000
내분비계장애물질 여부1.0001.0001.0000.0000.0000.0001.0000.000
EU고위험성우려물질 여부1.0001.0001.0000.6680.0000.0000.0001.000
2023-12-10T20:42:46.786413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
플래그EU고위험성우려물질 여부내분비계장애물질 여부잔류물질 여부특정표적장기독성물질 여부발암성, 생식세포 변이원성, 생식독성물질 여부
플래그1.0000.9640.9640.9640.9640.964
EU고위험성우려물질 여부0.9641.0000.0000.0000.0000.466
내분비계장애물질 여부0.9640.0001.0000.0000.0000.000
잔류물질 여부0.9640.0000.0001.0000.0000.000
특정표적장기독성물질 여부0.9640.0000.0000.0001.0000.159
발암성, 생식세포 변이원성, 생식독성물질 여부0.9640.4660.0000.0000.1591.000
2023-12-10T20:42:46.970687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
플래그발암성, 생식세포 변이원성, 생식독성물질 여부잔류물질 여부특정표적장기독성물질 여부내분비계장애물질 여부EU고위험성우려물질 여부
플래그1.0000.9640.9640.9640.9640.964
발암성, 생식세포 변이원성, 생식독성물질 여부0.9641.0000.0000.1590.0000.466
잔류물질 여부0.9640.0001.0000.0000.0000.000
특정표적장기독성물질 여부0.9640.1590.0001.0000.0000.000
내분비계장애물질 여부0.9640.0000.0000.0001.0000.000
EU고위험성우려물질 여부0.9640.4660.0000.0000.0001.000

Missing values

2023-12-10T20:42:41.607786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T20:42:41.884885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

화학물질영문CAS등록번호플래그발암성, 생식세포 변이원성, 생식독성물질 여부잔류물질 여부특정표적장기독성물질 여부내분비계장애물질 여부EU고위험성우려물질 여부출처
0Formaldehyde50-00-0CMRYNNNN환경부고시 제2018-233호 별표1
1benzo[def]chrysene50-32-8CMR, PBTYYNNN환경부고시 제2018-233호 별표1
2Mechlorethamine51-75-2CMRYNNNN환경부고시 제2018-233호 별표1
3Urethane51-79-6CMRYNNNN환경부고시 제2018-233호 별표1
4carbon tetrachloride56-23-5CMR, STOTYNYNN환경부고시 제2018-233호 별표1
5diphenoxarsin-10-yl oxide58-36-6CMRYNNNN환경부고시 제2018-233호 별표1
6Aniline62-53-3CMR, STOTYNYNN환경부고시 제2018-233호 별표1
7diethyl sulphate64-67-5CMRYNNNN환경부고시 제2018-233호 별표1
8Chloroform67-66-3CMR, STOTYNYNN환경부고시 제2018-233호 별표1
9N,N-Dimethylformamide68-12-2CMRYNNNN환경부고시 제2018-233호 별표1
화학물질영문CAS등록번호플래그발암성, 생식세포 변이원성, 생식독성물질 여부잔류물질 여부특정표적장기독성물질 여부내분비계장애물질 여부EU고위험성우려물질 여부출처
90Carbon monoxide630-08-0CMR, STOTYNYNN환경부고시 제2018-233호 별표1
91dibutyltin dichloride683-18-1CMR, STOTYNYNN환경부고시 제2018-233호 별표1
921-Methyl-2-pyrrolidone872-50-4CMRYNNNN환경부고시 제2018-233호 별표1
93lead distearate1072-35-1CMRYNNNN환경부고시 제2018-233호 별표1
941,3-Propanesultone1120-71-4CMRYNNNN환경부고시 제2018-233호 별표1
95bis(pentabromophenyl) ether1163-19-5PBTNYNNN환경부고시 제2018-233호 별표1
96Gallium arsenide1303-00-0CMR, STOTYNYNN환경부고시 제2018-233호 별표1
97Arsenic sulfides1303-33-9CMRYNNNN환경부고시 제2018-233호 별표1
98diboron trioxide1303-86-2CMRYNNNN환경부고시 제2018-233호 별표1
99Cadmium oxide1306-19-0CMR, STOTYNYNN환경부고시 제2018-233호 별표1