Overview

Dataset statistics

Number of variables9
Number of observations100
Missing cells100
Missing cells (%)11.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 KiB
Average record size in memory74.3 B

Variable types

Text2
Unsupported1
Boolean1
Categorical5

Alerts

출처 has constant value ""Constant
갱신내용 has constant value ""Constant
플래그 is highly overall correlated with UVCB 여부 and 2 other fieldsHigh correlation
UVCB 여부 is highly overall correlated with 플래그 and 1 other fieldsHigh correlation
플래그 정의 is highly overall correlated with UVCB 여부 and 2 other fieldsHigh correlation
상용여부 is highly overall correlated with 플래그 and 1 other fieldsHigh correlation
UVCB 여부 is highly imbalanced (85.9%)Imbalance
플래그 is highly imbalanced (64.9%)Imbalance
플래그 정의 is highly imbalanced (64.9%)Imbalance
정의 has 100 (100.0%) missing valuesMissing
정의 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-10 11:21:27.670024
Analysis finished2023-12-10 11:21:28.911404
Duration1.24 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct94
Distinct (%)94.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T20:21:29.280248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length7
Mean length7.56
Min length7

Characters and Unicode

Total characters756
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)91.0%

Sample

1st row50-00-0
2nd row623-37-0
3rd row770-35-4
4th row920-66-1
5th row50-01-1
ValueCountFrequency (%)
50-45-3 3
 
3.0%
50-30-6 3
 
3.0%
51-36-5 3
 
3.0%
51-83-2 1
 
1.0%
51-57-0 1
 
1.0%
51-52-5 1
 
1.0%
51-48-9 1
 
1.0%
51-46-7 1
 
1.0%
51-45-6 1
 
1.0%
51-44-5 1
 
1.0%
Other values (84) 84
84.0%
2023-12-10T20:21:30.007509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 200
26.5%
5 112
14.8%
1 84
11.1%
0 73
 
9.7%
2 50
 
6.6%
3 48
 
6.3%
6 40
 
5.3%
7 40
 
5.3%
4 38
 
5.0%
9 36
 
4.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 556
73.5%
Dash Punctuation 200
 
26.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 112
20.1%
1 84
15.1%
0 73
13.1%
2 50
9.0%
3 48
8.6%
6 40
 
7.2%
7 40
 
7.2%
4 38
 
6.8%
9 36
 
6.5%
8 35
 
6.3%
Dash Punctuation
ValueCountFrequency (%)
- 200
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 756
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 200
26.5%
5 112
14.8%
1 84
11.1%
0 73
 
9.7%
2 50
 
6.6%
3 48
 
6.3%
6 40
 
5.3%
7 40
 
5.3%
4 38
 
5.0%
9 36
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 756
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 200
26.5%
5 112
14.8%
1 84
11.1%
0 73
 
9.7%
2 50
 
6.6%
3 48
 
6.3%
6 40
 
5.3%
7 40
 
5.3%
4 38
 
5.0%
9 36
 
4.8%
Distinct94
Distinct (%)94.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T20:21:30.307259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length100
Median length54
Mean length37.81
Min length6

Characters and Unicode

Total characters3781
Distinct characters59
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)91.0%

Sample

1st rowFormaldehyde
2nd row3-Hexanol
3rd row2-Propanol, 1-phenoxy-
4th row2-Propanol, 1,1,1,3,3,3-hexafluoro-
5th rowGuanidine, hydrochloride (1:1)
ValueCountFrequency (%)
acid 27
 
10.5%
benzoic 14
 
5.5%
1:1 9
 
3.5%
ester 9
 
3.5%
hydrochloride 6
 
2.3%
benzene 4
 
1.6%
phenol 4
 
1.6%
2,6-dichloro 3
 
1.2%
11.beta 3
 
1.2%
3,5-dichloro 3
 
1.2%
Other values (158) 174
68.0%
2023-12-10T20:21:30.954053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 321
 
8.5%
e 279
 
7.4%
o 243
 
6.4%
i 219
 
5.8%
, 205
 
5.4%
n 176
 
4.7%
a 163
 
4.3%
l 160
 
4.2%
157
 
4.2%
h 150
 
4.0%
Other values (49) 1708
45.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2369
62.7%
Decimal Number 347
 
9.2%
Dash Punctuation 321
 
8.5%
Other Punctuation 264
 
7.0%
Uppercase Letter 164
 
4.3%
Space Separator 157
 
4.2%
Open Punctuation 83
 
2.2%
Close Punctuation 76
 
2.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 279
11.8%
o 243
10.3%
i 219
 
9.2%
n 176
 
7.4%
a 163
 
6.9%
l 160
 
6.8%
h 150
 
6.3%
y 147
 
6.2%
d 141
 
6.0%
t 136
 
5.7%
Other values (11) 555
23.4%
Uppercase Letter
ValueCountFrequency (%)
B 28
17.1%
P 27
16.5%
N 21
12.8%
H 18
11.0%
S 10
 
6.1%
O 8
 
4.9%
E 8
 
4.9%
R 8
 
4.9%
C 7
 
4.3%
A 6
 
3.7%
Other values (8) 23
14.0%
Decimal Number
ValueCountFrequency (%)
1 111
32.0%
2 64
18.4%
3 52
15.0%
4 41
 
11.8%
5 18
 
5.2%
8 16
 
4.6%
6 15
 
4.3%
7 14
 
4.0%
9 9
 
2.6%
0 7
 
2.0%
Other Punctuation
ValueCountFrequency (%)
, 205
77.7%
. 41
 
15.5%
: 12
 
4.5%
' 6
 
2.3%
Open Punctuation
ValueCountFrequency (%)
( 60
72.3%
[ 23
 
27.7%
Close Punctuation
ValueCountFrequency (%)
) 57
75.0%
] 19
 
25.0%
Dash Punctuation
ValueCountFrequency (%)
- 321
100.0%
Space Separator
ValueCountFrequency (%)
157
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2533
67.0%
Common 1248
33.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 279
 
11.0%
o 243
 
9.6%
i 219
 
8.6%
n 176
 
6.9%
a 163
 
6.4%
l 160
 
6.3%
h 150
 
5.9%
y 147
 
5.8%
d 141
 
5.6%
t 136
 
5.4%
Other values (29) 719
28.4%
Common
ValueCountFrequency (%)
- 321
25.7%
, 205
16.4%
157
12.6%
1 111
 
8.9%
2 64
 
5.1%
( 60
 
4.8%
) 57
 
4.6%
3 52
 
4.2%
. 41
 
3.3%
4 41
 
3.3%
Other values (10) 139
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3781
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 321
 
8.5%
e 279
 
7.4%
o 243
 
6.4%
i 219
 
5.8%
, 205
 
5.4%
n 176
 
4.7%
a 163
 
4.3%
l 160
 
4.2%
157
 
4.2%
h 150
 
4.0%
Other values (49) 1708
45.2%

정의
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing100
Missing (%)100.0%
Memory size1.0 KiB

UVCB 여부
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size232.0 B
False
98 
True
 
2
ValueCountFrequency (%)
False 98
98.0%
True 2
 
2.0%
2023-12-10T20:21:31.152887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

플래그
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
<NA>
87 
S
 
4
PMN
 
3
SP
 
3
5E
 
3

Length

Max length4
Median length4
Mean length3.73
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 87
87.0%
S 4
 
4.0%
PMN 3
 
3.0%
SP 3
 
3.0%
5E 3
 
3.0%

Length

2023-12-10T20:21:31.347871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T20:21:31.543513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 87
87.0%
s 4
 
4.0%
pmn 3
 
3.0%
sp 3
 
3.0%
5e 3
 
3.0%

플래그 정의
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
<NA>
87 
SNUR(중요신규용도규정) 대상물질(a substance that is identified in a final Significant New?Use Rule)
 
4
사전제조신고 물질(a commenced PMN substance)
 
3
SNUR(중요신규용도규정(안)) 대상물질(a substance that is identified in a proposed Significant New Use Rule)
 
3
TSCA section 5(e) 대상물질C3:C16 (a substance that is the subject of a TSCA section 5(e) order)
 
3

Length

Max length93
Median length4
Mean length13.56
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 87
87.0%
SNUR(중요신규용도규정) 대상물질(a substance that is identified in a final Significant New?Use Rule) 4
 
4.0%
사전제조신고 물질(a commenced PMN substance) 3
 
3.0%
SNUR(중요신규용도규정(안)) 대상물질(a substance that is identified in a proposed Significant New Use Rule) 3
 
3.0%
TSCA section 5(e) 대상물질C3:C16 (a substance that is the subject of a TSCA section 5(e) order) 3
 
3.0%

Length

2023-12-10T20:21:31.733287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T20:21:31.934289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 87
36.7%
substance 13
 
5.5%
a 13
 
5.5%
that 10
 
4.2%
is 10
 
4.2%
in 7
 
3.0%
rule 7
 
3.0%
significant 7
 
3.0%
identified 7
 
3.0%
대상물질(a 7
 
3.0%
Other values (19) 69
29.1%

상용여부
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
활성화
77 
비활성화
23 

Length

Max length4
Median length3
Mean length3.23
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row활성화
2nd row활성화
3rd row활성화
4th row활성화
5th row활성화

Common Values

ValueCountFrequency (%)
활성화 77
77.0%
비활성화 23
 
23.0%

Length

2023-12-10T20:21:32.119440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T20:21:32.283815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
활성화 77
77.0%
비활성화 23
 
23.0%

출처
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
ACToR
100 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowACToR
2nd rowACToR
3rd rowACToR
4th rowACToR
5th rowACToR

Common Values

ValueCountFrequency (%)
ACToR 100
100.0%

Length

2023-12-10T20:21:32.465399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T20:21:32.636559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
actor 100
100.0%

갱신내용
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Last updated 09/2019
100 

Length

Max length20
Median length20
Mean length20
Min length20

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLast updated 09/2019
2nd rowLast updated 09/2019
3rd rowLast updated 09/2019
4th rowLast updated 09/2019
5th rowLast updated 09/2019

Common Values

ValueCountFrequency (%)
Last updated 09/2019 100
100.0%

Length

2023-12-10T20:21:32.808486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T20:21:32.985055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
last 100
33.3%
updated 100
33.3%
09/2019 100
33.3%

Correlations

2023-12-10T20:21:33.103639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
CAS등록번호화학물질영문UVCB 여부플래그플래그 정의상용여부
CAS등록번호1.0001.0001.0000.0000.0001.000
화학물질영문1.0001.0001.0000.0000.0001.000
UVCB 여부1.0001.0001.000NaNNaN0.000
플래그0.0000.000NaN1.0001.000NaN
플래그 정의0.0000.000NaN1.0001.000NaN
상용여부1.0001.0000.000NaNNaN1.000
2023-12-10T20:21:33.340959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
플래그UVCB 여부플래그 정의상용여부
플래그1.0001.0001.0001.000
UVCB 여부1.0001.0001.0000.000
플래그 정의1.0001.0001.0001.000
상용여부1.0000.0001.0001.000
2023-12-10T20:21:33.535906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
UVCB 여부플래그플래그 정의상용여부
UVCB 여부1.0001.0001.0000.000
플래그1.0001.0001.0001.000
플래그 정의1.0001.0001.0001.000
상용여부0.0001.0001.0001.000

Missing values

2023-12-10T20:21:28.573918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T20:21:28.818236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

CAS등록번호화학물질영문정의UVCB 여부플래그플래그 정의상용여부출처갱신내용
050-00-0Formaldehyde<NA>N<NA><NA>활성화ACToRLast updated 09/2019
1623-37-03-Hexanol<NA>N<NA><NA>활성화ACToRLast updated 09/2019
2770-35-42-Propanol, 1-phenoxy-<NA>N<NA><NA>활성화ACToRLast updated 09/2019
3920-66-12-Propanol, 1,1,1,3,3,3-hexafluoro-<NA>N<NA><NA>활성화ACToRLast updated 09/2019
450-01-1Guanidine, hydrochloride (1:1)<NA>N<NA><NA>활성화ACToRLast updated 09/2019
51070-40-25-Decyne-4,7-diol<NA>N<NA><NA>활성화ACToRLast updated 09/2019
61117-79-9Octane, 3-chloro-<NA>N<NA><NA>비활성화ACToRLast updated 09/2019
71122-81-2Pyridine, 4-propyl-<NA>N<NA><NA>비활성화ACToRLast updated 09/2019
850-02-2Pregna-1,4-diene-3,20-dione, 9-fluoro-11,17,21-trihydroxy-16-methyl-, (11.beta.,16.alpha.)-<NA>N<NA><NA>활성화ACToRLast updated 09/2019
91313-60-6Sodium peroxide (Na2(O2))<NA>N<NA><NA>활성화ACToRLast updated 09/2019
CAS등록번호화학물질영문정의UVCB 여부플래그플래그 정의상용여부출처갱신내용
9051-93-4Ethanaminium, N,N,N-trimethyl-, iodide (1:1)<NA>N<NA><NA>비활성화ACToRLast updated 09/2019
9152-01-7Pregn-4-ene-21-carboxylic acid, 7-(acetylthio)-17-hydroxy-3-oxo-, .gamma.-lactone, (7.alpha.,17.alph<NA>N<NA><NA>비활성화ACToRLast updated 09/2019
9252-39-1Pregn-4-en-18-al, 11,21-dihydroxy-3,20-dioxo-, (11.beta.)-<NA>N<NA><NA>활성화ACToRLast updated 09/2019
9352-51-71,3-Propanediol, 2-bromo-2-nitro-<NA>N<NA><NA>활성화ACToRLast updated 09/2019
9452-52-8Cyclopentanecarboxylic acid, 1-amino-<NA>N<NA><NA>활성화ACToRLast updated 09/2019
9552-85-7Phosphorothioic acid, O-[4-[(dimethylamino)sulfonyl]phenyl] O,O-dimethyl ester<NA>N<NA><NA>활성화ACToRLast updated 09/2019
9652-88-08-Azoniabicyclo[3.2.1]octane, 3-(3-hydroxy-1-oxo-2-phenylpropoxy)-8,8-dimethyl-, (3-endo)-, nitrate<NA>N<NA><NA>비활성화ACToRLast updated 09/2019
9752-89-1L-Cysteine, hydrochloride (1:1)<NA>N<NA><NA>활성화ACToRLast updated 09/2019
9852-90-4L-Cysteine<NA>N<NA><NA>활성화ACToRLast updated 09/2019
9953-03-2Pregna-1,4-diene-3,11,20-trione, 17,21-dihydroxy-<NA>N<NA><NA>비활성화ACToRLast updated 09/2019