Overview

Dataset statistics

Number of variables6
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.8 KiB
Average record size in memory49.3 B

Variable types

Text3
Categorical2
DateTime1

Alerts

인용 출처 has constant value ""Constant
갱신일자 has constant value ""Constant
비고 is highly imbalanced (71.4%)Imbalance
CHRIP등록번호 has unique valuesUnique
CAS등록번호 has unique valuesUnique
화학물질영문 has unique valuesUnique

Reproduction

Analysis started2023-12-10 13:23:02.037327
Analysis finished2023-12-10 13:23:02.831532
Duration0.79 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

CHRIP등록번호
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T22:23:03.216307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters1200
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowC004-685-91A
2nd rowC005-480-87A
3rd rowC004-690-50A
4th rowC004-741-06A
5th rowC004-660-17A
ValueCountFrequency (%)
c004-685-91a 1
 
1.0%
c004-785-95a 1
 
1.0%
c004-660-62a 1
 
1.0%
c004-666-31a 1
 
1.0%
c004-685-24a 1
 
1.0%
c004-691-41a 1
 
1.0%
c004-737-38a 1
 
1.0%
c004-665-51a 1
 
1.0%
c004-726-55a 1
 
1.0%
c004-675-98a 1
 
1.0%
Other values (90) 90
90.0%
2023-12-10T22:23:03.939624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 245
20.4%
- 200
16.7%
4 123
10.2%
6 101
8.4%
C 100
8.3%
A 100
8.3%
7 79
 
6.6%
8 47
 
3.9%
2 47
 
3.9%
5 46
 
3.8%
Other values (3) 112
9.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 800
66.7%
Dash Punctuation 200
 
16.7%
Uppercase Letter 200
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 245
30.6%
4 123
15.4%
6 101
12.6%
7 79
 
9.9%
8 47
 
5.9%
2 47
 
5.9%
5 46
 
5.8%
1 41
 
5.1%
3 37
 
4.6%
9 34
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
C 100
50.0%
A 100
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 200
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1000
83.3%
Latin 200
 
16.7%

Most frequent character per script

Common
ValueCountFrequency (%)
0 245
24.5%
- 200
20.0%
4 123
12.3%
6 101
10.1%
7 79
 
7.9%
8 47
 
4.7%
2 47
 
4.7%
5 46
 
4.6%
1 41
 
4.1%
3 37
 
3.7%
Latin
ValueCountFrequency (%)
C 100
50.0%
A 100
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 245
20.4%
- 200
16.7%
4 123
10.2%
6 101
8.4%
C 100
8.3%
A 100
8.3%
7 79
 
6.6%
8 47
 
3.9%
2 47
 
3.9%
5 46
 
3.8%
Other values (3) 112
9.3%

CAS등록번호
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T22:23:04.452960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7
Mean length7.31
Min length7

Characters and Unicode

Total characters731
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st row50-00-0
2nd row57-09-0
3rd row57-55-6
4th row60-00-4
5th row62-53-3
ValueCountFrequency (%)
50-00-0 1
 
1.0%
95-63-6 1
 
1.0%
100-44-7 1
 
1.0%
100-42-5 1
 
1.0%
100-41-4 1
 
1.0%
100-21-0 1
 
1.0%
100-00-5 1
 
1.0%
98-95-3 1
 
1.0%
98-83-9 1
 
1.0%
98-82-8 1
 
1.0%
Other values (90) 90
90.0%
2023-12-10T22:23:05.411796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 200
27.4%
0 85
11.6%
7 73
 
10.0%
1 68
 
9.3%
5 59
 
8.1%
8 51
 
7.0%
6 48
 
6.6%
9 42
 
5.7%
4 37
 
5.1%
3 35
 
4.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 531
72.6%
Dash Punctuation 200
 
27.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 85
16.0%
7 73
13.7%
1 68
12.8%
5 59
11.1%
8 51
9.6%
6 48
9.0%
9 42
7.9%
4 37
7.0%
3 35
6.6%
2 33
 
6.2%
Dash Punctuation
ValueCountFrequency (%)
- 200
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 731
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 200
27.4%
0 85
11.6%
7 73
 
10.0%
1 68
 
9.3%
5 59
 
8.1%
8 51
 
7.0%
6 48
 
6.6%
9 42
 
5.7%
4 37
 
5.1%
3 35
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 731
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 200
27.4%
0 85
11.6%
7 73
 
10.0%
1 68
 
9.3%
5 59
 
8.1%
8 51
 
7.0%
6 48
 
6.6%
9 42
 
5.7%
4 37
 
5.1%
3 35
 
4.8%

비고
Categorical

IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
<NA>
95 
※2 : CSCL 간주 물질 (substance which is regarded as CSCL)
 
5

Length

Max length53
Median length4
Mean length6.45
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row※2 : CSCL 간주 물질 (substance which is regarded as CSCL)
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 95
95.0%
※2 : CSCL 간주 물질 (substance which is regarded as CSCL) 5
 
5.0%

Length

2023-12-10T22:23:05.629677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:23:05.802911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 95
63.3%
cscl 10
 
6.7%
※2 5
 
3.3%
5
 
3.3%
간주 5
 
3.3%
물질 5
 
3.3%
substance 5
 
3.3%
which 5
 
3.3%
is 5
 
3.3%
regarded 5
 
3.3%

화학물질영문
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T22:23:06.394649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length55
Median length33
Mean length19.26
Min length6

Characters and Unicode

Total characters1926
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowFormaldehyde
2nd rowCetrimonium bromide
3rd rowPropane-1,2-diol
4th rowEthylenediaminetetraacetic acid
5th rowAniline
ValueCountFrequency (%)
acid 8
 
5.9%
chloride 4
 
3.0%
methyl 4
 
3.0%
acetate 3
 
2.2%
ketone 2
 
1.5%
tetramethylammonium 2
 
1.5%
n,n,n-trimethylmethanaminium 2
 
1.5%
bromide 2
 
1.5%
ethyl 2
 
1.5%
phosphate 2
 
1.5%
Other values (103) 104
77.0%
2023-12-10T22:23:06.986031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 219
 
11.4%
o 130
 
6.7%
l 125
 
6.5%
n 113
 
5.9%
i 109
 
5.7%
t 108
 
5.6%
a 103
 
5.3%
- 101
 
5.2%
h 99
 
5.1%
y 92
 
4.8%
Other values (47) 727
37.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1511
78.5%
Uppercase Letter 117
 
6.1%
Dash Punctuation 101
 
5.2%
Decimal Number 89
 
4.6%
Other Punctuation 39
 
2.0%
Space Separator 35
 
1.8%
Close Punctuation 17
 
0.9%
Open Punctuation 17
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 219
14.5%
o 130
 
8.6%
l 125
 
8.3%
n 113
 
7.5%
i 109
 
7.2%
t 108
 
7.1%
a 103
 
6.8%
h 99
 
6.6%
y 92
 
6.1%
r 88
 
5.8%
Other values (12) 325
21.5%
Uppercase Letter
ValueCountFrequency (%)
N 22
18.8%
T 16
13.7%
C 12
10.3%
B 12
10.3%
A 9
7.7%
D 9
7.7%
E 9
7.7%
P 7
 
6.0%
M 6
 
5.1%
I 3
 
2.6%
Other values (7) 12
10.3%
Decimal Number
ValueCountFrequency (%)
2 34
38.2%
1 20
22.5%
4 11
 
12.4%
3 9
 
10.1%
5 7
 
7.9%
6 4
 
4.5%
7 2
 
2.2%
0 1
 
1.1%
8 1
 
1.1%
Other Punctuation
ValueCountFrequency (%)
, 33
84.6%
. 5
 
12.8%
' 1
 
2.6%
Close Punctuation
ValueCountFrequency (%)
) 15
88.2%
] 2
 
11.8%
Open Punctuation
ValueCountFrequency (%)
( 15
88.2%
[ 2
 
11.8%
Dash Punctuation
ValueCountFrequency (%)
- 101
100.0%
Space Separator
ValueCountFrequency (%)
35
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1628
84.5%
Common 298
 
15.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 219
13.5%
o 130
 
8.0%
l 125
 
7.7%
n 113
 
6.9%
i 109
 
6.7%
t 108
 
6.6%
a 103
 
6.3%
h 99
 
6.1%
y 92
 
5.7%
r 88
 
5.4%
Other values (29) 442
27.1%
Common
ValueCountFrequency (%)
- 101
33.9%
35
 
11.7%
2 34
 
11.4%
, 33
 
11.1%
1 20
 
6.7%
) 15
 
5.0%
( 15
 
5.0%
4 11
 
3.7%
3 9
 
3.0%
5 7
 
2.3%
Other values (8) 18
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1926
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 219
 
11.4%
o 130
 
6.7%
l 125
 
6.5%
n 113
 
5.9%
i 109
 
5.7%
t 108
 
5.6%
a 103
 
5.3%
- 101
 
5.2%
h 99
 
5.1%
y 92
 
4.8%
Other values (47) 727
37.7%

인용 출처
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
NITE-CHRIP
100 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNITE-CHRIP
2nd rowNITE-CHRIP
3rd rowNITE-CHRIP
4th rowNITE-CHRIP
5th rowNITE-CHRIP

Common Values

ValueCountFrequency (%)
NITE-CHRIP 100
100.0%

Length

2023-12-10T22:23:07.303673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:23:07.476171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
nite-chrip 100
100.0%

갱신일자
Date

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Minimum2020-05-27 00:00:00
Maximum2020-05-27 00:00:00
2023-12-10T22:23:07.627539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T22:23:07.786873image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Correlations

2023-12-10T22:23:07.896077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
CHRIP등록번호CAS등록번호화학물질영문
CHRIP등록번호1.0001.0001.000
CAS등록번호1.0001.0001.000
화학물질영문1.0001.0001.000

Missing values

2023-12-10T22:23:02.573618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T22:23:02.754210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

CHRIP등록번호CAS등록번호비고화학물질영문인용 출처갱신일자
0C004-685-91A50-00-0<NA>FormaldehydeNITE-CHRIP2020.05.27
1C005-480-87A57-09-0※2 : CSCL 간주 물질 (substance which is regarded as CSCL)Cetrimonium bromideNITE-CHRIP2020.05.27
2C004-690-50A57-55-6<NA>Propane-1,2-diolNITE-CHRIP2020.05.27
3C004-741-06A60-00-4<NA>Ethylenediaminetetraacetic acidNITE-CHRIP2020.05.27
4C004-660-17A62-53-3<NA>AnilineNITE-CHRIP2020.05.27
5C004-721-99A62-56-6<NA>ThioureaNITE-CHRIP2020.05.27
6C004-704-99A64-18-6<NA>Formic acidNITE-CHRIP2020.05.27
7C006-326-60A64-20-0※2 : CSCL 간주 물질 (substance which is regarded as CSCL)N,N,N-Trimethylmethanaminium bromideNITE-CHRIP2020.05.27
8C004-664-71A67-56-1<NA>MethanolNITE-CHRIP2020.05.27
9C004-711-07A67-63-0<NA>Propan-2-olNITE-CHRIP2020.05.27
CHRIP등록번호CAS등록번호비고화학물질영문인용 출처갱신일자
90C004-660-06A107-05-1<NA>3-Chloroprop-1-eneNITE-CHRIP2020.05.27
91C004-683-42A107-06-2<NA>1,2-DichloroethaneNITE-CHRIP2020.05.27
92C004-668-24A107-13-1<NA>AcrylonitrileNITE-CHRIP2020.05.27
93C004-685-46A107-21-1<NA>Ethylene glycolNITE-CHRIP2020.05.27
94C004-762-15A107-22-2<NA>OxalaldehydeNITE-CHRIP2020.05.27
95C005-489-85A107-46-0<NA>Disiloxane, hexamethyl-NITE-CHRIP2020.05.27
96C004-794-96A107-64-2※2 : CSCL 간주 물질 (substance which is regarded as CSCL)N,N-Dimethyl-N,N-dioctadecan-1-ylammonium chlorideNITE-CHRIP2020.05.27
97C004-692-98A108-05-4<NA>Vinyl acetateNITE-CHRIP2020.05.27
98C004-707-40A108-10-1<NA>Methyl isobutyl ketoneNITE-CHRIP2020.05.27
99C004-679-52A108-24-7<NA>Acetic anhydrideNITE-CHRIP2020.05.27