Overview

Dataset statistics

Number of variables8
Number of observations100
Missing cells200
Missing cells (%)25.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.7 KiB
Average record size in memory68.3 B

Variable types

Numeric1
Categorical5
Unsupported2

Alerts

인용출처 has constant value ""Constant
화학물질영문 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
CAS등록번호 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
화학물질국문 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
연번 is highly overall correlated with CAS등록번호 and 2 other fieldsHigh correlation
비고 has 100 (100.0%) missing valuesMissing
출처 has 100 (100.0%) missing valuesMissing
연번 has unique valuesUnique
비고 is an unsupported type, check if it needs cleaning or further analysisUnsupported
출처 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-10 12:09:51.022055
Analysis finished2023-12-10 12:09:52.019615
Duration1 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.5
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T21:09:52.134000image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5.95
Q125.75
median50.5
Q375.25
95-th percentile95.05
Maximum100
Range99
Interquartile range (IQR)49.5

Descriptive statistics

Standard deviation29.011492
Coefficient of variation (CV)0.57448499
Kurtosis-1.2
Mean50.5
Median Absolute Deviation (MAD)25
Skewness0
Sum5050
Variance841.66667
MonotonicityStrictly increasing
2023-12-10T21:09:52.370481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
1.0%
65 1
 
1.0%
75 1
 
1.0%
74 1
 
1.0%
73 1
 
1.0%
72 1
 
1.0%
71 1
 
1.0%
70 1
 
1.0%
69 1
 
1.0%
68 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
1 1
1.0%
2 1
1.0%
3 1
1.0%
4 1
1.0%
5 1
1.0%
6 1
1.0%
7 1
1.0%
8 1
1.0%
9 1
1.0%
10 1
1.0%
ValueCountFrequency (%)
100 1
1.0%
99 1
1.0%
98 1
1.0%
97 1
1.0%
96 1
1.0%
95 1
1.0%
94 1
1.0%
93 1
1.0%
92 1
1.0%
91 1
1.0%

CAS등록번호
Categorical

HIGH CORRELATION 

Distinct42
Distinct (%)42.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
4109-96-0
 
4
75-79-6
 
3
531-86-2
 
3
19750-95-9
 
3
297-99-4
 
3
Other values (37)
84 

Length

Max length11
Median length10
Mean length8.71
Min length7

Unique

Unique5 ?
Unique (%)5.0%

Sample

1st row531-86-2
2nd row531-86-2
3rd row531-86-2
4th row60-41-3
5th row60-41-3

Common Values

ValueCountFrequency (%)
4109-96-0 4
 
4.0%
75-79-6 3
 
3.0%
531-86-2 3
 
3.0%
19750-95-9 3
 
3.0%
297-99-4 3
 
3.0%
10294-34-5 3
 
3.0%
7778-73-6 3
 
3.0%
11113-75-0 3
 
3.0%
115-21-9 3
 
3.0%
75-94-5 3
 
3.0%
Other values (32) 69
69.0%

Length

2023-12-10T21:09:52.635068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
4109-96-0 4
 
4.0%
75-54-7 3
 
3.0%
75-79-6 3
 
3.0%
10025-78-2 3
 
3.0%
7803-62-5 3
 
3.0%
13463-40-6 3
 
3.0%
116-14-3 3
 
3.0%
78-79-5 3
 
3.0%
7782-65-2 3
 
3.0%
75-35-4 3
 
3.0%
Other values (32) 69
69.0%
Distinct8
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
GHS06
26 
GHS09
18 
GHS08
16 
GHS02
16 
GHS05
11 
Other values (3)
13 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st rowGHS07
2nd rowGHS08
3rd rowGHS09
4th rowGHS06
5th rowGHS09

Common Values

ValueCountFrequency (%)
GHS06 26
26.0%
GHS09 18
18.0%
GHS08 16
16.0%
GHS02 16
16.0%
GHS05 11
11.0%
GHS04 7
 
7.0%
GHS07 5
 
5.0%
GHS03 1
 
1.0%

Length

2023-12-10T21:09:52.824858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T21:09:52.966079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
ghs06 26
26.0%
ghs09 18
18.0%
ghs08 16
16.0%
ghs02 16
16.0%
ghs05 11
11.0%
ghs04 7
 
7.0%
ghs07 5
 
5.0%
ghs03 1
 
1.0%

비고
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing100
Missing (%)100.0%
Memory size1.0 KiB

출처
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing100
Missing (%)100.0%
Memory size1.0 KiB

인용출처
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
NCIS
100 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNCIS
2nd rowNCIS
3rd rowNCIS
4th rowNCIS
5th rowNCIS

Common Values

ValueCountFrequency (%)
NCIS 100
100.0%

Length

2023-12-10T21:09:53.130972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T21:09:53.284411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
ncis 100
100.0%

화학물질국문
Categorical

HIGH CORRELATION 

Distinct39
Distinct (%)39.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
<NA>
12 
1,1-다이클로로에틸렌
 
3
황산 벤지딘
 
3
테트라플루오르화 규소
 
3
클로르디메폼 염화수소산염
 
3
Other values (34)
76 

Length

Max length62
Median length22.5
Mean length10.71
Min length3

Unique

Unique5 ?
Unique (%)5.0%

Sample

1st row황산 벤지딘
2nd row황산 벤지딘
3rd row황산 벤지딘
4th row스트리크닌 황산
5th row스트리크닌 황산

Common Values

ValueCountFrequency (%)
<NA> 12
 
12.0%
1,1-다이클로로에틸렌 3
 
3.0%
황산 벤지딘 3
 
3.0%
테트라플루오르화 규소 3
 
3.0%
클로르디메폼 염화수소산염 3
 
3.0%
저메인 3
 
3.0%
트라이뷰틸(펜타클로로페녹시)스태낸) 3
 
3.0%
삼염화 붕소 3
 
3.0%
이소프렌 3
 
3.0%
칼륨 펜타클로로펜에이트 3
 
3.0%
Other values (29) 61
61.0%

Length

2023-12-10T21:09:53.454346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 12
 
7.4%
스트리크닌 10
 
6.2%
황산 5
 
3.1%
펜타클로로펜에이트 3
 
1.9%
1,1-다이클로로에틸렌 3
 
1.9%
에테르 3
 
1.9%
트랜스-포스파미돈 3
 
1.9%
트라이클로로에틸실레인 3
 
1.9%
트라이클로로비닐실레인 3
 
1.9%
메틸다이클로로실레인 3
 
1.9%
Other values (52) 114
70.4%

화학물질영문
Categorical

HIGH CORRELATION 

Distinct42
Distinct (%)42.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Dichlorosilane
 
4
Trichloromethylsilane
 
3
Benzidine sulfate
 
3
Chlordimeform hydrochloride
 
3
trans-Phosphamidon
 
3
Other values (37)
84 

Length

Max length83
Median length36
Mean length22
Min length6

Unique

Unique5 ?
Unique (%)5.0%

Sample

1st rowBenzidine sulfate
2nd rowBenzidine sulfate
3rd rowBenzidine sulfate
4th rowStrychnine sulfate
5th rowStrychnine sulfate

Common Values

ValueCountFrequency (%)
Dichlorosilane 4
 
4.0%
Trichloromethylsilane 3
 
3.0%
Benzidine sulfate 3
 
3.0%
Chlordimeform hydrochloride 3
 
3.0%
trans-Phosphamidon 3
 
3.0%
Boron trichloride 3
 
3.0%
Potassium pentachlorophenate 3
 
3.0%
Nickel sulfide 3
 
3.0%
Silane, trichloroethyl- 3
 
3.0%
Trichloroethenylsilane 3
 
3.0%
Other values (32) 69
69.0%

Length

2023-12-10T21:09:53.654554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
strychnine 14
 
8.1%
silane 9
 
5.2%
sulfate 7
 
4.0%
hydrochloride 5
 
2.9%
silicon 5
 
2.9%
dichlorosilane 4
 
2.3%
phosphate 4
 
2.3%
tetrachloride 4
 
2.3%
strychnidin-10-one 4
 
2.3%
potassium 3
 
1.7%
Other values (50) 114
65.9%

Interactions

2023-12-10T21:09:51.555728image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T21:09:53.780351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번CAS등록번호물질표지세계조화시스템화학물질국문화학물질영문
연번1.0000.9920.4190.9920.992
CAS등록번호0.9921.0000.0001.0001.000
물질표지세계조화시스템0.4190.0001.0000.0000.000
화학물질국문0.9921.0000.0001.0001.000
화학물질영문0.9921.0000.0001.0001.000
2023-12-10T21:09:53.897724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
물질표지세계조화시스템화학물질영문CAS등록번호화학물질국문
물질표지세계조화시스템1.0000.0000.0000.000
화학물질영문0.0001.0001.0001.000
CAS등록번호0.0001.0001.0001.000
화학물질국문0.0001.0001.0001.000
2023-12-10T21:09:54.064528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번CAS등록번호물질표지세계조화시스템화학물질국문화학물질영문
연번1.0000.7430.2100.7480.743
CAS등록번호0.7431.0000.0001.0001.000
물질표지세계조화시스템0.2100.0001.0000.0000.000
화학물질국문0.7481.0000.0001.0001.000
화학물질영문0.7431.0000.0001.0001.000

Missing values

2023-12-10T21:09:51.726800image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T21:09:51.928381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번CAS등록번호물질표지세계조화시스템비고출처인용출처화학물질국문화학물질영문
01531-86-2GHS07<NA><NA>NCIS황산 벤지딘Benzidine sulfate
12531-86-2GHS08<NA><NA>NCIS황산 벤지딘Benzidine sulfate
23531-86-2GHS09<NA><NA>NCIS황산 벤지딘Benzidine sulfate
3460-41-3GHS06<NA><NA>NCIS스트리크닌 황산Strychnine sulfate
4560-41-3GHS09<NA><NA>NCIS스트리크닌 황산Strychnine sulfate
5660491-10-3GHS06<NA><NA>NCIS스트리시닌-10-온, 황산염(2:1), 5수화물Strychnidin-10-one, sulfate (2:1), pentahydrate
6760491-10-3GHS09<NA><NA>NCIS스트리시닌-10-온, 황산염(2:1), 5수화물Strychnidin-10-one, sulfate (2:1), pentahydrate
7810476-87-6GHS06<NA><NA>NCIS스트리크닌 다이메틸아르신산Strychnine dimethylarsinate
8910476-87-6GHS09<NA><NA>NCIS스트리크닌 다이메틸아르신산Strychnine dimethylarsinate
91010476-82-1GHS06<NA><NA>NCIS스트리시닌 비산염Strychnine arsenate
연번CAS등록번호물질표지세계조화시스템비고출처인용출처화학물질국문화학물질영문
909110026-04-7GHS05<NA><NA>NCIS실리콘 테트라염화물Silicon tetrachloride
919210026-04-7GHS06<NA><NA>NCIS실리콘 테트라염화물Silicon tetrachloride
92937783-61-1GHS04<NA><NA>NCIS테트라플루오르화 규소Silicon tetrafluoride
93947783-61-1GHS05<NA><NA>NCIS테트라플루오르화 규소Silicon tetrafluoride
94957783-61-1GHS06<NA><NA>NCIS테트라플루오르화 규소Silicon tetrafluoride
959611113-75-0GHS06<NA><NA>NCIS<NA>Nickel sulfide
969711113-75-0GHS08<NA><NA>NCIS<NA>Nickel sulfide
979811113-75-0GHS09<NA><NA>NCIS<NA>Nickel sulfide
98991314-06-3GHS07<NA><NA>NCIS<NA>Dinickel trioxide
991001314-06-3GHS08<NA><NA>NCIS<NA>Dinickel trioxide