Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells24808
Missing cells (%)24.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory878.9 KiB
Average record size in memory90.0 B

Variable types

Text6
Categorical2
Unsupported2

Alerts

GHS코드 is highly overall correlated with 유해위험세부분류High correlation
유해위험세부분류 is highly overall correlated with GHS코드High correlation
국문 has 4808 (48.1%) missing valuesMissing
Unnamed: 8 has 10000 (100.0%) missing valuesMissing
Unnamed: 9 has 10000 (100.0%) missing valuesMissing
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-01-09 21:51:10.987825
Analysis finished2024-01-09 21:51:11.973581
Duration0.99 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct6781
Distinct (%)67.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T06:51:12.173318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length9.0037
Min length7

Characters and Unicode

Total characters90037
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4499 ?
Unique (%)45.0%

Sample

1st row50-84-0
2nd row625-43-4
3rd row1310-73-2
4th row461-58-5
5th row157299-02-0
ValueCountFrequency (%)
16337-84-1 8
 
0.1%
96-29-7 7
 
0.1%
822-36-6 7
 
0.1%
3121-61-7 7
 
0.1%
15571-58-1 7
 
0.1%
593-74-8 7
 
0.1%
584-79-2 7
 
0.1%
108-90-7 6
 
0.1%
7785-87-7 6
 
0.1%
7446-14-2 6
 
0.1%
Other values (6771) 9932
99.3%
2024-01-10T06:51:12.519625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 20000
22.2%
1 9149
10.2%
0 7156
 
7.9%
5 7017
 
7.8%
6 7000
 
7.8%
2 6996
 
7.8%
3 6889
 
7.7%
8 6592
 
7.3%
7 6541
 
7.3%
4 6428
 
7.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 70037
77.8%
Dash Punctuation 20000
 
22.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 9149
13.1%
0 7156
10.2%
5 7017
10.0%
6 7000
10.0%
2 6996
10.0%
3 6889
9.8%
8 6592
9.4%
7 6541
9.3%
4 6428
9.2%
9 6269
9.0%
Dash Punctuation
ValueCountFrequency (%)
- 20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 90037
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 20000
22.2%
1 9149
10.2%
0 7156
 
7.9%
5 7017
 
7.8%
6 7000
 
7.8%
2 6996
 
7.8%
3 6889
 
7.7%
8 6592
 
7.3%
7 6541
 
7.3%
4 6428
 
7.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90037
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 20000
22.2%
1 9149
10.2%
0 7156
 
7.9%
5 7017
 
7.8%
6 7000
 
7.8%
2 6996
 
7.8%
3 6889
 
7.7%
8 6592
 
7.3%
7 6541
 
7.3%
4 6428
 
7.1%

영문
Text

Distinct6764
Distinct (%)67.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T06:51:12.743379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length285
Median length191
Mean length34.3778
Min length3

Characters and Unicode

Total characters343778
Distinct characters105
Distinct categories13 ?
Distinct scripts3 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4475 ?
Unique (%)44.8%

Sample

1st row2,4-Dichlorobenzoic acid
2nd rowN-Methylisobutylamine
3rd rowSodium,hydroxide
4th rowCyanoguanidine
5th row1,3-Bis(1-isocyanato-1-methylethyl)benzene homopolymer
ValueCountFrequency (%)
acid 1098
 
5.7%
salt 393
 
2.0%
ester 330
 
1.7%
with 294
 
1.5%
sodium 252
 
1.3%
homopolymer 187
 
1.0%
polymer 175
 
0.9%
and 134
 
0.7%
chloride 128
 
0.7%
hydrochloride 96
 
0.5%
Other values (7794) 16087
83.9%
2024-01-10T06:51:13.078329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 29692
 
8.6%
o 23342
 
6.8%
i 21866
 
6.4%
l 19980
 
5.8%
- 19605
 
5.7%
t 17982
 
5.2%
n 17831
 
5.2%
a 17559
 
5.1%
y 15595
 
4.5%
h 14669
 
4.3%
Other values (95) 145657
42.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 256169
74.5%
Decimal Number 19828
 
5.8%
Dash Punctuation 19605
 
5.7%
Uppercase Letter 15122
 
4.4%
Other Punctuation 13685
 
4.0%
Space Separator 9176
 
2.7%
Open Punctuation 4889
 
1.4%
Close Punctuation 4871
 
1.4%
Math Symbol 400
 
0.1%
Modifier Symbol 22
 
< 0.1%
Other values (3) 11
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 29692
11.6%
o 23342
 
9.1%
i 21866
 
8.5%
l 19980
 
7.8%
t 17982
 
7.0%
n 17831
 
7.0%
a 17559
 
6.9%
y 15595
 
6.1%
h 14669
 
5.7%
r 13676
 
5.3%
Other values (25) 63977
25.0%
Uppercase Letter
ValueCountFrequency (%)
D 1760
11.6%
C 1518
10.0%
N 1383
9.1%
T 1162
 
7.7%
H 1159
 
7.7%
M 1144
 
7.6%
P 1030
 
6.8%
B 996
 
6.6%
A 933
 
6.2%
E 717
 
4.7%
Other values (16) 3320
22.0%
Other Punctuation
ValueCountFrequency (%)
, 11193
81.8%
; 991
 
7.2%
' 710
 
5.2%
. 556
 
4.1%
: 159
 
1.2%
* 33
 
0.2%
16
 
0.1%
/ 13
 
0.1%
% 4
 
< 0.1%
" 4
 
< 0.1%
Other values (3) 6
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2 5286
26.7%
1 4942
24.9%
3 2855
14.4%
4 2771
14.0%
5 1267
 
6.4%
6 1038
 
5.2%
7 532
 
2.7%
0 428
 
2.2%
8 395
 
2.0%
9 314
 
1.6%
Math Symbol
ValueCountFrequency (%)
= 256
64.0%
+ 89
 
22.2%
± 26
 
6.5%
14
 
3.5%
~ 6
 
1.5%
4
 
1.0%
< 4
 
1.0%
> 1
 
0.2%
Close Punctuation
ValueCountFrequency (%)
) 3561
73.1%
] 1308
 
26.9%
} 2
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 3560
72.8%
[ 1328
 
27.2%
{ 1
 
< 0.1%
Initial Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 19605
100.0%
Space Separator
ValueCountFrequency (%)
9176
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 22
100.0%
Final Punctuation
ValueCountFrequency (%)
6
100.0%
Letter Number
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 270740
78.8%
Common 72484
 
21.1%
Greek 554
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 29692
 
11.0%
o 23342
 
8.6%
i 21866
 
8.1%
l 19980
 
7.4%
t 17982
 
6.6%
n 17831
 
6.6%
a 17559
 
6.5%
y 15595
 
5.8%
h 14669
 
5.4%
r 13676
 
5.1%
Other values (43) 78548
29.0%
Common
ValueCountFrequency (%)
- 19605
27.0%
, 11193
15.4%
9176
12.7%
2 5286
 
7.3%
1 4942
 
6.8%
) 3561
 
4.9%
( 3560
 
4.9%
3 2855
 
3.9%
4 2771
 
3.8%
[ 1328
 
1.8%
Other values (33) 8207
11.3%
Greek
ValueCountFrequency (%)
α 319
57.6%
ω 101
 
18.2%
β 90
 
16.2%
γ 15
 
2.7%
κ 11
 
2.0%
μ 7
 
1.3%
η 5
 
0.9%
δ 3
 
0.5%
λ 3
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 343153
99.8%
None 580
 
0.2%
Punctuation 24
 
< 0.1%
Arrows 14
 
< 0.1%
Math Operators 4
 
< 0.1%
Number Forms 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 29692
 
8.7%
o 23342
 
6.8%
i 21866
 
6.4%
l 19980
 
5.8%
- 19605
 
5.7%
t 17982
 
5.2%
n 17831
 
5.2%
a 17559
 
5.1%
y 15595
 
4.5%
h 14669
 
4.3%
Other values (78) 145032
42.3%
None
ValueCountFrequency (%)
α 319
55.0%
ω 101
 
17.4%
β 90
 
15.5%
± 26
 
4.5%
γ 15
 
2.6%
κ 11
 
1.9%
μ 7
 
1.2%
η 5
 
0.9%
δ 3
 
0.5%
λ 3
 
0.5%
Punctuation
ValueCountFrequency (%)
16
66.7%
6
 
25.0%
1
 
4.2%
1
 
4.2%
Arrows
ValueCountFrequency (%)
14
100.0%
Math Operators
ValueCountFrequency (%)
4
100.0%
Number Forms
ValueCountFrequency (%)
3
100.0%

국문
Text

MISSING 

Distinct3424
Distinct (%)65.9%
Missing4808
Missing (%)48.1%
Memory size156.2 KiB
2024-01-10T06:51:13.272299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length159
Median length82
Mean length12.843413
Min length1

Characters and Unicode

Total characters66683
Distinct characters452
Distinct categories12 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2172 ?
Unique (%)41.8%

Sample

1st row수산화나트륨
2nd row사이아노구아니딘
3rd row2,4-다이나이트로페놀
4th row3,4-다이클로로톨루엔
5th row스칸듐과,결합한,안티모니,화합물(1,:,1)
ValueCountFrequency (%)
1:1 16
 
0.3%
15
 
0.3%
니켈 12
 
0.2%
메틸 11
 
0.2%
에스터 8
 
0.1%
리튬 8
 
0.1%
에틸렌,글라이콜,메틸,에테르,아크릴레이트 7
 
0.1%
황산 7
 
0.1%
소듐 7
 
0.1%
다이메틸수은 7
 
0.1%
Other values (3565) 5455
98.2%
2024-01-10T06:51:13.600508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 4935
 
7.4%
4640
 
7.0%
, 4487
 
6.7%
2689
 
4.0%
2098
 
3.1%
1851
 
2.8%
1798
 
2.7%
2 1378
 
2.1%
1327
 
2.0%
1276
 
1.9%
Other values (442) 40204
60.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48196
72.3%
Decimal Number 4974
 
7.5%
Dash Punctuation 4935
 
7.4%
Other Punctuation 4854
 
7.3%
Uppercase Letter 1033
 
1.5%
Open Punctuation 956
 
1.4%
Close Punctuation 953
 
1.4%
Lowercase Letter 364
 
0.5%
Space Separator 361
 
0.5%
Math Symbol 54
 
0.1%
Other values (2) 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4640
 
9.6%
2689
 
5.6%
2098
 
4.4%
1851
 
3.8%
1798
 
3.7%
1327
 
2.8%
1276
 
2.6%
1257
 
2.6%
1144
 
2.4%
988
 
2.0%
Other values (354) 29128
60.4%
Lowercase Letter
ValueCountFrequency (%)
t 51
14.0%
e 40
11.0%
a 39
10.7%
r 32
 
8.8%
o 21
 
5.8%
n 20
 
5.5%
k 20
 
5.5%
α 19
 
5.2%
p 17
 
4.7%
c 12
 
3.3%
Other values (20) 93
25.5%
Uppercase Letter
ValueCountFrequency (%)
N 313
30.3%
I 193
18.7%
O 92
 
8.9%
C 76
 
7.4%
H 58
 
5.6%
S 57
 
5.5%
T 47
 
4.5%
R 44
 
4.3%
E 29
 
2.8%
P 28
 
2.7%
Other values (10) 96
 
9.3%
Other Punctuation
ValueCountFrequency (%)
, 4487
92.4%
' 160
 
3.3%
. 82
 
1.7%
: 64
 
1.3%
* 13
 
0.3%
/ 13
 
0.3%
10
 
0.2%
· 9
 
0.2%
; 7
 
0.1%
% 4
 
0.1%
Other values (3) 5
 
0.1%
Decimal Number
ValueCountFrequency (%)
2 1378
27.7%
1 1194
24.0%
3 809
16.3%
4 789
15.9%
5 346
 
7.0%
6 223
 
4.5%
7 71
 
1.4%
9 62
 
1.2%
8 56
 
1.1%
0 46
 
0.9%
Math Symbol
ValueCountFrequency (%)
+ 34
63.0%
= 10
 
18.5%
~ 7
 
13.0%
± 2
 
3.7%
< 1
 
1.9%
Close Punctuation
ValueCountFrequency (%)
) 793
83.2%
] 158
 
16.6%
} 2
 
0.2%
Open Punctuation
ValueCountFrequency (%)
( 790
82.6%
[ 165
 
17.3%
{ 1
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 4935
100.0%
Space Separator
ValueCountFrequency (%)
361
100.0%
Initial Punctuation
ValueCountFrequency (%)
2
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48196
72.3%
Common 17090
 
25.6%
Latin 1352
 
2.0%
Greek 45
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4640
 
9.6%
2689
 
5.6%
2098
 
4.4%
1851
 
3.8%
1798
 
3.7%
1327
 
2.8%
1276
 
2.6%
1257
 
2.6%
1144
 
2.4%
988
 
2.0%
Other values (354) 29128
60.4%
Latin
ValueCountFrequency (%)
N 313
23.2%
I 193
14.3%
O 92
 
6.8%
C 76
 
5.6%
H 58
 
4.3%
S 57
 
4.2%
t 51
 
3.8%
T 47
 
3.5%
R 44
 
3.3%
e 40
 
3.0%
Other values (33) 381
28.2%
Common
ValueCountFrequency (%)
- 4935
28.9%
, 4487
26.3%
2 1378
 
8.1%
1 1194
 
7.0%
3 809
 
4.7%
) 793
 
4.6%
( 790
 
4.6%
4 789
 
4.6%
361
 
2.1%
5 346
 
2.0%
Other values (28) 1208
 
7.1%
Greek
ValueCountFrequency (%)
α 19
42.2%
κ 9
20.0%
μ 5
 
11.1%
η 4
 
8.9%
λ 3
 
6.7%
β 3
 
6.7%
ω 2
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48196
72.3%
ASCII 18418
 
27.6%
None 56
 
0.1%
Punctuation 13
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 4935
26.8%
, 4487
24.4%
2 1378
 
7.5%
1 1194
 
6.5%
3 809
 
4.4%
) 793
 
4.3%
( 790
 
4.3%
4 789
 
4.3%
361
 
2.0%
5 346
 
1.9%
Other values (66) 2536
13.8%
Hangul
ValueCountFrequency (%)
4640
 
9.6%
2689
 
5.6%
2098
 
4.4%
1851
 
3.8%
1798
 
3.7%
1327
 
2.8%
1276
 
2.6%
1257
 
2.6%
1144
 
2.4%
988
 
2.0%
Other values (354) 29128
60.4%
None
ValueCountFrequency (%)
α 19
33.9%
· 9
16.1%
κ 9
16.1%
μ 5
 
8.9%
η 4
 
7.1%
λ 3
 
5.4%
β 3
 
5.4%
± 2
 
3.6%
ω 2
 
3.6%
Punctuation
ValueCountFrequency (%)
10
76.9%
2
 
15.4%
1
 
7.7%
Distinct65
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T06:51:13.778601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters40000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowH319
2nd rowH224
3rd rowH290
4th rowH312
5th rowH334
ValueCountFrequency (%)
h319 1155
 
11.6%
h315 1094
 
10.9%
h335 814
 
8.1%
h302 751
 
7.5%
h400 471
 
4.7%
h317 435
 
4.3%
h410 395
 
4.0%
h314 381
 
3.8%
h318 359
 
3.6%
h332 309
 
3.1%
Other values (55) 3836
38.4%
2024-01-10T06:51:14.045210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
H 10000
25.0%
3 9891
24.7%
1 6079
15.2%
0 3376
 
8.4%
2 2883
 
7.2%
5 2299
 
5.7%
4 2185
 
5.5%
9 1179
 
2.9%
7 1044
 
2.6%
6 625
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 30000
75.0%
Uppercase Letter 10000
 
25.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 9891
33.0%
1 6079
20.3%
0 3376
 
11.3%
2 2883
 
9.6%
5 2299
 
7.7%
4 2185
 
7.3%
9 1179
 
3.9%
7 1044
 
3.5%
6 625
 
2.1%
8 439
 
1.5%
Uppercase Letter
ValueCountFrequency (%)
H 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 30000
75.0%
Latin 10000
 
25.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 9891
33.0%
1 6079
20.3%
0 3376
 
11.3%
2 2883
 
9.6%
5 2299
 
7.7%
4 2185
 
7.3%
9 1179
 
3.9%
7 1044
 
3.5%
6 625
 
2.1%
8 439
 
1.5%
Latin
ValueCountFrequency (%)
H 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 40000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H 10000
25.0%
3 9891
24.7%
1 6079
15.2%
0 3376
 
8.4%
2 2883
 
7.2%
5 2299
 
5.7%
4 2185
 
5.5%
9 1179
 
2.9%
7 1044
 
2.6%
6 625
 
1.6%

GHS코드
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
GHS07
4926 
GHS08
1278 
GHS09
1118 
GHS06
815 
GHS05
764 
Other values (5)
1099 

Length

Max length5
Median length5
Mean length4.9454
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGHS07
2nd rowGHS02
3rd rowGHS05
4th rowGHS07
5th rowGHS08

Common Values

ValueCountFrequency (%)
GHS07 4926
49.3%
GHS08 1278
 
12.8%
GHS09 1118
 
11.2%
GHS06 815
 
8.2%
GHS05 764
 
7.6%
<NA> 546
 
5.5%
GHS02 463
 
4.6%
GHS03 41
 
0.4%
GHS04 37
 
0.4%
GHS01 12
 
0.1%

Length

2024-01-10T06:51:14.147835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T06:51:14.238800image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
ghs07 4926
49.3%
ghs08 1278
 
12.8%
ghs09 1118
 
11.2%
ghs06 815
 
8.2%
ghs05 764
 
7.6%
na 546
 
5.5%
ghs02 463
 
4.6%
ghs03 41
 
0.4%
ghs04 37
 
0.4%
ghs01 12
 
0.1%

유해위험세부분류
Categorical

HIGH CORRELATION 

Distinct28
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
심한 눈 손상 또는 눈 자극성
1572 
피부 부식성 또는 자극성
1495 
급성독성-경구
1090 
특정 표적장기 독성-1회 노출
1070 
수생환경 유해성-만성
959 
Other values (23)
3814 

Length

Max length22
Median length18
Mean length11.05
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row심한 눈 손상 또는 눈 자극성
2nd row인화성 액체
3rd row금속부식성 물질
4th row급성독성-경피
5th row호흡기 과민성

Common Values

ValueCountFrequency (%)
심한 눈 손상 또는 눈 자극성 1572
15.7%
피부 부식성 또는 자극성 1495
14.9%
급성독성-경구 1090
10.9%
특정 표적장기 독성-1회 노출 1070
10.7%
수생환경 유해성-만성 959
9.6%
급성독성-흡입 617
 
6.2%
수생환경 유해성-급성 533
 
5.3%
급성독성-경피 440
 
4.4%
피부 과민성 435
 
4.3%
인화성 액체 396
 
4.0%
Other values (18) 1393
13.9%

Length

2024-01-10T06:51:14.349773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
3144
 
10.6%
또는 3067
 
10.3%
자극성 3067
 
10.3%
피부 1930
 
6.5%
심한 1572
 
5.3%
손상 1572
 
5.3%
부식성 1495
 
5.0%
수생환경 1492
 
5.0%
특정 1463
 
4.9%
표적장기 1463
 
4.9%
Other values (33) 9473
31.9%
Distinct65
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T06:51:14.534987image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length63
Median length41
Mean length23.5113
Min length14

Characters and Unicode

Total characters235113
Distinct characters94
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row심한 눈 손상/눈 자극성(3.3)의 구분 2(2A)
2nd row인화성 액체(2.6)의 구분 1
3rd row금속부식성 물질(2.16)의 구분 1
4th row급성독성-경피(3.1)의 구분 4
5th row호흡기 과민성(3.4)의 구분 1(1A, 1B)
ValueCountFrequency (%)
구분 10041
21.1%
2 2581
 
5.4%
3 2032
 
4.3%
1 1985
 
4.2%
심한 1572
 
3.3%
손상/눈 1572
 
3.3%
자극성(3.3)의 1572
 
3.3%
1572
 
3.3%
4 1506
 
3.2%
피부부식성/자극성(3.2)의 1495
 
3.1%
Other values (62) 21712
45.6%
2024-01-10T06:51:14.795700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
37640
16.0%
15139
 
6.4%
) 12374
 
5.3%
( 12374
 
5.3%
3 11523
 
4.9%
11148
 
4.7%
1 10825
 
4.6%
. 10058
 
4.3%
10058
 
4.3%
10040
 
4.3%
Other values (84) 93934
40.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 113156
48.1%
Space Separator 37640
 
16.0%
Decimal Number 35858
 
15.3%
Other Punctuation 16185
 
6.9%
Close Punctuation 12374
 
5.3%
Open Punctuation 12374
 
5.3%
Uppercase Letter 3916
 
1.7%
Dash Punctuation 3610
 
1.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
15139
 
13.4%
11148
 
9.9%
10058
 
8.9%
10040
 
8.9%
3915
 
3.5%
3881
 
3.4%
3871
 
3.4%
3449
 
3.0%
3144
 
2.8%
3022
 
2.7%
Other values (64) 45489
40.2%
Decimal Number
ValueCountFrequency (%)
3 11523
32.1%
1 10825
30.2%
2 7170
20.0%
4 3549
 
9.9%
8 1086
 
3.0%
6 667
 
1.9%
9 399
 
1.1%
7 304
 
0.8%
5 229
 
0.6%
0 106
 
0.3%
Other Punctuation
ValueCountFrequency (%)
. 10058
62.1%
/ 3067
 
18.9%
, 3060
 
18.9%
Uppercase Letter
ValueCountFrequency (%)
A 2316
59.1%
B 1219
31.1%
C 381
 
9.7%
Space Separator
ValueCountFrequency (%)
37640
100.0%
Close Punctuation
ValueCountFrequency (%)
) 12374
100.0%
Open Punctuation
ValueCountFrequency (%)
( 12374
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3610
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 118041
50.2%
Hangul 113156
48.1%
Latin 3916
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
15139
 
13.4%
11148
 
9.9%
10058
 
8.9%
10040
 
8.9%
3915
 
3.5%
3881
 
3.4%
3871
 
3.4%
3449
 
3.0%
3144
 
2.8%
3022
 
2.7%
Other values (64) 45489
40.2%
Common
ValueCountFrequency (%)
37640
31.9%
) 12374
 
10.5%
( 12374
 
10.5%
3 11523
 
9.8%
1 10825
 
9.2%
. 10058
 
8.5%
2 7170
 
6.1%
- 3610
 
3.1%
4 3549
 
3.0%
/ 3067
 
2.6%
Other values (7) 5851
 
5.0%
Latin
ValueCountFrequency (%)
A 2316
59.1%
B 1219
31.1%
C 381
 
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 121957
51.9%
Hangul 113156
48.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
37640
30.9%
) 12374
 
10.1%
( 12374
 
10.1%
3 11523
 
9.4%
1 10825
 
8.9%
. 10058
 
8.2%
2 7170
 
5.9%
- 3610
 
3.0%
4 3549
 
2.9%
/ 3067
 
2.5%
Other values (10) 9767
 
8.0%
Hangul
ValueCountFrequency (%)
15139
 
13.4%
11148
 
9.9%
10058
 
8.9%
10040
 
8.9%
3915
 
3.5%
3881
 
3.4%
3871
 
3.4%
3449
 
3.0%
3144
 
2.8%
3022
 
2.7%
Other values (64) 45489
40.2%
Distinct65
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T06:51:15.046219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length162
Median length125
Mean length26.6686
Min length6

Characters and Unicode

Total characters266686
Distinct characters151
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row눈에 심한 자극을 일으킴
2nd row극인화성 액체 및 증기
3rd row금속을 부식시킬 수 있음
4th row피부와 접촉하면 유해함
5th row흡입 시 알레르기성 반응, 천식 또는 호흡 곤란 등을 일으킬 수 있음
ValueCountFrequency (%)
일으킴 3288
 
4.8%
자극을 3141
 
4.6%
있음 2366
 
3.5%
일으킬 2360
 
3.5%
2262
 
3.3%
의해 2066
 
3.0%
눈에 1953
 
2.9%
심한 1895
 
2.8%
유독함 1703
 
2.5%
유해함 1538
 
2.2%
Other values (128) 45801
67.0%
2024-01-10T06:51:15.406018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
58373
 
21.9%
8909
 
3.3%
8705
 
3.3%
8639
 
3.2%
7959
 
3.0%
6472
 
2.4%
5833
 
2.2%
5705
 
2.1%
4671
 
1.8%
3875
 
1.5%
Other values (141) 147545
55.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 197949
74.2%
Space Separator 58373
 
21.9%
Other Punctuation 4250
 
1.6%
Close Punctuation 2925
 
1.1%
Open Punctuation 2925
 
1.1%
Decimal Number 264
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8909
 
4.5%
8705
 
4.4%
8639
 
4.4%
7959
 
4.0%
6472
 
3.3%
5833
 
2.9%
5705
 
2.9%
4671
 
2.4%
3875
 
2.0%
3828
 
1.9%
Other values (134) 133353
67.4%
Other Punctuation
ValueCountFrequency (%)
. 2982
70.2%
, 1178
 
27.7%
; 90
 
2.1%
Space Separator
ValueCountFrequency (%)
58373
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2925
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2925
100.0%
Decimal Number
ValueCountFrequency (%)
1 264
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 197949
74.2%
Common 68737
 
25.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8909
 
4.5%
8705
 
4.4%
8639
 
4.4%
7959
 
4.0%
6472
 
3.3%
5833
 
2.9%
5705
 
2.9%
4671
 
2.4%
3875
 
2.0%
3828
 
1.9%
Other values (134) 133353
67.4%
Common
ValueCountFrequency (%)
58373
84.9%
. 2982
 
4.3%
) 2925
 
4.3%
( 2925
 
4.3%
, 1178
 
1.7%
1 264
 
0.4%
; 90
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 197949
74.2%
ASCII 68737
 
25.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
58373
84.9%
. 2982
 
4.3%
) 2925
 
4.3%
( 2925
 
4.3%
, 1178
 
1.7%
1 264
 
0.4%
; 90
 
0.1%
Hangul
ValueCountFrequency (%)
8909
 
4.5%
8705
 
4.4%
8639
 
4.4%
7959
 
4.0%
6472
 
3.3%
5833
 
2.9%
5705
 
2.9%
4671
 
2.4%
3875
 
2.0%
3828
 
1.9%
Other values (134) 133353
67.4%

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

Correlations

2024-01-10T06:51:15.481914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
유해위험코드GHS코드유해위험세부분류유해성 항목유해위험문구
유해위험코드1.0001.0001.0001.0001.000
GHS코드1.0001.0000.9801.0001.000
유해위험세부분류1.0000.9801.0001.0001.000
유해성 항목1.0001.0001.0001.0001.000
유해위험문구1.0001.0001.0001.0001.000
2024-01-10T06:51:15.557318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
유해위험세부분류GHS코드
유해위험세부분류1.0000.875
GHS코드0.8751.000
2024-01-10T06:51:15.621741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
GHS코드유해위험세부분류
GHS코드1.0000.875
유해위험세부분류0.8751.000

Missing values

2024-01-10T06:51:11.799151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-10T06:51:11.916274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

고유(CAS)번호영문국문유해위험코드GHS코드유해위험세부분류유해성 항목유해위험문구Unnamed: 8Unnamed: 9
4347650-84-02,4-Dichlorobenzoic acid<NA>H319GHS07심한 눈 손상 또는 눈 자극성심한 눈 손상/눈 자극성(3.3)의 구분 2(2A)눈에 심한 자극을 일으킴<NA><NA>
39194625-43-4N-Methylisobutylamine<NA>H224GHS02인화성 액체인화성 액체(2.6)의 구분 1극인화성 액체 및 증기<NA><NA>
68361310-73-2Sodium,hydroxide수산화나트륨H290GHS05금속부식성 물질금속부식성 물질(2.16)의 구분 1금속을 부식시킬 수 있음<NA><NA>
43961461-58-5Cyanoguanidine사이아노구아니딘H312GHS07급성독성-경피급성독성-경피(3.1)의 구분 4피부와 접촉하면 유해함<NA><NA>
53051157299-02-01,3-Bis(1-isocyanato-1-methylethyl)benzene homopolymer<NA>H334GHS08호흡기 과민성호흡기 과민성(3.4)의 구분 1(1A, 1B)흡입 시 알레르기성 반응, 천식 또는 호흡 곤란 등을 일으킬 수 있음<NA><NA>
353851-28-52,4-Dinitrophenol2,4-다이나이트로페놀H331GHS06급성독성-흡입급성독성-흡입(3.1)의 구분 3흡입하면 유독함<NA><NA>
1839795-75-03,4-Dichlorotoluene3,4-다이클로로톨루엔H227<NA>인화성 액체인화성 액체(2.6)의 구분 4가연성 액체<NA><NA>
309779003-09-02Methoxyethene homopolymer<NA>H280GHS04고압가스고압가스(2.5)의 구분 1, 2, 4고압가스 포함; 가열하면 폭발할 수 있음<NA><NA>
739612166-36-8Antimony,,compound,with,scandium,(1,:,1)스칸듐과,결합한,안티모니,화합물(1,:,1)H302GHS07급성독성-경구급성독성-경구(3.1)의 구분 4삼키면 유해함<NA><NA>
907510025-87-3Phosphorus,oxychloride산화염화,인H302GHS07급성독성-경구급성독성-경구(3.1)의 구분 4삼키면 유해함<NA><NA>
고유(CAS)번호영문국문유해위험코드GHS코드유해위험세부분류유해성 항목유해위험문구Unnamed: 8Unnamed: 9
886710101-96-9Nickel,selenite니켈,셀레나이트H331GHS06급성독성-흡입급성독성-흡입(3.1)의 구분 3흡입하면 유독함<NA><NA>
3604968855-54-9Kieselguhr, soda ash flux-calcined<NA>H332GHS07급성독성-흡입급성독성-흡입(3.1)의 구분 4흡입하면 유해함<NA><NA>
251157252-83-7Bromoacetaldehyde,dimethyl,acetal브로모아세트알데하이드,다이메틸,아세탈H319GHS07심한 눈 손상 또는 눈 자극성심한 눈 손상/눈 자극성(3.3)의 구분 2(2A)눈에 심한 자극을 일으킴<NA><NA>
21622123-42-24-Hydroxy-4-methyl-2-pentanone4-하이드록시-4-메틸-2-펜타논H319GHS07심한 눈 손상 또는 눈 자극성심한 눈 손상/눈 자극성(3.3)의 구분 2(2A)눈에 심한 자극을 일으킴<NA><NA>
40513349-06-02Nickel diformate니켈 디포메이트H341GHS08생식세포 변이원성생식세포 변이원성(3.5)의 구분 2유전적인 결함을 일으킬 것으로 의심됨 (유전적인 결함을 일으키는 노출 경로를 기재한다. 단, 다른 노출경로에 의해 유전적인 결함을 일으키지 않는다는 결정적인 증거가 있는 경우에 한한다.)<NA><NA>
1522674-97-5Chlorobromomethane클로로브로모메테인H336GHS07특정 표적장기 독성-1회 노출특정 표적장기 독성-1회 노출(3.8)의 구분 3, 마취 영향졸음 또는 현기증을 일으킬 수 있음<NA><NA>
2576210034-81-8Magnesium,perchlorate과염소산,마그네슘H302GHS07급성독성-경구급성독성-경구(3.1)의 구분 4삼키면 유해함<NA><NA>
47187302-17-02,2,2-Trichloro-1,1-ethanediol; Tosyl, chloral hydrate<NA>H370GHS08특정 표적장기 독성-1회 노출특정 표적장기 독성-1회 노출(3.8)의 구분 1장기(영향을 받는 것으로 알려진 모든 장기를 명시한다.)에 손상을 일으킴 (특정 표적장기 독성(1회 노출)을 일으키는 노출 경로를 기재. 단, 다른 노출경로에 의해 특정 표적장기 독성(1회 노출)을 일으키지 않는다는 결정적인 증거가 있는 경우에 한한다.)<NA><NA>
200951627-45-5O-Methylhydroxylamine, methanesulfonate (1:1)메탄술폰산 O-메틸하이드록실아민 (1:1)H410GHS09수생환경 유해성-만성수생환경 유해성(4.1)의 만성 구분 1장기적 영향에 의해 수생생물에 매우 유독함<NA><NA>
959716529-66-1trans-3-Pentenenitrile트렌스-3-펜텐니트릴H335GHS07특정 표적장기 독성-1회 노출특정 표적장기 독성-1회 노출(3.8)의 구분 3, 호흡기 자극호흡기 자극을 일으킬 수 있음<NA><NA>