Overview

Dataset statistics

Number of variables4
Number of observations191
Missing cells109
Missing cells (%)14.3%
Duplicate rows4
Duplicate rows (%)2.1%
Total size in memory6.3 KiB
Average record size in memory33.7 B

Variable types

Text3
Categorical1

Dataset

Description한국가스안전공사 검사대상이 되는 독성가스 191종의 물성 정보(가스명, 화학기호, 검사주기)에 관한 데이터로, 일반 국민분들에게 전반적인 독성가스에 관한 정보를 제공하기 위해 공개하는 데이터입니다.
URLhttps://www.data.go.kr/data/15067783/fileData.do

Alerts

Dataset has 4 (2.1%) duplicate rowsDuplicates
화학기호 has 102 (53.4%) missing valuesMissing
카스번호(CAS No) has 7 (3.7%) missing valuesMissing

Reproduction

Analysis started2023-12-12 02:37:16.059034
Analysis finished2023-12-12 02:37:16.486165
Duration0.43 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct174
Distinct (%)91.1%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
2023-12-12T11:37:16.681649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length34
Median length23
Mean length9.0418848
Min length2

Characters and Unicode

Total characters1727
Distinct characters151
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique164 ?
Unique (%)85.9%

Sample

1st row염화수소
2nd row삼염화붕소
3rd row사불화규소
4th row육불화텅스텐
5th row사불화유황
ValueCountFrequency (%)
0.1%b2h6/h2 5
 
2.2%
5%b2h6/n2 5
 
2.2%
co 5
 
2.2%
3
 
1.3%
bcl3 3
 
1.3%
n2+sif4 3
 
1.3%
15%b2h6 3
 
1.3%
암모니아 2
 
0.9%
toxic 2
 
0.9%
0.95%f2/3.5%ar/ne 2
 
0.9%
Other values (180) 190
85.2%
2023-12-12T11:37:17.141405image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
H 121
 
7.0%
2 116
 
6.7%
/ 98
 
5.7%
C 80
 
4.6%
N 67
 
3.9%
% 59
 
3.4%
O 51
 
3.0%
B 41
 
2.4%
3 40
 
2.3%
F 40
 
2.3%
Other values (141) 1014
58.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 669
38.7%
Other Letter 327
18.9%
Decimal Number 300
17.4%
Other Punctuation 210
 
12.2%
Lowercase Letter 120
 
6.9%
Space Separator 32
 
1.9%
Math Symbol 24
 
1.4%
Open Punctuation 22
 
1.3%
Close Punctuation 20
 
1.2%
Dash Punctuation 3
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
28
 
8.6%
22
 
6.7%
11
 
3.4%
11
 
3.4%
10
 
3.1%
9
 
2.8%
9
 
2.8%
8
 
2.4%
7
 
2.1%
7
 
2.1%
Other values (79) 205
62.7%
Uppercase Letter
ValueCountFrequency (%)
H 121
18.1%
C 80
12.0%
N 67
10.0%
O 51
 
7.6%
B 41
 
6.1%
F 40
 
6.0%
A 37
 
5.5%
S 35
 
5.2%
E 28
 
4.2%
L 23
 
3.4%
Other values (15) 146
21.8%
Lowercase Letter
ValueCountFrequency (%)
e 38
31.7%
r 24
20.0%
i 13
 
10.8%
l 13
 
10.8%
o 7
 
5.8%
n 4
 
3.3%
a 4
 
3.3%
t 3
 
2.5%
s 2
 
1.7%
d 2
 
1.7%
Other values (8) 10
 
8.3%
Decimal Number
ValueCountFrequency (%)
2 116
38.7%
3 40
 
13.3%
1 34
 
11.3%
5 31
 
10.3%
6 26
 
8.7%
4 22
 
7.3%
0 20
 
6.7%
9 5
 
1.7%
8 4
 
1.3%
7 2
 
0.7%
Other Punctuation
ValueCountFrequency (%)
/ 98
46.7%
% 59
28.1%
, 31
 
14.8%
. 22
 
10.5%
Space Separator
ValueCountFrequency (%)
32
100.0%
Math Symbol
ValueCountFrequency (%)
+ 24
100.0%
Open Punctuation
ValueCountFrequency (%)
( 22
100.0%
Close Punctuation
ValueCountFrequency (%)
) 20
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 789
45.7%
Common 611
35.4%
Hangul 327
18.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
28
 
8.6%
22
 
6.7%
11
 
3.4%
11
 
3.4%
10
 
3.1%
9
 
2.8%
9
 
2.8%
8
 
2.4%
7
 
2.1%
7
 
2.1%
Other values (79) 205
62.7%
Latin
ValueCountFrequency (%)
H 121
15.3%
C 80
 
10.1%
N 67
 
8.5%
O 51
 
6.5%
B 41
 
5.2%
F 40
 
5.1%
e 38
 
4.8%
A 37
 
4.7%
S 35
 
4.4%
E 28
 
3.5%
Other values (33) 251
31.8%
Common
ValueCountFrequency (%)
2 116
19.0%
/ 98
16.0%
% 59
9.7%
3 40
 
6.5%
1 34
 
5.6%
32
 
5.2%
5 31
 
5.1%
, 31
 
5.1%
6 26
 
4.3%
+ 24
 
3.9%
Other values (9) 120
19.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1400
81.1%
Hangul 327
 
18.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H 121
 
8.6%
2 116
 
8.3%
/ 98
 
7.0%
C 80
 
5.7%
N 67
 
4.8%
% 59
 
4.2%
O 51
 
3.6%
B 41
 
2.9%
3 40
 
2.9%
F 40
 
2.9%
Other values (52) 687
49.1%
Hangul
ValueCountFrequency (%)
28
 
8.6%
22
 
6.7%
11
 
3.4%
11
 
3.4%
10
 
3.1%
9
 
2.8%
9
 
2.8%
8
 
2.4%
7
 
2.1%
7
 
2.1%
Other values (79) 205
62.7%

화학기호
Text

MISSING 

Distinct75
Distinct (%)84.3%
Missing102
Missing (%)53.4%
Memory size1.6 KiB
2023-12-12T11:37:17.415836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length10
Mean length5.1460674
Min length1

Characters and Unicode

Total characters458
Distinct characters41
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique66 ?
Unique (%)74.2%

Sample

1st rowHcl
2nd rowBcl3
3rd rowSiF4
4th rowWF6
5th rowSF4
ValueCountFrequency (%)
so2 4
 
4.4%
nh3 3
 
3.3%
bf3 3
 
3.3%
gef4 3
 
3.3%
15%b2h6 3
 
3.3%
sif4 2
 
2.2%
sih4 2
 
2.2%
bcl3 2
 
2.2%
b2h6 2
 
2.2%
hf 2
 
2.2%
Other values (60) 64
71.1%
2023-12-12T11:37:17.787606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
H 62
 
13.5%
2 50
 
10.9%
C 32
 
7.0%
3 29
 
6.3%
F 24
 
5.2%
N 21
 
4.6%
S 21
 
4.6%
4 21
 
4.6%
B 18
 
3.9%
% 17
 
3.7%
Other values (31) 163
35.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 228
49.8%
Decimal Number 145
31.7%
Lowercase Letter 40
 
8.7%
Other Punctuation 32
 
7.0%
Math Symbol 8
 
1.7%
Close Punctuation 2
 
0.4%
Open Punctuation 2
 
0.4%
Space Separator 1
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
H 62
27.2%
C 32
14.0%
F 24
 
10.5%
N 21
 
9.2%
S 21
 
9.2%
B 18
 
7.9%
O 16
 
7.0%
P 7
 
3.1%
L 6
 
2.6%
A 6
 
2.6%
Other values (7) 15
 
6.6%
Decimal Number
ValueCountFrequency (%)
2 50
34.5%
3 29
20.0%
4 21
14.5%
6 17
 
11.7%
5 11
 
7.6%
1 9
 
6.2%
0 5
 
3.4%
8 2
 
1.4%
9 1
 
0.7%
Lowercase Letter
ValueCountFrequency (%)
i 10
25.0%
e 9
22.5%
l 7
17.5%
r 6
15.0%
s 3
 
7.5%
c 2
 
5.0%
a 2
 
5.0%
b 1
 
2.5%
Other Punctuation
ValueCountFrequency (%)
% 17
53.1%
/ 12
37.5%
, 3
 
9.4%
Math Symbol
ValueCountFrequency (%)
+ 8
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 268
58.5%
Common 190
41.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
H 62
23.1%
C 32
11.9%
F 24
 
9.0%
N 21
 
7.8%
S 21
 
7.8%
B 18
 
6.7%
O 16
 
6.0%
i 10
 
3.7%
e 9
 
3.4%
l 7
 
2.6%
Other values (15) 48
17.9%
Common
ValueCountFrequency (%)
2 50
26.3%
3 29
15.3%
4 21
11.1%
% 17
 
8.9%
6 17
 
8.9%
/ 12
 
6.3%
5 11
 
5.8%
1 9
 
4.7%
+ 8
 
4.2%
0 5
 
2.6%
Other values (6) 11
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 458
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H 62
 
13.5%
2 50
 
10.9%
C 32
 
7.0%
3 29
 
6.3%
F 24
 
5.2%
N 21
 
4.6%
S 21
 
4.6%
4 21
 
4.6%
B 18
 
3.9%
% 17
 
3.7%
Other values (31) 163
35.6%

검사주기
Categorical

Distinct5
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
0
125 
1
58 
12
 
5
4
 
2
6
 
1

Length

Max length2
Median length1
Mean length1.026178
Min length1

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0 125
65.4%
1 58
30.4%
12 5
 
2.6%
4 2
 
1.0%
6 1
 
0.5%

Length

2023-12-12T11:37:17.931881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:37:18.116446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 125
65.4%
1 58
30.4%
12 5
 
2.6%
4 2
 
1.0%
6 1
 
0.5%

카스번호(CAS No)
Text

MISSING 

Distinct115
Distinct (%)62.5%
Missing7
Missing (%)3.7%
Memory size1.6 KiB
2023-12-12T11:37:18.320935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length124
Median length59
Mean length24.51087
Min length7

Characters and Unicode

Total characters4510
Distinct characters47
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique81 ?
Unique (%)44.0%

Sample

1st row7647-01-0
2nd row10294-34-5
3rd row7783-61-1
4th row7783-82-6
5th row7783-60-0
ValueCountFrequency (%)
192
30.5%
7727-37-9 23
 
3.7%
b2h6 22
 
3.5%
7440-01-9 18
 
2.9%
f2 14
 
2.2%
1333-74-0 14
 
2.2%
hcl 12
 
1.9%
ph3 10
 
1.6%
19287-45-7/h2 9
 
1.4%
7782-41-4/ar 8
 
1.3%
Other values (184) 308
48.9%
2023-12-12T11:37:18.731603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 665
14.7%
7 504
11.2%
447
 
9.9%
4 324
 
7.2%
2 277
 
6.1%
0 270
 
6.0%
: 255
 
5.7%
3 255
 
5.7%
1 228
 
5.1%
9 166
 
3.7%
Other values (37) 1119
24.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2473
54.8%
Dash Punctuation 665
 
14.7%
Space Separator 447
 
9.9%
Uppercase Letter 419
 
9.3%
Other Punctuation 405
 
9.0%
Lowercase Letter 95
 
2.1%
Other Letter 6
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
H 113
27.0%
C 66
15.8%
N 58
13.8%
F 38
 
9.1%
O 31
 
7.4%
B 31
 
7.4%
S 17
 
4.1%
A 13
 
3.1%
P 12
 
2.9%
K 8
 
1.9%
Other values (10) 32
 
7.6%
Decimal Number
ValueCountFrequency (%)
7 504
20.4%
4 324
13.1%
2 277
11.2%
0 270
10.9%
3 255
10.3%
1 228
9.2%
9 166
 
6.7%
6 161
 
6.5%
5 147
 
5.9%
8 141
 
5.7%
Lowercase Letter
ValueCountFrequency (%)
e 36
37.9%
l 23
24.2%
r 19
20.0%
i 11
 
11.6%
o 3
 
3.2%
t 1
 
1.1%
b 1
 
1.1%
a 1
 
1.1%
Other Letter
ValueCountFrequency (%)
2
33.3%
2
33.3%
1
16.7%
1
16.7%
Other Punctuation
ValueCountFrequency (%)
: 255
63.0%
/ 149
36.8%
, 1
 
0.2%
Dash Punctuation
ValueCountFrequency (%)
- 665
100.0%
Space Separator
ValueCountFrequency (%)
447
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3990
88.5%
Latin 514
 
11.4%
Hangul 6
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
H 113
22.0%
C 66
12.8%
N 58
11.3%
F 38
 
7.4%
e 36
 
7.0%
O 31
 
6.0%
B 31
 
6.0%
l 23
 
4.5%
r 19
 
3.7%
S 17
 
3.3%
Other values (18) 82
16.0%
Common
ValueCountFrequency (%)
- 665
16.7%
7 504
12.6%
447
11.2%
4 324
8.1%
2 277
6.9%
0 270
6.8%
: 255
 
6.4%
3 255
 
6.4%
1 228
 
5.7%
9 166
 
4.2%
Other values (5) 599
15.0%
Hangul
ValueCountFrequency (%)
2
33.3%
2
33.3%
1
16.7%
1
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4504
99.9%
Hangul 6
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 665
14.8%
7 504
11.2%
447
9.9%
4 324
 
7.2%
2 277
 
6.2%
0 270
 
6.0%
: 255
 
5.7%
3 255
 
5.7%
1 228
 
5.1%
9 166
 
3.7%
Other values (33) 1113
24.7%
Hangul
ValueCountFrequency (%)
2
33.3%
2
33.3%
1
16.7%
1
16.7%

Correlations

2023-12-12T11:37:18.844680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
화학기호검사주기
화학기호1.0000.000
검사주기0.0001.000

Missing values

2023-12-12T11:37:16.281584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T11:37:16.367524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T11:37:16.442316image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

가스명화학기호검사주기카스번호(CAS No)
0염화수소Hcl17647-01-0
1삼염화붕소Bcl3110294-34-5
2사불화규소SiF417783-61-1
3육불화텅스텐WF617783-82-6
4사불화유황SF407783-60-0
5포스핀PH317803-51-2
6디실란Si2H611590-87-0
7삼불화붕소BF317637-07-02
8아크릴로니트릴C2H3CN1107-13-1
9아크릴알데히드C3H4O1107-02-8
가스명화학기호검사주기카스번호(CAS No)
1810.1%B2H6/H2<NA>1B2H6 : 19287-45-7/H2 : 1333-74-0
1820.1%B2H6/H2<NA>1B2H6 : 19287-45-7/H2 : 1333-74-0
183TOXIC<NA>4<NA>
184MTBE/WATER<NA>0MTBE : 1634-04-4
185N2+SiF4<NA>0SiH4 : 7803-62-5/N2 : 7727-37-9
186PH3+ArPH3+Ar0PH3 : 7803-51-2/Ar : 7440-37-1
187CH3CL(17%)+HF(83%)<NA>0CH3Cl : 74-87-3/HF : 7664-39-3
1885%B2H6/N2<NA>1B2H6 : 19287-45-7/N2 : 7727-37-9
189옥타플루오르화부테인C4F812C4F8 : 115-25-3
190HBr acid<NA>010035-10-6

Duplicate rows

Most frequently occurring

가스명화학기호검사주기카스번호(CAS No)# duplicates
00.1%B2H6/H2<NA>1B2H6 : 19287-45-7/H2 : 1333-74-05
10.95%F2/3.5%Ar/Ne<NA>0F2: 7782-41-4/Ar : 7440-37-1/Ne: 7440-01-92
25%B2H6/N2<NA>0B2H6 : 19287-45-7/N2 : 7727-37-92
35%B2H6/N2<NA>1B2H6 : 19287-45-7/N2 : 7727-37-92