gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	191
Missing cells	109
Missing cells (%)	14.3%
Duplicate rows	4
Duplicate rows (%)	2.1%
Total size in memory	6.3 KiB
Average record size in memory	33.7 B

Variable types

Text	3
Categorical	1

Dataset

Description	한국가스안전공사 검사대상이 되는 독성가스 191종의 물성 정보(가스명, 화학기호, 검사주기)에 관한 데이터로, 일반 국민분들에게 전반적인 독성가스에 관한 정보를 제공하기 위해 공개하는 데이터입니다.
URL	https://www.data.go.kr/data/15067783/fileData.do

Alerts

Dataset has 4 (2.1%) duplicate rows	Duplicates
`화학기호` has 102 (53.4%) missing values	Missing
`카스번호(CAS No)` has 7 (3.7%) missing values	Missing

Reproduction

Analysis started	2023-12-12 02:37:16.059034
Analysis finished	2023-12-12 02:37:16.486165
Duration	0.43 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

가스명
Text

Distinct	174
Distinct (%)	91.1%
Missing	0
Missing (%)	0.0%
Memory size	1.6 KiB

Length

Max length	34
Median length	23
Mean length	9.0418848
Min length	2

Characters and Unicode

Total characters	1727
Distinct characters	151
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	164 ?
Unique (%)	85.9%

Sample

1st row	염화수소
2nd row	삼염화붕소
3rd row	사불화규소
4th row	육불화텅스텐
5th row	사불화유황

Value	Count	Frequency (%)
0.1%b2h6/h2	5	2.2%
5%b2h6/n2	5	2.2%
co	5	2.2%
	3	1.3%
bcl3	3	1.3%
n2+sif4	3	1.3%
15%b2h6	3	1.3%
암모니아	2	0.9%
toxic	2	0.9%
0.95%f2/3.5%ar/ne	2	0.9%
Other values (180)	190	85.2%

Most occurring characters

Value	Count	Frequency (%)
H	121	7.0%
2	116	6.7%
/	98	5.7%
C	80	4.6%
N	67	3.9%
%	59	3.4%
O	51	3.0%
B	41	2.4%
3	40	2.3%
F	40	2.3%
Other values (141)	1014	58.7%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	669	38.7%
Other Letter	327	18.9%
Decimal Number	300	17.4%
Other Punctuation	210	12.2%
Lowercase Letter	120	6.9%
Space Separator	32	1.9%
Math Symbol	24	1.4%
Open Punctuation	22	1.3%
Close Punctuation	20	1.2%
Dash Punctuation	3	0.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
화	28	8.6%
소	22	6.7%
스	11	3.4%
불	11	3.4%
로	10	3.1%
아	9	2.8%
오	9	2.8%
수	8	2.4%
사	7	2.1%
산	7	2.1%
Other values (79)	205	62.7%

Uppercase Letter

Value	Count	Frequency (%)
H	121	18.1%
C	80	12.0%
N	67	10.0%
O	51	7.6%
B	41	6.1%
F	40	6.0%
A	37	5.5%
S	35	5.2%
E	28	4.2%
L	23	3.4%
Other values (15)	146	21.8%

Lowercase Letter

Value	Count	Frequency (%)
e	38	31.7%
r	24	20.0%
i	13	10.8%
l	13	10.8%
o	7	5.8%
n	4	3.3%
a	4	3.3%
t	3	2.5%
s	2	1.7%
d	2	1.7%
Other values (8)	10	8.3%

Decimal Number

Value	Count	Frequency (%)
2	116	38.7%
3	40	13.3%
1	34	11.3%
5	31	10.3%
6	26	8.7%
4	22	7.3%
0	20	6.7%
9	5	1.7%
8	4	1.3%
7	2	0.7%

Other Punctuation

Value	Count	Frequency (%)
/	98	46.7%
%	59	28.1%
,	31	14.8%
.	22	10.5%

Space Separator

Value	Count	Frequency (%)
	32	100.0%

Math Symbol

Value	Count	Frequency (%)
+	24	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	22	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	20	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	789	45.7%
Common	611	35.4%
Hangul	327	18.9%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
화	28	8.6%
소	22	6.7%
스	11	3.4%
불	11	3.4%
로	10	3.1%
아	9	2.8%
오	9	2.8%
수	8	2.4%
사	7	2.1%
산	7	2.1%
Other values (79)	205	62.7%

Latin

Value	Count	Frequency (%)
H	121	15.3%
C	80	10.1%
N	67	8.5%
O	51	6.5%
B	41	5.2%
F	40	5.1%
e	38	4.8%
A	37	4.7%
S	35	4.4%
E	28	3.5%
Other values (33)	251	31.8%

Common

Value	Count	Frequency (%)
2	116	19.0%
/	98	16.0%
%	59	9.7%
3	40	6.5%
1	34	5.6%
	32	5.2%
5	31	5.1%
,	31	5.1%
6	26	4.3%
+	24	3.9%
Other values (9)	120	19.6%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1400	81.1%
Hangul	327	18.9%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
H	121	8.6%
2	116	8.3%
/	98	7.0%
C	80	5.7%
N	67	4.8%
%	59	4.2%
O	51	3.6%
B	41	2.9%
3	40	2.9%
F	40	2.9%
Other values (52)	687	49.1%

Hangul

Value	Count	Frequency (%)
화	28	8.6%
소	22	6.7%
스	11	3.4%
불	11	3.4%
로	10	3.1%
아	9	2.8%
오	9	2.8%
수	8	2.4%
사	7	2.1%
산	7	2.1%
Other values (79)	205	62.7%

화학기호
Text

MISSING

Distinct	75
Distinct (%)	84.3%
Missing	102
Missing (%)	53.4%
Memory size	1.6 KiB

Length

Max length	20
Median length	10
Mean length	5.1460674
Min length	1

Characters and Unicode

Total characters	458
Distinct characters	41
Distinct categories	8 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	66 ?
Unique (%)	74.2%

Sample

1st row	Hcl
2nd row	Bcl3
3rd row	SiF4
4th row	WF6
5th row	SF4

Value	Count	Frequency (%)
so2	4	4.4%
nh3	3	3.3%
bf3	3	3.3%
gef4	3	3.3%
15%b2h6	3	3.3%
sif4	2	2.2%
sih4	2	2.2%
bcl3	2	2.2%
b2h6	2	2.2%
hf	2	2.2%
Other values (60)	64	71.1%

Most occurring characters

Value	Count	Frequency (%)
H	62	13.5%
2	50	10.9%
C	32	7.0%
3	29	6.3%
F	24	5.2%
N	21	4.6%
S	21	4.6%
4	21	4.6%
B	18	3.9%
%	17	3.7%
Other values (31)	163	35.6%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	228	49.8%
Decimal Number	145	31.7%
Lowercase Letter	40	8.7%
Other Punctuation	32	7.0%
Math Symbol	8	1.7%
Close Punctuation	2	0.4%
Open Punctuation	2	0.4%
Space Separator	1	0.2%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
H	62	27.2%
C	32	14.0%
F	24	10.5%
N	21	9.2%
S	21	9.2%
B	18	7.9%
O	16	7.0%
P	7	3.1%
L	6	2.6%
A	6	2.6%
Other values (7)	15	6.6%

Decimal Number

Value	Count	Frequency (%)
2	50	34.5%
3	29	20.0%
4	21	14.5%
6	17	11.7%
5	11	7.6%
1	9	6.2%
0	5	3.4%
8	2	1.4%
9	1	0.7%

Lowercase Letter

Value	Count	Frequency (%)
i	10	25.0%
e	9	22.5%
l	7	17.5%
r	6	15.0%
s	3	7.5%
c	2	5.0%
a	2	5.0%
b	1	2.5%

Other Punctuation

Value	Count	Frequency (%)
%	17	53.1%
/	12	37.5%
,	3	9.4%

Math Symbol

Value	Count	Frequency (%)
+	8	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	2	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	2	100.0%

Space Separator

Value	Count	Frequency (%)
	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	268	58.5%
Common	190	41.5%

Most frequent character per script

Latin

Value	Count	Frequency (%)
H	62	23.1%
C	32	11.9%
F	24	9.0%
N	21	7.8%
S	21	7.8%
B	18	6.7%
O	16	6.0%
i	10	3.7%
e	9	3.4%
l	7	2.6%
Other values (15)	48	17.9%

Common

Value	Count	Frequency (%)
2	50	26.3%
3	29	15.3%
4	21	11.1%
%	17	8.9%
6	17	8.9%
/	12	6.3%
5	11	5.8%
1	9	4.7%
+	8	4.2%
0	5	2.6%
Other values (6)	11	5.8%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	458	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
H	62	13.5%
2	50	10.9%
C	32	7.0%
3	29	6.3%
F	24	5.2%
N	21	4.6%
S	21	4.6%
4	21	4.6%
B	18	3.9%
%	17	3.7%
Other values (31)	163	35.6%

검사주기
Categorical

Distinct	5
Distinct (%)	2.6%
Missing	0
Missing (%)	0.0%
Memory size	1.6 KiB

0	125
1	58
12	5
4	2
6	1

Length

Max length	2
Median length	1
Mean length	1.026178
Min length	1

Unique

Unique	1 ?
Unique (%)	0.5%

Sample

1st row	1
2nd row	1
3rd row	1
4th row	1
5th row	0

Common Values

Value	Count	Frequency (%)
0	125	65.4%
1	58	30.4%
12	5	2.6%
4	2	1.0%
6	1	0.5%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
0	125	65.4%
1	58	30.4%
12	5	2.6%
4	2	1.0%
6	1	0.5%

카스번호(CAS No)
Text

MISSING

Distinct	115
Distinct (%)	62.5%
Missing	7
Missing (%)	3.7%
Memory size	1.6 KiB

Length

Max length	124
Median length	59
Mean length	24.51087
Min length	7

Characters and Unicode

Total characters	4510
Distinct characters	47
Distinct categories	7 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	81 ?
Unique (%)	44.0%

Sample

1st row	7647-01-0
2nd row	10294-34-5
3rd row	7783-61-1
4th row	7783-82-6
5th row	7783-60-0

Value	Count	Frequency (%)
	192	30.5%
7727-37-9	23	3.7%
b2h6	22	3.5%
7440-01-9	18	2.9%
f2	14	2.2%
1333-74-0	14	2.2%
hcl	12	1.9%
ph3	10	1.6%
19287-45-7/h2	9	1.4%
7782-41-4/ar	8	1.3%
Other values (184)	308	48.9%

Most occurring characters

Value	Count	Frequency (%)
-	665	14.7%
7	504	11.2%
	447	9.9%
4	324	7.2%
2	277	6.1%
0	270	6.0%
:	255	5.7%
3	255	5.7%
1	228	5.1%
9	166	3.7%
Other values (37)	1119	24.8%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	2473	54.8%
Dash Punctuation	665	14.7%
Space Separator	447	9.9%
Uppercase Letter	419	9.3%
Other Punctuation	405	9.0%
Lowercase Letter	95	2.1%
Other Letter	6	0.1%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
H	113	27.0%
C	66	15.8%
N	58	13.8%
F	38	9.1%
O	31	7.4%
B	31	7.4%
S	17	4.1%
A	13	3.1%
P	12	2.9%
K	8	1.9%
Other values (10)	32	7.6%

Decimal Number

Value	Count	Frequency (%)
7	504	20.4%
4	324	13.1%
2	277	11.2%
0	270	10.9%
3	255	10.3%
1	228	9.2%
9	166	6.7%
6	161	6.5%
5	147	5.9%
8	141	5.7%

Lowercase Letter

Value	Count	Frequency (%)
e	36	37.9%
l	23	24.2%
r	19	20.0%
i	11	11.6%
o	3	3.2%
t	1	1.1%
b	1	1.1%
a	1	1.1%

Other Letter

Value	Count	Frequency (%)
벤	2	33.3%
젠	2	33.3%
놀	1	16.7%
페	1	16.7%

Other Punctuation

Value	Count	Frequency (%)
:	255	63.0%
/	149	36.8%
,	1	0.2%

Dash Punctuation

Value	Count	Frequency (%)
-	665	100.0%

Space Separator

Value	Count	Frequency (%)
	447	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	3990	88.5%
Latin	514	11.4%
Hangul	6	0.1%

Most frequent character per script

Latin

Value	Count	Frequency (%)
H	113	22.0%
C	66	12.8%
N	58	11.3%
F	38	7.4%
e	36	7.0%
O	31	6.0%
B	31	6.0%
l	23	4.5%
r	19	3.7%
S	17	3.3%
Other values (18)	82	16.0%

Common

Value	Count	Frequency (%)
-	665	16.7%
7	504	12.6%
	447	11.2%
4	324	8.1%
2	277	6.9%
0	270	6.8%
:	255	6.4%
3	255	6.4%
1	228	5.7%
9	166	4.2%
Other values (5)	599	15.0%

Hangul

Value	Count	Frequency (%)
벤	2	33.3%
젠	2	33.3%
놀	1	16.7%
페	1	16.7%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	4504	99.9%
Hangul	6	0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
-	665	14.8%
7	504	11.2%
	447	9.9%
4	324	7.2%
2	277	6.2%
0	270	6.0%
:	255	5.7%
3	255	5.7%
1	228	5.1%
9	166	3.7%
Other values (33)	1113	24.7%

Hangul

Value	Count	Frequency (%)
벤	2	33.3%
젠	2	33.3%
놀	1	16.7%
페	1	16.7%

Phik (φk)

Heatmap
Table

	화학기호	검사주기
화학기호	1.000	0.000
검사주기	0.000	1.000

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	가스명	화학기호	검사주기	카스번호(CAS No)
0	염화수소	Hcl	1	7647-01-0
1	삼염화붕소	Bcl3	1	10294-34-5
2	사불화규소	SiF4	1	7783-61-1
3	육불화텅스텐	WF6	1	7783-82-6
4	사불화유황	SF4	0	7783-60-0
5	포스핀	PH3	1	7803-51-2
6	디실란	Si2H6	1	1590-87-0
7	삼불화붕소	BF3	1	7637-07-02
8	아크릴로니트릴	C2H3CN	1	107-13-1
9	아크릴알데히드	C3H4O	1	107-02-8

	가스명	화학기호	검사주기	카스번호(CAS No)
181	0.1%B2H6/H2	<NA>	1	B2H6 : 19287-45-7/H2 : 1333-74-0
182	0.1%B2H6/H2	<NA>	1	B2H6 : 19287-45-7/H2 : 1333-74-0
183	TOXIC	<NA>	4	<NA>
184	MTBE/WATER	<NA>	0	MTBE : 1634-04-4
185	N2+SiF4	<NA>	0	SiH4 : 7803-62-5/N2 : 7727-37-9
186	PH3+Ar	PH3+Ar	0	PH3 : 7803-51-2/Ar : 7440-37-1
187	CH3CL(17%)+HF(83%)	<NA>	0	CH3Cl : 74-87-3/HF : 7664-39-3
188	5%B2H6/N2	<NA>	1	B2H6 : 19287-45-7/N2 : 7727-37-9
189	옥타플루오르화부테인	C4F8	12	C4F8 : 115-25-3
190	HBr acid	<NA>	0	10035-10-6

Most frequently occurring

	가스명	화학기호	검사주기	카스번호(CAS No)	# duplicates
0	0.1%B2H6/H2	<NA>	1	B2H6 : 19287-45-7/H2 : 1333-74-0	5
1	0.95%F2/3.5%Ar/Ne	<NA>	0	F2: 7782-41-4/Ar : 7440-37-1/Ne: 7440-01-9	2
2	5%B2H6/N2	<NA>	0	B2H6 : 19287-45-7/N2 : 7727-37-9	2
3	5%B2H6/N2	<NA>	1	B2H6 : 19287-45-7/N2 : 7727-37-9	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Math Symbol

Open Punctuation

Close Punctuation

Dash Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Decimal Number

Lowercase Letter

Other Punctuation

Math Symbol

Close Punctuation

Open Punctuation

Space Separator

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Decimal Number

Lowercase Letter

Other Letter

Other Punctuation

Dash Punctuation

Space Separator

Most occurring scripts

Most frequent character per script

Latin

Common

Hangul

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring