Overview

Dataset statistics

Number of variables5
Number of observations30
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.3 KiB
Average record size in memory44.4 B

Variable types

Text4
Categorical1

Dataset

Description샘플 데이터
Author한국생산기술연구원
URLhttps://bigdata-region.kr/#/dataset/bb99ea99-1800-4848-b50a-6c827d7a84be

Reproduction

Analysis started2023-12-10 13:48:38.227093
Analysis finished2023-12-10 13:48:39.127732
Duration0.9 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct28
Distinct (%)93.3%
Missing0
Missing (%)0.0%
Memory size372.0 B
2023-12-10T22:48:39.353293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9.5
Mean length9.1
Min length7

Characters and Unicode

Total characters273
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)86.7%

Sample

1st row75-04-7
2nd row68-11-1
3rd row100-16-3
4th row10026-12-7
5th row10026-24-1
ValueCountFrequency (%)
287-92-3 2
 
6.7%
26658-19-5 2
 
6.7%
75-04-7 1
 
3.3%
68-11-1 1
 
3.3%
281-23-2 1
 
3.3%
27458-94-2 1
 
3.3%
27193-86-8 1
 
3.3%
2695-37-6 1
 
3.3%
26761-40-0 1
 
3.3%
26523-78-4 1
 
3.3%
Other values (18) 18
60.0%
2023-12-10T22:48:39.908729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 60
22.0%
2 36
13.2%
1 27
9.9%
6 25
9.2%
8 21
 
7.7%
5 19
 
7.0%
7 18
 
6.6%
3 18
 
6.6%
0 18
 
6.6%
4 17
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 213
78.0%
Dash Punctuation 60
 
22.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 36
16.9%
1 27
12.7%
6 25
11.7%
8 21
9.9%
5 19
8.9%
7 18
8.5%
3 18
8.5%
0 18
8.5%
4 17
8.0%
9 14
 
6.6%
Dash Punctuation
ValueCountFrequency (%)
- 60
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 273
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 60
22.0%
2 36
13.2%
1 27
9.9%
6 25
9.2%
8 21
 
7.7%
5 19
 
7.0%
7 18
 
6.6%
3 18
 
6.6%
0 18
 
6.6%
4 17
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 273
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 60
22.0%
2 36
13.2%
1 27
9.9%
6 25
9.2%
8 21
 
7.7%
5 19
 
7.0%
7 18
 
6.6%
3 18
 
6.6%
0 18
 
6.6%
4 17
 
6.2%
Distinct28
Distinct (%)93.3%
Missing0
Missing (%)0.0%
Memory size372.0 B
2023-12-10T22:48:40.293917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length37
Median length27
Mean length21.066667
Min length10

Characters and Unicode

Total characters632
Distinct characters52
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)86.7%

Sample

1st rowEthylamine
2nd rowThioglycolic acid derivatives
3rd rowp-Nitrophenylhydrazine
4th rowPentachloroniobium
5th rowCobalt (Ⅱ) sulfate
ValueCountFrequency (%)
cyclopentane 2
 
3.3%
비이온성 2
 
3.3%
sodium 2
 
3.3%
1 2
 
3.3%
유기surfactant 2
 
3.3%
1-bis 1
 
1.7%
plastics 1
 
1.7%
silicon 1
 
1.7%
tetrafluoride 1
 
1.7%
hydrazine 1
 
1.7%
Other values (45) 45
75.0%
2023-12-10T22:48:40.954344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 55
 
8.7%
o 48
 
7.6%
n 44
 
7.0%
a 42
 
6.6%
i 42
 
6.6%
t 41
 
6.5%
l 38
 
6.0%
31
 
4.9%
r 27
 
4.3%
h 26
 
4.1%
Other values (42) 238
37.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 516
81.6%
Uppercase Letter 39
 
6.2%
Space Separator 31
 
4.9%
Dash Punctuation 13
 
2.1%
Other Letter 12
 
1.9%
Decimal Number 10
 
1.6%
Close Punctuation 5
 
0.8%
Open Punctuation 5
 
0.8%
Letter Number 1
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 55
10.7%
o 48
 
9.3%
n 44
 
8.5%
a 42
 
8.1%
i 42
 
8.1%
t 41
 
7.9%
l 38
 
7.4%
r 27
 
5.2%
h 26
 
5.0%
y 22
 
4.3%
Other values (12) 131
25.4%
Uppercase Letter
ValueCountFrequency (%)
S 6
15.4%
T 6
15.4%
D 5
12.8%
C 5
12.8%
A 4
10.3%
B 3
7.7%
P 2
 
5.1%
L 2
 
5.1%
H 1
 
2.6%
I 1
 
2.6%
Other values (4) 4
10.3%
Other Letter
ValueCountFrequency (%)
2
16.7%
2
16.7%
2
16.7%
2
16.7%
2
16.7%
2
16.7%
Decimal Number
ValueCountFrequency (%)
1 4
40.0%
4 2
20.0%
7 2
20.0%
3 1
 
10.0%
2 1
 
10.0%
Space Separator
ValueCountFrequency (%)
31
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 556
88.0%
Common 64
 
10.1%
Hangul 12
 
1.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 55
 
9.9%
o 48
 
8.6%
n 44
 
7.9%
a 42
 
7.6%
i 42
 
7.6%
t 41
 
7.4%
l 38
 
6.8%
r 27
 
4.9%
h 26
 
4.7%
y 22
 
4.0%
Other values (27) 171
30.8%
Common
ValueCountFrequency (%)
31
48.4%
- 13
20.3%
) 5
 
7.8%
( 5
 
7.8%
1 4
 
6.2%
4 2
 
3.1%
7 2
 
3.1%
3 1
 
1.6%
2 1
 
1.6%
Hangul
ValueCountFrequency (%)
2
16.7%
2
16.7%
2
16.7%
2
16.7%
2
16.7%
2
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 619
97.9%
Hangul 12
 
1.9%
Number Forms 1
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 55
 
8.9%
o 48
 
7.8%
n 44
 
7.1%
a 42
 
6.8%
i 42
 
6.8%
t 41
 
6.6%
l 38
 
6.1%
31
 
5.0%
r 27
 
4.4%
h 26
 
4.2%
Other values (35) 225
36.3%
Hangul
ValueCountFrequency (%)
2
16.7%
2
16.7%
2
16.7%
2
16.7%
2
16.7%
2
16.7%
Number Forms
ValueCountFrequency (%)
1
100.0%
Distinct2
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
Chem
22 
Elec

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowChem
2nd rowChem
3rd rowChem
4th rowChem
5th rowChem

Common Values

ValueCountFrequency (%)
Chem 22
73.3%
Elec 8
 
26.7%

Length

2023-12-10T22:48:41.198807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:48:41.418986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
chem 22
73.3%
elec 8
 
26.7%
Distinct29
Distinct (%)96.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
2023-12-10T22:48:41.756949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length40
Median length28
Mean length15.466667
Min length7

Characters and Unicode

Total characters464
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)93.3%

Sample

1st rowInsecticid
2nd rowHair dye
3rd rowCosmetic cleanser
4th rowCatalyst
5th rowFertilizer
ValueCountFrequency (%)
electrolyte 4
 
7.3%
battery 4
 
7.3%
agent 3
 
5.5%
emulsifier 2
 
3.6%
ion 2
 
3.6%
lithium 2
 
3.6%
electric 2
 
3.6%
swelling 1
 
1.8%
fertilizer 1
 
1.8%
aircraft 1
 
1.8%
Other values (33) 33
60.0%
2023-12-10T22:48:42.342739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 56
12.1%
t 53
11.4%
i 46
 
9.9%
r 37
 
8.0%
a 34
 
7.3%
l 28
 
6.0%
25
 
5.4%
c 24
 
5.2%
n 23
 
5.0%
o 19
 
4.1%
Other values (25) 119
25.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 409
88.1%
Uppercase Letter 30
 
6.5%
Space Separator 25
 
5.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 56
13.7%
t 53
13.0%
i 46
11.2%
r 37
9.0%
a 34
8.3%
l 28
 
6.8%
c 24
 
5.9%
n 23
 
5.6%
o 19
 
4.6%
s 13
 
3.2%
Other values (12) 76
18.6%
Uppercase Letter
ValueCountFrequency (%)
E 7
23.3%
A 5
16.7%
H 4
13.3%
C 3
10.0%
S 2
 
6.7%
F 2
 
6.7%
L 2
 
6.7%
R 1
 
3.3%
B 1
 
3.3%
I 1
 
3.3%
Other values (2) 2
 
6.7%
Space Separator
ValueCountFrequency (%)
25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 439
94.6%
Common 25
 
5.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 56
12.8%
t 53
12.1%
i 46
10.5%
r 37
 
8.4%
a 34
 
7.7%
l 28
 
6.4%
c 24
 
5.5%
n 23
 
5.2%
o 19
 
4.3%
s 13
 
3.0%
Other values (24) 106
24.1%
Common
ValueCountFrequency (%)
25
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 464
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 56
12.1%
t 53
11.4%
i 46
 
9.9%
r 37
 
8.0%
a 34
 
7.3%
l 28
 
6.0%
25
 
5.4%
c 24
 
5.2%
n 23
 
5.0%
o 19
 
4.1%
Other values (25) 119
25.6%
Distinct29
Distinct (%)96.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
2023-12-10T22:48:42.681172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length4.9333333
Min length2

Characters and Unicode

Total characters148
Distinct characters66
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)93.3%

Sample

1st row살충제
2nd row머리 염색제
3rd row화장품 클렌저
4th row촉매
5th row비료
ValueCountFrequency (%)
전지 3
 
6.1%
유화제 2
 
4.1%
전해질 2
 
4.1%
전해액 2
 
4.1%
이온 2
 
4.1%
리튬 2
 
4.1%
방지제 1
 
2.0%
제초제 1
 
2.0%
항공기 1
 
2.0%
코팅 1
 
2.0%
Other values (32) 32
65.3%
2023-12-10T22:48:43.278258image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
19
 
12.8%
19
 
12.8%
11
 
7.4%
6
 
4.1%
5
 
3.4%
5
 
3.4%
4
 
2.7%
4
 
2.7%
3
 
2.0%
3
 
2.0%
Other values (56) 69
46.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 129
87.2%
Space Separator 19
 
12.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
19
 
14.7%
11
 
8.5%
6
 
4.7%
5
 
3.9%
5
 
3.9%
4
 
3.1%
4
 
3.1%
3
 
2.3%
3
 
2.3%
3
 
2.3%
Other values (55) 66
51.2%
Space Separator
ValueCountFrequency (%)
19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 129
87.2%
Common 19
 
12.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
19
 
14.7%
11
 
8.5%
6
 
4.7%
5
 
3.9%
5
 
3.9%
4
 
3.1%
4
 
3.1%
3
 
2.3%
3
 
2.3%
3
 
2.3%
Other values (55) 66
51.2%
Common
ValueCountFrequency (%)
19
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 129
87.2%
ASCII 19
 
12.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
19
100.0%
Hangul
ValueCountFrequency (%)
19
 
14.7%
11
 
8.5%
6
 
4.7%
5
 
3.9%
5
 
3.9%
4
 
3.1%
4
 
3.1%
3
 
2.3%
3
 
2.3%
3
 
2.3%
Other values (55) 66
51.2%

Correlations

2023-12-10T22:48:43.423358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
화학물질요약서비스번호화학물질요약서비스물질명물질용도분류명물질용도영어명물질용도한글명
화학물질요약서비스번호1.0001.0001.0000.9840.984
화학물질요약서비스물질명1.0001.0001.0000.9840.984
물질용도분류명1.0001.0001.0001.0001.000
물질용도영어명0.9840.9841.0001.0001.000
물질용도한글명0.9840.9841.0001.0001.000

Missing values

2023-12-10T22:48:38.896825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T22:48:39.062093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

화학물질요약서비스번호화학물질요약서비스물질명물질용도분류명물질용도영어명물질용도한글명
075-04-7EthylamineChemInsecticid살충제
168-11-1Thioglycolic acid derivativesChemHair dye머리 염색제
2100-16-3p-NitrophenylhydrazineChemCosmetic cleanser화장품 클렌저
310026-12-7PentachloroniobiumChemCatalyst촉매
410026-24-1Cobalt (Ⅱ) sulfateChemFertilizer비료
512002-48-1Trichlorobenzene (mixture of isomers)ChemSwelling agent팽창제
6120-54-7Dipentamethylenethiuram tetrasulfideChemHeat stabilizer열 안정제
712058-66-1Sodium stannateChemElectrolytic surface treatment agent전해 표면 처리제
812069-32-8Boron carbideChemAbrasive연마제
9253-52-1PhthalazineElecRechargeable lithium ion battery cathode리튬 이온 전지 양극
화학물질요약서비스번호화학물질요약서비스물질명물질용도분류명물질용도영어명물질용도한글명
2026658-19-5비이온성 유기SurfactantChemCorrosion inhibitor부식억제제
2126658-19-5비이온성 유기SurfactantChemEmulsifier유화제
2226761-40-0Diisodecyl phthalateChemArtificial leather인조 가죽
232695-37-6Sodium styrenesulfonateChemEmulsifier유화제
2427193-86-8p-DodecylphenolChemFuel additive연료 첨가제
2527458-94-2Isononyl alcoholChemDetergent세제
26281-23-2AdamantaneChemHardener경화제
27287-92-3CyclopentaneChemAdhesive접착제
28287-92-3CyclopentaneChemLubricant윤활유
293006-86-81 1-Bis (t-butylperoxy) cyclohexaneChemOxidation agent산화제