Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells6
Missing cells (%)< 0.1%
Duplicate rows12
Duplicate rows (%)0.1%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Categorical3
Text2

Dataset

DescriptionLMO법에 따른 시험·연구용 LMO 수입신고 및 수출통보, 연구시설 신고 등 각종 제도에 대한 민원서류 접수·처리, LMO 안전관리등급 관련 정보를 제공합니다.
Author한국생명공학연구원
URLhttps://www.data.go.kr/data/15040518/fileData.do

Alerts

Dataset has 12 (0.1%) duplicate rowsDuplicates
위험군 is highly overall correlated with 등급High correlation
등급 is highly overall correlated with 위험군High correlation
분류 is highly imbalanced (50.5%)Imbalance
위험군 is highly imbalanced (64.4%)Imbalance
등급 is highly imbalanced (62.6%)Imbalance

Reproduction

Analysis started2023-12-12 19:44:39.332896
Analysis finished2023-12-12 19:44:40.138878
Duration0.81 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

분류
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
동식물
7771 
세균
884 
진균
840 
바이러스
 
384
기생충
 
121

Length

Max length4
Median length3
Mean length2.866
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row동식물
2nd row진균
3rd row동식물
4th row동식물
5th row동식물

Common Values

ValueCountFrequency (%)
동식물 7771
77.7%
세균 884
 
8.8%
진균 840
 
8.4%
바이러스 384
 
3.8%
기생충 121
 
1.2%

Length

2023-12-13T04:44:40.242729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:44:40.379388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
동식물 7771
77.7%
세균 884
 
8.8%
진균 840
 
8.4%
바이러스 384
 
3.8%
기생충 121
 
1.2%

위험군
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
7771 
2
2142 
3
 
65
4
 
18
1
 
4

Length

Max length4
Median length4
Mean length3.3313
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row2
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 7771
77.7%
2 2142
 
21.4%
3 65
 
0.7%
4 18
 
0.2%
1 4
 
< 0.1%

Length

2023-12-13T04:44:40.558477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:44:40.689003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 7771
77.7%
2 2142
 
21.4%
3 65
 
0.7%
4 18
 
0.2%
1 4
 
< 0.1%

구분
Text

Distinct4795
Distinct (%)48.0%
Missing6
Missing (%)0.1%
Memory size156.2 KiB
2023-12-13T04:44:41.029509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length17
Mean length9.484991
Min length2

Characters and Unicode

Total characters94793
Distinct characters54
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3238 ?
Unique (%)32.4%

Sample

1st rowAconitum
2nd rowLyophyllum
3rd rowEriophyes
4th rowRhomphocallus
5th rowArctium
ValueCountFrequency (%)
allium 85
 
0.9%
aconitum 76
 
0.8%
amanita 68
 
0.7%
acer 66
 
0.7%
agrilus 56
 
0.6%
achnanthes 52
 
0.5%
mycoplasma 44
 
0.4%
alternaria 43
 
0.4%
acremonium 41
 
0.4%
acleris 35
 
0.4%
Other values (4786) 9431
94.3%
2023-12-13T04:44:41.645040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 10232
 
10.8%
i 8105
 
8.6%
o 7588
 
8.0%
e 6600
 
7.0%
r 6311
 
6.7%
s 5984
 
6.3%
l 5244
 
5.5%
c 4630
 
4.9%
t 4493
 
4.7%
n 4480
 
4.7%
Other values (44) 31126
32.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 84783
89.4%
Uppercase Letter 9993
 
10.5%
Space Separator 16
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 10232
12.1%
i 8105
 
9.6%
o 7588
 
8.9%
e 6600
 
7.8%
r 6311
 
7.4%
s 5984
 
7.1%
l 5244
 
6.2%
c 4630
 
5.5%
t 4493
 
5.3%
n 4480
 
5.3%
Other values (16) 21116
24.9%
Uppercase Letter
ValueCountFrequency (%)
A 3475
34.8%
P 1081
 
10.8%
C 924
 
9.2%
S 594
 
5.9%
M 496
 
5.0%
L 388
 
3.9%
E 341
 
3.4%
T 334
 
3.3%
H 323
 
3.2%
B 307
 
3.1%
Other values (16) 1730
17.3%
Space Separator
ValueCountFrequency (%)
16
100.0%
Other Punctuation
ValueCountFrequency (%)
& 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 94776
> 99.9%
Common 17
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 10232
 
10.8%
i 8105
 
8.6%
o 7588
 
8.0%
e 6600
 
7.0%
r 6311
 
6.7%
s 5984
 
6.3%
l 5244
 
5.5%
c 4630
 
4.9%
t 4493
 
4.7%
n 4480
 
4.7%
Other values (42) 31109
32.8%
Common
ValueCountFrequency (%)
16
94.1%
& 1
 
5.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 94793
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 10232
 
10.8%
i 8105
 
8.6%
o 7588
 
8.0%
e 6600
 
7.0%
r 6311
 
6.7%
s 5984
 
6.3%
l 5244
 
5.5%
c 4630
 
4.9%
t 4493
 
4.7%
n 4480
 
4.7%
Other values (44) 31126
32.8%
Distinct9176
Distinct (%)91.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T04:44:41.971620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length197
Median length69
Mean length11.6146
Min length4

Characters and Unicode

Total characters116146
Distinct characters249
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8635 ?
Unique (%)86.4%

Sample

1st rowA.kirinense
2nd rowL.shimeji
3rd rowE.mali
4th rowR.coreanus
5th rowA.minus
ValueCountFrequency (%)
virus 283
 
2.5%
a.japonica 30
 
0.3%
c 25
 
0.2%
b 23
 
0.2%
a.koreana 18
 
0.2%
mosaic 16
 
0.1%
bovine 16
 
0.1%
herpesvirus 15
 
0.1%
a 15
 
0.1%
disease 15
 
0.1%
Other values (9399) 10716
95.9%
2023-12-13T04:44:42.492598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 11510
 
9.9%
i 11094
 
9.6%
. 9651
 
8.3%
s 8664
 
7.5%
e 7143
 
6.2%
n 6465
 
5.6%
r 6149
 
5.3%
u 5628
 
4.8%
o 5546
 
4.8%
l 4923
 
4.2%
Other values (239) 39373
33.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 93838
80.8%
Uppercase Letter 10136
 
8.7%
Other Punctuation 9677
 
8.3%
Space Separator 1426
 
1.2%
Other Letter 736
 
0.6%
Close Punctuation 120
 
0.1%
Open Punctuation 119
 
0.1%
Decimal Number 50
 
< 0.1%
Dash Punctuation 44
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
70
 
9.5%
19
 
2.6%
18
 
2.4%
17
 
2.3%
17
 
2.3%
17
 
2.3%
15
 
2.0%
15
 
2.0%
15
 
2.0%
14
 
1.9%
Other values (169) 519
70.5%
Lowercase Letter
ValueCountFrequency (%)
a 11510
12.3%
i 11094
11.8%
s 8664
 
9.2%
e 7143
 
7.6%
n 6465
 
6.9%
r 6149
 
6.6%
u 5628
 
6.0%
o 5546
 
5.9%
l 4923
 
5.2%
t 4653
 
5.0%
Other values (16) 22063
23.5%
Uppercase Letter
ValueCountFrequency (%)
A 3489
34.4%
P 1034
 
10.2%
C 950
 
9.4%
S 630
 
6.2%
M 523
 
5.2%
L 403
 
4.0%
E 362
 
3.6%
T 336
 
3.3%
H 324
 
3.2%
B 313
 
3.1%
Other values (16) 1772
17.5%
Decimal Number
ValueCountFrequency (%)
1 16
32.0%
2 13
26.0%
3 8
16.0%
4 7
14.0%
7 2
 
4.0%
5 1
 
2.0%
8 1
 
2.0%
0 1
 
2.0%
9 1
 
2.0%
Other Punctuation
ValueCountFrequency (%)
. 9651
99.7%
, 20
 
0.2%
' 3
 
< 0.1%
2
 
< 0.1%
/ 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1426
100.0%
Close Punctuation
ValueCountFrequency (%)
) 120
100.0%
Open Punctuation
ValueCountFrequency (%)
( 119
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 44
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 103974
89.5%
Common 11436
 
9.8%
Hangul 736
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
70
 
9.5%
19
 
2.6%
18
 
2.4%
17
 
2.3%
17
 
2.3%
17
 
2.3%
15
 
2.0%
15
 
2.0%
15
 
2.0%
14
 
1.9%
Other values (169) 519
70.5%
Latin
ValueCountFrequency (%)
a 11510
 
11.1%
i 11094
 
10.7%
s 8664
 
8.3%
e 7143
 
6.9%
n 6465
 
6.2%
r 6149
 
5.9%
u 5628
 
5.4%
o 5546
 
5.3%
l 4923
 
4.7%
t 4653
 
4.5%
Other values (42) 32199
31.0%
Common
ValueCountFrequency (%)
. 9651
84.4%
1426
 
12.5%
) 120
 
1.0%
( 119
 
1.0%
- 44
 
0.4%
, 20
 
0.2%
1 16
 
0.1%
2 13
 
0.1%
3 8
 
0.1%
4 7
 
0.1%
Other values (8) 12
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 115408
99.4%
Hangul 736
 
0.6%
Punctuation 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 11510
 
10.0%
i 11094
 
9.6%
. 9651
 
8.4%
s 8664
 
7.5%
e 7143
 
6.2%
n 6465
 
5.6%
r 6149
 
5.3%
u 5628
 
4.9%
o 5546
 
4.8%
l 4923
 
4.3%
Other values (59) 38635
33.5%
Hangul
ValueCountFrequency (%)
70
 
9.5%
19
 
2.6%
18
 
2.4%
17
 
2.3%
17
 
2.3%
17
 
2.3%
15
 
2.0%
15
 
2.0%
15
 
2.0%
14
 
1.9%
Other values (169) 519
70.5%
Punctuation
ValueCountFrequency (%)
2
100.0%

등급
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
7771 
2
2074 
1
 
72
3
 
65
4
 
18

Length

Max length4
Median length4
Mean length3.3313
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row2
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 7771
77.7%
2 2074
 
20.7%
1 72
 
0.7%
3 65
 
0.7%
4 18
 
0.2%

Length

2023-12-13T04:44:42.635296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:44:42.751959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 7771
77.7%
2 2074
 
20.7%
1 72
 
0.7%
3 65
 
0.7%
4 18
 
0.2%

Correlations

2023-12-13T04:44:42.830454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분류위험군등급
분류1.0000.3680.805
위험군0.3681.0000.984
등급0.8050.9841.000
2023-12-13T04:44:42.915755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분류등급위험군
분류1.0000.4460.151
등급0.4461.0000.827
위험군0.1510.8271.000
2023-12-13T04:44:42.999819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분류위험군등급
분류1.0000.1510.446
위험군0.1511.0000.827
등급0.4460.8271.000

Missing values

2023-12-13T04:44:39.908806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T04:44:40.086094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

분류위험군구분생물체등급
3979동식물<NA>AconitumA.kirinense<NA>
2824진균2LyophyllumL.shimeji2
8898동식물<NA>EriophyesE.mali<NA>
12699동식물<NA>RhomphocallusR.coreanus<NA>
6660동식물<NA>ArctiumA.minus<NA>
13080동식물<NA>SemisulcospiraS.libertina<NA>
6041동식물<NA>AlosternaA.perpera<NA>
7278동식물<NA>CalopogoniumC.mucunoides<NA>
6011동식물<NA>AlonellaA.exigua<NA>
7608동식물<NA>ChitalpaC.tashkinensis<NA>
분류위험군구분생물체등급
1068세균2ActinokineosporaA.inagensis2
12329동식물<NA>PrunusP.ishidoyana<NA>
13560동식물<NA>TegecoelotesT.secundus<NA>
13122동식물<NA>ShiragaiaS.taeguensis<NA>
13734동식물<NA>TodarodesT.pacificus<NA>
10829동식물<NA>MolophilusM.avidus<NA>
11837동식물<NA>PhilodromusP.auricomus<NA>
1330세균2AchnanthesA.rupestoides2
12990동식물<NA>ScirpusS.juncoides<NA>
8123동식물<NA>CryptoblabesC.adoceta<NA>

Duplicate rows

Most frequently occurring

분류위험군구분생물체등급# duplicates
0동식물<NA>AlopecurusA.aequalis<NA>2
1동식물<NA>ChryssoC.lativentris<NA>2
2동식물<NA>ClubionaC.papillata<NA>2
3동식물<NA>HydrolithonH.sargassi<NA>2
4동식물<NA>MelanoplusM.differentialis<NA>2
5동식물<NA>OncometopiaO.nigricans<NA>2
6동식물<NA>ProsopisP.juliflora<NA>2
7동식물<NA>SarcocheilichthysS.variegatus<NA>2
8동식물<NA>SpergulariaS.marina<NA>2
9동식물<NA>VespaV.velutina<NA>2