Dataset statistics
Number of variables | 5 |
---|---|
Number of observations | 10000 |
Missing cells | 6 |
Missing cells (%) | < 0.1% |
Duplicate rows | 12 |
Duplicate rows (%) | 0.1% |
Total size in memory | 488.3 KiB |
Average record size in memory | 50.0 B |
Variable types
Categorical | 3 |
---|---|
Text | 2 |
Dataset
Description | LMO법에 따른 시험·연구용 LMO 수입신고 및 수출통보, 연구시설 신고 등 각종 제도에 대한 민원서류 접수·처리, LMO 안전관리등급 관련 정보를 제공합니다. |
---|---|
Author | 한국생명공학연구원 |
URL | https://www.data.go.kr/data/15040518/fileData.do |
Reproduction
Analysis started | 2023-12-12 19:44:39.332896 |
---|---|
Analysis finished | 2023-12-12 19:44:40.138878 |
Duration | 0.81 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
분류
Categorical
IMBALANCE
 
Distinct | 5 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
동식물 | |
---|---|
세균 | |
진균 | |
바이러스 | 384 |
기생충 | 121 |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 2.866 |
Min length | 2 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 동식물 |
---|---|
2nd row | 진균 |
3rd row | 동식물 |
4th row | 동식물 |
5th row | 동식물 |
Common Values
Value | Count | Frequency (%) |
동식물 | 7771 | |
세균 | 884 | 8.8% |
진균 | 840 | 8.4% |
바이러스 | 384 | 3.8% |
기생충 | 121 | 1.2% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
동식물 | 7771 | |
세균 | 884 | 8.8% |
진균 | 840 | 8.4% |
바이러스 | 384 | 3.8% |
기생충 | 121 | 1.2% |
위험군
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 5 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
<NA> | |
---|---|
2 | |
3 | 65 |
4 | 18 |
1 | 4 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 3.3313 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | <NA> |
---|---|
2nd row | 2 |
3rd row | <NA> |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 7771 | |
2 | 2142 | 21.4% |
3 | 65 | 0.7% |
4 | 18 | 0.2% |
1 | 4 | < 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 7771 | |
2 | 2142 | 21.4% |
3 | 65 | 0.7% |
4 | 18 | 0.2% |
1 | 4 | < 0.1% |
구분
Text
Distinct | 4795 |
---|---|
Distinct (%) | 48.0% |
Missing | 6 |
Missing (%) | 0.1% |
Memory size | 156.2 KiB |
Length
Max length | 20 |
---|---|
Median length | 17 |
Mean length | 9.484991 |
Min length | 2 |
Characters and Unicode
Total characters | 94793 |
---|---|
Distinct characters | 54 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 3238 ? |
---|---|
Unique (%) | 32.4% |
Sample
1st row | Aconitum |
---|---|
2nd row | Lyophyllum |
3rd row | Eriophyes |
4th row | Rhomphocallus |
5th row | Arctium |
Value | Count | Frequency (%) |
allium | 85 | 0.9% |
aconitum | 76 | 0.8% |
amanita | 68 | 0.7% |
acer | 66 | 0.7% |
agrilus | 56 | 0.6% |
achnanthes | 52 | 0.5% |
mycoplasma | 44 | 0.4% |
alternaria | 43 | 0.4% |
acremonium | 41 | 0.4% |
acleris | 35 | 0.4% |
Other values (4786) | 9431 |
Most occurring characters
Value | Count | Frequency (%) |
a | 10232 | 10.8% |
i | 8105 | 8.6% |
o | 7588 | 8.0% |
e | 6600 | 7.0% |
r | 6311 | 6.7% |
s | 5984 | 6.3% |
l | 5244 | 5.5% |
c | 4630 | 4.9% |
t | 4493 | 4.7% |
n | 4480 | 4.7% |
Other values (44) | 31126 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 84783 | |
Uppercase Letter | 9993 | 10.5% |
Space Separator | 16 | < 0.1% |
Other Punctuation | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 10232 | |
i | 8105 | 9.6% |
o | 7588 | 8.9% |
e | 6600 | 7.8% |
r | 6311 | 7.4% |
s | 5984 | 7.1% |
l | 5244 | 6.2% |
c | 4630 | 5.5% |
t | 4493 | 5.3% |
n | 4480 | 5.3% |
Other values (16) | 21116 |
Uppercase Letter
Value | Count | Frequency (%) |
A | 3475 | |
P | 1081 | 10.8% |
C | 924 | 9.2% |
S | 594 | 5.9% |
M | 496 | 5.0% |
L | 388 | 3.9% |
E | 341 | 3.4% |
T | 334 | 3.3% |
H | 323 | 3.2% |
B | 307 | 3.1% |
Other values (16) | 1730 |
Space Separator
Value | Count | Frequency (%) |
16 |
Other Punctuation
Value | Count | Frequency (%) |
& | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 94776 | |
Common | 17 | < 0.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 10232 | 10.8% |
i | 8105 | 8.6% |
o | 7588 | 8.0% |
e | 6600 | 7.0% |
r | 6311 | 6.7% |
s | 5984 | 6.3% |
l | 5244 | 5.5% |
c | 4630 | 4.9% |
t | 4493 | 4.7% |
n | 4480 | 4.7% |
Other values (42) | 31109 |
Common
Value | Count | Frequency (%) |
16 | ||
& | 1 | 5.9% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 94793 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 10232 | 10.8% |
i | 8105 | 8.6% |
o | 7588 | 8.0% |
e | 6600 | 7.0% |
r | 6311 | 6.7% |
s | 5984 | 6.3% |
l | 5244 | 5.5% |
c | 4630 | 4.9% |
t | 4493 | 4.7% |
n | 4480 | 4.7% |
Other values (44) | 31126 |
생물체
Text
Distinct | 9176 |
---|---|
Distinct (%) | 91.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
Length
Max length | 197 |
---|---|
Median length | 69 |
Mean length | 11.6146 |
Min length | 4 |
Characters and Unicode
Total characters | 116146 |
---|---|
Distinct characters | 249 |
Distinct categories | 9 ? |
Distinct scripts | 3 ? |
Distinct blocks | 3 ? |
Unique
Unique | 8635 ? |
---|---|
Unique (%) | 86.4% |
Sample
1st row | A.kirinense |
---|---|
2nd row | L.shimeji |
3rd row | E.mali |
4th row | R.coreanus |
5th row | A.minus |
Value | Count | Frequency (%) |
virus | 283 | 2.5% |
a.japonica | 30 | 0.3% |
c | 25 | 0.2% |
b | 23 | 0.2% |
a.koreana | 18 | 0.2% |
mosaic | 16 | 0.1% |
bovine | 16 | 0.1% |
herpesvirus | 15 | 0.1% |
a | 15 | 0.1% |
disease | 15 | 0.1% |
Other values (9399) | 10716 |
Most occurring characters
Value | Count | Frequency (%) |
a | 11510 | 9.9% |
i | 11094 | 9.6% |
. | 9651 | 8.3% |
s | 8664 | 7.5% |
e | 7143 | 6.2% |
n | 6465 | 5.6% |
r | 6149 | 5.3% |
u | 5628 | 4.8% |
o | 5546 | 4.8% |
l | 4923 | 4.2% |
Other values (239) | 39373 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 93838 | |
Uppercase Letter | 10136 | 8.7% |
Other Punctuation | 9677 | 8.3% |
Space Separator | 1426 | 1.2% |
Other Letter | 736 | 0.6% |
Close Punctuation | 120 | 0.1% |
Open Punctuation | 119 | 0.1% |
Decimal Number | 50 | < 0.1% |
Dash Punctuation | 44 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
충 | 70 | 9.5% |
구 | 19 | 2.6% |
지 | 18 | 2.4% |
리 | 17 | 2.3% |
스 | 17 | 2.3% |
제 | 17 | 2.3% |
모 | 15 | 2.0% |
이 | 15 | 2.0% |
선 | 15 | 2.0% |
편 | 14 | 1.9% |
Other values (169) | 519 |
Lowercase Letter
Value | Count | Frequency (%) |
a | 11510 | |
i | 11094 | |
s | 8664 | 9.2% |
e | 7143 | 7.6% |
n | 6465 | 6.9% |
r | 6149 | 6.6% |
u | 5628 | 6.0% |
o | 5546 | 5.9% |
l | 4923 | 5.2% |
t | 4653 | 5.0% |
Other values (16) | 22063 |
Uppercase Letter
Value | Count | Frequency (%) |
A | 3489 | |
P | 1034 | 10.2% |
C | 950 | 9.4% |
S | 630 | 6.2% |
M | 523 | 5.2% |
L | 403 | 4.0% |
E | 362 | 3.6% |
T | 336 | 3.3% |
H | 324 | 3.2% |
B | 313 | 3.1% |
Other values (16) | 1772 |
Decimal Number
Value | Count | Frequency (%) |
1 | 16 | |
2 | 13 | |
3 | 8 | |
4 | 7 | |
7 | 2 | 4.0% |
5 | 1 | 2.0% |
8 | 1 | 2.0% |
0 | 1 | 2.0% |
9 | 1 | 2.0% |
Other Punctuation
Value | Count | Frequency (%) |
. | 9651 | |
, | 20 | 0.2% |
' | 3 | < 0.1% |
※ | 2 | < 0.1% |
/ | 1 | < 0.1% |
Space Separator
Value | Count | Frequency (%) |
1426 |
Close Punctuation
Value | Count | Frequency (%) |
) | 120 |
Open Punctuation
Value | Count | Frequency (%) |
( | 119 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 44 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 103974 | |
Common | 11436 | 9.8% |
Hangul | 736 | 0.6% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
충 | 70 | 9.5% |
구 | 19 | 2.6% |
지 | 18 | 2.4% |
리 | 17 | 2.3% |
스 | 17 | 2.3% |
제 | 17 | 2.3% |
모 | 15 | 2.0% |
이 | 15 | 2.0% |
선 | 15 | 2.0% |
편 | 14 | 1.9% |
Other values (169) | 519 |
Latin
Value | Count | Frequency (%) |
a | 11510 | 11.1% |
i | 11094 | 10.7% |
s | 8664 | 8.3% |
e | 7143 | 6.9% |
n | 6465 | 6.2% |
r | 6149 | 5.9% |
u | 5628 | 5.4% |
o | 5546 | 5.3% |
l | 4923 | 4.7% |
t | 4653 | 4.5% |
Other values (42) | 32199 |
Common
Value | Count | Frequency (%) |
. | 9651 | |
1426 | 12.5% | |
) | 120 | 1.0% |
( | 119 | 1.0% |
- | 44 | 0.4% |
, | 20 | 0.2% |
1 | 16 | 0.1% |
2 | 13 | 0.1% |
3 | 8 | 0.1% |
4 | 7 | 0.1% |
Other values (8) | 12 | 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 115408 | |
Hangul | 736 | 0.6% |
Punctuation | 2 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 11510 | 10.0% |
i | 11094 | 9.6% |
. | 9651 | 8.4% |
s | 8664 | 7.5% |
e | 7143 | 6.2% |
n | 6465 | 5.6% |
r | 6149 | 5.3% |
u | 5628 | 4.9% |
o | 5546 | 4.8% |
l | 4923 | 4.3% |
Other values (59) | 38635 |
Hangul
Value | Count | Frequency (%) |
충 | 70 | 9.5% |
구 | 19 | 2.6% |
지 | 18 | 2.4% |
리 | 17 | 2.3% |
스 | 17 | 2.3% |
제 | 17 | 2.3% |
모 | 15 | 2.0% |
이 | 15 | 2.0% |
선 | 15 | 2.0% |
편 | 14 | 1.9% |
Other values (169) | 519 |
Punctuation
Value | Count | Frequency (%) |
※ | 2 |
등급
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 5 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
<NA> | |
---|---|
2 | |
1 | 72 |
3 | 65 |
4 | 18 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 3.3313 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | <NA> |
---|---|
2nd row | 2 |
3rd row | <NA> |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 7771 | |
2 | 2074 | 20.7% |
1 | 72 | 0.7% |
3 | 65 | 0.7% |
4 | 18 | 0.2% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 7771 | |
2 | 2074 | 20.7% |
1 | 72 | 0.7% |
3 | 65 | 0.7% |
4 | 18 | 0.2% |
분류 | 위험군 | 등급 | |
---|---|---|---|
분류 | 1.000 | 0.368 | 0.805 |
위험군 | 0.368 | 1.000 | 0.984 |
등급 | 0.805 | 0.984 | 1.000 |
분류 | 등급 | 위험군 | |
---|---|---|---|
분류 | 1.000 | 0.446 | 0.151 |
등급 | 0.446 | 1.000 | 0.827 |
위험군 | 0.151 | 0.827 | 1.000 |
분류 | 위험군 | 등급 | |
---|---|---|---|
분류 | 1.000 | 0.151 | 0.446 |
위험군 | 0.151 | 1.000 | 0.827 |
등급 | 0.446 | 0.827 | 1.000 |
분류 | 위험군 | 구분 | 생물체 | 등급 | |
---|---|---|---|---|---|
3979 | 동식물 | <NA> | Aconitum | A.kirinense | <NA> |
2824 | 진균 | 2 | Lyophyllum | L.shimeji | 2 |
8898 | 동식물 | <NA> | Eriophyes | E.mali | <NA> |
12699 | 동식물 | <NA> | Rhomphocallus | R.coreanus | <NA> |
6660 | 동식물 | <NA> | Arctium | A.minus | <NA> |
13080 | 동식물 | <NA> | Semisulcospira | S.libertina | <NA> |
6041 | 동식물 | <NA> | Alosterna | A.perpera | <NA> |
7278 | 동식물 | <NA> | Calopogonium | C.mucunoides | <NA> |
6011 | 동식물 | <NA> | Alonella | A.exigua | <NA> |
7608 | 동식물 | <NA> | Chitalpa | C.tashkinensis | <NA> |
분류 | 위험군 | 구분 | 생물체 | 등급 | |
---|---|---|---|---|---|
1068 | 세균 | 2 | Actinokineospora | A.inagensis | 2 |
12329 | 동식물 | <NA> | Prunus | P.ishidoyana | <NA> |
13560 | 동식물 | <NA> | Tegecoelotes | T.secundus | <NA> |
13122 | 동식물 | <NA> | Shiragaia | S.taeguensis | <NA> |
13734 | 동식물 | <NA> | Todarodes | T.pacificus | <NA> |
10829 | 동식물 | <NA> | Molophilus | M.avidus | <NA> |
11837 | 동식물 | <NA> | Philodromus | P.auricomus | <NA> |
1330 | 세균 | 2 | Achnanthes | A.rupestoides | 2 |
12990 | 동식물 | <NA> | Scirpus | S.juncoides | <NA> |
8123 | 동식물 | <NA> | Cryptoblabes | C.adoceta | <NA> |
Most frequently occurring
분류 | 위험군 | 구분 | 생물체 | 등급 | # duplicates | |
---|---|---|---|---|---|---|
0 | 동식물 | <NA> | Alopecurus | A.aequalis | <NA> | 2 |
1 | 동식물 | <NA> | Chrysso | C.lativentris | <NA> | 2 |
2 | 동식물 | <NA> | Clubiona | C.papillata | <NA> | 2 |
3 | 동식물 | <NA> | Hydrolithon | H.sargassi | <NA> | 2 |
4 | 동식물 | <NA> | Melanoplus | M.differentialis | <NA> | 2 |
5 | 동식물 | <NA> | Oncometopia | O.nigricans | <NA> | 2 |
6 | 동식물 | <NA> | Prosopis | P.juliflora | <NA> | 2 |
7 | 동식물 | <NA> | Sarcocheilichthys | S.variegatus | <NA> | 2 |
8 | 동식물 | <NA> | Spergularia | S.marina | <NA> | 2 |
9 | 동식물 | <NA> | Vespa | V.velutina | <NA> | 2 |