Dataset statistics
Number of variables | 6 |
---|---|
Number of observations | 2222 |
Missing cells | 2744 |
Missing cells (%) | 20.6% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 106.5 KiB |
Average record size in memory | 49.1 B |
Variable types
Categorical | 3 |
---|---|
Text | 2 |
Unsupported | 1 |
Dataset
Description | 식물검역 규제병해충 종정보(금지병해충 및 관리병해충) |
---|---|
Author | 농림축산검역본부 |
URL | https://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220214000000001888 |
검역지위 is highly overall correlated with 분류 | High correlation |
분류 is highly overall correlated with 검역지위 | High correlation |
검역지위 is highly imbalanced (79.8%) | Imbalance |
분류 is highly imbalanced (52.5%) | Imbalance |
검토 is highly imbalanced (98.2%) | Imbalance |
비고(일반명, 과명 등) has 522 (23.5%) missing values | Missing |
DS has 2222 (100.0%) missing values | Missing |
DS is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
Analysis started | 2023-12-11 03:05:47.059040 |
---|---|
Analysis finished | 2023-12-11 03:05:47.799728 |
Duration | 0.74 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
검역지위
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 4 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 17.5 KiB |
관리병해충 | |
---|---|
금지병해충 | 77 |
규제비검역병해충 | 51 |
금지병해충(매개충) | 7 |
Length
Max length | 10 |
---|---|
Median length | 5 |
Mean length | 5.0846085 |
Min length | 5 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 금지병해충 |
---|---|
2nd row | 금지병해충 |
3rd row | 금지병해충 |
4th row | 금지병해충 |
5th row | 금지병해충 |
Common Values
Value | Count | Frequency (%) |
관리병해충 | 2087 | |
금지병해충 | 77 | 3.5% |
규제비검역병해충 | 51 | 2.3% |
금지병해충(매개충) | 7 | 0.3% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
관리병해충 | 2087 | |
금지병해충 | 77 | 3.5% |
규제비검역병해충 | 51 | 2.3% |
금지병해충(매개충 | 7 | 0.3% |
분류
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 11 |
---|---|
Distinct (%) | 0.5% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 17.5 KiB |
곤충 | |
---|---|
진균 | |
바이러스 | 112 |
세균 | 62 |
응애 | 57 |
Other values (6) | 126 |
Length
Max length | 7 |
---|---|
Median length | 2 |
Mean length | 2.139964 |
Min length | 2 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | 진균 |
---|---|
2nd row | 진균 |
3rd row | 진균 |
4th row | 진균 |
5th row | 진균 |
Common Values
Value | Count | Frequency (%) |
곤충 | 1517 | |
진균 | 348 | 15.7% |
바이러스 | 112 | 5.0% |
세균 | 62 | 2.8% |
응애 | 57 | 2.6% |
잡초 | 46 | 2.1% |
선충 | 41 | 1.8% |
달팽이 | 21 | 0.9% |
바이로이드 | 10 | 0.5% |
곤충(매개충) | 7 | 0.3% |
Length
Value | Count | Frequency (%) |
곤충 | 1517 | |
진균 | 348 | 15.7% |
바이러스 | 112 | 5.0% |
세균 | 62 | 2.8% |
응애 | 57 | 2.6% |
잡초 | 46 | 2.1% |
선충 | 41 | 1.8% |
달팽이 | 21 | 0.9% |
바이로이드 | 10 | 0.5% |
곤충(매개충 | 7 | 0.3% |
학명
Text
Distinct | 2221 |
---|---|
Distinct (%) | > 99.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 17.5 KiB |
Length
Max length | 347 |
---|---|
Median length | 148 |
Mean length | 34.169217 |
Min length | 11 |
Characters and Unicode
Total characters | 75924 |
---|---|
Distinct characters | 84 |
Distinct categories | 11 ? |
Distinct scripts | 3 ? |
Distinct blocks | 3 ? |
Unique
Unique | 2220 ? |
---|---|
Unique (%) | 99.9% |
Sample
1st row | Balansia oryzae-sativae |
---|---|
2nd row | Cronartium coleosporioides |
3rd row | Peronospora tabacina |
4th row | Phytophthora ramorum |
5th row | Synchytrium endobioticum |
Value | Count | Frequency (%) |
379 | 4.3% | |
virus | 106 | 1.2% |
fabricius | 92 | 1.0% |
linnaeus | 64 | 0.7% |
walker | 51 | 0.6% |
l | 46 | 0.5% |
et | 39 | 0.4% |
say | 35 | 0.4% |
cockerell | 31 | 0.4% |
al | 30 | 0.3% |
Other values (4651) | 7959 |
Most occurring characters
Value | Count | Frequency (%) |
a | 6648 | 8.8% |
6578 | 8.7% | |
i | 5665 | 7.5% |
e | 5205 | 6.9% |
s | 4717 | 6.2% |
r | 4674 | 6.2% |
o | 4390 | 5.8% |
l | 3600 | 4.7% |
n | 3468 | 4.6% |
t | 3124 | 4.1% |
Other values (74) | 27855 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 58769 | |
Space Separator | 6578 | 8.7% |
Uppercase Letter | 5807 | 7.6% |
Other Punctuation | 1446 | 1.9% |
Open Punctuation | 1388 | 1.8% |
Close Punctuation | 1386 | 1.8% |
Decimal Number | 269 | 0.4% |
Math Symbol | 160 | 0.2% |
Control | 81 | 0.1% |
Dash Punctuation | 27 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 6648 | |
i | 5665 | 9.6% |
e | 5205 | 8.9% |
s | 4717 | 8.0% |
r | 4674 | 8.0% |
o | 4390 | 7.5% |
l | 3600 | 6.1% |
n | 3468 | 5.9% |
t | 3124 | 5.3% |
u | 3114 | 5.3% |
Other values (17) | 14164 |
Uppercase Letter
Value | Count | Frequency (%) |
C | 623 | 10.7% |
P | 528 | 9.1% |
S | 487 | 8.4% |
M | 424 | 7.3% |
B | 389 | 6.7% |
L | 346 | 6.0% |
D | 316 | 5.4% |
A | 309 | 5.3% |
H | 299 | 5.1% |
F | 275 | 4.7% |
Other values (16) | 1811 |
Decimal Number
Value | Count | Frequency (%) |
1 | 72 | |
9 | 56 | |
8 | 38 | |
7 | 23 | 8.6% |
3 | 16 | 5.9% |
5 | 15 | 5.6% |
4 | 14 | 5.2% |
0 | 13 | 4.8% |
2 | 12 | 4.5% |
6 | 10 | 3.7% |
Other Letter
Value | Count | Frequency (%) |
외 | 2 | |
군 | 2 | |
단 | 2 | |
제 | 2 | |
하 | 1 | |
위 | 1 | |
류 | 1 | |
포 | 1 | |
함 | 1 |
Other Punctuation
Value | Count | Frequency (%) |
. | 993 | |
& | 270 | 18.7% |
, | 114 | 7.9% |
? | 64 | 4.4% |
' | 3 | 0.2% |
: | 2 | 0.1% |
Space Separator
Value | Count | Frequency (%) |
6578 |
Open Punctuation
Value | Count | Frequency (%) |
( | 1388 |
Close Punctuation
Value | Count | Frequency (%) |
) | 1386 |
Math Symbol
Value | Count | Frequency (%) |
= | 160 |
Control
Value | Count | Frequency (%) |
81 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 27 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 64576 | |
Common | 11335 | 14.9% |
Hangul | 13 | < 0.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 6648 | 10.3% |
i | 5665 | 8.8% |
e | 5205 | 8.1% |
s | 4717 | 7.3% |
r | 4674 | 7.2% |
o | 4390 | 6.8% |
l | 3600 | 5.6% |
n | 3468 | 5.4% |
t | 3124 | 4.8% |
u | 3114 | 4.8% |
Other values (43) | 19971 |
Common
Value | Count | Frequency (%) |
6578 | ||
( | 1388 | 12.2% |
) | 1386 | 12.2% |
. | 993 | 8.8% |
& | 270 | 2.4% |
= | 160 | 1.4% |
, | 114 | 1.0% |
81 | 0.7% | |
1 | 72 | 0.6% |
? | 64 | 0.6% |
Other values (12) | 229 | 2.0% |
Hangul
Value | Count | Frequency (%) |
외 | 2 | |
군 | 2 | |
단 | 2 | |
제 | 2 | |
하 | 1 | |
위 | 1 | |
류 | 1 | |
포 | 1 | |
함 | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 75910 | |
Hangul | 13 | < 0.1% |
None | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 6648 | 8.8% |
6578 | 8.7% | |
i | 5665 | 7.5% |
e | 5205 | 6.9% |
s | 4717 | 6.2% |
r | 4674 | 6.2% |
o | 4390 | 5.8% |
l | 3600 | 4.7% |
n | 3468 | 4.6% |
t | 3124 | 4.1% |
Other values (64) | 27841 |
Hangul
Value | Count | Frequency (%) |
외 | 2 | |
군 | 2 | |
단 | 2 | |
제 | 2 | |
하 | 1 | |
위 | 1 | |
류 | 1 | |
포 | 1 | |
함 | 1 |
None
Value | Count | Frequency (%) |
ø | 1 |
비고(일반명, 과명 등)
Text
MISSING
 
Distinct | 295 |
---|---|
Distinct (%) | 17.4% |
Missing | 522 |
Missing (%) | 23.5% |
Memory size | 17.5 KiB |
Value | Count | Frequency (%) |
curculionidae(바구미과 | 155 | 8.7% |
diaspididae(깍지벌레과 | 92 | 5.2% |
pseudococcidae(가루깍지벌레과 | 82 | 4.6% |
cerambycidae(하늘소과 | 67 | 3.8% |
scolytidae(나무좀과 | 66 | 3.7% |
tortricidae(잎말이나방과 | 61 | 3.4% |
chrysomelidae(잎벌레과 | 53 | 3.0% |
cicadellidae(매미충과 | 51 | 2.9% |
thripidae(총채벌레과 | 50 | 2.8% |
aphididae(진딧물과 | 48 | 2.7% |
Other values (336) | 1055 |
Most occurring characters
Value | Count | Frequency (%) |
i | 2599 | 8.8% |
e | 2356 | 8.0% |
a | 2211 | 7.5% |
d | 2023 | 6.9% |
과 | 1568 | 5.3% |
( | 1560 | 5.3% |
) | 1560 | 5.3% |
c | 1068 | 3.6% |
o | 1053 | 3.6% |
r | 1034 | 3.5% |
Other values (322) | 12476 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 16690 | |
Other Letter | 7974 | |
Uppercase Letter | 1611 | 5.5% |
Open Punctuation | 1560 | 5.3% |
Close Punctuation | 1560 | 5.3% |
Space Separator | 91 | 0.3% |
Other Punctuation | 14 | < 0.1% |
Dash Punctuation | 7 | < 0.1% |
Final Punctuation | 1 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
과 | 1568 | |
벌 | 474 | 5.9% |
레 | 440 | 5.5% |
나 | 375 | 4.7% |
미 | 280 | 3.5% |
방 | 261 | 3.3% |
지 | 236 | 3.0% |
깍 | 231 | 2.9% |
구 | 208 | 2.6% |
이 | 190 | 2.4% |
Other values (269) | 3711 |
Lowercase Letter
Value | Count | Frequency (%) |
i | 2599 | |
e | 2356 | |
a | 2211 | |
d | 2023 | |
c | 1068 | |
o | 1053 | |
r | 1034 | 6.2% |
l | 702 | 4.2% |
t | 533 | 3.2% |
u | 530 | 3.2% |
Other values (15) | 2581 |
Uppercase Letter
Value | Count | Frequency (%) |
C | 432 | |
T | 207 | |
P | 201 | |
A | 150 | 9.3% |
S | 118 | 7.3% |
D | 113 | 7.0% |
N | 58 | 3.6% |
B | 56 | 3.5% |
L | 52 | 3.2% |
E | 43 | 2.7% |
Other values (12) | 181 |
Open Punctuation
Value | Count | Frequency (%) |
( | 1560 |
Close Punctuation
Value | Count | Frequency (%) |
) | 1560 |
Space Separator
Value | Count | Frequency (%) |
91 |
Other Punctuation
Value | Count | Frequency (%) |
, | 14 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 7 |
Final Punctuation
Value | Count | Frequency (%) |
’ | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 18301 | |
Hangul | 7974 | |
Common | 3233 | 11.0% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
과 | 1568 | |
벌 | 474 | 5.9% |
레 | 440 | 5.5% |
나 | 375 | 4.7% |
미 | 280 | 3.5% |
방 | 261 | 3.3% |
지 | 236 | 3.0% |
깍 | 231 | 2.9% |
구 | 208 | 2.6% |
이 | 190 | 2.4% |
Other values (269) | 3711 |
Latin
Value | Count | Frequency (%) |
i | 2599 | |
e | 2356 | |
a | 2211 | |
d | 2023 | |
c | 1068 | 5.8% |
o | 1053 | 5.8% |
r | 1034 | 5.6% |
l | 702 | 3.8% |
t | 533 | 2.9% |
u | 530 | 2.9% |
Other values (37) | 4192 |
Common
Value | Count | Frequency (%) |
( | 1560 | |
) | 1560 | |
91 | 2.8% | |
, | 14 | 0.4% |
- | 7 | 0.2% |
’ | 1 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 21533 | |
Hangul | 7974 | 27.0% |
Punctuation | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
i | 2599 | |
e | 2356 | |
a | 2211 | |
d | 2023 | 9.4% |
( | 1560 | 7.2% |
) | 1560 | 7.2% |
c | 1068 | 5.0% |
o | 1053 | 4.9% |
r | 1034 | 4.8% |
l | 702 | 3.3% |
Other values (42) | 5367 |
Hangul
Value | Count | Frequency (%) |
과 | 1568 | |
벌 | 474 | 5.9% |
레 | 440 | 5.5% |
나 | 375 | 4.7% |
미 | 280 | 3.5% |
방 | 261 | 3.3% |
지 | 236 | 3.0% |
깍 | 231 | 2.9% |
구 | 208 | 2.6% |
이 | 190 | 2.4% |
Other values (269) | 3711 |
Punctuation
Value | Count | Frequency (%) |
’ | 1 |
검토
Categorical
IMBALANCE
 
Distinct | 3 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 17.5 KiB |
<NA> | |
---|---|
학명 수정 | 5 |
제외 | 1 |
Length
Max length | 5 |
---|---|
Median length | 4 |
Mean length | 4.0013501 |
Min length | 2 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | <NA> |
---|---|
2nd row | <NA> |
3rd row | 학명 수정 |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 2216 | |
학명 수정 | 5 | 0.2% |
제외 | 1 | < 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 2216 | |
학명 | 5 | 0.2% |
수정 | 5 | 0.2% |
제외 | 1 | < 0.1% |
DS
Unsupported
MISSING
  REJECTED
  UNSUPPORTED
 
Missing | 2222 |
---|---|
Missing (%) | 100.0% |
Memory size | 19.7 KiB |
검역지위 | 분류 | 검토 | |
---|---|---|---|
검역지위 | 1.000 | 0.779 | 0.000 |
분류 | 0.779 | 1.000 | 0.000 |
검토 | 0.000 | 0.000 | 1.000 |
검역지위 | 검토 | 분류 | |
---|---|---|---|
검역지위 | 1.000 | 0.000 | 0.603 |
검토 | 0.000 | 1.000 | 0.000 |
분류 | 0.603 | 0.000 | 1.000 |
검역지위 | 분류 | 검토 | |
---|---|---|---|
검역지위 | 1.000 | 0.603 | 0.000 |
분류 | 0.603 | 1.000 | 0.000 |
검토 | 0.000 | 0.000 | 1.000 |
검역지위 | 분류 | 학명 | 비고(일반명, 과명 등) | 검토 | DS | |
---|---|---|---|---|---|---|
0 | 금지병해충 | 진균 | Balansia oryzae-sativae | 벼이삭미이라병 | <NA> | <NA> |
1 | 금지병해충 | 진균 | Cronartium coleosporioides | 소나무종유석병 | <NA> | <NA> |
2 | 금지병해충 | 진균 | Peronospora tabacina | 담배노균병 | 학명 수정 | <NA> |
3 | 금지병해충 | 진균 | Phytophthora ramorum | 참나무역병 | <NA> | <NA> |
4 | 금지병해충 | 진균 | Synchytrium endobioticum | 감자암종병 | <NA> | <NA> |
5 | 금지병해충 | 세균 | Candidatus Liberibacter solanacearum | 제브라칩병 | <NA> | <NA> |
6 | 금지병해충 | 세균 | Citrus huanglongbing(greening) disease | 감귤그린병 | 학명 수정 | <NA> |
7 | 금지병해충 | 세균 | Xylella fastidiosa | 포도피어슨병 | <NA> | <NA> |
8 | 금지병해충 | 세균 | Erwinia amylovora | 과수화상병 | <NA> | <NA> |
9 | 금지병해충 | 세균 | Apple proliferation phytoplasma | 사과빗자루병 | 학명 수정 | <NA> |
검역지위 | 분류 | 학명 | 비고(일반명, 과명 등) | 검토 | DS | |
---|---|---|---|---|---|---|
2212 | 규제비검역병해충 | 잡초 | Aneilema keisak Hassk. | 사마귀풀 | <NA> | <NA> |
2213 | 규제비검역병해충 | 잡초 | Capsella bursa-pastoris (L.) Medik. | 냉이 | <NA> | <NA> |
2214 | 규제비검역병해충 | 잡초 | Cruciferae family | 십자화과 잡초 | <NA> | <NA> |
2215 | 규제비검역병해충 | 잡초 | Cuscuta spp. | 새삼 | <NA> | <NA> |
2216 | 규제비검역병해충 | 잡초 | Echinochloa crus-galli (하위군류군 포함) (L.) Beauv. | 돌피 | <NA> | <NA> |
2217 | 규제비검역병해충 | 잡초 | Echinochloa utilis Ohwi et Yabuno | 피 | <NA> | <NA> |
2218 | 규제비검역병해충 | 잡초 | Monochoria vaginalis (Burn. f.) Presl | 물달개비 | <NA> | <NA> |
2219 | 규제비검역병해충 | 잡초 | Persicaria hydropiper (L.) Spach | 여뀌 | <NA> | <NA> |
2220 | 규제비검역병해충 | 잡초 | Rotala indica (Willd.) Koehne | 마디꽃 | <NA> | <NA> |
2221 | 규제비검역병해충 | 잡초 | Schoenoplectiella juncoides (Roxb.) Lye | 올챙이고랭이 | <NA> | <NA> |