Dataset statistics
Number of variables | 3 |
---|---|
Number of observations | 2071 |
Missing cells | 119 |
Missing cells (%) | 1.9% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 48.7 KiB |
Average record size in memory | 24.1 B |
Variable types
Categorical | 1 |
---|---|
Text | 2 |
Dataset
Description | 식물검역 병해충정보로서 관리병균, 관리해충, 관리잡초에 대한 정보이며, 병해충 위험평가후 위원회에서 확정되면 업데이트 |
---|---|
URL | https://www.data.go.kr/data/15073134/fileData.do |
별표. is highly imbalanced (54.4%) | Imbalance |
Unnamed: 2 has 118 (5.7%) missing values | Missing |
Reproduction
Analysis started | 2023-12-12 16:00:48.916607 |
---|---|
Analysis finished | 2023-12-12 16:00:49.413923 |
Duration | 0.5 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
별표.
Categorical
IMBALANCE
 
Distinct | 11 |
---|---|
Distinct (%) | 0.5% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 16.3 KiB |
곤충 | |
---|---|
진균 | |
바이러스 | 100 |
응애 | 57 |
세균 | 54 |
Other values (6) | 100 |
Length
Max length | 5 |
---|---|
Median length | 2 |
Mean length | 2.1207146 |
Min length | 2 |
Unique
Unique | 2 ? |
---|---|
Unique (%) | 0.1% |
Sample
1st row | <NA> |
---|---|
2nd row | 구분 |
3rd row | 진균 |
4th row | 진균 |
5th row | 진균 |
Common Values
Value | Count | Frequency (%) |
곤충 | 1442 | |
진균 | 318 | 15.4% |
바이러스 | 100 | 4.8% |
응애 | 57 | 2.8% |
세균 | 54 | 2.6% |
잡초 | 35 | 1.7% |
선충 | 33 | 1.6% |
달팽이 | 21 | 1.0% |
바이로이드 | 9 | 0.4% |
<NA> | 1 | < 0.1% |
Length
Value | Count | Frequency (%) |
곤충 | 1442 | |
진균 | 318 | 15.4% |
바이러스 | 100 | 4.8% |
응애 | 57 | 2.8% |
세균 | 54 | 2.6% |
잡초 | 35 | 1.7% |
선충 | 33 | 1.6% |
달팽이 | 21 | 1.0% |
바이로이드 | 9 | 0.4% |
na | 1 | < 0.1% |
관리병해충(제2조 관련)
Text
Distinct | 2070 |
---|---|
Distinct (%) | 100.0% |
Missing | 1 |
Missing (%) | < 0.1% |
Memory size | 16.3 KiB |
Length
Max length | 347 |
---|---|
Median length | 145 |
Mean length | 34.45314 |
Min length | 2 |
Characters and Unicode
Total characters | 71318 |
---|---|
Distinct characters | 87 |
Distinct categories | 11 ? |
Distinct scripts | 3 ? |
Distinct blocks | 3 ? |
Unique
Unique | 2070 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | 학명 |
---|---|
2nd row | Acroconidiella tropaeoli (T.E.T. Bond) J.C. Lindq. & Alippi |
3rd row | Alternaria triticina Prasada & Prabhu |
4th row | Anisogramma anomala (Peck) E. Müll. |
5th row | Aphanomyces euteiches Drechsler |
Value | Count | Frequency (%) |
401 | 4.8% | |
virus | 101 | 1.2% |
fabricius | 90 | 1.1% |
linnaeus | 64 | 0.8% |
walker | 51 | 0.6% |
et | 36 | 0.4% |
say | 35 | 0.4% |
cockerell | 30 | 0.4% |
leconte | 27 | 0.3% |
al | 27 | 0.3% |
Other values (4411) | 7460 |
Most occurring characters
Value | Count | Frequency (%) |
a | 6158 | 8.6% |
6150 | 8.6% | |
i | 5354 | 7.5% |
e | 4890 | 6.9% |
s | 4454 | 6.2% |
r | 4339 | 6.1% |
o | 4100 | 5.7% |
l | 3396 | 4.8% |
n | 3289 | 4.6% |
u | 2948 | 4.1% |
Other values (77) | 26240 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 55167 | |
Space Separator | 6150 | 8.6% |
Uppercase Letter | 5454 | 7.6% |
Open Punctuation | 1335 | 1.9% |
Close Punctuation | 1332 | 1.9% |
Other Punctuation | 1265 | 1.8% |
Decimal Number | 277 | 0.4% |
Math Symbol | 155 | 0.2% |
Control | 151 | 0.2% |
Dash Punctuation | 24 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 6158 | |
i | 5354 | |
e | 4890 | 8.9% |
s | 4454 | 8.1% |
r | 4339 | 7.9% |
o | 4100 | 7.4% |
l | 3396 | 6.2% |
n | 3289 | 6.0% |
u | 2948 | 5.3% |
t | 2924 | 5.3% |
Other values (26) | 13315 |
Uppercase Letter
Value | Count | Frequency (%) |
C | 578 | 10.6% |
P | 509 | 9.3% |
S | 459 | 8.4% |
M | 397 | 7.3% |
B | 350 | 6.4% |
L | 313 | 5.7% |
D | 303 | 5.6% |
H | 286 | 5.2% |
A | 284 | 5.2% |
F | 265 | 4.9% |
Other values (16) | 1710 |
Decimal Number
Value | Count | Frequency (%) |
1 | 75 | |
9 | 57 | |
8 | 39 | |
7 | 24 | 8.7% |
3 | 16 | 5.8% |
0 | 15 | 5.4% |
5 | 15 | 5.4% |
4 | 14 | 5.1% |
2 | 12 | 4.3% |
6 | 10 | 3.6% |
Other Letter
Value | Count | Frequency (%) |
단 | 2 | |
제 | 2 | |
외 | 2 | |
학 | 1 | |
명 | 1 |
Other Punctuation
Value | Count | Frequency (%) |
. | 895 | |
& | 253 | 20.0% |
, | 114 | 9.0% |
' | 3 | 0.2% |
Space Separator
Value | Count | Frequency (%) |
6150 |
Open Punctuation
Value | Count | Frequency (%) |
( | 1335 |
Close Punctuation
Value | Count | Frequency (%) |
) | 1332 |
Math Symbol
Value | Count | Frequency (%) |
= | 155 |
Control
Value | Count | Frequency (%) |
151 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 24 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 60621 | |
Common | 10689 | 15.0% |
Hangul | 8 | < 0.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 6158 | 10.2% |
i | 5354 | 8.8% |
e | 4890 | 8.1% |
s | 4454 | 7.3% |
r | 4339 | 7.2% |
o | 4100 | 6.8% |
l | 3396 | 5.6% |
n | 3289 | 5.4% |
u | 2948 | 4.9% |
t | 2924 | 4.8% |
Other values (52) | 18769 |
Common
Value | Count | Frequency (%) |
6150 | ||
( | 1335 | 12.5% |
) | 1332 | 12.5% |
. | 895 | 8.4% |
& | 253 | 2.4% |
= | 155 | 1.5% |
151 | 1.4% | |
, | 114 | 1.1% |
1 | 75 | 0.7% |
9 | 57 | 0.5% |
Other values (10) | 172 | 1.6% |
Hangul
Value | Count | Frequency (%) |
단 | 2 | |
제 | 2 | |
외 | 2 | |
학 | 1 | |
명 | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 71252 | |
None | 58 | 0.1% |
Hangul | 8 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 6158 | 8.6% |
6150 | 8.6% | |
i | 5354 | 7.5% |
e | 4890 | 6.9% |
s | 4454 | 6.3% |
r | 4339 | 6.1% |
o | 4100 | 5.8% |
l | 3396 | 4.8% |
n | 3289 | 4.6% |
u | 2948 | 4.1% |
Other values (62) | 26174 |
None
Value | Count | Frequency (%) |
é | 20 | |
ü | 18 | |
ö | 8 | 13.8% |
ë | 3 | 5.2% |
á | 3 | 5.2% |
ä | 2 | 3.4% |
å | 1 | 1.7% |
ý | 1 | 1.7% |
ó | 1 | 1.7% |
ø | 1 | 1.7% |
Hangul
Value | Count | Frequency (%) |
단 | 2 | |
제 | 2 | |
외 | 2 | |
학 | 1 | |
명 | 1 |
Unnamed: 2
Text
MISSING
 
Distinct | 523 |
---|---|
Distinct (%) | 26.8% |
Missing | 118 |
Missing (%) | 5.7% |
Memory size | 16.3 KiB |
Length
Max length | 82 |
---|---|
Median length | 47 |
Mean length | 17.579109 |
Min length | 1 |
Characters and Unicode
Total characters | 34332 |
---|---|
Distinct characters | 241 |
Distinct categories | 12 ? |
Distinct scripts | 3 ? |
Distinct blocks | 3 ? |
Unique
Unique | 383 ? |
---|---|
Unique (%) | 19.6% |
Sample
1st row | 비고(일반명, 과명) |
---|---|
2nd row | Leaf spot |
3rd row | Leaf blight |
4th row | Eastern filbert blight |
5th row | Root rot |
Value | Count | Frequency (%) |
curculionidae(바구미과 | 154 | 5.9% |
diaspididae(깍지벌레과 | 91 | 3.5% |
pseudococcidae(가루깍지벌레과 | 79 | 3.0% |
scolytidae(나무좀과 | 66 | 2.5% |
cerambycidae(하늘소과 | 64 | 2.4% |
tortricidae(잎말이나방과 | 61 | 2.3% |
virus | 56 | 2.1% |
spot | 54 | 2.1% |
chrysomelidae(잎벌레과 | 52 | 2.0% |
leaf | 52 | 2.0% |
Other values (583) | 1884 |
Most occurring characters
Value | Count | Frequency (%) |
i | 2908 | 8.5% |
e | 2789 | 8.1% |
a | 2606 | 7.6% |
d | 2098 | 6.1% |
( | 1536 | 4.5% |
) | 1536 | 4.5% |
과 | 1535 | 4.5% |
o | 1456 | 4.2% |
r | 1435 | 4.2% |
c | 1259 | 3.7% |
Other values (231) | 15174 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 21250 | |
Other Letter | 7338 | 21.4% |
Uppercase Letter | 1944 | 5.7% |
Open Punctuation | 1536 | 4.5% |
Close Punctuation | 1536 | 4.5% |
Space Separator | 687 | 2.0% |
Other Punctuation | 20 | 0.1% |
Dash Punctuation | 16 | < 0.1% |
Modifier Symbol | 2 | < 0.1% |
Final Punctuation | 1 | < 0.1% |
Other values (2) | 2 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
과 | 1535 | |
벌 | 466 | 6.4% |
레 | 431 | 5.9% |
나 | 364 | 5.0% |
미 | 271 | 3.7% |
방 | 255 | 3.5% |
지 | 230 | 3.1% |
깍 | 226 | 3.1% |
구 | 203 | 2.8% |
바 | 177 | 2.4% |
Other values (170) | 3180 |
Lowercase Letter
Value | Count | Frequency (%) |
i | 2908 | |
e | 2789 | |
a | 2606 | |
d | 2098 | |
o | 1456 | 6.9% |
r | 1435 | 6.8% |
c | 1259 | 5.9% |
l | 985 | 4.6% |
t | 958 | 4.5% |
u | 719 | 3.4% |
Other values (16) | 4037 |
Uppercase Letter
Value | Count | Frequency (%) |
C | 459 | |
P | 232 | |
T | 223 | |
A | 169 | 8.7% |
S | 160 | 8.2% |
D | 125 | 6.4% |
B | 105 | 5.4% |
L | 81 | 4.2% |
N | 61 | 3.1% |
F | 56 | 2.9% |
Other values (15) | 273 |
Other Punctuation
Value | Count | Frequency (%) |
, | 15 | |
' | 5 | 25.0% |
Open Punctuation
Value | Count | Frequency (%) |
( | 1536 |
Close Punctuation
Value | Count | Frequency (%) |
) | 1536 |
Space Separator
Value | Count | Frequency (%) |
687 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 16 |
Modifier Symbol
Value | Count | Frequency (%) |
` | 2 |
Final Punctuation
Value | Count | Frequency (%) |
’ | 1 |
Control
Value | Count | Frequency (%) |
1 |
Math Symbol
Value | Count | Frequency (%) |
= | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 23194 | |
Hangul | 7338 | 21.4% |
Common | 3800 | 11.1% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
과 | 1535 | |
벌 | 466 | 6.4% |
레 | 431 | 5.9% |
나 | 364 | 5.0% |
미 | 271 | 3.7% |
방 | 255 | 3.5% |
지 | 230 | 3.1% |
깍 | 226 | 3.1% |
구 | 203 | 2.8% |
바 | 177 | 2.4% |
Other values (170) | 3180 |
Latin
Value | Count | Frequency (%) |
i | 2908 | |
e | 2789 | |
a | 2606 | |
d | 2098 | 9.0% |
o | 1456 | 6.3% |
r | 1435 | 6.2% |
c | 1259 | 5.4% |
l | 985 | 4.2% |
t | 958 | 4.1% |
u | 719 | 3.1% |
Other values (41) | 5981 |
Common
Value | Count | Frequency (%) |
( | 1536 | |
) | 1536 | |
687 | ||
- | 16 | 0.4% |
, | 15 | 0.4% |
' | 5 | 0.1% |
` | 2 | 0.1% |
’ | 1 | < 0.1% |
1 | < 0.1% | |
= | 1 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 26993 | |
Hangul | 7338 | 21.4% |
Punctuation | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
i | 2908 | 10.8% |
e | 2789 | 10.3% |
a | 2606 | 9.7% |
d | 2098 | 7.8% |
( | 1536 | 5.7% |
) | 1536 | 5.7% |
o | 1456 | 5.4% |
r | 1435 | 5.3% |
c | 1259 | 4.7% |
l | 985 | 3.6% |
Other values (50) | 8385 |
Hangul
Value | Count | Frequency (%) |
과 | 1535 | |
벌 | 466 | 6.4% |
레 | 431 | 5.9% |
나 | 364 | 5.0% |
미 | 271 | 3.7% |
방 | 255 | 3.5% |
지 | 230 | 3.1% |
깍 | 226 | 3.1% |
구 | 203 | 2.8% |
바 | 177 | 2.4% |
Other values (170) | 3180 |
Punctuation
Value | Count | Frequency (%) |
’ | 1 |
별표. | 관리병해충(제2조 관련) | Unnamed: 2 | |
---|---|---|---|
0 | <NA> | <NA> | <NA> |
1 | 구분 | 학명 | 비고(일반명, 과명) |
2 | 진균 | Acroconidiella tropaeoli (T.E.T. Bond) J.C. Lindq. & Alippi | Leaf spot |
3 | 진균 | Alternaria triticina Prasada & Prabhu | Leaf blight |
4 | 진균 | Anisogramma anomala (Peck) E. Müll. | Eastern filbert blight |
5 | 진균 | Aphanomyces euteiches Drechsler | Root rot |
6 | 진균 | Apiosporina morbosa (Schwein.) Arx (= Dibotryon morbosum (Schwein.) Theiss. & Syd.) | Black know |
7 | 진균 | Ascochyta corticola McAlpine | Bark boltch |
8 | 진균 | Ascochyta ligulariae Sawada | <NA> |
9 | 진균 | Ascochyta sorghi Sacc. | Rough leaf spot |
별표. | 관리병해충(제2조 관련) | Unnamed: 2 | |
---|---|---|---|
2061 | 잡초 | Oenanthe pimpinelloides | Corky-fruited water-dropwort |
2062 | 잡초 | Onopordum acanthium | Scotch thistle |
2063 | 잡초 | Orobanche spp. | <NA> |
2064 | 잡초 | Rhaponticum repens (= Centaurea repens) | Russian Knapweed |
2065 | 잡초 | Salvinia adnata (= Salvinia molesta) | Karibaweed |
2066 | 잡초 | Senecio jacobaea | Stinking willie |
2067 | 잡초 | Solanum elaeagnifolium | White horsenettle |
2068 | 잡초 | Striga spp. | Witchweed |
2069 | 잡초 | Themeda quadrivalvis | Grader grass |
2070 | 잡초 | Xanthium spinosum | Prickly burweed |