Overview

Dataset statistics

Number of variables3
Number of observations2222
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory52.2 KiB
Average record size in memory24.1 B

Variable types

Categorical2
Text1

Dataset

Description식물검역 규제병해충 지정 현황으로 검역지위와 분류, 학명, 일반명 등에 대한 자료를 제공합니다. 현재 규제병해충 지정현황은 2022건입니다.
URLhttps://www.data.go.kr/data/3055531/fileData.do

Alerts

검역지위 is highly overall correlated with 분류High correlation
분류 is highly overall correlated with 검역지위High correlation
검역지위 is highly imbalanced (79.8%)Imbalance
분류 is highly imbalanced (52.5%)Imbalance

Reproduction

Analysis started2023-12-12 22:08:04.929243
Analysis finished2023-12-12 22:08:05.244793
Duration0.32 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

검역지위
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
관리병해충
2087 
금지병해충
 
77
규제비검역병해충
 
51
금지병해충(매개충)
 
7

Length

Max length10
Median length5
Mean length5.0846085
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row금지병해충
2nd row금지병해충
3rd row금지병해충
4th row금지병해충
5th row금지병해충

Common Values

ValueCountFrequency (%)
관리병해충 2087
93.9%
금지병해충 77
 
3.5%
규제비검역병해충 51
 
2.3%
금지병해충(매개충) 7
 
0.3%

Length

2023-12-13T07:08:05.318493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:08:05.437972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
관리병해충 2087
93.9%
금지병해충 77
 
3.5%
규제비검역병해충 51
 
2.3%
금지병해충(매개충 7
 
0.3%

분류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct11
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
곤충
1517 
진균
348 
바이러스
 
112
세균
 
62
응애
 
57
Other values (6)
 
126

Length

Max length7
Median length2
Mean length2.139964
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row진균
2nd row진균
3rd row진균
4th row진균
5th row진균

Common Values

ValueCountFrequency (%)
곤충 1517
68.3%
진균 348
 
15.7%
바이러스 112
 
5.0%
세균 62
 
2.8%
응애 57
 
2.6%
잡초 46
 
2.1%
선충 41
 
1.8%
달팽이 21
 
0.9%
바이로이드 10
 
0.5%
곤충(매개충) 7
 
0.3%

Length

2023-12-13T07:08:05.556233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
곤충 1517
68.3%
진균 348
 
15.7%
바이러스 112
 
5.0%
세균 62
 
2.8%
응애 57
 
2.6%
잡초 46
 
2.1%
선충 41
 
1.8%
달팽이 21
 
0.9%
바이로이드 10
 
0.5%
곤충(매개충 7
 
0.3%

학명
Text

Distinct2221
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
2023-12-13T07:08:05.836171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length341
Median length146
Mean length34.132763
Min length11

Characters and Unicode

Total characters75843
Distinct characters83
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2220 ?
Unique (%)99.9%

Sample

1st rowBalansia oryzae-sativae
2nd rowCronartium coleosporioides
3rd rowPeronospora tabacina
4th rowPhytophthora ramorum
5th rowSynchytrium endobioticum
ValueCountFrequency (%)
305
 
3.5%
virus 106
 
1.2%
fabricius 92
 
1.1%
linnaeus 64
 
0.7%
walker 51
 
0.6%
l 46
 
0.5%
et 39
 
0.4%
say 35
 
0.4%
cockerell 31
 
0.4%
al 30
 
0.3%
Other values (4651) 7957
90.9%
2023-12-13T07:08:06.313017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 6648
 
8.8%
6578
 
8.7%
i 5665
 
7.5%
e 5205
 
6.9%
s 4717
 
6.2%
r 4674
 
6.2%
o 4390
 
5.8%
l 3600
 
4.7%
n 3468
 
4.6%
t 3124
 
4.1%
Other values (73) 27774
36.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 58769
77.5%
Space Separator 6578
 
8.7%
Uppercase Letter 5807
 
7.7%
Other Punctuation 1446
 
1.9%
Open Punctuation 1388
 
1.8%
Close Punctuation 1386
 
1.8%
Decimal Number 269
 
0.4%
Math Symbol 160
 
0.2%
Dash Punctuation 27
 
< 0.1%
Other Letter 13
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 6648
11.3%
i 5665
 
9.6%
e 5205
 
8.9%
s 4717
 
8.0%
r 4674
 
8.0%
o 4390
 
7.5%
l 3600
 
6.1%
n 3468
 
5.9%
t 3124
 
5.3%
u 3114
 
5.3%
Other values (17) 14164
24.1%
Uppercase Letter
ValueCountFrequency (%)
C 623
 
10.7%
P 528
 
9.1%
S 487
 
8.4%
M 424
 
7.3%
B 389
 
6.7%
L 346
 
6.0%
D 316
 
5.4%
A 309
 
5.3%
H 299
 
5.1%
F 275
 
4.7%
Other values (16) 1811
31.2%
Decimal Number
ValueCountFrequency (%)
1 72
26.8%
9 56
20.8%
8 38
14.1%
7 23
 
8.6%
3 16
 
5.9%
5 15
 
5.6%
4 14
 
5.2%
0 13
 
4.8%
2 12
 
4.5%
6 10
 
3.7%
Other Letter
ValueCountFrequency (%)
2
15.4%
2
15.4%
2
15.4%
2
15.4%
1
7.7%
1
7.7%
1
7.7%
1
7.7%
1
7.7%
Other Punctuation
ValueCountFrequency (%)
. 993
68.7%
& 270
 
18.7%
, 114
 
7.9%
? 64
 
4.4%
' 3
 
0.2%
: 2
 
0.1%
Space Separator
ValueCountFrequency (%)
6578
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1388
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1386
100.0%
Math Symbol
ValueCountFrequency (%)
= 160
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 27
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 64576
85.1%
Common 11254
 
14.8%
Hangul 13
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 6648
 
10.3%
i 5665
 
8.8%
e 5205
 
8.1%
s 4717
 
7.3%
r 4674
 
7.2%
o 4390
 
6.8%
l 3600
 
5.6%
n 3468
 
5.4%
t 3124
 
4.8%
u 3114
 
4.8%
Other values (43) 19971
30.9%
Common
ValueCountFrequency (%)
6578
58.5%
( 1388
 
12.3%
) 1386
 
12.3%
. 993
 
8.8%
& 270
 
2.4%
= 160
 
1.4%
, 114
 
1.0%
1 72
 
0.6%
? 64
 
0.6%
9 56
 
0.5%
Other values (11) 173
 
1.5%
Hangul
ValueCountFrequency (%)
2
15.4%
2
15.4%
2
15.4%
2
15.4%
1
7.7%
1
7.7%
1
7.7%
1
7.7%
1
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 75829
> 99.9%
Hangul 13
 
< 0.1%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 6648
 
8.8%
6578
 
8.7%
i 5665
 
7.5%
e 5205
 
6.9%
s 4717
 
6.2%
r 4674
 
6.2%
o 4390
 
5.8%
l 3600
 
4.7%
n 3468
 
4.6%
t 3124
 
4.1%
Other values (63) 27760
36.6%
Hangul
ValueCountFrequency (%)
2
15.4%
2
15.4%
2
15.4%
2
15.4%
1
7.7%
1
7.7%
1
7.7%
1
7.7%
1
7.7%
None
ValueCountFrequency (%)
ø 1
100.0%

Correlations

2023-12-13T07:08:06.422674image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
검역지위분류
검역지위1.0000.779
분류0.7791.000
2023-12-13T07:08:06.505466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분류검역지위
분류1.0000.603
검역지위0.6031.000
2023-12-13T07:08:06.592034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
검역지위분류
검역지위1.0000.603
분류0.6031.000

Missing values

2023-12-13T07:08:05.134057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:08:05.210017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

검역지위분류학명
0금지병해충진균Balansia oryzae-sativae
1금지병해충진균Cronartium coleosporioides
2금지병해충진균Peronospora tabacina
3금지병해충진균Phytophthora ramorum
4금지병해충진균Synchytrium endobioticum
5금지병해충세균Candidatus Liberibacter solanacearum
6금지병해충세균Citrus huanglongbing(greening) disease
7금지병해충세균Xylella fastidiosa
8금지병해충세균Erwinia amylovora
9금지병해충세균Apple proliferation phytoplasma
검역지위분류학명
2212규제비검역병해충잡초Aneilema keisak Hassk.
2213규제비검역병해충잡초Capsella bursa-pastoris (L.) Medik.
2214규제비검역병해충잡초Cruciferae family
2215규제비검역병해충잡초Cuscuta spp.
2216규제비검역병해충잡초Echinochloa crus-galli (하위군류군 포함) (L.) Beauv.
2217규제비검역병해충잡초Echinochloa utilis Ohwi et Yabuno
2218규제비검역병해충잡초Monochoria vaginalis (Burn. f.) Presl
2219규제비검역병해충잡초Persicaria hydropiper (L.) Spach
2220규제비검역병해충잡초Rotala indica (Willd.) Koehne
2221규제비검역병해충잡초Schoenoplectiella juncoides (Roxb.) Lye