Overview

Dataset statistics

Number of variables6
Number of observations2222
Missing cells2744
Missing cells (%)20.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory106.5 KiB
Average record size in memory49.1 B

Variable types

Categorical3
Text2
Unsupported1

Dataset

Description식물검역 규제병해충 종정보(금지병해충 및 관리병해충)
Author농림축산검역본부
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220214000000001888

Alerts

검역지위 is highly overall correlated with 분류High correlation
분류 is highly overall correlated with 검역지위High correlation
검역지위 is highly imbalanced (79.8%)Imbalance
분류 is highly imbalanced (52.5%)Imbalance
검토 is highly imbalanced (98.2%)Imbalance
비고(일반명, 과명 등) has 522 (23.5%) missing valuesMissing
DS has 2222 (100.0%) missing valuesMissing
DS is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 03:05:47.059040
Analysis finished2023-12-11 03:05:47.799728
Duration0.74 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

검역지위
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
관리병해충
2087 
금지병해충
 
77
규제비검역병해충
 
51
금지병해충(매개충)
 
7

Length

Max length10
Median length5
Mean length5.0846085
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row금지병해충
2nd row금지병해충
3rd row금지병해충
4th row금지병해충
5th row금지병해충

Common Values

ValueCountFrequency (%)
관리병해충 2087
93.9%
금지병해충 77
 
3.5%
규제비검역병해충 51
 
2.3%
금지병해충(매개충) 7
 
0.3%

Length

2023-12-11T12:05:47.883589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T12:05:48.014113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
관리병해충 2087
93.9%
금지병해충 77
 
3.5%
규제비검역병해충 51
 
2.3%
금지병해충(매개충 7
 
0.3%

분류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct11
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
곤충
1517 
진균
348 
바이러스
 
112
세균
 
62
응애
 
57
Other values (6)
 
126

Length

Max length7
Median length2
Mean length2.139964
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row진균
2nd row진균
3rd row진균
4th row진균
5th row진균

Common Values

ValueCountFrequency (%)
곤충 1517
68.3%
진균 348
 
15.7%
바이러스 112
 
5.0%
세균 62
 
2.8%
응애 57
 
2.6%
잡초 46
 
2.1%
선충 41
 
1.8%
달팽이 21
 
0.9%
바이로이드 10
 
0.5%
곤충(매개충) 7
 
0.3%

Length

2023-12-11T12:05:48.182775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
곤충 1517
68.3%
진균 348
 
15.7%
바이러스 112
 
5.0%
세균 62
 
2.8%
응애 57
 
2.6%
잡초 46
 
2.1%
선충 41
 
1.8%
달팽이 21
 
0.9%
바이로이드 10
 
0.5%
곤충(매개충 7
 
0.3%

학명
Text

Distinct2221
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
2023-12-11T12:05:48.581355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length347
Median length148
Mean length34.169217
Min length11

Characters and Unicode

Total characters75924
Distinct characters84
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2220 ?
Unique (%)99.9%

Sample

1st rowBalansia oryzae-sativae
2nd rowCronartium coleosporioides
3rd rowPeronospora tabacina
4th rowPhytophthora ramorum
5th rowSynchytrium endobioticum
ValueCountFrequency (%)
379
 
4.3%
virus 106
 
1.2%
fabricius 92
 
1.0%
linnaeus 64
 
0.7%
walker 51
 
0.6%
l 46
 
0.5%
et 39
 
0.4%
say 35
 
0.4%
cockerell 31
 
0.4%
al 30
 
0.3%
Other values (4651) 7959
90.1%
2023-12-11T12:05:49.157492image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 6648
 
8.8%
6578
 
8.7%
i 5665
 
7.5%
e 5205
 
6.9%
s 4717
 
6.2%
r 4674
 
6.2%
o 4390
 
5.8%
l 3600
 
4.7%
n 3468
 
4.6%
t 3124
 
4.1%
Other values (74) 27855
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 58769
77.4%
Space Separator 6578
 
8.7%
Uppercase Letter 5807
 
7.6%
Other Punctuation 1446
 
1.9%
Open Punctuation 1388
 
1.8%
Close Punctuation 1386
 
1.8%
Decimal Number 269
 
0.4%
Math Symbol 160
 
0.2%
Control 81
 
0.1%
Dash Punctuation 27
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 6648
11.3%
i 5665
 
9.6%
e 5205
 
8.9%
s 4717
 
8.0%
r 4674
 
8.0%
o 4390
 
7.5%
l 3600
 
6.1%
n 3468
 
5.9%
t 3124
 
5.3%
u 3114
 
5.3%
Other values (17) 14164
24.1%
Uppercase Letter
ValueCountFrequency (%)
C 623
 
10.7%
P 528
 
9.1%
S 487
 
8.4%
M 424
 
7.3%
B 389
 
6.7%
L 346
 
6.0%
D 316
 
5.4%
A 309
 
5.3%
H 299
 
5.1%
F 275
 
4.7%
Other values (16) 1811
31.2%
Decimal Number
ValueCountFrequency (%)
1 72
26.8%
9 56
20.8%
8 38
14.1%
7 23
 
8.6%
3 16
 
5.9%
5 15
 
5.6%
4 14
 
5.2%
0 13
 
4.8%
2 12
 
4.5%
6 10
 
3.7%
Other Letter
ValueCountFrequency (%)
2
15.4%
2
15.4%
2
15.4%
2
15.4%
1
7.7%
1
7.7%
1
7.7%
1
7.7%
1
7.7%
Other Punctuation
ValueCountFrequency (%)
. 993
68.7%
& 270
 
18.7%
, 114
 
7.9%
? 64
 
4.4%
' 3
 
0.2%
: 2
 
0.1%
Space Separator
ValueCountFrequency (%)
6578
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1388
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1386
100.0%
Math Symbol
ValueCountFrequency (%)
= 160
100.0%
Control
ValueCountFrequency (%)
81
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 27
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 64576
85.1%
Common 11335
 
14.9%
Hangul 13
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 6648
 
10.3%
i 5665
 
8.8%
e 5205
 
8.1%
s 4717
 
7.3%
r 4674
 
7.2%
o 4390
 
6.8%
l 3600
 
5.6%
n 3468
 
5.4%
t 3124
 
4.8%
u 3114
 
4.8%
Other values (43) 19971
30.9%
Common
ValueCountFrequency (%)
6578
58.0%
( 1388
 
12.2%
) 1386
 
12.2%
. 993
 
8.8%
& 270
 
2.4%
= 160
 
1.4%
, 114
 
1.0%
81
 
0.7%
1 72
 
0.6%
? 64
 
0.6%
Other values (12) 229
 
2.0%
Hangul
ValueCountFrequency (%)
2
15.4%
2
15.4%
2
15.4%
2
15.4%
1
7.7%
1
7.7%
1
7.7%
1
7.7%
1
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 75910
> 99.9%
Hangul 13
 
< 0.1%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 6648
 
8.8%
6578
 
8.7%
i 5665
 
7.5%
e 5205
 
6.9%
s 4717
 
6.2%
r 4674
 
6.2%
o 4390
 
5.8%
l 3600
 
4.7%
n 3468
 
4.6%
t 3124
 
4.1%
Other values (64) 27841
36.7%
Hangul
ValueCountFrequency (%)
2
15.4%
2
15.4%
2
15.4%
2
15.4%
1
7.7%
1
7.7%
1
7.7%
1
7.7%
1
7.7%
None
ValueCountFrequency (%)
ø 1
100.0%
Distinct295
Distinct (%)17.4%
Missing522
Missing (%)23.5%
Memory size17.5 KiB
2023-12-11T12:05:49.494503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length82
Median length47
Mean length17.357647
Min length1

Characters and Unicode

Total characters29508
Distinct characters332
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique175 ?
Unique (%)10.3%

Sample

1st row벼이삭미이라병
2nd row소나무종유석병
3rd row담배노균병
4th row참나무역병
5th row감자암종병
ValueCountFrequency (%)
curculionidae(바구미과 155
 
8.7%
diaspididae(깍지벌레과 92
 
5.2%
pseudococcidae(가루깍지벌레과 82
 
4.6%
cerambycidae(하늘소과 67
 
3.8%
scolytidae(나무좀과 66
 
3.7%
tortricidae(잎말이나방과 61
 
3.4%
chrysomelidae(잎벌레과 53
 
3.0%
cicadellidae(매미충과 51
 
2.9%
thripidae(총채벌레과 50
 
2.8%
aphididae(진딧물과 48
 
2.7%
Other values (336) 1055
59.3%
2023-12-11T12:05:49.975616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 2599
 
8.8%
e 2356
 
8.0%
a 2211
 
7.5%
d 2023
 
6.9%
1568
 
5.3%
( 1560
 
5.3%
) 1560
 
5.3%
c 1068
 
3.6%
o 1053
 
3.6%
r 1034
 
3.5%
Other values (322) 12476
42.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 16690
56.6%
Other Letter 7974
27.0%
Uppercase Letter 1611
 
5.5%
Open Punctuation 1560
 
5.3%
Close Punctuation 1560
 
5.3%
Space Separator 91
 
0.3%
Other Punctuation 14
 
< 0.1%
Dash Punctuation 7
 
< 0.1%
Final Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1568
19.7%
474
 
5.9%
440
 
5.5%
375
 
4.7%
280
 
3.5%
261
 
3.3%
236
 
3.0%
231
 
2.9%
208
 
2.6%
190
 
2.4%
Other values (269) 3711
46.5%
Lowercase Letter
ValueCountFrequency (%)
i 2599
15.6%
e 2356
14.1%
a 2211
13.2%
d 2023
12.1%
c 1068
6.4%
o 1053
6.3%
r 1034
 
6.2%
l 702
 
4.2%
t 533
 
3.2%
u 530
 
3.2%
Other values (15) 2581
15.5%
Uppercase Letter
ValueCountFrequency (%)
C 432
26.8%
T 207
12.8%
P 201
12.5%
A 150
 
9.3%
S 118
 
7.3%
D 113
 
7.0%
N 58
 
3.6%
B 56
 
3.5%
L 52
 
3.2%
E 43
 
2.7%
Other values (12) 181
11.2%
Open Punctuation
ValueCountFrequency (%)
( 1560
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1560
100.0%
Space Separator
ValueCountFrequency (%)
91
100.0%
Other Punctuation
ValueCountFrequency (%)
, 14
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 18301
62.0%
Hangul 7974
27.0%
Common 3233
 
11.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1568
19.7%
474
 
5.9%
440
 
5.5%
375
 
4.7%
280
 
3.5%
261
 
3.3%
236
 
3.0%
231
 
2.9%
208
 
2.6%
190
 
2.4%
Other values (269) 3711
46.5%
Latin
ValueCountFrequency (%)
i 2599
14.2%
e 2356
12.9%
a 2211
12.1%
d 2023
11.1%
c 1068
 
5.8%
o 1053
 
5.8%
r 1034
 
5.6%
l 702
 
3.8%
t 533
 
2.9%
u 530
 
2.9%
Other values (37) 4192
22.9%
Common
ValueCountFrequency (%)
( 1560
48.3%
) 1560
48.3%
91
 
2.8%
, 14
 
0.4%
- 7
 
0.2%
1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21533
73.0%
Hangul 7974
 
27.0%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 2599
12.1%
e 2356
10.9%
a 2211
10.3%
d 2023
 
9.4%
( 1560
 
7.2%
) 1560
 
7.2%
c 1068
 
5.0%
o 1053
 
4.9%
r 1034
 
4.8%
l 702
 
3.3%
Other values (42) 5367
24.9%
Hangul
ValueCountFrequency (%)
1568
19.7%
474
 
5.9%
440
 
5.5%
375
 
4.7%
280
 
3.5%
261
 
3.3%
236
 
3.0%
231
 
2.9%
208
 
2.6%
190
 
2.4%
Other values (269) 3711
46.5%
Punctuation
ValueCountFrequency (%)
1
100.0%

검토
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
<NA>
2216 
학명 수정
 
5
제외
 
1

Length

Max length5
Median length4
Mean length4.0013501
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row학명 수정
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 2216
99.7%
학명 수정 5
 
0.2%
제외 1
 
< 0.1%

Length

2023-12-11T12:05:50.161008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T12:05:50.266188image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 2216
99.5%
학명 5
 
0.2%
수정 5
 
0.2%
제외 1
 
< 0.1%

DS
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2222
Missing (%)100.0%
Memory size19.7 KiB

Correlations

2023-12-11T12:05:50.345189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
검역지위분류검토
검역지위1.0000.7790.000
분류0.7791.0000.000
검토0.0000.0001.000
2023-12-11T12:05:50.431848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
검역지위검토분류
검역지위1.0000.0000.603
검토0.0001.0000.000
분류0.6030.0001.000
2023-12-11T12:05:50.532863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
검역지위분류검토
검역지위1.0000.6030.000
분류0.6031.0000.000
검토0.0000.0001.000

Missing values

2023-12-11T12:05:47.654574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T12:05:47.753613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

검역지위분류학명비고(일반명, 과명 등)검토DS
0금지병해충진균Balansia oryzae-sativae벼이삭미이라병<NA><NA>
1금지병해충진균Cronartium coleosporioides소나무종유석병<NA><NA>
2금지병해충진균Peronospora tabacina담배노균병학명 수정<NA>
3금지병해충진균Phytophthora ramorum참나무역병<NA><NA>
4금지병해충진균Synchytrium endobioticum감자암종병<NA><NA>
5금지병해충세균Candidatus Liberibacter solanacearum제브라칩병<NA><NA>
6금지병해충세균Citrus huanglongbing(greening) disease감귤그린병학명 수정<NA>
7금지병해충세균Xylella fastidiosa포도피어슨병<NA><NA>
8금지병해충세균Erwinia amylovora과수화상병<NA><NA>
9금지병해충세균Apple proliferation phytoplasma사과빗자루병학명 수정<NA>
검역지위분류학명비고(일반명, 과명 등)검토DS
2212규제비검역병해충잡초Aneilema keisak Hassk.사마귀풀<NA><NA>
2213규제비검역병해충잡초Capsella bursa-pastoris (L.) Medik.냉이<NA><NA>
2214규제비검역병해충잡초Cruciferae family십자화과 잡초<NA><NA>
2215규제비검역병해충잡초Cuscuta spp.새삼<NA><NA>
2216규제비검역병해충잡초Echinochloa crus-galli (하위군류군 포함) (L.) Beauv.돌피<NA><NA>
2217규제비검역병해충잡초Echinochloa utilis Ohwi et Yabuno<NA><NA>
2218규제비검역병해충잡초Monochoria vaginalis (Burn. f.) Presl물달개비<NA><NA>
2219규제비검역병해충잡초Persicaria hydropiper (L.) Spach여뀌<NA><NA>
2220규제비검역병해충잡초Rotala indica (Willd.) Koehne마디꽃<NA><NA>
2221규제비검역병해충잡초Schoenoplectiella juncoides (Roxb.) Lye올챙이고랭이<NA><NA>