Overview

Dataset statistics

Number of variables5
Number of observations2005
Missing cells3
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory78.4 KiB
Average record size in memory40.1 B

Variable types

Text4
Categorical1

Dataset

Description국내 외 최초로 검역관련 한국의 식물병해충 소장 표본 목록을 제공함으로써 국민의 알권리와 교육 자료로 활용하게 함으로써 해외에서 유입되는 외래 병해충의 국내유입을 미리 예방하여 국내 자연과 환경을 보호하고자 함
Author공공데이터포털
URLhttps://www.data.go.kr/data/15117735/fileData.do

Alerts

라벨 종류 is highly imbalanced (98.9%)Imbalance
표본번호 has unique valuesUnique

Reproduction

Analysis started2024-04-21 09:03:07.277443
Analysis finished2024-04-21 09:03:08.179146
Duration0.9 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

표본번호
Text

UNIQUE 

Distinct2005
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
2024-04-21T18:03:08.989835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters24060
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2005 ?
Unique (%)100.0%

Sample

1st rowPQG 0000001
2nd rowPQG 0000002
3rd rowPQG 0000003
4th rowPQG 0000004
5th rowPQG 0000005
ValueCountFrequency (%)
pqg 2005
50.0%
0001333 1
 
< 0.1%
0001346 1
 
< 0.1%
0001345 1
 
< 0.1%
0001344 1
 
< 0.1%
0001343 1
 
< 0.1%
0001342 1
 
< 0.1%
0001341 1
 
< 0.1%
0001340 1
 
< 0.1%
0001339 1
 
< 0.1%
Other values (1996) 1996
49.8%
2024-04-21T18:03:10.371937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 7624
31.7%
4010
16.7%
P 2005
 
8.3%
Q 2005
 
8.3%
G 2005
 
8.3%
1 1601
 
6.7%
2 607
 
2.5%
3 601
 
2.5%
4 601
 
2.5%
5 601
 
2.5%
Other values (4) 2400
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 14035
58.3%
Uppercase Letter 6015
25.0%
Space Separator 4010
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 7624
54.3%
1 1601
 
11.4%
2 607
 
4.3%
3 601
 
4.3%
4 601
 
4.3%
5 601
 
4.3%
6 600
 
4.3%
7 600
 
4.3%
8 600
 
4.3%
9 600
 
4.3%
Uppercase Letter
ValueCountFrequency (%)
P 2005
33.3%
Q 2005
33.3%
G 2005
33.3%
Space Separator
ValueCountFrequency (%)
4010
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 18045
75.0%
Latin 6015
 
25.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 7624
42.2%
4010
22.2%
1 1601
 
8.9%
2 607
 
3.4%
3 601
 
3.3%
4 601
 
3.3%
5 601
 
3.3%
6 600
 
3.3%
7 600
 
3.3%
8 600
 
3.3%
Latin
ValueCountFrequency (%)
P 2005
33.3%
Q 2005
33.3%
G 2005
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24060
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 7624
31.7%
4010
16.7%
P 2005
 
8.3%
Q 2005
 
8.3%
G 2005
 
8.3%
1 1601
 
6.7%
2 607
 
2.5%
3 601
 
2.5%
4 601
 
2.5%
5 601
 
2.5%
Other values (4) 2400
 
10.0%

학명
Text

Distinct367
Distinct (%)18.3%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
2024-04-21T18:03:11.220968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length52
Median length47
Mean length32.715711
Min length11

Characters and Unicode

Total characters65595
Distinct characters58
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique153 ?
Unique (%)7.6%

Sample

1st rowAleurocanthus spiniferus (Quaintance)
2nd rowAleurocanthus spiniferus (Quaintance)
3rd rowAleurocanthus spiniferus (Quaintance)
4th rowAleurocanthus woglumi Ashby
5th rowAleurocanthus woglumi Ashby
ValueCountFrequency (%)
takahashi 222
 
3.5%
lepidosaphes 194
 
3.0%
maskell 190
 
3.0%
trialeurodes 156
 
2.4%
westwood 147
 
2.3%
vaporariorum 141
 
2.2%
kuwana 138
 
2.2%
and 128
 
2.0%
green 127
 
2.0%
ceroplastes 120
 
1.9%
Other values (585) 4818
75.5%
2024-04-21T18:03:12.513473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8549
13.0%
a 6349
 
9.7%
e 5138
 
7.8%
i 5081
 
7.7%
s 4755
 
7.2%
r 3701
 
5.6%
o 3592
 
5.5%
l 3076
 
4.7%
u 2928
 
4.5%
c 2355
 
3.6%
Other values (48) 20071
30.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 49793
75.9%
Space Separator 8549
 
13.0%
Uppercase Letter 4154
 
6.3%
Open Punctuation 1475
 
2.2%
Close Punctuation 1475
 
2.2%
Other Punctuation 118
 
0.2%
Connector Punctuation 24
 
< 0.1%
Decimal Number 7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 6349
12.8%
e 5138
10.3%
i 5081
10.2%
s 4755
9.5%
r 3701
 
7.4%
o 3592
 
7.2%
l 3076
 
6.2%
u 2928
 
5.9%
c 2355
 
4.7%
n 2341
 
4.7%
Other values (16) 10477
21.0%
Uppercase Letter
ValueCountFrequency (%)
P 537
12.9%
T 472
11.4%
C 455
11.0%
A 396
9.5%
M 342
8.2%
L 285
 
6.9%
B 261
 
6.3%
G 220
 
5.3%
W 216
 
5.2%
D 206
 
5.0%
Other values (15) 764
18.4%
Other Punctuation
ValueCountFrequency (%)
. 117
99.2%
, 1
 
0.8%
Space Separator
ValueCountFrequency (%)
8549
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1475
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1475
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 24
100.0%
Decimal Number
ValueCountFrequency (%)
1 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 53947
82.2%
Common 11648
 
17.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 6349
11.8%
e 5138
 
9.5%
i 5081
 
9.4%
s 4755
 
8.8%
r 3701
 
6.9%
o 3592
 
6.7%
l 3076
 
5.7%
u 2928
 
5.4%
c 2355
 
4.4%
n 2341
 
4.3%
Other values (41) 14631
27.1%
Common
ValueCountFrequency (%)
8549
73.4%
( 1475
 
12.7%
) 1475
 
12.7%
. 117
 
1.0%
_ 24
 
0.2%
1 7
 
0.1%
, 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 65595
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8549
13.0%
a 6349
 
9.7%
e 5138
 
7.8%
i 5081
 
7.7%
s 4755
 
7.2%
r 3701
 
5.6%
o 3592
 
5.5%
l 3076
 
4.7%
u 2928
 
4.5%
c 2355
 
3.6%
Other values (48) 20071
30.6%

국가
Text

Distinct51
Distinct (%)2.5%
Missing3
Missing (%)0.1%
Memory size15.8 KiB
2024-04-21T18:03:13.158232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length4
Mean length3.4535465
Min length2

Characters and Unicode

Total characters6914
Distinct characters96
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)0.7%

Sample

1st row대만
2nd row대만
3rd row대만
4th row케냐
5th row미국
ValueCountFrequency (%)
대한민국 1001
49.7%
라오스 269
 
13.3%
베트남 122
 
6.1%
미국 119
 
5.9%
대만 77
 
3.8%
일본 53
 
2.6%
태국 46
 
2.3%
중국 44
 
2.2%
호주 40
 
2.0%
필리핀 31
 
1.5%
Other values (47) 214
 
10.6%
2024-04-21T18:03:13.935535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1234
17.8%
1078
15.6%
1001
14.5%
1001
14.5%
302
 
4.4%
280
 
4.0%
269
 
3.9%
140
 
2.0%
128
 
1.9%
126
 
1.8%
Other values (86) 1355
19.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6900
99.8%
Space Separator 14
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1234
17.9%
1078
15.6%
1001
14.5%
1001
14.5%
302
 
4.4%
280
 
4.1%
269
 
3.9%
140
 
2.0%
128
 
1.9%
126
 
1.8%
Other values (85) 1341
19.4%
Space Separator
ValueCountFrequency (%)
14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6900
99.8%
Common 14
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1234
17.9%
1078
15.6%
1001
14.5%
1001
14.5%
302
 
4.4%
280
 
4.1%
269
 
3.9%
140
 
2.0%
128
 
1.9%
126
 
1.8%
Other values (85) 1341
19.4%
Common
ValueCountFrequency (%)
14
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6900
99.8%
ASCII 14
 
0.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1234
17.9%
1078
15.6%
1001
14.5%
1001
14.5%
302
 
4.4%
280
 
4.1%
269
 
3.9%
140
 
2.0%
128
 
1.9%
126
 
1.8%
Other values (85) 1341
19.4%
ASCII
ValueCountFrequency (%)
14
100.0%
Distinct610
Distinct (%)30.4%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
2024-04-21T18:03:14.783019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.9850374
Min length4

Characters and Unicode

Total characters20020
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique304 ?
Unique (%)15.2%

Sample

1st row1990-06-18
2nd row2011-01-14
3rd row2011-01-14
4th row1965-08-06
5th row2005-05-10
ValueCountFrequency (%)
2015-04-27 99
 
4.9%
2015-04-28 47
 
2.3%
2003-08-30 35
 
1.7%
2003-10-03 31
 
1.5%
1998-04-15 26
 
1.3%
2015-04-29 24
 
1.2%
2015-04-30 24
 
1.2%
2008-03-18 21
 
1.0%
2007-05-10 19
 
0.9%
2003-10-04 18
 
0.9%
Other values (600) 1661
82.8%
2024-04-21T18:03:15.855088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 5208
26.0%
- 4010
20.0%
2 2770
13.8%
1 2548
12.7%
9 1179
 
5.9%
5 983
 
4.9%
8 726
 
3.6%
3 717
 
3.6%
7 648
 
3.2%
4 628
 
3.1%
Other values (2) 603
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16000
79.9%
Dash Punctuation 4010
 
20.0%
Other Punctuation 10
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5208
32.6%
2 2770
17.3%
1 2548
15.9%
9 1179
 
7.4%
5 983
 
6.1%
8 726
 
4.5%
3 717
 
4.5%
7 648
 
4.0%
4 628
 
3.9%
6 593
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
- 4010
100.0%
Other Punctuation
ValueCountFrequency (%)
. 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 20020
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5208
26.0%
- 4010
20.0%
2 2770
13.8%
1 2548
12.7%
9 1179
 
5.9%
5 983
 
4.9%
8 726
 
3.6%
3 717
 
3.6%
7 648
 
3.2%
4 628
 
3.1%
Other values (2) 603
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20020
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5208
26.0%
- 4010
20.0%
2 2770
13.8%
1 2548
12.7%
9 1179
 
5.9%
5 983
 
4.9%
8 726
 
3.6%
3 717
 
3.6%
7 648
 
3.2%
4 628
 
3.1%
Other values (2) 603
 
3.0%

라벨 종류
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
슬라이드
2003 
해충(건조)
 
2

Length

Max length6
Median length4
Mean length4.001995
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row슬라이드
2nd row슬라이드
3rd row슬라이드
4th row슬라이드
5th row슬라이드

Common Values

ValueCountFrequency (%)
슬라이드 2003
99.9%
해충(건조) 2
 
0.1%

Length

2024-04-21T18:03:16.100550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T18:03:16.296384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
슬라이드 2003
99.9%
해충(건조 2
 
0.1%

Correlations

2024-04-21T18:03:16.407123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국가라벨 종류
국가1.0000.133
라벨 종류0.1331.000

Missing values

2024-04-21T18:03:07.732061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T18:03:08.053451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

표본번호학명국가검사_채집일라벨 종류
0PQG 0000001Aleurocanthus spiniferus (Quaintance)대만1990-06-18슬라이드
1PQG 0000002Aleurocanthus spiniferus (Quaintance)대만2011-01-14슬라이드
2PQG 0000003Aleurocanthus spiniferus (Quaintance)대만2011-01-14슬라이드
3PQG 0000004Aleurocanthus woglumi Ashby케냐1965-08-06슬라이드
4PQG 0000005Aleurocanthus woglumi Ashby미국2005-05-10슬라이드
5PQG 0000006Aleuroduplidens eucalyptifolia Martin호주2006-04-26슬라이드
6PQG 0000007Aleurolobus marlatti (Quaintance)대만2003-08-30슬라이드
7PQG 0000008Aleurolobus marlatti (Quaintance)베트남2013-02-06슬라이드
8PQG 0000009Aleurolobus marlatti (Quaintance)베트남2013-02-06슬라이드
9PQG 0000010Aleurotrachelus dryandrae Solomon호주2006-05-22슬라이드
표본번호학명국가검사_채집일라벨 종류
1995PQG 0001996Pseudaulacaspis cockerelli (Cooley)대한민국2014-03-13슬라이드
1996PQG 0001997Pseudaulacaspis cockerelli (Cooley)대한민국2014-04-02슬라이드
1997PQG 0001998Pseudaulacaspis cockerelli (Cooley)대한민국2014-04-01슬라이드
1998PQG 0001999Pseudaulacaspis cockerelli (Cooley)대한민국2014-04-01슬라이드
1999PQG 0002000Pseudaulacaspis cockerelli (Cooley)대한민국2014-07-27슬라이드
2000PQG 0002001Tinocallis kahawaluokalani (Kirkaldy)대한민국2014-05-20슬라이드
2001PQG 0002002Tinocallis kahawaluokalani (Kirkaldy)대한민국2014-05-20슬라이드
2002PQG 0002003Tinocallis kahawaluokalani (Kirkaldy)대한민국2014-05-20슬라이드
2003PQG 0002004Aspidiotus chinensis Kuwana일본2015-02-10슬라이드
2004PQG 0002005Octaspidiotus stauntoniae (Takahashi)대한민국2014-12-19슬라이드