Overview

Dataset statistics

Number of variables4
Number of observations1014
Missing cells0
Missing cells (%)0.0%
Duplicate rows21
Duplicate rows (%)2.1%
Total size in memory31.8 KiB
Average record size in memory32.1 B

Variable types

Text3
Categorical1

Dataset

Description생물종의 유전정보 분석 관련 바코드 유전자 및 계통 분석에 기반이 되는 프라이머 서열의 정의, 활용 및 라이브러리 정보 관련 자료 입니다.
Author환경부 국립생물자원관
URLhttps://www.data.go.kr/data/15067613/fileData.do

Alerts

Dataset has 21 (2.1%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 06:23:57.454126
Analysis finished2023-12-12 06:23:57.855312
Duration0.4 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct888
Distinct (%)87.6%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
2023-12-12T15:23:58.182706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length18
Mean length7.199211
Min length2

Characters and Unicode

Total characters7300
Distinct characters70
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique801 ?
Unique (%)79.0%

Sample

1st rowDorid_COI_3F
2nd row1F-spionid-LCO
3rd row18S329
4th row18SL
5th row18S R8
ValueCountFrequency (%)
psbaf 6
 
0.6%
ycf1 5
 
0.5%
m13_trnh 5
 
0.5%
trnhr 5
 
0.5%
its4 5
 
0.5%
m13_its1a 5
 
0.5%
m13_its4 5
 
0.5%
18s 5
 
0.5%
m13_psba 5
 
0.5%
lco1490 4
 
0.4%
Other values (885) 1007
95.3%
2023-12-12T15:23:58.726170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 589
 
8.1%
R 397
 
5.4%
_ 396
 
5.4%
F 383
 
5.2%
2 282
 
3.9%
- 250
 
3.4%
r 249
 
3.4%
t 247
 
3.4%
a 236
 
3.2%
L 231
 
3.2%
Other values (60) 4040
55.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2564
35.1%
Decimal Number 2007
27.5%
Lowercase Letter 2004
27.5%
Connector Punctuation 396
 
5.4%
Dash Punctuation 250
 
3.4%
Space Separator 50
 
0.7%
Other Punctuation 10
 
0.1%
Open Punctuation 9
 
0.1%
Close Punctuation 9
 
0.1%
Letter Number 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 397
15.5%
F 383
14.9%
L 231
9.0%
S 187
 
7.3%
C 164
 
6.4%
K 156
 
6.1%
I 129
 
5.0%
T 126
 
4.9%
A 116
 
4.5%
H 105
 
4.1%
Other values (16) 570
22.2%
Lowercase Letter
ValueCountFrequency (%)
r 249
12.4%
t 247
12.3%
a 236
11.8%
m 175
8.7%
c 166
 
8.3%
b 165
 
8.2%
n 104
 
5.2%
s 78
 
3.9%
p 77
 
3.8%
e 72
 
3.6%
Other values (16) 435
21.7%
Decimal Number
ValueCountFrequency (%)
1 589
29.3%
2 282
14.1%
3 196
 
9.8%
4 172
 
8.6%
8 158
 
7.9%
0 137
 
6.8%
9 123
 
6.1%
5 121
 
6.0%
6 121
 
6.0%
7 108
 
5.4%
Other Punctuation
ValueCountFrequency (%)
. 9
90.0%
: 1
 
10.0%
Connector Punctuation
ValueCountFrequency (%)
_ 396
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 250
100.0%
Space Separator
ValueCountFrequency (%)
50
100.0%
Open Punctuation
ValueCountFrequency (%)
( 9
100.0%
Close Punctuation
ValueCountFrequency (%)
) 9
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4567
62.6%
Common 2731
37.4%
Greek 2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 397
 
8.7%
F 383
 
8.4%
r 249
 
5.5%
t 247
 
5.4%
a 236
 
5.2%
L 231
 
5.1%
S 187
 
4.1%
m 175
 
3.8%
c 166
 
3.6%
b 165
 
3.6%
Other values (42) 2131
46.7%
Common
ValueCountFrequency (%)
1 589
21.6%
_ 396
14.5%
2 282
10.3%
- 250
9.2%
3 196
 
7.2%
4 172
 
6.3%
8 158
 
5.8%
0 137
 
5.0%
9 123
 
4.5%
5 121
 
4.4%
Other values (7) 307
11.2%
Greek
ValueCountFrequency (%)
β 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7297
> 99.9%
None 2
 
< 0.1%
Number Forms 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 589
 
8.1%
R 397
 
5.4%
_ 396
 
5.4%
F 383
 
5.2%
2 282
 
3.9%
- 250
 
3.4%
r 249
 
3.4%
t 247
 
3.4%
a 236
 
3.2%
L 231
 
3.2%
Other values (58) 4037
55.3%
None
ValueCountFrequency (%)
β 2
100.0%
Number Forms
ValueCountFrequency (%)
1
100.0%

방향
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
reverse
514 
forward
500 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowforward
2nd rowforward
3rd rowreverse
4th rowforward
5th rowreverse

Common Values

ValueCountFrequency (%)
reverse 514
50.7%
forward 500
49.3%

Length

2023-12-12T15:23:58.883436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:23:59.008682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
reverse 514
50.7%
forward 500
49.3%
Distinct60
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
2023-12-12T15:23:59.236567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length11
Mean length4.4575937
Min length3

Characters and Unicode

Total characters4520
Distinct characters54
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCOI
2nd rowCOI
3rd row18S rRNA
4th row18S rRNA
5th row18S rRNA
ValueCountFrequency (%)
coi 172
15.6%
matk 148
13.4%
rbcl 140
12.7%
its 88
 
8.0%
rrna 85
 
7.7%
cytb 55
 
5.0%
18s 52
 
4.7%
16s 38
 
3.5%
28s 33
 
3.0%
trnh-psba 32
 
2.9%
Other values (51) 258
23.4%
2023-12-12T15:23:59.669884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 319
 
7.1%
r 314
 
6.9%
t 287
 
6.3%
I 279
 
6.2%
b 259
 
5.7%
C 251
 
5.6%
L 186
 
4.1%
O 180
 
4.0%
c 175
 
3.9%
a 162
 
3.6%
Other values (44) 2108
46.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2072
45.8%
Lowercase Letter 1858
41.1%
Decimal Number 365
 
8.1%
Dash Punctuation 124
 
2.7%
Space Separator 99
 
2.2%
Other Punctuation 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 319
15.4%
I 279
13.5%
C 251
12.1%
L 186
9.0%
O 180
8.7%
K 157
7.6%
A 138
6.7%
T 108
 
5.2%
R 100
 
4.8%
N 91
 
4.4%
Other values (12) 263
12.7%
Lowercase Letter
ValueCountFrequency (%)
r 314
16.9%
t 287
15.4%
b 259
13.9%
c 175
9.4%
a 162
8.7%
m 152
8.2%
p 99
 
5.3%
n 92
 
5.0%
s 68
 
3.7%
y 64
 
3.4%
Other values (11) 186
10.0%
Decimal Number
ValueCountFrequency (%)
1 140
38.4%
8 85
23.3%
2 76
20.8%
6 47
 
12.9%
3 8
 
2.2%
5 5
 
1.4%
4 2
 
0.5%
9 2
 
0.5%
Dash Punctuation
ValueCountFrequency (%)
- 124
100.0%
Space Separator
ValueCountFrequency (%)
99
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3916
86.6%
Common 590
 
13.1%
Greek 14
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 319
 
8.1%
r 314
 
8.0%
t 287
 
7.3%
I 279
 
7.1%
b 259
 
6.6%
C 251
 
6.4%
L 186
 
4.7%
O 180
 
4.6%
c 175
 
4.5%
a 162
 
4.1%
Other values (32) 1504
38.4%
Common
ValueCountFrequency (%)
1 140
23.7%
- 124
21.0%
99
16.8%
8 85
14.4%
2 76
12.9%
6 47
 
8.0%
3 8
 
1.4%
5 5
 
0.8%
/ 2
 
0.3%
4 2
 
0.3%
Greek
ValueCountFrequency (%)
α 14
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4506
99.7%
None 14
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 319
 
7.1%
r 314
 
7.0%
t 287
 
6.4%
I 279
 
6.2%
b 259
 
5.7%
C 251
 
5.6%
L 186
 
4.1%
O 180
 
4.0%
c 175
 
3.9%
a 162
 
3.6%
Other values (43) 2094
46.5%
None
ValueCountFrequency (%)
α 14
100.0%
Distinct190
Distinct (%)18.7%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
2023-12-12T15:23:59.999045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length64
Median length37
Mean length20.99211
Min length4

Characters and Unicode

Total characters21286
Distinct characters67
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24 ?
Unique (%)2.4%

Sample

1st rowPseudopolydora gigeriosa
2nd rowPolychaeta Grube, 1850
3rd rowCrustacea Brnnich, 1772
4th rowCrustacea Brnnich, 1772
5th rowCrustacea Brnnich, 1772
ValueCountFrequency (%)
ex 92
 
3.2%
88
 
3.1%
plantae 75
 
2.6%
fungi 73
 
2.6%
l 57
 
2.0%
linnaeus 32
 
1.1%
wettstein 29
 
1.0%
rhodophyta 29
 
1.0%
1922 29
 
1.0%
insecta 28
 
1.0%
Other values (438) 2325
81.4%
2023-12-12T15:24:00.580243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 2164
 
10.2%
1843
 
8.7%
e 1632
 
7.7%
i 1275
 
6.0%
r 997
 
4.7%
n 967
 
4.5%
o 957
 
4.5%
t 931
 
4.4%
s 875
 
4.1%
l 722
 
3.4%
Other values (57) 8923
41.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 14681
69.0%
Uppercase Letter 2108
 
9.9%
Space Separator 1843
 
8.7%
Decimal Number 1600
 
7.5%
Other Punctuation 830
 
3.9%
Open Punctuation 100
 
0.5%
Close Punctuation 100
 
0.5%
Dash Punctuation 24
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2164
14.7%
e 1632
11.1%
i 1275
 
8.7%
r 997
 
6.8%
n 967
 
6.6%
o 957
 
6.5%
t 931
 
6.3%
s 875
 
6.0%
l 722
 
4.9%
c 637
 
4.3%
Other values (16) 3524
24.0%
Uppercase Letter
ValueCountFrequency (%)
C 235
 
11.1%
P 211
 
10.0%
L 200
 
9.5%
A 172
 
8.2%
R 138
 
6.5%
F 130
 
6.2%
S 109
 
5.2%
B 106
 
5.0%
M 104
 
4.9%
H 99
 
4.7%
Other values (14) 604
28.7%
Decimal Number
ValueCountFrequency (%)
1 461
28.8%
8 248
15.5%
9 211
13.2%
2 157
 
9.8%
7 150
 
9.4%
0 122
 
7.6%
5 76
 
4.8%
4 61
 
3.8%
6 60
 
3.8%
3 54
 
3.4%
Other Punctuation
ValueCountFrequency (%)
. 431
51.9%
, 311
37.5%
& 88
 
10.6%
Space Separator
ValueCountFrequency (%)
1843
100.0%
Open Punctuation
ValueCountFrequency (%)
( 100
100.0%
Close Punctuation
ValueCountFrequency (%)
) 100
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 24
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16789
78.9%
Common 4497
 
21.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2164
 
12.9%
e 1632
 
9.7%
i 1275
 
7.6%
r 997
 
5.9%
n 967
 
5.8%
o 957
 
5.7%
t 931
 
5.5%
s 875
 
5.2%
l 722
 
4.3%
c 637
 
3.8%
Other values (40) 5632
33.5%
Common
ValueCountFrequency (%)
1843
41.0%
1 461
 
10.3%
. 431
 
9.6%
, 311
 
6.9%
8 248
 
5.5%
9 211
 
4.7%
2 157
 
3.5%
7 150
 
3.3%
0 122
 
2.7%
( 100
 
2.2%
Other values (7) 463
 
10.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21286
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2164
 
10.2%
1843
 
8.7%
e 1632
 
7.7%
i 1275
 
6.0%
r 997
 
4.7%
n 967
 
4.5%
o 957
 
4.5%
t 931
 
4.4%
s 875
 
4.1%
l 722
 
3.4%
Other values (57) 8923
41.9%

Correlations

2023-12-12T15:24:00.724466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
방향마커명
방향1.0000.000
마커명0.0001.000

Missing values

2023-12-12T15:23:57.725319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:23:57.817753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

프라이머명방향마커명대상 분류군
0Dorid_COI_3FforwardCOIPseudopolydora gigeriosa
11F-spionid-LCOforwardCOIPolychaeta Grube, 1850
218S329reverse18S rRNACrustacea Brnnich, 1772
318SLforward18S rRNACrustacea Brnnich, 1772
418S R8reverse18S rRNACrustacea Brnnich, 1772
518S F2forward18S rRNACrustacea Brnnich, 1772
618S F2forward18S rRNACrustacea Brnnich, 1772
7psbAreversetrnH-psbAElaeocarpus L.
8trnLforwardtrnL-FElaeocarpus L.
9mtDNA_ext(Cytb)RreverseCytbAves Linnaeus, 1758
프라이머명방향마커명대상 분류군
1004rbcL_902RreverserbcLLamiales Bromhead
1005rbcL_26FforwardrbcLLamiales Bromhead
1006Am β-tubulin-RreverseTubulinAmanita Pers. 1797
1007Am β-tubulin FforwardTubulinAmanita Pers. 1797
1008Am-7R-DKreverseRPB2Amanita Pers. 1797
1009Am-6F-DKforwardRPB2Amanita Pers. 1797
1010LROR-DKforwardLSUAmanita Pers. 1797
1011LR5-DKreverseLSUAmanita Pers. 1797
1012ITS4-DKreverseITSAmanita Pers. 1797
1013ITS1-DKforwardITSAmanita Pers. 1797

Duplicate rows

Most frequently occurring

프라이머명방향마커명대상 분류군# duplicates
428sFFreverse28S rRNAInsecta3
01055Fforward18S rRNAProtozoa2
11055Rreverse18S rRNAProtozoa2
218S F2forward18S rRNACrustacea Brnnich, 17722
328sDDforward28S rRNAInsecta2
5BTUB4RdreverseTubulinFungi2
6COI2FforwardCOIAcari2
7ITS4reverseITSPlantae2
8LCOech1aF1forwardCOIEchinodermata Klein, 17342
9M13_ITS1aforwardITSRubia argyi (H. Lv. & Vaniot) H. Hara ex Lauener & D.K. Ferguson2