Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells2686
Missing cells (%)6.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory390.6 KiB
Average record size in memory40.0 B

Variable types

Text4

Dataset

Description국립생물자원관에서 생산한 DNA 바코드 서열 관련 자생 야생생물의 유전정보 현황(유전정보관리 번호, 학명, 국명 등) 제공
Author환경부 국립생물자원관
URLhttps://www.data.go.kr/data/3070009/fileData.do

Alerts

국명 has 2686 (26.9%) missing valuesMissing
유전정보아이디 has unique valuesUnique

Reproduction

Analysis started2023-12-12 16:32:35.548703
Analysis finished2023-12-12 16:32:36.263375
Duration0.71 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T01:32:36.445861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters100000
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st rowWBN0368067
2nd rowWBN0342171
3rd rowWBN0377694
4th rowWBN0340032
5th rowWBN0383118
ValueCountFrequency (%)
wbn0368067 1
 
< 0.1%
wbn0382855 1
 
< 0.1%
wbn0361566 1
 
< 0.1%
wbn0397397 1
 
< 0.1%
wbn0346392 1
 
< 0.1%
wbn0379514 1
 
< 0.1%
wbn0345017 1
 
< 0.1%
wbn0368320 1
 
< 0.1%
wbn0400002 1
 
< 0.1%
wbn0358100 1
 
< 0.1%
Other values (9990) 9990
99.9%
2023-12-13T01:32:36.810939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 14709
14.7%
3 13904
13.9%
W 10000
10.0%
B 10000
10.0%
N 10000
10.0%
4 5799
 
5.8%
9 5695
 
5.7%
8 5527
 
5.5%
6 5493
 
5.5%
7 5486
 
5.5%
Other values (3) 13387
13.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 70000
70.0%
Uppercase Letter 30000
30.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 14709
21.0%
3 13904
19.9%
4 5799
 
8.3%
9 5695
 
8.1%
8 5527
 
7.9%
6 5493
 
7.8%
7 5486
 
7.8%
5 5381
 
7.7%
1 4007
 
5.7%
2 3999
 
5.7%
Uppercase Letter
ValueCountFrequency (%)
W 10000
33.3%
B 10000
33.3%
N 10000
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common 70000
70.0%
Latin 30000
30.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 14709
21.0%
3 13904
19.9%
4 5799
 
8.3%
9 5695
 
8.1%
8 5527
 
7.9%
6 5493
 
7.8%
7 5486
 
7.8%
5 5381
 
7.7%
1 4007
 
5.7%
2 3999
 
5.7%
Latin
ValueCountFrequency (%)
W 10000
33.3%
B 10000
33.3%
N 10000
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 14709
14.7%
3 13904
13.9%
W 10000
10.0%
B 10000
10.0%
N 10000
10.0%
4 5799
 
5.8%
9 5695
 
5.7%
8 5527
 
5.5%
6 5493
 
5.5%
7 5486
 
5.5%
Other values (3) 13387
13.4%
Distinct4794
Distinct (%)47.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T01:32:37.110902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length117
Median length75
Mean length32.1174
Min length5

Characters and Unicode

Total characters321174
Distinct characters77
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3104 ?
Unique (%)31.0%

Sample

1st rowAsplenium incisum Thunb.
2nd rowAnthus gustavi Swinhoe, 1863
3rd rowGlomerella Spauld. & H. Schrenk 1903
4th rowPetrolisthes coccineus (Owen, 1839)
5th rowPagurus maculosus Komai & Imafuku, 1996
ValueCountFrequency (%)
l 1053
 
2.5%
1038
 
2.4%
et 506
 
1.2%
al 504
 
1.2%
ex 444
 
1.0%
nakai 375
 
0.9%
japonica 311
 
0.7%
a 297
 
0.7%
var 294
 
0.7%
h 286
 
0.7%
Other values (8363) 37354
88.0%
2023-12-13T01:32:37.833360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
32462
 
10.1%
a 28650
 
8.9%
i 22195
 
6.9%
e 18804
 
5.9%
s 15899
 
5.0%
o 15720
 
4.9%
r 15295
 
4.8%
n 14436
 
4.5%
l 13077
 
4.1%
u 12660
 
3.9%
Other values (67) 131976
41.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 222556
69.3%
Space Separator 32462
 
10.1%
Uppercase Letter 26618
 
8.3%
Decimal Number 18956
 
5.9%
Other Punctuation 13550
 
4.2%
Open Punctuation 3406
 
1.1%
Close Punctuation 3406
 
1.1%
Dash Punctuation 183
 
0.1%
Math Symbol 20
 
< 0.1%
Final Punctuation 17
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 28650
12.9%
i 22195
 
10.0%
e 18804
 
8.4%
s 15899
 
7.1%
o 15720
 
7.1%
r 15295
 
6.9%
n 14436
 
6.5%
l 13077
 
5.9%
u 12660
 
5.7%
t 10225
 
4.6%
Other values (17) 55595
25.0%
Uppercase Letter
ValueCountFrequency (%)
C 2655
 
10.0%
L 2518
 
9.5%
S 2253
 
8.5%
M 2003
 
7.5%
P 1751
 
6.6%
A 1741
 
6.5%
H 1578
 
5.9%
T 1360
 
5.1%
B 1336
 
5.0%
K 1186
 
4.5%
Other values (17) 8237
30.9%
Decimal Number
ValueCountFrequency (%)
1 4936
26.0%
8 3280
17.3%
9 2275
12.0%
0 1779
 
9.4%
7 1581
 
8.3%
2 1488
 
7.8%
5 1011
 
5.3%
6 981
 
5.2%
3 848
 
4.5%
4 777
 
4.1%
Other Punctuation
ValueCountFrequency (%)
. 8328
61.5%
, 3759
27.7%
& 1035
 
7.6%
? 421
 
3.1%
' 7
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 3403
99.9%
[ 3
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 3403
99.9%
] 3
 
0.1%
Space Separator
ValueCountFrequency (%)
32462
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 183
100.0%
Math Symbol
ValueCountFrequency (%)
× 20
100.0%
Final Punctuation
ValueCountFrequency (%)
17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 249174
77.6%
Common 72000
 
22.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 28650
 
11.5%
i 22195
 
8.9%
e 18804
 
7.5%
s 15899
 
6.4%
o 15720
 
6.3%
r 15295
 
6.1%
n 14436
 
5.8%
l 13077
 
5.2%
u 12660
 
5.1%
t 10225
 
4.1%
Other values (44) 82213
33.0%
Common
ValueCountFrequency (%)
32462
45.1%
. 8328
 
11.6%
1 4936
 
6.9%
, 3759
 
5.2%
( 3403
 
4.7%
) 3403
 
4.7%
8 3280
 
4.6%
9 2275
 
3.2%
0 1779
 
2.5%
7 1581
 
2.2%
Other values (13) 6794
 
9.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 321134
> 99.9%
None 23
 
< 0.1%
Punctuation 17
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
32462
 
10.1%
a 28650
 
8.9%
i 22195
 
6.9%
e 18804
 
5.9%
s 15899
 
5.0%
o 15720
 
4.9%
r 15295
 
4.8%
n 14436
 
4.5%
l 13077
 
4.1%
u 12660
 
3.9%
Other values (63) 131936
41.1%
None
ValueCountFrequency (%)
× 20
87.0%
Ø 2
 
8.7%
ø 1
 
4.3%
Punctuation
ValueCountFrequency (%)
17
100.0%

국명
Text

MISSING 

Distinct3057
Distinct (%)41.8%
Missing2686
Missing (%)26.9%
Memory size156.2 KiB
2023-12-13T01:32:38.098637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length11
Mean length4.6462948
Min length1

Characters and Unicode

Total characters33983
Distinct characters688
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1812 ?
Unique (%)24.8%

Sample

1st row꼬리고사리
2nd row흰등밭종다리
3rd row작은뿔껍질균속
4th row검붉은게붙이
5th row가는몸참집게
ValueCountFrequency (%)
밤고둥 180
 
2.5%
낫균속 103
 
1.4%
구멍밤고둥 102
 
1.4%
홍합 94
 
1.3%
가는몸참집게 76
 
1.0%
고랑딱개비 76
 
1.0%
극동갯강구 74
 
1.0%
쇠살모사 63
 
0.9%
덧나무 52
 
0.7%
무당거미 50
 
0.7%
Other values (3047) 6444
88.1%
2023-12-13T01:32:38.466176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1111
 
3.3%
1044
 
3.1%
844
 
2.5%
811
 
2.4%
791
 
2.3%
723
 
2.1%
629
 
1.9%
478
 
1.4%
454
 
1.3%
454
 
1.3%
Other values (678) 26644
78.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 33979
> 99.9%
Other Punctuation 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1111
 
3.3%
1044
 
3.1%
844
 
2.5%
811
 
2.4%
791
 
2.3%
723
 
2.1%
629
 
1.9%
478
 
1.4%
454
 
1.3%
454
 
1.3%
Other values (677) 26640
78.4%
Other Punctuation
ValueCountFrequency (%)
/ 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 33979
> 99.9%
Common 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1111
 
3.3%
1044
 
3.1%
844
 
2.5%
811
 
2.4%
791
 
2.3%
723
 
2.1%
629
 
1.9%
478
 
1.4%
454
 
1.3%
454
 
1.3%
Other values (677) 26640
78.4%
Common
ValueCountFrequency (%)
/ 4
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 33979
> 99.9%
ASCII 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1111
 
3.3%
1044
 
3.1%
844
 
2.5%
811
 
2.4%
791
 
2.3%
723
 
2.1%
629
 
1.9%
478
 
1.4%
454
 
1.3%
454
 
1.3%
Other values (677) 26640
78.4%
ASCII
ValueCountFrequency (%)
/ 4
100.0%
Distinct55
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T01:32:38.656958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length3
Mean length3.9099
Min length3

Characters and Unicode

Total characters39099
Distinct characters56
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowrbcL
2nd rowCytb
3rd rowCHS-1
4th rowCOI
5th rowCOI
ValueCountFrequency (%)
rbcl 1926
18.9%
coi 1835
18.0%
its 1636
16.0%
16s 1396
13.7%
matk 1130
11.1%
trnh-psba 422
 
4.1%
cytb 260
 
2.5%
rrna 213
 
2.1%
trnl-f 191
 
1.9%
lsu 135
 
1.3%
Other values (46) 1073
10.5%
2023-12-13T01:32:38.958684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 3573
 
9.1%
I 3543
 
9.1%
r 2839
 
7.3%
b 2744
 
7.0%
L 2272
 
5.8%
C 2237
 
5.7%
t 2128
 
5.4%
c 1982
 
5.1%
O 1840
 
4.7%
1 1719
 
4.4%
Other values (46) 14222
36.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 19330
49.4%
Lowercase Letter 14849
38.0%
Decimal Number 3765
 
9.6%
Dash Punctuation 922
 
2.4%
Space Separator 217
 
0.6%
Other Punctuation 16
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 3573
18.5%
I 3543
18.3%
L 2272
11.8%
C 2237
11.6%
O 1840
9.5%
T 1688
8.7%
K 1202
 
6.2%
A 684
 
3.5%
H 588
 
3.0%
R 328
 
1.7%
Other values (14) 1375
 
7.1%
Lowercase Letter
ValueCountFrequency (%)
r 2839
19.1%
b 2744
18.5%
t 2128
14.3%
c 1982
13.3%
a 1202
8.1%
m 1152
7.8%
n 758
 
5.1%
p 681
 
4.6%
s 573
 
3.9%
y 280
 
1.9%
Other values (11) 510
 
3.4%
Decimal Number
ValueCountFrequency (%)
1 1719
45.7%
6 1410
37.5%
2 361
 
9.6%
8 213
 
5.7%
3 29
 
0.8%
4 20
 
0.5%
5 8
 
0.2%
9 5
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 922
100.0%
Space Separator
ValueCountFrequency (%)
217
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 34140
87.3%
Common 4920
 
12.6%
Greek 39
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 3573
 
10.5%
I 3543
 
10.4%
r 2839
 
8.3%
b 2744
 
8.0%
L 2272
 
6.7%
C 2237
 
6.6%
t 2128
 
6.2%
c 1982
 
5.8%
O 1840
 
5.4%
T 1688
 
4.9%
Other values (34) 9294
27.2%
Common
ValueCountFrequency (%)
1 1719
34.9%
6 1410
28.7%
- 922
18.7%
2 361
 
7.3%
217
 
4.4%
8 213
 
4.3%
3 29
 
0.6%
4 20
 
0.4%
/ 16
 
0.3%
5 8
 
0.2%
Greek
ValueCountFrequency (%)
α 39
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 39060
99.9%
None 39
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 3573
 
9.1%
I 3543
 
9.1%
r 2839
 
7.3%
b 2744
 
7.0%
L 2272
 
5.8%
C 2237
 
5.7%
t 2128
 
5.4%
c 1982
 
5.1%
O 1840
 
4.7%
1 1719
 
4.4%
Other values (45) 14183
36.3%
None
ValueCountFrequency (%)
α 39
100.0%

Missing values

2023-12-13T01:32:36.139274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:32:36.222791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

유전정보아이디학명국명마커명
30127WBN0368067Asplenium incisum Thunb.꼬리고사리rbcL
2111WBN0342171Anthus gustavi Swinhoe, 1863흰등밭종다리Cytb
39921WBN0377694Glomerella Spauld. & H. Schrenk 1903작은뿔껍질균속CHS-1
2272WBN0340032Petrolisthes coccineus (Owen, 1839)검붉은게붙이COI
57483WBN0383118Pagurus maculosus Komai & Imafuku, 1996가는몸참집게COI
53234WBN0387239Polygonatum Mill.둥굴레속rbcL
467WBN0337853Petunia × hybrida (Hook.) Vilm.페튜니아rbcL
7881WBN0356460Potamogeton fryeri A. Benn.선가래rbcL
45620WBN0401025Cardamine leucantha (Tausch) O. E. Schulz미나리냉이trnH-psbA
47831WBN0395041Clematis ochotensis (Pall.) Poir.자주종덩굴rbcL
유전정보아이디학명국명마커명
63650WBN0403163Cylindromyia brassicaria (Fabricius, 1775)표주박기생파리COI
47852WBN0395062Coriandrum sativum L.고수rbcL
19620WBN0347780Solanum lycopersicum L.토마토trnH-psbA
10779WBN0348976Asparagus cochinchinensis (Lour.) Merr.천문동ITS
29079WBN0365442Mytilus unguiculatus Valenciennes, 1858홍합16S
9701WBN0354896Aster meyendorffii (Regel & Maack) Voss개쑥부쟁이matK
17755WBN0361442Sagina L.개미자리속trnL-F
58179WBN0389533Modiolicola bifida Tanaka, 1961진주담치속살이COI
38161WBN0373770Gasteracantha kuhli C. L. Koch, 1837가시거미16S
41303WBN0376532Dissotis rotundifolia (Sm.) Triana<NA>ITS