Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells2734
Missing cells (%)9.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory312.5 KiB
Average record size in memory32.0 B

Variable types

Text3

Dataset

Description생물 유전정보 중 DNA 바코드 관련 내용으로 그에 대한 정의와 국외 및 국내 연구동향, DNA 바코드의 필요성에 대한 내용 설명 입니다.
Author환경부 국립생물자원관
URLhttps://www.data.go.kr/data/15067608/fileData.do

Alerts

국명 has 2734 (27.3%) missing valuesMissing
유전정보아이디 has unique valuesUnique

Reproduction

Analysis started2023-12-12 09:19:47.544366
Analysis finished2023-12-12 09:19:48.382278
Duration0.84 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T18:19:48.617251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters100000
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st rowWBN0403419
2nd rowWBN0362518
3rd rowWBN0369629
4th rowWBN0339461
5th rowWBN0364430
ValueCountFrequency (%)
wbn0403419 1
 
< 0.1%
wbn0378672 1
 
< 0.1%
wbn0355176 1
 
< 0.1%
wbn0351627 1
 
< 0.1%
wbn0377113 1
 
< 0.1%
wbn0388020 1
 
< 0.1%
wbn0386675 1
 
< 0.1%
wbn0338612 1
 
< 0.1%
wbn0401216 1
 
< 0.1%
wbn0369089 1
 
< 0.1%
Other values (9990) 9990
99.9%
2023-12-12T18:19:49.100297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 14637
14.6%
3 13740
13.7%
W 10000
10.0%
B 10000
10.0%
N 10000
10.0%
4 5864
5.9%
6 5588
 
5.6%
9 5557
 
5.6%
7 5556
 
5.6%
8 5447
 
5.4%
Other values (3) 13611
13.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 70000
70.0%
Uppercase Letter 30000
30.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 14637
20.9%
3 13740
19.6%
4 5864
8.4%
6 5588
 
8.0%
9 5557
 
7.9%
7 5556
 
7.9%
8 5447
 
7.8%
5 5403
 
7.7%
2 4157
 
5.9%
1 4051
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
W 10000
33.3%
B 10000
33.3%
N 10000
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common 70000
70.0%
Latin 30000
30.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 14637
20.9%
3 13740
19.6%
4 5864
8.4%
6 5588
 
8.0%
9 5557
 
7.9%
7 5556
 
7.9%
8 5447
 
7.8%
5 5403
 
7.7%
2 4157
 
5.9%
1 4051
 
5.8%
Latin
ValueCountFrequency (%)
W 10000
33.3%
B 10000
33.3%
N 10000
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 14637
14.6%
3 13740
13.7%
W 10000
10.0%
B 10000
10.0%
N 10000
10.0%
4 5864
5.9%
6 5588
 
5.6%
9 5557
 
5.6%
7 5556
 
5.6%
8 5447
 
5.4%
Other values (3) 13611
13.6%
Distinct4776
Distinct (%)47.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T18:19:49.549566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length125
Median length75
Mean length32.2949
Min length5

Characters and Unicode

Total characters322949
Distinct characters76
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3072 ?
Unique (%)30.7%

Sample

1st rowMicropsalliota pleurocystidiata Heinem. & Little Flower 1983
2nd rowAgelena limbata Thorell, 1897
3rd rowImpatiens L.
4th rowChrysosplenium japonicum (Maxim.) Makino
5th rowChlorostoma lischkei Tapparone Canefri, 1874
ValueCountFrequency (%)
1126
 
2.6%
l 1068
 
2.5%
et 540
 
1.3%
al 537
 
1.3%
ex 450
 
1.1%
nakai 362
 
0.8%
japonica 314
 
0.7%
a 301
 
0.7%
var 271
 
0.6%
h 270
 
0.6%
Other values (8284) 37618
87.8%
2023-12-12T18:19:50.180786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
32857
 
10.2%
a 28913
 
9.0%
i 22305
 
6.9%
e 18941
 
5.9%
s 15777
 
4.9%
o 15594
 
4.8%
r 15195
 
4.7%
n 14672
 
4.5%
l 13148
 
4.1%
u 12720
 
3.9%
Other values (66) 132827
41.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 223530
69.2%
Space Separator 32857
 
10.2%
Uppercase Letter 26693
 
8.3%
Decimal Number 18996
 
5.9%
Other Punctuation 13649
 
4.2%
Open Punctuation 3483
 
1.1%
Close Punctuation 3483
 
1.1%
Dash Punctuation 221
 
0.1%
Math Symbol 21
 
< 0.1%
Final Punctuation 16
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 2551
 
9.6%
C 2516
 
9.4%
S 2445
 
9.2%
M 2110
 
7.9%
A 1766
 
6.6%
P 1724
 
6.5%
H 1503
 
5.6%
B 1340
 
5.0%
T 1305
 
4.9%
K 1191
 
4.5%
Other values (17) 8242
30.9%
Lowercase Letter
ValueCountFrequency (%)
a 28913
12.9%
i 22305
 
10.0%
e 18941
 
8.5%
s 15777
 
7.1%
o 15594
 
7.0%
r 15195
 
6.8%
n 14672
 
6.6%
l 13148
 
5.9%
u 12720
 
5.7%
t 10211
 
4.6%
Other values (16) 56054
25.1%
Decimal Number
ValueCountFrequency (%)
1 4912
25.9%
8 3178
16.7%
9 2392
12.6%
0 1850
 
9.7%
7 1558
 
8.2%
2 1543
 
8.1%
5 991
 
5.2%
6 903
 
4.8%
3 860
 
4.5%
4 809
 
4.3%
Other Punctuation
ValueCountFrequency (%)
. 8249
60.4%
, 3798
27.8%
& 1124
 
8.2%
? 468
 
3.4%
' 10
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 3479
99.9%
[ 4
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 3479
99.9%
] 4
 
0.1%
Space Separator
ValueCountFrequency (%)
32857
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 221
100.0%
Math Symbol
ValueCountFrequency (%)
× 21
100.0%
Final Punctuation
ValueCountFrequency (%)
16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 250223
77.5%
Common 72726
 
22.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 28913
 
11.6%
i 22305
 
8.9%
e 18941
 
7.6%
s 15777
 
6.3%
o 15594
 
6.2%
r 15195
 
6.1%
n 14672
 
5.9%
l 13148
 
5.3%
u 12720
 
5.1%
t 10211
 
4.1%
Other values (43) 82747
33.1%
Common
ValueCountFrequency (%)
32857
45.2%
. 8249
 
11.3%
1 4912
 
6.8%
, 3798
 
5.2%
( 3479
 
4.8%
) 3479
 
4.8%
8 3178
 
4.4%
9 2392
 
3.3%
0 1850
 
2.5%
7 1558
 
2.1%
Other values (13) 6974
 
9.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 322910
> 99.9%
None 23
 
< 0.1%
Punctuation 16
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
32857
 
10.2%
a 28913
 
9.0%
i 22305
 
6.9%
e 18941
 
5.9%
s 15777
 
4.9%
o 15594
 
4.8%
r 15195
 
4.7%
n 14672
 
4.5%
l 13148
 
4.1%
u 12720
 
3.9%
Other values (63) 132788
41.1%
None
ValueCountFrequency (%)
× 21
91.3%
Ø 2
 
8.7%
Punctuation
ValueCountFrequency (%)
16
100.0%

국명
Text

MISSING 

Distinct3092
Distinct (%)42.6%
Missing2734
Missing (%)27.3%
Memory size156.2 KiB
2023-12-12T18:19:50.559475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length11
Mean length4.6526287
Min length1

Characters and Unicode

Total characters33806
Distinct characters696
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1861 ?
Unique (%)25.6%

Sample

1st row들풀거미
2nd row물봉선속
3rd row산괭이눈
4th row밤고둥
5th row세포큰조롱
ValueCountFrequency (%)
밤고둥 159
 
2.2%
구멍밤고둥 105
 
1.4%
낫균속 84
 
1.2%
극동갯강구 80
 
1.1%
고랑딱개비 79
 
1.1%
홍합 79
 
1.1%
쇠살모사 76
 
1.0%
가는몸참집게 62
 
0.9%
갯장대 42
 
0.6%
덧나무 42
 
0.6%
Other values (3082) 6458
88.9%
2023-12-12T18:19:51.071448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1118
 
3.3%
943
 
2.8%
788
 
2.3%
769
 
2.3%
734
 
2.2%
678
 
2.0%
632
 
1.9%
470
 
1.4%
464
 
1.4%
460
 
1.4%
Other values (686) 26750
79.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 33802
> 99.9%
Uppercase Letter 2
 
< 0.1%
Other Punctuation 1
 
< 0.1%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1118
 
3.3%
943
 
2.8%
788
 
2.3%
769
 
2.3%
734
 
2.2%
678
 
2.0%
632
 
1.9%
470
 
1.4%
464
 
1.4%
460
 
1.4%
Other values (682) 26746
79.1%
Uppercase Letter
ValueCountFrequency (%)
U 1
50.0%
K 1
50.0%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%
Lowercase Letter
ValueCountFrequency (%)
a 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 33802
> 99.9%
Latin 3
 
< 0.1%
Common 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1118
 
3.3%
943
 
2.8%
788
 
2.3%
769
 
2.3%
734
 
2.2%
678
 
2.0%
632
 
1.9%
470
 
1.4%
464
 
1.4%
460
 
1.4%
Other values (682) 26746
79.1%
Latin
ValueCountFrequency (%)
a 1
33.3%
U 1
33.3%
K 1
33.3%
Common
ValueCountFrequency (%)
/ 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 33802
> 99.9%
ASCII 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1118
 
3.3%
943
 
2.8%
788
 
2.3%
769
 
2.3%
734
 
2.2%
678
 
2.0%
632
 
1.9%
470
 
1.4%
464
 
1.4%
460
 
1.4%
Other values (682) 26746
79.1%
ASCII
ValueCountFrequency (%)
/ 1
25.0%
a 1
25.0%
U 1
25.0%
K 1
25.0%

Missing values

2023-12-12T18:19:48.228599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:19:48.329708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

유전정보아이디학명국명
62991WBN0403419Micropsalliota pleurocystidiata Heinem. & Little Flower 1983<NA>
17404WBN0362518Agelena limbata Thorell, 1897들풀거미
26325WBN0369629Impatiens L.물봉선속
833WBN0339461Chrysosplenium japonicum (Maxim.) Makino산괭이눈
24921WBN0364430Chlorostoma lischkei Tapparone Canefri, 1874밤고둥
1889WBN0338910Cynanchum volubile (Maxim.) Hemsl.세포큰조롱
22828WBN0348845Maianthemum japonicum (A. Gray) La Frankie풀솜대
16881WBN0360619Leibnitzia anandria (L.) Turcz.솜나물
36011WBN0378668Eriocaulon truncatum Buch.-Ham. ex Mart.<NA>
18724WBN0347991Galium kinuta Nakai & H. Hara민둥갈퀴
유전정보아이디학명국명
63243WBN0402663Rikiosatoa grisea (Butler, 1878)두줄가지나방
10576WBN0350608Paraburkholderia caledonica Coenye et al. 2001<NA>
47431WBN0392150Arabis gemmifera (Matsum.) Makino산장대
41121WBN0381411Peromyia Kieffer, 1894어리애혹파리속
33542WBN0375844Spermacoce remota Lam.<NA>
28465WBN0366217Gloydius ussuriensis (Emelianov, 1929)쇠살모사
7834WBN0345319Sphingobium algicola Lee Y and Jeon CO. 2017<NA>
12356WBN0355370Taraxacum formosanum Kitam.영도민들레
48436WBN0390003Orthocladius ulaanbaatus Sasa and Suzuki, 1997울란바트깃깔따구
24055WBN0343215Forsythia ovata Nakai만리화