Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Numeric2
Categorical4
Text1

Dataset

Description국립생물자원관에서 유전정보와 관련된 연구사업을 수행하며 확보한 야생생물의 게놈 지도 정보 및 분석방법 목록 관련 정보입니다.
Author환경부 국립생물자원관
URLhttps://www.data.go.kr/data/15089672/fileData.do

Alerts

어셈블리조립버전 has constant value ""Constant
어셈블리조립방법 is highly overall correlated with 게놈고유아이디 and 2 other fieldsHigh correlation
게놈조립커버리지 is highly overall correlated with 게놈고유아이디 and 2 other fieldsHigh correlation
게놈고유아이디 is highly overall correlated with 유전자정보고유아이디 and 3 other fieldsHigh correlation
유전자정보고유아이디 is highly overall correlated with 게놈고유아이디 and 3 other fieldsHigh correlation
게놈조립정도 is highly overall correlated with 게놈고유아이디 and 1 other fieldsHigh correlation
게놈조립정도 is highly imbalanced (74.6%)Imbalance

Reproduction

Analysis started2023-12-12 05:18:27.176241
Analysis finished2023-12-12 05:18:28.388151
Duration1.21 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

게놈고유아이디
Real number (ℝ)

HIGH CORRELATION 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.2242
Minimum6
Maximum13
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:18:28.440034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile6
Q17
median9
Q311
95-th percentile13
Maximum13
Range7
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.3372806
Coefficient of variation (CV)0.25338573
Kurtosis-1.3720012
Mean9.2242
Median Absolute Deviation (MAD)2
Skewness-0.066484285
Sum92242
Variance5.4628806
MonotonicityNot monotonic
2023-12-12T14:18:28.561782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
6 2054
20.5%
12 1565
15.7%
11 1537
15.4%
9 1343
13.4%
7 1277
12.8%
10 1194
11.9%
13 605
 
6.0%
8 425
 
4.2%
ValueCountFrequency (%)
6 2054
20.5%
7 1277
12.8%
8 425
 
4.2%
9 1343
13.4%
10 1194
11.9%
11 1537
15.4%
12 1565
15.7%
13 605
 
6.0%
ValueCountFrequency (%)
13 605
 
6.0%
12 1565
15.7%
11 1537
15.4%
10 1194
11.9%
9 1343
13.4%
8 425
 
4.2%
7 1277
12.8%
6 2054
20.5%

게놈조립정도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
scaffold
9575 
contig
 
425

Length

Max length8
Median length8
Mean length7.915
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowscaffold
2nd rowscaffold
3rd rowscaffold
4th rowscaffold
5th rowscaffold

Common Values

ValueCountFrequency (%)
scaffold 9575
95.8%
contig 425
 
4.2%

Length

2023-12-12T14:18:28.708141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:18:28.831657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
scaffold 9575
95.8%
contig 425
 
4.2%

어셈블리조립버전
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
V_1
10000 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV_1
2nd rowV_1
3rd rowV_1
4th rowV_1
5th rowV_1

Common Values

ValueCountFrequency (%)
V_1 10000
100.0%

Length

2023-12-12T14:18:28.976182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:18:29.089624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
v_1 10000
100.0%

어셈블리조립방법
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Not known
7946 
SOAPdenovo2
2054 

Length

Max length11
Median length9
Mean length9.4108
Min length9

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSOAPdenovo2
2nd rowNot known
3rd rowNot known
4th rowNot known
5th rowNot known

Common Values

ValueCountFrequency (%)
Not known 7946
79.5%
SOAPdenovo2 2054
 
20.5%

Length

2023-12-12T14:18:29.230363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:18:29.352223image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
not 7946
44.3%
known 7946
44.3%
soapdenovo2 2054
 
11.4%

게놈조립커버리지
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Not known
7946 
30X
2054 

Length

Max length9
Median length9
Mean length7.7676
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row30X
2nd rowNot known
3rd rowNot known
4th rowNot known
5th rowNot known

Common Values

ValueCountFrequency (%)
Not known 7946
79.5%
30X 2054
 
20.5%

Length

2023-12-12T14:18:29.473530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:18:29.593230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
not 7946
44.3%
known 7946
44.3%
30x 2054
 
11.4%

유전자정보고유아이디
Real number (ℝ)

HIGH CORRELATION 

Distinct9876
Distinct (%)98.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean99996.032
Minimum63411
Maximum147732
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:18:29.720143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum63411
5-th percentile65805.65
Q175093.75
median98351
Q3122312.25
95-th percentile141204.8
Maximum147732
Range84321
Interquartile range (IQR)47218.5

Descriptive statistics

Standard deviation25254.427
Coefficient of variation (CV)0.25255429
Kurtosis-1.281019
Mean99996.032
Median Absolute Deviation (MAD)23483.5
Skewness0.19850996
Sum9.9996032 × 108
Variance6.377861 × 108
MonotonicityNot monotonic
2023-12-12T14:18:29.931876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
73837 2
 
< 0.1%
65420 2
 
< 0.1%
71338 2
 
< 0.1%
70847 2
 
< 0.1%
69453 2
 
< 0.1%
67813 2
 
< 0.1%
69772 2
 
< 0.1%
64976 2
 
< 0.1%
67566 2
 
< 0.1%
66984 2
 
< 0.1%
Other values (9866) 9980
99.8%
ValueCountFrequency (%)
63411 1
< 0.1%
63413 1
< 0.1%
63418 1
< 0.1%
63427 1
< 0.1%
63428 1
< 0.1%
63429 1
< 0.1%
63432 1
< 0.1%
63436 1
< 0.1%
63437 1
< 0.1%
63439 1
< 0.1%
ValueCountFrequency (%)
147732 1
< 0.1%
147705 1
< 0.1%
147704 1
< 0.1%
147687 1
< 0.1%
147637 1
< 0.1%
147624 1
< 0.1%
147607 1
< 0.1%
147589 1
< 0.1%
147560 1
< 0.1%
147559 1
< 0.1%
Distinct8426
Distinct (%)84.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T14:18:30.310424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length17
Mean length7.2082
Min length1

Characters and Unicode

Total characters72082
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7388 ?
Unique (%)73.9%

Sample

1st rowZNF420
2nd rowMON1
3rd rowGENE10981
4th rowEXG1
5th rowGPI16
ValueCountFrequency (%)
unknown 38
 
0.4%
het-e1 19
 
0.2%
het-6 14
 
0.1%
lovf 14
 
0.1%
spcc1529.01 12
 
0.1%
toxa 10
 
0.1%
mch5 10
 
0.1%
6-hdno 10
 
0.1%
stl1 10
 
0.1%
mtr 9
 
0.1%
Other values (8305) 9854
98.5%
2023-12-12T14:18:30.878474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 7382
 
10.2%
1 5976
 
8.3%
0 5853
 
8.1%
N 4487
 
6.2%
G 3974
 
5.5%
2 3309
 
4.6%
3 2767
 
3.8%
8 2668
 
3.7%
4 2309
 
3.2%
A 2301
 
3.2%
Other values (59) 31056
43.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 32586
45.2%
Decimal Number 30665
42.5%
Lowercase Letter 6560
 
9.1%
Other Punctuation 1378
 
1.9%
Connector Punctuation 572
 
0.8%
Dash Punctuation 315
 
0.4%
Open Punctuation 2
 
< 0.1%
Math Symbol 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 7382
22.7%
N 4487
13.8%
G 3974
12.2%
A 2301
 
7.1%
C 1757
 
5.4%
P 1667
 
5.1%
S 1360
 
4.2%
D 1206
 
3.7%
R 1045
 
3.2%
O 927
 
2.8%
Other values (16) 6480
19.9%
Lowercase Letter
ValueCountFrequency (%)
c 714
 
10.9%
p 541
 
8.2%
a 502
 
7.7%
r 431
 
6.6%
s 422
 
6.4%
t 372
 
5.7%
n 359
 
5.5%
l 300
 
4.6%
m 293
 
4.5%
d 266
 
4.1%
Other values (16) 2360
36.0%
Decimal Number
ValueCountFrequency (%)
1 5976
19.5%
0 5853
19.1%
2 3309
10.8%
3 2767
9.0%
8 2668
8.7%
4 2309
 
7.5%
6 2083
 
6.8%
5 2021
 
6.6%
7 1913
 
6.2%
9 1766
 
5.8%
Other Punctuation
ValueCountFrequency (%)
. 1377
99.9%
' 1
 
0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 572
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 315
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Math Symbol
ValueCountFrequency (%)
> 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 39146
54.3%
Common 32936
45.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 7382
18.9%
N 4487
 
11.5%
G 3974
 
10.2%
A 2301
 
5.9%
C 1757
 
4.5%
P 1667
 
4.3%
S 1360
 
3.5%
D 1206
 
3.1%
R 1045
 
2.7%
O 927
 
2.4%
Other values (42) 13040
33.3%
Common
ValueCountFrequency (%)
1 5976
18.1%
0 5853
17.8%
2 3309
10.0%
3 2767
8.4%
8 2668
8.1%
4 2309
 
7.0%
6 2083
 
6.3%
5 2021
 
6.1%
7 1913
 
5.8%
9 1766
 
5.4%
Other values (7) 2271
 
6.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 72082
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 7382
 
10.2%
1 5976
 
8.3%
0 5853
 
8.1%
N 4487
 
6.2%
G 3974
 
5.5%
2 3309
 
4.6%
3 2767
 
3.8%
8 2668
 
3.7%
4 2309
 
3.2%
A 2301
 
3.2%
Other values (59) 31056
43.1%

Interactions

2023-12-12T14:18:27.935081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:18:27.699471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:18:28.037063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:18:27.832037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:18:30.981396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
게놈고유아이디게놈조립정도어셈블리조립방법게놈조립커버리지유전자정보고유아이디
게놈고유아이디1.0001.0001.0001.0000.927
게놈조립정도1.0001.0000.1660.1660.835
어셈블리조립방법1.0000.1661.0001.0000.837
게놈조립커버리지1.0000.1661.0001.0000.837
유전자정보고유아이디0.9270.8350.8370.8371.000
2023-12-12T14:18:31.081506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
게놈조립정도어셈블리조립방법게놈조립커버리지
게놈조립정도1.0000.1060.106
어셈블리조립방법0.1061.0001.000
게놈조립커버리지0.1061.0001.000
2023-12-12T14:18:31.168138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
게놈고유아이디유전자정보고유아이디게놈조립정도어셈블리조립방법게놈조립커버리지
게놈고유아이디1.0000.9531.0001.0001.000
유전자정보고유아이디0.9531.0000.6680.6700.670
게놈조립정도1.0000.6681.0000.1060.106
어셈블리조립방법1.0000.6700.1061.0001.000
게놈조립커버리지1.0000.6700.1061.0001.000

Missing values

2023-12-12T14:18:28.187958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:18:28.330827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

게놈고유아이디게놈조립정도어셈블리조립버전어셈블리조립방법게놈조립커버리지유전자정보고유아이디유전자심볼이름
266956scaffoldV_1SOAPdenovo230X82114ZNF420
6320111scaffoldV_1Not knownNot known115421MON1
6986911scaffoldV_1Not knownNot known121669GENE10981
232827scaffoldV_1Not knownNot known67577EXG1
8618912scaffoldV_1Not knownNot known137941GPI16
381049scaffoldV_1Not knownNot known88851GENE02278
5143410scaffoldV_1Not knownNot known103894glnA2
8098012scaffoldV_1Not knownNot known133538AAE1
5176310scaffoldV_1Not knownNot known103676HAP2
7935512scaffoldV_1Not knownNot known128912GENE03584
게놈고유아이디게놈조립정도어셈블리조립버전어셈블리조립방법게놈조립커버리지유전자정보고유아이디유전자심볼이름
93506scaffoldV_1SOAPdenovo230X66425CHRDL1
220247scaffoldV_1Not knownNot known67180mvd1
467999scaffoldV_1Not knownNot known98582DNF1
301498contigV_1Not knownNot known8245488.DANOV.1_00072
208897scaffoldV_1Not knownNot known68957SPAC6G10.04c
166086scaffoldV_1SOAPdenovo230X76629RAI1
8963812scaffoldV_1Not knownNot known139460bznB
7096811scaffoldV_1Not knownNot known122770GENE12082
227107scaffoldV_1Not knownNot known66953GENE03631
8626612scaffoldV_1Not knownNot known1350076-HDNO