Dataset statistics
Number of variables | 7 |
---|---|
Number of observations | 10000 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 644.5 KiB |
Average record size in memory | 66.0 B |
Variable types
Numeric | 2 |
---|---|
Categorical | 4 |
Text | 1 |
Dataset
Description | 국립생물자원관에서 유전정보와 관련된 연구사업을 수행하며 확보한 야생생물의 게놈 지도 정보 및 분석방법 목록 관련 정보입니다. |
---|---|
Author | 환경부 국립생물자원관 |
URL | https://www.data.go.kr/data/15089672/fileData.do |
어셈블리조립버전 has constant value "" | Constant |
어셈블리조립방법 is highly overall correlated with 게놈고유아이디 and 2 other fields | High correlation |
게놈조립커버리지 is highly overall correlated with 게놈고유아이디 and 2 other fields | High correlation |
게놈고유아이디 is highly overall correlated with 유전자정보고유아이디 and 3 other fields | High correlation |
유전자정보고유아이디 is highly overall correlated with 게놈고유아이디 and 3 other fields | High correlation |
게놈조립정도 is highly overall correlated with 게놈고유아이디 and 1 other fields | High correlation |
게놈조립정도 is highly imbalanced (74.6%) | Imbalance |
Reproduction
Analysis started | 2023-12-12 05:18:27.176241 |
---|---|
Analysis finished | 2023-12-12 05:18:28.388151 |
Duration | 1.21 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
게놈고유아이디
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 8 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 9.2242 |
Minimum | 6 |
---|---|
Maximum | 13 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 6 |
---|---|
5-th percentile | 6 |
Q1 | 7 |
median | 9 |
Q3 | 11 |
95-th percentile | 13 |
Maximum | 13 |
Range | 7 |
Interquartile range (IQR) | 4 |
Descriptive statistics
Standard deviation | 2.3372806 |
---|---|
Coefficient of variation (CV) | 0.25338573 |
Kurtosis | -1.3720012 |
Mean | 9.2242 |
Median Absolute Deviation (MAD) | 2 |
Skewness | -0.066484285 |
Sum | 92242 |
Variance | 5.4628806 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
6 | 2054 | |
12 | 1565 | |
11 | 1537 | |
9 | 1343 | |
7 | 1277 | |
10 | 1194 | |
13 | 605 | 6.0% |
8 | 425 | 4.2% |
Value | Count | Frequency (%) |
6 | 2054 | |
7 | 1277 | |
8 | 425 | 4.2% |
9 | 1343 | |
10 | 1194 | |
11 | 1537 | |
12 | 1565 | |
13 | 605 | 6.0% |
Value | Count | Frequency (%) |
13 | 605 | 6.0% |
12 | 1565 | |
11 | 1537 | |
10 | 1194 | |
9 | 1343 | |
8 | 425 | 4.2% |
7 | 1277 | |
6 | 2054 |
게놈조립정도
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
scaffold | |
---|---|
contig | 425 |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 7.915 |
Min length | 6 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | scaffold |
---|---|
2nd row | scaffold |
3rd row | scaffold |
4th row | scaffold |
5th row | scaffold |
Common Values
Value | Count | Frequency (%) |
scaffold | 9575 | |
contig | 425 | 4.2% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
scaffold | 9575 | |
contig | 425 | 4.2% |
어셈블리조립버전
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
V_1 |
---|
Length
Max length | 3 |
---|---|
Median length | 3 |
Mean length | 3 |
Min length | 3 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | V_1 |
---|---|
2nd row | V_1 |
3rd row | V_1 |
4th row | V_1 |
5th row | V_1 |
Common Values
Value | Count | Frequency (%) |
V_1 | 10000 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
v_1 | 10000 |
어셈블리조립방법
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
Not known | |
---|---|
SOAPdenovo2 |
Length
Max length | 11 |
---|---|
Median length | 9 |
Mean length | 9.4108 |
Min length | 9 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | SOAPdenovo2 |
---|---|
2nd row | Not known |
3rd row | Not known |
4th row | Not known |
5th row | Not known |
Common Values
Value | Count | Frequency (%) |
Not known | 7946 | |
SOAPdenovo2 | 2054 | 20.5% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
not | 7946 | |
known | 7946 | |
soapdenovo2 | 2054 | 11.4% |
게놈조립커버리지
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
Not known | |
---|---|
30X |
Length
Max length | 9 |
---|---|
Median length | 9 |
Mean length | 7.7676 |
Min length | 3 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 30X |
---|---|
2nd row | Not known |
3rd row | Not known |
4th row | Not known |
5th row | Not known |
Common Values
Value | Count | Frequency (%) |
Not known | 7946 | |
30X | 2054 | 20.5% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
not | 7946 | |
known | 7946 | |
30x | 2054 | 11.4% |
유전자정보고유아이디
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 9876 |
---|---|
Distinct (%) | 98.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 99996.032 |
Minimum | 63411 |
---|---|
Maximum | 147732 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 63411 |
---|---|
5-th percentile | 65805.65 |
Q1 | 75093.75 |
median | 98351 |
Q3 | 122312.25 |
95-th percentile | 141204.8 |
Maximum | 147732 |
Range | 84321 |
Interquartile range (IQR) | 47218.5 |
Descriptive statistics
Standard deviation | 25254.427 |
---|---|
Coefficient of variation (CV) | 0.25255429 |
Kurtosis | -1.281019 |
Mean | 99996.032 |
Median Absolute Deviation (MAD) | 23483.5 |
Skewness | 0.19850996 |
Sum | 9.9996032 × 108 |
Variance | 6.377861 × 108 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
73837 | 2 | < 0.1% |
65420 | 2 | < 0.1% |
71338 | 2 | < 0.1% |
70847 | 2 | < 0.1% |
69453 | 2 | < 0.1% |
67813 | 2 | < 0.1% |
69772 | 2 | < 0.1% |
64976 | 2 | < 0.1% |
67566 | 2 | < 0.1% |
66984 | 2 | < 0.1% |
Other values (9866) | 9980 |
Value | Count | Frequency (%) |
63411 | 1 | |
63413 | 1 | |
63418 | 1 | |
63427 | 1 | |
63428 | 1 | |
63429 | 1 | |
63432 | 1 | |
63436 | 1 | |
63437 | 1 | |
63439 | 1 |
Value | Count | Frequency (%) |
147732 | 1 | |
147705 | 1 | |
147704 | 1 | |
147687 | 1 | |
147637 | 1 | |
147624 | 1 | |
147607 | 1 | |
147589 | 1 | |
147560 | 1 | |
147559 | 1 |
유전자심볼이름
Text
Distinct | 8426 |
---|---|
Distinct (%) | 84.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
Value | Count | Frequency (%) |
unknown | 38 | 0.4% |
het-e1 | 19 | 0.2% |
het-6 | 14 | 0.1% |
lovf | 14 | 0.1% |
spcc1529.01 | 12 | 0.1% |
toxa | 10 | 0.1% |
mch5 | 10 | 0.1% |
6-hdno | 10 | 0.1% |
stl1 | 10 | 0.1% |
mtr | 9 | 0.1% |
Other values (8305) | 9854 |
Most occurring characters
Value | Count | Frequency (%) |
E | 7382 | 10.2% |
1 | 5976 | 8.3% |
0 | 5853 | 8.1% |
N | 4487 | 6.2% |
G | 3974 | 5.5% |
2 | 3309 | 4.6% |
3 | 2767 | 3.8% |
8 | 2668 | 3.7% |
4 | 2309 | 3.2% |
A | 2301 | 3.2% |
Other values (59) | 31056 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 32586 | |
Decimal Number | 30665 | |
Lowercase Letter | 6560 | 9.1% |
Other Punctuation | 1378 | 1.9% |
Connector Punctuation | 572 | 0.8% |
Dash Punctuation | 315 | 0.4% |
Open Punctuation | 2 | < 0.1% |
Math Symbol | 2 | < 0.1% |
Close Punctuation | 2 | < 0.1% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
E | 7382 | |
N | 4487 | |
G | 3974 | |
A | 2301 | 7.1% |
C | 1757 | 5.4% |
P | 1667 | 5.1% |
S | 1360 | 4.2% |
D | 1206 | 3.7% |
R | 1045 | 3.2% |
O | 927 | 2.8% |
Other values (16) | 6480 |
Lowercase Letter
Value | Count | Frequency (%) |
c | 714 | 10.9% |
p | 541 | 8.2% |
a | 502 | 7.7% |
r | 431 | 6.6% |
s | 422 | 6.4% |
t | 372 | 5.7% |
n | 359 | 5.5% |
l | 300 | 4.6% |
m | 293 | 4.5% |
d | 266 | 4.1% |
Other values (16) | 2360 |
Decimal Number
Value | Count | Frequency (%) |
1 | 5976 | |
0 | 5853 | |
2 | 3309 | |
3 | 2767 | |
8 | 2668 | |
4 | 2309 | 7.5% |
6 | 2083 | 6.8% |
5 | 2021 | 6.6% |
7 | 1913 | 6.2% |
9 | 1766 | 5.8% |
Other Punctuation
Value | Count | Frequency (%) |
. | 1377 | |
' | 1 | 0.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 572 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 315 |
Open Punctuation
Value | Count | Frequency (%) |
( | 2 |
Math Symbol
Value | Count | Frequency (%) |
> | 2 |
Close Punctuation
Value | Count | Frequency (%) |
) | 2 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 39146 | |
Common | 32936 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
E | 7382 | |
N | 4487 | 11.5% |
G | 3974 | 10.2% |
A | 2301 | 5.9% |
C | 1757 | 4.5% |
P | 1667 | 4.3% |
S | 1360 | 3.5% |
D | 1206 | 3.1% |
R | 1045 | 2.7% |
O | 927 | 2.4% |
Other values (42) | 13040 |
Common
Value | Count | Frequency (%) |
1 | 5976 | |
0 | 5853 | |
2 | 3309 | |
3 | 2767 | |
8 | 2668 | |
4 | 2309 | 7.0% |
6 | 2083 | 6.3% |
5 | 2021 | 6.1% |
7 | 1913 | 5.8% |
9 | 1766 | 5.4% |
Other values (7) | 2271 | 6.9% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 72082 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
E | 7382 | 10.2% |
1 | 5976 | 8.3% |
0 | 5853 | 8.1% |
N | 4487 | 6.2% |
G | 3974 | 5.5% |
2 | 3309 | 4.6% |
3 | 2767 | 3.8% |
8 | 2668 | 3.7% |
4 | 2309 | 3.2% |
A | 2301 | 3.2% |
Other values (59) | 31056 |
게놈고유아이디 | 게놈조립정도 | 어셈블리조립방법 | 게놈조립커버리지 | 유전자정보고유아이디 | |
---|---|---|---|---|---|
게놈고유아이디 | 1.000 | 1.000 | 1.000 | 1.000 | 0.927 |
게놈조립정도 | 1.000 | 1.000 | 0.166 | 0.166 | 0.835 |
어셈블리조립방법 | 1.000 | 0.166 | 1.000 | 1.000 | 0.837 |
게놈조립커버리지 | 1.000 | 0.166 | 1.000 | 1.000 | 0.837 |
유전자정보고유아이디 | 0.927 | 0.835 | 0.837 | 0.837 | 1.000 |
게놈조립정도 | 어셈블리조립방법 | 게놈조립커버리지 | |
---|---|---|---|
게놈조립정도 | 1.000 | 0.106 | 0.106 |
어셈블리조립방법 | 0.106 | 1.000 | 1.000 |
게놈조립커버리지 | 0.106 | 1.000 | 1.000 |
게놈고유아이디 | 유전자정보고유아이디 | 게놈조립정도 | 어셈블리조립방법 | 게놈조립커버리지 | |
---|---|---|---|---|---|
게놈고유아이디 | 1.000 | 0.953 | 1.000 | 1.000 | 1.000 |
유전자정보고유아이디 | 0.953 | 1.000 | 0.668 | 0.670 | 0.670 |
게놈조립정도 | 1.000 | 0.668 | 1.000 | 0.106 | 0.106 |
어셈블리조립방법 | 1.000 | 0.670 | 0.106 | 1.000 | 1.000 |
게놈조립커버리지 | 1.000 | 0.670 | 0.106 | 1.000 | 1.000 |
게놈고유아이디 | 게놈조립정도 | 어셈블리조립버전 | 어셈블리조립방법 | 게놈조립커버리지 | 유전자정보고유아이디 | 유전자심볼이름 | |
---|---|---|---|---|---|---|---|
26695 | 6 | scaffold | V_1 | SOAPdenovo2 | 30X | 82114 | ZNF420 |
63201 | 11 | scaffold | V_1 | Not known | Not known | 115421 | MON1 |
69869 | 11 | scaffold | V_1 | Not known | Not known | 121669 | GENE10981 |
23282 | 7 | scaffold | V_1 | Not known | Not known | 67577 | EXG1 |
86189 | 12 | scaffold | V_1 | Not known | Not known | 137941 | GPI16 |
38104 | 9 | scaffold | V_1 | Not known | Not known | 88851 | GENE02278 |
51434 | 10 | scaffold | V_1 | Not known | Not known | 103894 | glnA2 |
80980 | 12 | scaffold | V_1 | Not known | Not known | 133538 | AAE1 |
51763 | 10 | scaffold | V_1 | Not known | Not known | 103676 | HAP2 |
79355 | 12 | scaffold | V_1 | Not known | Not known | 128912 | GENE03584 |
게놈고유아이디 | 게놈조립정도 | 어셈블리조립버전 | 어셈블리조립방법 | 게놈조립커버리지 | 유전자정보고유아이디 | 유전자심볼이름 | |
---|---|---|---|---|---|---|---|
9350 | 6 | scaffold | V_1 | SOAPdenovo2 | 30X | 66425 | CHRDL1 |
22024 | 7 | scaffold | V_1 | Not known | Not known | 67180 | mvd1 |
46799 | 9 | scaffold | V_1 | Not known | Not known | 98582 | DNF1 |
30149 | 8 | contig | V_1 | Not known | Not known | 82454 | 88.DANOV.1_00072 |
20889 | 7 | scaffold | V_1 | Not known | Not known | 68957 | SPAC6G10.04c |
16608 | 6 | scaffold | V_1 | SOAPdenovo2 | 30X | 76629 | RAI1 |
89638 | 12 | scaffold | V_1 | Not known | Not known | 139460 | bznB |
70968 | 11 | scaffold | V_1 | Not known | Not known | 122770 | GENE12082 |
22710 | 7 | scaffold | V_1 | Not known | Not known | 66953 | GENE03631 |
86266 | 12 | scaffold | V_1 | Not known | Not known | 135007 | 6-HDNO |