Dataset statistics
Number of variables | 3 |
---|---|
Number of observations | 7122 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 174.0 KiB |
Average record size in memory | 25.0 B |
Variable types
Numeric | 1 |
---|---|
Text | 1 |
Categorical | 1 |
Dataset
Description | 생물자원센터에서 분양서비스를 제공하는 미생물, 세포주, 곰팡이 등의 자원정보 입니다. 해당 데이터가 보유한 컬럼은 다음과 같습니다. 컬럼명 : 자원번호, 자원명, 분류 |
---|---|
URL | https://www.data.go.kr/data/3034156/fileData.do |
Reproduction
Analysis started | 2023-12-12 05:43:01.858878 |
---|---|
Analysis finished | 2023-12-12 05:43:02.609817 |
Duration | 0.75 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
자원번호
Real number (ℝ)
UNIQUE
 
Distinct | 7122 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 32850.93 |
Minimum | 1018 |
---|---|
Maximum | 92972 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 62.7 KiB |
Quantile statistics
Minimum | 1018 |
---|---|
5-th percentile | 3801.05 |
Q1 | 13619.5 |
median | 23647 |
Q3 | 49545.5 |
95-th percentile | 82383.85 |
Maximum | 92972 |
Range | 91954 |
Interquartile range (IQR) | 35926 |
Descriptive statistics
Standard deviation | 23731.609 |
---|---|
Coefficient of variation (CV) | 0.72240297 |
Kurtosis | -0.44545375 |
Mean | 32850.93 |
Median Absolute Deviation (MAD) | 14544.5 |
Skewness | 0.75787221 |
Sum | 2.3396432 × 108 |
Variance | 5.6318927 × 108 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
6968 | 1 | < 0.1% |
52305 | 1 | < 0.1% |
52344 | 1 | < 0.1% |
52343 | 1 | < 0.1% |
52339 | 1 | < 0.1% |
52336 | 1 | < 0.1% |
52335 | 1 | < 0.1% |
52325 | 1 | < 0.1% |
52323 | 1 | < 0.1% |
52318 | 1 | < 0.1% |
Other values (7112) | 7112 |
Value | Count | Frequency (%) |
1018 | 1 | |
1036 | 1 | |
1038 | 1 | |
1060 | 1 | |
1063 | 1 | |
1066 | 1 | |
1077 | 1 | |
1080 | 1 | |
1372 | 1 | |
1373 | 1 |
Value | Count | Frequency (%) |
92972 | 1 | |
92971 | 1 | |
92890 | 1 | |
92889 | 1 | |
92879 | 1 | |
92878 | 1 | |
92877 | 1 | |
92876 | 1 | |
92872 | 1 | |
92860 | 1 |
자원명
Text
Distinct | 7039 |
---|---|
Distinct (%) | 98.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 55.8 KiB |
Length
Max length | 80 |
---|---|
Median length | 52 |
Mean length | 23.220303 |
Min length | 11 |
Characters and Unicode
Total characters | 165375 |
---|---|
Distinct characters | 57 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 6984 ? |
---|---|
Unique (%) | 98.1% |
Sample
1st row | Trichoderma reesei |
---|---|
2nd row | Candida albicans |
3rd row | Aspergillus oryzae var. oryzae |
4th row | Emericella nidulans |
5th row | Penicillium chrysogenum |
Value | Count | Frequency (%) |
paenibacillus | 169 | 1.2% |
subsp | 145 | 1.0% |
flavobacterium | 143 | 1.0% |
sp | 107 | 0.7% |
nocardioides | 101 | 0.7% |
soli | 98 | 0.7% |
sediminis | 93 | 0.6% |
clostridium | 82 | 0.6% |
hymenobacter | 75 | 0.5% |
streptomyces | 74 | 0.5% |
Other values (5461) | 13496 |
Most occurring characters
Value | Count | Frequency (%) |
i | 17848 | 10.8% |
a | 17007 | 10.3% |
e | 12283 | 7.4% |
s | 11716 | 7.1% |
o | 11296 | 6.8% |
r | 9801 | 5.9% |
l | 9341 | 5.6% |
n | 9242 | 5.6% |
c | 8969 | 5.4% |
8281 | 5.0% | |
Other values (47) | 49591 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 149579 | |
Space Separator | 8281 | 5.0% |
Uppercase Letter | 7124 | 4.3% |
Other Punctuation | 391 | 0.2% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
i | 17848 | |
a | 17007 | |
e | 12283 | 8.2% |
s | 11716 | 7.8% |
o | 11296 | 7.6% |
r | 9801 | 6.6% |
l | 9341 | 6.2% |
n | 9242 | 6.2% |
c | 8969 | 6.0% |
u | 7709 | 5.2% |
Other values (16) | 34367 |
Uppercase Letter
Value | Count | Frequency (%) |
P | 1011 | |
S | 819 | |
A | 816 | |
M | 648 | |
C | 547 | 7.7% |
L | 440 | 6.2% |
B | 311 | 4.4% |
N | 308 | 4.3% |
H | 304 | 4.3% |
F | 304 | 4.3% |
Other values (16) | 1616 |
Other Punctuation
Value | Count | Frequency (%) |
. | 270 | |
" | 119 | |
; | 1 | 0.3% |
, | 1 | 0.3% |
Space Separator
Value | Count | Frequency (%) |
8281 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 156703 | |
Common | 8672 | 5.2% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
i | 17848 | |
a | 17007 | |
e | 12283 | 7.8% |
s | 11716 | 7.5% |
o | 11296 | 7.2% |
r | 9801 | 6.3% |
l | 9341 | 6.0% |
n | 9242 | 5.9% |
c | 8969 | 5.7% |
u | 7709 | 4.9% |
Other values (42) | 41491 |
Common
Value | Count | Frequency (%) |
8281 | ||
. | 270 | 3.1% |
" | 119 | 1.4% |
; | 1 | < 0.1% |
, | 1 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 165375 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
i | 17848 | 10.8% |
a | 17007 | 10.3% |
e | 12283 | 7.4% |
s | 11716 | 7.1% |
o | 11296 | 6.8% |
r | 9801 | 5.9% |
l | 9341 | 5.6% |
n | 9242 | 5.6% |
c | 8969 | 5.4% |
8281 | 5.0% | |
Other values (47) | 49591 |
분류
Categorical
IMBALANCE
 
Distinct | 4 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 55.8 KiB |
Bacteria | |
---|---|
Yeast | 59 |
Archaea | 42 |
Mold | 13 |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 7.9619489 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Mold |
---|---|
2nd row | Yeast |
3rd row | Mold |
4th row | Mold |
5th row | Mold |
Common Values
Value | Count | Frequency (%) |
Bacteria | 7008 | |
Yeast | 59 | 0.8% |
Archaea | 42 | 0.6% |
Mold | 13 | 0.2% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
bacteria | 7008 | |
yeast | 59 | 0.8% |
archaea | 42 | 0.6% |
mold | 13 | 0.2% |
자원번호 | 분류 | |
---|---|---|
자원번호 | 1.000 | 0.188 |
분류 | 0.188 | 1.000 |
자원번호 | 분류 | |
---|---|---|
자원번호 | 1.000 | 0.113 |
분류 | 0.113 | 1.000 |
자원번호 | 자원명 | 분류 | |
---|---|---|---|
0 | 6968 | Trichoderma reesei | Mold |
1 | 7965 | Candida albicans | Yeast |
2 | 6983 | Aspergillus oryzae var. oryzae | Mold |
3 | 6048 | Emericella nidulans | Mold |
4 | 6052 | Penicillium chrysogenum | Mold |
5 | 6080 | Penicillium roquefortii | Mold |
6 | 7123 | Ogataea polymorpha | Yeast |
7 | 7125 | Hanseniaspora valbyensis | Yeast |
8 | 7134 | Rhodosporidium toruloides | Yeast |
9 | 7183 | Zygosaccharomyces mrakii | Yeast |
자원번호 | 자원명 | 분류 | |
---|---|---|---|
7112 | 8068 | Rhizobium rhizolycopersici | Bacteria |
7113 | 8069 | Azotobacter chroococcum subsp. chroococcum | Bacteria |
7114 | 8070 | Stenotrophomonas sepilia | Bacteria |
7115 | 8076 | Pseudomonas rhizovicinus | Bacteria |
7116 | 8089 | Halomonas elongata | Bacteria |
7117 | 8090 | Halomonas venusta | Bacteria |
7118 | 8091 | Pseudoxanthomonas helianthi | Bacteria |
7119 | 8114 | Halomonas malpeensis | Bacteria |
7120 | 8125 | Halomonas korlensis | Bacteria |
7121 | 8131 | Novosphingobium soli | Bacteria |