Overview

Dataset statistics

Number of variables3
Number of observations7122
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory174.0 KiB
Average record size in memory25.0 B

Variable types

Numeric1
Text1
Categorical1

Dataset

Description생물자원센터에서 분양서비스를 제공하는 미생물, 세포주, 곰팡이 등의 자원정보 입니다. 해당 데이터가 보유한 컬럼은 다음과 같습니다. 컬럼명 : 자원번호, 자원명, 분류
URLhttps://www.data.go.kr/data/3034156/fileData.do

Alerts

분류 is highly imbalanced (93.0%)Imbalance
자원번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 05:43:01.858878
Analysis finished2023-12-12 05:43:02.609817
Duration0.75 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

자원번호
Real number (ℝ)

UNIQUE 

Distinct7122
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32850.93
Minimum1018
Maximum92972
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size62.7 KiB
2023-12-12T14:43:02.683564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1018
5-th percentile3801.05
Q113619.5
median23647
Q349545.5
95-th percentile82383.85
Maximum92972
Range91954
Interquartile range (IQR)35926

Descriptive statistics

Standard deviation23731.609
Coefficient of variation (CV)0.72240297
Kurtosis-0.44545375
Mean32850.93
Median Absolute Deviation (MAD)14544.5
Skewness0.75787221
Sum2.3396432 × 108
Variance5.6318927 × 108
MonotonicityNot monotonic
2023-12-12T14:43:02.856946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6968 1
 
< 0.1%
52305 1
 
< 0.1%
52344 1
 
< 0.1%
52343 1
 
< 0.1%
52339 1
 
< 0.1%
52336 1
 
< 0.1%
52335 1
 
< 0.1%
52325 1
 
< 0.1%
52323 1
 
< 0.1%
52318 1
 
< 0.1%
Other values (7112) 7112
99.9%
ValueCountFrequency (%)
1018 1
< 0.1%
1036 1
< 0.1%
1038 1
< 0.1%
1060 1
< 0.1%
1063 1
< 0.1%
1066 1
< 0.1%
1077 1
< 0.1%
1080 1
< 0.1%
1372 1
< 0.1%
1373 1
< 0.1%
ValueCountFrequency (%)
92972 1
< 0.1%
92971 1
< 0.1%
92890 1
< 0.1%
92889 1
< 0.1%
92879 1
< 0.1%
92878 1
< 0.1%
92877 1
< 0.1%
92876 1
< 0.1%
92872 1
< 0.1%
92860 1
< 0.1%
Distinct7039
Distinct (%)98.8%
Missing0
Missing (%)0.0%
Memory size55.8 KiB
2023-12-12T14:43:03.101794image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length80
Median length52
Mean length23.220303
Min length11

Characters and Unicode

Total characters165375
Distinct characters57
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6984 ?
Unique (%)98.1%

Sample

1st rowTrichoderma reesei
2nd rowCandida albicans
3rd rowAspergillus oryzae var. oryzae
4th rowEmericella nidulans
5th rowPenicillium chrysogenum
ValueCountFrequency (%)
paenibacillus 169
 
1.2%
subsp 145
 
1.0%
flavobacterium 143
 
1.0%
sp 107
 
0.7%
nocardioides 101
 
0.7%
soli 98
 
0.7%
sediminis 93
 
0.6%
clostridium 82
 
0.6%
hymenobacter 75
 
0.5%
streptomyces 74
 
0.5%
Other values (5461) 13496
92.5%
2023-12-12T14:43:03.538523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 17848
 
10.8%
a 17007
 
10.3%
e 12283
 
7.4%
s 11716
 
7.1%
o 11296
 
6.8%
r 9801
 
5.9%
l 9341
 
5.6%
n 9242
 
5.6%
c 8969
 
5.4%
8281
 
5.0%
Other values (47) 49591
30.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 149579
90.4%
Space Separator 8281
 
5.0%
Uppercase Letter 7124
 
4.3%
Other Punctuation 391
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 17848
11.9%
a 17007
11.4%
e 12283
 
8.2%
s 11716
 
7.8%
o 11296
 
7.6%
r 9801
 
6.6%
l 9341
 
6.2%
n 9242
 
6.2%
c 8969
 
6.0%
u 7709
 
5.2%
Other values (16) 34367
23.0%
Uppercase Letter
ValueCountFrequency (%)
P 1011
14.2%
S 819
11.5%
A 816
11.5%
M 648
9.1%
C 547
 
7.7%
L 440
 
6.2%
B 311
 
4.4%
N 308
 
4.3%
H 304
 
4.3%
F 304
 
4.3%
Other values (16) 1616
22.7%
Other Punctuation
ValueCountFrequency (%)
. 270
69.1%
" 119
30.4%
; 1
 
0.3%
, 1
 
0.3%
Space Separator
ValueCountFrequency (%)
8281
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 156703
94.8%
Common 8672
 
5.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 17848
11.4%
a 17007
10.9%
e 12283
 
7.8%
s 11716
 
7.5%
o 11296
 
7.2%
r 9801
 
6.3%
l 9341
 
6.0%
n 9242
 
5.9%
c 8969
 
5.7%
u 7709
 
4.9%
Other values (42) 41491
26.5%
Common
ValueCountFrequency (%)
8281
95.5%
. 270
 
3.1%
" 119
 
1.4%
; 1
 
< 0.1%
, 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 165375
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 17848
 
10.8%
a 17007
 
10.3%
e 12283
 
7.4%
s 11716
 
7.1%
o 11296
 
6.8%
r 9801
 
5.9%
l 9341
 
5.6%
n 9242
 
5.6%
c 8969
 
5.4%
8281
 
5.0%
Other values (47) 49591
30.0%

분류
Categorical

IMBALANCE 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size55.8 KiB
Bacteria
7008 
Yeast
 
59
Archaea
 
42
Mold
 
13

Length

Max length8
Median length8
Mean length7.9619489
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMold
2nd rowYeast
3rd rowMold
4th rowMold
5th rowMold

Common Values

ValueCountFrequency (%)
Bacteria 7008
98.4%
Yeast 59
 
0.8%
Archaea 42
 
0.6%
Mold 13
 
0.2%

Length

2023-12-12T14:43:03.660312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:43:03.757317image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
bacteria 7008
98.4%
yeast 59
 
0.8%
archaea 42
 
0.6%
mold 13
 
0.2%

Interactions

2023-12-12T14:43:02.379434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:43:03.817362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자원번호분류
자원번호1.0000.188
분류0.1881.000
2023-12-12T14:43:03.892676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자원번호분류
자원번호1.0000.113
분류0.1131.000

Missing values

2023-12-12T14:43:02.499770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:43:02.576587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

자원번호자원명분류
06968Trichoderma reeseiMold
17965Candida albicansYeast
26983Aspergillus oryzae var. oryzaeMold
36048Emericella nidulansMold
46052Penicillium chrysogenumMold
56080Penicillium roquefortiiMold
67123Ogataea polymorphaYeast
77125Hanseniaspora valbyensisYeast
87134Rhodosporidium toruloidesYeast
97183Zygosaccharomyces mrakiiYeast
자원번호자원명분류
71128068Rhizobium rhizolycopersiciBacteria
71138069Azotobacter chroococcum subsp. chroococcumBacteria
71148070Stenotrophomonas sepiliaBacteria
71158076Pseudomonas rhizovicinusBacteria
71168089Halomonas elongataBacteria
71178090Halomonas venustaBacteria
71188091Pseudoxanthomonas helianthiBacteria
71198114Halomonas malpeensisBacteria
71208125Halomonas korlensisBacteria
71218131Novosphingobium soliBacteria