Overview

Dataset statistics

Number of variables9
Number of observations2366
Missing cells9445
Missing cells (%)44.4%
Duplicate rows10
Duplicate rows (%)0.4%
Total size in memory166.5 KiB
Average record size in memory72.1 B

Variable types

Categorical2
Text7

Dataset

Description농림수산식품 생산·가공R&D 논문 정보
Author농림식품기술기획평가원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220211000000001835

Alerts

Unnamed: 6 has constant value ""Constant
Unnamed: 7 has constant value ""Constant
Unnamed: 8 has constant value ""Constant
Dataset has 10 (0.4%) duplicate rowsDuplicates
발표년월 is highly overall correlated with Unnamed: 5High correlation
Unnamed: 5 is highly overall correlated with 발표년월High correlation
Unnamed: 5 is highly imbalanced (98.2%)Imbalance
Unnamed: 4 has 2353 (99.5%) missing valuesMissing
Unnamed: 6 has 2364 (99.9%) missing valuesMissing
Unnamed: 7 has 2364 (99.9%) missing valuesMissing
Unnamed: 8 has 2364 (99.9%) missing valuesMissing

Reproduction

Analysis started2023-12-11 03:15:07.461699
Analysis finished2023-12-11 03:15:08.969291
Duration1.51 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

발표년월
Categorical

HIGH CORRELATION 

Distinct48
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size18.6 KiB
2012-12
817 
2011-12
482 
2010-12
203 
2009-12
104 
2012-09
 
58
Other values (43)
702 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013-12
2nd row2013-12
3rd row2013-12
4th row2013-12
5th row2013-12

Common Values

ValueCountFrequency (%)
2012-12 817
34.5%
2011-12 482
20.4%
2010-12 203
 
8.6%
2009-12 104
 
4.4%
2012-09 58
 
2.5%
2012-06 56
 
2.4%
2011-06 40
 
1.7%
2011-09 36
 
1.5%
2012-03 36
 
1.5%
2011-03 31
 
1.3%
Other values (38) 503
21.3%

Length

2023-12-11T12:15:09.064472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2012-12 817
34.5%
2011-12 482
20.4%
2010-12 203
 
8.6%
2009-12 104
 
4.4%
2012-09 58
 
2.5%
2012-06 56
 
2.4%
2011-06 40
 
1.7%
2011-09 36
 
1.5%
2012-03 36
 
1.5%
2011-03 31
 
1.3%
Other values (38) 503
21.3%
Distinct2323
Distinct (%)98.2%
Missing0
Missing (%)0.0%
Memory size18.6 KiB
2023-12-11T12:15:09.732803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length244
Median length163
Mean length69.894336
Min length12

Characters and Unicode

Total characters165370
Distinct characters774
Distinct categories16 ?
Distinct scripts5 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2284 ?
Unique (%)96.5%

Sample

1st rowMolecular characterization and phylogenetic analysis of deformed wing viruses.isolated from South Korea
2nd rowDevelopment of RAPD-SCAR Molecular Marker Related to Seed-hair Characteristic in Carrot
3rd rowMolecular characterization and phylogenetic analysis of deformed wing viruses isolated from South Korea
4th rowGrowth and phenolic content of sowthistle grown in a closed-type plant production system with a UV-A or UV-B lamp
5th rowLight intensity and photoperiod influence the growth and development of hydroponically grown leaf lettuce in a closed-type plant factory system
ValueCountFrequency (%)
of 1247
 
4.7%
and 716
 
2.7%
in 598
 
2.3%
448
 
1.7%
the 358
 
1.4%
미치는 241
 
0.9%
on 240
 
0.9%
영향 232
 
0.9%
a 197
 
0.7%
따른 172
 
0.6%
Other values (9383) 22058
83.2%
2023-12-11T12:15:10.388531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
24163
 
14.6%
e 10225
 
6.2%
i 9391
 
5.7%
a 8726
 
5.3%
o 8524
 
5.2%
n 7911
 
4.8%
t 7626
 
4.6%
r 6651
 
4.0%
s 5969
 
3.6%
l 4522
 
2.7%
Other values (764) 71662
43.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 97891
59.2%
Other Letter 29852
 
18.1%
Space Separator 24163
 
14.6%
Uppercase Letter 9973
 
6.0%
Other Punctuation 1032
 
0.6%
Decimal Number 712
 
0.4%
Dash Punctuation 700
 
0.4%
Open Punctuation 408
 
0.2%
Close Punctuation 408
 
0.2%
Final Punctuation 89
 
0.1%
Other values (6) 142
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1143
 
3.8%
830
 
2.8%
779
 
2.6%
525
 
1.8%
496
 
1.7%
456
 
1.5%
453
 
1.5%
432
 
1.4%
427
 
1.4%
378
 
1.3%
Other values (662) 23933
80.2%
Lowercase Letter
ValueCountFrequency (%)
e 10225
10.4%
i 9391
 
9.6%
a 8726
 
8.9%
o 8524
 
8.7%
n 7911
 
8.1%
t 7626
 
7.8%
r 6651
 
6.8%
s 5969
 
6.1%
l 4522
 
4.6%
c 4291
 
4.4%
Other values (22) 24055
24.6%
Uppercase Letter
ValueCountFrequency (%)
C 959
 
9.6%
S 922
 
9.2%
P 900
 
9.0%
A 871
 
8.7%
E 628
 
6.3%
R 567
 
5.7%
M 534
 
5.4%
T 484
 
4.9%
I 430
 
4.3%
D 418
 
4.2%
Other values (16) 3260
32.7%
Other Punctuation
ValueCountFrequency (%)
, 487
47.2%
. 216
20.9%
' 165
 
16.0%
: 92
 
8.9%
/ 38
 
3.7%
; 9
 
0.9%
9
 
0.9%
· 7
 
0.7%
? 5
 
0.5%
& 3
 
0.3%
Decimal Number
ValueCountFrequency (%)
1 190
26.7%
2 138
19.4%
3 83
11.7%
0 75
 
10.5%
4 45
 
6.3%
6 44
 
6.2%
5 41
 
5.8%
9 33
 
4.6%
8 33
 
4.6%
7 30
 
4.2%
Math Symbol
ValueCountFrequency (%)
| 19
51.4%
× 7
 
18.9%
< 4
 
10.8%
> 4
 
10.8%
+ 3
 
8.1%
Open Punctuation
ValueCountFrequency (%)
( 402
98.5%
[ 5
 
1.2%
1
 
0.2%
Close Punctuation
ValueCountFrequency (%)
) 402
98.5%
] 5
 
1.2%
1
 
0.2%
Dash Punctuation
ValueCountFrequency (%)
- 699
99.9%
1
 
0.1%
Final Punctuation
ValueCountFrequency (%)
78
87.6%
11
 
12.4%
Initial Punctuation
ValueCountFrequency (%)
76
87.4%
11
 
12.6%
Letter Number
ValueCountFrequency (%)
4
66.7%
2
33.3%
Space Separator
ValueCountFrequency (%)
24163
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 9
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 107858
65.2%
Hangul 29845
 
18.0%
Common 27648
 
16.7%
Greek 12
 
< 0.1%
Han 7
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1143
 
3.8%
830
 
2.8%
779
 
2.6%
525
 
1.8%
496
 
1.7%
456
 
1.5%
453
 
1.5%
432
 
1.4%
427
 
1.4%
378
 
1.3%
Other values (655) 23926
80.2%
Latin
ValueCountFrequency (%)
e 10225
 
9.5%
i 9391
 
8.7%
a 8726
 
8.1%
o 8524
 
7.9%
n 7911
 
7.3%
t 7626
 
7.1%
r 6651
 
6.2%
s 5969
 
5.5%
l 4522
 
4.2%
c 4291
 
4.0%
Other values (45) 34022
31.5%
Common
ValueCountFrequency (%)
24163
87.4%
- 699
 
2.5%
, 487
 
1.8%
( 402
 
1.5%
) 402
 
1.5%
. 216
 
0.8%
1 190
 
0.7%
' 165
 
0.6%
2 138
 
0.5%
: 92
 
0.3%
Other values (32) 694
 
2.5%
Han
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Greek
ValueCountFrequency (%)
β 8
66.7%
τ 1
 
8.3%
δ 1
 
8.3%
κ 1
 
8.3%
α 1
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 135295
81.8%
Hangul 29845
 
18.0%
Punctuation 177
 
0.1%
None 39
 
< 0.1%
CJK 7
 
< 0.1%
Number Forms 6
 
< 0.1%
CJK Compat 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
24163
17.9%
e 10225
 
7.6%
i 9391
 
6.9%
a 8726
 
6.4%
o 8524
 
6.3%
n 7911
 
5.8%
t 7626
 
5.6%
r 6651
 
4.9%
s 5969
 
4.4%
l 4522
 
3.3%
Other values (73) 41587
30.7%
Hangul
ValueCountFrequency (%)
1143
 
3.8%
830
 
2.8%
779
 
2.6%
525
 
1.8%
496
 
1.7%
456
 
1.5%
453
 
1.5%
432
 
1.4%
427
 
1.4%
378
 
1.3%
Other values (655) 23926
80.2%
Punctuation
ValueCountFrequency (%)
78
44.1%
76
42.9%
11
 
6.2%
11
 
6.2%
1
 
0.6%
None
ValueCountFrequency (%)
9
23.1%
β 8
20.5%
× 7
17.9%
· 7
17.9%
ß 2
 
5.1%
1
 
2.6%
1
 
2.6%
τ 1
 
2.6%
δ 1
 
2.6%
κ 1
 
2.6%
Number Forms
ValueCountFrequency (%)
4
66.7%
2
33.3%
CJK
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
CJK Compat
ValueCountFrequency (%)
1
100.0%

저자
Text

Distinct1596
Distinct (%)67.5%
Missing0
Missing (%)0.0%
Memory size18.6 KiB
2023-12-11T12:15:10.839374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length39
Median length3
Mean length5.4763314
Min length2

Characters and Unicode

Total characters12957
Distinct characters319
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1154 ?
Unique (%)48.8%

Sample

1st rowReddy, Kondreddy Eswar
2nd rowEun-JoShim
3rd rowSeung-WonKang
4th rowMin-JeongLee
5th rowJeongHwaKang
ValueCountFrequency (%)
kim 64
 
2.1%
lee 47
 
1.5%
park 35
 
1.1%
choi 18
 
0.6%
최기춘 15
 
0.5%
김남훈 12
 
0.4%
s 12
 
0.4%
김성은 11
 
0.4%
h 11
 
0.4%
kang 11
 
0.4%
Other values (1712) 2811
92.3%
2023-12-11T12:15:11.535556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 737
 
5.7%
681
 
5.3%
o 545
 
4.2%
a 444
 
3.4%
e 443
 
3.4%
, 405
 
3.1%
379
 
2.9%
u 371
 
2.9%
i 351
 
2.7%
g 342
 
2.6%
Other values (309) 8259
63.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 5486
42.3%
Lowercase Letter 4311
33.3%
Uppercase Letter 1650
 
12.7%
Space Separator 681
 
5.3%
Other Punctuation 607
 
4.7%
Dash Punctuation 197
 
1.5%
Close Punctuation 10
 
0.1%
Open Punctuation 10
 
0.1%
Decimal Number 4
 
< 0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
379
 
6.9%
261
 
4.8%
191
 
3.5%
164
 
3.0%
122
 
2.2%
107
 
2.0%
106
 
1.9%
96
 
1.7%
91
 
1.7%
86
 
1.6%
Other values (249) 3883
70.8%
Uppercase Letter
ValueCountFrequency (%)
J 208
12.6%
S 207
12.5%
K 193
11.7%
H 189
11.5%
Y 121
 
7.3%
C 95
 
5.8%
M 89
 
5.4%
L 82
 
5.0%
P 62
 
3.8%
I 45
 
2.7%
Other values (16) 359
21.8%
Lowercase Letter
ValueCountFrequency (%)
n 737
17.1%
o 545
12.6%
a 444
10.3%
e 443
10.3%
u 371
8.6%
i 351
8.1%
g 342
7.9%
h 202
 
4.7%
m 165
 
3.8%
y 144
 
3.3%
Other values (15) 567
13.2%
Other Punctuation
ValueCountFrequency (%)
, 405
66.7%
. 179
29.5%
; 23
 
3.8%
Space Separator
ValueCountFrequency (%)
681
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 197
100.0%
Close Punctuation
ValueCountFrequency (%)
) 10
100.0%
Open Punctuation
ValueCountFrequency (%)
( 10
100.0%
Decimal Number
ValueCountFrequency (%)
1 4
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5961
46.0%
Hangul 5486
42.3%
Common 1510
 
11.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
379
 
6.9%
261
 
4.8%
191
 
3.5%
164
 
3.0%
122
 
2.2%
107
 
2.0%
106
 
1.9%
96
 
1.7%
91
 
1.7%
86
 
1.6%
Other values (249) 3883
70.8%
Latin
ValueCountFrequency (%)
n 737
 
12.4%
o 545
 
9.1%
a 444
 
7.4%
e 443
 
7.4%
u 371
 
6.2%
i 351
 
5.9%
g 342
 
5.7%
J 208
 
3.5%
S 207
 
3.5%
h 202
 
3.4%
Other values (41) 2111
35.4%
Common
ValueCountFrequency (%)
681
45.1%
, 405
26.8%
- 197
 
13.0%
. 179
 
11.9%
; 23
 
1.5%
) 10
 
0.7%
( 10
 
0.7%
1 4
 
0.3%
_ 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7471
57.7%
Hangul 5486
42.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 737
 
9.9%
681
 
9.1%
o 545
 
7.3%
a 444
 
5.9%
e 443
 
5.9%
, 405
 
5.4%
u 371
 
5.0%
i 351
 
4.7%
g 342
 
4.6%
J 208
 
2.8%
Other values (50) 2944
39.4%
Hangul
ValueCountFrequency (%)
379
 
6.9%
261
 
4.8%
191
 
3.5%
164
 
3.0%
122
 
2.2%
107
 
2.0%
106
 
1.9%
96
 
1.7%
91
 
1.7%
86
 
1.6%
Other values (249) 3883
70.8%
Distinct720
Distinct (%)30.4%
Missing0
Missing (%)0.0%
Memory size18.6 KiB
2023-12-11T12:15:11.839636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length123
Median length79
Mean length19.954353
Min length4

Characters and Unicode

Total characters47212
Distinct characters268
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique450 ?
Unique (%)19.0%

Sample

1st rowVETERINARY MICROBIOLOGY
2nd rowKorean Journal of Horticultural Science & Technology
3rd rowVeterinary Microbiology
4th rowHORTICULTURE ENVIRONMENT AND BIOTECHNOLOGY
5th rowHort. Environ. Biotechnol.
ValueCountFrequency (%)
of 561
 
9.0%
journal 520
 
8.4%
and 294
 
4.7%
science 253
 
4.1%
korean 197
 
3.2%
168
 
2.7%
biotechnology 120
 
1.9%
한국육종학회지 115
 
1.9%
technology 108
 
1.7%
food 107
 
1.7%
Other values (703) 3763
60.6%
2023-12-11T12:15:12.307401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3860
 
8.2%
o 2074
 
4.4%
O 1683
 
3.6%
e 1590
 
3.4%
n 1523
 
3.2%
A 1378
 
2.9%
E 1331
 
2.8%
a 1287
 
2.7%
N 1284
 
2.7%
1241
 
2.6%
Other values (258) 29961
63.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 16115
34.1%
Lowercase Letter 15491
32.8%
Other Letter 11213
23.8%
Space Separator 3860
 
8.2%
Other Punctuation 205
 
0.4%
Decimal Number 110
 
0.2%
Open Punctuation 68
 
0.1%
Close Punctuation 68
 
0.1%
Dash Punctuation 63
 
0.1%
Math Symbol 19
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1241
 
11.1%
1136
 
10.1%
1103
 
9.8%
997
 
8.9%
961
 
8.6%
293
 
2.6%
292
 
2.6%
246
 
2.2%
235
 
2.1%
222
 
2.0%
Other values (187) 4487
40.0%
Lowercase Letter
ValueCountFrequency (%)
o 2074
13.4%
e 1590
10.3%
n 1523
9.8%
a 1287
8.3%
i 1199
 
7.7%
r 1097
 
7.1%
l 1095
 
7.1%
c 1000
 
6.5%
t 871
 
5.6%
u 602
 
3.9%
Other values (16) 3153
20.4%
Uppercase Letter
ValueCountFrequency (%)
O 1683
 
10.4%
A 1378
 
8.6%
E 1331
 
8.3%
N 1284
 
8.0%
R 1106
 
6.9%
L 1063
 
6.6%
I 1044
 
6.5%
C 1012
 
6.3%
S 910
 
5.6%
T 820
 
5.1%
Other values (16) 4484
27.8%
Decimal Number
ValueCountFrequency (%)
0 44
40.0%
2 26
23.6%
1 17
 
15.5%
6 6
 
5.5%
9 6
 
5.5%
4 5
 
4.5%
7 3
 
2.7%
5 2
 
1.8%
3 1
 
0.9%
Other Punctuation
ValueCountFrequency (%)
& 136
66.3%
. 59
28.8%
: 7
 
3.4%
2
 
1.0%
' 1
 
0.5%
Space Separator
ValueCountFrequency (%)
3860
100.0%
Open Punctuation
ValueCountFrequency (%)
( 68
100.0%
Close Punctuation
ValueCountFrequency (%)
) 68
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 63
100.0%
Math Symbol
ValueCountFrequency (%)
= 19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 31606
66.9%
Hangul 11080
 
23.5%
Common 4393
 
9.3%
Han 133
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1241
 
11.2%
1136
 
10.3%
1103
 
10.0%
997
 
9.0%
961
 
8.7%
293
 
2.6%
292
 
2.6%
246
 
2.2%
235
 
2.1%
222
 
2.0%
Other values (164) 4354
39.3%
Latin
ValueCountFrequency (%)
o 2074
 
6.6%
O 1683
 
5.3%
e 1590
 
5.0%
n 1523
 
4.8%
A 1378
 
4.4%
E 1331
 
4.2%
a 1287
 
4.1%
N 1284
 
4.1%
i 1199
 
3.8%
R 1106
 
3.5%
Other values (42) 17151
54.3%
Han
ValueCountFrequency (%)
16
12.0%
16
12.0%
11
 
8.3%
10
 
7.5%
10
 
7.5%
8
 
6.0%
8
 
6.0%
8
 
6.0%
7
 
5.3%
7
 
5.3%
Other values (13) 32
24.1%
Common
ValueCountFrequency (%)
3860
87.9%
& 136
 
3.1%
( 68
 
1.5%
) 68
 
1.5%
- 63
 
1.4%
. 59
 
1.3%
0 44
 
1.0%
2 26
 
0.6%
= 19
 
0.4%
1 17
 
0.4%
Other values (9) 33
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 35997
76.2%
Hangul 11080
 
23.5%
CJK 133
 
0.3%
None 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3860
 
10.7%
o 2074
 
5.8%
O 1683
 
4.7%
e 1590
 
4.4%
n 1523
 
4.2%
A 1378
 
3.8%
E 1331
 
3.7%
a 1287
 
3.6%
N 1284
 
3.6%
i 1199
 
3.3%
Other values (60) 18788
52.2%
Hangul
ValueCountFrequency (%)
1241
 
11.2%
1136
 
10.3%
1103
 
10.0%
997
 
9.0%
961
 
8.7%
293
 
2.6%
292
 
2.6%
246
 
2.2%
235
 
2.1%
222
 
2.0%
Other values (164) 4354
39.3%
CJK
ValueCountFrequency (%)
16
12.0%
16
12.0%
11
 
8.3%
10
 
7.5%
10
 
7.5%
8
 
6.0%
8
 
6.0%
8
 
6.0%
7
 
5.3%
7
 
5.3%
Other values (13) 32
24.1%
None
ValueCountFrequency (%)
2
100.0%

Unnamed: 4
Text

MISSING 

Distinct7
Distinct (%)53.8%
Missing2353
Missing (%)99.5%
Memory size18.6 KiB
2023-12-11T12:15:12.482202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length26
Mean length14.538462
Min length5

Characters and Unicode

Total characters189
Distinct characters25
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)38.5%

Sample

1st row Environment
2nd row Biotechnology
3rd row Environment
4th row Environment
5th row Environment
ValueCountFrequency (%)
environment 9
47.4%
3
 
15.8%
bioenergy 2
 
10.5%
agriculture 2
 
10.5%
biotechnology 1
 
5.3%
정책연구 1
 
5.3%
ecosystems 1
 
5.3%
2023-12-11T12:15:12.766145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 30
15.9%
20
10.6%
e 17
9.0%
r 15
7.9%
o 15
7.9%
i 14
 
7.4%
t 13
 
6.9%
E 10
 
5.3%
m 10
 
5.3%
v 9
 
4.8%
Other values (15) 36
19.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 149
78.8%
Space Separator 20
 
10.6%
Uppercase Letter 13
 
6.9%
Other Letter 4
 
2.1%
Other Punctuation 3
 
1.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 30
20.1%
e 17
11.4%
r 15
10.1%
o 15
10.1%
i 14
9.4%
t 13
8.7%
m 10
 
6.7%
v 9
 
6.0%
g 5
 
3.4%
c 4
 
2.7%
Other values (6) 17
11.4%
Other Letter
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Uppercase Letter
ValueCountFrequency (%)
E 10
76.9%
A 2
 
15.4%
B 1
 
7.7%
Space Separator
ValueCountFrequency (%)
20
100.0%
Other Punctuation
ValueCountFrequency (%)
& 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 162
85.7%
Common 23
 
12.2%
Hangul 4
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 30
18.5%
e 17
10.5%
r 15
9.3%
o 15
9.3%
i 14
8.6%
t 13
8.0%
E 10
 
6.2%
m 10
 
6.2%
v 9
 
5.6%
g 5
 
3.1%
Other values (9) 24
14.8%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Common
ValueCountFrequency (%)
20
87.0%
& 3
 
13.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 185
97.9%
Hangul 4
 
2.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 30
16.2%
20
10.8%
e 17
9.2%
r 15
8.1%
o 15
8.1%
i 14
7.6%
t 13
7.0%
E 10
 
5.4%
m 10
 
5.4%
v 9
 
4.9%
Other values (11) 32
17.3%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Unnamed: 5
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size18.6 KiB
<NA>
2357 
and Biotechnology
 
5
biowastes
 
2
and Biochemistry
 
1
and Biotechnology(electronic version)
 
1

Length

Max length38
Median length4
Mean length4.0545224
Min length4

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 2357
99.6%
and Biotechnology 5
 
0.2%
biowastes 2
 
0.1%
and Biochemistry 1
 
< 0.1%
and Biotechnology(electronic version) 1
 
< 0.1%

Length

2023-12-11T12:15:12.947191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T12:15:13.093854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 2357
99.3%
and 7
 
0.3%
biotechnology 5
 
0.2%
biowastes 2
 
0.1%
biochemistry 1
 
< 0.1%
biotechnology(electronic 1
 
< 0.1%
version 1
 
< 0.1%

Unnamed: 6
Text

CONSTANT  MISSING 

Distinct1
Distinct (%)50.0%
Missing2364
Missing (%)99.9%
Memory size18.6 KiB
2023-12-11T12:15:13.226114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length24
Mean length24
Min length24

Characters and Unicode

Total characters48
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row conversion technologies
2nd row conversion technologies
ValueCountFrequency (%)
conversion 2
50.0%
technologies 2
50.0%
2023-12-11T12:15:13.489846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 8
16.7%
n 6
12.5%
e 6
12.5%
4
8.3%
c 4
8.3%
s 4
8.3%
i 4
8.3%
v 2
 
4.2%
r 2
 
4.2%
t 2
 
4.2%
Other values (3) 6
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 44
91.7%
Space Separator 4
 
8.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 8
18.2%
n 6
13.6%
e 6
13.6%
c 4
9.1%
s 4
9.1%
i 4
9.1%
v 2
 
4.5%
r 2
 
4.5%
t 2
 
4.5%
h 2
 
4.5%
Other values (2) 4
9.1%
Space Separator
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 44
91.7%
Common 4
 
8.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 8
18.2%
n 6
13.6%
e 6
13.6%
c 4
9.1%
s 4
9.1%
i 4
9.1%
v 2
 
4.5%
r 2
 
4.5%
t 2
 
4.5%
h 2
 
4.5%
Other values (2) 4
9.1%
Common
ValueCountFrequency (%)
4
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 8
16.7%
n 6
12.5%
e 6
12.5%
4
8.3%
c 4
8.3%
s 4
8.3%
i 4
8.3%
v 2
 
4.2%
r 2
 
4.2%
t 2
 
4.2%
Other values (3) 6
12.5%

Unnamed: 7
Text

CONSTANT  MISSING 

Distinct1
Distinct (%)50.0%
Missing2364
Missing (%)99.9%
Memory size18.6 KiB
2023-12-11T12:15:13.665625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters38
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row biotransformations
2nd row biotransformations
ValueCountFrequency (%)
biotransformations 2
100.0%
2023-12-11T12:15:13.946470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 6
15.8%
i 4
10.5%
t 4
10.5%
r 4
10.5%
a 4
10.5%
n 4
10.5%
s 4
10.5%
2
 
5.3%
b 2
 
5.3%
f 2
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 36
94.7%
Space Separator 2
 
5.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 6
16.7%
i 4
11.1%
t 4
11.1%
r 4
11.1%
a 4
11.1%
n 4
11.1%
s 4
11.1%
b 2
 
5.6%
f 2
 
5.6%
m 2
 
5.6%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 36
94.7%
Common 2
 
5.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 6
16.7%
i 4
11.1%
t 4
11.1%
r 4
11.1%
a 4
11.1%
n 4
11.1%
s 4
11.1%
b 2
 
5.6%
f 2
 
5.6%
m 2
 
5.6%
Common
ValueCountFrequency (%)
2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 38
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 6
15.8%
i 4
10.5%
t 4
10.5%
r 4
10.5%
a 4
10.5%
n 4
10.5%
s 4
10.5%
2
 
5.3%
b 2
 
5.3%
f 2
 
5.3%

Unnamed: 8
Text

CONSTANT  MISSING 

Distinct1
Distinct (%)50.0%
Missing2364
Missing (%)99.9%
Memory size18.6 KiB
2023-12-11T12:15:14.104212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length24
Mean length24
Min length24

Characters and Unicode

Total characters48
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row production technologies
2nd row production technologies
ValueCountFrequency (%)
production 2
50.0%
technologies 2
50.0%
2023-12-11T12:15:14.387489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 8
16.7%
4
 
8.3%
c 4
 
8.3%
t 4
 
8.3%
i 4
 
8.3%
n 4
 
8.3%
e 4
 
8.3%
p 2
 
4.2%
r 2
 
4.2%
d 2
 
4.2%
Other values (5) 10
20.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 44
91.7%
Space Separator 4
 
8.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 8
18.2%
c 4
9.1%
t 4
9.1%
i 4
9.1%
n 4
9.1%
e 4
9.1%
p 2
 
4.5%
r 2
 
4.5%
d 2
 
4.5%
u 2
 
4.5%
Other values (4) 8
18.2%
Space Separator
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 44
91.7%
Common 4
 
8.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 8
18.2%
c 4
9.1%
t 4
9.1%
i 4
9.1%
n 4
9.1%
e 4
9.1%
p 2
 
4.5%
r 2
 
4.5%
d 2
 
4.5%
u 2
 
4.5%
Other values (4) 8
18.2%
Common
ValueCountFrequency (%)
4
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 8
16.7%
4
 
8.3%
c 4
 
8.3%
t 4
 
8.3%
i 4
 
8.3%
n 4
 
8.3%
e 4
 
8.3%
p 2
 
4.2%
r 2
 
4.2%
d 2
 
4.2%
Other values (5) 10
20.8%

Correlations

2023-12-11T12:15:14.467938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발표년월Unnamed: 4Unnamed: 5
발표년월1.0000.9200.870
Unnamed: 40.9201.0001.000
Unnamed: 50.8701.0001.000
2023-12-11T12:15:14.574265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발표년월Unnamed: 5
발표년월1.0000.529
Unnamed: 50.5291.000
2023-12-11T12:15:14.675249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발표년월Unnamed: 5
발표년월1.0000.529
Unnamed: 50.5291.000

Missing values

2023-12-11T12:15:08.516508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T12:15:08.725049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T12:15:08.888123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

발표년월논문명저자게재지명Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
02013-12Molecular characterization and phylogenetic analysis of deformed wing viruses.isolated from South KoreaReddy, Kondreddy EswarVETERINARY MICROBIOLOGY<NA><NA><NA><NA><NA>
12013-12Development of RAPD-SCAR Molecular Marker Related to Seed-hair Characteristic in CarrotEun-JoShimKorean Journal of Horticultural Science & Technology<NA><NA><NA><NA><NA>
22013-12Molecular characterization and phylogenetic analysis of deformed wing viruses isolated from South KoreaSeung-WonKangVeterinary Microbiology<NA><NA><NA><NA><NA>
32013-12Growth and phenolic content of sowthistle grown in a closed-type plant production system with a UV-A or UV-B lampMin-JeongLeeHORTICULTURE ENVIRONMENT AND BIOTECHNOLOGY<NA><NA><NA><NA><NA>
42013-12Light intensity and photoperiod influence the growth and development of hydroponically grown leaf lettuce in a closed-type plant factory systemJeongHwaKangHort. Environ. Biotechnol.<NA><NA><NA><NA><NA>
52013-12밀폐형 식물생산시스템에서 백색 LED를 이용한 광도와 광주기에 따른 상추의 생장박지은Protected Horticulture and Plant Factory<NA><NA><NA><NA><NA>
62013-12Growth and Phenolic Content of Sowthistle Grown in a Closed-type Plant Production System with a UV-A or UV-B LampMin-JeongLeeHorticultureEnvironmentand Biotechnology<NA><NA><NA>
72013-12시들음병 저항성 양배추 품종 ‘CT-171’ 육성송준호원예과학기술지<NA><NA><NA><NA><NA>
82013-12홀스타인 착유우에서 중성세제불용섬유소의 수준과 조사료유래 중성세제불용섬유소의 수준이 사료섭취량 및 유생산성에 미치는 영향이도형한국초지조사료학회지<NA><NA><NA><NA><NA>
92013-10Physicochemical and Sensory Properties of Restructured Jerky with Four AdditivesKimYoungBoongKorean J. Food Sci. An.<NA><NA><NA><NA><NA>
발표년월논문명저자게재지명Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
23562009-12선인장 Mammillaria goldii, M. theresae와 M. pseudopectinata의 생육, 개화 및 종자형성 특성송천영화훼연구<NA><NA><NA><NA><NA>
23572009-12과채류 주수입국 일본에 있어서 파프리카에대한 선호도 분석심이성한국생물환경조절학회<NA><NA><NA><NA><NA>
23582009-12멜론 수출농가 경영컨설팅 의향 분석송경환지역발전연구<NA><NA><NA><NA><NA>
23592009-12A GHR Polymorphism and Its Associations with Carcass Traits in Hanwoo CattleHan,S.-H.;Han, S.-H.Genes & Genomics<NA><NA><NA><NA><NA>
23602009-12초임계유체 추출을 이용한 포도씨 tocotrienol추출 조건 최적화정헌상;김경미한국식품과학회지<NA><NA><NA><NA><NA>
23612009-12Maturation and Spawning of the Korean Anchovy Coilia nasus on the West Coast of Korea전제천발생과 생식<NA><NA><NA><NA><NA>
23622009-12어린 포도 잎을 이용한 폴레페놀 고함유 분말 제조장석원한국식품저장유통학회지<NA><NA><NA><NA><NA>
23632009-12Evaluation of Extruded Pellets Containing Different Protein and Lipid Levels, and Raw Fish-Based Moist Pellet for Growth of Flounder(Paralichthys olivaceus)김경덕한국수산과학회지<NA><NA><NA><NA><NA>
23642009-12시설하우스용 보온자재의 보온특성정성원한국생물환경조절학회<NA><NA><NA><NA><NA>
23652009-12Histological and Biochemical Analyses on Reproductive Cycle of Gomphina melanaegis (Bivalvia; Veneridae)김수경한국수산과학회지<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

발표년월논문명저자게재지명Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8# duplicates
92013-06Silicon Uptake Level of Six Potted Plants from a Potassium Silicate-Supplemented Hydroponic Solution손문숙원예과학기술지<NA><NA><NA><NA><NA>3
02010-06유자 부산물 에탄올 추출물의 항노화 및 미백효과김다슬대한화장품학회지<NA><NA><NA><NA><NA>2
12011-12간척지토양에서 하수슬러지 고화물 처리가 에너지작물의 생육에 미치는 영향안기홍한국작물학회지<NA><NA><NA><NA><NA>2
22012-12A novel thermotolerant and acidotolerant peptide produced by a Bacillus strain newly isolated from a fermented food (kimchi) shows activity against multidrug-resistant bacteria.ChoiYunHeeInternational Journal of Antimicrobial Agents<NA><NA><NA><NA><NA>2
32012-12An extremely alkaline novel xylanase from a newly isolated streptomyces strain cultivated in corncob medium.SimkhadaJRApplied Biochemistry and Biotechnology<NA><NA><NA><NA><NA>2
42012-12Automatic Tension Control of a Timber Carriage Used for Biomass Collection최윤성바이오시스템공학<NA><NA><NA><NA><NA>2
52012-12Effect of Different Biosynthetic Precursors on the Production of Nargenicin A1 from Metabolically Engineered Nocardia sp. CS682DineshKojuJournal of Microbiology and Biotechnology<NA><NA><NA><NA><NA>2
62012-12Validity Test for Molecular Markers Associated with Resistance to Phytophthora Root Rot in Chili Pepper (Capsicum annuum L.)WonPhilLeeKorean Journal of Horticultural Science and Technolology<NA><NA><NA><NA><NA>2
72012-12파프리카 여름재배시 차광방법이 생육과 과실특성에 미치는 영향하준봉한국생물환경조절학회지<NA><NA><NA><NA><NA>2
82013-02Subirrigational supply of silicon affects the growth of three chrysanthemum cultivarsI. SivanesanHorticultureEnvironmentand Biotechnology<NA><NA><NA>2