Overview

Dataset statistics

Number of variables14
Number of observations2465
Missing cells9686
Missing cells (%)28.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory272.1 KiB
Average record size in memory113.1 B

Variable types

Numeric1
Text9
Categorical4

Dataset

Description한국학중앙연구원 해외한국학지원사업 연구성과 저자정보
Author한국학중앙연구원
URLhttps://www.data.go.kr/data/15049069/fileData.do

Alerts

GANADA_AUTHOR_ENG is highly overall correlated with GANADA_AUTHOR_ORI and 1 other fieldsHigh correlation
GANADA_AUTHOR_ETC is highly overall correlated with GANADA_AUTHOR_ORI and 1 other fieldsHigh correlation
GANADA_AUTHOR_KOR is highly overall correlated with GANADA_AUTHOR_ORIHigh correlation
GANADA_AUTHOR_ORI is highly overall correlated with GANADA_AUTHOR_KOR and 2 other fieldsHigh correlation
GANADA_AUTHOR_KOR is highly imbalanced (56.2%)Imbalance
GANADA_AUTHOR_ETC is highly imbalanced (76.8%)Imbalance
AUTHOR_KOR has 1771 (71.8%) missing valuesMissing
AUTHOR_ENG has 1213 (49.2%) missing valuesMissing
AUTHOR_ETC has 1859 (75.4%) missing valuesMissing
SORT_AUTHOR_KOR has 1771 (71.8%) missing valuesMissing
SORT_AUTHOR_ENG has 1213 (49.2%) missing valuesMissing
SORT_AUTHOR_ETC has 1859 (75.4%) missing valuesMissing
AUTHOR_ID has unique valuesUnique

Reproduction

Analysis started2023-12-12 06:02:53.471672
Analysis finished2023-12-12 06:02:56.544119
Duration3.07 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

AUTHOR_ID
Real number (ℝ)

UNIQUE 

Distinct2465
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8719.3671
Minimum7438
Maximum10443
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.8 KiB
2023-12-12T15:02:56.643150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum7438
5-th percentile7561.2
Q18068
median8712
Q39341
95-th percentile9838.8
Maximum10443
Range3005
Interquartile range (IQR)1273

Descriptive statistics

Standard deviation759.23051
Coefficient of variation (CV)0.087074038
Kurtosis-0.96733188
Mean8719.3671
Median Absolute Deviation (MAD)635
Skewness0.12412938
Sum21493240
Variance576430.96
MonotonicityStrictly increasing
2023-12-12T15:02:56.860430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7438 1
 
< 0.1%
9132 1
 
< 0.1%
9125 1
 
< 0.1%
9126 1
 
< 0.1%
9127 1
 
< 0.1%
9128 1
 
< 0.1%
9129 1
 
< 0.1%
9130 1
 
< 0.1%
9131 1
 
< 0.1%
9133 1
 
< 0.1%
Other values (2455) 2455
99.6%
ValueCountFrequency (%)
7438 1
< 0.1%
7439 1
< 0.1%
7440 1
< 0.1%
7441 1
< 0.1%
7442 1
< 0.1%
7443 1
< 0.1%
7444 1
< 0.1%
7445 1
< 0.1%
7446 1
< 0.1%
7447 1
< 0.1%
ValueCountFrequency (%)
10443 1
< 0.1%
10442 1
< 0.1%
10437 1
< 0.1%
10436 1
< 0.1%
10435 1
< 0.1%
10433 1
< 0.1%
10432 1
< 0.1%
10431 1
< 0.1%
10430 1
< 0.1%
10428 1
< 0.1%
Distinct2177
Distinct (%)88.3%
Missing0
Missing (%)0.0%
Memory size19.4 KiB
2023-12-12T15:02:57.305396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.6640974
Min length1

Characters and Unicode

Total characters23822
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1950 ?
Unique (%)79.1%

Sample

1st row08C19_0019
2nd row08C09_0004
3rd row06C10_0028
4th row06C10_0028
5th row08C09_0024
ValueCountFrequency (%)
06c10_0047 5
 
0.2%
06c10_0054 5
 
0.2%
09c12_0004 5
 
0.2%
10r41 4
 
0.2%
09r33_0001 4
 
0.2%
07c14_0010 4
 
0.2%
06p03_0006 4
 
0.2%
09c12_0018 4
 
0.2%
12r15_0001 4
 
0.2%
09c05_0070 4
 
0.2%
Other values (2167) 2422
98.3%
2023-12-12T15:02:57.849250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 9205
38.6%
1 2540
 
10.7%
_ 2299
 
9.7%
C 1826
 
7.7%
6 1209
 
5.1%
2 1136
 
4.8%
9 1028
 
4.3%
7 1023
 
4.3%
5 788
 
3.3%
8 778
 
3.3%
Other values (8) 1990
 
8.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 19050
80.0%
Uppercase Letter 2463
 
10.3%
Connector Punctuation 2299
 
9.7%
Lowercase Letter 10
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 9205
48.3%
1 2540
 
13.3%
6 1209
 
6.3%
2 1136
 
6.0%
9 1028
 
5.4%
7 1023
 
5.4%
5 788
 
4.1%
8 778
 
4.1%
3 690
 
3.6%
4 653
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
C 1826
74.1%
P 341
 
13.8%
R 295
 
12.0%
S 1
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
a 5
50.0%
b 4
40.0%
d 1
 
10.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2299
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 21349
89.6%
Latin 2473
 
10.4%

Most frequent character per script

Common
ValueCountFrequency (%)
0 9205
43.1%
1 2540
 
11.9%
_ 2299
 
10.8%
6 1209
 
5.7%
2 1136
 
5.3%
9 1028
 
4.8%
7 1023
 
4.8%
5 788
 
3.7%
8 778
 
3.6%
3 690
 
3.2%
Latin
ValueCountFrequency (%)
C 1826
73.8%
P 341
 
13.8%
R 295
 
11.9%
a 5
 
0.2%
b 4
 
0.2%
d 1
 
< 0.1%
S 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23822
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 9205
38.6%
1 2540
 
10.7%
_ 2299
 
9.7%
C 1826
 
7.7%
6 1209
 
5.1%
2 1136
 
4.8%
9 1028
 
4.3%
7 1023
 
4.3%
5 788
 
3.3%
8 778
 
3.3%
Other values (8) 1990
 
8.4%
Distinct1929
Distinct (%)78.3%
Missing0
Missing (%)0.0%
Memory size19.4 KiB
2023-12-12T15:02:58.316780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length30
Median length27
Mean length9.2490872
Min length1

Characters and Unicode

Total characters22799
Distinct characters870
Distinct categories12 ?
Distinct scripts7 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1596 ?
Unique (%)64.7%

Sample

1st rowSurachai Sensri
2nd rowSumi Yoon
3rd rowTetsuharu Moriya
4th rowKaoru Horie
5th rowTetsuharu Moriya
ValueCountFrequency (%)
kim 135
 
3.1%
lee 66
 
1.5%
park 52
 
1.2%
김관웅 30
 
0.7%
shin 27
 
0.6%
ким 24
 
0.6%
а 22
 
0.5%
suh 18
 
0.4%
han 18
 
0.4%
j 17
 
0.4%
Other values (2478) 3902
90.5%
2023-12-12T15:02:58.974285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1847
 
8.1%
n 1513
 
6.6%
a 1261
 
5.5%
o 1199
 
5.3%
e 1100
 
4.8%
i 975
 
4.3%
u 701
 
3.1%
h 615
 
2.7%
g 614
 
2.7%
r 588
 
2.6%
Other values (860) 12386
54.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12791
56.1%
Uppercase Letter 4176
 
18.3%
Other Letter 3211
 
14.1%
Space Separator 1847
 
8.1%
Other Punctuation 413
 
1.8%
Dash Punctuation 309
 
1.4%
Open Punctuation 21
 
0.1%
Close Punctuation 21
 
0.1%
Private Use 5
 
< 0.1%
Math Symbol 2
 
< 0.1%
Other values (2) 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
173
 
5.4%
67
 
2.1%
66
 
2.1%
47
 
1.5%
43
 
1.3%
40
 
1.2%
36
 
1.1%
33
 
1.0%
33
 
1.0%
33
 
1.0%
Other values (681) 2640
82.2%
Lowercase Letter
ValueCountFrequency (%)
n 1513
 
11.8%
a 1261
 
9.9%
o 1199
 
9.4%
e 1100
 
8.6%
i 975
 
7.6%
u 701
 
5.5%
h 615
 
4.8%
g 614
 
4.8%
r 588
 
4.6%
m 386
 
3.0%
Other values (88) 3839
30.0%
Uppercase Letter
ValueCountFrequency (%)
S 422
 
10.1%
K 340
 
8.1%
H 265
 
6.3%
J 247
 
5.9%
M 206
 
4.9%
Y 183
 
4.4%
A 162
 
3.9%
C 156
 
3.7%
L 145
 
3.5%
P 144
 
3.4%
Other values (55) 1906
45.6%
Other Punctuation
ValueCountFrequency (%)
. 360
87.2%
, 41
 
9.9%
' 10
 
2.4%
2
 
0.5%
Dash Punctuation
ValueCountFrequency (%)
- 308
99.7%
1
 
0.3%
Open Punctuation
ValueCountFrequency (%)
( 17
81.0%
[ 4
 
19.0%
Close Punctuation
ValueCountFrequency (%)
) 17
81.0%
] 4
 
19.0%
Decimal Number
ValueCountFrequency (%)
3 1
50.0%
1 1
50.0%
Space Separator
ValueCountFrequency (%)
1847
100.0%
Private Use
ValueCountFrequency (%)
􀀁 5
100.0%
Math Symbol
ValueCountFrequency (%)
÷ 2
100.0%
Modifier Letter
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14853
65.1%
Common 2616
 
11.5%
Cyrillic 2114
 
9.3%
Hangul 2026
 
8.9%
Han 1160
 
5.1%
Katakana 25
 
0.1%
Unknown 5
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
43
 
3.7%
29
 
2.5%
16
 
1.4%
15
 
1.3%
14
 
1.2%
14
 
1.2%
12
 
1.0%
11
 
0.9%
11
 
0.9%
11
 
0.9%
Other values (439) 984
84.8%
Hangul
ValueCountFrequency (%)
173
 
8.5%
67
 
3.3%
66
 
3.3%
47
 
2.3%
40
 
2.0%
36
 
1.8%
33
 
1.6%
33
 
1.6%
33
 
1.6%
28
 
1.4%
Other values (217) 1470
72.6%
Latin
ValueCountFrequency (%)
n 1513
 
10.2%
a 1261
 
8.5%
o 1199
 
8.1%
e 1100
 
7.4%
i 975
 
6.6%
u 701
 
4.7%
h 615
 
4.1%
g 614
 
4.1%
r 588
 
4.0%
S 422
 
2.8%
Other values (91) 5865
39.5%
Cyrillic
ValueCountFrequency (%)
а 241
 
11.4%
н 131
 
6.2%
А 127
 
6.0%
и 119
 
5.6%
о 83
 
3.9%
е 80
 
3.8%
в 80
 
3.8%
м 68
 
3.2%
М 66
 
3.1%
л 64
 
3.0%
Other values (52) 1055
49.9%
Common
ValueCountFrequency (%)
1847
70.6%
. 360
 
13.8%
- 308
 
11.8%
, 41
 
1.6%
( 17
 
0.6%
) 17
 
0.6%
' 10
 
0.4%
[ 4
 
0.2%
] 4
 
0.2%
2
 
0.1%
Other values (5) 6
 
0.2%
Katakana
ValueCountFrequency (%)
6
24.0%
3
12.0%
3
12.0%
2
 
8.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
Other values (5) 5
20.0%
Unknown
ValueCountFrequency (%)
􀀁 5
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17321
76.0%
Cyrillic 2114
 
9.3%
Hangul 2026
 
8.9%
CJK 1144
 
5.0%
None 126
 
0.6%
Katakana 28
 
0.1%
Latin Ext Additional 23
 
0.1%
CJK Compat Ideographs 16
 
0.1%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1847
 
10.7%
n 1513
 
8.7%
a 1261
 
7.3%
o 1199
 
6.9%
e 1100
 
6.4%
i 975
 
5.6%
u 701
 
4.0%
h 615
 
3.6%
g 614
 
3.5%
r 588
 
3.4%
Other values (53) 6908
39.9%
Cyrillic
ValueCountFrequency (%)
а 241
 
11.4%
н 131
 
6.2%
А 127
 
6.0%
и 119
 
5.6%
о 83
 
3.9%
е 80
 
3.8%
в 80
 
3.8%
м 68
 
3.2%
М 66
 
3.1%
л 64
 
3.0%
Other values (52) 1055
49.9%
Hangul
ValueCountFrequency (%)
173
 
8.5%
67
 
3.3%
66
 
3.3%
47
 
2.3%
40
 
2.0%
36
 
1.8%
33
 
1.6%
33
 
1.6%
33
 
1.6%
28
 
1.4%
Other values (217) 1470
72.6%
CJK
ValueCountFrequency (%)
43
 
3.8%
29
 
2.5%
16
 
1.4%
15
 
1.3%
14
 
1.2%
14
 
1.2%
12
 
1.0%
11
 
1.0%
11
 
1.0%
11
 
1.0%
Other values (435) 968
84.6%
None
ValueCountFrequency (%)
é 16
 
12.7%
ö 10
 
7.9%
ü 10
 
7.9%
á 8
 
6.3%
É 7
 
5.6%
å 6
 
4.8%
ê 5
 
4.0%
ư 5
 
4.0%
􀀁 5
 
4.0%
ô 5
 
4.0%
Other values (26) 49
38.9%
CJK Compat Ideographs
ValueCountFrequency (%)
11
68.8%
2
 
12.5%
2
 
12.5%
1
 
6.2%
Katakana
ValueCountFrequency (%)
6
21.4%
3
10.7%
3
10.7%
2
 
7.1%
2
 
7.1%
1
 
3.6%
1
 
3.6%
1
 
3.6%
1
 
3.6%
1
 
3.6%
Other values (7) 7
25.0%
Latin Ext Additional
ValueCountFrequency (%)
4
17.4%
3
13.0%
3
13.0%
2
 
8.7%
1
 
4.3%
1
 
4.3%
1
 
4.3%
1
 
4.3%
1
 
4.3%
1
 
4.3%
Other values (5) 5
21.7%
Punctuation
ValueCountFrequency (%)
1
100.0%

AUTHOR_KOR
Text

MISSING 

Distinct479
Distinct (%)69.0%
Missing1771
Missing (%)71.8%
Memory size19.4 KiB
2023-12-12T15:02:59.263947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length3
Mean length3.1282421
Min length2

Characters and Unicode

Total characters2171
Distinct characters230
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique377 ?
Unique (%)54.3%

Sample

1st row정광
2nd row정광
3rd row정광
4th row정광
5th row정광
ValueCountFrequency (%)
김관웅 31
 
4.3%
김도영 11
 
1.5%
김호웅 11
 
1.5%
오상순 8
 
1.1%
안평추 8
 
1.1%
정광 7
 
1.0%
김춘선 6
 
0.8%
장광군 5
 
0.7%
태평무 5
 
0.7%
장흥권 5
 
0.7%
Other values (489) 626
86.6%
2023-12-12T15:02:59.694117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
179
 
8.2%
69
 
3.2%
66
 
3.0%
50
 
2.3%
48
 
2.2%
37
 
1.7%
35
 
1.6%
34
 
1.6%
31
 
1.4%
29
 
1.3%
Other values (220) 1593
73.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2141
98.6%
Space Separator 29
 
1.3%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
179
 
8.4%
69
 
3.2%
66
 
3.1%
50
 
2.3%
48
 
2.2%
37
 
1.7%
35
 
1.6%
34
 
1.6%
31
 
1.4%
29
 
1.4%
Other values (218) 1563
73.0%
Space Separator
ValueCountFrequency (%)
29
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2141
98.6%
Common 30
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
179
 
8.4%
69
 
3.2%
66
 
3.1%
50
 
2.3%
48
 
2.2%
37
 
1.7%
35
 
1.6%
34
 
1.6%
31
 
1.4%
29
 
1.4%
Other values (218) 1563
73.0%
Common
ValueCountFrequency (%)
29
96.7%
, 1
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2141
98.6%
ASCII 30
 
1.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
179
 
8.4%
69
 
3.2%
66
 
3.1%
50
 
2.3%
48
 
2.2%
37
 
1.7%
35
 
1.6%
34
 
1.6%
31
 
1.4%
29
 
1.4%
Other values (218) 1563
73.0%
ASCII
ValueCountFrequency (%)
29
96.7%
, 1
 
3.3%

AUTHOR_ENG
Text

MISSING 

Distinct999
Distinct (%)79.8%
Missing1213
Missing (%)49.2%
Memory size19.4 KiB
2023-12-12T15:03:00.044967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length30
Median length24
Mean length13.511981
Min length3

Characters and Unicode

Total characters16917
Distinct characters77
Distinct categories6 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique814 ?
Unique (%)65.0%

Sample

1st rowSurachai Sensri
2nd rowSumi Yoon
3rd rowTetsuharu Moriya
4th rowKaoru Horie
5th rowTetsuharu Moriya
ValueCountFrequency (%)
kim 139
 
5.0%
lee 68
 
2.5%
park 54
 
2.0%
shin 26
 
0.9%
han 19
 
0.7%
suh 18
 
0.7%
j 18
 
0.7%
yoon 16
 
0.6%
john 15
 
0.5%
young 14
 
0.5%
Other values (1409) 2380
86.0%
2023-12-12T15:03:00.604602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 1541
 
9.1%
1518
 
9.0%
a 1280
 
7.6%
o 1201
 
7.1%
e 1116
 
6.6%
i 992
 
5.9%
u 713
 
4.2%
g 635
 
3.8%
h 606
 
3.6%
r 581
 
3.4%
Other values (67) 6734
39.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11510
68.0%
Uppercase Letter 3370
 
19.9%
Space Separator 1518
 
9.0%
Dash Punctuation 326
 
1.9%
Other Punctuation 190
 
1.1%
Other Letter 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 1541
13.4%
a 1280
11.1%
o 1201
10.4%
e 1116
9.7%
i 992
8.6%
u 713
 
6.2%
g 635
 
5.5%
h 606
 
5.3%
r 581
 
5.0%
k 388
 
3.4%
Other values (29) 2457
21.3%
Uppercase Letter
ValueCountFrequency (%)
S 413
 
12.3%
K 344
 
10.2%
H 278
 
8.2%
J 255
 
7.6%
M 206
 
6.1%
Y 189
 
5.6%
A 174
 
5.2%
C 164
 
4.9%
P 150
 
4.5%
L 146
 
4.3%
Other values (20) 1051
31.2%
Other Punctuation
ValueCountFrequency (%)
. 131
68.9%
, 50
 
26.3%
' 9
 
4.7%
Other Letter
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Space Separator
ValueCountFrequency (%)
1518
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 326
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14876
87.9%
Common 2034
 
12.0%
Cyrillic 4
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 1541
 
10.4%
a 1280
 
8.6%
o 1201
 
8.1%
e 1116
 
7.5%
i 992
 
6.7%
u 713
 
4.8%
g 635
 
4.3%
h 606
 
4.1%
r 581
 
3.9%
S 413
 
2.8%
Other values (55) 5798
39.0%
Common
ValueCountFrequency (%)
1518
74.6%
- 326
 
16.0%
. 131
 
6.4%
, 50
 
2.5%
' 9
 
0.4%
Cyrillic
ValueCountFrequency (%)
о 1
25.0%
Н 1
25.0%
А 1
25.0%
В 1
25.0%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16862
99.7%
None 48
 
0.3%
Cyrillic 4
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 1541
 
9.1%
1518
 
9.0%
a 1280
 
7.6%
o 1201
 
7.1%
e 1116
 
6.6%
i 992
 
5.9%
u 713
 
4.2%
g 635
 
3.8%
h 606
 
3.6%
r 581
 
3.4%
Other values (47) 6679
39.6%
None
ValueCountFrequency (%)
é 8
16.7%
ö 8
16.7%
á 7
14.6%
ü 7
14.6%
š 4
8.3%
ô 4
8.3%
É 4
8.3%
ű 1
 
2.1%
õ 1
 
2.1%
ř 1
 
2.1%
Other values (3) 3
 
6.2%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Cyrillic
ValueCountFrequency (%)
о 1
25.0%
Н 1
25.0%
А 1
25.0%
В 1
25.0%

AUTHOR_ETC
Text

MISSING 

Distinct508
Distinct (%)83.8%
Missing1859
Missing (%)75.4%
Memory size19.4 KiB
2023-12-12T15:03:00.903215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length26
Mean length6.9125413
Min length2

Characters and Unicode

Total characters4189
Distinct characters596
Distinct categories10 ?
Distinct scripts5 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique438 ?
Unique (%)72.3%

Sample

1st row磯崎典世
2nd row磯崎典世
3rd row鄭光
4th row鄭光
5th row鄭光
ValueCountFrequency (%)
ким 24
 
2.5%
а 21
 
2.2%
пак 12
 
1.3%
с 12
 
1.3%
м 12
 
1.3%
в 8
 
0.8%
鄭光 7
 
0.7%
и 7
 
0.7%
тен 6
 
0.6%
ю 6
 
0.6%
Other values (631) 828
87.8%
2023-12-12T15:03:01.466749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
337
 
8.0%
а 238
 
5.7%
. 233
 
5.6%
н 128
 
3.1%
А 124
 
3.0%
и 118
 
2.8%
о 81
 
1.9%
в 78
 
1.9%
е 76
 
1.8%
м 68
 
1.6%
Other values (586) 2708
64.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1545
36.9%
Other Letter 1138
27.2%
Uppercase Letter 926
22.1%
Space Separator 337
 
8.0%
Other Punctuation 236
 
5.6%
Dash Punctuation 3
 
0.1%
Decimal Number 1
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%
Modifier Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
40
 
3.5%
31
 
2.7%
20
 
1.8%
15
 
1.3%
14
 
1.2%
13
 
1.1%
13
 
1.1%
12
 
1.1%
11
 
1.0%
11
 
1.0%
Other values (444) 958
84.2%
Lowercase Letter
ValueCountFrequency (%)
а 238
15.4%
н 128
 
8.3%
и 118
 
7.6%
о 81
 
5.2%
в 78
 
5.0%
е 76
 
4.9%
м 68
 
4.4%
л 62
 
4.0%
р 59
 
3.8%
к 47
 
3.0%
Other values (70) 590
38.2%
Uppercase Letter
ValueCountFrequency (%)
А 124
 
13.4%
М 66
 
7.1%
К 63
 
6.8%
Н 60
 
6.5%
С 54
 
5.8%
В 53
 
5.7%
И 43
 
4.6%
Е 35
 
3.8%
Б 34
 
3.7%
Р 30
 
3.2%
Other values (41) 364
39.3%
Other Punctuation
ValueCountFrequency (%)
. 233
98.7%
' 1
 
0.4%
, 1
 
0.4%
1
 
0.4%
Dash Punctuation
ValueCountFrequency (%)
- 2
66.7%
1
33.3%
Space Separator
ValueCountFrequency (%)
337
100.0%
Decimal Number
ValueCountFrequency (%)
3 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Modifier Letter
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Cyrillic 2078
49.6%
Han 1120
26.7%
Common 580
 
13.8%
Latin 393
 
9.4%
Katakana 18
 
0.4%

Most frequent character per script

Han
ValueCountFrequency (%)
40
 
3.6%
31
 
2.8%
20
 
1.8%
15
 
1.3%
14
 
1.2%
13
 
1.2%
13
 
1.2%
12
 
1.1%
11
 
1.0%
11
 
1.0%
Other values (430) 940
83.9%
Latin
ValueCountFrequency (%)
n 31
 
7.9%
h 27
 
6.9%
T 26
 
6.6%
a 20
 
5.1%
e 18
 
4.6%
i 17
 
4.3%
g 16
 
4.1%
r 15
 
3.8%
u 12
 
3.1%
N 12
 
3.1%
Other values (59) 199
50.6%
Cyrillic
ValueCountFrequency (%)
а 238
 
11.5%
н 128
 
6.2%
А 124
 
6.0%
и 118
 
5.7%
о 81
 
3.9%
в 78
 
3.8%
е 76
 
3.7%
м 68
 
3.3%
М 66
 
3.2%
К 63
 
3.0%
Other values (52) 1038
50.0%
Katakana
ValueCountFrequency (%)
3
16.7%
2
11.1%
2
11.1%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
Other values (4) 4
22.2%
Common
ValueCountFrequency (%)
337
58.1%
. 233
40.2%
- 2
 
0.3%
3 1
 
0.2%
' 1
 
0.2%
( 1
 
0.2%
, 1
 
0.2%
) 1
 
0.2%
1
 
0.2%
1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Cyrillic 2078
49.6%
CJK 1107
26.4%
ASCII 900
21.5%
None 43
 
1.0%
Latin Ext Additional 27
 
0.6%
Katakana 20
 
0.5%
CJK Compat Ideographs 13
 
0.3%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
337
37.4%
. 233
25.9%
n 31
 
3.4%
h 27
 
3.0%
T 26
 
2.9%
a 20
 
2.2%
e 18
 
2.0%
i 17
 
1.9%
g 16
 
1.8%
r 15
 
1.7%
Other values (37) 160
17.8%
Cyrillic
ValueCountFrequency (%)
а 238
 
11.5%
н 128
 
6.2%
А 124
 
6.0%
и 118
 
5.7%
о 81
 
3.9%
в 78
 
3.8%
е 76
 
3.7%
м 68
 
3.3%
М 66
 
3.2%
К 63
 
3.0%
Other values (52) 1038
50.0%
CJK
ValueCountFrequency (%)
40
 
3.6%
31
 
2.8%
20
 
1.8%
15
 
1.4%
14
 
1.3%
13
 
1.2%
13
 
1.2%
12
 
1.1%
11
 
1.0%
11
 
1.0%
Other values (426) 927
83.7%
CJK Compat Ideographs
ValueCountFrequency (%)
8
61.5%
2
 
15.4%
2
 
15.4%
1
 
7.7%
None
ValueCountFrequency (%)
é 8
18.6%
ê 7
16.3%
ư 5
11.6%
ü 4
9.3%
ö 4
9.3%
Đ 3
 
7.0%
ă 2
 
4.7%
â 2
 
4.7%
ơ 2
 
4.7%
à 1
 
2.3%
Other values (5) 5
11.6%
Latin Ext Additional
ValueCountFrequency (%)
5
18.5%
5
18.5%
4
14.8%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (5) 5
18.5%
Katakana
ValueCountFrequency (%)
3
15.0%
2
 
10.0%
2
 
10.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (6) 6
30.0%
Punctuation
ValueCountFrequency (%)
1
100.0%
Distinct1894
Distinct (%)76.8%
Missing0
Missing (%)0.0%
Memory size19.4 KiB
2023-12-12T15:03:01.872491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length30
Median length28
Mean length8.9395538
Min length1

Characters and Unicode

Total characters22036
Distinct characters800
Distinct categories9 ?
Distinct scripts7 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1536 ?
Unique (%)62.3%

Sample

1st rowSURACHAI SENSRI
2nd rowSUMI YOON
3rd rowTETSUHARU MORIYA
4th rowKAORU HORIE
5th rowTETSUHARU MORIYA
ValueCountFrequency (%)
kim 135
 
3.1%
lee 66
 
1.5%
park 52
 
1.2%
김관웅 30
 
0.7%
shin 27
 
0.6%
ким 24
 
0.6%
а 22
 
0.5%
han 18
 
0.4%
suh 18
 
0.4%
j 17
 
0.4%
Other values (2460) 3901
90.5%
2023-12-12T15:03:02.663520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1845
 
8.4%
N 1624
 
7.4%
A 1423
 
6.5%
O 1281
 
5.8%
E 1186
 
5.4%
I 1058
 
4.8%
H 880
 
4.0%
S 804
 
3.6%
K 721
 
3.3%
U 721
 
3.3%
Other values (790) 10493
47.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 16967
77.0%
Other Letter 3211
 
14.6%
Space Separator 1845
 
8.4%
Private Use 5
 
< 0.1%
Other Punctuation 2
 
< 0.1%
Math Symbol 2
 
< 0.1%
Decimal Number 2
 
< 0.1%
Dash Punctuation 1
 
< 0.1%
Modifier Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
173
 
5.4%
67
 
2.1%
66
 
2.1%
47
 
1.5%
43
 
1.3%
40
 
1.2%
36
 
1.1%
33
 
1.0%
33
 
1.0%
33
 
1.0%
Other values (681) 2640
82.2%
Uppercase Letter
ValueCountFrequency (%)
N 1624
 
9.6%
A 1423
 
8.4%
O 1281
 
7.5%
E 1186
 
7.0%
I 1058
 
6.2%
H 880
 
5.2%
S 804
 
4.7%
K 721
 
4.2%
U 721
 
4.2%
G 698
 
4.1%
Other values (91) 6571
38.7%
Decimal Number
ValueCountFrequency (%)
1 1
50.0%
3 1
50.0%
Space Separator
ValueCountFrequency (%)
1845
100.0%
Private Use
ValueCountFrequency (%)
􀀁 5
100.0%
Other Punctuation
ValueCountFrequency (%)
2
100.0%
Math Symbol
ValueCountFrequency (%)
÷ 2
100.0%
Dash Punctuation
ValueCountFrequency (%)
1
100.0%
Modifier Letter
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14853
67.4%
Cyrillic 2114
 
9.6%
Hangul 2026
 
9.2%
Common 1853
 
8.4%
Han 1160
 
5.3%
Katakana 25
 
0.1%
Unknown 5
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
43
 
3.7%
29
 
2.5%
16
 
1.4%
15
 
1.3%
14
 
1.2%
14
 
1.2%
12
 
1.0%
11
 
0.9%
11
 
0.9%
11
 
0.9%
Other values (439) 984
84.8%
Hangul
ValueCountFrequency (%)
173
 
8.5%
67
 
3.3%
66
 
3.3%
47
 
2.3%
40
 
2.0%
36
 
1.8%
33
 
1.6%
33
 
1.6%
33
 
1.6%
28
 
1.4%
Other values (217) 1470
72.6%
Latin
ValueCountFrequency (%)
N 1624
 
10.9%
A 1423
 
9.6%
O 1281
 
8.6%
E 1186
 
8.0%
I 1058
 
7.1%
H 880
 
5.9%
S 804
 
5.4%
K 721
 
4.9%
U 721
 
4.9%
G 698
 
4.7%
Other values (58) 4457
30.0%
Cyrillic
ValueCountFrequency (%)
А 368
17.4%
Н 191
 
9.0%
И 162
 
7.7%
В 135
 
6.4%
М 134
 
6.3%
Е 115
 
5.4%
О 112
 
5.3%
К 111
 
5.3%
Р 92
 
4.4%
Л 91
 
4.3%
Other values (23) 603
28.5%
Katakana
ValueCountFrequency (%)
6
24.0%
3
12.0%
3
12.0%
2
 
8.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
Other values (5) 5
20.0%
Common
ValueCountFrequency (%)
1845
99.6%
2
 
0.1%
÷ 2
 
0.1%
1 1
 
0.1%
3 1
 
0.1%
1
 
0.1%
1
 
0.1%
Unknown
ValueCountFrequency (%)
􀀁 5
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16559
75.1%
Cyrillic 2114
 
9.6%
Hangul 2026
 
9.2%
CJK 1144
 
5.2%
None 125
 
0.6%
Katakana 28
 
0.1%
Latin Ext Additional 23
 
0.1%
CJK Compat Ideographs 16
 
0.1%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1845
 
11.1%
N 1624
 
9.8%
A 1423
 
8.6%
O 1281
 
7.7%
E 1186
 
7.2%
I 1058
 
6.4%
H 880
 
5.3%
S 804
 
4.9%
K 721
 
4.4%
U 721
 
4.4%
Other values (19) 5016
30.3%
Cyrillic
ValueCountFrequency (%)
А 368
17.4%
Н 191
 
9.0%
И 162
 
7.7%
В 135
 
6.4%
М 134
 
6.3%
Е 115
 
5.4%
О 112
 
5.3%
К 111
 
5.3%
Р 92
 
4.4%
Л 91
 
4.3%
Other values (23) 603
28.5%
Hangul
ValueCountFrequency (%)
173
 
8.5%
67
 
3.3%
66
 
3.3%
47
 
2.3%
40
 
2.0%
36
 
1.8%
33
 
1.6%
33
 
1.6%
33
 
1.6%
28
 
1.4%
Other values (217) 1470
72.6%
CJK
ValueCountFrequency (%)
43
 
3.8%
29
 
2.5%
16
 
1.4%
15
 
1.3%
14
 
1.2%
14
 
1.2%
12
 
1.0%
11
 
1.0%
11
 
1.0%
11
 
1.0%
Other values (435) 968
84.6%
None
ValueCountFrequency (%)
É 23
18.4%
Ü 10
 
8.0%
Ö 10
 
8.0%
Á 8
 
6.4%
Ð 6
 
4.8%
 6
 
4.8%
Å 6
 
4.8%
􀀁 5
 
4.0%
Ê 5
 
4.0%
Ô 5
 
4.0%
Other values (19) 41
32.8%
CJK Compat Ideographs
ValueCountFrequency (%)
11
68.8%
2
 
12.5%
2
 
12.5%
1
 
6.2%
Katakana
ValueCountFrequency (%)
6
21.4%
3
10.7%
3
10.7%
2
 
7.1%
2
 
7.1%
1
 
3.6%
1
 
3.6%
1
 
3.6%
1
 
3.6%
1
 
3.6%
Other values (7) 7
25.0%
Latin Ext Additional
ValueCountFrequency (%)
4
17.4%
3
13.0%
3
13.0%
2
 
8.7%
1
 
4.3%
1
 
4.3%
1
 
4.3%
1
 
4.3%
1
 
4.3%
1
 
4.3%
Other values (5) 5
21.7%
Punctuation
ValueCountFrequency (%)
1
100.0%

SORT_AUTHOR_KOR
Text

MISSING 

Distinct479
Distinct (%)69.0%
Missing1771
Missing (%)71.8%
Memory size19.4 KiB
2023-12-12T15:03:02.943550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length3
Mean length3.1268012
Min length2

Characters and Unicode

Total characters2170
Distinct characters229
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique377 ?
Unique (%)54.3%

Sample

1st row정광
2nd row정광
3rd row정광
4th row정광
5th row정광
ValueCountFrequency (%)
김관웅 31
 
4.3%
김도영 11
 
1.5%
김호웅 11
 
1.5%
오상순 8
 
1.1%
안평추 8
 
1.1%
정광 7
 
1.0%
김춘선 6
 
0.8%
장광군 5
 
0.7%
태평무 5
 
0.7%
장흥권 5
 
0.7%
Other values (489) 626
86.6%
2023-12-12T15:03:03.357581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
179
 
8.2%
69
 
3.2%
66
 
3.0%
50
 
2.3%
48
 
2.2%
37
 
1.7%
35
 
1.6%
34
 
1.6%
31
 
1.4%
29
 
1.3%
Other values (219) 1592
73.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2141
98.7%
Space Separator 29
 
1.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
179
 
8.4%
69
 
3.2%
66
 
3.1%
50
 
2.3%
48
 
2.2%
37
 
1.7%
35
 
1.6%
34
 
1.6%
31
 
1.4%
29
 
1.4%
Other values (218) 1563
73.0%
Space Separator
ValueCountFrequency (%)
29
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2141
98.7%
Common 29
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
179
 
8.4%
69
 
3.2%
66
 
3.1%
50
 
2.3%
48
 
2.2%
37
 
1.7%
35
 
1.6%
34
 
1.6%
31
 
1.4%
29
 
1.4%
Other values (218) 1563
73.0%
Common
ValueCountFrequency (%)
29
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2141
98.7%
ASCII 29
 
1.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
179
 
8.4%
69
 
3.2%
66
 
3.1%
50
 
2.3%
48
 
2.2%
37
 
1.7%
35
 
1.6%
34
 
1.6%
31
 
1.4%
29
 
1.4%
Other values (218) 1563
73.0%
ASCII
ValueCountFrequency (%)
29
100.0%

SORT_AUTHOR_ENG
Text

MISSING 

Distinct969
Distinct (%)77.4%
Missing1213
Missing (%)49.2%
Memory size19.4 KiB
2023-12-12T15:03:03.661495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length23
Mean length13.097444
Min length3

Characters and Unicode

Total characters16398
Distinct characters45
Distinct categories3 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique766 ?
Unique (%)61.2%

Sample

1st rowSURACHAI SENSRI
2nd rowSUMI YOON
3rd rowTETSUHARU MORIYA
4th rowKAORU HORIE
5th rowTETSUHARU MORIYA
ValueCountFrequency (%)
kim 139
 
5.0%
lee 68
 
2.5%
park 54
 
2.0%
shin 26
 
0.9%
han 19
 
0.7%
j 18
 
0.7%
suh 18
 
0.7%
yoon 16
 
0.6%
john 15
 
0.5%
young 14
 
0.5%
Other values (1394) 2380
86.0%
2023-12-12T15:03:04.124501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 1645
 
10.0%
1515
 
9.2%
A 1454
 
8.9%
O 1281
 
7.8%
E 1199
 
7.3%
I 1081
 
6.6%
H 884
 
5.4%
S 792
 
4.8%
U 733
 
4.5%
K 732
 
4.5%
Other values (35) 5082
31.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 14880
90.7%
Space Separator 1515
 
9.2%
Other Letter 3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 1645
 
11.1%
A 1454
 
9.8%
O 1281
 
8.6%
E 1199
 
8.1%
I 1081
 
7.3%
H 884
 
5.9%
S 792
 
5.3%
U 733
 
4.9%
K 732
 
4.9%
G 716
 
4.8%
Other values (31) 4363
29.3%
Other Letter
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Space Separator
ValueCountFrequency (%)
1515
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14876
90.7%
Common 1515
 
9.2%
Cyrillic 4
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 1645
 
11.1%
A 1454
 
9.8%
O 1281
 
8.6%
E 1199
 
8.1%
I 1081
 
7.3%
H 884
 
5.9%
S 792
 
5.3%
U 733
 
4.9%
K 732
 
4.9%
G 716
 
4.8%
Other values (27) 4359
29.3%
Cyrillic
ValueCountFrequency (%)
О 1
25.0%
А 1
25.0%
В 1
25.0%
Н 1
25.0%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Common
ValueCountFrequency (%)
1515
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16344
99.7%
None 47
 
0.3%
Cyrillic 4
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 1645
 
10.1%
1515
 
9.3%
A 1454
 
8.9%
O 1281
 
7.8%
E 1199
 
7.3%
I 1081
 
6.6%
H 884
 
5.4%
S 792
 
4.8%
U 733
 
4.5%
K 732
 
4.5%
Other values (17) 5028
30.8%
None
ValueCountFrequency (%)
É 12
25.5%
Ö 8
17.0%
Á 7
14.9%
Ü 7
14.9%
Ô 4
 
8.5%
Š 4
 
8.5%
Ű 1
 
2.1%
Ř 1
 
2.1%
Õ 1
 
2.1%
Ğ 1
 
2.1%
Cyrillic
ValueCountFrequency (%)
О 1
25.0%
А 1
25.0%
В 1
25.0%
Н 1
25.0%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

SORT_AUTHOR_ETC
Text

MISSING 

Distinct504
Distinct (%)83.2%
Missing1859
Missing (%)75.4%
Memory size19.4 KiB
2023-12-12T15:03:04.430534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length24
Mean length6.5165017
Min length2

Characters and Unicode

Total characters3949
Distinct characters545
Distinct categories7 ?
Distinct scripts5 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique432 ?
Unique (%)71.3%

Sample

1st row磯崎典世
2nd row磯崎典世
3rd row鄭光
4th row鄭光
5th row鄭光
ValueCountFrequency (%)
ким 24
 
2.5%
а 21
 
2.2%
пак 12
 
1.3%
м 12
 
1.3%
с 12
 
1.3%
в 8
 
0.8%
鄭光 7
 
0.7%
и 7
 
0.7%
л 6
 
0.6%
хан 6
 
0.6%
Other values (629) 827
87.8%
2023-12-12T15:03:04.922640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
А 362
 
9.2%
336
 
8.5%
Н 188
 
4.8%
И 161
 
4.1%
М 134
 
3.4%
В 131
 
3.3%
Е 111
 
2.8%
К 110
 
2.8%
О 110
 
2.8%
Л 89
 
2.3%
Other values (535) 2217
56.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2471
62.6%
Other Letter 1138
28.8%
Space Separator 336
 
8.5%
Decimal Number 1
 
< 0.1%
Modifier Letter 1
 
< 0.1%
Other Punctuation 1
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
40
 
3.5%
31
 
2.7%
20
 
1.8%
15
 
1.3%
14
 
1.2%
13
 
1.1%
13
 
1.1%
12
 
1.1%
11
 
1.0%
11
 
1.0%
Other values (444) 958
84.2%
Uppercase Letter
ValueCountFrequency (%)
А 362
 
14.6%
Н 188
 
7.6%
И 161
 
6.5%
М 134
 
5.4%
В 131
 
5.3%
Е 111
 
4.5%
К 110
 
4.5%
О 110
 
4.5%
Л 89
 
3.6%
Р 89
 
3.6%
Other values (76) 986
39.9%
Space Separator
ValueCountFrequency (%)
336
100.0%
Decimal Number
ValueCountFrequency (%)
3 1
100.0%
Modifier Letter
ValueCountFrequency (%)
1
100.0%
Other Punctuation
ValueCountFrequency (%)
1
100.0%
Dash Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Cyrillic 2078
52.6%
Han 1120
28.4%
Latin 393
 
10.0%
Common 340
 
8.6%
Katakana 18
 
0.5%

Most frequent character per script

Han
ValueCountFrequency (%)
40
 
3.6%
31
 
2.8%
20
 
1.8%
15
 
1.3%
14
 
1.2%
13
 
1.2%
13
 
1.2%
12
 
1.1%
11
 
1.0%
11
 
1.0%
Other values (430) 940
83.9%
Latin
ValueCountFrequency (%)
N 43
 
10.9%
H 31
 
7.9%
T 28
 
7.1%
G 27
 
6.9%
A 21
 
5.3%
I 19
 
4.8%
E 19
 
4.8%
L 18
 
4.6%
R 17
 
4.3%
S 15
 
3.8%
Other values (43) 155
39.4%
Cyrillic
ValueCountFrequency (%)
А 362
17.4%
Н 188
 
9.0%
И 161
 
7.7%
М 134
 
6.4%
В 131
 
6.3%
Е 111
 
5.3%
К 110
 
5.3%
О 110
 
5.3%
Л 89
 
4.3%
Р 89
 
4.3%
Other values (23) 593
28.5%
Katakana
ValueCountFrequency (%)
3
16.7%
2
11.1%
2
11.1%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
Other values (4) 4
22.2%
Common
ValueCountFrequency (%)
336
98.8%
3 1
 
0.3%
1
 
0.3%
1
 
0.3%
1
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
Cyrillic 2078
52.6%
CJK 1107
28.0%
ASCII 660
 
16.7%
None 43
 
1.1%
Latin Ext Additional 27
 
0.7%
Katakana 20
 
0.5%
CJK Compat Ideographs 13
 
0.3%
Punctuation 1
 
< 0.1%

Most frequent character per block

Cyrillic
ValueCountFrequency (%)
А 362
17.4%
Н 188
 
9.0%
И 161
 
7.7%
М 134
 
6.4%
В 131
 
6.3%
Е 111
 
5.3%
К 110
 
5.3%
О 110
 
5.3%
Л 89
 
4.3%
Р 89
 
4.3%
Other values (23) 593
28.5%
ASCII
ValueCountFrequency (%)
336
50.9%
N 43
 
6.5%
H 31
 
4.7%
T 28
 
4.2%
G 27
 
4.1%
A 21
 
3.2%
I 19
 
2.9%
E 19
 
2.9%
L 18
 
2.7%
R 17
 
2.6%
Other values (15) 101
 
15.3%
CJK
ValueCountFrequency (%)
40
 
3.6%
31
 
2.8%
20
 
1.8%
15
 
1.4%
14
 
1.3%
13
 
1.2%
13
 
1.2%
12
 
1.1%
11
 
1.0%
11
 
1.0%
Other values (426) 927
83.7%
CJK Compat Ideographs
ValueCountFrequency (%)
8
61.5%
2
 
15.4%
2
 
15.4%
1
 
7.7%
None
ValueCountFrequency (%)
É 8
18.6%
Ê 7
16.3%
Ư 5
11.6%
Ü 4
9.3%
Ö 4
9.3%
Đ 3
 
7.0%
Ơ 2
 
4.7%
Ă 2
 
4.7%
 2
 
4.7%
Ì 1
 
2.3%
Other values (5) 5
11.6%
Latin Ext Additional
ValueCountFrequency (%)
5
18.5%
5
18.5%
4
14.8%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (5) 5
18.5%
Katakana
ValueCountFrequency (%)
3
15.0%
2
 
10.0%
2
 
10.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (6) 6
30.0%
Punctuation
ValueCountFrequency (%)
1
100.0%

GANADA_AUTHOR_ORI
Categorical

HIGH CORRELATION 

Distinct40
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size19.4 KiB
ETC
588 
223 
S
164 
J
132 
128 
Other values (35)
1230 

Length

Max length3
Median length1
Mean length1.4770791
Min length1

Unique

Unique5 ?
Unique (%)0.2%

Sample

1st rowS
2nd rowS
3rd rowT
4th rowK
5th rowT

Common Values

ValueCountFrequency (%)
ETC 588
23.9%
223
 
9.0%
S 164
 
6.7%
J 132
 
5.4%
128
 
5.2%
H 100
 
4.1%
K 93
 
3.8%
92
 
3.7%
M 87
 
3.5%
Y 74
 
3.0%
Other values (30) 784
31.8%

Length

2023-12-12T15:03:05.135035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
etc 588
23.9%
223
 
9.0%
s 164
 
6.7%
j 132
 
5.4%
128
 
5.2%
h 100
 
4.1%
k 93
 
3.8%
92
 
3.7%
m 87
 
3.5%
y 74
 
3.0%
Other values (30) 784
31.8%

GANADA_AUTHOR_KOR
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct14
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size19.4 KiB
<NA>
1771 
231 
 
133
 
100
 
43
Other values (9)
187 

Length

Max length4
Median length4
Mean length3.1553753
Min length1

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 1771
71.8%
231
 
9.4%
133
 
5.4%
100
 
4.1%
43
 
1.7%
41
 
1.7%
39
 
1.6%
36
 
1.5%
22
 
0.9%
21
 
0.9%
Other values (4) 28
 
1.1%

Length

2023-12-12T15:03:05.301014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 1771
71.8%
231
 
9.4%
133
 
5.4%
100
 
4.1%
43
 
1.7%
41
 
1.7%
39
 
1.6%
36
 
1.5%
22
 
0.9%
21
 
0.9%
Other values (4) 28
 
1.1%

GANADA_AUTHOR_ENG
Categorical

HIGH CORRELATION 

Distinct28
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size19.4 KiB
<NA>
1213 
S
162 
J
134 
H
 
103
K
 
99
Other values (23)
754 

Length

Max length4
Median length1
Mean length2.4778905
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowS
2nd rowS
3rd rowT
4th rowK
5th rowT

Common Values

ValueCountFrequency (%)
<NA> 1213
49.2%
S 162
 
6.6%
J 134
 
5.4%
H 103
 
4.2%
K 99
 
4.0%
M 82
 
3.3%
Y 79
 
3.2%
C 75
 
3.0%
P 61
 
2.5%
D 60
 
2.4%
Other values (18) 397
 
16.1%

Length

2023-12-12T15:03:05.453789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 1213
49.2%
s 162
 
6.6%
j 134
 
5.4%
h 103
 
4.2%
k 99
 
4.0%
m 82
 
3.3%
y 79
 
3.2%
c 75
 
3.0%
p 61
 
2.5%
d 60
 
2.4%
Other values (18) 397
 
16.1%

GANADA_AUTHOR_ETC
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct15
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size19.4 KiB
<NA>
1859 
ETC
578 
T
 
8
V
 
4
S
 
3
Other values (10)
 
13

Length

Max length4
Median length4
Mean length3.7314402
Min length1

Unique

Unique7 ?
Unique (%)0.3%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 1859
75.4%
ETC 578
 
23.4%
T 8
 
0.3%
V 4
 
0.2%
S 3
 
0.1%
P 2
 
0.1%
G 2
 
0.1%
L 2
 
0.1%
M 1
 
< 0.1%
3 1
 
< 0.1%
Other values (5) 5
 
0.2%

Length

2023-12-12T15:03:05.605604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 1859
75.4%
etc 578
 
23.4%
t 8
 
0.3%
v 4
 
0.2%
s 3
 
0.1%
p 2
 
0.1%
g 2
 
0.1%
l 2
 
0.1%
m 1
 
< 0.1%
3 1
 
< 0.1%
Other values (5) 5
 
0.2%

Interactions

2023-12-12T15:02:55.837587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:03:05.702110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
AUTHOR_IDGANADA_AUTHOR_ORIGANADA_AUTHOR_KORGANADA_AUTHOR_ENGGANADA_AUTHOR_ETC
AUTHOR_ID1.0000.5980.4310.3700.662
GANADA_AUTHOR_ORI0.5981.0000.9890.9960.992
GANADA_AUTHOR_KOR0.4310.9891.0000.7650.000
GANADA_AUTHOR_ENG0.3700.9960.7651.0001.000
GANADA_AUTHOR_ETC0.6620.9920.0001.0001.000
2023-12-12T15:03:05.818330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
GANADA_AUTHOR_ENGGANADA_AUTHOR_ETCGANADA_AUTHOR_KORGANADA_AUTHOR_ORI
GANADA_AUTHOR_ENG1.0000.7070.4050.909
GANADA_AUTHOR_ETC0.7071.0000.0000.936
GANADA_AUTHOR_KOR0.4050.0001.0000.922
GANADA_AUTHOR_ORI0.9090.9360.9221.000
2023-12-12T15:03:05.934776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
AUTHOR_IDGANADA_AUTHOR_ORIGANADA_AUTHOR_KORGANADA_AUTHOR_ENGGANADA_AUTHOR_ETC
AUTHOR_ID1.0000.2310.1940.1420.339
GANADA_AUTHOR_ORI0.2311.0000.9220.9090.936
GANADA_AUTHOR_KOR0.1940.9221.0000.4050.000
GANADA_AUTHOR_ENG0.1420.9090.4051.0000.707
GANADA_AUTHOR_ETC0.3390.9360.0000.7071.000

Missing values

2023-12-12T15:02:56.011632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:02:56.237085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T15:02:56.414511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

AUTHOR_IDCATALOG_IDAUTHOR_ORIAUTHOR_KORAUTHOR_ENGAUTHOR_ETCSORT_AUTHOR_ORISORT_AUTHOR_KORSORT_AUTHOR_ENGSORT_AUTHOR_ETCGANADA_AUTHOR_ORIGANADA_AUTHOR_KORGANADA_AUTHOR_ENGGANADA_AUTHOR_ETC
0743808C19_0019Surachai Sensri<NA>Surachai Sensri<NA>SURACHAI SENSRI<NA>SURACHAI SENSRI<NA>S<NA>S<NA>
1743908C09_0004Sumi Yoon<NA>Sumi Yoon<NA>SUMI YOON<NA>SUMI YOON<NA>S<NA>S<NA>
2744006C10_0028Tetsuharu Moriya<NA>Tetsuharu Moriya<NA>TETSUHARU MORIYA<NA>TETSUHARU MORIYA<NA>T<NA>T<NA>
3744106C10_0028Kaoru Horie<NA>Kaoru Horie<NA>KAORU HORIE<NA>KAORU HORIE<NA>K<NA>K<NA>
4744208C09_0024Tetsuharu Moriya<NA>Tetsuharu Moriya<NA>TETSUHARU MORIYA<NA>TETSUHARU MORIYA<NA>T<NA>T<NA>
5744308C09_0024Yong-Taek Kim<NA>Yong-Taek Kim<NA>YONGTAEK KIM<NA>YONGTAEK KIM<NA>Y<NA>Y<NA>
6744410C11_0005磯崎典世<NA><NA>磯崎典世磯崎典世<NA><NA>磯崎典世ETC<NA><NA>ETC
7744510C11_0006磯崎典世<NA><NA>磯崎典世磯崎典世<NA><NA>磯崎典世ETC<NA><NA>ETC
8744606C06_0028정광정광<NA>鄭光정광정광<NA>鄭光<NA>ETC
9744707C06_0002정광정광<NA>鄭光정광정광<NA>鄭光<NA>ETC
AUTHOR_IDCATALOG_IDAUTHOR_ORIAUTHOR_KORAUTHOR_ENGAUTHOR_ETCSORT_AUTHOR_ORISORT_AUTHOR_KORSORT_AUTHOR_ENGSORT_AUTHOR_ETCGANADA_AUTHOR_ORIGANADA_AUTHOR_KORGANADA_AUTHOR_ENGGANADA_AUTHOR_ETC
24551042811R61_0003심현숙심현숙<NA><NA>심현숙심현숙<NA><NA><NA><NA>
24561043012R15_0002Nusta Carranza Ko<NA><NA>Nusta Carranza KoNUSTA CARRANZA KO<NA><NA>NUSTA CARRANZA KON<NA><NA>N
24571043112R15_0002Jeong-Nam Kim<NA><NA>Jeong-Nam KimJEONGNAM KIM<NA><NA>JEONGNAM KIMJ<NA><NA>J
24581043212R15_0002Song I. No<NA><NA>Song I. NoSONG I NO<NA><NA>SONG I NOS<NA><NA>S
24591043312R15_0002Ronald Gobbi Simoes<NA><NA>Ronald Gobbi SimoesRONALD GOBBI SIMOES<NA><NA>RONALD GOBBI SIMOESR<NA><NA>R
24601043511R61沈贤淑심현숙<NA><NA>沈贤淑심현숙<NA><NA>ETC<NA><NA>
24611043609R84Byung-jin Lim임병진Byung-jin Lim<NA>BYUNGJIN LIM임병진BYUNGJIN LIM<NA>BB<NA>
24621043707R73Simbirtseva Tatiana M.<NA>Simbirtseva Tatiana M.<NA>SIMBIRTSEVA TATIANA M<NA>SIMBIRTSEVA TATIANA M<NA>S<NA>S<NA>
24631044207R12한영한영<NA><NA>한영한영<NA><NA><NA><NA>
24641044307R62Xuehua Hong<NA>Xuehua Hong<NA>XUEHUA HONG<NA>XUEHUA HONG<NA>X<NA>X<NA>