Dataset statistics
Number of variables | 14 |
---|---|
Number of observations | 2191 |
Missing cells | 5078 |
Missing cells (%) | 16.6% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 241.9 KiB |
Average record size in memory | 113.1 B |
Variable types
Numeric | 1 |
---|---|
Text | 9 |
Categorical | 4 |
Dataset
Description | 해외한국학지원사업 연구성과를 등록한 기관 정보 |
---|---|
Author | 한국학중앙연구원 |
URL | https://www.data.go.kr/data/15049067/fileData.do |
GANADA_ORGANIZATION_ORI is highly overall correlated with GANADA_ORGANIZATION_KOR and 2 other fields | High correlation |
GANADA_ORGANIZATION_ETC is highly overall correlated with GANADA_ORGANIZATION_ORI | High correlation |
ORGANIZATION_ID is highly overall correlated with GANADA_ORGANIZATION_KOR and 1 other fields | High correlation |
GANADA_ORGANIZATION_KOR is highly overall correlated with ORGANIZATION_ID and 2 other fields | High correlation |
GANADA_ORGANIZATION_ENG is highly overall correlated with ORGANIZATION_ID and 2 other fields | High correlation |
GANADA_ORGANIZATION_ETC is highly imbalanced (86.3%) | Imbalance |
ORGANIZATION_KOR has 250 (11.4%) missing values | Missing |
ORGANIZATION_ENG has 235 (10.7%) missing values | Missing |
ORGANIZATION_ETC has 2043 (93.2%) missing values | Missing |
SORT_ORGANIZATION_KOR has 250 (11.4%) missing values | Missing |
SORT_ORGANIZATION_ENG has 235 (10.7%) missing values | Missing |
SORT_ORGANIZATION_ETC has 2043 (93.2%) missing values | Missing |
ORGANIZATION_ID has unique values | Unique |
Reproduction
Analysis started | 2023-12-12 02:09:17.738799 |
---|---|
Analysis finished | 2023-12-12 02:09:20.558884 |
Duration | 2.82 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
ORGANIZATION_ID
Real number (ℝ)
HIGH CORRELATION
  UNIQUE
 
Distinct | 2191 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 7279.8129 |
Minimum | 6154 |
---|---|
Maximum | 8608 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 19.4 KiB |
Quantile statistics
Minimum | 6154 |
---|---|
5-th percentile | 6263.5 |
Q1 | 6705.5 |
median | 7267 |
Q3 | 7817.5 |
95-th percentile | 8437.5 |
Maximum | 8608 |
Range | 2454 |
Interquartile range (IQR) | 1112 |
Descriptive statistics
Standard deviation | 668.89804 |
---|---|
Coefficient of variation (CV) | 0.091883961 |
Kurtosis | -1.0390722 |
Mean | 7279.8129 |
Median Absolute Deviation (MAD) | 556 |
Skewness | 0.12539726 |
Sum | 15950070 |
Variance | 447424.59 |
Monotonicity | Strictly increasing |
Value | Count | Frequency (%) |
6154 | 1 | < 0.1% |
7637 | 1 | < 0.1% |
7631 | 1 | < 0.1% |
7632 | 1 | < 0.1% |
7633 | 1 | < 0.1% |
7634 | 1 | < 0.1% |
7635 | 1 | < 0.1% |
7636 | 1 | < 0.1% |
7638 | 1 | < 0.1% |
7612 | 1 | < 0.1% |
Other values (2181) | 2181 |
Value | Count | Frequency (%) |
6154 | 1 | |
6155 | 1 | |
6156 | 1 | |
6157 | 1 | |
6158 | 1 | |
6159 | 1 | |
6160 | 1 | |
6161 | 1 | |
6162 | 1 | |
6163 | 1 |
Value | Count | Frequency (%) |
8608 | 1 | |
8607 | 1 | |
8602 | 1 | |
8601 | 1 | |
8600 | 1 | |
8598 | 1 | |
8596 | 1 | |
8593 | 1 | |
8590 | 1 | |
8588 | 1 |
CATALOG_ID
Text
Distinct | 2100 |
---|---|
Distinct (%) | 95.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 17.2 KiB |
Length
Max length | 11 |
---|---|
Median length | 10 |
Mean length | 9.3455043 |
Min length | 5 |
Characters and Unicode
Total characters | 20476 |
---|---|
Distinct characters | 20 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 2025 ? |
---|---|
Unique (%) | 92.4% |
Sample
1st row | 08C19_0019 |
---|---|
2nd row | 08C09_0004 |
3rd row | 06C10_0028 |
4th row | 06C10_0028 |
5th row | 08C09_0024 |
Value | Count | Frequency (%) |
09r33_0002 | 4 | 0.2% |
09r33_0001 | 4 | 0.2% |
09c12_0018 | 4 | 0.2% |
09r34 | 4 | 0.2% |
09c12_0004 | 4 | 0.2% |
09c02_0052 | 3 | 0.1% |
06c13_0016 | 3 | 0.1% |
07c09_0040 | 3 | 0.1% |
06p01_0017 | 3 | 0.1% |
10c15_0016 | 3 | 0.1% |
Other values (2090) | 2156 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 7649 | |
1 | 2288 | 11.2% |
_ | 1887 | 9.2% |
C | 1585 | 7.7% |
2 | 1042 | 5.1% |
6 | 1022 | 5.0% |
9 | 951 | 4.6% |
7 | 820 | 4.0% |
5 | 686 | 3.4% |
8 | 665 | 3.2% |
Other values (10) | 1881 | 9.2% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 16319 | |
Uppercase Letter | 2263 | 11.1% |
Connector Punctuation | 1887 | 9.2% |
Lowercase Letter | 7 | < 0.1% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 7649 | |
1 | 2288 | 14.0% |
2 | 1042 | 6.4% |
6 | 1022 | 6.3% |
9 | 951 | 5.8% |
7 | 820 | 5.0% |
5 | 686 | 4.2% |
8 | 665 | 4.1% |
3 | 654 | 4.0% |
4 | 542 | 3.3% |
Uppercase Letter
Value | Count | Frequency (%) |
C | 1585 | |
R | 340 | 15.0% |
P | 186 | 8.2% |
V | 150 | 6.6% |
M | 2 | 0.1% |
Lowercase Letter
Value | Count | Frequency (%) |
b | 3 | |
a | 2 | |
d | 1 | 14.3% |
t | 1 | 14.3% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 1887 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 18206 | |
Latin | 2270 | 11.1% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 7649 | |
1 | 2288 | 12.6% |
_ | 1887 | 10.4% |
2 | 1042 | 5.7% |
6 | 1022 | 5.6% |
9 | 951 | 5.2% |
7 | 820 | 4.5% |
5 | 686 | 3.8% |
8 | 665 | 3.7% |
3 | 654 | 3.6% |
Latin
Value | Count | Frequency (%) |
C | 1585 | |
R | 340 | 15.0% |
P | 186 | 8.2% |
V | 150 | 6.6% |
b | 3 | 0.1% |
a | 2 | 0.1% |
M | 2 | 0.1% |
d | 1 | < 0.1% |
t | 1 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 20476 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 7649 | |
1 | 2288 | 11.2% |
_ | 1887 | 9.2% |
C | 1585 | 7.7% |
2 | 1042 | 5.1% |
6 | 1022 | 5.0% |
9 | 951 | 4.6% |
7 | 820 | 4.0% |
5 | 686 | 3.4% |
8 | 665 | 3.2% |
Other values (10) | 1881 | 9.2% |
ORGANIZATION_ORI
Text
Distinct | 965 |
---|---|
Distinct (%) | 44.3% |
Missing | 11 |
Missing (%) | 0.5% |
Memory size | 17.2 KiB |
Length
Max length | 107 |
---|---|
Median length | 79 |
Mean length | 18.038991 |
Min length | 1 |
Characters and Unicode
Total characters | 39325 |
---|---|
Distinct characters | 654 |
Distinct categories | 13 ? |
Distinct scripts | 11 ? |
Distinct blocks | 14 ? |
Unique
Unique | 660 ? |
---|---|
Unique (%) | 30.3% |
Sample
1st row | GMM Grammy ltd |
---|---|
2nd row | Kanazawa University |
3rd row | Kanazawa University |
4th row | Tohoku University |
5th row | Kanazawa University |
Value | Count | Frequency (%) |
university | 852 | 16.0% |
of | 431 | 8.1% |
the | 118 | 2.2% |
national | 90 | 1.7% |
연변대학교 | 64 | 1.2% |
studies | 56 | 1.0% |
연변대학 | 52 | 1.0% |
им | 47 | 0.9% |
중앙민족대학교 | 46 | 0.9% |
state | 46 | 0.9% |
Other values (1204) | 3534 |
Most occurring characters
Value | Count | Frequency (%) |
3176 | 8.1% | |
i | 2913 | 7.4% |
n | 2256 | 5.7% |
e | 2220 | 5.6% |
t | 1738 | 4.4% |
a | 1606 | 4.1% |
o | 1597 | 4.1% |
r | 1511 | 3.8% |
s | 1471 | 3.7% |
y | 1106 | 2.8% |
Other values (644) | 19731 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 25650 | |
Other Letter | 5879 | 14.9% |
Uppercase Letter | 4098 | 10.4% |
Space Separator | 3176 | 8.1% |
Other Punctuation | 260 | 0.7% |
Dash Punctuation | 74 | 0.2% |
Close Punctuation | 51 | 0.1% |
Open Punctuation | 51 | 0.1% |
Decimal Number | 36 | 0.1% |
Nonspacing Mark | 31 | 0.1% |
Other values (3) | 19 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
학 | 640 | 10.9% |
대 | 588 | 10.0% |
교 | 418 | 7.1% |
大 | 273 | 4.6% |
学 | 256 | 4.4% |
연 | 155 | 2.6% |
국 | 137 | 2.3% |
변 | 118 | 2.0% |
중 | 116 | 2.0% |
앙 | 96 | 1.6% |
Other values (473) | 3082 |
Lowercase Letter
Value | Count | Frequency (%) |
i | 2913 | 11.4% |
n | 2256 | 8.8% |
e | 2220 | 8.7% |
t | 1738 | 6.8% |
a | 1606 | 6.3% |
o | 1597 | 6.2% |
r | 1511 | 5.9% |
s | 1471 | 5.7% |
y | 1106 | 4.3% |
v | 975 | 3.8% |
Other values (77) | 8257 |
Uppercase Letter
Value | Count | Frequency (%) |
U | 940 | |
S | 472 | 11.5% |
C | 229 | 5.6% |
A | 217 | 5.3% |
T | 189 | 4.6% |
N | 167 | 4.1% |
H | 141 | 3.4% |
K | 139 | 3.4% |
M | 128 | 3.1% |
I | 128 | 3.1% |
Other values (42) | 1348 |
Nonspacing Mark
Value | Count | Frequency (%) |
្ | 9 | |
ិ | 6 | |
́ | 4 | |
ំ | 3 | 9.7% |
័ | 3 | 9.7% |
ូ | 3 | 9.7% |
ั | 1 | 3.2% |
ิ | 1 | 3.2% |
ู | 1 | 3.2% |
Other Punctuation
Value | Count | Frequency (%) |
. | 116 | |
, | 116 | |
' | 14 | 5.4% |
& | 8 | 3.1% |
" | 2 | 0.8% |
: | 2 | 0.8% |
/ | 2 | 0.8% |
Decimal Number
Value | Count | Frequency (%) |
1 | 18 | |
2 | 10 | |
7 | 4 | 11.1% |
3 | 3 | 8.3% |
5 | 1 | 2.8% |
Open Punctuation
Value | Count | Frequency (%) |
( | 50 | |
„ | 1 | 2.0% |
Spacing Mark
Value | Count | Frequency (%) |
ា | 6 | |
េ | 3 |
Final Punctuation
Value | Count | Frequency (%) |
’ | 4 | |
” | 2 |
Initial Punctuation
Value | Count | Frequency (%) |
“ | 3 | |
‘ | 1 | 25.0% |
Space Separator
Value | Count | Frequency (%) |
3176 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 74 |
Close Punctuation
Value | Count | Frequency (%) |
) | 51 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 25253 | |
Cyrillic | 4494 | 11.4% |
Hangul | 4083 | 10.4% |
Common | 3658 | 9.3% |
Han | 1685 | 4.3% |
Khmer | 81 | 0.2% |
Arabic | 30 | 0.1% |
Katakana | 20 | 0.1% |
Thai | 16 | < 0.1% |
Inherited | 4 | < 0.1% |
Most frequent character per script
Han
Value | Count | Frequency (%) |
大 | 273 | 16.2% |
学 | 256 | 15.2% |
延 | 62 | 3.7% |
边 | 58 | 3.4% |
院 | 51 | 3.0% |
国 | 42 | 2.5% |
學 | 38 | 2.3% |
中 | 36 | 2.1% |
科 | 28 | 1.7% |
东 | 28 | 1.7% |
Other values (228) | 813 |
Hangul
Value | Count | Frequency (%) |
학 | 640 | 15.7% |
대 | 588 | 14.4% |
교 | 418 | 10.2% |
연 | 155 | 3.8% |
국 | 137 | 3.4% |
변 | 118 | 2.9% |
중 | 116 | 2.8% |
앙 | 96 | 2.4% |
원 | 90 | 2.2% |
사 | 83 | 2.0% |
Other values (192) | 1642 |
Latin
Value | Count | Frequency (%) |
i | 2913 | 11.5% |
n | 2256 | 8.9% |
e | 2220 | 8.8% |
t | 1738 | 6.9% |
a | 1606 | 6.4% |
o | 1597 | 6.3% |
r | 1511 | 6.0% |
s | 1471 | 5.8% |
y | 1106 | 4.4% |
v | 975 | 3.9% |
Other values (76) | 7860 |
Cyrillic
Value | Count | Frequency (%) |
и | 469 | 10.4% |
а | 410 | 9.1% |
н | 334 | 7.4% |
т | 322 | 7.2% |
с | 316 | 7.0% |
е | 292 | 6.5% |
о | 205 | 4.6% |
р | 188 | 4.2% |
к | 185 | 4.1% |
в | 157 | 3.5% |
Other values (42) | 1616 |
Common
Value | Count | Frequency (%) |
3176 | ||
. | 116 | 3.2% |
, | 116 | 3.2% |
- | 74 | 2.0% |
) | 51 | 1.4% |
( | 50 | 1.4% |
1 | 18 | 0.5% |
' | 14 | 0.4% |
2 | 10 | 0.3% |
& | 8 | 0.2% |
Other values (11) | 25 | 0.7% |
Khmer
Value | Count | Frequency (%) |
្ | 9 | 11.1% |
ទ | 6 | 7.4% |
ិ | 6 | 7.4% |
ា | 6 | 7.4% |
ល | 6 | 7.4% |
ន | 6 | 7.4% |
ភ | 6 | 7.4% |
យ | 6 | 7.4% |
ំ | 3 | 3.7% |
័ | 3 | 3.7% |
Other values (8) | 24 |
Thai
Value | Count | Frequency (%) |
า | 3 | |
ย | 2 | |
ม | 1 | 6.2% |
ห | 1 | 6.2% |
พ | 1 | 6.2% |
ร | 1 | 6.2% |
บ | 1 | 6.2% |
ั | 1 | 6.2% |
ล | 1 | 6.2% |
ท | 1 | 6.2% |
Other values (3) | 3 |
Arabic
Value | Count | Frequency (%) |
ا | 7 | |
ل | 4 | |
ة | 4 | |
ج | 2 | 6.7% |
ع | 2 | 6.7% |
ر | 2 | 6.7% |
د | 2 | 6.7% |
ن | 2 | 6.7% |
ي | 2 | 6.7% |
م | 2 | 6.7% |
Katakana
Value | Count | Frequency (%) |
ル | 4 | |
ソ | 4 | |
ウ | 4 | |
オ | 1 | 5.0% |
ジ | 1 | 5.0% |
ラ | 1 | 5.0% |
ト | 1 | 5.0% |
ッ | 1 | 5.0% |
ク | 1 | 5.0% |
リ | 1 | 5.0% |
Inherited
Value | Count | Frequency (%) |
́ | 4 |
Greek
Value | Count | Frequency (%) |
γ | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 28758 | |
Cyrillic | 4494 | 11.4% |
Hangul | 4083 | 10.4% |
CJK | 1683 | 4.3% |
None | 99 | 0.3% |
Khmer | 81 | 0.2% |
Latin Ext Additional | 42 | 0.1% |
Arabic | 30 | 0.1% |
Katakana | 20 | 0.1% |
Thai | 16 | < 0.1% |
Other values (4) | 19 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
3176 | 11.0% | |
i | 2913 | 10.1% |
n | 2256 | 7.8% |
e | 2220 | 7.7% |
t | 1738 | 6.0% |
a | 1606 | 5.6% |
o | 1597 | 5.6% |
r | 1511 | 5.3% |
s | 1471 | 5.1% |
y | 1106 | 3.8% |
Other values (58) | 9164 |
Hangul
Value | Count | Frequency (%) |
학 | 640 | 15.7% |
대 | 588 | 14.4% |
교 | 418 | 10.2% |
연 | 155 | 3.8% |
국 | 137 | 3.4% |
변 | 118 | 2.9% |
중 | 116 | 2.8% |
앙 | 96 | 2.4% |
원 | 90 | 2.2% |
사 | 83 | 2.0% |
Other values (192) | 1642 |
Cyrillic
Value | Count | Frequency (%) |
и | 469 | 10.4% |
а | 410 | 9.1% |
н | 334 | 7.4% |
т | 322 | 7.2% |
с | 316 | 7.0% |
е | 292 | 6.5% |
о | 205 | 4.6% |
р | 188 | 4.2% |
к | 185 | 4.1% |
в | 157 | 3.5% |
Other values (42) | 1616 |
CJK
Value | Count | Frequency (%) |
大 | 273 | 16.2% |
学 | 256 | 15.2% |
延 | 62 | 3.7% |
边 | 58 | 3.4% |
院 | 51 | 3.0% |
国 | 42 | 2.5% |
學 | 38 | 2.3% |
中 | 36 | 2.1% |
科 | 28 | 1.7% |
东 | 28 | 1.7% |
Other values (226) | 811 |
None
Value | Count | Frequency (%) |
ö | 12 | |
ä | 11 | |
Đ | 11 | |
é | 9 | 9.1% |
à | 9 | 9.1% |
á | 9 | 9.1% |
ó | 5 | 5.1% |
Ü | 4 | 4.0% |
ü | 3 | 3.0% |
ș | 3 | 3.0% |
Other values (14) | 23 |
Latin Ext Additional
Value | Count | Frequency (%) |
ọ | 10 | |
ạ | 10 | |
ố | 5 | |
ệ | 5 | |
ộ | 4 | 9.5% |
ứ | 3 | 7.1% |
ế | 2 | 4.8% |
ồ | 1 | 2.4% |
ờ | 1 | 2.4% |
ử | 1 | 2.4% |
Khmer
Value | Count | Frequency (%) |
្ | 9 | 11.1% |
ទ | 6 | 7.4% |
ិ | 6 | 7.4% |
ា | 6 | 7.4% |
ល | 6 | 7.4% |
ន | 6 | 7.4% |
ភ | 6 | 7.4% |
យ | 6 | 7.4% |
ំ | 3 | 3.7% |
័ | 3 | 3.7% |
Other values (8) | 24 |
Arabic
Value | Count | Frequency (%) |
ا | 7 | |
ل | 4 | |
ة | 4 | |
ج | 2 | 6.7% |
ع | 2 | 6.7% |
ر | 2 | 6.7% |
د | 2 | 6.7% |
ن | 2 | 6.7% |
ي | 2 | 6.7% |
م | 2 | 6.7% |
Katakana
Value | Count | Frequency (%) |
ル | 4 | |
ソ | 4 | |
ウ | 4 | |
オ | 1 | 5.0% |
ジ | 1 | 5.0% |
ラ | 1 | 5.0% |
ト | 1 | 5.0% |
ッ | 1 | 5.0% |
ク | 1 | 5.0% |
リ | 1 | 5.0% |
Diacriticals
Value | Count | Frequency (%) |
́ | 4 |
Punctuation
Value | Count | Frequency (%) |
’ | 4 | |
“ | 3 | |
” | 2 | |
„ | 1 | 9.1% |
‘ | 1 | 9.1% |
Thai
Value | Count | Frequency (%) |
า | 3 | |
ย | 2 | |
ม | 1 | 6.2% |
ห | 1 | 6.2% |
พ | 1 | 6.2% |
ร | 1 | 6.2% |
บ | 1 | 6.2% |
ั | 1 | 6.2% |
ล | 1 | 6.2% |
ท | 1 | 6.2% |
Other values (3) | 3 |
IPA Ext
Value | Count | Frequency (%) |
ə | 2 |
CJK Compat Ideographs
Value | Count | Frequency (%) |
女 | 1 | |
林 | 1 |
ORGANIZATION_KOR
Text
MISSING
 
Distinct | 514 |
---|---|
Distinct (%) | 26.5% |
Missing | 250 |
Missing (%) | 11.4% |
Memory size | 17.2 KiB |
Value | Count | Frequency (%) |
연변대학교 | 146 | 6.9% |
중앙민족대학교 | 71 | 3.4% |
서울대학교 | 55 | 2.6% |
조선사회과학원 | 39 | 1.8% |
한국학중앙연구원 | 36 | 1.7% |
고려대학교 | 33 | 1.6% |
푸단대학교 | 32 | 1.5% |
하와이대학교 | 26 | 1.2% |
산둥대학교 | 26 | 1.2% |
중앙대학교 | 26 | 1.2% |
Other values (540) | 1623 |
Most occurring characters
Value | Count | Frequency (%) |
학 | 1920 | 13.7% |
대 | 1779 | 12.7% |
교 | 1739 | 12.4% |
국 | 300 | 2.1% |
스 | 271 | 1.9% |
연 | 268 | 1.9% |
사 | 198 | 1.4% |
이 | 183 | 1.3% |
아 | 182 | 1.3% |
중 | 175 | 1.2% |
Other values (383) | 7015 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 13775 | |
Space Separator | 173 | 1.2% |
Decimal Number | 24 | 0.2% |
Close Punctuation | 15 | 0.1% |
Open Punctuation | 15 | 0.1% |
Dash Punctuation | 14 | 0.1% |
Other Punctuation | 11 | 0.1% |
Uppercase Letter | 3 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
학 | 1920 | 13.9% |
대 | 1779 | 12.9% |
교 | 1739 | 12.6% |
국 | 300 | 2.2% |
스 | 271 | 2.0% |
연 | 268 | 1.9% |
사 | 198 | 1.4% |
이 | 183 | 1.3% |
아 | 182 | 1.3% |
중 | 175 | 1.3% |
Other values (370) | 6760 |
Decimal Number
Value | Count | Frequency (%) |
2 | 9 | |
5 | 6 | |
7 | 4 | |
3 | 3 | 12.5% |
0 | 1 | 4.2% |
1 | 1 | 4.2% |
Uppercase Letter
Value | Count | Frequency (%) |
M | 2 | |
G | 1 |
Space Separator
Value | Count | Frequency (%) |
173 |
Close Punctuation
Value | Count | Frequency (%) |
) | 15 |
Open Punctuation
Value | Count | Frequency (%) |
( | 15 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 14 |
Other Punctuation
Value | Count | Frequency (%) |
, | 11 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 13775 | |
Common | 252 | 1.8% |
Latin | 3 | < 0.1% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
학 | 1920 | 13.9% |
대 | 1779 | 12.9% |
교 | 1739 | 12.6% |
국 | 300 | 2.2% |
스 | 271 | 2.0% |
연 | 268 | 1.9% |
사 | 198 | 1.4% |
이 | 183 | 1.3% |
아 | 182 | 1.3% |
중 | 175 | 1.3% |
Other values (370) | 6760 |
Common
Value | Count | Frequency (%) |
173 | ||
) | 15 | 6.0% |
( | 15 | 6.0% |
- | 14 | 5.6% |
, | 11 | 4.4% |
2 | 9 | 3.6% |
5 | 6 | 2.4% |
7 | 4 | 1.6% |
3 | 3 | 1.2% |
0 | 1 | 0.4% |
Latin
Value | Count | Frequency (%) |
M | 2 | |
G | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 13775 | |
ASCII | 255 | 1.8% |
Most frequent character per block
Hangul
Value | Count | Frequency (%) |
학 | 1920 | 13.9% |
대 | 1779 | 12.9% |
교 | 1739 | 12.6% |
국 | 300 | 2.2% |
스 | 271 | 2.0% |
연 | 268 | 1.9% |
사 | 198 | 1.4% |
이 | 183 | 1.3% |
아 | 182 | 1.3% |
중 | 175 | 1.3% |
Other values (370) | 6760 |
ASCII
Value | Count | Frequency (%) |
173 | ||
) | 15 | 5.9% |
( | 15 | 5.9% |
- | 14 | 5.5% |
, | 11 | 4.3% |
2 | 9 | 3.5% |
5 | 6 | 2.4% |
7 | 4 | 1.6% |
3 | 3 | 1.2% |
M | 2 | 0.8% |
Other values (3) | 3 | 1.2% |
ORGANIZATION_ENG
Text
MISSING
 
Distinct | 565 |
---|---|
Distinct (%) | 28.9% |
Missing | 235 |
Missing (%) | 10.7% |
Memory size | 17.2 KiB |
Length
Max length | 122 |
---|---|
Median length | 60 |
Mean length | 26.858384 |
Min length | 6 |
Characters and Unicode
Total characters | 52535 |
---|---|
Distinct characters | 78 |
Distinct categories | 10 ? |
Distinct scripts | 3 ? |
Distinct blocks | 4 ? |
Unique
Unique | 284 ? |
---|---|
Unique (%) | 14.5% |
Sample
1st row | GMM Grammy ltd |
---|---|
2nd row | Kanazawa University |
3rd row | Kanazawa University |
4th row | Tohoku University |
5th row | Kanazawa University |
Value | Count | Frequency (%) |
university | 1688 | |
of | 692 | 10.2% |
the | 196 | 2.9% |
national | 169 | 2.5% |
yanbian | 159 | 2.4% |
studies | 126 | 1.9% |
and | 87 | 1.3% |
for | 86 | 1.3% |
central | 75 | 1.1% |
nationalities | 73 | 1.1% |
Other values (675) | 3404 |
Most occurring characters
Value | Count | Frequency (%) |
i | 5712 | 10.9% |
4824 | 9.2% | |
n | 4712 | 9.0% |
e | 4186 | 8.0% |
a | 3349 | 6.4% |
t | 3246 | 6.2% |
o | 2838 | 5.4% |
r | 2807 | 5.3% |
s | 2802 | 5.3% |
y | 2155 | 4.1% |
Other values (68) | 15904 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 41244 | |
Uppercase Letter | 6150 | 11.7% |
Space Separator | 4825 | 9.2% |
Other Punctuation | 140 | 0.3% |
Dash Punctuation | 100 | 0.2% |
Open Punctuation | 34 | 0.1% |
Close Punctuation | 34 | 0.1% |
Final Punctuation | 5 | < 0.1% |
Decimal Number | 2 | < 0.1% |
Initial Punctuation | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
i | 5712 | |
n | 4712 | |
e | 4186 | |
a | 3349 | |
t | 3246 | |
o | 2838 | 6.9% |
r | 2807 | 6.8% |
s | 2802 | 6.8% |
y | 2155 | 5.2% |
v | 1794 | 4.3% |
Other values (26) | 7643 |
Uppercase Letter
Value | Count | Frequency (%) |
U | 1742 | |
S | 739 | |
C | 407 | 6.6% |
N | 361 | 5.9% |
T | 323 | 5.3% |
A | 294 | 4.8% |
K | 290 | 4.7% |
Y | 228 | 3.7% |
F | 223 | 3.6% |
H | 181 | 2.9% |
Other values (18) | 1362 |
Other Punctuation
Value | Count | Frequency (%) |
, | 91 | |
' | 25 | 17.9% |
& | 13 | 9.3% |
. | 9 | 6.4% |
: | 2 | 1.4% |
Space Separator
Value | Count | Frequency (%) |
4824 | ||
1 | < 0.1% |
Final Punctuation
Value | Count | Frequency (%) |
’ | 4 | |
” | 1 | 20.0% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 100 |
Open Punctuation
Value | Count | Frequency (%) |
( | 34 |
Close Punctuation
Value | Count | Frequency (%) |
) | 34 |
Decimal Number
Value | Count | Frequency (%) |
3 | 2 |
Initial Punctuation
Value | Count | Frequency (%) |
“ | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 47393 | |
Common | 5141 | 9.8% |
Cyrillic | 1 | < 0.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
i | 5712 | |
n | 4712 | 9.9% |
e | 4186 | 8.8% |
a | 3349 | 7.1% |
t | 3246 | 6.8% |
o | 2838 | 6.0% |
r | 2807 | 5.9% |
s | 2802 | 5.9% |
y | 2155 | 4.5% |
v | 1794 | 3.8% |
Other values (53) | 13792 |
Common
Value | Count | Frequency (%) |
4824 | ||
- | 100 | 1.9% |
, | 91 | 1.8% |
( | 34 | 0.7% |
) | 34 | 0.7% |
' | 25 | 0.5% |
& | 13 | 0.3% |
. | 9 | 0.2% |
’ | 4 | 0.1% |
: | 2 | < 0.1% |
Other values (4) | 5 | 0.1% |
Cyrillic
Value | Count | Frequency (%) |
К | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 52462 | |
None | 66 | 0.1% |
Punctuation | 6 | < 0.1% |
Cyrillic | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
i | 5712 | 10.9% |
4824 | 9.2% | |
n | 4712 | 9.0% |
e | 4186 | 8.0% |
a | 3349 | 6.4% |
t | 3246 | 6.2% |
o | 2838 | 5.4% |
r | 2807 | 5.4% |
s | 2802 | 5.3% |
y | 2155 | 4.1% |
Other values (52) | 15831 |
None
Value | Count | Frequency (%) |
ö | 18 | |
é | 15 | |
ä | 10 | |
á | 10 | |
ş | 3 | 4.5% |
É | 2 | 3.0% |
ó | 2 | 3.0% |
ü | 2 | 3.0% |
1 | 1.5% | |
ç | 1 | 1.5% |
Other values (2) | 2 | 3.0% |
Punctuation
Value | Count | Frequency (%) |
’ | 4 | |
“ | 1 | 16.7% |
” | 1 | 16.7% |
Cyrillic
Value | Count | Frequency (%) |
К | 1 |
ORGANIZATION_ETC
Text
MISSING
 
Distinct | 116 |
---|---|
Distinct (%) | 78.4% |
Missing | 2043 |
Missing (%) | 93.2% |
Memory size | 17.2 KiB |
Length
Max length | 107 |
---|---|
Median length | 62 |
Mean length | 34.912162 |
Min length | 4 |
Characters and Unicode
Total characters | 5167 |
---|---|
Distinct characters | 185 |
Distinct categories | 9 ? |
Distinct scripts | 5 ? |
Distinct blocks | 6 ? |
Unique
Unique | 99 ? |
---|---|
Unique (%) | 66.9% |
Sample
1st row | 民族出版社 |
---|---|
2nd row | Сахалинский государственный университет |
3rd row | Сахалинский государственный университет |
4th row | 上海市档案馆 |
5th row | 中央研究院 |
Value | Count | Frequency (%) |
им | 44 | 7.0% |
университет | 37 | 5.9% |
государственный | 34 | 5.4% |
институт | 29 | 4.6% |
и | 22 | 3.5% |
арабаева | 16 | 2.6% |
ташкентский | 15 | 2.4% |
казахский | 11 | 1.8% |
востоковедения | 10 | 1.6% |
кыргызский | 10 | 1.6% |
Other values (199) | 398 |
Most occurring characters
Value | Count | Frequency (%) |
478 | 9.3% | |
и | 418 | 8.1% |
а | 376 | 7.3% |
н | 301 | 5.8% |
т | 289 | 5.6% |
с | 282 | 5.5% |
е | 262 | 5.1% |
о | 193 | 3.7% |
р | 170 | 3.3% |
к | 166 | 3.2% |
Other values (175) | 2232 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 3839 | |
Uppercase Letter | 592 | 11.5% |
Space Separator | 478 | 9.3% |
Other Punctuation | 128 | 2.5% |
Other Letter | 114 | 2.2% |
Dash Punctuation | 6 | 0.1% |
Open Punctuation | 4 | 0.1% |
Close Punctuation | 4 | 0.1% |
Decimal Number | 2 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
и | 418 | 10.9% |
а | 376 | 9.8% |
н | 301 | 7.8% |
т | 289 | 7.5% |
с | 282 | 7.3% |
е | 262 | 6.8% |
о | 193 | 5.0% |
р | 170 | 4.4% |
к | 166 | 4.3% |
в | 141 | 3.7% |
Other values (60) | 1241 |
Other Letter
Value | Count | Frequency (%) |
学 | 8 | 7.0% |
院 | 6 | 5.3% |
大 | 5 | 4.4% |
青 | 4 | 3.5% |
为 | 3 | 2.6% |
现 | 3 | 2.6% |
中 | 3 | 2.6% |
岛 | 3 | 2.6% |
科 | 3 | 2.6% |
技 | 3 | 2.6% |
Other values (57) | 73 |
Uppercase Letter
Value | Count | Frequency (%) |
Г | 69 | |
У | 69 | |
И | 61 | |
К | 60 | 10.1% |
А | 49 | 8.3% |
Т | 39 | 6.6% |
Н | 30 | 5.1% |
М | 24 | 4.1% |
Р | 23 | 3.9% |
В | 18 | 3.0% |
Other values (29) | 150 |
Other Punctuation
Value | Count | Frequency (%) |
. | 93 | |
, | 34 | 26.6% |
& | 1 | 0.8% |
Decimal Number
Value | Count | Frequency (%) |
1 | 1 | |
2 | 1 |
Space Separator
Value | Count | Frequency (%) |
478 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 6 |
Open Punctuation
Value | Count | Frequency (%) |
( | 4 |
Close Punctuation
Value | Count | Frequency (%) |
) | 4 |
Most occurring scripts
Value | Count | Frequency (%) |
Cyrillic | 4088 | |
Common | 622 | 12.0% |
Latin | 343 | 6.6% |
Han | 111 | 2.1% |
Katakana | 3 | 0.1% |
Most frequent character per script
Han
Value | Count | Frequency (%) |
学 | 8 | 7.2% |
院 | 6 | 5.4% |
大 | 5 | 4.5% |
青 | 4 | 3.6% |
为 | 3 | 2.7% |
现 | 3 | 2.7% |
中 | 3 | 2.7% |
岛 | 3 | 2.7% |
科 | 3 | 2.7% |
技 | 3 | 2.7% |
Other values (54) | 70 |
Latin
Value | Count | Frequency (%) |
i | 32 | 9.3% |
a | 25 | 7.3% |
c | 21 | 6.1% |
h | 20 | 5.8% |
n | 19 | 5.5% |
g | 16 | 4.7% |
N | 12 | 3.5% |
u | 11 | 3.2% |
t | 11 | 3.2% |
ạ | 10 | 2.9% |
Other values (47) | 166 |
Cyrillic
Value | Count | Frequency (%) |
и | 418 | 10.2% |
а | 376 | 9.2% |
н | 301 | 7.4% |
т | 289 | 7.1% |
с | 282 | 6.9% |
е | 262 | 6.4% |
о | 193 | 4.7% |
р | 170 | 4.2% |
к | 166 | 4.1% |
в | 141 | 3.4% |
Other values (42) | 1490 |
Common
Value | Count | Frequency (%) |
478 | ||
. | 93 | 15.0% |
, | 34 | 5.5% |
- | 6 | 1.0% |
( | 4 | 0.6% |
) | 4 | 0.6% |
1 | 1 | 0.2% |
2 | 1 | 0.2% |
& | 1 | 0.2% |
Katakana
Value | Count | Frequency (%) |
ラ | 1 | |
ジ | 1 | |
オ | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
Cyrillic | 4088 | |
ASCII | 895 | 17.3% |
CJK | 111 | 2.1% |
Latin Ext Additional | 41 | 0.8% |
None | 29 | 0.6% |
Katakana | 3 | 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
478 | ||
. | 93 | 10.4% |
, | 34 | 3.8% |
i | 32 | 3.6% |
a | 25 | 2.8% |
c | 21 | 2.3% |
h | 20 | 2.2% |
n | 19 | 2.1% |
g | 16 | 1.8% |
N | 12 | 1.3% |
Other values (36) | 145 | 16.2% |
Cyrillic
Value | Count | Frequency (%) |
и | 418 | 10.2% |
а | 376 | 9.2% |
н | 301 | 7.4% |
т | 289 | 7.1% |
с | 282 | 6.9% |
е | 262 | 6.4% |
о | 193 | 4.7% |
р | 170 | 4.2% |
к | 166 | 4.1% |
в | 141 | 3.4% |
Other values (42) | 1490 |
Latin Ext Additional
Value | Count | Frequency (%) |
ạ | 10 | |
ọ | 10 | |
ố | 5 | |
ệ | 5 | |
ộ | 3 | 7.3% |
ứ | 3 | 7.3% |
ế | 2 | 4.9% |
ử | 1 | 2.4% |
ờ | 1 | 2.4% |
ồ | 1 | 2.4% |
None
Value | Count | Frequency (%) |
Đ | 10 | |
à | 4 | 13.8% |
ư | 3 | 10.3% |
ê | 3 | 10.3% |
ô | 2 | 6.9% |
á | 2 | 6.9% |
Á | 2 | 6.9% |
í | 1 | 3.4% |
ò | 1 | 3.4% |
ơ | 1 | 3.4% |
CJK
Value | Count | Frequency (%) |
学 | 8 | 7.2% |
院 | 6 | 5.4% |
大 | 5 | 4.5% |
青 | 4 | 3.6% |
为 | 3 | 2.7% |
现 | 3 | 2.7% |
中 | 3 | 2.7% |
岛 | 3 | 2.7% |
科 | 3 | 2.7% |
技 | 3 | 2.7% |
Other values (54) | 70 |
Katakana
Value | Count | Frequency (%) |
ラ | 1 | |
ジ | 1 | |
オ | 1 |
Distinct | 937 |
---|---|
Distinct (%) | 43.0% |
Missing | 11 |
Missing (%) | 0.5% |
Memory size | 17.2 KiB |
Length
Max length | 100 |
---|---|
Median length | 78 |
Mean length | 17.649541 |
Min length | 1 |
Characters and Unicode
Total characters | 38476 |
---|---|
Distinct characters | 593 |
Distinct categories | 9 ? |
Distinct scripts | 11 ? |
Distinct blocks | 13 ? |
Unique
Unique | 632 ? |
---|---|
Unique (%) | 29.0% |
Sample
1st row | GMM GRAMMY LTD |
---|---|
2nd row | KANAZAWA UNIVERSITY |
3rd row | KANAZAWA UNIVERSITY |
4th row | TOHOKU UNIVERSITY |
5th row | KANAZAWA UNIVERSITY |
Value | Count | Frequency (%) |
university | 852 | 16.3% |
of | 431 | 8.3% |
national | 90 | 1.7% |
연변대학교 | 64 | 1.2% |
studies | 56 | 1.1% |
연변대학 | 52 | 1.0% |
им | 47 | 0.9% |
state | 46 | 0.9% |
중앙민족대학교 | 46 | 0.9% |
университет | 45 | 0.9% |
Other values (1201) | 3495 |
Most occurring characters
Value | Count | Frequency (%) |
3059 | 8.0% | |
I | 3041 | 7.9% |
N | 2423 | 6.3% |
E | 2222 | 5.8% |
S | 1943 | 5.0% |
T | 1829 | 4.8% |
A | 1823 | 4.7% |
O | 1642 | 4.3% |
R | 1563 | 4.1% |
U | 1468 | 3.8% |
Other values (583) | 17463 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 29454 | |
Other Letter | 5879 | 15.3% |
Space Separator | 3059 | 8.0% |
Decimal Number | 36 | 0.1% |
Nonspacing Mark | 31 | 0.1% |
Spacing Mark | 9 | < 0.1% |
Initial Punctuation | 4 | < 0.1% |
Final Punctuation | 3 | < 0.1% |
Open Punctuation | 1 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
학 | 640 | 10.9% |
대 | 588 | 10.0% |
교 | 418 | 7.1% |
大 | 273 | 4.6% |
学 | 256 | 4.4% |
연 | 155 | 2.6% |
국 | 137 | 2.3% |
변 | 118 | 2.0% |
중 | 116 | 2.0% |
앙 | 96 | 1.6% |
Other values (473) | 3082 |
Uppercase Letter
Value | Count | Frequency (%) |
I | 3041 | 10.3% |
N | 2423 | 8.2% |
E | 2222 | 7.5% |
S | 1943 | 6.6% |
T | 1829 | 6.2% |
A | 1823 | 6.2% |
O | 1642 | 5.6% |
R | 1563 | 5.3% |
U | 1468 | 5.0% |
Y | 1144 | 3.9% |
Other values (78) | 10356 |
Nonspacing Mark
Value | Count | Frequency (%) |
្ | 9 | |
ិ | 6 | |
́ | 4 | |
ំ | 3 | 9.7% |
ូ | 3 | 9.7% |
័ | 3 | 9.7% |
ิ | 1 | 3.2% |
ั | 1 | 3.2% |
ู | 1 | 3.2% |
Decimal Number
Value | Count | Frequency (%) |
1 | 18 | |
2 | 10 | |
7 | 4 | 11.1% |
3 | 3 | 8.3% |
5 | 1 | 2.8% |
Spacing Mark
Value | Count | Frequency (%) |
ា | 6 | |
េ | 3 |
Initial Punctuation
Value | Count | Frequency (%) |
“ | 3 | |
‘ | 1 | 25.0% |
Final Punctuation
Value | Count | Frequency (%) |
” | 2 | |
’ | 1 |
Space Separator
Value | Count | Frequency (%) |
3059 |
Open Punctuation
Value | Count | Frequency (%) |
„ | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 24959 | |
Cyrillic | 4494 | 11.7% |
Hangul | 4083 | 10.6% |
Common | 3103 | 8.1% |
Han | 1685 | 4.4% |
Khmer | 81 | 0.2% |
Arabic | 30 | 0.1% |
Katakana | 20 | 0.1% |
Thai | 16 | < 0.1% |
Inherited | 4 | < 0.1% |
Most frequent character per script
Han
Value | Count | Frequency (%) |
大 | 273 | 16.2% |
学 | 256 | 15.2% |
延 | 62 | 3.7% |
边 | 58 | 3.4% |
院 | 51 | 3.0% |
国 | 42 | 2.5% |
學 | 38 | 2.3% |
中 | 36 | 2.1% |
科 | 28 | 1.7% |
东 | 28 | 1.7% |
Other values (228) | 813 |
Hangul
Value | Count | Frequency (%) |
학 | 640 | 15.7% |
대 | 588 | 14.4% |
교 | 418 | 10.2% |
연 | 155 | 3.8% |
국 | 137 | 3.4% |
변 | 118 | 2.9% |
중 | 116 | 2.8% |
앙 | 96 | 2.4% |
원 | 90 | 2.2% |
사 | 83 | 2.0% |
Other values (192) | 1642 |
Latin
Value | Count | Frequency (%) |
I | 3041 | |
N | 2423 | |
E | 2222 | 8.9% |
S | 1943 | 7.8% |
T | 1829 | 7.3% |
A | 1823 | 7.3% |
O | 1642 | 6.6% |
R | 1563 | 6.3% |
U | 1468 | 5.9% |
Y | 1144 | 4.6% |
Other values (47) | 5861 |
Cyrillic
Value | Count | Frequency (%) |
И | 530 | |
А | 461 | 10.3% |
Н | 366 | 8.1% |
Т | 362 | 8.1% |
С | 333 | 7.4% |
Е | 296 | 6.6% |
К | 251 | 5.6% |
О | 213 | 4.7% |
Р | 213 | 4.7% |
У | 189 | 4.2% |
Other values (20) | 1280 |
Khmer
Value | Count | Frequency (%) |
្ | 9 | 11.1% |
ន | 6 | 7.4% |
ល | 6 | 7.4% |
ិ | 6 | 7.4% |
ទ | 6 | 7.4% |
យ | 6 | 7.4% |
ភ | 6 | 7.4% |
ា | 6 | 7.4% |
ំ | 3 | 3.7% |
ព | 3 | 3.7% |
Other values (8) | 24 |
Thai
Value | Count | Frequency (%) |
า | 3 | |
ย | 2 | |
ว | 1 | 6.2% |
ิ | 1 | 6.2% |
ท | 1 | 6.2% |
ล | 1 | 6.2% |
ั | 1 | 6.2% |
บ | 1 | 6.2% |
ู | 1 | 6.2% |
ร | 1 | 6.2% |
Other values (3) | 3 |
Common
Value | Count | Frequency (%) |
3059 | ||
1 | 18 | 0.6% |
2 | 10 | 0.3% |
7 | 4 | 0.1% |
3 | 3 | 0.1% |
“ | 3 | 0.1% |
” | 2 | 0.1% |
‘ | 1 | < 0.1% |
5 | 1 | < 0.1% |
„ | 1 | < 0.1% |
Arabic
Value | Count | Frequency (%) |
ا | 7 | |
ل | 4 | |
ة | 4 | |
ج | 2 | 6.7% |
م | 2 | 6.7% |
ع | 2 | 6.7% |
ر | 2 | 6.7% |
د | 2 | 6.7% |
ي | 2 | 6.7% |
ن | 2 | 6.7% |
Katakana
Value | Count | Frequency (%) |
ソ | 4 | |
ウ | 4 | |
ル | 4 | |
ジ | 1 | 5.0% |
カ | 1 | 5.0% |
ト | 1 | 5.0% |
ク | 1 | 5.0% |
ッ | 1 | 5.0% |
リ | 1 | 5.0% |
ラ | 1 | 5.0% |
Inherited
Value | Count | Frequency (%) |
́ | 4 |
Greek
Value | Count | Frequency (%) |
Γ | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 27912 | |
Cyrillic | 4494 | 11.7% |
Hangul | 4083 | 10.6% |
CJK | 1683 | 4.4% |
None | 101 | 0.3% |
Khmer | 81 | 0.2% |
Latin Ext Additional | 42 | 0.1% |
Arabic | 30 | 0.1% |
Katakana | 20 | 0.1% |
Thai | 16 | < 0.1% |
Other values (3) | 14 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
3059 | ||
I | 3041 | |
N | 2423 | 8.7% |
E | 2222 | 8.0% |
S | 1943 | 7.0% |
T | 1829 | 6.6% |
A | 1823 | 6.5% |
O | 1642 | 5.9% |
R | 1563 | 5.6% |
U | 1468 | 5.3% |
Other values (22) | 6899 |
Hangul
Value | Count | Frequency (%) |
학 | 640 | 15.7% |
대 | 588 | 14.4% |
교 | 418 | 10.2% |
연 | 155 | 3.8% |
국 | 137 | 3.4% |
변 | 118 | 2.9% |
중 | 116 | 2.8% |
앙 | 96 | 2.4% |
원 | 90 | 2.2% |
사 | 83 | 2.0% |
Other values (192) | 1642 |
Cyrillic
Value | Count | Frequency (%) |
И | 530 | |
А | 461 | 10.3% |
Н | 366 | 8.1% |
Т | 362 | 8.1% |
С | 333 | 7.4% |
Е | 296 | 6.6% |
К | 251 | 5.6% |
О | 213 | 4.7% |
Р | 213 | 4.7% |
У | 189 | 4.2% |
Other values (20) | 1280 |
CJK
Value | Count | Frequency (%) |
大 | 273 | 16.2% |
学 | 256 | 15.2% |
延 | 62 | 3.7% |
边 | 58 | 3.4% |
院 | 51 | 3.0% |
国 | 42 | 2.5% |
學 | 38 | 2.3% |
中 | 36 | 2.1% |
科 | 28 | 1.7% |
东 | 28 | 1.7% |
Other values (226) | 811 |
None
Value | Count | Frequency (%) |
Ö | 12 | |
Ä | 11 | |
Đ | 11 | |
Á | 11 | |
É | 10 | |
À | 9 | |
Ü | 7 | 6.9% |
Ó | 5 | 5.0% |
Ư | 3 | 3.0% |
Ž | 3 | 3.0% |
Other values (12) | 19 |
Latin Ext Additional
Value | Count | Frequency (%) |
Ạ | 10 | |
Ọ | 10 | |
Ố | 5 | |
Ệ | 5 | |
Ộ | 4 | 9.5% |
Ứ | 3 | 7.1% |
Ế | 2 | 4.8% |
Ờ | 1 | 2.4% |
Ồ | 1 | 2.4% |
Ử | 1 | 2.4% |
Khmer
Value | Count | Frequency (%) |
្ | 9 | 11.1% |
ន | 6 | 7.4% |
ល | 6 | 7.4% |
ិ | 6 | 7.4% |
ទ | 6 | 7.4% |
យ | 6 | 7.4% |
ភ | 6 | 7.4% |
ា | 6 | 7.4% |
ំ | 3 | 3.7% |
ព | 3 | 3.7% |
Other values (8) | 24 |
Arabic
Value | Count | Frequency (%) |
ا | 7 | |
ل | 4 | |
ة | 4 | |
ج | 2 | 6.7% |
م | 2 | 6.7% |
ع | 2 | 6.7% |
ر | 2 | 6.7% |
د | 2 | 6.7% |
ي | 2 | 6.7% |
ن | 2 | 6.7% |
Katakana
Value | Count | Frequency (%) |
ソ | 4 | |
ウ | 4 | |
ル | 4 | |
ジ | 1 | 5.0% |
カ | 1 | 5.0% |
ト | 1 | 5.0% |
ク | 1 | 5.0% |
ッ | 1 | 5.0% |
リ | 1 | 5.0% |
ラ | 1 | 5.0% |
Diacriticals
Value | Count | Frequency (%) |
́ | 4 |
Punctuation
Value | Count | Frequency (%) |
“ | 3 | |
” | 2 | |
‘ | 1 | 12.5% |
„ | 1 | 12.5% |
’ | 1 | 12.5% |
Thai
Value | Count | Frequency (%) |
า | 3 | |
ย | 2 | |
ว | 1 | 6.2% |
ิ | 1 | 6.2% |
ท | 1 | 6.2% |
ล | 1 | 6.2% |
ั | 1 | 6.2% |
บ | 1 | 6.2% |
ู | 1 | 6.2% |
ร | 1 | 6.2% |
Other values (3) | 3 |
CJK Compat Ideographs
Value | Count | Frequency (%) |
女 | 1 | |
林 | 1 |
MISSING
 
Distinct | 513 |
---|---|
Distinct (%) | 26.4% |
Missing | 250 |
Missing (%) | 11.4% |
Memory size | 17.2 KiB |
Value | Count | Frequency (%) |
연변대학교 | 146 | 6.9% |
중앙민족대학교 | 71 | 3.4% |
서울대학교 | 55 | 2.6% |
조선사회과학원 | 39 | 1.8% |
한국학중앙연구원 | 36 | 1.7% |
고려대학교 | 33 | 1.6% |
푸단대학교 | 32 | 1.5% |
하와이대학교 | 26 | 1.2% |
산둥대학교 | 26 | 1.2% |
중앙대학교 | 26 | 1.2% |
Other values (540) | 1623 |
Most occurring characters
Value | Count | Frequency (%) |
학 | 1920 | 13.7% |
대 | 1779 | 12.7% |
교 | 1739 | 12.4% |
국 | 300 | 2.1% |
스 | 271 | 1.9% |
연 | 268 | 1.9% |
사 | 198 | 1.4% |
이 | 183 | 1.3% |
아 | 182 | 1.3% |
중 | 175 | 1.3% |
Other values (379) | 6959 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 13775 | |
Space Separator | 172 | 1.2% |
Decimal Number | 24 | 0.2% |
Uppercase Letter | 3 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
학 | 1920 | 13.9% |
대 | 1779 | 12.9% |
교 | 1739 | 12.6% |
국 | 300 | 2.2% |
스 | 271 | 2.0% |
연 | 268 | 1.9% |
사 | 198 | 1.4% |
이 | 183 | 1.3% |
아 | 182 | 1.3% |
중 | 175 | 1.3% |
Other values (370) | 6760 |
Decimal Number
Value | Count | Frequency (%) |
2 | 9 | |
5 | 6 | |
7 | 4 | |
3 | 3 | 12.5% |
0 | 1 | 4.2% |
1 | 1 | 4.2% |
Uppercase Letter
Value | Count | Frequency (%) |
M | 2 | |
G | 1 |
Space Separator
Value | Count | Frequency (%) |
172 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 13775 | |
Common | 196 | 1.4% |
Latin | 3 | < 0.1% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
학 | 1920 | 13.9% |
대 | 1779 | 12.9% |
교 | 1739 | 12.6% |
국 | 300 | 2.2% |
스 | 271 | 2.0% |
연 | 268 | 1.9% |
사 | 198 | 1.4% |
이 | 183 | 1.3% |
아 | 182 | 1.3% |
중 | 175 | 1.3% |
Other values (370) | 6760 |
Common
Value | Count | Frequency (%) |
172 | ||
2 | 9 | 4.6% |
5 | 6 | 3.1% |
7 | 4 | 2.0% |
3 | 3 | 1.5% |
0 | 1 | 0.5% |
1 | 1 | 0.5% |
Latin
Value | Count | Frequency (%) |
M | 2 | |
G | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 13775 | |
ASCII | 199 | 1.4% |
Most frequent character per block
Hangul
Value | Count | Frequency (%) |
학 | 1920 | 13.9% |
대 | 1779 | 12.9% |
교 | 1739 | 12.6% |
국 | 300 | 2.2% |
스 | 271 | 2.0% |
연 | 268 | 1.9% |
사 | 198 | 1.4% |
이 | 183 | 1.3% |
아 | 182 | 1.3% |
중 | 175 | 1.3% |
Other values (370) | 6760 |
ASCII
Value | Count | Frequency (%) |
172 | ||
2 | 9 | 4.5% |
5 | 6 | 3.0% |
7 | 4 | 2.0% |
3 | 3 | 1.5% |
M | 2 | 1.0% |
0 | 1 | 0.5% |
1 | 1 | 0.5% |
G | 1 | 0.5% |
MISSING
 
Distinct | 542 |
---|---|
Distinct (%) | 27.7% |
Missing | 235 |
Missing (%) | 10.7% |
Memory size | 17.2 KiB |
Length
Max length | 121 |
---|---|
Median length | 58 |
Mean length | 26.322597 |
Min length | 6 |
Characters and Unicode
Total characters | 51487 |
---|---|
Distinct characters | 43 |
Distinct categories | 5 ? |
Distinct scripts | 3 ? |
Distinct blocks | 4 ? |
Unique
Unique | 260 ? |
---|---|
Unique (%) | 13.3% |
Sample
1st row | GMM GRAMMY LTD |
---|---|
2nd row | KANAZAWA UNIVERSITY |
3rd row | KANAZAWA UNIVERSITY |
4th row | TOHOKU UNIVERSITY |
5th row | KANAZAWA UNIVERSITY |
Value | Count | Frequency (%) |
university | 1688 | |
of | 692 | 10.5% |
national | 169 | 2.6% |
yanbian | 159 | 2.4% |
studies | 126 | 1.9% |
and | 87 | 1.3% |
for | 86 | 1.3% |
central | 75 | 1.1% |
nationalities | 73 | 1.1% |
academy | 72 | 1.1% |
Other values (673) | 3336 |
Most occurring characters
Value | Count | Frequency (%) |
I | 5883 | |
N | 5073 | |
4621 | 9.0% | |
E | 4141 | 8.0% |
A | 3643 | 7.1% |
S | 3541 | 6.9% |
T | 3391 | 6.6% |
O | 2909 | 5.6% |
R | 2875 | 5.6% |
U | 2794 | 5.4% |
Other values (33) | 12616 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 46860 | |
Space Separator | 4622 | 9.0% |
Decimal Number | 2 | < 0.1% |
Final Punctuation | 2 | < 0.1% |
Initial Punctuation | 1 | < 0.1% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
I | 5883 | |
N | 5073 | |
E | 4141 | |
A | 3643 | 7.8% |
S | 3541 | 7.6% |
T | 3391 | 7.2% |
O | 2909 | 6.2% |
R | 2875 | 6.1% |
U | 2794 | 6.0% |
Y | 2383 | 5.1% |
Other values (27) | 10227 |
Space Separator
Value | Count | Frequency (%) |
4621 | ||
1 | < 0.1% |
Final Punctuation
Value | Count | Frequency (%) |
” | 1 | |
’ | 1 |
Decimal Number
Value | Count | Frequency (%) |
3 | 2 |
Initial Punctuation
Value | Count | Frequency (%) |
“ | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 46859 | |
Common | 4627 | 9.0% |
Cyrillic | 1 | < 0.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
I | 5883 | |
N | 5073 | |
E | 4141 | |
A | 3643 | 7.8% |
S | 3541 | 7.6% |
T | 3391 | 7.2% |
O | 2909 | 6.2% |
R | 2875 | 6.1% |
U | 2794 | 6.0% |
Y | 2383 | 5.1% |
Other values (26) | 10226 |
Common
Value | Count | Frequency (%) |
4621 | ||
3 | 2 | < 0.1% |
1 | < 0.1% | |
“ | 1 | < 0.1% |
” | 1 | < 0.1% |
’ | 1 | < 0.1% |
Cyrillic
Value | Count | Frequency (%) |
К | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 51417 | |
None | 66 | 0.1% |
Punctuation | 3 | < 0.1% |
Cyrillic | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
I | 5883 | |
N | 5073 | |
4621 | 9.0% | |
E | 4141 | 8.1% |
A | 3643 | 7.1% |
S | 3541 | 6.9% |
T | 3391 | 6.6% |
O | 2909 | 5.7% |
R | 2875 | 5.6% |
U | 2794 | 5.4% |
Other values (18) | 12546 |
None
Value | Count | Frequency (%) |
Ö | 18 | |
É | 17 | |
Á | 10 | |
Ä | 10 | |
Ş | 3 | 4.5% |
Ó | 2 | 3.0% |
Ü | 2 | 3.0% |
1 | 1.5% | |
Ç | 1 | 1.5% |
Ě | 1 | 1.5% |
Cyrillic
Value | Count | Frequency (%) |
К | 1 |
Punctuation
Value | Count | Frequency (%) |
“ | 1 | |
” | 1 | |
’ | 1 |
MISSING
 
Distinct | 106 |
---|---|
Distinct (%) | 71.6% |
Missing | 2043 |
Missing (%) | 93.2% |
Memory size | 17.2 KiB |
Length
Max length | 100 |
---|---|
Median length | 62 |
Mean length | 33.925676 |
Min length | 4 |
Characters and Unicode
Total characters | 5021 |
---|---|
Distinct characters | 144 |
Distinct categories | 4 ? |
Distinct scripts | 5 ? |
Distinct blocks | 6 ? |
Unique
Unique | 87 ? |
---|---|
Unique (%) | 58.8% |
Sample
1st row | 民族出版社 |
---|---|
2nd row | САХАЛИНСКИЙ ГОСУДАРСТВЕННЫЙ УНИВЕРСИТЕТ |
3rd row | САХАЛИНСКИЙ ГОСУДАРСТВЕННЫЙ УНИВЕРСИТЕТ |
4th row | 上海市档案馆 |
5th row | 中央研究院 |
Value | Count | Frequency (%) |
им | 44 | 7.1% |
университет | 37 | 6.0% |
государственный | 34 | 5.5% |
институт | 29 | 4.7% |
и | 22 | 3.5% |
арабаева | 16 | 2.6% |
ташкентский | 15 | 2.4% |
казахский | 11 | 1.8% |
кыргызский | 10 | 1.6% |
кгу | 10 | 1.6% |
Other values (197) | 393 |
Most occurring characters
Value | Count | Frequency (%) |
И | 479 | 9.5% |
477 | 9.5% | |
А | 425 | 8.5% |
Н | 331 | 6.6% |
Т | 328 | 6.5% |
С | 293 | 5.8% |
Е | 266 | 5.3% |
К | 226 | 4.5% |
О | 200 | 4.0% |
Р | 193 | 3.8% |
Other values (134) | 1803 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 4428 | |
Space Separator | 477 | 9.5% |
Other Letter | 114 | 2.3% |
Decimal Number | 2 | < 0.1% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
И | 479 | 10.8% |
А | 425 | 9.6% |
Н | 331 | 7.5% |
Т | 328 | 7.4% |
С | 293 | 6.6% |
Е | 266 | 6.0% |
К | 226 | 5.1% |
О | 200 | 4.5% |
Р | 193 | 4.4% |
У | 172 | 3.9% |
Other values (64) | 1515 |
Other Letter
Value | Count | Frequency (%) |
学 | 8 | 7.0% |
院 | 6 | 5.3% |
大 | 5 | 4.4% |
青 | 4 | 3.5% |
为 | 3 | 2.6% |
岛 | 3 | 2.6% |
科 | 3 | 2.6% |
技 | 3 | 2.6% |
现 | 3 | 2.6% |
中 | 3 | 2.6% |
Other values (57) | 73 |
Decimal Number
Value | Count | Frequency (%) |
2 | 1 | |
1 | 1 |
Space Separator
Value | Count | Frequency (%) |
477 |
Most occurring scripts
Value | Count | Frequency (%) |
Cyrillic | 4088 | |
Common | 479 | 9.5% |
Latin | 340 | 6.8% |
Han | 111 | 2.2% |
Katakana | 3 | 0.1% |
Most frequent character per script
Han
Value | Count | Frequency (%) |
学 | 8 | 7.2% |
院 | 6 | 5.4% |
大 | 5 | 4.5% |
青 | 4 | 3.6% |
为 | 3 | 2.7% |
岛 | 3 | 2.7% |
科 | 3 | 2.7% |
技 | 3 | 2.7% |
现 | 3 | 2.7% |
中 | 3 | 2.7% |
Other values (54) | 70 |
Latin
Value | Count | Frequency (%) |
I | 32 | 9.4% |
N | 31 | 9.1% |
H | 29 | 8.5% |
A | 26 | 7.6% |
C | 22 | 6.5% |
T | 17 | 5.0% |
G | 16 | 4.7% |
U | 13 | 3.8% |
Đ | 10 | 2.9% |
Ạ | 10 | 2.9% |
Other values (34) | 134 |
Cyrillic
Value | Count | Frequency (%) |
И | 479 | |
А | 425 | 10.4% |
Н | 331 | 8.1% |
Т | 328 | 8.0% |
С | 293 | 7.2% |
Е | 266 | 6.5% |
К | 226 | 5.5% |
О | 200 | 4.9% |
Р | 193 | 4.7% |
У | 172 | 4.2% |
Other values (20) | 1175 |
Common
Value | Count | Frequency (%) |
477 | ||
2 | 1 | 0.2% |
1 | 1 | 0.2% |
Katakana
Value | Count | Frequency (%) |
ラ | 1 | |
ジ | 1 | |
オ | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
Cyrillic | 4088 | |
ASCII | 749 | 14.9% |
CJK | 111 | 2.2% |
Latin Ext Additional | 41 | 0.8% |
None | 29 | 0.6% |
Katakana | 3 | 0.1% |
Most frequent character per block
Cyrillic
Value | Count | Frequency (%) |
И | 479 | |
А | 425 | 10.4% |
Н | 331 | 8.1% |
Т | 328 | 8.0% |
С | 293 | 7.2% |
Е | 266 | 6.5% |
К | 226 | 5.5% |
О | 200 | 4.9% |
Р | 193 | 4.7% |
У | 172 | 4.2% |
Other values (20) | 1175 |
ASCII
Value | Count | Frequency (%) |
477 | ||
I | 32 | 4.3% |
N | 31 | 4.1% |
H | 29 | 3.9% |
A | 26 | 3.5% |
C | 22 | 2.9% |
T | 17 | 2.3% |
G | 16 | 2.1% |
U | 13 | 1.7% |
K | 10 | 1.3% |
Other values (18) | 76 | 10.1% |
None
Value | Count | Frequency (%) |
Đ | 10 | |
Á | 4 | 13.8% |
À | 4 | 13.8% |
Ê | 3 | 10.3% |
Ư | 3 | 10.3% |
Ô | 2 | 6.9% |
Ơ | 1 | 3.4% |
Ò | 1 | 3.4% |
Í | 1 | 3.4% |
Latin Ext Additional
Value | Count | Frequency (%) |
Ạ | 10 | |
Ọ | 10 | |
Ệ | 5 | |
Ố | 5 | |
Ứ | 3 | 7.3% |
Ộ | 3 | 7.3% |
Ế | 2 | 4.9% |
Ờ | 1 | 2.4% |
Ử | 1 | 2.4% |
Ồ | 1 | 2.4% |
CJK
Value | Count | Frequency (%) |
学 | 8 | 7.2% |
院 | 6 | 5.4% |
大 | 5 | 4.5% |
青 | 4 | 3.6% |
为 | 3 | 2.7% |
岛 | 3 | 2.7% |
科 | 3 | 2.7% |
技 | 3 | 2.7% |
现 | 3 | 2.7% |
中 | 3 | 2.7% |
Other values (54) | 70 |
Katakana
Value | Count | Frequency (%) |
ラ | 1 | |
ジ | 1 | |
オ | 1 |
GANADA_ORGANIZATION_ORI
Categorical
HIGH CORRELATION
 
Distinct | 41 |
---|---|
Distinct (%) | 1.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 17.2 KiB |
ETC | |
---|---|
U | |
아 | |
자 | |
S | |
Other values (36) |
Length
Max length | 4 |
---|---|
Median length | 1 |
Mean length | 1.4276586 |
Min length | 1 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | G |
---|---|
2nd row | K |
3rd row | K |
4th row | T |
5th row | K |
Common Values
Value | Count | Frequency (%) |
ETC | 452 | |
U | 350 | |
아 | 151 | 6.9% |
자 | 140 | 6.4% |
S | 116 | 5.3% |
사 | 101 | 4.6% |
C | 80 | 3.7% |
K | 80 | 3.7% |
가 | 58 | 2.6% |
바 | 55 | 2.5% |
Other values (31) | 608 |
Length
Value | Count | Frequency (%) |
etc | 452 | |
u | 350 | |
아 | 151 | 6.9% |
자 | 140 | 6.4% |
s | 116 | 5.3% |
사 | 101 | 4.6% |
c | 80 | 3.7% |
k | 80 | 3.7% |
가 | 58 | 2.6% |
바 | 55 | 2.5% |
Other values (31) | 608 |
GANADA_ORGANIZATION_KOR
Categorical
HIGH CORRELATION
 
Distinct | 16 |
---|---|
Distinct (%) | 0.7% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 17.2 KiB |
아 | |
---|---|
사 | |
<NA> | |
자 | |
하 | |
Other values (11) |
Length
Max length | 4 |
---|---|
Median length | 1 |
Mean length | 1.3423094 |
Min length | 1 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | G |
---|---|
2nd row | 가 |
3rd row | 가 |
4th row | 다 |
5th row | 가 |
Common Values
Value | Count | Frequency (%) |
아 | 415 | |
사 | 286 | |
<NA> | 250 | |
자 | 228 | |
하 | 175 | |
가 | 154 | 7.0% |
바 | 144 | 6.6% |
다 | 118 | 5.4% |
파 | 80 | 3.7% |
나 | 76 | 3.5% |
Other values (6) | 265 |
Length
Value | Count | Frequency (%) |
아 | 415 | |
사 | 286 | |
na | 250 | |
자 | 228 | |
하 | 175 | |
가 | 154 | 7.0% |
바 | 144 | 6.6% |
다 | 118 | 5.4% |
파 | 80 | 3.7% |
나 | 76 | 3.5% |
Other values (6) | 265 |
GANADA_ORGANIZATION_ENG
Categorical
HIGH CORRELATION
 
Distinct | 27 |
---|---|
Distinct (%) | 1.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 17.2 KiB |
U | |
---|---|
S | |
<NA> | |
Y | |
C | |
Other values (22) |
Length
Max length | 4 |
---|---|
Median length | 1 |
Mean length | 1.3245094 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | G |
---|---|
2nd row | K |
3rd row | K |
4th row | T |
5th row | K |
Common Values
Value | Count | Frequency (%) |
U | 389 | |
S | 272 | |
<NA> | 235 | |
Y | 206 | |
C | 198 | |
K | 153 | 7.0% |
A | 96 | 4.4% |
H | 63 | 2.9% |
B | 57 | 2.6% |
T | 46 | 2.1% |
Other values (17) | 476 |
Length
Value | Count | Frequency (%) |
u | 389 | |
s | 272 | |
na | 235 | |
y | 206 | |
c | 198 | |
k | 153 | 7.0% |
a | 96 | 4.4% |
h | 63 | 2.9% |
b | 57 | 2.6% |
t | 46 | 2.1% |
Other values (17) | 476 |
GANADA_ORGANIZATION_ETC
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 7 |
---|---|
Distinct (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 17.2 KiB |
<NA> | |
---|---|
ETC | 140 |
V | 3 |
T | 2 |
N | 1 |
Other values (2) | 2 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 3.9251483 |
Min length | 1 |
Unique
Unique | 3 ? |
---|---|
Unique (%) | 0.1% |
Sample
1st row | <NA> |
---|---|
2nd row | <NA> |
3rd row | <NA> |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 2043 | |
ETC | 140 | 6.4% |
V | 3 | 0.1% |
T | 2 | 0.1% |
N | 1 | < 0.1% |
Q | 1 | < 0.1% |
A | 1 | < 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 2043 | |
etc | 140 | 6.4% |
v | 3 | 0.1% |
t | 2 | 0.1% |
n | 1 | < 0.1% |
q | 1 | < 0.1% |
a | 1 | < 0.1% |
ORGANIZATION_ID | GANADA_ORGANIZATION_ORI | GANADA_ORGANIZATION_KOR | GANADA_ORGANIZATION_ENG | GANADA_ORGANIZATION_ETC | |
---|---|---|---|---|---|
ORGANIZATION_ID | 1.000 | 0.874 | 0.912 | 0.849 | 0.000 |
GANADA_ORGANIZATION_ORI | 0.874 | 1.000 | 0.935 | 0.984 | 1.000 |
GANADA_ORGANIZATION_KOR | 0.912 | 0.935 | 1.000 | 0.927 | 0.302 |
GANADA_ORGANIZATION_ENG | 0.849 | 0.984 | 0.927 | 1.000 | 0.193 |
GANADA_ORGANIZATION_ETC | 0.000 | 1.000 | 0.302 | 0.193 | 1.000 |
GANADA_ORGANIZATION_KOR | GANADA_ORGANIZATION_ORI | GANADA_ORGANIZATION_ENG | GANADA_ORGANIZATION_ETC | |
---|---|---|---|---|
GANADA_ORGANIZATION_KOR | 1.000 | 0.604 | 0.610 | 0.236 |
GANADA_ORGANIZATION_ORI | 0.604 | 1.000 | 0.716 | 0.982 |
GANADA_ORGANIZATION_ENG | 0.610 | 0.716 | 1.000 | 0.000 |
GANADA_ORGANIZATION_ETC | 0.236 | 0.982 | 0.000 | 1.000 |
ORGANIZATION_ID | GANADA_ORGANIZATION_ORI | GANADA_ORGANIZATION_KOR | GANADA_ORGANIZATION_ENG | GANADA_ORGANIZATION_ETC | |
---|---|---|---|---|---|
ORGANIZATION_ID | 1.000 | 0.494 | 0.688 | 0.511 | 0.000 |
GANADA_ORGANIZATION_ORI | 0.494 | 1.000 | 0.604 | 0.716 | 0.982 |
GANADA_ORGANIZATION_KOR | 0.688 | 0.604 | 1.000 | 0.610 | 0.236 |
GANADA_ORGANIZATION_ENG | 0.511 | 0.716 | 0.610 | 1.000 | 0.000 |
GANADA_ORGANIZATION_ETC | 0.000 | 0.982 | 0.236 | 0.000 | 1.000 |
ORGANIZATION_ID | CATALOG_ID | ORGANIZATION_ORI | ORGANIZATION_KOR | ORGANIZATION_ENG | ORGANIZATION_ETC | SORT_ORGANIZATION_ORI | SORT_ORGANIZATION_KOR | SORT_ORGANIZATION_ENG | SORT_ORGANIZATION_ETC | GANADA_ORGANIZATION_ORI | GANADA_ORGANIZATION_KOR | GANADA_ORGANIZATION_ENG | GANADA_ORGANIZATION_ETC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6154 | 08C19_0019 | GMM Grammy ltd | GMM그래미사 | GMM Grammy ltd | <NA> | GMM GRAMMY LTD | GMM그래미사 | GMM GRAMMY LTD | <NA> | G | G | G | <NA> |
1 | 6155 | 08C09_0004 | Kanazawa University | 가나자와대학교 | Kanazawa University | <NA> | KANAZAWA UNIVERSITY | 가나자와대학교 | KANAZAWA UNIVERSITY | <NA> | K | 가 | K | <NA> |
2 | 6156 | 06C10_0028 | Kanazawa University | 가나자와대학교 | Kanazawa University | <NA> | KANAZAWA UNIVERSITY | 가나자와대학교 | KANAZAWA UNIVERSITY | <NA> | K | 가 | K | <NA> |
3 | 6157 | 06C10_0028 | Tohoku University | 도호쿠대학교 | Tohoku University | <NA> | TOHOKU UNIVERSITY | 도호쿠대학교 | TOHOKU UNIVERSITY | <NA> | T | 다 | T | <NA> |
4 | 6158 | 08C09_0024 | Kanazawa University | 가나자와대학교 | Kanazawa University | <NA> | KANAZAWA UNIVERSITY | 가나자와대학교 | KANAZAWA UNIVERSITY | <NA> | K | 가 | K | <NA> |
5 | 6159 | 08C09_0024 | University of Oregon | 오리건대학교 | University of Oregon | <NA> | UNIVERSITY OF OREGON | 오리건대학교 | UNIVERSITY OF OREGON | <NA> | U | 아 | U | <NA> |
6 | 6160 | 10C11_0005 | 学習院大学 | 가쿠슈인대학교 | Gakushuin University | <NA> | 学習院大学 | 가쿠슈인대학교 | GAKUSHUIN UNIVERSITY | <NA> | ETC | 가 | G | <NA> |
7 | 6161 | 10C11_0006 | 学習院大学 | 가쿠슈인대학교 | Gakushuin University | <NA> | 学習院大学 | 가쿠슈인대학교 | GAKUSHUIN UNIVERSITY | <NA> | ETC | 가 | G | <NA> |
8 | 6162 | 06C06_0028 | 가톨릭대학교 | 가톨릭대학교 | The Catholic University of Korea | <NA> | 가톨릭대학교 | 가톨릭대학교 | CATHOLIC UNIVERSITY OF KOREA | <NA> | 가 | 가 | C | <NA> |
9 | 6163 | 07C06_0002 | 카톨릭대학교 | 가톨릭대학교 | The Catholic University of Korea | <NA> | 카톨릭대학교 | 가톨릭대학교 | CATHOLIC UNIVERSITY OF KOREA | <NA> | 카 | 가 | C | <NA> |
ORGANIZATION_ID | CATALOG_ID | ORGANIZATION_ORI | ORGANIZATION_KOR | ORGANIZATION_ENG | ORGANIZATION_ETC | SORT_ORGANIZATION_ORI | SORT_ORGANIZATION_KOR | SORT_ORGANIZATION_ENG | SORT_ORGANIZATION_ETC | GANADA_ORGANIZATION_ORI | GANADA_ORGANIZATION_KOR | GANADA_ORGANIZATION_ENG | GANADA_ORGANIZATION_ETC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2181 | 8588 | 12R15_0001 | Purdue University | <NA> | Purdue University | <NA> | PURDUE UNIVERSITY | <NA> | PURDUE UNIVERSITY | <NA> | P | <NA> | P | <NA> |
2182 | 8590 | 11R61_0001 | 연변대학교 | 연변대학교 | Yanbian University | <NA> | 연변대학교 | 연변대학교 | YANBIAN UNIVERSITY | <NA> | 아 | 아 | Y | <NA> |
2183 | 8593 | 11R08 | 延边大学 | 연변대학교 | Yanbian University | <NA> | 延边大学 | 연변대학교 | YANBIAN UNIVERSITY | <NA> | ETC | 아 | Y | <NA> |
2184 | 8596 | 11R61_0003 | 연변대학교 | 연변대학교 | Yanbian University | <NA> | 연변대학교 | 연변대학교 | YANBIAN UNIVERSITY | <NA> | 아 | 아 | Y | <NA> |
2185 | 8598 | 12R15_0002 | Purdue University | <NA> | Purdue University | <NA> | PURDUE UNIVERSITY | <NA> | PURDUE UNIVERSITY | <NA> | P | <NA> | P | <NA> |
2186 | 8600 | 11R61 | 延边大学 | 연변대학교 | Yanbian University | <NA> | 延边大学 | 연변대학교 | YANBIAN UNIVERSITY | <NA> | ETC | 아 | Y | <NA> |
2187 | 8601 | 09R84 | University of Wisconsin-Madison | <NA> | University of Wisconsin-Madison | <NA> | UNIVERSITY OF WISCONSINMADISON | <NA> | UNIVERSITY OF WISCONSINMADISON | <NA> | U | <NA> | U | <NA> |
2188 | 8602 | 07R73 | Russian State University for the Humanities | <NA> | Russian State University for the Humanities | <NA> | RUSSIAN STATE UNIVERSITY FOR THE HUMANITIES | <NA> | RUSSIAN STATE UNIVERSITY FOR THE HUMANITIES | <NA> | R | <NA> | R | <NA> |
2189 | 8607 | 07R12 | 대련대학 한국학연구원 | 대련대학 한국학연구원 | <NA> | <NA> | 대련대학 한국학연구원 | 대련대학 한국학연구원 | <NA> | <NA> | 다 | 다 | <NA> | <NA> |
2190 | 8608 | 07R62 | 延边大学 | 연변대학교 | Yanbian University | <NA> | 延边大学 | 연변대학교 | YANBIAN UNIVERSITY | <NA> | ETC | 아 | Y | <NA> |