Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory546.9 KiB
Average record size in memory56.0 B

Variable types

Text5
Categorical1

Dataset

Description전통의학정보포털 오아시스의 한의연구보고서 입력 정보입니다. 온톨로지키워드제어번호 키워드분류, 식별자,약재명, 한글명, 온톨로지검색한문명으로 이루어져있습니다.
Author한국한의학연구원
URLhttps://www.data.go.kr/data/15086079/fileData.do

Alerts

온톨로지키워드제어번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 22:30:25.832877
Analysis finished2023-12-12 22:30:30.275585
Duration4.44 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:30:30.585343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.5743
Min length1

Characters and Unicode

Total characters55743
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row9,491
2nd row12,922
3rd row28,041
4th row356
5th row13,812
ValueCountFrequency (%)
9,491 1
 
< 0.1%
26,727 1
 
< 0.1%
9,461 1
 
< 0.1%
8,161 1
 
< 0.1%
20,776 1
 
< 0.1%
15,667 1
 
< 0.1%
12,168 1
 
< 0.1%
9,222 1
 
< 0.1%
1,056 1
 
< 0.1%
22,705 1
 
< 0.1%
Other values (9990) 9990
99.9%
2023-12-13T07:30:31.064409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
, 9626
17.3%
1 7532
13.5%
2 7095
12.7%
3 4107
7.4%
5 4080
7.3%
7 4024
7.2%
4 4021
7.2%
6 3975
7.1%
8 3899
7.0%
0 3706
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 46117
82.7%
Other Punctuation 9626
 
17.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 7532
16.3%
2 7095
15.4%
3 4107
8.9%
5 4080
8.8%
7 4024
8.7%
4 4021
8.7%
6 3975
8.6%
8 3899
8.5%
0 3706
8.0%
9 3678
8.0%
Other Punctuation
ValueCountFrequency (%)
, 9626
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 55743
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
, 9626
17.3%
1 7532
13.5%
2 7095
12.7%
3 4107
7.4%
5 4080
7.3%
7 4024
7.2%
4 4021
7.2%
6 3975
7.1%
8 3899
7.0%
0 3706
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 55743
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
, 9626
17.3%
1 7532
13.5%
2 7095
12.7%
3 4107
7.4%
5 4080
7.3%
7 4024
7.2%
4 4021
7.2%
6 3975
7.1%
8 3899
7.0%
0 3706
 
6.6%

키워드분류
Categorical

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Formula
3885 
Symptom
3509 
Medicinal_Material
1051 
Effect
688 
Disease
647 

Length

Max length18
Median length7
Mean length8.0873
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFormula
2nd rowFormula
3rd rowSymptom
4th rowDisease
5th rowFormula

Common Values

ValueCountFrequency (%)
Formula 3885
38.9%
Symptom 3509
35.1%
Medicinal_Material 1051
 
10.5%
Effect 688
 
6.9%
Disease 647
 
6.5%
Pattern 220
 
2.2%

Length

2023-12-13T07:30:31.188392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:30:31.295571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
formula 3885
38.9%
symptom 3509
35.1%
medicinal_material 1051
 
10.5%
effect 688
 
6.9%
disease 647
 
6.5%
pattern 220
 
2.2%
Distinct9964
Distinct (%)99.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:30:31.522152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length180
Median length89
Mean length11.4994
Min length3

Characters and Unicode

Total characters114994
Distinct characters2220
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9940 ?
Unique (%)99.4%

Sample

1st rowFO海蛤散-MB得效-BK심계내과학
2nd rowFO藿香安胃散-MB東醫寶鑑-MB醫鑑-BK동의방제와처방해설-108
3rd rowSY食無味
4th rowDI宿食秘
5th rowFO金水六君煎-MB景岳全書
ValueCountFrequency (%)
35
 
0.3%
mmoo 6
 
0.1%
syo 6
 
0.1%
6
 
0.1%
大黃 6
 
0.1%
人蔘 6
 
0.1%
fo二陳湯 5
 
< 0.1%
知母 5
 
< 0.1%
杏仁 5
 
< 0.1%
fo四物湯 5
 
< 0.1%
Other values (10097) 10201
99.2%
2023-12-13T07:30:31.872677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 7705
 
6.7%
O 5367
 
4.7%
B 4911
 
4.3%
F 4573
 
4.0%
M 4316
 
3.8%
S 3511
 
3.1%
Y 3509
 
3.1%
K 2225
 
1.9%
2060
 
1.8%
2012
 
1.7%
Other values (2210) 74805
65.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 71737
62.4%
Uppercase Letter 31313
27.2%
Dash Punctuation 7705
 
6.7%
Decimal Number 2218
 
1.9%
Connector Punctuation 1404
 
1.2%
Space Separator 286
 
0.2%
Lowercase Letter 254
 
0.2%
Other Punctuation 29
 
< 0.1%
Close Punctuation 24
 
< 0.1%
Open Punctuation 24
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2060
 
2.9%
2012
 
2.8%
1349
 
1.9%
1191
 
1.7%
1151
 
1.6%
1121
 
1.6%
1095
 
1.5%
1068
 
1.5%
1047
 
1.5%
1044
 
1.5%
Other values (2150) 58599
81.7%
Lowercase Letter
ValueCountFrequency (%)
a 34
13.4%
e 27
10.6%
r 27
10.6%
i 21
 
8.3%
t 21
 
8.3%
g 18
 
7.1%
s 15
 
5.9%
o 14
 
5.5%
c 12
 
4.7%
l 9
 
3.5%
Other values (13) 56
22.0%
Uppercase Letter
ValueCountFrequency (%)
O 5367
17.1%
B 4911
15.7%
F 4573
14.6%
M 4316
13.8%
S 3511
11.2%
Y 3509
11.2%
K 2225
7.1%
E 689
 
2.2%
I 647
 
2.1%
D 647
 
2.1%
Other values (6) 918
 
2.9%
Decimal Number
ValueCountFrequency (%)
1 291
13.1%
4 279
12.6%
2 265
11.9%
3 255
11.5%
5 240
10.8%
7 217
9.8%
6 182
8.2%
9 170
7.7%
0 162
7.3%
8 157
7.1%
Other Punctuation
ValueCountFrequency (%)
. 13
44.8%
· 9
31.0%
/ 3
 
10.3%
; 2
 
6.9%
\ 1
 
3.4%
" 1
 
3.4%
Dash Punctuation
ValueCountFrequency (%)
- 7705
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1404
100.0%
Space Separator
ValueCountFrequency (%)
286
100.0%
Close Punctuation
ValueCountFrequency (%)
) 24
100.0%
Open Punctuation
ValueCountFrequency (%)
( 24
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 51791
45.0%
Latin 31567
27.5%
Hangul 19946
 
17.3%
Common 11690
 
10.2%

Most frequent character per script

Han
ValueCountFrequency (%)
2012
 
3.9%
1121
 
2.2%
1021
 
2.0%
944
 
1.8%
919
 
1.8%
854
 
1.6%
687
 
1.3%
669
 
1.3%
485
 
0.9%
475
 
0.9%
Other values (1777) 42604
82.3%
Hangul
ValueCountFrequency (%)
2060
 
10.3%
1349
 
6.8%
1191
 
6.0%
1151
 
5.8%
1095
 
5.5%
1068
 
5.4%
1047
 
5.2%
1044
 
5.2%
891
 
4.5%
884
 
4.4%
Other values (363) 8166
40.9%
Latin
ValueCountFrequency (%)
O 5367
17.0%
B 4911
15.6%
F 4573
14.5%
M 4316
13.7%
S 3511
11.1%
Y 3509
11.1%
K 2225
7.0%
E 689
 
2.2%
I 647
 
2.0%
D 647
 
2.0%
Other values (29) 1172
 
3.7%
Common
ValueCountFrequency (%)
- 7705
65.9%
_ 1404
 
12.0%
1 291
 
2.5%
286
 
2.4%
4 279
 
2.4%
2 265
 
2.3%
3 255
 
2.2%
5 240
 
2.1%
7 217
 
1.9%
6 182
 
1.6%
Other values (11) 566
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
CJK 51789
45.0%
ASCII 43248
37.6%
Hangul 19946
 
17.3%
None 9
 
< 0.1%
CJK Compat Ideographs 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 7705
17.8%
O 5367
12.4%
B 4911
11.4%
F 4573
10.6%
M 4316
10.0%
S 3511
8.1%
Y 3509
8.1%
K 2225
 
5.1%
_ 1404
 
3.2%
E 689
 
1.6%
Other values (49) 5038
11.6%
Hangul
ValueCountFrequency (%)
2060
 
10.3%
1349
 
6.8%
1191
 
6.0%
1151
 
5.8%
1095
 
5.5%
1068
 
5.4%
1047
 
5.2%
1044
 
5.2%
891
 
4.5%
884
 
4.4%
Other values (363) 8166
40.9%
CJK
ValueCountFrequency (%)
2012
 
3.9%
1121
 
2.2%
1021
 
2.0%
944
 
1.8%
919
 
1.8%
854
 
1.6%
687
 
1.3%
669
 
1.3%
485
 
0.9%
475
 
0.9%
Other values (1775) 42602
82.3%
None
ValueCountFrequency (%)
· 9
100.0%
CJK Compat Ideographs
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct9903
Distinct (%)99.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:30:32.114037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length178
Median length87
Mean length9.4994
Min length1

Characters and Unicode

Total characters94994
Distinct characters2216
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9826 ?
Unique (%)98.3%

Sample

1st row海蛤散-MB得效-BK심계내과학
2nd row藿香安胃散-MB東醫寶鑑-MB醫鑑-BK동의방제와처방해설-108
3rd row食無味
4th row宿食秘
5th row金水六君煎-MB景岳全書
ValueCountFrequency (%)
35
 
0.3%
oo 12
 
0.1%
o 7
 
0.1%
大黃 6
 
0.1%
人蔘 6
 
0.1%
二陳湯 6
 
0.1%
6
 
0.1%
黃芩 6
 
0.1%
知母 5
 
< 0.1%
杏仁 5
 
< 0.1%
Other values (10020) 10189
99.1%
2023-12-13T07:30:32.493290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 7705
 
8.1%
B 4911
 
5.2%
K 2225
 
2.3%
M 2214
 
2.3%
2060
 
2.2%
2012
 
2.1%
O 1482
 
1.6%
_ 1404
 
1.5%
1349
 
1.4%
1191
 
1.3%
Other values (2206) 68441
72.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 71737
75.5%
Uppercase Letter 11313
 
11.9%
Dash Punctuation 7705
 
8.1%
Decimal Number 2218
 
2.3%
Connector Punctuation 1404
 
1.5%
Space Separator 286
 
0.3%
Lowercase Letter 254
 
0.3%
Other Punctuation 29
 
< 0.1%
Close Punctuation 24
 
< 0.1%
Open Punctuation 24
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2060
 
2.9%
2012
 
2.8%
1349
 
1.9%
1191
 
1.7%
1151
 
1.6%
1121
 
1.6%
1095
 
1.5%
1068
 
1.5%
1047
 
1.5%
1044
 
1.5%
Other values (2150) 58599
81.7%
Lowercase Letter
ValueCountFrequency (%)
a 34
13.4%
e 27
10.6%
r 27
10.6%
i 21
 
8.3%
t 21
 
8.3%
g 18
 
7.1%
s 15
 
5.9%
o 14
 
5.5%
c 12
 
4.7%
h 9
 
3.5%
Other values (13) 56
22.0%
Uppercase Letter
ValueCountFrequency (%)
B 4911
43.4%
K 2225
19.7%
M 2214
19.6%
O 1482
 
13.1%
C 472
 
4.2%
P 2
 
< 0.1%
S 2
 
< 0.1%
L 1
 
< 0.1%
H 1
 
< 0.1%
A 1
 
< 0.1%
Other values (2) 2
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 291
13.1%
4 279
12.6%
2 265
11.9%
3 255
11.5%
5 240
10.8%
7 217
9.8%
6 182
8.2%
9 170
7.7%
0 162
7.3%
8 157
7.1%
Other Punctuation
ValueCountFrequency (%)
. 13
44.8%
· 9
31.0%
/ 3
 
10.3%
; 2
 
6.9%
\ 1
 
3.4%
" 1
 
3.4%
Dash Punctuation
ValueCountFrequency (%)
- 7705
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1404
100.0%
Space Separator
ValueCountFrequency (%)
286
100.0%
Close Punctuation
ValueCountFrequency (%)
) 24
100.0%
Open Punctuation
ValueCountFrequency (%)
( 24
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 51791
54.5%
Hangul 19946
 
21.0%
Common 11690
 
12.3%
Latin 11567
 
12.2%

Most frequent character per script

Han
ValueCountFrequency (%)
2012
 
3.9%
1121
 
2.2%
1021
 
2.0%
944
 
1.8%
919
 
1.8%
854
 
1.6%
687
 
1.3%
669
 
1.3%
485
 
0.9%
475
 
0.9%
Other values (1777) 42604
82.3%
Hangul
ValueCountFrequency (%)
2060
 
10.3%
1349
 
6.8%
1191
 
6.0%
1151
 
5.8%
1095
 
5.5%
1068
 
5.4%
1047
 
5.2%
1044
 
5.2%
891
 
4.5%
884
 
4.4%
Other values (363) 8166
40.9%
Latin
ValueCountFrequency (%)
B 4911
42.5%
K 2225
19.2%
M 2214
19.1%
O 1482
 
12.8%
C 472
 
4.1%
a 34
 
0.3%
e 27
 
0.2%
r 27
 
0.2%
i 21
 
0.2%
t 21
 
0.2%
Other values (25) 133
 
1.1%
Common
ValueCountFrequency (%)
- 7705
65.9%
_ 1404
 
12.0%
1 291
 
2.5%
286
 
2.4%
4 279
 
2.4%
2 265
 
2.3%
3 255
 
2.2%
5 240
 
2.1%
7 217
 
1.9%
6 182
 
1.6%
Other values (11) 566
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
CJK 51789
54.5%
ASCII 23248
24.5%
Hangul 19946
 
21.0%
None 9
 
< 0.1%
CJK Compat Ideographs 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 7705
33.1%
B 4911
21.1%
K 2225
 
9.6%
M 2214
 
9.5%
O 1482
 
6.4%
_ 1404
 
6.0%
C 472
 
2.0%
1 291
 
1.3%
286
 
1.2%
4 279
 
1.2%
Other values (45) 1979
 
8.5%
Hangul
ValueCountFrequency (%)
2060
 
10.3%
1349
 
6.8%
1191
 
6.0%
1151
 
5.8%
1095
 
5.5%
1068
 
5.4%
1047
 
5.2%
1044
 
5.2%
891
 
4.5%
884
 
4.4%
Other values (363) 8166
40.9%
CJK
ValueCountFrequency (%)
2012
 
3.9%
1121
 
2.2%
1021
 
2.0%
944
 
1.8%
919
 
1.8%
854
 
1.6%
687
 
1.3%
669
 
1.3%
485
 
0.9%
475
 
0.9%
Other values (1775) 42602
82.3%
None
ValueCountFrequency (%)
· 9
100.0%
CJK Compat Ideographs
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct8050
Distinct (%)80.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:30:32.726848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length178
Median length130
Mean length5.1377
Min length1

Characters and Unicode

Total characters51377
Distinct characters1769
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6960 ?
Unique (%)69.6%

Sample

1st row해합산
2nd row곽향안위산
3rd row식무미
4th row숙식비
5th row금수육군전
ValueCountFrequency (%)
37
 
0.3%
16
 
0.1%
접촉시 13
 
0.1%
양격산 9
 
0.1%
계지복령환 9
 
0.1%
이진탕 9
 
0.1%
귀비탕 9
 
0.1%
지출환 8
 
0.1%
가미온담탕 8
 
0.1%
백호탕 8
 
0.1%
Other values (8423) 10592
98.8%
2023-12-13T07:30:33.096009image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1767
 
3.4%
- 1193
 
2.3%
1036
 
2.0%
858
 
1.7%
844
 
1.6%
_ 798
 
1.6%
748
 
1.5%
718
 
1.4%
698
 
1.4%
641
 
1.2%
Other values (1759) 42076
81.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47743
92.9%
Dash Punctuation 1193
 
2.3%
Connector Punctuation 798
 
1.6%
Space Separator 718
 
1.4%
Uppercase Letter 410
 
0.8%
Lowercase Letter 236
 
0.5%
Open Punctuation 87
 
0.2%
Close Punctuation 87
 
0.2%
Decimal Number 87
 
0.2%
Other Punctuation 18
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1767
 
3.7%
1036
 
2.2%
858
 
1.8%
844
 
1.8%
748
 
1.6%
698
 
1.5%
641
 
1.3%
639
 
1.3%
540
 
1.1%
534
 
1.1%
Other values (1707) 39438
82.6%
Lowercase Letter
ValueCountFrequency (%)
a 29
12.3%
r 27
11.4%
e 26
11.0%
t 20
8.5%
i 19
 
8.1%
g 17
 
7.2%
s 15
 
6.4%
o 13
 
5.5%
c 12
 
5.1%
n 9
 
3.8%
Other values (12) 49
20.8%
Decimal Number
ValueCountFrequency (%)
5 23
26.4%
1 17
19.5%
7 14
16.1%
3 11
12.6%
0 7
 
8.0%
2 7
 
8.0%
4 3
 
3.4%
8 3
 
3.4%
9 1
 
1.1%
6 1
 
1.1%
Uppercase Letter
ValueCountFrequency (%)
O 401
97.8%
P 2
 
0.5%
S 2
 
0.5%
H 1
 
0.2%
L 1
 
0.2%
A 1
 
0.2%
E 1
 
0.2%
R 1
 
0.2%
Other Punctuation
ValueCountFrequency (%)
, 7
38.9%
· 5
27.8%
/ 3
16.7%
; 2
 
11.1%
" 1
 
5.6%
Open Punctuation
ValueCountFrequency (%)
( 56
64.4%
{ 31
35.6%
Close Punctuation
ValueCountFrequency (%)
) 56
64.4%
} 31
35.6%
Dash Punctuation
ValueCountFrequency (%)
- 1193
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 798
100.0%
Space Separator
ValueCountFrequency (%)
718
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 38117
74.2%
Han 9626
 
18.7%
Common 2988
 
5.8%
Latin 646
 
1.3%

Most frequent character per script

Han
ValueCountFrequency (%)
240
 
2.5%
201
 
2.1%
162
 
1.7%
147
 
1.5%
134
 
1.4%
122
 
1.3%
120
 
1.2%
117
 
1.2%
107
 
1.1%
94
 
1.0%
Other values (1172) 8182
85.0%
Hangul
ValueCountFrequency (%)
1767
 
4.6%
1036
 
2.7%
858
 
2.3%
844
 
2.2%
748
 
2.0%
698
 
1.8%
641
 
1.7%
639
 
1.7%
540
 
1.4%
534
 
1.4%
Other values (525) 29812
78.2%
Latin
ValueCountFrequency (%)
O 401
62.1%
a 29
 
4.5%
r 27
 
4.2%
e 26
 
4.0%
t 20
 
3.1%
i 19
 
2.9%
g 17
 
2.6%
s 15
 
2.3%
o 13
 
2.0%
c 12
 
1.9%
Other values (20) 67
 
10.4%
Common
ValueCountFrequency (%)
- 1193
39.9%
_ 798
26.7%
718
24.0%
( 56
 
1.9%
) 56
 
1.9%
} 31
 
1.0%
{ 31
 
1.0%
5 23
 
0.8%
1 17
 
0.6%
7 14
 
0.5%
Other values (12) 51
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 38114
74.2%
CJK 9625
 
18.7%
ASCII 3629
 
7.1%
None 5
 
< 0.1%
Compat Jamo 3
 
< 0.1%
CJK Compat Ideographs 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1767
 
4.6%
1036
 
2.7%
858
 
2.3%
844
 
2.2%
748
 
2.0%
698
 
1.8%
641
 
1.7%
639
 
1.7%
540
 
1.4%
534
 
1.4%
Other values (524) 29809
78.2%
ASCII
ValueCountFrequency (%)
- 1193
32.9%
_ 798
22.0%
718
19.8%
O 401
 
11.0%
( 56
 
1.5%
) 56
 
1.5%
} 31
 
0.9%
{ 31
 
0.9%
a 29
 
0.8%
r 27
 
0.7%
Other values (41) 289
 
8.0%
CJK
ValueCountFrequency (%)
240
 
2.5%
201
 
2.1%
162
 
1.7%
147
 
1.5%
134
 
1.4%
122
 
1.3%
120
 
1.2%
117
 
1.2%
107
 
1.1%
94
 
1.0%
Other values (1171) 8181
85.0%
None
ValueCountFrequency (%)
· 5
100.0%
Compat Jamo
ValueCountFrequency (%)
3
100.0%
CJK Compat Ideographs
ValueCountFrequency (%)
1
100.0%
Distinct8019
Distinct (%)80.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:30:33.330348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length178
Median length130
Mean length5.1315
Min length1

Characters and Unicode

Total characters51315
Distinct characters2113
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6897 ?
Unique (%)69.0%

Sample

1st row海蛤散
2nd row藿香安胃散
3rd row食無味
4th row宿食秘
5th row金水六君煎
ValueCountFrequency (%)
37
 
0.4%
oo 12
 
0.1%
二陳湯 11
 
0.1%
四物湯 10
 
0.1%
歸脾湯 9
 
0.1%
桂枝茯o丸 9
 
0.1%
枳朮丸 8
 
0.1%
血府逐瘀湯 8
 
0.1%
加味溫膽湯 8
 
0.1%
當歸四逆湯 8
 
0.1%
Other values (8115) 10163
98.8%
2023-12-13T07:30:33.681046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2006
 
3.9%
_ 1365
 
2.7%
O 1304
 
2.5%
- 1231
 
2.4%
848
 
1.7%
684
 
1.3%
664
 
1.3%
653
 
1.3%
650
 
1.3%
642
 
1.3%
Other values (2103) 41268
80.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 46724
91.1%
Connector Punctuation 1365
 
2.7%
Uppercase Letter 1313
 
2.6%
Dash Punctuation 1231
 
2.4%
Space Separator 283
 
0.6%
Lowercase Letter 252
 
0.5%
Decimal Number 85
 
0.2%
Close Punctuation 24
 
< 0.1%
Open Punctuation 24
 
< 0.1%
Other Punctuation 14
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2006
 
4.3%
848
 
1.8%
684
 
1.5%
664
 
1.4%
653
 
1.4%
650
 
1.4%
642
 
1.4%
641
 
1.4%
469
 
1.0%
457
 
1.0%
Other values (2053) 39010
83.5%
Lowercase Letter
ValueCountFrequency (%)
a 34
13.5%
r 27
10.7%
e 27
10.7%
t 21
 
8.3%
i 21
 
8.3%
g 16
 
6.3%
s 15
 
6.0%
o 14
 
5.6%
c 12
 
4.8%
h 9
 
3.6%
Other values (13) 56
22.2%
Decimal Number
ValueCountFrequency (%)
5 22
25.9%
1 17
20.0%
7 13
15.3%
3 11
12.9%
2 7
 
8.2%
0 7
 
8.2%
8 3
 
3.5%
4 3
 
3.5%
9 1
 
1.2%
6 1
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
O 1304
99.3%
P 2
 
0.2%
S 2
 
0.2%
L 1
 
0.1%
H 1
 
0.1%
A 1
 
0.1%
R 1
 
0.1%
E 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
· 8
57.1%
/ 3
 
21.4%
; 2
 
14.3%
" 1
 
7.1%
Connector Punctuation
ValueCountFrequency (%)
_ 1365
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1231
100.0%
Space Separator
ValueCountFrequency (%)
283
100.0%
Close Punctuation
ValueCountFrequency (%)
) 24
100.0%
Open Punctuation
ValueCountFrequency (%)
( 24
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 41266
80.4%
Hangul 5458
 
10.6%
Common 3026
 
5.9%
Latin 1565
 
3.0%

Most frequent character per script

Han
ValueCountFrequency (%)
2006
 
4.9%
848
 
2.1%
684
 
1.7%
642
 
1.6%
469
 
1.1%
457
 
1.1%
439
 
1.1%
428
 
1.0%
425
 
1.0%
421
 
1.0%
Other values (1684) 34447
83.5%
Hangul
ValueCountFrequency (%)
664
 
12.2%
653
 
12.0%
650
 
11.9%
641
 
11.7%
214
 
3.9%
138
 
2.5%
133
 
2.4%
113
 
2.1%
110
 
2.0%
90
 
1.6%
Other values (359) 2052
37.6%
Latin
ValueCountFrequency (%)
O 1304
83.3%
a 34
 
2.2%
r 27
 
1.7%
e 27
 
1.7%
t 21
 
1.3%
i 21
 
1.3%
g 16
 
1.0%
s 15
 
1.0%
o 14
 
0.9%
c 12
 
0.8%
Other values (21) 74
 
4.7%
Common
ValueCountFrequency (%)
_ 1365
45.1%
- 1231
40.7%
283
 
9.4%
) 24
 
0.8%
( 24
 
0.8%
5 22
 
0.7%
1 17
 
0.6%
7 13
 
0.4%
3 11
 
0.4%
· 8
 
0.3%
Other values (9) 28
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
CJK 41264
80.4%
Hangul 5458
 
10.6%
ASCII 4583
 
8.9%
None 8
 
< 0.1%
CJK Compat Ideographs 2
 
< 0.1%

Most frequent character per block

CJK
ValueCountFrequency (%)
2006
 
4.9%
848
 
2.1%
684
 
1.7%
642
 
1.6%
469
 
1.1%
457
 
1.1%
439
 
1.1%
428
 
1.0%
425
 
1.0%
421
 
1.0%
Other values (1682) 34445
83.5%
ASCII
ValueCountFrequency (%)
_ 1365
29.8%
O 1304
28.5%
- 1231
26.9%
283
 
6.2%
a 34
 
0.7%
r 27
 
0.6%
e 27
 
0.6%
) 24
 
0.5%
( 24
 
0.5%
5 22
 
0.5%
Other values (39) 242
 
5.3%
Hangul
ValueCountFrequency (%)
664
 
12.2%
653
 
12.0%
650
 
11.9%
641
 
11.7%
214
 
3.9%
138
 
2.5%
133
 
2.4%
113
 
2.1%
110
 
2.0%
90
 
1.6%
Other values (359) 2052
37.6%
None
ValueCountFrequency (%)
· 8
100.0%
CJK Compat Ideographs
ValueCountFrequency (%)
1
50.0%
1
50.0%

Missing values

2023-12-13T07:30:30.111481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:30:30.223757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

온톨로지키워드제어번호키워드분류식별자(CID)약재명(LOCAL_NAME)한문명온톨로지검색한문명
160479,491FormulaFO海蛤散-MB得效-BK심계내과학海蛤散-MB得效-BK심계내과학해합산海蛤散
1893712,922FormulaFO藿香安胃散-MB東醫寶鑑-MB醫鑑-BK동의방제와처방해설-108藿香安胃散-MB東醫寶鑑-MB醫鑑-BK동의방제와처방해설-108곽향안위산藿香安胃散
790328,041SymptomSY食無味食無味식무미食無味
8460356DiseaseDI宿食秘宿食秘숙식비宿食秘
1898313,812FormulaFO金水六君煎-MB景岳全書金水六君煎-MB景岳全書금수육군전金水六君煎
364422,874SymptomSY氣上衝胸氣上衝胸기상충흉氣上衝胸
1771610,218FormulaFO滋補養榮丸滋補養榮丸자보양영환滋補養榮丸
736327,731SymptomSY頑癬頑癬완선頑癬
2443917,535Medicinal_MaterialMM銅靑銅靑동청銅靑
178788,304FormulaFO旋覆代O湯-MB傷寒論-BC辨太陽病脈證幷治下-BK방제학-399旋覆代O湯-MB傷寒論-BC辨太陽病脈證幷治下-BK방제학-399선복대자탕旋覆代O湯
온톨로지키워드제어번호키워드분류식별자(CID)약재명(LOCAL_NAME)한문명온톨로지검색한문명
158369,444FormulaFO活絡丹-MB東醫寶鑑-MB局方-BK동의방제와처방해설-861活絡丹-MB東醫寶鑑-MB局方-BK동의방제와처방해설-861활락단活絡丹
134922,288SymptomSY或面黃而O或面黃而O혹면황이반或面黃而O
2742116,230Medicinal_MaterialMM玄胡索-炒玄胡索-炒玄胡索-炒玄胡索-炒
2867226,968SymptomSY贅O贅O췌우贅O
141923,597SymptomSY熱結裏實-증상아님熱結裏實-증상아님熱結裏實-증상아님熱結裏實-증상아님
2420813,673FormulaFO連翹敗毒散-MB東醫寶鑑-BC癰疽--癰疽五發證-MB醫鑑-BK동의방제와처방해설-104連翹敗毒散-MB東醫寶鑑-BC癰疽--癰疽五發證-MB醫鑑-BK동의방제와처방해설-104연교패독산連翹敗毒散
735227,720SymptomSY項背拘急不舒項背拘急不舒항배구급불서項背拘急不舒
142617,081FormulaFO大秦O湯-MB東醫寶鑑-MB易老-BK동의방제와처방해설-287大秦O湯-MB東醫寶鑑-MB易老-BK동의방제와처방해설-287대진교탕大秦O湯
421922,973SymptomSY氣血疼痛氣血疼痛氣血疼痛氣血疼痛
158609,468FormulaFO活血驅風湯(散)-BK신계내과학活血驅風湯(散)-BK신계내과학活血驅風湯(散)活血驅風湯(散)