Overview

Dataset statistics

Number of variables14
Number of observations10000
Missing cells24049
Missing cells (%)17.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 MiB
Average record size in memory124.0 B

Variable types

Numeric3
Text9
Categorical1
DateTime1

Dataset

Description고대부터 현대까지 한국 역사의 주요 자료를 제공하는 웹 사이트인 한국사데이터베이스(http://db.history.go.kr)에서 서비스중인 한국역사용어를 수집, 분류하여 시소러스 검색 사전을 구축·제공하여 방대한 정보를 보다 용이하게 검색할 수 있도록 돕는 시소러스 DB
Author교육부 국사편찬위원회
URLhttps://www.data.go.kr/data/3039423/fileData.do

Alerts

term_kind is highly imbalanced (94.7%)Imbalance
term_remark has 7846 (78.5%) missing valuesMissing
term_attr has 9952 (99.5%) missing valuesMissing
term_year has 175 (1.8%) missing valuesMissing
term_times has 289 (2.9%) missing valuesMissing
term_desc has 131 (1.3%) missing valuesMissing
term_reference has 5511 (55.1%) missing valuesMissing
term_id has unique valuesUnique
term_user has 2360 (23.6%) zerosZeros

Reproduction

Analysis started2023-12-12 13:03:15.098277
Analysis finished2023-12-12 13:03:21.747741
Duration6.65 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

term_id
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13062767
Minimum1
Maximum52430056
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T22:03:21.849781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3595.85
Q115014.25
median6296781
Q323073634
95-th percentile39857069
Maximum52430056
Range52430055
Interquartile range (IQR)23058620

Descriptive statistics

Standard deviation15143195
Coefficient of variation (CV)1.159264
Kurtosis-0.83719477
Mean13062767
Median Absolute Deviation (MAD)6290441.5
Skewness0.78306527
Sum1.3062767 × 1011
Variance2.2931635 × 1014
MonotonicityNot monotonic
2023-12-12T22:03:22.057607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10489860 1
 
< 0.1%
6297830 1
 
< 0.1%
2106001 1
 
< 0.1%
14680392 1
 
< 0.1%
10491122 1
 
< 0.1%
2106004 1
 
< 0.1%
10082 1
 
< 0.1%
39857914 1
 
< 0.1%
880 1
 
< 0.1%
12591737 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
7 1
< 0.1%
10 1
< 0.1%
20 1
< 0.1%
30 1
< 0.1%
36 1
< 0.1%
48 1
< 0.1%
49 1
< 0.1%
70 1
< 0.1%
183 1
< 0.1%
ValueCountFrequency (%)
52430056 1
< 0.1%
52430035 1
< 0.1%
52429891 1
< 0.1%
52429732 1
< 0.1%
52429649 1
< 0.1%
48245807 1
< 0.1%
48245805 1
< 0.1%
48245149 1
< 0.1%
48245117 1
< 0.1%
48245057 1
< 0.1%

topterm_id
Real number (ℝ)

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean342.1283
Minimum1
Maximum665
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T22:03:22.216184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q149
median333
Q3631
95-th percentile665
Maximum665
Range664
Interquartile range (IQR)582

Descriptive statistics

Standard deviation259.71796
Coefficient of variation (CV)0.75912447
Kurtosis-1.5782601
Mean342.1283
Median Absolute Deviation (MAD)298
Skewness-0.073113456
Sum3421283
Variance67453.421
MonotonicityNot monotonic
2023-12-12T22:03:22.335053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
8 1941
19.4%
662 805
 
8.1%
439 764
 
7.6%
665 705
 
7.0%
333 674
 
6.7%
631 658
 
6.6%
199 636
 
6.4%
86 541
 
5.4%
49 535
 
5.3%
407 503
 
5.0%
Other values (7) 2238
22.4%
ValueCountFrequency (%)
1 319
 
3.2%
8 1941
19.4%
49 535
 
5.3%
86 541
 
5.4%
199 636
 
6.4%
283 227
 
2.3%
310 175
 
1.8%
333 674
 
6.7%
407 503
 
5.0%
439 764
 
7.6%
ValueCountFrequency (%)
665 705
7.0%
664 438
4.4%
663 258
 
2.6%
662 805
8.1%
631 658
6.6%
574 339
3.4%
518 482
4.8%
439 764
7.6%
407 503
5.0%
333 674
6.7%
Distinct9688
Distinct (%)96.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T22:03:22.754338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length18
Mean length4.2749
Min length1

Characters and Unicode

Total characters42749
Distinct characters648
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9431 ?
Unique (%)94.3%

Sample

1st row현무문
2nd row창원전보사
3rd row가감역관
4th row공보부
5th row사회주의법무생활지도위원회
ValueCountFrequency (%)
교동 8
 
0.1%
구암집 5
 
< 0.1%
9품 5
 
< 0.1%
대불정여래밀인수증요의제보살만행수능엄경 4
 
< 0.1%
영흥 4
 
< 0.1%
정1품 4
 
< 0.1%
2품 4
 
< 0.1%
정4품 3
 
< 0.1%
양현고 3
 
< 0.1%
종5품 3
 
< 0.1%
Other values (9685) 9964
99.6%
2023-12-12T22:03:23.226291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1496
 
3.5%
835
 
2.0%
787
 
1.8%
785
 
1.8%
770
 
1.8%
617
 
1.4%
560
 
1.3%
558
 
1.3%
555
 
1.3%
529
 
1.2%
Other values (638) 35257
82.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 42195
98.7%
Decimal Number 444
 
1.0%
Other Punctuation 84
 
0.2%
Uppercase Letter 13
 
< 0.1%
Space Separator 7
 
< 0.1%
Close Punctuation 3
 
< 0.1%
Open Punctuation 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1496
 
3.5%
835
 
2.0%
787
 
1.9%
785
 
1.9%
770
 
1.8%
617
 
1.5%
560
 
1.3%
558
 
1.3%
555
 
1.3%
529
 
1.3%
Other values (613) 34703
82.2%
Decimal Number
ValueCountFrequency (%)
3 105
23.6%
1 65
14.6%
4 56
12.6%
5 43
9.7%
2 42
 
9.5%
6 41
 
9.2%
9 32
 
7.2%
8 25
 
5.6%
7 21
 
4.7%
0 14
 
3.2%
Uppercase Letter
ValueCountFrequency (%)
K 4
30.8%
N 2
15.4%
L 1
 
7.7%
A 1
 
7.7%
C 1
 
7.7%
S 1
 
7.7%
B 1
 
7.7%
P 1
 
7.7%
H 1
 
7.7%
Other Punctuation
ValueCountFrequency (%)
· 82
97.6%
1
 
1.2%
% 1
 
1.2%
Space Separator
ValueCountFrequency (%)
7
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 42195
98.7%
Common 541
 
1.3%
Latin 13
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1496
 
3.5%
835
 
2.0%
787
 
1.9%
785
 
1.9%
770
 
1.8%
617
 
1.5%
560
 
1.3%
558
 
1.3%
555
 
1.3%
529
 
1.3%
Other values (613) 34703
82.2%
Common
ValueCountFrequency (%)
3 105
19.4%
· 82
15.2%
1 65
12.0%
4 56
10.4%
5 43
7.9%
2 42
 
7.8%
6 41
 
7.6%
9 32
 
5.9%
8 25
 
4.6%
7 21
 
3.9%
Other values (6) 29
 
5.4%
Latin
ValueCountFrequency (%)
K 4
30.8%
N 2
15.4%
L 1
 
7.7%
A 1
 
7.7%
C 1
 
7.7%
S 1
 
7.7%
B 1
 
7.7%
P 1
 
7.7%
H 1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 42195
98.7%
ASCII 471
 
1.1%
None 82
 
0.2%
Punctuation 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1496
 
3.5%
835
 
2.0%
787
 
1.9%
785
 
1.9%
770
 
1.8%
617
 
1.5%
560
 
1.3%
558
 
1.3%
555
 
1.3%
529
 
1.3%
Other values (613) 34703
82.2%
ASCII
ValueCountFrequency (%)
3 105
22.3%
1 65
13.8%
4 56
11.9%
5 43
9.1%
2 42
 
8.9%
6 41
 
8.7%
9 32
 
6.8%
8 25
 
5.3%
7 21
 
4.5%
0 14
 
3.0%
Other values (13) 27
 
5.7%
None
ValueCountFrequency (%)
· 82
100.0%
Punctuation
ValueCountFrequency (%)
1
100.0%

term_kind
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2
9899 
1
 
98
0
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 9899
99.0%
1 98
 
1.0%
0 3
 
< 0.1%

Length

2023-12-12T22:03:23.336502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:03:23.416735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 9899
99.0%
1 98
 
1.0%
0 3
 
< 0.1%
Distinct9699
Distinct (%)97.7%
Missing74
Missing (%)0.7%
Memory size156.2 KiB
2023-12-12T22:03:23.688961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length21
Mean length4.2812815
Min length1

Characters and Unicode

Total characters42496
Distinct characters3125
Distinct categories9 ?
Distinct scripts5 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9532 ?
Unique (%)96.0%

Sample

1st row玄武門
2nd row昌原電報司
3rd row假監役官
4th row公報部
5th row社會主義法務生活指導委員會
ValueCountFrequency (%)
校洞 8
 
0.1%
九品 5
 
0.1%
龜巖集 5
 
0.1%
正一品 4
 
< 0.1%
大佛頂如來密因修證了義諸菩薩萬行首楞嚴經 4
 
< 0.1%
二品 4
 
< 0.1%
永興 4
 
< 0.1%
從三品 3
 
< 0.1%
正四品 3
 
< 0.1%
正三品 3
 
< 0.1%
Other values (9710) 9882
99.6%
2023-12-12T22:03:24.089837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
804
 
1.9%
726
 
1.7%
587
 
1.4%
527
 
1.2%
447
 
1.1%
325
 
0.8%
319
 
0.8%
288
 
0.7%
238
 
0.6%
222
 
0.5%
Other values (3115) 38013
89.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 42077
99.0%
Lowercase Letter 180
 
0.4%
Decimal Number 67
 
0.2%
Uppercase Letter 66
 
0.2%
Other Punctuation 51
 
0.1%
Space Separator 47
 
0.1%
Open Punctuation 4
 
< 0.1%
Close Punctuation 3
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
804
 
1.9%
726
 
1.7%
587
 
1.4%
527
 
1.3%
447
 
1.1%
325
 
0.8%
319
 
0.8%
288
 
0.7%
238
 
0.6%
222
 
0.5%
Other values (3054) 37594
89.3%
Lowercase Letter
ValueCountFrequency (%)
a 24
13.3%
e 20
11.1%
n 18
10.0%
o 15
 
8.3%
r 14
 
7.8%
l 12
 
6.7%
i 12
 
6.7%
t 11
 
6.1%
d 7
 
3.9%
s 7
 
3.9%
Other values (12) 40
22.2%
Uppercase Letter
ValueCountFrequency (%)
C 8
12.1%
K 7
10.6%
A 7
10.6%
H 6
 
9.1%
S 5
 
7.6%
M 4
 
6.1%
N 4
 
6.1%
G 3
 
4.5%
P 3
 
4.5%
U 3
 
4.5%
Other values (10) 16
24.2%
Decimal Number
ValueCountFrequency (%)
3 11
16.4%
1 10
14.9%
9 10
14.9%
2 8
11.9%
8 7
10.4%
4 6
9.0%
7 5
7.5%
0 5
7.5%
5 3
 
4.5%
6 2
 
3.0%
Other Punctuation
ValueCountFrequency (%)
· 44
86.3%
. 4
 
7.8%
, 2
 
3.9%
% 1
 
2.0%
Open Punctuation
ValueCountFrequency (%)
[ 3
75.0%
( 1
 
25.0%
Space Separator
ValueCountFrequency (%)
47
100.0%
Close Punctuation
ValueCountFrequency (%)
] 3
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 40840
96.1%
Hangul 1236
 
2.9%
Latin 246
 
0.6%
Common 173
 
0.4%
Hiragana 1
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
804
 
2.0%
726
 
1.8%
587
 
1.4%
527
 
1.3%
447
 
1.1%
325
 
0.8%
319
 
0.8%
288
 
0.7%
238
 
0.6%
222
 
0.5%
Other values (2694) 36357
89.0%
Hangul
ValueCountFrequency (%)
66
 
5.3%
46
 
3.7%
43
 
3.5%
29
 
2.3%
27
 
2.2%
25
 
2.0%
18
 
1.5%
17
 
1.4%
16
 
1.3%
14
 
1.1%
Other values (349) 935
75.6%
Latin
ValueCountFrequency (%)
a 24
 
9.8%
e 20
 
8.1%
n 18
 
7.3%
o 15
 
6.1%
r 14
 
5.7%
l 12
 
4.9%
i 12
 
4.9%
t 11
 
4.5%
C 8
 
3.3%
K 7
 
2.8%
Other values (32) 105
42.7%
Common
ValueCountFrequency (%)
47
27.2%
· 44
25.4%
3 11
 
6.4%
1 10
 
5.8%
9 10
 
5.8%
2 8
 
4.6%
8 7
 
4.0%
4 6
 
3.5%
7 5
 
2.9%
0 5
 
2.9%
Other values (9) 20
11.6%
Hiragana
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
CJK 40111
94.4%
Hangul 1236
 
2.9%
CJK Compat Ideographs 728
 
1.7%
ASCII 375
 
0.9%
None 44
 
0.1%
CJK Ext A 1
 
< 0.1%
Hiragana 1
 
< 0.1%

Most frequent character per block

CJK
ValueCountFrequency (%)
804
 
2.0%
726
 
1.8%
587
 
1.5%
527
 
1.3%
447
 
1.1%
325
 
0.8%
319
 
0.8%
288
 
0.7%
238
 
0.6%
222
 
0.6%
Other values (2587) 35628
88.8%
CJK Compat Ideographs
ValueCountFrequency (%)
121
 
16.6%
54
 
7.4%
52
 
7.1%
38
 
5.2%
35
 
4.8%
29
 
4.0%
27
 
3.7%
27
 
3.7%
19
 
2.6%
19
 
2.6%
Other values (96) 307
42.2%
Hangul
ValueCountFrequency (%)
66
 
5.3%
46
 
3.7%
43
 
3.5%
29
 
2.3%
27
 
2.2%
25
 
2.0%
18
 
1.5%
17
 
1.4%
16
 
1.3%
14
 
1.1%
Other values (349) 935
75.6%
ASCII
ValueCountFrequency (%)
47
 
12.5%
a 24
 
6.4%
e 20
 
5.3%
n 18
 
4.8%
o 15
 
4.0%
r 14
 
3.7%
l 12
 
3.2%
i 12
 
3.2%
t 11
 
2.9%
3 11
 
2.9%
Other values (50) 191
50.9%
None
ValueCountFrequency (%)
· 44
100.0%
CJK Ext A
ValueCountFrequency (%)
1
100.0%
Hiragana
ValueCountFrequency (%)
1
100.0%

term_remark
Text

MISSING 

Distinct436
Distinct (%)20.2%
Missing7846
Missing (%)78.5%
Memory size156.2 KiB
2023-12-12T22:03:24.337082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length1
Mean length1.7934076
Min length1

Characters and Unicode

Total characters3863
Distinct characters244
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique324 ?
Unique (%)15.0%

Sample

1st row판소리
2nd row
3rd row
4th row
5th row인명
ValueCountFrequency (%)
고려 101
 
10.3%
조선 87
 
8.9%
인명 35
 
3.6%
북한 20
 
2.0%
신라 18
 
1.8%
지명 18
 
1.8%
잡지 11
 
1.1%
서명 11
 
1.1%
발해 10
 
1.0%
대한제국기 9
 
0.9%
Other values (426) 657
67.2%
2023-12-12T22:03:24.724643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1185
30.7%
155
 
4.0%
146
 
3.8%
135
 
3.5%
128
 
3.3%
1 108
 
2.8%
9 83
 
2.1%
73
 
1.9%
47
 
1.2%
0 44
 
1.1%
Other values (234) 1759
45.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2283
59.1%
Space Separator 1185
30.7%
Decimal Number 390
 
10.1%
Other Punctuation 5
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
155
 
6.8%
146
 
6.4%
135
 
5.9%
128
 
5.6%
73
 
3.2%
47
 
2.1%
41
 
1.8%
38
 
1.7%
37
 
1.6%
37
 
1.6%
Other values (222) 1446
63.3%
Decimal Number
ValueCountFrequency (%)
1 108
27.7%
9 83
21.3%
0 44
11.3%
8 37
 
9.5%
5 30
 
7.7%
2 26
 
6.7%
6 21
 
5.4%
4 17
 
4.4%
7 17
 
4.4%
3 7
 
1.8%
Space Separator
ValueCountFrequency (%)
1185
100.0%
Other Punctuation
ValueCountFrequency (%)
· 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2283
59.1%
Common 1580
40.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
155
 
6.8%
146
 
6.4%
135
 
5.9%
128
 
5.6%
73
 
3.2%
47
 
2.1%
41
 
1.8%
38
 
1.7%
37
 
1.6%
37
 
1.6%
Other values (222) 1446
63.3%
Common
ValueCountFrequency (%)
1185
75.0%
1 108
 
6.8%
9 83
 
5.3%
0 44
 
2.8%
8 37
 
2.3%
5 30
 
1.9%
2 26
 
1.6%
6 21
 
1.3%
4 17
 
1.1%
7 17
 
1.1%
Other values (2) 12
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2283
59.1%
ASCII 1575
40.8%
None 5
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1185
75.2%
1 108
 
6.9%
9 83
 
5.3%
0 44
 
2.8%
8 37
 
2.3%
5 30
 
1.9%
2 26
 
1.7%
6 21
 
1.3%
4 17
 
1.1%
7 17
 
1.1%
Hangul
ValueCountFrequency (%)
155
 
6.8%
146
 
6.4%
135
 
5.9%
128
 
5.6%
73
 
3.2%
47
 
2.1%
41
 
1.8%
38
 
1.7%
37
 
1.6%
37
 
1.6%
Other values (222) 1446
63.3%
None
ValueCountFrequency (%)
· 5
100.0%

term_attr
Text

MISSING 

Distinct42
Distinct (%)87.5%
Missing9952
Missing (%)99.5%
Memory size156.2 KiB
2023-12-12T22:03:24.949904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length5
Mean length5.125
Min length2

Characters and Unicode

Total characters246
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique36 ?
Unique (%)75.0%

Sample

1st row08.01
2nd row09.01
3rd row05.01
4th row04.09
5th row03.02
ValueCountFrequency (%)
05.01 2
 
4.2%
03 2
 
4.2%
08.02 2
 
4.2%
01.02 2
 
4.2%
07.01 2
 
4.2%
08.01 2
 
4.2%
06.03 1
 
2.1%
10 1
 
2.1%
01.03.04 1
 
2.1%
04.04 1
 
2.1%
Other values (32) 32
66.7%
2023-12-12T22:03:25.267877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 96
39.0%
. 50
20.3%
1 34
 
13.8%
3 12
 
4.9%
2 12
 
4.9%
4 12
 
4.9%
5 8
 
3.3%
8 7
 
2.8%
7 5
 
2.0%
9 5
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 196
79.7%
Other Punctuation 50
 
20.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 96
49.0%
1 34
 
17.3%
3 12
 
6.1%
2 12
 
6.1%
4 12
 
6.1%
5 8
 
4.1%
8 7
 
3.6%
7 5
 
2.6%
9 5
 
2.6%
6 5
 
2.6%
Other Punctuation
ValueCountFrequency (%)
. 50
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 246
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 96
39.0%
. 50
20.3%
1 34
 
13.8%
3 12
 
4.9%
2 12
 
4.9%
4 12
 
4.9%
5 8
 
3.3%
8 7
 
2.8%
7 5
 
2.0%
9 5
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 246
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 96
39.0%
. 50
20.3%
1 34
 
13.8%
3 12
 
4.9%
2 12
 
4.9%
4 12
 
4.9%
5 8
 
3.3%
8 7
 
2.8%
7 5
 
2.0%
9 5
 
2.0%

term_year
Text

MISSING 

Distinct2740
Distinct (%)27.9%
Missing175
Missing (%)1.8%
Memory size156.2 KiB
2023-12-12T22:03:25.578506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length72
Median length51
Mean length4.9283461
Min length1

Characters and Unicode

Total characters48421
Distinct characters72
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2023 ?
Unique (%)20.6%

Sample

1st row?
2nd row1899
3rd row?-?
4th row1946-1948
5th row1977-?
ValueCountFrequency (%)
4313
43.1%
1919 74
 
0.7%
1907 67
 
0.7%
1909 64
 
0.6%
1945 62
 
0.6%
1946 46
 
0.5%
1908 42
 
0.4%
1920 41
 
0.4%
1894 40
 
0.4%
1921 40
 
0.4%
Other values (2407) 5214
52.1%
2023-12-12T22:03:26.079635image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
? 9651
19.9%
1 8594
17.7%
- 7530
15.6%
9 5569
11.5%
8 2460
 
5.1%
6 2122
 
4.4%
4 2048
 
4.2%
5 1944
 
4.0%
3 1940
 
4.0%
0 1924
 
4.0%
Other values (62) 4639
9.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 30197
62.4%
Other Punctuation 9970
 
20.6%
Dash Punctuation 7530
 
15.6%
Space Separator 261
 
0.5%
Other Letter 258
 
0.5%
Uppercase Letter 104
 
0.2%
Open Punctuation 45
 
0.1%
Close Punctuation 45
 
0.1%
Math Symbol 11
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
37
14.3%
29
11.2%
23
 
8.9%
21
 
8.1%
19
 
7.4%
17
 
6.6%
14
 
5.4%
12
 
4.7%
11
 
4.3%
5
 
1.9%
Other values (38) 70
27.1%
Decimal Number
ValueCountFrequency (%)
1 8594
28.5%
9 5569
18.4%
8 2460
 
8.1%
6 2122
 
7.0%
4 2048
 
6.8%
5 1944
 
6.4%
3 1940
 
6.4%
0 1924
 
6.4%
2 1886
 
6.2%
7 1710
 
5.7%
Uppercase Letter
ValueCountFrequency (%)
B 46
44.2%
C 42
40.4%
D 6
 
5.8%
A 6
 
5.8%
P 4
 
3.8%
Other Punctuation
ValueCountFrequency (%)
? 9651
96.8%
, 216
 
2.2%
. 101
 
1.0%
· 2
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 7530
100.0%
Space Separator
ValueCountFrequency (%)
261
100.0%
Open Punctuation
ValueCountFrequency (%)
( 45
100.0%
Close Punctuation
ValueCountFrequency (%)
) 45
100.0%
Math Symbol
ValueCountFrequency (%)
11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 48059
99.3%
Hangul 258
 
0.5%
Latin 104
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
37
14.3%
29
11.2%
23
 
8.9%
21
 
8.1%
19
 
7.4%
17
 
6.6%
14
 
5.4%
12
 
4.7%
11
 
4.3%
5
 
1.9%
Other values (38) 70
27.1%
Common
ValueCountFrequency (%)
? 9651
20.1%
1 8594
17.9%
- 7530
15.7%
9 5569
11.6%
8 2460
 
5.1%
6 2122
 
4.4%
4 2048
 
4.3%
5 1944
 
4.0%
3 1940
 
4.0%
0 1924
 
4.0%
Other values (9) 4277
8.9%
Latin
ValueCountFrequency (%)
B 46
44.2%
C 42
40.4%
D 6
 
5.8%
A 6
 
5.8%
P 4
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48150
99.4%
Hangul 258
 
0.5%
Math Operators 11
 
< 0.1%
None 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
? 9651
20.0%
1 8594
17.8%
- 7530
15.6%
9 5569
11.6%
8 2460
 
5.1%
6 2122
 
4.4%
4 2048
 
4.3%
5 1944
 
4.0%
3 1940
 
4.0%
0 1924
 
4.0%
Other values (12) 4368
9.1%
Hangul
ValueCountFrequency (%)
37
14.3%
29
11.2%
23
 
8.9%
21
 
8.1%
19
 
7.4%
17
 
6.6%
14
 
5.4%
12
 
4.7%
11
 
4.3%
5
 
1.9%
Other values (38) 70
27.1%
Math Operators
ValueCountFrequency (%)
11
100.0%
None
ValueCountFrequency (%)
· 2
100.0%

term_times
Text

MISSING 

Distinct67
Distinct (%)0.7%
Missing289
Missing (%)2.9%
Memory size156.2 KiB
2023-12-12T22:03:26.296826image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length4
Mean length3.7951807
Min length2

Characters and Unicode

Total characters36855
Distinct characters37
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19 ?
Unique (%)0.2%

Sample

1st row삼국시대
2nd row근대
3rd row조선후기
4th row현대
5th row현대
ValueCountFrequency (%)
조선시대 1498
15.4%
현대 1222
12.6%
일제시기 1038
10.7%
조선후기 971
10.0%
통시대 860
8.9%
고려시대 566
 
5.8%
삼국시대 485
 
5.0%
근대 480
 
4.9%
조선전기 416
 
4.3%
고려후기 378
 
3.9%
Other values (53) 1800
18.5%
2023-12-12T22:03:26.720735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6927
18.8%
5595
15.2%
3569
9.7%
3332
9.0%
3252
8.8%
1791
 
4.9%
1538
 
4.2%
1412
 
3.8%
1402
 
3.8%
1161
 
3.2%
Other values (27) 6876
18.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 36205
98.2%
Dash Punctuation 642
 
1.7%
Space Separator 5
 
< 0.1%
Other Punctuation 1
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6927
19.1%
5595
15.5%
3569
9.9%
3332
9.2%
3252
9.0%
1791
 
4.9%
1538
 
4.2%
1412
 
3.9%
1402
 
3.9%
1161
 
3.2%
Other values (22) 6226
17.2%
Dash Punctuation
ValueCountFrequency (%)
- 642
100.0%
Space Separator
ValueCountFrequency (%)
5
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 36205
98.2%
Common 650
 
1.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6927
19.1%
5595
15.5%
3569
9.9%
3332
9.2%
3252
9.0%
1791
 
4.9%
1538
 
4.2%
1412
 
3.9%
1402
 
3.9%
1161
 
3.2%
Other values (22) 6226
17.2%
Common
ValueCountFrequency (%)
- 642
98.8%
5
 
0.8%
, 1
 
0.2%
( 1
 
0.2%
) 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 36205
98.2%
ASCII 650
 
1.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6927
19.1%
5595
15.5%
3569
9.9%
3332
9.2%
3252
9.0%
1791
 
4.9%
1538
 
4.2%
1412
 
3.9%
1402
 
3.9%
1161
 
3.2%
Other values (22) 6226
17.2%
ASCII
ValueCountFrequency (%)
- 642
98.8%
5
 
0.8%
, 1
 
0.2%
( 1
 
0.2%
) 1
 
0.2%
Distinct285
Distinct (%)2.9%
Missing71
Missing (%)0.7%
Memory size156.2 KiB
2023-12-12T22:03:27.009413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length61
Median length46
Mean length11.672072
Min length2

Characters and Unicode

Total characters115892
Distinct characters203
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40 ?
Unique (%)0.4%

Sample

1st row문화재
2nd row정치·행정·법제>행정>중앙행정기구
3rd row정치·행정·법제>인사
4th row정치·행정·법제>행정>중앙행정기구
5th row외교·국제관계>북한>정치·행정·법제(북한)
ValueCountFrequency (%)
정치·행정·법제>인사 974
 
9.8%
인명 765
 
7.7%
문화재 705
 
7.1%
서명 428
 
4.3%
정치·행정·법제>행정>중앙행정기구 414
 
4.2%
지명 241
 
2.4%
역사일반 222
 
2.2%
문화·예술>미술 211
 
2.1%
경제·산업>경제단체·기구>회사·기업 168
 
1.7%
교육>근대교육기관>초등교육기관 162
 
1.6%
Other values (277) 5681
57.0%
2023-12-12T22:03:27.471279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
· 11989
 
10.3%
> 11714
 
10.1%
6315
 
5.4%
4789
 
4.1%
4285
 
3.7%
4013
 
3.5%
3330
 
2.9%
2848
 
2.5%
2532
 
2.2%
2414
 
2.1%
Other values (193) 61663
53.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 90851
78.4%
Other Punctuation 11989
 
10.3%
Math Symbol 11714
 
10.1%
Open Punctuation 558
 
0.5%
Close Punctuation 558
 
0.5%
Uppercase Letter 120
 
0.1%
Connector Punctuation 60
 
0.1%
Space Separator 42
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6315
 
7.0%
4789
 
5.3%
4285
 
4.7%
4013
 
4.4%
3330
 
3.7%
2848
 
3.1%
2532
 
2.8%
2414
 
2.7%
2075
 
2.3%
1946
 
2.1%
Other values (184) 56304
62.0%
Uppercase Letter
ValueCountFrequency (%)
L 60
50.0%
U 30
25.0%
N 30
25.0%
Other Punctuation
ValueCountFrequency (%)
· 11989
100.0%
Math Symbol
ValueCountFrequency (%)
> 11714
100.0%
Open Punctuation
ValueCountFrequency (%)
( 558
100.0%
Close Punctuation
ValueCountFrequency (%)
) 558
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 60
100.0%
Space Separator
ValueCountFrequency (%)
42
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 90851
78.4%
Common 24921
 
21.5%
Latin 120
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6315
 
7.0%
4789
 
5.3%
4285
 
4.7%
4013
 
4.4%
3330
 
3.7%
2848
 
3.1%
2532
 
2.8%
2414
 
2.7%
2075
 
2.3%
1946
 
2.1%
Other values (184) 56304
62.0%
Common
ValueCountFrequency (%)
· 11989
48.1%
> 11714
47.0%
( 558
 
2.2%
) 558
 
2.2%
_ 60
 
0.2%
42
 
0.2%
Latin
ValueCountFrequency (%)
L 60
50.0%
U 30
25.0%
N 30
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 90851
78.4%
ASCII 13052
 
11.3%
None 11989
 
10.3%

Most frequent character per block

None
ValueCountFrequency (%)
· 11989
100.0%
ASCII
ValueCountFrequency (%)
> 11714
89.7%
( 558
 
4.3%
) 558
 
4.3%
_ 60
 
0.5%
L 60
 
0.5%
42
 
0.3%
U 30
 
0.2%
N 30
 
0.2%
Hangul
ValueCountFrequency (%)
6315
 
7.0%
4789
 
5.3%
4285
 
4.7%
4013
 
4.4%
3330
 
3.7%
2848
 
3.1%
2532
 
2.8%
2414
 
2.7%
2075
 
2.3%
1946
 
2.1%
Other values (184) 56304
62.0%

term_desc
Text

MISSING 

Distinct9644
Distinct (%)97.7%
Missing131
Missing (%)1.3%
Memory size156.2 KiB
2023-12-12T22:03:27.958319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length281
Median length169
Mean length47.065457
Min length1

Characters and Unicode

Total characters464489
Distinct characters3517
Distinct categories16 ?
Distinct scripts5 ?
Distinct blocks13 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9515 ?
Unique (%)96.4%

Sample

1st row고구려가 수도 평양성의 북성을 쌓을 때 처음 세운 북문으로 1954년 다시 복구한 평양직할시 중구역(中區域) 금수산(錦繡山)에 있는 성문.
2nd row1899년 8월 29일 경상남도 창원(昌原)에 개설한 전신(電信) 업무 행정기관.
3rd row조선시대 선공감(繕工監)에서 토목 영선을 감독하던 종9품의 임시직.
4th row1946년 미군정청 개편에 따라 설치된 기구.
5th row1977년 김일성의 연설을 통해 알려진 것으로 북한주민들이 법규범과 규정대로 생활하도록 법무생활을 지도감독하는 기관.
ValueCountFrequency (%)
조선시대 1029
 
1.1%
617
 
0.7%
하나 611
 
0.7%
있음 571
 
0.6%
있는 446
 
0.5%
고려시대 421
 
0.4%
중국 420
 
0.4%
위해 411
 
0.4%
관직 306
 
0.3%
설립한 298
 
0.3%
Other values (37545) 88859
94.5%
2023-12-12T22:03:28.723534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
84335
 
18.2%
. 12638
 
2.7%
) 12078
 
2.6%
( 12077
 
2.6%
9333
 
2.0%
8114
 
1.7%
1 7001
 
1.5%
5366
 
1.2%
5023
 
1.1%
5021
 
1.1%
Other values (3507) 303503
65.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 311419
67.0%
Space Separator 84335
 
18.2%
Decimal Number 25179
 
5.4%
Other Punctuation 16746
 
3.6%
Close Punctuation 12172
 
2.6%
Open Punctuation 12170
 
2.6%
Control 716
 
0.2%
Math Symbol 612
 
0.1%
Lowercase Letter 492
 
0.1%
Dash Punctuation 407
 
0.1%
Other values (6) 241
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9333
 
3.0%
8114
 
2.6%
5366
 
1.7%
5023
 
1.6%
5021
 
1.6%
4763
 
1.5%
4585
 
1.5%
4560
 
1.5%
4395
 
1.4%
4357
 
1.4%
Other values (3404) 255902
82.2%
Lowercase Letter
ValueCountFrequency (%)
e 53
10.8%
a 51
10.4%
r 47
9.6%
n 45
9.1%
o 44
8.9%
l 37
 
7.5%
i 33
 
6.7%
t 27
 
5.5%
m 26
 
5.3%
s 25
 
5.1%
Other values (13) 104
21.1%
Uppercase Letter
ValueCountFrequency (%)
C 26
13.1%
B 21
 
10.6%
A 19
 
9.5%
M 16
 
8.0%
S 12
 
6.0%
H 11
 
5.5%
P 11
 
5.5%
J 11
 
5.5%
E 9
 
4.5%
F 9
 
4.5%
Other values (12) 54
27.1%
Other Punctuation
ValueCountFrequency (%)
. 12638
75.5%
, 2007
 
12.0%
· 1966
 
11.7%
: 59
 
0.4%
' 56
 
0.3%
" 10
 
0.1%
3
 
< 0.1%
/ 2
 
< 0.1%
% 1
 
< 0.1%
1
 
< 0.1%
Other values (3) 3
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 7001
27.8%
9 4237
16.8%
2 2289
 
9.1%
3 1885
 
7.5%
8 1729
 
6.9%
0 1693
 
6.7%
4 1669
 
6.6%
5 1656
 
6.6%
6 1656
 
6.6%
7 1364
 
5.4%
Math Symbol
ValueCountFrequency (%)
260
42.5%
260
42.5%
28
 
4.6%
> 21
 
3.4%
< 21
 
3.4%
~ 12
 
2.0%
5
 
0.8%
5
 
0.8%
Close Punctuation
ValueCountFrequency (%)
) 12078
99.2%
] 79
 
0.6%
7
 
0.1%
5
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 12077
99.2%
[ 79
 
0.6%
6
 
< 0.1%
5
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
8
30.8%
8
30.8%
8
30.8%
1
 
3.8%
° 1
 
3.8%
Control
ValueCountFrequency (%)
358
50.0%
358
50.0%
Space Separator
ValueCountFrequency (%)
84335
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 407
100.0%
Final Punctuation
ValueCountFrequency (%)
7
100.0%
Initial Punctuation
ValueCountFrequency (%)
7
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 282879
60.9%
Common 152378
32.8%
Han 28539
 
6.1%
Latin 692
 
0.1%
Hiragana 1
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
261
 
0.9%
252
 
0.9%
251
 
0.9%
244
 
0.9%
242
 
0.8%
233
 
0.8%
223
 
0.8%
217
 
0.8%
211
 
0.7%
199
 
0.7%
Other values (2342) 26206
91.8%
Hangul
ValueCountFrequency (%)
9333
 
3.3%
8114
 
2.9%
5366
 
1.9%
5023
 
1.8%
5021
 
1.8%
4763
 
1.7%
4585
 
1.6%
4560
 
1.6%
4395
 
1.6%
4357
 
1.5%
Other values (1051) 227362
80.4%
Common
ValueCountFrequency (%)
84335
55.3%
. 12638
 
8.3%
) 12078
 
7.9%
( 12077
 
7.9%
1 7001
 
4.6%
9 4237
 
2.8%
2 2289
 
1.5%
, 2007
 
1.3%
· 1966
 
1.3%
3 1885
 
1.2%
Other values (47) 11865
 
7.8%
Latin
ValueCountFrequency (%)
e 53
 
7.7%
a 51
 
7.4%
r 47
 
6.8%
n 45
 
6.5%
o 44
 
6.4%
l 37
 
5.3%
i 33
 
4.8%
t 27
 
3.9%
C 26
 
3.8%
m 26
 
3.8%
Other values (36) 303
43.8%
Hiragana
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 282848
60.9%
ASCII 150470
32.4%
CJK 27934
 
6.0%
None 2010
 
0.4%
CJK Compat Ideographs 604
 
0.1%
Math Operators 548
 
0.1%
Compat Jamo 31
 
< 0.1%
CJK Compat 24
 
< 0.1%
Punctuation 16
 
< 0.1%
Number Forms 1
 
< 0.1%
Other values (3) 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
84335
56.0%
. 12638
 
8.4%
) 12078
 
8.0%
( 12077
 
8.0%
1 7001
 
4.7%
9 4237
 
2.8%
2 2289
 
1.5%
, 2007
 
1.3%
3 1885
 
1.3%
8 1729
 
1.1%
Other values (65) 10194
 
6.8%
Hangul
ValueCountFrequency (%)
9333
 
3.3%
8114
 
2.9%
5366
 
1.9%
5023
 
1.8%
5021
 
1.8%
4763
 
1.7%
4585
 
1.6%
4560
 
1.6%
4395
 
1.6%
4357
 
1.5%
Other values (1032) 227331
80.4%
None
ValueCountFrequency (%)
· 1966
97.8%
7
 
0.3%
6
 
0.3%
5
 
0.2%
5
 
0.2%
5
 
0.2%
5
 
0.2%
3
 
0.1%
1
 
< 0.1%
1
 
< 0.1%
Other values (6) 6
 
0.3%
CJK
ValueCountFrequency (%)
261
 
0.9%
252
 
0.9%
251
 
0.9%
244
 
0.9%
242
 
0.9%
233
 
0.8%
223
 
0.8%
217
 
0.8%
211
 
0.8%
199
 
0.7%
Other values (2231) 25601
91.6%
Math Operators
ValueCountFrequency (%)
260
47.4%
260
47.4%
28
 
5.1%
CJK Compat Ideographs
ValueCountFrequency (%)
99
 
16.4%
76
 
12.6%
20
 
3.3%
16
 
2.6%
15
 
2.5%
14
 
2.3%
14
 
2.3%
13
 
2.2%
12
 
2.0%
12
 
2.0%
Other values (100) 313
51.8%
CJK Compat
ValueCountFrequency (%)
8
33.3%
8
33.3%
8
33.3%
Punctuation
ValueCountFrequency (%)
7
43.8%
7
43.8%
1
 
6.2%
1
 
6.2%
Compat Jamo
ValueCountFrequency (%)
7
22.6%
3
 
9.7%
2
 
6.5%
2
 
6.5%
2
 
6.5%
2
 
6.5%
1
 
3.2%
1
 
3.2%
1
 
3.2%
1
 
3.2%
Other values (9) 9
29.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
CJK Ext A
ValueCountFrequency (%)
1
100.0%
Geometric Shapes
ValueCountFrequency (%)
1
100.0%
Hiragana
ValueCountFrequency (%)
1
100.0%

term_user
Real number (ℝ)

ZEROS 

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.5955
Minimum0
Maximum25
Zeros2360
Zeros (%)23.6%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T22:03:28.867639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median8
Q314
95-th percentile19
Maximum25
Range25
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.598563
Coefficient of variation (CV)0.76767646
Kurtosis-0.77238142
Mean8.5955
Median Absolute Deviation (MAD)6
Skewness0.29334879
Sum85955
Variance43.541034
MonotonicityNot monotonic
2023-12-12T22:03:29.008294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
0 2360
23.6%
10 1731
17.3%
8 1304
13.0%
14 547
 
5.5%
17 514
 
5.1%
19 396
 
4.0%
18 379
 
3.8%
23 359
 
3.6%
6 352
 
3.5%
5 345
 
3.5%
Other values (14) 1713
17.1%
ValueCountFrequency (%)
0 2360
23.6%
1 47
 
0.5%
2 148
 
1.5%
3 217
 
2.2%
4 94
 
0.9%
5 345
 
3.5%
6 352
 
3.5%
7 302
 
3.0%
8 1304
13.0%
9 131
 
1.3%
ValueCountFrequency (%)
25 11
 
0.1%
23 359
3.6%
22 2
 
< 0.1%
21 21
 
0.2%
20 5
 
0.1%
19 396
4.0%
18 379
3.8%
17 514
5.1%
16 334
3.3%
14 547
5.5%
Distinct8470
Distinct (%)84.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2002-09-30 22:42:00
Maximum2008-02-14 14:34:49
2023-12-12T22:03:29.162784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:03:29.296441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

term_reference
Text

MISSING 

Distinct206
Distinct (%)4.6%
Missing5511
Missing (%)55.1%
Memory size156.2 KiB
2023-12-12T22:03:29.605962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length37
Median length31
Mean length9.7119626
Min length3

Characters and Unicode

Total characters43597
Distinct characters319
Distinct categories9 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)2.1%

Sample

1st row파스칼세계대백과사전
2nd row한국문화사대계3
3rd row두산세계대백과사전
4th row북한용어 400선집
5th row한국민족문화대백과사전
ValueCountFrequency (%)
한국민족문화대백과사전 1868
36.4%
두산세계대백과사전 338
 
6.6%
두산동아백과사전 241
 
4.7%
한국역대제도용어사전 139
 
2.7%
한국마정사 115
 
2.2%
학회총람 108
 
2.1%
두산동아대백과사전 105
 
2.0%
한국사신론 95
 
1.9%
두산세계대백과 85
 
1.7%
한국독립운동사사전 80
 
1.6%
Other values (242) 1959
38.2%
2023-12-12T22:03:29.984936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3617
 
8.3%
3236
 
7.4%
2775
 
6.4%
2757
 
6.3%
2748
 
6.3%
2702
 
6.2%
2686
 
6.2%
2059
 
4.7%
2050
 
4.7%
1890
 
4.3%
Other values (309) 17077
39.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 40685
93.3%
Decimal Number 1250
 
2.9%
Space Separator 651
 
1.5%
Open Punctuation 292
 
0.7%
Close Punctuation 292
 
0.7%
Other Punctuation 283
 
0.6%
Lowercase Letter 115
 
0.3%
Math Symbol 20
 
< 0.1%
Dash Punctuation 9
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3617
 
8.9%
3236
 
8.0%
2775
 
6.8%
2757
 
6.8%
2748
 
6.8%
2702
 
6.6%
2686
 
6.6%
2059
 
5.1%
2050
 
5.0%
1890
 
4.6%
Other values (272) 14165
34.8%
Lowercase Letter
ValueCountFrequency (%)
w 18
15.7%
o 13
11.3%
e 12
10.4%
u 12
10.4%
m 12
10.4%
k 12
10.4%
r 11
9.6%
s 6
 
5.2%
g 5
 
4.3%
a 5
 
4.3%
Other values (6) 9
7.8%
Decimal Number
ValueCountFrequency (%)
1 384
30.7%
0 277
22.2%
9 217
17.4%
4 93
 
7.4%
8 88
 
7.0%
3 68
 
5.4%
5 57
 
4.6%
2 29
 
2.3%
7 26
 
2.1%
6 11
 
0.9%
Other Punctuation
ValueCountFrequency (%)
, 254
89.8%
. 18
 
6.4%
· 8
 
2.8%
/ 2
 
0.7%
: 1
 
0.4%
Math Symbol
ValueCountFrequency (%)
> 10
50.0%
< 10
50.0%
Space Separator
ValueCountFrequency (%)
651
100.0%
Open Punctuation
ValueCountFrequency (%)
( 292
100.0%
Close Punctuation
ValueCountFrequency (%)
) 292
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 40088
92.0%
Common 2797
 
6.4%
Han 597
 
1.4%
Latin 115
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3617
 
9.0%
3236
 
8.1%
2775
 
6.9%
2757
 
6.9%
2748
 
6.9%
2702
 
6.7%
2686
 
6.7%
2059
 
5.1%
2050
 
5.1%
1890
 
4.7%
Other values (204) 13568
33.8%
Han
ValueCountFrequency (%)
63
 
10.6%
63
 
10.6%
63
 
10.6%
24
 
4.0%
23
 
3.9%
21
 
3.5%
20
 
3.4%
19
 
3.2%
19
 
3.2%
19
 
3.2%
Other values (58) 263
44.1%
Common
ValueCountFrequency (%)
651
23.3%
1 384
13.7%
( 292
10.4%
) 292
10.4%
0 277
9.9%
, 254
 
9.1%
9 217
 
7.8%
4 93
 
3.3%
8 88
 
3.1%
3 68
 
2.4%
Other values (11) 181
 
6.5%
Latin
ValueCountFrequency (%)
w 18
15.7%
o 13
11.3%
e 12
10.4%
u 12
10.4%
m 12
10.4%
k 12
10.4%
r 11
9.6%
s 6
 
5.2%
g 5
 
4.3%
a 5
 
4.3%
Other values (6) 9
7.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 40088
92.0%
ASCII 2904
 
6.7%
CJK 596
 
1.4%
None 8
 
< 0.1%
CJK Compat Ideographs 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3617
 
9.0%
3236
 
8.1%
2775
 
6.9%
2757
 
6.9%
2748
 
6.9%
2702
 
6.7%
2686
 
6.7%
2059
 
5.1%
2050
 
5.1%
1890
 
4.7%
Other values (204) 13568
33.8%
ASCII
ValueCountFrequency (%)
651
22.4%
1 384
13.2%
( 292
10.1%
) 292
10.1%
0 277
9.5%
, 254
 
8.7%
9 217
 
7.5%
4 93
 
3.2%
8 88
 
3.0%
3 68
 
2.3%
Other values (26) 288
9.9%
CJK
ValueCountFrequency (%)
63
 
10.6%
63
 
10.6%
63
 
10.6%
24
 
4.0%
23
 
3.9%
21
 
3.5%
20
 
3.4%
19
 
3.2%
19
 
3.2%
19
 
3.2%
Other values (57) 262
44.0%
None
ValueCountFrequency (%)
· 8
100.0%
CJK Compat Ideographs
ValueCountFrequency (%)
1
100.0%

Interactions

2023-12-12T22:03:20.523176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:03:19.884357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:03:20.199158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:03:20.872693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:03:19.996974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:03:20.308371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:03:21.006935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:03:20.107982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:03:20.422974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T22:03:30.062713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
term_idtopterm_idterm_kindterm_attrterm_timesterm_user
term_id1.0000.3470.0851.0000.5650.915
topterm_id0.3471.0000.101NaN0.6550.510
term_kind0.0850.1011.000NaN0.0000.162
term_attr1.000NaNNaN1.000NaN1.000
term_times0.5650.6550.000NaN1.0000.494
term_user0.9150.5100.1621.0000.4941.000
2023-12-12T22:03:30.457373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
term_idtopterm_idterm_userterm_kind
term_id1.0000.1010.2790.051
topterm_id0.1011.000-0.3590.064
term_user0.279-0.3591.0000.097
term_kind0.0510.0640.0971.000

Missing values

2023-12-12T22:03:21.155281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:03:21.400533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T22:03:21.610022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

term_idtopterm_idterm_nameterm_kindterm_chterm_remarkterm_attrterm_yearterm_timesterm_lkterm_descterm_userterm_createdterm_reference
607910489860665현무문2玄武門<NA><NA>?삼국시대문화재고구려가 수도 평양성의 북성을 쌓을 때 처음 세운 북문으로 1954년 다시 복구한 평양직할시 중구역(中區域) 금수산(錦繡山)에 있는 성문.02004-07-10 10:15:16파스칼세계대백과사전
6185146813548창원전보사2昌原電報司<NA><NA>1899근대정치·행정·법제>행정>중앙행정기구1899년 8월 29일 경상남도 창원(昌原)에 개설한 전신(電信) 업무 행정기관.72004-07-12 17:07:57한국문화사대계3
22299356579968가감역관2假監役官<NA><NA>?-?조선후기정치·행정·법제>인사조선시대 선공감(繕工監)에서 토목 영선을 감독하던 종9품의 임시직.102003-10-02 16:50:48두산세계대백과사전
18288112208공보부2公報部<NA><NA>1946-1948현대정치·행정·법제>행정>중앙행정기구1946년 미군정청 개편에 따라 설치된 기구.102003-09-24 19:04:31<NA>
44362307035686사회주의법무생활지도위원회2社會主義法務生活指導委員會<NA><NA>1977-?현대외교·국제관계>북한>정치·행정·법제(북한)1977년 김일성의 연설을 통해 알려진 것으로 북한주민들이 법규범과 규정대로 생활하도록 법무생활을 지도감독하는 기관.112004-06-30 09:46:35북한용어 400선집
40238391237574가삼2家蔘<NA><NA>?-?통시대학술·과학기술>의학·약학밭에 씨를 뿌려 거둔 인삼.82004-06-29 13:23:59<NA>
331833005439단가2短歌판소리<NA>?-?근세-현대문화·예술>음악판소리를 부르기 전에 목청을 가다듬기 위하여 부르는 짧은 노래.82002-09-30 22:42:00한국민족문화대백과사전
2598621669662권유2權裕<NA>1745-1804조선후기인명조선 영조-순조 때의 문신으로 본관은 안동(安東). 순조와 순원왕후(純元王后)의 국혼을 반대하는 소를 올린 일로 대역부도죄로 죽임을 당함.02002-09-30 22:44:00한국민족문화대백과사전
1885623072505574전북대사학회2全北大史學會<NA><NA>1976-?현대학술·과학기술>학술기구·단체1976년 역사연구를 기본 목적으로 전북대학교에 설립한 학회.112004-07-26 15:26:02학회총람
2352417752631봉명학원2鳳鳴學院<NA><NA>1907근대교육>근대교육기관>초등교육기관1907년 윤최명(尹最明) 등이 평안북도 박천군(博川郡) 가산면 동문동에 설립한 사립교육기관.162003-06-10 16:49:58<NA>
term_idtopterm_idterm_nameterm_kindterm_chterm_remarkterm_attrterm_yearterm_timesterm_lkterm_descterm_userterm_createdterm_reference
1066637762349665유점사부도군2楡岾寺浮屠群<NA><NA>?조선후기문화재조선후기에 제작된 유점사(楡岾寺) 경내에 세워져 있는 10여 기의 석조 부도와 비석의 무리. 강원도 고성군 소재.02004-11-10 01:10:00북한문화재해설집1
999316058계공랑2啓功郞<NA><NA>1392-1894조선시대정치·행정·법제>인사조선시대 문산계(文散階) 종7품의 품계.102003-08-06 17:52:13<NA>
22613157678영집자2永執者<NA><NA>?-?조선시대정치·행정·법제>사법>범죄전호(佃戶)가 남의 토지를 아울러 경작하게 된 것을 기화로 삼아 그 토지를 영구히 점유하는 것.182003-06-03 11:50:14<NA>
3716137755718662조완기2趙完基<NA><NA>1570-1592조선전기인명1592년(선조 25) 임진왜란 때 옥천(沃川)에서 의병을 일으킨 아버지 조헌을 따라 종군한 의병으로 현종 때 지평(持平)이에 추증. 경기도 김포 출생.182003-08-21 16:49:43<NA>
3138316797584662환인2桓因<NA><NA>?고대인명단군신화(檀君神話)에 나오는 신적인 존재. 환웅(桓雄)의 아버지이며 단군의 할아버지인 천제(天帝)로, 환웅에게 천부인(天符印) 3개를 주어 세상을 다스리게 함, 환인의 이름은 불전(佛典)에서 따온 제석신(帝釋神)의 이름으로 원래는 '하늘', '하느님'이라는 한글의 근원이 되는 어떤 어형의 음사(音寫)로 봄.82003-09-16 13:18:57<NA>
297194200017665삼군부총무당2三軍府總武堂<NA><NA>1868조선후기문화재1868년(고종 5) 삼군부의 무략(武略)을 총괄하던 청사. 현 정부종합청사 자리에 세워졌다가, 1930년에 서울시 성북구 돈암동 삼선공원 옮김.02004-05-21 15:48:44한국민족문화대백과사전
204846457439와당2瓦當<NA><NA>?-?통시대문화·예술>미술추녀 끝에 덮는 기와로 기와 한쪽 끝에 둥글게 모양을 낸 부분.82002-11-13 11:27:24한국민족문화대백과사전
3425616579665속리의정2품송2俗離의正二品松<NA><NA>?근세-현대문화재1962년 천연기념물 제103호로 지정된 속리산의 소나무.02002-10-22 09:44:33두산세계대백과사전
3859244042351199곰배괭이2곰배괭이<NA><NA>?-?통시대경제·산업>농업>농기구흙을 파거나 씨를 뿌리기 위해 골을 켤 때, 덩어리진 흙을 부수거나 땅을 고를 때 쓰는 농기구.82003-10-25 12:04:01한국민족문화대백과사전
3825439853982574조선박물교원회2朝鮮博物敎員會<NA><NA>1935-1940일제시기학술·과학기술>학술기구·단체일제시기에 생물교사들이 박물(博物)에 관해 연구하고자 설립한 학술단체. 1940년 조선박물교원연구회로 명칭을 바꾸었고, 8·15광복 후 조선생물학회와 통합.192003-09-16 13:59:03<NA>