Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells14103
Missing cells (%)17.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory703.1 KiB
Average record size in memory72.0 B

Variable types

Text7
Categorical1

Dataset

Description우리나라의 역사, 문화를 집대성한 백과사전으로 7만 3천여 항목을 수록하고 있습니다.
Author한국학중앙연구원
URLhttps://www.data.go.kr/data/3059498/fileData.do

Alerts

원어 has 455 (4.5%) missing valuesMissing
이칭 has 7083 (70.8%) missing valuesMissing
키워드 has 4710 (47.1%) missing valuesMissing
시대 has 1855 (18.6%) missing valuesMissing
웹사이트 주소 has unique valuesUnique

Reproduction

Analysis started2023-12-12 12:22:27.068234
Analysis finished2023-12-12 12:22:32.173162
Duration5.1 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct9841
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T21:22:32.555367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length25
Mean length4.169
Min length1

Characters and Unicode

Total characters41690
Distinct characters735
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9695 ?
Unique (%)97.0%

Sample

1st row제고
2nd row유원중
3rd row표충사아미타삼존도
4th row계모
5th row은곡서당
ValueCountFrequency (%)
32
 
0.3%
경주 18
 
0.2%
고택 17
 
0.2%
삼층석탑 15
 
0.1%
청자 11
 
0.1%
순천 11
 
0.1%
고인돌 9
 
0.1%
일괄 9
 
0.1%
안동 8
 
0.1%
서울 8
 
0.1%
Other values (10201) 10704
98.7%
2023-12-12T21:22:33.195992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1026
 
2.5%
842
 
2.0%
650
 
1.6%
605
 
1.5%
563
 
1.4%
559
 
1.3%
558
 
1.3%
549
 
1.3%
536
 
1.3%
529
 
1.3%
Other values (725) 35273
84.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 40677
97.6%
Space Separator 842
 
2.0%
Decimal Number 66
 
0.2%
Other Symbol 26
 
0.1%
Other Punctuation 24
 
0.1%
Close Punctuation 19
 
< 0.1%
Open Punctuation 18
 
< 0.1%
Uppercase Letter 11
 
< 0.1%
Dash Punctuation 4
 
< 0.1%
Initial Punctuation 1
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1026
 
2.5%
650
 
1.6%
605
 
1.5%
563
 
1.4%
559
 
1.4%
558
 
1.4%
549
 
1.3%
536
 
1.3%
529
 
1.3%
517
 
1.3%
Other values (696) 34585
85.0%
Decimal Number
ValueCountFrequency (%)
1 15
22.7%
2 15
22.7%
3 12
18.2%
7 6
 
9.1%
4 5
 
7.6%
6 4
 
6.1%
5 3
 
4.5%
0 2
 
3.0%
9 2
 
3.0%
8 2
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
D 3
27.3%
K 2
18.2%
B 2
18.2%
E 1
 
9.1%
C 1
 
9.1%
G 1
 
9.1%
T 1
 
9.1%
Other Punctuation
ValueCountFrequency (%)
· 21
87.5%
, 3
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 18
94.7%
1
 
5.3%
Open Punctuation
ValueCountFrequency (%)
( 17
94.4%
1
 
5.6%
Space Separator
ValueCountFrequency (%)
842
100.0%
Other Symbol
ValueCountFrequency (%)
26
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%
Initial Punctuation
ValueCountFrequency (%)
1
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 40703
97.6%
Common 976
 
2.3%
Latin 11
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1026
 
2.5%
650
 
1.6%
605
 
1.5%
563
 
1.4%
559
 
1.4%
558
 
1.4%
549
 
1.3%
536
 
1.3%
529
 
1.3%
517
 
1.3%
Other values (697) 34611
85.0%
Common
ValueCountFrequency (%)
842
86.3%
· 21
 
2.2%
) 18
 
1.8%
( 17
 
1.7%
1 15
 
1.5%
2 15
 
1.5%
3 12
 
1.2%
7 6
 
0.6%
4 5
 
0.5%
6 4
 
0.4%
Other values (11) 21
 
2.2%
Latin
ValueCountFrequency (%)
D 3
27.3%
K 2
18.2%
B 2
18.2%
E 1
 
9.1%
C 1
 
9.1%
G 1
 
9.1%
T 1
 
9.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 40674
97.6%
ASCII 962
 
2.3%
None 49
 
0.1%
Compat Jamo 3
 
< 0.1%
Punctuation 2
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1026
 
2.5%
650
 
1.6%
605
 
1.5%
563
 
1.4%
559
 
1.4%
558
 
1.4%
549
 
1.3%
536
 
1.3%
529
 
1.3%
517
 
1.3%
Other values (693) 34582
85.0%
ASCII
ValueCountFrequency (%)
842
87.5%
) 18
 
1.9%
( 17
 
1.8%
1 15
 
1.6%
2 15
 
1.6%
3 12
 
1.2%
7 6
 
0.6%
4 5
 
0.5%
6 4
 
0.4%
- 4
 
0.4%
Other values (13) 24
 
2.5%
None
ValueCountFrequency (%)
26
53.1%
· 21
42.9%
1
 
2.0%
1
 
2.0%
Compat Jamo
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%

원어
Text

MISSING 

Distinct9454
Distinct (%)99.0%
Missing455
Missing (%)4.5%
Memory size156.2 KiB
2023-12-12T21:22:33.538499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length24
Mean length4.051231
Min length1

Characters and Unicode

Total characters38669
Distinct characters2993
Distinct categories13 ?
Distinct scripts4 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9381 ?
Unique (%)98.3%

Sample

1st row齊鼓
2nd row柳遠重
3rd row表忠寺阿彌陀三尊圖
4th row繼母
5th row隱谷書堂
ValueCountFrequency (%)
─打令 7
 
0.1%
銀杏─ 6
 
0.1%
古宅 4
 
< 0.1%
珍島─ 4
 
< 0.1%
郎中 4
 
< 0.1%
大韓─協會 4
 
< 0.1%
─大學校 3
 
< 0.1%
慶州 3
 
< 0.1%
良洞마을 3
 
< 0.1%
盤松 3
 
< 0.1%
Other values (9529) 9621
99.6%
2023-12-12T21:22:34.003804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
503
 
1.3%
473
 
1.2%
467
 
1.2%
460
 
1.2%
459
 
1.2%
447
 
1.2%
425
 
1.1%
413
 
1.1%
362
 
0.9%
295
 
0.8%
Other values (2983) 34365
88.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 37957
98.2%
Other Symbol 459
 
1.2%
Space Separator 117
 
0.3%
Close Punctuation 38
 
0.1%
Open Punctuation 37
 
0.1%
Uppercase Letter 16
 
< 0.1%
Other Punctuation 12
 
< 0.1%
Decimal Number 11
 
< 0.1%
Lowercase Letter 9
 
< 0.1%
Dash Punctuation 8
 
< 0.1%
Other values (3) 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
503
 
1.3%
473
 
1.2%
467
 
1.2%
460
 
1.2%
447
 
1.2%
425
 
1.1%
413
 
1.1%
362
 
1.0%
295
 
0.8%
275
 
0.7%
Other values (2949) 33837
89.1%
Uppercase Letter
ValueCountFrequency (%)
E 3
18.8%
T 2
12.5%
H 2
12.5%
K 2
12.5%
R 2
12.5%
A 2
12.5%
O 1
 
6.2%
L 1
 
6.2%
D 1
 
6.2%
Decimal Number
ValueCountFrequency (%)
4 2
18.2%
1 2
18.2%
6 2
18.2%
7 2
18.2%
9 1
9.1%
3 1
9.1%
2 1
9.1%
Lowercase Letter
ValueCountFrequency (%)
a 3
33.3%
t 2
22.2%
f 1
 
11.1%
u 1
 
11.1%
s 1
 
11.1%
r 1
 
11.1%
Other Punctuation
ValueCountFrequency (%)
· 9
75.0%
, 3
 
25.0%
Dash Punctuation
ValueCountFrequency (%)
5
62.5%
- 3
37.5%
Math Symbol
ValueCountFrequency (%)
+ 2
66.7%
~ 1
33.3%
Other Symbol
ValueCountFrequency (%)
459
100.0%
Space Separator
ValueCountFrequency (%)
117
100.0%
Close Punctuation
ValueCountFrequency (%)
) 38
100.0%
Open Punctuation
ValueCountFrequency (%)
( 37
100.0%
Initial Punctuation
ValueCountFrequency (%)
1
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 37861
97.9%
Common 687
 
1.8%
Hangul 96
 
0.2%
Latin 25
 
0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
503
 
1.3%
473
 
1.2%
467
 
1.2%
460
 
1.2%
447
 
1.2%
425
 
1.1%
413
 
1.1%
362
 
1.0%
295
 
0.8%
275
 
0.7%
Other values (2876) 33741
89.1%
Hangul
ValueCountFrequency (%)
4
 
4.2%
3
 
3.1%
3
 
3.1%
3
 
3.1%
3
 
3.1%
3
 
3.1%
2
 
2.1%
2
 
2.1%
2
 
2.1%
2
 
2.1%
Other values (63) 69
71.9%
Common
ValueCountFrequency (%)
459
66.8%
117
 
17.0%
) 38
 
5.5%
( 37
 
5.4%
· 9
 
1.3%
5
 
0.7%
, 3
 
0.4%
- 3
 
0.4%
4 2
 
0.3%
1 2
 
0.3%
Other values (9) 12
 
1.7%
Latin
ValueCountFrequency (%)
a 3
12.0%
E 3
12.0%
t 2
 
8.0%
T 2
 
8.0%
H 2
 
8.0%
K 2
 
8.0%
R 2
 
8.0%
A 2
 
8.0%
f 1
 
4.0%
u 1
 
4.0%
Other values (5) 5
20.0%

Most occurring blocks

ValueCountFrequency (%)
CJK 37503
97.0%
Box Drawing 459
 
1.2%
CJK Compat Ideographs 348
 
0.9%
ASCII 237
 
0.6%
Hangul 96
 
0.2%
CJK Ext A 10
 
< 0.1%
None 9
 
< 0.1%
Punctuation 7
 
< 0.1%

Most frequent character per block

CJK
ValueCountFrequency (%)
503
 
1.3%
473
 
1.3%
467
 
1.2%
460
 
1.2%
447
 
1.2%
425
 
1.1%
413
 
1.1%
362
 
1.0%
295
 
0.8%
275
 
0.7%
Other values (2795) 33383
89.0%
Box Drawing
ValueCountFrequency (%)
459
100.0%
ASCII
ValueCountFrequency (%)
117
49.4%
) 38
 
16.0%
( 37
 
15.6%
a 3
 
1.3%
, 3
 
1.3%
E 3
 
1.3%
- 3
 
1.3%
t 2
 
0.8%
4 2
 
0.8%
1 2
 
0.8%
Other values (19) 27
 
11.4%
CJK Compat Ideographs
ValueCountFrequency (%)
42
 
12.1%
33
 
9.5%
29
 
8.3%
21
 
6.0%
20
 
5.7%
17
 
4.9%
14
 
4.0%
11
 
3.2%
9
 
2.6%
8
 
2.3%
Other values (61) 144
41.4%
None
ValueCountFrequency (%)
· 9
100.0%
Punctuation
ValueCountFrequency (%)
5
71.4%
1
 
14.3%
1
 
14.3%
Hangul
ValueCountFrequency (%)
4
 
4.2%
3
 
3.1%
3
 
3.1%
3
 
3.1%
3
 
3.1%
3
 
3.1%
2
 
2.1%
2
 
2.1%
2
 
2.1%
2
 
2.1%
Other values (63) 69
71.9%
CJK Ext A
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
㦿 1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%

이칭
Text

MISSING 

Distinct2901
Distinct (%)99.5%
Missing7083
Missing (%)70.8%
Memory size156.2 KiB
2023-12-12T21:22:34.223291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length126
Median length87
Mean length12.760027
Min length1

Characters and Unicode

Total characters37221
Distinct characters2316
Distinct categories12 ?
Distinct scripts5 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2887 ?
Unique (%)99.0%

Sample

1st row희여(希輿)|우헌(愚軒)|서강(西岡)
2nd row의모(義母)|의붓어머니
3rd row토광묘
4th row냉천사(冷泉寺)
5th row자회(子晦)|주계(朱溪)|해양(海陽)
ValueCountFrequency (%)
of 17
 
0.5%
국도 5
 
0.1%
republic 4
 
0.1%
화중(和仲 3
 
0.1%
충간(忠簡 3
 
0.1%
월여(月如 3
 
0.1%
공화국|republic 2
 
0.1%
democratic 2
 
0.1%
문안(文安 2
 
0.1%
청동방울 2
 
0.1%
Other values (3281) 3307
98.7%
2023-12-12T21:22:34.655043image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 4024
 
10.8%
) 4024
 
10.8%
| 2211
 
5.9%
911
 
2.4%
339
 
0.9%
286
 
0.8%
273
 
0.7%
259
 
0.7%
258
 
0.7%
212
 
0.6%
Other values (2306) 24424
65.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 25129
67.5%
Open Punctuation 4032
 
10.8%
Close Punctuation 4032
 
10.8%
Math Symbol 2216
 
6.0%
Space Separator 911
 
2.4%
Lowercase Letter 498
 
1.3%
Other Punctuation 209
 
0.6%
Uppercase Letter 119
 
0.3%
Decimal Number 63
 
0.2%
Dash Punctuation 5
 
< 0.1%
Other values (2) 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
339
 
1.3%
286
 
1.1%
273
 
1.1%
259
 
1.0%
258
 
1.0%
212
 
0.8%
210
 
0.8%
202
 
0.8%
191
 
0.8%
189
 
0.8%
Other values (2226) 22710
90.4%
Lowercase Letter
ValueCountFrequency (%)
e 58
11.6%
i 53
10.6%
o 52
10.4%
a 41
 
8.2%
n 38
 
7.6%
l 33
 
6.6%
t 28
 
5.6%
c 25
 
5.0%
u 23
 
4.6%
p 18
 
3.6%
Other values (16) 129
25.9%
Uppercase Letter
ValueCountFrequency (%)
R 15
12.6%
S 12
 
10.1%
K 9
 
7.6%
C 9
 
7.6%
T 7
 
5.9%
A 7
 
5.9%
N 7
 
5.9%
I 6
 
5.0%
P 6
 
5.0%
G 5
 
4.2%
Other values (12) 36
30.3%
Decimal Number
ValueCountFrequency (%)
0 13
20.6%
1 10
15.9%
8 7
11.1%
9 6
9.5%
4 6
9.5%
3 5
 
7.9%
6 5
 
7.9%
2 5
 
7.9%
7 3
 
4.8%
5 3
 
4.8%
Open Punctuation
ValueCountFrequency (%)
( 4024
99.8%
4
 
0.1%
2
 
< 0.1%
1
 
< 0.1%
[ 1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 4024
99.8%
4
 
0.1%
2
 
< 0.1%
1
 
< 0.1%
] 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
, 206
98.6%
· 1
 
0.5%
/ 1
 
0.5%
. 1
 
0.5%
Private Use
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Math Symbol
ValueCountFrequency (%)
| 2211
99.8%
~ 5
 
0.2%
Space Separator
ValueCountFrequency (%)
911
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
Other Symbol
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 15323
41.2%
Common 11472
30.8%
Han 9806
26.3%
Latin 617
 
1.7%
Unknown 3
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
186
 
1.9%
121
 
1.2%
120
 
1.2%
111
 
1.1%
107
 
1.1%
89
 
0.9%
76
 
0.8%
74
 
0.8%
68
 
0.7%
67
 
0.7%
Other values (1649) 8787
89.6%
Hangul
ValueCountFrequency (%)
339
 
2.2%
286
 
1.9%
273
 
1.8%
259
 
1.7%
258
 
1.7%
212
 
1.4%
210
 
1.4%
202
 
1.3%
191
 
1.2%
189
 
1.2%
Other values (567) 12904
84.2%
Latin
ValueCountFrequency (%)
e 58
 
9.4%
i 53
 
8.6%
o 52
 
8.4%
a 41
 
6.6%
n 38
 
6.2%
l 33
 
5.3%
t 28
 
4.5%
c 25
 
4.1%
u 23
 
3.7%
p 18
 
2.9%
Other values (38) 248
40.2%
Common
ValueCountFrequency (%)
( 4024
35.1%
) 4024
35.1%
| 2211
19.3%
911
 
7.9%
, 206
 
1.8%
0 13
 
0.1%
1 10
 
0.1%
8 7
 
0.1%
9 6
 
0.1%
4 6
 
0.1%
Other values (19) 54
 
0.5%
Unknown
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 15323
41.2%
ASCII 12069
32.4%
CJK 9736
26.2%
CJK Compat Ideographs 64
 
0.2%
None 16
 
< 0.1%
CJK Ext A 6
 
< 0.1%
Geometric Shapes 4
 
< 0.1%
PUA 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 4024
33.3%
) 4024
33.3%
| 2211
18.3%
911
 
7.5%
, 206
 
1.7%
e 58
 
0.5%
i 53
 
0.4%
o 52
 
0.4%
a 41
 
0.3%
n 38
 
0.3%
Other values (58) 451
 
3.7%
Hangul
ValueCountFrequency (%)
339
 
2.2%
286
 
1.9%
273
 
1.8%
259
 
1.7%
258
 
1.7%
212
 
1.4%
210
 
1.4%
202
 
1.3%
191
 
1.2%
189
 
1.2%
Other values (567) 12904
84.2%
CJK
ValueCountFrequency (%)
186
 
1.9%
121
 
1.2%
120
 
1.2%
111
 
1.1%
107
 
1.1%
89
 
0.9%
76
 
0.8%
74
 
0.8%
68
 
0.7%
67
 
0.7%
Other values (1619) 8717
89.5%
CJK Compat Ideographs
ValueCountFrequency (%)
15
23.4%
12
18.8%
4
 
6.2%
4
 
6.2%
4
 
6.2%
3
 
4.7%
2
 
3.1%
2
 
3.1%
2
 
3.1%
2
 
3.1%
Other values (14) 14
21.9%
None
ValueCountFrequency (%)
4
25.0%
4
25.0%
2
12.5%
2
12.5%
1
 
6.2%
1
 
6.2%
· 1
 
6.2%
ñ 1
 
6.2%
Geometric Shapes
ValueCountFrequency (%)
4
100.0%
CJK Ext A
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
PUA
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

키워드
Text

MISSING 

Distinct5220
Distinct (%)98.7%
Missing4710
Missing (%)47.1%
Memory size156.2 KiB
2023-12-12T21:22:35.043750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length405
Median length184
Mean length21.313233
Min length1

Characters and Unicode

Total characters112747
Distinct characters2670
Distinct categories11 ?
Distinct scripts5 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5171 ?
Unique (%)97.8%

Sample

1st row고려기(高麗伎), 서량기(西凉伎), 악서(樂書)
2nd row최익현, 서강문집
3rd row평양시 태성리, 황해도 은율군 운성리, 경상북도 경주시 조양동, 경상남도 김해시 예안리
4th row고종, 대한제국
5th row당속악(唐俗樂) 14조
ValueCountFrequency (%)
대한불교조계종 67
 
0.3%
61
 
0.3%
읍지 31
 
0.2%
임진왜란 26
 
0.1%
송시열 23
 
0.1%
이황 19
 
0.1%
침입 18
 
0.1%
홍건적 16
 
0.1%
인조반정 14
 
0.1%
대흥사 13
 
0.1%
Other values (16448) 19501
98.5%
2023-12-12T21:22:35.688621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
14511
 
12.9%
, 13708
 
12.2%
( 2507
 
2.2%
) 2505
 
2.2%
1896
 
1.7%
1304
 
1.2%
1134
 
1.0%
1034
 
0.9%
987
 
0.9%
984
 
0.9%
Other values (2660) 72177
64.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 78783
69.9%
Space Separator 14511
 
12.9%
Other Punctuation 13753
 
12.2%
Open Punctuation 2514
 
2.2%
Close Punctuation 2512
 
2.2%
Decimal Number 295
 
0.3%
Lowercase Letter 168
 
0.1%
Private Use 121
 
0.1%
Uppercase Letter 85
 
0.1%
Dash Punctuation 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1896
 
2.4%
1304
 
1.7%
1134
 
1.4%
1034
 
1.3%
987
 
1.3%
984
 
1.2%
919
 
1.2%
901
 
1.1%
844
 
1.1%
835
 
1.1%
Other values (2590) 67945
86.2%
Lowercase Letter
ValueCountFrequency (%)
a 16
9.5%
e 16
9.5%
n 15
 
8.9%
o 14
 
8.3%
i 14
 
8.3%
r 12
 
7.1%
s 11
 
6.5%
l 11
 
6.5%
t 11
 
6.5%
u 10
 
6.0%
Other values (15) 38
22.6%
Uppercase Letter
ValueCountFrequency (%)
S 9
 
10.6%
C 8
 
9.4%
A 6
 
7.1%
G 6
 
7.1%
T 6
 
7.1%
O 6
 
7.1%
M 5
 
5.9%
L 5
 
5.9%
E 4
 
4.7%
D 4
 
4.7%
Other values (12) 26
30.6%
Decimal Number
ValueCountFrequency (%)
1 78
26.4%
2 40
13.6%
3 31
 
10.5%
0 29
 
9.8%
4 28
 
9.5%
6 27
 
9.2%
5 25
 
8.5%
9 16
 
5.4%
8 14
 
4.7%
7 7
 
2.4%
Other Punctuation
ValueCountFrequency (%)
, 13708
99.7%
· 43
 
0.3%
. 2
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 2507
99.7%
[ 6
 
0.2%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 2505
99.7%
] 6
 
0.2%
1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
14511
100.0%
Private Use
ValueCountFrequency (%)
󰠐 121
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 69700
61.8%
Common 33590
29.8%
Han 9083
 
8.1%
Latin 253
 
0.2%
Unknown 121
 
0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
97
 
1.1%
94
 
1.0%
77
 
0.8%
70
 
0.8%
67
 
0.7%
67
 
0.7%
65
 
0.7%
57
 
0.6%
55
 
0.6%
54
 
0.6%
Other values (1805) 8380
92.3%
Hangul
ValueCountFrequency (%)
1896
 
2.7%
1304
 
1.9%
1134
 
1.6%
1034
 
1.5%
987
 
1.4%
984
 
1.4%
919
 
1.3%
901
 
1.3%
844
 
1.2%
835
 
1.2%
Other values (775) 58862
84.5%
Latin
ValueCountFrequency (%)
a 16
 
6.3%
e 16
 
6.3%
n 15
 
5.9%
o 14
 
5.5%
i 14
 
5.5%
r 12
 
4.7%
s 11
 
4.3%
l 11
 
4.3%
t 11
 
4.3%
u 10
 
4.0%
Other values (37) 123
48.6%
Common
ValueCountFrequency (%)
14511
43.2%
, 13708
40.8%
( 2507
 
7.5%
) 2505
 
7.5%
1 78
 
0.2%
· 43
 
0.1%
2 40
 
0.1%
3 31
 
0.1%
0 29
 
0.1%
4 28
 
0.1%
Other values (12) 110
 
0.3%
Unknown
ValueCountFrequency (%)
󰠐 121
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 69697
61.8%
ASCII 33795
30.0%
CJK 8943
 
7.9%
None 167
 
0.1%
CJK Compat Ideographs 138
 
0.1%
Compat Jamo 3
 
< 0.1%
Geometric Shapes 2
 
< 0.1%
CJK Ext A 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14511
42.9%
, 13708
40.6%
( 2507
 
7.4%
) 2505
 
7.4%
1 78
 
0.2%
2 40
 
0.1%
3 31
 
0.1%
0 29
 
0.1%
4 28
 
0.1%
6 27
 
0.1%
Other values (54) 331
 
1.0%
Hangul
ValueCountFrequency (%)
1896
 
2.7%
1304
 
1.9%
1134
 
1.6%
1034
 
1.5%
987
 
1.4%
984
 
1.4%
919
 
1.3%
901
 
1.3%
844
 
1.2%
835
 
1.2%
Other values (773) 58859
84.4%
None
ValueCountFrequency (%)
󰠐 121
72.5%
· 43
 
25.7%
1
 
0.6%
1
 
0.6%
ñ 1
 
0.6%
CJK
ValueCountFrequency (%)
97
 
1.1%
94
 
1.1%
77
 
0.9%
70
 
0.8%
67
 
0.7%
67
 
0.7%
65
 
0.7%
57
 
0.6%
55
 
0.6%
54
 
0.6%
Other values (1750) 8240
92.1%
CJK Compat Ideographs
ValueCountFrequency (%)
27
19.6%
20
 
14.5%
6
 
4.3%
6
 
4.3%
5
 
3.6%
5
 
3.6%
3
 
2.2%
3
 
2.2%
3
 
2.2%
3
 
2.2%
Other values (43) 57
41.3%
Compat Jamo
ValueCountFrequency (%)
2
66.7%
1
33.3%
Geometric Shapes
ValueCountFrequency (%)
2
100.0%
CJK Ext A
ValueCountFrequency (%)
1
50.0%
1
50.0%

분야
Text

Distinct61
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T21:22:35.996983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length8
Mean length7.3492
Min length5

Characters and Unicode

Total characters73492
Distinct characters90
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row예술·체육/국악
2nd row종교·철학/유교
3rd row예술·체육/회화
4th row사회/가족
5th row교육/교육
ValueCountFrequency (%)
역사/조선시대사 1572
 
15.7%
종교·철학/유교 941
 
9.4%
역사/근대사 649
 
6.5%
역사/고려시대사 572
 
5.7%
역사/고대사 499
 
5.0%
교육/교육 489
 
4.9%
종교·철학/불교 476
 
4.8%
예술·체육/건축 329
 
3.3%
지리/인문지리 305
 
3.0%
지리/자연지리 224
 
2.2%
Other values (51) 3944
39.4%
2023-12-12T21:22:36.481480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 10002
 
13.6%
7179
 
9.8%
4280
 
5.8%
· 4032
 
5.5%
3565
 
4.9%
3535
 
4.8%
3486
 
4.7%
2278
 
3.1%
2247
 
3.1%
1756
 
2.4%
Other values (80) 31132
42.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59456
80.9%
Other Punctuation 14034
 
19.1%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7179
 
12.1%
4280
 
7.2%
3565
 
6.0%
3535
 
5.9%
3486
 
5.9%
2278
 
3.8%
2247
 
3.8%
1756
 
3.0%
1749
 
2.9%
1747
 
2.9%
Other values (77) 27634
46.5%
Other Punctuation
ValueCountFrequency (%)
/ 10002
71.3%
· 4032
28.7%
Math Symbol
ValueCountFrequency (%)
| 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59456
80.9%
Common 14036
 
19.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7179
 
12.1%
4280
 
7.2%
3565
 
6.0%
3535
 
5.9%
3486
 
5.9%
2278
 
3.8%
2247
 
3.8%
1756
 
3.0%
1749
 
2.9%
1747
 
2.9%
Other values (77) 27634
46.5%
Common
ValueCountFrequency (%)
/ 10002
71.3%
· 4032
28.7%
| 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59456
80.9%
ASCII 10004
 
13.6%
None 4032
 
5.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 10002
> 99.9%
| 2
 
< 0.1%
Hangul
ValueCountFrequency (%)
7179
 
12.1%
4280
 
7.2%
3565
 
6.0%
3535
 
5.9%
3486
 
5.9%
2278
 
3.8%
2247
 
3.8%
1756
 
3.0%
1749
 
2.9%
1747
 
2.9%
Other values (77) 27634
46.5%
None
ValueCountFrequency (%)
· 4032
100.0%

유형
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
인물
2639 
문헌
1635 
유적
1074 
제도
973 
개념용어
880 
Other values (12)
2799 

Length

Max length5
Median length2
Mean length2.3967
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row유물
2nd row인물
3rd row작품
4th row개념용어
5th row유적

Common Values

ValueCountFrequency (%)
인물 2639
26.4%
문헌 1635
16.4%
유적 1074
10.7%
제도 973
 
9.7%
개념용어 880
 
8.8%
단체 652
 
6.5%
작품 581
 
5.8%
지명/지명 553
 
5.5%
유물 290
 
2.9%
물품 232
 
2.3%
Other values (7) 491
 
4.9%

Length

2023-12-12T21:22:36.674953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
인물 2639
26.4%
문헌 1635
16.4%
유적 1074
10.7%
제도 973
 
9.7%
개념용어 880
 
8.8%
단체 652
 
6.5%
작품 581
 
5.8%
지명/지명 553
 
5.5%
유물 290
 
2.9%
물품 232
 
2.3%
Other values (7) 491
 
4.9%

시대
Text

MISSING 

Distinct77
Distinct (%)0.9%
Missing1855
Missing (%)18.6%
Memory size156.2 KiB
2023-12-12T21:22:36.878880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length65
Median length2
Mean length3.695887
Min length2

Characters and Unicode

Total characters30103
Distinct characters43
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)0.4%

Sample

1st row고대/삼국
2nd row근대
3rd row근대/개항기
4th row조선/조선 전기
5th row현대/현대
ValueCountFrequency (%)
조선 3227
38.2%
현대/현대 1272
 
15.0%
근대 1061
 
12.5%
고려 837
 
9.9%
고대/삼국 295
 
3.5%
조선/조선 273
 
3.2%
후기 212
 
2.5%
고대/남북국 169
 
2.0%
고대/남북국/통일신라 136
 
1.6%
근대/일제강점기 113
 
1.3%
Other values (66) 863
 
10.2%
2023-12-12T21:22:37.330597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4861
16.1%
4157
13.8%
4037
13.4%
/ 3561
11.8%
2591
8.6%
2057
6.8%
1327
 
4.4%
1112
 
3.7%
941
 
3.1%
931
 
3.1%
Other values (33) 4528
15.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 26096
86.7%
Other Punctuation 3561
 
11.8%
Space Separator 313
 
1.0%
Math Symbol 133
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4861
18.6%
4157
15.9%
4037
15.5%
2591
9.9%
2057
7.9%
1327
 
5.1%
1112
 
4.3%
941
 
3.6%
931
 
3.6%
560
 
2.1%
Other values (30) 3522
13.5%
Other Punctuation
ValueCountFrequency (%)
/ 3561
100.0%
Space Separator
ValueCountFrequency (%)
313
100.0%
Math Symbol
ValueCountFrequency (%)
| 133
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 26096
86.7%
Common 4007
 
13.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4861
18.6%
4157
15.9%
4037
15.5%
2591
9.9%
2057
7.9%
1327
 
5.1%
1112
 
4.3%
941
 
3.6%
931
 
3.6%
560
 
2.1%
Other values (30) 3522
13.5%
Common
ValueCountFrequency (%)
/ 3561
88.9%
313
 
7.8%
| 133
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 26096
86.7%
ASCII 4007
 
13.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4861
18.6%
4157
15.9%
4037
15.5%
2591
9.9%
2057
7.9%
1327
 
5.1%
1112
 
4.3%
941
 
3.6%
931
 
3.6%
560
 
2.1%
Other values (30) 3522
13.5%
ASCII
ValueCountFrequency (%)
/ 3561
88.9%
313
 
7.8%
| 133
 
3.3%

웹사이트 주소
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T21:22:37.640590image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length49
Median length49
Mean length49
Min length49

Characters and Unicode

Total characters490000
Distinct characters29
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st rowhttp://encykorea.aks.ac.kr/Contents/Item/E0051235
2nd rowhttp://encykorea.aks.ac.kr/Contents/Item/E0041714
3rd rowhttp://encykorea.aks.ac.kr/Contents/Item/E0060282
4th rowhttp://encykorea.aks.ac.kr/Contents/Item/E0003119
5th rowhttp://encykorea.aks.ac.kr/Contents/Item/E0042806
ValueCountFrequency (%)
http://encykorea.aks.ac.kr/contents/item/e0051235 1
 
< 0.1%
http://encykorea.aks.ac.kr/contents/item/e0012897 1
 
< 0.1%
http://encykorea.aks.ac.kr/contents/item/e0049002 1
 
< 0.1%
http://encykorea.aks.ac.kr/contents/item/e0004996 1
 
< 0.1%
http://encykorea.aks.ac.kr/contents/item/e0033266 1
 
< 0.1%
http://encykorea.aks.ac.kr/contents/item/e0016514 1
 
< 0.1%
http://encykorea.aks.ac.kr/contents/item/e0036486 1
 
< 0.1%
http://encykorea.aks.ac.kr/contents/item/e0040650 1
 
< 0.1%
http://encykorea.aks.ac.kr/contents/item/e0010014 1
 
< 0.1%
http://encykorea.aks.ac.kr/contents/item/e0010979 1
 
< 0.1%
Other values (9990) 9990
99.9%
2023-12-12T21:22:38.070060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 50000
 
10.2%
t 50000
 
10.2%
e 40000
 
8.2%
n 30000
 
6.1%
k 30000
 
6.1%
a 30000
 
6.1%
. 30000
 
6.1%
0 25394
 
5.2%
s 20000
 
4.1%
c 20000
 
4.1%
Other values (19) 164606
33.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 300000
61.2%
Other Punctuation 90000
 
18.4%
Decimal Number 70000
 
14.3%
Uppercase Letter 30000
 
6.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 50000
16.7%
e 40000
13.3%
n 30000
10.0%
k 30000
10.0%
a 30000
10.0%
s 20000
 
6.7%
c 20000
 
6.7%
o 20000
 
6.7%
r 20000
 
6.7%
m 10000
 
3.3%
Other values (3) 30000
10.0%
Decimal Number
ValueCountFrequency (%)
0 25394
36.3%
1 5668
 
8.1%
6 5510
 
7.9%
4 5501
 
7.9%
5 5436
 
7.8%
3 5396
 
7.7%
2 5254
 
7.5%
9 3970
 
5.7%
7 3947
 
5.6%
8 3924
 
5.6%
Other Punctuation
ValueCountFrequency (%)
/ 50000
55.6%
. 30000
33.3%
: 10000
 
11.1%
Uppercase Letter
ValueCountFrequency (%)
I 10000
33.3%
E 10000
33.3%
C 10000
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 330000
67.3%
Common 160000
32.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 50000
15.2%
e 40000
12.1%
n 30000
9.1%
k 30000
9.1%
a 30000
9.1%
s 20000
 
6.1%
c 20000
 
6.1%
o 20000
 
6.1%
r 20000
 
6.1%
I 10000
 
3.0%
Other values (6) 60000
18.2%
Common
ValueCountFrequency (%)
/ 50000
31.2%
. 30000
18.8%
0 25394
15.9%
: 10000
 
6.2%
1 5668
 
3.5%
6 5510
 
3.4%
4 5501
 
3.4%
5 5436
 
3.4%
3 5396
 
3.4%
2 5254
 
3.3%
Other values (3) 11841
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 490000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 50000
 
10.2%
t 50000
 
10.2%
e 40000
 
8.2%
n 30000
 
6.1%
k 30000
 
6.1%
a 30000
 
6.1%
. 30000
 
6.1%
0 25394
 
5.2%
s 20000
 
4.1%
c 20000
 
4.1%
Other values (19) 164606
33.6%

Correlations

2023-12-12T21:22:38.181796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분야유형시대
분야1.0000.9420.899
유형0.9421.0000.672
시대0.8990.6721.000

Missing values

2023-12-12T21:22:31.663426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:22:31.873887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T21:22:32.067617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

항목명원어이칭키워드분야유형시대웹사이트 주소
48070제고齊鼓<NA>고려기(高麗伎), 서량기(西凉伎), 악서(樂書)예술·체육/국악유물고대/삼국http://encykorea.aks.ac.kr/Contents/Item/E0051235
38952유원중柳遠重희여(希輿)|우헌(愚軒)|서강(西岡)최익현, 서강문집종교·철학/유교인물근대http://encykorea.aks.ac.kr/Contents/Item/E0041714
56598표충사아미타삼존도表忠寺阿彌陀三尊圖<NA><NA>예술·체육/회화작품근대/개항기http://encykorea.aks.ac.kr/Contents/Item/E0060282
2849계모繼母의모(義母)|의붓어머니<NA>사회/가족개념용어<NA>http://encykorea.aks.ac.kr/Contents/Item/E0003119
40014은곡서당隱谷書堂<NA><NA>교육/교육유적조선/조선 전기http://encykorea.aks.ac.kr/Contents/Item/E0042806
18379문헌학文獻學<NA><NA>언론·출판/출판개념용어<NA>http://encykorea.aks.ac.kr/Contents/Item/E0019755
5024교보증권㈜敎保證券(株)<NA><NA>경제·산업/경제단체현대/현대http://encykorea.aks.ac.kr/Contents/Item/E0005479
32281안동사범학교安東師範學校<NA><NA>교육/교육단체현대/현대http://encykorea.aks.ac.kr/Contents/Item/E0034617
12054녹둔도鹿屯島<NA><NA>역사/조선시대사지명/지명조선http://encykorea.aks.ac.kr/Contents/Item/E0012968
11714널무덤<NA>토광묘평양시 태성리, 황해도 은율군 운성리, 경상북도 경주시 조양동, 경상남도 김해시 예안리역사/선사문화개념용어선사/석기http://encykorea.aks.ac.kr/Contents/Item/E0012614
항목명원어이칭키워드분야유형시대웹사이트 주소
2631경주 보문사지 연화문 당간지주慶州普門寺址蓮華文幢竿支柱<NA><NA>예술·체육/건축유적고대/남북국/통일신라http://encykorea.aks.ac.kr/Contents/Item/E0002877
47719정조正朝<NA>마진국(摩震國), 광평성(廣評省)역사/고려시대사제도고려http://encykorea.aks.ac.kr/Contents/Item/E0050868
11502낭중郎中<NA><NA>역사/조선시대사제도조선http://encykorea.aks.ac.kr/Contents/Item/E0012382
39851윤치호尹致昊좌옹(佐翁)|이토 지코(伊東致昊)<NA>역사/근대사인물근대/일제강점기http://encykorea.aks.ac.kr/Contents/Item/E0042640
27875소년少年<NA><NA>언론·출판/언론·방송문헌현대/현대http://encykorea.aks.ac.kr/Contents/Item/E0029962
918강남산맥江南山脈<NA><NA>지리/자연지리지명/지명<NA>http://encykorea.aks.ac.kr/Contents/Item/E0001018
20205방귀온房貴溫옥여(玉汝)|금서(錦西)방사량, 방구성, 방계문, 정존, 조광조, 기묘사화역사/조선시대사인물조선http://encykorea.aks.ac.kr/Contents/Item/E0021670
43638이호李皓세보(世輔)|경평군(慶平君)이당, 김좌근, 서대순, 흥선대원군, 신지도역사/조선시대사인물근대http://encykorea.aks.ac.kr/Contents/Item/E0046546
11358남파유고南坡遺稿<NA>박필조종교·철학/유교문헌조선http://encykorea.aks.ac.kr/Contents/Item/E0012216
1315강찬姜찬<NA><NA>역사/근대사인물근대http://encykorea.aks.ac.kr/Contents/Item/E0001434