Overview

Dataset statistics

Number of variables4
Number of observations2823
Missing cells741
Missing cells (%)6.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory88.3 KiB
Average record size in memory32.0 B

Variable types

Text4

Dataset

Description한국문화재 중 건축(395개), 고고학(884개), 미술사(1,203개) 관련 용어를 국영문으로 풀이한 사전입니다.
Author한국국제교류재단
URLhttps://www.data.go.kr/data/3073111/fileData.do

Alerts

한자 has 741 (26.2%) missing valuesMissing

Reproduction

Analysis started2024-03-15 00:27:02.266443
Analysis finished2024-03-15 00:27:05.024281
Duration2.76 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2802
Distinct (%)99.3%
Missing0
Missing (%)0.0%
Memory size22.2 KiB
2024-03-15T09:27:06.725294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length3.3035778
Min length1

Characters and Unicode

Total characters9326
Distinct characters639
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2781 ?
Unique (%)98.5%

Sample

1st row
2nd row가교
3rd row가께수리
4th row가늠자
5th row가독권
ValueCountFrequency (%)
토기 16
 
0.6%
세형 5
 
0.2%
동기 5
 
0.2%
횡혈식 3
 
0.1%
차형 3
 
0.1%
부도 3
 
0.1%
석도 3
 
0.1%
석핵 3
 
0.1%
입식 3
 
0.1%
석부 3
 
0.1%
Other values (2790) 2846
98.4%
2024-03-15T09:27:08.454561image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
779
 
8.4%
250
 
2.7%
158
 
1.7%
134
 
1.4%
131
 
1.4%
130
 
1.4%
126
 
1.4%
121
 
1.3%
120
 
1.3%
115
 
1.2%
Other values (629) 7262
77.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8541
91.6%
Space Separator 779
 
8.4%
Open Punctuation 3
 
< 0.1%
Close Punctuation 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
250
 
2.9%
158
 
1.8%
134
 
1.6%
131
 
1.5%
130
 
1.5%
126
 
1.5%
121
 
1.4%
120
 
1.4%
115
 
1.3%
105
 
1.2%
Other values (626) 7151
83.7%
Space Separator
ValueCountFrequency (%)
779
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8541
91.6%
Common 785
 
8.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
250
 
2.9%
158
 
1.8%
134
 
1.6%
131
 
1.5%
130
 
1.5%
126
 
1.5%
121
 
1.4%
120
 
1.4%
115
 
1.3%
105
 
1.2%
Other values (626) 7151
83.7%
Common
ValueCountFrequency (%)
779
99.2%
( 3
 
0.4%
) 3
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8541
91.6%
ASCII 785
 
8.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
779
99.2%
( 3
 
0.4%
) 3
 
0.4%
Hangul
ValueCountFrequency (%)
250
 
2.9%
158
 
1.8%
134
 
1.6%
131
 
1.5%
130
 
1.5%
126
 
1.5%
121
 
1.4%
120
 
1.4%
115
 
1.3%
105
 
1.2%
Other values (626) 7151
83.7%

한자
Text

MISSING 

Distinct1910
Distinct (%)91.7%
Missing741
Missing (%)26.2%
Memory size22.2 KiB
2024-03-15T09:27:09.835240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length2.7137368
Min length1

Characters and Unicode

Total characters5650
Distinct characters1130
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1842 ?
Unique (%)88.5%

Sample

1st row駕轎
2nd row家督權
3rd row迦羅頻伽
4th row~裝飾
5th row加蘭花簪
ValueCountFrequency (%)
土器 19
 
0.9%
裝飾 16
 
0.8%
10
 
0.5%
靑銅 10
 
0.5%
10
 
0.5%
石器 8
 
0.4%
天障 8
 
0.4%
7
 
0.3%
6
 
0.3%
6
 
0.3%
Other values (1821) 1981
95.2%
2024-03-15T09:27:11.597759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
~ 465
 
8.2%
121
 
2.1%
117
 
2.1%
72
 
1.3%
54
 
1.0%
53
 
0.9%
52
 
0.9%
48
 
0.8%
45
 
0.8%
45
 
0.8%
Other values (1120) 4578
81.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 5094
90.2%
Math Symbol 465
 
8.2%
Close Punctuation 35
 
0.6%
Open Punctuation 34
 
0.6%
Space Separator 15
 
0.3%
Other Punctuation 6
 
0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
121
 
2.4%
117
 
2.3%
72
 
1.4%
54
 
1.1%
53
 
1.0%
52
 
1.0%
48
 
0.9%
45
 
0.9%
45
 
0.9%
44
 
0.9%
Other values (1114) 4443
87.2%
Math Symbol
ValueCountFrequency (%)
~ 465
100.0%
Close Punctuation
ValueCountFrequency (%)
) 35
100.0%
Open Punctuation
ValueCountFrequency (%)
( 34
100.0%
Space Separator
ValueCountFrequency (%)
15
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 6
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 5092
90.1%
Common 556
 
9.8%
Hangul 2
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
121
 
2.4%
117
 
2.3%
72
 
1.4%
54
 
1.1%
53
 
1.0%
52
 
1.0%
48
 
0.9%
45
 
0.9%
45
 
0.9%
44
 
0.9%
Other values (1112) 4441
87.2%
Common
ValueCountFrequency (%)
~ 465
83.6%
) 35
 
6.3%
( 34
 
6.1%
15
 
2.7%
/ 6
 
1.1%
- 1
 
0.2%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
CJK 4927
87.2%
ASCII 556
 
9.8%
CJK Compat Ideographs 165
 
2.9%
Hangul 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
~ 465
83.6%
) 35
 
6.3%
( 34
 
6.1%
15
 
2.7%
/ 6
 
1.1%
- 1
 
0.2%
CJK
ValueCountFrequency (%)
121
 
2.5%
117
 
2.4%
72
 
1.5%
54
 
1.1%
53
 
1.1%
48
 
1.0%
45
 
0.9%
45
 
0.9%
44
 
0.9%
44
 
0.9%
Other values (1064) 4284
86.9%
CJK Compat Ideographs
ValueCountFrequency (%)
52
31.5%
13
 
7.9%
9
 
5.5%
8
 
4.8%
7
 
4.2%
5
 
3.0%
5
 
3.0%
3
 
1.8%
3
 
1.8%
3
 
1.8%
Other values (38) 57
34.5%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

독음
Text

Distinct2799
Distinct (%)99.1%
Missing0
Missing (%)0.0%
Memory size22.2 KiB
2024-03-15T09:27:12.740137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length28
Mean length11.368757
Min length2

Characters and Unicode

Total characters32094
Distinct characters34
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2775 ?
Unique (%)98.3%

Sample

1st rowga
2nd rowgagyo
3rd rowgakkesuri
4th rowganeumja
5th rowgadokgwon
ValueCountFrequency (%)
togi 16
 
0.6%
donggi 5
 
0.2%
sehyeong 5
 
0.2%
seokbu 3
 
0.1%
chahyeong 3
 
0.1%
seokhaek 3
 
0.1%
gidae 3
 
0.1%
budo 3
 
0.1%
seokdo 3
 
0.1%
ipsik 3
 
0.1%
Other values (2786) 2847
98.4%
2024-03-15T09:27:14.355588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5495
17.1%
a 2736
 
8.5%
g 2729
 
8.5%
n 2640
 
8.2%
o 2607
 
8.1%
e 2249
 
7.0%
u 1613
 
5.0%
i 1490
 
4.6%
m 1089
 
3.4%
s 1037
 
3.2%
Other values (24) 8409
26.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 25917
80.8%
Space Separator 5495
 
17.1%
Dash Punctuation 653
 
2.0%
Uppercase Letter 21
 
0.1%
Open Punctuation 4
 
< 0.1%
Close Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2736
10.6%
g 2729
10.5%
n 2640
 
10.2%
o 2607
 
10.1%
e 2249
 
8.7%
u 1613
 
6.2%
i 1490
 
5.7%
m 1089
 
4.2%
s 1037
 
4.0%
j 982
 
3.8%
Other values (11) 6745
26.0%
Uppercase Letter
ValueCountFrequency (%)
G 7
33.3%
D 6
28.6%
S 2
 
9.5%
N 1
 
4.8%
M 1
 
4.8%
B 1
 
4.8%
Y 1
 
4.8%
T 1
 
4.8%
H 1
 
4.8%
Space Separator
ValueCountFrequency (%)
5495
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 653
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 25938
80.8%
Common 6156
 
19.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2736
10.5%
g 2729
10.5%
n 2640
 
10.2%
o 2607
 
10.1%
e 2249
 
8.7%
u 1613
 
6.2%
i 1490
 
5.7%
m 1089
 
4.2%
s 1037
 
4.0%
j 982
 
3.8%
Other values (20) 6766
26.1%
Common
ValueCountFrequency (%)
5495
89.3%
- 653
 
10.6%
( 4
 
0.1%
) 4
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32094
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5495
17.1%
a 2736
 
8.5%
g 2729
 
8.5%
n 2640
 
8.2%
o 2607
 
8.1%
e 2249
 
7.0%
u 1613
 
5.0%
i 1490
 
4.6%
m 1089
 
3.4%
s 1037
 
3.2%
Other values (24) 8409
26.2%

설명
Text

Distinct2663
Distinct (%)94.3%
Missing0
Missing (%)0.0%
Memory size22.2 KiB
2024-03-15T09:27:15.790559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length657
Median length323
Mean length98.99752
Min length5

Characters and Unicode

Total characters279470
Distinct characters1317
Distinct categories14 ?
Distinct scripts4 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2521 ?
Unique (%)89.3%

Sample

1st row청동 제기 술잔 중 가장 큰 것. 1개의 손잡이가 달린 몸통에 3개의 다리와 2개의 꼭지가 달려 있음 (ritual wine vessel); the largest bronze vessel used for pouring wine in rituals; features include a handle, three legs, and two knobs
2nd row쌍가마 ssanggama
3rd row궤짝의 앞면은 금속 장식을 하고 내부는 여러 단을 두거나 서랍을 두어 문서나 귀중품을 보관하는 상자 모양 가구 (cabinet); a chest with decorated metal fittings on the outside, containing several shelves or drawers inside for storing documents or precious belongings
4th row도자기를 만들 때 물레 성형 시 기형의 높이와 둘레 치수를 동시에 재면서 작업하는 용구(用具) (ruler); a tool for measuring both the height and width of a vessel while it is on the potter’s wheel
5th row가정을 이끌어 가는 권한. ‘가독’은 맏아들을 일컫기도 함 (authority of the family head); authority given to the eldest son to act as head of the family
ValueCountFrequency (%)
a 2272
 
4.4%
the 1943
 
3.7%
of 1559
 
3.0%
in 670
 
1.3%
and 628
 
1.2%
to 609
 
1.2%
or 594
 
1.1%
with 511
 
1.0%
on 378
 
0.7%
for 374
 
0.7%
Other values (13980) 42579
81.7%
2024-03-15T09:27:17.731410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
50203
18.0%
e 18639
 
6.7%
a 15747
 
5.6%
o 13995
 
5.0%
t 13233
 
4.7%
n 12674
 
4.5%
i 11941
 
4.3%
r 11296
 
4.0%
s 10026
 
3.6%
h 7483
 
2.7%
Other values (1307) 114233
40.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 169267
60.6%
Space Separator 50203
 
18.0%
Other Letter 48215
 
17.3%
Other Punctuation 4431
 
1.6%
Close Punctuation 2181
 
0.8%
Open Punctuation 2180
 
0.8%
Uppercase Letter 1257
 
0.4%
Dash Punctuation 950
 
0.3%
Final Punctuation 298
 
0.1%
Decimal Number 240
 
0.1%
Other values (4) 248
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1282
 
2.7%
1241
 
2.6%
1206
 
2.5%
1182
 
2.5%
1079
 
2.2%
947
 
2.0%
860
 
1.8%
775
 
1.6%
736
 
1.5%
673
 
1.4%
Other values (1223) 38234
79.3%
Lowercase Letter
ValueCountFrequency (%)
e 18639
 
11.0%
a 15747
 
9.3%
o 13995
 
8.3%
t 13233
 
7.8%
n 12674
 
7.5%
i 11941
 
7.1%
r 11296
 
6.7%
s 10026
 
5.9%
h 7483
 
4.4%
d 7414
 
4.4%
Other values (16) 46819
27.7%
Uppercase Letter
ValueCountFrequency (%)
B 387
30.8%
C 93
 
7.4%
S 85
 
6.8%
G 78
 
6.2%
K 77
 
6.1%
A 75
 
6.0%
N 48
 
3.8%
J 48
 
3.8%
T 46
 
3.7%
H 43
 
3.4%
Other values (16) 277
22.0%
Decimal Number
ValueCountFrequency (%)
1 56
23.3%
0 52
21.7%
3 38
15.8%
2 35
14.6%
4 17
 
7.1%
5 17
 
7.1%
6 10
 
4.2%
7 6
 
2.5%
8 5
 
2.1%
9 4
 
1.7%
Other Punctuation
ValueCountFrequency (%)
; 2155
48.6%
, 1590
35.9%
. 618
 
13.9%
/ 43
 
1.0%
: 21
 
0.5%
2
 
< 0.1%
! 2
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 2158
98.9%
] 23
 
1.1%
Open Punctuation
ValueCountFrequency (%)
( 2157
98.9%
[ 23
 
1.1%
Final Punctuation
ValueCountFrequency (%)
251
84.2%
47
 
15.8%
Math Symbol
ValueCountFrequency (%)
159
94.1%
~ 10
 
5.9%
Initial Punctuation
ValueCountFrequency (%)
47
72.3%
18
 
27.7%
Other Symbol
ValueCountFrequency (%)
12
92.3%
1
 
7.7%
Space Separator
ValueCountFrequency (%)
50203
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 950
100.0%
Modifier Symbol
ValueCountFrequency (%)
´ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 170524
61.0%
Common 60731
 
21.7%
Hangul 47689
 
17.1%
Han 526
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1282
 
2.7%
1241
 
2.6%
1206
 
2.5%
1182
 
2.5%
1079
 
2.3%
947
 
2.0%
860
 
1.8%
775
 
1.6%
736
 
1.5%
673
 
1.4%
Other values (905) 37708
79.1%
Han
ValueCountFrequency (%)
25
 
4.8%
8
 
1.5%
7
 
1.3%
7
 
1.3%
7
 
1.3%
6
 
1.1%
6
 
1.1%
5
 
1.0%
5
 
1.0%
5
 
1.0%
Other values (308) 445
84.6%
Latin
ValueCountFrequency (%)
e 18639
 
10.9%
a 15747
 
9.2%
o 13995
 
8.2%
t 13233
 
7.8%
n 12674
 
7.4%
i 11941
 
7.0%
r 11296
 
6.6%
s 10026
 
5.9%
h 7483
 
4.4%
d 7414
 
4.3%
Other values (42) 48076
28.2%
Common
ValueCountFrequency (%)
50203
82.7%
) 2158
 
3.6%
( 2157
 
3.6%
; 2155
 
3.5%
, 1590
 
2.6%
- 950
 
1.6%
. 618
 
1.0%
251
 
0.4%
159
 
0.3%
1 56
 
0.1%
Other values (22) 434
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 230717
82.6%
Hangul 47688
 
17.1%
CJK 502
 
0.2%
Punctuation 363
 
0.1%
Arrows 159
 
0.1%
CJK Compat Ideographs 24
 
< 0.1%
Letterlike Symbols 12
 
< 0.1%
None 3
 
< 0.1%
CJK Compat 1
 
< 0.1%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
50203
21.8%
e 18639
 
8.1%
a 15747
 
6.8%
o 13995
 
6.1%
t 13233
 
5.7%
n 12674
 
5.5%
i 11941
 
5.2%
r 11296
 
4.9%
s 10026
 
4.3%
h 7483
 
3.2%
Other values (65) 65480
28.4%
Hangul
ValueCountFrequency (%)
1282
 
2.7%
1241
 
2.6%
1206
 
2.5%
1182
 
2.5%
1079
 
2.3%
947
 
2.0%
860
 
1.8%
775
 
1.6%
736
 
1.5%
673
 
1.4%
Other values (904) 37707
79.1%
Punctuation
ValueCountFrequency (%)
251
69.1%
47
 
12.9%
47
 
12.9%
18
 
5.0%
Arrows
ValueCountFrequency (%)
159
100.0%
CJK
ValueCountFrequency (%)
25
 
5.0%
8
 
1.6%
7
 
1.4%
7
 
1.4%
7
 
1.4%
6
 
1.2%
6
 
1.2%
5
 
1.0%
5
 
1.0%
5
 
1.0%
Other values (290) 421
83.9%
Letterlike Symbols
ValueCountFrequency (%)
12
100.0%
CJK Compat Ideographs
ValueCountFrequency (%)
4
16.7%
2
 
8.3%
2
 
8.3%
2
 
8.3%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Other values (8) 8
33.3%
None
ValueCountFrequency (%)
2
66.7%
´ 1
33.3%
CJK Compat
ValueCountFrequency (%)
1
100.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%

Missing values

2024-03-15T09:27:04.629303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-15T09:27:04.910858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

시작자음한자독음설명
0<NA>ga청동 제기 술잔 중 가장 큰 것. 1개의 손잡이가 달린 몸통에 3개의 다리와 2개의 꼭지가 달려 있음 (ritual wine vessel); the largest bronze vessel used for pouring wine in rituals; features include a handle, three legs, and two knobs
1가교駕轎gagyo쌍가마 ssanggama
2가께수리<NA>gakkesuri궤짝의 앞면은 금속 장식을 하고 내부는 여러 단을 두거나 서랍을 두어 문서나 귀중품을 보관하는 상자 모양 가구 (cabinet); a chest with decorated metal fittings on the outside, containing several shelves or drawers inside for storing documents or precious belongings
3가늠자<NA>ganeumja도자기를 만들 때 물레 성형 시 기형의 높이와 둘레 치수를 동시에 재면서 작업하는 용구(用具) (ruler); a tool for measuring both the height and width of a vessel while it is on the potter’s wheel
4가독권家督權gadokgwon가정을 이끌어 가는 권한. ‘가독’은 맏아들을 일컫기도 함 (authority of the family head); authority given to the eldest son to act as head of the family
5가라빈가迦羅頻伽garabinga가릉빈가 gareungbinga
6가락바퀴<NA>garak-bakwi물레로 실을 자을 때 쓰는, 가락에 끼워 회전을 돕는 바퀴. 석제품과 토제품이 있음 (spindle whorl); a disc made of either stone or pottery sherd, attached to a spindle to help rotation when spinning yarn
7가락지<NA>garakji손가락에 끼는 꾸미개의 하나 (ring); a ring worn on the finger
8가락지장식~裝飾garakji-jangsik감잡이 gamjabi
9가란화잠加蘭花簪garanhwajam난초비녀 nancho-binyeo
시작자음한자독음설명
2813횡구식橫口式hoenggusik앞트기식 apteugisik
2814횡혈식橫穴式hoenghyeolsik굴식 gulsik
2815횡혈식 돌칸무덤橫穴式~hoenghyeolsik dolkan-mudeom(북) → 굴식 돌방무덤 gulsik dolbang-mudeom
2816횡혈식 석실분橫穴式石室墳hoenghyeolsik seoksilbun굴식 돌방무덤 gulsik dolbang-mudeom
2817후륜後輪huryun뒤가리개 dwigarigae
2818휘모리<NA>hwimori우리 음악의 가장 빠른 장단 (hwimori rhythm); the fastest tempo in Korean music
2819흑도黑陶heukdo검은간토기 geomeun-gantogi
2820흑유黑釉heugyu철분이 많이 함유된 유약을 씌운 후 번조하여 흑갈색으로 된 것 (black glaze); a glaze high in iron content, which gives the vessel a black hue when fired
2821흙불~佛heukbul토불 tobul
2822흰자기~瓷(磁)器huinjagi백자 baekja