Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells146
Missing cells (%)0.3%
Duplicate rows198
Duplicate rows (%)2.0%
Total size in memory468.8 KiB
Average record size in memory48.0 B

Variable types

Text4
Categorical1

Dataset

Description국립민속박물관 자료실의 도서자료DB
Author문화체육관광부 국립민속박물관
URLhttps://www.data.go.kr/data/3074339/fileData.do

Alerts

Dataset has 198 (2.0%) duplicate rowsDuplicates
자료실 is highly imbalanced (66.7%)Imbalance

Reproduction

Analysis started2023-12-12 11:16:00.807717
Analysis finished2023-12-12 11:16:07.006949
Duration6.2 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct9652
Distinct (%)96.6%
Missing12
Missing (%)0.1%
Memory size156.2 KiB
2023-12-12T20:16:07.405876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length334
Median length167
Mean length22.211954
Min length1

Characters and Unicode

Total characters221853
Distinct characters3044
Distinct categories18 ?
Distinct scripts7 ?
Distinct blocks15 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9366 ?
Unique (%)93.8%

Sample

1st row(전통문화)한국의 전통예술 : 기지시줄다리기
2nd row민속과 종교
3rd row국왕경응조무구정탑원기 : 정밀 학술 조사보고서
4th row문화의 세계화
5th row漢語大辭典 . 2
ValueCountFrequency (%)
5200
 
12.0%
of 548
 
1.3%
the 453
 
1.0%
1 363
 
0.8%
2 361
 
0.8%
245
 
0.6%
and 242
 
0.6%
3 226
 
0.5%
in 197
 
0.5%
한국의 178
 
0.4%
Other values (19184) 35418
81.6%
2023-12-12T20:16:08.248319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
36509
 
16.5%
e 6239
 
2.8%
. 4459
 
2.0%
n 4267
 
1.9%
a 3989
 
1.8%
i 3962
 
1.8%
o 3933
 
1.8%
r 3438
 
1.5%
t 3421
 
1.5%
: 3265
 
1.5%
Other values (3034) 148371
66.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 97566
44.0%
Lowercase Letter 48916
22.0%
Space Separator 36509
 
16.5%
Decimal Number 12844
 
5.8%
Other Punctuation 10283
 
4.6%
Uppercase Letter 9600
 
4.3%
Open Punctuation 2048
 
0.9%
Close Punctuation 2047
 
0.9%
Math Symbol 986
 
0.4%
Dash Punctuation 951
 
0.4%
Other values (8) 103
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2314
 
2.4%
1432
 
1.5%
1373
 
1.4%
1274
 
1.3%
1204
 
1.2%
1159
 
1.2%
1134
 
1.2%
946
 
1.0%
919
 
0.9%
893
 
0.9%
Other values (2860) 84918
87.0%
Uppercase Letter
ValueCountFrequency (%)
A 766
 
8.0%
T 707
 
7.4%
E 706
 
7.4%
S 698
 
7.3%
I 559
 
5.8%
N 490
 
5.1%
K 489
 
5.1%
R 477
 
5.0%
C 463
 
4.8%
O 452
 
4.7%
Other values (47) 3793
39.5%
Lowercase Letter
ValueCountFrequency (%)
e 6239
12.8%
n 4267
 
8.7%
a 3989
 
8.2%
i 3962
 
8.1%
o 3933
 
8.0%
r 3438
 
7.0%
t 3421
 
7.0%
s 2939
 
6.0%
l 2338
 
4.8%
u 2251
 
4.6%
Other values (46) 12139
24.8%
Other Punctuation
ValueCountFrequency (%)
. 4459
43.4%
: 3265
31.8%
1666
 
16.2%
· 623
 
6.1%
' 128
 
1.2%
/ 49
 
0.5%
& 32
 
0.3%
" 26
 
0.3%
! 16
 
0.2%
; 14
 
0.1%
Other values (3) 5
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 3195
24.9%
2 2113
16.5%
0 1659
12.9%
9 1120
 
8.7%
3 1078
 
8.4%
5 873
 
6.8%
4 809
 
6.3%
7 687
 
5.3%
8 670
 
5.2%
6 640
 
5.0%
Letter Number
ValueCountFrequency (%)
33
37.5%
33
37.5%
8
 
9.1%
8
 
9.1%
2
 
2.3%
2
 
2.3%
1
 
1.1%
1
 
1.1%
Math Symbol
ValueCountFrequency (%)
= 856
86.8%
~ 92
 
9.3%
21
 
2.1%
> 6
 
0.6%
+ 5
 
0.5%
< 5
 
0.5%
1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 2002
97.8%
20
 
1.0%
[ 17
 
0.8%
8
 
0.4%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 2003
97.9%
19
 
0.9%
] 17
 
0.8%
8
 
0.4%
Other Number
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
66.7%
˙ 1
33.3%
Final Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
36509
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 951
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%
Initial Punctuation
ValueCountFrequency (%)
1
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 65683
29.6%
Hangul 58952
26.6%
Latin 58172
26.2%
Han 36813
16.6%
Hiragana 1062
 
0.5%
Katakana 739
 
0.3%
Cyrillic 432
 
0.2%

Most frequent character per script

Han
ValueCountFrequency (%)
1134
 
3.1%
919
 
2.5%
796
 
2.2%
540
 
1.5%
526
 
1.4%
519
 
1.4%
491
 
1.3%
490
 
1.3%
442
 
1.2%
392
 
1.1%
Other values (1838) 30564
83.0%
Hangul
ValueCountFrequency (%)
2314
 
3.9%
1432
 
2.4%
1373
 
2.3%
1274
 
2.2%
1204
 
2.0%
1159
 
2.0%
946
 
1.6%
893
 
1.5%
818
 
1.4%
786
 
1.3%
Other values (875) 46753
79.3%
Katakana
ValueCountFrequency (%)
67
 
9.1%
50
 
6.8%
38
 
5.1%
34
 
4.6%
34
 
4.6%
29
 
3.9%
27
 
3.7%
26
 
3.5%
24
 
3.2%
22
 
3.0%
Other values (67) 388
52.5%
Latin
ValueCountFrequency (%)
e 6239
 
10.7%
n 4267
 
7.3%
a 3989
 
6.9%
i 3962
 
6.8%
o 3933
 
6.8%
r 3438
 
5.9%
t 3421
 
5.9%
s 2939
 
5.1%
l 2338
 
4.0%
u 2251
 
3.9%
Other values (52) 21395
36.8%
Hiragana
ValueCountFrequency (%)
401
37.8%
138
 
13.0%
38
 
3.6%
36
 
3.4%
31
 
2.9%
31
 
2.9%
28
 
2.6%
23
 
2.2%
20
 
1.9%
20
 
1.9%
Other values (50) 296
27.9%
Cyrillic
ValueCountFrequency (%)
О 27
 
6.2%
И 25
 
5.8%
А 22
 
5.1%
а 18
 
4.2%
о 18
 
4.2%
Н 17
 
3.9%
и 16
 
3.7%
К 15
 
3.5%
Е 15
 
3.5%
В 14
 
3.2%
Other values (49) 245
56.7%
Common
ValueCountFrequency (%)
36509
55.6%
. 4459
 
6.8%
: 3265
 
5.0%
1 3195
 
4.9%
2 2113
 
3.2%
) 2003
 
3.0%
( 2002
 
3.0%
1666
 
2.5%
0 1659
 
2.5%
9 1120
 
1.7%
Other values (43) 7692
 
11.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 121381
54.7%
Hangul 58898
26.5%
CJK 35997
 
16.2%
None 2375
 
1.1%
Hiragana 1062
 
0.5%
CJK Compat Ideographs 816
 
0.4%
Katakana 739
 
0.3%
Cyrillic 432
 
0.2%
Number Forms 88
 
< 0.1%
Compat Jamo 54
 
< 0.1%
Other values (5) 11
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
36509
30.1%
e 6239
 
5.1%
. 4459
 
3.7%
n 4267
 
3.5%
a 3989
 
3.3%
i 3962
 
3.3%
o 3933
 
3.2%
r 3438
 
2.8%
t 3421
 
2.8%
: 3265
 
2.7%
Other values (75) 47899
39.5%
Hangul
ValueCountFrequency (%)
2314
 
3.9%
1432
 
2.4%
1373
 
2.3%
1274
 
2.2%
1204
 
2.0%
1159
 
2.0%
946
 
1.6%
893
 
1.5%
818
 
1.4%
786
 
1.3%
Other values (863) 46699
79.3%
None
ValueCountFrequency (%)
1666
70.1%
· 623
 
26.2%
21
 
0.9%
20
 
0.8%
19
 
0.8%
8
 
0.3%
8
 
0.3%
æ 4
 
0.2%
ß 2
 
0.1%
2
 
0.1%
Other values (2) 2
 
0.1%
CJK
ValueCountFrequency (%)
1134
 
3.2%
919
 
2.6%
796
 
2.2%
540
 
1.5%
526
 
1.5%
519
 
1.4%
491
 
1.4%
490
 
1.4%
442
 
1.2%
392
 
1.1%
Other values (1751) 29748
82.6%
Hiragana
ValueCountFrequency (%)
401
37.8%
138
 
13.0%
38
 
3.6%
36
 
3.4%
31
 
2.9%
31
 
2.9%
28
 
2.6%
23
 
2.2%
20
 
1.9%
20
 
1.9%
Other values (50) 296
27.9%
CJK Compat Ideographs
ValueCountFrequency (%)
145
17.8%
86
 
10.5%
77
 
9.4%
59
 
7.2%
49
 
6.0%
39
 
4.8%
36
 
4.4%
32
 
3.9%
30
 
3.7%
17
 
2.1%
Other values (77) 246
30.1%
Katakana
ValueCountFrequency (%)
67
 
9.1%
50
 
6.8%
38
 
5.1%
34
 
4.6%
34
 
4.6%
29
 
3.9%
27
 
3.7%
26
 
3.5%
24
 
3.2%
22
 
3.0%
Other values (67) 388
52.5%
Number Forms
ValueCountFrequency (%)
33
37.5%
33
37.5%
8
 
9.1%
8
 
9.1%
2
 
2.3%
2
 
2.3%
1
 
1.1%
1
 
1.1%
Compat Jamo
ValueCountFrequency (%)
28
51.9%
6
 
11.1%
5
 
9.3%
4
 
7.4%
3
 
5.6%
2
 
3.7%
1
 
1.9%
1
 
1.9%
1
 
1.9%
1
 
1.9%
Other values (2) 2
 
3.7%
Cyrillic
ValueCountFrequency (%)
О 27
 
6.2%
И 25
 
5.8%
А 22
 
5.1%
а 18
 
4.2%
о 18
 
4.2%
Н 17
 
3.9%
и 16
 
3.7%
К 15
 
3.5%
Е 15
 
3.5%
В 14
 
3.2%
Other values (49) 245
56.7%
Geometric Shapes
ValueCountFrequency (%)
2
100.0%
Enclosed Alphanum
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Punctuation
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Math Operators
ValueCountFrequency (%)
1
100.0%
Modifier Letters
ValueCountFrequency (%)
˙ 1
100.0%

저자
Text

Distinct6532
Distinct (%)65.5%
Missing33
Missing (%)0.3%
Memory size156.2 KiB
2023-12-12T20:16:08.675360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length158
Median length106
Mean length10.441156
Min length2

Characters and Unicode

Total characters104067
Distinct characters2146
Distinct categories12 ?
Distinct scripts7 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5519 ?
Unique (%)55.4%

Sample

1st row문화체육부 문화재관리국
2nd row비교민속학회 편
3rd row문화재청, 불교문화재연구소 [편]
4th row바르니에 쟝-피에르 지음 ; 주형일 옮김
5th row韓語大辭典編輯委員會
ValueCountFrequency (%)
1545
 
7.1%
1390
 
6.4%
566
 
2.6%
520
 
2.4%
지음 515
 
2.4%
국립민속박물관 414
 
1.9%
347
 
1.6%
옮김 196
 
0.9%
123
 
0.6%
110
 
0.5%
Other values (8682) 15940
73.6%
2023-12-12T20:16:09.384057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
12981
 
12.5%
2022
 
1.9%
e 1827
 
1.8%
1783
 
1.7%
[ 1475
 
1.4%
] 1473
 
1.4%
; 1362
 
1.3%
1337
 
1.3%
a 1299
 
1.2%
1273
 
1.2%
Other values (2136) 77235
74.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66750
64.1%
Lowercase Letter 13921
 
13.4%
Space Separator 12981
 
12.5%
Uppercase Letter 4416
 
4.2%
Other Punctuation 2882
 
2.8%
Open Punctuation 1493
 
1.4%
Close Punctuation 1491
 
1.4%
Decimal Number 80
 
0.1%
Dash Punctuation 48
 
< 0.1%
Math Symbol 2
 
< 0.1%
Other values (2) 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2022
 
3.0%
1783
 
2.7%
1337
 
2.0%
1273
 
1.9%
1182
 
1.8%
1086
 
1.6%
1059
 
1.6%
981
 
1.5%
898
 
1.3%
846
 
1.3%
Other values (1995) 54283
81.3%
Uppercase Letter
ValueCountFrequency (%)
S 324
 
7.3%
A 299
 
6.8%
M 288
 
6.5%
K 266
 
6.0%
B 256
 
5.8%
E 248
 
5.6%
H 245
 
5.5%
R 223
 
5.0%
C 211
 
4.8%
N 198
 
4.5%
Other values (44) 1858
42.1%
Lowercase Letter
ValueCountFrequency (%)
e 1827
13.1%
a 1299
 
9.3%
r 1257
 
9.0%
n 1186
 
8.5%
i 1066
 
7.7%
o 949
 
6.8%
t 824
 
5.9%
s 732
 
5.3%
l 721
 
5.2%
u 585
 
4.2%
Other values (43) 3475
25.0%
Decimal Number
ValueCountFrequency (%)
2 26
32.5%
0 22
27.5%
1 13
16.2%
9 6
 
7.5%
5 4
 
5.0%
4 4
 
5.0%
7 2
 
2.5%
3 2
 
2.5%
8 1
 
1.2%
Other Punctuation
ValueCountFrequency (%)
; 1362
47.3%
747
25.9%
: 370
 
12.8%
. 345
 
12.0%
· 33
 
1.1%
& 10
 
0.3%
' 8
 
0.3%
/ 7
 
0.2%
Open Punctuation
ValueCountFrequency (%)
[ 1475
98.8%
( 13
 
0.9%
3
 
0.2%
2
 
0.1%
Close Punctuation
ValueCountFrequency (%)
] 1473
98.8%
) 13
 
0.9%
3
 
0.2%
2
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 45
93.8%
2
 
4.2%
1
 
2.1%
Math Symbol
ValueCountFrequency (%)
+ 1
50.0%
= 1
50.0%
Modifier Symbol
ValueCountFrequency (%)
¨ 1
50.0%
˙ 1
50.0%
Space Separator
ValueCountFrequency (%)
12981
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 43154
41.5%
Han 23181
22.3%
Common 18980
18.2%
Latin 17997
17.3%
Cyrillic 340
 
0.3%
Katakana 326
 
0.3%
Hiragana 89
 
0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
981
 
4.2%
785
 
3.4%
697
 
3.0%
679
 
2.9%
673
 
2.9%
596
 
2.6%
580
 
2.5%
497
 
2.1%
486
 
2.1%
435
 
1.9%
Other values (1286) 16772
72.4%
Hangul
ValueCountFrequency (%)
2022
 
4.7%
1783
 
4.1%
1337
 
3.1%
1273
 
2.9%
1182
 
2.7%
1086
 
2.5%
1059
 
2.5%
898
 
2.1%
846
 
2.0%
825
 
1.9%
Other values (608) 30843
71.5%
Katakana
ValueCountFrequency (%)
22
 
6.7%
21
 
6.4%
18
 
5.5%
18
 
5.5%
16
 
4.9%
12
 
3.7%
12
 
3.7%
12
 
3.7%
11
 
3.4%
10
 
3.1%
Other values (52) 174
53.4%
Latin
ValueCountFrequency (%)
e 1827
 
10.2%
a 1299
 
7.2%
r 1257
 
7.0%
n 1186
 
6.6%
i 1066
 
5.9%
o 949
 
5.3%
t 824
 
4.6%
s 732
 
4.1%
l 721
 
4.0%
u 585
 
3.3%
Other values (44) 7551
42.0%
Cyrillic
ValueCountFrequency (%)
И 22
 
6.5%
А 20
 
5.9%
С 20
 
5.9%
Т 17
 
5.0%
К 17
 
5.0%
О 16
 
4.7%
Н 14
 
4.1%
а 14
 
4.1%
н 14
 
4.1%
Е 12
 
3.5%
Other values (43) 174
51.2%
Common
ValueCountFrequency (%)
12981
68.4%
[ 1475
 
7.8%
] 1473
 
7.8%
; 1362
 
7.2%
747
 
3.9%
: 370
 
1.9%
. 345
 
1.8%
- 45
 
0.2%
· 33
 
0.2%
2 26
 
0.1%
Other values (24) 123
 
0.6%
Hiragana
ValueCountFrequency (%)
18
20.2%
11
12.4%
8
 
9.0%
8
 
9.0%
6
 
6.7%
6
 
6.7%
4
 
4.5%
4
 
4.5%
2
 
2.2%
2
 
2.2%
Other values (19) 20
22.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 43152
41.5%
ASCII 36176
34.8%
CJK 22603
21.7%
None 798
 
0.8%
CJK Compat Ideographs 578
 
0.6%
Cyrillic 340
 
0.3%
Katakana 326
 
0.3%
Hiragana 89
 
0.1%
Compat Jamo 2
 
< 0.1%
Punctuation 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12981
35.9%
e 1827
 
5.1%
[ 1475
 
4.1%
] 1473
 
4.1%
; 1362
 
3.8%
a 1299
 
3.6%
r 1257
 
3.5%
n 1186
 
3.3%
i 1066
 
2.9%
o 949
 
2.6%
Other values (65) 11301
31.2%
Hangul
ValueCountFrequency (%)
2022
 
4.7%
1783
 
4.1%
1337
 
3.1%
1273
 
3.0%
1182
 
2.7%
1086
 
2.5%
1059
 
2.5%
898
 
2.1%
846
 
2.0%
825
 
1.9%
Other values (607) 30841
71.5%
CJK
ValueCountFrequency (%)
981
 
4.3%
785
 
3.5%
697
 
3.1%
679
 
3.0%
673
 
3.0%
596
 
2.6%
580
 
2.6%
497
 
2.2%
486
 
2.2%
435
 
1.9%
Other values (1231) 16194
71.6%
None
ValueCountFrequency (%)
747
93.6%
· 33
 
4.1%
æ 4
 
0.5%
3
 
0.4%
3
 
0.4%
2
 
0.3%
2
 
0.3%
2
 
0.3%
¨ 1
 
0.1%
ø 1
 
0.1%
CJK Compat Ideographs
ValueCountFrequency (%)
273
47.2%
30
 
5.2%
23
 
4.0%
22
 
3.8%
21
 
3.6%
18
 
3.1%
18
 
3.1%
17
 
2.9%
14
 
2.4%
11
 
1.9%
Other values (45) 131
22.7%
Cyrillic
ValueCountFrequency (%)
И 22
 
6.5%
А 20
 
5.9%
С 20
 
5.9%
Т 17
 
5.0%
К 17
 
5.0%
О 16
 
4.7%
Н 14
 
4.1%
а 14
 
4.1%
н 14
 
4.1%
Е 12
 
3.5%
Other values (43) 174
51.2%
Katakana
ValueCountFrequency (%)
22
 
6.7%
21
 
6.4%
18
 
5.5%
18
 
5.5%
16
 
4.9%
12
 
3.7%
12
 
3.7%
12
 
3.7%
11
 
3.4%
10
 
3.1%
Other values (52) 174
53.4%
Hiragana
ValueCountFrequency (%)
18
20.2%
11
12.4%
8
 
9.0%
8
 
9.0%
6
 
6.7%
6
 
6.7%
4
 
4.5%
4
 
4.5%
2
 
2.2%
2
 
2.2%
Other values (19) 20
22.5%
Compat Jamo
ValueCountFrequency (%)
2
100.0%
Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%
Modifier Letters
ValueCountFrequency (%)
˙ 1
100.0%
Distinct3978
Distinct (%)40.1%
Missing79
Missing (%)0.8%
Memory size156.2 KiB
2023-12-12T20:16:09.786118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length91
Median length71
Mean length7.532406
Min length1

Characters and Unicode

Total characters74729
Distinct characters1394
Distinct categories13 ?
Distinct scripts7 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2712 ?
Unique (%)27.3%

Sample

1st row한국문화재보호재단
2nd row민속원
3rd row문화재청
4th row한울
5th row한어대사전출판사
ValueCountFrequency (%)
국립민속박물관 441
 
3.6%
verlag 232
 
1.9%
민속원 170
 
1.4%
민족문화추진회 148
 
1.2%
한국정신문화연구원 140
 
1.1%
문화재관리국 116
 
0.9%
국립문화재연구소 100
 
0.8%
국사편찬위원회 89
 
0.7%
of 66
 
0.5%
문화재청 65
 
0.5%
Other values (4275) 10733
87.3%
2023-12-12T20:16:10.628392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2379
 
3.2%
2060
 
2.8%
1965
 
2.6%
e 1709
 
2.3%
1658
 
2.2%
1610
 
2.2%
1524
 
2.0%
1376
 
1.8%
1294
 
1.7%
a 1179
 
1.6%
Other values (1384) 57975
77.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 54488
72.9%
Lowercase Letter 13216
 
17.7%
Uppercase Letter 3986
 
5.3%
Space Separator 2379
 
3.2%
Other Punctuation 287
 
0.4%
Open Punctuation 131
 
0.2%
Close Punctuation 130
 
0.2%
Decimal Number 70
 
0.1%
Dash Punctuation 38
 
0.1%
Math Symbol 1
 
< 0.1%
Other values (3) 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2060
 
3.8%
1965
 
3.6%
1658
 
3.0%
1610
 
3.0%
1524
 
2.8%
1376
 
2.5%
1294
 
2.4%
1156
 
2.1%
1151
 
2.1%
1098
 
2.0%
Other values (1245) 39596
72.7%
Lowercase Letter
ValueCountFrequency (%)
e 1709
12.9%
a 1179
 
8.9%
r 1133
 
8.6%
n 1017
 
7.7%
i 954
 
7.2%
s 887
 
6.7%
l 817
 
6.2%
o 791
 
6.0%
t 713
 
5.4%
u 606
 
4.6%
Other values (44) 3410
25.8%
Uppercase Letter
ValueCountFrequency (%)
V 351
 
8.8%
S 269
 
6.7%
A 267
 
6.7%
B 257
 
6.4%
C 230
 
5.8%
T 222
 
5.6%
M 199
 
5.0%
K 192
 
4.8%
P 186
 
4.7%
E 180
 
4.5%
Other values (43) 1633
41.0%
Other Punctuation
ValueCountFrequency (%)
. 174
60.6%
29
 
10.1%
28
 
9.8%
· 20
 
7.0%
' 18
 
6.3%
& 13
 
4.5%
" 2
 
0.7%
; 1
 
0.3%
: 1
 
0.3%
/ 1
 
0.3%
Decimal Number
ValueCountFrequency (%)
2 25
35.7%
0 16
22.9%
1 12
17.1%
9 5
 
7.1%
7 4
 
5.7%
3 2
 
2.9%
4 2
 
2.9%
8 2
 
2.9%
5 1
 
1.4%
6 1
 
1.4%
Open Punctuation
ValueCountFrequency (%)
[ 124
94.7%
( 6
 
4.6%
1
 
0.8%
Close Punctuation
ValueCountFrequency (%)
] 123
94.6%
) 6
 
4.6%
1
 
0.8%
Space Separator
ValueCountFrequency (%)
2379
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 38
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%
Modifier Symbol
ValueCountFrequency (%)
¨ 1
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 41508
55.5%
Latin 16848
22.5%
Han 12823
 
17.2%
Common 3039
 
4.1%
Cyrillic 354
 
0.5%
Katakana 125
 
0.2%
Hiragana 32
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
677
 
5.3%
606
 
4.7%
474
 
3.7%
440
 
3.4%
362
 
2.8%
359
 
2.8%
310
 
2.4%
301
 
2.3%
297
 
2.3%
288
 
2.2%
Other values (702) 8709
67.9%
Hangul
ValueCountFrequency (%)
2060
 
5.0%
1965
 
4.7%
1658
 
4.0%
1610
 
3.9%
1524
 
3.7%
1376
 
3.3%
1294
 
3.1%
1156
 
2.8%
1151
 
2.8%
1098
 
2.6%
Other values (472) 26616
64.1%
Cyrillic
ValueCountFrequency (%)
А 21
 
5.9%
Т 16
 
4.5%
е 15
 
4.2%
К 15
 
4.2%
И 14
 
4.0%
Н 13
 
3.7%
и 13
 
3.7%
о 13
 
3.7%
Р 13
 
3.7%
У 12
 
3.4%
Other values (44) 209
59.0%
Latin
ValueCountFrequency (%)
e 1709
 
10.1%
a 1179
 
7.0%
r 1133
 
6.7%
n 1017
 
6.0%
i 954
 
5.7%
s 887
 
5.3%
l 817
 
4.8%
o 791
 
4.7%
t 713
 
4.2%
u 606
 
3.6%
Other values (43) 7042
41.8%
Katakana
ValueCountFrequency (%)
10
 
8.0%
9
 
7.2%
6
 
4.8%
6
 
4.8%
6
 
4.8%
5
 
4.0%
4
 
3.2%
4
 
3.2%
4
 
3.2%
4
 
3.2%
Other values (33) 67
53.6%
Common
ValueCountFrequency (%)
2379
78.3%
. 174
 
5.7%
[ 124
 
4.1%
] 123
 
4.0%
- 38
 
1.3%
29
 
1.0%
28
 
0.9%
2 25
 
0.8%
· 20
 
0.7%
' 18
 
0.6%
Other values (22) 81
 
2.7%
Hiragana
ValueCountFrequency (%)
6
18.8%
5
15.6%
2
 
6.2%
2
 
6.2%
2
 
6.2%
2
 
6.2%
2
 
6.2%
1
 
3.1%
1
 
3.1%
1
 
3.1%
Other values (8) 8
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 41508
55.5%
ASCII 19801
26.5%
CJK 12709
 
17.0%
Cyrillic 354
 
0.5%
Katakana 125
 
0.2%
CJK Compat Ideographs 114
 
0.2%
None 84
 
0.1%
Hiragana 32
 
< 0.1%
Geometric Shapes 1
 
< 0.1%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2379
 
12.0%
e 1709
 
8.6%
a 1179
 
6.0%
r 1133
 
5.7%
n 1017
 
5.1%
i 954
 
4.8%
s 887
 
4.5%
l 817
 
4.1%
o 791
 
4.0%
t 713
 
3.6%
Other values (66) 8222
41.5%
Hangul
ValueCountFrequency (%)
2060
 
5.0%
1965
 
4.7%
1658
 
4.0%
1610
 
3.9%
1524
 
3.7%
1376
 
3.3%
1294
 
3.1%
1156
 
2.8%
1151
 
2.8%
1098
 
2.6%
Other values (472) 26616
64.1%
CJK
ValueCountFrequency (%)
677
 
5.3%
606
 
4.8%
474
 
3.7%
440
 
3.5%
362
 
2.8%
359
 
2.8%
310
 
2.4%
301
 
2.4%
297
 
2.3%
288
 
2.3%
Other values (682) 8595
67.6%
None
ValueCountFrequency (%)
29
34.5%
28
33.3%
· 20
23.8%
æ 4
 
4.8%
1
 
1.2%
¨ 1
 
1.2%
1
 
1.2%
CJK Compat Ideographs
ValueCountFrequency (%)
23
20.2%
18
15.8%
13
11.4%
9
 
7.9%
7
 
6.1%
7
 
6.1%
6
 
5.3%
6
 
5.3%
4
 
3.5%
4
 
3.5%
Other values (10) 17
14.9%
Cyrillic
ValueCountFrequency (%)
А 21
 
5.9%
Т 16
 
4.5%
е 15
 
4.2%
К 15
 
4.2%
И 14
 
4.0%
Н 13
 
3.7%
и 13
 
3.7%
о 13
 
3.7%
Р 13
 
3.7%
У 12
 
3.4%
Other values (44) 209
59.0%
Katakana
ValueCountFrequency (%)
10
 
8.0%
9
 
7.2%
6
 
4.8%
6
 
4.8%
6
 
4.8%
5
 
4.0%
4
 
3.2%
4
 
3.2%
4
 
3.2%
4
 
3.2%
Other values (33) 67
53.6%
Hiragana
ValueCountFrequency (%)
6
18.8%
5
15.6%
2
 
6.2%
2
 
6.2%
2
 
6.2%
2
 
6.2%
2
 
6.2%
1
 
3.1%
1
 
3.1%
1
 
3.1%
Other values (8) 8
25.0%
Geometric Shapes
ValueCountFrequency (%)
1
100.0%
Punctuation
ValueCountFrequency (%)
1
100.0%
Distinct263
Distinct (%)2.6%
Missing22
Missing (%)0.2%
Memory size156.2 KiB
2023-12-12T20:16:11.201064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length4
Mean length4.195931
Min length1

Characters and Unicode

Total characters41867
Distinct characters48
Distinct categories9 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique114 ?
Unique (%)1.1%

Sample

1st row1993
2nd row2003
3rd row2013
4th row2000
5th row1994
ValueCountFrequency (%)
2000 345
 
3.4%
1999 316
 
3.1%
2001 306
 
3.0%
2006 306
 
3.0%
1998 301
 
3.0%
1997 298
 
3.0%
1996 294
 
2.9%
1994 291
 
2.9%
2003 290
 
2.9%
2002 286
 
2.8%
Other values (230) 7044
69.9%
2023-12-12T20:16:11.949029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9 9753
23.3%
1 7998
19.1%
0 7617
18.2%
2 4796
11.5%
8 2444
 
5.8%
7 1912
 
4.6%
6 1400
 
3.3%
5 1152
 
2.8%
3 1138
 
2.7%
4 1093
 
2.6%
Other values (38) 2564
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 39303
93.9%
Other Letter 1468
 
3.5%
Close Punctuation 405
 
1.0%
Open Punctuation 403
 
1.0%
Space Separator 163
 
0.4%
Dash Punctuation 75
 
0.2%
Lowercase Letter 42
 
0.1%
Other Punctuation 7
 
< 0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
223
15.2%
220
15.0%
220
15.0%
219
14.9%
219
14.9%
110
7.5%
96
6.5%
96
6.5%
8
 
0.5%
8
 
0.5%
Other values (16) 49
 
3.3%
Decimal Number
ValueCountFrequency (%)
9 9753
24.8%
1 7998
20.3%
0 7617
19.4%
2 4796
12.2%
8 2444
 
6.2%
7 1912
 
4.9%
6 1400
 
3.6%
5 1152
 
2.9%
3 1138
 
2.9%
4 1093
 
2.8%
Other Punctuation
ValueCountFrequency (%)
4
57.1%
. 2
28.6%
· 1
 
14.3%
Close Punctuation
ValueCountFrequency (%)
] 402
99.3%
) 3
 
0.7%
Open Punctuation
ValueCountFrequency (%)
[ 400
99.3%
( 3
 
0.7%
Lowercase Letter
ValueCountFrequency (%)
c 34
81.0%
b 8
 
19.0%
Space Separator
ValueCountFrequency (%)
163
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 75
100.0%
Uppercase Letter
ValueCountFrequency (%)
W 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 40356
96.4%
Hangul 1235
 
2.9%
Han 233
 
0.6%
Latin 43
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
9 9753
24.2%
1 7998
19.8%
0 7617
18.9%
2 4796
11.9%
8 2444
 
6.1%
7 1912
 
4.7%
6 1400
 
3.5%
5 1152
 
2.9%
3 1138
 
2.8%
4 1093
 
2.7%
Other values (9) 1053
 
2.6%
Hangul
ValueCountFrequency (%)
223
18.1%
220
17.8%
220
17.8%
219
17.7%
219
17.7%
110
8.9%
8
 
0.6%
8
 
0.6%
2
 
0.2%
2
 
0.2%
Other values (4) 4
 
0.3%
Han
ValueCountFrequency (%)
96
41.2%
96
41.2%
7
 
3.0%
6
 
2.6%
6
 
2.6%
5
 
2.1%
5
 
2.1%
3
 
1.3%
3
 
1.3%
3
 
1.3%
Other values (2) 3
 
1.3%
Latin
ValueCountFrequency (%)
c 34
79.1%
b 8
 
18.6%
W 1
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 40394
96.5%
Hangul 1235
 
2.9%
CJK 231
 
0.6%
None 5
 
< 0.1%
CJK Compat Ideographs 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9 9753
24.1%
1 7998
19.8%
0 7617
18.9%
2 4796
11.9%
8 2444
 
6.1%
7 1912
 
4.7%
6 1400
 
3.5%
5 1152
 
2.9%
3 1138
 
2.8%
4 1093
 
2.7%
Other values (10) 1091
 
2.7%
Hangul
ValueCountFrequency (%)
223
18.1%
220
17.8%
220
17.8%
219
17.7%
219
17.7%
110
8.9%
8
 
0.6%
8
 
0.6%
2
 
0.2%
2
 
0.2%
Other values (4) 4
 
0.3%
CJK
ValueCountFrequency (%)
96
41.6%
96
41.6%
7
 
3.0%
6
 
2.6%
6
 
2.6%
5
 
2.2%
5
 
2.2%
3
 
1.3%
3
 
1.3%
3
 
1.3%
None
ValueCountFrequency (%)
4
80.0%
· 1
 
20.0%
CJK Compat Ideographs
ValueCountFrequency (%)
2
100.0%

자료실
Categorical

IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
자료실
8002 
수장고
1762 
아카이브서고
 
146
별도 자료
 
56
아카이브서고2
 
25

Length

Max length10
Median length3
Mean length3.0713
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row자료실
2nd row자료실
3rd row자료실
4th row자료실
5th row자료실

Common Values

ValueCountFrequency (%)
자료실 8002
80.0%
수장고 1762
 
17.6%
아카이브서고 146
 
1.5%
별도 자료 56
 
0.6%
아카이브서고2 25
 
0.2%
아카이브서고(유물) 9
 
0.1%

Length

2023-12-12T20:16:12.213374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:16:12.411608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
자료실 8002
79.6%
수장고 1762
 
17.5%
아카이브서고 146
 
1.5%
별도 56
 
0.6%
자료 56
 
0.6%
아카이브서고2 25
 
0.2%
아카이브서고(유물 9
 
0.1%

Missing values

2023-12-12T20:16:06.455983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T20:16:06.667699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T20:16:06.874551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

서명사항저자발행자발행년자료실
8537(전통문화)한국의 전통예술 : 기지시줄다리기문화체육부 문화재관리국한국문화재보호재단1993자료실
62031민속과 종교비교민속학회 편민속원2003자료실
29446국왕경응조무구정탑원기 : 정밀 학술 조사보고서문화재청, 불교문화재연구소 [편]문화재청2013자료실
21390문화의 세계화바르니에 쟝-피에르 지음 ; 주형일 옮김한울2000자료실
65049漢語大辭典 . 2韓語大辭典編輯委員會한어대사전출판사1994자료실
5966한국고대사회경제사전덕재 지음태학사2006자료실
60566日本語書簡文崔水鳳 著靑山文化社1973자료실
73372(옛부터 내려온) 우리 색깔 배우기김지희 저국립민속박물관2009자료실
49121中國遠古지三代宗敎史王吉懷상해인민줄판사1994자료실
42920(하늘에서 가져온) 솔씨 : 집지킴이 성주신 이야기글: 정해왕 ; 그림: 윤보원여원미디어2007자료실
서명사항저자발행자발행년자료실
47767始興金石總覽始興郡誌編纂委員會 [編]始興群1988자료실
51113景福宮 泰元殿址國立文化財硏究所국립문화재연구소1998수장고
46776(Das)Bild im Islam : Ein Verbot und seine FolgenIpsiroglu M.SVerlag Anton Schroll & Co1971수장고
54020Roofs and Lines :Yim Seock Jae ; Lee Jean Young Translated byEwha Womans University2005수장고
76328Arbeiterfamilie und sozialer AufstiegOrtmann HedwigJuventa Verlag2001수장고
15884주술동요의 사설구조와 기능 연구韓靜媚 [저]江陵大學校1994수장고
72328중국의 고구려사 연구 동향 분석고구려연구재단고구려연구재단2004자료실
5400태평광기. 16李昉 등 모음 ; 김장환 ; 이민숙 옮김학고방2001자료실
60895韓國語 語源 硏究. 1, 原始韓國語의 探求李男德 著梨花女子大學校 出版部1987자료실
2864韓國의 刺繡 어제와 오늘 : 수실과 마음이 함께 한숙명여자대학교 박물관숙명여자대학교 박물관2000자료실

Duplicate rows

Most frequently occurring

서명사항저자발행자발행년자료실# duplicates
107국어국문학 = (The) Korean language and literature. 13-75, 77-85, 87-113, 116-122, 124-157국어국문학회 [편]국어국문학회1953-2011자료실28
197<NA><NA><NA><NA>자료실11
117문화와 나<NA><NA><NA>자료실5
104국립국악원 구술총서 = Oral history series by national Gugak center. 1-17국립국악원 [편]국립국악원2011-2014자료실4
42峨山硏究論文集 : 要約 = Research papers(abstracts) funded by the Asan foundation. 2-7峨山社會福祉事業財團 [편]峨山社會福지事業財團1982-1989수장고3
59民俗調査ハンドブック上野和男 編吉川弘文館昭和62[1987]자료실3
95강원도 산간지역의 가옥과 생활 : 삼척군.평창군.정선군국립민속박물관 [편]국립민속박물관1994자료실3
100고성농요교본 : 등지소리金石明고성농요보존회1995자료실3
163청소년 민속강좌 : 민속박물관 문화학교. 1993국립민속박물관 [편]국립민속박물관1993자료실3
171한국근대미술시장사자료집. 1-6김상엽 편저경인문화사2015자료실3