Overview

Dataset statistics

Number of variables19
Number of observations10000
Missing cells53218
Missing cells (%)28.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.5 MiB
Average record size in memory162.0 B

Variable types

Text12
Categorical5
Unsupported2

Dataset

Description삼일운동데이터베이스는 삼일운동 기초 정보를 종합하고 GIS(지리정보체계)와 연동한 데이터베이스로 조선 소요사건 관계 서류, 일본 외무성 기록, 삼일운동 관련 판결문, 재한 선교사 자료 등 삼일운동과 관련한 자료에서 추출한 지역별, 유형별 정보를 제공합니다.
Author교육부 국사편찬위원회
URLhttps://www.data.go.kr/data/15064311/fileData.do

Alerts

제공일자 is highly overall correlated with 출처정보구분 and 3 other fieldsHigh correlation
제공 is highly overall correlated with 출처정보구분 and 3 other fieldsHigh correlation
출전문서구분 is highly overall correlated with 제공 and 1 other fieldsHigh correlation
출처정보구분 is highly overall correlated with 제공 and 1 other fieldsHigh correlation
탄압정보포함여부 is highly overall correlated with 제공 and 1 other fieldsHigh correlation
탄압정보포함여부 is highly imbalanced (50.9%)Imbalance
제공 is highly imbalanced (98.0%)Imbalance
제공일자 is highly imbalanced (98.0%)Imbalance
사건ID has 2893 (28.9%) missing valuesMissing
비고 has 7610 (76.1%) missing valuesMissing
사건종료일자 has 9279 (92.8%) missing valuesMissing
사건국외도시 has 9043 (90.4%) missing valuesMissing
사건세부장소 has 4366 (43.7%) missing valuesMissing
Unnamed: 17 has 10000 (100.0%) missing valuesMissing
Unnamed: 18 has 10000 (100.0%) missing valuesMissing
출처정보ID has unique valuesUnique
Unnamed: 17 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 18 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 08:56:32.016062
Analysis finished2023-12-12 08:56:40.802383
Duration8.79 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

출처정보ID
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T17:56:41.033395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length26
Mean length13.9212
Min length8

Characters and Unicode

Total characters139212
Distinct characters33
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row16G_0479
2nd rowhd_026_0070_0100_001
3rd rowhaf_109_0151_001
4th rowhd_013_0030_1010_001
5th rowdr_07101001_0010
ValueCountFrequency (%)
16g_0479 1
 
< 0.1%
haf_115_0000_001 1
 
< 0.1%
haf_023_0570_003 1
 
< 0.1%
mis_0003_013 1
 
< 0.1%
mi191903070460_0100 1
 
< 0.1%
dr_00604002_0320 1
 
< 0.1%
dr_05103002_0010 1
 
< 0.1%
haf_111_0243_021 1
 
< 0.1%
16a_0019 1
 
< 0.1%
ks0116_01420_001 1
 
< 0.1%
Other values (9990) 9990
99.9%
2023-12-12T17:56:41.526115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 44125
31.7%
_ 21009
15.1%
1 20196
14.5%
2 6796
 
4.9%
6 6396
 
4.6%
3 4495
 
3.2%
9 4462
 
3.2%
4 4038
 
2.9%
5 3522
 
2.5%
8 3467
 
2.5%
Other values (23) 20706
14.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100720
72.4%
Connector Punctuation 21009
 
15.1%
Lowercase Letter 10658
 
7.7%
Uppercase Letter 6819
 
4.9%
Dash Punctuation 6
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1215
17.8%
M 1030
15.1%
I 1030
15.1%
B 844
12.4%
F 610
8.9%
G 594
8.7%
E 503
7.4%
D 402
 
5.9%
S 252
 
3.7%
K 252
 
3.7%
Decimal Number
ValueCountFrequency (%)
0 44125
43.8%
1 20196
20.1%
2 6796
 
6.7%
6 6396
 
6.4%
3 4495
 
4.5%
9 4462
 
4.4%
4 4038
 
4.0%
5 3522
 
3.5%
8 3467
 
3.4%
7 3223
 
3.2%
Lowercase Letter
ValueCountFrequency (%)
h 2919
27.4%
d 2875
27.0%
f 1179
11.1%
a 1179
11.1%
r 729
 
6.8%
s 545
 
5.1%
i 413
 
3.9%
k 406
 
3.8%
j 230
 
2.2%
m 183
 
1.7%
Connector Punctuation
ValueCountFrequency (%)
_ 21009
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 121735
87.4%
Latin 17477
 
12.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
h 2919
16.7%
d 2875
16.5%
A 1215
 
7.0%
f 1179
 
6.7%
a 1179
 
6.7%
M 1030
 
5.9%
I 1030
 
5.9%
B 844
 
4.8%
r 729
 
4.2%
F 610
 
3.5%
Other values (11) 3867
22.1%
Common
ValueCountFrequency (%)
0 44125
36.2%
_ 21009
17.3%
1 20196
16.6%
2 6796
 
5.6%
6 6396
 
5.3%
3 4495
 
3.7%
9 4462
 
3.7%
4 4038
 
3.3%
5 3522
 
2.9%
8 3467
 
2.8%
Other values (2) 3229
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 139212
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 44125
31.7%
_ 21009
15.1%
1 20196
14.5%
2 6796
 
4.9%
6 6396
 
4.6%
3 4495
 
3.2%
9 4462
 
3.2%
4 4038
 
2.9%
5 3522
 
2.5%
8 3467
 
2.5%
Other values (23) 20706
14.9%

사건ID
Text

MISSING 

Distinct1774
Distinct (%)25.0%
Missing2893
Missing (%)28.9%
Memory size156.2 KiB
2023-12-12T17:56:41.869180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7
Mean length6.9991558
Min length3

Characters and Unicode

Total characters49743
Distinct characters36
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique772 ?
Unique (%)10.9%

Sample

1st rowb_30221
2nd rowa_00051
3rd rowx_00000
4th rowb_51722
5th rowb_00822
ValueCountFrequency (%)
a_00556 365
 
5.1%
a_00511 258
 
3.6%
a_00609 184
 
2.6%
a_00509 173
 
2.4%
x_00000 117
 
1.6%
a_00608 111
 
1.6%
a_00951 80
 
1.1%
a_00605 61
 
0.9%
a_00993 60
 
0.8%
a_00051 58
 
0.8%
Other values (1761) 5640
79.4%
2023-12-12T17:56:42.444759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 12452
25.0%
_ 7103
14.3%
1 6711
13.5%
2 3211
 
6.5%
a 3036
 
6.1%
5 2924
 
5.9%
3 2395
 
4.8%
b 2293
 
4.6%
6 2092
 
4.2%
4 1947
 
3.9%
Other values (26) 5579
11.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 35529
71.4%
Connector Punctuation 7103
 
14.3%
Lowercase Letter 6225
 
12.5%
Uppercase Letter 884
 
1.8%
Space Separator 1
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 3036
48.8%
b 2293
36.8%
h 309
 
5.0%
z 259
 
4.2%
j 133
 
2.1%
x 128
 
2.1%
f 47
 
0.8%
s 8
 
0.1%
i 5
 
0.1%
k 3
 
< 0.1%
Other values (2) 4
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
N 485
54.9%
M 173
 
19.6%
R 89
 
10.1%
J 85
 
9.6%
L 39
 
4.4%
Y 5
 
0.6%
I 3
 
0.3%
X 2
 
0.2%
B 1
 
0.1%
A 1
 
0.1%
Decimal Number
ValueCountFrequency (%)
0 12452
35.0%
1 6711
18.9%
2 3211
 
9.0%
5 2924
 
8.2%
3 2395
 
6.7%
6 2092
 
5.9%
4 1947
 
5.5%
9 1443
 
4.1%
7 1376
 
3.9%
8 978
 
2.8%
Connector Punctuation
ValueCountFrequency (%)
_ 7103
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 42634
85.7%
Latin 7109
 
14.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 3036
42.7%
b 2293
32.3%
N 485
 
6.8%
h 309
 
4.3%
z 259
 
3.6%
M 173
 
2.4%
j 133
 
1.9%
x 128
 
1.8%
R 89
 
1.3%
J 85
 
1.2%
Other values (13) 119
 
1.7%
Common
ValueCountFrequency (%)
0 12452
29.2%
_ 7103
16.7%
1 6711
15.7%
2 3211
 
7.5%
5 2924
 
6.9%
3 2395
 
5.6%
6 2092
 
4.9%
4 1947
 
4.6%
9 1443
 
3.4%
7 1376
 
3.2%
Other values (3) 980
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 49743
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 12452
25.0%
_ 7103
14.3%
1 6711
13.5%
2 3211
 
6.5%
a 3036
 
6.1%
5 2924
 
5.9%
3 2395
 
4.8%
b 2293
 
4.6%
6 2092
 
4.2%
4 1947
 
3.9%
Other values (26) 5579
11.2%
Distinct4651
Distinct (%)46.5%
Missing1
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-12T17:56:42.835490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length17
Mean length13.792779
Min length8

Characters and Unicode

Total characters137914
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3737 ?
Unique (%)37.4%

Sample

1st rowjssy_007_0060
2nd rowhd_026_0070_0100
3rd rowhaf_109_0151
4th rowhd_013_0030_1010
5th rowdr_071_01_001
ValueCountFrequency (%)
jssy_001_5470 621
 
6.2%
jssy_007_2570 436
 
4.4%
jssy_007_2540 123
 
1.2%
jssy_001_1960 95
 
1.0%
haf_113_0060 93
 
0.9%
jssy_007_2550 90
 
0.9%
ij_100_0010_0030 75
 
0.8%
haf_110_2051 71
 
0.7%
ij_007_0040_00010 70
 
0.7%
jssy_001_1680 65
 
0.7%
Other values (4641) 8260
82.6%
2023-12-12T17:56:43.390814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 40150
29.1%
_ 23406
17.0%
1 13603
 
9.9%
s 8696
 
6.3%
2 6021
 
4.4%
j 4481
 
3.2%
4 4384
 
3.2%
7 4328
 
3.1%
y 4251
 
3.1%
5 4083
 
3.0%
Other values (16) 24511
17.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 84287
61.1%
Lowercase Letter 27657
 
20.1%
Connector Punctuation 23406
 
17.0%
Uppercase Letter 2564
 
1.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 8696
31.4%
j 4481
16.2%
y 4251
15.4%
h 3267
 
11.8%
d 2526
 
9.1%
a 1527
 
5.5%
f 1527
 
5.5%
r 729
 
2.6%
i 413
 
1.5%
m 183
 
0.7%
Decimal Number
ValueCountFrequency (%)
0 40150
47.6%
1 13603
 
16.1%
2 6021
 
7.1%
4 4384
 
5.2%
7 4328
 
5.1%
5 4083
 
4.8%
9 3882
 
4.6%
3 3495
 
4.1%
6 2653
 
3.1%
8 1688
 
2.0%
Uppercase Letter
ValueCountFrequency (%)
M 1030
40.2%
I 1030
40.2%
K 252
 
9.8%
S 252
 
9.8%
Connector Punctuation
ValueCountFrequency (%)
_ 23406
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 107693
78.1%
Latin 30221
 
21.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 8696
28.8%
j 4481
14.8%
y 4251
14.1%
h 3267
 
10.8%
d 2526
 
8.4%
a 1527
 
5.1%
f 1527
 
5.1%
M 1030
 
3.4%
I 1030
 
3.4%
r 729
 
2.4%
Other values (5) 1157
 
3.8%
Common
ValueCountFrequency (%)
0 40150
37.3%
_ 23406
21.7%
1 13603
 
12.6%
2 6021
 
5.6%
4 4384
 
4.1%
7 4328
 
4.0%
5 4083
 
3.8%
9 3882
 
3.6%
3 3495
 
3.2%
6 2653
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 137914
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 40150
29.1%
_ 23406
17.0%
1 13603
 
9.9%
s 8696
 
6.3%
2 6021
 
4.4%
j 4481
 
3.2%
4 4384
 
3.2%
7 4328
 
3.1%
y 4251
 
3.1%
5 4083
 
3.0%
Other values (16) 24511
17.8%
Distinct3409
Distinct (%)34.1%
Missing1
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-12T17:56:43.726127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length201
Median length79
Mean length19.09591
Min length2

Characters and Unicode

Total characters190940
Distinct characters2134
Distinct categories11 ?
Distinct scripts6 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2571 ?
Unique (%)25.7%

Sample

1st row獨立運動에 관한 건(제6보)
2nd row調書
3rd row[當地朝鮮人ニハ元來日韓倂合ヲ…]
4th row第2回證人訊問調書
5th row獨立公債募集
ValueCountFrequency (%)
1987
 
5.4%
관한 1448
 
3.9%
1116
 
3.0%
關한 1081
 
2.9%
電報 1031
 
2.8%
獨立運動에 1004
 
2.7%
調書 662
 
1.8%
朝鮮騷擾事件一覽表에 621
 
1.7%
상황 604
 
1.6%
經過 436
 
1.2%
Other values (6183) 26717
72.8%
2023-12-12T17:56:44.176220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
26588
 
13.9%
1 4450
 
2.3%
3590
 
1.9%
3571
 
1.9%
3369
 
1.8%
9 2991
 
1.6%
2940
 
1.5%
. 2933
 
1.5%
2587
 
1.4%
) 2533
 
1.3%
Other values (2124) 135388
70.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 127383
66.7%
Space Separator 26748
 
14.0%
Decimal Number 13533
 
7.1%
Other Punctuation 7721
 
4.0%
Close Punctuation 4732
 
2.5%
Open Punctuation 4730
 
2.5%
Lowercase Letter 4402
 
2.3%
Uppercase Letter 765
 
0.4%
Dash Punctuation 681
 
0.4%
Math Symbol 208
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3590
 
2.8%
3571
 
2.8%
3369
 
2.6%
2940
 
2.3%
2587
 
2.0%
2267
 
1.8%
2267
 
1.8%
2240
 
1.8%
2203
 
1.7%
2188
 
1.7%
Other values (2036) 100161
78.6%
Lowercase Letter
ValueCountFrequency (%)
e 612
13.9%
o 500
11.4%
r 495
11.2%
n 426
9.7%
a 399
9.1%
i 314
 
7.1%
t 276
 
6.3%
s 272
 
6.2%
d 136
 
3.1%
c 136
 
3.1%
Other values (16) 836
19.0%
Uppercase Letter
ValueCountFrequency (%)
S 128
16.7%
D 83
10.8%
K 81
10.6%
M 75
9.8%
T 55
 
7.2%
W 45
 
5.9%
B 42
 
5.5%
H 41
 
5.4%
A 39
 
5.1%
R 38
 
5.0%
Other values (13) 138
18.0%
Decimal Number
ValueCountFrequency (%)
1 4450
32.9%
9 2991
22.1%
3 1895
14.0%
4 1265
 
9.3%
2 866
 
6.4%
0 717
 
5.3%
5 543
 
4.0%
6 485
 
3.6%
7 169
 
1.2%
8 149
 
1.1%
Other values (2) 3
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 2933
38.0%
: 2094
27.1%
, 1842
23.9%
706
 
9.1%
? 127
 
1.6%
' 9
 
0.1%
; 7
 
0.1%
· 1
 
< 0.1%
/ 1
 
< 0.1%
! 1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 2533
53.5%
] 2071
43.8%
121
 
2.6%
6
 
0.1%
1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 2531
53.5%
[ 2071
43.8%
121
 
2.6%
6
 
0.1%
1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
26588
99.4%
  160
 
0.6%
Math Symbol
ValueCountFrequency (%)
~ 206
99.0%
2
 
1.0%
Other Symbol
ValueCountFrequency (%)
35
94.6%
2
 
5.4%
Dash Punctuation
ValueCountFrequency (%)
- 681
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 91369
47.9%
Common 58390
30.6%
Hangul 33141
 
17.4%
Latin 5167
 
2.7%
Katakana 2864
 
1.5%
Hiragana 9
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
3590
 
3.9%
2587
 
2.8%
2267
 
2.5%
2267
 
2.5%
2240
 
2.5%
2203
 
2.4%
2188
 
2.4%
2078
 
2.3%
2032
 
2.2%
1986
 
2.2%
Other values (1508) 67931
74.3%
Hangul
ValueCountFrequency (%)
3571
 
10.8%
3369
 
10.2%
2940
 
8.9%
1537
 
4.6%
1152
 
3.5%
905
 
2.7%
857
 
2.6%
807
 
2.4%
800
 
2.4%
776
 
2.3%
Other values (468) 16427
49.6%
Latin
ValueCountFrequency (%)
e 612
 
11.8%
o 500
 
9.7%
r 495
 
9.6%
n 426
 
8.2%
a 399
 
7.7%
i 314
 
6.1%
t 276
 
5.3%
s 272
 
5.3%
d 136
 
2.6%
c 136
 
2.6%
Other values (39) 1601
31.0%
Katakana
ValueCountFrequency (%)
840
29.3%
427
14.9%
317
 
11.1%
218
 
7.6%
203
 
7.1%
134
 
4.7%
121
 
4.2%
87
 
3.0%
66
 
2.3%
48
 
1.7%
Other values (36) 403
14.1%
Common
ValueCountFrequency (%)
26588
45.5%
1 4450
 
7.6%
9 2991
 
5.1%
. 2933
 
5.0%
) 2533
 
4.3%
( 2531
 
4.3%
: 2094
 
3.6%
[ 2071
 
3.5%
] 2071
 
3.5%
3 1895
 
3.2%
Other values (29) 8233
 
14.1%
Hiragana
ValueCountFrequency (%)
3
33.3%
3
33.3%
2
22.2%
1
 
11.1%

Most occurring blocks

ValueCountFrequency (%)
CJK 91369
47.9%
ASCII 62392
32.7%
Hangul 33141
 
17.4%
Katakana 2864
 
1.5%
Punctuation 706
 
0.4%
None 420
 
0.2%
Geometric Shapes 37
 
< 0.1%
Hiragana 9
 
< 0.1%
Math Operators 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
26588
42.6%
1 4450
 
7.1%
9 2991
 
4.8%
. 2933
 
4.7%
) 2533
 
4.1%
( 2531
 
4.1%
: 2094
 
3.4%
[ 2071
 
3.3%
] 2071
 
3.3%
3 1895
 
3.0%
Other values (64) 12235
19.6%
CJK
ValueCountFrequency (%)
3590
 
3.9%
2587
 
2.8%
2267
 
2.5%
2267
 
2.5%
2240
 
2.5%
2203
 
2.4%
2188
 
2.4%
2078
 
2.3%
2032
 
2.2%
1986
 
2.2%
Other values (1508) 67931
74.3%
Hangul
ValueCountFrequency (%)
3571
 
10.8%
3369
 
10.2%
2940
 
8.9%
1537
 
4.6%
1152
 
3.5%
905
 
2.7%
857
 
2.6%
807
 
2.4%
800
 
2.4%
776
 
2.3%
Other values (468) 16427
49.6%
Katakana
ValueCountFrequency (%)
840
29.3%
427
14.9%
317
 
11.1%
218
 
7.6%
203
 
7.1%
134
 
4.7%
121
 
4.2%
87
 
3.0%
66
 
2.3%
48
 
1.7%
Other values (36) 403
14.1%
Punctuation
ValueCountFrequency (%)
706
100.0%
None
ValueCountFrequency (%)
  160
38.1%
121
28.8%
121
28.8%
6
 
1.4%
6
 
1.4%
2
 
0.5%
1
 
0.2%
· 1
 
0.2%
1
 
0.2%
1
 
0.2%
Geometric Shapes
ValueCountFrequency (%)
35
94.6%
2
 
5.4%
Hiragana
ValueCountFrequency (%)
3
33.3%
3
33.3%
2
22.2%
1
 
11.1%
Math Operators
ValueCountFrequency (%)
2
100.0%

출처정보구분
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
운동
7088 
독립운동
1300 
동향정보
1022 
탄압
 
585
<NA>
 
3

Length

Max length4
Median length2
Mean length2.4654
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row운동
2nd row운동
3rd row독립운동
4th row운동
5th row독립운동

Common Values

ValueCountFrequency (%)
운동 7088
70.9%
독립운동 1300
 
13.0%
동향정보 1022
 
10.2%
탄압 585
 
5.9%
<NA> 3
 
< 0.1%
동향보고 2
 
< 0.1%

Length

2023-12-12T17:56:44.361180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:56:44.502186image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
운동 7088
70.9%
독립운동 1300
 
13.0%
동향정보 1022
 
10.2%
탄압 585
 
5.9%
na 3
 
< 0.1%
동향보고 2
 
< 0.1%

출전문서구분
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
소요사건서류
4251 
경성지법검사국문서
1992 
일본외무성기록
1584 
매일신보
1030 
독립신문
729 
Other values (4)
 
414

Length

Max length9
Median length7
Mean length6.4659
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row소요사건서류
2nd row경성지법검사국문서
3rd row일본외무성기록
4th row경성지법검사국문서
5th row독립신문

Common Values

ValueCountFrequency (%)
소요사건서류 4251
42.5%
경성지법검사국문서 1992
19.9%
일본외무성기록 1584
 
15.8%
매일신보 1030
 
10.3%
독립신문 729
 
7.3%
재한선교사보고자료 183
 
1.8%
독립운동사략 160
 
1.6%
한일관계사료집 70
 
0.7%
<NA> 1
 
< 0.1%

Length

2023-12-12T17:56:44.671812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:56:44.798182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
소요사건서류 4251
42.5%
경성지법검사국문서 1992
19.9%
일본외무성기록 1584
 
15.8%
매일신보 1030
 
10.3%
독립신문 729
 
7.3%
재한선교사보고자료 183
 
1.8%
독립운동사략 160
 
1.6%
한일관계사료집 70
 
0.7%
na 1
 
< 0.1%
Distinct9077
Distinct (%)90.8%
Missing1
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-12T17:56:45.205145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length1024
Median length672
Mean length158.40884
Min length8

Characters and Unicode

Total characters1583930
Distinct characters4111
Distinct categories16 ?
Distinct scripts6 ?
Distinct blocks13 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8683 ?
Unique (%)86.8%

Sample

1st row[독립운동에 관한 건(제6보)] 平安南道 江西郡 沙川憲兵駐在所는 3월 4일(時間不詳)에 暴民이 來襲하므로 所長 上等兵 佐藤實五郞 외 補助員 3명이 힘을 다해 시위대를 격퇴하는 데 힘썼으나 탄약을 다 사용하고 중과부적으로 마침내 장렬한 최후를 맞었다. 平壤憲兵分隊長 이하 10명이 사천으로 급행했는데 暴民 중에도 다수의 사상자가 있는 것 같다.
2nd row1919년 3월 18일 강화군 부내면 만세시위사건 피고인 高成根(松海面長)에 대한 경찰 신문조서. 피고인 염씨가 전달한 불온문서의 수취 경위 등 신문
3rd row니콜리스크(ニコリスクコエ)에서는 韓族國民委員會와 협의 후 2월 상순 파리에 尹海 및 高昌一 2명을 파견하였다. 또 [上海]에서도 김규식을 파견하였다.
4th row1919년 3월 1일 경기도 개성의 독립선언서 배포 관련자 강조원에 대한 증인 오화영의 경찰신문조서임(제2회). 2월 중순 개성 방문시 행적, 2월 26일 상경한 강조원과 면담 내용 등 신문임
5th row1920년 04월 29일 [獨立公債募集] 독립공채모집 홍보를 게재
ValueCountFrequency (%)
1919년 4346
 
1.3%
3월 4280
 
1.2%
4월 2877
 
0.8%
2065
 
0.6%
1551
 
0.5%
대한 1539
 
0.4%
1일 1326
 
0.4%
관한 1242
 
0.4%
1087
 
0.3%
조선인 1078
 
0.3%
Other values (77121) 322916
93.8%
2023-12-12T17:56:45.963532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
335978
 
21.2%
27484
 
1.7%
1 25003
 
1.6%
, 23378
 
1.5%
23236
 
1.5%
21553
 
1.4%
19074
 
1.2%
18711
 
1.2%
. 18063
 
1.1%
15483
 
1.0%
Other values (4101) 1055967
66.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1048494
66.2%
Space Separator 336046
 
21.2%
Decimal Number 90334
 
5.7%
Other Punctuation 51161
 
3.2%
Lowercase Letter 17608
 
1.1%
Close Punctuation 14255
 
0.9%
Open Punctuation 14042
 
0.9%
Math Symbol 8242
 
0.5%
Uppercase Letter 1453
 
0.1%
Other Symbol 1433
 
0.1%
Other values (6) 862
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
27484
 
2.6%
23236
 
2.2%
21553
 
2.1%
19074
 
1.8%
18711
 
1.8%
15483
 
1.5%
14558
 
1.4%
14443
 
1.4%
14058
 
1.3%
13793
 
1.3%
Other values (3974) 866101
82.6%
Lowercase Letter
ValueCountFrequency (%)
r 3613
20.5%
b 2692
15.3%
n 1471
8.4%
o 1394
 
7.9%
t 1297
 
7.4%
s 1235
 
7.0%
g 821
 
4.7%
a 731
 
4.2%
e 696
 
4.0%
i 586
 
3.3%
Other values (15) 3072
17.4%
Uppercase Letter
ValueCountFrequency (%)
S 136
 
9.4%
M 124
 
8.5%
C 98
 
6.7%
B 95
 
6.5%
K 93
 
6.4%
A 89
 
6.1%
H 77
 
5.3%
G 75
 
5.2%
R 66
 
4.5%
T 63
 
4.3%
Other values (15) 537
37.0%
Other Punctuation
ValueCountFrequency (%)
, 23378
45.7%
. 18063
35.3%
/ 4370
 
8.5%
? 2961
 
5.8%
" 706
 
1.4%
· 537
 
1.0%
' 372
 
0.7%
: 349
 
0.7%
; 187
 
0.4%
& 173
 
0.3%
Other values (7) 65
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 25003
27.7%
9 12350
13.7%
0 12020
13.3%
3 11905
13.2%
2 9331
 
10.3%
4 7915
 
8.8%
5 3918
 
4.3%
6 3136
 
3.5%
8 2611
 
2.9%
7 2142
 
2.4%
Other values (2) 3
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
1048
73.1%
235
 
16.4%
107
 
7.5%
27
 
1.9%
6
 
0.4%
3
 
0.2%
3
 
0.2%
1
 
0.1%
1
 
0.1%
1
 
0.1%
Math Symbol
ValueCountFrequency (%)
< 3553
43.1%
> 3552
43.1%
~ 966
 
11.7%
= 162
 
2.0%
3
 
< 0.1%
+ 3
 
< 0.1%
2
 
< 0.1%
| 1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 7888
55.3%
] 6080
42.7%
209
 
1.5%
55
 
0.4%
18
 
0.1%
3
 
< 0.1%
} 2
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 7670
54.6%
[ 6085
43.3%
211
 
1.5%
56
 
0.4%
18
 
0.1%
{ 2
 
< 0.1%
Other Number
ValueCountFrequency (%)
5
29.4%
4
23.5%
4
23.5%
2
 
11.8%
1
 
5.9%
1
 
5.9%
Space Separator
ValueCountFrequency (%)
335978
> 99.9%
  68
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 561
95.7%
25
 
4.3%
Final Punctuation
ValueCountFrequency (%)
77
63.6%
44
36.4%
Initial Punctuation
ValueCountFrequency (%)
76
63.3%
44
36.7%
Connector Punctuation
ValueCountFrequency (%)
_ 17
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 795574
50.2%
Common 516374
32.6%
Han 249444
 
15.7%
Latin 19062
 
1.2%
Katakana 3428
 
0.2%
Hiragana 48
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
5462
 
2.2%
3846
 
1.5%
3833
 
1.5%
3500
 
1.4%
3370
 
1.4%
2652
 
1.1%
2601
 
1.0%
2532
 
1.0%
2479
 
1.0%
2425
 
1.0%
Other values (2731) 216744
86.9%
Hangul
ValueCountFrequency (%)
27484
 
3.5%
23236
 
2.9%
21553
 
2.7%
19074
 
2.4%
18711
 
2.4%
15483
 
1.9%
14558
 
1.8%
14443
 
1.8%
14058
 
1.8%
13793
 
1.7%
Other values (1141) 613181
77.1%
Katakana
ValueCountFrequency (%)
341
 
9.9%
323
 
9.4%
242
 
7.1%
168
 
4.9%
156
 
4.6%
151
 
4.4%
148
 
4.3%
111
 
3.2%
89
 
2.6%
86
 
2.5%
Other values (68) 1613
47.1%
Common
ValueCountFrequency (%)
335978
65.1%
1 25003
 
4.8%
, 23378
 
4.5%
. 18063
 
3.5%
9 12350
 
2.4%
0 12020
 
2.3%
3 11905
 
2.3%
2 9331
 
1.8%
4 7915
 
1.5%
) 7888
 
1.5%
Other values (66) 52543
 
10.2%
Latin
ValueCountFrequency (%)
r 3613
19.0%
b 2692
14.1%
n 1471
 
7.7%
o 1394
 
7.3%
t 1297
 
6.8%
s 1235
 
6.5%
g 821
 
4.3%
a 731
 
3.8%
e 696
 
3.7%
i 586
 
3.1%
Other values (41) 4526
23.7%
Hiragana
ValueCountFrequency (%)
10
20.8%
7
14.6%
7
14.6%
5
10.4%
5
10.4%
3
 
6.2%
3
 
6.2%
2
 
4.2%
1
 
2.1%
1
 
2.1%
Other values (4) 4
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 795498
50.2%
ASCII 532524
33.6%
CJK 249444
 
15.7%
Katakana 3428
 
0.2%
Geometric Shapes 1432
 
0.1%
None 1184
 
0.1%
Punctuation 272
 
< 0.1%
Compat Jamo 76
 
< 0.1%
Hiragana 48
 
< 0.1%
Enclosed Alphanum 19
 
< 0.1%
Other values (3) 5
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
335978
63.1%
1 25003
 
4.7%
, 23378
 
4.4%
. 18063
 
3.4%
9 12350
 
2.3%
0 12020
 
2.3%
3 11905
 
2.2%
2 9331
 
1.8%
4 7915
 
1.5%
) 7888
 
1.5%
Other values (78) 68693
 
12.9%
Hangul
ValueCountFrequency (%)
27484
 
3.5%
23236
 
2.9%
21553
 
2.7%
19074
 
2.4%
18711
 
2.4%
15483
 
1.9%
14558
 
1.8%
14443
 
1.8%
14058
 
1.8%
13793
 
1.7%
Other values (1135) 613105
77.1%
CJK
ValueCountFrequency (%)
5462
 
2.2%
3846
 
1.5%
3833
 
1.5%
3500
 
1.4%
3370
 
1.4%
2652
 
1.1%
2601
 
1.0%
2532
 
1.0%
2479
 
1.0%
2425
 
1.0%
Other values (2731) 216744
86.9%
Geometric Shapes
ValueCountFrequency (%)
1048
73.2%
235
 
16.4%
107
 
7.5%
27
 
1.9%
6
 
0.4%
3
 
0.2%
3
 
0.2%
2
 
0.1%
1
 
0.1%
None
ValueCountFrequency (%)
· 537
45.4%
211
 
17.8%
209
 
17.7%
  68
 
5.7%
56
 
4.7%
55
 
4.6%
18
 
1.5%
18
 
1.5%
3
 
0.3%
3
 
0.3%
Other values (3) 6
 
0.5%
Katakana
ValueCountFrequency (%)
341
 
9.9%
323
 
9.4%
242
 
7.1%
168
 
4.9%
156
 
4.6%
151
 
4.4%
148
 
4.3%
111
 
3.2%
89
 
2.6%
86
 
2.5%
Other values (68) 1613
47.1%
Punctuation
ValueCountFrequency (%)
77
28.3%
76
27.9%
44
16.2%
44
16.2%
25
 
9.2%
6
 
2.2%
Compat Jamo
ValueCountFrequency (%)
65
85.5%
4
 
5.3%
4
 
5.3%
1
 
1.3%
1
 
1.3%
1
 
1.3%
Hiragana
ValueCountFrequency (%)
10
20.8%
7
14.6%
7
14.6%
5
10.4%
5
10.4%
3
 
6.2%
3
 
6.2%
2
 
4.2%
1
 
2.1%
1
 
2.1%
Other values (4) 4
 
8.3%
Enclosed Alphanum
ValueCountFrequency (%)
5
26.3%
4
21.1%
4
21.1%
2
 
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Math Operators
ValueCountFrequency (%)
3
100.0%
Box Drawing
ValueCountFrequency (%)
1
100.0%
Number Forms
ValueCountFrequency (%)
1
100.0%

비고
Text

MISSING 

Distinct1792
Distinct (%)75.0%
Missing7610
Missing (%)76.1%
Memory size156.2 KiB
2023-12-12T17:56:46.344239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length1024
Median length284
Mean length75.28159
Min length1

Characters and Unicode

Total characters179923
Distinct characters1651
Distinct categories15 ?
Distinct scripts6 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1586 ?
Unique (%)66.4%

Sample

1st row첨부문서 있음(大同團總部의 通告文과 醵金勸告文, 大同團總裁의 佈告文)
2nd row3월 29일 양주군의 시위는 노해면, 장흥면, 별내면이 있다. 각면에 배치한다.
3rd rowポクラニ?チナヤ라고 표기된 지역은 ‘포그라니치니’로, 중국 綏芬河와 국경을 맞대고 있다.
4th row 본문에 독립선언 준비 활동, 선언서 배포, 3월 1일 독립만세, 이후 유인물 배포 등 독립운동 주도자 29명에 대한 예심결정(서)이 게재됨
5th row 본문에 3월 1일 독립선언 및 이후 독립만세운동을 주도 참여한 학생 212명에 대한 예심종결서(8월 30일)가 게재됨
ValueCountFrequency (%)
본문에 1005
 
2.6%
3월 857
 
2.2%
게재됨 529
 
1.3%
4월 383
 
1.0%
대한 279
 
0.7%
시위 242
 
0.6%
독립만세운동 199
 
0.5%
것으로 176
 
0.4%
1일 174
 
0.4%
171
 
0.4%
Other values (10405) 35363
89.8%
2023-12-12T17:56:46.842390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
38174
 
21.2%
3142
 
1.7%
3104
 
1.7%
1 2777
 
1.5%
2744
 
1.5%
, 2673
 
1.5%
2512
 
1.4%
2205
 
1.2%
3 1998
 
1.1%
. 1898
 
1.1%
Other values (1641) 118696
66.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 115672
64.3%
Space Separator 38187
 
21.2%
Decimal Number 12086
 
6.7%
Other Punctuation 6888
 
3.8%
Lowercase Letter 2506
 
1.4%
Close Punctuation 1301
 
0.7%
Open Punctuation 1218
 
0.7%
Math Symbol 924
 
0.5%
Connector Punctuation 418
 
0.2%
Uppercase Letter 385
 
0.2%
Other values (5) 338
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3142
 
2.7%
3104
 
2.7%
2744
 
2.4%
2512
 
2.2%
2205
 
1.9%
1741
 
1.5%
1724
 
1.5%
1706
 
1.5%
1687
 
1.5%
1597
 
1.4%
Other values (1541) 93510
80.8%
Lowercase Letter
ValueCountFrequency (%)
s 299
11.9%
r 234
 
9.3%
b 222
 
8.9%
y 180
 
7.2%
n 163
 
6.5%
a 160
 
6.4%
t 152
 
6.1%
l 148
 
5.9%
o 145
 
5.8%
i 132
 
5.3%
Other values (15) 671
26.8%
Uppercase Letter
ValueCountFrequency (%)
D 108
28.1%
B 72
18.7%
I 38
 
9.9%
F 22
 
5.7%
N 20
 
5.2%
E 16
 
4.2%
G 16
 
4.2%
S 14
 
3.6%
T 11
 
2.9%
A 11
 
2.9%
Other values (13) 57
14.8%
Other Punctuation
ValueCountFrequency (%)
, 2673
38.8%
. 1898
27.6%
' 446
 
6.5%
; 353
 
5.1%
& 327
 
4.7%
" 316
 
4.6%
# 279
 
4.1%
/ 238
 
3.5%
: 207
 
3.0%
? 135
 
2.0%
Other values (3) 16
 
0.2%
Decimal Number
ValueCountFrequency (%)
1 2777
23.0%
3 1998
16.5%
0 1671
13.8%
2 1445
12.0%
9 1389
11.5%
4 1049
 
8.7%
5 502
 
4.2%
8 441
 
3.6%
6 423
 
3.5%
7 391
 
3.2%
Math Symbol
ValueCountFrequency (%)
> 369
39.9%
< 366
39.6%
~ 121
 
13.1%
= 62
 
6.7%
5
 
0.5%
+ 1
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 1108
85.2%
] 134
 
10.3%
38
 
2.9%
20
 
1.5%
} 1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 1026
84.2%
[ 133
 
10.9%
38
 
3.1%
20
 
1.6%
{ 1
 
0.1%
Other Number
ValueCountFrequency (%)
8
38.1%
8
38.1%
3
 
14.3%
2
 
9.5%
Space Separator
ValueCountFrequency (%)
38174
> 99.9%
  13
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 290
99.7%
1
 
0.3%
Other Symbol
ValueCountFrequency (%)
4
80.0%
1
 
20.0%
Connector Punctuation
ValueCountFrequency (%)
_ 418
100.0%
Initial Punctuation
ValueCountFrequency (%)
11
100.0%
Final Punctuation
ValueCountFrequency (%)
10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 109255
60.7%
Common 61360
34.1%
Han 6061
 
3.4%
Latin 2891
 
1.6%
Katakana 355
 
0.2%
Hiragana 1
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
157
 
2.6%
156
 
2.6%
145
 
2.4%
126
 
2.1%
121
 
2.0%
120
 
2.0%
100
 
1.6%
90
 
1.5%
89
 
1.5%
87
 
1.4%
Other values (880) 4870
80.3%
Hangul
ValueCountFrequency (%)
3142
 
2.9%
3104
 
2.8%
2744
 
2.5%
2512
 
2.3%
2205
 
2.0%
1741
 
1.6%
1724
 
1.6%
1706
 
1.6%
1687
 
1.5%
1597
 
1.5%
Other values (598) 87093
79.7%
Common
ValueCountFrequency (%)
38174
62.2%
1 2777
 
4.5%
, 2673
 
4.4%
3 1998
 
3.3%
. 1898
 
3.1%
0 1671
 
2.7%
2 1445
 
2.4%
9 1389
 
2.3%
) 1108
 
1.8%
4 1049
 
1.7%
Other values (42) 7178
 
11.7%
Katakana
ValueCountFrequency (%)
38
 
10.7%
32
 
9.0%
31
 
8.7%
25
 
7.0%
25
 
7.0%
25
 
7.0%
20
 
5.6%
19
 
5.4%
16
 
4.5%
11
 
3.1%
Other values (42) 113
31.8%
Latin
ValueCountFrequency (%)
s 299
 
10.3%
r 234
 
8.1%
b 222
 
7.7%
y 180
 
6.2%
n 163
 
5.6%
a 160
 
5.5%
t 152
 
5.3%
l 148
 
5.1%
o 145
 
5.0%
i 132
 
4.6%
Other values (38) 1056
36.5%
Hiragana
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 109255
60.7%
ASCII 64055
35.6%
CJK 6061
 
3.4%
Katakana 355
 
0.2%
None 141
 
0.1%
Punctuation 24
 
< 0.1%
Enclosed Alphanum 21
 
< 0.1%
Arrows 5
 
< 0.1%
Misc Symbols 4
 
< 0.1%
Geometric Shapes 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
38174
59.6%
1 2777
 
4.3%
, 2673
 
4.2%
3 1998
 
3.1%
. 1898
 
3.0%
0 1671
 
2.6%
2 1445
 
2.3%
9 1389
 
2.2%
) 1108
 
1.7%
4 1049
 
1.6%
Other values (73) 9873
 
15.4%
Hangul
ValueCountFrequency (%)
3142
 
2.9%
3104
 
2.8%
2744
 
2.5%
2512
 
2.3%
2205
 
2.0%
1741
 
1.6%
1724
 
1.6%
1706
 
1.6%
1687
 
1.5%
1597
 
1.5%
Other values (598) 87093
79.7%
CJK
ValueCountFrequency (%)
157
 
2.6%
156
 
2.6%
145
 
2.4%
126
 
2.1%
121
 
2.0%
120
 
2.0%
100
 
1.6%
90
 
1.5%
89
 
1.5%
87
 
1.4%
Other values (880) 4870
80.3%
None
ValueCountFrequency (%)
38
27.0%
38
27.0%
20
14.2%
20
14.2%
  13
 
9.2%
· 12
 
8.5%
Katakana
ValueCountFrequency (%)
38
 
10.7%
32
 
9.0%
31
 
8.7%
25
 
7.0%
25
 
7.0%
25
 
7.0%
20
 
5.6%
19
 
5.4%
16
 
4.5%
11
 
3.1%
Other values (42) 113
31.8%
Punctuation
ValueCountFrequency (%)
11
45.8%
10
41.7%
2
 
8.3%
1
 
4.2%
Enclosed Alphanum
ValueCountFrequency (%)
8
38.1%
8
38.1%
3
 
14.3%
2
 
9.5%
Arrows
ValueCountFrequency (%)
5
100.0%
Misc Symbols
ValueCountFrequency (%)
4
100.0%
Geometric Shapes
ValueCountFrequency (%)
1
100.0%
Hiragana
ValueCountFrequency (%)
1
100.0%
Distinct543
Distinct (%)5.4%
Missing22
Missing (%)0.2%
Memory size156.2 KiB
2023-12-12T17:56:47.176933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters99780
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique174 ?
Unique (%)1.7%

Sample

1st row1919-03-04
2nd row1919-03-16
3rd row1919-02-99
4th row1919-02-22
5th row1919-04-29
ValueCountFrequency (%)
1919-04-01 550
 
5.5%
1919-03-01 539
 
5.4%
1919-04-03 481
 
4.8%
1919-03-28 311
 
3.1%
1919-03-99 281
 
2.8%
1919-04-02 277
 
2.8%
1919-03-31 215
 
2.2%
1919-04-08 204
 
2.0%
1919-04-99 197
 
2.0%
1919-03-23 193
 
1.9%
Other values (533) 6730
67.4%
2023-12-12T17:56:47.643551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 23989
24.0%
9 22831
22.9%
- 19955
20.0%
0 15083
15.1%
3 6129
 
6.1%
2 4065
 
4.1%
4 3812
 
3.8%
8 1158
 
1.2%
5 1118
 
1.1%
6 892
 
0.9%
Other values (2) 748
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 79824
80.0%
Dash Punctuation 19955
 
20.0%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 23989
30.1%
9 22831
28.6%
0 15083
18.9%
3 6129
 
7.7%
2 4065
 
5.1%
4 3812
 
4.8%
8 1158
 
1.5%
5 1118
 
1.4%
6 892
 
1.1%
7 747
 
0.9%
Dash Punctuation
ValueCountFrequency (%)
- 19955
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 99780
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 23989
24.0%
9 22831
22.9%
- 19955
20.0%
0 15083
15.1%
3 6129
 
6.1%
2 4065
 
4.1%
4 3812
 
3.8%
8 1158
 
1.2%
5 1118
 
1.1%
6 892
 
0.9%
Other values (2) 748
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 99780
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 23989
24.0%
9 22831
22.9%
- 19955
20.0%
0 15083
15.1%
3 6129
 
6.1%
2 4065
 
4.1%
4 3812
 
3.8%
8 1158
 
1.2%
5 1118
 
1.1%
6 892
 
0.9%
Other values (2) 748
 
0.7%

사건종료일자
Text

MISSING 

Distinct177
Distinct (%)24.5%
Missing9279
Missing (%)92.8%
Memory size156.2 KiB
2023-12-12T17:56:47.941598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters7210
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique94 ?
Unique (%)13.0%

Sample

1st row1919-02-28
2nd row1920-03-10
3rd row1919-04-99
4th row1919-04-23
5th row1919-04-06
ValueCountFrequency (%)
1919-04-99 65
 
9.0%
1919-08-99 63
 
8.7%
1919-02-28 37
 
5.1%
1919-04-23 34
 
4.7%
1919-03-10 18
 
2.5%
1919-03-08 16
 
2.2%
1919-03-15 16
 
2.2%
1919-04-03 14
 
1.9%
1919-03-04 14
 
1.9%
1919-03-03 14
 
1.9%
Other values (167) 430
59.6%
2023-12-12T17:56:48.355141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9 1820
25.2%
1 1654
22.9%
- 1442
20.0%
0 985
13.7%
3 342
 
4.7%
2 340
 
4.7%
4 269
 
3.7%
8 166
 
2.3%
5 85
 
1.2%
6 57
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5768
80.0%
Dash Punctuation 1442
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9 1820
31.6%
1 1654
28.7%
0 985
17.1%
3 342
 
5.9%
2 340
 
5.9%
4 269
 
4.7%
8 166
 
2.9%
5 85
 
1.5%
6 57
 
1.0%
7 50
 
0.9%
Dash Punctuation
ValueCountFrequency (%)
- 1442
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 7210
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
9 1820
25.2%
1 1654
22.9%
- 1442
20.0%
0 985
13.7%
3 342
 
4.7%
2 340
 
4.7%
4 269
 
3.7%
8 166
 
2.3%
5 85
 
1.2%
6 57
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7210
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9 1820
25.2%
1 1654
22.9%
- 1442
20.0%
0 985
13.7%
3 342
 
4.7%
2 340
 
4.7%
4 269
 
3.7%
8 166
 
2.3%
5 85
 
1.2%
6 57
 
0.8%
Distinct1352
Distinct (%)13.5%
Missing1
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-12T17:56:48.542114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length167
Median length6
Mean length6.2710271
Min length1

Characters and Unicode

Total characters62704
Distinct characters31
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique502 ?
Unique (%)5.0%

Sample

1st rowK02_03
2nd rowA02_10
3rd rowX03_01
4th rowA03_11
5th rowX02
ValueCountFrequency (%)
z01 1272
 
12.7%
x02 486
 
4.9%
a13_13 255
 
2.5%
a13_09 221
 
2.2%
x01_02_25 214
 
2.1%
x03_01 198
 
2.0%
k15 181
 
1.8%
a15_08;a15_09 179
 
1.8%
x01_02_39 133
 
1.3%
x01_02 127
 
1.3%
Other values (1341) 6736
67.3%
2023-12-12T17:56:48.921581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 13611
21.7%
1 11620
18.5%
_ 8828
14.1%
2 3545
 
5.7%
3 2727
 
4.3%
5 2538
 
4.0%
A 2501
 
4.0%
X 2046
 
3.3%
; 1641
 
2.6%
9 1587
 
2.5%
Other values (21) 12060
19.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 40571
64.7%
Uppercase Letter 11648
 
18.6%
Connector Punctuation 8828
 
14.1%
Other Punctuation 1645
 
2.6%
Space Separator 10
 
< 0.1%
Close Punctuation 1
 
< 0.1%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 2501
21.5%
X 2046
17.6%
Z 1423
12.2%
L 743
 
6.4%
M 717
 
6.2%
G 626
 
5.4%
K 604
 
5.2%
D 530
 
4.6%
F 451
 
3.9%
C 441
 
3.8%
Other values (5) 1566
13.4%
Decimal Number
ValueCountFrequency (%)
0 13611
33.5%
1 11620
28.6%
2 3545
 
8.7%
3 2727
 
6.7%
5 2538
 
6.3%
9 1587
 
3.9%
8 1425
 
3.5%
6 1273
 
3.1%
4 1245
 
3.1%
7 1000
 
2.5%
Other Punctuation
ValueCountFrequency (%)
; 1641
99.8%
' 4
 
0.2%
Connector Punctuation
ValueCountFrequency (%)
_ 8828
100.0%
Space Separator
ValueCountFrequency (%)
10
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Lowercase Letter
ValueCountFrequency (%)
j 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 51055
81.4%
Latin 11649
 
18.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 2501
21.5%
X 2046
17.6%
Z 1423
12.2%
L 743
 
6.4%
M 717
 
6.2%
G 626
 
5.4%
K 604
 
5.2%
D 530
 
4.5%
F 451
 
3.9%
C 441
 
3.8%
Other values (6) 1567
13.5%
Common
ValueCountFrequency (%)
0 13611
26.7%
1 11620
22.8%
_ 8828
17.3%
2 3545
 
6.9%
3 2727
 
5.3%
5 2538
 
5.0%
; 1641
 
3.2%
9 1587
 
3.1%
8 1425
 
2.8%
6 1273
 
2.5%
Other values (5) 2260
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 62704
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 13611
21.7%
1 11620
18.5%
_ 8828
14.1%
2 3545
 
5.7%
3 2727
 
4.3%
5 2538
 
4.0%
A 2501
 
4.0%
X 2046
 
3.3%
; 1641
 
2.6%
9 1587
 
2.5%
Other values (21) 12060
19.2%
Distinct1445
Distinct (%)14.5%
Missing1
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-12T17:56:49.228514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length311
Median length159
Mean length11.935794
Min length2

Characters and Unicode

Total characters119346
Distinct characters623
Distinct categories6 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique607 ?
Unique (%)6.1%

Sample

1st row平安南道 江西郡 斑石面
2nd row京畿道 江華郡 松海面
3rd row러시아 연해주 지역
4th row京畿道 開城郡 松都面
5th row중국 關內
ValueCountFrequency (%)
京畿道 3306
 
11.5%
京城府 1315
 
4.6%
安城郡 824
 
2.9%
북간도[吉林省 658
 
2.3%
黃海道 654
 
2.3%
平安北道 610
 
2.1%
水原郡 606
 
2.1%
慶尙南道 539
 
1.9%
延吉道 514
 
1.8%
平安南道 504
 
1.8%
Other values (1529) 19185
66.8%
2023-12-12T17:56:49.825994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
18716
 
15.7%
10472
 
8.8%
7357
 
6.2%
6600
 
5.5%
5398
 
4.5%
3904
 
3.3%
3215
 
2.7%
2811
 
2.4%
2564
 
2.1%
2491
 
2.1%
Other values (613) 55818
46.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 96872
81.2%
Space Separator 18716
 
15.7%
Other Punctuation 1740
 
1.5%
Close Punctuation 985
 
0.8%
Open Punctuation 985
 
0.8%
Uppercase Letter 48
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10472
 
10.8%
7357
 
7.6%
6600
 
6.8%
5398
 
5.6%
3904
 
4.0%
3215
 
3.3%
2811
 
2.9%
2564
 
2.6%
2491
 
2.6%
2171
 
2.2%
Other values (605) 49889
51.5%
Other Punctuation
ValueCountFrequency (%)
; 1619
93.0%
? 97
 
5.6%
. 24
 
1.4%
Uppercase Letter
ValueCountFrequency (%)
C 24
50.0%
D 24
50.0%
Space Separator
ValueCountFrequency (%)
18716
100.0%
Close Punctuation
ValueCountFrequency (%)
] 985
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 985
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 90089
75.5%
Common 22426
 
18.8%
Hangul 6783
 
5.7%
Latin 48
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
10472
 
11.6%
7357
 
8.2%
6600
 
7.3%
5398
 
6.0%
3904
 
4.3%
3215
 
3.6%
2811
 
3.1%
2564
 
2.8%
2491
 
2.8%
2171
 
2.4%
Other values (554) 43106
47.8%
Hangul
ValueCountFrequency (%)
962
14.2%
962
14.2%
701
10.3%
555
 
8.2%
495
 
7.3%
320
 
4.7%
320
 
4.7%
318
 
4.7%
289
 
4.3%
272
 
4.0%
Other values (41) 1589
23.4%
Common
ValueCountFrequency (%)
18716
83.5%
; 1619
 
7.2%
] 985
 
4.4%
[ 985
 
4.4%
? 97
 
0.4%
. 24
 
0.1%
Latin
ValueCountFrequency (%)
C 24
50.0%
D 24
50.0%

Most occurring blocks

ValueCountFrequency (%)
CJK 90089
75.5%
ASCII 22474
 
18.8%
Hangul 6783
 
5.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
18716
83.3%
; 1619
 
7.2%
] 985
 
4.4%
[ 985
 
4.4%
? 97
 
0.4%
C 24
 
0.1%
D 24
 
0.1%
. 24
 
0.1%
CJK
ValueCountFrequency (%)
10472
 
11.6%
7357
 
8.2%
6600
 
7.3%
5398
 
6.0%
3904
 
4.3%
3215
 
3.6%
2811
 
3.1%
2564
 
2.8%
2491
 
2.8%
2171
 
2.4%
Other values (554) 43106
47.8%
Hangul
ValueCountFrequency (%)
962
14.2%
962
14.2%
701
10.3%
555
 
8.2%
495
 
7.3%
320
 
4.7%
320
 
4.7%
318
 
4.7%
289
 
4.3%
272
 
4.0%
Other values (41) 1589
23.4%

사건국외도시
Text

MISSING 

Distinct265
Distinct (%)27.7%
Missing9043
Missing (%)90.4%
Memory size156.2 KiB
2023-12-12T17:56:50.113660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length39
Median length34
Mean length6.2884013
Min length1

Characters and Unicode

Total characters6018
Distinct characters488
Distinct categories9 ?
Distinct scripts5 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique195 ?
Unique (%)20.4%

Sample

1st row니콜리스크(ニコリスクコエ)
2nd row上海(상해)
3rd row上海
4th row포그라니치니
5th row浦潮(블라디보스토크)
ValueCountFrequency (%)
上海(상해 271
27.6%
上海 154
 
15.7%
浦潮(블라디보스토크 46
 
4.7%
東京(도쿄 17
 
1.7%
니콜스크-우수리스크 16
 
1.6%
東京 16
 
1.6%
블라디보스토크 14
 
1.4%
北京 12
 
1.2%
華盛頓(위싱턴 11
 
1.1%
新韓村 8
 
0.8%
Other values (278) 416
42.4%
2023-12-12T17:56:50.579315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
) 542
 
9.0%
( 541
 
9.0%
459
 
7.6%
451
 
7.5%
288
 
4.8%
284
 
4.7%
203
 
3.4%
171
 
2.8%
; 130
 
2.2%
107
 
1.8%
Other values (478) 2842
47.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4598
76.4%
Close Punctuation 542
 
9.0%
Open Punctuation 541
 
9.0%
Other Punctuation 140
 
2.3%
Space Separator 107
 
1.8%
Lowercase Letter 52
 
0.9%
Dash Punctuation 30
 
0.5%
Uppercase Letter 7
 
0.1%
Decimal Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
459
 
10.0%
451
 
9.8%
288
 
6.3%
284
 
6.2%
203
 
4.4%
171
 
3.7%
104
 
2.3%
90
 
2.0%
87
 
1.9%
87
 
1.9%
Other values (442) 2374
51.6%
Lowercase Letter
ValueCountFrequency (%)
s 8
15.4%
i 7
13.5%
k 6
11.5%
a 5
9.6%
o 4
 
7.7%
r 3
 
5.8%
l 2
 
3.8%
v 2
 
3.8%
t 2
 
3.8%
n 2
 
3.8%
Other values (11) 11
21.2%
Uppercase Letter
ValueCountFrequency (%)
V 1
14.3%
B 1
14.3%
N 1
14.3%
U 1
14.3%
K 1
14.3%
P 1
14.3%
O 1
14.3%
Other Punctuation
ValueCountFrequency (%)
; 130
92.9%
? 9
 
6.4%
1
 
0.7%
Close Punctuation
ValueCountFrequency (%)
) 542
100.0%
Open Punctuation
ValueCountFrequency (%)
( 541
100.0%
Space Separator
ValueCountFrequency (%)
107
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 30
100.0%
Decimal Number
ValueCountFrequency (%)
3 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2302
38.3%
Han 2204
36.6%
Common 1361
22.6%
Katakana 92
 
1.5%
Latin 59
 
1.0%

Most frequent character per script

Han
ValueCountFrequency (%)
459
20.8%
451
20.5%
73
 
3.3%
68
 
3.1%
59
 
2.7%
53
 
2.4%
39
 
1.8%
36
 
1.6%
29
 
1.3%
27
 
1.2%
Other values (240) 910
41.3%
Hangul
ValueCountFrequency (%)
288
 
12.5%
284
 
12.3%
203
 
8.8%
171
 
7.4%
104
 
4.5%
90
 
3.9%
87
 
3.8%
87
 
3.8%
81
 
3.5%
52
 
2.3%
Other values (162) 855
37.1%
Katakana
ValueCountFrequency (%)
9
 
9.8%
9
 
9.8%
8
 
8.7%
7
 
7.6%
6
 
6.5%
5
 
5.4%
5
 
5.4%
5
 
5.4%
4
 
4.3%
3
 
3.3%
Other values (20) 31
33.7%
Latin
ValueCountFrequency (%)
s 8
13.6%
i 7
 
11.9%
k 6
 
10.2%
a 5
 
8.5%
o 4
 
6.8%
r 3
 
5.1%
l 2
 
3.4%
v 2
 
3.4%
t 2
 
3.4%
n 2
 
3.4%
Other values (18) 18
30.5%
Common
ValueCountFrequency (%)
) 542
39.8%
( 541
39.8%
; 130
 
9.6%
107
 
7.9%
- 30
 
2.2%
? 9
 
0.7%
1
 
0.1%
3 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2302
38.3%
CJK 2176
36.2%
ASCII 1419
23.6%
Katakana 92
 
1.5%
CJK Compat Ideographs 28
 
0.5%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
) 542
38.2%
( 541
38.1%
; 130
 
9.2%
107
 
7.5%
- 30
 
2.1%
? 9
 
0.6%
s 8
 
0.6%
i 7
 
0.5%
k 6
 
0.4%
a 5
 
0.4%
Other values (25) 34
 
2.4%
CJK
ValueCountFrequency (%)
459
21.1%
451
20.7%
73
 
3.4%
68
 
3.1%
59
 
2.7%
53
 
2.4%
39
 
1.8%
36
 
1.7%
29
 
1.3%
27
 
1.2%
Other values (232) 882
40.5%
Hangul
ValueCountFrequency (%)
288
 
12.5%
284
 
12.3%
203
 
8.8%
171
 
7.4%
104
 
4.5%
90
 
3.9%
87
 
3.8%
87
 
3.8%
81
 
3.5%
52
 
2.3%
Other values (162) 855
37.1%
CJK Compat Ideographs
ValueCountFrequency (%)
15
53.6%
5
 
17.9%
2
 
7.1%
2
 
7.1%
1
 
3.6%
1
 
3.6%
1
 
3.6%
1
 
3.6%
Katakana
ValueCountFrequency (%)
9
 
9.8%
9
 
9.8%
8
 
8.7%
7
 
7.6%
6
 
6.5%
5
 
5.4%
5
 
5.4%
5
 
5.4%
4
 
4.3%
3
 
3.3%
Other values (20) 31
33.7%
None
ValueCountFrequency (%)
1
100.0%

사건세부장소
Text

MISSING 

Distinct3480
Distinct (%)61.8%
Missing4366
Missing (%)43.7%
Memory size156.2 KiB
2023-12-12T17:56:50.875347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length149
Median length93
Mean length9.1446574
Min length1

Characters and Unicode

Total characters51521
Distinct characters1484
Distinct categories10 ?
Distinct scripts5 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2779 ?
Unique (%)49.3%

Sample

1st row沙川憲兵駐在所
2nd row率丁里
3rd row李萬珪 집;北部禮拜堂;京城 吳華英 집
4th row洪原邑內
5th row邑內洞
ValueCountFrequency (%)
160
 
2.2%
元谷面事務所 74
 
1.0%
沙江里 63
 
0.9%
도내 62
 
0.9%
松山面事務所 55
 
0.8%
陽城警察官駐在所;陽城郵便所;元谷面事務所 54
 
0.8%
雨汀面事務所;花樹駐在所;雙峯山;長安面事務所 53
 
0.7%
龍井村 53
 
0.7%
京城)府內;숭덕학교;홍서동 50
 
0.7%
서부교회;송항리;북청읍내;이원읍내;천북동;구암리;만정;고사정;청주천도교구실;고려정;동막리 50
 
0.7%
Other values (3859) 6496
90.6%
2023-12-12T17:56:51.363659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
; 4035
 
7.8%
1785
 
3.5%
1643
 
3.2%
1415
 
2.7%
959
 
1.9%
867
 
1.7%
783
 
1.5%
723
 
1.4%
715
 
1.4%
708
 
1.4%
Other values (1474) 37888
73.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 44116
85.6%
Other Punctuation 4211
 
8.2%
Space Separator 1643
 
3.2%
Open Punctuation 597
 
1.2%
Close Punctuation 588
 
1.1%
Decimal Number 191
 
0.4%
Lowercase Letter 115
 
0.2%
Uppercase Letter 40
 
0.1%
Dash Punctuation 14
 
< 0.1%
Math Symbol 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1785
 
4.0%
1415
 
3.2%
959
 
2.2%
867
 
2.0%
783
 
1.8%
723
 
1.6%
715
 
1.6%
708
 
1.6%
690
 
1.6%
588
 
1.3%
Other values (1414) 34883
79.1%
Lowercase Letter
ValueCountFrequency (%)
a 19
16.5%
n 16
13.9%
o 13
11.3%
g 9
7.8%
m 7
 
6.1%
i 7
 
6.1%
l 6
 
5.2%
s 6
 
5.2%
d 5
 
4.3%
e 5
 
4.3%
Other values (8) 22
19.1%
Uppercase Letter
ValueCountFrequency (%)
S 8
20.0%
A 4
10.0%
O 3
 
7.5%
C 3
 
7.5%
K 3
 
7.5%
U 3
 
7.5%
P 2
 
5.0%
M 2
 
5.0%
H 2
 
5.0%
R 2
 
5.0%
Other values (7) 8
20.0%
Decimal Number
ValueCountFrequency (%)
1 83
43.5%
2 51
26.7%
3 22
 
11.5%
5 16
 
8.4%
4 15
 
7.9%
0 2
 
1.0%
8 1
 
0.5%
6 1
 
0.5%
Other Punctuation
ValueCountFrequency (%)
; 4035
95.8%
? 152
 
3.6%
: 18
 
0.4%
. 4
 
0.1%
* 1
 
< 0.1%
/ 1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 474
79.4%
[ 122
 
20.4%
{ 1
 
0.2%
Close Punctuation
ValueCountFrequency (%)
) 466
79.3%
] 121
 
20.6%
} 1
 
0.2%
Math Symbol
ValueCountFrequency (%)
~ 3
50.0%
2
33.3%
+ 1
 
16.7%
Space Separator
ValueCountFrequency (%)
1643
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 33805
65.6%
Hangul 10011
 
19.4%
Common 7250
 
14.1%
Katakana 300
 
0.6%
Latin 155
 
0.3%

Most frequent character per script

Han
ValueCountFrequency (%)
1785
 
5.3%
1415
 
4.2%
959
 
2.8%
867
 
2.6%
783
 
2.3%
715
 
2.1%
708
 
2.1%
690
 
2.0%
588
 
1.7%
559
 
1.7%
Other values (1050) 24736
73.2%
Hangul
ValueCountFrequency (%)
723
 
7.2%
367
 
3.7%
336
 
3.4%
283
 
2.8%
269
 
2.7%
266
 
2.7%
257
 
2.6%
243
 
2.4%
200
 
2.0%
199
 
2.0%
Other values (307) 6868
68.6%
Katakana
ValueCountFrequency (%)
56
18.7%
54
18.0%
53
17.7%
12
 
4.0%
11
 
3.7%
11
 
3.7%
10
 
3.3%
8
 
2.7%
7
 
2.3%
6
 
2.0%
Other values (37) 72
24.0%
Latin
ValueCountFrequency (%)
a 19
 
12.3%
n 16
 
10.3%
o 13
 
8.4%
g 9
 
5.8%
S 8
 
5.2%
m 7
 
4.5%
i 7
 
4.5%
l 6
 
3.9%
s 6
 
3.9%
d 5
 
3.2%
Other values (25) 59
38.1%
Common
ValueCountFrequency (%)
; 4035
55.7%
1643
22.7%
( 474
 
6.5%
) 466
 
6.4%
? 152
 
2.1%
[ 122
 
1.7%
] 121
 
1.7%
1 83
 
1.1%
2 51
 
0.7%
3 22
 
0.3%
Other values (15) 81
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
CJK 33805
65.6%
Hangul 10010
 
19.4%
ASCII 7403
 
14.4%
Katakana 300
 
0.6%
Math Operators 2
 
< 0.1%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
; 4035
54.5%
1643
22.2%
( 474
 
6.4%
) 466
 
6.3%
? 152
 
2.1%
[ 122
 
1.6%
] 121
 
1.6%
1 83
 
1.1%
2 51
 
0.7%
3 22
 
0.3%
Other values (49) 234
 
3.2%
CJK
ValueCountFrequency (%)
1785
 
5.3%
1415
 
4.2%
959
 
2.8%
867
 
2.6%
783
 
2.3%
715
 
2.1%
708
 
2.1%
690
 
2.0%
588
 
1.7%
559
 
1.7%
Other values (1050) 24736
73.2%
Hangul
ValueCountFrequency (%)
723
 
7.2%
367
 
3.7%
336
 
3.4%
283
 
2.8%
269
 
2.7%
266
 
2.7%
257
 
2.6%
243
 
2.4%
200
 
2.0%
199
 
2.0%
Other values (306) 6867
68.6%
Katakana
ValueCountFrequency (%)
56
18.7%
54
18.0%
53
17.7%
12
 
4.0%
11
 
3.7%
11
 
3.7%
10
 
3.3%
8
 
2.7%
7
 
2.3%
6
 
2.0%
Other values (37) 72
24.0%
Math Operators
ValueCountFrequency (%)
2
100.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%

탄압정보포함여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
아니요
7790 
2191 
<NA>
 
19

Length

Max length4
Median length3
Mean length2.5637
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row아니요
2nd row아니요
3rd row아니요
4th row아니요
5th row아니요

Common Values

ValueCountFrequency (%)
아니요 7790
77.9%
2191
 
21.9%
<NA> 19
 
0.2%

Length

2023-12-12T17:56:51.531687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:56:51.666602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
아니요 7790
77.9%
2191
 
21.9%
na 19
 
0.2%

제공
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
국사편찬위원회
9981 
<NA>
 
19

Length

Max length7
Median length7
Mean length6.9943
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row국사편찬위원회
2nd row국사편찬위원회
3rd row국사편찬위원회
4th row국사편찬위원회
5th row국사편찬위원회

Common Values

ValueCountFrequency (%)
국사편찬위원회 9981
99.8%
<NA> 19
 
0.2%

Length

2023-12-12T17:56:51.793197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:56:51.956498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국사편찬위원회 9981
99.8%
na 19
 
0.2%

제공일자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2020-01-31
9981 
<NA>
 
19

Length

Max length10
Median length10
Mean length9.9886
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-01-31
2nd row2020-01-31
3rd row2020-01-31
4th row2020-01-31
5th row2020-01-31

Common Values

ValueCountFrequency (%)
2020-01-31 9981
99.8%
<NA> 19
 
0.2%

Length

2023-12-12T17:56:52.087337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:56:52.236968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020-01-31 9981
99.8%
na 19
 
0.2%

Unnamed: 17
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

Unnamed: 18
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

Correlations

2023-12-12T17:56:52.668254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출처정보구분출전문서구분탄압정보포함여부
출처정보구분1.0000.3880.297
출전문서구분0.3881.0000.503
탄압정보포함여부0.2970.5031.000
2023-12-12T17:56:52.791511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
제공일자제공출전문서구분출처정보구분탄압정보포함여부
제공일자1.0001.0001.0001.0001.000
제공1.0001.0001.0001.0001.000
출전문서구분1.0001.0001.0000.2500.379
출처정보구분1.0001.0000.2501.0000.362
탄압정보포함여부1.0001.0000.3790.3621.000
2023-12-12T17:56:52.928620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출처정보구분출전문서구분탄압정보포함여부제공제공일자
출처정보구분1.0000.2500.3621.0001.000
출전문서구분0.2501.0000.3791.0001.000
탄압정보포함여부0.3620.3791.0001.0001.000
제공1.0001.0001.0001.0001.000
제공일자1.0001.0001.0001.0001.000

Missing values

2023-12-12T17:56:39.914525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T17:56:40.264265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T17:56:40.573731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

출처정보ID사건ID출전문서ID출전문서제목출처정보구분출전문서구분출처정보비고사건일자사건종료일자사건장소코드사건장소명사건국외도시사건세부장소탄압정보포함여부제공제공일자Unnamed: 17Unnamed: 18
565616G_0479b_30221jssy_007_0060獨立運動에 관한 건(제6보)운동소요사건서류[독립운동에 관한 건(제6보)] 平安南道 江西郡 沙川憲兵駐在所는 3월 4일(時間不詳)에 暴民이 來襲하므로 所長 上等兵 佐藤實五郞 외 補助員 3명이 힘을 다해 시위대를 격퇴하는 데 힘썼으나 탄약을 다 사용하고 중과부적으로 마침내 장렬한 최후를 맞었다. 平壤憲兵分隊長 이하 10명이 사천으로 급행했는데 暴民 중에도 다수의 사상자가 있는 것 같다.<NA>1919-03-04<NA>K02_03平安南道 江西郡 斑石面<NA>沙川憲兵駐在所아니요국사편찬위원회2020-01-31<NA><NA>
17499hd_026_0070_0100_001a_00051hd_026_0070_0100調書운동경성지법검사국문서1919년 3월 18일 강화군 부내면 만세시위사건 피고인 高成根(松海面長)에 대한 경찰 신문조서. 피고인 염씨가 전달한 불온문서의 수취 경위 등 신문<NA>1919-03-16<NA>A02_10京畿道 江華郡 松海面<NA>率丁里아니요국사편찬위원회2020-01-31<NA><NA>
13281haf_109_0151_001<NA>haf_109_0151[當地朝鮮人ニハ元來日韓倂合ヲ…]독립운동일본외무성기록니콜리스크(ニコリスクコエ)에서는 韓族國民委員會와 협의 후 2월 상순 파리에 尹海 및 高昌一 2명을 파견하였다. 또 [上海]에서도 김규식을 파견하였다.<NA>1919-02-99<NA>X03_01러시아 연해주 지역니콜리스크(ニコリスクコエ)<NA>아니요국사편찬위원회2020-01-31<NA><NA>
15003hd_013_0030_1010_001x_00000hd_013_0030_1010第2回證人訊問調書운동경성지법검사국문서1919년 3월 1일 경기도 개성의 독립선언서 배포 관련자 강조원에 대한 증인 오화영의 경찰신문조서임(제2회). 2월 중순 개성 방문시 행적, 2월 26일 상경한 강조원과 면담 내용 등 신문임<NA>1919-02-221919-02-28A03_11京畿道 開城郡 松都面<NA>李萬珪 집;北部禮拜堂;京城 吳華英 집아니요국사편찬위원회2020-01-31<NA><NA>
12053dr_07101001_0010<NA>dr_071_01_001獨立公債募集독립운동독립신문1920년 04월 29일 [獨立公債募集] 독립공채모집 홍보를 게재<NA>1919-04-29<NA>X02중국 關內上海(상해)<NA>아니요국사편찬위원회2020-01-31<NA><NA>
600216G_0801b_51722jssy_007_0260獨立運動에 관한 건(제18보)운동소요사건서류獨立運動에 관한 건(제18보) / 咸鏡南道 洪原郡 洪原. 3월 16일 오전 11시경 홍원에서 天道敎徒 약 700명이 집합하여 舊韓國國旗를 세우고 독립만세를 연호하면서 읍내를 돌아다니는 不穩한 형세가 있어서 首謀者 10명을 경찰서에 검속하고 다른사람은 해산시켰다.<NA>1919-03-16<NA>I17_08咸鏡南道 洪原郡 州翼面<NA>洪原邑內국사편찬위원회2020-01-31<NA><NA>
805418B_0628b_00822jssy_007_2570騷擾事件 經過 槪覽表(1919.3.1-1919.4.30)운동소요사건서류[騷擾事件經過槪覽表(1919.3.1~4.30.)]4월 4일, 江原 襄陽, 示威運動以上, 群衆數 1100명, 發砲, 死傷(暴民[한국] 3명, 我[일제] 1명)<NA>1919-04-04<NA>H08江原道 襄陽郡<NA><NA>국사편찬위원회2020-01-31<NA><NA>
723118A_1021h_10256jssy_001_5470朝鮮騷擾事件一覽表에 關한 件운동소요사건서류[朝鮮騷擾事件一覽表 1919년 4월말 조사]3월 17일, 河東郡 河東面 邑內洞, 未然防止<NA>1919-03-17<NA>G18_06慶尙南道 河東郡 河東面<NA>邑內洞아니요국사편찬위원회2020-01-31<NA><NA>
273216D_0540<NA>jssy_006_0070國外情報 : 大同團이 配付한 不穩印刷物의 件독립운동소요사건서류金嘉鎭이 總載인 在上海大同團은 조선인이 奮起할 것을 喚起하고 또 軍資金醵出을 권유하기 위해 두 종류의 不穩文書를 인쇄하여 조선 내외 각지에 발송하였는데 佈告文 중 血戰 云云하는 시기는 지난번 威鏡北道 穩城地方의 침입을 뜻하는 것이 아니고 오히려 앞으로의 일을 기도한 것이라 함.첨부문서 있음(大同團總部의 通告文과 醵金勸告文, 大同團總裁의 佈告文)1920-03-061920-03-10X02중국 關內上海<NA>아니요국사편찬위원회2020-01-31<NA><NA>
305716E_0100a_00579jssy_001_1680朝鮮에 있어서 獨立運動에 關한 件(次官으로부터 侍從武官長에 통첩)운동소요사건서류[獨立運動을 위한 朝鮮人 不穩 行動에 關한 狀況(3월 21일부터 31일 사이의 조사)] 京畿道 始興郡, 富川郡, 水原郡, 龍仁郡, 楊州郡, 抱川郡 등 11개 장소에서 폭민 약 200, 300명 많게는 2000명이 시위를 일으켰다. 이들은 폭력적 양상을 보였는데, 수원 남쪽에 있는 우편서, 주재소, 일본인 가옥을 파괴, 방화 등의 광폭한 성향을 보였다. 대부분 약간명의 군대를 파견하였다. 군대의 원조를 받아 진압하였다. 사상자가 약간 있다.3월 29일 양주군의 시위는 노해면, 장흥면, 별내면이 있다. 각면에 배치한다.1919-03-29<NA>A16_03京畿道 楊州郡 蘆海面<NA><NA>아니요국사편찬위원회2020-01-31<NA><NA>
출처정보ID사건ID출전문서ID출전문서제목출처정보구분출전문서구분출처정보비고사건일자사건종료일자사건장소코드사건장소명사건국외도시사건세부장소탄압정보포함여부제공제공일자Unnamed: 17Unnamed: 18
720218A_0991h_10226jssy_001_5470朝鮮騷擾事件一覽表에 關한 件운동소요사건서류[朝鮮騷擾事件一覽表 1919년 4월말 조사]4월 3일, 金海郡 長有面 新文市場, 未然防止<NA>1919-04-03<NA>G03_09慶尙南道 金海郡 長有面<NA>新文市場아니요국사편찬위원회2020-01-31<NA><NA>
16730hd_022_0020_1060_0100_001a_00509hd_022_0020_1060_0100公判始末書운동경성지법검사국문서1919년 3월 28일 수원군 송산면 만세시위 및 순사살해사건 피고인 洪?玉[당시 松山面 서기]에 대한 1920년 3월 31일 경성지법 공판 시말서<NA>1919-03-28<NA>A13_09京畿道 水原郡 松山面<NA>松山面事務所아니요국사편찬위원회2020-01-31<NA><NA>
543016G_0304_02<NA>jssy_004_0210鮮內外一般 狀況(6月21日~7月10日)독립운동소요사건서류조선 내외 일반상황(6월 21일~7월 10일), 鮮外 상황, 豆滿江 對岸 방면 / 6월 24일 간도지방의 한족독립운동 유력자 柳河天(前 淸津府 서기로서 현재 間島時報 기자인 자로서 독립운동 때 문서부장을 맡았음)은 영사관에 자수해 구금되었다.間島領事館 소재지인 용정촌을 장소로 특정하였다.1919-06-24<NA>X01_02_25북간도[吉林省 延吉道] 延吉縣<NA>용정촌 間島領事館아니요국사편찬위원회2020-01-31<NA><NA>
712018A_0909a_00310jssy_001_5470朝鮮騷擾事件一覽表에 關한 件운동소요사건서류[朝鮮騷擾事件一覽表 1919년 4월말 조사]4월 2일, 公州郡 正安面 臺山里[大山里], 暴行 없음, 騷擾人員 500명, 騷擾者種別 보통민, 警察 管轄 騷擾地<NA>1919-04-02<NA>C01_11忠淸南道 公州郡 正安面<NA>臺山里아니요국사편찬위원회2020-01-31<NA><NA>
19141mis_0062_002a_00019mis_0062Stories of Wounded Koreans in Severance Hospital운동재한선교사보고자료[세브란스병원 한국인 부상자들의 증언, F. G. Vesey 목사, 1919년 3월 29일]<br/>(3) Kim Nam San[김남산], 27세. 파주(Paiju) 출신, 어깨에 총상, 종교 없음. 공웅(Kong Ung)장날, 마을 사람들과 함께 장터로 가서 약 1,000명 정도 되는 사람들이 함께 모여 만세시위. 헌병 8명 출종(일본인 6명, 한국인 2명). 일본인 헌병들만이 총을 휴대하고고 있었으며, 군중이 계속하여 소리 높이 외치자, 발포. 4명은 죽고 3명 부상(그가 아는 한에서).mis_2008_002/mis_0059_002와 동일한 내용.1919-03-28<NA>A21_01京畿道 坡州郡 廣灘面<NA>공웅장국사편찬위원회2020-01-31<NA><NA>
9235MI191903290400_0010N_12020MI_1919_0329_0400騷擾事件의 後報, 경상북도 金泉, 잔치 끝에 소요운동매일신보매일신보 1919년 3월 29일 [騷擾事件의 後報, 경상북도 金泉, 잔치 끝에 소요]24일 김천군 개령면 보통학교 졸업증서 수여식을 거행하게되어 다수의 학부형이 참석하고 그 수여식을 마친후 동면에 있던 사람의 집 결혼식이 있어서 여기 모인 사람들은 음식을 먹은 뒤에 동부 높은 산에 올라가서 소요를 일으키였음으로 헌병출장소원은 쫓아가서 해산을 명하는 동시에 4명을 인치하였더라 한편 이말을 들은 김천헌병분대는 고교생등 병보조원 3명이 개령으로 급하하고 출장소원과 함께 한쪽의 가택을 수색하고자 하였으나 아무증거가 없었더라.1919-03-24<NA>F05_02慶尙北道 金泉郡 開寧面<NA>동부동(개령읍내)국사편찬위원회2020-01-31<NA><NA>
13562haf_110_0007_001a_00478haf_110_0007[電報譯 : 三十日京畿龍仁郡ホウトクセン附近ニ暴民…]운동일본외무성기록3월 30일 京畿道 龍仁郡 豊德川 부근에서 暴民 2,000명이 騷擾를 일으켜 출장한 헌병을 폭행하였다. 發砲하여 해산시켰다. 조선인 사망 2명이다.<NA>1919-03-30<NA>A10_07京畿道 龍仁郡 水枝面<NA>豊德川 부근아니요국사편찬위원회2020-01-31<NA><NA>
9419MI191904060690_0030N_00397MI_1919_0406_0690騷擾事件의 後報, 충청북도 槐山, 각 면에서 소요운동매일신보매일신보 1919년 4월 6일 [騷擾事件의 後報, 충청북도 槐山, 각 면에서 소요]괴산군 증평면 장연면에서는 일일에 시위운동을 시작하였음으로 주모자로 인정할만한자 약간명을 체포하였으며 장연에서는 다수한 군중이 면사무소를 습격하여 문부을 파기하였다하며 감을면 방면에서는 이일에 시위운동을 하고 소수면에서는 군중 약 7백명이 동면 면장을 끌어내어 선두에 세우고 만세를 부르게하고 각동리로 돌아다니다가 해산하였다더라.본문에 청천은 괴산군 청천면 청천리, 장연은 장연면 오가리, 감을면은 감물면의 오기, 방축동은 사리면의 방축리, 제월은 괴산면 제월리, 이탄은 감물면의 오성리와 검승리 사이의 자연마을(성골산 아래)이다.1919-04-02<NA>B01_07忠淸北道 槐山郡 沼壽面<NA><NA>아니요국사편찬위원회2020-01-31<NA><NA>
822018B_0794b_21221jssy_007_2570騷擾事件 經過 槪覽表(1919.3.1-1919.4.30)운동소요사건서류[騷擾事件經過槪覽表(1919.3.1~4.30.)]4월 12일, 慶北 善山郡 善山, 示威運動以上, 群衆數 30명<NA>1919-04-12<NA>F12_06慶尙北道 善山郡 善山面<NA><NA>아니요국사편찬위원회2020-01-31<NA><NA>
16269hd_021_0010_0080_001a_00511hd_021_0010_0080證人 宋哲浩 調書운동경성지법검사국문서1919년 4월 3일 장안면 우정면 시위사건 증인 송철호에 대한 경성지방법원 예심판사의 신문조서. 우정면 화수리 송철호는 이번 소요로 아버지가 피살되고 집이 소실되었으며, 소요 당일 저녁 崔長官이 피묻은 몽둥이를 들고 있었다는 애기를 朱日峰으로부터 들었다고 진술.이하 원본에 관련문서 1장 있음1919-04-03<NA>A13_13京畿道 水原郡 雨汀面<NA>花樹駐在所아니요국사편찬위원회2020-01-31<NA><NA>