Overview

Dataset statistics

Number of variables4
Number of observations592
Missing cells196
Missing cells (%)8.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory19.2 KiB
Average record size in memory33.2 B

Variable types

Numeric1
Text2
Categorical1

Dataset

Description국립중앙박물관에서 제공하는 조선총독부박물관 공문서에 대한 데이터로 문권제목, 정리분류, 연도 항목을 제공합니다.
Author문화체육관광부 국립중앙박물관
URLhttps://www.data.go.kr/data/3070537/fileData.do

Alerts

연번 is highly overall correlated with 정리분류High correlation
정리분류 is highly overall correlated with 연번High correlation
연도 has 196 (33.1%) missing valuesMissing
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 06:14:18.886645
Analysis finished2023-12-12 06:14:19.623417
Duration0.74 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct592
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean297.375
Minimum1
Maximum593
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.3 KiB
2023-12-12T15:14:19.715518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile30.55
Q1149.75
median297.5
Q3445.25
95-th percentile563.45
Maximum593
Range592
Interquartile range (IQR)295.5

Descriptive statistics

Standard deviation171.22977
Coefficient of variation (CV)0.5758042
Kurtosis-1.1975536
Mean297.375
Median Absolute Deviation (MAD)148
Skewness-0.0028923879
Sum176046
Variance29319.636
MonotonicityStrictly increasing
2023-12-12T15:14:19.879818image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.2%
392 1
 
0.2%
394 1
 
0.2%
395 1
 
0.2%
396 1
 
0.2%
397 1
 
0.2%
398 1
 
0.2%
399 1
 
0.2%
400 1
 
0.2%
401 1
 
0.2%
Other values (582) 582
98.3%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
593 1
0.2%
592 1
0.2%
591 1
0.2%
590 1
0.2%
589 1
0.2%
588 1
0.2%
587 1
0.2%
586 1
0.2%
585 1
0.2%
584 1
0.2%
Distinct582
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Memory size4.8 KiB
2023-12-12T15:14:20.205445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length62
Median length44.5
Mean length18.739865
Min length2

Characters and Unicode

Total characters11094
Distinct characters673
Distinct categories8 ?
Distinct scripts4 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique573 ?
Unique (%)96.8%

Sample

1st row기부품 목록
2nd row백자투조모란문호(白磁透彫牡丹文壺)
3rd row수탁증서(受託證書) 1~96호
4th row대정 13년도~소화 4년도 진열품 기부 문서철
5th row대정 4년도~소화 8년도 진열품 기부 문서철
ValueCountFrequency (%)
대정 53
 
2.2%
48
 
1.9%
관계 36
 
1.5%
소화 36
 
1.5%
31
 
1.3%
목록 30
 
1.2%
고적 29
 
1.2%
28
 
1.1%
쇼와(昭和 26
 
1.1%
지정 26
 
1.1%
Other values (1079) 2120
86.1%
2023-12-12T15:14:20.722604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1883
 
17.0%
( 331
 
3.0%
) 330
 
3.0%
277
 
2.5%
275
 
2.5%
267
 
2.4%
205
 
1.8%
178
 
1.6%
164
 
1.5%
160
 
1.4%
Other values (663) 7024
63.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7959
71.7%
Space Separator 1883
 
17.0%
Decimal Number 412
 
3.7%
Open Punctuation 340
 
3.1%
Close Punctuation 340
 
3.1%
Other Punctuation 56
 
0.5%
Dash Punctuation 54
 
0.5%
Math Symbol 50
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
277
 
3.5%
275
 
3.5%
267
 
3.4%
205
 
2.6%
178
 
2.2%
164
 
2.1%
160
 
2.0%
153
 
1.9%
142
 
1.8%
132
 
1.7%
Other values (641) 6006
75.5%
Decimal Number
ValueCountFrequency (%)
1 147
35.7%
5 44
 
10.7%
2 42
 
10.2%
3 34
 
8.3%
4 32
 
7.8%
8 29
 
7.0%
6 27
 
6.6%
7 19
 
4.6%
9 19
 
4.6%
0 19
 
4.6%
Open Punctuation
ValueCountFrequency (%)
( 331
97.4%
[ 8
 
2.4%
1
 
0.3%
Close Punctuation
ValueCountFrequency (%)
) 330
97.1%
] 9
 
2.6%
1
 
0.3%
Other Punctuation
ValueCountFrequency (%)
, 53
94.6%
· 2
 
3.6%
. 1
 
1.8%
Space Separator
ValueCountFrequency (%)
1883
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 54
100.0%
Math Symbol
ValueCountFrequency (%)
~ 50
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7129
64.3%
Common 3135
28.3%
Han 829
 
7.5%
Hiragana 1
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
77
 
9.3%
35
 
4.2%
34
 
4.1%
22
 
2.7%
18
 
2.2%
殿 15
 
1.8%
13
 
1.6%
11
 
1.3%
10
 
1.2%
7
 
0.8%
Other values (325) 587
70.8%
Hangul
ValueCountFrequency (%)
277
 
3.9%
275
 
3.9%
267
 
3.7%
205
 
2.9%
178
 
2.5%
164
 
2.3%
160
 
2.2%
153
 
2.1%
142
 
2.0%
132
 
1.9%
Other values (305) 5176
72.6%
Common
ValueCountFrequency (%)
1883
60.1%
( 331
 
10.6%
) 330
 
10.5%
1 147
 
4.7%
- 54
 
1.7%
, 53
 
1.7%
~ 50
 
1.6%
5 44
 
1.4%
2 42
 
1.3%
3 34
 
1.1%
Other values (12) 167
 
5.3%
Hiragana
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7129
64.3%
ASCII 3131
28.2%
CJK 804
 
7.2%
CJK Compat Ideographs 25
 
0.2%
None 4
 
< 0.1%
Hiragana 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1883
60.1%
( 331
 
10.6%
) 330
 
10.5%
1 147
 
4.7%
- 54
 
1.7%
, 53
 
1.7%
~ 50
 
1.6%
5 44
 
1.4%
2 42
 
1.3%
3 34
 
1.1%
Other values (9) 163
 
5.2%
Hangul
ValueCountFrequency (%)
277
 
3.9%
275
 
3.9%
267
 
3.7%
205
 
2.9%
178
 
2.5%
164
 
2.3%
160
 
2.2%
153
 
2.1%
142
 
2.0%
132
 
1.9%
Other values (305) 5176
72.6%
CJK
ValueCountFrequency (%)
77
 
9.6%
35
 
4.4%
34
 
4.2%
22
 
2.7%
18
 
2.2%
殿 15
 
1.9%
13
 
1.6%
11
 
1.4%
10
 
1.2%
7
 
0.9%
Other values (312) 562
69.9%
CJK Compat Ideographs
ValueCountFrequency (%)
5
20.0%
5
20.0%
3
12.0%
2
 
8.0%
2
 
8.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
Other values (3) 3
12.0%
None
ValueCountFrequency (%)
· 2
50.0%
1
25.0%
1
25.0%
Hiragana
ValueCountFrequency (%)
1
100.0%

정리분류
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size4.8 KiB
보존
197 
고적조사
193 
지정
56 
진열
33 
국유림
33 
Other values (7)
80 

Length

Max length6
Median length2
Mean length2.7179054
Min length2

Unique

Unique2 ?
Unique (%)0.3%

Sample

1st row기부
2nd row기부
3rd row기부
4th row기부
5th row기부

Common Values

ValueCountFrequency (%)
보존 197
33.3%
고적조사 193
32.6%
지정 56
 
9.5%
진열 33
 
5.6%
국유림 33
 
5.6%
기타 26
 
4.4%
발견 24
 
4.1%
구입 20
 
3.4%
기부 6
 
1.0%
도면 2
 
0.3%
Other values (2) 2
 
0.3%

Length

2023-12-12T15:14:20.924479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
보존 197
33.3%
고적조사 193
32.6%
지정 56
 
9.5%
진열 33
 
5.6%
국유림 33
 
5.6%
기타 26
 
4.4%
발견 24
 
4.1%
구입 20
 
3.4%
기부 6
 
1.0%
도면 2
 
0.3%
Other values (2) 2
 
0.3%

연도
Text

MISSING 

Distinct181
Distinct (%)45.7%
Missing196
Missing (%)33.1%
Memory size4.8 KiB
2023-12-12T15:14:21.307693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length15
Mean length5.5176768
Min length2

Characters and Unicode

Total characters2185
Distinct characters28
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique128 ?
Unique (%)32.3%

Sample

1st row대정12, 소화19
2nd row소화15
3rd row대정09~소화04
4th row대정04~소화07
5th row소화09~13
ValueCountFrequency (%)
대정 38
 
9.3%
대정06 18
 
4.4%
소화13 14
 
3.4%
소화09 13
 
3.2%
소화04 12
 
2.9%
소화05 11
 
2.7%
대정05 9
 
2.2%
소화12 9
 
2.2%
대정11 8
 
2.0%
소화15 8
 
2.0%
Other values (172) 269
65.8%
2023-12-12T15:14:21.849374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 327
15.0%
0 270
12.4%
251
11.5%
251
11.5%
164
 
7.5%
164
 
7.5%
~ 113
 
5.2%
4 74
 
3.4%
5 72
 
3.3%
, 71
 
3.2%
Other values (18) 428
19.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1093
50.0%
Other Letter 872
39.9%
Math Symbol 113
 
5.2%
Other Punctuation 73
 
3.3%
Space Separator 13
 
0.6%
Open Punctuation 8
 
0.4%
Close Punctuation 8
 
0.4%
Dash Punctuation 5
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
251
28.8%
251
28.8%
164
18.8%
164
18.8%
10
 
1.1%
10
 
1.1%
8
 
0.9%
8
 
0.9%
4
 
0.5%
1
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 327
29.9%
0 270
24.7%
4 74
 
6.8%
5 72
 
6.6%
6 64
 
5.9%
2 62
 
5.7%
9 61
 
5.6%
3 57
 
5.2%
8 53
 
4.8%
7 53
 
4.8%
Other Punctuation
ValueCountFrequency (%)
, 71
97.3%
. 2
 
2.7%
Math Symbol
ValueCountFrequency (%)
~ 113
100.0%
Space Separator
ValueCountFrequency (%)
13
100.0%
Open Punctuation
ValueCountFrequency (%)
( 8
100.0%
Close Punctuation
ValueCountFrequency (%)
) 8
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1313
60.1%
Hangul 872
39.9%

Most frequent character per script

Common
ValueCountFrequency (%)
1 327
24.9%
0 270
20.6%
~ 113
 
8.6%
4 74
 
5.6%
5 72
 
5.5%
, 71
 
5.4%
6 64
 
4.9%
2 62
 
4.7%
9 61
 
4.6%
3 57
 
4.3%
Other values (7) 142
10.8%
Hangul
ValueCountFrequency (%)
251
28.8%
251
28.8%
164
18.8%
164
18.8%
10
 
1.1%
10
 
1.1%
8
 
0.9%
8
 
0.9%
4
 
0.5%
1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1313
60.1%
Hangul 872
39.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 327
24.9%
0 270
20.6%
~ 113
 
8.6%
4 74
 
5.6%
5 72
 
5.5%
, 71
 
5.4%
6 64
 
4.9%
2 62
 
4.7%
9 61
 
4.6%
3 57
 
4.3%
Other values (7) 142
10.8%
Hangul
ValueCountFrequency (%)
251
28.8%
251
28.8%
164
18.8%
164
18.8%
10
 
1.1%
10
 
1.1%
8
 
0.9%
8
 
0.9%
4
 
0.5%
1
 
0.1%

Interactions

2023-12-12T15:14:19.315180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:14:21.995554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번정리분류
연번1.0000.849
정리분류0.8491.000
2023-12-12T15:14:22.092434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번정리분류
연번1.0000.576
정리분류0.5761.000

Missing values

2023-12-12T15:14:19.470993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:14:19.576595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번문권제목정리분류연도
01기부품 목록기부대정12, 소화19
12백자투조모란문호(白磁透彫牡丹文壺)기부소화15
23수탁증서(受託證書) 1~96호기부<NA>
34대정 13년도~소화 4년도 진열품 기부 문서철기부대정09~소화04
45대정 4년도~소화 8년도 진열품 기부 문서철기부대정04~소화07
56소화 8~13년도 진열품 기부 문서철기부소화09~13
67대정 4년도 진열물품 청구서진열대정04
78대정 5년도 박물관 진열품 대장 고본진열대정05
89대정 6년도 진열물품 청구서진열대정06, 소화09.13,14
910대정 7년도 진열물품 청구서진열대정07
연번문권제목정리분류연도
582584부읍 소재 보물 건조물 전시 비상 조치 요강기타<NA>
583585상주분랑기(常住分記)기타<NA>
584586고려도경 필사본 일부 및 고려자기 관련 내용기타<NA>
585587「조선의 혼인형태(朝鮮の婚姻形態)」, 아키바 다카시(秋葉隆, 추엽륭), 문학논집 49-64기타<NA>
586588좌석배치도기타<NA>
587589건조물 조사 계획 지도기타<NA>
588590진해 요항(要港) 부근 사적 개설고적조사<NA>
589591칭원(稱元)에 관한 자료기타<NA>
590592탁본 명세기타<NA>
591593제1회 조선총독부 사료 조사 사진첩 명함판 - 함경도고적조사<NA>