Overview

Dataset statistics

Number of variables4
Number of observations172
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.8 KiB
Average record size in memory34.8 B

Variable types

Numeric1
Text2
Categorical1

Dataset

Description북한인권기록보존소 도서 보유 목록입니다.(연번,도서제목,구입연도,저자)
Author법무부
URLhttps://www.data.go.kr/data/15042257/fileData.do

Alerts

연번 is highly overall correlated with 구입연도High correlation
구입연도 is highly overall correlated with 연번High correlation
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 18:58:44.613092
Analysis finished2023-12-12 18:58:45.522210
Duration0.91 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct172
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean86.5
Minimum1
Maximum172
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2023-12-13T03:58:45.667066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile9.55
Q143.75
median86.5
Q3129.25
95-th percentile163.45
Maximum172
Range171
Interquartile range (IQR)85.5

Descriptive statistics

Standard deviation49.796252
Coefficient of variation (CV)0.57567921
Kurtosis-1.2
Mean86.5
Median Absolute Deviation (MAD)43
Skewness0
Sum14878
Variance2479.6667
MonotonicityStrictly increasing
2023-12-13T03:58:45.952102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.6%
120 1
 
0.6%
112 1
 
0.6%
113 1
 
0.6%
114 1
 
0.6%
115 1
 
0.6%
116 1
 
0.6%
117 1
 
0.6%
118 1
 
0.6%
119 1
 
0.6%
Other values (162) 162
94.2%
ValueCountFrequency (%)
1 1
0.6%
2 1
0.6%
3 1
0.6%
4 1
0.6%
5 1
0.6%
6 1
0.6%
7 1
0.6%
8 1
0.6%
9 1
0.6%
10 1
0.6%
ValueCountFrequency (%)
172 1
0.6%
171 1
0.6%
170 1
0.6%
169 1
0.6%
168 1
0.6%
167 1
0.6%
166 1
0.6%
165 1
0.6%
164 1
0.6%
163 1
0.6%
Distinct169
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
2023-12-13T03:58:46.577776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length139
Median length37
Mean length17.5
Min length3

Characters and Unicode

Total characters3010
Distinct characters353
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique166 ?
Unique (%)96.5%

Sample

1st rowHuman Rights Abuses and Crimes Against Humanity in North Korea, Subcommittee on Africa, Global Health, Global Human Rights, and Internation
2nd rowThe Hidden Gulag: putting human rights on the north korea polish agenda
3rd rowHuman Rights Discourse in North Korea
4th rowACT OF WAR
5th rowNothing to Envy
ValueCountFrequency (%)
북한 29
 
4.3%
북한의 14
 
2.1%
north 8
 
1.2%
korea 8
 
1.2%
한반도 8
 
1.2%
통일 7
 
1.0%
the 6
 
0.9%
rights 5
 
0.7%
이해 5
 
0.7%
현대사 5
 
0.7%
Other values (474) 582
86.0%
2023-12-13T03:58:47.434024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
505
 
16.8%
101
 
3.4%
84
 
2.8%
74
 
2.5%
e 70
 
2.3%
o 56
 
1.9%
t 51
 
1.7%
n 50
 
1.7%
r 48
 
1.6%
a 47
 
1.6%
Other values (343) 1924
63.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1689
56.1%
Lowercase Letter 584
 
19.4%
Space Separator 505
 
16.8%
Uppercase Letter 136
 
4.5%
Other Punctuation 41
 
1.4%
Decimal Number 36
 
1.2%
Open Punctuation 7
 
0.2%
Close Punctuation 7
 
0.2%
Dash Punctuation 3
 
0.1%
Letter Number 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
101
 
6.0%
84
 
5.0%
74
 
4.4%
38
 
2.2%
32
 
1.9%
31
 
1.8%
31
 
1.8%
30
 
1.8%
27
 
1.6%
26
 
1.5%
Other values (280) 1215
71.9%
Lowercase Letter
ValueCountFrequency (%)
e 70
12.0%
o 56
9.6%
t 51
 
8.7%
n 50
 
8.6%
r 48
 
8.2%
a 47
 
8.0%
i 47
 
8.0%
s 35
 
6.0%
h 25
 
4.3%
l 22
 
3.8%
Other values (13) 133
22.8%
Uppercase Letter
ValueCountFrequency (%)
N 14
 
10.3%
R 12
 
8.8%
A 11
 
8.1%
C 10
 
7.4%
T 10
 
7.4%
H 10
 
7.4%
E 9
 
6.6%
I 9
 
6.6%
K 8
 
5.9%
S 6
 
4.4%
Other values (10) 37
27.2%
Other Punctuation
ValueCountFrequency (%)
, 16
39.0%
: 13
31.7%
" 3
 
7.3%
! 3
 
7.3%
? 2
 
4.9%
· 2
 
4.9%
' 1
 
2.4%
. 1
 
2.4%
Decimal Number
ValueCountFrequency (%)
1 9
25.0%
2 7
19.4%
3 7
19.4%
0 5
13.9%
4 5
13.9%
7 2
 
5.6%
5 1
 
2.8%
Space Separator
ValueCountFrequency (%)
505
100.0%
Open Punctuation
ValueCountFrequency (%)
( 7
100.0%
Close Punctuation
ValueCountFrequency (%)
) 7
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Letter Number
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1689
56.1%
Latin 722
24.0%
Common 599
 
19.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
101
 
6.0%
84
 
5.0%
74
 
4.4%
38
 
2.2%
32
 
1.9%
31
 
1.8%
31
 
1.8%
30
 
1.8%
27
 
1.6%
26
 
1.5%
Other values (280) 1215
71.9%
Latin
ValueCountFrequency (%)
e 70
 
9.7%
o 56
 
7.8%
t 51
 
7.1%
n 50
 
6.9%
r 48
 
6.6%
a 47
 
6.5%
i 47
 
6.5%
s 35
 
4.8%
h 25
 
3.5%
l 22
 
3.0%
Other values (34) 271
37.5%
Common
ValueCountFrequency (%)
505
84.3%
, 16
 
2.7%
: 13
 
2.2%
1 9
 
1.5%
2 7
 
1.2%
( 7
 
1.2%
3 7
 
1.2%
) 7
 
1.2%
0 5
 
0.8%
4 5
 
0.8%
Other values (9) 18
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1689
56.1%
ASCII 1317
43.8%
Number Forms 2
 
0.1%
None 2
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
505
38.3%
e 70
 
5.3%
o 56
 
4.3%
t 51
 
3.9%
n 50
 
3.8%
r 48
 
3.6%
a 47
 
3.6%
i 47
 
3.6%
s 35
 
2.7%
h 25
 
1.9%
Other values (51) 383
29.1%
Hangul
ValueCountFrequency (%)
101
 
6.0%
84
 
5.0%
74
 
4.4%
38
 
2.2%
32
 
1.9%
31
 
1.8%
31
 
1.8%
30
 
1.8%
27
 
1.6%
26
 
1.5%
Other values (280) 1215
71.9%
Number Forms
ValueCountFrequency (%)
2
100.0%
None
ValueCountFrequency (%)
· 2
100.0%

구입연도
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
2018
126 
2019
46 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2018
2nd row2018
3rd row2018
4th row2018
5th row2018

Common Values

ValueCountFrequency (%)
2018 126
73.3%
2019 46
 
26.7%

Length

2023-12-13T03:58:47.687194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:58:47.907965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2018 126
73.3%
2019 46
 
26.7%

저자
Text

Distinct142
Distinct (%)82.6%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
2023-12-13T03:58:48.367594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length51
Median length3
Mean length5.3604651
Min length2

Characters and Unicode

Total characters922
Distinct characters222
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique129 ?
Unique (%)75.0%

Sample

1st rowcreate independant
2nd rowcreate independant
3rd rowJiyoung Song
4th rowCheevers, Jack
5th rowDemick, Barbara
ValueCountFrequency (%)
황장엽 14
 
6.1%
8
 
3.5%
안문석 5
 
2.2%
권순희 3
 
1.3%
윤대규 3
 
1.3%
허만호 3
 
1.3%
강동완 2
 
0.9%
2
 
0.9%
2
 
0.9%
kim 2
 
0.9%
Other values (177) 187
81.0%
2023-12-13T03:58:49.128579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
75
 
8.1%
e 39
 
4.2%
n 24
 
2.6%
23
 
2.5%
a 22
 
2.4%
i 18
 
2.0%
r 16
 
1.7%
15
 
1.6%
o 15
 
1.6%
15
 
1.6%
Other values (212) 660
71.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 524
56.8%
Lowercase Letter 236
25.6%
Space Separator 75
 
8.1%
Uppercase Letter 59
 
6.4%
Other Punctuation 17
 
1.8%
Decimal Number 5
 
0.5%
Dash Punctuation 4
 
0.4%
Close Punctuation 1
 
0.1%
Open Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
23
 
4.4%
15
 
2.9%
15
 
2.9%
15
 
2.9%
14
 
2.7%
10
 
1.9%
10
 
1.9%
9
 
1.7%
9
 
1.7%
9
 
1.7%
Other values (159) 395
75.4%
Lowercase Letter
ValueCountFrequency (%)
e 39
16.5%
n 24
10.2%
a 22
9.3%
i 18
 
7.6%
r 16
 
6.8%
o 15
 
6.4%
t 13
 
5.5%
u 13
 
5.5%
m 12
 
5.1%
s 11
 
4.7%
Other values (11) 53
22.5%
Uppercase Letter
ValueCountFrequency (%)
H 5
 
8.5%
A 5
 
8.5%
B 5
 
8.5%
S 5
 
8.5%
J 4
 
6.8%
L 4
 
6.8%
C 4
 
6.8%
T 4
 
6.8%
I 3
 
5.1%
P 3
 
5.1%
Other values (10) 17
28.8%
Decimal Number
ValueCountFrequency (%)
3 1
20.0%
7 1
20.0%
1 1
20.0%
0 1
20.0%
2 1
20.0%
Other Punctuation
ValueCountFrequency (%)
, 13
76.5%
. 3
 
17.6%
& 1
 
5.9%
Space Separator
ValueCountFrequency (%)
75
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 524
56.8%
Latin 295
32.0%
Common 103
 
11.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
23
 
4.4%
15
 
2.9%
15
 
2.9%
15
 
2.9%
14
 
2.7%
10
 
1.9%
10
 
1.9%
9
 
1.7%
9
 
1.7%
9
 
1.7%
Other values (159) 395
75.4%
Latin
ValueCountFrequency (%)
e 39
 
13.2%
n 24
 
8.1%
a 22
 
7.5%
i 18
 
6.1%
r 16
 
5.4%
o 15
 
5.1%
t 13
 
4.4%
u 13
 
4.4%
m 12
 
4.1%
s 11
 
3.7%
Other values (31) 112
38.0%
Common
ValueCountFrequency (%)
75
72.8%
, 13
 
12.6%
- 4
 
3.9%
. 3
 
2.9%
3 1
 
1.0%
) 1
 
1.0%
7 1
 
1.0%
1 1
 
1.0%
0 1
 
1.0%
2 1
 
1.0%
Other values (2) 2
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 524
56.8%
ASCII 398
43.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
75
18.8%
e 39
 
9.8%
n 24
 
6.0%
a 22
 
5.5%
i 18
 
4.5%
r 16
 
4.0%
o 15
 
3.8%
t 13
 
3.3%
, 13
 
3.3%
u 13
 
3.3%
Other values (43) 150
37.7%
Hangul
ValueCountFrequency (%)
23
 
4.4%
15
 
2.9%
15
 
2.9%
15
 
2.9%
14
 
2.7%
10
 
1.9%
10
 
1.9%
9
 
1.7%
9
 
1.7%
9
 
1.7%
Other values (159) 395
75.4%

Interactions

2023-12-13T03:58:45.065962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T03:58:49.306995image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번구입연도
연번1.0000.995
구입연도0.9951.000
2023-12-13T03:58:49.462104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번구입연도
연번1.0000.915
구입연도0.9151.000

Missing values

2023-12-13T03:58:45.300344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T03:58:45.456733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번도서제목구입연도저자
01Human Rights Abuses and Crimes Against Humanity in North Korea, Subcommittee on Africa, Global Health, Global Human Rights, and Internation2018create independant
12The Hidden Gulag: putting human rights on the north korea polish agenda2018create independant
23Human Rights Discourse in North Korea2018Jiyoung Song
34ACT OF WAR2018Cheevers, Jack
45Nothing to Envy2018Demick, Barbara
56Transitional Justice in Unifed Korea2018Teitel, Ruti G. (EDT) , Baekbuhm-suk, Baek
67North Korea and the World2018Clemens, Walter C.,Jr
78Escape from Camp 142018harden, blaine
89Escaping North Korea: Defiance and Hope in the World"s Most Repressive Country2018Kim, Mike
910A Thousand Miles to Freedom: My Escape from North Korea2018Kim, Eunsun
연번도서제목구입연도저자
162163분단체제의 노동2019김화순
163164북한 사람과 거래하는 법2019오기현
164165북한을 읽다2019구애림 외
165166북한의 오늘 Ⅱ2019윤영관
166167김정은 체제의 북한 전쟁전략2019박용환
167168김정은 평전 마지막 계승자2019애나 파이필드
168169대통령과 통일정책2019김창진
169170나의 살던 북한은2019경화
170171인민의 얼굴2019한성훈
171172평양, 제가 한 번 가보겠습니다2019정재연