Overview

Dataset statistics

Number of variables5
Number of observations1802
Missing cells0
Missing cells (%)0.0%
Duplicate rows19
Duplicate rows (%)1.1%
Total size in memory72.3 KiB
Average record size in memory41.1 B

Variable types

Text3
Numeric1
Categorical1

Dataset

Description전북특별자치도 장수군에 소재한 군립도서관에서 구입한 도서 목록 현황(도서명, 저자, 출판사, 수량, 구분)에 대하여 정보를 제공하고자 합니다
Author전북특별자치도 장수군
URLhttps://www.data.go.kr/data/15055047/fileData.do

Alerts

Dataset has 19 (1.1%) duplicate rowsDuplicates
구분 is highly imbalanced (59.4%)Imbalance
수량 is highly skewed (γ1 = 27.60347527)Skewed

Reproduction

Analysis started2024-04-06 08:46:03.759203
Analysis finished2024-04-06 08:46:06.564966
Duration2.81 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1780
Distinct (%)98.8%
Missing0
Missing (%)0.0%
Memory size14.2 KiB
2024-04-06T17:46:07.267947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length49
Median length37
Mean length15.420089
Min length1

Characters and Unicode

Total characters27787
Distinct characters914
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1760 ?
Unique (%)97.7%

Sample

1st row 게으르지만 콘텐츠로 돈은 잘 법니다
2nd row 무인 양품의 생각과 말
3rd row 어포메이션
4th row 마녀의 은신처
5th row 미스터 프레지던트
ValueCountFrequency (%)
74
 
1.1%
1 69
 
1.0%
2 67
 
1.0%
오디오북 49
 
0.7%
3 35
 
0.5%
제로니모의 30
 
0.4%
환상 30
 
0.4%
시리즈 28
 
0.4%
4 28
 
0.4%
5 27
 
0.4%
Other values (3914) 6457
93.7%
2024-04-06T17:46:08.668584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6027
 
21.7%
667
 
2.4%
505
 
1.8%
1 327
 
1.2%
( 323
 
1.2%
) 323
 
1.2%
317
 
1.1%
309
 
1.1%
. 306
 
1.1%
2 277
 
1.0%
Other values (904) 18406
66.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 18640
67.1%
Space Separator 6027
 
21.7%
Decimal Number 1304
 
4.7%
Other Punctuation 746
 
2.7%
Open Punctuation 388
 
1.4%
Close Punctuation 388
 
1.4%
Uppercase Letter 169
 
0.6%
Lowercase Letter 88
 
0.3%
Math Symbol 24
 
0.1%
Dash Punctuation 13
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
667
 
3.6%
505
 
2.7%
317
 
1.7%
309
 
1.7%
260
 
1.4%
236
 
1.3%
228
 
1.2%
222
 
1.2%
220
 
1.2%
215
 
1.2%
Other values (828) 15461
82.9%
Uppercase Letter
ValueCountFrequency (%)
G 22
13.0%
T 19
11.2%
S 15
 
8.9%
V 12
 
7.1%
A 11
 
6.5%
E 10
 
5.9%
I 10
 
5.9%
O 9
 
5.3%
P 8
 
4.7%
D 6
 
3.6%
Other values (14) 47
27.8%
Lowercase Letter
ValueCountFrequency (%)
o 23
26.1%
e 7
 
8.0%
r 6
 
6.8%
c 6
 
6.8%
x 6
 
6.8%
t 4
 
4.5%
a 4
 
4.5%
s 3
 
3.4%
d 3
 
3.4%
l 3
 
3.4%
Other values (13) 23
26.1%
Decimal Number
ValueCountFrequency (%)
1 327
25.1%
2 277
21.2%
3 149
11.4%
5 118
 
9.0%
0 104
 
8.0%
4 104
 
8.0%
6 73
 
5.6%
8 54
 
4.1%
9 53
 
4.1%
7 45
 
3.5%
Other Punctuation
ValueCountFrequency (%)
. 306
41.0%
: 243
32.6%
, 90
 
12.1%
! 62
 
8.3%
? 36
 
4.8%
· 4
 
0.5%
' 2
 
0.3%
% 2
 
0.3%
& 1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 323
83.2%
[ 49
 
12.6%
16
 
4.1%
Close Punctuation
ValueCountFrequency (%)
) 323
83.2%
] 49
 
12.6%
16
 
4.1%
Math Symbol
ValueCountFrequency (%)
~ 23
95.8%
+ 1
 
4.2%
Space Separator
ValueCountFrequency (%)
6027
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 18636
67.1%
Common 8890
32.0%
Latin 257
 
0.9%
Han 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
667
 
3.6%
505
 
2.7%
317
 
1.7%
309
 
1.7%
260
 
1.4%
236
 
1.3%
228
 
1.2%
222
 
1.2%
220
 
1.2%
215
 
1.2%
Other values (824) 15457
82.9%
Latin
ValueCountFrequency (%)
o 23
 
8.9%
G 22
 
8.6%
T 19
 
7.4%
S 15
 
5.8%
V 12
 
4.7%
A 11
 
4.3%
E 10
 
3.9%
I 10
 
3.9%
O 9
 
3.5%
P 8
 
3.1%
Other values (37) 118
45.9%
Common
ValueCountFrequency (%)
6027
67.8%
1 327
 
3.7%
( 323
 
3.6%
) 323
 
3.6%
. 306
 
3.4%
2 277
 
3.1%
: 243
 
2.7%
3 149
 
1.7%
5 118
 
1.3%
0 104
 
1.2%
Other values (19) 693
 
7.8%
Han
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 18635
67.1%
ASCII 9111
32.8%
None 36
 
0.1%
CJK 4
 
< 0.1%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6027
66.2%
1 327
 
3.6%
( 323
 
3.5%
) 323
 
3.5%
. 306
 
3.4%
2 277
 
3.0%
: 243
 
2.7%
3 149
 
1.6%
5 118
 
1.3%
0 104
 
1.1%
Other values (63) 914
 
10.0%
Hangul
ValueCountFrequency (%)
667
 
3.6%
505
 
2.7%
317
 
1.7%
309
 
1.7%
260
 
1.4%
236
 
1.3%
228
 
1.2%
222
 
1.2%
220
 
1.2%
215
 
1.2%
Other values (823) 15456
82.9%
None
ValueCountFrequency (%)
16
44.4%
16
44.4%
· 4
 
11.1%
CJK
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%
Distinct1250
Distinct (%)69.4%
Missing0
Missing (%)0.0%
Memory size14.2 KiB
2024-04-06T17:46:09.303093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length68
Median length48
Mean length5.309101
Min length1

Characters and Unicode

Total characters9567
Distinct characters573
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1065 ?
Unique (%)59.1%

Sample

1st row 신태순
2nd row 양품계획
3rd row 노아세인트존
4th row 존 딕슨 카
5th row 탁현민
ValueCountFrequency (%)
38
 
1.5%
스틸턴 30
 
1.2%
제로니모 30
 
1.2%
㈜셉텐트리오 26
 
1.0%
안비루 25
 
1.0%
야스코 25
 
1.0%
김강현 23
 
0.9%
22
 
0.9%
호럽 20
 
0.8%
조앤 20
 
0.8%
Other values (1605) 2266
89.7%
2024-04-06T17:46:10.312243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1712
 
17.9%
298
 
3.1%
242
 
2.5%
173
 
1.8%
155
 
1.6%
125
 
1.3%
, 120
 
1.3%
113
 
1.2%
105
 
1.1%
100
 
1.0%
Other values (563) 6424
67.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7514
78.5%
Space Separator 1712
 
17.9%
Other Punctuation 151
 
1.6%
Uppercase Letter 81
 
0.8%
Lowercase Letter 28
 
0.3%
Other Symbol 26
 
0.3%
Open Punctuation 22
 
0.2%
Close Punctuation 22
 
0.2%
Decimal Number 7
 
0.1%
Dash Punctuation 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
298
 
4.0%
242
 
3.2%
173
 
2.3%
155
 
2.1%
125
 
1.7%
113
 
1.5%
105
 
1.4%
100
 
1.3%
96
 
1.3%
94
 
1.3%
Other values (512) 6013
80.0%
Uppercase Letter
ValueCountFrequency (%)
R 18
22.2%
P 6
 
7.4%
J 6
 
7.4%
T 6
 
7.4%
A 5
 
6.2%
L 4
 
4.9%
K 4
 
4.9%
O 4
 
4.9%
B 4
 
4.9%
D 3
 
3.7%
Other values (10) 21
25.9%
Lowercase Letter
ValueCountFrequency (%)
a 4
14.3%
m 3
10.7%
n 3
10.7%
i 3
10.7%
v 2
7.1%
t 2
7.1%
r 2
7.1%
c 2
7.1%
b 2
7.1%
g 1
 
3.6%
Other values (4) 4
14.3%
Other Punctuation
ValueCountFrequency (%)
, 120
79.5%
. 29
 
19.2%
& 1
 
0.7%
· 1
 
0.7%
Decimal Number
ValueCountFrequency (%)
3 3
42.9%
2 2
28.6%
5 1
 
14.3%
4 1
 
14.3%
Open Punctuation
ValueCountFrequency (%)
( 20
90.9%
2
 
9.1%
Close Punctuation
ValueCountFrequency (%)
) 20
90.9%
2
 
9.1%
Space Separator
ValueCountFrequency (%)
1712
100.0%
Other Symbol
ValueCountFrequency (%)
26
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Initial Punctuation
ValueCountFrequency (%)
1
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7540
78.8%
Common 1918
 
20.0%
Latin 109
 
1.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
298
 
4.0%
242
 
3.2%
173
 
2.3%
155
 
2.1%
125
 
1.7%
113
 
1.5%
105
 
1.4%
100
 
1.3%
96
 
1.3%
94
 
1.2%
Other values (513) 6039
80.1%
Latin
ValueCountFrequency (%)
R 18
 
16.5%
P 6
 
5.5%
J 6
 
5.5%
T 6
 
5.5%
A 5
 
4.6%
L 4
 
3.7%
K 4
 
3.7%
O 4
 
3.7%
a 4
 
3.7%
B 4
 
3.7%
Other values (24) 48
44.0%
Common
ValueCountFrequency (%)
1712
89.3%
, 120
 
6.3%
. 29
 
1.5%
( 20
 
1.0%
) 20
 
1.0%
3 3
 
0.2%
2 2
 
0.1%
- 2
 
0.1%
2
 
0.1%
2
 
0.1%
Other values (6) 6
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7510
78.5%
ASCII 2020
 
21.1%
None 31
 
0.3%
Compat Jamo 4
 
< 0.1%
Punctuation 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1712
84.8%
, 120
 
5.9%
. 29
 
1.4%
( 20
 
1.0%
) 20
 
1.0%
R 18
 
0.9%
P 6
 
0.3%
J 6
 
0.3%
T 6
 
0.3%
A 5
 
0.2%
Other values (35) 78
 
3.9%
Hangul
ValueCountFrequency (%)
298
 
4.0%
242
 
3.2%
173
 
2.3%
155
 
2.1%
125
 
1.7%
113
 
1.5%
105
 
1.4%
100
 
1.3%
96
 
1.3%
94
 
1.3%
Other values (511) 6009
80.0%
None
ValueCountFrequency (%)
26
83.9%
2
 
6.5%
2
 
6.5%
· 1
 
3.2%
Compat Jamo
ValueCountFrequency (%)
4
100.0%
Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct727
Distinct (%)40.3%
Missing0
Missing (%)0.0%
Memory size14.2 KiB
2024-04-06T17:46:11.063669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length11
Mean length4.9600444
Min length1

Characters and Unicode

Total characters8938
Distinct characters477
Distinct categories8 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique427 ?
Unique (%)23.7%

Sample

1st row 나비의 활주로
2nd row 웅진지식하우스
3rd row 나비스쿨
4th row 엘릭시르
5th row 메디치미디어
ValueCountFrequency (%)
위즈덤하우스 39
 
2.1%
서울문화사 34
 
1.8%
사파리 30
 
1.6%
예림당 27
 
1.4%
셉텐트리오 26
 
1.4%
창비 26
 
1.4%
26
 
1.4%
아울북 26
 
1.4%
자음과모음 23
 
1.2%
rhk 23
 
1.2%
Other values (656) 1590
85.0%
2024-04-06T17:46:12.182261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
991
 
11.1%
399
 
4.5%
297
 
3.3%
268
 
3.0%
194
 
2.2%
190
 
2.1%
172
 
1.9%
171
 
1.9%
125
 
1.4%
102
 
1.1%
Other values (467) 6029
67.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7607
85.1%
Space Separator 991
 
11.1%
Uppercase Letter 129
 
1.4%
Lowercase Letter 92
 
1.0%
Open Punctuation 49
 
0.5%
Close Punctuation 49
 
0.5%
Decimal Number 19
 
0.2%
Other Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
399
 
5.2%
297
 
3.9%
268
 
3.5%
194
 
2.6%
190
 
2.5%
172
 
2.3%
171
 
2.2%
125
 
1.6%
102
 
1.3%
101
 
1.3%
Other values (435) 5588
73.5%
Uppercase Letter
ValueCountFrequency (%)
K 32
24.8%
H 26
20.2%
R 25
19.4%
O 12
 
9.3%
B 11
 
8.5%
S 6
 
4.7%
P 5
 
3.9%
M 5
 
3.9%
G 2
 
1.6%
J 1
 
0.8%
Other values (4) 4
 
3.1%
Lowercase Letter
ValueCountFrequency (%)
o 28
30.4%
b 15
16.3%
k 14
15.2%
i 6
 
6.5%
n 6
 
6.5%
s 6
 
6.5%
d 5
 
5.4%
t 5
 
5.4%
e 5
 
5.4%
a 1
 
1.1%
Decimal Number
ValueCountFrequency (%)
2 12
63.2%
4 4
 
21.1%
1 3
 
15.8%
Space Separator
ValueCountFrequency (%)
991
100.0%
Open Punctuation
ValueCountFrequency (%)
( 49
100.0%
Close Punctuation
ValueCountFrequency (%)
) 49
100.0%
Other Punctuation
ValueCountFrequency (%)
# 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7606
85.1%
Common 1110
 
12.4%
Latin 221
 
2.5%
Han 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
399
 
5.2%
297
 
3.9%
268
 
3.5%
194
 
2.6%
190
 
2.5%
172
 
2.3%
171
 
2.2%
125
 
1.6%
102
 
1.3%
101
 
1.3%
Other values (434) 5587
73.5%
Latin
ValueCountFrequency (%)
K 32
14.5%
o 28
12.7%
H 26
11.8%
R 25
11.3%
b 15
 
6.8%
k 14
 
6.3%
O 12
 
5.4%
B 11
 
5.0%
i 6
 
2.7%
n 6
 
2.7%
Other values (15) 46
20.8%
Common
ValueCountFrequency (%)
991
89.3%
( 49
 
4.4%
) 49
 
4.4%
2 12
 
1.1%
4 4
 
0.4%
1 3
 
0.3%
# 2
 
0.2%
Han
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7606
85.1%
ASCII 1331
 
14.9%
CJK 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
991
74.5%
( 49
 
3.7%
) 49
 
3.7%
K 32
 
2.4%
o 28
 
2.1%
H 26
 
2.0%
R 25
 
1.9%
b 15
 
1.1%
k 14
 
1.1%
2 12
 
0.9%
Other values (22) 90
 
6.8%
Hangul
ValueCountFrequency (%)
399
 
5.2%
297
 
3.9%
268
 
3.5%
194
 
2.6%
190
 
2.5%
172
 
2.3%
171
 
2.2%
125
 
1.6%
102
 
1.3%
101
 
1.3%
Other values (434) 5587
73.5%
CJK
ValueCountFrequency (%)
1
100.0%

수량
Real number (ℝ)

SKEWED 

Distinct7
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0233074
Minimum1
Maximum14
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.0 KiB
2024-04-06T17:46:12.545921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile1
Maximum14
Range13
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.36582271
Coefficient of variation (CV)0.35749052
Kurtosis910.11122
Mean1.0233074
Median Absolute Deviation (MAD)0
Skewness27.603475
Sum1844
Variance0.13382626
MonotonicityNot monotonic
2024-04-06T17:46:12.956140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1 1784
99.0%
2 11
 
0.6%
3 3
 
0.2%
14 1
 
0.1%
6 1
 
0.1%
4 1
 
0.1%
5 1
 
0.1%
ValueCountFrequency (%)
1 1784
99.0%
2 11
 
0.6%
3 3
 
0.2%
4 1
 
0.1%
5 1
 
0.1%
6 1
 
0.1%
14 1
 
0.1%
ValueCountFrequency (%)
14 1
 
0.1%
6 1
 
0.1%
5 1
 
0.1%
4 1
 
0.1%
3 3
 
0.2%
2 11
 
0.6%
1 1784
99.0%

구분
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size14.2 KiB
일반
1574 
전자책
179 
오디오북
 
49

Length

Max length4
Median length2
Mean length2.1537181
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반
2nd row일반
3rd row일반
4th row일반
5th row일반

Common Values

ValueCountFrequency (%)
일반 1574
87.3%
전자책 179
 
9.9%
오디오북 49
 
2.7%

Length

2024-04-06T17:46:13.597717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T17:46:13.946934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반 1574
87.3%
전자책 179
 
9.9%
오디오북 49
 
2.7%

Interactions

2024-04-06T17:46:05.740562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-06T17:46:14.135855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수량구분
수량1.0000.000
구분0.0001.000
2024-04-06T17:46:14.360021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수량구분
수량1.0000.000
구분0.0001.000

Missing values

2024-04-06T17:46:06.146272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-06T17:46:06.456433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

도서명저자출판사수량구분
0게으르지만 콘텐츠로 돈은 잘 법니다신태순나비의 활주로1일반
1무인 양품의 생각과 말양품계획웅진지식하우스1일반
2어포메이션노아세인트존나비스쿨1일반
3마녀의 은신처존 딕슨 카엘릭시르1일반
4미스터 프레지던트탁현민메디치미디어1일반
5교만의 요새마사 c.누스바움민음사1일반
6연약한 선마사 c.누스바움서커스1일반
7피렌체 서점 이양기로스킹책과함께1일반
8오래된 신들이 섬에 내려오시니전건우외 5인들녁1일반
9생강빵과 진저브레드김지현비채1일반
도서명저자출판사수량구분
1792어느날, 노비가 되었다.3.지은지,이민아아르볼1일반
1793설전도 수련관.2김경미슈크림북1일반
1794톰과 소야의 도시 탐험.1.하야미네 카오루상상출판1일반
1795톰과 소야의 도시 탐험.3.하야미네 카오루상상출판1일반
1796미지의 파랑.1 : 소울메이트를 찾아서차율이비룡소1일반
1797미지의 파랑.3 : 새로운 세계를 찾아서차율이비룡소1일반
1798왔구마고구마구마.2. : 쉿!비밀이구마조주희킨더랜드1일반
1799(신기한 맛)도깨비 식당.4~5김용세, 김병섭꿈터2일반
1800이리의 형제.4.: 친구와 적허교범창비1일반
1801이리의 형제.5: 목숨보다 소중한 것허교범창비1일반

Duplicate rows

Most frequently occurring

도서명저자출판사수량구분# duplicates
17이브의 세 딸엘리프 샤팍소담출판사1일반3
0김미경의 마흔수업김미경어웨이크북스1일반2
1LA 이방인신재동북랩1일반2
2겨울이 마주한 봄은 멍멍이에요강이서스토리해윰1일반2
3고양이에게 말 걸기백종선청어1일반2
4꼬마철학자 두발로신광철느티나무가있는풍경1일반2
5남자무리 여사친정율리북폴리오1일반2
6두 번째 원고함윤이사계절1일반2
7라이크 팔로우 리벤지(스토리 콜렉터 105)엘러리 로이드북로드1일반2
8마음의 비율김승연마시멜로1일반2