Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells258
Missing cells (%)0.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory634.8 KiB
Average record size in memory65.0 B

Variable types

Numeric1
Categorical1
Text5

Dataset

Description도서명, 저자명, 출판사명, 형태사항, 주기사항,원문정보등
Author충북대학교
URLhttps://www.data.go.kr/data/3058186/fileData.do

Alerts

유형 is highly imbalanced (82.6%)Imbalance
분류번호 has 109 (1.1%) missing valuesMissing
서지번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 04:56:07.387462
Analysis finished2023-12-12 04:56:10.843416
Duration3.46 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

서지번호
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1175787.5
Minimum44494
Maximum3017993
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:56:10.921568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum44494
5-th percentile93635.7
Q1264490.25
median422836
Q32242929.8
95-th percentile2962100.6
Maximum3017993
Range2973499
Interquartile range (IQR)1978439.5

Descriptive statistics

Standard deviation1075023
Coefficient of variation (CV)0.91430039
Kurtosis-1.5860814
Mean1175787.5
Median Absolute Deviation (MAD)279787
Skewness0.44847822
Sum1.1757875 × 1010
Variance1.1556744 × 1012
MonotonicityNot monotonic
2023-12-12T13:56:11.084513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
285701 1
 
< 0.1%
272390 1
 
< 0.1%
2081396 1
 
< 0.1%
2187130 1
 
< 0.1%
1933784 1
 
< 0.1%
2973452 1
 
< 0.1%
404881 1
 
< 0.1%
2771411 1
 
< 0.1%
358401 1
 
< 0.1%
389211 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
44494 1
< 0.1%
44503 1
< 0.1%
44517 1
< 0.1%
44631 1
< 0.1%
44634 1
< 0.1%
44645 1
< 0.1%
44757 1
< 0.1%
44803 1
< 0.1%
44839 1
< 0.1%
44951 1
< 0.1%
ValueCountFrequency (%)
3017993 1
< 0.1%
3017983 1
< 0.1%
3017980 1
< 0.1%
3017709 1
< 0.1%
3015435 1
< 0.1%
3015243 1
< 0.1%
3015233 1
< 0.1%
3015226 1
< 0.1%
3014845 1
< 0.1%
3014828 1
< 0.1%

유형
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
국내단행본
9576 
DVD
 
375
고서
 
49

Length

Max length5
Median length5
Mean length4.9103
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row국내단행본
2nd row국내단행본
3rd row국내단행본
4th row국내단행본
5th row국내단행본

Common Values

ValueCountFrequency (%)
국내단행본 9576
95.8%
DVD 375
 
3.8%
고서 49
 
0.5%

Length

2023-12-12T13:56:11.243492image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:56:11.341928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국내단행본 9576
95.8%
dvd 375
 
3.8%
고서 49
 
0.5%

서명
Text

Distinct9728
Distinct (%)97.3%
Missing1
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-12T13:56:11.835156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length133
Median length88
Mean length15.020002
Min length1

Characters and Unicode

Total characters150185
Distinct characters2635
Distinct categories16 ?
Distinct scripts7 ?
Distinct blocks13 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9557 ?
Unique (%)95.6%

Sample

1st rowTCP/IP 네트워킹
2nd row건축설계이론
3rd row(새로운)財務管理論
4th row(단재)신채호
5th row軍改革 이렇게 해야 한다
ValueCountFrequency (%)
위한 325
 
1.1%
연구 261
 
0.9%
244
 
0.8%
개발 118
 
0.4%
관한 92
 
0.3%
이야기 78
 
0.3%
21세기 72
 
0.2%
이해 69
 
0.2%
67
 
0.2%
프로그래밍 58
 
0.2%
Other values (18772) 29041
95.5%
2023-12-12T13:56:12.509580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
20438
 
13.6%
) 5270
 
3.5%
( 5269
 
3.5%
2607
 
1.7%
1726
 
1.1%
1605
 
1.1%
1524
 
1.0%
1492
 
1.0%
0 1234
 
0.8%
1181
 
0.8%
Other values (2625) 107839
71.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 98158
65.4%
Space Separator 20438
 
13.6%
Lowercase Letter 9229
 
6.1%
Uppercase Letter 5958
 
4.0%
Close Punctuation 5326
 
3.5%
Open Punctuation 5325
 
3.5%
Decimal Number 4365
 
2.9%
Other Punctuation 1082
 
0.7%
Dash Punctuation 185
 
0.1%
Math Symbol 87
 
0.1%
Other values (6) 32
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2607
 
2.7%
1726
 
1.8%
1605
 
1.6%
1524
 
1.6%
1492
 
1.5%
1181
 
1.2%
1107
 
1.1%
1055
 
1.1%
997
 
1.0%
962
 
1.0%
Other values (2488) 83902
85.5%
Lowercase Letter
ValueCountFrequency (%)
e 1028
11.1%
o 842
 
9.1%
i 812
 
8.8%
a 807
 
8.7%
n 714
 
7.7%
r 684
 
7.4%
t 619
 
6.7%
s 555
 
6.0%
l 429
 
4.6%
c 373
 
4.0%
Other values (38) 2366
25.6%
Uppercase Letter
ValueCountFrequency (%)
S 569
 
9.6%
C 505
 
8.5%
A 491
 
8.2%
T 441
 
7.4%
I 419
 
7.0%
E 414
 
6.9%
P 329
 
5.5%
O 312
 
5.2%
M 291
 
4.9%
D 273
 
4.6%
Other values (20) 1914
32.1%
Other Punctuation
ValueCountFrequency (%)
. 291
26.9%
, 275
25.4%
· 191
17.7%
/ 81
 
7.5%
! 64
 
5.9%
& 49
 
4.5%
' 45
 
4.2%
: 24
 
2.2%
" 18
 
1.7%
% 15
 
1.4%
Other values (7) 29
 
2.7%
Decimal Number
ValueCountFrequency (%)
0 1234
28.3%
1 895
20.5%
2 849
19.5%
3 280
 
6.4%
5 247
 
5.7%
9 219
 
5.0%
4 174
 
4.0%
8 166
 
3.8%
6 156
 
3.6%
7 145
 
3.3%
Math Symbol
ValueCountFrequency (%)
+ 56
64.4%
~ 21
 
24.1%
> 4
 
4.6%
= 3
 
3.4%
< 2
 
2.3%
1
 
1.1%
Letter Number
ValueCountFrequency (%)
6
33.3%
5
27.8%
4
22.2%
1
 
5.6%
1
 
5.6%
1
 
5.6%
Close Punctuation
ValueCountFrequency (%)
) 5270
98.9%
] 30
 
0.6%
13
 
0.2%
12
 
0.2%
1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 5269
98.9%
[ 30
 
0.6%
13
 
0.2%
12
 
0.2%
1
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
5
62.5%
® 2
 
25.0%
1
 
12.5%
Modifier Symbol
ValueCountFrequency (%)
´ 1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
20438
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 185
100.0%
Final Punctuation
ValueCountFrequency (%)
2
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%
Initial Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 82217
54.7%
Common 36822
24.5%
Han 15871
 
10.6%
Latin 15149
 
10.1%
Cyrillic 56
 
< 0.1%
Katakana 48
 
< 0.1%
Hiragana 22
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
465
 
2.9%
372
 
2.3%
298
 
1.9%
239
 
1.5%
237
 
1.5%
232
 
1.5%
213
 
1.3%
181
 
1.1%
180
 
1.1%
137
 
0.9%
Other values (1403) 13317
83.9%
Hangul
ValueCountFrequency (%)
2607
 
3.2%
1726
 
2.1%
1605
 
2.0%
1524
 
1.9%
1492
 
1.8%
1181
 
1.4%
1107
 
1.3%
1055
 
1.3%
997
 
1.2%
962
 
1.2%
Other values (1038) 67961
82.7%
Latin
ValueCountFrequency (%)
e 1028
 
6.8%
o 842
 
5.6%
i 812
 
5.4%
a 807
 
5.3%
n 714
 
4.7%
r 684
 
4.5%
t 619
 
4.1%
S 569
 
3.8%
s 555
 
3.7%
C 505
 
3.3%
Other values (48) 8014
52.9%
Common
ValueCountFrequency (%)
20438
55.5%
) 5270
 
14.3%
( 5269
 
14.3%
0 1234
 
3.4%
1 895
 
2.4%
2 849
 
2.3%
. 291
 
0.8%
3 280
 
0.8%
, 275
 
0.7%
5 247
 
0.7%
Other values (43) 1774
 
4.8%
Katakana
ValueCountFrequency (%)
6
 
12.5%
5
 
10.4%
4
 
8.3%
4
 
8.3%
3
 
6.2%
2
 
4.2%
2
 
4.2%
1
 
2.1%
1
 
2.1%
1
 
2.1%
Other values (19) 19
39.6%
Cyrillic
ValueCountFrequency (%)
и 7
 
12.5%
я 5
 
8.9%
е 5
 
8.9%
л 4
 
7.1%
а 4
 
7.1%
р 4
 
7.1%
д 3
 
5.4%
ы 2
 
3.6%
т 2
 
3.6%
п 2
 
3.6%
Other values (16) 18
32.1%
Hiragana
ValueCountFrequency (%)
13
59.1%
3
 
13.6%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 82182
54.7%
ASCII 51680
34.4%
CJK 15561
 
10.4%
CJK Compat Ideographs 310
 
0.2%
None 264
 
0.2%
Cyrillic 56
 
< 0.1%
Katakana 48
 
< 0.1%
Compat Jamo 35
 
< 0.1%
Hiragana 22
 
< 0.1%
Number Forms 18
 
< 0.1%
Other values (3) 9
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20438
39.5%
) 5270
 
10.2%
( 5269
 
10.2%
0 1234
 
2.4%
e 1028
 
2.0%
1 895
 
1.7%
2 849
 
1.6%
o 842
 
1.6%
i 812
 
1.6%
a 807
 
1.6%
Other values (75) 14236
27.5%
Hangul
ValueCountFrequency (%)
2607
 
3.2%
1726
 
2.1%
1605
 
2.0%
1524
 
1.9%
1492
 
1.8%
1181
 
1.4%
1107
 
1.3%
1055
 
1.3%
997
 
1.2%
962
 
1.2%
Other values (1036) 67926
82.7%
CJK
ValueCountFrequency (%)
465
 
3.0%
372
 
2.4%
298
 
1.9%
239
 
1.5%
237
 
1.5%
232
 
1.5%
213
 
1.4%
181
 
1.2%
180
 
1.2%
137
 
0.9%
Other values (1343) 13007
83.6%
None
ValueCountFrequency (%)
· 191
72.3%
13
 
4.9%
13
 
4.9%
12
 
4.5%
12
 
4.5%
7
 
2.7%
4
 
1.5%
3
 
1.1%
® 2
 
0.8%
´ 1
 
0.4%
Other values (6) 6
 
2.3%
CJK Compat Ideographs
ValueCountFrequency (%)
43
13.9%
41
 
13.2%
29
 
9.4%
29
 
9.4%
14
 
4.5%
13
 
4.2%
10
 
3.2%
10
 
3.2%
9
 
2.9%
7
 
2.3%
Other values (50) 105
33.9%
Compat Jamo
ValueCountFrequency (%)
32
91.4%
3
 
8.6%
Hiragana
ValueCountFrequency (%)
13
59.1%
3
 
13.6%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
Cyrillic
ValueCountFrequency (%)
и 7
 
12.5%
я 5
 
8.9%
е 5
 
8.9%
л 4
 
7.1%
а 4
 
7.1%
р 4
 
7.1%
д 3
 
5.4%
ы 2
 
3.6%
т 2
 
3.6%
п 2
 
3.6%
Other values (16) 18
32.1%
Katakana
ValueCountFrequency (%)
6
 
12.5%
5
 
10.4%
4
 
8.3%
4
 
8.3%
3
 
6.2%
2
 
4.2%
2
 
4.2%
1
 
2.1%
1
 
2.1%
1
 
2.1%
Other values (19) 19
39.6%
Number Forms
ValueCountFrequency (%)
6
33.3%
5
27.8%
4
22.2%
1
 
5.6%
1
 
5.6%
1
 
5.6%
Letterlike Symbols
ValueCountFrequency (%)
5
100.0%
Punctuation
ValueCountFrequency (%)
2
66.7%
1
33.3%
Enclosed Alphanum
ValueCountFrequency (%)
1
100.0%

저자
Text

Distinct8245
Distinct (%)83.2%
Missing93
Missing (%)0.9%
Memory size156.2 KiB
2023-12-12T13:56:12.920447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length64
Median length3
Mean length6.7266579
Min length2

Characters and Unicode

Total characters66641
Distinct characters915
Distinct categories11 ?
Distinct scripts7 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7309 ?
Unique (%)73.8%

Sample

1st rowMartin, James
2nd row조영호
3rd row노덕환
4th row외솔회
5th row서효일
ValueCountFrequency (%)
한국 168
 
1.2%
편집부 80
 
0.6%
david 45
 
0.3%
j 43
 
0.3%
정보통신부 37
 
0.3%
john 35
 
0.3%
michael 35
 
0.3%
m 33
 
0.2%
과학기술부 31
 
0.2%
l 27
 
0.2%
Other values (9538) 13057
96.1%
2023-12-12T13:56:13.497997image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3686
 
5.5%
a 2156
 
3.2%
e 1830
 
2.7%
, 1756
 
2.6%
i 1623
 
2.4%
r 1469
 
2.2%
n 1436
 
2.2%
o 1322
 
2.0%
1114
 
1.7%
1092
 
1.6%
Other values (905) 49157
73.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 37087
55.7%
Lowercase Letter 17385
26.1%
Uppercase Letter 4185
 
6.3%
Space Separator 3686
 
5.5%
Other Punctuation 2619
 
3.9%
Decimal Number 1299
 
1.9%
Dash Punctuation 279
 
0.4%
Open Punctuation 41
 
0.1%
Close Punctuation 40
 
0.1%
Math Symbol 14
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1114
 
3.0%
1092
 
2.9%
1061
 
2.9%
957
 
2.6%
937
 
2.5%
898
 
2.4%
826
 
2.2%
814
 
2.2%
630
 
1.7%
575
 
1.6%
Other values (788) 28183
76.0%
Lowercase Letter
ValueCountFrequency (%)
a 2156
12.4%
e 1830
10.5%
i 1623
 
9.3%
r 1469
 
8.4%
n 1436
 
8.3%
o 1322
 
7.6%
l 925
 
5.3%
s 862
 
5.0%
t 825
 
4.7%
h 791
 
4.5%
Other values (38) 4146
23.8%
Uppercase Letter
ValueCountFrequency (%)
S 387
 
9.2%
M 355
 
8.5%
C 261
 
6.2%
J 253
 
6.0%
D 240
 
5.7%
R 239
 
5.7%
H 238
 
5.7%
K 228
 
5.4%
B 213
 
5.1%
A 207
 
4.9%
Other values (25) 1564
37.4%
Other Punctuation
ValueCountFrequency (%)
, 1756
67.0%
. 814
31.1%
& 14
 
0.5%
' 12
 
0.5%
· 12
 
0.5%
: 6
 
0.2%
# 2
 
0.1%
1
 
< 0.1%
* 1
 
< 0.1%
@ 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 346
26.6%
9 289
22.2%
5 115
 
8.9%
6 104
 
8.0%
4 99
 
7.6%
7 83
 
6.4%
2 77
 
5.9%
0 74
 
5.7%
3 57
 
4.4%
8 55
 
4.2%
Math Symbol
ValueCountFrequency (%)
> 5
35.7%
< 4
28.6%
+ 2
 
14.3%
= 2
 
14.3%
~ 1
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 278
99.6%
1
 
0.4%
Close Punctuation
ValueCountFrequency (%)
) 37
92.5%
] 3
 
7.5%
Open Punctuation
ValueCountFrequency (%)
( 37
90.2%
[ 4
 
9.8%
Modifier Symbol
ValueCountFrequency (%)
^ 5
83.3%
¨ 1
 
16.7%
Space Separator
ValueCountFrequency (%)
3686
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 36772
55.2%
Latin 21502
32.3%
Common 7984
 
12.0%
Han 302
 
0.5%
Cyrillic 68
 
0.1%
Katakana 11
 
< 0.1%
Hiragana 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1114
 
3.0%
1092
 
3.0%
1061
 
2.9%
957
 
2.6%
937
 
2.5%
898
 
2.4%
826
 
2.2%
814
 
2.2%
630
 
1.7%
575
 
1.6%
Other values (569) 27868
75.8%
Han
ValueCountFrequency (%)
7
 
2.3%
6
 
2.0%
5
 
1.7%
5
 
1.7%
4
 
1.3%
4
 
1.3%
4
 
1.3%
4
 
1.3%
3
 
1.0%
西 3
 
1.0%
Other values (197) 257
85.1%
Latin
ValueCountFrequency (%)
a 2156
 
10.0%
e 1830
 
8.5%
i 1623
 
7.5%
r 1469
 
6.8%
n 1436
 
6.7%
o 1322
 
6.1%
l 925
 
4.3%
s 862
 
4.0%
t 825
 
3.8%
h 791
 
3.7%
Other values (43) 8263
38.4%
Common
ValueCountFrequency (%)
3686
46.2%
, 1756
22.0%
. 814
 
10.2%
1 346
 
4.3%
9 289
 
3.6%
- 278
 
3.5%
5 115
 
1.4%
6 104
 
1.3%
4 99
 
1.2%
7 83
 
1.0%
Other values (24) 414
 
5.2%
Cyrillic
ValueCountFrequency (%)
а 9
 
13.2%
в 6
 
8.8%
о 6
 
8.8%
н 5
 
7.4%
и 4
 
5.9%
л 4
 
5.9%
е 3
 
4.4%
д 3
 
4.4%
й 3
 
4.4%
А 2
 
2.9%
Other values (20) 23
33.8%
Katakana
ValueCountFrequency (%)
2
18.2%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
Hiragana
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 36767
55.2%
ASCII 29470
44.2%
CJK 297
 
0.4%
Cyrillic 68
 
0.1%
None 15
 
< 0.1%
Katakana 11
 
< 0.1%
CJK Compat Ideographs 5
 
< 0.1%
Compat Jamo 5
 
< 0.1%
Hiragana 2
 
< 0.1%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3686
 
12.5%
a 2156
 
7.3%
e 1830
 
6.2%
, 1756
 
6.0%
i 1623
 
5.5%
r 1469
 
5.0%
n 1436
 
4.9%
o 1322
 
4.5%
l 925
 
3.1%
s 862
 
2.9%
Other values (72) 12405
42.1%
Hangul
ValueCountFrequency (%)
1114
 
3.0%
1092
 
3.0%
1061
 
2.9%
957
 
2.6%
937
 
2.5%
898
 
2.4%
826
 
2.2%
814
 
2.2%
630
 
1.7%
575
 
1.6%
Other values (565) 27863
75.8%
None
ValueCountFrequency (%)
· 12
80.0%
¨ 1
 
6.7%
1
 
6.7%
æ 1
 
6.7%
Cyrillic
ValueCountFrequency (%)
а 9
 
13.2%
в 6
 
8.8%
о 6
 
8.8%
н 5
 
7.4%
и 4
 
5.9%
л 4
 
5.9%
е 3
 
4.4%
д 3
 
4.4%
й 3
 
4.4%
А 2
 
2.9%
Other values (20) 23
33.8%
CJK
ValueCountFrequency (%)
7
 
2.4%
6
 
2.0%
5
 
1.7%
5
 
1.7%
4
 
1.3%
4
 
1.3%
4
 
1.3%
4
 
1.3%
3
 
1.0%
西 3
 
1.0%
Other values (193) 252
84.8%
Katakana
ValueCountFrequency (%)
2
18.2%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
CJK Compat Ideographs
ValueCountFrequency (%)
2
40.0%
1
20.0%
1
20.0%
1
20.0%
Compat Jamo
ValueCountFrequency (%)
2
40.0%
1
20.0%
1
20.0%
1
20.0%
Punctuation
ValueCountFrequency (%)
1
100.0%
Hiragana
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct4311
Distinct (%)43.2%
Missing17
Missing (%)0.2%
Memory size156.2 KiB
2023-12-12T13:56:13.792770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length48
Median length42
Mean length5.2625463
Min length1

Characters and Unicode

Total characters52536
Distinct characters1365
Distinct categories10 ?
Distinct scripts6 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2885 ?
Unique (%)28.9%

Sample

1st row이한
2nd row예문사
3rd row學文社
4th row정음문화사
5th row백암
ValueCountFrequency (%)
博英社 111
 
1.0%
法文社 74
 
0.7%
과학기술부 65
 
0.6%
출판부 64
 
0.6%
학지사 61
 
0.6%
정보통신부 61
 
0.6%
한국학술정보 51
 
0.5%
교육과학사 43
 
0.4%
螢雪出版社 41
 
0.4%
산업자원부 40
 
0.4%
Other values (4514) 10390
94.4%
2023-12-12T13:56:14.362193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2034
 
3.9%
1161
 
2.2%
1020
 
1.9%
977
 
1.9%
969
 
1.8%
957
 
1.8%
948
 
1.8%
941
 
1.8%
874
 
1.7%
614
 
1.2%
Other values (1355) 42041
80.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47630
90.7%
Lowercase Letter 1943
 
3.7%
Space Separator 1020
 
1.9%
Uppercase Letter 1000
 
1.9%
Open Punctuation 280
 
0.5%
Close Punctuation 279
 
0.5%
Other Punctuation 198
 
0.4%
Decimal Number 172
 
0.3%
Dash Punctuation 12
 
< 0.1%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2034
 
4.3%
1161
 
2.4%
977
 
2.1%
969
 
2.0%
957
 
2.0%
948
 
2.0%
941
 
2.0%
874
 
1.8%
614
 
1.3%
599
 
1.3%
Other values (1271) 37556
78.8%
Lowercase Letter
ValueCountFrequency (%)
o 215
11.1%
e 205
10.6%
i 193
9.9%
a 181
 
9.3%
n 160
 
8.2%
s 139
 
7.2%
r 112
 
5.8%
l 104
 
5.4%
t 101
 
5.2%
m 69
 
3.6%
Other values (23) 464
23.9%
Uppercase Letter
ValueCountFrequency (%)
B 136
13.6%
M 98
 
9.8%
C 91
 
9.1%
S 78
 
7.8%
E 74
 
7.4%
K 65
 
6.5%
A 46
 
4.6%
P 45
 
4.5%
O 39
 
3.9%
D 35
 
3.5%
Other values (14) 293
29.3%
Decimal Number
ValueCountFrequency (%)
2 71
41.3%
1 67
39.0%
0 17
 
9.9%
4 5
 
2.9%
3 3
 
1.7%
9 3
 
1.7%
8 2
 
1.2%
5 2
 
1.2%
6 1
 
0.6%
7 1
 
0.6%
Other Punctuation
ValueCountFrequency (%)
· 57
28.8%
. 47
23.7%
& 38
19.2%
, 25
12.6%
: 12
 
6.1%
10
 
5.1%
/ 5
 
2.5%
' 2
 
1.0%
* 1
 
0.5%
# 1
 
0.5%
Open Punctuation
ValueCountFrequency (%)
[ 217
77.5%
( 63
 
22.5%
Close Punctuation
ValueCountFrequency (%)
] 216
77.4%
) 63
 
22.6%
Space Separator
ValueCountFrequency (%)
1020
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 12
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 38458
73.2%
Han 9162
 
17.4%
Latin 2921
 
5.6%
Common 1963
 
3.7%
Cyrillic 22
 
< 0.1%
Katakana 10
 
< 0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
1161
 
12.7%
614
 
6.7%
278
 
3.0%
275
 
3.0%
260
 
2.8%
235
 
2.6%
181
 
2.0%
175
 
1.9%
164
 
1.8%
149
 
1.6%
Other values (651) 5670
61.9%
Hangul
ValueCountFrequency (%)
2034
 
5.3%
977
 
2.5%
969
 
2.5%
957
 
2.5%
948
 
2.5%
941
 
2.4%
874
 
2.3%
599
 
1.6%
598
 
1.6%
582
 
1.5%
Other values (600) 28979
75.4%
Latin
ValueCountFrequency (%)
o 215
 
7.4%
e 205
 
7.0%
i 193
 
6.6%
a 181
 
6.2%
n 160
 
5.5%
s 139
 
4.8%
B 136
 
4.7%
r 112
 
3.8%
l 104
 
3.6%
t 101
 
3.5%
Other values (38) 1375
47.1%
Common
ValueCountFrequency (%)
1020
52.0%
[ 217
 
11.1%
] 216
 
11.0%
2 71
 
3.6%
1 67
 
3.4%
) 63
 
3.2%
( 63
 
3.2%
· 57
 
2.9%
. 47
 
2.4%
& 38
 
1.9%
Other values (17) 104
 
5.3%
Katakana
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Cyrillic
ValueCountFrequency (%)
п 4
18.2%
н 4
18.2%
а 2
9.1%
у 2
9.1%
К 2
9.1%
р 2
9.1%
й 2
9.1%
л 2
9.1%
ы 2
9.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 38454
73.2%
CJK 9096
 
17.3%
ASCII 4817
 
9.2%
None 67
 
0.1%
CJK Compat Ideographs 66
 
0.1%
Cyrillic 22
 
< 0.1%
Katakana 10
 
< 0.1%
Compat Jamo 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2034
 
5.3%
977
 
2.5%
969
 
2.5%
957
 
2.5%
948
 
2.5%
941
 
2.4%
874
 
2.3%
599
 
1.6%
598
 
1.6%
582
 
1.5%
Other values (599) 28975
75.3%
CJK
ValueCountFrequency (%)
1161
 
12.8%
614
 
6.8%
278
 
3.1%
275
 
3.0%
260
 
2.9%
235
 
2.6%
181
 
2.0%
175
 
1.9%
164
 
1.8%
149
 
1.6%
Other values (621) 5604
61.6%
ASCII
ValueCountFrequency (%)
1020
21.2%
[ 217
 
4.5%
] 216
 
4.5%
o 215
 
4.5%
e 205
 
4.3%
i 193
 
4.0%
a 181
 
3.8%
n 160
 
3.3%
s 139
 
2.9%
B 136
 
2.8%
Other values (63) 2135
44.3%
None
ValueCountFrequency (%)
· 57
85.1%
10
 
14.9%
CJK Compat Ideographs
ValueCountFrequency (%)
8
 
12.1%
7
 
10.6%
6
 
9.1%
5
 
7.6%
4
 
6.1%
4
 
6.1%
4
 
6.1%
3
 
4.5%
2
 
3.0%
2
 
3.0%
Other values (20) 21
31.8%
Cyrillic
ValueCountFrequency (%)
п 4
18.2%
н 4
18.2%
а 2
9.1%
у 2
9.1%
К 2
9.1%
р 2
9.1%
й 2
9.1%
л 2
9.1%
ы 2
9.1%
Compat Jamo
ValueCountFrequency (%)
4
100.0%
Katakana
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Distinct93
Distinct (%)0.9%
Missing38
Missing (%)0.4%
Memory size156.2 KiB
2023-12-12T13:56:14.643811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length3.9979924
Min length2

Characters and Unicode

Total characters39828
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)0.2%

Sample

1st row1998
2nd row2007
3rd row1991
4th row1989
5th row1995
ValueCountFrequency (%)
2010 507
 
5.1%
2009 463
 
4.6%
2012 430
 
4.3%
2014 405
 
4.1%
2006 404
 
4.1%
2007 401
 
4.0%
2005 399
 
4.0%
2011 382
 
3.8%
2008 379
 
3.8%
2004 367
 
3.7%
Other values (82) 5825
58.5%
2023-12-12T13:56:15.132256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 11218
28.2%
2 7672
19.3%
1 7260
18.2%
9 6017
15.1%
8 1970
 
4.9%
7 1417
 
3.6%
4 1126
 
2.8%
6 1120
 
2.8%
5 1074
 
2.7%
3 947
 
2.4%
Other values (4) 7
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 39821
> 99.9%
Dash Punctuation 3
 
< 0.1%
Uppercase Letter 2
 
< 0.1%
Lowercase Letter 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 11218
28.2%
2 7672
19.3%
1 7260
18.2%
9 6017
15.1%
8 1970
 
4.9%
7 1417
 
3.6%
4 1126
 
2.8%
6 1120
 
2.8%
5 1074
 
2.7%
3 947
 
2.4%
Lowercase Letter
ValueCountFrequency (%)
u 1
50.0%
s 1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Uppercase Letter
ValueCountFrequency (%)
U 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 39824
> 99.9%
Latin 4
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 11218
28.2%
2 7672
19.3%
1 7260
18.2%
9 6017
15.1%
8 1970
 
4.9%
7 1417
 
3.6%
4 1126
 
2.8%
6 1120
 
2.8%
5 1074
 
2.7%
3 947
 
2.4%
Latin
ValueCountFrequency (%)
U 2
50.0%
u 1
25.0%
s 1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 39828
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 11218
28.2%
2 7672
19.3%
1 7260
18.2%
9 6017
15.1%
8 1970
 
4.9%
7 1417
 
3.6%
4 1126
 
2.8%
6 1120
 
2.8%
5 1074
 
2.7%
3 947
 
2.4%
Other values (4) 7
 
< 0.1%

분류번호
Text

MISSING 

Distinct3276
Distinct (%)33.1%
Missing109
Missing (%)1.1%
Memory size156.2 KiB
2023-12-12T13:56:15.607273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length10
Mean length5.3564857
Min length1

Characters and Unicode

Total characters52981
Distinct characters13
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1906 ?
Unique (%)19.3%

Sample

1st row5.4
2nd row542.1
3rd row325.8
4th row991.17
5th row390
ValueCountFrequency (%)
688.2 262
 
2.6%
813.6 137
 
1.4%
325.1 85
 
0.9%
814.6 73
 
0.7%
811.6 65
 
0.7%
325.04 58
 
0.6%
28.64 56
 
0.6%
4.76 53
 
0.5%
818 48
 
0.5%
320.1 48
 
0.5%
Other values (3266) 9007
91.1%
2023-12-12T13:56:16.239456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 8294
15.7%
3 6576
12.4%
1 6434
12.1%
5 5087
9.6%
2 4687
8.8%
0 4070
7.7%
6 3800
7.2%
8 3745
7.1%
7 3597
6.8%
9 3444
6.5%
Other values (3) 3247
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 44685
84.3%
Other Punctuation 8295
 
15.7%
Space Separator 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 6576
14.7%
1 6434
14.4%
5 5087
11.4%
2 4687
10.5%
0 4070
9.1%
6 3800
8.5%
8 3745
8.4%
7 3597
8.0%
9 3444
7.7%
4 3245
7.3%
Other Punctuation
ValueCountFrequency (%)
. 8294
> 99.9%
, 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 52981
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 8294
15.7%
3 6576
12.4%
1 6434
12.1%
5 5087
9.6%
2 4687
8.8%
0 4070
7.7%
6 3800
7.2%
8 3745
7.1%
7 3597
6.8%
9 3444
6.5%
Other values (3) 3247
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 52981
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 8294
15.7%
3 6576
12.4%
1 6434
12.1%
5 5087
9.6%
2 4687
8.8%
0 4070
7.7%
6 3800
7.2%
8 3745
7.1%
7 3597
6.8%
9 3444
6.5%
Other values (3) 3247
 
6.1%

Interactions

2023-12-12T13:56:10.366618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T13:56:16.395626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
서지번호유형출판년도
서지번호1.0000.1910.939
유형0.1911.0000.761
출판년도0.9390.7611.000
2023-12-12T13:56:16.520157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
서지번호유형
서지번호1.0000.122
유형0.1221.000

Missing values

2023-12-12T13:56:10.510392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:56:10.639454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T13:56:10.762306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

서지번호유형서명저자출판사출판년도분류번호
65128285701국내단행본TCP/IP 네트워킹Martin, James이한19985.4
73126430753국내단행본건축설계이론조영호예문사2007542.1
27044153979국내단행본(새로운)財務管理論노덕환學文社1991325.8
19930244004국내단행본(단재)신채호외솔회정음문화사1989991.17
94723212169국내단행본軍改革 이렇게 해야 한다서효일백암1995390
322562182127국내단행본(실험과 도전,) 식민지의 심연이상, 1910-1937민음사2010810.906
18846452080국내단행본(내 손으로 받는) 우리 종자안완식, 1942-들녘2007523.22
4033844839국내단행본(長篇小說)그 少年의 첫사랑Wouk, Herman乙酉文化社1956843
195072980699국내단행본(누리과정과 연계한) 창의적 전통놀이임혜수창지사2016375.1
57443348907국내단행본After effects 5이병현사이버출판사20014.76
서지번호유형서명저자출판사출판년도분류번호
19158452841국내단행본(노인복지를 위한) 노인영양관리이병순광문각2007594.1
77638413527국내단행본계량정보분석을 위한 프로그래밍 활용사례연구한국과학기술정보연구원한국과학기술정보연구원20053.56
32273197330국내단행본(實話集)狼虎血戰記구소청朝洋社1953813.7
1795887986국내단행본(김수용 운명소설)命김수용玄岩社1989813.6
82082205691국내단행본(Again!)뒤집어본 영문법오성호김영사2006745
31180166666국내단행본(新制)作物生理學박종성鄕文社1994524
924272542252국내단행본국어과 교과서론주세형, 1973-사회평론2014374.71
60432437071국내단행본GoF의 디자인 패턴Gamma, Erich피어슨에듀케이션코리아20075.115
76341151404국내단행본경제사강의한경민두리1989320.9
28574249357국내단행본(소설)마쓰시타Kosaka Jiro매일경제신문사1995813.6