Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows1028
Duplicate rows (%)10.3%
Total size in memory546.9 KiB
Average record size in memory56.0 B

Variable types

Text2
Categorical3
Boolean1

Dataset

Description대구광역시_공공도서관 연속자료(간행물) 보유 내역_20210920
Author대구광역시
URLhttp://data.daegu.go.kr/open/data/dataView.do?dataSetId=15089209&dataSetDetailId=1508920919cfddd557f68&provdMethod=FILE

Alerts

Dataset has 1028 (10.3%) duplicate rowsDuplicates
대출가능여부 is highly overall correlated with 등록번호 and 2 other fieldsHigh correlation
등록번호 is highly overall correlated with 대출가능여부 and 1 other fieldsHigh correlation
발행빈도 is highly overall correlated with 대출가능여부High correlation
도서관 is highly overall correlated with 등록번호 and 1 other fieldsHigh correlation
등록번호 is highly imbalanced (85.5%)Imbalance
대출가능여부 is highly imbalanced (95.2%)Imbalance
발행빈도 is highly imbalanced (50.5%)Imbalance
도서관 is highly imbalanced (62.1%)Imbalance

Reproduction

Analysis started2023-12-10 19:41:49.678481
Analysis finished2023-12-10 19:41:51.153954
Duration1.48 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

서명
Text

Distinct1437
Distinct (%)14.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T04:41:51.364427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length179
Median length152
Mean length11.6968
Min length1

Characters and Unicode

Total characters116968
Distinct characters511
Distinct categories7 ?
Distinct scripts6 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique435 ?
Unique (%)4.3%

Sample

1st row객석 AUDITORIUM
2nd row매일신문
3rd row이코노미스트2013 이코노미스트 2013
4th row신동아2014
5th row월간바둑 월간 바둑
ValueCountFrequency (%)
2013 1379
 
8.0%
time 594
 
3.5%
타임 560
 
3.3%
time2013 543
 
3.2%
타임2013 543
 
3.2%
2012 542
 
3.2%
뉴스위크2014 465
 
2.7%
시사in2014 400
 
2.3%
이코노미스트2014 339
 
2.0%
주간동아2014 336
 
2.0%
Other values (2049) 11484
66.8%
2023-12-11T04:41:52.006559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 10578
 
9.0%
2 9669
 
8.3%
1 8799
 
7.5%
7215
 
6.2%
E 3652
 
3.1%
3 3489
 
3.0%
I 3017
 
2.6%
2485
 
2.1%
T 2045
 
1.7%
4 2011
 
1.7%
Other values (501) 64008
54.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 50817
43.4%
Decimal Number 36513
31.2%
Uppercase Letter 22107
18.9%
Space Separator 7215
 
6.2%
Lowercase Letter 300
 
0.3%
Math Symbol 10
 
< 0.1%
Other Punctuation 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2485
 
4.9%
1822
 
3.6%
1799
 
3.5%
1503
 
3.0%
1419
 
2.8%
1356
 
2.7%
1322
 
2.6%
1303
 
2.6%
1279
 
2.5%
1221
 
2.4%
Other values (441) 35308
69.5%
Uppercase Letter
ValueCountFrequency (%)
E 3652
16.5%
I 3017
13.6%
T 2045
9.3%
M 1762
 
8.0%
N 1607
 
7.3%
A 1462
 
6.6%
O 1345
 
6.1%
R 1134
 
5.1%
S 866
 
3.9%
L 717
 
3.2%
Other values (17) 4500
20.4%
Lowercase Letter
ValueCountFrequency (%)
o 42
14.0%
n 32
10.7%
a 30
10.0%
e 29
9.7%
r 26
8.7%
i 23
7.7%
l 18
 
6.0%
c 18
 
6.0%
t 14
 
4.7%
f 13
 
4.3%
Other values (9) 55
18.3%
Decimal Number
ValueCountFrequency (%)
0 10578
29.0%
2 9669
26.5%
1 8799
24.1%
3 3489
 
9.6%
4 2011
 
5.5%
8 896
 
2.5%
7 570
 
1.6%
9 469
 
1.3%
5 21
 
0.1%
6 11
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 5
83.3%
: 1
 
16.7%
Space Separator
ValueCountFrequency (%)
7215
100.0%
Math Symbol
ValueCountFrequency (%)
= 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 50726
43.4%
Common 43744
37.4%
Latin 22402
19.2%
Han 76
 
0.1%
Katakana 15
 
< 0.1%
Greek 5
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2485
 
4.9%
1822
 
3.6%
1799
 
3.5%
1503
 
3.0%
1419
 
2.8%
1356
 
2.7%
1322
 
2.6%
1303
 
2.6%
1279
 
2.5%
1221
 
2.4%
Other values (401) 35217
69.4%
Latin
ValueCountFrequency (%)
E 3652
16.3%
I 3017
13.5%
T 2045
9.1%
M 1762
 
7.9%
N 1607
 
7.2%
A 1462
 
6.5%
O 1345
 
6.0%
R 1134
 
5.1%
S 866
 
3.9%
L 717
 
3.2%
Other values (35) 4795
21.4%
Han
ValueCountFrequency (%)
10
 
13.2%
9
 
11.8%
5
 
6.6%
4
 
5.3%
4
 
5.3%
4
 
5.3%
4
 
5.3%
3
 
3.9%
2
 
2.6%
2
 
2.6%
Other values (27) 29
38.2%
Common
ValueCountFrequency (%)
0 10578
24.2%
2 9669
22.1%
1 8799
20.1%
7215
16.5%
3 3489
 
8.0%
4 2011
 
4.6%
8 896
 
2.0%
7 570
 
1.3%
9 469
 
1.1%
5 21
 
< 0.1%
Other values (4) 27
 
0.1%
Katakana
ValueCountFrequency (%)
5
33.3%
5
33.3%
5
33.3%
Greek
ValueCountFrequency (%)
Α 5
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 66146
56.6%
Hangul 50725
43.4%
CJK 66
 
0.1%
Katakana 15
 
< 0.1%
CJK Compat Ideographs 10
 
< 0.1%
None 5
 
< 0.1%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 10578
16.0%
2 9669
14.6%
1 8799
13.3%
7215
10.9%
E 3652
 
5.5%
3 3489
 
5.3%
I 3017
 
4.6%
T 2045
 
3.1%
4 2011
 
3.0%
M 1762
 
2.7%
Other values (49) 13909
21.0%
Hangul
ValueCountFrequency (%)
2485
 
4.9%
1822
 
3.6%
1799
 
3.5%
1503
 
3.0%
1419
 
2.8%
1356
 
2.7%
1322
 
2.6%
1303
 
2.6%
1279
 
2.5%
1221
 
2.4%
Other values (400) 35216
69.4%
CJK Compat Ideographs
ValueCountFrequency (%)
10
100.0%
CJK
ValueCountFrequency (%)
9
 
13.6%
5
 
7.6%
4
 
6.1%
4
 
6.1%
4
 
6.1%
4
 
6.1%
3
 
4.5%
2
 
3.0%
2
 
3.0%
2
 
3.0%
Other values (26) 27
40.9%
None
ValueCountFrequency (%)
Α 5
100.0%
Katakana
ValueCountFrequency (%)
5
33.3%
5
33.3%
5
33.3%
Compat Jamo
ValueCountFrequency (%)
1
100.0%
Distinct629
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T04:41:52.374838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length47
Median length24
Mean length6.0059
Min length1

Characters and Unicode

Total characters60059
Distinct characters442
Distinct categories6 ?
Distinct scripts5 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique168 ?
Unique (%)1.7%

Sample

1st row주돌꽃컴퍼니
2nd row매일신문
3rd row중앙일보시사미디어사
4th row동아일보사
5th row한국기원
ValueCountFrequency (%)
timemagazine 555
 
5.5%
중앙일보사 473
 
4.7%
중앙일보시사미디어 468
 
4.7%
참언론 448
 
4.5%
동아일보사 446
 
4.4%
중앙일보시사미디어사 305
 
3.0%
조선일보 171
 
1.7%
매일경제신문사 156
 
1.6%
내일신문 155
 
1.5%
서울신문 149
 
1.5%
Other values (633) 6704
66.8%
2023-12-11T04:41:52.958050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4061
 
6.8%
3743
 
6.2%
3149
 
5.2%
2106
 
3.5%
1717
 
2.9%
E 1498
 
2.5%
1491
 
2.5%
1483
 
2.5%
I 1465
 
2.4%
A 1406
 
2.3%
Other values (432) 37940
63.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49911
83.1%
Uppercase Letter 9871
 
16.4%
Decimal Number 245
 
0.4%
Space Separator 30
 
< 0.1%
Close Punctuation 1
 
< 0.1%
Open Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4061
 
8.1%
3743
 
7.5%
3149
 
6.3%
2106
 
4.2%
1717
 
3.4%
1491
 
3.0%
1483
 
3.0%
1292
 
2.6%
1235
 
2.5%
1196
 
2.4%
Other values (397) 28438
57.0%
Uppercase Letter
ValueCountFrequency (%)
E 1498
15.2%
I 1465
14.8%
A 1406
14.2%
M 1366
13.8%
T 778
7.9%
N 746
7.6%
G 616
6.2%
Z 575
 
5.8%
L 166
 
1.7%
O 165
 
1.7%
Other values (16) 1090
11.0%
Decimal Number
ValueCountFrequency (%)
2 115
46.9%
1 110
44.9%
0 7
 
2.9%
3 5
 
2.0%
4 5
 
2.0%
8 3
 
1.2%
Space Separator
ValueCountFrequency (%)
30
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49820
83.0%
Latin 9871
 
16.4%
Common 277
 
0.5%
Han 87
 
0.1%
Hiragana 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4061
 
8.2%
3743
 
7.5%
3149
 
6.3%
2106
 
4.2%
1717
 
3.4%
1491
 
3.0%
1483
 
3.0%
1292
 
2.6%
1235
 
2.5%
1196
 
2.4%
Other values (351) 28347
56.9%
Han
ValueCountFrequency (%)
12
 
13.8%
9
 
10.3%
5
 
5.7%
3
 
3.4%
3
 
3.4%
3
 
3.4%
3
 
3.4%
3
 
3.4%
3
 
3.4%
2
 
2.3%
Other values (35) 41
47.1%
Latin
ValueCountFrequency (%)
E 1498
15.2%
I 1465
14.8%
A 1406
14.2%
M 1366
13.8%
T 778
7.9%
N 746
7.6%
G 616
6.2%
Z 575
 
5.8%
L 166
 
1.7%
O 165
 
1.7%
Other values (16) 1090
11.0%
Common
ValueCountFrequency (%)
2 115
41.5%
1 110
39.7%
30
 
10.8%
0 7
 
2.5%
3 5
 
1.8%
4 5
 
1.8%
8 3
 
1.1%
) 1
 
0.4%
( 1
 
0.4%
Hiragana
ValueCountFrequency (%)
4
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49820
83.0%
ASCII 10148
 
16.9%
CJK 87
 
0.1%
Hiragana 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4061
 
8.2%
3743
 
7.5%
3149
 
6.3%
2106
 
4.2%
1717
 
3.4%
1491
 
3.0%
1483
 
3.0%
1292
 
2.6%
1235
 
2.5%
1196
 
2.4%
Other values (351) 28347
56.9%
ASCII
ValueCountFrequency (%)
E 1498
14.8%
I 1465
14.4%
A 1406
13.9%
M 1366
13.5%
T 778
7.7%
N 746
7.4%
G 616
6.1%
Z 575
 
5.7%
L 166
 
1.6%
O 165
 
1.6%
Other values (25) 1367
13.5%
CJK
ValueCountFrequency (%)
12
 
13.8%
9
 
10.3%
5
 
5.7%
3
 
3.4%
3
 
3.4%
3
 
3.4%
3
 
3.4%
3
 
3.4%
3
 
3.4%
2
 
2.3%
Other values (35) 41
47.1%
Hiragana
ValueCountFrequency (%)
4
100.0%

등록번호
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct41
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
8880 
2.01E+11
 
429
2.011E+11
 
391
2.007E+11
 
74
2.012E+11
 
71
Other values (36)
 
155

Length

Max length12
Median length4
Mean length4.5267
Min length4

Unique

Unique32 ?
Unique (%)0.3%

Sample

1st row2.01E+11
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 8880
88.8%
2.01E+11 429
 
4.3%
2.011E+11 391
 
3.9%
2.007E+11 74
 
0.7%
2.012E+11 71
 
0.7%
2.005E+11 52
 
0.5%
2.009E+11 36
 
0.4%
2.008E+11 25
 
0.2%
2.006E+11 10
 
0.1%
BMQ000001783 1
 
< 0.1%
Other values (31) 31
 
0.3%

Length

2023-12-11T04:41:53.224307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 8880
88.8%
2.01e+11 429
 
4.3%
2.011e+11 391
 
3.9%
2.007e+11 74
 
0.7%
2.012e+11 71
 
0.7%
2.005e+11 52
 
0.5%
2.009e+11 36
 
0.4%
2.008e+11 25
 
0.2%
2.006e+11 10
 
0.1%
bmq000001697 1
 
< 0.1%
Other values (31) 31
 
0.3%

대출가능여부
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size87.9 KiB
False
9946 
True
 
54
ValueCountFrequency (%)
False 9946
99.5%
True 54
 
0.5%
2023-12-11T04:41:53.368750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

발행빈도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
주간
3742 
월간
3035 
일간
2701 
계간
 
146
연간
 
110
Other values (10)
 
266

Length

Max length4
Median length2
Mean length2.0228
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row월간
2nd row일간
3rd row주간
4th row월간
5th row월간

Common Values

ValueCountFrequency (%)
주간 3742
37.4%
월간 3035
30.3%
일간 2701
27.0%
계간 146
 
1.5%
연간 110
 
1.1%
격월간 66
 
0.7%
격주간 53
 
0.5%
반월간 47
 
0.5%
기타 47
 
0.5%
반년간 26
 
0.3%
Other values (5) 27
 
0.3%

Length

2023-12-11T04:41:53.541509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
주간 3742
37.4%
월간 3035
30.3%
일간 2701
27.0%
계간 146
 
1.5%
연간 110
 
1.1%
격월간 66
 
0.7%
격주간 53
 
0.5%
반월간 47
 
0.5%
기타 47
 
0.5%
반년간 26
 
0.3%
Other values (5) 27
 
0.3%

도서관
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
대구광역시립중앙도서관
6541 
대구광역시립수성도서관
3147 
대구광역시립북부도서관
 
153
대구광역시립남부도서관
 
106
구수산도서관
 
22
Other values (3)
 
31

Length

Max length11
Median length11
Mean length10.9862
Min length6

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row대구광역시립중앙도서관
2nd row대구광역시립수성도서관
3rd row대구광역시립중앙도서관
4th row대구광역시립중앙도서관
5th row대구광역시립수성도서관

Common Values

ValueCountFrequency (%)
대구광역시립중앙도서관 6541
65.4%
대구광역시립수성도서관 3147
31.5%
대구광역시립북부도서관 153
 
1.5%
대구광역시립남부도서관 106
 
1.1%
구수산도서관 22
 
0.2%
대구광역시립동부도서관 21
 
0.2%
이천어울림도서관 9
 
0.1%
수성구립 범어도서관 1
 
< 0.1%

Length

2023-12-11T04:41:53.783044image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T04:41:53.945421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대구광역시립중앙도서관 6541
65.4%
대구광역시립수성도서관 3147
31.5%
대구광역시립북부도서관 153
 
1.5%
대구광역시립남부도서관 106
 
1.1%
구수산도서관 22
 
0.2%
대구광역시립동부도서관 21
 
0.2%
이천어울림도서관 9
 
0.1%
수성구립 1
 
< 0.1%
범어도서관 1
 
< 0.1%

Correlations

2023-12-11T04:41:54.095208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록번호대출가능여부발행빈도도서관
등록번호1.000NaN0.8230.948
대출가능여부NaN1.0000.5900.698
발행빈도0.8230.5901.0000.792
도서관0.9480.6980.7921.000
2023-12-11T04:41:54.270556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대출가능여부등록번호발행빈도도서관
대출가능여부1.0001.0000.5430.533
등록번호1.0001.0000.4670.739
발행빈도0.5430.4671.0000.494
도서관0.5330.7390.4941.000
2023-12-11T04:41:54.425106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록번호대출가능여부발행빈도도서관
등록번호1.0001.0000.4670.739
대출가능여부1.0001.0000.5430.533
발행빈도0.4670.5431.0000.494
도서관0.7390.5330.4941.000

Missing values

2023-12-11T04:41:50.916326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T04:41:51.069963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

서명출판사등록번호대출가능여부발행빈도도서관
23629객석 AUDITORIUM주돌꽃컴퍼니2.01E+11N월간대구광역시립중앙도서관
9306매일신문매일신문<NA>N일간대구광역시립수성도서관
21199이코노미스트2013 이코노미스트 2013중앙일보시사미디어사<NA>N주간대구광역시립중앙도서관
24992신동아2014동아일보사<NA>N월간대구광역시립중앙도서관
740월간바둑 월간 바둑한국기원<NA>N월간대구광역시립수성도서관
24948CAMPUSJOBJOY CAMPUS JOBJOY한국경제신문사<NA>N월간대구광역시립중앙도서관
17408人文論集고려대학교 문과대학<NA>Y연간대구광역시립남부도서관
7055한겨레신문한겨레신문<NA>N일간대구광역시립수성도서관
4972서울신문서울신문<NA>N일간대구광역시립수성도서관
12677한겨레신문한겨레신문사<NA>N일간대구광역시립수성도서관
서명출판사등록번호대출가능여부발행빈도도서관
7385한국경제신문한국경제신문<NA>N일간대구광역시립수성도서관
35153TIME2013 TIME 2013 타임2013 타임TIMEMAGAZINE<NA>N주간대구광역시립중앙도서관
19639월간유아09월간유아<NA>N월간대구광역시립중앙도서관
28782시사IN2014참언론<NA>N주간대구광역시립중앙도서관
17123주간조선조선뉴스프레스<NA>N주간대구광역시립수성도서관
13544한겨레신문한겨레신문<NA>N일간대구광역시립수성도서관
12975한국경제신문한국경제<NA>N일간대구광역시립수성도서관
10259내일신문내일신문<NA>N일간대구광역시립수성도서관
15144스포츠서울스포츠서울<NA>N일간대구광역시립수성도서관
10885매일경제신문매일경제신문사<NA>N일간대구광역시립수성도서관

Duplicate rows

Most frequently occurring

서명출판사등록번호대출가능여부발행빈도도서관# duplicates
171TIME2013 TIME 2013 타임2013 타임TIMEMAGAZINE<NA>N주간대구광역시립중앙도서관543
339뉴스위크2014중앙일보사<NA>N주간대구광역시립중앙도서관465
551시사IN2014참언론<NA>N주간대구광역시립중앙도서관400
757이코노미스트2014중앙일보시사미디어<NA>N주간대구광역시립중앙도서관339
858주간동아2014동아일보사<NA>N주간대구광역시립중앙도서관336
756이코노미스트2013 이코노미스트 2013중앙일보시사미디어사<NA>N주간대구광역시립중앙도서관294
523서울신문서울신문<NA>N일간대구광역시립수성도서관149
235경북일보경북일보<NA>N일간대구광역시립수성도서관138
348대구신문대구신문<NA>N일간대구광역시립수성도서관138
632영남일보영남일보<NA>N일간대구광역시립수성도서관137