Overview

Dataset statistics

Number of variables2
Number of observations207
Missing cells0
Missing cells (%)0.0%
Duplicate rows8
Duplicate rows (%)3.9%
Total size in memory3.4 KiB
Average record size in memory16.6 B

Variable types

Text1
Categorical1

Dataset

Description2020년 기준 경기도 의정부시 도서관 사이버학습관 카테고리입니다. 카테고리명(도서종류명), 타입 항목의 데이터를 제공합니다.
URLhttps://www.data.go.kr/data/15064193/fileData.do

Alerts

Dataset has 8 (3.9%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 22:10:04.242313
Analysis finished2023-12-12 22:10:04.453513
Duration0.21 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct186
Distinct (%)89.9%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2023-12-13T07:10:04.642321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length11
Mean length4.5700483
Min length1

Characters and Unicode

Total characters946
Distinct characters231
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique169 ?
Unique (%)81.6%

Sample

1st row문학
2nd row한국소설
3rd row한국근대소설
4th row감성소설
5th row외국소설
ValueCountFrequency (%)
기타 5
 
2.4%
it 3
 
1.4%
문학 2
 
0.9%
교양 2
 
0.9%
어린이영어 2
 
0.9%
인물이야기 2
 
0.9%
역사 2
 
0.9%
어학 2
 
0.9%
어린이 2
 
0.9%
일본어 2
 
0.9%
Other values (180) 187
88.6%
2023-12-13T07:10:05.016763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 68
 
7.2%
53
 
5.6%
29
 
3.1%
26
 
2.7%
26
 
2.7%
22
 
2.3%
20
 
2.1%
18
 
1.9%
16
 
1.7%
15
 
1.6%
Other values (221) 653
69.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 822
86.9%
Other Punctuation 68
 
7.2%
Uppercase Letter 29
 
3.1%
Decimal Number 17
 
1.8%
Math Symbol 5
 
0.5%
Space Separator 4
 
0.4%
Dash Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
53
 
6.4%
29
 
3.5%
26
 
3.2%
26
 
3.2%
22
 
2.7%
20
 
2.4%
18
 
2.2%
16
 
1.9%
15
 
1.8%
15
 
1.8%
Other values (198) 582
70.8%
Uppercase Letter
ValueCountFrequency (%)
O 6
20.7%
I 4
13.8%
E 3
10.3%
T 3
10.3%
S 3
10.3%
D 2
 
6.9%
A 2
 
6.9%
B 2
 
6.9%
K 1
 
3.4%
R 1
 
3.4%
Other values (2) 2
 
6.9%
Decimal Number
ValueCountFrequency (%)
3 3
17.6%
4 3
17.6%
5 3
17.6%
6 3
17.6%
2 2
11.8%
1 2
11.8%
0 1
 
5.9%
Other Punctuation
ValueCountFrequency (%)
/ 68
100.0%
Math Symbol
ValueCountFrequency (%)
~ 5
100.0%
Space Separator
ValueCountFrequency (%)
4
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 822
86.9%
Common 95
 
10.0%
Latin 29
 
3.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
53
 
6.4%
29
 
3.5%
26
 
3.2%
26
 
3.2%
22
 
2.7%
20
 
2.4%
18
 
2.2%
16
 
1.9%
15
 
1.8%
15
 
1.8%
Other values (198) 582
70.8%
Latin
ValueCountFrequency (%)
O 6
20.7%
I 4
13.8%
E 3
10.3%
T 3
10.3%
S 3
10.3%
D 2
 
6.9%
A 2
 
6.9%
B 2
 
6.9%
K 1
 
3.4%
R 1
 
3.4%
Other values (2) 2
 
6.9%
Common
ValueCountFrequency (%)
/ 68
71.6%
~ 5
 
5.3%
4
 
4.2%
3 3
 
3.2%
4 3
 
3.2%
5 3
 
3.2%
6 3
 
3.2%
2 2
 
2.1%
1 2
 
2.1%
- 1
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 822
86.9%
ASCII 124
 
13.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 68
54.8%
O 6
 
4.8%
~ 5
 
4.0%
4
 
3.2%
I 4
 
3.2%
3 3
 
2.4%
4 3
 
2.4%
5 3
 
2.4%
6 3
 
2.4%
E 3
 
2.4%
Other values (13) 22
 
17.7%
Hangul
ValueCountFrequency (%)
53
 
6.4%
29
 
3.5%
26
 
3.2%
26
 
3.2%
22
 
2.7%
20
 
2.4%
18
 
2.2%
16
 
1.9%
15
 
1.8%
15
 
1.8%
Other values (198) 582
70.8%

타입
Categorical

Distinct4
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
EBK
159 
web
35 
ado
 
11
ebk
 
2

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEBK
2nd rowEBK
3rd rowEBK
4th rowEBK
5th rowEBK

Common Values

ValueCountFrequency (%)
EBK 159
76.8%
web 35
 
16.9%
ado 11
 
5.3%
ebk 2
 
1.0%

Length

2023-12-13T07:10:05.149098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:10:05.258079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
ebk 161
77.8%
web 35
 
16.9%
ado 11
 
5.3%

Missing values

2023-12-13T07:10:04.366293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:10:04.430749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

카테고리명타입
0문학EBK
1한국소설EBK
2한국근대소설EBK
3감성소설EBK
4외국소설EBK
5고전EBK
6EBK
7희곡EBK
8어른을위한동화EBK
9에세이/산문EBK
카테고리명타입
197마케팅ado
198리더쉽ado
199개인브랜드ado
200교양/평생교육ado
201커뮤니케이션ado
202심리/예절ado
203생활건강ado
204의학/건강ado
205영어동화ebk
206교양사상ebk

Duplicate rows

Most frequently occurring

카테고리명타입# duplicates
2기타EBK4
0ITweb3
1경제web2
3문학EBK2
4문화/예술EBK2
5어린이영어EBK2
6역사EBK2
7인물이야기EBK2