gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	207
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	8
Duplicate rows (%)	3.9%
Total size in memory	3.4 KiB
Average record size in memory	16.6 B

Variable types

Text	1
Categorical	1

Dataset

Description	2020년 기준 경기도 의정부시 도서관 사이버학습관 카테고리입니다. 카테고리명(도서종류명), 타입 항목의 데이터를 제공합니다.
URL	https://www.data.go.kr/data/15064193/fileData.do

Alerts

Dataset has 8 (3.9%) duplicate rows

Duplicates

Reproduction

Analysis started	2023-12-12 22:10:04.242313
Analysis finished	2023-12-12 22:10:04.453513
Duration	0.21 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

카테고리명
Text

Distinct	186
Distinct (%)	89.9%
Missing	0
Missing (%)	0.0%
Memory size	1.7 KiB

Length

Max length	13
Median length	11
Mean length	4.5700483
Min length	1

Characters and Unicode

Total characters	946
Distinct characters	231
Distinct categories	7 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	169 ?
Unique (%)	81.6%

Sample

1st row	문학
2nd row	한국소설
3rd row	한국근대소설
4th row	감성소설
5th row	외국소설

Value	Count	Frequency (%)
기타	5	2.4%
it	3	1.4%
문학	2	0.9%
교양	2	0.9%
어린이영어	2	0.9%
인물이야기	2	0.9%
역사	2	0.9%
어학	2	0.9%
어린이	2	0.9%
일본어	2	0.9%
Other values (180)	187	88.6%

Most occurring characters

Value	Count	Frequency (%)
/	68	7.2%
학	53	5.6%
어	29	3.1%
교	26	2.7%
문	26	2.7%
화	22	2.3%
기	20	2.1%
이	18	1.9%
리	16	1.7%
자	15	1.6%
Other values (221)	653	69.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	822	86.9%
Other Punctuation	68	7.2%
Uppercase Letter	29	3.1%
Decimal Number	17	1.8%
Math Symbol	5	0.5%
Space Separator	4	0.4%
Dash Punctuation	1	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
학	53	6.4%
어	29	3.5%
교	26	3.2%
문	26	3.2%
화	22	2.7%
기	20	2.4%
이	18	2.2%
리	16	1.9%
자	15	1.8%
사	15	1.8%
Other values (198)	582	70.8%

Uppercase Letter

Value	Count	Frequency (%)
O	6	20.7%
I	4	13.8%
E	3	10.3%
T	3	10.3%
S	3	10.3%
D	2	6.9%
A	2	6.9%
B	2	6.9%
K	1	3.4%
R	1	3.4%
Other values (2)	2	6.9%

Decimal Number

Value	Count	Frequency (%)
3	3	17.6%
4	3	17.6%
5	3	17.6%
6	3	17.6%
2	2	11.8%
1	2	11.8%
0	1	5.9%

Other Punctuation

Value	Count	Frequency (%)
/	68	100.0%

Math Symbol

Value	Count	Frequency (%)
~	5	100.0%

Space Separator

Value	Count	Frequency (%)
	4	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	822	86.9%
Common	95	10.0%
Latin	29	3.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
학	53	6.4%
어	29	3.5%
교	26	3.2%
문	26	3.2%
화	22	2.7%
기	20	2.4%
이	18	2.2%
리	16	1.9%
자	15	1.8%
사	15	1.8%
Other values (198)	582	70.8%

Latin

Value	Count	Frequency (%)
O	6	20.7%
I	4	13.8%
E	3	10.3%
T	3	10.3%
S	3	10.3%
D	2	6.9%
A	2	6.9%
B	2	6.9%
K	1	3.4%
R	1	3.4%
Other values (2)	2	6.9%

Common

Value	Count	Frequency (%)
/	68	71.6%
~	5	5.3%
	4	4.2%
3	3	3.2%
4	3	3.2%
5	3	3.2%
6	3	3.2%
2	2	2.1%
1	2	2.1%
-	1	1.1%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	822	86.9%
ASCII	124	13.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
/	68	54.8%
O	6	4.8%
~	5	4.0%
	4	3.2%
I	4	3.2%
3	3	2.4%
4	3	2.4%
5	3	2.4%
6	3	2.4%
E	3	2.4%
Other values (13)	22	17.7%

Hangul

Value	Count	Frequency (%)
학	53	6.4%
어	29	3.5%
교	26	3.2%
문	26	3.2%
화	22	2.7%
기	20	2.4%
이	18	2.2%
리	16	1.9%
자	15	1.8%
사	15	1.8%
Other values (198)	582	70.8%

타입
Categorical

Distinct	4
Distinct (%)	1.9%
Missing	0
Missing (%)	0.0%
Memory size	1.7 KiB

EBK	159
web	35
ado	11
ebk	2

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	EBK
2nd row	EBK
3rd row	EBK
4th row	EBK
5th row	EBK

Common Values

Value	Count	Frequency (%)
EBK	159	76.8%
web	35	16.9%
ado	11	5.3%
ebk	2	1.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
ebk	161	77.8%
web	35	16.9%
ado	11	5.3%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	카테고리명	타입
0	문학	EBK
1	한국소설	EBK
2	한국근대소설	EBK
3	감성소설	EBK
4	외국소설	EBK
5	고전	EBK
6	시	EBK
7	희곡	EBK
8	어른을위한동화	EBK
9	에세이/산문	EBK

	카테고리명	타입
197	마케팅	ado
198	리더쉽	ado
199	개인브랜드	ado
200	교양/평생교육	ado
201	커뮤니케이션	ado
202	심리/예절	ado
203	생활건강	ado
204	의학/건강	ado
205	영어동화	ebk
206	교양사상	ebk

Most frequently occurring

	카테고리명	타입	# duplicates
2	기타	EBK	4
0	IT	web	3
1	경제	web	2
3	문학	EBK	2
4	문화/예술	EBK	2
5	어린이영어	EBK	2
6	역사	EBK	2
7	인물이야기	EBK	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Other Punctuation

Math Symbol

Space Separator

Dash Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Common Values

Length

Common Values (Plot)

Missing values

Sample

Duplicate rows

Most frequently occurring