gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	100
Missing cells	1
Missing cells (%)	0.5%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	1.8 KiB
Average record size in memory	18.3 B

Variable types

Numeric	1
Text	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-2788/C/1/datasetView.do

Alerts

* 한국십진분류표 has unique values Unique

Reproduction

Analysis started	2024-03-23 02:56:57.021814
Analysis finished	2024-03-23 02:56:57.632106
Duration	0.61 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

* 한국십진분류표
Real number (ℝ)

UNIQUE

Distinct	100
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	495

Minimum	0
Maximum	990
Zeros	1
Zeros (%)	1.0%
Negative	0
Negative (%)	0.0%
Memory size	1.0 KiB

Quantile statistics

Minimum	0
5-th percentile	49.5
Q1	247.5
median	495
Q3	742.5
95-th percentile	940.5
Maximum	990
Range	990
Interquartile range (IQR)	495

Descriptive statistics

Standard deviation	290.11492
Coefficient of variation (CV)	0.58609075
Kurtosis	-1.2
Mean	495
Median Absolute Deviation (MAD)	250
Skewness	0
Sum	49500
Variance	84166.667
Monotonicity	Strictly increasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	1	1.0%
640	1	1.0%
740	1	1.0%
730	1	1.0%
720	1	1.0%
710	1	1.0%
700	1	1.0%
690	1	1.0%
680	1	1.0%
670	1	1.0%
Other values (90)	90	90.0%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
0	1	1.0%
10	1	1.0%
20	1	1.0%
30	1	1.0%
40	1	1.0%
50	1	1.0%
60	1	1.0%
70	1	1.0%
80	1	1.0%
90	1	1.0%

Value	Count	Frequency (%)
990	1	1.0%
980	1	1.0%
970	1	1.0%
960	1	1.0%
950	1	1.0%
940	1	1.0%
930	1	1.0%
920	1	1.0%
910	1	1.0%
900	1	1.0%

Unnamed: 1
Text

Distinct	99
Distinct (%)	100.0%
Missing	1
Missing (%)	1.0%
Memory size	932.0 B

Length

Max length	17
Median length	16
Mean length	6.2424242
Min length	3

Characters and Unicode

Total characters	618
Distinct characters	140
Distinct categories	5 ?
Distinct scripts	2 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	99 ?
Unique (%)	100.0%

Sample

1st row	총 류
2nd row	도서학,서지학
3rd row	문헌정보학
4th row	백과사전
5th row	강연집,수필집,연설문집

Value	Count	Frequency (%)
학	18	9.8%
어	6	3.3%
및	6	3.3%
교	6	3.3%
기타	5	2.7%
물	4	2.2%
일반	3	1.6%
제	2	1.1%
기	2	1.1%
독	2	1.1%
Other values (116)	130	70.7%

Most occurring characters

Value	Count	Frequency (%)
	85	13.8%
	66	10.7%
학	59	9.5%
,	25	4.0%
문	16	2.6%
아	13	2.1%
교	13	2.1%
어	12	1.9%
리	11	1.8%
공	11	1.8%
Other values (130)	307	49.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	440	71.2%
Space Separator	151	24.4%
Other Punctuation	25	4.0%
Close Punctuation	1	0.2%
Open Punctuation	1	0.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
학	59	13.4%
문	16	3.6%
아	13	3.0%
교	13	3.0%
어	12	2.7%
리	11	2.5%
공	11	2.5%
기	11	2.5%
예	8	1.8%
사	8	1.8%
Other values (125)	278	63.2%

Space Separator

Value	Count	Frequency (%)
	85	56.3%
	66	43.7%

Other Punctuation

Value	Count	Frequency (%)
,	25	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	1	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	440	71.2%
Common	178	28.8%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
학	59	13.4%
문	16	3.6%
아	13	3.0%
교	13	3.0%
어	12	2.7%
리	11	2.5%
공	11	2.5%
기	11	2.5%
예	8	1.8%
사	8	1.8%
Other values (125)	278	63.2%

Common

Value	Count	Frequency (%)
	85	47.8%
	66	37.1%
,	25	14.0%
)	1	0.6%
(	1	0.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	440	71.2%
ASCII	112	18.1%
None	66	10.7%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	85	75.9%
,	25	22.3%
)	1	0.9%
(	1	0.9%

None

Value	Count	Frequency (%)
	66	100.0%

Hangul

Value	Count	Frequency (%)
학	59	13.4%
문	16	3.6%
아	13	3.0%
교	13	3.0%
어	12	2.7%
리	11	2.5%
공	11	2.5%
기	11	2.5%
예	8	1.8%
사	8	1.8%
Other values (125)	278	63.2%

* 한국십진분류표

* 한국십진분류표

Phik (φk)

Heatmap
Table

	* 한국십진분류표	Unnamed: 1
* 한국십진분류표	1.000	1.000
Unnamed: 1	1.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	* 한국십진분류표	Unnamed: 1
0	0	총 류
1	10	도서학,서지학
2	20	문헌정보학
3	30	백과사전
4	40	강연집,수필집,연설문집
5	50	일반 연속간행물
6	60	일반 학회,단체,협회,기관
7	70	신문,저널리즘
8	80	일반 전집,총서
9	90	향토자료

	* 한국십진분류표	Unnamed: 1
90	900	역 사
91	910	아 시 아
92	920	유 럽
93	930	아프리카
94	940	북아메리카
95	950	남아메리카
96	960	오세아니아
97	970	양극지방
98	980	지 리
99	990	전 기

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Other Punctuation

Close Punctuation

Open Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

None

Hangul

Interactions

Correlations

Missing values

Sample