gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	32
Missing cells	25
Missing cells (%)	26.0%
Duplicate rows	1
Duplicate rows (%)	3.1%
Total size in memory	900.0 B
Average record size in memory	28.1 B

Variable types

Text	1
Categorical	2

Dataset

Description	가평군시설관리공단 경영평가 등급결과에 대한 데이터로 연도별( 2016년 ~ 2022년) 경영평가 등급 결과 자료 입니다.
Author	가평군시설관리공단
URL	https://www.data.go.kr/data/15121526/fileData.do

Alerts

Dataset has 1 (3.1%) duplicate rows	Duplicates
`데이터기준일자` is highly overall correlated with `평가등급`	High correlation
`평가등급` is highly overall correlated with `데이터기준일자`	High correlation
`평가(실적)연도` has 25 (78.1%) missing values	Missing

Reproduction

Analysis started	2023-12-12 03:38:39.248812
Analysis finished	2023-12-12 03:38:39.564150
Duration	0.32 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

평가(실적)연도
Text

MISSING

Distinct	7
Distinct (%)	100.0%
Missing	25
Missing (%)	78.1%
Memory size	388.0 B

Length

Max length	10
Median length	10
Mean length	10
Min length	10

Characters and Unicode

Total characters	70
Distinct characters	10
Distinct categories	3 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	7 ?
Unique (%)	100.0%

Sample

1st row	2022(2021)
2nd row	2021(2020)
3rd row	2020(2019)
4th row	2019(2018)
5th row	2018(2017)

Value	Count	Frequency (%)
2022(2021	1	14.3%
2021(2020	1	14.3%
2020(2019	1	14.3%
2019(2018	1	14.3%
2018(2017	1	14.3%
2017(2016	1	14.3%
2016(2015	1	14.3%

Most occurring characters

Value	Count	Frequency (%)
2	20	28.6%
0	16	22.9%
1	11	15.7%
(	7	10.0%
)	7	10.0%
9	2	2.9%
8	2	2.9%
7	2	2.9%
6	2	2.9%
5	1	1.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	56	80.0%
Open Punctuation	7	10.0%
Close Punctuation	7	10.0%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
2	20	35.7%
0	16	28.6%
1	11	19.6%
9	2	3.6%
8	2	3.6%
7	2	3.6%
6	2	3.6%
5	1	1.8%

Open Punctuation

Value	Count	Frequency (%)
(	7	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	7	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	70	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
2	20	28.6%
0	16	22.9%
1	11	15.7%
(	7	10.0%
)	7	10.0%
9	2	2.9%
8	2	2.9%
7	2	2.9%
6	2	2.9%
5	1	1.4%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	70	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
2	20	28.6%
0	16	22.9%
1	11	15.7%
(	7	10.0%
)	7	10.0%
9	2	2.9%
8	2	2.9%
7	2	2.9%
6	2	2.9%
5	1	1.4%

평가등급
Categorical

HIGH CORRELATION

Distinct	3
Distinct (%)	9.4%
Missing	0
Missing (%)	0.0%
Memory size	388.0 B

<NA>	25
나	4
다	3

Length

Max length	4
Median length	4
Mean length	3.34375
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	나
2nd row	다
3rd row	나
4th row	나
5th row	다

Common Values

Value	Count	Frequency (%)
<NA>	25	78.1%
나	4	12.5%
다	3	9.4%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	25	78.1%
나	4	12.5%
다	3	9.4%

데이터기준일자
Categorical

HIGH CORRELATION

Distinct	2
Distinct (%)	6.2%
Missing	0
Missing (%)	0.0%
Memory size	388.0 B

<NA>	25
2023-08-31	7

Length

Max length	10
Median length	4
Mean length	5.3125
Min length	4

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	2023-08-31
2nd row	2023-08-31
3rd row	2023-08-31
4th row	2023-08-31
5th row	2023-08-31

Common Values

Value	Count	Frequency (%)
<NA>	25	78.1%
2023-08-31	7	21.9%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	25	78.1%
2023-08-31	7	21.9%

Heatmap
Table

	평가(실적)연도	평가등급
평가(실적)연도	1.000	1.000
평가등급	1.000	1.000

Heatmap
Table

	데이터기준일자	평가등급
데이터기준일자	1.000	1.000
평가등급	1.000	1.000

Heatmap
Table

	평가등급	데이터기준일자
평가등급	1.000	1.000
데이터기준일자	1.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	평가(실적)연도	평가등급	데이터기준일자
0	2022(2021)	나	2023-08-31
1	2021(2020)	다	2023-08-31
2	2020(2019)	나	2023-08-31
3	2019(2018)	나	2023-08-31
4	2018(2017)	다	2023-08-31
5	2017(2016)	다	2023-08-31
6	2016(2015)	나	2023-08-31
7	<NA>	<NA>	<NA>
8	<NA>	<NA>	<NA>
9	<NA>	<NA>	<NA>

	평가(실적)연도	평가등급	데이터기준일자
22	<NA>	<NA>	<NA>
23	<NA>	<NA>	<NA>
24	<NA>	<NA>	<NA>
25	<NA>	<NA>	<NA>
26	<NA>	<NA>	<NA>
27	<NA>	<NA>	<NA>
28	<NA>	<NA>	<NA>
29	<NA>	<NA>	<NA>
30	<NA>	<NA>	<NA>
31	<NA>	<NA>	<NA>

Most frequently occurring

	평가(실적)연도	평가등급	데이터기준일자	# duplicates
0	<NA>	<NA>	<NA>	25

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Open Punctuation

Close Punctuation

Most occurring scripts

Most frequent character per script

Common

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring