gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	100
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	2.6 KiB
Average record size in memory	26.3 B

Variable types

Text	1
Categorical	2

Dataset

Description	병원정보시스템에 저장되어 있는 전체 데이터에서 ICD-10 코드 중 E10, E11~14, 024의 진단코드를 가진 환자를 추출한 코호트의 인구통계학적 정보 데이터임. 환자들의 최초진단 당시의 연령, 성별 데이터를 이용하여 연령대별 특성과 성별 특성을 분석할 수 있음. -SEX : 0은 남자, 1은 여자로 구분 하였음
Author	가톨릭대학교 서울성모병원
URL	http://cmcdata.net/data/dataset/diabetes_demo

Alerts

RID has unique values Unique

Reproduction

Analysis started	2023-10-08 18:57:48.723170
Analysis finished	2023-10-08 18:57:51.114656
Duration	2.39 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

RID
Text

UNIQUE

Distinct	100
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

Length

Max length	8
Median length	8
Mean length	8
Min length	8

Characters and Unicode

Total characters	800
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	100 ?
Unique (%)	100.0%

Sample

1st row	R0000001
2nd row	R0000002
3rd row	R0000003
4th row	R0000004
5th row	R0000005

Value	Count	Frequency (%)
r0000001	1	1.0%
r0000063	1	1.0%
r0000074	1	1.0%
r0000073	1	1.0%
r0000072	1	1.0%
r0000071	1	1.0%
r0000070	1	1.0%
r0000069	1	1.0%
r0000068	1	1.0%
r0000067	1	1.0%
Other values (90)	90	90.0%

Most occurring characters

Value	Count	Frequency (%)
0	519	64.9%
R	100	12.5%
1	21	2.6%
3	20	2.5%
4	20	2.5%
5	20	2.5%
6	20	2.5%
7	20	2.5%
8	20	2.5%
9	20	2.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	700	87.5%
Uppercase Letter	100	12.5%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	519	74.1%
1	21	3.0%
3	20	2.9%
4	20	2.9%
5	20	2.9%
6	20	2.9%
7	20	2.9%
8	20	2.9%
9	20	2.9%
2	20	2.9%

Uppercase Letter

Value	Count	Frequency (%)
R	100	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	700	87.5%
Latin	100	12.5%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	519	74.1%
1	21	3.0%
3	20	2.9%
4	20	2.9%
5	20	2.9%
6	20	2.9%
7	20	2.9%
8	20	2.9%
9	20	2.9%
2	20	2.9%

Latin

Value	Count	Frequency (%)
R	100	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	800	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	519	64.9%
R	100	12.5%
1	21	2.6%
3	20	2.5%
4	20	2.5%
5	20	2.5%
6	20	2.5%
7	20	2.5%
8	20	2.5%
9	20	2.5%

Age_grp
Categorical

Distinct	7
Distinct (%)	7.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

60대	32
50대	28
70대	17
80대	10
30대	6
Other values (2)	7

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	70대
2nd row	60대
3rd row	50대
4th row	50대
5th row	50대

Common Values

Value	Count	Frequency (%)
60대	32	32.0%
50대	28	28.0%
70대	17	17.0%
80대	10	10.0%
30대	6	6.0%
40대	5	5.0%
20대	2	2.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
60대	32	32.0%
50대	28	28.0%
70대	17	17.0%
80대	10	10.0%
30대	6	6.0%
40대	5	5.0%
20대	2	2.0%

SEX
Categorical

Distinct	2
Distinct (%)	2.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

1	52
0	48

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	0
2nd row	0
3rd row	0
4th row	1
5th row	0

Common Values

Value	Count	Frequency (%)
1	52	52.0%
0	48	48.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
1	52	52.0%
0	48	48.0%

Heatmap
Table

	RID	Age_grp	SEX
RID	1.000	1.000	1.000
Age_grp	1.000	1.000	0.000
SEX	1.000	0.000	1.000

Heatmap
Table

	SEX	Age_grp
SEX	1.000	0.000
Age_grp	0.000	1.000

Heatmap
Table

	Age_grp	SEX
Age_grp	1.000	0.000
SEX	0.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	RID	Age_grp	SEX
0	R0000001	70대	0
1	R0000002	60대	0
2	R0000003	50대	0
3	R0000004	50대	1
4	R0000005	50대	0
5	R0000006	40대	1
6	R0000007	60대	1
7	R0000008	30대	1
8	R0000009	50대	0
9	R0000010	60대	0

	RID	Age_grp	SEX
90	R0000091	40대	0
91	R0000092	70대	0
92	R0000093	60대	1
93	R0000094	70대	1
94	R0000095	70대	0
95	R0000096	60대	1
96	R0000097	70대	0
97	R0000098	70대	1
98	R0000099	60대	1
99	R0000100	50대	1

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Correlations

Missing values

Sample