gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	100
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	2.6 KiB
Average record size in memory	26.3 B

Variable types

Text	1
Categorical	2

Dataset

Description	병원정보시스템에 저장되어 있는 전체 데이터로 부터 고지혈증 연구를 위한 선정기준을 적용한 쿼리문을 생성하여 추출한 코호트의 인구통계학적 정보 데이터임. 스타틴을 최초 처방받은 환자들의 최초 처방 당시의 연령, 성별 데이터를 이용하여 연령대별 특성과 성별 특성을 분석할 수 있음. -SEX : 0은 남자, 1은 여자로 구분 하였음
Author	가톨릭대학교 서울성모병원
URL	http://cmcdata.net/data/dataset/demographic-data-dyslipidemia

Alerts

RID has unique values Unique

Reproduction

Analysis started	2023-10-08 18:55:48.946786
Analysis finished	2023-10-08 18:55:52.024460
Duration	3.08 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

RID
Text

UNIQUE

Distinct	100
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

Length

Max length	8
Median length	8
Mean length	8
Min length	8

Characters and Unicode

Total characters	800
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	100 ?
Unique (%)	100.0%

Sample

1st row	R0000001
2nd row	R0000002
3rd row	R0000004
4th row	R0000010
5th row	R0000015

Value	Count	Frequency (%)
r0000001	1	1.0%
r0000204	1	1.0%
r0000230	1	1.0%
r0000226	1	1.0%
r0000225	1	1.0%
r0000222	1	1.0%
r0000219	1	1.0%
r0000210	1	1.0%
r0000209	1	1.0%
r0000208	1	1.0%
Other values (90)	90	90.0%

Most occurring characters

Value	Count	Frequency (%)
0	454	56.8%
R	100	12.5%
2	58	7.2%
1	51	6.4%
5	26	3.2%
3	24	3.0%
6	21	2.6%
8	18	2.2%
4	16	2.0%
7	16	2.0%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	700	87.5%
Uppercase Letter	100	12.5%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	454	64.9%
2	58	8.3%
1	51	7.3%
5	26	3.7%
3	24	3.4%
6	21	3.0%
8	18	2.6%
4	16	2.3%
7	16	2.3%
9	16	2.3%

Uppercase Letter

Value	Count	Frequency (%)
R	100	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	700	87.5%
Latin	100	12.5%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	454	64.9%
2	58	8.3%
1	51	7.3%
5	26	3.7%
3	24	3.4%
6	21	3.0%
8	18	2.6%
4	16	2.3%
7	16	2.3%
9	16	2.3%

Latin

Value	Count	Frequency (%)
R	100	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	800	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	454	56.8%
R	100	12.5%
2	58	7.2%
1	51	6.4%
5	26	3.2%
3	24	3.0%
6	21	2.6%
8	18	2.2%
4	16	2.0%
7	16	2.0%

Age_grp
Categorical

Distinct	7
Distinct (%)	7.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

60대	31
50대	24
40대	20
70대	17
30대	4
Other values (2)	4

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	60대
2nd row	40대
3rd row	60대
4th row	60대
5th row	40대

Common Values

Value	Count	Frequency (%)
60대	31	31.0%
50대	24	24.0%
40대	20	20.0%
70대	17	17.0%
30대	4	4.0%
20대	2	2.0%
80대	2	2.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
60대	31	31.0%
50대	24	24.0%
40대	20	20.0%
70대	17	17.0%
30대	4	4.0%
20대	2	2.0%
80대	2	2.0%

SEX
Categorical

Distinct	2
Distinct (%)	2.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

1	57
0	43

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	1
2nd row	0
3rd row	0
4th row	0
5th row	0

Common Values

Value	Count	Frequency (%)
1	57	57.0%
0	43	43.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
1	57	57.0%
0	43	43.0%

Heatmap
Table

	RID	Age_grp	SEX
RID	1.000	1.000	1.000
Age_grp	1.000	1.000	0.239
SEX	1.000	0.239	1.000

Heatmap
Table

	Age_grp	SEX
Age_grp	1.000	0.248
SEX	0.248	1.000

Heatmap
Table

	Age_grp	SEX
Age_grp	1.000	0.248
SEX	0.248	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	RID	Age_grp	SEX
0	R0000001	60대	1
1	R0000002	40대	0
2	R0000004	60대	0
3	R0000010	60대	0
4	R0000015	40대	0
5	R0000016	40대	0
6	R0000018	50대	1
7	R0000021	70대	1
8	R0000022	70대	0
9	R0000030	40대	1

	RID	Age_grp	SEX
90	R0000265	50대	0
91	R0000266	40대	0
92	R0000268	40대	1
93	R0000276	50대	1
94	R0000280	60대	0
95	R0000281	50대	1
96	R0000283	70대	0
97	R0000284	50대	1
98	R0000285	20대	0
99	R0000287	60대	1

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Correlations

Missing values

Sample