Overview

Dataset statistics

Number of variables3
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.6 KiB
Average record size in memory26.3 B

Variable types

Text1
Categorical2

Dataset

Description병원정보시스템에 저장되어 있는 전체 데이터에서 ICD-10 코드 중 E10, E11~14, 024의 진단코드를 가진 환자를 추출한 코호트의 인구통계학적 정보 데이터임. 환자들의 최초진단 당시의 연령, 성별 데이터를 이용하여 연령대별 특성과 성별 특성을 분석할 수 있음. -SEX : 0은 남자, 1은 여자로 구분 하였음
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/diabetes_demo

Alerts

RID has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:57:48.723170
Analysis finished2023-10-08 18:57:51.114656
Duration2.39 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:57:51.532518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters800
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowR0000001
2nd rowR0000002
3rd rowR0000003
4th rowR0000004
5th rowR0000005
ValueCountFrequency (%)
r0000001 1
 
1.0%
r0000063 1
 
1.0%
r0000074 1
 
1.0%
r0000073 1
 
1.0%
r0000072 1
 
1.0%
r0000071 1
 
1.0%
r0000070 1
 
1.0%
r0000069 1
 
1.0%
r0000068 1
 
1.0%
r0000067 1
 
1.0%
Other values (90) 90
90.0%
2023-10-09T03:57:52.414760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 519
64.9%
R 100
 
12.5%
1 21
 
2.6%
3 20
 
2.5%
4 20
 
2.5%
5 20
 
2.5%
6 20
 
2.5%
7 20
 
2.5%
8 20
 
2.5%
9 20
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.5%
Uppercase Letter 100
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 519
74.1%
1 21
 
3.0%
3 20
 
2.9%
4 20
 
2.9%
5 20
 
2.9%
6 20
 
2.9%
7 20
 
2.9%
8 20
 
2.9%
9 20
 
2.9%
2 20
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
R 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.5%
Latin 100
 
12.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 519
74.1%
1 21
 
3.0%
3 20
 
2.9%
4 20
 
2.9%
5 20
 
2.9%
6 20
 
2.9%
7 20
 
2.9%
8 20
 
2.9%
9 20
 
2.9%
2 20
 
2.9%
Latin
ValueCountFrequency (%)
R 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 519
64.9%
R 100
 
12.5%
1 21
 
2.6%
3 20
 
2.5%
4 20
 
2.5%
5 20
 
2.5%
6 20
 
2.5%
7 20
 
2.5%
8 20
 
2.5%
9 20
 
2.5%

Age_grp
Categorical

Distinct7
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
60대
32 
50대
28 
70대
17 
80대
10 
30대
Other values (2)

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row70대
2nd row60대
3rd row50대
4th row50대
5th row50대

Common Values

ValueCountFrequency (%)
60대 32
32.0%
50대 28
28.0%
70대 17
17.0%
80대 10
 
10.0%
30대 6
 
6.0%
40대 5
 
5.0%
20대 2
 
2.0%

Length

2023-10-09T03:57:52.761346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:52.957896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
60대 32
32.0%
50대 28
28.0%
70대 17
17.0%
80대 10
 
10.0%
30대 6
 
6.0%
40대 5
 
5.0%
20대 2
 
2.0%

SEX
Categorical

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
52 
0
48 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
1 52
52.0%
0 48
48.0%

Length

2023-10-09T03:57:53.172743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:53.342530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 52
52.0%
0 48
48.0%

Correlations

2023-10-09T03:57:53.471881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDAge_grpSEX
RID1.0001.0001.000
Age_grp1.0001.0000.000
SEX1.0000.0001.000
2023-10-09T03:57:53.642397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
SEXAge_grp
SEX1.0000.000
Age_grp0.0001.000
2023-10-09T03:57:53.945620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Age_grpSEX
Age_grp1.0000.000
SEX0.0001.000

Missing values

2023-10-09T03:57:50.899123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:57:51.038905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

RIDAge_grpSEX
0R000000170대0
1R000000260대0
2R000000350대0
3R000000450대1
4R000000550대0
5R000000640대1
6R000000760대1
7R000000830대1
8R000000950대0
9R000001060대0
RIDAge_grpSEX
90R000009140대0
91R000009270대0
92R000009360대1
93R000009470대1
94R000009570대0
95R000009660대1
96R000009770대0
97R000009870대1
98R000009960대1
99R000010050대1