Overview

Dataset statistics

Number of variables3
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.6 KiB
Average record size in memory26.3 B

Variable types

Text1
Categorical2

Dataset

Description병원정보시스템에 저장되어 있는 전체 데이터에서 ICD-10 코드 중 F101, F102, F103, F104, F109의 진단코드를 가진 환자와 K700, K701, K703, K7030, K7031, K7041, K709의 진단코드를 가진 환자들을 추출한 코호트의 인구통계학적 정보 데이터임. 환자들의 최초 처방 당시의 연령, 성별 데이터를 이용하여 연령대별 특성과 성별 특성을 분석할 수 있음. -SEX : 0은 남자, 1은 여자로 구분 하였음
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/demographic-data-alcohol-use-disorder

Alerts

SEX is highly imbalanced (53.1%)Imbalance
RID has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:56:20.154028
Analysis finished2023-10-08 18:56:20.562791
Duration0.41 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:56:21.001678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters800
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowR0000002
2nd rowR0000003
3rd rowR0000004
4th rowR0000006
5th rowR0000008
ValueCountFrequency (%)
r0000002 1
 
1.0%
r0000109 1
 
1.0%
r0000133 1
 
1.0%
r0000129 1
 
1.0%
r0000128 1
 
1.0%
r0000125 1
 
1.0%
r0000122 1
 
1.0%
r0000118 1
 
1.0%
r0000116 1
 
1.0%
r0000114 1
 
1.0%
Other values (90) 90
90.0%
2023-10-09T03:56:21.848337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 479
59.9%
R 100
 
12.5%
1 65
 
8.1%
6 24
 
3.0%
2 22
 
2.8%
5 21
 
2.6%
4 21
 
2.6%
3 21
 
2.6%
7 17
 
2.1%
9 16
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.5%
Uppercase Letter 100
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 479
68.4%
1 65
 
9.3%
6 24
 
3.4%
2 22
 
3.1%
5 21
 
3.0%
4 21
 
3.0%
3 21
 
3.0%
7 17
 
2.4%
9 16
 
2.3%
8 14
 
2.0%
Uppercase Letter
ValueCountFrequency (%)
R 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.5%
Latin 100
 
12.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 479
68.4%
1 65
 
9.3%
6 24
 
3.4%
2 22
 
3.1%
5 21
 
3.0%
4 21
 
3.0%
3 21
 
3.0%
7 17
 
2.4%
9 16
 
2.3%
8 14
 
2.0%
Latin
ValueCountFrequency (%)
R 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 479
59.9%
R 100
 
12.5%
1 65
 
8.1%
6 24
 
3.0%
2 22
 
2.8%
5 21
 
2.6%
4 21
 
2.6%
3 21
 
2.6%
7 17
 
2.1%
9 16
 
2.0%

Age_grp
Categorical

Distinct6
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
50대
35 
40대
25 
60대
18 
30대
13 
70대

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row60대
2nd row50대
3rd row70대
4th row30대
5th row50대

Common Values

ValueCountFrequency (%)
50대 35
35.0%
40대 25
25.0%
60대 18
18.0%
30대 13
 
13.0%
70대 8
 
8.0%
10대 1
 
1.0%

Length

2023-10-09T03:56:22.259166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:22.519856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
50대 35
35.0%
40대 25
25.0%
60대 18
18.0%
30대 13
 
13.0%
70대 8
 
8.0%
10대 1
 
1.0%

SEX
Categorical

IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
90 
1
10 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0 90
90.0%
1 10
 
10.0%

Length

2023-10-09T03:56:22.803388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:22.971493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 90
90.0%
1 10
 
10.0%

Correlations

2023-10-09T03:56:23.116326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDAge_grpSEX
RID1.0001.0001.000
Age_grp1.0001.0000.123
SEX1.0000.1231.000
2023-10-09T03:56:23.289732image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Age_grpSEX
Age_grp1.0000.084
SEX0.0841.000
2023-10-09T03:56:23.445377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Age_grpSEX
Age_grp1.0000.084
SEX0.0841.000

Missing values

2023-10-09T03:56:20.405040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:56:20.515308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

RIDAge_grpSEX
0R000000260대0
1R000000350대1
2R000000470대0
3R000000630대1
4R000000850대0
5R000001030대0
6R000001630대0
7R000001950대0
8R000002050대0
9R000002240대1
RIDAge_grpSEX
90R000016370대1
91R000016460대0
92R000016650대0
93R000017140대1
94R000017260대0
95R000017350대0
96R000017540대0
97R000017660대0
98R000017840대0
99R000018130대0