Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 100 |
Missing cells | 312 |
Missing cells (%) | 28.4% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 9.1 KiB |
Average record size in memory | 93.3 B |
Variable types
Text | 2 |
---|---|
Categorical | 5 |
DateTime | 4 |
Dataset
Description | 고지혈증 환자들의 최초 진단과와 다양한 공존 질환의 진단명과 진단코드 데이터. 진단명은 고혈압성 질환, 허혈성심장질환, 고밀도 및 구조장애, 신생물 등이 포함됨. 진단코드는 ICD-11 코드와 SNOMED-CT 코드로 매핑됨. -질환에 관한 진단코드 유무 : 0은 No, 1은 Yes로 구분 하였음 |
---|---|
Author | 가톨릭대학교 은평성모병원 |
URL | http://cmcdata.net/data/dataset/coexistence-disease-data-dyslipidemia-eunpyeong |
osteo_dig is highly overall correlated with osteo_cd | High correlation |
osteo_cd is highly overall correlated with osteo_dig | High correlation |
osteo_cd is highly imbalanced (56.7%) | Imbalance |
Hypt_date has 19 (19.0%) missing values | Missing |
Heart_date has 51 (51.0%) missing values | Missing |
cancer_cd has 85 (85.0%) missing values | Missing |
cancer_date has 85 (85.0%) missing values | Missing |
osteo_date has 72 (72.0%) missing values | Missing |
RID has unique values | Unique |
Reproduction
Analysis started | 2023-10-08 18:57:24.157582 |
---|---|
Analysis finished | 2023-10-08 18:57:25.833129 |
Duration | 1.68 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
RID
Text
UNIQUE
 
Distinct | 100 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Value | Count | Frequency (%) |
r0000001 | 1 | 1.0% |
r0000063 | 1 | 1.0% |
r0000074 | 1 | 1.0% |
r0000073 | 1 | 1.0% |
r0000072 | 1 | 1.0% |
r0000071 | 1 | 1.0% |
r0000070 | 1 | 1.0% |
r0000069 | 1 | 1.0% |
r0000068 | 1 | 1.0% |
r0000067 | 1 | 1.0% |
Other values (90) | 90 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 519 | |
R | 100 | 12.5% |
1 | 21 | 2.6% |
3 | 20 | 2.5% |
4 | 20 | 2.5% |
5 | 20 | 2.5% |
6 | 20 | 2.5% |
7 | 20 | 2.5% |
8 | 20 | 2.5% |
9 | 20 | 2.5% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 700 | |
Uppercase Letter | 100 | 12.5% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 519 | |
1 | 21 | 3.0% |
3 | 20 | 2.9% |
4 | 20 | 2.9% |
5 | 20 | 2.9% |
6 | 20 | 2.9% |
7 | 20 | 2.9% |
8 | 20 | 2.9% |
9 | 20 | 2.9% |
2 | 20 | 2.9% |
Uppercase Letter
Value | Count | Frequency (%) |
R | 100 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 700 | |
Latin | 100 | 12.5% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 519 | |
1 | 21 | 3.0% |
3 | 20 | 2.9% |
4 | 20 | 2.9% |
5 | 20 | 2.9% |
6 | 20 | 2.9% |
7 | 20 | 2.9% |
8 | 20 | 2.9% |
9 | 20 | 2.9% |
2 | 20 | 2.9% |
Latin
Value | Count | Frequency (%) |
R | 100 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 800 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 519 | |
R | 100 | 12.5% |
1 | 21 | 2.6% |
3 | 20 | 2.5% |
4 | 20 | 2.5% |
5 | 20 | 2.5% |
6 | 20 | 2.5% |
7 | 20 | 2.5% |
8 | 20 | 2.5% |
9 | 20 | 2.5% |
Hypt_dig
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
1 | |
---|---|
0 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 1 |
3rd row | 1 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
1 | 81 | |
0 | 19 | 19.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1 | 81 | |
0 | 19 | 19.0% |
Hypt_date
Date
MISSING
 
Distinct | 73 |
---|---|
Distinct (%) | 90.1% |
Missing | 19 |
Missing (%) | 19.0% |
Memory size | 932.0 B |
Minimum | 2001-08-01 00:00:00 |
---|---|
Maximum | 2018-06-07 00:00:00 |
Heart_dig
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
0 | |
---|---|
1 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1 |
---|---|
2nd row | 0 |
3rd row | 1 |
4th row | 1 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 51 | |
1 | 49 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 51 | |
1 | 49 |
Heart_date
Date
MISSING
 
Distinct | 48 |
---|---|
Distinct (%) | 98.0% |
Missing | 51 |
Missing (%) | 51.0% |
Memory size | 932.0 B |
Minimum | 2001-03-07 00:00:00 |
---|---|
Maximum | 2017-12-11 00:00:00 |
cancer_dig
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
0 | |
---|---|
1 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 0 |
3rd row | 1 |
4th row | 0 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
0 | 85 | |
1 | 15 | 15.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 85 | |
1 | 15 | 15.0% |
cancer_cd
Text
MISSING
 
Distinct | 11 |
---|---|
Distinct (%) | 73.3% |
Missing | 85 |
Missing (%) | 85.0% |
Memory size | 932.0 B |
Value | Count | Frequency (%) |
c3499 | 4 | |
c649 | 2 | |
c220 | 1 | 6.7% |
c1680 | 1 | 6.7% |
c169 | 1 | 6.7% |
c569 | 1 | 6.7% |
c61 | 1 | 6.7% |
c1699 | 1 | 6.7% |
c679 | 1 | 6.7% |
c187 | 1 | 6.7% |
Most occurring characters
Value | Count | Frequency (%) |
C | 15 | |
9 | 15 | |
6 | 9 | |
4 | 6 | 9.2% |
1 | 5 | 7.7% |
3 | 4 | 6.2% |
0 | 3 | 4.6% |
7 | 3 | 4.6% |
2 | 2 | 3.1% |
8 | 2 | 3.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 50 | |
Uppercase Letter | 15 | 23.1% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
9 | 15 | |
6 | 9 | |
4 | 6 | 12.0% |
1 | 5 | 10.0% |
3 | 4 | 8.0% |
0 | 3 | 6.0% |
7 | 3 | 6.0% |
2 | 2 | 4.0% |
8 | 2 | 4.0% |
5 | 1 | 2.0% |
Uppercase Letter
Value | Count | Frequency (%) |
C | 15 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 50 | |
Latin | 15 | 23.1% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
9 | 15 | |
6 | 9 | |
4 | 6 | 12.0% |
1 | 5 | 10.0% |
3 | 4 | 8.0% |
0 | 3 | 6.0% |
7 | 3 | 6.0% |
2 | 2 | 4.0% |
8 | 2 | 4.0% |
5 | 1 | 2.0% |
Latin
Value | Count | Frequency (%) |
C | 15 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 65 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
C | 15 | |
9 | 15 | |
6 | 9 | |
4 | 6 | 9.2% |
1 | 5 | 7.7% |
3 | 4 | 6.2% |
0 | 3 | 4.6% |
7 | 3 | 4.6% |
2 | 2 | 3.1% |
8 | 2 | 3.1% |
cancer_date
Date
MISSING
 
Distinct | 15 |
---|---|
Distinct (%) | 100.0% |
Missing | 85 |
Missing (%) | 85.0% |
Memory size | 932.0 B |
Minimum | 2002-12-16 00:00:00 |
---|---|
Maximum | 2019-06-11 00:00:00 |
osteo_dig
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
0 | |
---|---|
1 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 0 |
3rd row | 0 |
4th row | 0 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 72 | |
1 | 28 | 28.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 72 | |
1 | 28 | 28.0% |
osteo_cd
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 13 |
---|---|
Distinct (%) | 13.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
<NA> | |
---|---|
M8219.44E.00 | |
M8198.44E.00 | 3 |
M8199.000.01 | 2 |
M8283.000.00 | 1 |
Other values (8) |
Length
Max length | 12 |
---|---|
Median length | 4 |
Mean length | 6.24 |
Min length | 4 |
Unique
Unique | 9 ? |
---|---|
Unique (%) | 9.0% |
Sample
1st row | <NA> |
---|---|
2nd row | <NA> |
3rd row | <NA> |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 72 | |
M8219.44E.00 | 14 | 14.0% |
M8198.44E.00 | 3 | 3.0% |
M8199.000.01 | 2 | 2.0% |
M8283.000.00 | 1 | 1.0% |
M8190.44E.00 | 1 | 1.0% |
M8100.44E.00 | 1 | 1.0% |
M8105.44E.00 | 1 | 1.0% |
M8188.44E.00 | 1 | 1.0% |
M8050.000.00 | 1 | 1.0% |
Other values (3) | 3 | 3.0% |
Length
Value | Count | Frequency (%) |
na | 72 | |
m8219.44e.00 | 14 | 14.0% |
m8198.44e.00 | 3 | 3.0% |
m8199.000.01 | 2 | 2.0% |
m8283.000.00 | 1 | 1.0% |
m8190.44e.00 | 1 | 1.0% |
m8100.44e.00 | 1 | 1.0% |
m8105.44e.00 | 1 | 1.0% |
m8188.44e.00 | 1 | 1.0% |
m8050.000.00 | 1 | 1.0% |
Other values (3) | 3 | 3.0% |
osteo_date
Date
MISSING
 
Distinct | 24 |
---|---|
Distinct (%) | 85.7% |
Missing | 72 |
Missing (%) | 72.0% |
Memory size | 932.0 B |
Minimum | 2014-01-07 00:00:00 |
---|---|
Maximum | 2019-04-22 00:00:00 |
RID | Hypt_dig | Hypt_date | Heart_dig | Heart_date | cancer_dig | cancer_cd | cancer_date | osteo_dig | osteo_cd | osteo_date | |
---|---|---|---|---|---|---|---|---|---|---|---|
RID | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Hypt_dig | 1.000 | 1.000 | NaN | 0.161 | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 0.000 |
Hypt_date | 1.000 | NaN | 1.000 | 0.593 | 1.000 | 0.730 | 1.000 | 1.000 | 0.700 | 1.000 | 0.994 |
Heart_dig | 1.000 | 0.161 | 0.593 | 1.000 | NaN | 0.106 | 0.000 | 1.000 | 0.000 | 0.682 | 1.000 |
Heart_date | 1.000 | 1.000 | 1.000 | NaN | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
cancer_dig | 1.000 | 0.000 | 0.730 | 0.106 | 1.000 | 1.000 | NaN | NaN | 0.000 | 0.301 | 1.000 |
cancer_cd | 1.000 | 1.000 | 1.000 | 0.000 | 1.000 | NaN | 1.000 | 1.000 | 0.000 | 1.000 | 1.000 |
cancer_date | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | NaN | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
osteo_dig | 1.000 | 0.000 | 0.700 | 0.000 | 1.000 | 0.000 | 0.000 | 1.000 | 1.000 | NaN | NaN |
osteo_cd | 1.000 | 0.000 | 1.000 | 0.682 | 1.000 | 0.301 | 1.000 | 1.000 | NaN | 1.000 | 0.979 |
osteo_date | 1.000 | 0.000 | 0.994 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | NaN | 0.979 | 1.000 |
osteo_dig | Heart_dig | cancer_dig | osteo_cd | Hypt_dig | |
---|---|---|---|---|---|
osteo_dig | 1.000 | 0.000 | 0.000 | 1.000 | 0.000 |
Heart_dig | 0.000 | 1.000 | 0.067 | 0.409 | 0.103 |
cancer_dig | 0.000 | 0.067 | 1.000 | 0.139 | 0.000 |
osteo_cd | 1.000 | 0.409 | 0.139 | 1.000 | 0.000 |
Hypt_dig | 0.000 | 0.103 | 0.000 | 0.000 | 1.000 |
Hypt_dig | Heart_dig | cancer_dig | osteo_dig | osteo_cd | |
---|---|---|---|---|---|
Hypt_dig | 1.000 | 0.103 | 0.000 | 0.000 | 0.000 |
Heart_dig | 0.103 | 1.000 | 0.067 | 0.000 | 0.409 |
cancer_dig | 0.000 | 0.067 | 1.000 | 0.000 | 0.139 |
osteo_dig | 0.000 | 0.000 | 0.000 | 1.000 | 1.000 |
osteo_cd | 0.000 | 0.409 | 0.139 | 1.000 | 1.000 |
RID | Hypt_dig | Hypt_date | Heart_dig | Heart_date | cancer_dig | cancer_cd | cancer_date | osteo_dig | osteo_cd | osteo_date | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | R0000001 | 0 | <NA> | 1 | 2015-09-01T00:00:00 | 0 | <NA> | <NA> | 0 | <NA> | <NA> |
1 | R0000002 | 1 | 2014-04-15T00:00:00 | 0 | <NA> | 0 | <NA> | <NA> | 0 | <NA> | <NA> |
2 | R0000003 | 1 | 2010-10-19T00:00:00 | 1 | 2010-10-19T00:00:00 | 1 | C3499 | 2017-01-13T00:00:00 | 0 | <NA> | <NA> |
3 | R0000004 | 1 | 2012-09-21T00:00:00 | 1 | 2012-09-21T00:00:00 | 0 | <NA> | <NA> | 0 | <NA> | <NA> |
4 | R0000005 | 1 | 2014-02-13T00:00:00 | 0 | <NA> | 1 | C220 | 2013-08-30T00:00:00 | 0 | <NA> | <NA> |
5 | R0000006 | 1 | 2002-10-30T00:00:00 | 1 | 2002-12-26T00:00:00 | 0 | <NA> | <NA> | 0 | <NA> | <NA> |
6 | R0000007 | 1 | 2014-06-19T00:00:00 | 0 | <NA> | 0 | <NA> | <NA> | 1 | M8283.000.00 | 2015-09-01T00:00:00 |
7 | R0000008 | 1 | 2011-06-07T00:00:00 | 1 | 2015-10-14T00:00:00 | 0 | <NA> | <NA> | 0 | <NA> | <NA> |
8 | R0000009 | 1 | 2014-02-17T00:00:00 | 0 | <NA> | 0 | <NA> | <NA> | 1 | M8219.44E.00 | 2014-02-17T00:00:00 |
9 | R0000010 | 1 | 2005-08-13T00:00:00 | 0 | <NA> | 0 | <NA> | <NA> | 0 | <NA> | <NA> |
RID | Hypt_dig | Hypt_date | Heart_dig | Heart_date | cancer_dig | cancer_cd | cancer_date | osteo_dig | osteo_cd | osteo_date | |
---|---|---|---|---|---|---|---|---|---|---|---|
90 | R0000091 | 1 | 2014-03-10T00:00:00 | 0 | <NA> | 0 | <NA> | <NA> | 1 | M8219.44E.00 | 2014-03-10T00:00:00 |
91 | R0000092 | 1 | 2008-06-08T00:00:00 | 1 | 2008-06-08T00:00:00 | 0 | <NA> | <NA> | 0 | <NA> | <NA> |
92 | R0000093 | 1 | 2015-07-17T00:00:00 | 0 | <NA> | 0 | <NA> | <NA> | 0 | <NA> | <NA> |
93 | R0000094 | 1 | 2010-07-08T00:00:00 | 0 | <NA> | 0 | <NA> | <NA> | 0 | <NA> | <NA> |
94 | R0000095 | 0 | <NA> | 0 | <NA> | 0 | <NA> | <NA> | 1 | M8199.000.01 | 2015-09-07T00:00:00 |
95 | R0000096 | 1 | 2011-04-26T00:00:00 | 0 | <NA> | 0 | <NA> | <NA> | 1 | M8219.44E.00 | 2014-02-18T00:00:00 |
96 | R0000097 | 0 | <NA> | 0 | <NA> | 0 | <NA> | <NA> | 1 | M8219.44E.00 | 2014-04-22T00:00:00 |
97 | R0000098 | 1 | 2007-05-08T00:00:00 | 1 | 2014-04-17T00:00:00 | 1 | C649 | 2019-02-14T00:00:00 | 0 | <NA> | <NA> |
98 | R0000099 | 1 | 2015-07-16T00:00:00 | 1 | 2005-05-23T00:00:00 | 0 | <NA> | <NA> | 0 | <NA> | <NA> |
99 | R0000100 | 0 | <NA> | 0 | <NA> | 0 | <NA> | <NA> | 1 | M8219.44E.00 | 2014-02-17T00:00:00 |