Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 100 |
Missing cells | 310 |
Missing cells (%) | 28.2% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 9.1 KiB |
Average record size in memory | 93.3 B |
Variable types
Text | 2 |
---|---|
Categorical | 5 |
DateTime | 4 |
Dataset
Description | 고지혈증 환자들의 최초 진단과와 다양한 공존 질환의 진단명과 진단코드 데이터. 진단명은 고혈압성 질환, 허혈성심장질환, 고밀도 및 구조장애, 신생물 등이 포함됨. 진단코드는 ICD-11 코드와 SNOMED-CT 코드로 매핑됨. -질환에 관한 진단코드 유무 : 0은 No, 1은 Yes로 구분 하였음 |
---|---|
Author | 가톨릭대학교 서울성모병원 |
URL | http://cmcdata.net/data/dataset/coexistence-disease-data-dyslipidemia |
osteo_cd is highly overall correlated with osteo_dig | High correlation |
osteo_dig is highly overall correlated with osteo_cd | High correlation |
osteo_cd is highly imbalanced (53.8%) | Imbalance |
Hypt_date has 36 (36.0%) missing values | Missing |
Heart_date has 51 (51.0%) missing values | Missing |
cancer_cd has 75 (75.0%) missing values | Missing |
cancer_date has 75 (75.0%) missing values | Missing |
osteo_date has 73 (73.0%) missing values | Missing |
RID has unique values | Unique |
Reproduction
Analysis started | 2023-10-08 18:57:41.063504 |
---|---|
Analysis finished | 2023-10-08 18:57:43.005861 |
Duration | 1.94 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
RID
Text
UNIQUE
 
Distinct | 100 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Value | Count | Frequency (%) |
r0000001 | 1 | 1.0% |
r0000064 | 1 | 1.0% |
r0000075 | 1 | 1.0% |
r0000074 | 1 | 1.0% |
r0000073 | 1 | 1.0% |
r0000072 | 1 | 1.0% |
r0000071 | 1 | 1.0% |
r0000070 | 1 | 1.0% |
r0000069 | 1 | 1.0% |
r0000068 | 1 | 1.0% |
Other values (90) | 90 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 518 | |
R | 100 | 12.5% |
1 | 24 | 3.0% |
2 | 21 | 2.6% |
4 | 20 | 2.5% |
5 | 20 | 2.5% |
7 | 20 | 2.5% |
9 | 20 | 2.5% |
3 | 19 | 2.4% |
6 | 19 | 2.4% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 700 | |
Uppercase Letter | 100 | 12.5% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 518 | |
1 | 24 | 3.4% |
2 | 21 | 3.0% |
4 | 20 | 2.9% |
5 | 20 | 2.9% |
7 | 20 | 2.9% |
9 | 20 | 2.9% |
3 | 19 | 2.7% |
6 | 19 | 2.7% |
8 | 19 | 2.7% |
Uppercase Letter
Value | Count | Frequency (%) |
R | 100 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 700 | |
Latin | 100 | 12.5% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 518 | |
1 | 24 | 3.4% |
2 | 21 | 3.0% |
4 | 20 | 2.9% |
5 | 20 | 2.9% |
7 | 20 | 2.9% |
9 | 20 | 2.9% |
3 | 19 | 2.7% |
6 | 19 | 2.7% |
8 | 19 | 2.7% |
Latin
Value | Count | Frequency (%) |
R | 100 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 800 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 518 | |
R | 100 | 12.5% |
1 | 24 | 3.0% |
2 | 21 | 2.6% |
4 | 20 | 2.5% |
5 | 20 | 2.5% |
7 | 20 | 2.5% |
9 | 20 | 2.5% |
3 | 19 | 2.4% |
6 | 19 | 2.4% |
Hypt_dig
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
1 | |
---|---|
0 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1 |
---|---|
2nd row | 1 |
3rd row | 1 |
4th row | 1 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
1 | 64 | |
0 | 36 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1 | 64 | |
0 | 36 |
Hypt_date
Date
MISSING
 
Distinct | 63 |
---|---|
Distinct (%) | 98.4% |
Missing | 36 |
Missing (%) | 36.0% |
Memory size | 932.0 B |
Minimum | 1996-01-16 00:00:00 |
---|---|
Maximum | 2018-06-21 00:00:00 |
Heart_dig
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
0 | |
---|---|
1 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 1 |
3rd row | 1 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
0 | 51 | |
1 | 49 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 51 | |
1 | 49 |
Heart_date
Date
MISSING
 
Distinct | 49 |
---|---|
Distinct (%) | 100.0% |
Missing | 51 |
Missing (%) | 51.0% |
Memory size | 932.0 B |
Minimum | 1995-04-19 00:00:00 |
---|---|
Maximum | 2018-07-06 00:00:00 |
cancer_dig
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
0 | |
---|---|
1 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 0 |
3rd row | 0 |
4th row | 1 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 75 | |
1 | 25 | 25.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 75 | |
1 | 25 | 25.0% |
cancer_cd
Text
MISSING
 
Distinct | 18 |
---|---|
Distinct (%) | 72.0% |
Missing | 75 |
Missing (%) | 75.0% |
Memory size | 932.0 B |
Value | Count | Frequency (%) |
c73.0 | 4 | |
c509.4 | 3 | 12.0% |
c220.4 | 2 | 8.0% |
c539.4 | 2 | 8.0% |
c186.4 | 1 | 4.0% |
c20.0 | 1 | 4.0% |
c221.0 | 1 | 4.0% |
c56.4 | 1 | 4.0% |
c649.0 | 1 | 4.0% |
c189.4 | 1 | 4.0% |
Other values (8) | 8 |
Most occurring characters
Value | Count | Frequency (%) |
C | 25 | |
. | 25 | |
4 | 18 | |
0 | 15 | |
9 | 10 | 7.1% |
3 | 8 | 5.7% |
5 | 8 | 5.7% |
2 | 8 | 5.7% |
6 | 8 | 5.7% |
1 | 7 | 5.0% |
Other values (2) | 8 | 5.7% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 90 | |
Uppercase Letter | 25 | 17.9% |
Other Punctuation | 25 | 17.9% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
4 | 18 | |
0 | 15 | |
9 | 10 | |
3 | 8 | |
5 | 8 | |
2 | 8 | |
6 | 8 | |
1 | 7 | 7.8% |
7 | 5 | 5.6% |
8 | 3 | 3.3% |
Uppercase Letter
Value | Count | Frequency (%) |
C | 25 |
Other Punctuation
Value | Count | Frequency (%) |
. | 25 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 115 | |
Latin | 25 | 17.9% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
. | 25 | |
4 | 18 | |
0 | 15 | |
9 | 10 | 8.7% |
3 | 8 | 7.0% |
5 | 8 | 7.0% |
2 | 8 | 7.0% |
6 | 8 | 7.0% |
1 | 7 | 6.1% |
7 | 5 | 4.3% |
Latin
Value | Count | Frequency (%) |
C | 25 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 140 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
C | 25 | |
. | 25 | |
4 | 18 | |
0 | 15 | |
9 | 10 | 7.1% |
3 | 8 | 5.7% |
5 | 8 | 5.7% |
2 | 8 | 5.7% |
6 | 8 | 5.7% |
1 | 7 | 5.0% |
Other values (2) | 8 | 5.7% |
cancer_date
Date
MISSING
 
Distinct | 24 |
---|---|
Distinct (%) | 96.0% |
Missing | 75 |
Missing (%) | 75.0% |
Memory size | 932.0 B |
Minimum | 1996-11-17 00:00:00 |
---|---|
Maximum | 2015-08-17 00:00:00 |
osteo_dig
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
0 | |
---|---|
1 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 1 |
3rd row | 0 |
4th row | 0 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 73 | |
1 | 27 | 27.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 73 | |
1 | 27 | 27.0% |
osteo_cd
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 10 |
---|---|
Distinct (%) | 10.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
<NA> | |
---|---|
M819.4 | |
M8199.0 | 5 |
M818.4 | 3 |
M810.4 | 3 |
Other values (5) | 5 |
Length
Max length | 7 |
---|---|
Median length | 4 |
Mean length | 4.6 |
Min length | 4 |
Unique
Unique | 5 ? |
---|---|
Unique (%) | 5.0% |
Sample
1st row | <NA> |
---|---|
2nd row | M818.4 |
3rd row | <NA> |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 73 | |
M819.4 | 11 | 11.0% |
M8199.0 | 5 | 5.0% |
M818.4 | 3 | 3.0% |
M810.4 | 3 | 3.0% |
M8000.4 | 1 | 1.0% |
M8190.4 | 1 | 1.0% |
M80.4 | 1 | 1.0% |
M81.4 | 1 | 1.0% |
M8109.0 | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 73 | |
m819.4 | 11 | 11.0% |
m8199.0 | 5 | 5.0% |
m818.4 | 3 | 3.0% |
m810.4 | 3 | 3.0% |
m8000.4 | 1 | 1.0% |
m8190.4 | 1 | 1.0% |
m80.4 | 1 | 1.0% |
m81.4 | 1 | 1.0% |
m8109.0 | 1 | 1.0% |
osteo_date
Date
MISSING
 
Distinct | 27 |
---|---|
Distinct (%) | 100.0% |
Missing | 73 |
Missing (%) | 73.0% |
Memory size | 932.0 B |
Minimum | 1997-04-12 00:00:00 |
---|---|
Maximum | 2017-11-15 00:00:00 |
RID | Hypt_dig | Hypt_date | Heart_dig | Heart_date | cancer_dig | cancer_cd | cancer_date | osteo_dig | osteo_cd | osteo_date | |
---|---|---|---|---|---|---|---|---|---|---|---|
RID | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Hypt_dig | 1.000 | 1.000 | NaN | 0.363 | 1.000 | 0.212 | 0.000 | 1.000 | 0.000 | 0.000 | 1.000 |
Hypt_date | 1.000 | NaN | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Heart_dig | 1.000 | 0.363 | 0.000 | 1.000 | NaN | 0.000 | 0.000 | 0.000 | 0.212 | 0.285 | 1.000 |
Heart_date | 1.000 | 1.000 | 1.000 | NaN | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
cancer_dig | 1.000 | 0.212 | 0.000 | 0.000 | 1.000 | 1.000 | NaN | NaN | 0.349 | 0.144 | 1.000 |
cancer_cd | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | NaN | 1.000 | 1.000 | 0.000 | 0.292 | 1.000 |
cancer_date | 1.000 | 1.000 | 1.000 | 0.000 | 1.000 | NaN | 1.000 | 1.000 | 0.000 | 1.000 | 1.000 |
osteo_dig | 1.000 | 0.000 | 1.000 | 0.212 | 1.000 | 0.349 | 0.000 | 0.000 | 1.000 | NaN | NaN |
osteo_cd | 1.000 | 0.000 | 1.000 | 0.285 | 1.000 | 0.144 | 0.292 | 1.000 | NaN | 1.000 | 1.000 |
osteo_date | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | NaN | 1.000 | 1.000 |
osteo_cd | cancer_dig | Hypt_dig | osteo_dig | Heart_dig | |
---|---|---|---|---|---|
osteo_cd | 1.000 | 0.058 | 0.000 | 1.000 | 0.218 |
cancer_dig | 0.058 | 1.000 | 0.136 | 0.227 | 0.000 |
Hypt_dig | 0.000 | 0.136 | 1.000 | 0.000 | 0.237 |
osteo_dig | 1.000 | 0.227 | 0.000 | 1.000 | 0.135 |
Heart_dig | 0.218 | 0.000 | 0.237 | 0.135 | 1.000 |
Hypt_dig | Heart_dig | cancer_dig | osteo_dig | osteo_cd | |
---|---|---|---|---|---|
Hypt_dig | 1.000 | 0.237 | 0.136 | 0.000 | 0.000 |
Heart_dig | 0.237 | 1.000 | 0.000 | 0.135 | 0.218 |
cancer_dig | 0.136 | 0.000 | 1.000 | 0.227 | 0.058 |
osteo_dig | 0.000 | 0.135 | 0.227 | 1.000 | 1.000 |
osteo_cd | 0.000 | 0.218 | 0.058 | 1.000 | 1.000 |
RID | Hypt_dig | Hypt_date | Heart_dig | Heart_date | cancer_dig | cancer_cd | cancer_date | osteo_dig | osteo_cd | osteo_date | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | R0000001 | 1 | 1997-04-29 | 0 | NaT | 0 | <NA> | NaT | 0 | <NA> | NaT |
1 | R0000002 | 1 | 1999-05-28 | 1 | 2014-04-11 | 0 | <NA> | NaT | 1 | M818.4 | 2005-05-07 |
2 | R0000003 | 1 | 1997-04-10 | 1 | 2014-03-14 | 0 | <NA> | NaT | 0 | <NA> | NaT |
3 | R0000004 | 1 | 1997-09-29 | 1 | 2001-12-05 | 1 | C189.4 | 1997-08-05 | 0 | <NA> | NaT |
4 | R0000005 | 0 | NaT | 1 | 2001-02-07 | 0 | <NA> | NaT | 0 | <NA> | NaT |
5 | R0000006 | 1 | 2009-12-07 | 0 | NaT | 1 | C73.0 | 2014-06-24 | 0 | <NA> | NaT |
6 | R0000007 | 1 | 2003-12-20 | 1 | 2001-03-17 | 0 | <NA> | NaT | 0 | <NA> | NaT |
7 | R0000008 | 1 | 2014-06-09 | 1 | 2001-02-12 | 1 | C20.0 | 2015-08-17 | 0 | <NA> | NaT |
8 | R0000009 | 1 | 1999-07-28 | 1 | 2000-09-15 | 0 | <NA> | NaT | 0 | <NA> | NaT |
9 | R0000010 | 1 | 2009-12-31 | 0 | NaT | 0 | <NA> | NaT | 0 | <NA> | NaT |
RID | Hypt_dig | Hypt_date | Heart_dig | Heart_date | cancer_dig | cancer_cd | cancer_date | osteo_dig | osteo_cd | osteo_date | |
---|---|---|---|---|---|---|---|---|---|---|---|
90 | R0000093 | 1 | 2006-02-11 | 0 | NaT | 0 | <NA> | NaT | 0 | <NA> | NaT |
91 | R0000094 | 1 | 2000-04-10 | 1 | 2000-12-18 | 0 | <NA> | NaT | 0 | <NA> | NaT |
92 | R0000095 | 1 | 2006-01-03 | 0 | NaT | 0 | <NA> | NaT | 0 | <NA> | NaT |
93 | R0000096 | 0 | NaT | 1 | 2006-01-03 | 0 | <NA> | NaT | 0 | <NA> | NaT |
94 | R0000097 | 1 | 2006-01-05 | 1 | 2008-03-29 | 1 | C73.0 | 2014-06-24 | 1 | M8109.0 | 2016-02-02 |
95 | R0000098 | 0 | NaT | 0 | NaT | 0 | <NA> | NaT | 0 | <NA> | NaT |
96 | R0000099 | 0 | NaT | 0 | NaT | 0 | <NA> | NaT | 0 | <NA> | NaT |
97 | R0000101 | 1 | 2001-10-11 | 1 | 2001-10-11 | 0 | <NA> | NaT | 0 | <NA> | NaT |
98 | R0000102 | 1 | 2002-04-30 | 0 | NaT | 1 | C64.4 | 1998-06-18 | 0 | <NA> | NaT |
99 | R0000103 | 1 | 1996-04-09 | 1 | 2005-11-21 | 0 | <NA> | NaT | 0 | <NA> | NaT |