Overview

Dataset statistics

Number of variables11
Number of observations100
Missing cells310
Missing cells (%)28.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.1 KiB
Average record size in memory93.3 B

Variable types

Text2
Categorical5
DateTime4

Dataset

Description고지혈증 환자들의 최초 진단과와 다양한 공존 질환의 진단명과 진단코드 데이터. 진단명은 고혈압성 질환, 허혈성심장질환, 고밀도 및 구조장애, 신생물 등이 포함됨. 진단코드는 ICD-11 코드와 SNOMED-CT 코드로 매핑됨. -질환에 관한 진단코드 유무 : 0은 No, 1은 Yes로 구분 하였음
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/coexistence-disease-data-dyslipidemia

Alerts

osteo_cd is highly overall correlated with osteo_digHigh correlation
osteo_dig is highly overall correlated with osteo_cdHigh correlation
osteo_cd is highly imbalanced (53.8%)Imbalance
Hypt_date has 36 (36.0%) missing valuesMissing
Heart_date has 51 (51.0%) missing valuesMissing
cancer_cd has 75 (75.0%) missing valuesMissing
cancer_date has 75 (75.0%) missing valuesMissing
osteo_date has 73 (73.0%) missing valuesMissing
RID has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:57:41.063504
Analysis finished2023-10-08 18:57:43.005861
Duration1.94 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:57:43.487823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters800
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowR0000001
2nd rowR0000002
3rd rowR0000003
4th rowR0000004
5th rowR0000005
ValueCountFrequency (%)
r0000001 1
 
1.0%
r0000064 1
 
1.0%
r0000075 1
 
1.0%
r0000074 1
 
1.0%
r0000073 1
 
1.0%
r0000072 1
 
1.0%
r0000071 1
 
1.0%
r0000070 1
 
1.0%
r0000069 1
 
1.0%
r0000068 1
 
1.0%
Other values (90) 90
90.0%
2023-10-09T03:57:44.477101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 518
64.8%
R 100
 
12.5%
1 24
 
3.0%
2 21
 
2.6%
4 20
 
2.5%
5 20
 
2.5%
7 20
 
2.5%
9 20
 
2.5%
3 19
 
2.4%
6 19
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.5%
Uppercase Letter 100
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 518
74.0%
1 24
 
3.4%
2 21
 
3.0%
4 20
 
2.9%
5 20
 
2.9%
7 20
 
2.9%
9 20
 
2.9%
3 19
 
2.7%
6 19
 
2.7%
8 19
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
R 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.5%
Latin 100
 
12.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 518
74.0%
1 24
 
3.4%
2 21
 
3.0%
4 20
 
2.9%
5 20
 
2.9%
7 20
 
2.9%
9 20
 
2.9%
3 19
 
2.7%
6 19
 
2.7%
8 19
 
2.7%
Latin
ValueCountFrequency (%)
R 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 518
64.8%
R 100
 
12.5%
1 24
 
3.0%
2 21
 
2.6%
4 20
 
2.5%
5 20
 
2.5%
7 20
 
2.5%
9 20
 
2.5%
3 19
 
2.4%
6 19
 
2.4%

Hypt_dig
Categorical

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
64 
0
36 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
1 64
64.0%
0 36
36.0%

Length

2023-10-09T03:57:44.755161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:44.941171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 64
64.0%
0 36
36.0%

Hypt_date
Date

MISSING 

Distinct63
Distinct (%)98.4%
Missing36
Missing (%)36.0%
Memory size932.0 B
Minimum1996-01-16 00:00:00
Maximum2018-06-21 00:00:00
2023-10-09T03:57:45.231972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:57:45.548212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Heart_dig
Categorical

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
51 
1
49 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0 51
51.0%
1 49
49.0%

Length

2023-10-09T03:57:46.483779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:46.720802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 51
51.0%
1 49
49.0%

Heart_date
Date

MISSING 

Distinct49
Distinct (%)100.0%
Missing51
Missing (%)51.0%
Memory size932.0 B
Minimum1995-04-19 00:00:00
Maximum2018-07-06 00:00:00
2023-10-09T03:57:46.951614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:57:47.295917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)

cancer_dig
Categorical

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
75 
1
25 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0 75
75.0%
1 25
 
25.0%

Length

2023-10-09T03:57:47.640812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:47.873501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 75
75.0%
1 25
 
25.0%

cancer_cd
Text

MISSING 

Distinct18
Distinct (%)72.0%
Missing75
Missing (%)75.0%
Memory size932.0 B
2023-10-09T03:57:48.122652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length6
Mean length5.6
Min length5

Characters and Unicode

Total characters140
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)56.0%

Sample

1st rowC189.4
2nd rowC73.0
3rd rowC20.0
4th rowC221.0
5th rowC56.4
ValueCountFrequency (%)
c73.0 4
16.0%
c509.4 3
 
12.0%
c220.4 2
 
8.0%
c539.4 2
 
8.0%
c186.4 1
 
4.0%
c20.0 1
 
4.0%
c221.0 1
 
4.0%
c56.4 1
 
4.0%
c649.0 1
 
4.0%
c189.4 1
 
4.0%
Other values (8) 8
32.0%
2023-10-09T03:57:48.727434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 25
17.9%
. 25
17.9%
4 18
12.9%
0 15
10.7%
9 10
 
7.1%
3 8
 
5.7%
5 8
 
5.7%
2 8
 
5.7%
6 8
 
5.7%
1 7
 
5.0%
Other values (2) 8
 
5.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 90
64.3%
Uppercase Letter 25
 
17.9%
Other Punctuation 25
 
17.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 18
20.0%
0 15
16.7%
9 10
11.1%
3 8
8.9%
5 8
8.9%
2 8
8.9%
6 8
8.9%
1 7
 
7.8%
7 5
 
5.6%
8 3
 
3.3%
Uppercase Letter
ValueCountFrequency (%)
C 25
100.0%
Other Punctuation
ValueCountFrequency (%)
. 25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 115
82.1%
Latin 25
 
17.9%

Most frequent character per script

Common
ValueCountFrequency (%)
. 25
21.7%
4 18
15.7%
0 15
13.0%
9 10
 
8.7%
3 8
 
7.0%
5 8
 
7.0%
2 8
 
7.0%
6 8
 
7.0%
1 7
 
6.1%
7 5
 
4.3%
Latin
ValueCountFrequency (%)
C 25
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 140
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 25
17.9%
. 25
17.9%
4 18
12.9%
0 15
10.7%
9 10
 
7.1%
3 8
 
5.7%
5 8
 
5.7%
2 8
 
5.7%
6 8
 
5.7%
1 7
 
5.0%
Other values (2) 8
 
5.7%

cancer_date
Date

MISSING 

Distinct24
Distinct (%)96.0%
Missing75
Missing (%)75.0%
Memory size932.0 B
Minimum1996-11-17 00:00:00
Maximum2015-08-17 00:00:00
2023-10-09T03:57:49.024763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:57:49.349532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)

osteo_dig
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
73 
1
27 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 73
73.0%
1 27
 
27.0%

Length

2023-10-09T03:57:49.661712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:50.076008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 73
73.0%
1 27
 
27.0%

osteo_cd
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct10
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
<NA>
73 
M819.4
11 
M8199.0
 
5
M818.4
 
3
M810.4
 
3
Other values (5)
 
5

Length

Max length7
Median length4
Mean length4.6
Min length4

Unique

Unique5 ?
Unique (%)5.0%

Sample

1st row<NA>
2nd rowM818.4
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 73
73.0%
M819.4 11
 
11.0%
M8199.0 5
 
5.0%
M818.4 3
 
3.0%
M810.4 3
 
3.0%
M8000.4 1
 
1.0%
M8190.4 1
 
1.0%
M80.4 1
 
1.0%
M81.4 1
 
1.0%
M8109.0 1
 
1.0%

Length

2023-10-09T03:57:50.287591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:50.529771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 73
73.0%
m819.4 11
 
11.0%
m8199.0 5
 
5.0%
m818.4 3
 
3.0%
m810.4 3
 
3.0%
m8000.4 1
 
1.0%
m8190.4 1
 
1.0%
m80.4 1
 
1.0%
m81.4 1
 
1.0%
m8109.0 1
 
1.0%

osteo_date
Date

MISSING 

Distinct27
Distinct (%)100.0%
Missing73
Missing (%)73.0%
Memory size932.0 B
Minimum1997-04-12 00:00:00
Maximum2017-11-15 00:00:00
2023-10-09T03:57:50.781029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:57:51.013600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=27)

Correlations

2023-10-09T03:57:51.213659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDHypt_digHypt_dateHeart_digHeart_datecancer_digcancer_cdcancer_dateosteo_digosteo_cdosteo_date
RID1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Hypt_dig1.0001.000NaN0.3631.0000.2120.0001.0000.0000.0001.000
Hypt_date1.000NaN1.0000.0001.0000.0001.0001.0001.0001.0001.000
Heart_dig1.0000.3630.0001.000NaN0.0000.0000.0000.2120.2851.000
Heart_date1.0001.0001.000NaN1.0001.0001.0001.0001.0001.0001.000
cancer_dig1.0000.2120.0000.0001.0001.000NaNNaN0.3490.1441.000
cancer_cd1.0000.0001.0000.0001.000NaN1.0001.0000.0000.2921.000
cancer_date1.0001.0001.0000.0001.000NaN1.0001.0000.0001.0001.000
osteo_dig1.0000.0001.0000.2121.0000.3490.0000.0001.000NaNNaN
osteo_cd1.0000.0001.0000.2851.0000.1440.2921.000NaN1.0001.000
osteo_date1.0001.0001.0001.0001.0001.0001.0001.000NaN1.0001.000
2023-10-09T03:57:51.486448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
osteo_cdcancer_digHypt_digosteo_digHeart_dig
osteo_cd1.0000.0580.0001.0000.218
cancer_dig0.0581.0000.1360.2270.000
Hypt_dig0.0000.1361.0000.0000.237
osteo_dig1.0000.2270.0001.0000.135
Heart_dig0.2180.0000.2370.1351.000
2023-10-09T03:57:51.683615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Hypt_digHeart_digcancer_digosteo_digosteo_cd
Hypt_dig1.0000.2370.1360.0000.000
Heart_dig0.2371.0000.0000.1350.218
cancer_dig0.1360.0001.0000.2270.058
osteo_dig0.0000.1350.2271.0001.000
osteo_cd0.0000.2180.0581.0001.000

Missing values

2023-10-09T03:57:41.909303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:57:42.234634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-10-09T03:57:42.681386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

RIDHypt_digHypt_dateHeart_digHeart_datecancer_digcancer_cdcancer_dateosteo_digosteo_cdosteo_date
0R000000111997-04-290NaT0<NA>NaT0<NA>NaT
1R000000211999-05-2812014-04-110<NA>NaT1M818.42005-05-07
2R000000311997-04-1012014-03-140<NA>NaT0<NA>NaT
3R000000411997-09-2912001-12-051C189.41997-08-050<NA>NaT
4R00000050NaT12001-02-070<NA>NaT0<NA>NaT
5R000000612009-12-070NaT1C73.02014-06-240<NA>NaT
6R000000712003-12-2012001-03-170<NA>NaT0<NA>NaT
7R000000812014-06-0912001-02-121C20.02015-08-170<NA>NaT
8R000000911999-07-2812000-09-150<NA>NaT0<NA>NaT
9R000001012009-12-310NaT0<NA>NaT0<NA>NaT
RIDHypt_digHypt_dateHeart_digHeart_datecancer_digcancer_cdcancer_dateosteo_digosteo_cdosteo_date
90R000009312006-02-110NaT0<NA>NaT0<NA>NaT
91R000009412000-04-1012000-12-180<NA>NaT0<NA>NaT
92R000009512006-01-030NaT0<NA>NaT0<NA>NaT
93R00000960NaT12006-01-030<NA>NaT0<NA>NaT
94R000009712006-01-0512008-03-291C73.02014-06-241M8109.02016-02-02
95R00000980NaT0NaT0<NA>NaT0<NA>NaT
96R00000990NaT0NaT0<NA>NaT0<NA>NaT
97R000010112001-10-1112001-10-110<NA>NaT0<NA>NaT
98R000010212002-04-300NaT1C64.41998-06-180<NA>NaT
99R000010311996-04-0912005-11-210<NA>NaT0<NA>NaT