Overview

Dataset statistics

Number of variables11
Number of observations100
Missing cells312
Missing cells (%)28.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.1 KiB
Average record size in memory93.3 B

Variable types

Text2
Categorical5
DateTime4

Dataset

Description고지혈증 환자들의 최초 진단과와 다양한 공존 질환의 진단명과 진단코드 데이터. 진단명은 고혈압성 질환, 허혈성심장질환, 고밀도 및 구조장애, 신생물 등이 포함됨. 진단코드는 ICD-11 코드와 SNOMED-CT 코드로 매핑됨. -질환에 관한 진단코드 유무 : 0은 No, 1은 Yes로 구분 하였음
Author가톨릭대학교 은평성모병원
URLhttp://cmcdata.net/data/dataset/coexistence-disease-data-dyslipidemia-eunpyeong

Alerts

osteo_dig is highly overall correlated with osteo_cdHigh correlation
osteo_cd is highly overall correlated with osteo_digHigh correlation
osteo_cd is highly imbalanced (56.7%)Imbalance
Hypt_date has 19 (19.0%) missing valuesMissing
Heart_date has 51 (51.0%) missing valuesMissing
cancer_cd has 85 (85.0%) missing valuesMissing
cancer_date has 85 (85.0%) missing valuesMissing
osteo_date has 72 (72.0%) missing valuesMissing
RID has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:57:24.157582
Analysis finished2023-10-08 18:57:25.833129
Duration1.68 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:57:26.314641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters800
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowR0000001
2nd rowR0000002
3rd rowR0000003
4th rowR0000004
5th rowR0000005
ValueCountFrequency (%)
r0000001 1
 
1.0%
r0000063 1
 
1.0%
r0000074 1
 
1.0%
r0000073 1
 
1.0%
r0000072 1
 
1.0%
r0000071 1
 
1.0%
r0000070 1
 
1.0%
r0000069 1
 
1.0%
r0000068 1
 
1.0%
r0000067 1
 
1.0%
Other values (90) 90
90.0%
2023-10-09T03:57:27.223535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 519
64.9%
R 100
 
12.5%
1 21
 
2.6%
3 20
 
2.5%
4 20
 
2.5%
5 20
 
2.5%
6 20
 
2.5%
7 20
 
2.5%
8 20
 
2.5%
9 20
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.5%
Uppercase Letter 100
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 519
74.1%
1 21
 
3.0%
3 20
 
2.9%
4 20
 
2.9%
5 20
 
2.9%
6 20
 
2.9%
7 20
 
2.9%
8 20
 
2.9%
9 20
 
2.9%
2 20
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
R 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.5%
Latin 100
 
12.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 519
74.1%
1 21
 
3.0%
3 20
 
2.9%
4 20
 
2.9%
5 20
 
2.9%
6 20
 
2.9%
7 20
 
2.9%
8 20
 
2.9%
9 20
 
2.9%
2 20
 
2.9%
Latin
ValueCountFrequency (%)
R 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 519
64.9%
R 100
 
12.5%
1 21
 
2.6%
3 20
 
2.5%
4 20
 
2.5%
5 20
 
2.5%
6 20
 
2.5%
7 20
 
2.5%
8 20
 
2.5%
9 20
 
2.5%

Hypt_dig
Categorical

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
81 
0
19 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 81
81.0%
0 19
 
19.0%

Length

2023-10-09T03:57:27.505229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:27.718122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 81
81.0%
0 19
 
19.0%

Hypt_date
Date

MISSING 

Distinct73
Distinct (%)90.1%
Missing19
Missing (%)19.0%
Memory size932.0 B
Minimum2001-08-01 00:00:00
Maximum2018-06-07 00:00:00
2023-10-09T03:57:27.919895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:57:28.316068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Heart_dig
Categorical

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
51 
1
49 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0 51
51.0%
1 49
49.0%

Length

2023-10-09T03:57:28.724813image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:28.880325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 51
51.0%
1 49
49.0%

Heart_date
Date

MISSING 

Distinct48
Distinct (%)98.0%
Missing51
Missing (%)51.0%
Memory size932.0 B
Minimum2001-03-07 00:00:00
Maximum2017-12-11 00:00:00
2023-10-09T03:57:29.066284image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:57:29.397196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=48)

cancer_dig
Categorical

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
85 
1
15 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0 85
85.0%
1 15
 
15.0%

Length

2023-10-09T03:57:29.690034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:29.862027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 85
85.0%
1 15
 
15.0%

cancer_cd
Text

MISSING 

Distinct11
Distinct (%)73.3%
Missing85
Missing (%)85.0%
Memory size932.0 B
2023-10-09T03:57:30.057093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length4
Mean length4.3333333
Min length3

Characters and Unicode

Total characters65
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)60.0%

Sample

1st rowC3499
2nd rowC220
3rd rowC3499
4th rowC1680
5th rowC3499
ValueCountFrequency (%)
c3499 4
26.7%
c649 2
13.3%
c220 1
 
6.7%
c1680 1
 
6.7%
c169 1
 
6.7%
c569 1
 
6.7%
c61 1
 
6.7%
c1699 1
 
6.7%
c679 1
 
6.7%
c187 1
 
6.7%
2023-10-09T03:57:30.586048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 15
23.1%
9 15
23.1%
6 9
13.8%
4 6
 
9.2%
1 5
 
7.7%
3 4
 
6.2%
0 3
 
4.6%
7 3
 
4.6%
2 2
 
3.1%
8 2
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 50
76.9%
Uppercase Letter 15
 
23.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9 15
30.0%
6 9
18.0%
4 6
 
12.0%
1 5
 
10.0%
3 4
 
8.0%
0 3
 
6.0%
7 3
 
6.0%
2 2
 
4.0%
8 2
 
4.0%
5 1
 
2.0%
Uppercase Letter
ValueCountFrequency (%)
C 15
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 50
76.9%
Latin 15
 
23.1%

Most frequent character per script

Common
ValueCountFrequency (%)
9 15
30.0%
6 9
18.0%
4 6
 
12.0%
1 5
 
10.0%
3 4
 
8.0%
0 3
 
6.0%
7 3
 
6.0%
2 2
 
4.0%
8 2
 
4.0%
5 1
 
2.0%
Latin
ValueCountFrequency (%)
C 15
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 65
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 15
23.1%
9 15
23.1%
6 9
13.8%
4 6
 
9.2%
1 5
 
7.7%
3 4
 
6.2%
0 3
 
4.6%
7 3
 
4.6%
2 2
 
3.1%
8 2
 
3.1%

cancer_date
Date

MISSING 

Distinct15
Distinct (%)100.0%
Missing85
Missing (%)85.0%
Memory size932.0 B
Minimum2002-12-16 00:00:00
Maximum2019-06-11 00:00:00
2023-10-09T03:57:30.786212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:57:31.018296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)

osteo_dig
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
72 
1
28 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 72
72.0%
1 28
 
28.0%

Length

2023-10-09T03:57:31.354997image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:31.612250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 72
72.0%
1 28
 
28.0%

osteo_cd
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct13
Distinct (%)13.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
<NA>
72 
M8219.44E.00
14 
M8198.44E.00
 
3
M8199.000.01
 
2
M8283.000.00
 
1
Other values (8)

Length

Max length12
Median length4
Mean length6.24
Min length4

Unique

Unique9 ?
Unique (%)9.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 72
72.0%
M8219.44E.00 14
 
14.0%
M8198.44E.00 3
 
3.0%
M8199.000.01 2
 
2.0%
M8283.000.00 1
 
1.0%
M8190.44E.00 1
 
1.0%
M8100.44E.00 1
 
1.0%
M8105.44E.00 1
 
1.0%
M8188.44E.00 1
 
1.0%
M8050.000.00 1
 
1.0%
Other values (3) 3
 
3.0%

Length

2023-10-09T03:57:32.289584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 72
72.0%
m8219.44e.00 14
 
14.0%
m8198.44e.00 3
 
3.0%
m8199.000.01 2
 
2.0%
m8283.000.00 1
 
1.0%
m8190.44e.00 1
 
1.0%
m8100.44e.00 1
 
1.0%
m8105.44e.00 1
 
1.0%
m8188.44e.00 1
 
1.0%
m8050.000.00 1
 
1.0%
Other values (3) 3
 
3.0%

osteo_date
Date

MISSING 

Distinct24
Distinct (%)85.7%
Missing72
Missing (%)72.0%
Memory size932.0 B
Minimum2014-01-07 00:00:00
Maximum2019-04-22 00:00:00
2023-10-09T03:57:32.531207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:57:32.741283image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)

Correlations

2023-10-09T03:57:32.914738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDHypt_digHypt_dateHeart_digHeart_datecancer_digcancer_cdcancer_dateosteo_digosteo_cdosteo_date
RID1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Hypt_dig1.0001.000NaN0.1611.0000.0001.0001.0000.0000.0000.000
Hypt_date1.000NaN1.0000.5931.0000.7301.0001.0000.7001.0000.994
Heart_dig1.0000.1610.5931.000NaN0.1060.0001.0000.0000.6821.000
Heart_date1.0001.0001.000NaN1.0001.0001.0001.0001.0001.0001.000
cancer_dig1.0000.0000.7300.1061.0001.000NaNNaN0.0000.3011.000
cancer_cd1.0001.0001.0000.0001.000NaN1.0001.0000.0001.0001.000
cancer_date1.0001.0001.0001.0001.000NaN1.0001.0001.0001.0001.000
osteo_dig1.0000.0000.7000.0001.0000.0000.0001.0001.000NaNNaN
osteo_cd1.0000.0001.0000.6821.0000.3011.0001.000NaN1.0000.979
osteo_date1.0000.0000.9941.0001.0001.0001.0001.000NaN0.9791.000
2023-10-09T03:57:33.194222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
osteo_digHeart_digcancer_digosteo_cdHypt_dig
osteo_dig1.0000.0000.0001.0000.000
Heart_dig0.0001.0000.0670.4090.103
cancer_dig0.0000.0671.0000.1390.000
osteo_cd1.0000.4090.1391.0000.000
Hypt_dig0.0000.1030.0000.0001.000
2023-10-09T03:57:33.571522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Hypt_digHeart_digcancer_digosteo_digosteo_cd
Hypt_dig1.0000.1030.0000.0000.000
Heart_dig0.1031.0000.0670.0000.409
cancer_dig0.0000.0671.0000.0000.139
osteo_dig0.0000.0000.0001.0001.000
osteo_cd0.0000.4090.1391.0001.000

Missing values

2023-10-09T03:57:25.099205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:57:25.442353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-10-09T03:57:25.692213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

RIDHypt_digHypt_dateHeart_digHeart_datecancer_digcancer_cdcancer_dateosteo_digosteo_cdosteo_date
0R00000010<NA>12015-09-01T00:00:000<NA><NA>0<NA><NA>
1R000000212014-04-15T00:00:000<NA>0<NA><NA>0<NA><NA>
2R000000312010-10-19T00:00:0012010-10-19T00:00:001C34992017-01-13T00:00:000<NA><NA>
3R000000412012-09-21T00:00:0012012-09-21T00:00:000<NA><NA>0<NA><NA>
4R000000512014-02-13T00:00:000<NA>1C2202013-08-30T00:00:000<NA><NA>
5R000000612002-10-30T00:00:0012002-12-26T00:00:000<NA><NA>0<NA><NA>
6R000000712014-06-19T00:00:000<NA>0<NA><NA>1M8283.000.002015-09-01T00:00:00
7R000000812011-06-07T00:00:0012015-10-14T00:00:000<NA><NA>0<NA><NA>
8R000000912014-02-17T00:00:000<NA>0<NA><NA>1M8219.44E.002014-02-17T00:00:00
9R000001012005-08-13T00:00:000<NA>0<NA><NA>0<NA><NA>
RIDHypt_digHypt_dateHeart_digHeart_datecancer_digcancer_cdcancer_dateosteo_digosteo_cdosteo_date
90R000009112014-03-10T00:00:000<NA>0<NA><NA>1M8219.44E.002014-03-10T00:00:00
91R000009212008-06-08T00:00:0012008-06-08T00:00:000<NA><NA>0<NA><NA>
92R000009312015-07-17T00:00:000<NA>0<NA><NA>0<NA><NA>
93R000009412010-07-08T00:00:000<NA>0<NA><NA>0<NA><NA>
94R00000950<NA>0<NA>0<NA><NA>1M8199.000.012015-09-07T00:00:00
95R000009612011-04-26T00:00:000<NA>0<NA><NA>1M8219.44E.002014-02-18T00:00:00
96R00000970<NA>0<NA>0<NA><NA>1M8219.44E.002014-04-22T00:00:00
97R000009812007-05-08T00:00:0012014-04-17T00:00:001C6492019-02-14T00:00:000<NA><NA>
98R000009912015-07-16T00:00:0012005-05-23T00:00:000<NA><NA>0<NA><NA>
99R00001000<NA>0<NA>0<NA><NA>1M8219.44E.002014-02-17T00:00:00