Overview

Dataset statistics

Number of variables15
Number of observations342
Missing cells2093
Missing cells (%)40.8%
Duplicate rows1
Duplicate rows (%)0.3%
Total size in memory42.9 KiB
Average record size in memory128.4 B

Variable types

Numeric2
Categorical8
Unsupported4
Text1

Dataset

Description고지혈증 환자들의 최초 진단과와 최초 진단명과 진단코드 데이터. 진단과로는 소화기내과, 순환기내과, 내분비내과 등이 포함되어 환자 유입 경로를 분석할 수 있음. 스타틴 약물 데이터는 RxNorm 코드로 매핑됨.
Author가톨릭대학교 은평성모병원
URLhttp://cmcdata.net/data/dataset/first-statin-prescription-dyslipidemia-data-eunpyeong

Alerts

Dataset has 1 (0.3%) duplicate rowsDuplicates
MED_CD is highly overall correlated with STTN_FST_PRCD and 1 other fieldsHigh correlation
STTN_FST_PRCD is highly overall correlated with MED_CD and 1 other fieldsHigh correlation
Diag_date is highly overall correlated with RID and 4 other fieldsHigh correlation
STTN_FST_DEPT is highly overall correlated with Diag_dateHigh correlation
RID is highly overall correlated with Diag_dateHigh correlation
STTN_FST_DAYS is highly overall correlated with Diag_dateHigh correlation
Unnamed: 11 is highly overall correlated with Unnamed: 12 and 1 other fieldsHigh correlation
Unnamed: 12 is highly overall correlated with Unnamed: 11 and 2 other fieldsHigh correlation
Unnamed: 13 is highly overall correlated with Unnamed: 12High correlation
Unnamed: 14 is highly overall correlated with Unnamed: 11 and 1 other fieldsHigh correlation
STTN_FST_DEPT is highly imbalanced (56.8%)Imbalance
Unnamed: 11 is highly imbalanced (57.8%)Imbalance
Unnamed: 14 is highly imbalanced (54.7%)Imbalance
RID has 242 (70.8%) missing valuesMissing
STTN_FST_DAYS has 242 (70.8%) missing valuesMissing
Unnamed: 6 has 342 (100.0%) missing valuesMissing
Unnamed: 7 has 342 (100.0%) missing valuesMissing
Unnamed: 8 has 342 (100.0%) missing valuesMissing
Unnamed: 9 has 342 (100.0%) missing valuesMissing
Unnamed: 10 has 241 (70.5%) missing valuesMissing
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-10-08 18:58:10.134282
Analysis finished2023-10-08 18:58:12.385872
Duration2.25 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct100
Distinct (%)100.0%
Missing242
Missing (%)70.8%
Infinite0
Infinite (%)0.0%
Mean50.5
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2023-10-09T03:58:12.522180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5.95
Q125.75
median50.5
Q375.25
95-th percentile95.05
Maximum100
Range99
Interquartile range (IQR)49.5

Descriptive statistics

Standard deviation29.011492
Coefficient of variation (CV)0.57448499
Kurtosis-1.2
Mean50.5
Median Absolute Deviation (MAD)25
Skewness0
Sum5050
Variance841.66667
MonotonicityStrictly increasing
2023-10-09T03:58:12.937360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
65 1
 
0.3%
75 1
 
0.3%
74 1
 
0.3%
73 1
 
0.3%
72 1
 
0.3%
71 1
 
0.3%
70 1
 
0.3%
69 1
 
0.3%
68 1
 
0.3%
67 1
 
0.3%
Other values (90) 90
 
26.3%
(Missing) 242
70.8%
ValueCountFrequency (%)
1 1
0.3%
2 1
0.3%
3 1
0.3%
4 1
0.3%
5 1
0.3%
6 1
0.3%
7 1
0.3%
8 1
0.3%
9 1
0.3%
10 1
0.3%
ValueCountFrequency (%)
100 1
0.3%
99 1
0.3%
98 1
0.3%
97 1
0.3%
96 1
0.3%
95 1
0.3%
94 1
0.3%
93 1
0.3%
92 1
0.3%
91 1
0.3%

STTN_FST_DEPT
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct10
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
<NA>
242 
순환기내과
50 
내분비내과
33 
외과
 
4
호흡기내과
 
3
Other values (5)
 
10

Length

Max length10
Median length4
Mean length4.2748538
Min length2

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row내분비내과
2nd row내분비내과
3rd row내분비내과
4th row내분비내과
5th row내분비내과

Common Values

ValueCountFrequency (%)
<NA> 242
70.8%
순환기내과 50
 
14.6%
내분비내과 33
 
9.6%
외과 4
 
1.2%
호흡기내과 3
 
0.9%
신장내과 3
 
0.9%
소화기내과 2
 
0.6%
재활의학과 2
 
0.6%
관절센터(정형외과) 2
 
0.6%
흉부외과 1
 
0.3%

Length

2023-10-09T03:58:13.163555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:13.337526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 242
70.8%
순환기내과 50
 
14.6%
내분비내과 33
 
9.6%
외과 4
 
1.2%
호흡기내과 3
 
0.9%
신장내과 3
 
0.9%
소화기내과 2
 
0.6%
재활의학과 2
 
0.6%
관절센터(정형외과 2
 
0.6%
흉부외과 1
 
0.3%

STTN_FST_PRCD
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
<NA>
242 
Rosuvastatin
53 
Atorvastatin
47 

Length

Max length12
Median length4
Mean length6.3391813
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRosuvastatin
2nd rowRosuvastatin
3rd rowAtorvastatin
4th rowRosuvastatin
5th rowRosuvastatin

Common Values

ValueCountFrequency (%)
<NA> 242
70.8%
Rosuvastatin 53
 
15.5%
Atorvastatin 47
 
13.7%

Length

2023-10-09T03:58:13.499170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:13.639355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 242
70.8%
rosuvastatin 53
 
15.5%
atorvastatin 47
 
13.7%

MED_CD
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
<NA>
242 
859747
53 
617310
47 

Length

Max length6
Median length4
Mean length4.5847953
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row859747
2nd row859747
3rd row617310
4th row859747
5th row859747

Common Values

ValueCountFrequency (%)
<NA> 242
70.8%
859747 53
 
15.5%
617310 47
 
13.7%

Length

2023-10-09T03:58:13.781763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:13.937801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 242
70.8%
859747 53
 
15.5%
617310 47
 
13.7%

Diag_date
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
<NA>
242 
2015
100 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015
2nd row2015
3rd row2015
4th row2015
5th row2015

Common Values

ValueCountFrequency (%)
<NA> 242
70.8%
2015 100
29.2%

Length

2023-10-09T03:58:14.084031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:14.283807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 242
70.8%
2015 100
29.2%

STTN_FST_DAYS
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct16
Distinct (%)16.0%
Missing242
Missing (%)70.8%
Infinite0
Infinite (%)0.0%
Mean59.68
Minimum1
Maximum120
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2023-10-09T03:58:14.485672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q133.75
median63
Q390
95-th percentile100
Maximum120
Range119
Interquartile range (IQR)56.25

Descriptive statistics

Standard deviation30.415267
Coefficient of variation (CV)0.5096392
Kurtosis-0.52815551
Mean59.68
Median Absolute Deviation (MAD)27
Skewness-0.27934754
Sum5968
Variance925.08848
MonotonicityNot monotonic
2023-10-09T03:58:14.675386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
90 23
 
6.7%
60 14
 
4.1%
30 13
 
3.8%
70 11
 
3.2%
1 9
 
2.6%
63 8
 
2.3%
35 5
 
1.5%
120 4
 
1.2%
65 3
 
0.9%
56 2
 
0.6%
Other values (6) 8
 
2.3%
(Missing) 242
70.8%
ValueCountFrequency (%)
1 9
2.6%
14 1
 
0.3%
21 1
 
0.3%
28 1
 
0.3%
30 13
3.8%
35 5
 
1.5%
40 2
 
0.6%
56 2
 
0.6%
60 14
4.1%
63 8
2.3%
ValueCountFrequency (%)
120 4
 
1.2%
100 2
 
0.6%
90 23
6.7%
80 1
 
0.3%
70 11
3.2%
65 3
 
0.9%
63 8
 
2.3%
60 14
4.1%
56 2
 
0.6%
40 2
 
0.6%

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing342
Missing (%)100.0%
Memory size3.1 KiB

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing342
Missing (%)100.0%
Memory size3.1 KiB

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing342
Missing (%)100.0%
Memory size3.1 KiB

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing342
Missing (%)100.0%
Memory size3.1 KiB

Unnamed: 10
Text

MISSING 

Distinct101
Distinct (%)100.0%
Missing241
Missing (%)70.5%
Memory size2.8 KiB
2023-10-09T03:58:14.971531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length7.950495
Min length3

Characters and Unicode

Total characters803
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique101 ?
Unique (%)100.0%

Sample

1st rowRID
2nd rowR0000202
3rd rowR0000203
4th rowR0000215
5th rowR0000218
ValueCountFrequency (%)
r0000249 1
 
1.0%
r0000529 1
 
1.0%
r0000515 1
 
1.0%
r0000514 1
 
1.0%
r0000511 1
 
1.0%
r0000508 1
 
1.0%
r0000503 1
 
1.0%
r0000500 1
 
1.0%
r0000491 1
 
1.0%
r0000489 1
 
1.0%
Other values (91) 91
90.1%
2023-10-09T03:58:15.532049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 417
51.9%
R 101
 
12.6%
5 54
 
6.7%
4 49
 
6.1%
2 42
 
5.2%
3 39
 
4.9%
8 22
 
2.7%
1 21
 
2.6%
7 21
 
2.6%
6 18
 
2.2%
Other values (3) 19
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.2%
Uppercase Letter 103
 
12.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 417
59.6%
5 54
 
7.7%
4 49
 
7.0%
2 42
 
6.0%
3 39
 
5.6%
8 22
 
3.1%
1 21
 
3.0%
7 21
 
3.0%
6 18
 
2.6%
9 17
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
R 101
98.1%
I 1
 
1.0%
D 1
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.2%
Latin 103
 
12.8%

Most frequent character per script

Common
ValueCountFrequency (%)
0 417
59.6%
5 54
 
7.7%
4 49
 
7.0%
2 42
 
6.0%
3 39
 
5.6%
8 22
 
3.1%
1 21
 
3.0%
7 21
 
3.0%
6 18
 
2.6%
9 17
 
2.4%
Latin
ValueCountFrequency (%)
R 101
98.1%
I 1
 
1.0%
D 1
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 803
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 417
51.9%
R 101
 
12.6%
5 54
 
6.7%
4 49
 
6.1%
2 42
 
5.2%
3 39
 
4.9%
8 22
 
2.7%
1 21
 
2.6%
7 21
 
2.6%
6 18
 
2.2%
Other values (3) 19
 
2.4%

Unnamed: 11
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct11
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
<NA>
241 
순환기내과
50 
내분비내과
33 
외과
 
4
호흡기내과
 
3
Other values (6)
 
11

Length

Max length10
Median length4
Mean length4.2807018
Min length2

Unique

Unique2 ?
Unique (%)0.6%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 241
70.5%
순환기내과 50
 
14.6%
내분비내과 33
 
9.6%
외과 4
 
1.2%
호흡기내과 3
 
0.9%
신장내과 3
 
0.9%
소화기내과 2
 
0.6%
재활의학과 2
 
0.6%
관절센터(정형외과) 2
 
0.6%
Dept_S 1
 
0.3%

Length

2023-10-09T03:58:15.730116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 241
70.5%
순환기내과 50
 
14.6%
내분비내과 33
 
9.6%
외과 4
 
1.2%
호흡기내과 3
 
0.9%
신장내과 3
 
0.9%
소화기내과 2
 
0.6%
재활의학과 2
 
0.6%
관절센터(정형외과 2
 
0.6%
dept_s 1
 
0.3%

Unnamed: 12
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
<NA>
241 
Rosuvastatin
53 
Atorvastatin
47 
Sta_S
 
1

Length

Max length12
Median length4
Mean length6.3421053
Min length4

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 241
70.5%
Rosuvastatin 53
 
15.5%
Atorvastatin 47
 
13.7%
Sta_S 1
 
0.3%

Length

2023-10-09T03:58:15.928833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:16.150746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 241
70.5%
rosuvastatin 53
 
15.5%
atorvastatin 47
 
13.7%
sta_s 1
 
0.3%

Unnamed: 13
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
<NA>
241 
2015-09-07
35 
2015-09-04
28 
2015-09-03
 
22
2015-09-05
 
9
Other values (2)
 
7

Length

Max length10
Median length4
Mean length5.7602339
Min length4

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 241
70.5%
2015-09-07 35
 
10.2%
2015-09-04 28
 
8.2%
2015-09-03 22
 
6.4%
2015-09-05 9
 
2.6%
2015-09-02 6
 
1.8%
Sta_SD 1
 
0.3%

Length

2023-10-09T03:58:16.323462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:16.486402image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 241
70.5%
2015-09-07 35
 
10.2%
2015-09-04 28
 
8.2%
2015-09-03 22
 
6.4%
2015-09-05 9
 
2.6%
2015-09-02 6
 
1.8%
sta_sd 1
 
0.3%

Unnamed: 14
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct18
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
<NA>
241 
90
 
23
60
 
14
30
 
13
70
 
11
Other values (13)
40 

Length

Max length6
Median length4
Mean length3.4122807
Min length1

Unique

Unique5 ?
Unique (%)1.5%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 241
70.5%
90 23
 
6.7%
60 14
 
4.1%
30 13
 
3.8%
70 11
 
3.2%
1 9
 
2.6%
63 8
 
2.3%
35 5
 
1.5%
120 4
 
1.2%
65 3
 
0.9%
Other values (8) 11
 
3.2%

Length

2023-10-09T03:58:16.636149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 241
70.5%
90 23
 
6.7%
60 14
 
4.1%
30 13
 
3.8%
70 11
 
3.2%
1 9
 
2.6%
63 8
 
2.3%
35 5
 
1.5%
120 4
 
1.2%
65 3
 
0.9%
Other values (8) 11
 
3.2%

Interactions

2023-10-09T03:58:11.271811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:11.055387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:11.375169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:11.155166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:58:16.753999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDSTTN_FST_DEPTSTTN_FST_PRCDMED_CDSTTN_FST_DAYSUnnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14
RID1.0000.0000.0000.0000.452NaNNaNNaNNaN
STTN_FST_DEPT0.0001.0000.3490.3490.390NaNNaNNaNNaN
STTN_FST_PRCD0.0000.3491.0000.9990.000NaNNaNNaNNaN
MED_CD0.0000.3490.9991.0000.000NaNNaNNaNNaN
STTN_FST_DAYS0.4520.3900.0000.0001.000NaNNaNNaNNaN
Unnamed: 11NaNNaNNaNNaNNaN1.0000.8360.6960.849
Unnamed: 12NaNNaNNaNNaNNaN0.8361.0000.9410.850
Unnamed: 13NaNNaNNaNNaNNaN0.6960.9411.0000.753
Unnamed: 14NaNNaNNaNNaNNaN0.8490.8500.7531.000
2023-10-09T03:58:17.192203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 11Unnamed: 14Unnamed: 12MED_CDUnnamed: 13STTN_FST_PRCDDiag_dateSTTN_FST_DEPT
Unnamed: 111.0000.5220.719NaN0.449NaNNaNNaN
Unnamed: 140.5221.0000.650NaN0.450NaNNaNNaN
Unnamed: 120.7190.6501.000NaN0.695NaNNaNNaN
MED_CDNaNNaNNaN1.000NaN0.9801.0000.335
Unnamed: 130.4490.4500.695NaN1.000NaNNaNNaN
STTN_FST_PRCDNaNNaNNaN0.980NaN1.0001.0000.335
Diag_dateNaNNaNNaN1.000NaN1.0001.0001.000
STTN_FST_DEPTNaNNaNNaN0.335NaN0.3351.0001.000
2023-10-09T03:58:17.391415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDSTTN_FST_DAYSSTTN_FST_DEPTSTTN_FST_PRCDMED_CDDiag_dateUnnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14
RID1.0000.0290.0000.0000.0001.0000.0000.0000.0000.000
STTN_FST_DAYS0.0291.0000.1850.0000.0001.0000.0000.0000.0000.000
STTN_FST_DEPT0.0000.1851.0000.3350.3351.0000.0000.0000.0000.000
STTN_FST_PRCD0.0000.0000.3351.0000.9801.0000.0000.0000.0000.000
MED_CD0.0000.0000.3350.9801.0001.0000.0000.0000.0000.000
Diag_date1.0001.0001.0001.0001.0001.0000.0000.0000.0000.000
Unnamed: 110.0000.0000.0000.0000.0000.0001.0000.7190.4490.522
Unnamed: 120.0000.0000.0000.0000.0000.0000.7191.0000.6950.650
Unnamed: 130.0000.0000.0000.0000.0000.0000.4490.6951.0000.450
Unnamed: 140.0000.0000.0000.0000.0000.0000.5220.6500.4501.000

Missing values

2023-10-09T03:58:11.567168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:58:11.881681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-10-09T03:58:12.154181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

RIDSTTN_FST_DEPTSTTN_FST_PRCDMED_CDDiag_dateSTTN_FST_DAYSUnnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14
01내분비내과Rosuvastatin859747201560<NA><NA><NA><NA><NA><NA><NA><NA><NA>
12내분비내과Rosuvastatin859747201530<NA><NA><NA><NA><NA><NA><NA><NA><NA>
23내분비내과Atorvastatin617310201530<NA><NA><NA><NA><NA><NA><NA><NA><NA>
34내분비내과Rosuvastatin859747201530<NA><NA><NA><NA><NA><NA><NA><NA><NA>
45내분비내과Rosuvastatin859747201590<NA><NA><NA><NA><NA><NA><NA><NA><NA>
56순환기내과Rosuvastatin859747201560<NA><NA><NA><NA><NA><NA><NA><NA><NA>
67소화기내과Atorvastatin617310201556<NA><NA><NA><NA><NA><NA><NA><NA><NA>
78순환기내과Atorvastatin6173102015120<NA><NA><NA><NA><NA><NA><NA><NA><NA>
89순환기내과Rosuvastatin8597472015100<NA><NA><NA><NA><NA><NA><NA><NA><NA>
910순환기내과Rosuvastatin859747201521<NA><NA><NA><NA><NA><NA><NA><NA><NA>
RIDSTTN_FST_DEPTSTTN_FST_PRCDMED_CDDiag_dateSTTN_FST_DAYSUnnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14
332<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>R0000567순환기내과Rosuvastatin2015-09-0714
333<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>R0000568내분비내과Atorvastatin2015-09-0740
334<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>R0000572순환기내과Rosuvastatin2015-09-0730
335<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>R0000573순환기내과Atorvastatin2015-09-0770
336<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>R0000579순환기내과Rosuvastatin2015-09-0790
337<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>R0000580순환기내과Atorvastatin2015-09-0760
338<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>R0000586신장내과Atorvastatin2015-09-0735
339<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>R0000592순환기내과Atorvastatin2015-09-0760
340<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>R0000595내분비내과Atorvastatin2015-09-0760
341<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>R0000597내분비내과Atorvastatin2015-09-0760

Duplicate rows

Most frequently occurring

RIDSTTN_FST_DEPTSTTN_FST_PRCDMED_CDDiag_dateSTTN_FST_DAYSUnnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14# duplicates
0<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>141