Overview

Dataset statistics

Number of variables6
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.1 KiB
Average record size in memory52.3 B

Variable types

Text1
Categorical3
Numeric1
DateTime1

Dataset

Description고지혈증 환자들의 스타틴 처방 데이터와 스타틴 처방 이전이나 이후에 처방된 선행 약물과 병용 약물 현황을 분석할 수 있는 데이터. 스타틴 약물 처방 데이터는 1일 기준 용량과 수량, 처방횟수, 처방 일수 데이터를 이용하여 총 투여량을 생성할 수 있음. 약물 처방 데이터는 RxNorm 코드로 매핑됨 -선행 약물 여부 : 0은 No, 1은 Yes로 구분 하였음 -병용 약물 여부 : 0은 No, 1은 Yes로 구분 하였음
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/precedence-combination-administration-drug-data-dyslipidemia

Alerts

comorbidity is highly overall correlated with drug_cd and 2 other fieldsHigh correlation
drug_exposure is highly overall correlated with drug_cd and 2 other fieldsHigh correlation
drug_cd is highly overall correlated with drug_name and 2 other fieldsHigh correlation
drug_name is highly overall correlated with drug_cd and 2 other fieldsHigh correlation
RID has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:56:03.913247
Analysis finished2023-10-08 18:56:08.443003
Duration4.53 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:56:08.940174image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters800
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowR0000002
2nd rowR0000003
3rd rowR0000007
4th rowR0000012
5th rowR0000017
ValueCountFrequency (%)
r0000002 1
 
1.0%
r0000318 1
 
1.0%
r0000423 1
 
1.0%
r0000421 1
 
1.0%
r0000379 1
 
1.0%
r0000363 1
 
1.0%
r0000360 1
 
1.0%
r0000359 1
 
1.0%
r0000357 1
 
1.0%
r0000352 1
 
1.0%
Other values (90) 90
90.0%
2023-10-09T03:56:09.869437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 442
55.2%
R 100
 
12.5%
5 46
 
5.8%
2 41
 
5.1%
1 34
 
4.2%
3 32
 
4.0%
4 31
 
3.9%
8 21
 
2.6%
7 20
 
2.5%
9 18
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.5%
Uppercase Letter 100
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 442
63.1%
5 46
 
6.6%
2 41
 
5.9%
1 34
 
4.9%
3 32
 
4.6%
4 31
 
4.4%
8 21
 
3.0%
7 20
 
2.9%
9 18
 
2.6%
6 15
 
2.1%
Uppercase Letter
ValueCountFrequency (%)
R 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.5%
Latin 100
 
12.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 442
63.1%
5 46
 
6.6%
2 41
 
5.9%
1 34
 
4.9%
3 32
 
4.6%
4 31
 
4.4%
8 21
 
3.0%
7 20
 
2.9%
9 18
 
2.6%
6 15
 
2.1%
Latin
ValueCountFrequency (%)
R 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 442
55.2%
R 100
 
12.5%
5 46
 
5.8%
2 41
 
5.1%
1 34
 
4.2%
3 32
 
4.0%
4 31
 
3.9%
8 21
 
2.6%
7 20
 
2.5%
9 18
 
2.2%

drug_name
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)13.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Fenofibrate
15 
Gemfibrozil
15 
Propranolol
14 
Thyroxine
14 
Warfarin
14 
Other values (8)
28 

Length

Max length14
Median length11
Mean length10.16
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFenofibrate
2nd rowGemfibrozil
3rd rowOmega-3
4th rowPropranolol
5th rowThyroxine

Common Values

ValueCountFrequency (%)
Fenofibrate 15
15.0%
Gemfibrozil 15
15.0%
Propranolol 14
14.0%
Thyroxine 14
14.0%
Warfarin 14
14.0%
Bisphosphonate 14
14.0%
Omega-3 2
 
2.0%
Omega-4 2
 
2.0%
Omega-5 2
 
2.0%
Omega-6 2
 
2.0%
Other values (3) 6
 
6.0%

Length

2023-10-09T03:56:10.298653image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
fenofibrate 15
15.0%
gemfibrozil 15
15.0%
propranolol 14
14.0%
thyroxine 14
14.0%
warfarin 14
14.0%
bisphosphonate 14
14.0%
omega-3 2
 
2.0%
omega-4 2
 
2.0%
omega-5 2
 
2.0%
omega-6 2
 
2.0%
Other values (3) 6
 
6.0%

drug_cd
Real number (ℝ)

HIGH CORRELATION 

Distinct7
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean669002.27
Minimum315106
Maximum966221
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:10.584557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum315106
5-th percentile315106
Q1349287
median855302
Q3904419
95-th percentile966221
Maximum966221
Range651115
Interquartile range (IQR)555132

Descriptive statistics

Standard deviation263613.51
Coefficient of variation (CV)0.39403978
Kurtosis-1.7714708
Mean669002.27
Median Absolute Deviation (MAD)110919
Skewness-0.28707765
Sum66900227
Variance6.9492082 × 1010
MonotonicityNot monotonic
2023-10-09T03:56:10.831448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
349287 15
15.0%
315106 15
15.0%
484348 14
14.0%
856448 14
14.0%
966221 14
14.0%
855302 14
14.0%
904419 14
14.0%
ValueCountFrequency (%)
315106 15
15.0%
349287 15
15.0%
484348 14
14.0%
855302 14
14.0%
856448 14
14.0%
904419 14
14.0%
966221 14
14.0%
ValueCountFrequency (%)
966221 14
14.0%
904419 14
14.0%
856448 14
14.0%
855302 14
14.0%
484348 14
14.0%
349287 15
15.0%
315106 15
15.0%

drug_exposure
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
79 
1
21 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 79
79.0%
1 21
 
21.0%

Length

2023-10-09T03:56:11.094650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:11.277029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 79
79.0%
1 21
 
21.0%
Distinct85
Distinct (%)85.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Minimum1999-10-05 00:00:00
Maximum2018-06-20 00:00:00
2023-10-09T03:56:11.615448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:12.010856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

comorbidity
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
79 
1
21 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 79
79.0%
1 21
 
21.0%

Length

2023-10-09T03:56:12.221364image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:12.480448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 79
79.0%
1 21
 
21.0%

Interactions

2023-10-09T03:56:06.961632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:56:12.592677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDdrug_namedrug_cddrug_exposuredrug_start_datecomorbidity
RID1.0001.0001.0001.0001.0001.000
drug_name1.0001.0001.0000.8830.7030.883
drug_cd1.0001.0001.0000.9560.3050.956
drug_exposure1.0000.8830.9561.0000.0000.999
drug_start_date1.0000.7030.3050.0001.0000.000
comorbidity1.0000.8830.9560.9990.0001.000
2023-10-09T03:56:12.779681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
comorbiditydrug_namedrug_exposure
comorbidity1.0000.8210.970
drug_name0.8211.0000.821
drug_exposure0.9700.8211.000
2023-10-09T03:56:13.047598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
drug_cddrug_namedrug_exposurecomorbidity
drug_cd1.0000.9520.8120.812
drug_name0.9521.0000.8210.821
drug_exposure0.8120.8211.0000.970
comorbidity0.8120.8210.9701.000

Missing values

2023-10-09T03:56:07.600063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:56:08.199741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

RIDdrug_namedrug_cddrug_exposuredrug_start_datecomorbidity
0R0000002Fenofibrate34928702003-10-250
1R0000003Gemfibrozil31510602000-03-310
2R0000007Omega-348434802005-05-090
3R0000012Propranolol85644802006-01-160
4R0000017Thyroxine96622102009-09-170
5R0000018Warfarin85530202013-11-210
6R0000021Bisphosphonate90441912004-12-231
7R0000026Fenofibrate34928702003-07-290
8R0000027Gemfibrozil31510602008-08-050
9R0000030Omega-348434802015-09-300
RIDdrug_namedrug_cddrug_exposuredrug_start_datecomorbidity
90R0000540Bisphosphonate90441912008-10-021
91R0000542Fenofibrate34928702007-11-020
92R0000543Gemfibrozil31510602006-01-050
93R0000544Omega-948434802007-04-210
94R0000549Propranolol85644802009-04-210
95R0000555Thyroxine96622112014-10-151
96R0000556Warfarin85530202006-08-300
97R0000557Bisphosphonate90441912006-02-201
98R0000570Fenofibrate34928702013-12-120
99R0000572Gemfibrozil31510602006-07-050