Overview

Dataset statistics

Number of variables6
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.1 KiB
Average record size in memory52.3 B

Variable types

Text1
Categorical4
Numeric1

Dataset

Description고지혈증 환자들의 최초 진단과와 최초 진단명과 진단코드 데이터. 진단과로는 소화기내과, 순환기내과, 내분비내과 등이 포함되어 환자 유입 경로를 분석할 수 있음. 스타틴 약물 데이터는 RxNorm 코드로 매핑됨.
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/first-statin-prescription-dyslipidemia-data

Alerts

Med_CD is highly overall correlated with STTN_FST_DAYS and 2 other fieldsHigh correlation
STTN_FST_DEPT is highly overall correlated with STTN_FST_PRCD and 1 other fieldsHigh correlation
STTN_FST_PRCD is highly overall correlated with STTN_FST_DAYS and 2 other fieldsHigh correlation
STTN_FST_DAYS is highly overall correlated with STTN_FST_PRCD and 1 other fieldsHigh correlation
Diag_date is highly imbalanced (91.9%)Imbalance
RID has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:57:45.562716
Analysis finished2023-10-08 18:57:46.738335
Duration1.18 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:57:47.327060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters800
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowR0000001
2nd rowR0000002
3rd rowR0000004
4th rowR0000010
5th rowR0000015
ValueCountFrequency (%)
r0000001 1
 
1.0%
r0001724 1
 
1.0%
r0002117 1
 
1.0%
r0002092 1
 
1.0%
r0002070 1
 
1.0%
r0002047 1
 
1.0%
r0002044 1
 
1.0%
r0002023 1
 
1.0%
r0002001 1
 
1.0%
r0002000 1
 
1.0%
Other values (90) 90
90.0%
2023-10-09T03:57:48.210994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 391
48.9%
R 100
 
12.5%
2 65
 
8.1%
1 51
 
6.4%
3 38
 
4.8%
4 28
 
3.5%
7 28
 
3.5%
8 27
 
3.4%
9 27
 
3.4%
5 24
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.5%
Uppercase Letter 100
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 391
55.9%
2 65
 
9.3%
1 51
 
7.3%
3 38
 
5.4%
4 28
 
4.0%
7 28
 
4.0%
8 27
 
3.9%
9 27
 
3.9%
5 24
 
3.4%
6 21
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
R 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.5%
Latin 100
 
12.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 391
55.9%
2 65
 
9.3%
1 51
 
7.3%
3 38
 
5.4%
4 28
 
4.0%
7 28
 
4.0%
8 27
 
3.9%
9 27
 
3.9%
5 24
 
3.4%
6 21
 
3.0%
Latin
ValueCountFrequency (%)
R 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 391
48.9%
R 100
 
12.5%
2 65
 
8.1%
1 51
 
6.4%
3 38
 
4.8%
4 28
 
3.5%
7 28
 
3.5%
8 27
 
3.4%
9 27
 
3.4%
5 24
 
3.0%

STTN_FST_DEPT
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
신장내과
47 
내과
25 
신경과
22 
흉부외과
 
4
외과
 
1

Length

Max length5
Median length4
Mean length3.27
Min length2

Unique

Unique2 ?
Unique (%)2.0%

Sample

1st row신장내과
2nd row내과
3rd row내과
4th row내과
5th row내과

Common Values

ValueCountFrequency (%)
신장내과 47
47.0%
내과 25
25.0%
신경과 22
22.0%
흉부외과 4
 
4.0%
외과 1
 
1.0%
응급의학과 1
 
1.0%

Length

2023-10-09T03:57:48.571310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:48.809530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
신장내과 47
47.0%
내과 25
25.0%
신경과 22
22.0%
흉부외과 4
 
4.0%
외과 1
 
1.0%
응급의학과 1
 
1.0%

STTN_FST_PRCD
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Atorvastatin
75 
Rosuvastatin
25 

Length

Max length12
Median length12
Mean length12
Min length12

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAtorvastatin
2nd rowRosuvastatin
3rd rowRosuvastatin
4th rowRosuvastatin
5th rowRosuvastatin

Common Values

ValueCountFrequency (%)
Atorvastatin 75
75.0%
Rosuvastatin 25
 
25.0%

Length

2023-10-09T03:57:49.064086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:49.271598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
atorvastatin 75
75.0%
rosuvastatin 25
 
25.0%

Med_CD
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
617310
69 
859747
25 
617312
 
5
617311
 
1

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row617310
2nd row859747
3rd row859747
4th row859747
5th row859747

Common Values

ValueCountFrequency (%)
617310 69
69.0%
859747 25
 
25.0%
617312 5
 
5.0%
617311 1
 
1.0%

Length

2023-10-09T03:57:49.507238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:49.821151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
617310 69
69.0%
859747 25
 
25.0%
617312 5
 
5.0%
617311 1
 
1.0%

Diag_date
Categorical

IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2006
99 
2005
 
1

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row2005
2nd row2006
3rd row2006
4th row2006
5th row2006

Common Values

ValueCountFrequency (%)
2006 99
99.0%
2005 1
 
1.0%

Length

2023-10-09T03:57:50.220361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:57:50.432323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2006 99
99.0%
2005 1
 
1.0%

STTN_FST_DAYS
Real number (ℝ)

HIGH CORRELATION 

Distinct22
Distinct (%)22.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.84
Minimum1
Maximum120
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:57:50.626175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15.5
median40
Q360
95-th percentile90
Maximum120
Range119
Interquartile range (IQR)54.5

Descriptive statistics

Standard deviation29.063747
Coefficient of variation (CV)0.72951173
Kurtosis-0.77924987
Mean39.84
Median Absolute Deviation (MAD)20
Skewness0.10017075
Sum3984
Variance844.70141
MonotonicityNot monotonic
2023-10-09T03:57:51.111449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
1 25
25.0%
60 23
23.0%
30 12
12.0%
90 7
 
7.0%
35 5
 
5.0%
63 4
 
4.0%
65 4
 
4.0%
50 3
 
3.0%
70 3
 
3.0%
40 2
 
2.0%
Other values (12) 12
12.0%
ValueCountFrequency (%)
1 25
25.0%
7 1
 
1.0%
16 1
 
1.0%
17 1
 
1.0%
20 1
 
1.0%
21 1
 
1.0%
28 1
 
1.0%
30 12
12.0%
35 5
 
5.0%
37 1
 
1.0%
ValueCountFrequency (%)
120 1
 
1.0%
90 7
 
7.0%
70 3
 
3.0%
65 4
 
4.0%
63 4
 
4.0%
60 23
23.0%
56 1
 
1.0%
53 1
 
1.0%
50 3
 
3.0%
45 1
 
1.0%

Interactions

2023-10-09T03:57:46.056558image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:57:51.320663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDSTTN_FST_DEPTSTTN_FST_PRCDMed_CDDiag_dateSTTN_FST_DAYS
RID1.0001.0001.0001.0001.0001.000
STTN_FST_DEPT1.0001.0001.0001.0000.0000.645
STTN_FST_PRCD1.0001.0001.0001.0000.0000.991
Med_CD1.0001.0001.0001.0000.0000.842
Diag_date1.0000.0000.0000.0001.0000.000
STTN_FST_DAYS1.0000.6450.9910.8420.0001.000
2023-10-09T03:57:51.509449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Med_CDSTTN_FST_DEPTDiag_dateSTTN_FST_PRCD
Med_CD1.0000.9900.0000.990
STTN_FST_DEPT0.9901.0000.0000.979
Diag_date0.0000.0001.0000.000
STTN_FST_PRCD0.9900.9790.0001.000
2023-10-09T03:57:51.708225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
STTN_FST_DAYSSTTN_FST_DEPTSTTN_FST_PRCDMed_CDDiag_date
STTN_FST_DAYS1.0000.4230.8880.5040.000
STTN_FST_DEPT0.4231.0000.9790.9900.000
STTN_FST_PRCD0.8880.9791.0000.9900.000
Med_CD0.5040.9900.9901.0000.000
Diag_date0.0000.0000.0000.0001.000

Missing values

2023-10-09T03:57:46.322086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:57:46.654634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

RIDSTTN_FST_DEPTSTTN_FST_PRCDMed_CDDiag_dateSTTN_FST_DAYS
0R0000001신장내과Atorvastatin617310200530
1R0000002내과Rosuvastatin85974720061
2R0000004내과Rosuvastatin85974720061
3R0000010내과Rosuvastatin85974720061
4R0000015내과Rosuvastatin85974720061
5R0000016내과Rosuvastatin859747200630
6R0000018내과Rosuvastatin85974720061
7R0000021신장내과Atorvastatin617310200635
8R0000022신경과Atorvastatin617310200663
9R0000030신경과Atorvastatin617310200660
RIDSTTN_FST_DEPTSTTN_FST_PRCDMed_CDDiag_dateSTTN_FST_DAYS
90R0002906내과Rosuvastatin85974720061
91R0002926신경과Atorvastatin617310200616
92R0002928신경과Atorvastatin617310200663
93R0002943신장내과Atorvastatin617310200617
94R0002955신경과Atorvastatin617310200663
95R0002987내과Rosuvastatin85974720061
96R0003104신경과Atorvastatin617310200663
97R0003128신장내과Atorvastatin617310200660
98R0003181신경과Atorvastatin617310200690
99R0003225신장내과Atorvastatin617310200635