Dataset statistics
Number of variables | 6 |
---|---|
Number of observations | 100 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 5.1 KiB |
Average record size in memory | 52.3 B |
Variable types
Text | 1 |
---|---|
Categorical | 4 |
Numeric | 1 |
Dataset
Description | 고지혈증 환자들의 최초 진단과와 최초 진단명과 진단코드 데이터. 진단과로는 소화기내과, 순환기내과, 내분비내과 등이 포함되어 환자 유입 경로를 분석할 수 있음. 스타틴 약물 데이터는 RxNorm 코드로 매핑됨. |
---|---|
Author | 가톨릭대학교 서울성모병원 |
URL | http://cmcdata.net/data/dataset/first-statin-prescription-dyslipidemia-data |
Med_CD is highly overall correlated with STTN_FST_DAYS and 2 other fields | High correlation |
STTN_FST_DEPT is highly overall correlated with STTN_FST_PRCD and 1 other fields | High correlation |
STTN_FST_PRCD is highly overall correlated with STTN_FST_DAYS and 2 other fields | High correlation |
STTN_FST_DAYS is highly overall correlated with STTN_FST_PRCD and 1 other fields | High correlation |
Diag_date is highly imbalanced (91.9%) | Imbalance |
RID has unique values | Unique |
Reproduction
Analysis started | 2023-10-08 18:57:45.562716 |
---|---|
Analysis finished | 2023-10-08 18:57:46.738335 |
Duration | 1.18 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
RID
Text
UNIQUE
 
Distinct | 100 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Value | Count | Frequency (%) |
r0000001 | 1 | 1.0% |
r0001724 | 1 | 1.0% |
r0002117 | 1 | 1.0% |
r0002092 | 1 | 1.0% |
r0002070 | 1 | 1.0% |
r0002047 | 1 | 1.0% |
r0002044 | 1 | 1.0% |
r0002023 | 1 | 1.0% |
r0002001 | 1 | 1.0% |
r0002000 | 1 | 1.0% |
Other values (90) | 90 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 391 | |
R | 100 | 12.5% |
2 | 65 | 8.1% |
1 | 51 | 6.4% |
3 | 38 | 4.8% |
4 | 28 | 3.5% |
7 | 28 | 3.5% |
8 | 27 | 3.4% |
9 | 27 | 3.4% |
5 | 24 | 3.0% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 700 | |
Uppercase Letter | 100 | 12.5% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 391 | |
2 | 65 | 9.3% |
1 | 51 | 7.3% |
3 | 38 | 5.4% |
4 | 28 | 4.0% |
7 | 28 | 4.0% |
8 | 27 | 3.9% |
9 | 27 | 3.9% |
5 | 24 | 3.4% |
6 | 21 | 3.0% |
Uppercase Letter
Value | Count | Frequency (%) |
R | 100 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 700 | |
Latin | 100 | 12.5% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 391 | |
2 | 65 | 9.3% |
1 | 51 | 7.3% |
3 | 38 | 5.4% |
4 | 28 | 4.0% |
7 | 28 | 4.0% |
8 | 27 | 3.9% |
9 | 27 | 3.9% |
5 | 24 | 3.4% |
6 | 21 | 3.0% |
Latin
Value | Count | Frequency (%) |
R | 100 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 800 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 391 | |
R | 100 | 12.5% |
2 | 65 | 8.1% |
1 | 51 | 6.4% |
3 | 38 | 4.8% |
4 | 28 | 3.5% |
7 | 28 | 3.5% |
8 | 27 | 3.4% |
9 | 27 | 3.4% |
5 | 24 | 3.0% |
STTN_FST_DEPT
Categorical
HIGH CORRELATION
 
Distinct | 6 |
---|---|
Distinct (%) | 6.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
신장내과 | |
---|---|
내과 | |
신경과 | |
흉부외과 | 4 |
외과 | 1 |
Length
Max length | 5 |
---|---|
Median length | 4 |
Mean length | 3.27 |
Min length | 2 |
Unique
Unique | 2 ? |
---|---|
Unique (%) | 2.0% |
Sample
1st row | 신장내과 |
---|---|
2nd row | 내과 |
3rd row | 내과 |
4th row | 내과 |
5th row | 내과 |
Common Values
Value | Count | Frequency (%) |
신장내과 | 47 | |
내과 | 25 | |
신경과 | 22 | |
흉부외과 | 4 | 4.0% |
외과 | 1 | 1.0% |
응급의학과 | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
신장내과 | 47 | |
내과 | 25 | |
신경과 | 22 | |
흉부외과 | 4 | 4.0% |
외과 | 1 | 1.0% |
응급의학과 | 1 | 1.0% |
STTN_FST_PRCD
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Atorvastatin | |
---|---|
Rosuvastatin |
Length
Max length | 12 |
---|---|
Median length | 12 |
Mean length | 12 |
Min length | 12 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Atorvastatin |
---|---|
2nd row | Rosuvastatin |
3rd row | Rosuvastatin |
4th row | Rosuvastatin |
5th row | Rosuvastatin |
Common Values
Value | Count | Frequency (%) |
Atorvastatin | 75 | |
Rosuvastatin | 25 | 25.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
atorvastatin | 75 | |
rosuvastatin | 25 | 25.0% |
Med_CD
Categorical
HIGH CORRELATION
 
Distinct | 4 |
---|---|
Distinct (%) | 4.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
617310 | |
---|---|
859747 | |
617312 | 5 |
617311 | 1 |
Length
Max length | 6 |
---|---|
Median length | 6 |
Mean length | 6 |
Min length | 6 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 1.0% |
Sample
1st row | 617310 |
---|---|
2nd row | 859747 |
3rd row | 859747 |
4th row | 859747 |
5th row | 859747 |
Common Values
Value | Count | Frequency (%) |
617310 | 69 | |
859747 | 25 | 25.0% |
617312 | 5 | 5.0% |
617311 | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
617310 | 69 | |
859747 | 25 | 25.0% |
617312 | 5 | 5.0% |
617311 | 1 | 1.0% |
Diag_date
Categorical
IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
2006 | |
---|---|
2005 | 1 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 1.0% |
Sample
1st row | 2005 |
---|---|
2nd row | 2006 |
3rd row | 2006 |
4th row | 2006 |
5th row | 2006 |
Common Values
Value | Count | Frequency (%) |
2006 | 99 | |
2005 | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
2006 | 99 | |
2005 | 1 | 1.0% |
STTN_FST_DAYS
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 22 |
---|---|
Distinct (%) | 22.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 39.84 |
Minimum | 1 |
---|---|
Maximum | 120 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 5.5 |
median | 40 |
Q3 | 60 |
95-th percentile | 90 |
Maximum | 120 |
Range | 119 |
Interquartile range (IQR) | 54.5 |
Descriptive statistics
Standard deviation | 29.063747 |
---|---|
Coefficient of variation (CV) | 0.72951173 |
Kurtosis | -0.77924987 |
Mean | 39.84 |
Median Absolute Deviation (MAD) | 20 |
Skewness | 0.10017075 |
Sum | 3984 |
Variance | 844.70141 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 25 | |
60 | 23 | |
30 | 12 | |
90 | 7 | 7.0% |
35 | 5 | 5.0% |
63 | 4 | 4.0% |
65 | 4 | 4.0% |
50 | 3 | 3.0% |
70 | 3 | 3.0% |
40 | 2 | 2.0% |
Other values (12) | 12 |
Value | Count | Frequency (%) |
1 | 25 | |
7 | 1 | 1.0% |
16 | 1 | 1.0% |
17 | 1 | 1.0% |
20 | 1 | 1.0% |
21 | 1 | 1.0% |
28 | 1 | 1.0% |
30 | 12 | |
35 | 5 | 5.0% |
37 | 1 | 1.0% |
Value | Count | Frequency (%) |
120 | 1 | 1.0% |
90 | 7 | 7.0% |
70 | 3 | 3.0% |
65 | 4 | 4.0% |
63 | 4 | 4.0% |
60 | 23 | |
56 | 1 | 1.0% |
53 | 1 | 1.0% |
50 | 3 | 3.0% |
45 | 1 | 1.0% |
RID | STTN_FST_DEPT | STTN_FST_PRCD | Med_CD | Diag_date | STTN_FST_DAYS | |
---|---|---|---|---|---|---|
RID | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
STTN_FST_DEPT | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.645 |
STTN_FST_PRCD | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.991 |
Med_CD | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.842 |
Diag_date | 1.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 |
STTN_FST_DAYS | 1.000 | 0.645 | 0.991 | 0.842 | 0.000 | 1.000 |
Med_CD | STTN_FST_DEPT | Diag_date | STTN_FST_PRCD | |
---|---|---|---|---|
Med_CD | 1.000 | 0.990 | 0.000 | 0.990 |
STTN_FST_DEPT | 0.990 | 1.000 | 0.000 | 0.979 |
Diag_date | 0.000 | 0.000 | 1.000 | 0.000 |
STTN_FST_PRCD | 0.990 | 0.979 | 0.000 | 1.000 |
STTN_FST_DAYS | STTN_FST_DEPT | STTN_FST_PRCD | Med_CD | Diag_date | |
---|---|---|---|---|---|
STTN_FST_DAYS | 1.000 | 0.423 | 0.888 | 0.504 | 0.000 |
STTN_FST_DEPT | 0.423 | 1.000 | 0.979 | 0.990 | 0.000 |
STTN_FST_PRCD | 0.888 | 0.979 | 1.000 | 0.990 | 0.000 |
Med_CD | 0.504 | 0.990 | 0.990 | 1.000 | 0.000 |
Diag_date | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 |
RID | STTN_FST_DEPT | STTN_FST_PRCD | Med_CD | Diag_date | STTN_FST_DAYS | |
---|---|---|---|---|---|---|
0 | R0000001 | 신장내과 | Atorvastatin | 617310 | 2005 | 30 |
1 | R0000002 | 내과 | Rosuvastatin | 859747 | 2006 | 1 |
2 | R0000004 | 내과 | Rosuvastatin | 859747 | 2006 | 1 |
3 | R0000010 | 내과 | Rosuvastatin | 859747 | 2006 | 1 |
4 | R0000015 | 내과 | Rosuvastatin | 859747 | 2006 | 1 |
5 | R0000016 | 내과 | Rosuvastatin | 859747 | 2006 | 30 |
6 | R0000018 | 내과 | Rosuvastatin | 859747 | 2006 | 1 |
7 | R0000021 | 신장내과 | Atorvastatin | 617310 | 2006 | 35 |
8 | R0000022 | 신경과 | Atorvastatin | 617310 | 2006 | 63 |
9 | R0000030 | 신경과 | Atorvastatin | 617310 | 2006 | 60 |
RID | STTN_FST_DEPT | STTN_FST_PRCD | Med_CD | Diag_date | STTN_FST_DAYS | |
---|---|---|---|---|---|---|
90 | R0002906 | 내과 | Rosuvastatin | 859747 | 2006 | 1 |
91 | R0002926 | 신경과 | Atorvastatin | 617310 | 2006 | 16 |
92 | R0002928 | 신경과 | Atorvastatin | 617310 | 2006 | 63 |
93 | R0002943 | 신장내과 | Atorvastatin | 617310 | 2006 | 17 |
94 | R0002955 | 신경과 | Atorvastatin | 617310 | 2006 | 63 |
95 | R0002987 | 내과 | Rosuvastatin | 859747 | 2006 | 1 |
96 | R0003104 | 신경과 | Atorvastatin | 617310 | 2006 | 63 |
97 | R0003128 | 신장내과 | Atorvastatin | 617310 | 2006 | 60 |
98 | R0003181 | 신경과 | Atorvastatin | 617310 | 2006 | 90 |
99 | R0003225 | 신장내과 | Atorvastatin | 617310 | 2006 | 35 |