Dataset statistics
Number of variables | 4 |
---|---|
Number of observations | 105 |
Missing cells | 15 |
Missing cells (%) | 3.6% |
Duplicate rows | 1 |
Duplicate rows (%) | 1.0% |
Total size in memory | 3.8 KiB |
Average record size in memory | 37.3 B |
Variable types
Numeric | 3 |
---|---|
Categorical | 1 |
Dataset
Description | 고지혈증 환자들이 시행한 혈액 검사를 이용하여 당뇨, 비뇨기 질환과의 관련성을 평가할 수 있는 검사 데이터를 포함함. 검체 채취 일장, 접수 일자를 이용하여 처방시점으로 부터의 기간을 계산한 시점 데이터를 생성함. 검사항목은HbA1c, PSA(Prostate Specific Ag), free PSA, 등 고지혈증의 간독성과 신독성 등 다양한 부작용을 평가할 수 있는 주요 검사항목이 포함됨 - HbA1c(당화혈색소) :혈액 속 적혈구 내 혈색소에 포도당 일부가 결합한 상태. 일반 혈당 검사가 검사 시점 혈당만을 알 수 있는데 반해 당화혈색소를 통해 3개월 간의 평균 혈당을 알 수 있음 - PSA(Prostate Specific Antigen) : 전립선특이항원(전립샘특이항원). 전립선에서 분비되며 정액이나 혈액 속에 들어있는 당단백의 하나로, 전립선암 종양표지자(tumor marker) - free PSA : 활성 전립선특이항원 |
---|---|
Author | 가톨릭대학교 은평성모병원 |
URL | http://cmcdata.net/data/dataset/coexistence-disease-analysis-blood-test-data-dyslipidemia-eunpyeong |
Dataset has 1 (1.0%) duplicate rows | Duplicates |
일련번호 is highly overall correlated with PSA_X_VAL | High correlation |
A1C_VAL is highly overall correlated with PSA_X_VAL | High correlation |
PSA_L_VAL is highly overall correlated with PSA_X_VAL | High correlation |
PSA_X_VAL is highly overall correlated with 일련번호 and 2 other fields | High correlation |
PSA_X_VAL is highly imbalanced (86.7%) | Imbalance |
일련번호 has 5 (4.8%) missing values | Missing |
A1C_VAL has 5 (4.8%) missing values | Missing |
PSA_L_VAL has 5 (4.8%) missing values | Missing |
Reproduction
Analysis started | 2023-10-08 18:58:02.475671 |
---|---|
Analysis finished | 2023-10-08 18:58:04.353789 |
Duration | 1.88 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
일련번호
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 100 |
---|---|
Distinct (%) | 100.0% |
Missing | 5 |
Missing (%) | 4.8% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 50.5 |
Minimum | 1 |
---|---|
Maximum | 100 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.1 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 5.95 |
Q1 | 25.75 |
median | 50.5 |
Q3 | 75.25 |
95-th percentile | 95.05 |
Maximum | 100 |
Range | 99 |
Interquartile range (IQR) | 49.5 |
Descriptive statistics
Standard deviation | 29.011492 |
---|---|
Coefficient of variation (CV) | 0.57448499 |
Kurtosis | -1.2 |
Mean | 50.5 |
Median Absolute Deviation (MAD) | 25 |
Skewness | 0 |
Sum | 5050 |
Variance | 841.66667 |
Monotonicity | Strictly increasing |
Value | Count | Frequency (%) |
65 | 1 | 1.0% |
75 | 1 | 1.0% |
74 | 1 | 1.0% |
73 | 1 | 1.0% |
72 | 1 | 1.0% |
71 | 1 | 1.0% |
70 | 1 | 1.0% |
69 | 1 | 1.0% |
68 | 1 | 1.0% |
67 | 1 | 1.0% |
Other values (90) | 90 | |
(Missing) | 5 | 4.8% |
Value | Count | Frequency (%) |
1 | 1 | |
2 | 1 | |
3 | 1 | |
4 | 1 | |
5 | 1 | |
6 | 1 | |
7 | 1 | |
8 | 1 | |
9 | 1 | |
10 | 1 |
Value | Count | Frequency (%) |
100 | 1 | |
99 | 1 | |
98 | 1 | |
97 | 1 | |
96 | 1 | |
95 | 1 | |
94 | 1 | |
93 | 1 | |
92 | 1 | |
91 | 1 |
A1C_VAL
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 42 |
---|---|
Distinct (%) | 42.0% |
Missing | 5 |
Missing (%) | 4.8% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 6.719 |
Minimum | 4.3 |
---|---|
Maximum | 14.7 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.1 KiB |
Quantile statistics
Minimum | 4.3 |
---|---|
5-th percentile | 5.2 |
Q1 | 5.575 |
median | 6 |
Q3 | 7.3 |
95-th percentile | 10.41 |
Maximum | 14.7 |
Range | 10.4 |
Interquartile range (IQR) | 1.725 |
Descriptive statistics
Standard deviation | 1.7490947 |
---|---|
Coefficient of variation (CV) | 0.26032069 |
Kurtosis | 4.5085791 |
Mean | 6.719 |
Median Absolute Deviation (MAD) | 0.6 |
Skewness | 1.9242015 |
Sum | 671.9 |
Variance | 3.0593323 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
5.5 | 9 | 8.6% |
5.8 | 8 | 7.6% |
5.4 | 7 | 6.7% |
5.7 | 6 | 5.7% |
7.3 | 5 | 4.8% |
6.3 | 5 | 4.8% |
5.6 | 4 | 3.8% |
6.0 | 4 | 3.8% |
5.9 | 4 | 3.8% |
5.3 | 3 | 2.9% |
Other values (32) | 45 | |
(Missing) | 5 | 4.8% |
Value | Count | Frequency (%) |
4.3 | 1 | 1.0% |
4.8 | 1 | 1.0% |
5.0 | 1 | 1.0% |
5.2 | 3 | 2.9% |
5.3 | 3 | 2.9% |
5.4 | 7 | |
5.5 | 9 | |
5.6 | 4 | |
5.7 | 6 | |
5.8 | 8 |
Value | Count | Frequency (%) |
14.7 | 1 | |
11.9 | 1 | |
11.7 | 1 | |
10.8 | 1 | |
10.6 | 1 | |
10.4 | 1 | |
10.0 | 1 | |
9.8 | 2 | |
9.1 | 1 | |
8.7 | 1 |
PSA_L_VAL
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 79 |
---|---|
Distinct (%) | 79.0% |
Missing | 5 |
Missing (%) | 4.8% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1.8223 |
Minimum | 0.04 |
---|---|
Maximum | 20.72 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.1 KiB |
Quantile statistics
Minimum | 0.04 |
---|---|
5-th percentile | 0.1685 |
Q1 | 0.47 |
median | 0.745 |
Q3 | 1.3675 |
95-th percentile | 7.7635 |
Maximum | 20.72 |
Range | 20.68 |
Interquartile range (IQR) | 0.8975 |
Descriptive statistics
Standard deviation | 3.4427171 |
---|---|
Coefficient of variation (CV) | 1.8892153 |
Kurtosis | 15.341024 |
Mean | 1.8223 |
Median Absolute Deviation (MAD) | 0.385 |
Skewness | 3.8495146 |
Sum | 182.23 |
Variance | 11.852301 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0.69 | 3 | 2.9% |
0.65 | 3 | 2.9% |
0.84 | 3 | 2.9% |
1.19 | 3 | 2.9% |
0.61 | 3 | 2.9% |
0.04 | 2 | 1.9% |
0.53 | 2 | 1.9% |
0.47 | 2 | 1.9% |
1.01 | 2 | 1.9% |
0.22 | 2 | 1.9% |
Other values (69) | 75 | |
(Missing) | 5 | 4.8% |
Value | Count | Frequency (%) |
0.04 | 2 | |
0.08 | 1 | |
0.09 | 1 | |
0.14 | 1 | |
0.17 | 1 | |
0.18 | 1 | |
0.22 | 2 | |
0.26 | 1 | |
0.29 | 1 | |
0.3 | 1 |
Value | Count | Frequency (%) |
20.72 | 1 | |
16.5 | 1 | |
16.1 | 1 | |
12.45 | 1 | |
12.2 | 1 | |
7.53 | 1 | |
4.69 | 1 | |
4.36 | 1 | |
3.99 | 1 | |
3.52 | 1 |
PSA_X_VAL
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 5 |
---|---|
Distinct (%) | 4.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 972.0 B |
<NA> | |
---|---|
0.07 | 1 |
0.59 | 1 |
0.32 | 1 |
0.81 | 1 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Unique
Unique | 4 ? |
---|---|
Unique (%) | 3.8% |
Sample
1st row | <NA> |
---|---|
2nd row | <NA> |
3rd row | <NA> |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 101 | |
0.07 | 1 | 1.0% |
0.59 | 1 | 1.0% |
0.32 | 1 | 1.0% |
0.81 | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 101 | |
0.07 | 1 | 1.0% |
0.59 | 1 | 1.0% |
0.32 | 1 | 1.0% |
0.81 | 1 | 1.0% |
일련번호 | A1C_VAL | PSA_L_VAL | PSA_X_VAL | |
---|---|---|---|---|
일련번호 | 1.000 | 0.000 | 0.000 | 1.000 |
A1C_VAL | 0.000 | 1.000 | 0.314 | 1.000 |
PSA_L_VAL | 0.000 | 0.314 | 1.000 | 1.000 |
PSA_X_VAL | 1.000 | 1.000 | 1.000 | 1.000 |
일련번호 | A1C_VAL | PSA_L_VAL | PSA_X_VAL | |
---|---|---|---|---|
일련번호 | 1.000 | -0.030 | -0.028 | 1.000 |
A1C_VAL | -0.030 | 1.000 | -0.023 | 1.000 |
PSA_L_VAL | -0.028 | -0.023 | 1.000 | 1.000 |
PSA_X_VAL | 1.000 | 1.000 | 1.000 | 1.000 |
일련번호 | A1C_VAL | PSA_L_VAL | PSA_X_VAL | |
---|---|---|---|---|
0 | 1 | 6.0 | 0.77 | <NA> |
1 | 2 | 5.5 | 0.47 | <NA> |
2 | 3 | 5.5 | 1.05 | <NA> |
3 | 4 | 5.5 | 1.44 | <NA> |
4 | 5 | 6.0 | 0.09 | <NA> |
5 | 6 | 6.3 | 4.69 | <NA> |
6 | 7 | 8.7 | 1.16 | <NA> |
7 | 8 | 11.7 | 0.42 | <NA> |
8 | 9 | 7.0 | 16.5 | 0.07 |
9 | 10 | 5.4 | 0.82 | <NA> |
일련번호 | A1C_VAL | PSA_L_VAL | PSA_X_VAL | |
---|---|---|---|---|
95 | 96 | 5.2 | 0.3 | <NA> |
96 | 97 | 10.0 | 20.72 | <NA> |
97 | 98 | 8.1 | 0.82 | <NA> |
98 | 99 | 5.4 | 0.58 | <NA> |
99 | 100 | 5.5 | 0.41 | <NA> |
100 | <NA> | <NA> | <NA> | <NA> |
101 | <NA> | <NA> | <NA> | <NA> |
102 | <NA> | <NA> | <NA> | <NA> |
103 | <NA> | <NA> | <NA> | <NA> |
104 | <NA> | <NA> | <NA> | <NA> |
Most frequently occurring
일련번호 | A1C_VAL | PSA_L_VAL | PSA_X_VAL | # duplicates | |
---|---|---|---|---|---|
0 | <NA> | <NA> | <NA> | <NA> | 5 |