Dataset statistics
Number of variables | 4 |
---|---|
Number of observations | 100 |
Missing cells | 33 |
Missing cells (%) | 8.2% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 3.5 KiB |
Average record size in memory | 36.3 B |
Variable types
Text | 1 |
---|---|
Numeric | 2 |
Categorical | 1 |
Dataset
Description | 고지혈증 환자들이 시행한 혈액 검사를 이용하여 당뇨, 비뇨기 질환과의 관련성을 평가할 수 있는 검사 데이터를 포함함. 검체 채취 일장, 접수 일자를 이용하여 처방시점으로 부터의 기간을 계산한 시점 데이터를 생성함. 검사항목은HbA1c, PSA(Prostate Specific Ag), free PSA, 등 고지혈증의 간독성과 신독성 등 다양한 부작용을 평가할 수 있는 주요 검사항목이 포함됨 - HbA1c(당화혈색소) :혈액 속 적혈구 내 혈색소에 포도당 일부가 결합한 상태. 일반 혈당 검사가 검사 시점 혈당만을 알 수 있는데 반해 당화혈색소를 통해 3개월 간의 평균 혈당을 알 수 있음 - PSA(Prostate Specific Antigen) : 전립선특이항원(전립샘특이항원). 전립선에서 분비되며 정액이나 혈액 속에 들어있는 당단백의 하나로, 전립선암 종양표지자(tumor marker) - free PSA : 활성 전립선특이항원 |
---|---|
Author | 가톨릭대학교 서울성모병원 |
URL | http://cmcdata.net/data/dataset/coexistence-disease-analysis-blood-test-data-dyslipidemia |
A1C_VAL is highly overall correlated with PSA_X_VAL | High correlation |
PSA_L_VAL is highly overall correlated with PSA_X_VAL | High correlation |
PSA_X_VAL is highly overall correlated with A1C_VAL and 1 other fields | High correlation |
PSA_X_VAL is highly imbalanced (84.4%) | Imbalance |
A1C_VAL has 33 (33.0%) missing values | Missing |
일련번호 has unique values | Unique |
PSA_L_VAL has 5 (5.0%) zeros | Zeros |
Reproduction
Analysis started | 2023-10-08 18:56:09.468877 |
---|---|
Analysis finished | 2023-10-08 18:56:11.467073 |
Duration | 2 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
일련번호
Text
UNIQUE
 
Distinct | 100 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Value | Count | Frequency (%) |
r0010750 | 1 | 1.0% |
r0028163 | 1 | 1.0% |
r0030478 | 1 | 1.0% |
r0030194 | 1 | 1.0% |
r0029952 | 1 | 1.0% |
r0029721 | 1 | 1.0% |
r0029696 | 1 | 1.0% |
r0029537 | 1 | 1.0% |
r0029339 | 1 | 1.0% |
r0028940 | 1 | 1.0% |
Other values (90) | 90 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 238 | |
R | 100 | |
2 | 77 | 9.6% |
3 | 71 | 8.9% |
1 | 68 | 8.5% |
9 | 51 | 6.4% |
4 | 42 | 5.2% |
6 | 42 | 5.2% |
8 | 41 | 5.1% |
7 | 35 | 4.4% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 700 | |
Uppercase Letter | 100 | 12.5% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 238 | |
2 | 77 | 11.0% |
3 | 71 | 10.1% |
1 | 68 | 9.7% |
9 | 51 | 7.3% |
4 | 42 | 6.0% |
6 | 42 | 6.0% |
8 | 41 | 5.9% |
7 | 35 | 5.0% |
5 | 35 | 5.0% |
Uppercase Letter
Value | Count | Frequency (%) |
R | 100 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 700 | |
Latin | 100 | 12.5% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 238 | |
2 | 77 | 11.0% |
3 | 71 | 10.1% |
1 | 68 | 9.7% |
9 | 51 | 7.3% |
4 | 42 | 6.0% |
6 | 42 | 6.0% |
8 | 41 | 5.9% |
7 | 35 | 5.0% |
5 | 35 | 5.0% |
Latin
Value | Count | Frequency (%) |
R | 100 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 800 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 238 | |
R | 100 | |
2 | 77 | 9.6% |
3 | 71 | 8.9% |
1 | 68 | 8.5% |
9 | 51 | 6.4% |
4 | 42 | 5.2% |
6 | 42 | 5.2% |
8 | 41 | 5.1% |
7 | 35 | 4.4% |
A1C_VAL
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 47 |
---|---|
Distinct (%) | 70.1% |
Missing | 33 |
Missing (%) | 33.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.4192537 |
Minimum | 0.02 |
---|---|
Maximum | 25.95 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 0.02 |
---|---|
5-th percentile | 0.03 |
Q1 | 0.065 |
median | 0.19 |
Q3 | 2.25 |
95-th percentile | 15.533 |
Maximum | 25.95 |
Range | 25.93 |
Interquartile range (IQR) | 2.185 |
Descriptive statistics
Standard deviation | 5.4366351 |
---|---|
Coefficient of variation (CV) | 2.2472364 |
Kurtosis | 10.674644 |
Mean | 2.4192537 |
Median Absolute Deviation (MAD) | 0.15 |
Skewness | 3.2712488 |
Sum | 162.09 |
Variance | 29.557001 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0.04 | 5 | 5.0% |
0.12 | 4 | 4.0% |
0.06 | 4 | 4.0% |
0.02 | 3 | 3.0% |
0.03 | 3 | 3.0% |
0.05 | 2 | 2.0% |
0.13 | 2 | 2.0% |
0.21 | 2 | 2.0% |
0.16 | 2 | 2.0% |
0.29 | 2 | 2.0% |
Other values (37) | 38 | |
(Missing) | 33 |
Value | Count | Frequency (%) |
0.02 | 3 | |
0.03 | 3 | |
0.04 | 5 | |
0.05 | 2 | 2.0% |
0.06 | 4 | |
0.07 | 2 | 2.0% |
0.09 | 1 | 1.0% |
0.1 | 1 | 1.0% |
0.11 | 1 | 1.0% |
0.12 | 4 |
Value | Count | Frequency (%) |
25.95 | 1 | |
25.29 | 1 | |
18.89 | 1 | |
18.71 | 1 | |
8.12 | 1 | |
7.1 | 1 | |
6.2 | 1 | |
5.33 | 1 | |
5.13 | 1 | |
5.11 | 1 |
PSA_L_VAL
Real number (ℝ)
HIGH CORRELATION
  ZEROS
 
Distinct | 88 |
---|---|
Distinct (%) | 88.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.5688 |
Minimum | 0 |
---|---|
Maximum | 45.62 |
Zeros | 5 |
Zeros (%) | 5.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0.0095 |
Q1 | 0.4575 |
median | 1.25 |
Q3 | 2.76 |
95-th percentile | 7.4695 |
Maximum | 45.62 |
Range | 45.62 |
Interquartile range (IQR) | 2.3025 |
Descriptive statistics
Standard deviation | 5.2049375 |
---|---|
Coefficient of variation (CV) | 2.0262136 |
Kurtosis | 48.78677 |
Mean | 2.5688 |
Median Absolute Deviation (MAD) | 0.94 |
Skewness | 6.3237092 |
Sum | 256.88 |
Variance | 27.091374 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0.0 | 5 | 5.0% |
2.23 | 2 | 2.0% |
1.25 | 2 | 2.0% |
0.41 | 2 | 2.0% |
0.66 | 2 | 2.0% |
0.67 | 2 | 2.0% |
0.86 | 2 | 2.0% |
0.57 | 2 | 2.0% |
0.02 | 2 | 2.0% |
3.38 | 1 | 1.0% |
Other values (78) | 78 |
Value | Count | Frequency (%) |
0.0 | 5 | |
0.01 | 1 | 1.0% |
0.02 | 2 | 2.0% |
0.03 | 1 | 1.0% |
0.07 | 1 | 1.0% |
0.1 | 1 | 1.0% |
0.13 | 1 | 1.0% |
0.17 | 1 | 1.0% |
0.19 | 1 | 1.0% |
0.21 | 1 | 1.0% |
Value | Count | Frequency (%) |
45.62 | 1 | |
20.37 | 1 | |
11.74 | 1 | |
8.91 | 1 | |
8.41 | 1 | |
7.42 | 1 | |
7.01 | 1 | |
5.75 | 1 | |
5.59 | 1 | |
5.41 | 1 |
PSA_X_VAL
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 6 |
---|---|
Distinct (%) | 6.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
<NA> | |
---|---|
0.09 | 1 |
0.36 | 1 |
0.82 | 1 |
0.29 | 1 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Unique
Unique | 5 ? |
---|---|
Unique (%) | 5.0% |
Sample
1st row | <NA> |
---|---|
2nd row | <NA> |
3rd row | <NA> |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 95 | |
0.09 | 1 | 1.0% |
0.36 | 1 | 1.0% |
0.82 | 1 | 1.0% |
0.29 | 1 | 1.0% |
0.23 | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 95 | |
0.09 | 1 | 1.0% |
0.36 | 1 | 1.0% |
0.82 | 1 | 1.0% |
0.29 | 1 | 1.0% |
0.23 | 1 | 1.0% |
일련번호 | A1C_VAL | PSA_L_VAL | PSA_X_VAL | |
---|---|---|---|---|
일련번호 | 1.000 | 1.000 | 1.000 | 1.000 |
A1C_VAL | 1.000 | 1.000 | 0.000 | NaN |
PSA_L_VAL | 1.000 | 0.000 | 1.000 | NaN |
PSA_X_VAL | 1.000 | NaN | NaN | 1.000 |
A1C_VAL | PSA_L_VAL | PSA_X_VAL | |
---|---|---|---|
A1C_VAL | 1.000 | -0.084 | 1.000 |
PSA_L_VAL | -0.084 | 1.000 | 1.000 |
PSA_X_VAL | 1.000 | 1.000 | 1.000 |
일련번호 | A1C_VAL | PSA_L_VAL | PSA_X_VAL | |
---|---|---|---|---|
0 | R0010750 | 0.09 | 3.38 | <NA> |
1 | R0011300 | 0.1 | 0.01 | <NA> |
2 | R0011754 | 0.04 | 4.32 | <NA> |
3 | R0012316 | 0.18 | 1.59 | <NA> |
4 | R0012507 | 0.19 | 1.15 | <NA> |
5 | R0013226 | 0.23 | 3.91 | <NA> |
6 | R0013701 | 0.46 | 0.41 | <NA> |
7 | R0013987 | <NA> | 2.69 | <NA> |
8 | R0014264 | 0.04 | 5.59 | <NA> |
9 | R0014330 | <NA> | 2.97 | <NA> |
일련번호 | A1C_VAL | PSA_L_VAL | PSA_X_VAL | |
---|---|---|---|---|
90 | R0033869 | <NA> | 0.51 | <NA> |
91 | R0034103 | <NA> | 0.0 | <NA> |
92 | R0034187 | 18.89 | 7.42 | <NA> |
93 | R0034330 | 25.29 | 0.21 | <NA> |
94 | R0034337 | 0.76 | 1.17 | <NA> |
95 | R0034515 | 0.02 | 2.02 | <NA> |
96 | R0034640 | <NA> | 0.27 | <NA> |
97 | R0034878 | <NA> | 1.8 | <NA> |
98 | R0035004 | 0.11 | 0.0 | <NA> |
99 | R0035066 | 0.02 | 5.41 | <NA> |