Overview

Dataset statistics

Number of variables4
Number of observations105
Missing cells15
Missing cells (%)3.6%
Duplicate rows1
Duplicate rows (%)1.0%
Total size in memory3.8 KiB
Average record size in memory37.3 B

Variable types

Numeric3
Categorical1

Dataset

Description고지혈증 환자들이 시행한 혈액 검사를 이용하여 당뇨, 비뇨기 질환과의 관련성을 평가할 수 있는 검사 데이터를 포함함. 검체 채취 일장, 접수 일자를 이용하여 처방시점으로 부터의 기간을 계산한 시점 데이터를 생성함. 검사항목은HbA1c, PSA(Prostate Specific Ag), free PSA, 등 고지혈증의 간독성과 신독성 등 다양한 부작용을 평가할 수 있는 주요 검사항목이 포함됨 - HbA1c(당화혈색소) :혈액 속 적혈구 내 혈색소에 포도당 일부가 결합한 상태. 일반 혈당 검사가 검사 시점 혈당만을 알 수 있는데 반해 당화혈색소를 통해 3개월 간의 평균 혈당을 알 수 있음 - PSA(Prostate Specific Antigen) : 전립선특이항원(전립샘특이항원). 전립선에서 분비되며 정액이나 혈액 속에 들어있는 당단백의 하나로, 전립선암 종양표지자(tumor marker) - free PSA : 활성 전립선특이항원
Author가톨릭대학교 은평성모병원
URLhttp://cmcdata.net/data/dataset/coexistence-disease-analysis-blood-test-data-dyslipidemia-eunpyeong

Alerts

Dataset has 1 (1.0%) duplicate rowsDuplicates
일련번호 is highly overall correlated with PSA_X_VALHigh correlation
A1C_VAL is highly overall correlated with PSA_X_VALHigh correlation
PSA_L_VAL is highly overall correlated with PSA_X_VALHigh correlation
PSA_X_VAL is highly overall correlated with 일련번호 and 2 other fieldsHigh correlation
PSA_X_VAL is highly imbalanced (86.7%)Imbalance
일련번호 has 5 (4.8%) missing valuesMissing
A1C_VAL has 5 (4.8%) missing valuesMissing
PSA_L_VAL has 5 (4.8%) missing valuesMissing

Reproduction

Analysis started2023-10-08 18:58:02.475671
Analysis finished2023-10-08 18:58:04.353789
Duration1.88 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

일련번호
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct100
Distinct (%)100.0%
Missing5
Missing (%)4.8%
Infinite0
Infinite (%)0.0%
Mean50.5
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-10-09T03:58:04.468040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5.95
Q125.75
median50.5
Q375.25
95-th percentile95.05
Maximum100
Range99
Interquartile range (IQR)49.5

Descriptive statistics

Standard deviation29.011492
Coefficient of variation (CV)0.57448499
Kurtosis-1.2
Mean50.5
Median Absolute Deviation (MAD)25
Skewness0
Sum5050
Variance841.66667
MonotonicityStrictly increasing
2023-10-09T03:58:04.687636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
65 1
 
1.0%
75 1
 
1.0%
74 1
 
1.0%
73 1
 
1.0%
72 1
 
1.0%
71 1
 
1.0%
70 1
 
1.0%
69 1
 
1.0%
68 1
 
1.0%
67 1
 
1.0%
Other values (90) 90
85.7%
(Missing) 5
 
4.8%
ValueCountFrequency (%)
1 1
1.0%
2 1
1.0%
3 1
1.0%
4 1
1.0%
5 1
1.0%
6 1
1.0%
7 1
1.0%
8 1
1.0%
9 1
1.0%
10 1
1.0%
ValueCountFrequency (%)
100 1
1.0%
99 1
1.0%
98 1
1.0%
97 1
1.0%
96 1
1.0%
95 1
1.0%
94 1
1.0%
93 1
1.0%
92 1
1.0%
91 1
1.0%

A1C_VAL
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct42
Distinct (%)42.0%
Missing5
Missing (%)4.8%
Infinite0
Infinite (%)0.0%
Mean6.719
Minimum4.3
Maximum14.7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-10-09T03:58:04.871431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4.3
5-th percentile5.2
Q15.575
median6
Q37.3
95-th percentile10.41
Maximum14.7
Range10.4
Interquartile range (IQR)1.725

Descriptive statistics

Standard deviation1.7490947
Coefficient of variation (CV)0.26032069
Kurtosis4.5085791
Mean6.719
Median Absolute Deviation (MAD)0.6
Skewness1.9242015
Sum671.9
Variance3.0593323
MonotonicityNot monotonic
2023-10-09T03:58:05.622453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%)
5.5 9
 
8.6%
5.8 8
 
7.6%
5.4 7
 
6.7%
5.7 6
 
5.7%
7.3 5
 
4.8%
6.3 5
 
4.8%
5.6 4
 
3.8%
6.0 4
 
3.8%
5.9 4
 
3.8%
5.3 3
 
2.9%
Other values (32) 45
42.9%
(Missing) 5
 
4.8%
ValueCountFrequency (%)
4.3 1
 
1.0%
4.8 1
 
1.0%
5.0 1
 
1.0%
5.2 3
 
2.9%
5.3 3
 
2.9%
5.4 7
6.7%
5.5 9
8.6%
5.6 4
3.8%
5.7 6
5.7%
5.8 8
7.6%
ValueCountFrequency (%)
14.7 1
1.0%
11.9 1
1.0%
11.7 1
1.0%
10.8 1
1.0%
10.6 1
1.0%
10.4 1
1.0%
10.0 1
1.0%
9.8 2
1.9%
9.1 1
1.0%
8.7 1
1.0%

PSA_L_VAL
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct79
Distinct (%)79.0%
Missing5
Missing (%)4.8%
Infinite0
Infinite (%)0.0%
Mean1.8223
Minimum0.04
Maximum20.72
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-10-09T03:58:05.959290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.04
5-th percentile0.1685
Q10.47
median0.745
Q31.3675
95-th percentile7.7635
Maximum20.72
Range20.68
Interquartile range (IQR)0.8975

Descriptive statistics

Standard deviation3.4427171
Coefficient of variation (CV)1.8892153
Kurtosis15.341024
Mean1.8223
Median Absolute Deviation (MAD)0.385
Skewness3.8495146
Sum182.23
Variance11.852301
MonotonicityNot monotonic
2023-10-09T03:58:06.320484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.69 3
 
2.9%
0.65 3
 
2.9%
0.84 3
 
2.9%
1.19 3
 
2.9%
0.61 3
 
2.9%
0.04 2
 
1.9%
0.53 2
 
1.9%
0.47 2
 
1.9%
1.01 2
 
1.9%
0.22 2
 
1.9%
Other values (69) 75
71.4%
(Missing) 5
 
4.8%
ValueCountFrequency (%)
0.04 2
1.9%
0.08 1
1.0%
0.09 1
1.0%
0.14 1
1.0%
0.17 1
1.0%
0.18 1
1.0%
0.22 2
1.9%
0.26 1
1.0%
0.29 1
1.0%
0.3 1
1.0%
ValueCountFrequency (%)
20.72 1
1.0%
16.5 1
1.0%
16.1 1
1.0%
12.45 1
1.0%
12.2 1
1.0%
7.53 1
1.0%
4.69 1
1.0%
4.36 1
1.0%
3.99 1
1.0%
3.52 1
1.0%

PSA_X_VAL
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Memory size972.0 B
<NA>
101 
0.07
 
1
0.59
 
1
0.32
 
1
0.81
 
1

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique4 ?
Unique (%)3.8%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 101
96.2%
0.07 1
 
1.0%
0.59 1
 
1.0%
0.32 1
 
1.0%
0.81 1
 
1.0%

Length

2023-10-09T03:58:06.529995image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:06.720715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 101
96.2%
0.07 1
 
1.0%
0.59 1
 
1.0%
0.32 1
 
1.0%
0.81 1
 
1.0%

Interactions

2023-10-09T03:58:03.521658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:02.690922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:03.104813image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:03.643956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:02.827017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:03.264194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:03.824283image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:02.953098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:03.400434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:58:06.852812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일련번호A1C_VALPSA_L_VALPSA_X_VAL
일련번호1.0000.0000.0001.000
A1C_VAL0.0001.0000.3141.000
PSA_L_VAL0.0000.3141.0001.000
PSA_X_VAL1.0001.0001.0001.000
2023-10-09T03:58:07.062045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일련번호A1C_VALPSA_L_VALPSA_X_VAL
일련번호1.000-0.030-0.0281.000
A1C_VAL-0.0301.000-0.0231.000
PSA_L_VAL-0.028-0.0231.0001.000
PSA_X_VAL1.0001.0001.0001.000

Missing values

2023-10-09T03:58:04.014131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:58:04.127665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-10-09T03:58:04.275229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

일련번호A1C_VALPSA_L_VALPSA_X_VAL
016.00.77<NA>
125.50.47<NA>
235.51.05<NA>
345.51.44<NA>
456.00.09<NA>
566.34.69<NA>
678.71.16<NA>
7811.70.42<NA>
897.016.50.07
9105.40.82<NA>
일련번호A1C_VALPSA_L_VALPSA_X_VAL
95965.20.3<NA>
969710.020.72<NA>
97988.10.82<NA>
98995.40.58<NA>
991005.50.41<NA>
100<NA><NA><NA><NA>
101<NA><NA><NA><NA>
102<NA><NA><NA><NA>
103<NA><NA><NA><NA>
104<NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

일련번호A1C_VALPSA_L_VALPSA_X_VAL# duplicates
0<NA><NA><NA><NA>5