Overview

Dataset statistics

Number of variables4
Number of observations100
Missing cells33
Missing cells (%)8.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.5 KiB
Average record size in memory36.3 B

Variable types

Text1
Numeric2
Categorical1

Dataset

Description고지혈증 환자들이 시행한 혈액 검사를 이용하여 당뇨, 비뇨기 질환과의 관련성을 평가할 수 있는 검사 데이터를 포함함. 검체 채취 일장, 접수 일자를 이용하여 처방시점으로 부터의 기간을 계산한 시점 데이터를 생성함. 검사항목은HbA1c, PSA(Prostate Specific Ag), free PSA, 등 고지혈증의 간독성과 신독성 등 다양한 부작용을 평가할 수 있는 주요 검사항목이 포함됨 - HbA1c(당화혈색소) :혈액 속 적혈구 내 혈색소에 포도당 일부가 결합한 상태. 일반 혈당 검사가 검사 시점 혈당만을 알 수 있는데 반해 당화혈색소를 통해 3개월 간의 평균 혈당을 알 수 있음 - PSA(Prostate Specific Antigen) : 전립선특이항원(전립샘특이항원). 전립선에서 분비되며 정액이나 혈액 속에 들어있는 당단백의 하나로, 전립선암 종양표지자(tumor marker) - free PSA : 활성 전립선특이항원
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/coexistence-disease-analysis-blood-test-data-dyslipidemia

Alerts

A1C_VAL is highly overall correlated with PSA_X_VALHigh correlation
PSA_L_VAL is highly overall correlated with PSA_X_VALHigh correlation
PSA_X_VAL is highly overall correlated with A1C_VAL and 1 other fieldsHigh correlation
PSA_X_VAL is highly imbalanced (84.4%)Imbalance
A1C_VAL has 33 (33.0%) missing valuesMissing
일련번호 has unique valuesUnique
PSA_L_VAL has 5 (5.0%) zerosZeros

Reproduction

Analysis started2023-10-08 18:56:09.468877
Analysis finished2023-10-08 18:56:11.467073
Duration2 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

일련번호
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:56:12.184613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters800
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowR0010750
2nd rowR0011300
3rd rowR0011754
4th rowR0012316
5th rowR0012507
ValueCountFrequency (%)
r0010750 1
 
1.0%
r0028163 1
 
1.0%
r0030478 1
 
1.0%
r0030194 1
 
1.0%
r0029952 1
 
1.0%
r0029721 1
 
1.0%
r0029696 1
 
1.0%
r0029537 1
 
1.0%
r0029339 1
 
1.0%
r0028940 1
 
1.0%
Other values (90) 90
90.0%
2023-10-09T03:56:13.103072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 238
29.8%
R 100
12.5%
2 77
 
9.6%
3 71
 
8.9%
1 68
 
8.5%
9 51
 
6.4%
4 42
 
5.2%
6 42
 
5.2%
8 41
 
5.1%
7 35
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.5%
Uppercase Letter 100
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 238
34.0%
2 77
 
11.0%
3 71
 
10.1%
1 68
 
9.7%
9 51
 
7.3%
4 42
 
6.0%
6 42
 
6.0%
8 41
 
5.9%
7 35
 
5.0%
5 35
 
5.0%
Uppercase Letter
ValueCountFrequency (%)
R 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.5%
Latin 100
 
12.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 238
34.0%
2 77
 
11.0%
3 71
 
10.1%
1 68
 
9.7%
9 51
 
7.3%
4 42
 
6.0%
6 42
 
6.0%
8 41
 
5.9%
7 35
 
5.0%
5 35
 
5.0%
Latin
ValueCountFrequency (%)
R 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 238
29.8%
R 100
12.5%
2 77
 
9.6%
3 71
 
8.9%
1 68
 
8.5%
9 51
 
6.4%
4 42
 
5.2%
6 42
 
5.2%
8 41
 
5.1%
7 35
 
4.4%

A1C_VAL
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct47
Distinct (%)70.1%
Missing33
Missing (%)33.0%
Infinite0
Infinite (%)0.0%
Mean2.4192537
Minimum0.02
Maximum25.95
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:13.361030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.02
5-th percentile0.03
Q10.065
median0.19
Q32.25
95-th percentile15.533
Maximum25.95
Range25.93
Interquartile range (IQR)2.185

Descriptive statistics

Standard deviation5.4366351
Coefficient of variation (CV)2.2472364
Kurtosis10.674644
Mean2.4192537
Median Absolute Deviation (MAD)0.15
Skewness3.2712488
Sum162.09
Variance29.557001
MonotonicityNot monotonic
2023-10-09T03:56:13.690878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
0.04 5
 
5.0%
0.12 4
 
4.0%
0.06 4
 
4.0%
0.02 3
 
3.0%
0.03 3
 
3.0%
0.05 2
 
2.0%
0.13 2
 
2.0%
0.21 2
 
2.0%
0.16 2
 
2.0%
0.29 2
 
2.0%
Other values (37) 38
38.0%
(Missing) 33
33.0%
ValueCountFrequency (%)
0.02 3
3.0%
0.03 3
3.0%
0.04 5
5.0%
0.05 2
 
2.0%
0.06 4
4.0%
0.07 2
 
2.0%
0.09 1
 
1.0%
0.1 1
 
1.0%
0.11 1
 
1.0%
0.12 4
4.0%
ValueCountFrequency (%)
25.95 1
1.0%
25.29 1
1.0%
18.89 1
1.0%
18.71 1
1.0%
8.12 1
1.0%
7.1 1
1.0%
6.2 1
1.0%
5.33 1
1.0%
5.13 1
1.0%
5.11 1
1.0%

PSA_L_VAL
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct88
Distinct (%)88.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.5688
Minimum0
Maximum45.62
Zeros5
Zeros (%)5.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:13.965024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.0095
Q10.4575
median1.25
Q32.76
95-th percentile7.4695
Maximum45.62
Range45.62
Interquartile range (IQR)2.3025

Descriptive statistics

Standard deviation5.2049375
Coefficient of variation (CV)2.0262136
Kurtosis48.78677
Mean2.5688
Median Absolute Deviation (MAD)0.94
Skewness6.3237092
Sum256.88
Variance27.091374
MonotonicityNot monotonic
2023-10-09T03:56:14.191956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 5
 
5.0%
2.23 2
 
2.0%
1.25 2
 
2.0%
0.41 2
 
2.0%
0.66 2
 
2.0%
0.67 2
 
2.0%
0.86 2
 
2.0%
0.57 2
 
2.0%
0.02 2
 
2.0%
3.38 1
 
1.0%
Other values (78) 78
78.0%
ValueCountFrequency (%)
0.0 5
5.0%
0.01 1
 
1.0%
0.02 2
 
2.0%
0.03 1
 
1.0%
0.07 1
 
1.0%
0.1 1
 
1.0%
0.13 1
 
1.0%
0.17 1
 
1.0%
0.19 1
 
1.0%
0.21 1
 
1.0%
ValueCountFrequency (%)
45.62 1
1.0%
20.37 1
1.0%
11.74 1
1.0%
8.91 1
1.0%
8.41 1
1.0%
7.42 1
1.0%
7.01 1
1.0%
5.75 1
1.0%
5.59 1
1.0%
5.41 1
1.0%

PSA_X_VAL
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
<NA>
95 
0.09
 
1
0.36
 
1
0.82
 
1
0.29
 
1

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique5 ?
Unique (%)5.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 95
95.0%
0.09 1
 
1.0%
0.36 1
 
1.0%
0.82 1
 
1.0%
0.29 1
 
1.0%
0.23 1
 
1.0%

Length

2023-10-09T03:56:14.395529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:14.575597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 95
95.0%
0.09 1
 
1.0%
0.36 1
 
1.0%
0.82 1
 
1.0%
0.29 1
 
1.0%
0.23 1
 
1.0%

Interactions

2023-10-09T03:56:10.594663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:10.175616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:10.824951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:10.399793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:56:14.715380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일련번호A1C_VALPSA_L_VALPSA_X_VAL
일련번호1.0001.0001.0001.000
A1C_VAL1.0001.0000.000NaN
PSA_L_VAL1.0000.0001.000NaN
PSA_X_VAL1.000NaNNaN1.000
2023-10-09T03:56:14.885225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A1C_VALPSA_L_VALPSA_X_VAL
A1C_VAL1.000-0.0841.000
PSA_L_VAL-0.0841.0001.000
PSA_X_VAL1.0001.0001.000

Missing values

2023-10-09T03:56:11.137650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:56:11.370334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

일련번호A1C_VALPSA_L_VALPSA_X_VAL
0R00107500.093.38<NA>
1R00113000.10.01<NA>
2R00117540.044.32<NA>
3R00123160.181.59<NA>
4R00125070.191.15<NA>
5R00132260.233.91<NA>
6R00137010.460.41<NA>
7R0013987<NA>2.69<NA>
8R00142640.045.59<NA>
9R0014330<NA>2.97<NA>
일련번호A1C_VALPSA_L_VALPSA_X_VAL
90R0033869<NA>0.51<NA>
91R0034103<NA>0.0<NA>
92R003418718.897.42<NA>
93R003433025.290.21<NA>
94R00343370.761.17<NA>
95R00345150.022.02<NA>
96R0034640<NA>0.27<NA>
97R0034878<NA>1.8<NA>
98R00350040.110.0<NA>
99R00350660.025.41<NA>