Overview

Dataset statistics

Number of variables6
Number of observations100
Missing cells102
Missing cells (%)17.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.3 KiB
Average record size in memory54.3 B

Variable types

Text1
Numeric4
Unsupported1

Dataset

Description당뇨 환자들의 최초 진단 시점의 키, 몸무게와 같은 신체 계측 정보와 수축기/이완기 혈압을 포함하는 생체 징후 데이터. 키와 몸무게 데이터를 이용한 Body Mass Index(BMI)를 생성할 수 있으며 혈압 데이터를 이용하여 고혈압 여부를 판단할 수 있음
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/diabetes_vital

Alerts

BDHT is highly overall correlated with BDWTHigh correlation
BDWT is highly overall correlated with BDHTHigh correlation
SYSTOLIC is highly overall correlated with DIASTOLICHigh correlation
DIASTOLIC is highly overall correlated with SYSTOLICHigh correlation
BDWT has 2 (2.0%) missing valuesMissing
Unnamed: 5 has 100 (100.0%) missing valuesMissing
RID has unique valuesUnique
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-10-08 18:56:05.636806
Analysis finished2023-10-08 18:56:12.896214
Duration7.26 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:56:13.347680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters800
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowR0000001
2nd rowR0000002
3rd rowR0000003
4th rowR0000004
5th rowR0000005
ValueCountFrequency (%)
r0000001 1
 
1.0%
r0000063 1
 
1.0%
r0000074 1
 
1.0%
r0000073 1
 
1.0%
r0000072 1
 
1.0%
r0000071 1
 
1.0%
r0000070 1
 
1.0%
r0000069 1
 
1.0%
r0000068 1
 
1.0%
r0000067 1
 
1.0%
Other values (90) 90
90.0%
2023-10-09T03:56:14.106663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 519
64.9%
R 100
 
12.5%
1 21
 
2.6%
3 20
 
2.5%
4 20
 
2.5%
5 20
 
2.5%
6 20
 
2.5%
7 20
 
2.5%
8 20
 
2.5%
9 20
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.5%
Uppercase Letter 100
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 519
74.1%
1 21
 
3.0%
3 20
 
2.9%
4 20
 
2.9%
5 20
 
2.9%
6 20
 
2.9%
7 20
 
2.9%
8 20
 
2.9%
9 20
 
2.9%
2 20
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
R 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.5%
Latin 100
 
12.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 519
74.1%
1 21
 
3.0%
3 20
 
2.9%
4 20
 
2.9%
5 20
 
2.9%
6 20
 
2.9%
7 20
 
2.9%
8 20
 
2.9%
9 20
 
2.9%
2 20
 
2.9%
Latin
ValueCountFrequency (%)
R 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 519
64.9%
R 100
 
12.5%
1 21
 
2.6%
3 20
 
2.5%
4 20
 
2.5%
5 20
 
2.5%
6 20
 
2.5%
7 20
 
2.5%
8 20
 
2.5%
9 20
 
2.5%

BDHT
Real number (ℝ)

HIGH CORRELATION 

Distinct61
Distinct (%)61.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean162.037
Minimum142.9
Maximum181
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:14.507461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum142.9
5-th percentile147
Q1155.075
median162.5
Q3168.275
95-th percentile176.05
Maximum181
Range38.1
Interquartile range (IQR)13.2

Descriptive statistics

Standard deviation8.9937538
Coefficient of variation (CV)0.055504322
Kurtosis-0.82054362
Mean162.037
Median Absolute Deviation (MAD)6.35
Skewness0.00051419387
Sum16203.7
Variance80.887607
MonotonicityNot monotonic
2023-10-09T03:56:14.762103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
165.0 6
 
6.0%
163.0 5
 
5.0%
172.0 4
 
4.0%
158.0 4
 
4.0%
168.0 4
 
4.0%
160.0 4
 
4.0%
147.0 4
 
4.0%
153.0 3
 
3.0%
157.0 3
 
3.0%
174.0 3
 
3.0%
Other values (51) 60
60.0%
ValueCountFrequency (%)
142.9 1
 
1.0%
146.0 1
 
1.0%
147.0 4
4.0%
148.0 1
 
1.0%
149.0 2
2.0%
149.2 1
 
1.0%
150.0 2
2.0%
150.6 2
2.0%
151.0 1
 
1.0%
152.0 1
 
1.0%
ValueCountFrequency (%)
181.0 1
 
1.0%
179.5 1
 
1.0%
178.0 2
2.0%
177.0 1
 
1.0%
176.0 1
 
1.0%
175.5 1
 
1.0%
175.0 1
 
1.0%
174.9 1
 
1.0%
174.0 3
3.0%
173.0 1
 
1.0%

BDWT
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct64
Distinct (%)65.3%
Missing2
Missing (%)2.0%
Infinite0
Infinite (%)0.0%
Mean66.109694
Minimum43
Maximum135
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:15.025298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum43
5-th percentile47.485
Q157
median63
Q374.75
95-th percentile84.9425
Maximum135
Range92
Interquartile range (IQR)17.75

Descriptive statistics

Standard deviation14.279624
Coefficient of variation (CV)0.21599894
Kurtosis4.9732653
Mean66.109694
Median Absolute Deviation (MAD)9
Skewness1.5526674
Sum6478.75
Variance203.90766
MonotonicityNot monotonic
2023-10-09T03:56:15.296170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
63.0 5
 
5.0%
68.0 5
 
5.0%
62.0 5
 
5.0%
59.0 3
 
3.0%
74.0 3
 
3.0%
72.0 3
 
3.0%
57.0 3
 
3.0%
76.0 3
 
3.0%
61.4 2
 
2.0%
75.0 2
 
2.0%
Other values (54) 64
64.0%
ValueCountFrequency (%)
43.0 1
1.0%
45.7 1
1.0%
47.0 2
2.0%
47.4 1
1.0%
47.5 1
1.0%
48.0 1
1.0%
48.7 1
1.0%
49.1 1
1.0%
49.5 1
1.0%
50.0 1
1.0%
ValueCountFrequency (%)
135.0 1
1.0%
109.4 1
1.0%
98.1 1
1.0%
94.0 1
1.0%
90.0 1
1.0%
84.05 1
1.0%
84.0 1
1.0%
83.5 1
1.0%
83.0 2
2.0%
82.1 1
1.0%

SYSTOLIC
Real number (ℝ)

HIGH CORRELATION 

Distinct41
Distinct (%)41.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean127.46
Minimum95
Maximum191
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:15.516871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum95
5-th percentile104.95
Q1115.75
median123
Q3139.25
95-th percentile155.25
Maximum191
Range96
Interquartile range (IQR)23.5

Descriptive statistics

Standard deviation17.443948
Coefficient of variation (CV)0.13685821
Kurtosis0.97937086
Mean127.46
Median Absolute Deviation (MAD)11.5
Skewness0.81495383
Sum12746
Variance304.29131
MonotonicityNot monotonic
2023-10-09T03:56:15.769116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
110 15
 
15.0%
120 12
 
12.0%
130 9
 
9.0%
140 5
 
5.0%
131 4
 
4.0%
134 4
 
4.0%
117 3
 
3.0%
119 3
 
3.0%
115 3
 
3.0%
122 3
 
3.0%
Other values (31) 39
39.0%
ValueCountFrequency (%)
95 1
 
1.0%
96 1
 
1.0%
99 1
 
1.0%
100 1
 
1.0%
104 1
 
1.0%
105 1
 
1.0%
106 1
 
1.0%
110 15
15.0%
115 3
 
3.0%
116 1
 
1.0%
ValueCountFrequency (%)
191 1
1.0%
174 1
1.0%
166 1
1.0%
160 2
2.0%
155 1
1.0%
154 2
2.0%
152 1
1.0%
151 2
2.0%
150 2
2.0%
149 1
1.0%

DIASTOLIC
Real number (ℝ)

HIGH CORRELATION 

Distinct33
Distinct (%)33.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean76.1
Minimum60
Maximum119
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:15.996734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum60
5-th percentile60
Q169.75
median75.5
Q380
95-th percentile97.05
Maximum119
Range59
Interquartile range (IQR)10.25

Descriptive statistics

Standard deviation11.31594
Coefficient of variation (CV)0.1486983
Kurtosis1.5096128
Mean76.1
Median Absolute Deviation (MAD)5.5
Skewness1.0076088
Sum7610
Variance128.05051
MonotonicityNot monotonic
2023-10-09T03:56:16.230848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
70 18
18.0%
80 17
17.0%
60 8
 
8.0%
82 4
 
4.0%
67 4
 
4.0%
76 3
 
3.0%
69 3
 
3.0%
64 3
 
3.0%
95 3
 
3.0%
72 3
 
3.0%
Other values (23) 34
34.0%
ValueCountFrequency (%)
60 8
8.0%
62 1
 
1.0%
63 2
 
2.0%
64 3
 
3.0%
65 2
 
2.0%
66 2
 
2.0%
67 4
 
4.0%
69 3
 
3.0%
70 18
18.0%
71 2
 
2.0%
ValueCountFrequency (%)
119 1
 
1.0%
109 1
 
1.0%
99 2
2.0%
98 1
 
1.0%
97 1
 
1.0%
95 3
3.0%
94 1
 
1.0%
91 1
 
1.0%
90 2
2.0%
89 1
 
1.0%

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing100
Missing (%)100.0%
Memory size1.0 KiB

Interactions

2023-10-09T03:56:11.646661image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:09.209606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:10.160625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:10.948270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:12.010025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:09.552497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:10.434401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:11.144552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:12.154769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:09.729152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:10.610831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:11.297727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:12.284992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:09.881061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:10.788946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:11.454791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:56:16.544150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDBDHTBDWTSYSTOLICDIASTOLIC
RID1.0001.0001.0001.0001.000
BDHT1.0001.0000.5340.0000.440
BDWT1.0000.5341.0000.0000.270
SYSTOLIC1.0000.0000.0001.0000.770
DIASTOLIC1.0000.4400.2700.7701.000
2023-10-09T03:56:16.732741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
BDHTBDWTSYSTOLICDIASTOLIC
BDHT1.0000.6060.2440.247
BDWT0.6061.0000.2520.261
SYSTOLIC0.2440.2521.0000.630
DIASTOLIC0.2470.2610.6301.000

Missing values

2023-10-09T03:56:12.492912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:56:12.751130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

RIDBDHTBDWTSYSTOLICDIASTOLICUnnamed: 5
0R0000001160.561.413070<NA>
1R0000002150.662.611060<NA>
2R0000003172.073.612080<NA>
3R0000004168.763.014676<NA>
4R0000005152.068.013080<NA>
5R0000006165.075.611670<NA>
6R0000007157.079.014565<NA>
7R0000008158.068.012264<NA>
8R0000009163.052.011060<NA>
9R0000010165.072.012060<NA>
RIDBDHTBDWTSYSTOLICDIASTOLICUnnamed: 5
90R0000091179.594.0151109<NA>
91R0000092165.063.0174119<NA>
92R0000093172.072.013181<NA>
93R0000094168.575.2516090<NA>
94R0000095174.066.013488<NA>
95R0000096168.068.013999<NA>
96R0000097160.050.011767<NA>
97R0000098147.063.011070<NA>
98R0000099174.074.013080<NA>
99R0000100168.278.211979<NA>