Overview

Dataset statistics

Number of variables8
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.1 KiB
Average record size in memory72.3 B

Variable types

Numeric4
Categorical4

Dataset

Description당뇨 환자의 검사 기록을 OMOP CDM 형식으로 생산한 데이터
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/diabetes_measurement_2020-omop-cdm

Alerts

measurement_type_concept_id has constant value ""Constant
unit_concept_id is highly overall correlated with unit_source_valueHigh correlation
unit_source_value is highly overall correlated with measurement_concept_id and 2 other fieldsHigh correlation
measurement_id is highly overall correlated with measurement_dateHigh correlation
measurement_concept_id is highly overall correlated with unit_source_valueHigh correlation
measurement_date is highly overall correlated with measurement_idHigh correlation
operator_concept_id is highly overall correlated with unit_source_valueHigh correlation
unit_concept_id is highly imbalanced (91.9%)Imbalance
measurement_id has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:56:34.184416
Analysis finished2023-10-08 18:56:39.266296
Duration5.08 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

measurement_id
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.6291679 × 108
Minimum1.5345084 × 108
Maximum3.7207922 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:39.450878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.5345084 × 108
5-th percentile1.5351941 × 108
Q11.6105586 × 108
median3.0207249 × 108
Q33.2170486 × 108
95-th percentile3.5887188 × 108
Maximum3.7207922 × 108
Range2.1862838 × 108
Interquartile range (IQR)1.6064899 × 108

Descriptive statistics

Standard deviation77747132
Coefficient of variation (CV)0.29571003
Kurtosis-1.4237216
Mean2.6291679 × 108
Median Absolute Deviation (MAD)27392429
Skewness-0.47362182
Sum2.6291679 × 1010
Variance6.0446165 × 1015
MonotonicityNot monotonic
2023-10-09T03:56:39.883373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
290835079 1
 
1.0%
302597039 1
 
1.0%
153519411 1
 
1.0%
153714485 1
 
1.0%
153796045 1
 
1.0%
357611797 1
 
1.0%
153609883 1
 
1.0%
305401595 1
 
1.0%
324986769 1
 
1.0%
303486949 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
153450843 1
1.0%
153450844 1
1.0%
153450851 1
1.0%
153450852 1
1.0%
153519410 1
1.0%
153519411 1
1.0%
153519418 1
1.0%
153519419 1
1.0%
153609882 1
1.0%
153609883 1
1.0%
ValueCountFrequency (%)
372079224 1
1.0%
372079223 1
1.0%
372079213 1
1.0%
366916824 1
1.0%
366916823 1
1.0%
358448458 1
1.0%
358448456 1
1.0%
357611806 1
1.0%
357611805 1
1.0%
357611797 1
1.0%

measurement_concept_id
Real number (ℝ)

HIGH CORRELATION 

Distinct7
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3014097.5
Minimum3004410
Maximum3036887
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:40.908721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3004410
5-th percentile3006923
Q13006923
median3013721
Q33016723
95-th percentile3036887
Maximum3036887
Range32477
Interquartile range (IQR)9800

Descriptive statistics

Standard deviation7248.7998
Coefficient of variation (CV)0.0024049652
Kurtosis4.5441326
Mean3014097.5
Median Absolute Deviation (MAD)3002
Skewness1.9739808
Sum3.0140975 × 108
Variance52545099
MonotonicityNot monotonic
2023-10-09T03:56:41.261778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
3006923 25
25.0%
3013721 24
24.0%
3016723 21
21.0%
3013682 20
20.0%
3036887 7
 
7.0%
3009966 2
 
2.0%
3004410 1
 
1.0%
ValueCountFrequency (%)
3004410 1
 
1.0%
3006923 25
25.0%
3009966 2
 
2.0%
3013682 20
20.0%
3013721 24
24.0%
3016723 21
21.0%
3036887 7
 
7.0%
ValueCountFrequency (%)
3036887 7
 
7.0%
3016723 21
21.0%
3013721 24
24.0%
3013682 20
20.0%
3009966 2
 
2.0%
3006923 25
25.0%
3004410 1
 
1.0%

measurement_date
Real number (ℝ)

HIGH CORRELATION 

Distinct21
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201757.6
Minimum201503
Maximum202002
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:41.520494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum201503
5-th percentile201602
Q1201707
median201710
Q3201812
95-th percentile201908.15
Maximum202002
Range499
Interquartile range (IQR)105

Descriptive statistics

Standard deviation111.43128
Coefficient of variation (CV)0.00055230274
Kurtosis-0.16302157
Mean201757.6
Median Absolute Deviation (MAD)97.5
Skewness-0.095374758
Sum20175760
Variance12416.929
MonotonicityNot monotonic
2023-10-09T03:56:41.762440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
201707 24
24.0%
201812 9
 
9.0%
201802 7
 
7.0%
201602 6
 
6.0%
201908 6
 
6.0%
201610 5
 
5.0%
201708 5
 
5.0%
201905 4
 
4.0%
201503 4
 
4.0%
201901 4
 
4.0%
Other values (11) 26
26.0%
ValueCountFrequency (%)
201503 4
 
4.0%
201602 6
 
6.0%
201606 1
 
1.0%
201610 5
 
5.0%
201707 24
24.0%
201708 5
 
5.0%
201709 2
 
2.0%
201710 4
 
4.0%
201802 7
 
7.0%
201803 4
 
4.0%
ValueCountFrequency (%)
202002 3
 
3.0%
201911 2
 
2.0%
201908 6
6.0%
201905 4
4.0%
201901 4
4.0%
201812 9
9.0%
201811 3
 
3.0%
201809 1
 
1.0%
201808 2
 
2.0%
201807 1
 
1.0%

measurement_type_concept_id
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
44818702
100 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row44818702
2nd row44818702
3rd row44818702
4th row44818702
5th row44818702

Common Values

ValueCountFrequency (%)
44818702 100
100.0%

Length

2023-10-09T03:56:41.978223image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:42.129833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
44818702 100
100.0%

operator_concept_id
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
<NA>
61 
4172704
24 
4171756
15 

Length

Max length7
Median length4
Mean length5.17
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row4172704

Common Values

ValueCountFrequency (%)
<NA> 61
61.0%
4172704 24
 
24.0%
4171756 15
 
15.0%

Length

2023-10-09T03:56:42.327553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:42.505189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 61
61.0%
4172704 24
 
24.0%
4171756 15
 
15.0%

value_as_number
Real number (ℝ)

Distinct35
Distinct (%)35.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.16
Minimum0
Maximum307
Zeros1
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:42.836587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q18
median13
Q318.25
95-th percentile167.1
Maximum307
Range307
Interquartile range (IQR)10.25

Descriptive statistics

Standard deviation55.306076
Coefficient of variation (CV)1.8966418
Kurtosis10.832609
Mean29.16
Median Absolute Deviation (MAD)5
Skewness3.2996582
Sum2916
Variance3058.762
MonotonicityNot monotonic
2023-10-09T03:56:43.149538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
1 15
 
15.0%
15 10
 
10.0%
16 6
 
6.0%
12 5
 
5.0%
8 5
 
5.0%
13 5
 
5.0%
2 4
 
4.0%
11 4
 
4.0%
14 4
 
4.0%
9 4
 
4.0%
Other values (25) 38
38.0%
ValueCountFrequency (%)
0 1
 
1.0%
1 15
15.0%
2 4
 
4.0%
4 1
 
1.0%
7 3
 
3.0%
8 5
 
5.0%
9 4
 
4.0%
10 4
 
4.0%
11 4
 
4.0%
12 5
 
5.0%
ValueCountFrequency (%)
307 1
1.0%
259 1
1.0%
212 1
1.0%
199 1
1.0%
188 1
1.0%
166 1
1.0%
162 1
1.0%
130 1
1.0%
95 1
1.0%
66 1
1.0%

unit_concept_id
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
99 
8554
 
1

Length

Max length4
Median length1
Mean length1.03
Min length1

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 99
99.0%
8554 1
 
1.0%

Length

2023-10-09T03:56:43.428399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:43.642707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 99
99.0%
8554 1
 
1.0%

unit_source_value
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
㎎/㎗
50 
U/ℓ
49 
%
 
1

Length

Max length3
Median length3
Mean length2.98
Min length1

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st rowU/ℓ
2nd rowU/ℓ
3rd rowU/ℓ
4th rowU/ℓ
5th rowU/ℓ

Common Values

ValueCountFrequency (%)
㎎/㎗ 50
50.0%
U/ℓ 49
49.0%
% 1
 
1.0%

Length

2023-10-09T03:56:43.908406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:44.117586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
㎎/㎗ 50
50.0%
u/ℓ 49
49.0%
1
 
1.0%

Interactions

2023-10-09T03:56:38.127018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:34.907338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:35.897016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:36.932401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:38.284203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:35.049636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:36.133935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:37.151973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:38.461645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:35.287474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:36.424072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:37.464267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:38.629241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:35.556211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:36.706424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:37.835128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:56:44.257581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
measurement_idmeasurement_concept_idmeasurement_dateoperator_concept_idvalue_as_numberunit_concept_idunit_source_value
measurement_id1.0000.2100.8900.5870.4660.0000.000
measurement_concept_id0.2101.0000.0000.3420.797NaN0.610
measurement_date0.8900.0001.0000.4140.3830.0000.000
operator_concept_id0.5870.3420.4141.0000.0880.0000.460
value_as_number0.4660.7970.3830.0881.0000.0000.000
unit_concept_id0.000NaN0.0000.0000.0001.0001.000
unit_source_value0.0000.6100.0000.4600.0001.0001.000
2023-10-09T03:56:44.488602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
unit_concept_idoperator_concept_idunit_source_value
unit_concept_id1.0000.0000.995
operator_concept_id0.0001.0000.700
unit_source_value0.9950.7001.000
2023-10-09T03:56:44.680831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
measurement_idmeasurement_concept_idmeasurement_datevalue_as_numberoperator_concept_idunit_concept_idunit_source_value
measurement_id1.0000.0110.7580.2140.3990.0000.000
measurement_concept_id0.0111.000-0.047-0.1310.3180.0000.504
measurement_date0.758-0.0471.0000.0190.4080.0000.000
value_as_number0.214-0.1310.0191.0000.0590.0000.000
operator_concept_id0.3990.3180.4080.0591.0000.0000.700
unit_concept_id0.0000.0000.0000.0000.0001.0000.995
unit_source_value0.0000.5040.0000.0000.7000.9951.000

Missing values

2023-10-09T03:56:38.928615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:56:39.167738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

measurement_idmeasurement_concept_idmeasurement_datemeasurement_type_concept_idoperator_concept_idvalue_as_numberunit_concept_idunit_source_value
0290835079300692320161044818702<NA>280U/ℓ
1310590893300692320180344818702<NA>130U/ℓ
2323081168300692320181244818702<NA>200U/ℓ
3285208182301372120160644818702<NA>230U/ℓ
42810520283013721201602448187024172704500U/ℓ
52810520293006923201602448187024172704950U/ℓ
6290835078301372120161044818702<NA>190U/ℓ
735844845830368872019084481870241727043070㎎/㎗
828105203430368872016024481870241727042590㎎/㎗
9317072245300441020180744818702417270498554%
measurement_idmeasurement_concept_idmeasurement_datemeasurement_type_concept_idoperator_concept_idvalue_as_numberunit_concept_idunit_source_value
90319151025301672320180944818702417270420㎎/㎗
913576117963013682201908448187024172704250㎎/㎗
92372079213301672320200244818702417270420㎎/㎗
933217048533013682201811448187024172704430㎎/㎗
943139110783013682201805448187024172704430㎎/㎗
951728826803013682201812448187024172704300㎎/㎗
96302072482301672320170844818702417270410㎎/㎗
97305401589301672320171044818702417270410㎎/㎗
98153450844301672320170744818702<NA>10㎎/㎗
99153683657301672320170744818702<NA>10㎎/㎗