Dataset statistics
Number of variables | 8 |
---|---|
Number of observations | 100 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 7.1 KiB |
Average record size in memory | 72.3 B |
Variable types
Numeric | 4 |
---|---|
Categorical | 4 |
Dataset
Description | 당뇨 환자의 검사 기록을 OMOP CDM 형식으로 생산한 데이터 |
---|---|
Author | 가톨릭대학교 서울성모병원 |
URL | http://cmcdata.net/data/dataset/diabetes_measurement_2020-omop-cdm |
measurement_type_concept_id has constant value "" | Constant |
unit_concept_id is highly overall correlated with unit_source_value | High correlation |
unit_source_value is highly overall correlated with measurement_concept_id and 2 other fields | High correlation |
measurement_id is highly overall correlated with measurement_date | High correlation |
measurement_concept_id is highly overall correlated with unit_source_value | High correlation |
measurement_date is highly overall correlated with measurement_id | High correlation |
operator_concept_id is highly overall correlated with unit_source_value | High correlation |
unit_concept_id is highly imbalanced (91.9%) | Imbalance |
measurement_id has unique values | Unique |
Reproduction
Analysis started | 2023-10-08 18:56:34.184416 |
---|---|
Analysis finished | 2023-10-08 18:56:39.266296 |
Duration | 5.08 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
measurement_id
Real number (ℝ)
HIGH CORRELATION
  UNIQUE
 
Distinct | 100 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.6291679 × 108 |
Minimum | 1.5345084 × 108 |
---|---|
Maximum | 3.7207922 × 108 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 1.5345084 × 108 |
---|---|
5-th percentile | 1.5351941 × 108 |
Q1 | 1.6105586 × 108 |
median | 3.0207249 × 108 |
Q3 | 3.2170486 × 108 |
95-th percentile | 3.5887188 × 108 |
Maximum | 3.7207922 × 108 |
Range | 2.1862838 × 108 |
Interquartile range (IQR) | 1.6064899 × 108 |
Descriptive statistics
Standard deviation | 77747132 |
---|---|
Coefficient of variation (CV) | 0.29571003 |
Kurtosis | -1.4237216 |
Mean | 2.6291679 × 108 |
Median Absolute Deviation (MAD) | 27392429 |
Skewness | -0.47362182 |
Sum | 2.6291679 × 1010 |
Variance | 6.0446165 × 1015 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
290835079 | 1 | 1.0% |
302597039 | 1 | 1.0% |
153519411 | 1 | 1.0% |
153714485 | 1 | 1.0% |
153796045 | 1 | 1.0% |
357611797 | 1 | 1.0% |
153609883 | 1 | 1.0% |
305401595 | 1 | 1.0% |
324986769 | 1 | 1.0% |
303486949 | 1 | 1.0% |
Other values (90) | 90 |
Value | Count | Frequency (%) |
153450843 | 1 | |
153450844 | 1 | |
153450851 | 1 | |
153450852 | 1 | |
153519410 | 1 | |
153519411 | 1 | |
153519418 | 1 | |
153519419 | 1 | |
153609882 | 1 | |
153609883 | 1 |
Value | Count | Frequency (%) |
372079224 | 1 | |
372079223 | 1 | |
372079213 | 1 | |
366916824 | 1 | |
366916823 | 1 | |
358448458 | 1 | |
358448456 | 1 | |
357611806 | 1 | |
357611805 | 1 | |
357611797 | 1 |
measurement_concept_id
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 7 |
---|---|
Distinct (%) | 7.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 3014097.5 |
Minimum | 3004410 |
---|---|
Maximum | 3036887 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 3004410 |
---|---|
5-th percentile | 3006923 |
Q1 | 3006923 |
median | 3013721 |
Q3 | 3016723 |
95-th percentile | 3036887 |
Maximum | 3036887 |
Range | 32477 |
Interquartile range (IQR) | 9800 |
Descriptive statistics
Standard deviation | 7248.7998 |
---|---|
Coefficient of variation (CV) | 0.0024049652 |
Kurtosis | 4.5441326 |
Mean | 3014097.5 |
Median Absolute Deviation (MAD) | 3002 |
Skewness | 1.9739808 |
Sum | 3.0140975 × 108 |
Variance | 52545099 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
3006923 | 25 | |
3013721 | 24 | |
3016723 | 21 | |
3013682 | 20 | |
3036887 | 7 | 7.0% |
3009966 | 2 | 2.0% |
3004410 | 1 | 1.0% |
Value | Count | Frequency (%) |
3004410 | 1 | 1.0% |
3006923 | 25 | |
3009966 | 2 | 2.0% |
3013682 | 20 | |
3013721 | 24 | |
3016723 | 21 | |
3036887 | 7 | 7.0% |
Value | Count | Frequency (%) |
3036887 | 7 | 7.0% |
3016723 | 21 | |
3013721 | 24 | |
3013682 | 20 | |
3009966 | 2 | 2.0% |
3006923 | 25 | |
3004410 | 1 | 1.0% |
measurement_date
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 21 |
---|---|
Distinct (%) | 21.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 201757.6 |
Minimum | 201503 |
---|---|
Maximum | 202002 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 201503 |
---|---|
5-th percentile | 201602 |
Q1 | 201707 |
median | 201710 |
Q3 | 201812 |
95-th percentile | 201908.15 |
Maximum | 202002 |
Range | 499 |
Interquartile range (IQR) | 105 |
Descriptive statistics
Standard deviation | 111.43128 |
---|---|
Coefficient of variation (CV) | 0.00055230274 |
Kurtosis | -0.16302157 |
Mean | 201757.6 |
Median Absolute Deviation (MAD) | 97.5 |
Skewness | -0.095374758 |
Sum | 20175760 |
Variance | 12416.929 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
201707 | 24 | |
201812 | 9 | 9.0% |
201802 | 7 | 7.0% |
201602 | 6 | 6.0% |
201908 | 6 | 6.0% |
201610 | 5 | 5.0% |
201708 | 5 | 5.0% |
201905 | 4 | 4.0% |
201503 | 4 | 4.0% |
201901 | 4 | 4.0% |
Other values (11) | 26 |
Value | Count | Frequency (%) |
201503 | 4 | 4.0% |
201602 | 6 | 6.0% |
201606 | 1 | 1.0% |
201610 | 5 | 5.0% |
201707 | 24 | |
201708 | 5 | 5.0% |
201709 | 2 | 2.0% |
201710 | 4 | 4.0% |
201802 | 7 | 7.0% |
201803 | 4 | 4.0% |
Value | Count | Frequency (%) |
202002 | 3 | 3.0% |
201911 | 2 | 2.0% |
201908 | 6 | |
201905 | 4 | |
201901 | 4 | |
201812 | 9 | |
201811 | 3 | 3.0% |
201809 | 1 | 1.0% |
201808 | 2 | 2.0% |
201807 | 1 | 1.0% |
measurement_type_concept_id
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
44818702 |
---|
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 8 |
Min length | 8 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 44818702 |
---|---|
2nd row | 44818702 |
3rd row | 44818702 |
4th row | 44818702 |
5th row | 44818702 |
Common Values
Value | Count | Frequency (%) |
44818702 | 100 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
44818702 | 100 |
operator_concept_id
Categorical
HIGH CORRELATION
 
Distinct | 3 |
---|---|
Distinct (%) | 3.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
<NA> | |
---|---|
4172704 | |
4171756 |
Length
Max length | 7 |
---|---|
Median length | 4 |
Mean length | 5.17 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | <NA> |
---|---|
2nd row | <NA> |
3rd row | <NA> |
4th row | <NA> |
5th row | 4172704 |
Common Values
Value | Count | Frequency (%) |
<NA> | 61 | |
4172704 | 24 | 24.0% |
4171756 | 15 | 15.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 61 | |
4172704 | 24 | 24.0% |
4171756 | 15 | 15.0% |
value_as_number
Real number (ℝ)
Distinct | 35 |
---|---|
Distinct (%) | 35.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 29.16 |
Minimum | 0 |
---|---|
Maximum | 307 |
Zeros | 1 |
Zeros (%) | 1.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 8 |
median | 13 |
Q3 | 18.25 |
95-th percentile | 167.1 |
Maximum | 307 |
Range | 307 |
Interquartile range (IQR) | 10.25 |
Descriptive statistics
Standard deviation | 55.306076 |
---|---|
Coefficient of variation (CV) | 1.8966418 |
Kurtosis | 10.832609 |
Mean | 29.16 |
Median Absolute Deviation (MAD) | 5 |
Skewness | 3.2996582 |
Sum | 2916 |
Variance | 3058.762 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 15 | 15.0% |
15 | 10 | 10.0% |
16 | 6 | 6.0% |
12 | 5 | 5.0% |
8 | 5 | 5.0% |
13 | 5 | 5.0% |
2 | 4 | 4.0% |
11 | 4 | 4.0% |
14 | 4 | 4.0% |
9 | 4 | 4.0% |
Other values (25) | 38 |
Value | Count | Frequency (%) |
0 | 1 | 1.0% |
1 | 15 | |
2 | 4 | 4.0% |
4 | 1 | 1.0% |
7 | 3 | 3.0% |
8 | 5 | 5.0% |
9 | 4 | 4.0% |
10 | 4 | 4.0% |
11 | 4 | 4.0% |
12 | 5 | 5.0% |
Value | Count | Frequency (%) |
307 | 1 | |
259 | 1 | |
212 | 1 | |
199 | 1 | |
188 | 1 | |
166 | 1 | |
162 | 1 | |
130 | 1 | |
95 | 1 | |
66 | 1 |
unit_concept_id
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
0 | |
---|---|
8554 | 1 |
Length
Max length | 4 |
---|---|
Median length | 1 |
Mean length | 1.03 |
Min length | 1 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 1.0% |
Sample
1st row | 0 |
---|---|
2nd row | 0 |
3rd row | 0 |
4th row | 0 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 99 | |
8554 | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 99 | |
8554 | 1 | 1.0% |
unit_source_value
Categorical
HIGH CORRELATION
 
Distinct | 3 |
---|---|
Distinct (%) | 3.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
㎎/㎗ | |
---|---|
U/ℓ | |
% | 1 |
Length
Max length | 3 |
---|---|
Median length | 3 |
Mean length | 2.98 |
Min length | 1 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 1.0% |
Sample
1st row | U/ℓ |
---|---|
2nd row | U/ℓ |
3rd row | U/ℓ |
4th row | U/ℓ |
5th row | U/ℓ |
Common Values
Value | Count | Frequency (%) |
㎎/㎗ | 50 | |
U/ℓ | 49 | |
% | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
㎎/㎗ | 50 | |
u/ℓ | 49 | |
1 | 1.0% |
measurement_id | measurement_concept_id | measurement_date | operator_concept_id | value_as_number | unit_concept_id | unit_source_value | |
---|---|---|---|---|---|---|---|
measurement_id | 1.000 | 0.210 | 0.890 | 0.587 | 0.466 | 0.000 | 0.000 |
measurement_concept_id | 0.210 | 1.000 | 0.000 | 0.342 | 0.797 | NaN | 0.610 |
measurement_date | 0.890 | 0.000 | 1.000 | 0.414 | 0.383 | 0.000 | 0.000 |
operator_concept_id | 0.587 | 0.342 | 0.414 | 1.000 | 0.088 | 0.000 | 0.460 |
value_as_number | 0.466 | 0.797 | 0.383 | 0.088 | 1.000 | 0.000 | 0.000 |
unit_concept_id | 0.000 | NaN | 0.000 | 0.000 | 0.000 | 1.000 | 1.000 |
unit_source_value | 0.000 | 0.610 | 0.000 | 0.460 | 0.000 | 1.000 | 1.000 |
unit_concept_id | operator_concept_id | unit_source_value | |
---|---|---|---|
unit_concept_id | 1.000 | 0.000 | 0.995 |
operator_concept_id | 0.000 | 1.000 | 0.700 |
unit_source_value | 0.995 | 0.700 | 1.000 |
measurement_id | measurement_concept_id | measurement_date | value_as_number | operator_concept_id | unit_concept_id | unit_source_value | |
---|---|---|---|---|---|---|---|
measurement_id | 1.000 | 0.011 | 0.758 | 0.214 | 0.399 | 0.000 | 0.000 |
measurement_concept_id | 0.011 | 1.000 | -0.047 | -0.131 | 0.318 | 0.000 | 0.504 |
measurement_date | 0.758 | -0.047 | 1.000 | 0.019 | 0.408 | 0.000 | 0.000 |
value_as_number | 0.214 | -0.131 | 0.019 | 1.000 | 0.059 | 0.000 | 0.000 |
operator_concept_id | 0.399 | 0.318 | 0.408 | 0.059 | 1.000 | 0.000 | 0.700 |
unit_concept_id | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.995 |
unit_source_value | 0.000 | 0.504 | 0.000 | 0.000 | 0.700 | 0.995 | 1.000 |
measurement_id | measurement_concept_id | measurement_date | measurement_type_concept_id | operator_concept_id | value_as_number | unit_concept_id | unit_source_value | |
---|---|---|---|---|---|---|---|---|
0 | 290835079 | 3006923 | 201610 | 44818702 | <NA> | 28 | 0 | U/ℓ |
1 | 310590893 | 3006923 | 201803 | 44818702 | <NA> | 13 | 0 | U/ℓ |
2 | 323081168 | 3006923 | 201812 | 44818702 | <NA> | 20 | 0 | U/ℓ |
3 | 285208182 | 3013721 | 201606 | 44818702 | <NA> | 23 | 0 | U/ℓ |
4 | 281052028 | 3013721 | 201602 | 44818702 | 4172704 | 50 | 0 | U/ℓ |
5 | 281052029 | 3006923 | 201602 | 44818702 | 4172704 | 95 | 0 | U/ℓ |
6 | 290835078 | 3013721 | 201610 | 44818702 | <NA> | 19 | 0 | U/ℓ |
7 | 358448458 | 3036887 | 201908 | 44818702 | 4172704 | 307 | 0 | ㎎/㎗ |
8 | 281052034 | 3036887 | 201602 | 44818702 | 4172704 | 259 | 0 | ㎎/㎗ |
9 | 317072245 | 3004410 | 201807 | 44818702 | 4172704 | 9 | 8554 | % |
measurement_id | measurement_concept_id | measurement_date | measurement_type_concept_id | operator_concept_id | value_as_number | unit_concept_id | unit_source_value | |
---|---|---|---|---|---|---|---|---|
90 | 319151025 | 3016723 | 201809 | 44818702 | 4172704 | 2 | 0 | ㎎/㎗ |
91 | 357611796 | 3013682 | 201908 | 44818702 | 4172704 | 25 | 0 | ㎎/㎗ |
92 | 372079213 | 3016723 | 202002 | 44818702 | 4172704 | 2 | 0 | ㎎/㎗ |
93 | 321704853 | 3013682 | 201811 | 44818702 | 4172704 | 43 | 0 | ㎎/㎗ |
94 | 313911078 | 3013682 | 201805 | 44818702 | 4172704 | 43 | 0 | ㎎/㎗ |
95 | 172882680 | 3013682 | 201812 | 44818702 | 4172704 | 30 | 0 | ㎎/㎗ |
96 | 302072482 | 3016723 | 201708 | 44818702 | 4172704 | 1 | 0 | ㎎/㎗ |
97 | 305401589 | 3016723 | 201710 | 44818702 | 4172704 | 1 | 0 | ㎎/㎗ |
98 | 153450844 | 3016723 | 201707 | 44818702 | <NA> | 1 | 0 | ㎎/㎗ |
99 | 153683657 | 3016723 | 201707 | 44818702 | <NA> | 1 | 0 | ㎎/㎗ |