Dataset statistics
Number of variables | 8 |
---|---|
Number of observations | 100 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 7.1 KiB |
Average record size in memory | 72.3 B |
Variable types
Numeric | 4 |
---|---|
Categorical | 4 |
Dataset
Description | 알코올 사용장애 환자의 검사 기록을 OMOP CDM 형식으로 생산한 데이터 |
---|---|
Author | 가톨릭대학교 서울성모병원 |
URL | http://cmcdata.net/data/dataset/alcohol_measurement_2020-omop-cdm |
measurement_type_concept_id has constant value "" | Constant |
unit_concept_id is highly overall correlated with measurement_concept_id and 1 other fields | High correlation |
unit_source_value is highly overall correlated with measurement_concept_id and 1 other fields | High correlation |
measurement_id is highly overall correlated with measurement_date | High correlation |
measurement_concept_id is highly overall correlated with value_as_number and 3 other fields | High correlation |
measurement_date is highly overall correlated with measurement_id | High correlation |
value_as_number is highly overall correlated with measurement_concept_id and 1 other fields | High correlation |
operator_concept_id is highly overall correlated with measurement_concept_id and 1 other fields | High correlation |
unit_concept_id is highly imbalanced (50.0%) | Imbalance |
measurement_id has unique values | Unique |
Reproduction
Analysis started | 2023-10-08 18:56:19.630673 |
---|---|
Analysis finished | 2023-10-08 18:56:24.270495 |
Duration | 4.64 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
measurement_id
Real number (ℝ)
HIGH CORRELATION
  UNIQUE
 
Distinct | 100 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.4703628 × 108 |
Minimum | 9700478 |
---|---|
Maximum | 3.6978949 × 108 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 9700478 |
---|---|
5-th percentile | 71784313 |
Q1 | 2.2355058 × 108 |
median | 2.5157225 × 108 |
Q3 | 3.0426758 × 108 |
95-th percentile | 3.4843393 × 108 |
Maximum | 3.6978949 × 108 |
Range | 3.6008901 × 108 |
Interquartile range (IQR) | 80716998 |
Descriptive statistics
Standard deviation | 79307109 |
---|---|
Coefficient of variation (CV) | 0.32103426 |
Kurtosis | 1.4336274 |
Mean | 2.4703628 × 108 |
Median Absolute Deviation (MAD) | 37229415 |
Skewness | -1.1605554 |
Sum | 2.4703628 × 1010 |
Variance | 6.2896176 × 1015 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
332277538 | 1 | 1.0% |
218740619 | 1 | 1.0% |
226951215 | 1 | 1.0% |
217935647 | 1 | 1.0% |
326922842 | 1 | 1.0% |
295746090 | 1 | 1.0% |
369789490 | 1 | 1.0% |
254632501 | 1 | 1.0% |
244422773 | 1 | 1.0% |
234703832 | 1 | 1.0% |
Other values (90) | 90 |
Value | Count | Frequency (%) |
9700478 | 1 | |
17606445 | 1 | |
24338175 | 1 | |
28785807 | 1 | |
68307647 | 1 | |
71967295 | 1 | |
72813659 | 1 | |
86894848 | 1 | |
106971770 | 1 | |
144791498 | 1 |
Value | Count | Frequency (%) |
369789490 | 1 | |
361389120 | 1 | |
360261430 | 1 | |
358088875 | 1 | |
349794397 | 1 | |
348362327 | 1 | |
345419988 | 1 | |
345100511 | 1 | |
341832176 | 1 | |
332277538 | 1 |
measurement_concept_id
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 16 |
---|---|
Distinct (%) | 16.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 3015101.6 |
Minimum | 3000905 |
---|---|
Maximum | 3036887 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 3000905 |
---|---|
5-th percentile | 3000963 |
Q1 | 3004501 |
median | 3013650 |
Q3 | 3023314 |
95-th percentile | 3036887 |
Maximum | 3036887 |
Range | 35982 |
Interquartile range (IQR) | 18813 |
Descriptive statistics
Standard deviation | 11751.374 |
---|---|
Coefficient of variation (CV) | 0.0038975052 |
Kurtosis | -0.98084115 |
Mean | 3015101.6 |
Median Absolute Deviation (MAD) | 9149 |
Skewness | 0.54850166 |
Sum | 3.0151016 × 108 |
Variance | 1.3809479 × 108 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
3004501 | 22 | |
3000963 | 9 | |
3036887 | 9 | |
3023314 | 9 | |
3024929 | 7 | 7.0% |
3013721 | 7 | 7.0% |
3007070 | 6 | 6.0% |
3022192 | 5 | 5.0% |
3013650 | 4 | 4.0% |
3006923 | 4 | 4.0% |
Other values (6) | 18 |
Value | Count | Frequency (%) |
3000905 | 2 | 2.0% |
3000963 | 9 | |
3004501 | 22 | |
3006923 | 4 | 4.0% |
3007070 | 6 | 6.0% |
3009966 | 4 | 4.0% |
3010156 | 2 | 2.0% |
3013650 | 4 | 4.0% |
3013721 | 7 | 7.0% |
3020416 | 2 | 2.0% |
Value | Count | Frequency (%) |
3036887 | 9 | |
3035995 | 4 | |
3026910 | 4 | |
3024929 | 7 | |
3023314 | 9 | |
3022192 | 5 | |
3020416 | 2 | 2.0% |
3013721 | 7 | |
3013650 | 4 | |
3010156 | 2 | 2.0% |
measurement_date
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 66 |
---|---|
Distinct (%) | 66.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 201466.74 |
Minimum | 201101 |
---|---|
Maximum | 201912 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 201101 |
---|---|
5-th percentile | 201102.95 |
Q1 | 201206.25 |
median | 201407.5 |
Q3 | 201801.25 |
95-th percentile | 201906.05 |
Maximum | 201912 |
Range | 811 |
Interquartile range (IQR) | 595 |
Descriptive statistics
Standard deviation | 291.0664 |
---|---|
Coefficient of variation (CV) | 0.0014447367 |
Kurtosis | -1.5353255 |
Mean | 201466.74 |
Median Absolute Deviation (MAD) | 296.5 |
Skewness | 0.15500291 |
Sum | 20146674 |
Variance | 84719.649 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
201211 | 4 | 4.0% |
201102 | 3 | 3.0% |
201803 | 3 | 3.0% |
201802 | 3 | 3.0% |
201105 | 3 | 3.0% |
201207 | 3 | 3.0% |
201311 | 3 | 3.0% |
201908 | 2 | 2.0% |
201304 | 2 | 2.0% |
201103 | 2 | 2.0% |
Other values (56) | 72 |
Value | Count | Frequency (%) |
201101 | 2 | |
201102 | 3 | |
201103 | 2 | |
201105 | 3 | |
201106 | 2 | |
201107 | 1 | 1.0% |
201108 | 2 | |
201109 | 1 | 1.0% |
201110 | 1 | 1.0% |
201111 | 2 |
Value | Count | Frequency (%) |
201912 | 1 | |
201909 | 1 | |
201908 | 2 | |
201907 | 1 | |
201906 | 1 | |
201904 | 2 | |
201903 | 1 | |
201902 | 1 | |
201812 | 2 | |
201811 | 1 |
measurement_type_concept_id
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
44818702 |
---|
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 8 |
Min length | 8 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 44818702 |
---|---|
2nd row | 44818702 |
3rd row | 44818702 |
4th row | 44818702 |
5th row | 44818702 |
Common Values
Value | Count | Frequency (%) |
44818702 | 100 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
44818702 | 100 |
operator_concept_id
Categorical
HIGH CORRELATION
 
Distinct | 3 |
---|---|
Distinct (%) | 3.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
<NA> | |
---|---|
4172704 | |
4171756 |
Length
Max length | 7 |
---|---|
Median length | 4 |
Mean length | 5.17 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | <NA> |
---|---|
2nd row | 4171756 |
3rd row | 4172704 |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 61 | |
4172704 | 24 | 24.0% |
4171756 | 15 | 15.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 61 | |
4172704 | 24 | 24.0% |
4171756 | 15 | 15.0% |
value_as_number
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 73 |
---|---|
Distinct (%) | 73.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 81.27 |
Minimum | 0 |
---|---|
Maximum | 670 |
Zeros | 1 |
Zeros (%) | 1.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 14 |
median | 42 |
Q3 | 104.75 |
95-th percentile | 243.25 |
Maximum | 670 |
Range | 670 |
Interquartile range (IQR) | 90.75 |
Descriptive statistics
Standard deviation | 110.96117 |
---|---|
Coefficient of variation (CV) | 1.3653398 |
Kurtosis | 12.505813 |
Mean | 81.27 |
Median Absolute Deviation (MAD) | 33 |
Skewness | 3.0434917 |
Sum | 8127 |
Variance | 12312.381 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 7 | 7.0% |
14 | 5 | 5.0% |
3 | 4 | 4.0% |
43 | 3 | 3.0% |
47 | 3 | 3.0% |
28 | 2 | 2.0% |
41 | 2 | 2.0% |
11 | 2 | 2.0% |
141 | 2 | 2.0% |
15 | 2 | 2.0% |
Other values (63) | 68 |
Value | Count | Frequency (%) |
0 | 1 | 1.0% |
1 | 7 | |
2 | 1 | 1.0% |
3 | 4 | |
5 | 1 | 1.0% |
7 | 1 | 1.0% |
8 | 1 | 1.0% |
9 | 1 | 1.0% |
10 | 2 | 2.0% |
11 | 2 | 2.0% |
Value | Count | Frequency (%) |
670 | 1 | |
631 | 1 | |
285 | 1 | |
262 | 1 | |
248 | 1 | |
243 | 1 | |
241 | 1 | |
239 | 1 | |
234 | 1 | |
229 | 1 |
unit_concept_id
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
0 | |
---|---|
8554 |
Length
Max length | 4 |
---|---|
Median length | 1 |
Mean length | 1.33 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 8554 |
3rd row | 0 |
4th row | 0 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 89 | |
8554 | 11 | 11.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 89 | |
8554 | 11 | 11.0% |
unit_source_value
Categorical
HIGH CORRELATION
 
Distinct | 9 |
---|---|
Distinct (%) | 9.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
㎎/㎗ | |
---|---|
U/ℓ | |
10^9/L | |
% | |
g/㎗ | |
Other values (4) |
Length
Max length | 7 |
---|---|
Median length | 3 |
Mean length | 3.15 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | g/㎗ |
---|---|
2nd row | % |
3rd row | ㎎/㎗ |
4th row | U/ℓ |
5th row | 10^9/L |
Common Values
Value | Count | Frequency (%) |
㎎/㎗ | 29 | |
U/ℓ | 19 | |
10^9/L | 14 | |
% | 11 | 11.0% |
g/㎗ | 9 | 9.0% |
초 | 9 | 9.0% |
<NA> | 5 | 5.0% |
㎎/ℓ | 2 | 2.0% |
10^12/L | 2 | 2.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
㎎/㎗ | 29 | |
u/ℓ | 19 | |
10^9/l | 14 | |
11 | 11.0% | |
g/㎗ | 9 | 9.0% |
초 | 9 | 9.0% |
na | 5 | 5.0% |
㎎/ℓ | 2 | 2.0% |
10^12/l | 2 | 2.0% |
measurement_id | measurement_concept_id | measurement_date | operator_concept_id | value_as_number | unit_concept_id | unit_source_value | |
---|---|---|---|---|---|---|---|
measurement_id | 1.000 | 0.137 | 0.778 | 0.201 | 0.000 | 0.000 | 0.000 |
measurement_concept_id | 0.137 | 1.000 | 0.163 | 0.758 | 0.599 | 0.776 | 0.903 |
measurement_date | 0.778 | 0.163 | 1.000 | 0.000 | 0.328 | 0.000 | 0.000 |
operator_concept_id | 0.201 | 0.758 | 0.000 | 1.000 | 0.447 | 0.111 | 0.635 |
value_as_number | 0.000 | 0.599 | 0.328 | 0.447 | 1.000 | 0.000 | 0.371 |
unit_concept_id | 0.000 | 0.776 | 0.000 | 0.111 | 0.000 | 1.000 | 1.000 |
unit_source_value | 0.000 | 0.903 | 0.000 | 0.635 | 0.371 | 1.000 | 1.000 |
unit_concept_id | unit_source_value | operator_concept_id | |
---|---|---|---|
unit_concept_id | 1.000 | 0.967 | 0.065 |
unit_source_value | 0.967 | 1.000 | 0.433 |
operator_concept_id | 0.065 | 0.433 | 1.000 |
measurement_id | measurement_concept_id | measurement_date | value_as_number | operator_concept_id | unit_concept_id | unit_source_value | |
---|---|---|---|---|---|---|---|
measurement_id | 1.000 | 0.029 | 0.561 | 0.163 | 0.000 | 0.000 | 0.062 |
measurement_concept_id | 0.029 | 1.000 | -0.175 | 0.605 | 0.531 | 0.582 | 0.518 |
measurement_date | 0.561 | -0.175 | 1.000 | -0.062 | 0.000 | 0.000 | 0.000 |
value_as_number | 0.163 | 0.605 | -0.062 | 1.000 | 0.519 | 0.000 | 0.212 |
operator_concept_id | 0.000 | 0.531 | 0.000 | 0.519 | 1.000 | 0.065 | 0.433 |
unit_concept_id | 0.000 | 0.582 | 0.000 | 0.000 | 0.065 | 1.000 | 0.967 |
unit_source_value | 0.062 | 0.518 | 0.000 | 0.212 | 0.433 | 0.967 | 1.000 |
measurement_id | measurement_concept_id | measurement_date | measurement_type_concept_id | operator_concept_id | value_as_number | unit_concept_id | unit_source_value | |
---|---|---|---|---|---|---|---|---|
0 | 332277538 | 3000963 | 201907 | 44818702 | <NA> | 14 | 0 | g/㎗ |
1 | 71967295 | 3023314 | 201105 | 44818702 | 4171756 | 31 | 8554 | % |
2 | 231453304 | 3004501 | 201204 | 44818702 | 4172704 | 262 | 0 | ㎎/㎗ |
3 | 261715506 | 3006923 | 201409 | 44818702 | <NA> | 23 | 0 | U/ℓ |
4 | 221858571 | 3013650 | 201106 | 44818702 | <NA> | 3 | 0 | 10^9/L |
5 | 311802146 | 3036887 | 201803 | 44818702 | <NA> | 95 | 0 | ㎎/㎗ |
6 | 228468649 | 3024929 | 201201 | 44818702 | <NA> | 157 | 0 | 10^9/L |
7 | 173245334 | 3000905 | 201812 | 44818702 | <NA> | 8 | 0 | 10^9/L |
8 | 161799588 | 3006923 | 201802 | 44818702 | 4172704 | 51 | 0 | U/ℓ |
9 | 278150542 | 3024929 | 201512 | 44818702 | <NA> | 234 | 0 | 10^9/L |
measurement_id | measurement_concept_id | measurement_date | measurement_type_concept_id | operator_concept_id | value_as_number | unit_concept_id | unit_source_value | |
---|---|---|---|---|---|---|---|---|
90 | 86894848 | 3013650 | 201207 | 44818702 | <NA> | 3 | 0 | 10^9/L |
91 | 251761423 | 3024929 | 201311 | 44818702 | <NA> | 285 | 0 | 10^9/L |
92 | 239798462 | 3004501 | 201212 | 44818702 | <NA> | 87 | 8554 | % |
93 | 156432283 | 3023314 | 201710 | 44818702 | 4171756 | 28 | 8554 | % |
94 | 235445658 | 3004501 | 201207 | 44818702 | 4172704 | 163 | 0 | ㎎/㎗ |
95 | 311293427 | 3000963 | 201803 | 44818702 | <NA> | 14 | 0 | g/㎗ |
96 | 251358585 | 3007070 | 201311 | 44818702 | 4171756 | 47 | 0 | ㎎/㎗ |
97 | 222655663 | 3023314 | 201107 | 44818702 | <NA> | 39 | 8554 | % |
98 | 361389120 | 3009966 | 201909 | 44818702 | <NA> | 75 | 0 | ㎎/㎗ |
99 | 224493643 | 3006923 | 201109 | 44818702 | 4172704 | 71 | 0 | U/ℓ |