Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 100 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 9.5 KiB |
Average record size in memory | 97.3 B |
Variable types
Categorical | 8 |
---|---|
Numeric | 3 |
Dataset
Description | 고지혈증 환자의 기본 정보를 OMOP CDM 형식으로 생산한 데이터 |
---|---|
Author | 가톨릭대학교 서울성모병원 |
URL | http://cmcdata.net/data/dataset/_person_2020-omop-cdm |
day_of_birth has constant value "" | Constant |
care_site_id has constant value "" | Constant |
race_concept_id is highly overall correlated with ethnicity_concept_id and 2 other fields | High correlation |
gender_concept_id is highly overall correlated with gender_source_value and 1 other fields | High correlation |
gender_source_value is highly overall correlated with gender_concept_id and 1 other fields | High correlation |
ethnicity_source_value is highly overall correlated with race_concept_id and 2 other fields | High correlation |
race_source_value is highly overall correlated with year_of_birth and 7 other fields | High correlation |
ethnicity_concept_id is highly overall correlated with race_concept_id and 2 other fields | High correlation |
year_of_birth is highly overall correlated with race_source_value | High correlation |
month_of_birth is highly overall correlated with race_source_value | High correlation |
location_id is highly overall correlated with race_source_value | High correlation |
race_concept_id is highly imbalanced (71.4%) | Imbalance |
ethnicity_concept_id is highly imbalanced (71.4%) | Imbalance |
race_source_value is highly imbalanced (71.4%) | Imbalance |
ethnicity_source_value is highly imbalanced (71.4%) | Imbalance |
location_id has 4 (4.0%) zeros | Zeros |
Reproduction
Analysis started | 2023-10-08 18:58:19.858446 |
---|---|
Analysis finished | 2023-10-08 18:58:21.400100 |
Duration | 1.54 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
gender_concept_id
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
8532 | |
---|---|
8507 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 8532 |
---|---|
2nd row | 8507 |
3rd row | 8507 |
4th row | 8507 |
5th row | 8507 |
Common Values
Value | Count | Frequency (%) |
8532 | 55 | |
8507 | 45 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
8532 | 55 | |
8507 | 45 |
year_of_birth
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 46 |
---|---|
Distinct (%) | 46.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1952.17 |
Minimum | 1923 |
---|---|
Maximum | 2007 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 1923 |
---|---|
5-th percentile | 1932.95 |
Q1 | 1942 |
median | 1952 |
Q3 | 1960 |
95-th percentile | 1979 |
Maximum | 2007 |
Range | 84 |
Interquartile range (IQR) | 18 |
Descriptive statistics
Standard deviation | 14.091372 |
---|---|
Coefficient of variation (CV) | 0.007218312 |
Kurtosis | 1.4901061 |
Mean | 1952.17 |
Median Absolute Deviation (MAD) | 8.5 |
Skewness | 0.75011313 |
Sum | 195217 |
Variance | 198.56677 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1947 | 6 | 6.0% |
1953 | 5 | 5.0% |
1948 | 5 | 5.0% |
1960 | 4 | 4.0% |
1938 | 4 | 4.0% |
1956 | 4 | 4.0% |
1952 | 4 | 4.0% |
1955 | 4 | 4.0% |
1950 | 4 | 4.0% |
1940 | 3 | 3.0% |
Other values (36) | 57 |
Value | Count | Frequency (%) |
1923 | 1 | 1.0% |
1927 | 1 | 1.0% |
1929 | 1 | 1.0% |
1930 | 1 | 1.0% |
1932 | 1 | 1.0% |
1933 | 3 | |
1934 | 1 | 1.0% |
1935 | 2 | |
1936 | 3 | |
1938 | 4 |
Value | Count | Frequency (%) |
2007 | 1 | 1.0% |
1985 | 1 | 1.0% |
1983 | 1 | 1.0% |
1979 | 3 | |
1974 | 1 | 1.0% |
1973 | 2 | |
1970 | 2 | |
1967 | 3 | |
1966 | 2 | |
1965 | 1 | 1.0% |
month_of_birth
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 12 |
---|---|
Distinct (%) | 12.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 6.17 |
Minimum | 1 |
---|---|
Maximum | 12 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 3 |
median | 6 |
Q3 | 9 |
95-th percentile | 12 |
Maximum | 12 |
Range | 11 |
Interquartile range (IQR) | 6 |
Descriptive statistics
Standard deviation | 3.5733441 |
---|---|
Coefficient of variation (CV) | 0.57914815 |
Kurtosis | -1.2620655 |
Mean | 6.17 |
Median Absolute Deviation (MAD) | 3 |
Skewness | 0.16441211 |
Sum | 617 |
Variance | 12.768788 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
3 | 12 | |
6 | 11 | |
1 | 10 | |
9 | 10 | |
2 | 9 | |
12 | 9 | |
4 | 8 | |
11 | 8 | |
8 | 7 | |
5 | 7 | |
Other values (2) | 9 |
Value | Count | Frequency (%) |
1 | 10 | |
2 | 9 | |
3 | 12 | |
4 | 8 | |
5 | 7 | |
6 | 11 | |
7 | 4 | 4.0% |
8 | 7 | |
9 | 10 | |
10 | 5 |
Value | Count | Frequency (%) |
12 | 9 | |
11 | 8 | |
10 | 5 | |
9 | 10 | |
8 | 7 | |
7 | 4 | 4.0% |
6 | 11 | |
5 | 7 | |
4 | 8 | |
3 | 12 |
day_of_birth
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
1 |
---|
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1 |
---|---|
2nd row | 1 |
3rd row | 1 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
1 | 100 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1 | 100 |
race_concept_id
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
38003585 | |
---|---|
8552 | 5 |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 7.8 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 38003585 |
---|---|
2nd row | 38003585 |
3rd row | 38003585 |
4th row | 38003585 |
5th row | 38003585 |
Common Values
Value | Count | Frequency (%) |
38003585 | 95 | |
8552 | 5 | 5.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
38003585 | 95 | |
8552 | 5 | 5.0% |
ethnicity_concept_id
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
38003564 | |
---|---|
38003563 | 5 |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 8 |
Min length | 8 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 38003564 |
---|---|
2nd row | 38003564 |
3rd row | 38003564 |
4th row | 38003564 |
5th row | 38003564 |
Common Values
Value | Count | Frequency (%) |
38003564 | 95 | |
38003563 | 5 | 5.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
38003564 | 95 | |
38003563 | 5 | 5.0% |
location_id
Real number (ℝ)
HIGH CORRELATION
  ZEROS
 
Distinct | 44 |
---|---|
Distinct (%) | 44.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 40338346 |
Minimum | 0 |
---|---|
Maximum | 42019244 |
Zeros | 4 |
Zeros (%) | 4.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 42018961 |
Q1 | 42019016 |
median | 42019104 |
Q3 | 42019171 |
95-th percentile | 42019238 |
Maximum | 42019244 |
Range | 42019244 |
Interquartile range (IQR) | 155 |
Descriptive statistics
Standard deviation | 8275511.9 |
---|---|
Coefficient of variation (CV) | 0.20515248 |
Kurtosis | 21.143554 |
Mean | 40338346 |
Median Absolute Deviation (MAD) | 84 |
Skewness | -4.7666552 |
Sum | 4.0338346 × 109 |
Variance | 6.8484097 × 1013 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
42019171 | 16 | 16.0% |
42019016 | 6 | 6.0% |
42019008 | 5 | 5.0% |
42018961 | 4 | 4.0% |
42019244 | 4 | 4.0% |
0 | 4 | 4.0% |
42019238 | 4 | 4.0% |
42019074 | 4 | 4.0% |
42019220 | 4 | 4.0% |
42019190 | 3 | 3.0% |
Other values (34) | 46 |
Value | Count | Frequency (%) |
0 | 4 | |
42018955 | 1 | 1.0% |
42018961 | 4 | |
42018968 | 1 | 1.0% |
42018971 | 1 | 1.0% |
42018981 | 1 | 1.0% |
42018994 | 1 | 1.0% |
42018996 | 1 | 1.0% |
42019000 | 2 | 2.0% |
42019008 | 5 |
Value | Count | Frequency (%) |
42019244 | 4 | |
42019241 | 1 | 1.0% |
42019238 | 4 | |
42019226 | 2 | |
42019220 | 4 | |
42019218 | 1 | 1.0% |
42019211 | 1 | 1.0% |
42019195 | 2 | |
42019190 | 3 | |
42019189 | 1 | 1.0% |
care_site_id
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
1300 |
---|
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1300 |
---|---|
2nd row | 1300 |
3rd row | 1300 |
4th row | 1300 |
5th row | 1300 |
Common Values
Value | Count | Frequency (%) |
1300 | 100 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1300 | 100 |
gender_source_value
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
F | |
---|---|
M |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | F |
---|---|
2nd row | M |
3rd row | M |
4th row | M |
5th row | M |
Common Values
Value | Count | Frequency (%) |
F | 55 | |
M | 45 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
f | 55 | |
m | 45 |
race_source_value
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Korean | |
---|---|
<NA> | 5 |
Length
Max length | 6 |
---|---|
Median length | 6 |
Mean length | 5.9 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Korean |
---|---|
2nd row | Korean |
3rd row | Korean |
4th row | Korean |
5th row | Korean |
Common Values
Value | Count | Frequency (%) |
Korean | 95 | |
<NA> | 5 | 5.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
korean | 95 | |
na | 5 | 5.0% |
ethnicity_source_value
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Not Hispanic or Latino | |
---|---|
Hispanic | 5 |
Length
Max length | 22 |
---|---|
Median length | 22 |
Mean length | 21.3 |
Min length | 8 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Not Hispanic or Latino |
---|---|
2nd row | Not Hispanic or Latino |
3rd row | Not Hispanic or Latino |
4th row | Not Hispanic or Latino |
5th row | Not Hispanic or Latino |
Common Values
Value | Count | Frequency (%) |
Not Hispanic or Latino | 95 | |
Hispanic | 5 | 5.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
hispanic | 100 | |
not | 95 | |
or | 95 | |
latino | 95 |
gender_concept_id | year_of_birth | month_of_birth | race_concept_id | ethnicity_concept_id | location_id | gender_source_value | ethnicity_source_value | |
---|---|---|---|---|---|---|---|---|
gender_concept_id | 1.000 | 0.068 | 0.000 | 0.000 | 0.000 | 0.000 | 0.999 | 0.000 |
year_of_birth | 0.068 | 1.000 | 0.000 | 0.380 | 0.380 | 0.000 | 0.068 | 0.380 |
month_of_birth | 0.000 | 0.000 | 1.000 | 0.395 | 0.395 | 0.000 | 0.000 | 0.395 |
race_concept_id | 0.000 | 0.380 | 0.395 | 1.000 | 0.986 | 0.000 | 0.000 | 0.986 |
ethnicity_concept_id | 0.000 | 0.380 | 0.395 | 0.986 | 1.000 | 0.000 | 0.000 | 0.986 |
location_id | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 |
gender_source_value | 0.999 | 0.068 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 |
ethnicity_source_value | 0.000 | 0.380 | 0.395 | 0.986 | 0.986 | 0.000 | 0.000 | 1.000 |
race_concept_id | gender_concept_id | gender_source_value | ethnicity_source_value | race_source_value | ethnicity_concept_id | |
---|---|---|---|---|---|---|
race_concept_id | 1.000 | 0.000 | 0.000 | 0.894 | 1.000 | 0.894 |
gender_concept_id | 0.000 | 1.000 | 0.980 | 0.000 | 1.000 | 0.000 |
gender_source_value | 0.000 | 0.980 | 1.000 | 0.000 | 1.000 | 0.000 |
ethnicity_source_value | 0.894 | 0.000 | 0.000 | 1.000 | 1.000 | 0.894 |
race_source_value | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
ethnicity_concept_id | 0.894 | 0.000 | 0.000 | 0.894 | 1.000 | 1.000 |
year_of_birth | month_of_birth | location_id | gender_concept_id | race_concept_id | ethnicity_concept_id | gender_source_value | race_source_value | ethnicity_source_value | |
---|---|---|---|---|---|---|---|---|---|
year_of_birth | 1.000 | -0.077 | -0.003 | 0.000 | 0.366 | 0.366 | 0.000 | 1.000 | 0.366 |
month_of_birth | -0.077 | 1.000 | -0.081 | 0.000 | 0.289 | 0.289 | 0.000 | 1.000 | 0.289 |
location_id | -0.003 | -0.081 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 |
gender_concept_id | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.980 | 1.000 | 0.000 |
race_concept_id | 0.366 | 0.289 | 0.000 | 0.000 | 1.000 | 0.894 | 0.000 | 1.000 | 0.894 |
ethnicity_concept_id | 0.366 | 0.289 | 0.000 | 0.000 | 0.894 | 1.000 | 0.000 | 1.000 | 0.894 |
gender_source_value | 0.000 | 0.000 | 0.000 | 0.980 | 0.000 | 0.000 | 1.000 | 1.000 | 0.000 |
race_source_value | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
ethnicity_source_value | 0.366 | 0.289 | 0.000 | 0.000 | 0.894 | 0.894 | 0.000 | 1.000 | 1.000 |
gender_concept_id | year_of_birth | month_of_birth | day_of_birth | race_concept_id | ethnicity_concept_id | location_id | care_site_id | gender_source_value | race_source_value | ethnicity_source_value | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8532 | 1957 | 4 | 1 | 38003585 | 38003564 | 42019066 | 1300 | F | Korean | Not Hispanic or Latino |
1 | 8507 | 1933 | 2 | 1 | 38003585 | 38003564 | 42019046 | 1300 | M | Korean | Not Hispanic or Latino |
2 | 8507 | 1934 | 8 | 1 | 38003585 | 38003564 | 42018981 | 1300 | M | Korean | Not Hispanic or Latino |
3 | 8507 | 1953 | 6 | 1 | 38003585 | 38003564 | 42019032 | 1300 | M | Korean | Not Hispanic or Latino |
4 | 8507 | 1983 | 1 | 1 | 38003585 | 38003564 | 42019171 | 1300 | M | Korean | Not Hispanic or Latino |
5 | 8532 | 1966 | 1 | 1 | 38003585 | 38003564 | 42019176 | 1300 | F | Korean | Not Hispanic or Latino |
6 | 8507 | 1945 | 8 | 1 | 38003585 | 38003564 | 42019058 | 1300 | M | Korean | Not Hispanic or Latino |
7 | 8507 | 1927 | 11 | 1 | 38003585 | 38003564 | 42019016 | 1300 | M | Korean | Not Hispanic or Latino |
8 | 8507 | 1938 | 1 | 1 | 38003585 | 38003564 | 42019171 | 1300 | M | Korean | Not Hispanic or Latino |
9 | 8507 | 1938 | 4 | 1 | 38003585 | 38003564 | 42019080 | 1300 | M | Korean | Not Hispanic or Latino |
gender_concept_id | year_of_birth | month_of_birth | day_of_birth | race_concept_id | ethnicity_concept_id | location_id | care_site_id | gender_source_value | race_source_value | ethnicity_source_value | |
---|---|---|---|---|---|---|---|---|---|---|---|
90 | 8507 | 1974 | 2 | 1 | 38003585 | 38003564 | 42019000 | 1300 | M | Korean | Not Hispanic or Latino |
91 | 8507 | 1962 | 10 | 1 | 38003585 | 38003564 | 42019171 | 1300 | M | Korean | Not Hispanic or Latino |
92 | 8532 | 1956 | 12 | 1 | 38003585 | 38003564 | 42018961 | 1300 | F | Korean | Not Hispanic or Latino |
93 | 8532 | 1940 | 12 | 1 | 38003585 | 38003564 | 42019171 | 1300 | F | Korean | Not Hispanic or Latino |
94 | 8507 | 1979 | 9 | 1 | 8552 | 38003563 | 0 | 1300 | M | <NA> | Hispanic |
95 | 8532 | 1944 | 2 | 1 | 38003585 | 38003564 | 42019074 | 1300 | F | Korean | Not Hispanic or Latino |
96 | 8532 | 1951 | 10 | 1 | 38003585 | 38003564 | 42019033 | 1300 | F | Korean | Not Hispanic or Latino |
97 | 8507 | 1955 | 7 | 1 | 38003585 | 38003564 | 42019008 | 1300 | M | Korean | Not Hispanic or Latino |
98 | 8532 | 1942 | 4 | 1 | 38003585 | 38003564 | 42019061 | 1300 | F | Korean | Not Hispanic or Latino |
99 | 8532 | 1960 | 6 | 1 | 38003585 | 38003564 | 42019226 | 1300 | F | Korean | Not Hispanic or Latino |