Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 100 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 9.5 KiB |
Average record size in memory | 97.3 B |
Variable types
Categorical | 8 |
---|---|
Numeric | 3 |
Dataset
Description | 알코올 사용장애 환자의 기본 정보를 OMOP CDM 형식으로 생산한 데이터 |
---|---|
Author | 가톨릭대학교 서울성모병원 |
URL | http://cmcdata.net/data/dataset/alcohol_person_2020-omop-cdm |
day_of_birth has constant value "" | Constant |
care_site_id has constant value "" | Constant |
ethnicity_source_value is highly overall correlated with race_concept_id and 2 other fields | High correlation |
race_source_value is highly overall correlated with year_of_birth and 7 other fields | High correlation |
gender_source_value is highly overall correlated with gender_concept_id and 1 other fields | High correlation |
gender_concept_id is highly overall correlated with gender_source_value and 1 other fields | High correlation |
race_concept_id is highly overall correlated with ethnicity_concept_id and 2 other fields | High correlation |
ethnicity_concept_id is highly overall correlated with race_concept_id and 2 other fields | High correlation |
year_of_birth is highly overall correlated with race_source_value | High correlation |
month_of_birth is highly overall correlated with race_source_value | High correlation |
location_id is highly overall correlated with race_source_value | High correlation |
race_concept_id is highly imbalanced (85.9%) | Imbalance |
ethnicity_concept_id is highly imbalanced (85.9%) | Imbalance |
race_source_value is highly imbalanced (85.9%) | Imbalance |
ethnicity_source_value is highly imbalanced (85.9%) | Imbalance |
location_id has 5 (5.0%) zeros | Zeros |
Reproduction
Analysis started | 2023-10-08 18:58:03.465264 |
---|---|
Analysis finished | 2023-10-08 18:58:06.071373 |
Duration | 2.61 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
gender_concept_id
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
8507 | |
---|---|
8532 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 8507 |
---|---|
2nd row | 8532 |
3rd row | 8507 |
4th row | 8507 |
5th row | 8532 |
Common Values
Value | Count | Frequency (%) |
8507 | 84 | |
8532 | 16 | 16.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
8507 | 84 | |
8532 | 16 | 16.0% |
year_of_birth
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 44 |
---|---|
Distinct (%) | 44.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1962.65 |
Minimum | 1936 |
---|---|
Maximum | 1998 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 1936 |
---|---|
5-th percentile | 1942 |
Q1 | 1954 |
median | 1959.5 |
Q3 | 1972 |
95-th percentile | 1992 |
Maximum | 1998 |
Range | 62 |
Interquartile range (IQR) | 18 |
Descriptive statistics
Standard deviation | 14.270387 |
---|---|
Coefficient of variation (CV) | 0.0072709789 |
Kurtosis | -0.061042274 |
Mean | 1962.65 |
Median Absolute Deviation (MAD) | 8.5 |
Skewness | 0.60106926 |
Sum | 196265 |
Variance | 203.64394 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1958 | 5 | 5.0% |
1955 | 5 | 5.0% |
1954 | 5 | 5.0% |
1962 | 5 | 5.0% |
1972 | 4 | 4.0% |
1957 | 4 | 4.0% |
1960 | 4 | 4.0% |
1968 | 4 | 4.0% |
1956 | 4 | 4.0% |
1952 | 4 | 4.0% |
Other values (34) | 56 |
Value | Count | Frequency (%) |
1936 | 1 | 1.0% |
1937 | 1 | 1.0% |
1939 | 1 | 1.0% |
1940 | 1 | 1.0% |
1942 | 2 | |
1943 | 3 | |
1944 | 1 | 1.0% |
1946 | 1 | 1.0% |
1947 | 1 | 1.0% |
1949 | 1 | 1.0% |
Value | Count | Frequency (%) |
1998 | 1 | 1.0% |
1996 | 1 | 1.0% |
1995 | 2 | |
1992 | 2 | |
1991 | 1 | 1.0% |
1988 | 2 | |
1982 | 2 | |
1981 | 2 | |
1979 | 2 | |
1977 | 3 |
month_of_birth
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 12 |
---|---|
Distinct (%) | 12.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 6.42 |
Minimum | 1 |
---|---|
Maximum | 12 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 3 |
median | 7 |
Q3 | 9 |
95-th percentile | 12 |
Maximum | 12 |
Range | 11 |
Interquartile range (IQR) | 6 |
Descriptive statistics
Standard deviation | 3.5109641 |
---|---|
Coefficient of variation (CV) | 0.54687914 |
Kurtosis | -1.1519945 |
Mean | 6.42 |
Median Absolute Deviation (MAD) | 3 |
Skewness | -0.085036171 |
Sum | 642 |
Variance | 12.326869 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 12 | |
9 | 11 | |
7 | 11 | |
6 | 10 | |
2 | 9 | |
12 | 9 | |
8 | 8 | |
10 | 8 | |
4 | 6 | |
5 | 6 | |
Other values (2) | 10 |
Value | Count | Frequency (%) |
1 | 12 | |
2 | 9 | |
3 | 5 | |
4 | 6 | |
5 | 6 | |
6 | 10 | |
7 | 11 | |
8 | 8 | |
9 | 11 | |
10 | 8 |
Value | Count | Frequency (%) |
12 | 9 | |
11 | 5 | |
10 | 8 | |
9 | 11 | |
8 | 8 | |
7 | 11 | |
6 | 10 | |
5 | 6 | |
4 | 6 | |
3 | 5 |
day_of_birth
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
1 |
---|
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1 |
---|---|
2nd row | 1 |
3rd row | 1 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
1 | 100 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1 | 100 |
race_concept_id
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
38003585 | |
---|---|
8552 | 2 |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 7.92 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 38003585 |
---|---|
2nd row | 38003585 |
3rd row | 38003585 |
4th row | 38003585 |
5th row | 38003585 |
Common Values
Value | Count | Frequency (%) |
38003585 | 98 | |
8552 | 2 | 2.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
38003585 | 98 | |
8552 | 2 | 2.0% |
ethnicity_concept_id
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
38003564 | |
---|---|
38003563 | 2 |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 8 |
Min length | 8 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 38003564 |
---|---|
2nd row | 38003564 |
3rd row | 38003564 |
4th row | 38003564 |
5th row | 38003564 |
Common Values
Value | Count | Frequency (%) |
38003564 | 98 | |
38003563 | 2 | 2.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
38003564 | 98 | |
38003563 | 2 | 2.0% |
location_id
Real number (ℝ)
HIGH CORRELATION
  ZEROS
 
Distinct | 46 |
---|---|
Distinct (%) | 46.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 39918147 |
Minimum | 0 |
---|---|
Maximum | 42019244 |
Zeros | 5 |
Zeros (%) | 5.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 39918007 |
Q1 | 42019016 |
median | 42019074 |
Q3 | 42019171 |
95-th percentile | 42019238 |
Maximum | 42019244 |
Range | 42019244 |
Interquartile range (IQR) | 155 |
Descriptive statistics
Standard deviation | 9203986.5 |
---|---|
Coefficient of variation (CV) | 0.23057149 |
Kurtosis | 15.895778 |
Mean | 39918147 |
Median Absolute Deviation (MAD) | 96.5 |
Skewness | -4.1926366 |
Sum | 3.9918147 × 109 |
Variance | 8.4713368 × 1013 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
42019171 | 16 | 16.0% |
42019074 | 9 | 9.0% |
42019016 | 8 | 8.0% |
42019238 | 5 | 5.0% |
42019008 | 5 | 5.0% |
0 | 5 | 5.0% |
42019046 | 4 | 4.0% |
42019244 | 3 | 3.0% |
42019220 | 2 | 2.0% |
42019190 | 2 | 2.0% |
Other values (36) | 41 |
Value | Count | Frequency (%) |
0 | 5 | |
42018955 | 1 | 1.0% |
42018961 | 1 | 1.0% |
42018962 | 2 | 2.0% |
42018965 | 1 | 1.0% |
42018968 | 2 | 2.0% |
42018976 | 1 | 1.0% |
42018983 | 1 | 1.0% |
42018994 | 1 | 1.0% |
42018997 | 1 | 1.0% |
Value | Count | Frequency (%) |
42019244 | 3 | |
42019243 | 1 | 1.0% |
42019238 | 5 | |
42019231 | 1 | 1.0% |
42019229 | 1 | 1.0% |
42019220 | 2 | 2.0% |
42019204 | 1 | 1.0% |
42019199 | 1 | 1.0% |
42019190 | 2 | 2.0% |
42019189 | 1 | 1.0% |
care_site_id
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
1300 |
---|
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1300 |
---|---|
2nd row | 1300 |
3rd row | 1300 |
4th row | 1300 |
5th row | 1300 |
Common Values
Value | Count | Frequency (%) |
1300 | 100 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1300 | 100 |
gender_source_value
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
M | |
---|---|
F |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | M |
---|---|
2nd row | F |
3rd row | M |
4th row | M |
5th row | F |
Common Values
Value | Count | Frequency (%) |
M | 84 | |
F | 16 | 16.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
m | 84 | |
f | 16 | 16.0% |
race_source_value
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Korean | |
---|---|
<NA> | 2 |
Length
Max length | 6 |
---|---|
Median length | 6 |
Mean length | 5.96 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Korean |
---|---|
2nd row | Korean |
3rd row | Korean |
4th row | Korean |
5th row | Korean |
Common Values
Value | Count | Frequency (%) |
Korean | 98 | |
<NA> | 2 | 2.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
korean | 98 | |
na | 2 | 2.0% |
ethnicity_source_value
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Not Hispanic or Latino | |
---|---|
Hispanic | 2 |
Length
Max length | 22 |
---|---|
Median length | 22 |
Mean length | 21.72 |
Min length | 8 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Not Hispanic or Latino |
---|---|
2nd row | Not Hispanic or Latino |
3rd row | Not Hispanic or Latino |
4th row | Not Hispanic or Latino |
5th row | Not Hispanic or Latino |
Common Values
Value | Count | Frequency (%) |
Not Hispanic or Latino | 98 | |
Hispanic | 2 | 2.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
hispanic | 100 | |
not | 98 | |
or | 98 | |
latino | 98 |
gender_concept_id | year_of_birth | month_of_birth | race_concept_id | ethnicity_concept_id | location_id | gender_source_value | ethnicity_source_value | |
---|---|---|---|---|---|---|---|---|
gender_concept_id | 1.000 | 0.331 | 0.169 | 0.000 | 0.000 | 0.000 | 0.998 | 0.000 |
year_of_birth | 0.331 | 1.000 | 0.403 | 0.000 | 0.000 | 0.046 | 0.331 | 0.000 |
month_of_birth | 0.169 | 0.403 | 1.000 | 0.144 | 0.144 | 0.097 | 0.169 | 0.144 |
race_concept_id | 0.000 | 0.000 | 0.144 | 1.000 | 0.919 | 0.000 | 0.000 | 0.919 |
ethnicity_concept_id | 0.000 | 0.000 | 0.144 | 0.919 | 1.000 | 0.000 | 0.000 | 0.919 |
location_id | 0.000 | 0.046 | 0.097 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 |
gender_source_value | 0.998 | 0.331 | 0.169 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 |
ethnicity_source_value | 0.000 | 0.000 | 0.144 | 0.919 | 0.919 | 0.000 | 0.000 | 1.000 |
ethnicity_source_value | race_source_value | gender_source_value | gender_concept_id | race_concept_id | ethnicity_concept_id | |
---|---|---|---|---|---|---|
ethnicity_source_value | 1.000 | 1.000 | 0.000 | 0.000 | 0.742 | 0.742 |
race_source_value | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
gender_source_value | 0.000 | 1.000 | 1.000 | 0.962 | 0.000 | 0.000 |
gender_concept_id | 0.000 | 1.000 | 0.962 | 1.000 | 0.000 | 0.000 |
race_concept_id | 0.742 | 1.000 | 0.000 | 0.000 | 1.000 | 0.742 |
ethnicity_concept_id | 0.742 | 1.000 | 0.000 | 0.000 | 0.742 | 1.000 |
year_of_birth | month_of_birth | location_id | gender_concept_id | race_concept_id | ethnicity_concept_id | gender_source_value | race_source_value | ethnicity_source_value | |
---|---|---|---|---|---|---|---|---|---|
year_of_birth | 1.000 | 0.019 | -0.017 | 0.247 | 0.000 | 0.000 | 0.247 | 1.000 | 0.000 |
month_of_birth | 0.019 | 1.000 | -0.191 | 0.121 | 0.101 | 0.101 | 0.121 | 1.000 | 0.101 |
location_id | -0.017 | -0.191 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 |
gender_concept_id | 0.247 | 0.121 | 0.000 | 1.000 | 0.000 | 0.000 | 0.962 | 1.000 | 0.000 |
race_concept_id | 0.000 | 0.101 | 0.000 | 0.000 | 1.000 | 0.742 | 0.000 | 1.000 | 0.742 |
ethnicity_concept_id | 0.000 | 0.101 | 0.000 | 0.000 | 0.742 | 1.000 | 0.000 | 1.000 | 0.742 |
gender_source_value | 0.247 | 0.121 | 0.000 | 0.962 | 0.000 | 0.000 | 1.000 | 1.000 | 0.000 |
race_source_value | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
ethnicity_source_value | 0.000 | 0.101 | 0.000 | 0.000 | 0.742 | 0.742 | 0.000 | 1.000 | 1.000 |
gender_concept_id | year_of_birth | month_of_birth | day_of_birth | race_concept_id | ethnicity_concept_id | location_id | care_site_id | gender_source_value | race_source_value | ethnicity_source_value | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8507 | 1951 | 8 | 1 | 38003585 | 38003564 | 42019046 | 1300 | M | Korean | Not Hispanic or Latino |
1 | 8532 | 1973 | 2 | 1 | 38003585 | 38003564 | 42019008 | 1300 | F | Korean | Not Hispanic or Latino |
2 | 8507 | 1958 | 8 | 1 | 38003585 | 38003564 | 42019108 | 1300 | M | Korean | Not Hispanic or Latino |
3 | 8507 | 1959 | 4 | 1 | 38003585 | 38003564 | 42019170 | 1300 | M | Korean | Not Hispanic or Latino |
4 | 8532 | 1957 | 11 | 1 | 38003585 | 38003564 | 42019008 | 1300 | F | Korean | Not Hispanic or Latino |
5 | 8507 | 1960 | 9 | 1 | 38003585 | 38003564 | 42019047 | 1300 | M | Korean | Not Hispanic or Latino |
6 | 8532 | 1998 | 10 | 1 | 38003585 | 38003564 | 42019143 | 1300 | F | Korean | Not Hispanic or Latino |
7 | 8507 | 1964 | 9 | 1 | 38003585 | 38003564 | 42019030 | 1300 | M | Korean | Not Hispanic or Latino |
8 | 8507 | 1953 | 1 | 1 | 38003585 | 38003564 | 42019171 | 1300 | M | Korean | Not Hispanic or Latino |
9 | 8507 | 1957 | 9 | 1 | 38003585 | 38003564 | 0 | 1300 | M | Korean | Not Hispanic or Latino |
gender_concept_id | year_of_birth | month_of_birth | day_of_birth | race_concept_id | ethnicity_concept_id | location_id | care_site_id | gender_source_value | race_source_value | ethnicity_source_value | |
---|---|---|---|---|---|---|---|---|---|---|---|
90 | 8507 | 1950 | 10 | 1 | 38003585 | 38003564 | 42019074 | 1300 | M | Korean | Not Hispanic or Latino |
91 | 8507 | 1968 | 5 | 1 | 38003585 | 38003564 | 42019238 | 1300 | M | Korean | Not Hispanic or Latino |
92 | 8507 | 1960 | 8 | 1 | 38003585 | 38003564 | 42019072 | 1300 | M | Korean | Not Hispanic or Latino |
93 | 8507 | 1942 | 1 | 1 | 38003585 | 38003564 | 42019016 | 1300 | M | Korean | Not Hispanic or Latino |
94 | 8507 | 1972 | 12 | 1 | 8552 | 38003563 | 42019238 | 1300 | M | <NA> | Hispanic |
95 | 8507 | 1954 | 1 | 1 | 38003585 | 38003564 | 42019243 | 1300 | M | Korean | Not Hispanic or Latino |
96 | 8507 | 1952 | 6 | 1 | 38003585 | 38003564 | 42019190 | 1300 | M | Korean | Not Hispanic or Latino |
97 | 8532 | 1962 | 2 | 1 | 38003585 | 38003564 | 42019238 | 1300 | F | Korean | Not Hispanic or Latino |
98 | 8532 | 1956 | 1 | 1 | 38003585 | 38003564 | 42019238 | 1300 | F | Korean | Not Hispanic or Latino |
99 | 8507 | 1950 | 2 | 1 | 38003585 | 38003564 | 42019171 | 1300 | M | Korean | Not Hispanic or Latino |