Overview

Dataset statistics

Number of variables11
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.5 KiB
Average record size in memory97.3 B

Variable types

Categorical8
Numeric3

Dataset

Description고지혈증 환자의 기본 정보를 OMOP CDM 형식으로 생산한 데이터
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/_person_2020-omop-cdm

Alerts

day_of_birth has constant value ""Constant
care_site_id has constant value ""Constant
race_concept_id is highly overall correlated with ethnicity_concept_id and 2 other fieldsHigh correlation
gender_concept_id is highly overall correlated with gender_source_value and 1 other fieldsHigh correlation
gender_source_value is highly overall correlated with gender_concept_id and 1 other fieldsHigh correlation
ethnicity_source_value is highly overall correlated with race_concept_id and 2 other fieldsHigh correlation
race_source_value is highly overall correlated with year_of_birth and 7 other fieldsHigh correlation
ethnicity_concept_id is highly overall correlated with race_concept_id and 2 other fieldsHigh correlation
year_of_birth is highly overall correlated with race_source_valueHigh correlation
month_of_birth is highly overall correlated with race_source_valueHigh correlation
location_id is highly overall correlated with race_source_valueHigh correlation
race_concept_id is highly imbalanced (71.4%)Imbalance
ethnicity_concept_id is highly imbalanced (71.4%)Imbalance
race_source_value is highly imbalanced (71.4%)Imbalance
ethnicity_source_value is highly imbalanced (71.4%)Imbalance
location_id has 4 (4.0%) zerosZeros

Reproduction

Analysis started2023-10-08 18:58:19.858446
Analysis finished2023-10-08 18:58:21.400100
Duration1.54 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

gender_concept_id
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
8532
55 
8507
45 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row8532
2nd row8507
3rd row8507
4th row8507
5th row8507

Common Values

ValueCountFrequency (%)
8532 55
55.0%
8507 45
45.0%

Length

2023-10-09T03:58:21.468152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:21.604129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
8532 55
55.0%
8507 45
45.0%

year_of_birth
Real number (ℝ)

HIGH CORRELATION 

Distinct46
Distinct (%)46.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1952.17
Minimum1923
Maximum2007
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:58:21.720032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1923
5-th percentile1932.95
Q11942
median1952
Q31960
95-th percentile1979
Maximum2007
Range84
Interquartile range (IQR)18

Descriptive statistics

Standard deviation14.091372
Coefficient of variation (CV)0.007218312
Kurtosis1.4901061
Mean1952.17
Median Absolute Deviation (MAD)8.5
Skewness0.75011313
Sum195217
Variance198.56677
MonotonicityNot monotonic
2023-10-09T03:58:21.884311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
1947 6
 
6.0%
1953 5
 
5.0%
1948 5
 
5.0%
1960 4
 
4.0%
1938 4
 
4.0%
1956 4
 
4.0%
1952 4
 
4.0%
1955 4
 
4.0%
1950 4
 
4.0%
1940 3
 
3.0%
Other values (36) 57
57.0%
ValueCountFrequency (%)
1923 1
 
1.0%
1927 1
 
1.0%
1929 1
 
1.0%
1930 1
 
1.0%
1932 1
 
1.0%
1933 3
3.0%
1934 1
 
1.0%
1935 2
2.0%
1936 3
3.0%
1938 4
4.0%
ValueCountFrequency (%)
2007 1
 
1.0%
1985 1
 
1.0%
1983 1
 
1.0%
1979 3
3.0%
1974 1
 
1.0%
1973 2
2.0%
1970 2
2.0%
1967 3
3.0%
1966 2
2.0%
1965 1
 
1.0%

month_of_birth
Real number (ℝ)

HIGH CORRELATION 

Distinct12
Distinct (%)12.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.17
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:58:21.999392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.5733441
Coefficient of variation (CV)0.57914815
Kurtosis-1.2620655
Mean6.17
Median Absolute Deviation (MAD)3
Skewness0.16441211
Sum617
Variance12.768788
MonotonicityNot monotonic
2023-10-09T03:58:22.110031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
3 12
12.0%
6 11
11.0%
1 10
10.0%
9 10
10.0%
2 9
9.0%
12 9
9.0%
4 8
8.0%
11 8
8.0%
8 7
7.0%
5 7
7.0%
Other values (2) 9
9.0%
ValueCountFrequency (%)
1 10
10.0%
2 9
9.0%
3 12
12.0%
4 8
8.0%
5 7
7.0%
6 11
11.0%
7 4
 
4.0%
8 7
7.0%
9 10
10.0%
10 5
5.0%
ValueCountFrequency (%)
12 9
9.0%
11 8
8.0%
10 5
5.0%
9 10
10.0%
8 7
7.0%
7 4
 
4.0%
6 11
11.0%
5 7
7.0%
4 8
8.0%
3 12
12.0%

day_of_birth
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
100 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 100
100.0%

Length

2023-10-09T03:58:22.267962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:22.406355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 100
100.0%

race_concept_id
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
38003585
95 
8552
 
5

Length

Max length8
Median length8
Mean length7.8
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row38003585
2nd row38003585
3rd row38003585
4th row38003585
5th row38003585

Common Values

ValueCountFrequency (%)
38003585 95
95.0%
8552 5
 
5.0%

Length

2023-10-09T03:58:22.541640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:22.657955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
38003585 95
95.0%
8552 5
 
5.0%

ethnicity_concept_id
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
38003564
95 
38003563
 
5

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row38003564
2nd row38003564
3rd row38003564
4th row38003564
5th row38003564

Common Values

ValueCountFrequency (%)
38003564 95
95.0%
38003563 5
 
5.0%

Length

2023-10-09T03:58:22.771220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:22.886509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
38003564 95
95.0%
38003563 5
 
5.0%

location_id
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct44
Distinct (%)44.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40338346
Minimum0
Maximum42019244
Zeros4
Zeros (%)4.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:58:23.021569image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile42018961
Q142019016
median42019104
Q342019171
95-th percentile42019238
Maximum42019244
Range42019244
Interquartile range (IQR)155

Descriptive statistics

Standard deviation8275511.9
Coefficient of variation (CV)0.20515248
Kurtosis21.143554
Mean40338346
Median Absolute Deviation (MAD)84
Skewness-4.7666552
Sum4.0338346 × 109
Variance6.8484097 × 1013
MonotonicityNot monotonic
2023-10-09T03:58:23.174669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
42019171 16
 
16.0%
42019016 6
 
6.0%
42019008 5
 
5.0%
42018961 4
 
4.0%
42019244 4
 
4.0%
0 4
 
4.0%
42019238 4
 
4.0%
42019074 4
 
4.0%
42019220 4
 
4.0%
42019190 3
 
3.0%
Other values (34) 46
46.0%
ValueCountFrequency (%)
0 4
4.0%
42018955 1
 
1.0%
42018961 4
4.0%
42018968 1
 
1.0%
42018971 1
 
1.0%
42018981 1
 
1.0%
42018994 1
 
1.0%
42018996 1
 
1.0%
42019000 2
 
2.0%
42019008 5
5.0%
ValueCountFrequency (%)
42019244 4
4.0%
42019241 1
 
1.0%
42019238 4
4.0%
42019226 2
2.0%
42019220 4
4.0%
42019218 1
 
1.0%
42019211 1
 
1.0%
42019195 2
2.0%
42019190 3
3.0%
42019189 1
 
1.0%

care_site_id
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1300
100 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1300
2nd row1300
3rd row1300
4th row1300
5th row1300

Common Values

ValueCountFrequency (%)
1300 100
100.0%

Length

2023-10-09T03:58:23.319751image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:23.430996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1300 100
100.0%

gender_source_value
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
F
55 
M
45 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowM
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
F 55
55.0%
M 45
45.0%

Length

2023-10-09T03:58:23.523745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:23.655478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
f 55
55.0%
m 45
45.0%

race_source_value
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Korean
95 
<NA>
 
5

Length

Max length6
Median length6
Mean length5.9
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKorean
2nd rowKorean
3rd rowKorean
4th rowKorean
5th rowKorean

Common Values

ValueCountFrequency (%)
Korean 95
95.0%
<NA> 5
 
5.0%

Length

2023-10-09T03:58:23.793977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:23.930570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
korean 95
95.0%
na 5
 
5.0%

ethnicity_source_value
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Not Hispanic or Latino
95 
Hispanic
 
5

Length

Max length22
Median length22
Mean length21.3
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot Hispanic or Latino
2nd rowNot Hispanic or Latino
3rd rowNot Hispanic or Latino
4th rowNot Hispanic or Latino
5th rowNot Hispanic or Latino

Common Values

ValueCountFrequency (%)
Not Hispanic or Latino 95
95.0%
Hispanic 5
 
5.0%

Length

2023-10-09T03:58:24.069855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:24.202181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hispanic 100
26.0%
not 95
24.7%
or 95
24.7%
latino 95
24.7%

Interactions

2023-10-09T03:58:20.882099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:20.325463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:20.587789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:20.974352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:20.406391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:20.670371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:21.053358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:20.507922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:20.772422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:58:24.300374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
gender_concept_idyear_of_birthmonth_of_birthrace_concept_idethnicity_concept_idlocation_idgender_source_valueethnicity_source_value
gender_concept_id1.0000.0680.0000.0000.0000.0000.9990.000
year_of_birth0.0681.0000.0000.3800.3800.0000.0680.380
month_of_birth0.0000.0001.0000.3950.3950.0000.0000.395
race_concept_id0.0000.3800.3951.0000.9860.0000.0000.986
ethnicity_concept_id0.0000.3800.3950.9861.0000.0000.0000.986
location_id0.0000.0000.0000.0000.0001.0000.0000.000
gender_source_value0.9990.0680.0000.0000.0000.0001.0000.000
ethnicity_source_value0.0000.3800.3950.9860.9860.0000.0001.000
2023-10-09T03:58:24.503875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
race_concept_idgender_concept_idgender_source_valueethnicity_source_valuerace_source_valueethnicity_concept_id
race_concept_id1.0000.0000.0000.8941.0000.894
gender_concept_id0.0001.0000.9800.0001.0000.000
gender_source_value0.0000.9801.0000.0001.0000.000
ethnicity_source_value0.8940.0000.0001.0001.0000.894
race_source_value1.0001.0001.0001.0001.0001.000
ethnicity_concept_id0.8940.0000.0000.8941.0001.000
2023-10-09T03:58:24.692505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
year_of_birthmonth_of_birthlocation_idgender_concept_idrace_concept_idethnicity_concept_idgender_source_valuerace_source_valueethnicity_source_value
year_of_birth1.000-0.077-0.0030.0000.3660.3660.0001.0000.366
month_of_birth-0.0771.000-0.0810.0000.2890.2890.0001.0000.289
location_id-0.003-0.0811.0000.0000.0000.0000.0001.0000.000
gender_concept_id0.0000.0000.0001.0000.0000.0000.9801.0000.000
race_concept_id0.3660.2890.0000.0001.0000.8940.0001.0000.894
ethnicity_concept_id0.3660.2890.0000.0000.8941.0000.0001.0000.894
gender_source_value0.0000.0000.0000.9800.0000.0001.0001.0000.000
race_source_value1.0001.0001.0001.0001.0001.0001.0001.0001.000
ethnicity_source_value0.3660.2890.0000.0000.8940.8940.0001.0001.000

Missing values

2023-10-09T03:58:21.172305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:58:21.326952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

gender_concept_idyear_of_birthmonth_of_birthday_of_birthrace_concept_idethnicity_concept_idlocation_idcare_site_idgender_source_valuerace_source_valueethnicity_source_value
085321957413800358538003564420190661300FKoreanNot Hispanic or Latino
185071933213800358538003564420190461300MKoreanNot Hispanic or Latino
285071934813800358538003564420189811300MKoreanNot Hispanic or Latino
385071953613800358538003564420190321300MKoreanNot Hispanic or Latino
485071983113800358538003564420191711300MKoreanNot Hispanic or Latino
585321966113800358538003564420191761300FKoreanNot Hispanic or Latino
685071945813800358538003564420190581300MKoreanNot Hispanic or Latino
7850719271113800358538003564420190161300MKoreanNot Hispanic or Latino
885071938113800358538003564420191711300MKoreanNot Hispanic or Latino
985071938413800358538003564420190801300MKoreanNot Hispanic or Latino
gender_concept_idyear_of_birthmonth_of_birthday_of_birthrace_concept_idethnicity_concept_idlocation_idcare_site_idgender_source_valuerace_source_valueethnicity_source_value
9085071974213800358538003564420190001300MKoreanNot Hispanic or Latino
91850719621013800358538003564420191711300MKoreanNot Hispanic or Latino
92853219561213800358538003564420189611300FKoreanNot Hispanic or Latino
93853219401213800358538003564420191711300FKoreanNot Hispanic or Latino
94850719799185523800356301300M<NA>Hispanic
9585321944213800358538003564420190741300FKoreanNot Hispanic or Latino
96853219511013800358538003564420190331300FKoreanNot Hispanic or Latino
9785071955713800358538003564420190081300MKoreanNot Hispanic or Latino
9885321942413800358538003564420190611300FKoreanNot Hispanic or Latino
9985321960613800358538003564420192261300FKoreanNot Hispanic or Latino