Overview

Dataset statistics

Number of variables11
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.5 KiB
Average record size in memory97.3 B

Variable types

Categorical8
Numeric3

Dataset

Description알코올 사용장애 환자의 기본 정보를 OMOP CDM 형식으로 생산한 데이터
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/alcohol_person_2020-omop-cdm

Alerts

day_of_birth has constant value ""Constant
care_site_id has constant value ""Constant
ethnicity_source_value is highly overall correlated with race_concept_id and 2 other fieldsHigh correlation
race_source_value is highly overall correlated with year_of_birth and 7 other fieldsHigh correlation
gender_source_value is highly overall correlated with gender_concept_id and 1 other fieldsHigh correlation
gender_concept_id is highly overall correlated with gender_source_value and 1 other fieldsHigh correlation
race_concept_id is highly overall correlated with ethnicity_concept_id and 2 other fieldsHigh correlation
ethnicity_concept_id is highly overall correlated with race_concept_id and 2 other fieldsHigh correlation
year_of_birth is highly overall correlated with race_source_valueHigh correlation
month_of_birth is highly overall correlated with race_source_valueHigh correlation
location_id is highly overall correlated with race_source_valueHigh correlation
race_concept_id is highly imbalanced (85.9%)Imbalance
ethnicity_concept_id is highly imbalanced (85.9%)Imbalance
race_source_value is highly imbalanced (85.9%)Imbalance
ethnicity_source_value is highly imbalanced (85.9%)Imbalance
location_id has 5 (5.0%) zerosZeros

Reproduction

Analysis started2023-10-08 18:58:03.465264
Analysis finished2023-10-08 18:58:06.071373
Duration2.61 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

gender_concept_id
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
8507
84 
8532
16 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row8507
2nd row8532
3rd row8507
4th row8507
5th row8532

Common Values

ValueCountFrequency (%)
8507 84
84.0%
8532 16
 
16.0%

Length

2023-10-09T03:58:06.264204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:06.410479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
8507 84
84.0%
8532 16
 
16.0%

year_of_birth
Real number (ℝ)

HIGH CORRELATION 

Distinct44
Distinct (%)44.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1962.65
Minimum1936
Maximum1998
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:58:06.573822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1936
5-th percentile1942
Q11954
median1959.5
Q31972
95-th percentile1992
Maximum1998
Range62
Interquartile range (IQR)18

Descriptive statistics

Standard deviation14.270387
Coefficient of variation (CV)0.0072709789
Kurtosis-0.061042274
Mean1962.65
Median Absolute Deviation (MAD)8.5
Skewness0.60106926
Sum196265
Variance203.64394
MonotonicityNot monotonic
2023-10-09T03:58:06.805500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
1958 5
 
5.0%
1955 5
 
5.0%
1954 5
 
5.0%
1962 5
 
5.0%
1972 4
 
4.0%
1957 4
 
4.0%
1960 4
 
4.0%
1968 4
 
4.0%
1956 4
 
4.0%
1952 4
 
4.0%
Other values (34) 56
56.0%
ValueCountFrequency (%)
1936 1
 
1.0%
1937 1
 
1.0%
1939 1
 
1.0%
1940 1
 
1.0%
1942 2
2.0%
1943 3
3.0%
1944 1
 
1.0%
1946 1
 
1.0%
1947 1
 
1.0%
1949 1
 
1.0%
ValueCountFrequency (%)
1998 1
 
1.0%
1996 1
 
1.0%
1995 2
2.0%
1992 2
2.0%
1991 1
 
1.0%
1988 2
2.0%
1982 2
2.0%
1981 2
2.0%
1979 2
2.0%
1977 3
3.0%

month_of_birth
Real number (ℝ)

HIGH CORRELATION 

Distinct12
Distinct (%)12.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.42
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:58:07.061859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median7
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.5109641
Coefficient of variation (CV)0.54687914
Kurtosis-1.1519945
Mean6.42
Median Absolute Deviation (MAD)3
Skewness-0.085036171
Sum642
Variance12.326869
MonotonicityNot monotonic
2023-10-09T03:58:07.239268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1 12
12.0%
9 11
11.0%
7 11
11.0%
6 10
10.0%
2 9
9.0%
12 9
9.0%
8 8
8.0%
10 8
8.0%
4 6
6.0%
5 6
6.0%
Other values (2) 10
10.0%
ValueCountFrequency (%)
1 12
12.0%
2 9
9.0%
3 5
5.0%
4 6
6.0%
5 6
6.0%
6 10
10.0%
7 11
11.0%
8 8
8.0%
9 11
11.0%
10 8
8.0%
ValueCountFrequency (%)
12 9
9.0%
11 5
5.0%
10 8
8.0%
9 11
11.0%
8 8
8.0%
7 11
11.0%
6 10
10.0%
5 6
6.0%
4 6
6.0%
3 5
5.0%

day_of_birth
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
100 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 100
100.0%

Length

2023-10-09T03:58:07.473125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:07.633562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 100
100.0%

race_concept_id
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
38003585
98 
8552
 
2

Length

Max length8
Median length8
Mean length7.92
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row38003585
2nd row38003585
3rd row38003585
4th row38003585
5th row38003585

Common Values

ValueCountFrequency (%)
38003585 98
98.0%
8552 2
 
2.0%

Length

2023-10-09T03:58:07.775345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:07.954745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
38003585 98
98.0%
8552 2
 
2.0%

ethnicity_concept_id
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
38003564
98 
38003563
 
2

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row38003564
2nd row38003564
3rd row38003564
4th row38003564
5th row38003564

Common Values

ValueCountFrequency (%)
38003564 98
98.0%
38003563 2
 
2.0%

Length

2023-10-09T03:58:08.163726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:08.608339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
38003564 98
98.0%
38003563 2
 
2.0%

location_id
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct46
Distinct (%)46.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39918147
Minimum0
Maximum42019244
Zeros5
Zeros (%)5.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:58:08.773958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile39918007
Q142019016
median42019074
Q342019171
95-th percentile42019238
Maximum42019244
Range42019244
Interquartile range (IQR)155

Descriptive statistics

Standard deviation9203986.5
Coefficient of variation (CV)0.23057149
Kurtosis15.895778
Mean39918147
Median Absolute Deviation (MAD)96.5
Skewness-4.1926366
Sum3.9918147 × 109
Variance8.4713368 × 1013
MonotonicityNot monotonic
2023-10-09T03:58:08.980365image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
42019171 16
 
16.0%
42019074 9
 
9.0%
42019016 8
 
8.0%
42019238 5
 
5.0%
42019008 5
 
5.0%
0 5
 
5.0%
42019046 4
 
4.0%
42019244 3
 
3.0%
42019220 2
 
2.0%
42019190 2
 
2.0%
Other values (36) 41
41.0%
ValueCountFrequency (%)
0 5
5.0%
42018955 1
 
1.0%
42018961 1
 
1.0%
42018962 2
 
2.0%
42018965 1
 
1.0%
42018968 2
 
2.0%
42018976 1
 
1.0%
42018983 1
 
1.0%
42018994 1
 
1.0%
42018997 1
 
1.0%
ValueCountFrequency (%)
42019244 3
3.0%
42019243 1
 
1.0%
42019238 5
5.0%
42019231 1
 
1.0%
42019229 1
 
1.0%
42019220 2
 
2.0%
42019204 1
 
1.0%
42019199 1
 
1.0%
42019190 2
 
2.0%
42019189 1
 
1.0%

care_site_id
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1300
100 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1300
2nd row1300
3rd row1300
4th row1300
5th row1300

Common Values

ValueCountFrequency (%)
1300 100
100.0%

Length

2023-10-09T03:58:09.154023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:09.261555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1300 100
100.0%

gender_source_value
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
M
84 
F
16 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowM
5th rowF

Common Values

ValueCountFrequency (%)
M 84
84.0%
F 16
 
16.0%

Length

2023-10-09T03:58:09.384703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:09.518804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
m 84
84.0%
f 16
 
16.0%

race_source_value
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Korean
98 
<NA>
 
2

Length

Max length6
Median length6
Mean length5.96
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKorean
2nd rowKorean
3rd rowKorean
4th rowKorean
5th rowKorean

Common Values

ValueCountFrequency (%)
Korean 98
98.0%
<NA> 2
 
2.0%

Length

2023-10-09T03:58:09.684266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:09.838278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
korean 98
98.0%
na 2
 
2.0%

ethnicity_source_value
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Not Hispanic or Latino
98 
Hispanic
 
2

Length

Max length22
Median length22
Mean length21.72
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot Hispanic or Latino
2nd rowNot Hispanic or Latino
3rd rowNot Hispanic or Latino
4th rowNot Hispanic or Latino
5th rowNot Hispanic or Latino

Common Values

ValueCountFrequency (%)
Not Hispanic or Latino 98
98.0%
Hispanic 2
 
2.0%

Length

2023-10-09T03:58:10.003478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:58:10.207311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hispanic 100
25.4%
not 98
24.9%
or 98
24.9%
latino 98
24.9%

Interactions

2023-10-09T03:58:04.843780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:04.100759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:04.468859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:04.962032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:04.230459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:04.614014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:05.230447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:04.354015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:58:04.734704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:58:10.301956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
gender_concept_idyear_of_birthmonth_of_birthrace_concept_idethnicity_concept_idlocation_idgender_source_valueethnicity_source_value
gender_concept_id1.0000.3310.1690.0000.0000.0000.9980.000
year_of_birth0.3311.0000.4030.0000.0000.0460.3310.000
month_of_birth0.1690.4031.0000.1440.1440.0970.1690.144
race_concept_id0.0000.0000.1441.0000.9190.0000.0000.919
ethnicity_concept_id0.0000.0000.1440.9191.0000.0000.0000.919
location_id0.0000.0460.0970.0000.0001.0000.0000.000
gender_source_value0.9980.3310.1690.0000.0000.0001.0000.000
ethnicity_source_value0.0000.0000.1440.9190.9190.0000.0001.000
2023-10-09T03:58:10.523357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ethnicity_source_valuerace_source_valuegender_source_valuegender_concept_idrace_concept_idethnicity_concept_id
ethnicity_source_value1.0001.0000.0000.0000.7420.742
race_source_value1.0001.0001.0001.0001.0001.000
gender_source_value0.0001.0001.0000.9620.0000.000
gender_concept_id0.0001.0000.9621.0000.0000.000
race_concept_id0.7421.0000.0000.0001.0000.742
ethnicity_concept_id0.7421.0000.0000.0000.7421.000
2023-10-09T03:58:10.690736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
year_of_birthmonth_of_birthlocation_idgender_concept_idrace_concept_idethnicity_concept_idgender_source_valuerace_source_valueethnicity_source_value
year_of_birth1.0000.019-0.0170.2470.0000.0000.2471.0000.000
month_of_birth0.0191.000-0.1910.1210.1010.1010.1211.0000.101
location_id-0.017-0.1911.0000.0000.0000.0000.0001.0000.000
gender_concept_id0.2470.1210.0001.0000.0000.0000.9621.0000.000
race_concept_id0.0000.1010.0000.0001.0000.7420.0001.0000.742
ethnicity_concept_id0.0000.1010.0000.0000.7421.0000.0001.0000.742
gender_source_value0.2470.1210.0000.9620.0000.0001.0001.0000.000
race_source_value1.0001.0001.0001.0001.0001.0001.0001.0001.000
ethnicity_source_value0.0000.1010.0000.0000.7420.7420.0001.0001.000

Missing values

2023-10-09T03:58:05.641418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:58:05.942585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

gender_concept_idyear_of_birthmonth_of_birthday_of_birthrace_concept_idethnicity_concept_idlocation_idcare_site_idgender_source_valuerace_source_valueethnicity_source_value
085071951813800358538003564420190461300MKoreanNot Hispanic or Latino
185321973213800358538003564420190081300FKoreanNot Hispanic or Latino
285071958813800358538003564420191081300MKoreanNot Hispanic or Latino
385071959413800358538003564420191701300MKoreanNot Hispanic or Latino
4853219571113800358538003564420190081300FKoreanNot Hispanic or Latino
585071960913800358538003564420190471300MKoreanNot Hispanic or Latino
6853219981013800358538003564420191431300FKoreanNot Hispanic or Latino
785071964913800358538003564420190301300MKoreanNot Hispanic or Latino
885071953113800358538003564420191711300MKoreanNot Hispanic or Latino
98507195791380035853800356401300MKoreanNot Hispanic or Latino
gender_concept_idyear_of_birthmonth_of_birthday_of_birthrace_concept_idethnicity_concept_idlocation_idcare_site_idgender_source_valuerace_source_valueethnicity_source_value
90850719501013800358538003564420190741300MKoreanNot Hispanic or Latino
9185071968513800358538003564420192381300MKoreanNot Hispanic or Latino
9285071960813800358538003564420190721300MKoreanNot Hispanic or Latino
9385071942113800358538003564420190161300MKoreanNot Hispanic or Latino
9485071972121855238003563420192381300M<NA>Hispanic
9585071954113800358538003564420192431300MKoreanNot Hispanic or Latino
9685071952613800358538003564420191901300MKoreanNot Hispanic or Latino
9785321962213800358538003564420192381300FKoreanNot Hispanic or Latino
9885321956113800358538003564420192381300FKoreanNot Hispanic or Latino
9985071950213800358538003564420191711300MKoreanNot Hispanic or Latino