Overview

Dataset statistics

Number of variables7
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.1 KiB
Average record size in memory62.3 B

Variable types

Numeric3
Categorical4

Dataset

Description알코올 사용장애 환자의 증상 정보를 OMOP CDM 형식으로 생산한 데이터
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/alcohol_condition_occurrence_2020-omop-cdm

Alerts

condition_status_source_value has constant value ""Constant
condition_status_concept_id has constant value ""Constant
condition_occurrence_id is highly overall correlated with condition_start_dateHigh correlation
condition_concept_id is highly overall correlated with condition_source_valueHigh correlation
condition_start_date is highly overall correlated with condition_occurrence_idHigh correlation
condition_source_value is highly overall correlated with condition_concept_idHigh correlation
condition_type_concept_id is highly imbalanced (53.1%)Imbalance
condition_occurrence_id has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:55:58.070727
Analysis finished2023-10-08 18:56:02.651693
Duration4.58 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

condition_occurrence_id
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56274886
Minimum37892410
Maximum76135914
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:02.769950image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum37892410
5-th percentile38391587
Q146607654
median57291136
Q366851673
95-th percentile72039281
Maximum76135914
Range38243504
Interquartile range (IQR)20244020

Descriptive statistics

Standard deviation11203691
Coefficient of variation (CV)0.19908865
Kurtosis-1.1244452
Mean56274886
Median Absolute Deviation (MAD)9600076.5
Skewness-0.16326485
Sum5.6274886 × 109
Variance1.2552269 × 1014
MonotonicityNot monotonic
2023-10-09T03:56:03.059138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
70329014 1
 
1.0%
44799099 1
 
1.0%
61973785 1
 
1.0%
67690880 1
 
1.0%
44434129 1
 
1.0%
38392457 1
 
1.0%
43181834 1
 
1.0%
39411149 1
 
1.0%
53432479 1
 
1.0%
57291132 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
37892410 1
1.0%
37959287 1
1.0%
37959288 1
1.0%
38375073 1
1.0%
38375074 1
1.0%
38392456 1
1.0%
38392457 1
1.0%
38761936 1
1.0%
39224891 1
1.0%
39224893 1
1.0%
ValueCountFrequency (%)
76135914 1
1.0%
74487210 1
1.0%
74487206 1
1.0%
74453623 1
1.0%
74084930 1
1.0%
71931615 1
1.0%
71931613 1
1.0%
71931607 1
1.0%
70329014 1
1.0%
70121178 1
1.0%

condition_concept_id
Real number (ℝ)

HIGH CORRELATION 

Distinct18
Distinct (%)18.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7302959.2
Minimum194984
Maximum40525349
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:03.418373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum194984
5-th percentile197032
Q1320128
median4001171
Q34145627
95-th percentile40525349
Maximum40525349
Range40330365
Interquartile range (IQR)3825499

Descriptive statistics

Standard deviation13020870
Coefficient of variation (CV)1.782958
Kurtosis2.841802
Mean7302959.2
Median Absolute Deviation (MAD)3565647
Skewness2.1491705
Sum7.3029592 × 108
Variance1.6954305 × 1014
MonotonicityNot monotonic
2023-10-09T03:56:03.739241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
4001171 23
23.0%
201820 13
13.0%
40525349 11
11.0%
320128 10
10.0%
435524 8
 
8.0%
4098483 7
 
7.0%
197032 6
 
6.0%
4340383 6
 
6.0%
4145627 4
 
4.0%
4096673 2
 
2.0%
Other values (8) 10
10.0%
ValueCountFrequency (%)
194984 1
 
1.0%
197032 6
 
6.0%
201820 13
13.0%
320128 10
10.0%
321318 2
 
2.0%
435524 8
 
8.0%
4001171 23
23.0%
4096044 1
 
1.0%
4096673 2
 
2.0%
4098483 7
 
7.0%
ValueCountFrequency (%)
40525349 11
11.0%
40397928 1
 
1.0%
40356720 1
 
1.0%
4342779 1
 
1.0%
4340383 6
6.0%
4169287 2
 
2.0%
4145627 4
 
4.0%
4121624 1
 
1.0%
4098483 7
7.0%
4096673 2
 
2.0%

condition_start_date
Real number (ℝ)

HIGH CORRELATION 

Distinct36
Distinct (%)36.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201658.23
Minimum201402
Maximum201909
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:04.030587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum201402
5-th percentile201403
Q1201509
median201703
Q3201807
95-th percentile201902.2
Maximum201909
Range507
Interquartile range (IQR)298

Descriptive statistics

Standard deviation159.52574
Coefficient of variation (CV)0.00079106982
Kurtosis-1.1169059
Mean201658.23
Median Absolute Deviation (MAD)104
Skewness-0.28461164
Sum20165823
Variance25448.462
MonotonicityNot monotonic
2023-10-09T03:56:04.252428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%)
201807 7
 
7.0%
201405 6
 
6.0%
201703 5
 
5.0%
201808 5
 
5.0%
201812 4
 
4.0%
201403 4
 
4.0%
201505 4
 
4.0%
201906 4
 
4.0%
201704 4
 
4.0%
201803 4
 
4.0%
Other values (26) 53
53.0%
ValueCountFrequency (%)
201402 3
3.0%
201403 4
4.0%
201404 1
 
1.0%
201405 6
6.0%
201407 1
 
1.0%
201411 2
 
2.0%
201502 1
 
1.0%
201503 1
 
1.0%
201504 1
 
1.0%
201505 4
4.0%
ValueCountFrequency (%)
201909 1
 
1.0%
201906 4
4.0%
201902 3
3.0%
201812 4
4.0%
201810 3
3.0%
201808 5
5.0%
201807 7
7.0%
201806 1
 
1.0%
201803 4
4.0%
201802 2
 
2.0%

condition_type_concept_id
Categorical

IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
44786629
90 
44786627
10 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row44786629
2nd row44786629
3rd row44786629
4th row44786629
5th row44786629

Common Values

ValueCountFrequency (%)
44786629 90
90.0%
44786627 10
 
10.0%

Length

2023-10-09T03:56:04.466736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:04.633279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
44786629 90
90.0%
44786627 10
 
10.0%

condition_source_value
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
C22
23 
E14
13 
Z94
11 
I10
10 
G47
Other values (11)
35 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique4 ?
Unique (%)4.0%

Sample

1st rowI20
2nd rowF32
3rd rowI20
4th rowE11
5th rowK76

Common Values

ValueCountFrequency (%)
C22 23
23.0%
E14 13
13.0%
Z94 11
11.0%
I10 10
10.0%
G47 8
 
8.0%
E78 7
 
7.0%
K70 6
 
6.0%
N40 6
 
6.0%
K80 4
 
4.0%
E11 3
 
3.0%
Other values (6) 9
 
9.0%

Length

2023-10-09T03:56:05.005125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c22 23
23.0%
e14 13
13.0%
z94 11
11.0%
i10 10
10.0%
g47 8
 
8.0%
e78 7
 
7.0%
k70 6
 
6.0%
n40 6
 
6.0%
k80 4
 
4.0%
e11 3
 
3.0%
Other values (6) 9
 
9.0%

condition_status_source_value
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
C
100 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowC
3rd rowC
4th rowC
5th rowC

Common Values

ValueCountFrequency (%)
C 100
100.0%

Length

2023-10-09T03:56:05.367726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:05.590508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
c 100
100.0%

condition_status_concept_id
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
4230359
100 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4230359
2nd row4230359
3rd row4230359
4th row4230359
5th row4230359

Common Values

ValueCountFrequency (%)
4230359 100
100.0%

Length

2023-10-09T03:56:06.344439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:06.564308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
4230359 100
100.0%

Interactions

2023-10-09T03:56:01.558357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:59.713513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:00.845289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:01.782462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:00.111766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:01.129342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:01.958870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:00.584251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:01.328090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:56:06.732052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
condition_occurrence_idcondition_concept_idcondition_start_datecondition_type_concept_idcondition_source_value
condition_occurrence_id1.0000.5690.9220.3510.638
condition_concept_id0.5691.0000.6860.0711.000
condition_start_date0.9220.6861.0000.2750.606
condition_type_concept_id0.3510.0710.2751.0000.466
condition_source_value0.6381.0000.6060.4661.000
2023-10-09T03:56:07.312356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
condition_type_concept_idcondition_source_value
condition_type_concept_id1.0000.338
condition_source_value0.3381.000
2023-10-09T03:56:07.882908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
condition_occurrence_idcondition_concept_idcondition_start_datecondition_type_concept_idcondition_source_value
condition_occurrence_id1.000-0.1550.9990.2500.296
condition_concept_id-0.1551.000-0.1560.1170.914
condition_start_date0.999-0.1561.0000.2490.289
condition_type_concept_id0.2500.1170.2491.0000.338
condition_source_value0.2960.9140.2890.3381.000

Missing values

2023-10-09T03:56:02.214841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:56:02.524889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

condition_occurrence_idcondition_concept_idcondition_start_datecondition_type_concept_idcondition_source_valuecondition_status_source_valuecondition_status_concept_id
07032901432131820181244786629I20C4230359
1679746024035672020180844786629F32C4230359
26628411032131820180644786629I20C4230359
374084930409604420190644786629E11C4230359
47613591419498420190944786629K76C4230359
567974600412162420180844786627I65C4230359
640026298416928720140744786629L29C4230359
75805154520182020170544786629E14C4230359
858752609400117120170644786629C22C4230359
9387619364052534920140444786629Z94C4230359
condition_occurrence_idcondition_concept_idcondition_start_datecondition_type_concept_idcondition_source_valuecondition_status_source_valuecondition_status_concept_id
90392352154052534920140544786629Z94C4230359
9168860230409848320181044786629E78C4230359
92418648094052534920141144786629Z94C4230359
93379592884052534920140244786629Z94C4230359
947012117343552420181244786629G47C4230359
95447990964052534920150544786627Z94C4230359
96447427794052534920150544786629Z94C4230359
9767160029400117120180744786629C22C4230359
983837507332012820140344786629I10C4230359
993795928720182020140244786629E14C4230359