Overview

Dataset statistics

Number of variables8
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.1 KiB
Average record size in memory72.3 B

Variable types

Numeric4
Categorical4

Dataset

Description알코올 사용장애 환자의 검사 기록을 OMOP CDM 형식으로 생산한 데이터
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/alcohol_measurement_2020-omop-cdm

Alerts

measurement_type_concept_id has constant value ""Constant
unit_concept_id is highly overall correlated with measurement_concept_id and 1 other fieldsHigh correlation
unit_source_value is highly overall correlated with measurement_concept_id and 1 other fieldsHigh correlation
measurement_id is highly overall correlated with measurement_dateHigh correlation
measurement_concept_id is highly overall correlated with value_as_number and 3 other fieldsHigh correlation
measurement_date is highly overall correlated with measurement_idHigh correlation
value_as_number is highly overall correlated with measurement_concept_id and 1 other fieldsHigh correlation
operator_concept_id is highly overall correlated with measurement_concept_id and 1 other fieldsHigh correlation
unit_concept_id is highly imbalanced (50.0%)Imbalance
measurement_id has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:56:19.630673
Analysis finished2023-10-08 18:56:24.270495
Duration4.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

measurement_id
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.4703628 × 108
Minimum9700478
Maximum3.6978949 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:24.383241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum9700478
5-th percentile71784313
Q12.2355058 × 108
median2.5157225 × 108
Q33.0426758 × 108
95-th percentile3.4843393 × 108
Maximum3.6978949 × 108
Range3.6008901 × 108
Interquartile range (IQR)80716998

Descriptive statistics

Standard deviation79307109
Coefficient of variation (CV)0.32103426
Kurtosis1.4336274
Mean2.4703628 × 108
Median Absolute Deviation (MAD)37229415
Skewness-1.1605554
Sum2.4703628 × 1010
Variance6.2896176 × 1015
MonotonicityNot monotonic
2023-10-09T03:56:24.659126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
332277538 1
 
1.0%
218740619 1
 
1.0%
226951215 1
 
1.0%
217935647 1
 
1.0%
326922842 1
 
1.0%
295746090 1
 
1.0%
369789490 1
 
1.0%
254632501 1
 
1.0%
244422773 1
 
1.0%
234703832 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
9700478 1
1.0%
17606445 1
1.0%
24338175 1
1.0%
28785807 1
1.0%
68307647 1
1.0%
71967295 1
1.0%
72813659 1
1.0%
86894848 1
1.0%
106971770 1
1.0%
144791498 1
1.0%
ValueCountFrequency (%)
369789490 1
1.0%
361389120 1
1.0%
360261430 1
1.0%
358088875 1
1.0%
349794397 1
1.0%
348362327 1
1.0%
345419988 1
1.0%
345100511 1
1.0%
341832176 1
1.0%
332277538 1
1.0%

measurement_concept_id
Real number (ℝ)

HIGH CORRELATION 

Distinct16
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3015101.6
Minimum3000905
Maximum3036887
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:24.868758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3000905
5-th percentile3000963
Q13004501
median3013650
Q33023314
95-th percentile3036887
Maximum3036887
Range35982
Interquartile range (IQR)18813

Descriptive statistics

Standard deviation11751.374
Coefficient of variation (CV)0.0038975052
Kurtosis-0.98084115
Mean3015101.6
Median Absolute Deviation (MAD)9149
Skewness0.54850166
Sum3.0151016 × 108
Variance1.3809479 × 108
MonotonicityNot monotonic
2023-10-09T03:56:25.088598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
3004501 22
22.0%
3000963 9
9.0%
3036887 9
9.0%
3023314 9
9.0%
3024929 7
 
7.0%
3013721 7
 
7.0%
3007070 6
 
6.0%
3022192 5
 
5.0%
3013650 4
 
4.0%
3006923 4
 
4.0%
Other values (6) 18
18.0%
ValueCountFrequency (%)
3000905 2
 
2.0%
3000963 9
9.0%
3004501 22
22.0%
3006923 4
 
4.0%
3007070 6
 
6.0%
3009966 4
 
4.0%
3010156 2
 
2.0%
3013650 4
 
4.0%
3013721 7
 
7.0%
3020416 2
 
2.0%
ValueCountFrequency (%)
3036887 9
9.0%
3035995 4
4.0%
3026910 4
4.0%
3024929 7
7.0%
3023314 9
9.0%
3022192 5
5.0%
3020416 2
 
2.0%
3013721 7
7.0%
3013650 4
4.0%
3010156 2
 
2.0%

measurement_date
Real number (ℝ)

HIGH CORRELATION 

Distinct66
Distinct (%)66.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201466.74
Minimum201101
Maximum201912
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:25.302376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum201101
5-th percentile201102.95
Q1201206.25
median201407.5
Q3201801.25
95-th percentile201906.05
Maximum201912
Range811
Interquartile range (IQR)595

Descriptive statistics

Standard deviation291.0664
Coefficient of variation (CV)0.0014447367
Kurtosis-1.5353255
Mean201466.74
Median Absolute Deviation (MAD)296.5
Skewness0.15500291
Sum20146674
Variance84719.649
MonotonicityNot monotonic
2023-10-09T03:56:25.573679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201211 4
 
4.0%
201102 3
 
3.0%
201803 3
 
3.0%
201802 3
 
3.0%
201105 3
 
3.0%
201207 3
 
3.0%
201311 3
 
3.0%
201908 2
 
2.0%
201304 2
 
2.0%
201103 2
 
2.0%
Other values (56) 72
72.0%
ValueCountFrequency (%)
201101 2
2.0%
201102 3
3.0%
201103 2
2.0%
201105 3
3.0%
201106 2
2.0%
201107 1
 
1.0%
201108 2
2.0%
201109 1
 
1.0%
201110 1
 
1.0%
201111 2
2.0%
ValueCountFrequency (%)
201912 1
1.0%
201909 1
1.0%
201908 2
2.0%
201907 1
1.0%
201906 1
1.0%
201904 2
2.0%
201903 1
1.0%
201902 1
1.0%
201812 2
2.0%
201811 1
1.0%

measurement_type_concept_id
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
44818702
100 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row44818702
2nd row44818702
3rd row44818702
4th row44818702
5th row44818702

Common Values

ValueCountFrequency (%)
44818702 100
100.0%

Length

2023-10-09T03:56:25.870670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:26.008314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
44818702 100
100.0%

operator_concept_id
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
<NA>
61 
4172704
24 
4171756
15 

Length

Max length7
Median length4
Mean length5.17
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row4171756
3rd row4172704
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 61
61.0%
4172704 24
 
24.0%
4171756 15
 
15.0%

Length

2023-10-09T03:56:26.164908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:26.353473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 61
61.0%
4172704 24
 
24.0%
4171756 15
 
15.0%

value_as_number
Real number (ℝ)

HIGH CORRELATION 

Distinct73
Distinct (%)73.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean81.27
Minimum0
Maximum670
Zeros1
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:26.592885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q114
median42
Q3104.75
95-th percentile243.25
Maximum670
Range670
Interquartile range (IQR)90.75

Descriptive statistics

Standard deviation110.96117
Coefficient of variation (CV)1.3653398
Kurtosis12.505813
Mean81.27
Median Absolute Deviation (MAD)33
Skewness3.0434917
Sum8127
Variance12312.381
MonotonicityNot monotonic
2023-10-09T03:56:26.843433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 7
 
7.0%
14 5
 
5.0%
3 4
 
4.0%
43 3
 
3.0%
47 3
 
3.0%
28 2
 
2.0%
41 2
 
2.0%
11 2
 
2.0%
141 2
 
2.0%
15 2
 
2.0%
Other values (63) 68
68.0%
ValueCountFrequency (%)
0 1
 
1.0%
1 7
7.0%
2 1
 
1.0%
3 4
4.0%
5 1
 
1.0%
7 1
 
1.0%
8 1
 
1.0%
9 1
 
1.0%
10 2
 
2.0%
11 2
 
2.0%
ValueCountFrequency (%)
670 1
1.0%
631 1
1.0%
285 1
1.0%
262 1
1.0%
248 1
1.0%
243 1
1.0%
241 1
1.0%
239 1
1.0%
234 1
1.0%
229 1
1.0%

unit_concept_id
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0
89 
8554
11 

Length

Max length4
Median length1
Mean length1.33
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row8554
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 89
89.0%
8554 11
 
11.0%

Length

2023-10-09T03:56:27.065017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:27.242052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 89
89.0%
8554 11
 
11.0%

unit_source_value
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
㎎/㎗
29 
U/ℓ
19 
10^9/L
14 
%
11 
g/㎗
Other values (4)
18 

Length

Max length7
Median length3
Mean length3.15
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowg/㎗
2nd row%
3rd row㎎/㎗
4th rowU/ℓ
5th row10^9/L

Common Values

ValueCountFrequency (%)
㎎/㎗ 29
29.0%
U/ℓ 19
19.0%
10^9/L 14
14.0%
% 11
 
11.0%
g/㎗ 9
 
9.0%
9
 
9.0%
<NA> 5
 
5.0%
㎎/ℓ 2
 
2.0%
10^12/L 2
 
2.0%

Length

2023-10-09T03:56:27.429972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:27.612131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
㎎/㎗ 29
29.0%
u/ℓ 19
19.0%
10^9/l 14
14.0%
11
 
11.0%
g/㎗ 9
 
9.0%
9
 
9.0%
na 5
 
5.0%
㎎/ℓ 2
 
2.0%
10^12/l 2
 
2.0%

Interactions

2023-10-09T03:56:23.087226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:20.608315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:21.291617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:22.174847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:23.296350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:20.800854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:21.493627image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:22.401048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:23.518607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:20.968360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:21.702166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:22.671318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:23.720760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:21.126197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:21.883787image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:56:22.878595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:56:27.966335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
measurement_idmeasurement_concept_idmeasurement_dateoperator_concept_idvalue_as_numberunit_concept_idunit_source_value
measurement_id1.0000.1370.7780.2010.0000.0000.000
measurement_concept_id0.1371.0000.1630.7580.5990.7760.903
measurement_date0.7780.1631.0000.0000.3280.0000.000
operator_concept_id0.2010.7580.0001.0000.4470.1110.635
value_as_number0.0000.5990.3280.4471.0000.0000.371
unit_concept_id0.0000.7760.0000.1110.0001.0001.000
unit_source_value0.0000.9030.0000.6350.3711.0001.000
2023-10-09T03:56:28.264832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
unit_concept_idunit_source_valueoperator_concept_id
unit_concept_id1.0000.9670.065
unit_source_value0.9671.0000.433
operator_concept_id0.0650.4331.000
2023-10-09T03:56:28.401078image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
measurement_idmeasurement_concept_idmeasurement_datevalue_as_numberoperator_concept_idunit_concept_idunit_source_value
measurement_id1.0000.0290.5610.1630.0000.0000.062
measurement_concept_id0.0291.000-0.1750.6050.5310.5820.518
measurement_date0.561-0.1751.000-0.0620.0000.0000.000
value_as_number0.1630.605-0.0621.0000.5190.0000.212
operator_concept_id0.0000.5310.0000.5191.0000.0650.433
unit_concept_id0.0000.5820.0000.0000.0651.0000.967
unit_source_value0.0620.5180.0000.2120.4330.9671.000

Missing values

2023-10-09T03:56:23.943228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:56:24.173780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

measurement_idmeasurement_concept_idmeasurement_datemeasurement_type_concept_idoperator_concept_idvalue_as_numberunit_concept_idunit_source_value
0332277538300096320190744818702<NA>140g/㎗
1719672953023314201105448187024171756318554%
223145330430045012012044481870241727042620㎎/㎗
3261715506300692320140944818702<NA>230U/ℓ
4221858571301365020110644818702<NA>3010^9/L
5311802146303688720180344818702<NA>950㎎/㎗
6228468649302492920120144818702<NA>157010^9/L
7173245334300090520181244818702<NA>8010^9/L
81617995883006923201802448187024172704510U/ℓ
9278150542302492920151244818702<NA>234010^9/L
measurement_idmeasurement_concept_idmeasurement_datemeasurement_type_concept_idoperator_concept_idvalue_as_numberunit_concept_idunit_source_value
9086894848301365020120744818702<NA>3010^9/L
91251761423302492920131144818702<NA>285010^9/L
92239798462300450120121244818702<NA>878554%
931564322833023314201710448187024171756288554%
9423544565830045012012074481870241727041630㎎/㎗
95311293427300096320180344818702<NA>140g/㎗
962513585853007070201311448187024171756470㎎/㎗
97222655663302331420110744818702<NA>398554%
98361389120300996620190944818702<NA>750㎎/㎗
992244936433006923201109448187024172704710U/ℓ