Overview

Dataset statistics

Number of variables7
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.2 KiB
Average record size in memory63.3 B

Variable types

Numeric3
Categorical4

Dataset

Description당뇨 환자의 관찰 기록을 OMOP CDM 형식으로 생산한 데이터
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/diabetes_observation_2020-omop-cdm

Alerts

observation_type_concept_id has constant value ""Constant
unit_source_value is highly overall correlated with value_as_number and 2 other fieldsHigh correlation
unit_concept_id is highly overall correlated with value_as_number and 2 other fieldsHigh correlation
observation_concept_id is highly overall correlated with value_as_number and 2 other fieldsHigh correlation
value_as_number is highly overall correlated with observation_concept_id and 2 other fieldsHigh correlation

Reproduction

Analysis started2023-10-08 18:55:37.006834
Analysis finished2023-10-08 18:55:46.963316
Duration9.96 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

observation_id
Real number (ℝ)

Distinct88
Distinct (%)88.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.8805297 × 1015
Minimum1.6388867 × 1011
Maximum2.458449 × 1016
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:55:47.097978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.6388867 × 1011
5-th percentile1.9236646 × 1011
Q12.9695147 × 1011
median4.1953879 × 1011
Q32.4566575 × 1016
95-th percentile2.4577635 × 1016
Maximum2.458449 × 1016
Range2.4584326 × 1016
Interquartile range (IQR)2.4566278 × 1016

Descriptive statistics

Standard deviation1.1088393 × 1016
Coefficient of variation (CV)1.611561
Kurtosis-1.0311147
Mean6.8805297 × 1015
Median Absolute Deviation (MAD)1.6163166 × 1011
Skewness0.99494495
Sum6.8805297 × 1017
Variance1.2295247 × 1032
MonotonicityNot monotonic
2023-10-09T03:55:47.360265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24568190000200000 3
 
3.0%
24567340001120000 3
 
3.0%
24570190000100000 2
 
2.0%
24567400001060000 2
 
2.0%
24584490000020000 2
 
2.0%
24575940001180000 2
 
2.0%
24570840001370000 2
 
2.0%
24579060001110000 2
 
2.0%
24572750000540000 2
 
2.0%
24577560000980000 2
 
2.0%
Other values (78) 78
78.0%
ValueCountFrequency (%)
163888670001 1
1.0%
163888680004 1
1.0%
163888730001 1
1.0%
192366410002 1
1.0%
192366420001 1
1.0%
192366460001 1
1.0%
192393520001 1
1.0%
199363230001 1
1.0%
199363270001 1
1.0%
199363280001 1
1.0%
ValueCountFrequency (%)
24584490000020000 2
2.0%
24580530001380000 1
1.0%
24579060001110000 2
2.0%
24577560000980000 2
2.0%
24575950000040000 1
1.0%
24575940001180000 2
2.0%
24575410000840000 1
1.0%
24572750000540000 2
2.0%
24570840001370000 2
2.0%
24570190000100000 2
2.0%

observation_concept_id
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
4060831
27 
4062019
27 
4099154
23 
4177340
23 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4060831
2nd row4062019
3rd row4099154
4th row4177340
5th row4060831

Common Values

ValueCountFrequency (%)
4060831 27
27.0%
4062019 27
27.0%
4099154 23
23.0%
4177340 23
23.0%

Length

2023-10-09T03:55:47.639725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:55:47.904670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
4060831 27
27.0%
4062019 27
27.0%
4099154 23
23.0%
4177340 23
23.0%

observation_date
Real number (ℝ)

Distinct26
Distinct (%)26.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201568.27
Minimum201208
Maximum201912
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:55:48.141943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum201208
5-th percentile201303
Q1201405
median201606
Q3201706
95-th percentile201911
Maximum201912
Range704
Interquartile range (IQR)301

Descriptive statistics

Standard deviation202.54202
Coefficient of variation (CV)0.0010048309
Kurtosis-1.0953559
Mean201568.27
Median Absolute Deviation (MAD)196.5
Skewness0.10014988
Sum20156827
Variance41023.27
MonotonicityNot monotonic
2023-10-09T03:55:48.459174image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
201403 8
 
8.0%
201606 7
 
7.0%
201607 5
 
5.0%
201412 4
 
4.0%
201912 4
 
4.0%
201208 4
 
4.0%
201801 4
 
4.0%
201911 4
 
4.0%
201509 4
 
4.0%
201702 4
 
4.0%
Other values (16) 52
52.0%
ValueCountFrequency (%)
201208 4
4.0%
201303 4
4.0%
201304 4
4.0%
201312 4
4.0%
201403 8
8.0%
201405 2
 
2.0%
201406 4
4.0%
201408 4
4.0%
201412 4
4.0%
201503 4
4.0%
ValueCountFrequency (%)
201912 4
4.0%
201911 4
4.0%
201901 2
2.0%
201811 4
4.0%
201804 2
2.0%
201801 4
4.0%
201711 2
2.0%
201710 1
 
1.0%
201706 4
4.0%
201705 4
4.0%

observation_type_concept_id
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
44814644
100 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row44814644
2nd row44814644
3rd row44814644
4th row44814644
5th row44814644

Common Values

ValueCountFrequency (%)
44814644 100
100.0%

Length

2023-10-09T03:55:48.700498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:55:48.871322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
44814644 100
100.0%

value_as_number
Real number (ℝ)

HIGH CORRELATION 

Distinct67
Distinct (%)67.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean104.51
Minimum43
Maximum180
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:55:49.108155image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum43
5-th percentile51
Q168
median90.5
Q3144.25
95-th percentile169
Maximum180
Range137
Interquartile range (IQR)76.25

Descriptive statistics

Standard deviation41.072858
Coefficient of variation (CV)0.3930041
Kurtosis-1.4160967
Mean104.51
Median Absolute Deviation (MAD)32.5
Skewness0.28830259
Sum10451
Variance1686.9797
MonotonicityNot monotonic
2023-10-09T03:55:49.409094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
80 7
 
7.0%
110 4
 
4.0%
156 3
 
3.0%
61 3
 
3.0%
158 3
 
3.0%
130 3
 
3.0%
51 3
 
3.0%
133 2
 
2.0%
74 2
 
2.0%
169 2
 
2.0%
Other values (57) 68
68.0%
ValueCountFrequency (%)
43 1
 
1.0%
46 1
 
1.0%
50 1
 
1.0%
51 3
3.0%
53 1
 
1.0%
54 1
 
1.0%
56 2
2.0%
58 2
2.0%
60 2
2.0%
61 3
3.0%
ValueCountFrequency (%)
180 1
1.0%
178 1
1.0%
172 1
1.0%
170 1
1.0%
169 2
2.0%
164 2
2.0%
163 1
1.0%
162 1
1.0%
160 2
2.0%
159 1
1.0%

unit_concept_id
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
8876
54 
9529
23 
8582
23 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row8876
2nd row8876
3rd row9529
4th row8582
5th row8876

Common Values

ValueCountFrequency (%)
8876 54
54.0%
9529 23
23.0%
8582 23
23.0%

Length

2023-10-09T03:55:49.804754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:55:50.149052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
8876 54
54.0%
9529 23
23.0%
8582 23
23.0%

unit_source_value
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
mmhg
54 
KG
23 
CM
23 

Length

Max length4
Median length4
Mean length3.08
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmmhg
2nd rowmmhg
3rd rowKG
4th rowCM
5th rowmmhg

Common Values

ValueCountFrequency (%)
mmhg 54
54.0%
KG 23
23.0%
CM 23
23.0%

Length

2023-10-09T03:55:50.854531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:55:51.201105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
mmhg 54
54.0%
kg 23
23.0%
cm 23
23.0%

Interactions

2023-10-09T03:55:45.814678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:40.557452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:43.615031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:46.017533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:41.261225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:44.670829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:46.372523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:42.353855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:45.518691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:55:51.429513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
observation_idobservation_concept_idobservation_datevalue_as_numberunit_concept_idunit_source_value
observation_id1.0000.3220.1660.0000.1410.141
observation_concept_id0.3221.0000.0000.8861.0001.000
observation_date0.1660.0001.0000.0000.0000.000
value_as_number0.0000.8860.0001.0000.8440.844
unit_concept_id0.1411.0000.0000.8441.0001.000
unit_source_value0.1411.0000.0000.8441.0001.000
2023-10-09T03:55:51.662164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
unit_source_valueunit_concept_idobservation_concept_id
unit_source_value1.0001.0000.995
unit_concept_id1.0001.0000.995
observation_concept_id0.9950.9951.000
2023-10-09T03:55:52.253268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
observation_idobservation_datevalue_as_numberobservation_concept_idunit_concept_idunit_source_value
observation_id1.0000.4920.1320.2170.2370.237
observation_date0.4921.0000.0850.0000.0000.000
value_as_number0.1320.0851.0000.7320.7310.731
observation_concept_id0.2170.0000.7321.0000.9950.995
unit_concept_id0.2370.0000.7310.9951.0001.000
unit_source_value0.2370.0000.7310.9951.0001.000

Missing values

2023-10-09T03:55:46.584802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:55:46.880702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

observation_idobservation_concept_idobservation_dateobservation_type_concept_idvalue_as_numberunit_concept_idunit_source_value
0289546300003406083120141244814644758876mmhg
12895462900034062019201412448146441418876mmhg
224570190000100000409915420141244814644519529KG
3245701900001000004177340201412448146441508582CM
4505140510002406083120181144814644748876mmhg
55051405000014062019201811448146441568876mmhg
624584490000020000409915420181144814644699529KG
7245844900000200004177340201811448146441638582CM
8231778410001406083120131244814644808876mmhg
9245663200012000004062019201312448146441458876mmhg
observation_idobservation_concept_idobservation_dateobservation_type_concept_idvalue_as_numberunit_concept_idunit_source_value
90365384910001409915420160644814644469529KG
91245754100008400004177340201606448146441488582CM
92163888680004406083120120844814644808876mmhg
931638886700014062019201208448146441308876mmhg
94163888730001409915420120844814644619529KG
95245615400002100004177340201208448146441648582CM
96248142950001406083120140344814644608876mmhg
972481429400014062019201403448146441108876mmhg
9824567400001060000409915420140344814644549529KG
99245674000010600004177340201403448146441588582CM