Overview

Dataset statistics

Number of variables4
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.4 KiB
Average record size in memory35.3 B

Variable types

Numeric2
Text2

Dataset

Description알코올 사용 장애 환자들의 다양한 공존 질환의 진단명과 진단코드, 최초진단일, 진단 데이터가 포함됨. 주요 동반질환은 소화계통의 질환, 정신 및 행동장애, 내분비질환, 순환기계 질환 등이 포함됨. 진단코드는 ICD-11 코드와 SNOMED-CT 코드로 매핑됨.
Author가톨릭대학교 은평성모병원
URLhttp://cmcdata.net/data/dataset/coexistence-disease-data-alcohol-use-disorder-eunpyeong

Alerts

RID has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:55:38.290807
Analysis finished2023-10-08 18:55:45.929273
Duration7.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Real number (ℝ)

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.5
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:55:46.160664image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5.95
Q125.75
median50.5
Q375.25
95-th percentile95.05
Maximum100
Range99
Interquartile range (IQR)49.5

Descriptive statistics

Standard deviation29.011492
Coefficient of variation (CV)0.57448499
Kurtosis-1.2
Mean50.5
Median Absolute Deviation (MAD)25
Skewness0
Sum5050
Variance841.66667
MonotonicityStrictly increasing
2023-10-09T03:55:46.586246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
1.0%
65 1
 
1.0%
75 1
 
1.0%
74 1
 
1.0%
73 1
 
1.0%
72 1
 
1.0%
71 1
 
1.0%
70 1
 
1.0%
69 1
 
1.0%
68 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
1 1
1.0%
2 1
1.0%
3 1
1.0%
4 1
1.0%
5 1
1.0%
6 1
1.0%
7 1
1.0%
8 1
1.0%
9 1
1.0%
10 1
1.0%
ValueCountFrequency (%)
100 1
1.0%
99 1
1.0%
98 1
1.0%
97 1
1.0%
96 1
1.0%
95 1
1.0%
94 1
1.0%
93 1
1.0%
92 1
1.0%
91 1
1.0%
Distinct66
Distinct (%)66.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:55:46.986742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length71
Median length47.5
Mean length32.52
Min length15

Characters and Unicode

Total characters3252
Distinct characters181
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique48 ?
Unique (%)48.0%

Sample

1st row혈액 및 조혈기관의 질환-빈혈-영양성 빈혈
2nd row내분비질환, 영양 및 대사-대사장애-수분, 전해질 및 산-염기균형의 기타 장애
3rd row소화계통의 질환-식도, 위 및 십이지장의 질환-위염 및 십이지장염
4th row정신 및 행동장애-기타-기타
5th row소화계통의 질환-간의 질환-간의 기타질환
ValueCountFrequency (%)
138
 
19.8%
증상 44
 
6.3%
검사의 25
 
3.6%
임상 25
 
3.6%
징후와 25
 
3.6%
소화계통의 17
 
2.4%
질환-간의 17
 
2.4%
정신 14
 
2.0%
순환계통의 14
 
2.0%
기타 13
 
1.9%
Other values (179) 366
52.4%
2023-10-09T03:55:47.796014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
598
 
18.4%
- 206
 
6.3%
155
 
4.8%
138
 
4.2%
126
 
3.9%
118
 
3.6%
100
 
3.1%
79
 
2.4%
79
 
2.4%
77
 
2.4%
Other values (171) 1576
48.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2396
73.7%
Space Separator 598
 
18.4%
Dash Punctuation 206
 
6.3%
Other Punctuation 51
 
1.6%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
155
 
6.5%
138
 
5.8%
126
 
5.3%
118
 
4.9%
100
 
4.2%
79
 
3.3%
79
 
3.3%
77
 
3.2%
69
 
2.9%
55
 
2.3%
Other values (167) 1400
58.4%
Space Separator
ValueCountFrequency (%)
598
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 206
100.0%
Other Punctuation
ValueCountFrequency (%)
, 51
100.0%
Uppercase Letter
ValueCountFrequency (%)
B 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2396
73.7%
Common 855
 
26.3%
Latin 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
155
 
6.5%
138
 
5.8%
126
 
5.3%
118
 
4.9%
100
 
4.2%
79
 
3.3%
79
 
3.3%
77
 
3.2%
69
 
2.9%
55
 
2.3%
Other values (167) 1400
58.4%
Common
ValueCountFrequency (%)
598
69.9%
- 206
 
24.1%
, 51
 
6.0%
Latin
ValueCountFrequency (%)
B 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2396
73.7%
ASCII 856
 
26.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
598
69.9%
- 206
 
24.1%
, 51
 
6.0%
B 1
 
0.1%
Hangul
ValueCountFrequency (%)
155
 
6.5%
138
 
5.8%
126
 
5.3%
118
 
4.9%
100
 
4.2%
79
 
3.3%
79
 
3.3%
77
 
3.2%
69
 
2.9%
55
 
2.3%
Other values (167) 1400
58.4%
Distinct80
Distinct (%)80.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:55:48.354249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.01
Min length3

Characters and Unicode

Total characters401
Distinct characters26
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique68 ?
Unique (%)68.0%

Sample

1st rowD509
2nd rowE871
3rd rowK294
4th rowF99
5th rowK769
ValueCountFrequency (%)
k769 5
 
5.0%
f329 5
 
5.0%
i109 3
 
3.0%
r074 3
 
3.0%
k291 2
 
2.0%
i638 2
 
2.0%
r1012 2
 
2.0%
r251 2
 
2.0%
r51 2
 
2.0%
e785 2
 
2.0%
Other values (70) 72
72.0%
2023-10-09T03:55:49.239105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 54
13.5%
9 46
11.5%
1 42
10.5%
2 29
 
7.2%
4 26
 
6.5%
7 26
 
6.5%
R 25
 
6.2%
8 21
 
5.2%
6 21
 
5.2%
5 19
 
4.7%
Other values (16) 92
22.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 301
75.1%
Uppercase Letter 100
 
24.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 25
25.0%
K 17
17.0%
I 14
14.0%
F 13
13.0%
E 8
 
8.0%
S 5
 
5.0%
N 4
 
4.0%
M 3
 
3.0%
H 2
 
2.0%
J 2
 
2.0%
Other values (6) 7
 
7.0%
Decimal Number
ValueCountFrequency (%)
0 54
17.9%
9 46
15.3%
1 42
14.0%
2 29
9.6%
4 26
8.6%
7 26
8.6%
8 21
 
7.0%
6 21
 
7.0%
5 19
 
6.3%
3 17
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Common 301
75.1%
Latin 100
 
24.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 25
25.0%
K 17
17.0%
I 14
14.0%
F 13
13.0%
E 8
 
8.0%
S 5
 
5.0%
N 4
 
4.0%
M 3
 
3.0%
H 2
 
2.0%
J 2
 
2.0%
Other values (6) 7
 
7.0%
Common
ValueCountFrequency (%)
0 54
17.9%
9 46
15.3%
1 42
14.0%
2 29
9.6%
4 26
8.6%
7 26
8.6%
8 21
 
7.0%
6 21
 
7.0%
5 19
 
6.3%
3 17
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 401
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 54
13.5%
9 46
11.5%
1 42
10.5%
2 29
 
7.2%
4 26
 
6.5%
7 26
 
6.5%
R 25
 
6.2%
8 21
 
5.2%
6 21
 
5.2%
5 19
 
4.7%
Other values (16) 92
22.9%

DIAG_1ST_DO
Real number (ℝ)

Distinct6
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2017.31
Minimum2015
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:55:49.636878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2015
5-th percentile2015
Q12015
median2018
Q32019
95-th percentile2020
Maximum2020
Range5
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.8460714
Coefficient of variation (CV)0.0009151154
Kurtosis-1.660724
Mean2017.31
Median Absolute Deviation (MAD)1
Skewness-0.18308288
Sum201731
Variance3.4079798
MonotonicityNot monotonic
2023-10-09T03:55:50.119880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2019 37
37.0%
2015 32
32.0%
2018 10
 
10.0%
2017 8
 
8.0%
2016 7
 
7.0%
2020 6
 
6.0%
ValueCountFrequency (%)
2015 32
32.0%
2016 7
 
7.0%
2017 8
 
8.0%
2018 10
 
10.0%
2019 37
37.0%
2020 6
 
6.0%
ValueCountFrequency (%)
2020 6
 
6.0%
2019 37
37.0%
2018 10
 
10.0%
2017 8
 
8.0%
2016 7
 
7.0%
2015 32
32.0%

Interactions

2023-10-09T03:55:44.936432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:43.831484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:45.456985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-09T03:55:44.556298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:55:50.769044image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDCODIAG_GRP1CODIAG_CDDIAG_1ST_DO
RID1.0000.0000.3680.000
CODIAG_GRP10.0001.0001.0000.635
CODIAG_CD0.3681.0001.0000.719
DIAG_1ST_DO0.0000.6350.7191.000
2023-10-09T03:55:51.107862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDDIAG_1ST_DO
RID1.0000.057
DIAG_1ST_DO0.0571.000

Missing values

2023-10-09T03:55:45.639628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:55:45.805053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

RIDCODIAG_GRP1CODIAG_CDDIAG_1ST_DO
01혈액 및 조혈기관의 질환-빈혈-영양성 빈혈D5092015
12내분비질환, 영양 및 대사-대사장애-수분, 전해질 및 산-염기균형의 기타 장애E8712020
23소화계통의 질환-식도, 위 및 십이지장의 질환-위염 및 십이지장염K2942015
34정신 및 행동장애-기타-기타F992016
45소화계통의 질환-간의 질환-간의 기타질환K7692016
56정신 및 행동장애-기분장애-우울장애F3282015
67순환계통의 질환-고혈압성 질환-고혈압I1092015
78순환계통의 질환-기타 형태의 심장병-발작성 빈맥I4712015
89손상, 중독 및 외인에 의한 특정 기타 결과-머리의 손상-두개내 손상S06502019
910증상, 징후와 임상 및 검사의 이상소견-순환계통 및 호흡계통의 증상 및 징후-목구멍 및 가슴의 통증R0742019
RIDCODIAG_GRP1CODIAG_CDDIAG_1ST_DO
9091증상, 징후와 임상 및 검사의 이상소견-순환계통 및 호흡계통의 증상 및 징후-목구멍 및 가슴의 통증R0742019
9192비뇨생식계통의 질환-남성생식기관의 질환-전림선증식증N4002016
9293순환계통의 질환-고혈압성 질환-고혈압I1092015
9394정신 및 행동장애-불안장애-기타 불안장애F4192019
9495피부 및 피하조직의 질환-피부 및 피하조직의 기타 장애-피부 및 피하조직의 기타 장애L842019
9596소화계통의 질환-간의 질환-간의 기타질환K7692015
9697내분비질환, 영양 및 대사-고지혈증-고지혈증E7852018
9798소화계통의 질환-식도, 위 및 십이지장의 질환-위염 및 십이지장염K2972016
9899소화계통의 질환-간의 질환-간의 섬유증 및 경변증K74692020
99100내분비질환, 영양 및 대사-대사장애-용적고갈E8602018