Overview

Dataset statistics

Number of variables4
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.4 KiB
Average record size in memory34.3 B

Variable types

Text3
Numeric1

Dataset

Description알코올 사용 장애 환자들의 다양한 공존 질환의 진단명과 진단코드, 최초진단일, 진단 데이터가 포함됨. 주요 동반질환은 소화계통의 질환, 정신 및 행동장애, 내분비질환, 순환기계 질환 등이 포함됨. 진단코드는 ICD-11 코드와 SNOMED-CT 코드로 매핑됨.
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/coexistence-disease-data-alcohol-use-disorder

Alerts

RID has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:56:20.556207
Analysis finished2023-10-08 18:56:22.256037
Duration1.7 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:56:22.778320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters800
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowR0000347
2nd rowR0000348
3rd rowR0000349
4th rowR0000350
5th rowR0000351
ValueCountFrequency (%)
r0000347 1
 
1.0%
r0000413 1
 
1.0%
r0000425 1
 
1.0%
r0000424 1
 
1.0%
r0000423 1
 
1.0%
r0000422 1
 
1.0%
r0000421 1
 
1.0%
r0000420 1
 
1.0%
r0000419 1
 
1.0%
r0000418 1
 
1.0%
Other values (90) 90
90.0%
2023-10-09T03:56:23.836509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 421
52.6%
R 100
 
12.5%
4 74
 
9.2%
3 69
 
8.6%
5 23
 
2.9%
9 20
 
2.5%
2 20
 
2.5%
8 20
 
2.5%
1 19
 
2.4%
7 17
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.5%
Uppercase Letter 100
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 421
60.1%
4 74
 
10.6%
3 69
 
9.9%
5 23
 
3.3%
9 20
 
2.9%
2 20
 
2.9%
8 20
 
2.9%
1 19
 
2.7%
7 17
 
2.4%
6 17
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
R 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.5%
Latin 100
 
12.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 421
60.1%
4 74
 
10.6%
3 69
 
9.9%
5 23
 
3.3%
9 20
 
2.9%
2 20
 
2.9%
8 20
 
2.9%
1 19
 
2.7%
7 17
 
2.4%
6 17
 
2.4%
Latin
ValueCountFrequency (%)
R 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 421
52.6%
R 100
 
12.5%
4 74
 
9.2%
3 69
 
8.6%
5 23
 
2.9%
9 20
 
2.5%
2 20
 
2.5%
8 20
 
2.5%
1 19
 
2.4%
7 17
 
2.1%
Distinct61
Distinct (%)61.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:56:24.336328image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length75
Median length48
Mean length31.81
Min length14

Characters and Unicode

Total characters3181
Distinct characters182
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)44.0%

Sample

1st row피부 및 피하조직의 질환-피부 및 피하조직의 감염-연조직염
2nd row정신 및 행동장애-수면-각성 장애-기타 수면장애
3rd row정신 및 행동장애-수면-각성 장애-기타 수면장애
4th row순환계통의 질환-허혈성심질환-협심증
5th row소화계통의 질환-담낭, 담도 및 췌장의 장애-담석증
ValueCountFrequency (%)
133
 
19.4%
소화계통의 32
 
4.7%
증상 29
 
4.2%
질환-간의 18
 
2.6%
검사의 16
 
2.3%
임상 16
 
2.3%
징후와 16
 
2.3%
정신 16
 
2.3%
질환-식도 12
 
1.8%
12
 
1.8%
Other values (184) 385
56.2%
2023-10-09T03:56:25.062815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
585
 
18.4%
- 208
 
6.5%
141
 
4.4%
133
 
4.2%
106
 
3.3%
104
 
3.3%
92
 
2.9%
89
 
2.8%
65
 
2.0%
64
 
2.0%
Other values (172) 1594
50.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2334
73.4%
Space Separator 585
 
18.4%
Dash Punctuation 208
 
6.5%
Other Punctuation 51
 
1.6%
Uppercase Letter 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
141
 
6.0%
133
 
5.7%
106
 
4.5%
104
 
4.5%
92
 
3.9%
89
 
3.8%
65
 
2.8%
64
 
2.7%
61
 
2.6%
57
 
2.4%
Other values (168) 1422
60.9%
Space Separator
ValueCountFrequency (%)
585
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 208
100.0%
Other Punctuation
ValueCountFrequency (%)
, 51
100.0%
Uppercase Letter
ValueCountFrequency (%)
B 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2334
73.4%
Common 844
 
26.5%
Latin 3
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
141
 
6.0%
133
 
5.7%
106
 
4.5%
104
 
4.5%
92
 
3.9%
89
 
3.8%
65
 
2.8%
64
 
2.7%
61
 
2.6%
57
 
2.4%
Other values (168) 1422
60.9%
Common
ValueCountFrequency (%)
585
69.3%
- 208
 
24.6%
, 51
 
6.0%
Latin
ValueCountFrequency (%)
B 3
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2334
73.4%
ASCII 847
 
26.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
585
69.1%
- 208
 
24.6%
, 51
 
6.0%
B 3
 
0.4%
Hangul
ValueCountFrequency (%)
141
 
6.0%
133
 
5.7%
106
 
4.5%
104
 
4.5%
92
 
3.9%
89
 
3.8%
65
 
2.8%
64
 
2.7%
61
 
2.6%
57
 
2.4%
Other values (168) 1422
60.9%
Distinct82
Distinct (%)82.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:56:25.520149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.05
Min length3

Characters and Unicode

Total characters405
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique69 ?
Unique (%)69.0%

Sample

1st rowL039
2nd rowG479
3rd rowG478
4th rowI200
5th rowK8020
ValueCountFrequency (%)
k760 5
 
5.0%
f329 3
 
3.0%
k742 3
 
3.0%
k210 2
 
2.0%
l039 2
 
2.0%
b181 2
 
2.0%
k746 2
 
2.0%
k219 2
 
2.0%
r509 2
 
2.0%
r418 2
 
2.0%
Other values (72) 75
75.0%
2023-10-09T03:56:26.184564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 47
11.6%
1 42
10.4%
9 37
9.1%
2 34
 
8.4%
4 33
 
8.1%
K 31
 
7.7%
7 26
 
6.4%
3 26
 
6.4%
8 24
 
5.9%
6 20
 
4.9%
Other values (15) 85
21.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 305
75.3%
Uppercase Letter 100
 
24.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
K 31
31.0%
R 16
16.0%
F 12
 
12.0%
E 5
 
5.0%
G 5
 
5.0%
J 5
 
5.0%
L 4
 
4.0%
M 4
 
4.0%
B 3
 
3.0%
D 3
 
3.0%
Other values (5) 12
 
12.0%
Decimal Number
ValueCountFrequency (%)
0 47
15.4%
1 42
13.8%
9 37
12.1%
2 34
11.1%
4 33
10.8%
7 26
8.5%
3 26
8.5%
8 24
7.9%
6 20
6.6%
5 16
 
5.2%

Most occurring scripts

ValueCountFrequency (%)
Common 305
75.3%
Latin 100
 
24.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
K 31
31.0%
R 16
16.0%
F 12
 
12.0%
E 5
 
5.0%
G 5
 
5.0%
J 5
 
5.0%
L 4
 
4.0%
M 4
 
4.0%
B 3
 
3.0%
D 3
 
3.0%
Other values (5) 12
 
12.0%
Common
ValueCountFrequency (%)
0 47
15.4%
1 42
13.8%
9 37
12.1%
2 34
11.1%
4 33
10.8%
7 26
8.5%
3 26
8.5%
8 24
7.9%
6 20
6.6%
5 16
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 405
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 47
11.6%
1 42
10.4%
9 37
9.1%
2 34
 
8.4%
4 33
 
8.1%
K 31
 
7.7%
7 26
 
6.4%
3 26
 
6.4%
8 24
 
5.9%
6 20
 
4.9%
Other values (15) 85
21.0%

DIAG_1ST_DD
Real number (ℝ)

Distinct10
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011.93
Minimum2008
Maximum2017
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:26.449244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2008
5-th percentile2008
Q12009
median2011
Q32014
95-th percentile2017
Maximum2017
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.0359962
Coefficient of variation (CV)0.0015089969
Kurtosis-1.3011531
Mean2011.93
Median Absolute Deviation (MAD)2
Skewness0.31995257
Sum201193
Variance9.2172727
MonotonicityNot monotonic
2023-10-09T03:56:26.648369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2009 19
19.0%
2010 12
12.0%
2014 12
12.0%
2008 12
12.0%
2017 10
10.0%
2016 9
9.0%
2011 9
9.0%
2013 6
 
6.0%
2012 6
 
6.0%
2015 5
 
5.0%
ValueCountFrequency (%)
2008 12
12.0%
2009 19
19.0%
2010 12
12.0%
2011 9
9.0%
2012 6
 
6.0%
2013 6
 
6.0%
2014 12
12.0%
2015 5
 
5.0%
2016 9
9.0%
2017 10
10.0%
ValueCountFrequency (%)
2017 10
10.0%
2016 9
9.0%
2015 5
 
5.0%
2014 12
12.0%
2013 6
 
6.0%
2012 6
 
6.0%
2011 9
9.0%
2010 12
12.0%
2009 19
19.0%
2008 12
12.0%

Interactions

2023-10-09T03:56:21.533354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:56:26.791142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDCODIAG_GRP1CODIAG_CDDIAG_1ST_DD
RID1.0001.0001.0001.000
CODIAG_GRP11.0001.0001.0000.655
CODIAG_CD1.0001.0001.0000.866
DIAG_1ST_DD1.0000.6550.8661.000

Missing values

2023-10-09T03:56:21.969449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:56:22.191931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

RIDCODIAG_GRP1CODIAG_CDDIAG_1ST_DD
0R0000347피부 및 피하조직의 질환-피부 및 피하조직의 감염-연조직염L0392013
1R0000348정신 및 행동장애-수면-각성 장애-기타 수면장애G4792017
2R0000349정신 및 행동장애-수면-각성 장애-기타 수면장애G4782010
3R0000350순환계통의 질환-허혈성심질환-협심증I2002014
4R0000351소화계통의 질환-담낭, 담도 및 췌장의 장애-담석증K80202014
5R0000352정신 및 행동장애-신경인지장애-알츠하이머 병으로 인한 신경인지장애F0092014
6R0000353증상, 징후와 임상 및 검사의 이상소견-순환계통 및 호흡계통의 증상 및 징후-목구멍 및 가슴의 통증R0742009
7R0000354손상, 중독 및 외인에 의한 특정 기타 결과-흉부의 손상-흉부의 손상S223302016
8R0000355증상, 징후와 임상 및 검사의 이상소견-인지, 지각, 정서상태 및 행위에 관련된 증상 및 징후-인지기능 및 자각에 관련된 증상 및 징후R4132010
9R0000356소화계통의 질환-간의 질환-분류되지 않은 지방간K7602017
RIDCODIAG_GRP1CODIAG_CDDIAG_1ST_DD
90R0000443정신 및 행동장애-기분장애-우울장애F3292010
91R0000444신경계통의 질환-우발적 및 발작적 장애-기타두통G4422010
92R0000445소화계통의 질환-장의 기타 질환-기타 기능성 장장애K5992017
93R0000446소화계통의 질환-식도, 위 및 십이지장의 질환-위-식도역류병K2102009
94R0000447감영성 및 기생충성 질환-바이러스 간염-만성 B형 간염B1812015
95R0000448내분비질환, 영양 및 대사-당뇨-당뇨E1492011
96R0000449소화계통의 질환-간의 질환-간의 섬유증 및 경변증K7422009
97R0000450증상, 징후와 임상 및 검사의 이상소견-인지, 지각, 정서상태 및 행위에 관련된 증상 및 징후-인지기능 및 자각에 관련된 증상 및 징후R4182009
98R0000451신생물-악성 신생물-간 및 간내 담관의 악성 신생물C2212015
99R0000452소화계통의 질환-간의 질환-분류되지 않은 지방간K7602009