Overview

Dataset statistics

Number of variables4
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.4 KiB
Average record size in memory34.3 B

Variable types

Text1
Categorical2
Numeric1

Dataset

Description알코올 사용 장애 환자들의 최초 진단과와 최초 진단명과 진단코드 데이터. 진단과로는 소화기내과, 정신건강의학과, 응급의학과, 가정의학과 심장내과 등이 포함되어 환자 유입 경로를 분석할 수 있음. 진단코드는 ICD-11 코드와 SNOMED-CT 코드로 매핑됨.
Author가톨릭대학교 서울성모병원
URLhttp://cmcdata.net/data/dataset/diagnosis-data-alcohol-use-disorder

Alerts

RID has unique valuesUnique

Reproduction

Analysis started2023-10-08 18:56:20.507027
Analysis finished2023-10-08 18:56:21.295729
Duration0.79 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

RID
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-10-09T03:56:21.843141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters800
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st rowR0000001
2nd rowR0000004
3rd rowR0000007
4th rowR0000013
5th rowR0000015
ValueCountFrequency (%)
r0000001 1
 
1.0%
r0000170 1
 
1.0%
r0000202 1
 
1.0%
r0000197 1
 
1.0%
r0000196 1
 
1.0%
r0000195 1
 
1.0%
r0000188 1
 
1.0%
r0000186 1
 
1.0%
r0000184 1
 
1.0%
r0000183 1
 
1.0%
Other values (90) 90
90.0%
2023-10-09T03:56:22.773101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 457
57.1%
R 100
 
12.5%
1 55
 
6.9%
2 51
 
6.4%
4 24
 
3.0%
3 23
 
2.9%
5 20
 
2.5%
8 19
 
2.4%
6 18
 
2.2%
7 17
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 700
87.5%
Uppercase Letter 100
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 457
65.3%
1 55
 
7.9%
2 51
 
7.3%
4 24
 
3.4%
3 23
 
3.3%
5 20
 
2.9%
8 19
 
2.7%
6 18
 
2.6%
7 17
 
2.4%
9 16
 
2.3%
Uppercase Letter
ValueCountFrequency (%)
R 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 700
87.5%
Latin 100
 
12.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 457
65.3%
1 55
 
7.9%
2 51
 
7.3%
4 24
 
3.4%
3 23
 
3.3%
5 20
 
2.9%
8 19
 
2.7%
6 18
 
2.6%
7 17
 
2.4%
9 16
 
2.3%
Latin
ValueCountFrequency (%)
R 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 457
57.1%
R 100
 
12.5%
1 55
 
6.9%
2 51
 
6.4%
4 24
 
3.0%
3 23
 
2.9%
5 20
 
2.5%
8 19
 
2.4%
6 18
 
2.2%
7 17
 
2.1%

DEPTNM
Categorical

Distinct10
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
정신건강의학과
65 
응급의학과
18 
소화기내과
신경과
 
3
가정의학과
 
2
Other values (5)
 
5

Length

Max length7
Median length7
Mean length6.2
Min length2

Unique

Unique5 ?
Unique (%)5.0%

Sample

1st row응급의학과
2nd row응급의학과
3rd row정신건강의학과
4th row정신건강의학과
5th row정신건강의학과

Common Values

ValueCountFrequency (%)
정신건강의학과 65
65.0%
응급의학과 18
 
18.0%
소화기내과 7
 
7.0%
신경과 3
 
3.0%
가정의학과 2
 
2.0%
순환기내과 1
 
1.0%
내분비내과 1
 
1.0%
외과 1
 
1.0%
재활의학과 1
 
1.0%
신경외과 1
 
1.0%

Length

2023-10-09T03:56:23.136467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:23.408898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정신건강의학과 65
65.0%
응급의학과 18
 
18.0%
소화기내과 7
 
7.0%
신경과 3
 
3.0%
가정의학과 2
 
2.0%
순환기내과 1
 
1.0%
내분비내과 1
 
1.0%
외과 1
 
1.0%
재활의학과 1
 
1.0%
신경외과 1
 
1.0%

DIAGCD
Categorical

Distinct4
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
F102
48 
F101
45 
F104
F103
 
1

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st rowF102
2nd rowF102
3rd rowF101
4th rowF102
5th rowF102

Common Values

ValueCountFrequency (%)
F102 48
48.0%
F101 45
45.0%
F104 6
 
6.0%
F103 1
 
1.0%

Length

2023-10-09T03:56:23.665299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-09T03:56:23.850524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
f102 48
48.0%
f101 45
45.0%
f104 6
 
6.0%
f103 1
 
1.0%

DIAG_date
Real number (ℝ)

Distinct11
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2013.07
Minimum2008
Maximum2018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-10-09T03:56:24.021659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2008
5-th percentile2009
Q12010
median2013
Q32015
95-th percentile2018
Maximum2018
Range10
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.8998607
Coefficient of variation (CV)0.0014405166
Kurtosis-1.2046935
Mean2013.07
Median Absolute Deviation (MAD)2.5
Skewness0.081212327
Sum201307
Variance8.4091919
MonotonicityNot monotonic
2023-10-09T03:56:24.224449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
2010 14
14.0%
2015 13
13.0%
2014 12
12.0%
2011 11
11.0%
2016 11
11.0%
2009 10
10.0%
2012 10
10.0%
2018 8
8.0%
2017 5
 
5.0%
2013 4
 
4.0%
ValueCountFrequency (%)
2008 2
 
2.0%
2009 10
10.0%
2010 14
14.0%
2011 11
11.0%
2012 10
10.0%
2013 4
 
4.0%
2014 12
12.0%
2015 13
13.0%
2016 11
11.0%
2017 5
 
5.0%
ValueCountFrequency (%)
2018 8
8.0%
2017 5
 
5.0%
2016 11
11.0%
2015 13
13.0%
2014 12
12.0%
2013 4
 
4.0%
2012 10
10.0%
2011 11
11.0%
2010 14
14.0%
2009 10
10.0%

Interactions

2023-10-09T03:56:20.777942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-09T03:56:24.369683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIDDEPTNMDIAGCDDIAG_date
RID1.0001.0001.0001.000
DEPTNM1.0001.0000.6640.408
DIAGCD1.0000.6641.0000.408
DIAG_date1.0000.4080.4081.000
2023-10-09T03:56:24.527526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
DIAGCDDEPTNM
DIAGCD1.0000.449
DEPTNM0.4491.000
2023-10-09T03:56:24.701330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
DIAG_dateDEPTNMDIAGCD
DIAG_date1.0000.1810.250
DEPTNM0.1811.0000.449
DIAGCD0.2500.4491.000

Missing values

2023-10-09T03:56:21.009491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-09T03:56:21.228573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

RIDDEPTNMDIAGCDDIAG_date
0R0000001응급의학과F1022011
1R0000004응급의학과F1022014
2R0000007정신건강의학과F1012010
3R0000013정신건강의학과F1022016
4R0000015정신건강의학과F1022010
5R0000019정신건강의학과F1022018
6R0000020정신건강의학과F1022015
7R0000022순환기내과F1022008
8R0000026정신건강의학과F1012014
9R0000028정신건강의학과F1012010
RIDDEPTNMDIAGCDDIAG_date
90R0000244정신건강의학과F1022010
91R0000246정신건강의학과F1012015
92R0000247응급의학과F1022010
93R0000249정신건강의학과F1012009
94R0000253정신건강의학과F1012011
95R0000256정신건강의학과F1012015
96R0000259정신건강의학과F1012014
97R0000260응급의학과F1012014
98R0000272정신건강의학과F1012009
99R0000274신경외과F1022017