gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	100
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	3.4 KiB
Average record size in memory	34.3 B

Variable types

Text	1
Categorical	2
Numeric	1

Dataset

Description	알코올 사용 장애 환자들의 최초 진단과와 최초 진단명과 진단코드 데이터. 진단과로는 소화기내과, 정신건강의학과, 응급의학과, 가정의학과 심장내과 등이 포함되어 환자 유입 경로를 분석할 수 있음. 진단코드는 ICD-11 코드와 SNOMED-CT 코드로 매핑됨.
Author	가톨릭대학교 서울성모병원
URL	http://cmcdata.net/data/dataset/diagnosis-data-alcohol-use-disorder

Alerts

RID has unique values Unique

Reproduction

Analysis started	2023-10-08 18:56:20.507027
Analysis finished	2023-10-08 18:56:21.295729
Duration	0.79 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

RID
Text

UNIQUE

Distinct	100
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

Length

Max length	8
Median length	8
Mean length	8
Min length	8

Characters and Unicode

Total characters	800
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	100 ?
Unique (%)	100.0%

Sample

1st row	R0000001
2nd row	R0000004
3rd row	R0000007
4th row	R0000013
5th row	R0000015

Value	Count	Frequency (%)
r0000001	1	1.0%
r0000170	1	1.0%
r0000202	1	1.0%
r0000197	1	1.0%
r0000196	1	1.0%
r0000195	1	1.0%
r0000188	1	1.0%
r0000186	1	1.0%
r0000184	1	1.0%
r0000183	1	1.0%
Other values (90)	90	90.0%

Most occurring characters

Value	Count	Frequency (%)
0	457	57.1%
R	100	12.5%
1	55	6.9%
2	51	6.4%
4	24	3.0%
3	23	2.9%
5	20	2.5%
8	19	2.4%
6	18	2.2%
7	17	2.1%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	700	87.5%
Uppercase Letter	100	12.5%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	457	65.3%
1	55	7.9%
2	51	7.3%
4	24	3.4%
3	23	3.3%
5	20	2.9%
8	19	2.7%
6	18	2.6%
7	17	2.4%
9	16	2.3%

Uppercase Letter

Value	Count	Frequency (%)
R	100	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	700	87.5%
Latin	100	12.5%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	457	65.3%
1	55	7.9%
2	51	7.3%
4	24	3.4%
3	23	3.3%
5	20	2.9%
8	19	2.7%
6	18	2.6%
7	17	2.4%
9	16	2.3%

Latin

Value	Count	Frequency (%)
R	100	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	800	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	457	57.1%
R	100	12.5%
1	55	6.9%
2	51	6.4%
4	24	3.0%
3	23	2.9%
5	20	2.5%
8	19	2.4%
6	18	2.2%
7	17	2.1%

DEPTNM
Categorical

Distinct	10
Distinct (%)	10.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

정신건강의학과	65
응급의학과	18
소화기내과	7
신경과	3
가정의학과	2
Other values (5)	5

Length

Max length	7
Median length	7
Mean length	6.2
Min length	2

Unique

Unique	5 ?
Unique (%)	5.0%

Sample

1st row	응급의학과
2nd row	응급의학과
3rd row	정신건강의학과
4th row	정신건강의학과
5th row	정신건강의학과

Common Values

Value	Count	Frequency (%)
정신건강의학과	65	65.0%
응급의학과	18	18.0%
소화기내과	7	7.0%
신경과	3	3.0%
가정의학과	2	2.0%
순환기내과	1	1.0%
내분비내과	1	1.0%
외과	1	1.0%
재활의학과	1	1.0%
신경외과	1	1.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
정신건강의학과	65	65.0%
응급의학과	18	18.0%
소화기내과	7	7.0%
신경과	3	3.0%
가정의학과	2	2.0%
순환기내과	1	1.0%
내분비내과	1	1.0%
외과	1	1.0%
재활의학과	1	1.0%
신경외과	1	1.0%

DIAGCD
Categorical

Distinct	4
Distinct (%)	4.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

F102	48
F101	45
F104	6
F103	1

Length

Max length	4
Median length	4
Mean length	4
Min length	4

Unique

Unique	1 ?
Unique (%)	1.0%

Sample

1st row	F102
2nd row	F102
3rd row	F101
4th row	F102
5th row	F102

Common Values

Value	Count	Frequency (%)
F102	48	48.0%
F101	45	45.0%
F104	6	6.0%
F103	1	1.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
f102	48	48.0%
f101	45	45.0%
f104	6	6.0%
f103	1	1.0%

DIAG_date
Real number (ℝ)

Distinct	11
Distinct (%)	11.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	2013.07

Minimum	2008
Maximum	2018
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	1.0 KiB

Quantile statistics

Minimum	2008
5-th percentile	2009
Q1	2010
median	2013
Q3	2015
95-th percentile	2018
Maximum	2018
Range	10
Interquartile range (IQR)	5

Descriptive statistics

Standard deviation	2.8998607
Coefficient of variation (CV)	0.0014405166
Kurtosis	-1.2046935
Mean	2013.07
Median Absolute Deviation (MAD)	2.5
Skewness	0.081212327
Sum	201307
Variance	8.4091919
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=11)

Value	Count	Frequency (%)
2010	14	14.0%
2015	13	13.0%
2014	12	12.0%
2011	11	11.0%
2016	11	11.0%
2009	10	10.0%
2012	10	10.0%
2018	8	8.0%
2017	5	5.0%
2013	4	4.0%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
2008	2	2.0%
2009	10	10.0%
2010	14	14.0%
2011	11	11.0%
2012	10	10.0%
2013	4	4.0%
2014	12	12.0%
2015	13	13.0%
2016	11	11.0%
2017	5	5.0%

Value	Count	Frequency (%)
2018	8	8.0%
2017	5	5.0%
2016	11	11.0%
2015	13	13.0%
2014	12	12.0%
2013	4	4.0%
2012	10	10.0%
2011	11	11.0%
2010	14	14.0%
2009	10	10.0%

DIAG_date

DIAG_date

Heatmap
Table

	RID	DEPTNM	DIAGCD	DIAG_date
RID	1.000	1.000	1.000	1.000
DEPTNM	1.000	1.000	0.664	0.408
DIAGCD	1.000	0.664	1.000	0.408
DIAG_date	1.000	0.408	0.408	1.000

Heatmap
Table

	DIAGCD	DEPTNM
DIAGCD	1.000	0.449
DEPTNM	0.449	1.000

Heatmap
Table

	DIAG_date	DEPTNM	DIAGCD
DIAG_date	1.000	0.181	0.250
DEPTNM	0.181	1.000	0.449
DIAGCD	0.250	0.449	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	RID	DEPTNM	DIAGCD	DIAG_date
0	R0000001	응급의학과	F102	2011
1	R0000004	응급의학과	F102	2014
2	R0000007	정신건강의학과	F101	2010
3	R0000013	정신건강의학과	F102	2016
4	R0000015	정신건강의학과	F102	2010
5	R0000019	정신건강의학과	F102	2018
6	R0000020	정신건강의학과	F102	2015
7	R0000022	순환기내과	F102	2008
8	R0000026	정신건강의학과	F101	2014
9	R0000028	정신건강의학과	F101	2010

	RID	DEPTNM	DIAGCD	DIAG_date
90	R0000244	정신건강의학과	F102	2010
91	R0000246	정신건강의학과	F101	2015
92	R0000247	응급의학과	F102	2010
93	R0000249	정신건강의학과	F101	2009
94	R0000253	정신건강의학과	F101	2011
95	R0000256	정신건강의학과	F101	2015
96	R0000259	정신건강의학과	F101	2014
97	R0000260	응급의학과	F101	2014
98	R0000272	정신건강의학과	F101	2009
99	R0000274	신경외과	F102	2017

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample