gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	31
Missing cells	1
Missing cells (%)	0.8%
Duplicate rows	1
Duplicate rows (%)	3.2%
Total size in memory	1.1 KiB
Average record size in memory	36.3 B

Variable types

Categorical	2
Text	2

Dataset

Description	울산광역시 북구 병리검사에 대한 데이터로 병리검사목록, 검사항목, 참고치(단위), 처리기한 등의 항목을 제공합니다.
Author	울산광역시 북구
URL	https://www.data.go.kr/data/3076002/fileData.do

Alerts

Dataset has 1 (3.2%) duplicate rows	Duplicates
`구 분` is highly overall correlated with `처리기한`	High correlation
`처리기한` is highly overall correlated with `구 분`	High correlation
`참고치 (단위)` has 1 (3.2%) missing values	Missing

Reproduction

Analysis started	2023-12-12 14:55:48.268583
Analysis finished	2023-12-12 14:55:48.668622
Duration	0.4 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

구 분
Categorical

HIGH CORRELATION

Distinct	13
Distinct (%)	41.9%
Missing	0
Missing (%)	0.0%
Memory size	380.0 B

간기능검사	7
고지혈증검사	5
소변검사	3
신장기능검사	3
빈혈검사	2
Other values (8)	11

Length

Max length	13
Median length	6
Mean length	6.0322581
Min length	4

Unique

Unique	5 ?
Unique (%)	16.1%

Sample

1st row	당뇨검사
2nd row	당뇨정밀검사
3rd row	빈혈검사
4th row	빈혈검사
5th row	소변검사

Common Values

Value	Count	Frequency (%)
간기능검사	7	22.6%
고지혈증검사	5	16.1%
소변검사	3	9.7%
신장기능검사	3	9.7%
빈혈검사	2	6.5%
통풍검사	2	6.5%
B형 간염검사(정밀검사)	2	6.5%
C형 간염검사(정밀검사)	2	6.5%
당뇨검사	1	3.2%
당뇨정밀검사	1	3.2%
Other values (3)	3	9.7%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
간기능검사	7	20.0%
고지혈증검사	5	14.3%
간염검사(정밀검사	4	11.4%
소변검사	3	8.6%
신장기능검사	3	8.6%
빈혈검사	2	5.7%
통풍검사	2	5.7%
b형	2	5.7%
c형	2	5.7%
당뇨검사	1	2.9%
Other values (4)	4	11.4%

검사항목
Text

Distinct	23
Distinct (%)	74.2%
Missing	0
Missing (%)	0.0%
Memory size	380.0 B

Length

Max length	15
Median length	11
Mean length	8.1935484
Min length	2

Characters and Unicode

Total characters	254
Distinct characters	56
Distinct categories	9 ?
Distinct scripts	4 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	15 ?
Unique (%)	48.4%

Sample

1st row	Glucose (혈당)
2nd row	HbA1C
3rd row	Hb (헤모글로빈)
4th row	Hb (헤모글로빈)
5th row	3종

Value	Count	Frequency (%)
creatinine	2	5.1%
acid	2	5.1%
hcv(eia	2	5.1%
hb	2	5.1%
hdl-cholesterol	2	5.1%
헤모글로빈	2	5.1%
uric	2	5.1%
γ-gtp	2	5.1%
alt	2	5.1%
ast	2	5.1%
Other values (18)	19	48.7%

Most occurring characters

Value	Count	Frequency (%)
e	15	5.9%
A	15	5.9%
i	13	5.1%
C	11	4.3%
r	11	4.3%
l	11	4.3%
H	10	3.9%
(	9	3.5%
)	9	3.5%
o	9	3.5%
Other values (46)	141	55.5%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	105	41.3%
Uppercase Letter	88	34.6%
Other Letter	22	8.7%
Open Punctuation	9	3.5%
Close Punctuation	9	3.5%
Dash Punctuation	8	3.1%
Space Separator	8	3.1%
Decimal Number	4	1.6%
Other Punctuation	1	0.4%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
e	15	14.3%
i	13	12.4%
r	11	10.5%
l	11	10.5%
o	9	8.6%
t	7	6.7%
s	7	6.7%
c	6	5.7%
n	5	4.8%
b	5	4.8%
Other values (7)	16	15.2%

Uppercase Letter

Value	Count	Frequency (%)
A	15	17.0%
C	11	12.5%
H	10	11.4%
T	8	9.1%
I	6	6.8%
L	6	6.8%
B	5	5.7%
P	5	5.7%
E	5	5.7%
U	3	3.4%
Other values (6)	14	15.9%

Other Letter

Value	Count	Frequency (%)
종	2	9.1%
경	2	9.1%
헤	2	9.1%
로	2	9.1%
글	2	9.1%
빈	2	9.1%
모	2	9.1%
당	1	4.5%
혈	1	4.5%
검	1	4.5%
Other values (5)	5	22.7%

Decimal Number

Value	Count	Frequency (%)
1	2	50.0%
3	1	25.0%
0	1	25.0%

Open Punctuation

Value	Count	Frequency (%)
(	9	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	9	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	8	100.0%

Space Separator

Value	Count	Frequency (%)
	8	100.0%

Other Punctuation

Value	Count	Frequency (%)
,	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	191	75.2%
Common	39	15.4%
Hangul	22	8.7%
Greek	2	0.8%

Most frequent character per script

Latin

Value	Count	Frequency (%)
e	15	7.9%
A	15	7.9%
i	13	6.8%
C	11	5.8%
r	11	5.8%
l	11	5.8%
H	10	5.2%
o	9	4.7%
T	8	4.2%
t	7	3.7%
Other values (22)	81	42.4%

Hangul

Value	Count	Frequency (%)
종	2	9.1%
경	2	9.1%
헤	2	9.1%
로	2	9.1%
글	2	9.1%
빈	2	9.1%
모	2	9.1%
당	1	4.5%
혈	1	4.5%
검	1	4.5%
Other values (5)	5	22.7%

Common

Value	Count	Frequency (%)
(	9	23.1%
)	9	23.1%
-	8	20.5%
	8	20.5%
1	2	5.1%
3	1	2.6%
,	1	2.6%
0	1	2.6%

Greek

Value	Count	Frequency (%)
γ	2	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	230	90.6%
Hangul	22	8.7%
None	2	0.8%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
e	15	6.5%
A	15	6.5%
i	13	5.7%
C	11	4.8%
r	11	4.8%
l	11	4.8%
H	10	4.3%
(	9	3.9%
)	9	3.9%
o	9	3.9%
Other values (30)	117	50.9%

Hangul

Value	Count	Frequency (%)
종	2	9.1%
경	2	9.1%
헤	2	9.1%
로	2	9.1%
글	2	9.1%
빈	2	9.1%
모	2	9.1%
당	1	4.5%
혈	1	4.5%
검	1	4.5%
Other values (5)	5	22.7%

None

Value	Count	Frequency (%)
γ	2	100.0%

참고치 (단위)
Text

MISSING

Distinct	24
Distinct (%)	80.0%
Missing	1
Missing (%)	3.2%
Memory size	380.0 B

Length

Max length	20
Median length	16.5
Mean length	13.333333
Min length	2

Characters and Unicode

Total characters	400
Distinct characters	33
Distinct categories	9 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	20 ?
Unique (%)	66.7%

Sample

1st row	70 ~ 100 (mg/dL) 공복
2nd row	3.5 ~ 5.9이하 (mg/dL)
3rd row	남 : 13이상(mg/dL)
4th row	여 : 12이상(mg/dL)
5th row	불검출

Value	Count	Frequency (%)
	16	20.0%
mg/dl	14	17.5%
여성	6	7.5%
남성	6	7.5%
음성(1.0이하	3	3.8%
불검출	3	3.8%
200이하	2	2.5%
31이하(mg/dl	2	2.5%
음성	2	2.5%
5.7이하	1	1.2%
Other values (25)	25	31.2%

Most occurring characters

Value	Count	Frequency (%)
	52	13.0%
)	26	6.5%
(	26	6.5%
d	21	5.2%
g	21	5.2%
m	21	5.2%
L	21	5.2%
/	21	5.2%
이	20	5.0%
0	19	4.8%
Other values (23)	152	38.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	89	22.2%
Decimal Number	68	17.0%
Lowercase Letter	63	15.8%
Space Separator	52	13.0%
Other Punctuation	48	12.0%
Close Punctuation	26	6.5%
Open Punctuation	26	6.5%
Uppercase Letter	21	5.2%
Math Symbol	7	1.8%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
이	20	22.5%
성	18	20.2%
하	15	16.9%
여	7	7.9%
남	7	7.9%
상	5	5.6%
음	5	5.6%
출	3	3.4%
검	3	3.4%
불	3	3.4%
Other values (3)	3	3.4%

Decimal Number

Value	Count	Frequency (%)
0	19	27.9%
1	16	23.5%
3	7	10.3%
5	7	10.3%
2	6	8.8%
7	6	8.8%
9	3	4.4%
4	2	2.9%
6	2	2.9%

Lowercase Letter

Value	Count	Frequency (%)
d	21	33.3%
g	21	33.3%
m	21	33.3%

Other Punctuation

Value	Count	Frequency (%)
/	21	43.8%
:	14	29.2%
.	13	27.1%

Space Separator

Value	Count	Frequency (%)
	52	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	26	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	26	100.0%

Uppercase Letter

Value	Count	Frequency (%)
L	21	100.0%

Math Symbol

Value	Count	Frequency (%)
~	7	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	227	56.8%
Hangul	89	22.2%
Latin	84	21.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
	52	22.9%
)	26	11.5%
(	26	11.5%
/	21	9.3%
0	19	8.4%
1	16	7.0%
:	14	6.2%
.	13	5.7%
3	7	3.1%
~	7	3.1%
Other values (6)	26	11.5%

Hangul

Value	Count	Frequency (%)
이	20	22.5%
성	18	20.2%
하	15	16.9%
여	7	7.9%
남	7	7.9%
상	5	5.6%
음	5	5.6%
출	3	3.4%
검	3	3.4%
불	3	3.4%
Other values (3)	3	3.4%

Latin

Value	Count	Frequency (%)
d	21	25.0%
g	21	25.0%
m	21	25.0%
L	21	25.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	311	77.8%
Hangul	89	22.2%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	52	16.7%
)	26	8.4%
(	26	8.4%
d	21	6.8%
g	21	6.8%
m	21	6.8%
L	21	6.8%
/	21	6.8%
0	19	6.1%
1	16	5.1%
Other values (10)	67	21.5%

Hangul

Value	Count	Frequency (%)
이	20	22.5%
성	18	20.2%
하	15	16.9%
여	7	7.9%
남	7	7.9%
상	5	5.6%
음	5	5.6%
출	3	3.4%
검	3	3.4%
불	3	3.4%
Other values (3)	3	3.4%

처리기한
Categorical

HIGH CORRELATION

Distinct	3
Distinct (%)	9.7%
Missing	0
Missing (%)	0.0%
Memory size	380.0 B

3일	24
즉시	6
2일	1

Length

Max length	2
Median length	2
Mean length	2
Min length	2

Unique

Unique	1 ?
Unique (%)	3.2%

Sample

1st row	즉시
2nd row	3일
3rd row	즉시
4th row	즉시
5th row	즉시

Common Values

Value	Count	Frequency (%)
3일	24	77.4%
즉시	6	19.4%
2일	1	3.2%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
3일	24	77.4%
즉시	6	19.4%
2일	1	3.2%

Heatmap
Table

	구 분	검사항목	참고치 (단위)	처리기한
구 분	1.000	1.000	0.985	1.000
검사항목	1.000	1.000	0.738	1.000
참고치 (단위)	0.985	0.738	1.000	1.000
처리기한	1.000	1.000	1.000	1.000

Heatmap
Table

	처리기한	구 분
처리기한	1.000	0.802
구 분	0.802	1.000

Heatmap
Table

	구 분	처리기한
구 분	1.000	0.802
처리기한	0.802	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	구 분	검사항목	참고치 (단위)	처리기한
0	당뇨검사	Glucose (혈당)	70 ~ 100 (mg/dL) 공복	즉시
1	당뇨정밀검사	HbA1C	3.5 ~ 5.9이하 (mg/dL)	3일
2	빈혈검사	Hb (헤모글로빈)	남 : 13이상(mg/dL)	즉시
3	빈혈검사	Hb (헤모글로빈)	여 : 12이상(mg/dL)	즉시
4	소변검사	3종	불검출	즉시
5	소변검사	10종	불검출	즉시
6	소변검사	요침사(현미경검경)	불검출	즉시
7	간기능검사	AST	남성 : 37이하(mg/dL)	3일
8	간기능검사	AST	여성 : 31이하(mg/dL)	3일
9	간기능검사	ALT	남성 : 41이하(mg/dL)	3일

	구 분	검사항목	참고치 (단위)	처리기한
21	고지혈증검사	LDL-Cholesterol	130이하(mg/dL)	3일
22	통풍검사	Uric Acid	남성 : 7.0이하 (mg/dL)	3일
23	통풍검사	Uric Acid	여성 : 5.7이하 (mg/dL)	3일
24	B형 간염검사(정밀검사)	HBs-Ag (EIA)	음성 (1.0이하)	3일
25	B형 간염검사(정밀검사)	HBs-Ab (EIA)	양성 (10이상)	3일
26	C형 간염검사(정밀검사)	HCV(EIA)	음성(1.0이하)	3일
27	C형 간염검사(정밀검사)	HCV(EIA)	음성(1.0이하)	3일
28	매독검사	RPR, TPPA	음성	3일
29	에이즈검사	HIV(EIA)	음성(1.0이하)	3일
30	혈액학검사	CBC	<NA>	2일

Most frequently occurring

	구 분	검사항목	참고치 (단위)	처리기한	# duplicates
0	C형 간염검사(정밀검사)	HCV(EIA)	음성(1.0이하)	3일	2

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Other Letter

Decimal Number

Open Punctuation

Close Punctuation

Dash Punctuation

Space Separator

Other Punctuation

Most occurring scripts

Most frequent character per script

Latin

Hangul

Common

Greek

Most occurring blocks

Most frequent character per block

ASCII

Hangul

None

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Lowercase Letter

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Uppercase Letter

Math Symbol

Most occurring scripts

Most frequent character per script

Common

Hangul

Latin

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Common Values

Length

Common Values (Plot)

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring