gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	239
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	9.5 KiB
Average record size in memory	40.5 B

Variable types

Text	3
Categorical	2

Dataset

Description	보건복지부 국립나주병원에서 사용하고 있는 의약품에 대한 데이터로 약품코드, 성분한글명, 성분영문명, 약품분류(일반약, 향정), 약품구분(내복약, 외용약, 주사)에 대한 정보를 포함하고 있습니다.
Author	보건복지부 국립나주병원
URL	https://www.data.go.kr/data/15079898/fileData.do

Alerts

`약품분류` is highly imbalanced (50.4%)	Imbalance
`약품코드` has unique values	Unique

Reproduction

Analysis started	2024-03-15 01:52:52.347531
Analysis finished	2024-03-15 01:52:53.244298
Duration	0.9 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

약품코드
Text

UNIQUE

Distinct	239
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	2.0 KiB

Length

Max length	11
Median length	9
Mean length	5.3305439
Min length	3

Characters and Unicode

Total characters	1274
Distinct characters	36
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	239 ?
Unique (%)	100.0%

Sample

1st row	DAAP
2nd row	DABILOD10
3rd row	DABILOD15
4th row	DACAM
5th row	DACAR2

Value	Count	Frequency (%)
daap	1	0.4%
drisq2	1	0.4%
dro	1	0.4%
drzp	1	0.4%
dscital	1	0.4%
dscital2	1	0.4%
dscital5	1	0.4%
dscitalod10	1	0.4%
dscitalod20	1	0.4%
dsero	1	0.4%
Other values (229)	229	95.8%

Most occurring characters

Value	Count	Frequency (%)
D	233	18.3%
A	89	7.0%
L	70	5.5%
P	68	5.3%
0	68	5.3%
I	62	4.9%
O	54	4.2%
T	53	4.2%
R	51	4.0%
1	44	3.5%
Other values (26)	482	37.8%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	1037	81.4%
Decimal Number	235	18.4%
Other Punctuation	2	0.2%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
D	233	22.5%
A	89	8.6%
L	70	6.8%
P	68	6.6%
I	62	6.0%
O	54	5.2%
T	53	5.1%
R	51	4.9%
W	38	3.7%
E	37	3.6%
Other values (15)	282	27.2%

Decimal Number

Value	Count	Frequency (%)
0	68	28.9%
1	44	18.7%
5	44	18.7%
2	40	17.0%
4	12	5.1%
3	11	4.7%
8	6	2.6%
6	5	2.1%
7	3	1.3%
9	2	0.9%

Other Punctuation

Value	Count	Frequency (%)
.	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	1037	81.4%
Common	237	18.6%

Most frequent character per script

Latin

Value	Count	Frequency (%)
D	233	22.5%
A	89	8.6%
L	70	6.8%
P	68	6.6%
I	62	6.0%
O	54	5.2%
T	53	5.1%
R	51	4.9%
W	38	3.7%
E	37	3.6%
Other values (15)	282	27.2%

Common

Value	Count	Frequency (%)
0	68	28.7%
1	44	18.6%
5	44	18.6%
2	40	16.9%
4	12	5.1%
3	11	4.6%
8	6	2.5%
6	5	2.1%
7	3	1.3%
.	2	0.8%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1274	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
D	233	18.3%
A	89	7.0%
L	70	5.5%
P	68	5.3%
0	68	5.3%
I	62	4.9%
O	54	4.2%
T	53	4.2%
R	51	4.0%
1	44	3.5%
Other values (26)	482	37.8%

성분한글명
Text

Distinct	222
Distinct (%)	92.9%
Missing	0
Missing (%)	0.0%
Memory size	2.0 KiB

Length

Max length	304
Median length	67
Mean length	15.037657
Min length	7

Characters and Unicode

Total characters	3594
Distinct characters	215
Distinct categories	10 ?
Distinct scripts	4 ?
Distinct blocks	4 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	205 ?
Unique (%)	85.8%

Sample

1st row	아세트아미노펜 300mg
2nd row	아리피프라졸 10mg
3rd row	아리피프라졸 15mg
4th row	아캄프로세이트칼슘 333mg
5th row	아카보즈 100mg

Value	Count	Frequency (%)
10mg	23	4.3%
25mg	16	3.0%
100mg	16	3.0%
50mg	15	2.8%
5mg	11	2.1%
2mg	11	2.1%
1mg	10	1.9%
아리피프라졸	10	1.9%
쿠에티아핀	9	1.7%
200mg	8	1.5%
Other values (258)	404	75.8%

Most occurring characters

Value	Count	Frequency (%)
m	303	8.4%
g	298	8.3%
	296	8.2%
0	210	5.8%
5	114	3.2%
1	112	3.1%
염	112	3.1%
2	93	2.6%
산	92	2.6%
리	72	2.0%
Other values (205)	1892	52.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1848	51.4%
Decimal Number	646	18.0%
Lowercase Letter	604	16.8%
Space Separator	296	8.2%
Other Punctuation	90	2.5%
Math Symbol	53	1.5%
Uppercase Letter	45	1.3%
Dash Punctuation	4	0.1%
Open Punctuation	4	0.1%
Close Punctuation	4	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
염	112	6.1%
산	92	5.0%
리	72	3.9%
로	69	3.7%
아	62	3.4%
라	56	3.0%
트	55	3.0%
프	54	2.9%
페	41	2.2%
스	39	2.1%
Other values (179)	1196	64.7%

Decimal Number

Value	Count	Frequency (%)
0	210	32.5%
5	114	17.6%
1	112	17.3%
2	93	14.4%
3	43	6.7%
4	30	4.6%
6	17	2.6%
7	13	2.0%
8	8	1.2%
9	6	0.9%

Other Punctuation

Value	Count	Frequency (%)
/	48	53.3%
.	37	41.1%
%	3	3.3%
:	2	2.2%

Lowercase Letter

Value	Count	Frequency (%)
m	303	50.2%
g	298	49.3%
μ	3	0.5%

Uppercase Letter

Value	Count	Frequency (%)
L	43	95.6%
D	1	2.2%
S	1	2.2%

Math Symbol

Value	Count	Frequency (%)
+	52	98.1%
→	1	1.9%

Space Separator

Value	Count	Frequency (%)
	296	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	4	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	4	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1848	51.4%
Common	1097	30.5%
Latin	646	18.0%
Greek	3	0.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
염	112	6.1%
산	92	5.0%
리	72	3.9%
로	69	3.7%
아	62	3.4%
라	56	3.0%
트	55	3.0%
프	54	2.9%
페	41	2.2%
스	39	2.1%
Other values (179)	1196	64.7%

Common

Value	Count	Frequency (%)
	296	27.0%
0	210	19.1%
5	114	10.4%
1	112	10.2%
2	93	8.5%
+	52	4.7%
/	48	4.4%
3	43	3.9%
.	37	3.4%
4	30	2.7%
Other values (10)	62	5.7%

Latin

Value	Count	Frequency (%)
m	303	46.9%
g	298	46.1%
L	43	6.7%
D	1	0.2%
S	1	0.2%

Greek

Value	Count	Frequency (%)
μ	3	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1848	51.4%
ASCII	1742	48.5%
None	3	0.1%
Arrows	1	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
m	303	17.4%
g	298	17.1%
	296	17.0%
0	210	12.1%
5	114	6.5%
1	112	6.4%
2	93	5.3%
+	52	3.0%
/	48	2.8%
L	43	2.5%
Other values (14)	173	9.9%

Hangul

Value	Count	Frequency (%)
염	112	6.1%
산	92	5.0%
리	72	3.9%
로	69	3.7%
아	62	3.4%
라	56	3.0%
트	55	3.0%
프	54	2.9%
페	41	2.2%
스	39	2.1%
Other values (179)	1196	64.7%

None

Value	Count	Frequency (%)
μ	3	100.0%

Arrows

Value	Count	Frequency (%)
→	1	100.0%

성분영문명
Text

Distinct	223
Distinct (%)	93.3%
Missing	0
Missing (%)	0.0%
Memory size	2.0 KiB

Length

Max length	533
Median length	77
Mean length	26.435146
Min length	11

Characters and Unicode

Total characters	6318
Distinct characters	68
Distinct categories	9 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	207 ?
Unique (%)	86.6%

Sample

1st row	Acetaminophen 300mg
2nd row	Aripiprazole 10mg
3rd row	Aripiprazole 15mg
4th row	Acamprosate Calcium 333mg
5th row	Acarbose 100mg

Value	Count	Frequency (%)
hydrochloride	47	6.8%
10mg	22	3.2%
25mg	16	2.3%
100mg	16	2.3%
50mg	15	2.2%
sodium	13	1.9%
5mg	11	1.6%
2mg	11	1.6%
aripiprazole	10	1.4%
1mg	10	1.4%
Other values (307)	523	75.4%

Most occurring characters

Value	Count	Frequency (%)
	474	7.5%
i	466	7.4%
e	459	7.3%
m	435	6.9%
o	398	6.3%
a	341	5.4%
r	319	5.0%
g	310	4.9%
l	270	4.3%
n	263	4.2%
Other values (58)	2583	40.9%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	4555	72.1%
Decimal Number	645	10.2%
Uppercase Letter	486	7.7%
Space Separator	474	7.5%
Other Punctuation	97	1.5%
Math Symbol	52	0.8%
Dash Punctuation	5	0.1%
Close Punctuation	2	< 0.1%
Open Punctuation	2	< 0.1%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
i	466	10.2%
e	459	10.1%
m	435	9.5%
o	398	8.7%
a	341	7.5%
r	319	7.0%
g	310	6.8%
l	270	5.9%
n	263	5.8%
d	223	4.9%
Other values (16)	1071	23.5%

Uppercase Letter

Value	Count	Frequency (%)
H	66	13.6%
L	64	13.2%
C	52	10.7%
A	51	10.5%
P	36	7.4%
S	33	6.8%
M	28	5.8%
D	28	5.8%
T	18	3.7%
B	17	3.5%
Other values (13)	93	19.1%

Decimal Number

Value	Count	Frequency (%)
0	210	32.6%
5	115	17.8%
1	112	17.4%
2	92	14.3%
3	43	6.7%
4	30	4.7%
6	17	2.6%
7	13	2.0%
8	8	1.2%
9	5	0.8%

Other Punctuation

Value	Count	Frequency (%)
/	48	49.5%
.	44	45.4%
%	4	4.1%
&	1	1.0%

Space Separator

Value	Count	Frequency (%)
	474	100.0%

Math Symbol

Value	Count	Frequency (%)
+	52	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	5	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	2	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	5037	79.7%
Common	1277	20.2%
Greek	4	0.1%

Most frequent character per script

Latin

Value	Count	Frequency (%)
i	466	9.3%
e	459	9.1%
m	435	8.6%
o	398	7.9%
a	341	6.8%
r	319	6.3%
g	310	6.2%
l	270	5.4%
n	263	5.2%
d	223	4.4%
Other values (37)	1553	30.8%

Common

Value	Count	Frequency (%)
	474	37.1%
0	210	16.4%
5	115	9.0%
1	112	8.8%
2	92	7.2%
+	52	4.1%
/	48	3.8%
.	44	3.4%
3	43	3.4%
4	30	2.3%
Other values (9)	57	4.5%

Greek

Value	Count	Frequency (%)
μ	3	75.0%
β	1	25.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	6314	99.9%
None	4	0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	474	7.5%
i	466	7.4%
e	459	7.3%
m	435	6.9%
o	398	6.3%
a	341	5.4%
r	319	5.1%
g	310	4.9%
l	270	4.3%
n	263	4.2%
Other values (56)	2579	40.8%

None

Value	Count	Frequency (%)
μ	3	75.0%
β	1	25.0%

약품분류
Categorical

IMBALANCE

Distinct	2
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	2.0 KiB

일반	213
향정	26

Length

Max length	2
Median length	2
Mean length	2
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	일반
2nd row	일반
3rd row	일반
4th row	일반
5th row	일반

Common Values

Value	Count	Frequency (%)
일반	213	89.1%
향정	26	10.9%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
일반	213	89.1%
향정	26	10.9%

약품구분
Categorical

Distinct	3
Distinct (%)	1.3%
Missing	0
Missing (%)	0.0%
Memory size	2.0 KiB

내복약	193
주사약	36
외용약	10

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	내복약
2nd row	내복약
3rd row	내복약
4th row	내복약
5th row	내복약

Common Values

Value	Count	Frequency (%)
내복약	193	80.8%
주사약	36	15.1%
외용약	10	4.2%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
내복약	193	80.8%
주사약	36	15.1%
외용약	10	4.2%

Heatmap
Table

	약품분류	약품구분
약품분류	1.000	0.034
약품구분	0.034	1.000

Heatmap
Table

	약품구분	약품분류
약품구분	1.000	0.056
약품분류	0.056	1.000

Heatmap
Table

	약품분류	약품구분
약품분류	1.000	0.056
약품구분	0.056	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	약품코드	성분한글명	성분영문명	약품분류	약품구분
0	DAAP	아세트아미노펜 300mg	Acetaminophen 300mg	일반	내복약
1	DABILOD10	아리피프라졸 10mg	Aripiprazole 10mg	일반	내복약
2	DABILOD15	아리피프라졸 15mg	Aripiprazole 15mg	일반	내복약
3	DACAM	아캄프로세이트칼슘 333mg	Acamprosate Calcium 333mg	일반	내복약
4	DACAR2	아카보즈 100mg	Acarbose 100mg	일반	내복약
5	DACTI	슈도에페드린염산염 60mg+트리프롤리딘염산염수화물 2.5mg	Pseudoephedrine Hydrochloride 60mg+Triprolidine Hydrochloride Hydrate 2.5mg	일반	내복약
6	DALP	알프라졸람 0.25mg	Alprazolam 0.25mg	향정	내복약
7	DALP125	알프라졸람 0.125mg	Alprazolam 0.125mg	향정	내복약
8	DALP5	알프라졸람 0.5mg	Alprazolam 0.5mg	향정	내복약
9	DAMA	글리메피리드 2mg	Glimepiride 2mg	일반	내복약

	약품코드	성분한글명	성분영문명	약품분류	약품구분
229	WNS1	염화나트륨 9g/L	Sodium Chloride 9g/L	일반	주사약
230	WPAL100	팔리페리돈 100mg	Paliperidone 100mg	일반	주사약
231	WPAL150	팔리페리돈 150mg	Paliperidone 150mg	일반	주사약
232	WPAL50	팔리페리돈 50mg	Paliperidone 50mg	일반	주사약
233	WPAL546	팔리페리돈팔미테이트 312mg/mL	Paliperidone palmitate 312mg/mL	일반	주사약
234	WPAL75	팔리페리돈 75mg	Paliperidone 75mg	일반	주사약
235	WPAL819	팔리페리돈팔미테이트 312mg/mL	Paliperidone palmitate 312mg/mL	일반	주사약
236	WPERI	할로페리돌 5mg/mL	Haloperidol 5mg/1mL	일반	주사약
237	WPPCT	염산프로파세타몰 1g	Propacetamol Hydrochloride 1g	일반	주사약
238	WTHI	치아민염산염 50mg/2mL	Thiamine Hydrochloride 50mg/2mL	일반	주사약

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Decimal Number

Other Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Other Punctuation

Lowercase Letter

Uppercase Letter

Math Symbol

Space Separator

Dash Punctuation

Open Punctuation

Close Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Greek

Most occurring blocks

Most frequent character per block

ASCII

Hangul

None

Arrows

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Decimal Number

Other Punctuation

Space Separator

Math Symbol

Dash Punctuation

Close Punctuation

Open Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Greek

Most occurring blocks

Most frequent character per block

ASCII

None

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Correlations

Missing values

Sample