gimi9 Pandas Profiling

Dataset statistics

Number of variables	1
Number of observations	429
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	10
Duplicate rows (%)	2.3%
Total size in memory	3.5 KiB
Average record size in memory	8.3 B

Variable types

Text	1

Dataset

Description	서울특별시 강남구에 위치한 400여개 의료기관에 대한 기관명 데이터를 제공합니다.(아랍어) 자세한 사항은 서울특별시 강남구 관관진흥과로 문의하여 주시기 바랍니다.
Author	서울특별시 강남구
URL	https://www.data.go.kr/data/15072589/fileData.do

Alerts

Dataset has 10 (2.3%) duplicate rows

Duplicates

Reproduction

Analysis started	2023-12-12 23:15:09.787595
Analysis finished	2023-12-12 23:15:09.936624
Duration	0.15 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

의료기관
Text

Distinct	412
Distinct (%)	96.0%
Missing	0
Missing (%)	0.0%
Memory size	3.5 KiB

Length

Max length	56
Median length	39
Mean length	22.417249
Min length	3

Characters and Unicode

Total characters	9617
Distinct characters	110
Distinct categories	9 ?
Distinct scripts	4 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	402 ?
Unique (%)	93.7%

Sample

1st row	تاكو لجراحة التجميل
2nd row	مركز برايت سانت ماري للعيون
3rd row	عيادة هيونداي للتجميل لجراحة التجميل
4th row	جراحة تجميل الوجه
5th row	عيادة جراحة الثدي

Value	Count	Frequency (%)
عيادة	186	11.8%
التجميل	94	6.0%
جراحة	68	4.3%
لجراحة	46	2.9%
مستشفى	39	2.5%
التجميلية	31	2.0%
لطب	26	1.7%
مركز	25	1.6%
الجلدية	23	1.5%
سيول	23	1.5%
Other values (601)	1010	64.3%

Most occurring characters

Value	Count	Frequency (%)
	1142	11.9%
ا	1130	11.8%
ل	948	9.9%
ي	844	8.8%
ج	491	5.1%
ة	461	4.8%
ن	425	4.4%
م	396	4.1%
و	374	3.9%
ر	358	3.7%
Other values (100)	3048	31.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	7714	80.2%
Space Separator	1142	11.9%
Lowercase Letter	395	4.1%
Uppercase Letter	313	3.3%
Decimal Number	18	0.2%
Other Punctuation	17	0.2%
Open Punctuation	7	0.1%
Close Punctuation	7	0.1%
Dash Punctuation	4	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
ا	1130	14.6%
ل	948	12.3%
ي	844	10.9%
ج	491	6.4%
ة	461	6.0%
ن	425	5.5%
م	396	5.1%
و	374	4.8%
ر	358	4.6%
د	333	4.3%
Other values (33)	1954	25.3%

Uppercase Letter

Value	Count	Frequency (%)
A	27	8.6%
I	22	7.0%
O	20	6.4%
C	19	6.1%
J	17	5.4%
S	17	5.4%
E	17	5.4%
G	16	5.1%
U	16	5.1%
N	14	4.5%
Other values (16)	128	40.9%

Lowercase Letter

Value	Count	Frequency (%)
e	58	14.7%
n	48	12.2%
o	38	9.6%
a	34	8.6%
i	32	8.1%
l	24	6.1%
u	24	6.1%
g	22	5.6%
r	18	4.6%
d	15	3.8%
Other values (14)	82	20.8%

Decimal Number

Value	Count	Frequency (%)
1	3	16.7%
2	3	16.7%
0	2	11.1%
6	2	11.1%
8	2	11.1%
3	2	11.1%
9	2	11.1%
7	1	5.6%
5	1	5.6%

Other Punctuation

Value	Count	Frequency (%)
،	6	35.3%
.	5	29.4%
&	5	29.4%
/	1	5.9%

Space Separator

Value	Count	Frequency (%)
	1142	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	7	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	7	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Arabic	7707	80.1%
Common	1195	12.4%
Latin	708	7.4%
Hangul	7	0.1%

Most frequent character per script

Latin

Value	Count	Frequency (%)
e	58	8.2%
n	48	6.8%
o	38	5.4%
a	34	4.8%
i	32	4.5%
A	27	3.8%
l	24	3.4%
u	24	3.4%
I	22	3.1%
g	22	3.1%
Other values (40)	379	53.5%

Arabic

Value	Count	Frequency (%)
ا	1130	14.7%
ل	948	12.3%
ي	844	11.0%
ج	491	6.4%
ة	461	6.0%
ن	425	5.5%
م	396	5.1%
و	374	4.9%
ر	358	4.6%
د	333	4.3%
Other values (26)	1947	25.3%

Common

Value	Count	Frequency (%)
	1142	95.6%
(	7	0.6%
)	7	0.6%
،	6	0.5%
.	5	0.4%
&	5	0.4%
-	4	0.3%
1	3	0.3%
2	3	0.3%
0	2	0.2%
Other values (7)	11	0.9%

Hangul

Value	Count	Frequency (%)
과	1	14.3%
의	1	14.3%
원	1	14.3%
외	1	14.3%
앤	1	14.3%
성	1	14.3%
형	1	14.3%

Most occurring blocks

Value	Count	Frequency (%)
Arabic	7713	80.2%
ASCII	1897	19.7%
Hangul	7	0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	1142	60.2%
e	58	3.1%
n	48	2.5%
o	38	2.0%
a	34	1.8%
i	32	1.7%
A	27	1.4%
l	24	1.3%
u	24	1.3%
I	22	1.2%
Other values (56)	448	23.6%

Arabic

Value	Count	Frequency (%)
ا	1130	14.7%
ل	948	12.3%
ي	844	10.9%
ج	491	6.4%
ة	461	6.0%
ن	425	5.5%
م	396	5.1%
و	374	4.8%
ر	358	4.6%
د	333	4.3%
Other values (27)	1953	25.3%

Hangul

Value	Count	Frequency (%)
과	1	14.3%
의	1	14.3%
원	1	14.3%
외	1	14.3%
앤	1	14.3%
성	1	14.3%
형	1	14.3%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	의료기관
0	تاكو لجراحة التجميل
1	مركز برايت سانت ماري للعيون
2	عيادة هيونداي للتجميل لجراحة التجميل
3	جراحة تجميل الوجه
4	عيادة جراحة الثدي
5	جراحة البانتنج البلاستيكية
6	الجراحة التجميلية الأولى
7	أوبرا لجراحة التجميل
8	عيادة الأسنان SOJOONG
9	للحصول على جراحة التجميل

	의료기관
419	الجبين
420	فندق جانجنام فاميلي
421	فندق جراموس
422	فندق المصممين
423	فندق TRIA
424	بست ويسترن بريمير جانجنام
425	نوفوتيل سيول أمباسادور جانجنام
426	ريتز كارلتون سيول
427	فندق JBIS
428	مركز Oakwood Premier Coex

Most frequently occurring

	의료기관	# duplicates
1	جانجنام سيفيرانس هوسيبيتال	6
8	مركز سامسونج سيول الطبي	5
0	اقتراح جراحة التجميل	2
2	جراحة التجميل JJ	2
3	جراحة تجميل الوجه	2
4	جراحة لافيان التجميلية	2
5	جلوفي لجراحة التجميل	2
6	عيادة ريبيلو	2
7	مركز CHA Gangnam الطبي ، جامعة CHA	2
9	مستشفى سو لطب الاسنان	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Most occurring scripts

Most frequent character per script

Latin

Arabic

Common

Hangul

Most occurring blocks

Most frequent character per block

ASCII

Arabic

Hangul

Missing values

Sample

Duplicate rows

Most frequently occurring