gimi9 Pandas Profiling

Dataset statistics

Number of variables	1
Number of observations	67
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	8
Duplicate rows (%)	11.9%
Total size in memory	668.0 B
Average record size in memory	10.0 B

Variable types

Text	1

Dataset

Description	아파트 전월세 자료 현황
Author	경기도
URL	https://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=18SI9BJ6M874FNBOOOTU23125858&infSeq=2

Alerts

Dataset has 8 (11.9%) duplicate rows

Duplicates

Reproduction

Analysis started	2023-12-10 21:15:17.624686
Analysis finished	2023-12-10 21:15:17.796396
Duration	0.17 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

Text

Distinct	58
Distinct (%)	86.6%
Missing	0
Missing (%)	0.0%
Memory size	668.0 B

Length

Max length	160
Median length	71
Mean length	41.328358
Min length	6

Characters and Unicode

Total characters	2769
Distinct characters	133
Distinct categories	12 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	50 ?
Unique (%)	74.6%

Sample

1st row	<html lang="ko">
2nd row	<head>
3rd row	<title>오류 메세지 \| 경기데이터드림</title>
4th row	<meta charset="utf-8" />
5th row	<meta http-equiv="X-UA-Compatible" content="IE=edge" />

Value	Count	Frequency (%)
	40	17.2%
script	9	3.9%
p	6	2.6%
type="text/javascript	6	2.6%
div	6	2.6%
페이지를	5	2.1%
type="text/css	5	2.1%
link	5	2.1%
rel="stylesheet	5	2.1%
if	4	1.7%
Other values (106)	142	60.9%

Most occurring characters

Value	Count	Frequency (%)
	540	19.5%
t	162	5.9%
e	153	5.5%
r	128	4.6%
s	122	4.4%
"	112	4.0%
/	105	3.8%
o	91	3.3%
i	83	3.0%
a	77	2.8%
Other values (123)	1196	43.2%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	1439	52.0%
Space Separator	540	19.5%
Other Punctuation	287	10.4%
Math Symbol	195	7.0%
Other Letter	174	6.3%
Uppercase Letter	31	1.1%
Dash Punctuation	25	0.9%
Open Punctuation	23	0.8%
Connector Punctuation	19	0.7%
Close Punctuation	19	0.7%
Other values (2)	17	0.6%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
이	10	5.7%
을	6	3.4%
지	6	3.4%
니	6	3.4%
다	6	3.4%
하	6	3.4%
시	5	2.9%
페	5	2.9%
를	5	2.9%
기	4	2.3%
Other values (57)	115	66.1%

Lowercase Letter

Value	Count	Frequency (%)
t	162	11.3%
e	153	10.6%
r	128	8.9%
s	122	8.5%
o	91	6.3%
i	83	5.8%
a	77	5.4%
l	76	5.3%
p	71	4.9%
n	69	4.8%
Other values (15)	407	28.3%

Uppercase Letter

Value	Count	Frequency (%)
A	6	19.4%
E	4	12.9%
Q	4	12.9%
T	3	9.7%
I	3	9.7%
N	2	6.5%
P	2	6.5%
R	2	6.5%
X	1	3.2%
U	1	3.2%
Other values (3)	3	9.7%

Other Punctuation

Value	Count	Frequency (%)
"	112	39.0%
/	105	36.6%
.	50	17.4%
&	8	2.8%
!	6	2.1%
;	4	1.4%
:	1	0.3%
?	1	0.3%

Decimal Number

Value	Count	Frequency (%)
1	7	43.8%
0	4	25.0%
9	2	12.5%
8	1	6.2%
2	1	6.2%
5	1	6.2%

Math Symbol

Value	Count	Frequency (%)
>	70	35.9%
<	68	34.9%
=	56	28.7%
\|	1	0.5%

Open Punctuation

Value	Count	Frequency (%)
(	12	52.2%
{	7	30.4%
[	4	17.4%

Close Punctuation

Value	Count	Frequency (%)
)	10	52.6%
}	5	26.3%
]	4	21.1%

Space Separator

Value	Count	Frequency (%)
	540	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	25	100.0%

Connector Punctuation

Value	Count	Frequency (%)
_	19	100.0%

Currency Symbol

Value	Count	Frequency (%)
$	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	1470	53.1%
Common	1125	40.6%
Hangul	174	6.3%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
이	10	5.7%
을	6	3.4%
지	6	3.4%
니	6	3.4%
다	6	3.4%
하	6	3.4%
시	5	2.9%
페	5	2.9%
를	5	2.9%
기	4	2.3%
Other values (57)	115	66.1%

Latin

Value	Count	Frequency (%)
t	162	11.0%
e	153	10.4%
r	128	8.7%
s	122	8.3%
o	91	6.2%
i	83	5.6%
a	77	5.2%
l	76	5.2%
p	71	4.8%
n	69	4.7%
Other values (28)	438	29.8%

Common

Value	Count	Frequency (%)
	540	48.0%
"	112	10.0%
/	105	9.3%
>	70	6.2%
<	68	6.0%
=	56	5.0%
.	50	4.4%
-	25	2.2%
_	19	1.7%
(	12	1.1%
Other values (18)	68	6.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2595	93.7%
Hangul	174	6.3%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	540	20.8%
t	162	6.2%
e	153	5.9%
r	128	4.9%
s	122	4.7%
"	112	4.3%
/	105	4.0%
o	91	3.5%
i	83	3.2%
a	77	3.0%
Other values (56)	1022	39.4%

Hangul

Value	Count	Frequency (%)
이	10	5.7%
을	6	3.4%
지	6	3.4%
니	6	3.4%
다	6	3.4%
하	6	3.4%
시	5	2.9%
페	5	2.9%
를	5	2.9%
기	4	2.3%
Other values (57)	115	66.1%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	<!DOCTYPE html>
0	<html lang="ko">
1	<head>
2	<title>오류 메세지 \| 경기데이터드림</title>
3	<meta charset="utf-8" />
4	<meta http-equiv="X-UA-Compatible" content="IE=edge" />
5	<meta name="format-detection" content="telephone=no" />
6	<meta name="copyright" content="Gyeonggi Province. All Rights Reserved." />
7	<link rel="stylesheet" type="text/css" href="/css/ggportal/base.css" />
8	<link rel="stylesheet" type="text/css" href="/css/ggportal/layout.css" />
9	<link rel="stylesheet" type="text/css" href="/css/ggportal/remainder.css?ver=1.0" />

	<!DOCTYPE html>
57	</p>
58	<span class="logo">
59	<a href="http://www.gg.go.kr" target="_blank" title="새창으로 이동"><img src="/img/ggportal/desktop/remainder/logo_footer_1.png" alt="셰게속의 경기도" /></a>
60	</span>
61	</div>
62	</div>
63	</div>
64	<!-- // layout_A -->
65	</body>
66	</html>

Most frequently occurring

	<!DOCTYPE html>	# duplicates
1	</p>	3
0	setTimeout(function() {	2
2	}	2
3	요청하신 페이지를 찾을 수 없습니다. 이용에 불편을 드려 죄송합니다.<br /><br />	2
4	}	2
5	<!--[if lt IE 9]>	2
6	<![endif]-->	2
7	</script>	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Lowercase Letter

Uppercase Letter

Other Punctuation

Decimal Number

Math Symbol

Open Punctuation

Close Punctuation

Space Separator

Dash Punctuation

Connector Punctuation

Currency Symbol

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Missing values

Sample

Duplicate rows

Most frequently occurring