Overview

Dataset statistics

Number of variables1
Number of observations67
Missing cells0
Missing cells (%)0.0%
Duplicate rows8
Duplicate rows (%)11.9%
Total size in memory668.0 B
Average record size in memory10.0 B

Variable types

Text1

Alerts

Dataset has 8 (11.9%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-10 21:15:17.624686
Analysis finished2023-12-10 21:15:17.796396
Duration0.17 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables


Text

Distinct58
Distinct (%)86.6%
Missing0
Missing (%)0.0%
Memory size668.0 B
2023-12-11T06:15:17.900451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length160
Median length71
Mean length41.328358
Min length6

Characters and Unicode

Total characters2769
Distinct characters133
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique50 ?
Unique (%)74.6%

Sample

1st row<html lang="ko">
2nd row<head>
3rd row<title>오류 메세지 | 경기데이터드림</title>
4th row<meta charset="utf-8" />
5th row<meta http-equiv="X-UA-Compatible" content="IE=edge" />
ValueCountFrequency (%)
40
 
17.2%
script 9
 
3.9%
p 6
 
2.6%
type="text/javascript 6
 
2.6%
div 6
 
2.6%
페이지를 5
 
2.1%
type="text/css 5
 
2.1%
link 5
 
2.1%
rel="stylesheet 5
 
2.1%
if 4
 
1.7%
Other values (106) 142
60.9%
2023-12-11T06:15:18.285797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
540
19.5%
t 162
 
5.9%
e 153
 
5.5%
r 128
 
4.6%
s 122
 
4.4%
" 112
 
4.0%
/ 105
 
3.8%
o 91
 
3.3%
i 83
 
3.0%
a 77
 
2.8%
Other values (123) 1196
43.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1439
52.0%
Space Separator 540
 
19.5%
Other Punctuation 287
 
10.4%
Math Symbol 195
 
7.0%
Other Letter 174
 
6.3%
Uppercase Letter 31
 
1.1%
Dash Punctuation 25
 
0.9%
Open Punctuation 23
 
0.8%
Connector Punctuation 19
 
0.7%
Close Punctuation 19
 
0.7%
Other values (2) 17
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10
 
5.7%
6
 
3.4%
6
 
3.4%
6
 
3.4%
6
 
3.4%
6
 
3.4%
5
 
2.9%
5
 
2.9%
5
 
2.9%
4
 
2.3%
Other values (57) 115
66.1%
Lowercase Letter
ValueCountFrequency (%)
t 162
 
11.3%
e 153
 
10.6%
r 128
 
8.9%
s 122
 
8.5%
o 91
 
6.3%
i 83
 
5.8%
a 77
 
5.4%
l 76
 
5.3%
p 71
 
4.9%
n 69
 
4.8%
Other values (15) 407
28.3%
Uppercase Letter
ValueCountFrequency (%)
A 6
19.4%
E 4
12.9%
Q 4
12.9%
T 3
9.7%
I 3
9.7%
N 2
 
6.5%
P 2
 
6.5%
R 2
 
6.5%
X 1
 
3.2%
U 1
 
3.2%
Other values (3) 3
9.7%
Other Punctuation
ValueCountFrequency (%)
" 112
39.0%
/ 105
36.6%
. 50
17.4%
& 8
 
2.8%
! 6
 
2.1%
; 4
 
1.4%
: 1
 
0.3%
? 1
 
0.3%
Decimal Number
ValueCountFrequency (%)
1 7
43.8%
0 4
25.0%
9 2
 
12.5%
8 1
 
6.2%
2 1
 
6.2%
5 1
 
6.2%
Math Symbol
ValueCountFrequency (%)
> 70
35.9%
< 68
34.9%
= 56
28.7%
| 1
 
0.5%
Open Punctuation
ValueCountFrequency (%)
( 12
52.2%
{ 7
30.4%
[ 4
 
17.4%
Close Punctuation
ValueCountFrequency (%)
) 10
52.6%
} 5
26.3%
] 4
 
21.1%
Space Separator
ValueCountFrequency (%)
540
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 25
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 19
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1470
53.1%
Common 1125
40.6%
Hangul 174
 
6.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10
 
5.7%
6
 
3.4%
6
 
3.4%
6
 
3.4%
6
 
3.4%
6
 
3.4%
5
 
2.9%
5
 
2.9%
5
 
2.9%
4
 
2.3%
Other values (57) 115
66.1%
Latin
ValueCountFrequency (%)
t 162
 
11.0%
e 153
 
10.4%
r 128
 
8.7%
s 122
 
8.3%
o 91
 
6.2%
i 83
 
5.6%
a 77
 
5.2%
l 76
 
5.2%
p 71
 
4.8%
n 69
 
4.7%
Other values (28) 438
29.8%
Common
ValueCountFrequency (%)
540
48.0%
" 112
 
10.0%
/ 105
 
9.3%
> 70
 
6.2%
< 68
 
6.0%
= 56
 
5.0%
. 50
 
4.4%
- 25
 
2.2%
_ 19
 
1.7%
( 12
 
1.1%
Other values (18) 68
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2595
93.7%
Hangul 174
 
6.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
540
20.8%
t 162
 
6.2%
e 153
 
5.9%
r 128
 
4.9%
s 122
 
4.7%
" 112
 
4.3%
/ 105
 
4.0%
o 91
 
3.5%
i 83
 
3.2%
a 77
 
3.0%
Other values (56) 1022
39.4%
Hangul
ValueCountFrequency (%)
10
 
5.7%
6
 
3.4%
6
 
3.4%
6
 
3.4%
6
 
3.4%
6
 
3.4%
5
 
2.9%
5
 
2.9%
5
 
2.9%
4
 
2.3%
Other values (57) 115
66.1%

Missing values

2023-12-11T06:15:17.718823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T06:15:17.771958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

<!DOCTYPE html>
0<html lang="ko">
1<head>
2<title>오류 메세지 | 경기데이터드림</title>
3<meta charset="utf-8" />
4<meta http-equiv="X-UA-Compatible" content="IE=edge" />
5<meta name="format-detection" content="telephone=no" />
6<meta name="copyright" content="Gyeonggi Province. All Rights Reserved." />
7<link rel="stylesheet" type="text/css" href="/css/ggportal/base.css" />
8<link rel="stylesheet" type="text/css" href="/css/ggportal/layout.css" />
9<link rel="stylesheet" type="text/css" href="/css/ggportal/remainder.css?ver=1.0" />
<!DOCTYPE html>
57</p>
58<span class="logo">
59<a href="http://www.gg.go.kr" target="_blank" title="새창으로 이동"><img src="/img/ggportal/desktop/remainder/logo_footer_1.png" alt="셰게속의 경기도" /></a>
60</span>
61</div>
62</div>
63</div>
64<!-- // layout_A -->
65</body>
66</html>

Duplicate rows

Most frequently occurring

<!DOCTYPE html># duplicates
1</p>3
0setTimeout(function() {2
2}2
3요청하신 페이지를 찾을 수 없습니다. 이용에 불편을 드려 죄송합니다.<br /><br />2
4}2
5<!--[if lt IE 9]>2
6<![endif]-->2
7</script>2