Overview

Dataset statistics

Number of variables1
Number of observations78
Missing cells0
Missing cells (%)0.0%
Duplicate rows9
Duplicate rows (%)11.5%
Total size in memory756.0 B
Average record size in memory9.7 B

Variable types

Text1

Alerts

Dataset has 9 (11.5%) duplicate rowsDuplicates

Reproduction

Analysis started2024-03-12 23:26:50.388553
Analysis finished2024-03-12 23:26:50.520590
Duration0.13 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables


Text

Distinct68
Distinct (%)87.2%
Missing0
Missing (%)0.0%
Memory size756.0 B
2024-03-13T08:26:50.615733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length160
Median length71.5
Mean length39.961538
Min length6

Characters and Unicode

Total characters3117
Distinct characters144
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique59 ?
Unique (%)75.6%

Sample

1st row<html lang="ko">
2nd row<head>
3rd row<title>오류 메세지 | 경기데이터드림</title>
4th row<meta charset="utf-8" />
5th row<meta name="viewport" content="width=device-width
ValueCountFrequency (%)
42
 
16.4%
script 9
 
3.5%
meta 7
 
2.7%
p 6
 
2.3%
div 6
 
2.3%
type="text/javascript 6
 
2.3%
link 5
 
2.0%
rel="stylesheet 5
 
2.0%
페이지를 5
 
2.0%
type="text/css 5
 
2.0%
Other values (124) 160
62.5%
2024-03-13T08:26:50.886672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
596
19.1%
t 182
 
5.8%
e 180
 
5.8%
r 139
 
4.5%
s 132
 
4.2%
" 121
 
3.9%
/ 105
 
3.4%
o 102
 
3.3%
i 95
 
3.0%
a 94
 
3.0%
Other values (134) 1371
44.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1627
52.2%
Space Separator 596
 
19.1%
Other Punctuation 312
 
10.0%
Math Symbol 211
 
6.8%
Other Letter 201
 
6.4%
Uppercase Letter 39
 
1.3%
Open Punctuation 36
 
1.2%
Close Punctuation 31
 
1.0%
Dash Punctuation 26
 
0.8%
Connector Punctuation 19
 
0.6%
Other values (2) 19
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
11
 
5.5%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
5
 
2.5%
Other values (66) 137
68.2%
Lowercase Letter
ValueCountFrequency (%)
t 182
 
11.2%
e 180
 
11.1%
r 139
 
8.5%
s 132
 
8.1%
o 102
 
6.3%
i 95
 
5.8%
a 94
 
5.8%
n 88
 
5.4%
l 77
 
4.7%
p 75
 
4.6%
Other values (15) 463
28.5%
Uppercase Letter
ValueCountFrequency (%)
A 6
15.4%
E 5
12.8%
N 4
10.3%
Q 4
10.3%
T 4
10.3%
P 3
7.7%
I 3
7.7%
B 2
 
5.1%
C 2
 
5.1%
R 2
 
5.1%
Other values (4) 4
10.3%
Other Punctuation
ValueCountFrequency (%)
" 121
38.8%
/ 105
33.7%
. 59
18.9%
& 8
 
2.6%
; 7
 
2.2%
! 6
 
1.9%
' 4
 
1.3%
: 1
 
0.3%
? 1
 
0.3%
Decimal Number
ValueCountFrequency (%)
1 8
44.4%
0 5
27.8%
9 2
 
11.1%
8 1
 
5.6%
5 1
 
5.6%
2 1
 
5.6%
Math Symbol
ValueCountFrequency (%)
< 71
33.6%
> 70
33.2%
= 67
31.8%
| 3
 
1.4%
Open Punctuation
ValueCountFrequency (%)
( 21
58.3%
[ 8
 
22.2%
{ 7
 
19.4%
Close Punctuation
ValueCountFrequency (%)
) 14
45.2%
} 9
29.0%
] 8
25.8%
Space Separator
ValueCountFrequency (%)
596
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 26
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 19
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1666
53.4%
Common 1250
40.1%
Hangul 201
 
6.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
11
 
5.5%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
5
 
2.5%
Other values (66) 137
68.2%
Latin
ValueCountFrequency (%)
t 182
 
10.9%
e 180
 
10.8%
r 139
 
8.3%
s 132
 
7.9%
o 102
 
6.1%
i 95
 
5.7%
a 94
 
5.6%
n 88
 
5.3%
l 77
 
4.6%
p 75
 
4.5%
Other values (29) 502
30.1%
Common
ValueCountFrequency (%)
596
47.7%
" 121
 
9.7%
/ 105
 
8.4%
< 71
 
5.7%
> 70
 
5.6%
= 67
 
5.4%
. 59
 
4.7%
- 26
 
2.1%
( 21
 
1.7%
_ 19
 
1.5%
Other values (19) 95
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2916
93.6%
Hangul 201
 
6.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
596
20.4%
t 182
 
6.2%
e 180
 
6.2%
r 139
 
4.8%
s 132
 
4.5%
" 121
 
4.1%
/ 105
 
3.6%
o 102
 
3.5%
i 95
 
3.3%
a 94
 
3.2%
Other values (58) 1170
40.1%
Hangul
ValueCountFrequency (%)
11
 
5.5%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
6
 
3.0%
5
 
2.5%
Other values (66) 137
68.2%

Missing values

2024-03-13T08:26:50.464273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T08:26:50.503900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

<!DOCTYPE html>
0<html lang="ko">
1<head>
2<title>오류 메세지 | 경기데이터드림</title>
3<meta charset="utf-8" />
4<meta name="viewport" content="width=device-width
5<meta http-equiv="X-UA-Compatible" content="IE=edge" />
6<meta name="format-detection" content="telephone=no" />
7<meta name="copyright" content="Gyeonggi Province. All Rights Reserved." />
8<meta name="description" content="경기데이터드림 에서 필요한 자료를 손쉽게 찾아보세요. PC웹
9<meta name="keywords" content="경기도
<!DOCTYPE html>
68</p>
69<span class="logo">
70<a href="http://www.gg.go.kr" target="_blank" title="새창으로 이동"><img src="/img/ggportal/desktop/remainder/logo_footer_1.png" alt="셰게속의 경기도" /></a>
71</span>
72</div>
73</div>
74</div>
75<!-- // layout_A -->
76</body>
77</html>

Duplicate rows

Most frequently occurring

<!DOCTYPE html># duplicates
2</p>3
0setTimeout(function() {2
1}2
3}2
4요청하신 페이지를 찾을 수 없습니다. 이용에 불편을 드려 죄송합니다.<br /><br />2
5}2
6<!--[if lt IE 9]>2
7<![endif]-->2
8</script>2