Overview

Dataset statistics

Number of variables6
Number of observations128
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.2 KiB
Average record size in memory50.0 B

Variable types

Numeric1
Text1
Categorical3
DateTime1

Dataset

Description샘플 데이터
Author지란지교시큐리티
URLhttps://www.findatamall.or.kr/market/dataProdDetail?gdsSn=27&gdsSeCd=GENERAL&gdsVer=1

Alerts

필터링 대상 is highly overall correlated with 필터링 조건High correlation
필터링 조건 is highly overall correlated with 필터링 대상High correlation
필터링 대상 is highly imbalanced (51.6%)Imbalance
필터링 조건 is highly imbalanced (79.9%)Imbalance
필터ID has unique valuesUnique
필터링 값 has unique valuesUnique

Reproduction

Analysis started2024-03-03 10:03:54.668427
Analysis finished2024-03-03 10:03:55.833672
Duration1.17 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

필터ID
Real number (ℝ)

UNIQUE 

Distinct128
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9446924
Minimum9428137
Maximum9447169
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 KiB
2024-03-03T19:03:56.051608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum9428137
5-th percentile9447048.3
Q19447073.8
median9447105.5
Q39447137.2
95-th percentile9447162.7
Maximum9447169
Range19032
Interquartile range (IQR)63.5

Descriptive statistics

Standard deviation1716.8674
Coefficient of variation (CV)0.00018173825
Kurtosis115.56187
Mean9446924
Median Absolute Deviation (MAD)32
Skewness-10.590766
Sum1.2092063 × 109
Variance2947633.6
MonotonicityNot monotonic
2024-03-03T19:03:56.521440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9447169 1
 
0.8%
9447108 1
 
0.8%
9447069 1
 
0.8%
9447076 1
 
0.8%
9447077 1
 
0.8%
9447078 1
 
0.8%
9447079 1
 
0.8%
9447084 1
 
0.8%
9447083 1
 
0.8%
9447082 1
 
0.8%
Other values (118) 118
92.2%
ValueCountFrequency (%)
9428137 1
0.8%
9442795 1
0.8%
9446969 1
0.8%
9447042 1
0.8%
9447043 1
0.8%
9447047 1
0.8%
9447048 1
0.8%
9447049 1
0.8%
9447050 1
0.8%
9447051 1
0.8%
ValueCountFrequency (%)
9447169 1
0.8%
9447168 1
0.8%
9447167 1
0.8%
9447166 1
0.8%
9447165 1
0.8%
9447164 1
0.8%
9447163 1
0.8%
9447162 1
0.8%
9447161 1
0.8%
9447160 1
0.8%

필터링 값
Text

UNIQUE 

Distinct128
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
2024-03-03T19:03:57.537019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length50
Median length34
Mean length18.421875
Min length10

Characters and Unicode

Total characters2358
Distinct characters73
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique128 ?
Unique (%)100.0%

Sample

1st row1GtdpkxN7izueE1696LQUSRB72Mh71BeNh
2nd rowmrallison60@gmail.com
3rd row2748482748&extra=&&&484827&&&vxc.azurewebsites.net
4th rowhronicealt1d8f.top
5th rowesliston.xyz
ValueCountFrequency (%)
1gtdpkxn7izuee1696lqusrb72mh71benh 1
 
0.7%
wqk4gsm74w.biz 1
 
0.7%
pasteascript.com/home 1
 
0.7%
y6hitbsfc3.biz 1
 
0.7%
gruyerec7nsgvday.onion.pet 1
 
0.7%
glasees.duckdns.org 1
 
0.7%
kidocx.xyz 1
 
0.7%
newyork-defense-lawyer.com/wp-http 1
 
0.7%
0jl5l1rntu.biz 1
 
0.7%
8b0p0zldx4.biz 1
 
0.7%
Other values (127) 127
92.7%
2024-03-03T19:03:58.998322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 214
 
9.1%
o 142
 
6.0%
i 139
 
5.9%
a 122
 
5.2%
c 121
 
5.1%
e 121
 
5.1%
l 100
 
4.2%
n 86
 
3.6%
m 84
 
3.6%
t 82
 
3.5%
Other values (63) 1147
48.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1787
75.8%
Other Punctuation 263
 
11.2%
Decimal Number 244
 
10.3%
Uppercase Letter 26
 
1.1%
Dash Punctuation 16
 
0.7%
Space Separator 9
 
0.4%
Other Letter 7
 
0.3%
Connector Punctuation 3
 
0.1%
Open Punctuation 1
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 142
 
7.9%
i 139
 
7.8%
a 122
 
6.8%
c 121
 
6.8%
e 121
 
6.8%
l 100
 
5.6%
n 86
 
4.8%
m 84
 
4.7%
t 82
 
4.6%
r 81
 
4.5%
Other values (16) 709
39.7%
Uppercase Letter
ValueCountFrequency (%)
B 3
 
11.5%
A 2
 
7.7%
N 2
 
7.7%
C 2
 
7.7%
G 2
 
7.7%
U 2
 
7.7%
S 2
 
7.7%
K 1
 
3.8%
I 1
 
3.8%
H 1
 
3.8%
Other values (8) 8
30.8%
Decimal Number
ValueCountFrequency (%)
7 31
12.7%
8 28
11.5%
1 28
11.5%
5 26
10.7%
4 25
10.2%
0 24
9.8%
2 24
9.8%
6 23
9.4%
9 21
8.6%
3 14
5.7%
Other Letter
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Other Punctuation
ValueCountFrequency (%)
. 214
81.4%
/ 21
 
8.0%
@ 17
 
6.5%
& 7
 
2.7%
: 3
 
1.1%
! 1
 
0.4%
Dash Punctuation
ValueCountFrequency (%)
- 16
100.0%
Space Separator
ValueCountFrequency (%)
9
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Math Symbol
ValueCountFrequency (%)
= 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1813
76.9%
Common 538
 
22.8%
Hangul 7
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 142
 
7.8%
i 139
 
7.7%
a 122
 
6.7%
c 121
 
6.7%
e 121
 
6.7%
l 100
 
5.5%
n 86
 
4.7%
m 84
 
4.6%
t 82
 
4.5%
r 81
 
4.5%
Other values (34) 735
40.5%
Common
ValueCountFrequency (%)
. 214
39.8%
7 31
 
5.8%
8 28
 
5.2%
1 28
 
5.2%
5 26
 
4.8%
4 25
 
4.6%
0 24
 
4.5%
2 24
 
4.5%
6 23
 
4.3%
9 21
 
3.9%
Other values (12) 94
17.5%
Hangul
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2351
99.7%
Hangul 7
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 214
 
9.1%
o 142
 
6.0%
i 139
 
5.9%
a 122
 
5.2%
c 121
 
5.1%
e 121
 
5.1%
l 100
 
4.3%
n 86
 
3.7%
m 84
 
3.6%
t 82
 
3.5%
Other values (56) 1140
48.5%
Hangul
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%

필터링 대상
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
본문의 URL
92 
본문
25 
보내는 사람 전체
 
5
보내는 메일 서버 Reply to
 
4
제목
 
1

Length

Max length18
Median length7
Mean length6.40625
Min length2

Unique

Unique2 ?
Unique (%)1.6%

Sample

1st row본문
2nd row본문
3rd row본문
4th row본문의 URL
5th row본문의 URL

Common Values

ValueCountFrequency (%)
본문의 URL 92
71.9%
본문 25
 
19.5%
보내는 사람 전체 5
 
3.9%
보내는 메일 서버 Reply to 4
 
3.1%
제목 1
 
0.8%
첨부파일 이름 1
 
0.8%

Length

2024-03-03T19:03:59.445697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-03T19:03:59.801471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
본문의 92
37.2%
url 92
37.2%
본문 25
 
10.1%
보내는 9
 
3.6%
사람 5
 
2.0%
전체 5
 
2.0%
메일 4
 
1.6%
서버 4
 
1.6%
reply 4
 
1.6%
to 4
 
1.6%
Other values (3) 3
 
1.2%

필터링 조건
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
포함하면
124 
일치하면
 
4

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row포함하면
2nd row포함하면
3rd row포함하면
4th row포함하면
5th row포함하면

Common Values

ValueCountFrequency (%)
포함하면 124
96.9%
일치하면 4
 
3.1%

Length

2024-03-03T19:04:00.188468image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-03T19:04:00.498506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
포함하면 124
96.9%
일치하면 4
 
3.1%

분류
Categorical

Distinct4
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
성인(A)
65 
피싱(X)
42 
홍보(P)
20 
홍보(P), 피싱(X)
 
1

Length

Max length12
Median length5
Mean length5.0546875
Min length5

Unique

Unique1 ?
Unique (%)0.8%

Sample

1st row피싱(X)
2nd row홍보(P)
3rd row피싱(X)
4th row홍보(P)
5th row홍보(P)

Common Values

ValueCountFrequency (%)
성인(A) 65
50.8%
피싱(X) 42
32.8%
홍보(P) 20
 
15.6%
홍보(P), 피싱(X) 1
 
0.8%

Length

2024-03-03T19:04:00.852927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-03T19:04:01.200706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
성인(a 65
50.4%
피싱(x 43
33.3%
홍보(p 21
 
16.3%
Distinct75
Distinct (%)58.6%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
Minimum2019-07-15 15:44:39
Maximum2019-07-15 18:04:42
2024-03-03T19:04:01.573065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-03T19:04:01.987158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2024-03-03T19:03:54.977951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-03T19:04:02.239432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
필터ID필터링 대상필터링 조건분류수정시간
필터ID1.0000.0000.0000.0001.000
필터링 대상0.0001.0000.9300.6051.000
필터링 조건0.0000.9301.0000.3901.000
분류0.0000.6050.3901.0000.996
수정시간1.0001.0001.0000.9961.000
2024-03-03T19:04:02.503430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
필터링 조건분류필터링 대상
필터링 조건1.0000.2590.752
분류0.2591.0000.432
필터링 대상0.7520.4321.000
2024-03-03T19:04:02.749748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
필터ID필터링 대상필터링 조건분류
필터ID1.0000.0000.0000.000
필터링 대상0.0001.0000.7520.432
필터링 조건0.0000.7521.0000.259
분류0.0000.4320.2591.000

Missing values

2024-03-03T19:03:55.347519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-03T19:03:55.699409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

필터ID필터링 값필터링 대상필터링 조건분류수정시간
094471691GtdpkxN7izueE1696LQUSRB72Mh71BeNh본문포함하면피싱(X)2019-07-15 18:04:42
19447168mrallison60@gmail.com본문포함하면홍보(P)2019-07-15 18:04:02
294471672748482748&extra=&&&484827&&&vxc.azurewebsites.net본문포함하면피싱(X)2019-07-15 18:03:43
39447166hronicealt1d8f.top본문의 URL포함하면홍보(P)2019-07-15 18:02:53
49447165esliston.xyz본문의 URL포함하면홍보(P)2019-07-15 18:02:12
59447164safra.nationalbank.enquiaries@gmail.com본문포함하면홍보(P)2019-07-15 17:56:44
69447163wo7783@gmail.com보내는 메일 서버 Reply to일치하면홍보(P)2019-07-15 17:55:21
79447162chonghinatakaitu56@gmail.com본문포함하면홍보(P)2019-07-15 17:53:38
89447161contabilidadeatual.com본문의 URL포함하면피싱(X)2019-07-15 17:50:18
99447160bannerman_jp@yahoo.com보내는 사람 전체포함하면홍보(P)2019-07-15 17:48:57
필터ID필터링 값필터링 대상필터링 조건분류수정시간
1189447050.swing-particular.com본문의 URL포함하면성인(A)2019-07-15 15:54:11
1199447051.fza0yslzpz7.cloud본문의 URL포함하면성인(A)2019-07-15 15:54:11
1209447052.ggut7reuar.biz본문의 URL포함하면성인(A)2019-07-15 15:54:11
1219447053.4lrj9zaex1.biz본문의 URL포함하면성인(A)2019-07-15 15:54:11
1229447054.2qqo9eic76.biz본문의 URL포함하면성인(A)2019-07-15 15:54:11
1239447055.bcgkwtnauigv.biz본문의 URL포함하면성인(A)2019-07-15 15:54:11
1249447048HGCK181160287-CIFA (1).lzh첨부파일 이름일치하면피싱(X)2019-07-15 15:48:02
1259447047/secnote24.club본문포함하면피싱(X)2019-07-15 15:45:52
1269447042.ijv4l4yjv7.biz본문의 URL포함하면성인(A)2019-07-15 15:44:39
1279447043.i1pr5jctqy.biz본문의 URL포함하면성인(A)2019-07-15 15:44:39