Overview

Dataset statistics

Number of variables6
Number of observations153
Missing cells0
Missing cells (%)0.0%
Duplicate rows6
Duplicate rows (%)3.9%
Total size in memory7.9 KiB
Average record size in memory52.9 B

Variable types

Categorical4
Numeric1
Text1

Dataset

DescriptionSample
Author한국인터넷진흥원
URLhttps://www.bigdata-telecom.kr/invoke/SOKBP2603/?goodsCode=KIS00000000000000004

Alerts

생성년도 has constant value ""Constant
Dataset has 6 (3.9%) duplicate rowsDuplicates
생성월 is highly overall correlated with 생성시분초 and 1 other fieldsHigh correlation
생성일 is highly overall correlated with 생성시분초 and 1 other fieldsHigh correlation
생성시분초 is highly overall correlated with 생성월 and 2 other fieldsHigh correlation
URL is highly overall correlated with 생성시분초High correlation
URL is highly imbalanced (91.5%)Imbalance

Reproduction

Analysis started2023-12-10 06:36:31.023536
Analysis finished2023-12-10 06:36:31.737540
Duration0.71 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

생성년도
Categorical

CONSTANT 

Distinct1
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
2019
153 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019
2nd row2019
3rd row2019
4th row2019
5th row2019

Common Values

ValueCountFrequency (%)
2019 153
100.0%

Length

2023-12-10T15:36:31.890795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:36:32.057794image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2019 153
100.0%

생성월
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
5
107 
7
46 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5
2nd row5
3rd row5
4th row5
5th row5

Common Values

ValueCountFrequency (%)
5 107
69.9%
7 46
30.1%

Length

2023-12-10T15:36:32.252501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:36:32.409806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5 107
69.9%
7 46
30.1%

생성일
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
22
75 
10
46 
23
32 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row23
2nd row23
3rd row23
4th row23
5th row23

Common Values

ValueCountFrequency (%)
22 75
49.0%
10 46
30.1%
23 32
20.9%

Length

2023-12-10T15:36:32.588166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:36:32.752373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
22 75
49.0%
10 46
30.1%
23 32
20.9%

생성시분초
Real number (ℝ)

HIGH CORRELATION 

Distinct23
Distinct (%)15.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean92179.085
Minimum3600
Maximum223700
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB
2023-12-10T15:36:32.915742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3600
5-th percentile31000
Q151800
median112500
Q3120000
95-th percentile132580
Maximum223700
Range220100
Interquartile range (IQR)68200

Descriptive statistics

Standard deviation42063.418
Coefficient of variation (CV)0.4563228
Kurtosis-0.38277847
Mean92179.085
Median Absolute Deviation (MAD)18000
Skewness0.026893349
Sum14103400
Variance1.7693311 × 109
MonotonicityNot monotonic
2023-12-10T15:36:33.139102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
130500 28
18.3%
112500 20
13.1%
50600 20
13.1%
51800 19
12.4%
120000 16
10.5%
114000 16
10.5%
35100 6
 
3.9%
53300 4
 
2.6%
31000 3
 
2.0%
24300 2
 
1.3%
Other values (13) 19
12.4%
ValueCountFrequency (%)
3600 1
 
0.7%
3800 1
 
0.7%
23700 1
 
0.7%
24300 2
 
1.3%
25000 2
 
1.3%
31000 3
 
2.0%
35100 6
 
3.9%
50600 20
13.1%
51800 19
12.4%
53300 4
 
2.6%
ValueCountFrequency (%)
223700 2
 
1.3%
182100 1
 
0.7%
144500 1
 
0.7%
140900 2
 
1.3%
135700 2
 
1.3%
130500 28
18.3%
120000 16
10.5%
114300 2
 
1.3%
114000 16
10.5%
112500 20
13.1%
Distinct131
Distinct (%)85.6%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
2023-12-10T15:36:33.653453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length12
Mean length11.705882
Min length9

Characters and Unicode

Total characters1791
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique113 ?
Unique (%)73.9%

Sample

1st row51.*.16.36
2nd row50.*.202.39
3rd row110.*.111.190
4th row202.*.239.17
5th row34.*.102.38
ValueCountFrequency (%)
175.*.163.169 4
 
2.6%
23.*.239.12 3
 
2.0%
50.*.202.39 3
 
2.0%
192.*.78.25 2
 
1.3%
104.*.115.34 2
 
1.3%
62.*.70.146 2
 
1.3%
86.*.200.105 2
 
1.3%
185.*.136.222 2
 
1.3%
181.*.254.21 2
 
1.3%
37.*.33.242 2
 
1.3%
Other values (121) 129
84.3%
2023-12-10T15:36:34.422160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 459
25.6%
1 287
16.0%
2 172
 
9.6%
* 153
 
8.5%
3 111
 
6.2%
9 96
 
5.4%
8 95
 
5.3%
0 91
 
5.1%
7 88
 
4.9%
5 85
 
4.7%
Other values (2) 154
 
8.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1179
65.8%
Other Punctuation 612
34.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 287
24.3%
2 172
14.6%
3 111
 
9.4%
9 96
 
8.1%
8 95
 
8.1%
0 91
 
7.7%
7 88
 
7.5%
5 85
 
7.2%
4 85
 
7.2%
6 69
 
5.9%
Other Punctuation
ValueCountFrequency (%)
. 459
75.0%
* 153
 
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1791
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 459
25.6%
1 287
16.0%
2 172
 
9.6%
* 153
 
8.5%
3 111
 
6.2%
9 96
 
5.4%
8 95
 
5.3%
0 91
 
5.1%
7 88
 
4.9%
5 85
 
4.7%
Other values (2) 154
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1791
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 459
25.6%
1 287
16.0%
2 172
 
9.6%
* 153
 
8.5%
3 111
 
6.2%
9 96
 
5.4%
8 95
 
5.3%
0 91
 
5.1%
7 88
 
4.9%
5 85
 
4.7%
Other values (2) 154
 
8.6%

URL
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
-
150 
hxxp://versuvius.ru/phazzy/Panel/fre.php
 
1
hxxp://37.49.230.231/285217/logs/fre.php
 
1
hxxp://5.8.88.176/es2cdNybX27IOKuk.conf.php
 
1

Length

Max length43
Median length1
Mean length1.7843137
Min length1

Unique

Unique3 ?
Unique (%)2.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 150
98.0%
hxxp://versuvius.ru/phazzy/Panel/fre.php 1
 
0.7%
hxxp://37.49.230.231/285217/logs/fre.php 1
 
0.7%
hxxp://5.8.88.176/es2cdNybX27IOKuk.conf.php 1
 
0.7%

Length

2023-12-10T15:36:34.687338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:36:34.868209image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
150
98.0%
hxxp://versuvius.ru/phazzy/panel/fre.php 1
 
0.7%
hxxp://37.49.230.231/285217/logs/fre.php 1
 
0.7%
hxxp://5.8.88.176/es2cdnybx27iokuk.conf.php 1
 
0.7%

Interactions

2023-12-10T15:36:31.301423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T15:36:35.009797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
생성월생성일생성시분초URL
생성월1.0001.0000.6460.249
생성일1.0001.0000.9220.065
생성시분초0.6460.9221.0000.761
URL0.2490.0650.7611.000
2023-12-10T15:36:35.146184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
생성월생성일URL
생성월1.0000.9970.164
생성일0.9971.0000.060
URL0.1640.0601.000
2023-12-10T15:36:35.293161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
생성시분초생성월생성일URL
생성시분초1.0000.6300.6520.598
생성월0.6301.0000.9970.164
생성일0.6520.9971.0000.060
URL0.5980.1640.0601.000

Missing values

2023-12-10T15:36:31.454748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T15:36:31.647876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

생성년도생성월생성일생성시분초IP주소URL
0201952312000051.*.16.36-
120195232500050.*.202.39-
22019523223700110.*.111.190-
32019523120000202.*.239.17-
420195233510034.*.102.38-
52019523120000192.*.78.25-
6201952331000103.*.72.54-
72019523223700104.*.110.190-
82019523120000185.*.61.161-
920195233510018.*.215.84-
생성년도생성월생성일생성시분초IP주소URL
1432019710112500178.*.83.248-
144201971011400079.*.23.90-
1452019710112500104.*.84.171-
1462019710114300198.*.117.212-
1472019710114000184.*.221.60-
1482019710114000184.*.131.241-
1492019710112500104.*.18.74-
1502019710112500104.*.27.170-
151201971011400023.*.239.12-
1522019710829005.*.88.176hxxp://5.8.88.176/es2cdNybX27IOKuk.conf.php

Duplicate rows

Most frequently occurring

생성년도생성월생성일생성시분초IP주소URL# duplicates
0201952253300104.*.114.34-2
1201952253300104.*.115.34-2
22019522140900175.*.163.169-2
32019523120000192.*.78.25-2
42019710112500148.*.235.217-2
52019710112500185.*.136.222-2