Overview

Dataset statistics

Number of variables5
Number of observations33
Missing cells0
Missing cells (%)0.0%
Duplicate rows1
Duplicate rows (%)3.0%
Total size in memory1.5 KiB
Average record size in memory48.0 B

Variable types

Categorical1
Numeric3
Text1

Dataset

DescriptionSample
Author한국인터넷진흥원
URLhttps://www.bigdata-telecom.kr/invoke/SOKBP2603/?goodsCode=KIS00000000000000010

Alerts

생성년도 has constant value ""Constant
Dataset has 1 (3.0%) duplicate rowsDuplicates
생성월 is highly overall correlated with 생성시분초High correlation
생성시분초 is highly overall correlated with 생성월High correlation
생성시분초 has 24 (72.7%) zerosZeros

Reproduction

Analysis started2023-12-10 06:27:52.037694
Analysis finished2023-12-10 06:27:53.890159
Duration1.85 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

생성년도
Categorical

CONSTANT 

Distinct1
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size396.0 B
2019
33 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019
2nd row2019
3rd row2019
4th row2019
5th row2019

Common Values

ValueCountFrequency (%)
2019 33
100.0%

Length

2023-12-10T15:27:53.991466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:27:54.138498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2019 33
100.0%

생성월
Real number (ℝ)

HIGH CORRELATION 

Distinct10
Distinct (%)30.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.0909091
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size429.0 B
2023-12-10T15:27:54.276668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median6
Q37
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.2051096
Coefficient of variation (CV)0.52621202
Kurtosis-0.63640527
Mean6.0909091
Median Absolute Deviation (MAD)2
Skewness0.50045096
Sum201
Variance10.272727
MonotonicityNot monotonic
2023-12-10T15:27:54.455295image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
6 6
18.2%
7 5
15.2%
4 5
15.2%
12 4
12.1%
3 4
12.1%
2 3
9.1%
10 2
 
6.1%
9 2
 
6.1%
1 1
 
3.0%
5 1
 
3.0%
ValueCountFrequency (%)
1 1
 
3.0%
2 3
9.1%
3 4
12.1%
4 5
15.2%
5 1
 
3.0%
6 6
18.2%
7 5
15.2%
9 2
 
6.1%
10 2
 
6.1%
12 4
12.1%
ValueCountFrequency (%)
12 4
12.1%
10 2
 
6.1%
9 2
 
6.1%
7 5
15.2%
6 6
18.2%
5 1
 
3.0%
4 5
15.2%
3 4
12.1%
2 3
9.1%
1 1
 
3.0%

생성일
Real number (ℝ)

Distinct13
Distinct (%)39.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.727273
Minimum1
Maximum28
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size429.0 B
2023-12-10T15:27:54.615351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.6
Q117
median18
Q325
95-th percentile28
Maximum28
Range27
Interquartile range (IQR)8

Descriptive statistics

Standard deviation7.0899256
Coefficient of variation (CV)0.35939715
Kurtosis1.599316
Mean19.727273
Median Absolute Deviation (MAD)6
Skewness-1.2450191
Sum651
Variance50.267045
MonotonicityNot monotonic
2023-12-10T15:27:54.792695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
17 7
21.2%
24 6
18.2%
18 5
15.2%
28 4
12.1%
25 3
9.1%
16 1
 
3.0%
1 1
 
3.0%
26 1
 
3.0%
3 1
 
3.0%
21 1
 
3.0%
Other values (3) 3
9.1%
ValueCountFrequency (%)
1 1
 
3.0%
2 1
 
3.0%
3 1
 
3.0%
15 1
 
3.0%
16 1
 
3.0%
17 7
21.2%
18 5
15.2%
21 1
 
3.0%
24 6
18.2%
25 3
9.1%
ValueCountFrequency (%)
28 4
12.1%
27 1
 
3.0%
26 1
 
3.0%
25 3
9.1%
24 6
18.2%
21 1
 
3.0%
18 5
15.2%
17 7
21.2%
16 1
 
3.0%
15 1
 
3.0%

생성시분초
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct7
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33089.061
Minimum0
Maximum161131
Zeros24
Zeros (%)72.7%
Negative0
Negative (%)0.0%
Memory size429.0 B
2023-12-10T15:27:54.954881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3101648
95-th percentile151411
Maximum161131
Range161131
Interquartile range (IQR)101648

Descriptive statistics

Standard deviation56309.5
Coefficient of variation (CV)1.7017558
Kurtosis-0.12043012
Mean33089.061
Median Absolute Deviation (MAD)0
Skewness1.2585116
Sum1091939
Variance3.1707598 × 109
MonotonicityNot monotonic
2023-12-10T15:27:55.168331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 24
72.7%
101648 2
 
6.1%
151411 2
 
6.1%
104927 2
 
6.1%
161131 1
 
3.0%
104214 1
 
3.0%
110622 1
 
3.0%
ValueCountFrequency (%)
0 24
72.7%
101648 2
 
6.1%
104214 1
 
3.0%
104927 2
 
6.1%
110622 1
 
3.0%
151411 2
 
6.1%
161131 1
 
3.0%
ValueCountFrequency (%)
161131 1
 
3.0%
151411 2
 
6.1%
110622 1
 
3.0%
104927 2
 
6.1%
104214 1
 
3.0%
101648 2
 
6.1%
0 24
72.7%
Distinct32
Distinct (%)97.0%
Missing0
Missing (%)0.0%
Memory size396.0 B
2023-12-10T15:27:55.570533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length121
Median length42
Mean length33.969697
Min length18

Characters and Unicode

Total characters1121
Distinct characters187
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)93.9%

Sample

1st row2018년_하반기_악성코드_은닉사이트_탐지_동향_보고서.pdf
2nd row_KISA__갠드크랩_분석_스폐셜_리포트.pdf
3rd row20190226_Clop랜섬웨어_유포에_따른_감염_주의.pdf
4th row사이버보안_빅데이터_활용사례(연구-국민대-윤명근_교수).pdf
5th row사이버보안_빅데이터_활용사례(기업-두산디지털이노베이션BU-김민교_대리).pdf
ValueCountFrequency (%)
공급망_공격_사례_분석_및_대응_방안.pdf 2
 
6.1%
2018년_하반기_악성코드_은닉사이트_탐지_동향_보고서.pdf 1
 
3.0%
3._머신러닝_기반의_보안데이터_분석_연구.pdf 1
 
3.0%
4._operation_kitty_phishing.pdf 1
 
3.0%
3._system_anomaly_analysis___detection.pdf 1
 
3.0%
2._threat_intelligence_수집방법_및_활용사례.pdf 1
 
3.0%
1._사이버보안빅데이터센터_소개.pdf 1
 
3.0%
ad(active_directory)_관리자가_피해야_할_6가지_ad_운영_사례.pdf 1
 
3.0%
2019년_1분기_사이버_위협_동향_보고서.pdf 1
 
3.0%
kisa_technical_report__analysis_on_cases_of_distribution_of_internal_network_ransomware_through_exploiting_ad_server.pdf 1
 
3.0%
Other values (22) 22
66.7%
2023-12-10T15:27:56.204092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 178
 
15.9%
. 44
 
3.9%
p 39
 
3.5%
f 35
 
3.1%
d 33
 
2.9%
e 29
 
2.6%
2 22
 
2.0%
22
 
2.0%
21
 
1.9%
r 19
 
1.7%
Other values (177) 679
60.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 413
36.8%
Lowercase Letter 300
26.8%
Connector Punctuation 178
15.9%
Uppercase Letter 97
 
8.7%
Decimal Number 69
 
6.2%
Other Punctuation 44
 
3.9%
Close Punctuation 7
 
0.6%
Open Punctuation 7
 
0.6%
Dash Punctuation 6
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
22
 
5.3%
21
 
5.1%
14
 
3.4%
12
 
2.9%
11
 
2.7%
11
 
2.7%
9
 
2.2%
9
 
2.2%
8
 
1.9%
8
 
1.9%
Other values (121) 288
69.7%
Lowercase Letter
ValueCountFrequency (%)
p 39
13.0%
f 35
11.7%
d 33
11.0%
e 29
9.7%
r 19
 
6.3%
t 19
 
6.3%
o 16
 
5.3%
i 15
 
5.0%
n 14
 
4.7%
s 13
 
4.3%
Other values (13) 68
22.7%
Uppercase Letter
ValueCountFrequency (%)
A 18
18.6%
I 16
16.5%
S 13
13.4%
K 8
8.2%
D 8
8.2%
T 6
 
6.2%
R 6
 
6.2%
C 5
 
5.2%
N 3
 
3.1%
H 2
 
2.1%
Other values (8) 12
12.4%
Decimal Number
ValueCountFrequency (%)
2 22
31.9%
0 15
21.7%
1 11
15.9%
9 7
 
10.1%
3 4
 
5.8%
6 3
 
4.3%
5 2
 
2.9%
4 2
 
2.9%
7 2
 
2.9%
8 1
 
1.4%
Connector Punctuation
ValueCountFrequency (%)
_ 178
100.0%
Other Punctuation
ValueCountFrequency (%)
. 44
100.0%
Close Punctuation
ValueCountFrequency (%)
) 7
100.0%
Open Punctuation
ValueCountFrequency (%)
( 7
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 413
36.8%
Latin 397
35.4%
Common 311
27.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
22
 
5.3%
21
 
5.1%
14
 
3.4%
12
 
2.9%
11
 
2.7%
11
 
2.7%
9
 
2.2%
9
 
2.2%
8
 
1.9%
8
 
1.9%
Other values (121) 288
69.7%
Latin
ValueCountFrequency (%)
p 39
 
9.8%
f 35
 
8.8%
d 33
 
8.3%
e 29
 
7.3%
r 19
 
4.8%
t 19
 
4.8%
A 18
 
4.5%
o 16
 
4.0%
I 16
 
4.0%
i 15
 
3.8%
Other values (31) 158
39.8%
Common
ValueCountFrequency (%)
_ 178
57.2%
. 44
 
14.1%
2 22
 
7.1%
0 15
 
4.8%
1 11
 
3.5%
9 7
 
2.3%
) 7
 
2.3%
( 7
 
2.3%
- 6
 
1.9%
3 4
 
1.3%
Other values (5) 10
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 708
63.2%
Hangul 413
36.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 178
25.1%
. 44
 
6.2%
p 39
 
5.5%
f 35
 
4.9%
d 33
 
4.7%
e 29
 
4.1%
2 22
 
3.1%
r 19
 
2.7%
t 19
 
2.7%
A 18
 
2.5%
Other values (46) 272
38.4%
Hangul
ValueCountFrequency (%)
22
 
5.3%
21
 
5.1%
14
 
3.4%
12
 
2.9%
11
 
2.7%
11
 
2.7%
9
 
2.2%
9
 
2.2%
8
 
1.9%
8
 
1.9%
Other values (121) 288
69.7%

Interactions

2023-12-10T15:27:53.148394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:27:52.356701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:27:52.744759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:27:53.294811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:27:52.480808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:27:52.872757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:27:53.464661image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:27:52.603125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:27:52.997096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T15:27:56.339714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
생성월생성일생성시분초분석보고서명
생성월1.0000.9120.9321.000
생성일0.9121.0000.5781.000
생성시분초0.9320.5781.0001.000
분석보고서명1.0001.0001.0001.000
2023-12-10T15:27:56.490188image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
생성월생성일생성시분초
생성월1.000-0.1550.547
생성일-0.1551.000-0.185
생성시분초0.547-0.1851.000

Missing values

2023-12-10T15:27:53.662973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T15:27:53.833426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

생성년도생성월생성일생성시분초분석보고서명
020191161611312018년_하반기_악성코드_은닉사이트_탐지_동향_보고서.pdf
12019210_KISA__갠드크랩_분석_스폐셜_리포트.pdf
22019226020190226_Clop랜섬웨어_유포에_따른_감염_주의.pdf
320191218101648사이버보안_빅데이터_활용사례(연구-국민대-윤명근_교수).pdf
420191218101648사이버보안_빅데이터_활용사례(기업-두산디지털이노베이션BU-김민교_대리).pdf
520191217151411KISA-포스터(2020년_7대_사이버_공격_전망).pdf
620191217151411KISA-발표자료(2020년_7대_사이버_공격_전망).pdf
7201910251049272019년_3분기_사이버_위협_동향_보고서.pdf
820191025104927KISA_Cyber_Security_Issue_Report__Q3_2019.pdf
920199251042142._머신러닝을_활용한_피싱_사이트_탐지_방안.pdf
생성년도생성월생성일생성시분초분석보고서명
2320194170AD_악용_랜섬웨어_유포사례_분석.pdf
2420194170_KISA_Technical_Report.pdf
2520194170_KISA_Technical_Report__Analysis_on_Cases_of_Distribution_of_Internal_Network_Ransomware_through_Exploiting_AD_Server.pdf
26201941502019년_1분기_사이버_위협_동향_보고서.pdf
272019420AD(Active_Directory)_관리자가_피해야_할_6가지_AD_운영_사례.pdf
28201932801._사이버보안빅데이터센터_소개.pdf
29201932802._Threat_Intelligence_수집방법_및_활용사례.pdf
30201932803._System_Anomaly_Analysis___Detection.pdf
31201932804._OPERATION_KITTY_PHISHING.pdf
3220192270AD_관리자_계정_탈취를_통한_기업_내부망_장악_사례와_대응방안.pdf

Duplicate rows

Most frequently occurring

생성년도생성월생성일생성시분초분석보고서명# duplicates
020197180공급망_공격_사례_분석_및_대응_방안.pdf2