Overview

Dataset statistics

Number of variables5
Number of observations8119
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory333.1 KiB
Average record size in memory42.0 B

Variable types

Numeric2
DateTime1
Text1
Categorical1

Dataset

Description2022년 5월부터 2023년 6월까지 개인정보 On마당 홈페이지에서 검색한 키워드에 대한 데이터로 해당 년,월에 입력 한 키워드와 횟수를 확인 할 수 있습니다.
URLhttps://www.data.go.kr/data/15119847/fileData.do

Alerts

키워드분류 is highly imbalanced (66.7%)Imbalance
번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 12:07:57.090126
Analysis finished2023-12-12 12:07:58.343023
Duration1.25 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

번호
Real number (ℝ)

UNIQUE 

Distinct8119
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4060
Minimum1
Maximum8119
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size71.5 KiB
2023-12-12T21:07:58.453123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile406.9
Q12030.5
median4060
Q36089.5
95-th percentile7713.1
Maximum8119
Range8118
Interquartile range (IQR)4059

Descriptive statistics

Standard deviation2343.8978
Coefficient of variation (CV)0.57731472
Kurtosis-1.2
Mean4060
Median Absolute Deviation (MAD)2030
Skewness0
Sum32963140
Variance5493856.7
MonotonicityStrictly increasing
2023-12-12T21:07:58.761795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
5409 1
 
< 0.1%
5422 1
 
< 0.1%
5421 1
 
< 0.1%
5420 1
 
< 0.1%
5419 1
 
< 0.1%
5418 1
 
< 0.1%
5417 1
 
< 0.1%
5416 1
 
< 0.1%
5415 1
 
< 0.1%
Other values (8109) 8109
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
8119 1
< 0.1%
8118 1
< 0.1%
8117 1
< 0.1%
8116 1
< 0.1%
8115 1
< 0.1%
8114 1
< 0.1%
8113 1
< 0.1%
8112 1
< 0.1%
8111 1
< 0.1%
8110 1
< 0.1%
Distinct14
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size63.6 KiB
Minimum2022-05-01 00:00:00
Maximum2023-06-01 00:00:00
2023-12-12T21:07:59.013824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:07:59.509208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
Distinct4788
Distinct (%)59.0%
Missing0
Missing (%)0.0%
Memory size63.6 KiB
2023-12-12T21:07:59.931487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length8
Mean length4.2433797
Min length1

Characters and Unicode

Total characters34452
Distinct characters626
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3631 ?
Unique (%)44.7%

Sample

1st row
2nd row
3rd row가명
4th row가상
5th row가족
ValueCountFrequency (%)
개인정보 257
 
2.4%
동의 136
 
1.3%
cctv 128
 
1.2%
제공 114
 
1.1%
위탁 78
 
0.7%
수집 77
 
0.7%
파기 74
 
0.7%
직원 73
 
0.7%
정보 68
 
0.6%
제3자 61
 
0.6%
Other values (3285) 9660
90.1%
2023-12-12T21:08:00.477625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2720
 
7.9%
1238
 
3.6%
1009
 
2.9%
983
 
2.9%
651
 
1.9%
612
 
1.8%
576
 
1.7%
532
 
1.5%
454
 
1.3%
447
 
1.3%
Other values (616) 25230
73.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 29547
85.8%
Space Separator 2720
 
7.9%
Lowercase Letter 1044
 
3.0%
Uppercase Letter 646
 
1.9%
Decimal Number 454
 
1.3%
Other Punctuation 28
 
0.1%
Dash Punctuation 8
 
< 0.1%
Close Punctuation 2
 
< 0.1%
Connector Punctuation 2
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1238
 
4.2%
1009
 
3.4%
983
 
3.3%
651
 
2.2%
612
 
2.1%
576
 
1.9%
532
 
1.8%
454
 
1.5%
447
 
1.5%
435
 
1.5%
Other values (547) 22610
76.5%
Lowercase Letter
ValueCountFrequency (%)
c 219
21.0%
t 106
10.2%
v 102
9.8%
s 94
9.0%
i 80
 
7.7%
a 70
 
6.7%
p 46
 
4.4%
d 45
 
4.3%
o 37
 
3.5%
e 37
 
3.5%
Other values (16) 208
19.9%
Uppercase Letter
ValueCountFrequency (%)
C 122
18.9%
I 73
11.3%
S 67
10.4%
D 54
8.4%
T 53
8.2%
V 52
8.0%
A 48
 
7.4%
P 43
 
6.7%
B 25
 
3.9%
G 16
 
2.5%
Other values (11) 93
14.4%
Decimal Number
ValueCountFrequency (%)
3 129
28.4%
1 106
23.3%
2 66
14.5%
4 53
11.7%
0 31
 
6.8%
8 21
 
4.6%
5 19
 
4.2%
9 12
 
2.6%
7 10
 
2.2%
6 7
 
1.5%
Other Punctuation
ValueCountFrequency (%)
, 13
46.4%
\ 5
 
17.9%
/ 5
 
17.9%
. 2
 
7.1%
; 1
 
3.6%
· 1
 
3.6%
: 1
 
3.6%
Space Separator
ValueCountFrequency (%)
2720
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%
Close Punctuation
ValueCountFrequency (%)
] 2
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 29547
85.8%
Common 3215
 
9.3%
Latin 1690
 
4.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1238
 
4.2%
1009
 
3.4%
983
 
3.3%
651
 
2.2%
612
 
2.1%
576
 
1.9%
532
 
1.8%
454
 
1.5%
447
 
1.5%
435
 
1.5%
Other values (547) 22610
76.5%
Latin
ValueCountFrequency (%)
c 219
 
13.0%
C 122
 
7.2%
t 106
 
6.3%
v 102
 
6.0%
s 94
 
5.6%
i 80
 
4.7%
I 73
 
4.3%
a 70
 
4.1%
S 67
 
4.0%
D 54
 
3.2%
Other values (37) 703
41.6%
Common
ValueCountFrequency (%)
2720
84.6%
3 129
 
4.0%
1 106
 
3.3%
2 66
 
2.1%
4 53
 
1.6%
0 31
 
1.0%
8 21
 
0.7%
5 19
 
0.6%
, 13
 
0.4%
9 12
 
0.4%
Other values (12) 45
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 29507
85.6%
ASCII 4904
 
14.2%
Compat Jamo 40
 
0.1%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2720
55.5%
c 219
 
4.5%
3 129
 
2.6%
C 122
 
2.5%
1 106
 
2.2%
t 106
 
2.2%
v 102
 
2.1%
s 94
 
1.9%
i 80
 
1.6%
I 73
 
1.5%
Other values (58) 1153
23.5%
Hangul
ValueCountFrequency (%)
1238
 
4.2%
1009
 
3.4%
983
 
3.3%
651
 
2.2%
612
 
2.1%
576
 
2.0%
532
 
1.8%
454
 
1.5%
447
 
1.5%
435
 
1.5%
Other values (527) 22570
76.5%
Compat Jamo
ValueCountFrequency (%)
6
15.0%
4
 
10.0%
3
 
7.5%
3
 
7.5%
3
 
7.5%
2
 
5.0%
2
 
5.0%
2
 
5.0%
2
 
5.0%
2
 
5.0%
Other values (10) 11
27.5%
None
ValueCountFrequency (%)
· 1
100.0%

검색개수
Real number (ℝ)

Distinct87
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.4392167
Minimum1
Maximum304
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size71.5 KiB
2023-12-12T21:08:00.659861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34
95-th percentile16
Maximum304
Range303
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.4522031
Coefficient of variation (CV)2.1292502
Kurtosis188.13809
Mean4.4392167
Median Absolute Deviation (MAD)1
Skewness9.9219943
Sum36042
Variance89.344143
MonotonicityNot monotonic
2023-12-12T21:08:00.856610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 3897
48.0%
2 1296
 
16.0%
3 657
 
8.1%
4 432
 
5.3%
5 289
 
3.6%
6 224
 
2.8%
7 191
 
2.4%
8 160
 
2.0%
9 111
 
1.4%
10 106
 
1.3%
Other values (77) 756
 
9.3%
ValueCountFrequency (%)
1 3897
48.0%
2 1296
 
16.0%
3 657
 
8.1%
4 432
 
5.3%
5 289
 
3.6%
6 224
 
2.8%
7 191
 
2.4%
8 160
 
2.0%
9 111
 
1.4%
10 106
 
1.3%
ValueCountFrequency (%)
304 1
< 0.1%
181 1
< 0.1%
156 1
< 0.1%
149 1
< 0.1%
144 1
< 0.1%
123 2
< 0.1%
121 1
< 0.1%
116 1
< 0.1%
110 1
< 0.1%
108 1
< 0.1%

키워드분류
Categorical

IMBALANCE 

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size63.6 KiB
기타
6441 
직장
 
473
웹사이트
 
351
영상
 
176
휴대폰
 
161
Other values (13)
 
517

Length

Max length7
Median length2
Mean length2.1704643
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row기타
2nd row기타
3rd row가명정보
4th row기타
5th row기타

Common Values

ValueCountFrequency (%)
기타 6441
79.3%
직장 473
 
5.8%
웹사이트 351
 
4.3%
영상 176
 
2.2%
휴대폰 161
 
2.0%
금융 145
 
1.8%
녹취 89
 
1.1%
내부시스템 63
 
0.8%
계약 37
 
0.5%
SNS 31
 
0.4%
Other values (8) 152
 
1.9%

Length

2023-12-12T21:08:01.016585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
기타 6441
79.3%
직장 473
 
5.8%
웹사이트 351
 
4.3%
영상 176
 
2.2%
휴대폰 161
 
2.0%
금융 145
 
1.8%
녹취 89
 
1.1%
내부시스템 63
 
0.8%
계약 37
 
0.5%
이메일 31
 
0.4%
Other values (8) 152
 
1.9%

Interactions

2023-12-12T21:07:57.906054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:07:57.681186image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:07:58.031008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:07:57.799337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:08:01.112129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호입력확정일자검색개수키워드분류
번호1.0000.9770.0200.074
입력확정일자0.9771.0000.0000.098
검색개수0.0200.0001.0000.132
키워드분류0.0740.0980.1321.000
2023-12-12T21:08:01.247004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호검색개수키워드분류
번호1.000-0.0510.028
검색개수-0.0511.0000.059
키워드분류0.0280.0591.000

Missing values

2023-12-12T21:07:58.164124image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:07:58.288414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

번호입력확정일자키워드검색개수키워드분류
012022-05-011기타
122022-05-011기타
232022-05-01가명1가명정보
342022-05-01가상2기타
452022-05-01가족34기타
562022-05-01간편1기타
672022-05-01감사25기타
782022-05-01같은2기타
892022-05-01개인1기타
9102022-05-01걸음1기타
번호입력확정일자키워드검색개수키워드분류
810981102023-06-01IDFA ADIA1기타
811081112023-06-01IDFA ADIA1기타
811181122023-06-01ip6기타
811281132023-06-01PG사3기타
811381142023-06-01PG사1기타
811481152023-06-01PG사 재위탁1기타
811581162023-06-01PG사 재위탁1기타
811681172023-06-01SNS6SNS
811781182023-06-01tlrqufwk1기타
811881192023-06-01vdi1기타