Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows17
Duplicate rows (%)0.2%
Total size in memory322.3 KiB
Average record size in memory33.0 B

Variable types

Text1
Numeric1
DateTime1

Dataset

Description기관 대표 홈페이지의 메뉴별 연관 키워드에 대한 정보로써 키워드, 조회수, 메뉴ID, 키워드 등록 날짜 항목 정보를 제공합니다.
Author한국보건산업진흥원
URLhttps://www.data.go.kr/data/15122043/fileData.do

Alerts

Dataset has 17 (0.2%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 05:55:51.306995
Analysis finished2023-12-12 05:55:52.032683
Duration0.73 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct4108
Distinct (%)41.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T14:55:52.332876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length97
Median length83
Mean length4.9841
Min length1

Characters and Unicode

Total characters49841
Distinct characters712
Distinct categories13 ?
Distinct scripts4 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3056 ?
Unique (%)30.6%

Sample

1st row정책보럼
2nd row제약산어
3rd row쇼케이스
4th row한국의료기기산업협회
5th row투자활성화
ValueCountFrequency (%)
의료기기 416
 
3.7%
한국보건산업진흥원 201
 
1.8%
보건산업 182
 
1.6%
중국 108
 
1.0%
글로벌 106
 
0.9%
식품의약품안전처 105
 
0.9%
의료해외진출 104
 
0.9%
보건복지부 94
 
0.8%
동향 90
 
0.8%
지원사업 83
 
0.7%
Other values (3920) 9768
86.8%
2023-12-12T14:55:52.855434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2193
 
4.4%
1796
 
3.6%
1599
 
3.2%
1314
 
2.6%
1258
 
2.5%
1155
 
2.3%
915
 
1.8%
858
 
1.7%
857
 
1.7%
840
 
1.7%
Other values (702) 37056
74.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 42553
85.4%
Uppercase Letter 2378
 
4.8%
Lowercase Letter 1990
 
4.0%
Space Separator 1258
 
2.5%
Decimal Number 1218
 
2.4%
Other Punctuation 245
 
0.5%
Dash Punctuation 107
 
0.2%
Open Punctuation 37
 
0.1%
Close Punctuation 35
 
0.1%
Math Symbol 16
 
< 0.1%
Other values (3) 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2193
 
5.2%
1796
 
4.2%
1599
 
3.8%
1314
 
3.1%
1155
 
2.7%
915
 
2.2%
858
 
2.0%
857
 
2.0%
840
 
2.0%
692
 
1.6%
Other values (618) 30334
71.3%
Uppercase Letter
ValueCountFrequency (%)
A 218
 
9.2%
I 201
 
8.5%
K 180
 
7.6%
E 162
 
6.8%
D 156
 
6.6%
R 147
 
6.2%
S 143
 
6.0%
T 141
 
5.9%
O 138
 
5.8%
M 126
 
5.3%
Other values (17) 766
32.2%
Lowercase Letter
ValueCountFrequency (%)
e 231
11.6%
a 217
10.9%
i 180
 
9.0%
t 162
 
8.1%
o 156
 
7.8%
r 155
 
7.8%
n 124
 
6.2%
l 117
 
5.9%
c 89
 
4.5%
d 86
 
4.3%
Other values (16) 473
23.8%
Decimal Number
ValueCountFrequency (%)
2 302
24.8%
1 276
22.7%
0 267
21.9%
9 76
 
6.2%
7 69
 
5.7%
8 61
 
5.0%
3 58
 
4.8%
4 53
 
4.4%
6 31
 
2.5%
5 25
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 146
59.6%
. 39
 
15.9%
& 36
 
14.7%
· 9
 
3.7%
' 8
 
3.3%
/ 3
 
1.2%
" 2
 
0.8%
% 1
 
0.4%
* 1
 
0.4%
Math Symbol
ValueCountFrequency (%)
> 6
37.5%
< 6
37.5%
+ 4
25.0%
Open Punctuation
ValueCountFrequency (%)
( 36
97.3%
[ 1
 
2.7%
Close Punctuation
ValueCountFrequency (%)
) 34
97.1%
] 1
 
2.9%
Space Separator
ValueCountFrequency (%)
1258
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 107
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 42546
85.4%
Latin 4368
 
8.8%
Common 2919
 
5.9%
Han 8
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2193
 
5.2%
1796
 
4.2%
1599
 
3.8%
1314
 
3.1%
1155
 
2.7%
915
 
2.2%
858
 
2.0%
857
 
2.0%
840
 
2.0%
692
 
1.6%
Other values (611) 30327
71.3%
Latin
ValueCountFrequency (%)
e 231
 
5.3%
A 218
 
5.0%
a 217
 
5.0%
I 201
 
4.6%
K 180
 
4.1%
i 180
 
4.1%
E 162
 
3.7%
t 162
 
3.7%
o 156
 
3.6%
D 156
 
3.6%
Other values (43) 2505
57.3%
Common
ValueCountFrequency (%)
1258
43.1%
2 302
 
10.3%
1 276
 
9.5%
0 267
 
9.1%
, 146
 
5.0%
- 107
 
3.7%
9 76
 
2.6%
7 69
 
2.4%
8 61
 
2.1%
3 58
 
2.0%
Other values (20) 299
 
10.2%
Han
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 42542
85.4%
ASCII 7276
 
14.6%
None 11
 
< 0.1%
CJK 8
 
< 0.1%
Compat Jamo 3
 
< 0.1%
Punctuation 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2193
 
5.2%
1796
 
4.2%
1599
 
3.8%
1314
 
3.1%
1155
 
2.7%
915
 
2.2%
858
 
2.0%
857
 
2.0%
840
 
2.0%
692
 
1.6%
Other values (608) 30323
71.3%
ASCII
ValueCountFrequency (%)
1258
 
17.3%
2 302
 
4.2%
1 276
 
3.8%
0 267
 
3.7%
e 231
 
3.2%
A 218
 
3.0%
a 217
 
3.0%
I 201
 
2.8%
K 180
 
2.5%
i 180
 
2.5%
Other values (70) 3946
54.2%
None
ValueCountFrequency (%)
· 9
81.8%
1
 
9.1%
1
 
9.1%
Compat Jamo
ValueCountFrequency (%)
2
66.7%
1
33.3%
Punctuation
ValueCountFrequency (%)
1
100.0%
CJK
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%

조회수
Real number (ℝ)

Distinct5020
Distinct (%)50.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4213.558
Minimum1
Maximum78677
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:55:53.016507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q11993
median3649
Q35516
95-th percentile10846
Maximum78677
Range78676
Interquartile range (IQR)3523

Descriptive statistics

Standard deviation3974.7822
Coefficient of variation (CV)0.94333155
Kurtosis41.568381
Mean4213.558
Median Absolute Deviation (MAD)1762
Skewness3.9318059
Sum42135580
Variance15798894
MonotonicityNot monotonic
2023-12-12T14:55:53.186906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 416
 
4.2%
2 216
 
2.2%
3 148
 
1.5%
4 81
 
0.8%
6 54
 
0.5%
5 52
 
0.5%
7 47
 
0.5%
8 21
 
0.2%
9 21
 
0.2%
12 18
 
0.2%
Other values (5010) 8926
89.3%
ValueCountFrequency (%)
1 416
4.2%
2 216
2.2%
3 148
 
1.5%
4 81
 
0.8%
5 52
 
0.5%
6 54
 
0.5%
7 47
 
0.5%
8 21
 
0.2%
9 21
 
0.2%
10 17
 
0.2%
ValueCountFrequency (%)
78677 1
< 0.1%
70841 2
< 0.1%
58211 1
< 0.1%
44409 1
< 0.1%
43769 2
< 0.1%
42086 1
< 0.1%
39176 2
< 0.1%
38487 1
< 0.1%
35538 1
< 0.1%
35080 1
< 0.1%
Distinct1674
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2006-02-01 00:00:00
Maximum2021-07-01 00:00:00
2023-12-12T14:55:53.367150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:55:53.530597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-12T14:55:51.759887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2023-12-12T14:55:51.892315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:55:51.988245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

키워드조회수키워드 등록 날짜
56904정책보럼57532018-11-12
11051제약산어117722016-02-19
32603쇼케이스115302017-07-27
21112한국의료기기산업협회29872016-12-01
49943투자활성화75482018-05-08
31359활용29632017-07-03
18517교육49742016-09-21
51881식품의약품안전처42892018-06-29
60460비품의 관리10142019-01-15
20087의료기관11792016-11-07
키워드조회수키워드 등록 날짜
18340특허연계53432016-09-13
30940의료기기36662017-06-26
50528보건복지부70422018-05-23
54707평가결과108832018-09-19
91883러시아83832019-12-03
2559043182017-03-06
4492420회41672018-01-23
68287인도66982019-07-09
20909보건의료22016-11-25
93379글로벌74092020-11-26

Duplicate rows

Most frequently occurring

키워드조회수키워드 등록 날짜# duplicates
0<script>alert('XSS')</script>22019-07-152
1KOHES뉴스레터12018-10-152
2MOA 체결32016-03-312
3교육비22016-10-212
4김현정기자32016-08-312
5뉴스레터32017-09-202
6서울경제12017-05-192
7영국26602016-11-132
8의료 한류12016-07-042
9의료기기12019-02-262