Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory400.4 KiB
Average record size in memory41.0 B

Variable types

Text2
DateTime1
Numeric1

Dataset

Description전통의학정보포털 오아시스의 일자별 이용자 검색어 정보입니다. 키워드, 사용자, 등록일, 검색건수로 이루어져있습니다.
Author한국한의학연구원
URLhttps://www.data.go.kr/data/15086067/fileData.do

Reproduction

Analysis started2023-12-12 18:28:42.796969
Analysis finished2023-12-12 18:28:43.589937
Duration0.79 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct4596
Distinct (%)46.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T03:28:43.896045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length102
Median length40
Mean length4.3868
Min length1

Characters and Unicode

Total characters43868
Distinct characters1018
Distinct categories12 ?
Distinct scripts6 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3135 ?
Unique (%)31.4%

Sample

1st row불면증
2nd row2000-10-22
3rd row갑상선
4th rowPhellodendri
5th row대한한방안이비인후피부과학회
ValueCountFrequency (%)
104
 
1.0%
한약 92
 
0.9%
acupuncture 65
 
0.6%
약침 54
 
0.5%
비만 49
 
0.5%
증례 43
 
0.4%
아토피 43
 
0.4%
추나 43
 
0.4%
40
 
0.4%
감초 35
 
0.3%
Other values (4552) 9436
94.3%
2023-12-13T03:28:44.528881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1964
 
4.5%
e 1467
 
3.3%
a 1415
 
3.2%
- 1339
 
3.1%
i 1256
 
2.9%
2 1086
 
2.5%
n 1082
 
2.5%
t 1048
 
2.4%
r 1026
 
2.3%
o 968
 
2.2%
Other values (1008) 31217
71.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 21885
49.9%
Lowercase Letter 14783
33.7%
Decimal Number 5620
 
12.8%
Dash Punctuation 1339
 
3.1%
Uppercase Letter 155
 
0.4%
Other Punctuation 61
 
0.1%
Math Symbol 10
 
< 0.1%
Connector Punctuation 8
 
< 0.1%
Space Separator 4
 
< 0.1%
Modifier Symbol 1
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
454
 
2.1%
372
 
1.7%
314
 
1.4%
310
 
1.4%
297
 
1.4%
270
 
1.2%
269
 
1.2%
267
 
1.2%
263
 
1.2%
255
 
1.2%
Other values (927) 18814
86.0%
Lowercase Letter
ValueCountFrequency (%)
e 1467
 
9.9%
a 1415
 
9.6%
i 1256
 
8.5%
n 1082
 
7.3%
t 1048
 
7.1%
r 1026
 
6.9%
o 968
 
6.5%
c 931
 
6.3%
s 816
 
5.5%
u 729
 
4.9%
Other values (17) 4045
27.4%
Uppercase Letter
ValueCountFrequency (%)
C 18
 
11.6%
L 17
 
11.0%
A 14
 
9.0%
T 13
 
8.4%
P 11
 
7.1%
R 10
 
6.5%
S 7
 
4.5%
H 6
 
3.9%
E 6
 
3.9%
U 6
 
3.9%
Other values (15) 47
30.3%
Decimal Number
ValueCountFrequency (%)
0 1964
34.9%
2 1086
19.3%
1 960
17.1%
9 267
 
4.8%
5 266
 
4.7%
8 247
 
4.4%
3 226
 
4.0%
6 219
 
3.9%
4 207
 
3.7%
7 178
 
3.2%
Other Punctuation
ValueCountFrequency (%)
/ 34
55.7%
: 12
 
19.7%
& 7
 
11.5%
· 3
 
4.9%
¡ 2
 
3.3%
# 2
 
3.3%
\ 1
 
1.6%
Math Symbol
ValueCountFrequency (%)
= 4
40.0%
+ 2
20.0%
2
20.0%
÷ 1
 
10.0%
~ 1
 
10.0%
Space Separator
ValueCountFrequency (%)
  2
50.0%
2
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 1339
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 8
100.0%
Modifier Symbol
ValueCountFrequency (%)
¨ 1
100.0%
Other Number
ValueCountFrequency (%)
1
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 21123
48.2%
Latin 14938
34.1%
Common 7044
 
16.1%
Han 759
 
1.7%
Katakana 3
 
< 0.1%
Greek 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
454
 
2.1%
372
 
1.8%
314
 
1.5%
310
 
1.5%
297
 
1.4%
270
 
1.3%
269
 
1.3%
267
 
1.3%
263
 
1.2%
255
 
1.2%
Other values (598) 18052
85.5%
Han
ValueCountFrequency (%)
31
 
4.1%
21
 
2.8%
19
 
2.5%
17
 
2.2%
15
 
2.0%
12
 
1.6%
11
 
1.4%
10
 
1.3%
10
 
1.3%
9
 
1.2%
Other values (316) 604
79.6%
Latin
ValueCountFrequency (%)
e 1467
 
9.8%
a 1415
 
9.5%
i 1256
 
8.4%
n 1082
 
7.2%
t 1048
 
7.0%
r 1026
 
6.9%
o 968
 
6.5%
c 931
 
6.2%
s 816
 
5.5%
u 729
 
4.9%
Other values (42) 4200
28.1%
Common
ValueCountFrequency (%)
0 1964
27.9%
- 1339
19.0%
2 1086
15.4%
1 960
13.6%
9 267
 
3.8%
5 266
 
3.8%
8 247
 
3.5%
3 226
 
3.2%
6 219
 
3.1%
4 207
 
2.9%
Other values (18) 263
 
3.7%
Katakana
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Greek
ValueCountFrequency (%)
γ 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21967
50.1%
Hangul 21098
48.1%
CJK 738
 
1.7%
Compat Jamo 25
 
0.1%
CJK Compat Ideographs 21
 
< 0.1%
None 13
 
< 0.1%
Katakana 3
 
< 0.1%
Math Operators 2
 
< 0.1%
Number Forms 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1964
 
8.9%
e 1467
 
6.7%
a 1415
 
6.4%
- 1339
 
6.1%
i 1256
 
5.7%
2 1086
 
4.9%
n 1082
 
4.9%
t 1048
 
4.8%
r 1026
 
4.7%
o 968
 
4.4%
Other values (61) 9316
42.4%
Hangul
ValueCountFrequency (%)
454
 
2.2%
372
 
1.8%
314
 
1.5%
310
 
1.5%
297
 
1.4%
270
 
1.3%
269
 
1.3%
267
 
1.3%
263
 
1.2%
255
 
1.2%
Other values (587) 18027
85.4%
CJK
ValueCountFrequency (%)
31
 
4.2%
21
 
2.8%
19
 
2.6%
17
 
2.3%
15
 
2.0%
12
 
1.6%
11
 
1.5%
10
 
1.4%
10
 
1.4%
9
 
1.2%
Other values (304) 583
79.0%
Compat Jamo
ValueCountFrequency (%)
8
32.0%
4
16.0%
4
16.0%
2
 
8.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
CJK Compat Ideographs
ValueCountFrequency (%)
4
19.0%
3
14.3%
3
14.3%
3
14.3%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (2) 2
9.5%
None
ValueCountFrequency (%)
· 3
23.1%
  2
15.4%
¡ 2
15.4%
Æ 2
15.4%
÷ 1
 
7.7%
¨ 1
 
7.7%
1
 
7.7%
γ 1
 
7.7%
Math Operators
ValueCountFrequency (%)
2
100.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
Katakana
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Distinct679
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T03:28:45.030696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters60000
Distinct characters45
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique303 ?
Unique (%)3.0%

Sample

1st rowguesOO
2nd rowguesOO
3rd rowguesOO
4th rowguesOO
5th rowgateOO
ValueCountFrequency (%)
guesoo 6447
64.5%
solaoo 167
 
1.7%
pkhgoo 97
 
1.0%
pastoo 89
 
0.9%
gll1oo 69
 
0.7%
kwonoo 59
 
0.6%
ssamoo 51
 
0.5%
fanuoo 48
 
0.5%
eunsoo 45
 
0.4%
min9oo 42
 
0.4%
Other values (668) 2886
28.9%
2023-12-13T03:28:45.610164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
O 20006
33.3%
s 7470
 
12.4%
e 7218
 
12.0%
u 6985
 
11.6%
g 6923
 
11.5%
a 1092
 
1.8%
o 1062
 
1.8%
l 880
 
1.5%
n 845
 
1.4%
i 833
 
1.4%
Other values (35) 6686
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 39266
65.4%
Uppercase Letter 20024
33.4%
Decimal Number 710
 
1.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 7470
19.0%
e 7218
18.4%
u 6985
17.8%
g 6923
17.6%
a 1092
 
2.8%
o 1062
 
2.7%
l 880
 
2.2%
n 845
 
2.2%
i 833
 
2.1%
k 660
 
1.7%
Other values (16) 5298
13.5%
Decimal Number
ValueCountFrequency (%)
1 200
28.2%
0 197
27.7%
9 91
12.8%
3 55
 
7.7%
2 32
 
4.5%
4 31
 
4.4%
6 30
 
4.2%
8 30
 
4.2%
5 26
 
3.7%
7 18
 
2.5%
Uppercase Letter
ValueCountFrequency (%)
O 20006
99.9%
M 5
 
< 0.1%
N 5
 
< 0.1%
T 2
 
< 0.1%
D 2
 
< 0.1%
P 1
 
< 0.1%
H 1
 
< 0.1%
B 1
 
< 0.1%
J 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 59290
98.8%
Common 710
 
1.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
O 20006
33.7%
s 7470
 
12.6%
e 7218
 
12.2%
u 6985
 
11.8%
g 6923
 
11.7%
a 1092
 
1.8%
o 1062
 
1.8%
l 880
 
1.5%
n 845
 
1.4%
i 833
 
1.4%
Other values (25) 5976
 
10.1%
Common
ValueCountFrequency (%)
1 200
28.2%
0 197
27.7%
9 91
12.8%
3 55
 
7.7%
2 32
 
4.5%
4 31
 
4.4%
6 30
 
4.2%
8 30
 
4.2%
5 26
 
3.7%
7 18
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
O 20006
33.3%
s 7470
 
12.4%
e 7218
 
12.0%
u 6985
 
11.6%
g 6923
 
11.5%
a 1092
 
1.8%
o 1062
 
1.8%
l 880
 
1.5%
n 845
 
1.4%
i 833
 
1.4%
Other values (35) 6686
 
11.1%
Distinct1074
Distinct (%)10.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2018-01-09 00:00:00
Maximum2021-04-20 00:00:00
2023-12-13T03:28:45.810129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:28:45.998254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

검색건수
Real number (ℝ)

Distinct94
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56.2217
Minimum1
Maximum363
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T03:28:46.182296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q116
median29
Q356
95-th percentile201
Maximum363
Range362
Interquartile range (IQR)40

Descriptive statistics

Standard deviation68.323745
Coefficient of variation (CV)1.2152558
Kurtosis5.8063183
Mean56.2217
Median Absolute Deviation (MAD)16
Skewness2.3152571
Sum562217
Variance4668.1342
MonotonicityNot monotonic
2023-12-13T03:28:46.393308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16 275
 
2.8%
19 260
 
2.6%
21 249
 
2.5%
17 238
 
2.4%
20 235
 
2.4%
28 234
 
2.3%
10 225
 
2.2%
24 225
 
2.2%
13 214
 
2.1%
12 213
 
2.1%
Other values (84) 7632
76.3%
ValueCountFrequency (%)
1 21
 
0.2%
2 49
 
0.5%
3 72
 
0.7%
4 103
1.0%
5 144
1.4%
6 181
1.8%
7 155
1.6%
8 193
1.9%
9 179
1.8%
10 225
2.2%
ValueCountFrequency (%)
363 161
1.6%
262 118
1.2%
213 85
0.9%
202 89
0.9%
201 90
0.9%
199 96
1.0%
190 77
0.8%
175 74
0.7%
157 66
0.7%
155 59
 
0.6%

Interactions

2023-12-13T03:28:43.319624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2023-12-13T03:28:43.449893image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T03:28:43.545122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

키워드사용자등록일검색건수
16230불면증guesOO2020-05-2837
58282000-10-22guesOO2018-06-2120
6498갑상선guesOO2018-07-239
10406PhellodendriguesOO2019-01-029
6554대한한방안이비인후피부과학회gateOO2018-07-3027
7778가래나무guesOO2018-09-1777
9469치료guesOO2018-11-1176
17013retrospectivepkhgOO2020-06-2260
2687서창용guesOO2018-01-2692
16589침치료guesOO2020-05-31139
키워드사용자등록일검색건수
4886소양인guesOO2018-05-1720
15998복근guesOO2020-05-1147
2311guesOO2018-01-23190
11941병행한dkwkOO2019-05-1739
15661청각자극guesOO2020-04-0337
17980알즈하이머guesOO2020-09-0837
18684crohnguesOO2020-11-0130
1305ㅎㅎㅎguesOO2018-01-1697
4172대전guesOO2018-04-0953
9113wonguesOO2018-11-0154