Overview

Dataset statistics

Number of variables4
Number of observations3663
Missing cells0
Missing cells (%)0.0%
Duplicate rows250
Duplicate rows (%)6.8%
Total size in memory118.2 KiB
Average record size in memory33.0 B

Variable types

Categorical1
Text1
DateTime1
Numeric1

Dataset

Description전통의학정보포털 오아시스의 한의연구보고서 입력 정보입니다. 구분, 사용자(ID), 순번, 등록일, 스크랩건수로 이루어져있습니다.
Author한국한의학연구원
URLhttps://www.data.go.kr/data/15086072/fileData.do

Alerts

Dataset has 250 (6.8%) duplicate rowsDuplicates
구분 is highly imbalanced (78.6%)Imbalance

Reproduction

Analysis started2023-12-12 00:57:48.741618
Analysis finished2023-12-12 01:02:51.087403
Duration5 minutes and 2.35 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Categorical

IMBALANCE 

Distinct15
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size28.7 KiB
논문
3197 
보고서
 
230
처방상세-TAB1
 
121
모노상세-SEQ1
 
34
통계
 
18
Other values (10)
 
63

Length

Max length9
Median length2
Mean length2.4777505
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row논문
2nd row논문
3rd row보고서
4th row보고서
5th row논문

Common Values

ValueCountFrequency (%)
논문 3197
87.3%
보고서 230
 
6.3%
처방상세-TAB1 121
 
3.3%
모노상세-SEQ1 34
 
0.9%
통계 18
 
0.5%
모노상세-SEQ2 12
 
0.3%
처방상세-TAB4 11
 
0.3%
처방상세-TAB3 11
 
0.3%
표본상세-SPE1 10
 
0.3%
모노상세-SEQ5 6
 
0.2%
Other values (5) 13
 
0.4%

Length

2023-12-12T10:02:51.509494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
논문 3197
87.3%
보고서 230
 
6.3%
처방상세-tab1 121
 
3.3%
모노상세-seq1 34
 
0.9%
통계 18
 
0.5%
모노상세-seq2 12
 
0.3%
처방상세-tab4 11
 
0.3%
처방상세-tab3 11
 
0.3%
표본상세-spe1 10
 
0.3%
모노상세-seq5 6
 
0.2%
Other values (5) 13
 
0.4%
Distinct945
Distinct (%)25.8%
Missing0
Missing (%)0.0%
Memory size28.7 KiB
2023-12-12T10:02:51.819095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters21978
Distinct characters40
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique563 ?
Unique (%)15.4%

Sample

1st rowjhe5OO
2nd rowmotoOO
3rd rowghshOO
4th rowkuriOO
5th rowsillOO
ValueCountFrequency (%)
solaoo 129
 
3.5%
zailoo 123
 
3.4%
jhy4oo 87
 
2.4%
nimdoo 80
 
2.2%
yoyooo 72
 
2.0%
clemoo 71
 
1.9%
jcjooo 70
 
1.9%
cheroo 56
 
1.5%
tripoo 52
 
1.4%
phssoo 50
 
1.4%
Other values (934) 2873
78.4%
2023-12-12T10:02:52.257778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
O 7326
33.3%
a 1211
 
5.5%
o 1069
 
4.9%
s 1000
 
4.6%
h 996
 
4.5%
i 931
 
4.2%
n 885
 
4.0%
l 835
 
3.8%
e 781
 
3.6%
j 697
 
3.2%
Other values (30) 6247
28.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 14167
64.5%
Uppercase Letter 7336
33.4%
Decimal Number 475
 
2.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1211
 
8.5%
o 1069
 
7.5%
s 1000
 
7.1%
h 996
 
7.0%
i 931
 
6.6%
n 885
 
6.2%
l 835
 
5.9%
e 781
 
5.5%
j 697
 
4.9%
y 689
 
4.9%
Other values (16) 5073
35.8%
Decimal Number
ValueCountFrequency (%)
0 130
27.4%
4 96
20.2%
1 59
12.4%
2 46
 
9.7%
3 37
 
7.8%
9 35
 
7.4%
5 21
 
4.4%
8 19
 
4.0%
6 18
 
3.8%
7 14
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
O 7326
99.9%
M 8
 
0.1%
L 1
 
< 0.1%
D 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 21503
97.8%
Common 475
 
2.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
O 7326
34.1%
a 1211
 
5.6%
o 1069
 
5.0%
s 1000
 
4.7%
h 996
 
4.6%
i 931
 
4.3%
n 885
 
4.1%
l 835
 
3.9%
e 781
 
3.6%
j 697
 
3.2%
Other values (20) 5772
26.8%
Common
ValueCountFrequency (%)
0 130
27.4%
4 96
20.2%
1 59
12.4%
2 46
 
9.7%
3 37
 
7.8%
9 35
 
7.4%
5 21
 
4.4%
8 19
 
4.0%
6 18
 
3.8%
7 14
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21978
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
O 7326
33.3%
a 1211
 
5.5%
o 1069
 
4.9%
s 1000
 
4.6%
h 996
 
4.5%
i 931
 
4.2%
n 885
 
4.0%
l 835
 
3.8%
e 781
 
3.6%
j 697
 
3.2%
Other values (30) 6247
28.4%
Distinct824
Distinct (%)22.5%
Missing0
Missing (%)0.0%
Memory size28.7 KiB
Minimum2018-01-02 00:00:00
Maximum2021-07-26 00:00:00
2023-12-12T10:02:52.415795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:02:52.551222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

스크랩건수
Real number (ℝ)

Distinct41
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.024843
Minimum1
Maximum130
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 KiB
2023-12-12T10:02:52.688436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median11
Q343
95-th percentile122
Maximum130
Range129
Interquartile range (IQR)40

Descriptive statistics

Standard deviation34.747481
Coefficient of variation (CV)1.2398814
Kurtosis1.7997032
Mean28.024843
Median Absolute Deviation (MAD)9
Skewness1.5984555
Sum102655
Variance1207.3874
MonotonicityNot monotonic
2023-12-12T10:02:52.855297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
1 361
 
9.9%
2 354
 
9.7%
3 252
 
6.9%
4 192
 
5.2%
74 148
 
4.0%
5 145
 
4.0%
130 130
 
3.5%
6 126
 
3.4%
122 122
 
3.3%
9 117
 
3.2%
Other values (31) 1716
46.8%
ValueCountFrequency (%)
1 361
9.9%
2 354
9.7%
3 252
6.9%
4 192
5.2%
5 145
4.0%
6 126
 
3.4%
7 70
 
1.9%
8 104
 
2.8%
9 117
 
3.2%
10 70
 
1.9%
ValueCountFrequency (%)
130 130
3.5%
122 122
3.3%
74 148
4.0%
73 73
2.0%
67 67
1.8%
65 65
1.8%
59 59
 
1.6%
56 56
 
1.5%
51 51
 
1.4%
50 50
 
1.4%

Interactions

2023-12-12T09:57:48.908268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T10:02:52.948595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분스크랩건수
구분1.0000.267
스크랩건수0.2671.000
2023-12-12T10:02:53.028871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
스크랩건수구분
스크랩건수1.0000.124
구분0.1241.000

Missing values

2023-12-12T09:57:49.041182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T09:57:49.156189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

구분사용자(ID)등록일스크랩건수
0논문jhe5OO2018-01-021
1논문motoOO2018-01-032
2보고서ghshOO2018-01-032
3보고서kuriOO2018-01-051
4논문sillOO2018-01-116
5논문sillOO2018-01-116
6논문sillOO2018-01-116
7논문sillOO2018-01-116
8논문sillOO2018-01-116
9논문kailOO2018-01-116
구분사용자(ID)등록일스크랩건수
3653논문hye0OO2021-07-143
3654논문hye0OO2021-07-143
3655논문dldbOO2021-07-151
3656논문swnsOO2021-07-161
3657논문condOO2021-07-182
3658논문jjl1OO2021-07-182
3659논문cdi9OO2021-07-211
3660논문jcjoOO2021-07-221
3661논문jeonOO2021-07-251
3662보고서heewOO2021-07-261

Duplicate rows

Most frequently occurring

구분사용자(ID)등록일스크랩건수# duplicates
166논문solaOO2018-09-27130129
209논문zailOO2019-05-11122122
88논문jhy4OO2018-11-127474
203논문yoyoOO2019-05-127372
23논문clemOO2021-01-236766
20논문cherOO2019-10-155656
132논문nimdOO2021-03-255956
187논문tripOO2018-08-095151
144논문phssOO2021-06-126550
93논문joosOO2018-08-065048