Overview

Dataset statistics

Number of variables5
Number of observations1235
Missing cells1734
Missing cells (%)28.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory48.4 KiB
Average record size in memory40.1 B

Variable types

Text4
Boolean1

Dataset

Description국립암센터에서 19년도 9월까지 국립암센터홈페이지를 통해 개방하는 책정보
Author국립암센터
URLhttps://www.data.go.kr/data/15049626/fileData.do

Alerts

COL1 has 1008 (81.6%) missing valuesMissing
COL2 has 640 (51.8%) missing valuesMissing
NOVIEW has 86 (7.0%) missing valuesMissing
BBS_SEQ has unique valuesUnique

Reproduction

Analysis started2023-12-12 09:32:14.254933
Analysis finished2023-12-12 09:32:15.224894
Duration0.97 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

BBS_SEQ
Text

UNIQUE 

Distinct1235
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
2023-12-12T18:32:15.720750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length4.3740891
Min length2

Characters and Unicode

Total characters5402
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1235 ?
Unique (%)100.0%

Sample

1st row587
2nd row588
3rd row589
4th row590
5th row591
ValueCountFrequency (%)
587 1
 
0.1%
1,828 1
 
0.1%
1,857 1
 
0.1%
1,856 1
 
0.1%
1,854 1
 
0.1%
1,853 1
 
0.1%
1,855 1
 
0.1%
1,841 1
 
0.1%
1,840 1
 
0.1%
1,818 1
 
0.1%
Other values (1225) 1225
99.2%
2023-12-12T18:32:16.539523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
, 867
16.0%
1 807
14.9%
3 600
11.1%
2 541
10.0%
4 450
8.3%
6 419
7.8%
7 396
7.3%
8 387
7.2%
5 320
 
5.9%
9 308
 
5.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4535
84.0%
Other Punctuation 867
 
16.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 807
17.8%
3 600
13.2%
2 541
11.9%
4 450
9.9%
6 419
9.2%
7 396
8.7%
8 387
8.5%
5 320
 
7.1%
9 308
 
6.8%
0 307
 
6.8%
Other Punctuation
ValueCountFrequency (%)
, 867
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5402
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
, 867
16.0%
1 807
14.9%
3 600
11.1%
2 541
10.0%
4 450
8.3%
6 419
7.8%
7 396
7.3%
8 387
7.2%
5 320
 
5.9%
9 308
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5402
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
, 867
16.0%
1 807
14.9%
3 600
11.1%
2 541
10.0%
4 450
8.3%
6 419
7.8%
7 396
7.3%
8 387
7.2%
5 320
 
5.9%
9 308
 
5.7%

BBSNUM
Text

Distinct147
Distinct (%)11.9%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
2023-12-12T18:32:17.032280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length3
Mean length3.6850202
Min length2

Characters and Unicode

Total characters4551
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)1.3%

Sample

1st row203
2nd row223
3rd row223
4th row223
5th row223
ValueCountFrequency (%)
704 31
 
2.5%
364 30
 
2.4%
403 27
 
2.2%
504 26
 
2.1%
523 26
 
2.1%
304 25
 
2.0%
644 24
 
1.9%
325 24
 
1.9%
464 24
 
1.9%
626 23
 
1.9%
Other values (137) 975
78.9%
2023-12-12T18:32:17.717068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 760
16.7%
4 629
13.8%
3 533
11.7%
, 462
10.2%
2 448
9.8%
6 446
9.8%
0 432
9.5%
5 298
 
6.5%
8 232
 
5.1%
7 220
 
4.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4089
89.8%
Other Punctuation 462
 
10.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 760
18.6%
4 629
15.4%
3 533
13.0%
2 448
11.0%
6 446
10.9%
0 432
10.6%
5 298
 
7.3%
8 232
 
5.7%
7 220
 
5.4%
9 91
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 462
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4551
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 760
16.7%
4 629
13.8%
3 533
11.7%
, 462
10.2%
2 448
9.8%
6 446
9.8%
0 432
9.5%
5 298
 
6.5%
8 232
 
5.1%
7 220
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4551
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 760
16.7%
4 629
13.8%
3 533
11.7%
, 462
10.2%
2 448
9.8%
6 446
9.8%
0 432
9.5%
5 298
 
6.5%
8 232
 
5.1%
7 220
 
4.8%

COL1
Text

MISSING 

Distinct102
Distinct (%)44.9%
Missing1008
Missing (%)81.6%
Memory size9.8 KiB
2023-12-12T18:32:18.227271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length45
Median length41
Mean length9.339207
Min length1

Characters and Unicode

Total characters2120
Distinct characters313
Distinct categories11 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique89 ?
Unique (%)39.2%

Sample

1st row출간일
2nd row가격
3rd row
4th row 엮은이
5th row 지은이
ValueCountFrequency (%)
국립암센터 35
 
6.8%
출간일 34
 
6.6%
가격 32
 
6.2%
지은이 32
 
6.2%
뉴스레터 16
 
3.1%
15
 
2.9%
출판사 11
 
2.1%
개최 8
 
1.6%
5
 
1.0%
정가 5
 
1.0%
Other values (280) 323
62.6%
2023-12-12T18:32:19.172292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
359
 
16.9%
70
 
3.3%
65
 
3.1%
51
 
2.4%
48
 
2.3%
47
 
2.2%
0 44
 
2.1%
43
 
2.0%
43
 
2.0%
42
 
2.0%
Other values (303) 1308
61.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1461
68.9%
Space Separator 363
 
17.1%
Decimal Number 147
 
6.9%
Other Punctuation 60
 
2.8%
Uppercase Letter 35
 
1.7%
Lowercase Letter 34
 
1.6%
Dash Punctuation 10
 
0.5%
Initial Punctuation 4
 
0.2%
Final Punctuation 4
 
0.2%
Open Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
70
 
4.8%
65
 
4.4%
51
 
3.5%
48
 
3.3%
47
 
3.2%
43
 
2.9%
43
 
2.9%
42
 
2.9%
41
 
2.8%
40
 
2.7%
Other values (249) 971
66.5%
Uppercase Letter
ValueCountFrequency (%)
O 5
14.3%
C 4
11.4%
M 4
11.4%
N 3
8.6%
S 3
8.6%
I 3
8.6%
W 2
 
5.7%
H 2
 
5.7%
F 2
 
5.7%
E 1
 
2.9%
Other values (6) 6
17.1%
Lowercase Letter
ValueCountFrequency (%)
e 6
17.6%
a 6
17.6%
r 3
8.8%
n 3
8.8%
i 3
8.8%
s 2
 
5.9%
t 2
 
5.9%
c 2
 
5.9%
g 2
 
5.9%
l 1
 
2.9%
Other values (4) 4
11.8%
Decimal Number
ValueCountFrequency (%)
0 44
29.9%
1 35
23.8%
2 24
16.3%
9 9
 
6.1%
6 8
 
5.4%
4 6
 
4.1%
7 6
 
4.1%
8 5
 
3.4%
3 5
 
3.4%
5 5
 
3.4%
Other Punctuation
ValueCountFrequency (%)
, 32
53.3%
: 16
26.7%
. 8
 
13.3%
· 1
 
1.7%
% 1
 
1.7%
& 1
 
1.7%
? 1
 
1.7%
Space Separator
ValueCountFrequency (%)
359
98.9%
  4
 
1.1%
Dash Punctuation
ValueCountFrequency (%)
- 10
100.0%
Initial Punctuation
ValueCountFrequency (%)
4
100.0%
Final Punctuation
ValueCountFrequency (%)
4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1460
68.9%
Common 590
27.8%
Latin 69
 
3.3%
Han 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
70
 
4.8%
65
 
4.5%
51
 
3.5%
48
 
3.3%
47
 
3.2%
43
 
2.9%
43
 
2.9%
42
 
2.9%
41
 
2.8%
40
 
2.7%
Other values (248) 970
66.4%
Latin
ValueCountFrequency (%)
e 6
 
8.7%
a 6
 
8.7%
O 5
 
7.2%
C 4
 
5.8%
M 4
 
5.8%
r 3
 
4.3%
n 3
 
4.3%
N 3
 
4.3%
S 3
 
4.3%
I 3
 
4.3%
Other values (20) 29
42.0%
Common
ValueCountFrequency (%)
359
60.8%
0 44
 
7.5%
1 35
 
5.9%
, 32
 
5.4%
2 24
 
4.1%
: 16
 
2.7%
- 10
 
1.7%
9 9
 
1.5%
. 8
 
1.4%
6 8
 
1.4%
Other values (14) 45
 
7.6%
Han
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1460
68.9%
ASCII 646
30.5%
Punctuation 8
 
0.4%
None 5
 
0.2%
CJK 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
359
55.6%
0 44
 
6.8%
1 35
 
5.4%
, 32
 
5.0%
2 24
 
3.7%
: 16
 
2.5%
- 10
 
1.5%
9 9
 
1.4%
. 8
 
1.2%
6 8
 
1.2%
Other values (40) 101
 
15.6%
Hangul
ValueCountFrequency (%)
70
 
4.8%
65
 
4.5%
51
 
3.5%
48
 
3.3%
47
 
3.2%
43
 
2.9%
43
 
2.9%
42
 
2.9%
41
 
2.8%
40
 
2.7%
Other values (248) 970
66.4%
Punctuation
ValueCountFrequency (%)
4
50.0%
4
50.0%
None
ValueCountFrequency (%)
  4
80.0%
· 1
 
20.0%
CJK
ValueCountFrequency (%)
1
100.0%

COL2
Text

MISSING 

Distinct501
Distinct (%)84.2%
Missing640
Missing (%)51.8%
Memory size9.8 KiB
2023-12-12T18:32:19.517681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length163
Median length49
Mean length20.332773
Min length2

Characters and Unicode

Total characters12098
Distinct characters573
Distinct categories14 ?
Distinct scripts4 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique470 ?
Unique (%)79.0%

Sample

1st row금연콜센터
2nd row이진수 박사, 제4대 국립암센터 원장에 취임
3rd row국립암센터 인사
4th row국립암센터 제2회 국제심포지엄 성황리에 마쳐
5th row국제심포지엄 주요 해외 연제 요약 및 연자 소개
ValueCountFrequency (%)
국립암센터 85
 
3.4%
45
 
1.8%
28
 
1.1%
개최 21
 
0.8%
20
 
0.8%
포토뉴스 15
 
0.6%
isbn 15
 
0.6%
15
 
0.6%
친절직원 15
 
0.6%
위한 14
 
0.6%
Other values (1464) 2254
89.2%
2023-12-12T18:32:20.119732image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2019
 
16.7%
397
 
3.3%
0 315
 
2.6%
209
 
1.7%
, 197
 
1.6%
191
 
1.6%
184
 
1.5%
2 156
 
1.3%
1 138
 
1.1%
116
 
1.0%
Other values (563) 8176
67.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7284
60.2%
Space Separator 2019
 
16.7%
Decimal Number 1086
 
9.0%
Lowercase Letter 724
 
6.0%
Other Punctuation 516
 
4.3%
Uppercase Letter 211
 
1.7%
Dash Punctuation 85
 
0.7%
Math Symbol 77
 
0.6%
Close Punctuation 30
 
0.2%
Open Punctuation 28
 
0.2%
Other values (4) 38
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
397
 
5.5%
209
 
2.9%
191
 
2.6%
184
 
2.5%
116
 
1.6%
106
 
1.5%
103
 
1.4%
100
 
1.4%
94
 
1.3%
89
 
1.2%
Other values (476) 5695
78.2%
Lowercase Letter
ValueCountFrequency (%)
e 94
13.0%
a 65
 
9.0%
n 65
 
9.0%
r 63
 
8.7%
t 59
 
8.1%
o 44
 
6.1%
p 39
 
5.4%
i 37
 
5.1%
c 35
 
4.8%
k 28
 
3.9%
Other values (15) 195
26.9%
Uppercase Letter
ValueCountFrequency (%)
N 35
16.6%
C 32
15.2%
B 31
14.7%
S 30
14.2%
I 26
12.3%
E 7
 
3.3%
A 6
 
2.8%
T 6
 
2.8%
M 5
 
2.4%
P 5
 
2.4%
Other values (11) 28
13.3%
Other Punctuation
ValueCountFrequency (%)
, 197
38.2%
. 82
15.9%
/ 76
 
14.7%
" 68
 
13.2%
: 41
 
7.9%
% 14
 
2.7%
· 13
 
2.5%
! 9
 
1.7%
& 8
 
1.6%
' 4
 
0.8%
Decimal Number
ValueCountFrequency (%)
0 315
29.0%
2 156
14.4%
1 138
12.7%
8 94
 
8.7%
5 93
 
8.6%
9 93
 
8.6%
6 57
 
5.2%
4 52
 
4.8%
3 48
 
4.4%
7 40
 
3.7%
Math Symbol
ValueCountFrequency (%)
= 24
31.2%
> 24
31.2%
< 24
31.2%
| 5
 
6.5%
Close Punctuation
ValueCountFrequency (%)
) 16
53.3%
8
26.7%
] 5
 
16.7%
1
 
3.3%
Open Punctuation
ValueCountFrequency (%)
( 14
50.0%
8
28.6%
[ 5
 
17.9%
1
 
3.6%
Final Punctuation
ValueCountFrequency (%)
4
80.0%
1
 
20.0%
Initial Punctuation
ValueCountFrequency (%)
3
75.0%
1
 
25.0%
Space Separator
ValueCountFrequency (%)
2019
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 85
100.0%
Control
ValueCountFrequency (%)
18
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7280
60.2%
Common 3879
32.1%
Latin 935
 
7.7%
Han 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
397
 
5.5%
209
 
2.9%
191
 
2.6%
184
 
2.5%
116
 
1.6%
106
 
1.5%
103
 
1.4%
100
 
1.4%
94
 
1.3%
89
 
1.2%
Other values (472) 5691
78.2%
Latin
ValueCountFrequency (%)
e 94
 
10.1%
a 65
 
7.0%
n 65
 
7.0%
r 63
 
6.7%
t 59
 
6.3%
o 44
 
4.7%
p 39
 
4.2%
i 37
 
4.0%
N 35
 
3.7%
c 35
 
3.7%
Other values (36) 399
42.7%
Common
ValueCountFrequency (%)
2019
52.0%
0 315
 
8.1%
, 197
 
5.1%
2 156
 
4.0%
1 138
 
3.6%
8 94
 
2.4%
5 93
 
2.4%
9 93
 
2.4%
- 85
 
2.2%
. 82
 
2.1%
Other values (31) 607
 
15.6%
Han
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7256
60.0%
ASCII 4774
39.5%
None 31
 
0.3%
Compat Jamo 24
 
0.2%
Punctuation 9
 
0.1%
CJK 3
 
< 0.1%
CJK Compat Ideographs 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2019
42.3%
0 315
 
6.6%
, 197
 
4.1%
2 156
 
3.3%
1 138
 
2.9%
e 94
 
2.0%
8 94
 
2.0%
5 93
 
1.9%
9 93
 
1.9%
- 85
 
1.8%
Other values (68) 1490
31.2%
Hangul
ValueCountFrequency (%)
397
 
5.5%
209
 
2.9%
191
 
2.6%
184
 
2.5%
116
 
1.6%
106
 
1.5%
103
 
1.4%
100
 
1.4%
94
 
1.3%
89
 
1.2%
Other values (470) 5667
78.1%
Compat Jamo
ValueCountFrequency (%)
21
87.5%
3
 
12.5%
None
ValueCountFrequency (%)
· 13
41.9%
8
25.8%
8
25.8%
1
 
3.2%
1
 
3.2%
Punctuation
ValueCountFrequency (%)
4
44.4%
3
33.3%
1
 
11.1%
1
 
11.1%
CJK
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
CJK Compat Ideographs
ValueCountFrequency (%)
1
100.0%

NOVIEW
Boolean

MISSING 

Distinct2
Distinct (%)0.2%
Missing86
Missing (%)7.0%
Memory size2.5 KiB
True
742 
False
407 
(Missing)
86 
ValueCountFrequency (%)
True 742
60.1%
False 407
33.0%
(Missing) 86
 
7.0%
2023-12-12T18:32:20.270302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2023-12-12T18:32:14.862741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:32:15.004465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T18:32:15.152428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

BBS_SEQBBSNUMCOL1COL2NOVIEW
0587203<NA>금연콜센터N
1588223<NA>이진수 박사, 제4대 국립암센터 원장에 취임Y
2589223<NA>국립암센터 인사N
3590223<NA>국립암센터 제2회 국제심포지엄 성황리에 마쳐Y
4591223<NA>국제심포지엄 주요 해외 연제 요약 및 연자 소개N
5592223<NA>암 전문의료기관하면 국립암센터Y
6593223<NA>암조기검진의 최신지견 세미나 개최N
7594223<NA>연구성과N
8595223<NA>복강경 수술, 위암환자 삶의 질을 향상 시키는 것으로 나와N
9596223<NA>단신N
BBS_SEQBBSNUMCOL1COL2NOVIEW
12254,4032,226국립암센터, 암생존자 주간 기념 심포지엄 개최<NA>Y
12264,4042,226암생존자와 함께하는 소생캠페인 진행<NA>Y
12274,4052,226인공지능 기반 상담형 챗봇 서비스 구축<NA>Y
12284,4062,226마이크로바이옴, 암 치료와의 연결고리는?<NA>Y
12294,4072,2263기 병원실무자양성과정 수료식 진행<NA>Y
12304,4082,226개그맨 유상무, 유튜브 수익금으로 소아암 기부 마라톤 선행<NA>Y
12314,4092,2265월, 6월 친절직원<NA>Y
12324,4102,226암정보<NA>Y
12334,4842,246<NA><NA>Y
12344,4852,266<NA><NA>Y