Dataset statistics
Number of variables | 5 |
---|---|
Number of observations | 1235 |
Missing cells | 1734 |
Missing cells (%) | 28.1% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 48.4 KiB |
Average record size in memory | 40.1 B |
Variable types
Text | 4 |
---|---|
Boolean | 1 |
Dataset
Description | 국립암센터에서 19년도 9월까지 국립암센터홈페이지를 통해 개방하는 책정보 |
---|---|
Author | 국립암센터 |
URL | https://www.data.go.kr/data/15049626/fileData.do |
Reproduction
Analysis started | 2023-12-12 09:32:14.254933 |
---|---|
Analysis finished | 2023-12-12 09:32:15.224894 |
Duration | 0.97 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
BBS_SEQ
Text
UNIQUE
 
Distinct | 1235 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 9.8 KiB |
Value | Count | Frequency (%) |
587 | 1 | 0.1% |
1,828 | 1 | 0.1% |
1,857 | 1 | 0.1% |
1,856 | 1 | 0.1% |
1,854 | 1 | 0.1% |
1,853 | 1 | 0.1% |
1,855 | 1 | 0.1% |
1,841 | 1 | 0.1% |
1,840 | 1 | 0.1% |
1,818 | 1 | 0.1% |
Other values (1225) | 1225 |
Most occurring characters
Value | Count | Frequency (%) |
, | 867 | |
1 | 807 | |
3 | 600 | |
2 | 541 | |
4 | 450 | |
6 | 419 | |
7 | 396 | |
8 | 387 | |
5 | 320 | 5.9% |
9 | 308 | 5.7% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 4535 | |
Other Punctuation | 867 | 16.0% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 807 | |
3 | 600 | |
2 | 541 | |
4 | 450 | |
6 | 419 | |
7 | 396 | |
8 | 387 | |
5 | 320 | 7.1% |
9 | 308 | 6.8% |
0 | 307 | 6.8% |
Other Punctuation
Value | Count | Frequency (%) |
, | 867 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 5402 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
, | 867 | |
1 | 807 | |
3 | 600 | |
2 | 541 | |
4 | 450 | |
6 | 419 | |
7 | 396 | |
8 | 387 | |
5 | 320 | 5.9% |
9 | 308 | 5.7% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5402 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
, | 867 | |
1 | 807 | |
3 | 600 | |
2 | 541 | |
4 | 450 | |
6 | 419 | |
7 | 396 | |
8 | 387 | |
5 | 320 | 5.9% |
9 | 308 | 5.7% |
BBSNUM
Text
Distinct | 147 |
---|---|
Distinct (%) | 11.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 9.8 KiB |
Value | Count | Frequency (%) |
704 | 31 | 2.5% |
364 | 30 | 2.4% |
403 | 27 | 2.2% |
504 | 26 | 2.1% |
523 | 26 | 2.1% |
304 | 25 | 2.0% |
644 | 24 | 1.9% |
325 | 24 | 1.9% |
464 | 24 | 1.9% |
626 | 23 | 1.9% |
Other values (137) | 975 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 760 | |
4 | 629 | |
3 | 533 | |
, | 462 | |
2 | 448 | |
6 | 446 | |
0 | 432 | |
5 | 298 | 6.5% |
8 | 232 | 5.1% |
7 | 220 | 4.8% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 4089 | |
Other Punctuation | 462 | 10.2% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 760 | |
4 | 629 | |
3 | 533 | |
2 | 448 | |
6 | 446 | |
0 | 432 | |
5 | 298 | 7.3% |
8 | 232 | 5.7% |
7 | 220 | 5.4% |
9 | 91 | 2.2% |
Other Punctuation
Value | Count | Frequency (%) |
, | 462 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 4551 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 760 | |
4 | 629 | |
3 | 533 | |
, | 462 | |
2 | 448 | |
6 | 446 | |
0 | 432 | |
5 | 298 | 6.5% |
8 | 232 | 5.1% |
7 | 220 | 4.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 4551 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 760 | |
4 | 629 | |
3 | 533 | |
, | 462 | |
2 | 448 | |
6 | 446 | |
0 | 432 | |
5 | 298 | 6.5% |
8 | 232 | 5.1% |
7 | 220 | 4.8% |
COL1
Text
MISSING
 
Distinct | 102 |
---|---|
Distinct (%) | 44.9% |
Missing | 1008 |
Missing (%) | 81.6% |
Memory size | 9.8 KiB |
Value | Count | Frequency (%) |
국립암센터 | 35 | 6.8% |
출간일 | 34 | 6.6% |
가격 | 32 | 6.2% |
지은이 | 32 | 6.2% |
뉴스레터 | 16 | 3.1% |
15 | 2.9% | |
출판사 | 11 | 2.1% |
개최 | 8 | 1.6% |
암 | 5 | 1.0% |
정가 | 5 | 1.0% |
Other values (280) | 323 |
Most occurring characters
Value | Count | Frequency (%) |
359 | 16.9% | |
암 | 70 | 3.3% |
터 | 65 | 3.1% |
출 | 51 | 2.4% |
이 | 48 | 2.3% |
일 | 47 | 2.2% |
0 | 44 | 2.1% |
센 | 43 | 2.0% |
국 | 43 | 2.0% |
가 | 42 | 2.0% |
Other values (303) | 1308 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 1461 | |
Space Separator | 363 | 17.1% |
Decimal Number | 147 | 6.9% |
Other Punctuation | 60 | 2.8% |
Uppercase Letter | 35 | 1.7% |
Lowercase Letter | 34 | 1.6% |
Dash Punctuation | 10 | 0.5% |
Initial Punctuation | 4 | 0.2% |
Final Punctuation | 4 | 0.2% |
Open Punctuation | 1 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
암 | 70 | 4.8% |
터 | 65 | 4.4% |
출 | 51 | 3.5% |
이 | 48 | 3.3% |
일 | 47 | 3.2% |
센 | 43 | 2.9% |
국 | 43 | 2.9% |
가 | 42 | 2.9% |
립 | 41 | 2.8% |
간 | 40 | 2.7% |
Other values (249) | 971 |
Uppercase Letter
Value | Count | Frequency (%) |
O | 5 | |
C | 4 | |
M | 4 | |
N | 3 | |
S | 3 | |
I | 3 | |
W | 2 | 5.7% |
H | 2 | 5.7% |
F | 2 | 5.7% |
E | 1 | 2.9% |
Other values (6) | 6 |
Lowercase Letter
Value | Count | Frequency (%) |
e | 6 | |
a | 6 | |
r | 3 | |
n | 3 | |
i | 3 | |
s | 2 | 5.9% |
t | 2 | 5.9% |
c | 2 | 5.9% |
g | 2 | 5.9% |
l | 1 | 2.9% |
Other values (4) | 4 |
Decimal Number
Value | Count | Frequency (%) |
0 | 44 | |
1 | 35 | |
2 | 24 | |
9 | 9 | 6.1% |
6 | 8 | 5.4% |
4 | 6 | 4.1% |
7 | 6 | 4.1% |
8 | 5 | 3.4% |
3 | 5 | 3.4% |
5 | 5 | 3.4% |
Other Punctuation
Value | Count | Frequency (%) |
, | 32 | |
: | 16 | |
. | 8 | 13.3% |
· | 1 | 1.7% |
% | 1 | 1.7% |
& | 1 | 1.7% |
? | 1 | 1.7% |
Space Separator
Value | Count | Frequency (%) |
359 | ||
4 | 1.1% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 10 |
Initial Punctuation
Value | Count | Frequency (%) |
‘ | 4 |
Final Punctuation
Value | Count | Frequency (%) |
’ | 4 |
Open Punctuation
Value | Count | Frequency (%) |
( | 1 |
Close Punctuation
Value | Count | Frequency (%) |
) | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 1460 | |
Common | 590 | |
Latin | 69 | 3.3% |
Han | 1 | < 0.1% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
암 | 70 | 4.8% |
터 | 65 | 4.5% |
출 | 51 | 3.5% |
이 | 48 | 3.3% |
일 | 47 | 3.2% |
센 | 43 | 2.9% |
국 | 43 | 2.9% |
가 | 42 | 2.9% |
립 | 41 | 2.8% |
간 | 40 | 2.7% |
Other values (248) | 970 |
Latin
Value | Count | Frequency (%) |
e | 6 | 8.7% |
a | 6 | 8.7% |
O | 5 | 7.2% |
C | 4 | 5.8% |
M | 4 | 5.8% |
r | 3 | 4.3% |
n | 3 | 4.3% |
N | 3 | 4.3% |
S | 3 | 4.3% |
I | 3 | 4.3% |
Other values (20) | 29 |
Common
Value | Count | Frequency (%) |
359 | ||
0 | 44 | 7.5% |
1 | 35 | 5.9% |
, | 32 | 5.4% |
2 | 24 | 4.1% |
: | 16 | 2.7% |
- | 10 | 1.7% |
9 | 9 | 1.5% |
. | 8 | 1.4% |
6 | 8 | 1.4% |
Other values (14) | 45 | 7.6% |
Han
Value | Count | Frequency (%) |
美 | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 1460 | |
ASCII | 646 | |
Punctuation | 8 | 0.4% |
None | 5 | 0.2% |
CJK | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
359 | ||
0 | 44 | 6.8% |
1 | 35 | 5.4% |
, | 32 | 5.0% |
2 | 24 | 3.7% |
: | 16 | 2.5% |
- | 10 | 1.5% |
9 | 9 | 1.4% |
. | 8 | 1.2% |
6 | 8 | 1.2% |
Other values (40) | 101 | 15.6% |
Hangul
Value | Count | Frequency (%) |
암 | 70 | 4.8% |
터 | 65 | 4.5% |
출 | 51 | 3.5% |
이 | 48 | 3.3% |
일 | 47 | 3.2% |
센 | 43 | 2.9% |
국 | 43 | 2.9% |
가 | 42 | 2.9% |
립 | 41 | 2.8% |
간 | 40 | 2.7% |
Other values (248) | 970 |
Punctuation
Value | Count | Frequency (%) |
‘ | 4 | |
’ | 4 |
None
Value | Count | Frequency (%) |
4 | ||
· | 1 | 20.0% |
CJK
Value | Count | Frequency (%) |
美 | 1 |
COL2
Text
MISSING
 
Distinct | 501 |
---|---|
Distinct (%) | 84.2% |
Missing | 640 |
Missing (%) | 51.8% |
Memory size | 9.8 KiB |
Length
Max length | 163 |
---|---|
Median length | 49 |
Mean length | 20.332773 |
Min length | 2 |
Characters and Unicode
Total characters | 12098 |
---|---|
Distinct characters | 573 |
Distinct categories | 14 ? |
Distinct scripts | 4 ? |
Distinct blocks | 7 ? |
Unique
Unique | 470 ? |
---|---|
Unique (%) | 79.0% |
Sample
1st row | 금연콜센터 |
---|---|
2nd row | 이진수 박사, 제4대 국립암센터 원장에 취임 |
3rd row | 국립암센터 인사 |
4th row | 국립암센터 제2회 국제심포지엄 성황리에 마쳐 |
5th row | 국제심포지엄 주요 해외 연제 요약 및 연자 소개 |
Value | Count | Frequency (%) |
국립암센터 | 85 | 3.4% |
45 | 1.8% | |
암 | 28 | 1.1% |
개최 | 21 | 0.8% |
ㅣ | 20 | 0.8% |
포토뉴스 | 15 | 0.6% |
isbn | 15 | 0.6% |
및 | 15 | 0.6% |
친절직원 | 15 | 0.6% |
위한 | 14 | 0.6% |
Other values (1464) | 2254 |
Most occurring characters
Value | Count | Frequency (%) |
2019 | 16.7% | |
암 | 397 | 3.3% |
0 | 315 | 2.6% |
국 | 209 | 1.7% |
, | 197 | 1.6% |
터 | 191 | 1.6% |
센 | 184 | 1.5% |
2 | 156 | 1.3% |
1 | 138 | 1.1% |
립 | 116 | 1.0% |
Other values (563) | 8176 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 7284 | |
Space Separator | 2019 | 16.7% |
Decimal Number | 1086 | 9.0% |
Lowercase Letter | 724 | 6.0% |
Other Punctuation | 516 | 4.3% |
Uppercase Letter | 211 | 1.7% |
Dash Punctuation | 85 | 0.7% |
Math Symbol | 77 | 0.6% |
Close Punctuation | 30 | 0.2% |
Open Punctuation | 28 | 0.2% |
Other values (4) | 38 | 0.3% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
암 | 397 | 5.5% |
국 | 209 | 2.9% |
터 | 191 | 2.6% |
센 | 184 | 2.5% |
립 | 116 | 1.6% |
의 | 106 | 1.5% |
가 | 103 | 1.4% |
연 | 100 | 1.4% |
기 | 94 | 1.3% |
원 | 89 | 1.2% |
Other values (476) | 5695 |
Lowercase Letter
Value | Count | Frequency (%) |
e | 94 | |
a | 65 | 9.0% |
n | 65 | 9.0% |
r | 63 | 8.7% |
t | 59 | 8.1% |
o | 44 | 6.1% |
p | 39 | 5.4% |
i | 37 | 5.1% |
c | 35 | 4.8% |
k | 28 | 3.9% |
Other values (15) | 195 |
Uppercase Letter
Value | Count | Frequency (%) |
N | 35 | |
C | 32 | |
B | 31 | |
S | 30 | |
I | 26 | |
E | 7 | 3.3% |
A | 6 | 2.8% |
T | 6 | 2.8% |
M | 5 | 2.4% |
P | 5 | 2.4% |
Other values (11) | 28 |
Other Punctuation
Value | Count | Frequency (%) |
, | 197 | |
. | 82 | |
/ | 76 | 14.7% |
" | 68 | 13.2% |
: | 41 | 7.9% |
% | 14 | 2.7% |
· | 13 | 2.5% |
! | 9 | 1.7% |
& | 8 | 1.6% |
' | 4 | 0.8% |
Decimal Number
Value | Count | Frequency (%) |
0 | 315 | |
2 | 156 | |
1 | 138 | |
8 | 94 | 8.7% |
5 | 93 | 8.6% |
9 | 93 | 8.6% |
6 | 57 | 5.2% |
4 | 52 | 4.8% |
3 | 48 | 4.4% |
7 | 40 | 3.7% |
Math Symbol
Value | Count | Frequency (%) |
= | 24 | |
> | 24 | |
< | 24 | |
| | 5 | 6.5% |
Close Punctuation
Value | Count | Frequency (%) |
) | 16 | |
』 | 8 | |
] | 5 | 16.7% |
」 | 1 | 3.3% |
Open Punctuation
Value | Count | Frequency (%) |
( | 14 | |
『 | 8 | |
[ | 5 | 17.9% |
「 | 1 | 3.6% |
Final Punctuation
Value | Count | Frequency (%) |
’ | 4 | |
” | 1 | 20.0% |
Initial Punctuation
Value | Count | Frequency (%) |
‘ | 3 | |
“ | 1 | 25.0% |
Space Separator
Value | Count | Frequency (%) |
2019 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 85 |
Control
Value | Count | Frequency (%) |
18 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 11 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 7280 | |
Common | 3879 | |
Latin | 935 | 7.7% |
Han | 4 | < 0.1% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
암 | 397 | 5.5% |
국 | 209 | 2.9% |
터 | 191 | 2.6% |
센 | 184 | 2.5% |
립 | 116 | 1.6% |
의 | 106 | 1.5% |
가 | 103 | 1.4% |
연 | 100 | 1.4% |
기 | 94 | 1.3% |
원 | 89 | 1.2% |
Other values (472) | 5691 |
Latin
Value | Count | Frequency (%) |
e | 94 | 10.1% |
a | 65 | 7.0% |
n | 65 | 7.0% |
r | 63 | 6.7% |
t | 59 | 6.3% |
o | 44 | 4.7% |
p | 39 | 4.2% |
i | 37 | 4.0% |
N | 35 | 3.7% |
c | 35 | 3.7% |
Other values (36) | 399 |
Common
Value | Count | Frequency (%) |
2019 | ||
0 | 315 | 8.1% |
, | 197 | 5.1% |
2 | 156 | 4.0% |
1 | 138 | 3.6% |
8 | 94 | 2.4% |
5 | 93 | 2.4% |
9 | 93 | 2.4% |
- | 85 | 2.2% |
. | 82 | 2.1% |
Other values (31) | 607 | 15.6% |
Han
Value | Count | Frequency (%) |
場 | 1 | |
賞 | 1 | |
大 | 1 | |
茶 | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 7256 | |
ASCII | 4774 | |
None | 31 | 0.3% |
Compat Jamo | 24 | 0.2% |
Punctuation | 9 | 0.1% |
CJK | 3 | < 0.1% |
CJK Compat Ideographs | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
2019 | ||
0 | 315 | 6.6% |
, | 197 | 4.1% |
2 | 156 | 3.3% |
1 | 138 | 2.9% |
e | 94 | 2.0% |
8 | 94 | 2.0% |
5 | 93 | 1.9% |
9 | 93 | 1.9% |
- | 85 | 1.8% |
Other values (68) | 1490 |
Hangul
Value | Count | Frequency (%) |
암 | 397 | 5.5% |
국 | 209 | 2.9% |
터 | 191 | 2.6% |
센 | 184 | 2.5% |
립 | 116 | 1.6% |
의 | 106 | 1.5% |
가 | 103 | 1.4% |
연 | 100 | 1.4% |
기 | 94 | 1.3% |
원 | 89 | 1.2% |
Other values (470) | 5667 |
Compat Jamo
Value | Count | Frequency (%) |
ㅣ | 21 | |
ㆍ | 3 | 12.5% |
None
Value | Count | Frequency (%) |
· | 13 | |
』 | 8 | |
『 | 8 | |
「 | 1 | 3.2% |
」 | 1 | 3.2% |
Punctuation
Value | Count | Frequency (%) |
’ | 4 | |
‘ | 3 | |
“ | 1 | 11.1% |
” | 1 | 11.1% |
CJK
Value | Count | Frequency (%) |
場 | 1 | |
賞 | 1 | |
大 | 1 |
CJK Compat Ideographs
Value | Count | Frequency (%) |
茶 | 1 |
NOVIEW
Boolean
MISSING
 
Distinct | 2 |
---|---|
Distinct (%) | 0.2% |
Missing | 86 |
Missing (%) | 7.0% |
Memory size | 2.5 KiB |
True | |
---|---|
False | |
(Missing) |
Value | Count | Frequency (%) |
True | 742 | |
False | 407 | |
(Missing) | 86 | 7.0% |
BBS_SEQ | BBSNUM | COL1 | COL2 | NOVIEW | |
---|---|---|---|---|---|
0 | 587 | 203 | <NA> | 금연콜센터 | N |
1 | 588 | 223 | <NA> | 이진수 박사, 제4대 국립암센터 원장에 취임 | Y |
2 | 589 | 223 | <NA> | 국립암센터 인사 | N |
3 | 590 | 223 | <NA> | 국립암센터 제2회 국제심포지엄 성황리에 마쳐 | Y |
4 | 591 | 223 | <NA> | 국제심포지엄 주요 해외 연제 요약 및 연자 소개 | N |
5 | 592 | 223 | <NA> | 암 전문의료기관하면 국립암센터 | Y |
6 | 593 | 223 | <NA> | 암조기검진의 최신지견 세미나 개최 | N |
7 | 594 | 223 | <NA> | 연구성과 | N |
8 | 595 | 223 | <NA> | 복강경 수술, 위암환자 삶의 질을 향상 시키는 것으로 나와 | N |
9 | 596 | 223 | <NA> | 단신 | N |
BBS_SEQ | BBSNUM | COL1 | COL2 | NOVIEW | |
---|---|---|---|---|---|
1225 | 4,403 | 2,226 | 국립암센터, 암생존자 주간 기념 심포지엄 개최 | <NA> | Y |
1226 | 4,404 | 2,226 | 암생존자와 함께하는 소생캠페인 진행 | <NA> | Y |
1227 | 4,405 | 2,226 | 인공지능 기반 상담형 챗봇 서비스 구축 | <NA> | Y |
1228 | 4,406 | 2,226 | 마이크로바이옴, 암 치료와의 연결고리는? | <NA> | Y |
1229 | 4,407 | 2,226 | 3기 병원실무자양성과정 수료식 진행 | <NA> | Y |
1230 | 4,408 | 2,226 | 개그맨 유상무, 유튜브 수익금으로 소아암 기부 마라톤 선행 | <NA> | Y |
1231 | 4,409 | 2,226 | 5월, 6월 친절직원 | <NA> | Y |
1232 | 4,410 | 2,226 | 암정보 | <NA> | Y |
1233 | 4,484 | 2,246 | <NA> | <NA> | Y |
1234 | 4,485 | 2,266 | <NA> | <NA> | Y |