Overview

Dataset statistics

Number of variables11
Number of observations69
Missing cells185
Missing cells (%)24.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.5 KiB
Average record size in memory96.9 B

Variable types

Numeric1
Text1
Unsupported2
Boolean1
Categorical5
DateTime1

Dataset

Description국립암센터에서 19년도 9월까지 국립암센터홈페이지를 통해 개방하는 기타공고 테이블 정보
Author국립암센터
URLhttps://www.data.go.kr/data/15049632/fileData.do

Alerts

BBSCONTENTYN has constant value ""Constant
BBS_KIND is highly overall correlated with BBSNUM and 2 other fieldsHigh correlation
CONTENT2 is highly overall correlated with BBSNUM and 2 other fieldsHigh correlation
READNUM is highly overall correlated with BBSNUM and 2 other fieldsHigh correlation
BBSNUM is highly overall correlated with READNUM and 2 other fieldsHigh correlation
CONTENT2 is highly imbalanced (62.5%)Imbalance
BBS_KIND is highly imbalanced (62.5%)Imbalance
BBSCONTENT has 69 (100.0%) missing valuesMissing
BBSCONTENTYN has 47 (68.1%) missing valuesMissing
ETC has 69 (100.0%) missing valuesMissing
BBSNUM has unique valuesUnique
BBSCONTENT is an unsupported type, check if it needs cleaning or further analysisUnsupported
ETC is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 09:29:11.541811
Analysis finished2023-12-12 09:29:12.487976
Duration0.95 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

BBSNUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct69
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean151.52174
Minimum59
Maximum292
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size753.0 B
2023-12-12T18:29:12.591774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum59
5-th percentile63.4
Q177
median113
Q3274
95-th percentile288.6
Maximum292
Range233
Interquartile range (IQR)197

Descriptive statistics

Standard deviation90.682641
Coefficient of variation (CV)0.59847941
Kurtosis-1.4049328
Mean151.52174
Median Absolute Deviation (MAD)41
Skewness0.67139649
Sum10455
Variance8223.3414
MonotonicityNot monotonic
2023-12-12T18:29:12.809994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
124 1
 
1.4%
64 1
 
1.4%
89 1
 
1.4%
82 1
 
1.4%
81 1
 
1.4%
80 1
 
1.4%
79 1
 
1.4%
78 1
 
1.4%
63 1
 
1.4%
90 1
 
1.4%
Other values (59) 59
85.5%
ValueCountFrequency (%)
59 1
1.4%
61 1
1.4%
62 1
1.4%
63 1
1.4%
64 1
1.4%
65 1
1.4%
66 1
1.4%
67 1
1.4%
68 1
1.4%
69 1
1.4%
ValueCountFrequency (%)
292 1
1.4%
291 1
1.4%
290 1
1.4%
289 1
1.4%
288 1
1.4%
287 1
1.4%
286 1
1.4%
285 1
1.4%
284 1
1.4%
283 1
1.4%
Distinct61
Distinct (%)88.4%
Missing0
Missing (%)0.0%
Memory size684.0 B
2023-12-12T18:29:13.203445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length159
Median length39
Mean length30.376812
Min length2

Characters and Unicode

Total characters2096
Distinct characters127
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)79.7%

Sample

1st row2006년도 암정복추진연구개발사업 지정과제(추가) 공고
2nd row2006년도 암정복추진연구개발사업 추가 지정과제 선정결과
3rd row암정복추진연구개발사업 연구성과물, 수행과제요약서, 진도보고서 제출 요청
4th row암정복추진연구개발사업「연구비 사용실적보고서 및 최종결과보고서」서식 안내
5th row2003년도 암정복추진연구개발사업 신청서
ValueCountFrequency (%)
암정복추진연구개발사업 37
 
11.5%
안내 21
 
6.5%
13
 
4.0%
2009년도 11
 
3.4%
선정결과 10
 
3.1%
구두발표평가 10
 
3.1%
2006년도 10
 
3.1%
2005년도 9
 
2.8%
공고 9
 
2.8%
2003년도 7
 
2.2%
Other values (110) 185
57.5%
2023-12-12T18:29:13.699243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
253
 
12.1%
0 106
 
5.1%
2 83
 
4.0%
77
 
3.7%
73
 
3.5%
67
 
3.2%
62
 
3.0%
59
 
2.8%
55
 
2.6%
52
 
2.5%
Other values (117) 1209
57.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1400
66.8%
Decimal Number 331
 
15.8%
Space Separator 253
 
12.1%
Other Punctuation 70
 
3.3%
Open Punctuation 20
 
1.0%
Close Punctuation 20
 
1.0%
Dash Punctuation 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
77
 
5.5%
73
 
5.2%
67
 
4.8%
62
 
4.4%
59
 
4.2%
55
 
3.9%
52
 
3.7%
52
 
3.7%
51
 
3.6%
51
 
3.6%
Other values (95) 801
57.2%
Decimal Number
ValueCountFrequency (%)
0 106
32.0%
2 83
25.1%
1 26
 
7.9%
5 24
 
7.3%
6 23
 
6.9%
4 23
 
6.9%
9 17
 
5.1%
3 16
 
4.8%
8 7
 
2.1%
7 5
 
1.5%
Other Punctuation
ValueCountFrequency (%)
; 19
27.1%
& 19
27.1%
# 19
27.1%
· 7
 
10.0%
, 6
 
8.6%
Open Punctuation
ValueCountFrequency (%)
( 14
70.0%
6
30.0%
Close Punctuation
ValueCountFrequency (%)
) 14
70.0%
6
30.0%
Space Separator
ValueCountFrequency (%)
253
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1400
66.8%
Common 696
33.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
77
 
5.5%
73
 
5.2%
67
 
4.8%
62
 
4.4%
59
 
4.2%
55
 
3.9%
52
 
3.7%
52
 
3.7%
51
 
3.6%
51
 
3.6%
Other values (95) 801
57.2%
Common
ValueCountFrequency (%)
253
36.4%
0 106
15.2%
2 83
 
11.9%
1 26
 
3.7%
5 24
 
3.4%
6 23
 
3.3%
4 23
 
3.3%
; 19
 
2.7%
& 19
 
2.7%
# 19
 
2.7%
Other values (12) 101
 
14.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1398
66.7%
ASCII 676
32.3%
None 20
 
1.0%
Compat Jamo 2
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
253
37.4%
0 106
15.7%
2 83
 
12.3%
1 26
 
3.8%
5 24
 
3.6%
6 23
 
3.4%
4 23
 
3.4%
; 19
 
2.8%
& 19
 
2.8%
# 19
 
2.8%
Other values (8) 81
 
12.0%
Hangul
ValueCountFrequency (%)
77
 
5.5%
73
 
5.2%
67
 
4.8%
62
 
4.4%
59
 
4.2%
55
 
3.9%
52
 
3.7%
52
 
3.7%
51
 
3.6%
51
 
3.6%
Other values (94) 799
57.2%
None
ValueCountFrequency (%)
· 7
35.0%
6
30.0%
6
30.0%
1
 
5.0%
Compat Jamo
ValueCountFrequency (%)
2
100.0%

BBSCONTENT
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing69
Missing (%)100.0%
Memory size753.0 B

BBSCONTENTYN
Boolean

CONSTANT  MISSING 

Distinct1
Distinct (%)4.5%
Missing47
Missing (%)68.1%
Memory size270.0 B
True
22 
(Missing)
47 
ValueCountFrequency (%)
True 22
31.9%
(Missing) 47
68.1%
2023-12-12T18:29:13.853439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

BBSFROMDATE
Categorical

Distinct3
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size684.0 B
<NA>
52 
20100705
16 
20090818
 
1

Length

Max length8
Median length4
Mean length4.9855072
Min length4

Unique

Unique1 ?
Unique (%)1.4%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 52
75.4%
20100705 16
 
23.2%
20090818 1
 
1.4%

Length

2023-12-12T18:29:13.972869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:29:14.076077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 52
75.4%
20100705 16
 
23.2%
20090818 1
 
1.4%

BBSTODATE
Categorical

Distinct3
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size684.0 B
<NA>
52 
20100705
16 
20090818
 
1

Length

Max length8
Median length4
Mean length4.9855072
Min length4

Unique

Unique1 ?
Unique (%)1.4%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 52
75.4%
20100705 16
 
23.2%
20090818 1
 
1.4%

Length

2023-12-12T18:29:14.189557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:29:14.307487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 52
75.4%
20100705 16
 
23.2%
20090818 1
 
1.4%

READNUM
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size684.0 B
0
52 
<NA>
17 

Length

Max length4
Median length1
Mean length1.7391304
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 52
75.4%
<NA> 17
 
24.6%

Length

2023-12-12T18:29:14.422087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:29:14.531600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 52
75.4%
na 17
 
24.6%

CONTENT2
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size684.0 B
<NA>
64 
1111
 
5

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 64
92.8%
1111 5
 
7.2%

Length

2023-12-12T18:29:14.648460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:29:14.801586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 64
92.8%
1111 5
 
7.2%

BBS_KIND
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size684.0 B
<NA>
64 
A
 
5

Length

Max length4
Median length4
Mean length3.7826087
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 64
92.8%
A 5
 
7.2%

Length

2023-12-12T18:29:15.310144image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:29:15.507596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 64
92.8%
a 5
 
7.2%
Distinct43
Distinct (%)62.3%
Missing0
Missing (%)0.0%
Memory size684.0 B
Minimum2001-01-01 00:00:00
Maximum2011-04-18 00:00:00
2023-12-12T18:29:15.644177image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:29:15.818340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=43)

ETC
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing69
Missing (%)100.0%
Memory size753.0 B

Interactions

2023-12-12T18:29:12.051970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T18:29:15.945226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
BBSNUMBBSTITLEBBSFROMDATEBBSTODATEMOD_DATE
BBSNUM1.0000.9350.6050.6051.000
BBSTITLE0.9351.0001.0001.0000.983
BBSFROMDATE0.6051.0001.0000.6050.605
BBSTODATE0.6051.0000.6051.0000.605
MOD_DATE1.0000.9830.6050.6051.000
2023-12-12T18:29:16.099320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
BBS_KINDCONTENT2READNUMBBSTODATEBBSFROMDATE
BBS_KIND1.0001.0001.000NaNNaN
CONTENT21.0001.0001.000NaNNaN
READNUM1.0001.0001.000NaNNaN
BBSTODATENaNNaNNaN1.0000.410
BBSFROMDATENaNNaNNaN0.4101.000
2023-12-12T18:29:16.243573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
BBSNUMBBSFROMDATEBBSTODATEREADNUMCONTENT2BBS_KIND
BBSNUM1.0000.4100.4101.0001.0001.000
BBSFROMDATE0.4101.0000.4100.0000.0000.000
BBSTODATE0.4100.4101.0000.0000.0000.000
READNUM1.0000.0000.0001.0001.0001.000
CONTENT21.0000.0000.0001.0001.0001.000
BBS_KIND1.0000.0000.0001.0001.0001.000

Missing values

2023-12-12T18:29:12.206846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:29:12.414407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

BBSNUMBBSTITLEBBSCONTENTBBSCONTENTYNBBSFROMDATEBBSTODATEREADNUMCONTENT2BBS_KINDMOD_DATEETC
01242006년도 암정복추진연구개발사업 지정과제(추가) 공고<NA><NA><NA><NA>0<NA><NA>2006-08-14<NA>
11252006년도 암정복추진연구개발사업 추가 지정과제 선정결과<NA><NA><NA><NA>0<NA><NA>2006-09-28<NA>
2130암정복추진연구개발사업 연구성과물, 수행과제요약서, 진도보고서 제출 요청<NA><NA><NA><NA>0<NA><NA>2006-12-08<NA>
374암정복추진연구개발사업「연구비 사용실적보고서 및 최종결과보고서」서식 안내<NA><NA><NA><NA>0<NA><NA>2004-05-10<NA>
4652003년도 암정복추진연구개발사업 신청서<NA><NA><NA><NA>0<NA><NA>2003-01-01<NA>
566암정복 추진연구개발사업 사업설명회 개최일정 변경<NA><NA><NA><NA>0<NA><NA>2003-04-15<NA>
6672003년도 암정복추진연구사업 2차평가 대상자 및 구두발표평가 일정안내<NA><NA><NA><NA>0<NA><NA>2003-05-22<NA>
7682003년도 암정복사업 2차평가 대상과제 선정결과 고지일정 변경<NA><NA><NA><NA>0<NA><NA>2003-06-17<NA>
877제5기 암정복추진기획단장 위촉-김창민 국립암센터 연구소장 연임<NA><NA><NA><NA>0<NA><NA>2004-09-01<NA>
9952005년도 암정복추진연구개발사업 선정결과 안내<NA><NA><NA><NA>0<NA><NA>2005-06-28<NA>
BBSNUMBBSTITLEBBSCONTENTBBSCONTENTYNBBSFROMDATEBBSTODATEREADNUMCONTENT2BBS_KINDMOD_DATEETC
591182006년도 암정복추진연구개발사업 계속과제 구두발표평가 일정 안내<NA><NA><NA><NA>0<NA><NA>2006-05-23<NA>
602832010년도 암정복추진연구개발사업 연차(단계)실적ㆍ계획서 제출요청<NA>Y2010070520100705<NA><NA><NA>2010-07-05<NA>
612842010년도 암정복추진연구개발사업 공고<NA>Y2010070520100705<NA><NA><NA>2010-07-05<NA>
622852010년도 암정복추진연구개발사업 계속과제(2년차 및 2단계 2, 3년차) 구두발표평가 실시 안내<NA>Y2010070520100705<NA><NA><NA>2010-07-05<NA>
632862010년도 암정복추진연구개발사업 계속과제(2년차 및 2단계 2,3년차) 선정결과 통보<NA>Y2010070520100705<NA><NA><NA>2010-07-05<NA>
642872010년도 신규 및 계속과제(3년차 및 2단계1년차) 선정결과 통보 - 이메일 공지<NA>Y2010070520100705<NA><NA><NA>2010-07-05<NA>
6528911<NA>Y<NA><NA>01111A2011-04-18<NA>
66290111<NA>Y<NA><NA>01111A2011-04-18<NA>
67291111<NA>Y<NA><NA>01111A2011-04-18<NA>
68292111<NA>Y<NA><NA>01111A2011-04-18<NA>