Overview

Dataset statistics

Number of variables11
Number of observations249
Missing cells249
Missing cells (%)9.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory22.0 KiB
Average record size in memory90.5 B

Variable types

Numeric1
Categorical6
Text3
Unsupported1

Dataset

Description소아청소년암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048704/fileData.do

Alerts

tblId is highly overall correlated with NUM and 3 other fieldsHigh correlation
gpNm is highly overall correlated with NUM and 3 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 3 other fieldsHigh correlation
tblNm is highly overall correlated with NUM and 3 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 3 other fieldsHigh correlation
dataType is highly overall correlated with dispFormatHigh correlation
dispFormat is highly overall correlated with dataTypeHigh correlation
colCnt has 249 (100.0%) missing valuesMissing
NUM has unique valuesUnique
colCnt is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 06:02:37.720349
Analysis finished2023-12-12 06:02:38.605613
Duration0.89 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct249
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean125
Minimum1
Maximum249
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-12-12T15:02:38.676360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile13.4
Q163
median125
Q3187
95-th percentile236.6
Maximum249
Range248
Interquartile range (IQR)124

Descriptive statistics

Standard deviation72.024301
Coefficient of variation (CV)0.57619441
Kurtosis-1.2
Mean125
Median Absolute Deviation (MAD)62
Skewness0
Sum31125
Variance5187.5
MonotonicityStrictly increasing
2023-12-12T15:02:38.821309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.4%
172 1
 
0.4%
159 1
 
0.4%
160 1
 
0.4%
161 1
 
0.4%
162 1
 
0.4%
163 1
 
0.4%
164 1
 
0.4%
165 1
 
0.4%
166 1
 
0.4%
Other values (239) 239
96.0%
ValueCountFrequency (%)
1 1
0.4%
2 1
0.4%
3 1
0.4%
4 1
0.4%
5 1
0.4%
6 1
0.4%
7 1
0.4%
8 1
0.4%
9 1
0.4%
10 1
0.4%
ValueCountFrequency (%)
249 1
0.4%
248 1
0.4%
247 1
0.4%
246 1
0.4%
245 1
0.4%
244 1
0.4%
243 1
0.4%
242 1
0.4%
241 1
0.4%
240 1
0.4%

gpId
Categorical

HIGH CORRELATION 

Distinct22
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
PDTR_HLTH
76 
PDTR_CHMO_FLST
26 
PDTR_CNDX_BDMS
16 
PDTR_TRGT
13 
PDTR_EVAL_DEAD
 
10
Other values (17)
108 

Length

Max length14
Median length9
Mean length10.64257
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPDTR_TRGT
2nd rowPDTR_TRGT
3rd rowPDTR_TRGT
4th rowPDTR_TRGT
5th rowPDTR_TRGT

Common Values

ValueCountFrequency (%)
PDTR_HLTH 76
30.5%
PDTR_CHMO_FLST 26
 
10.4%
PDTR_CNDX_BDMS 16
 
6.4%
PDTR_TRGT 13
 
5.2%
PDTR_EVAL_DEAD 10
 
4.0%
PDTR_MTST_RLPS 10
 
4.0%
PDTR_RTX 10
 
4.0%
PDTR_CHMO 9
 
3.6%
PDTR_BX 8
 
3.2%
PDTR_OPRT 8
 
3.2%
Other values (12) 63
25.3%

Length

2023-12-12T15:02:39.002616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pdtr_hlth 76
30.5%
pdtr_chmo_flst 26
 
10.4%
pdtr_cndx_bdms 16
 
6.4%
pdtr_trgt 13
 
5.2%
pdtr_eval_dead 10
 
4.0%
pdtr_mtst_rlps 10
 
4.0%
pdtr_rtx 10
 
4.0%
pdtr_chmo 9
 
3.6%
pdtr_bx 8
 
3.2%
pdtr_oprt 8
 
3.2%
Other values (12) 63
25.3%

gpNm
Categorical

HIGH CORRELATION 

Distinct22
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
환자건강정보
76 
항암 FlowSheet
26 
진단 및 신체
16 
Summary
13 
치료평가 및 사망정보
 
10
Other values (17)
108 

Length

Max length12
Median length11
Mean length6.8915663
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSummary
2nd rowSummary
3rd rowSummary
4th rowSummary
5th rowSummary

Common Values

ValueCountFrequency (%)
환자건강정보 76
30.5%
항암 FlowSheet 26
 
10.4%
진단 및 신체 16
 
6.4%
Summary 13
 
5.2%
치료평가 및 사망정보 10
 
4.0%
전이 및 재발 10
 
4.0%
방사선 치료 10
 
4.0%
항암치료 9
 
3.6%
병리검사 8
 
3.2%
수술 8
 
3.2%
Other values (12) 63
25.3%

Length

2023-12-12T15:02:39.173373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
환자건강정보 76
20.7%
36
 
9.8%
flowsheet 26
 
7.1%
항암 26
 
7.1%
진단 16
 
4.3%
신체 16
 
4.3%
summary 13
 
3.5%
initial 11
 
3.0%
치료평가 10
 
2.7%
사망정보 10
 
2.7%
Other values (19) 128
34.8%

tblId
Categorical

HIGH CORRELATION 

Distinct35
Distinct (%)14.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
PDTR_PE_CHMO_FLST
26 
PDTR_MR_HLTH_10
 
14
PDTR_PT_TRGT
 
13
PDTR_PT_BDMS
 
11
PDTR_PE_RTX
 
10
Other values (30)
175 

Length

Max length17
Median length15
Mean length13.594378
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPDTR_PT_TRGT
2nd rowPDTR_PT_TRGT
3rd rowPDTR_PT_TRGT
4th rowPDTR_PT_TRGT
5th rowPDTR_PT_TRGT

Common Values

ValueCountFrequency (%)
PDTR_PE_CHMO_FLST 26
 
10.4%
PDTR_MR_HLTH_10 14
 
5.6%
PDTR_PT_TRGT 13
 
5.2%
PDTR_PT_BDMS 11
 
4.4%
PDTR_PE_RTX 10
 
4.0%
PDTR_MR_HLTH_8 9
 
3.6%
PDTR_MR_HLTH_7 9
 
3.6%
PDTR_MR_HLTH_6 9
 
3.6%
PDTR_MR_HLTH_5 9
 
3.6%
PDTR_PE_CHMO 9
 
3.6%
Other values (25) 130
52.2%

Length

2023-12-12T15:02:39.311342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pdtr_pe_chmo_flst 26
 
10.4%
pdtr_mr_hlth_10 14
 
5.6%
pdtr_pt_trgt 13
 
5.2%
pdtr_pt_bdms 11
 
4.4%
pdtr_pe_rtx 10
 
4.0%
pdtr_mr_hlth_8 9
 
3.6%
pdtr_mr_hlth_7 9
 
3.6%
pdtr_mr_hlth_6 9
 
3.6%
pdtr_mr_hlth_5 9
 
3.6%
pdtr_pe_chmo 9
 
3.6%
Other values (25) 130
52.2%

tblNm
Categorical

HIGH CORRELATION 

Distinct35
Distinct (%)14.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
Flow Sheet
26 
과거력
 
14
기본정보
 
13
신체계측정보
 
11
방사선 치료 정보
 
10
Other values (30)
175 

Length

Max length11
Median length9
Mean length6.3815261
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기본정보
2nd row기본정보
3rd row기본정보
4th row기본정보
5th row기본정보

Common Values

ValueCountFrequency (%)
Flow Sheet 26
 
10.4%
과거력 14
 
5.6%
기본정보 13
 
5.2%
신체계측정보 11
 
4.4%
방사선 치료 정보 10
 
4.0%
가족력(형제/자매) 9
 
3.6%
가족력(자녀) 9
 
3.6%
가족력(모) 9
 
3.6%
가족력(부) 9
 
3.6%
항암치료 정보 9
 
3.6%
Other values (25) 130
52.2%

Length

2023-12-12T15:02:39.490304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
flow 26
 
8.2%
sheet 26
 
8.2%
정보 19
 
6.0%
과거력 14
 
4.4%
기본정보 13
 
4.1%
initial 11
 
3.5%
신체계측정보 11
 
3.5%
치료 10
 
3.2%
방사선 10
 
3.2%
가족력(형제/자매 9
 
2.8%
Other values (30) 168
53.0%

colId
Text

Distinct217
Distinct (%)87.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
2023-12-12T15:02:39.777027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length18
Mean length12.040161
Min length5

Characters and Unicode

Total characters2998
Distinct characters30
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique209 ?
Unique (%)83.9%

Sample

1st rowPT_SBST_NO
2nd rowSEX_CD
3rd rowBRTH_YMD
4th rowFRST_DIAG_CD
5th rowFRST_DIAG_YMD
ValueCountFrequency (%)
pt_sbst_no 25
 
10.0%
chmo_strt_ymd 3
 
1.2%
wt_vl 2
 
0.8%
rtx_strt_ymd 2
 
0.8%
intr_pret_cmnt 2
 
0.8%
chmo_end_ymd 2
 
0.8%
ecog_cd 2
 
0.8%
dead_ymd 2
 
0.8%
fmhs_fath_yn 1
 
0.4%
fmhs_fath_cncr_yn 1
 
0.4%
Other values (207) 207
83.1%
2023-12-12T15:02:40.245377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 500
16.7%
T 293
 
9.8%
M 268
 
8.9%
N 232
 
7.7%
S 197
 
6.6%
C 193
 
6.4%
D 155
 
5.2%
R 118
 
3.9%
H 116
 
3.9%
Y 115
 
3.8%
Other values (20) 811
27.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2495
83.2%
Connector Punctuation 500
 
16.7%
Decimal Number 3
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 293
11.7%
M 268
 
10.7%
N 232
 
9.3%
S 197
 
7.9%
C 193
 
7.7%
D 155
 
6.2%
R 118
 
4.7%
H 116
 
4.6%
Y 115
 
4.6%
L 101
 
4.0%
Other values (16) 707
28.3%
Decimal Number
ValueCountFrequency (%)
1 1
33.3%
2 1
33.3%
3 1
33.3%
Connector Punctuation
ValueCountFrequency (%)
_ 500
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2495
83.2%
Common 503
 
16.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 293
11.7%
M 268
 
10.7%
N 232
 
9.3%
S 197
 
7.9%
C 193
 
7.7%
D 155
 
6.2%
R 118
 
4.7%
H 116
 
4.6%
Y 115
 
4.6%
L 101
 
4.0%
Other values (16) 707
28.3%
Common
ValueCountFrequency (%)
_ 500
99.4%
1 1
 
0.2%
2 1
 
0.2%
3 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2998
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 500
16.7%
T 293
 
9.8%
M 268
 
8.9%
N 232
 
7.7%
S 197
 
6.6%
C 193
 
6.4%
D 155
 
5.2%
R 118
 
3.9%
H 116
 
3.9%
Y 115
 
3.8%
Other values (20) 811
27.1%

colNm
Text

Distinct182
Distinct (%)73.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
2023-12-12T15:02:40.525297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length26
Mean length7.7871486
Min length2

Characters and Unicode

Total characters1939
Distinct characters186
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique163 ?
Unique (%)65.5%

Sample

1st row환자대체번호
2nd row성별 코드
3rd row생년월일
4th row최초 진단 코드
5th row최초 진단일
ValueCountFrequency (%)
내용 42
 
10.3%
환자대체번호 25
 
6.1%
검사 22
 
5.4%
결과 11
 
2.7%
검사명 10
 
2.4%
접수일 10
 
2.4%
여부 7
 
1.7%
n:무 6
 
1.5%
y:유 6
 
1.5%
코드 5
 
1.2%
Other values (186) 265
64.8%
2023-12-12T15:02:40.944447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
160
 
8.3%
58
 
3.0%
56
 
2.9%
54
 
2.8%
53
 
2.7%
53
 
2.7%
) 53
 
2.7%
( 53
 
2.7%
47
 
2.4%
46
 
2.4%
Other values (176) 1306
67.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1374
70.9%
Space Separator 160
 
8.3%
Uppercase Letter 145
 
7.5%
Lowercase Letter 123
 
6.3%
Close Punctuation 53
 
2.7%
Open Punctuation 53
 
2.7%
Other Punctuation 27
 
1.4%
Decimal Number 4
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
58
 
4.2%
56
 
4.1%
54
 
3.9%
53
 
3.9%
53
 
3.9%
47
 
3.4%
46
 
3.3%
44
 
3.2%
40
 
2.9%
39
 
2.8%
Other values (125) 884
64.3%
Uppercase Letter
ValueCountFrequency (%)
N 17
11.7%
T 14
 
9.7%
I 13
 
9.0%
A 11
 
7.6%
C 10
 
6.9%
S 9
 
6.2%
O 9
 
6.2%
R 8
 
5.5%
E 8
 
5.5%
B 8
 
5.5%
Other values (12) 38
26.2%
Lowercase Letter
ValueCountFrequency (%)
n 14
 
11.4%
i 12
 
9.8%
e 11
 
8.9%
o 10
 
8.1%
s 9
 
7.3%
a 8
 
6.5%
r 6
 
4.9%
g 6
 
4.9%
m 6
 
4.9%
c 6
 
4.9%
Other values (11) 35
28.5%
Decimal Number
ValueCountFrequency (%)
1 2
50.0%
3 1
25.0%
2 1
25.0%
Other Punctuation
ValueCountFrequency (%)
/ 15
55.6%
: 12
44.4%
Space Separator
ValueCountFrequency (%)
160
100.0%
Close Punctuation
ValueCountFrequency (%)
) 53
100.0%
Open Punctuation
ValueCountFrequency (%)
( 53
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1374
70.9%
Common 297
 
15.3%
Latin 268
 
13.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
58
 
4.2%
56
 
4.1%
54
 
3.9%
53
 
3.9%
53
 
3.9%
47
 
3.4%
46
 
3.3%
44
 
3.2%
40
 
2.9%
39
 
2.8%
Other values (125) 884
64.3%
Latin
ValueCountFrequency (%)
N 17
 
6.3%
n 14
 
5.2%
T 14
 
5.2%
I 13
 
4.9%
i 12
 
4.5%
A 11
 
4.1%
e 11
 
4.1%
o 10
 
3.7%
C 10
 
3.7%
s 9
 
3.4%
Other values (33) 147
54.9%
Common
ValueCountFrequency (%)
160
53.9%
) 53
 
17.8%
( 53
 
17.8%
/ 15
 
5.1%
: 12
 
4.0%
1 2
 
0.7%
3 1
 
0.3%
2 1
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1374
70.9%
ASCII 565
29.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
160
28.3%
) 53
 
9.4%
( 53
 
9.4%
N 17
 
3.0%
/ 15
 
2.7%
n 14
 
2.5%
T 14
 
2.5%
I 13
 
2.3%
i 12
 
2.1%
: 12
 
2.1%
Other values (41) 202
35.8%
Hangul
ValueCountFrequency (%)
58
 
4.2%
56
 
4.1%
54
 
3.9%
53
 
3.9%
53
 
3.9%
47
 
3.4%
46
 
3.3%
44
 
3.2%
40
 
2.9%
39
 
2.8%
Other values (125) 884
64.3%

dataType
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
String(1)
52 
DATE
46 
String(10)
31 
String(100)
23 
String(50)
14 
Other values (20)
83 

Length

Max length13
Median length12
Mean length9.1485944
Min length4

Unique

Unique5 ?
Unique (%)2.0%

Sample

1st rowString(10)
2nd rowString(code)
3rd rowDATE
4th rowString(code)
5th rowDATE

Common Values

ValueCountFrequency (%)
String(1) 52
20.9%
DATE 46
18.5%
String(10) 31
12.4%
String(100) 23
9.2%
String(50) 14
 
5.6%
String(200) 14
 
5.6%
String(4000) 12
 
4.8%
Integer(code) 7
 
2.8%
String(code) 6
 
2.4%
Integer(3) 6
 
2.4%
Other values (15) 38
15.3%

Length

2023-12-12T15:02:41.138850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
string(1 52
20.9%
date 46
18.5%
string(10 31
12.4%
string(100 23
9.2%
string(50 14
 
5.6%
string(200 14
 
5.6%
string(4000 12
 
4.8%
integer(code 7
 
2.8%
string(code 6
 
2.4%
integer(3 6
 
2.4%
Other values (15) 38
15.3%
Distinct216
Distinct (%)86.7%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
2023-12-12T15:02:41.415310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length38
Median length23
Mean length8.5702811
Min length2

Characters and Unicode

Total characters2134
Distinct characters204
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique207 ?
Unique (%)83.1%

Sample

1st row환자대체번호
2nd row성별코드
3rd row생년월일
4th row최초진단코드
5th row최초진단일자
ValueCountFrequency (%)
환자대체번호 25
 
6.4%
접수일 8
 
2.1%
7
 
1.8%
검사명 7
 
1.8%
내용 7
 
1.8%
검사결과 5
 
1.3%
biopsy 5
 
1.3%
항암화학요법치료 5
 
1.3%
세포병리검사 5
 
1.3%
원인 4
 
1.0%
Other values (236) 312
80.0%
2023-12-12T15:02:41.885272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
141
 
6.6%
62
 
2.9%
60
 
2.8%
58
 
2.7%
56
 
2.6%
53
 
2.5%
( 52
 
2.4%
) 52
 
2.4%
50
 
2.3%
44
 
2.1%
Other values (194) 1506
70.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1556
72.9%
Uppercase Letter 168
 
7.9%
Space Separator 141
 
6.6%
Lowercase Letter 135
 
6.3%
Open Punctuation 52
 
2.4%
Close Punctuation 52
 
2.4%
Other Punctuation 24
 
1.1%
Decimal Number 4
 
0.2%
Connector Punctuation 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
62
 
4.0%
60
 
3.9%
58
 
3.7%
56
 
3.6%
53
 
3.4%
50
 
3.2%
44
 
2.8%
41
 
2.6%
39
 
2.5%
38
 
2.4%
Other values (142) 1055
67.8%
Uppercase Letter
ValueCountFrequency (%)
N 19
11.3%
T 16
 
9.5%
I 14
 
8.3%
L 13
 
7.7%
C 13
 
7.7%
B 12
 
7.1%
A 10
 
6.0%
O 10
 
6.0%
Y 9
 
5.4%
E 9
 
5.4%
Other values (12) 43
25.6%
Lowercase Letter
ValueCountFrequency (%)
i 18
13.3%
n 15
11.1%
s 13
 
9.6%
o 13
 
9.6%
e 10
 
7.4%
y 8
 
5.9%
a 8
 
5.9%
p 6
 
4.4%
r 5
 
3.7%
d 5
 
3.7%
Other values (11) 34
25.2%
Decimal Number
ValueCountFrequency (%)
1 2
50.0%
3 1
25.0%
2 1
25.0%
Other Punctuation
ValueCountFrequency (%)
/ 14
58.3%
: 10
41.7%
Space Separator
ValueCountFrequency (%)
141
100.0%
Open Punctuation
ValueCountFrequency (%)
( 52
100.0%
Close Punctuation
ValueCountFrequency (%)
) 52
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1556
72.9%
Latin 303
 
14.2%
Common 275
 
12.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
62
 
4.0%
60
 
3.9%
58
 
3.7%
56
 
3.6%
53
 
3.4%
50
 
3.2%
44
 
2.8%
41
 
2.6%
39
 
2.5%
38
 
2.4%
Other values (142) 1055
67.8%
Latin
ValueCountFrequency (%)
N 19
 
6.3%
i 18
 
5.9%
T 16
 
5.3%
n 15
 
5.0%
I 14
 
4.6%
s 13
 
4.3%
L 13
 
4.3%
o 13
 
4.3%
C 13
 
4.3%
B 12
 
4.0%
Other values (33) 157
51.8%
Common
ValueCountFrequency (%)
141
51.3%
( 52
 
18.9%
) 52
 
18.9%
/ 14
 
5.1%
: 10
 
3.6%
1 2
 
0.7%
_ 2
 
0.7%
3 1
 
0.4%
2 1
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1556
72.9%
ASCII 578
 
27.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
141
24.4%
( 52
 
9.0%
) 52
 
9.0%
N 19
 
3.3%
i 18
 
3.1%
T 16
 
2.8%
n 15
 
2.6%
I 14
 
2.4%
/ 14
 
2.4%
s 13
 
2.2%
Other values (42) 224
38.8%
Hangul
ValueCountFrequency (%)
62
 
4.0%
60
 
3.9%
58
 
3.7%
56
 
3.6%
53
 
3.4%
50
 
3.2%
44
 
2.8%
41
 
2.6%
39
 
2.5%
38
 
2.4%
Other values (142) 1055
67.8%

colCnt
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing249
Missing (%)100.0%
Memory size2.3 KiB

dispFormat
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
텍스트
73 
Y : 유 / N : 무
50 
YYYY-MM-DD
46 
숫자
34 
RN+비식별숫자(8)
25 
Other values (4)
21 

Length

Max length15
Median length11
Mean length7.4417671
Min length2

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st rowRN+비식별숫자(8)
2nd rowM 남 | F 여
3rd rowYYYY-MM-DD
4th row원내검사 코드
5th rowYYYY-MM-DD

Common Values

ValueCountFrequency (%)
텍스트 73
29.3%
Y : 유 / N : 무 50
20.1%
YYYY-MM-DD 46
18.5%
숫자 34
13.7%
RN+비식별숫자(8) 25
 
10.0%
Free 텍스트 16
 
6.4%
원내검사 코드 2
 
0.8%
Y : 내부 / N : 외부 2
 
0.8%
M 남 | F 여 1
 
0.4%

Length

2023-12-12T15:02:42.067775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:02:42.187280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
157
26.9%
텍스트 89
15.3%
y 52
 
8.9%
n 52
 
8.9%
50
 
8.6%
50
 
8.6%
yyyy-mm-dd 46
 
7.9%
숫자 34
 
5.8%
rn+비식별숫자(8 25
 
4.3%
free 16
 
2.7%
Other values (8) 12
 
2.1%

Interactions

2023-12-12T15:02:38.315135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:02:42.321416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.0000.9640.9640.9940.9940.7690.579
gpId0.9641.0001.0001.0001.0000.7600.579
gpNm0.9641.0001.0001.0001.0000.7600.579
tblId0.9941.0001.0001.0001.0000.7570.586
tblNm0.9941.0001.0001.0001.0000.7570.586
dataType0.7690.7600.7600.7570.7571.0000.949
dispFormat0.5790.5790.5790.5860.5860.9491.000
2023-12-12T15:02:42.456303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
dataTypetblIddispFormatgpNmgpIdtblNm
dataType1.0000.2600.7390.2990.2990.260
tblId0.2601.0000.2410.9710.9711.000
dispFormat0.7390.2411.0000.2600.2600.241
gpNm0.2990.9710.2601.0001.0000.971
gpId0.2990.9710.2601.0001.0000.971
tblNm0.2601.0000.2410.9710.9711.000
2023-12-12T15:02:42.584884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.0000.7950.7950.8810.8810.3750.307
gpId0.7951.0001.0000.9710.9710.2990.260
gpNm0.7951.0001.0000.9710.9710.2990.260
tblId0.8810.9710.9711.0001.0000.2600.241
tblNm0.8810.9710.9711.0001.0000.2600.241
dataType0.3750.2990.2990.2600.2601.0000.739
dispFormat0.3070.2600.2600.2410.2410.7391.000

Missing values

2023-12-12T15:02:38.416170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:02:38.554129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01PDTR_TRGTSummaryPDTR_PT_TRGT기본정보PT_SBST_NO환자대체번호String(10)환자대체번호<NA>RN+비식별숫자(8)
12PDTR_TRGTSummaryPDTR_PT_TRGT기본정보SEX_CD성별 코드String(code)성별코드<NA>M 남 | F 여
23PDTR_TRGTSummaryPDTR_PT_TRGT기본정보BRTH_YMD생년월일DATE생년월일<NA>YYYY-MM-DD
34PDTR_TRGTSummaryPDTR_PT_TRGT기본정보FRST_DIAG_CD최초 진단 코드String(code)최초진단코드<NA>원내검사 코드
45PDTR_TRGTSummaryPDTR_PT_TRGT기본정보FRST_DIAG_YMD최초 진단일DATE최초진단일자<NA>YYYY-MM-DD
56PDTR_TRGTSummaryPDTR_PT_TRGT기본정보FRST_DIAG_NM최초 진단명String(256)최초진단명<NA>텍스트
67PDTR_TRGTSummaryPDTR_PT_TRGT기본정보DIAG_ATT_AGE진단 시 나이Integer(3)진단 시 나이<NA>숫자
78PDTR_TRGTSummaryPDTR_PT_TRGT기본정보FRMD_YMD초진일DATE초진일자<NA>YYYY-MM-DD
89PDTR_TRGTSummaryPDTR_PT_TRGT기본정보FRST_OPRT_YMD최초 수술일DATE최초 수술일자<NA>YYYY-MM-DD
910PDTR_TRGTSummaryPDTR_PT_TRGT기본정보FRST_OPRT_NM최초 수술명String(256)최초 수술명<NA>텍스트
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
239240PDTR_CHMO_FLST항암 FlowSheetPDTR_PE_CHMO_FLSTFlow SheetFATG_CMNTFATIGUE 내용String(50)FATIGUE<NA>텍스트
240241PDTR_CHMO_FLST항암 FlowSheetPDTR_PE_CHMO_FLSTFlow SheetNV_CMNTNV 내용String(50)NV<NA>텍스트
241242PDTR_CHMO_FLST항암 FlowSheetPDTR_PE_CHMO_FLSTFlow SheetCSTP_CMNTCONSTIPATION 내용String(50)CONSTIPATION<NA>텍스트
242243PDTR_CHMO_FLST항암 FlowSheetPDTR_PE_CHMO_FLSTFlow SheetDIAR_CMNTDIARRHEA 내용String(50)DIARRHEA<NA>텍스트
243244PDTR_CHMO_FLST항암 FlowSheetPDTR_PE_CHMO_FLSTFlow SheetSKIN_RASH_CMNTSKINRASH 내용String(50)SKINRASH<NA>텍스트
244245PDTR_CHMO_FLST항암 FlowSheetPDTR_PE_CHMO_FLSTFlow SheetMCST_CMNTMUCOSITIS 내용String(50)MUCOSITIS<NA>텍스트
245246PDTR_CHMO_FLST항암 FlowSheetPDTR_PE_CHMO_FLSTFlow SheetNURO_PTHY_CMNTNEUROPATHY 내용String(50)NEUROPATHY<NA>텍스트
246247PDTR_CHMO_FLST항암 FlowSheetPDTR_PE_CHMO_FLSTFlow SheetECOG_CDECOG 코드Integer(code)ECOG 전신상태평가<NA>숫자
247248PDTR_CHMO_FLST항암 FlowSheetPDTR_PE_CHMO_FLSTFlow SheetWT_VL체중 (kg)Float(5,2)체중<NA>숫자
248249PDTR_CHMO_FLST항암 FlowSheetPDTR_PE_CHMO_FLSTFlow SheetBSA_VLBSAFloat(10,2)체표면적<NA>숫자