Overview

Dataset statistics

Number of variables11
Number of observations269
Missing cells269
Missing cells (%)9.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory23.8 KiB
Average record size in memory90.5 B

Variable types

Numeric1
Categorical6
Text3
Unsupported1

Dataset

Description구강암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048706/fileData.do

Alerts

gpNm is highly overall correlated with NUM and 3 other fieldsHigh correlation
tblNm is highly overall correlated with NUM and 3 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 3 other fieldsHigh correlation
tblId is highly overall correlated with NUM and 3 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 3 other fieldsHigh correlation
dataType is highly overall correlated with dispFormatHigh correlation
dispFormat is highly overall correlated with dataTypeHigh correlation
colCnt has 269 (100.0%) missing valuesMissing
NUM has unique valuesUnique
colCnt is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 12:53:10.596180
Analysis finished2023-12-12 12:53:11.576081
Duration0.98 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct269
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean135
Minimum1
Maximum269
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2023-12-12T21:53:11.645744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile14.4
Q168
median135
Q3202
95-th percentile255.6
Maximum269
Range268
Interquartile range (IQR)134

Descriptive statistics

Standard deviation77.797815
Coefficient of variation (CV)0.57628011
Kurtosis-1.2
Mean135
Median Absolute Deviation (MAD)67
Skewness0
Sum36315
Variance6052.5
MonotonicityStrictly increasing
2023-12-12T21:53:11.767098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.4%
186 1
 
0.4%
172 1
 
0.4%
173 1
 
0.4%
174 1
 
0.4%
175 1
 
0.4%
176 1
 
0.4%
177 1
 
0.4%
178 1
 
0.4%
179 1
 
0.4%
Other values (259) 259
96.3%
ValueCountFrequency (%)
1 1
0.4%
2 1
0.4%
3 1
0.4%
4 1
0.4%
5 1
0.4%
6 1
0.4%
7 1
0.4%
8 1
0.4%
9 1
0.4%
10 1
0.4%
ValueCountFrequency (%)
269 1
0.4%
268 1
0.4%
267 1
0.4%
266 1
0.4%
265 1
0.4%
264 1
0.4%
263 1
0.4%
262 1
0.4%
261 1
0.4%
260 1
0.4%

gpId
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
ORAL_HLTH
74 
ORAL_SPR
37 
ORAL_CHMO_FLST
26 
ORAL_OPRT
19 
ORAL_TRGT
18 
Other values (12)
95 

Length

Max length14
Median length9
Mean length10.092937
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowORAL_TRGT
2nd rowORAL_TRGT
3rd rowORAL_TRGT
4th rowORAL_TRGT
5th rowORAL_TRGT

Common Values

ValueCountFrequency (%)
ORAL_HLTH 74
27.5%
ORAL_SPR 37
13.8%
ORAL_CHMO_FLST 26
 
9.7%
ORAL_OPRT 19
 
7.1%
ORAL_TRGT 18
 
6.7%
ORAL_CNDX_BDMS 12
 
4.5%
ORAL_CHMO 11
 
4.1%
ORAL_RTX 11
 
4.1%
ORAL_EVAL_DEAD 10
 
3.7%
ORAL_BX 9
 
3.3%
Other values (7) 42
15.6%

Length

2023-12-12T21:53:11.952637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
oral_hlth 74
27.5%
oral_spr 37
13.8%
oral_chmo_flst 26
 
9.7%
oral_oprt 19
 
7.1%
oral_trgt 18
 
6.7%
oral_cndx_bdms 12
 
4.5%
oral_chmo 11
 
4.1%
oral_rtx 11
 
4.1%
oral_eval_dead 10
 
3.7%
oral_bx 9
 
3.3%
Other values (7) 42
15.6%

gpNm
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
환자건강정보
74 
외과병리
37 
항암 FlowSheet
26 
수술
19 
Summary
18 
Other values (12)
95 

Length

Max length12
Median length11
Mean length6.4869888
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSummary
2nd rowSummary
3rd rowSummary
4th rowSummary
5th rowSummary

Common Values

ValueCountFrequency (%)
환자건강정보 74
27.5%
외과병리 37
13.8%
항암 FlowSheet 26
 
9.7%
수술 19
 
7.1%
Summary 18
 
6.7%
진단 및 신체 12
 
4.5%
항암치료 11
 
4.1%
방사선치료 11
 
4.1%
치료평가 및 사망정보 10
 
3.7%
병리검사 9
 
3.3%
Other values (7) 42
15.6%

Length

2023-12-12T21:53:12.084280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
환자건강정보 74
20.4%
외과병리 37
 
10.2%
항암 26
 
7.2%
flowsheet 26
 
7.2%
22
 
6.1%
수술 19
 
5.2%
summary 18
 
5.0%
영상검사 14
 
3.9%
initial 12
 
3.3%
진단 12
 
3.3%
Other values (11) 103
28.4%

tblId
Categorical

HIGH CORRELATION 

Distinct30
Distinct (%)11.2%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
ORAL_PE_SPR
37 
ORAL_PE_CHMO_FLST
26 
ORAL_PT_TRGT
18 
ORAL_MR_HLTH_10
 
14
ORAL_PE_OPRT
 
14
Other values (25)
160 

Length

Max length17
Median length15
Mean length13.386617
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowORAL_PT_TRGT
2nd rowORAL_PT_TRGT
3rd rowORAL_PT_TRGT
4th rowORAL_PT_TRGT
5th rowORAL_PT_TRGT

Common Values

ValueCountFrequency (%)
ORAL_PE_SPR 37
 
13.8%
ORAL_PE_CHMO_FLST 26
 
9.7%
ORAL_PT_TRGT 18
 
6.7%
ORAL_MR_HLTH_10 14
 
5.2%
ORAL_PE_OPRT 14
 
5.2%
ORAL_PE_RTX 11
 
4.1%
ORAL_PE_CHMO 11
 
4.1%
ORAL_PE_BX 9
 
3.3%
ORAL_MR_HLTH_5 9
 
3.3%
ORAL_MR_HLTH_8 9
 
3.3%
Other values (20) 111
41.3%

Length

2023-12-12T21:53:12.212073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
oral_pe_spr 37
 
13.8%
oral_pe_chmo_flst 26
 
9.7%
oral_pt_trgt 18
 
6.7%
oral_mr_hlth_10 14
 
5.2%
oral_pe_oprt 14
 
5.2%
oral_pe_rtx 11
 
4.1%
oral_pe_chmo 11
 
4.1%
oral_pe_bx 9
 
3.3%
oral_mr_hlth_5 9
 
3.3%
oral_mr_hlth_8 9
 
3.3%
Other values (20) 111
41.3%

tblNm
Categorical

HIGH CORRELATION 

Distinct30
Distinct (%)11.2%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
수술 후 결과
37 
Flow Sheet
26 
기본정보
18 
과거력
 
14
수술정보
 
14
Other values (25)
160 

Length

Max length11
Median length10
Mean length6.2342007
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기본정보
2nd row기본정보
3rd row기본정보
4th row기본정보
5th row기본정보

Common Values

ValueCountFrequency (%)
수술 후 결과 37
 
13.8%
Flow Sheet 26
 
9.7%
기본정보 18
 
6.7%
과거력 14
 
5.2%
수술정보 14
 
5.2%
방사선치료정보 11
 
4.1%
항암치료정보 11
 
4.1%
병리결과 9
 
3.3%
가족력 (부) 9
 
3.3%
가족력(형제/자매) 9
 
3.3%
Other values (20) 111
41.3%

Length

2023-12-12T21:53:12.369612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
결과 43
 
10.5%
수술 37
 
9.0%
37
 
9.0%
flow 26
 
6.3%
sheet 26
 
6.3%
기본정보 18
 
4.4%
과거력 14
 
3.4%
수술정보 14
 
3.4%
영상 14
 
3.4%
initial 12
 
2.9%
Other values (25) 169
41.2%

colId
Text

Distinct226
Distinct (%)84.0%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
2023-12-12T21:53:12.640734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length17
Mean length11.933086
Min length5

Characters and Unicode

Total characters3210
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique204 ?
Unique (%)75.8%

Sample

1st rowPT_SBST_NO
2nd rowSEX_CD
3rd rowBRTH_YMD
4th rowFRST_DIAG_YMD
5th rowFRST_DIAG_CD
ValueCountFrequency (%)
pt_sbst_no 20
 
7.4%
oprt_ymd 3
 
1.1%
chmo_strt_ymd 3
 
1.1%
oprt_nm 3
 
1.1%
cexm_rslt_cmnt 2
 
0.7%
miex_ymd 2
 
0.7%
miex_rslt_cmnt 2
 
0.7%
chmo_prps_nm 2
 
0.7%
main_smpl_site_cmnt 2
 
0.7%
miex_opnn_cmnt 2
 
0.7%
Other values (216) 228
84.8%
2023-12-12T21:53:13.123494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 528
16.4%
T 325
 
10.1%
M 285
 
8.9%
N 260
 
8.1%
C 202
 
6.3%
S 201
 
6.3%
D 154
 
4.8%
R 146
 
4.5%
H 124
 
3.9%
Y 119
 
3.7%
Other values (23) 866
27.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2669
83.1%
Connector Punctuation 528
 
16.4%
Decimal Number 13
 
0.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 325
12.2%
M 285
 
10.7%
N 260
 
9.7%
C 202
 
7.6%
S 201
 
7.5%
D 154
 
5.8%
R 146
 
5.5%
H 124
 
4.6%
Y 119
 
4.5%
P 106
 
4.0%
Other values (16) 747
28.0%
Decimal Number
ValueCountFrequency (%)
3 4
30.8%
2 3
23.1%
1 2
15.4%
6 2
15.4%
5 1
 
7.7%
7 1
 
7.7%
Connector Punctuation
ValueCountFrequency (%)
_ 528
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2669
83.1%
Common 541
 
16.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 325
12.2%
M 285
 
10.7%
N 260
 
9.7%
C 202
 
7.6%
S 201
 
7.5%
D 154
 
5.8%
R 146
 
5.5%
H 124
 
4.6%
Y 119
 
4.5%
P 106
 
4.0%
Other values (16) 747
28.0%
Common
ValueCountFrequency (%)
_ 528
97.6%
3 4
 
0.7%
2 3
 
0.6%
1 2
 
0.4%
6 2
 
0.4%
5 1
 
0.2%
7 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3210
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 528
16.4%
T 325
 
10.1%
M 285
 
8.9%
N 260
 
8.1%
C 202
 
6.3%
S 201
 
6.3%
D 154
 
4.8%
R 146
 
4.5%
H 124
 
3.9%
Y 119
 
3.7%
Other values (23) 866
27.0%

colNm
Text

Distinct207
Distinct (%)77.0%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
2023-12-12T21:53:13.415994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length45
Median length24
Mean length8.5130112
Min length2

Characters and Unicode

Total characters2290
Distinct characters191
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique184 ?
Unique (%)68.4%

Sample

1st row환자대체번호
2nd row성별 코드
3rd row생년월일
4th row최초 진단일
5th row최초 진단 코드
ValueCountFrequency (%)
내용 57
 
12.1%
환자대체번호 20
 
4.2%
검사 19
 
4.0%
결과 8
 
1.7%
여부 8
 
1.7%
최초 7
 
1.5%
기타 7
 
1.5%
n:무 6
 
1.3%
y:유 6
 
1.3%
tumor 6
 
1.3%
Other values (225) 329
69.6%
2023-12-12T21:53:13.918101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
204
 
8.9%
71
 
3.1%
67
 
2.9%
65
 
2.8%
( 55
 
2.4%
) 55
 
2.4%
53
 
2.3%
53
 
2.3%
42
 
1.8%
41
 
1.8%
Other values (181) 1584
69.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1357
59.3%
Lowercase Letter 401
 
17.5%
Space Separator 204
 
8.9%
Uppercase Letter 176
 
7.7%
Open Punctuation 55
 
2.4%
Close Punctuation 55
 
2.4%
Other Punctuation 27
 
1.2%
Decimal Number 14
 
0.6%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
71
 
5.2%
67
 
4.9%
65
 
4.8%
53
 
3.9%
53
 
3.9%
42
 
3.1%
41
 
3.0%
39
 
2.9%
39
 
2.9%
38
 
2.8%
Other values (123) 849
62.6%
Lowercase Letter
ValueCountFrequency (%)
o 41
10.2%
i 38
 
9.5%
e 35
 
8.7%
a 34
 
8.5%
n 29
 
7.2%
t 28
 
7.0%
r 27
 
6.7%
s 25
 
6.2%
l 22
 
5.5%
m 20
 
5.0%
Other values (13) 102
25.4%
Uppercase Letter
ValueCountFrequency (%)
T 21
11.9%
N 19
10.8%
I 16
 
9.1%
C 14
 
8.0%
A 13
 
7.4%
S 13
 
7.4%
P 10
 
5.7%
R 8
 
4.5%
E 8
 
4.5%
H 7
 
4.0%
Other values (13) 47
26.7%
Decimal Number
ValueCountFrequency (%)
3 4
28.6%
2 3
21.4%
1 3
21.4%
6 2
14.3%
7 1
 
7.1%
5 1
 
7.1%
Other Punctuation
ValueCountFrequency (%)
/ 15
55.6%
: 12
44.4%
Space Separator
ValueCountFrequency (%)
204
100.0%
Open Punctuation
ValueCountFrequency (%)
( 55
100.0%
Close Punctuation
ValueCountFrequency (%)
) 55
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1357
59.3%
Latin 577
25.2%
Common 356
 
15.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
71
 
5.2%
67
 
4.9%
65
 
4.8%
53
 
3.9%
53
 
3.9%
42
 
3.1%
41
 
3.0%
39
 
2.9%
39
 
2.9%
38
 
2.8%
Other values (123) 849
62.6%
Latin
ValueCountFrequency (%)
o 41
 
7.1%
i 38
 
6.6%
e 35
 
6.1%
a 34
 
5.9%
n 29
 
5.0%
t 28
 
4.9%
r 27
 
4.7%
s 25
 
4.3%
l 22
 
3.8%
T 21
 
3.6%
Other values (36) 277
48.0%
Common
ValueCountFrequency (%)
204
57.3%
( 55
 
15.4%
) 55
 
15.4%
/ 15
 
4.2%
: 12
 
3.4%
3 4
 
1.1%
2 3
 
0.8%
1 3
 
0.8%
6 2
 
0.6%
- 1
 
0.3%
Other values (2) 2
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1357
59.3%
ASCII 933
40.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
204
21.9%
( 55
 
5.9%
) 55
 
5.9%
o 41
 
4.4%
i 38
 
4.1%
e 35
 
3.8%
a 34
 
3.6%
n 29
 
3.1%
t 28
 
3.0%
r 27
 
2.9%
Other values (48) 387
41.5%
Hangul
ValueCountFrequency (%)
71
 
5.2%
67
 
4.9%
65
 
4.8%
53
 
3.9%
53
 
3.9%
42
 
3.1%
41
 
3.0%
39
 
2.9%
39
 
2.9%
38
 
2.8%
Other values (123) 849
62.6%

dataType
Categorical

HIGH CORRELATION 

Distinct27
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
String(1)
54 
DATE
42 
String(100)
33 
String(10)
27 
String(200)
20 
Other values (22)
93 

Length

Max length13
Median length12
Mean length9.3085502
Min length4

Unique

Unique8 ?
Unique (%)3.0%

Sample

1st rowString(10)
2nd rowString(code)
3rd rowDATE
4th rowDATE
5th rowString(code)

Common Values

ValueCountFrequency (%)
String(1) 54
20.1%
DATE 42
15.6%
String(100) 33
12.3%
String(10) 27
10.0%
String(200) 20
 
7.4%
String(50) 17
 
6.3%
String(256) 10
 
3.7%
Integer(code) 9
 
3.3%
String(4000) 8
 
3.0%
String(20) 8
 
3.0%
Other values (17) 41
15.2%

Length

2023-12-12T21:53:14.120311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
string(1 54
20.1%
date 42
15.6%
string(100 33
12.3%
string(10 27
10.0%
string(200 20
 
7.4%
string(50 17
 
6.3%
string(256 10
 
3.7%
integer(code 9
 
3.3%
integer(3 9
 
3.3%
string(4000 8
 
3.0%
Other values (16) 40
14.9%
Distinct225
Distinct (%)83.6%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
2023-12-12T21:53:14.413591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length42
Median length24
Mean length8.9256506
Min length2

Characters and Unicode

Total characters2401
Distinct characters200
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique202 ?
Unique (%)75.1%

Sample

1st row환자대체번호
2nd row성별코드
3rd row생년월일
4th row최초진단일자
5th row최초진단코드
ValueCountFrequency (%)
환자대체번호 20
 
4.5%
내용 15
 
3.4%
7
 
1.6%
명칭 6
 
1.4%
항암화학요법치료 6
 
1.4%
tumor 6
 
1.4%
기타 6
 
1.4%
수술 5
 
1.1%
방사선치료 5
 
1.1%
y:유 4
 
0.9%
Other values (264) 361
81.9%
2023-12-12T21:53:14.942740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
172
 
7.2%
65
 
2.7%
64
 
2.7%
56
 
2.3%
53
 
2.2%
( 52
 
2.2%
) 52
 
2.2%
i 48
 
2.0%
42
 
1.7%
o 42
 
1.7%
Other values (190) 1755
73.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1492
62.1%
Lowercase Letter 414
 
17.2%
Uppercase Letter 179
 
7.5%
Space Separator 172
 
7.2%
Open Punctuation 52
 
2.2%
Close Punctuation 52
 
2.2%
Other Punctuation 24
 
1.0%
Decimal Number 14
 
0.6%
Connector Punctuation 1
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
65
 
4.4%
64
 
4.3%
56
 
3.8%
53
 
3.6%
42
 
2.8%
42
 
2.8%
41
 
2.7%
40
 
2.7%
39
 
2.6%
38
 
2.5%
Other values (132) 1012
67.8%
Uppercase Letter
ValueCountFrequency (%)
N 20
 
11.2%
T 18
 
10.1%
C 15
 
8.4%
I 13
 
7.3%
A 12
 
6.7%
S 12
 
6.7%
L 11
 
6.1%
B 9
 
5.0%
P 9
 
5.0%
Y 8
 
4.5%
Other values (13) 52
29.1%
Lowercase Letter
ValueCountFrequency (%)
i 48
11.6%
o 42
10.1%
e 36
 
8.7%
a 35
 
8.5%
n 31
 
7.5%
t 29
 
7.0%
r 28
 
6.8%
s 27
 
6.5%
l 21
 
5.1%
c 19
 
4.6%
Other values (12) 98
23.7%
Decimal Number
ValueCountFrequency (%)
3 4
28.6%
2 3
21.4%
1 3
21.4%
6 2
14.3%
5 1
 
7.1%
7 1
 
7.1%
Other Punctuation
ValueCountFrequency (%)
/ 14
58.3%
: 10
41.7%
Space Separator
ValueCountFrequency (%)
172
100.0%
Open Punctuation
ValueCountFrequency (%)
( 52
100.0%
Close Punctuation
ValueCountFrequency (%)
) 52
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1492
62.1%
Latin 593
 
24.7%
Common 316
 
13.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
65
 
4.4%
64
 
4.3%
56
 
3.8%
53
 
3.6%
42
 
2.8%
42
 
2.8%
41
 
2.7%
40
 
2.7%
39
 
2.6%
38
 
2.5%
Other values (132) 1012
67.8%
Latin
ValueCountFrequency (%)
i 48
 
8.1%
o 42
 
7.1%
e 36
 
6.1%
a 35
 
5.9%
n 31
 
5.2%
t 29
 
4.9%
r 28
 
4.7%
s 27
 
4.6%
l 21
 
3.5%
N 20
 
3.4%
Other values (35) 276
46.5%
Common
ValueCountFrequency (%)
172
54.4%
( 52
 
16.5%
) 52
 
16.5%
/ 14
 
4.4%
: 10
 
3.2%
3 4
 
1.3%
2 3
 
0.9%
1 3
 
0.9%
6 2
 
0.6%
5 1
 
0.3%
Other values (3) 3
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1492
62.1%
ASCII 909
37.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
172
18.9%
( 52
 
5.7%
) 52
 
5.7%
i 48
 
5.3%
o 42
 
4.6%
e 36
 
4.0%
a 35
 
3.9%
n 31
 
3.4%
t 29
 
3.2%
r 28
 
3.1%
Other values (48) 384
42.2%
Hangul
ValueCountFrequency (%)
65
 
4.4%
64
 
4.3%
56
 
3.8%
53
 
3.6%
42
 
2.8%
42
 
2.8%
41
 
2.7%
40
 
2.7%
39
 
2.6%
38
 
2.5%
Other values (132) 1012
67.8%

colCnt
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing269
Missing (%)100.0%
Memory size2.5 KiB

dispFormat
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
텍스트
107 
Y : 유 / N : 무
51 
YYYY-MM-DD
41 
숫자
34 
RN+비식별숫자(8)
20 
Other values (4)
16 

Length

Max length15
Median length13
Mean length6.802974
Min length2

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st rowRN+비식별숫자(8)
2nd rowM 남 | F 여
3rd rowYYYY-MM-DD
4th rowYYYY-MM-DD
5th row원내검사 코드

Common Values

ValueCountFrequency (%)
텍스트 107
39.8%
Y : 유 / N : 무 51
19.0%
YYYY-MM-DD 41
 
15.2%
숫자 34
 
12.6%
RN+비식별숫자(8) 20
 
7.4%
Free 텍스트 10
 
3.7%
Y : 내부 / N : 외부 3
 
1.1%
원내검사 코드 2
 
0.7%
M 남 | F 여 1
 
0.4%

Length

2023-12-12T21:53:15.109353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:53:15.272431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
163
26.8%
텍스트 117
19.2%
y 54
 
8.9%
n 54
 
8.9%
51
 
8.4%
51
 
8.4%
yyyy-mm-dd 41
 
6.7%
숫자 34
 
5.6%
rn+비식별숫자(8 20
 
3.3%
free 10
 
1.6%
Other values (8) 14
 
2.3%

Interactions

2023-12-12T21:53:11.254030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:53:15.394460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.0000.9480.9480.9970.9970.7490.564
gpId0.9481.0001.0001.0001.0000.7520.610
gpNm0.9481.0001.0001.0001.0000.7520.610
tblId0.9971.0001.0001.0001.0000.7100.639
tblNm0.9971.0001.0001.0001.0000.7100.639
dataType0.7490.7520.7520.7100.7101.0000.985
dispFormat0.5640.6100.6100.6390.6390.9851.000
2023-12-12T21:53:15.529020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
dispFormatgpNmtblNmdataTypegpIdtblId
dispFormat1.0000.2910.2840.7850.2910.284
gpNm0.2911.0000.9740.3111.0000.974
tblNm0.2840.9741.0000.2340.9741.000
dataType0.7850.3110.2341.0000.3110.234
gpId0.2911.0000.9740.3111.0000.974
tblId0.2840.9741.0000.2340.9741.000
2023-12-12T21:53:15.640671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.0000.7670.7670.8750.8750.3710.297
gpId0.7671.0001.0000.9740.9740.3110.291
gpNm0.7671.0001.0000.9740.9740.3110.291
tblId0.8750.9740.9741.0001.0000.2340.284
tblNm0.8750.9740.9741.0001.0000.2340.284
dataType0.3710.3110.3110.2340.2341.0000.785
dispFormat0.2970.2910.2910.2840.2840.7851.000

Missing values

2023-12-12T21:53:11.369363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:53:11.522298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01ORAL_TRGTSummaryORAL_PT_TRGT기본정보PT_SBST_NO환자대체번호String(10)환자대체번호<NA>RN+비식별숫자(8)
12ORAL_TRGTSummaryORAL_PT_TRGT기본정보SEX_CD성별 코드String(code)성별코드<NA>M 남 | F 여
23ORAL_TRGTSummaryORAL_PT_TRGT기본정보BRTH_YMD생년월일DATE생년월일<NA>YYYY-MM-DD
34ORAL_TRGTSummaryORAL_PT_TRGT기본정보FRST_DIAG_YMD최초 진단일DATE최초진단일자<NA>YYYY-MM-DD
45ORAL_TRGTSummaryORAL_PT_TRGT기본정보FRST_DIAG_CD최초 진단 코드String(code)최초진단코드<NA>원내검사 코드
56ORAL_TRGTSummaryORAL_PT_TRGT기본정보FRST_DIAG_NM최초 진단명String(256)최초진단명<NA>텍스트
67ORAL_TRGTSummaryORAL_PT_TRGT기본정보DIAG_ATT_AGE진단 시 나이Integer(3)진단 시 나이<NA>숫자
78ORAL_TRGTSummaryORAL_PT_TRGT기본정보FRMD_YMD초진일DATE초진일자<NA>YYYY-MM-DD
89ORAL_TRGTSummaryORAL_PT_TRGT기본정보FRMD_DEPT_NM초진 부서명String(20)초진 부서<NA>텍스트
910ORAL_TRGTSummaryORAL_PT_TRGT기본정보SPRA_FRMD_YMD희귀암클리닉 초진일DATE희귀암클리닉 초진일<NA>YYYY-MM-DD
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
259260ORAL_CHMO_FLST항암 FlowSheetORAL_PE_CHMO_FLSTFlow SheetFATG_CMNTFATIGUE 내용String(50)FATIGUE<NA>텍스트
260261ORAL_CHMO_FLST항암 FlowSheetORAL_PE_CHMO_FLSTFlow SheetNV_CMNTNV 내용String(50)NV<NA>텍스트
261262ORAL_CHMO_FLST항암 FlowSheetORAL_PE_CHMO_FLSTFlow SheetCSTP_CMNTCONSTIPATION 내용String(50)CONSTIPATION<NA>텍스트
262263ORAL_CHMO_FLST항암 FlowSheetORAL_PE_CHMO_FLSTFlow SheetDIAR_CMNTDIARRHEA 내용String(50)DIARRHEA<NA>텍스트
263264ORAL_CHMO_FLST항암 FlowSheetORAL_PE_CHMO_FLSTFlow SheetSKIN_RASH_CMNTSKINRASH 내용String(50)SKINRASH<NA>텍스트
264265ORAL_CHMO_FLST항암 FlowSheetORAL_PE_CHMO_FLSTFlow SheetMCST_CMNTMUCOSITIS 내용String(50)MUCOSITIS<NA>텍스트
265266ORAL_CHMO_FLST항암 FlowSheetORAL_PE_CHMO_FLSTFlow SheetNURO_PTHY_CMNTNEUROPATHY 내용String(50)NEUROPATHY<NA>텍스트
266267ORAL_CHMO_FLST항암 FlowSheetORAL_PE_CHMO_FLSTFlow SheetECOG_CDECOG 코드Integer(code)ECOG 전신상태평가<NA>숫자
267268ORAL_CHMO_FLST항암 FlowSheetORAL_PE_CHMO_FLSTFlow SheetWT_VL체중 (kg)Float(5,2)체중<NA>숫자
268269ORAL_CHMO_FLST항암 FlowSheetORAL_PE_CHMO_FLSTFlow SheetBSA_VLBSAFloat(10,2)체표면적<NA>숫자