Overview

Dataset statistics

Number of variables11
Number of observations308
Missing cells308
Missing cells (%)9.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory27.2 KiB
Average record size in memory90.4 B

Variable types

Numeric1
Categorical6
Text3
Unsupported1

Dataset

Description방광암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048705/fileData.do

Alerts

tblId is highly overall correlated with NUM and 3 other fieldsHigh correlation
gpNm is highly overall correlated with NUM and 3 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 3 other fieldsHigh correlation
tblNm is highly overall correlated with NUM and 3 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 3 other fieldsHigh correlation
dataType is highly overall correlated with dispFormatHigh correlation
dispFormat is highly overall correlated with dataTypeHigh correlation
colCnt has 308 (100.0%) missing valuesMissing
NUM has unique valuesUnique
colCnt is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 09:20:26.327044
Analysis finished2023-12-12 09:20:27.682713
Duration1.36 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct308
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean154.5
Minimum1
Maximum308
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 KiB
2023-12-12T18:20:28.147971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16.35
Q177.75
median154.5
Q3231.25
95-th percentile292.65
Maximum308
Range307
Interquartile range (IQR)153.5

Descriptive statistics

Standard deviation89.056162
Coefficient of variation (CV)0.57641529
Kurtosis-1.2
Mean154.5
Median Absolute Deviation (MAD)77
Skewness0
Sum47586
Variance7931
MonotonicityStrictly increasing
2023-12-12T18:20:28.312332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.3%
205 1
 
0.3%
212 1
 
0.3%
211 1
 
0.3%
210 1
 
0.3%
209 1
 
0.3%
208 1
 
0.3%
207 1
 
0.3%
206 1
 
0.3%
204 1
 
0.3%
Other values (298) 298
96.8%
ValueCountFrequency (%)
1 1
0.3%
2 1
0.3%
3 1
0.3%
4 1
0.3%
5 1
0.3%
6 1
0.3%
7 1
0.3%
8 1
0.3%
9 1
0.3%
10 1
0.3%
ValueCountFrequency (%)
308 1
0.3%
307 1
0.3%
306 1
0.3%
305 1
0.3%
304 1
0.3%
303 1
0.3%
302 1
0.3%
301 1
0.3%
300 1
0.3%
299 1
0.3%

gpId
Categorical

HIGH CORRELATION 

Distinct19
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
BLAD_HLTH
74 
BLAD_SPR
60 
BLAD_CHMO_FLST
26 
BLAD_OPRT
19 
BLAD_IMNL
13 
Other values (14)
116 

Length

Max length14
Median length12
Mean length10.126623
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBLAD_TRGT
2nd rowBLAD_TRGT
3rd rowBLAD_TRGT
4th rowBLAD_TRGT
5th rowBLAD_TRGT

Common Values

ValueCountFrequency (%)
BLAD_HLTH 74
24.0%
BLAD_SPR 60
19.5%
BLAD_CHMO_FLST 26
 
8.4%
BLAD_OPRT 19
 
6.2%
BLAD_IMNL 13
 
4.2%
BLAD_TRGT 13
 
4.2%
BLAD_CNDX_BDMS 12
 
3.9%
BLAD_CHMO 12
 
3.9%
BLAD_EVAL_DEAD 10
 
3.2%
BLAD_RTX 9
 
2.9%
Other values (9) 60
19.5%

Length

2023-12-12T18:20:28.480936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
blad_hlth 74
24.0%
blad_spr 60
19.5%
blad_chmo_flst 26
 
8.4%
blad_oprt 19
 
6.2%
blad_imnl 13
 
4.2%
blad_trgt 13
 
4.2%
blad_cndx_bdms 12
 
3.9%
blad_chmo 12
 
3.9%
blad_eval_dead 10
 
3.2%
blad_itrv_tx 9
 
2.9%
Other values (9) 60
19.5%

gpNm
Categorical

HIGH CORRELATION 

Distinct19
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
환자건강정보
74 
외과병리
60 
항암 FlowSheet
26 
수술
19 
면역병리검사
13 
Other values (14)
116 

Length

Max length15
Median length12
Mean length6.775974
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSummary
2nd rowSummary
3rd rowSummary
4th rowSummary
5th rowSummary

Common Values

ValueCountFrequency (%)
환자건강정보 74
24.0%
외과병리 60
19.5%
항암 FlowSheet 26
 
8.4%
수술 19
 
6.2%
면역병리검사 13
 
4.2%
Summary 13
 
4.2%
진단 및 신체 12
 
3.9%
항암치료 12
 
3.9%
치료평가 및 사망정보 10
 
3.2%
방사선 치료 9
 
2.9%
Other values (9) 60
19.5%

Length

2023-12-12T18:20:28.684968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
환자건강정보 74
17.3%
외과병리 60
14.1%
항암 26
 
6.1%
flowsheet 26
 
6.1%
22
 
5.2%
initial 20
 
4.7%
수술 19
 
4.4%
면역병리검사 13
 
3.0%
summary 13
 
3.0%
진단 12
 
2.8%
Other values (15) 142
33.3%

tblId
Categorical

HIGH CORRELATION 

Distinct34
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
BLAD_PE_SPR
36 
BLAD_PE_CHMO_FLST
26 
BLAD_PE_SPR_TURB
24 
BLAD_MR_HLTH_5
 
14
BLAD_PE_IMNL
 
13
Other values (29)
195 

Length

Max length17
Median length16
Mean length13.75
Min length11

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBLAD_PT_TRGT
2nd rowBLAD_PT_TRGT
3rd rowBLAD_PT_TRGT
4th rowBLAD_PT_TRGT
5th rowBLAD_PT_TRGT

Common Values

ValueCountFrequency (%)
BLAD_PE_SPR 36
 
11.7%
BLAD_PE_CHMO_FLST 26
 
8.4%
BLAD_PE_SPR_TURB 24
 
7.8%
BLAD_MR_HLTH_5 14
 
4.5%
BLAD_PE_IMNL 13
 
4.2%
BLAD_PT_TRGT 13
 
4.2%
BLAD_PE_CHMO 12
 
3.9%
BLAD_MR_HLTH_6 9
 
2.9%
BLAD_PE_RTX 9
 
2.9%
BLAD_PE_BX_INIT 9
 
2.9%
Other values (24) 143
46.4%

Length

2023-12-12T18:20:28.849084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
blad_pe_spr 36
 
11.7%
blad_pe_chmo_flst 26
 
8.4%
blad_pe_spr_turb 24
 
7.8%
blad_mr_hlth_5 14
 
4.5%
blad_pe_imnl 13
 
4.2%
blad_pt_trgt 13
 
4.2%
blad_pe_chmo 12
 
3.9%
blad_mr_hlth_7 9
 
2.9%
blad_pe_itrv_tx 9
 
2.9%
blad_mr_hlth_8 9
 
2.9%
Other values (24) 143
46.4%

tblNm
Categorical

HIGH CORRELATION 

Distinct34
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
수술 후 결과
36 
Flow Sheet
26 
수술 후 결과(TURB)
24 
과거력
 
14
면역병리결과
 
13
Other values (29)
195 

Length

Max length13
Median length11
Mean length7.2564935
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기본정보
2nd row기본정보
3rd row기본정보
4th row기본정보
5th row기본정보

Common Values

ValueCountFrequency (%)
수술 후 결과 36
 
11.7%
Flow Sheet 26
 
8.4%
수술 후 결과(TURB) 24
 
7.8%
과거력 14
 
4.5%
면역병리결과 13
 
4.2%
기본정보 13
 
4.2%
항암치료정보 12
 
3.9%
가족력(부) 9
 
2.9%
방사선치료정보 9
 
2.9%
Initial 병리결과 9
 
2.9%
Other values (24) 143
46.4%

Length

2023-12-12T18:20:29.028705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
수술 66
 
12.4%
66
 
12.4%
결과 43
 
8.1%
flow 26
 
4.9%
sheet 26
 
4.9%
결과(turb 24
 
4.5%
initial 20
 
3.8%
과거력 14
 
2.6%
기본정보 13
 
2.4%
면역병리결과 13
 
2.4%
Other values (30) 220
41.4%

colId
Text

Distinct240
Distinct (%)77.9%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2023-12-12T18:20:29.332096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length16
Mean length11.896104
Min length5

Characters and Unicode

Total characters3664
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique200 ?
Unique (%)64.9%

Sample

1st rowPT_SBST_NO
2nd rowSEX_CD
3rd rowBRTH_YMD
4th rowFRST_DIAG_CD
5th rowFRST_DIAG_YMD
ValueCountFrequency (%)
pt_sbst_no 24
 
7.8%
chmo_strt_ymd 4
 
1.3%
spr_read_ymd 3
 
1.0%
spr_acpt_ymd 3
 
1.0%
oprt_ymd 3
 
1.0%
chmo_end_ymd 3
 
1.0%
mrph_type_cmnt 2
 
0.6%
ord_ymd 2
 
0.6%
cis_yn 2
 
0.6%
miex_clsf_nm 2
 
0.6%
Other values (230) 260
84.4%
2023-12-12T18:20:29.900179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 597
16.3%
T 384
 
10.5%
M 319
 
8.7%
N 284
 
7.8%
S 244
 
6.7%
C 241
 
6.6%
D 177
 
4.8%
R 144
 
3.9%
Y 143
 
3.9%
H 137
 
3.7%
Other values (25) 994
27.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3020
82.4%
Connector Punctuation 597
 
16.3%
Decimal Number 47
 
1.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 384
12.7%
M 319
 
10.6%
N 284
 
9.4%
S 244
 
8.1%
C 241
 
8.0%
D 177
 
5.9%
R 144
 
4.8%
Y 143
 
4.7%
H 137
 
4.5%
A 112
 
3.7%
Other values (16) 835
27.6%
Decimal Number
ValueCountFrequency (%)
0 13
27.7%
2 11
23.4%
1 10
21.3%
7 5
 
10.6%
5 2
 
4.3%
3 2
 
4.3%
6 2
 
4.3%
4 2
 
4.3%
Connector Punctuation
ValueCountFrequency (%)
_ 597
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3020
82.4%
Common 644
 
17.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 384
12.7%
M 319
 
10.6%
N 284
 
9.4%
S 244
 
8.1%
C 241
 
8.0%
D 177
 
5.9%
R 144
 
4.8%
Y 143
 
4.7%
H 137
 
4.5%
A 112
 
3.7%
Other values (16) 835
27.6%
Common
ValueCountFrequency (%)
_ 597
92.7%
0 13
 
2.0%
2 11
 
1.7%
1 10
 
1.6%
7 5
 
0.8%
5 2
 
0.3%
3 2
 
0.3%
6 2
 
0.3%
4 2
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3664
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 597
16.3%
T 384
 
10.5%
M 319
 
8.7%
N 284
 
7.8%
S 244
 
6.7%
C 241
 
6.6%
D 177
 
4.8%
R 144
 
3.9%
Y 143
 
3.9%
H 137
 
3.7%
Other values (25) 994
27.1%

colNm
Text

Distinct225
Distinct (%)73.1%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2023-12-12T18:20:30.245611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length23
Mean length9.1720779
Min length2

Characters and Unicode

Total characters2825
Distinct characters194
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique186 ?
Unique (%)60.4%

Sample

1st row환자대체번호
2nd row성별 코드
3rd row생년월일
4th row최초 진단 코드
5th row최초 진단일
ValueCountFrequency (%)
내용 72
 
12.6%
환자대체번호 24
 
4.2%
검사 17
 
3.0%
stage 14
 
2.5%
여부 9
 
1.6%
invasion 9
 
1.6%
부위 7
 
1.2%
기타 7
 
1.2%
y:유 6
 
1.1%
n:무 6
 
1.1%
Other values (238) 399
70.0%
2023-12-12T18:20:30.975400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
262
 
9.3%
92
 
3.3%
85
 
3.0%
( 65
 
2.3%
) 65
 
2.3%
65
 
2.3%
a 64
 
2.3%
o 62
 
2.2%
e 61
 
2.2%
i 56
 
2.0%
Other values (184) 1948
69.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1438
50.9%
Lowercase Letter 629
22.3%
Uppercase Letter 288
 
10.2%
Space Separator 262
 
9.3%
Open Punctuation 65
 
2.3%
Close Punctuation 65
 
2.3%
Decimal Number 48
 
1.7%
Other Punctuation 27
 
1.0%
Dash Punctuation 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
92
 
6.4%
85
 
5.9%
65
 
4.5%
53
 
3.7%
53
 
3.7%
52
 
3.6%
42
 
2.9%
42
 
2.9%
38
 
2.6%
38
 
2.6%
Other values (124) 878
61.1%
Lowercase Letter
ValueCountFrequency (%)
a 64
 
10.2%
o 62
 
9.9%
e 61
 
9.7%
i 56
 
8.9%
t 49
 
7.8%
s 47
 
7.5%
n 47
 
7.5%
g 31
 
4.9%
r 27
 
4.3%
l 25
 
4.0%
Other values (13) 160
25.4%
Uppercase Letter
ValueCountFrequency (%)
C 39
13.5%
T 28
 
9.7%
A 24
 
8.3%
N 23
 
8.0%
I 21
 
7.3%
S 17
 
5.9%
P 15
 
5.2%
L 14
 
4.9%
M 14
 
4.9%
R 12
 
4.2%
Other values (13) 81
28.1%
Decimal Number
ValueCountFrequency (%)
0 13
27.1%
2 11
22.9%
1 11
22.9%
7 5
 
10.4%
6 2
 
4.2%
5 2
 
4.2%
3 2
 
4.2%
4 2
 
4.2%
Other Punctuation
ValueCountFrequency (%)
/ 15
55.6%
: 12
44.4%
Space Separator
ValueCountFrequency (%)
262
100.0%
Open Punctuation
ValueCountFrequency (%)
( 65
100.0%
Close Punctuation
ValueCountFrequency (%)
) 65
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1438
50.9%
Latin 917
32.5%
Common 470
 
16.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
92
 
6.4%
85
 
5.9%
65
 
4.5%
53
 
3.7%
53
 
3.7%
52
 
3.6%
42
 
2.9%
42
 
2.9%
38
 
2.6%
38
 
2.6%
Other values (124) 878
61.1%
Latin
ValueCountFrequency (%)
a 64
 
7.0%
o 62
 
6.8%
e 61
 
6.7%
i 56
 
6.1%
t 49
 
5.3%
s 47
 
5.1%
n 47
 
5.1%
C 39
 
4.3%
g 31
 
3.4%
T 28
 
3.1%
Other values (36) 433
47.2%
Common
ValueCountFrequency (%)
262
55.7%
( 65
 
13.8%
) 65
 
13.8%
/ 15
 
3.2%
0 13
 
2.8%
: 12
 
2.6%
2 11
 
2.3%
1 11
 
2.3%
7 5
 
1.1%
- 3
 
0.6%
Other values (4) 8
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1438
50.9%
ASCII 1387
49.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
262
18.9%
( 65
 
4.7%
) 65
 
4.7%
a 64
 
4.6%
o 62
 
4.5%
e 61
 
4.4%
i 56
 
4.0%
t 49
 
3.5%
s 47
 
3.4%
n 47
 
3.4%
Other values (50) 609
43.9%
Hangul
ValueCountFrequency (%)
92
 
6.4%
85
 
5.9%
65
 
4.5%
53
 
3.7%
53
 
3.7%
52
 
3.6%
42
 
2.9%
42
 
2.9%
38
 
2.6%
38
 
2.6%
Other values (124) 878
61.1%

dataType
Categorical

HIGH CORRELATION 

Distinct31
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
String(1)
56 
DATE
50 
String(10)
37 
String(100)
31 
String(200)
19 
Other values (26)
115 

Length

Max length13
Median length12
Mean length9.2305195
Min length4

Unique

Unique9 ?
Unique (%)2.9%

Sample

1st rowString(10)
2nd rowString(code)
3rd rowDATE
4th rowString(code)
5th rowDATE

Common Values

ValueCountFrequency (%)
String(1) 56
18.2%
DATE 50
16.2%
String(10) 37
12.0%
String(100) 31
10.1%
String(200) 19
 
6.2%
String(50) 17
 
5.5%
String(20) 12
 
3.9%
String(400) 10
 
3.2%
Integer(code) 9
 
2.9%
Integer(4) 8
 
2.6%
Other values (21) 59
19.2%

Length

2023-12-12T18:20:31.196014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
string(1 56
18.2%
date 51
16.6%
string(10 37
12.0%
string(100 31
10.1%
string(200 19
 
6.2%
string(50 17
 
5.5%
string(20 12
 
3.9%
string(400 10
 
3.2%
integer(code 9
 
2.9%
integer(4 8
 
2.6%
Other values (20) 58
18.8%
Distinct239
Distinct (%)77.6%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2023-12-12T18:20:31.537340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length38
Median length22
Mean length9.4318182
Min length2

Characters and Unicode

Total characters2905
Distinct characters204
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique200 ?
Unique (%)64.9%

Sample

1st row환자대체번호
2nd row성별코드
3rd row생년월일
4th row최초진단코드
5th row최초진단일자
ValueCountFrequency (%)
환자대체번호 24
 
4.5%
내용 18
 
3.4%
stage 14
 
2.6%
invasion 9
 
1.7%
검체결과 8
 
1.5%
항암치료 6
 
1.1%
기타 6
 
1.1%
명칭 6
 
1.1%
5
 
0.9%
세포병리검사 5
 
0.9%
Other values (273) 429
80.9%
2023-12-12T18:20:32.223869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
222
 
7.6%
i 68
 
2.3%
o 66
 
2.3%
66
 
2.3%
) 65
 
2.2%
( 65
 
2.2%
a 64
 
2.2%
62
 
2.1%
e 61
 
2.1%
59
 
2.0%
Other values (194) 2107
72.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1554
53.5%
Lowercase Letter 660
22.7%
Uppercase Letter 261
 
9.0%
Space Separator 222
 
7.6%
Close Punctuation 65
 
2.2%
Open Punctuation 65
 
2.2%
Decimal Number 48
 
1.7%
Other Punctuation 24
 
0.8%
Connector Punctuation 3
 
0.1%
Dash Punctuation 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
66
 
4.2%
62
 
4.0%
59
 
3.8%
53
 
3.4%
52
 
3.3%
44
 
2.8%
41
 
2.6%
40
 
2.6%
39
 
2.5%
38
 
2.4%
Other values (133) 1060
68.2%
Lowercase Letter
ValueCountFrequency (%)
i 68
10.3%
o 66
 
10.0%
a 64
 
9.7%
e 61
 
9.2%
t 52
 
7.9%
s 50
 
7.6%
n 48
 
7.3%
r 28
 
4.2%
g 28
 
4.2%
y 27
 
4.1%
Other values (13) 168
25.5%
Uppercase Letter
ValueCountFrequency (%)
C 28
 
10.7%
T 24
 
9.2%
N 24
 
9.2%
A 18
 
6.9%
S 17
 
6.5%
I 15
 
5.7%
L 15
 
5.7%
P 14
 
5.4%
B 13
 
5.0%
R 12
 
4.6%
Other values (13) 81
31.0%
Decimal Number
ValueCountFrequency (%)
0 13
27.1%
2 11
22.9%
1 11
22.9%
7 5
 
10.4%
4 2
 
4.2%
3 2
 
4.2%
5 2
 
4.2%
6 2
 
4.2%
Other Punctuation
ValueCountFrequency (%)
/ 14
58.3%
: 10
41.7%
Space Separator
ValueCountFrequency (%)
222
100.0%
Close Punctuation
ValueCountFrequency (%)
) 65
100.0%
Open Punctuation
ValueCountFrequency (%)
( 65
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1554
53.5%
Latin 921
31.7%
Common 430
 
14.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
66
 
4.2%
62
 
4.0%
59
 
3.8%
53
 
3.4%
52
 
3.3%
44
 
2.8%
41
 
2.6%
40
 
2.6%
39
 
2.5%
38
 
2.4%
Other values (133) 1060
68.2%
Latin
ValueCountFrequency (%)
i 68
 
7.4%
o 66
 
7.2%
a 64
 
6.9%
e 61
 
6.6%
t 52
 
5.6%
s 50
 
5.4%
n 48
 
5.2%
r 28
 
3.0%
C 28
 
3.0%
g 28
 
3.0%
Other values (36) 428
46.5%
Common
ValueCountFrequency (%)
222
51.6%
) 65
 
15.1%
( 65
 
15.1%
/ 14
 
3.3%
0 13
 
3.0%
2 11
 
2.6%
1 11
 
2.6%
: 10
 
2.3%
7 5
 
1.2%
_ 3
 
0.7%
Other values (5) 11
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1554
53.5%
ASCII 1351
46.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
222
 
16.4%
i 68
 
5.0%
o 66
 
4.9%
) 65
 
4.8%
( 65
 
4.8%
a 64
 
4.7%
e 61
 
4.5%
t 52
 
3.8%
s 50
 
3.7%
n 48
 
3.6%
Other values (51) 590
43.7%
Hangul
ValueCountFrequency (%)
66
 
4.2%
62
 
4.0%
59
 
3.8%
53
 
3.4%
52
 
3.3%
44
 
2.8%
41
 
2.6%
40
 
2.6%
39
 
2.5%
38
 
2.4%
Other values (133) 1060
68.2%

colCnt
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing308
Missing (%)100.0%
Memory size2.8 KiB

dispFormat
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
텍스트
126 
YYYY-MM-DD
51 
Y : 유 / N : 무
51 
숫자
39 
RN+비식별숫자(8)
24 
Other values (5)
17 

Length

Max length18
Median length15
Mean length6.6850649
Min length2

Unique

Unique2 ?
Unique (%)0.6%

Sample

1st rowRN+비식별숫자(8)
2nd rowM 남 | F 여
3rd rowYYYY-MM-DD
4th row원내검사 코드
5th rowYYYY-MM-DD

Common Values

ValueCountFrequency (%)
텍스트 126
40.9%
YYYY-MM-DD 51
16.6%
Y : 유 / N : 무 51
16.6%
숫자 39
 
12.7%
RN+비식별숫자(8) 24
 
7.8%
Free 텍스트 10
 
3.2%
Y : 내부 / N : 외부 3
 
1.0%
원내검사 코드 2
 
0.6%
M 남 | F 여 1
 
0.3%
Y : 유 / N : 무/알수없음 1
 
0.3%

Length

2023-12-12T18:20:32.444270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:20:32.606709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
166
25.4%
텍스트 136
20.8%
y 55
 
8.4%
n 55
 
8.4%
52
 
8.0%
yyyy-mm-dd 51
 
7.8%
51
 
7.8%
숫자 39
 
6.0%
rn+비식별숫자(8 24
 
3.7%
free 10
 
1.5%
Other values (9) 15
 
2.3%

Interactions

2023-12-12T18:20:27.181835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T18:20:32.753409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.0000.9600.9600.9920.9920.7480.704
gpId0.9601.0001.0001.0001.0000.7670.605
gpNm0.9601.0001.0001.0001.0000.7670.605
tblId0.9921.0001.0001.0001.0000.7460.609
tblNm0.9921.0001.0001.0001.0000.7460.609
dataType0.7480.7670.7670.7460.7461.0000.953
dispFormat0.7040.6050.6050.6090.6090.9531.000
2023-12-12T18:20:32.907826image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
dataTypetblIddispFormatgpNmgpIdtblNm
dataType1.0000.2420.7180.3040.3040.242
tblId0.2421.0000.2510.9740.9741.000
dispFormat0.7180.2511.0000.2720.2720.251
gpNm0.3040.9740.2721.0001.0000.974
gpId0.3040.9740.2721.0001.0000.974
tblNm0.2421.0000.2510.9740.9741.000
2023-12-12T18:20:33.035728image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.0000.7880.7880.8960.8960.3600.285
gpId0.7881.0001.0000.9740.9740.3040.272
gpNm0.7881.0001.0000.9740.9740.3040.272
tblId0.8960.9740.9741.0001.0000.2420.251
tblNm0.8960.9740.9741.0001.0000.2420.251
dataType0.3600.3040.3040.2420.2421.0000.718
dispFormat0.2850.2720.2720.2510.2510.7181.000

Missing values

2023-12-12T18:20:27.356564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:20:27.595589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01BLAD_TRGTSummaryBLAD_PT_TRGT기본정보PT_SBST_NO환자대체번호String(10)환자대체번호<NA>RN+비식별숫자(8)
12BLAD_TRGTSummaryBLAD_PT_TRGT기본정보SEX_CD성별 코드String(code)성별코드<NA>M 남 | F 여
23BLAD_TRGTSummaryBLAD_PT_TRGT기본정보BRTH_YMD생년월일DATE생년월일<NA>YYYY-MM-DD
34BLAD_TRGTSummaryBLAD_PT_TRGT기본정보FRST_DIAG_CD최초 진단 코드String(code)최초진단코드<NA>원내검사 코드
45BLAD_TRGTSummaryBLAD_PT_TRGT기본정보FRST_DIAG_YMD최초 진단일DATE최초진단일자<NA>YYYY-MM-DD
56BLAD_TRGTSummaryBLAD_PT_TRGT기본정보FRST_DIAG_NM최초 진단명String(256)최초진단명<NA>텍스트
67BLAD_TRGTSummaryBLAD_PT_TRGT기본정보DIAG_ATT_AGE진단 시 나이Integer(3)진단 시 나이<NA>숫자
78BLAD_TRGTSummaryBLAD_PT_TRGT기본정보FRMD_YMD초진일DATE초진일자<NA>YYYY-MM-DD
89BLAD_TRGTSummaryBLAD_PT_TRGT기본정보FRST_OPRT_YMD최초 수술일DATE최초 수술일자<NA>YYYY-MM-DD
910BLAD_TRGTSummaryBLAD_PT_TRGT기본정보FRST_OPRT_NM최초 수술명String(256)최초 수술명<NA>텍스트
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
298299BLAD_CHMO_FLST항암 FlowSheetBLAD_PE_CHMO_FLSTFlow SheetFATG_CMNTFATIGUE 내용String(50)FATIGUE<NA>텍스트
299300BLAD_CHMO_FLST항암 FlowSheetBLAD_PE_CHMO_FLSTFlow SheetNV_CMNTNV 내용String(50)NV<NA>텍스트
300301BLAD_CHMO_FLST항암 FlowSheetBLAD_PE_CHMO_FLSTFlow SheetCSTP_CMNTCONSTIPATION 내용String(50)CONSTIPATION<NA>텍스트
301302BLAD_CHMO_FLST항암 FlowSheetBLAD_PE_CHMO_FLSTFlow SheetDIAR_CMNTDIARRHEA 내용String(50)DIARRHEA<NA>텍스트
302303BLAD_CHMO_FLST항암 FlowSheetBLAD_PE_CHMO_FLSTFlow SheetSKIN_RASH_CMNTSKINRASH 내용String(50)SKINRASH<NA>텍스트
303304BLAD_CHMO_FLST항암 FlowSheetBLAD_PE_CHMO_FLSTFlow SheetMCST_CMNTMUCOSITIS 내용String(50)MUCOSITIS<NA>텍스트
304305BLAD_CHMO_FLST항암 FlowSheetBLAD_PE_CHMO_FLSTFlow SheetNURO_PTHY_CMNTNEUROPATHY 내용String(50)NEUROPATHY<NA>텍스트
305306BLAD_CHMO_FLST항암 FlowSheetBLAD_PE_CHMO_FLSTFlow SheetECOG_CDECOG 코드Integer(code)ECOG 전신상태평가<NA>숫자
306307BLAD_CHMO_FLST항암 FlowSheetBLAD_PE_CHMO_FLSTFlow SheetWT_VL체중 (kg)Float(5,2)체중<NA>숫자
307308BLAD_CHMO_FLST항암 FlowSheetBLAD_PE_CHMO_FLSTFlow SheetBSA_VLBSAFloat(10,2)체표면적<NA>숫자