Overview

Dataset statistics

Number of variables11
Number of observations246
Missing cells114
Missing cells (%)4.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory21.8 KiB
Average record size in memory90.5 B

Variable types

Numeric2
Categorical6
Text3

Dataset

Description담도암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048698/fileData.do

Alerts

gpId is highly overall correlated with NUM and 4 other fieldsHigh correlation
gpNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
tblId is highly overall correlated with NUM and 4 other fieldsHigh correlation
tblNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 3 other fieldsHigh correlation
colCnt is highly overall correlated with gpId and 4 other fieldsHigh correlation
dispFormat is highly overall correlated with colCntHigh correlation
dataType is highly imbalanced (63.3%)Imbalance
colCnt has 114 (46.3%) missing valuesMissing
NUM has unique valuesUnique

Reproduction

Analysis started2023-12-12 13:11:21.220729
Analysis finished2023-12-12 13:11:22.361377
Duration1.14 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct246
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean123.5
Minimum1
Maximum246
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-12-12T22:11:22.438151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile13.25
Q162.25
median123.5
Q3184.75
95-th percentile233.75
Maximum246
Range245
Interquartile range (IQR)122.5

Descriptive statistics

Standard deviation71.158274
Coefficient of variation (CV)0.57618036
Kurtosis-1.2
Mean123.5
Median Absolute Deviation (MAD)61.5
Skewness0
Sum30381
Variance5063.5
MonotonicityStrictly increasing
2023-12-12T22:11:22.577935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.4%
156 1
 
0.4%
158 1
 
0.4%
159 1
 
0.4%
160 1
 
0.4%
161 1
 
0.4%
162 1
 
0.4%
163 1
 
0.4%
164 1
 
0.4%
165 1
 
0.4%
Other values (236) 236
95.9%
ValueCountFrequency (%)
1 1
0.4%
2 1
0.4%
3 1
0.4%
4 1
0.4%
5 1
0.4%
6 1
0.4%
7 1
0.4%
8 1
0.4%
9 1
0.4%
10 1
0.4%
ValueCountFrequency (%)
246 1
0.4%
245 1
0.4%
244 1
0.4%
243 1
0.4%
242 1
0.4%
241 1
0.4%
240 1
0.4%
239 1
0.4%
238 1
0.4%
237 1
0.4%

gpId
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
BC_HLTH
73 
BC_SPR
35 
BC_OPRT
31 
BC_COMP
28 
BC_CCRT_RT
15 
Other values (9)
64 

Length

Max length12
Median length7
Mean length7.601626
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBC_SUMMARY
2nd rowBC_SUMMARY
3rd rowBC_SUMMARY
4th rowBC_SUMMARY
5th rowBC_SUMMARY

Common Values

ValueCountFrequency (%)
BC_HLTH 73
29.7%
BC_SPR 35
14.2%
BC_OPRT 31
12.6%
BC_COMP 28
 
11.4%
BC_CCRT_RT 15
 
6.1%
BC_CST 12
 
4.9%
BC_FLUP_DEAD 12
 
4.9%
BC_DIAG 11
 
4.5%
BC_MIEX_SREX 6
 
2.4%
BC_INIT_BX 6
 
2.4%
Other values (4) 17
 
6.9%

Length

2023-12-12T22:11:22.685095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bc_hlth 73
29.7%
bc_spr 35
14.2%
bc_oprt 31
12.6%
bc_comp 28
 
11.4%
bc_ccrt_rt 15
 
6.1%
bc_cst 12
 
4.9%
bc_flup_dead 12
 
4.9%
bc_diag 11
 
4.5%
bc_miex_srex 6
 
2.4%
bc_init_bx 6
 
2.4%
Other values (4) 17
 
6.9%

gpNm
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
기타건강정보
73 
외과병리보고서
35 
수술정보
31 
합병증
28 
CCRT/RT
15 
Other values (9)
64 

Length

Max length17
Median length16
Mean length6.199187
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSummary
2nd rowSummary
3rd rowSummary
4th rowSummary
5th rowSummary

Common Values

ValueCountFrequency (%)
기타건강정보 73
29.7%
외과병리보고서 35
14.2%
수술정보 31
12.6%
합병증 28
 
11.4%
CCRT/RT 15
 
6.1%
전이 및 재발 12
 
4.9%
사망 및 치료평가 12
 
4.9%
진단정보 11
 
4.5%
진단검사(영상/시술) 6
 
2.4%
진단검사(Initial Bx) 6
 
2.4%
Other values (4) 17
 
6.9%

Length

2023-12-12T22:11:22.789392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
기타건강정보 73
23.9%
외과병리보고서 35
11.4%
수술정보 31
10.1%
합병증 28
 
9.2%
24
 
7.8%
ccrt/rt 15
 
4.9%
사망 12
 
3.9%
치료평가 12
 
3.9%
재발 12
 
3.9%
전이 12
 
3.9%
Other values (8) 52
17.0%

tblId
Categorical

HIGH CORRELATION 

Distinct36
Distinct (%)14.6%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
PE_BC_SPR
35 
PE_BC_COMP
28 
MR_BC_HLTH_9
 
14
PE_BC_OPRT
 
10
PE_BC_RTX_1
 
10
Other values (31)
149 

Length

Max length17
Median length16
Mean length11.51626
Min length9

Unique

Unique2 ?
Unique (%)0.8%

Sample

1st rowBC_SUMMARY_PTIF_V
2nd rowBC_SUMMARY_PTIF_V
3rd rowBC_SUMMARY_PTIF_V
4th rowBC_SUMMARY_PTIF_V
5th rowBC_SUMMARY_PTIF_V

Common Values

ValueCountFrequency (%)
PE_BC_SPR 35
 
14.2%
PE_BC_COMP 28
 
11.4%
MR_BC_HLTH_9 14
 
5.7%
PE_BC_OPRT 10
 
4.1%
PE_BC_RTX_1 10
 
4.1%
MR_BC_HLTH_4 9
 
3.7%
MR_BC_HLTH_7 9
 
3.7%
MR_BC_HLTH_5 9
 
3.7%
MR_BC_HLTH_6 9
 
3.7%
MR_BC_HLTH_2 7
 
2.8%
Other values (26) 106
43.1%

Length

2023-12-12T22:11:22.895734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pe_bc_spr 35
 
14.2%
pe_bc_comp 28
 
11.4%
mr_bc_hlth_9 14
 
5.7%
pe_bc_oprt 10
 
4.1%
pe_bc_rtx_1 10
 
4.1%
mr_bc_hlth_4 9
 
3.7%
mr_bc_hlth_7 9
 
3.7%
mr_bc_hlth_5 9
 
3.7%
mr_bc_hlth_6 9
 
3.7%
pt_bc_dead 7
 
2.8%
Other values (26) 106
43.1%

tblNm
Categorical

HIGH CORRELATION 

Distinct35
Distinct (%)14.2%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
외과병리보고서
35 
합병증
28 
과거력
 
14
수술정보
 
10
RT 정보
 
10
Other values (30)
149 

Length

Max length22
Median length14
Mean length6.5934959
Min length2

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st rowPatient info
2nd rowPatient info
3rd rowPatient info
4th rowPatient info
5th rowPatient info

Common Values

ValueCountFrequency (%)
외과병리보고서 35
 
14.2%
합병증 28
 
11.4%
과거력 14
 
5.7%
수술정보 10
 
4.1%
RT 정보 10
 
4.1%
가족력(형제/자매) 9
 
3.7%
가족력(부) 9
 
3.7%
가족력(자녀) 9
 
3.7%
가족력(모) 9
 
3.7%
사망정보 7
 
2.8%
Other values (25) 106
43.1%

Length

2023-12-12T22:11:23.008471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
외과병리보고서 35
 
10.7%
합병증 28
 
8.5%
정보 16
 
4.9%
결과 15
 
4.6%
과거력 14
 
4.3%
initial 11
 
3.4%
수술정보 10
 
3.0%
rt 10
 
3.0%
가족력(모 9
 
2.7%
가족력(자녀 9
 
2.7%
Other values (35) 171
52.1%

colId
Text

Distinct237
Distinct (%)96.3%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
2023-12-12T22:11:23.225829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length17
Mean length12.373984
Min length5

Characters and Unicode

Total characters3044
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique228 ?
Unique (%)92.7%

Sample

1st rowDIAG_AGE
2nd rowFRMD_YMD
3rd rowORD_YMD
4th rowFRST_TRTM_RSRV_YMD
5th rowOPRT_NM
ValueCountFrequency (%)
ancd_nm 2
 
0.8%
clnc_diag_nm 2
 
0.8%
stag_rcrd_ymd 2
 
0.8%
mtst_part_cmnt 2
 
0.8%
ancd_ingr_nm 2
 
0.8%
ctx_cycl 2
 
0.8%
oprt_nm 2
 
0.8%
frst_trtm_rsrv_ymd 2
 
0.8%
ord_ymd 2
 
0.8%
last_vshs_ymd 1
 
0.4%
Other values (227) 227
92.3%
2023-12-12T22:11:23.571317image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 487
16.0%
T 271
 
8.9%
N 269
 
8.8%
M 238
 
7.8%
C 223
 
7.3%
S 182
 
6.0%
R 163
 
5.4%
D 150
 
4.9%
Y 115
 
3.8%
H 110
 
3.6%
Other values (22) 836
27.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2549
83.7%
Connector Punctuation 487
 
16.0%
Decimal Number 8
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 271
 
10.6%
N 269
 
10.6%
M 238
 
9.3%
C 223
 
8.7%
S 182
 
7.1%
R 163
 
6.4%
D 150
 
5.9%
Y 115
 
4.5%
H 110
 
4.3%
A 99
 
3.9%
Other values (16) 729
28.6%
Decimal Number
ValueCountFrequency (%)
1 3
37.5%
2 2
25.0%
6 1
 
12.5%
3 1
 
12.5%
4 1
 
12.5%
Connector Punctuation
ValueCountFrequency (%)
_ 487
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2549
83.7%
Common 495
 
16.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 271
 
10.6%
N 269
 
10.6%
M 238
 
9.3%
C 223
 
8.7%
S 182
 
7.1%
R 163
 
6.4%
D 150
 
5.9%
Y 115
 
4.5%
H 110
 
4.3%
A 99
 
3.9%
Other values (16) 729
28.6%
Common
ValueCountFrequency (%)
_ 487
98.4%
1 3
 
0.6%
2 2
 
0.4%
6 1
 
0.2%
3 1
 
0.2%
4 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3044
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 487
16.0%
T 271
 
8.9%
N 269
 
8.8%
M 238
 
7.8%
C 223
 
7.3%
S 182
 
6.0%
R 163
 
5.4%
D 150
 
4.9%
Y 115
 
3.8%
H 110
 
3.6%
Other values (22) 836
27.5%

colNm
Text

Distinct230
Distinct (%)93.5%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
2023-12-12T22:11:23.846017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length52
Median length29
Mean length11.658537
Min length2

Characters and Unicode

Total characters2868
Distinct characters160
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique216 ?
Unique (%)87.8%

Sample

1st row진단 시 나이
2nd row초진일
3rd row약물치료시작일
4th row방사선치료시작일
5th row수술명
ValueCountFrequency (%)
grade 14
 
3.1%
of 10
 
2.2%
가족병력(모 8
 
1.7%
기타 8
 
1.7%
invasion 8
 
1.7%
가족병력(부 8
 
1.7%
가족병력(형제/자매 8
 
1.7%
stage 6
 
1.3%
구분 6
 
1.3%
resection 5
 
1.1%
Other values (242) 378
82.4%
2023-12-12T22:11:24.340815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
217
 
7.6%
A 163
 
5.7%
E 161
 
5.6%
I 147
 
5.1%
T 121
 
4.2%
O 113
 
3.9%
N 104
 
3.6%
S 102
 
3.6%
R 101
 
3.5%
C 88
 
3.1%
Other values (150) 1551
54.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1584
55.2%
Other Letter 951
33.2%
Space Separator 217
 
7.6%
Open Punctuation 43
 
1.5%
Close Punctuation 43
 
1.5%
Other Punctuation 19
 
0.7%
Dash Punctuation 6
 
0.2%
Decimal Number 5
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
54
 
5.7%
53
 
5.6%
51
 
5.4%
40
 
4.2%
38
 
4.0%
37
 
3.9%
26
 
2.7%
24
 
2.5%
22
 
2.3%
20
 
2.1%
Other values (116) 586
61.6%
Uppercase Letter
ValueCountFrequency (%)
A 163
10.3%
E 161
 
10.2%
I 147
 
9.3%
T 121
 
7.6%
O 113
 
7.1%
N 104
 
6.6%
S 102
 
6.4%
R 101
 
6.4%
C 88
 
5.6%
L 79
 
5.0%
Other values (15) 405
25.6%
Decimal Number
ValueCountFrequency (%)
2 2
40.0%
1 2
40.0%
3 1
20.0%
Other Punctuation
ValueCountFrequency (%)
/ 16
84.2%
. 3
 
15.8%
Space Separator
ValueCountFrequency (%)
217
100.0%
Open Punctuation
ValueCountFrequency (%)
( 43
100.0%
Close Punctuation
ValueCountFrequency (%)
) 43
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1584
55.2%
Hangul 951
33.2%
Common 333
 
11.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
54
 
5.7%
53
 
5.6%
51
 
5.4%
40
 
4.2%
38
 
4.0%
37
 
3.9%
26
 
2.7%
24
 
2.5%
22
 
2.3%
20
 
2.1%
Other values (116) 586
61.6%
Latin
ValueCountFrequency (%)
A 163
10.3%
E 161
 
10.2%
I 147
 
9.3%
T 121
 
7.6%
O 113
 
7.1%
N 104
 
6.6%
S 102
 
6.4%
R 101
 
6.4%
C 88
 
5.6%
L 79
 
5.0%
Other values (15) 405
25.6%
Common
ValueCountFrequency (%)
217
65.2%
( 43
 
12.9%
) 43
 
12.9%
/ 16
 
4.8%
- 6
 
1.8%
. 3
 
0.9%
2 2
 
0.6%
1 2
 
0.6%
3 1
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1917
66.8%
Hangul 951
33.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
217
 
11.3%
A 163
 
8.5%
E 161
 
8.4%
I 147
 
7.7%
T 121
 
6.3%
O 113
 
5.9%
N 104
 
5.4%
S 102
 
5.3%
R 101
 
5.3%
C 88
 
4.6%
Other values (24) 600
31.3%
Hangul
ValueCountFrequency (%)
54
 
5.7%
53
 
5.6%
51
 
5.4%
40
 
4.2%
38
 
4.0%
37
 
3.9%
26
 
2.7%
24
 
2.5%
22
 
2.3%
20
 
2.1%
Other values (116) 586
61.6%

dataType
Categorical

IMBALANCE 

Distinct14
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
String()
193 
DATE
 
13
Integer()
 
12
<NA>
 
11
Float()
 
3
Other values (9)
 
14

Length

Max length13
Median length8
Mean length7.7764228
Min length4

Unique

Unique5 ?
Unique (%)2.0%

Sample

1st rowInteger()
2nd rowDATE
3rd rowDATE
4th rowDATE
5th rowString()

Common Values

ValueCountFrequency (%)
String() 193
78.5%
DATE 13
 
5.3%
Integer() 12
 
4.9%
<NA> 11
 
4.5%
Float() 3
 
1.2%
Integer(code) 3
 
1.2%
Float(51) 2
 
0.8%
String(code) 2
 
0.8%
Float(,) 2
 
0.8%
Float(62) 1
 
0.4%
Other values (4) 4
 
1.6%

Length

2023-12-12T22:11:24.485823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
string 193
78.5%
date 13
 
5.3%
integer 13
 
5.3%
na 11
 
4.5%
float 5
 
2.0%
integer(code 3
 
1.2%
float(51 2
 
0.8%
string(code 2
 
0.8%
float(62 1
 
0.4%
string(4000 1
 
0.4%
Other values (2) 2
 
0.8%
Distinct244
Distinct (%)99.2%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
2023-12-12T22:11:24.754022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length54
Median length25
Mean length13.109756
Min length5

Characters and Unicode

Total characters3225
Distinct characters242
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique242 ?
Unique (%)98.4%

Sample

1st row담도암 진단시 나이
2nd row간암센터 외래 초진일
3rd row항암치료 첫 치료시작일
4th row방사선치료 첫 치료시작일
5th row담도암 수술명
ValueCountFrequency (%)
유무 43
 
5.5%
상세내용 22
 
2.8%
수술 21
 
2.7%
20
 
2.6%
기타 16
 
2.0%
과거력 14
 
1.8%
악성도 13
 
1.7%
담도암 11
 
1.4%
10
 
1.3%
첫번째 10
 
1.3%
Other values (296) 602
77.0%
2023-12-12T22:11:25.175857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
536
 
16.6%
87
 
2.7%
72
 
2.2%
59
 
1.8%
55
 
1.7%
55
 
1.7%
52
 
1.6%
48
 
1.5%
46
 
1.4%
( 46
 
1.4%
Other values (232) 2169
67.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2273
70.5%
Space Separator 536
 
16.6%
Uppercase Letter 258
 
8.0%
Other Punctuation 50
 
1.6%
Open Punctuation 46
 
1.4%
Close Punctuation 46
 
1.4%
Decimal Number 12
 
0.4%
Dash Punctuation 3
 
0.1%
Other Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
87
 
3.8%
72
 
3.2%
59
 
2.6%
55
 
2.4%
55
 
2.4%
52
 
2.3%
48
 
2.1%
46
 
2.0%
46
 
2.0%
46
 
2.0%
Other values (199) 1707
75.1%
Uppercase Letter
ValueCountFrequency (%)
C 40
15.5%
T 26
10.1%
L 26
10.1%
E 22
8.5%
A 22
8.5%
I 20
 
7.8%
N 14
 
5.4%
R 14
 
5.4%
O 13
 
5.0%
S 11
 
4.3%
Other values (10) 50
19.4%
Decimal Number
ValueCountFrequency (%)
1 4
33.3%
2 3
25.0%
4 2
16.7%
3 2
16.7%
0 1
 
8.3%
Other Punctuation
ValueCountFrequency (%)
' 26
52.0%
/ 22
44.0%
, 2
 
4.0%
Space Separator
ValueCountFrequency (%)
536
100.0%
Open Punctuation
ValueCountFrequency (%)
( 46
100.0%
Close Punctuation
ValueCountFrequency (%)
) 46
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Other Number
ValueCountFrequency (%)
² 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2273
70.5%
Common 694
 
21.5%
Latin 258
 
8.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
87
 
3.8%
72
 
3.2%
59
 
2.6%
55
 
2.4%
55
 
2.4%
52
 
2.3%
48
 
2.1%
46
 
2.0%
46
 
2.0%
46
 
2.0%
Other values (199) 1707
75.1%
Latin
ValueCountFrequency (%)
C 40
15.5%
T 26
10.1%
L 26
10.1%
E 22
8.5%
A 22
8.5%
I 20
 
7.8%
N 14
 
5.4%
R 14
 
5.4%
O 13
 
5.0%
S 11
 
4.3%
Other values (10) 50
19.4%
Common
ValueCountFrequency (%)
536
77.2%
( 46
 
6.6%
) 46
 
6.6%
' 26
 
3.7%
/ 22
 
3.2%
1 4
 
0.6%
- 3
 
0.4%
2 3
 
0.4%
4 2
 
0.3%
3 2
 
0.3%
Other values (3) 4
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2273
70.5%
ASCII 951
29.5%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
536
56.4%
( 46
 
4.8%
) 46
 
4.8%
C 40
 
4.2%
' 26
 
2.7%
T 26
 
2.7%
L 26
 
2.7%
E 22
 
2.3%
/ 22
 
2.3%
A 22
 
2.3%
Other values (22) 139
 
14.6%
Hangul
ValueCountFrequency (%)
87
 
3.8%
72
 
3.2%
59
 
2.6%
55
 
2.4%
55
 
2.4%
52
 
2.3%
48
 
2.1%
46
 
2.0%
46
 
2.0%
46
 
2.0%
Other values (199) 1707
75.1%
None
ValueCountFrequency (%)
² 1
100.0%

colCnt
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct83
Distinct (%)62.9%
Missing114
Missing (%)46.3%
Infinite0
Infinite (%)0.0%
Mean4628.0909
Minimum1
Maximum90932
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 KiB
2023-12-12T22:11:25.334841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile103.5
Q1384
median743.5
Q32321
95-th percentile28783
Maximum90932
Range90931
Interquartile range (IQR)1937

Descriptive statistics

Standard deviation13943.03
Coefficient of variation (CV)3.0126957
Kurtosis24.843484
Mean4628.0909
Median Absolute Deviation (MAD)563.5
Skewness4.7912511
Sum610908
Variance1.9440808 × 108
MonotonicityNot monotonic
2023-12-12T22:11:25.482908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2321 17
 
6.9%
523 7
 
2.8%
1009 7
 
2.8%
658 6
 
2.4%
180 4
 
1.6%
4774 3
 
1.2%
34339 3
 
1.2%
828 2
 
0.8%
712 2
 
0.8%
166 2
 
0.8%
Other values (73) 79
32.1%
(Missing) 114
46.3%
ValueCountFrequency (%)
1 1
0.4%
19 1
0.4%
26 1
0.4%
29 1
0.4%
55 1
0.4%
79 1
0.4%
87 1
0.4%
117 1
0.4%
145 1
0.4%
163 1
0.4%
ValueCountFrequency (%)
90932 2
0.8%
69449 1
 
0.4%
34339 3
1.2%
28783 2
0.8%
27949 1
 
0.4%
17879 2
0.8%
5718 1
 
0.4%
4774 3
1.2%
4750 1
 
0.4%
4570 1
 
0.4%

dispFormat
Categorical

HIGH CORRELATION 

Distinct21
Distinct (%)8.5%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
<NA>
105 
텍스트
54 
Y 유 |N 무
21 
YYYY-MM-DD
16 
숫자
16 
Other values (16)
34 

Length

Max length681
Median length71
Mean length8.5447154
Min length2

Unique

Unique12 ?
Unique (%)4.9%

Sample

1st row숫자
2nd rowYYYY-MM-DD
3rd rowYYYY-MM-DD
4th rowYYYY-MM-DD
5th row텍스트

Common Values

ValueCountFrequency (%)
<NA> 105
42.7%
텍스트 54
22.0%
Y 유 |N 무 21
 
8.5%
YYYY-MM-DD 16
 
6.5%
숫자 16
 
6.5%
Y : 유 / N : 무 9
 
3.7%
Y, Y |N, N 8
 
3.3%
Y Y |N N 3
 
1.2%
내부 | 외부 2
 
0.8%
예)NEEDLE BIOPSY 1
 
0.4%
Other values (11) 11
 
4.5%

Length

2023-12-12T22:11:25.663032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 105
18.6%
81
14.3%
텍스트 55
9.7%
y 52
9.2%
n 52
9.2%
30
 
5.3%
30
 
5.3%
yyyy-mm-dd 16
 
2.8%
숫자 16
 
2.8%
neutrophil 2
 
0.4%
Other values (122) 127
22.4%

Interactions

2023-12-12T22:11:22.012277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:11:21.878075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:11:22.073307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:11:21.947089image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T22:11:25.784747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypecolCntdispFormat
NUM1.0000.9380.9380.9880.9910.4450.4400.734
gpId0.9381.0001.0001.0001.0000.5650.8470.822
gpNm0.9381.0001.0001.0001.0000.5650.8470.822
tblId0.9881.0001.0001.0001.0000.7210.8710.887
tblNm0.9911.0001.0001.0001.0000.7150.8710.887
dataType0.4450.5650.5650.7210.7151.0000.0000.853
colCnt0.4400.8470.8470.8710.8710.0001.0000.908
dispFormat0.7340.8220.8220.8870.8870.8530.9081.000
2023-12-12T22:11:25.938773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
dispFormatgpIdgpNmtblIddataTypetblNm
dispFormat1.0000.4570.4570.4560.4340.456
gpId0.4571.0001.0000.9510.2410.948
gpNm0.4571.0001.0000.9510.2410.948
tblId0.4560.9510.9511.0000.2950.998
dataType0.4340.2410.2410.2951.0000.292
tblNm0.4560.9480.9480.9980.2921.000
2023-12-12T22:11:26.093475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMcolCntgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.000-0.0900.7520.7520.8510.8490.2040.381
colCnt-0.0901.0000.6860.6860.6130.6130.0000.716
gpId0.7520.6861.0001.0000.9510.9480.2410.457
gpNm0.7520.6861.0001.0000.9510.9480.2410.457
tblId0.8510.6130.9510.9511.0000.9980.2950.456
tblNm0.8490.6130.9480.9480.9981.0000.2920.456
dataType0.2040.0000.2410.2410.2950.2921.0000.434
dispFormat0.3810.7160.4570.4570.4560.4560.4341.000

Missing values

2023-12-12T22:11:22.162251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:11:22.302147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01BC_SUMMARYSummaryBC_SUMMARY_PTIF_VPatient infoDIAG_AGE진단 시 나이Integer()담도암 진단시 나이<NA>숫자
12BC_SUMMARYSummaryBC_SUMMARY_PTIF_VPatient infoFRMD_YMD초진일DATE간암센터 외래 초진일<NA>YYYY-MM-DD
23BC_SUMMARYSummaryBC_SUMMARY_PTIF_VPatient infoORD_YMD약물치료시작일DATE항암치료 첫 치료시작일<NA>YYYY-MM-DD
34BC_SUMMARYSummaryBC_SUMMARY_PTIF_VPatient infoFRST_TRTM_RSRV_YMD방사선치료시작일DATE방사선치료 첫 치료시작일<NA>YYYY-MM-DD
45BC_SUMMARYSummaryBC_SUMMARY_PTIF_VPatient infoOPRT_NM수술명String()담도암 수술명<NA>텍스트
56BC_DIAG진단정보RG_BC_CNDX진단정보CLNC_DIAG_NM임상진단명String()담도암 및 기타암 임상진단명4570텍스트
67BC_DIAG진단정보RG_BC_CNDX진단정보CLNC_DIAG_CD담도암/기타암 임상진단코드String()담도암과 기타암 임상진단코드<NA>원내검사코드
78BC_DIAG진단정보RG_BC_CNDX진단정보DIAG_YMD진단일<NA>담도암 및 기타암 진단일자<NA>YYYY-MM-DD
89BC_DIAG진단정보PT_BC_BDMS증상 및 신체계측HT_MSRM_YMD신장 측정일DATE담도암 진단 이후 첫번째 신장 측정일1694YYYY-MM-DD
910BC_DIAG진단정보PT_BC_BDMS증상 및 신체계측HT_VL신장Float(51)담도암 진단 이후 첫번째 신장 값1782숫자
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
236237BC_HLTH기타건강정보MR_BC_HLTH_9과거력PHIS_CNCR_YN과거병력암여부String()과거력 암 유무<NA><NA>
237238BC_HLTH기타건강정보MR_BC_HLTH_9과거력PHIS_DEPR_YN과거병력우울증여부String()과거력 우울 유무<NA><NA>
238239BC_HLTH기타건강정보MR_BC_HLTH_9과거력PHIS_INSM_YN과거병력불면증여부String()과거력 불면 유무<NA><NA>
239240BC_HLTH기타건강정보MR_BC_HLTH_9과거력PHIS_CADZ_YN과거병력심장질환여부String()과거력 심장질환 유무<NA><NA>
240241BC_HLTH기타건강정보MR_BC_HLTH_9과거력PHIS_CADZ_CMNT과거병력심장질환내용String()과거력 심장질환 상세내용<NA><NA>
241242BC_HLTH기타건강정보MR_BC_HLTH_9과거력PHIS_ETC_YN과거병력기타여부String()과거력 기타 유무<NA><NA>
242243BC_HLTH기타건강정보MR_BC_HLTH_9과거력PHIS_ETC_CMNT과거병력기타내용String()과거력 기타 상세내용<NA><NA>
243244BC_HLTH기타건강정보MR_BC_HLTH_10증상/전원 정보MAIN_SYMP_YN증상String()입원 시 주증상 유무<NA><NA>
244245BC_HLTH기타건강정보MR_BC_HLTH_10증상/전원 정보MAIN_SYMP_CMNT증상 상세내용String()입원 시 주증상 상세내용<NA><NA>
245246BC_HLTH기타건강정보MR_BC_HLTH_10증상/전원 정보OUTS_DIAG_TRANS_YN타 병원 진단 후 전원여부String()타 병원 진단 후 전원여부<NA><NA>