Overview

Dataset statistics

Number of variables11
Number of observations221
Missing cells183
Missing cells (%)7.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory19.6 KiB
Average record size in memory90.6 B

Variable types

Numeric2
Categorical6
Text3

Dataset

Description췌장암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048699/fileData.do

Alerts

tblNm is highly overall correlated with NUM and 5 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 5 other fieldsHigh correlation
gpNm is highly overall correlated with NUM and 5 other fieldsHigh correlation
tblId is highly overall correlated with NUM and 5 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 4 other fieldsHigh correlation
colCnt is highly overall correlated with gpId and 3 other fieldsHigh correlation
dataType is highly overall correlated with dispFormatHigh correlation
dispFormat is highly overall correlated with NUM and 5 other fieldsHigh correlation
dataType is highly imbalanced (59.4%)Imbalance
colCnt has 183 (82.8%) missing valuesMissing
NUM has unique valuesUnique

Reproduction

Analysis started2023-12-12 20:07:47.201709
Analysis finished2023-12-12 20:07:48.684454
Duration1.48 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct221
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean111
Minimum1
Maximum221
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB
2023-12-13T05:07:48.790196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile12
Q156
median111
Q3166
95-th percentile210
Maximum221
Range220
Interquartile range (IQR)110

Descriptive statistics

Standard deviation63.941379
Coefficient of variation (CV)0.57604846
Kurtosis-1.2
Mean111
Median Absolute Deviation (MAD)55
Skewness0
Sum24531
Variance4088.5
MonotonicityStrictly increasing
2023-12-13T05:07:48.962898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.5%
153 1
 
0.5%
142 1
 
0.5%
143 1
 
0.5%
144 1
 
0.5%
145 1
 
0.5%
146 1
 
0.5%
147 1
 
0.5%
148 1
 
0.5%
149 1
 
0.5%
Other values (211) 211
95.5%
ValueCountFrequency (%)
1 1
0.5%
2 1
0.5%
3 1
0.5%
4 1
0.5%
5 1
0.5%
6 1
0.5%
7 1
0.5%
8 1
0.5%
9 1
0.5%
10 1
0.5%
ValueCountFrequency (%)
221 1
0.5%
220 1
0.5%
219 1
0.5%
218 1
0.5%
217 1
0.5%
216 1
0.5%
215 1
0.5%
214 1
0.5%
213 1
0.5%
212 1
0.5%

gpId
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
PNC_HLTH
73 
PNC_OPRT
29 
PNC_COMP
28 
PNC_SPR
21 
PNC_CST
12 
Other values (9)
58 

Length

Max length13
Median length8
Mean length8.5972851
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPNC_SUMMARY
2nd rowPNC_SUMMARY
3rd rowPNC_SUMMARY
4th rowPNC_SUMMARY
5th rowPNC_SUMMARY

Common Values

ValueCountFrequency (%)
PNC_HLTH 73
33.0%
PNC_OPRT 29
 
13.1%
PNC_COMP 28
 
12.7%
PNC_SPR 21
 
9.5%
PNC_CST 12
 
5.4%
PNC_FLUP_DEAD 12
 
5.4%
PNC_DIAG 11
 
5.0%
PNC_CHMO 7
 
3.2%
PNC_MIEX_SREX 6
 
2.7%
PNC_INIT_BX 6
 
2.7%
Other values (4) 16
 
7.2%

Length

2023-12-13T05:07:49.123432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pnc_hlth 73
33.0%
pnc_oprt 29
 
13.1%
pnc_comp 28
 
12.7%
pnc_spr 21
 
9.5%
pnc_cst 12
 
5.4%
pnc_flup_dead 12
 
5.4%
pnc_diag 11
 
5.0%
pnc_chmo 7
 
3.2%
pnc_miex_srex 6
 
2.7%
pnc_init_bx 6
 
2.7%
Other values (4) 16
 
7.2%

gpNm
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
기타건강정보
73 
수술정보
29 
합병증
28 
외과병리보고서
21 
전이 및 재발
12 
Other values (9)
58 

Length

Max length17
Median length16
Mean length6.1221719
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSummary
2nd rowSummary
3rd rowSummary
4th rowSummary
5th rowSummary

Common Values

ValueCountFrequency (%)
기타건강정보 73
33.0%
수술정보 29
 
13.1%
합병증 28
 
12.7%
외과병리보고서 21
 
9.5%
전이 및 재발 12
 
5.4%
사망 및 치료평가 12
 
5.4%
진단정보 11
 
5.0%
항암치료 7
 
3.2%
진단검사(영상/시술) 6
 
2.7%
진단검사(Initial Bx) 6
 
2.7%
Other values (4) 16
 
7.2%

Length

2023-12-13T05:07:49.313760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
기타건강정보 73
26.0%
수술정보 29
 
10.3%
합병증 28
 
10.0%
24
 
8.5%
외과병리보고서 21
 
7.5%
전이 12
 
4.3%
재발 12
 
4.3%
사망 12
 
4.3%
치료평가 12
 
4.3%
진단정보 11
 
3.9%
Other values (8) 47
16.7%

tblId
Categorical

HIGH CORRELATION 

Distinct35
Distinct (%)15.8%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
PE_PNC_COMP
28 
PE_PNC_SPR
21 
MR_PNC_HLTH_9
14 
PE_PNC_OPRT
 
10
MR_PNC_HLTH_4
 
9
Other values (30)
139 

Length

Max length18
Median length17
Mean length12.687783
Min length10

Unique

Unique2 ?
Unique (%)0.9%

Sample

1st rowPNC_SUMMARY_PTIF_V
2nd rowPNC_SUMMARY_PTIF_V
3rd rowPNC_SUMMARY_PTIF_V
4th rowPNC_SUMMARY_PTIF_V
5th rowPNC_SUMMARY_PTIF_V

Common Values

ValueCountFrequency (%)
PE_PNC_COMP 28
 
12.7%
PE_PNC_SPR 21
 
9.5%
MR_PNC_HLTH_9 14
 
6.3%
PE_PNC_OPRT 10
 
4.5%
MR_PNC_HLTH_4 9
 
4.1%
MR_PNC_HLTH_7 9
 
4.1%
MR_PNC_HLTH_5 9
 
4.1%
MR_PNC_HLTH_6 9
 
4.1%
PE_PNC_CHMO 7
 
3.2%
PT_PNC_DEAD 7
 
3.2%
Other values (25) 98
44.3%

Length

2023-12-13T05:07:49.480546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pe_pnc_comp 28
 
12.7%
pe_pnc_spr 21
 
9.5%
mr_pnc_hlth_9 14
 
6.3%
pe_pnc_oprt 10
 
4.5%
mr_pnc_hlth_4 9
 
4.1%
mr_pnc_hlth_7 9
 
4.1%
mr_pnc_hlth_5 9
 
4.1%
mr_pnc_hlth_6 9
 
4.1%
pe_pnc_chmo 7
 
3.2%
pt_pnc_dead 7
 
3.2%
Other values (25) 98
44.3%

tblNm
Categorical

HIGH CORRELATION 

Distinct34
Distinct (%)15.4%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
합병증
28 
외과병리보고서
21 
과거력
14 
수술정보
 
10
가족력(형제/자매)
 
9
Other values (29)
139 

Length

Max length22
Median length14
Mean length6.6063348
Min length2

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st rowPatient info
2nd rowPatient info
3rd rowPatient info
4th rowPatient info
5th rowPatient info

Common Values

ValueCountFrequency (%)
합병증 28
 
12.7%
외과병리보고서 21
 
9.5%
과거력 14
 
6.3%
수술정보 10
 
4.5%
가족력(형제/자매) 9
 
4.1%
가족력(자녀) 9
 
4.1%
가족력(모) 9
 
4.1%
가족력(부) 9
 
4.1%
항암치료정보 7
 
3.2%
음주력 7
 
3.2%
Other values (24) 98
44.3%

Length

2023-12-13T05:07:49.689282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
합병증 28
 
9.6%
외과병리보고서 21
 
7.2%
결과 15
 
5.1%
과거력 14
 
4.8%
initial 11
 
3.8%
수술정보 10
 
3.4%
가족력(형제/자매 9
 
3.1%
가족력(자녀 9
 
3.1%
가족력(모 9
 
3.1%
가족력(부 9
 
3.1%
Other values (34) 158
53.9%

colId
Text

Distinct214
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
2023-12-13T05:07:49.980857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length16
Mean length12.253394
Min length5

Characters and Unicode

Total characters2708
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique207 ?
Unique (%)93.7%

Sample

1st rowFRMD_YMD
2nd rowDIAG_AGE
3rd rowOPRT_NM
4th rowORD_YMD
5th rowFRST_TRTM_RSRV_YMD
ValueCountFrequency (%)
stag_rcrd_ymd 2
 
0.9%
ancd_ingr_nm 2
 
0.9%
ctx_cycl 2
 
0.9%
oprt_nm 2
 
0.9%
ord_ymd 2
 
0.9%
mtst_part_cmnt 2
 
0.9%
ancd_nm 2
 
0.9%
dcuz3_cmnt 1
 
0.5%
edu_dgre_cd 1
 
0.5%
adm_ymd 1
 
0.5%
Other values (204) 204
92.3%
2023-12-13T05:07:50.405181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 436
16.1%
N 238
 
8.8%
T 222
 
8.2%
M 209
 
7.7%
C 198
 
7.3%
S 161
 
5.9%
D 141
 
5.2%
R 132
 
4.9%
Y 110
 
4.1%
H 103
 
3.8%
Other values (22) 758
28.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2264
83.6%
Connector Punctuation 436
 
16.1%
Decimal Number 8
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 238
 
10.5%
T 222
 
9.8%
M 209
 
9.2%
C 198
 
8.7%
S 161
 
7.1%
D 141
 
6.2%
R 132
 
5.8%
Y 110
 
4.9%
H 103
 
4.5%
A 93
 
4.1%
Other values (16) 657
29.0%
Decimal Number
ValueCountFrequency (%)
1 3
37.5%
2 2
25.0%
3 1
 
12.5%
4 1
 
12.5%
6 1
 
12.5%
Connector Punctuation
ValueCountFrequency (%)
_ 436
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2264
83.6%
Common 444
 
16.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 238
 
10.5%
T 222
 
9.8%
M 209
 
9.2%
C 198
 
8.7%
S 161
 
7.1%
D 141
 
6.2%
R 132
 
5.8%
Y 110
 
4.9%
H 103
 
4.5%
A 93
 
4.1%
Other values (16) 657
29.0%
Common
ValueCountFrequency (%)
_ 436
98.2%
1 3
 
0.7%
2 2
 
0.5%
3 1
 
0.2%
4 1
 
0.2%
6 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2708
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 436
16.1%
N 238
 
8.8%
T 222
 
8.2%
M 209
 
7.7%
C 198
 
7.3%
S 161
 
5.9%
D 141
 
5.2%
R 132
 
4.9%
Y 110
 
4.1%
H 103
 
3.8%
Other values (22) 758
28.0%

colNm
Text

Distinct206
Distinct (%)93.2%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
2023-12-13T05:07:50.719602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length25
Mean length10.950226
Min length2

Characters and Unicode

Total characters2420
Distinct characters158
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique193 ?
Unique (%)87.3%

Sample

1st row초진일
2nd row진단 시 나이
3rd row수술명
4th row약물치료시작일
5th row방사선치료시작일
ValueCountFrequency (%)
grade 13
 
3.3%
가족병력(부 9
 
2.3%
가족병력(형제/자매 9
 
2.3%
가족병력(모 9
 
2.3%
가족병력(자녀 9
 
2.3%
기타 8
 
2.0%
유무 7
 
1.8%
stage 6
 
1.5%
기타내용 5
 
1.3%
invasion 5
 
1.3%
Other values (193) 317
79.8%
2023-12-13T05:07:51.201701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
181
 
7.5%
E 135
 
5.6%
A 129
 
5.3%
I 111
 
4.6%
T 93
 
3.8%
S 84
 
3.5%
R 84
 
3.5%
O 80
 
3.3%
N 74
 
3.1%
L 66
 
2.7%
Other values (148) 1383
57.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1235
51.0%
Other Letter 891
36.8%
Space Separator 181
 
7.5%
Open Punctuation 43
 
1.8%
Close Punctuation 43
 
1.8%
Other Punctuation 16
 
0.7%
Dash Punctuation 6
 
0.2%
Decimal Number 5
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
53
 
5.9%
53
 
5.9%
50
 
5.6%
40
 
4.5%
38
 
4.3%
34
 
3.8%
25
 
2.8%
23
 
2.6%
19
 
2.1%
19
 
2.1%
Other values (114) 537
60.3%
Uppercase Letter
ValueCountFrequency (%)
E 135
10.9%
A 129
10.4%
I 111
 
9.0%
T 93
 
7.5%
S 84
 
6.8%
R 84
 
6.8%
O 80
 
6.5%
N 74
 
6.0%
L 66
 
5.3%
C 65
 
5.3%
Other values (15) 314
25.4%
Decimal Number
ValueCountFrequency (%)
2 2
40.0%
1 2
40.0%
3 1
20.0%
Other Punctuation
ValueCountFrequency (%)
/ 14
87.5%
. 2
 
12.5%
Space Separator
ValueCountFrequency (%)
181
100.0%
Open Punctuation
ValueCountFrequency (%)
( 43
100.0%
Close Punctuation
ValueCountFrequency (%)
) 43
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1235
51.0%
Hangul 891
36.8%
Common 294
 
12.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
53
 
5.9%
53
 
5.9%
50
 
5.6%
40
 
4.5%
38
 
4.3%
34
 
3.8%
25
 
2.8%
23
 
2.6%
19
 
2.1%
19
 
2.1%
Other values (114) 537
60.3%
Latin
ValueCountFrequency (%)
E 135
10.9%
A 129
10.4%
I 111
 
9.0%
T 93
 
7.5%
S 84
 
6.8%
R 84
 
6.8%
O 80
 
6.5%
N 74
 
6.0%
L 66
 
5.3%
C 65
 
5.3%
Other values (15) 314
25.4%
Common
ValueCountFrequency (%)
181
61.6%
( 43
 
14.6%
) 43
 
14.6%
/ 14
 
4.8%
- 6
 
2.0%
. 2
 
0.7%
2 2
 
0.7%
1 2
 
0.7%
3 1
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1529
63.2%
Hangul 891
36.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
181
 
11.8%
E 135
 
8.8%
A 129
 
8.4%
I 111
 
7.3%
T 93
 
6.1%
S 84
 
5.5%
R 84
 
5.5%
O 80
 
5.2%
N 74
 
4.8%
L 66
 
4.3%
Other values (24) 492
32.2%
Hangul
ValueCountFrequency (%)
53
 
5.9%
53
 
5.9%
50
 
5.6%
40
 
4.5%
38
 
4.3%
34
 
3.8%
25
 
2.8%
23
 
2.6%
19
 
2.1%
19
 
2.1%
Other values (114) 537
60.3%

dataType
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct12
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
String()
166 
DATE
21 
Integer()
 
13
Float()
 
6
Integer(code)
 
6
Other values (7)
 
9

Length

Max length13
Median length8
Mean length7.8371041
Min length4

Unique

Unique5 ?
Unique (%)2.3%

Sample

1st rowDATE
2nd rowInteger()
3rd rowString()
4th rowDATE
5th rowDATE

Common Values

ValueCountFrequency (%)
String() 166
75.1%
DATE 21
 
9.5%
Integer() 13
 
5.9%
Float() 6
 
2.7%
Integer(code) 6
 
2.7%
Float(51) 2
 
0.9%
Integer(1) 2
 
0.9%
String(code) 1
 
0.5%
Float(62) 1
 
0.5%
String(4000) 1
 
0.5%
Other values (2) 2
 
0.9%

Length

2023-12-13T05:07:51.349525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
string 166
75.1%
date 21
 
9.5%
integer 13
 
5.9%
float 7
 
3.2%
integer(code 6
 
2.7%
float(51 2
 
0.9%
integer(1 2
 
0.9%
string(code 1
 
0.5%
float(62 1
 
0.5%
string(4000 1
 
0.5%
Distinct219
Distinct (%)99.1%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
2023-12-13T05:07:51.645846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length54
Median length25
Mean length13.40724
Min length5

Characters and Unicode

Total characters2963
Distinct characters232
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique217 ?
Unique (%)98.2%

Sample

1st row간암센터 외래 초진일
2nd row췌장암 진단시 나이
3rd row췌장암 수술명
4th row항암치료 첫 치료시작일
5th row방사선치료 첫 치료시작일
ValueCountFrequency (%)
유무 44
 
6.1%
수술 21
 
2.9%
상세내용 21
 
2.9%
17
 
2.4%
기타 16
 
2.2%
과거력 14
 
1.9%
악성도 13
 
1.8%
췌장암 11
 
1.5%
첫번째 10
 
1.4%
10
 
1.4%
Other values (258) 542
75.4%
2023-12-13T05:07:52.131665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
498
 
16.8%
88
 
3.0%
73
 
2.5%
54
 
1.8%
49
 
1.7%
48
 
1.6%
48
 
1.6%
) 46
 
1.6%
( 46
 
1.6%
46
 
1.6%
Other values (222) 1967
66.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2051
69.2%
Space Separator 498
 
16.8%
Uppercase Letter 255
 
8.6%
Other Punctuation 52
 
1.8%
Close Punctuation 46
 
1.6%
Open Punctuation 46
 
1.6%
Decimal Number 11
 
0.4%
Dash Punctuation 3
 
0.1%
Other Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
88
 
4.3%
73
 
3.6%
54
 
2.6%
49
 
2.4%
48
 
2.3%
48
 
2.3%
46
 
2.2%
45
 
2.2%
43
 
2.1%
42
 
2.0%
Other values (189) 1515
73.9%
Uppercase Letter
ValueCountFrequency (%)
C 40
15.7%
L 26
10.2%
T 25
9.8%
E 22
8.6%
A 21
8.2%
I 20
7.8%
N 14
 
5.5%
R 14
 
5.5%
O 13
 
5.1%
S 11
 
4.3%
Other values (10) 49
19.2%
Decimal Number
ValueCountFrequency (%)
2 3
27.3%
1 3
27.3%
4 2
18.2%
3 2
18.2%
0 1
 
9.1%
Other Punctuation
ValueCountFrequency (%)
' 26
50.0%
/ 24
46.2%
, 2
 
3.8%
Space Separator
ValueCountFrequency (%)
498
100.0%
Close Punctuation
ValueCountFrequency (%)
) 46
100.0%
Open Punctuation
ValueCountFrequency (%)
( 46
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Other Number
ValueCountFrequency (%)
² 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2051
69.2%
Common 657
 
22.2%
Latin 255
 
8.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
88
 
4.3%
73
 
3.6%
54
 
2.6%
49
 
2.4%
48
 
2.3%
48
 
2.3%
46
 
2.2%
45
 
2.2%
43
 
2.1%
42
 
2.0%
Other values (189) 1515
73.9%
Latin
ValueCountFrequency (%)
C 40
15.7%
L 26
10.2%
T 25
9.8%
E 22
8.6%
A 21
8.2%
I 20
7.8%
N 14
 
5.5%
R 14
 
5.5%
O 13
 
5.1%
S 11
 
4.3%
Other values (10) 49
19.2%
Common
ValueCountFrequency (%)
498
75.8%
) 46
 
7.0%
( 46
 
7.0%
' 26
 
4.0%
/ 24
 
3.7%
2 3
 
0.5%
- 3
 
0.5%
1 3
 
0.5%
, 2
 
0.3%
4 2
 
0.3%
Other values (3) 4
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2051
69.2%
ASCII 911
30.7%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
498
54.7%
) 46
 
5.0%
( 46
 
5.0%
C 40
 
4.4%
L 26
 
2.9%
' 26
 
2.9%
T 25
 
2.7%
/ 24
 
2.6%
E 22
 
2.4%
A 21
 
2.3%
Other values (22) 137
 
15.0%
Hangul
ValueCountFrequency (%)
88
 
4.3%
73
 
3.6%
54
 
2.6%
49
 
2.4%
48
 
2.3%
48
 
2.3%
46
 
2.2%
45
 
2.2%
43
 
2.1%
42
 
2.0%
Other values (189) 1515
73.9%
None
ValueCountFrequency (%)
² 1
100.0%

colCnt
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct23
Distinct (%)60.5%
Missing183
Missing (%)82.8%
Infinite0
Infinite (%)0.0%
Mean8048.2895
Minimum14
Maximum85076
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB
2023-12-13T05:07:52.289705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile48.3
Q1514.75
median2269.5
Q32321
95-th percentile82753.8
Maximum85076
Range85062
Interquartile range (IQR)1806.25

Descriptive statistics

Standard deviation22613.591
Coefficient of variation (CV)2.8097388
Kurtosis9.0122344
Mean8048.2895
Median Absolute Deviation (MAD)1301.5
Skewness3.2406059
Sum305835
Variance5.1137451 × 108
MonotonicityNot monotonic
2023-12-13T05:07:52.440701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
2321 13
 
5.9%
85076 2
 
0.9%
4156 2
 
0.9%
995 2
 
0.9%
401 1
 
0.5%
1427 1
 
0.5%
941 1
 
0.5%
68 1
 
0.5%
512 1
 
0.5%
564 1
 
0.5%
Other values (13) 13
 
5.9%
(Missing) 183
82.8%
ValueCountFrequency (%)
14 1
0.5%
16 1
0.5%
54 1
0.5%
62 1
0.5%
68 1
0.5%
130 1
0.5%
401 1
0.5%
426 1
0.5%
427 1
0.5%
512 1
0.5%
ValueCountFrequency (%)
85076 2
 
0.9%
82344 1
 
0.5%
4156 2
 
0.9%
2321 13
5.9%
2320 1
 
0.5%
2219 1
 
0.5%
2058 1
 
0.5%
1427 1
 
0.5%
995 2
 
0.9%
941 1
 
0.5%

dispFormat
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
<NA>
60 
텍스트
43 
Y : 유 / N : 무
32 
숫자
25 
YYYY-MM-DD
19 
Other values (7)
42 

Length

Max length15
Median length13
Mean length6.6877828
Min length2

Unique

Unique4 ?
Unique (%)1.8%

Sample

1st rowYYYY-MM-DD
2nd row숫자
3rd row텍스트
4th rowYYYY-MM-DD
5th rowYYYY-MM-DD

Common Values

ValueCountFrequency (%)
<NA> 60
27.1%
텍스트 43
19.5%
Y : 유 / N : 무 32
14.5%
숫자 25
11.3%
YYYY-MM-DD 19
 
8.6%
Y, Y |N, N 13
 
5.9%
Y 유 |N 무 13
 
5.9%
Y : 내부 / N : 외부 12
 
5.4%
원내검사코드 1
 
0.5%
Free 텍스트 1
 
0.5%
Other values (2) 2
 
0.9%

Length

2023-12-13T05:07:52.595502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
132
23.3%
y 84
14.8%
n 84
14.8%
na 60
10.6%
45
 
7.9%
45
 
7.9%
텍스트 44
 
7.8%
숫자 25
 
4.4%
yyyy-mm-dd 20
 
3.5%
내부 12
 
2.1%
Other values (5) 16
 
2.8%

Interactions

2023-12-13T05:07:48.151363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:07:47.980225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:07:48.285420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:07:48.059356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T05:07:52.693974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypecolCntdispFormat
NUM1.0000.9150.9150.9840.9800.4330.9630.758
gpId0.9151.0001.0001.0000.9990.6381.0000.819
gpNm0.9151.0001.0001.0000.9990.6381.0000.819
tblId0.9841.0001.0001.0001.0000.7411.0000.936
tblNm0.9800.9990.9991.0001.0000.7451.0000.936
dataType0.4330.6380.6380.7410.7451.0000.0000.882
colCnt0.9631.0001.0001.0001.0000.0001.0000.000
dispFormat0.7580.8190.8190.9360.9360.8820.0001.000
2023-12-13T05:07:52.833160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
dataTypetblNmdispFormatgpIdgpNmtblId
dataType1.0000.3340.6650.3090.3090.332
tblNm0.3341.0000.6390.9400.9400.997
dispFormat0.6650.6391.0000.5060.5060.639
gpId0.3090.9400.5061.0001.0000.948
gpNm0.3090.9400.5061.0001.0000.948
tblId0.3320.9970.6390.9480.9481.000
2023-12-13T05:07:52.951188image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMcolCntgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.0000.0170.6780.6780.8120.8100.1960.551
colCnt0.0171.0000.9570.9570.8660.8660.0000.000
gpId0.6780.9571.0001.0000.9480.9400.3090.506
gpNm0.6780.9571.0001.0000.9480.9400.3090.506
tblId0.8120.8660.9480.9481.0000.9970.3320.639
tblNm0.8100.8660.9400.9400.9971.0000.3340.639
dataType0.1960.0000.3090.3090.3320.3341.0000.665
dispFormat0.5510.0000.5060.5060.6390.6390.6651.000

Missing values

2023-12-13T05:07:48.424515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T05:07:48.612243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01PNC_SUMMARYSummaryPNC_SUMMARY_PTIF_VPatient infoFRMD_YMD초진일DATE간암센터 외래 초진일<NA>YYYY-MM-DD
12PNC_SUMMARYSummaryPNC_SUMMARY_PTIF_VPatient infoDIAG_AGE진단 시 나이Integer()췌장암 진단시 나이<NA>숫자
23PNC_SUMMARYSummaryPNC_SUMMARY_PTIF_VPatient infoOPRT_NM수술명String()췌장암 수술명<NA>텍스트
34PNC_SUMMARYSummaryPNC_SUMMARY_PTIF_VPatient infoORD_YMD약물치료시작일DATE항암치료 첫 치료시작일<NA>YYYY-MM-DD
45PNC_SUMMARYSummaryPNC_SUMMARY_PTIF_VPatient infoFRST_TRTM_RSRV_YMD방사선치료시작일DATE방사선치료 첫 치료시작일<NA>YYYY-MM-DD
56PNC_DIAG진단정보RG_PNC_CNDX진단정보CLNC_DIAG_CD임상진단코드String(code)췌장암과 기타암 임상진단코드<NA>원내검사코드
67PNC_DIAG진단정보RG_PNC_CNDX진단정보CLNC_DIAG_NM임상진단명String()췌장암 및 기타암 임상진단명<NA>텍스트
78PNC_DIAG진단정보RG_PNC_CNDX진단정보DIAG_YMD진단일DATE췌장암 및 기타암 진단일자<NA>YYYY-MM-DD
89PNC_DIAG진단정보PT_PNC_BDMS증상 및 신체계측HT_MSRM_YMD신장 측정일DATE췌장암 진단 이후 첫번째 신장 측정일<NA>YYYY-MM-DD
910PNC_DIAG진단정보PT_PNC_BDMS증상 및 신체계측HT_VL신장Float(51)췌장암 진단 이후 첫번째 신장 값<NA>숫자
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
211212PNC_HLTH기타건강정보MR_PNC_HLTH_9과거력PHIS_CNCR_YN과거병력암여부String()과거력 암 유무2321Y 유 |N 무
212213PNC_HLTH기타건강정보MR_PNC_HLTH_9과거력PHIS_DEPR_YN과거병력우을증여부String()과거력 우울 유무2321Y 유 |N 무
213214PNC_HLTH기타건강정보MR_PNC_HLTH_9과거력PHIS_INSM_YN과거병력불면증여부String()과거력 불면 유무2321Y 유 |N 무
214215PNC_HLTH기타건강정보MR_PNC_HLTH_9과거력PHIS_CADZ_YN과거병력심장질환여부String()과거력 심장질환 유무2321Y 유 |N 무
215216PNC_HLTH기타건강정보MR_PNC_HLTH_9과거력PHIS_CADZ_CMNT과거병력심장질환내용String()과거력 심장질환 상세내용68텍스트
216217PNC_HLTH기타건강정보MR_PNC_HLTH_9과거력PHIS_ETC_YN과거병력기타여부String()과거력 기타 유무2321Y 유 |N 무
217218PNC_HLTH기타건강정보MR_PNC_HLTH_9과거력PHIS_ETC_CMNT과거병력기타내용String()과거력 기타 상세내용941텍스트
218219PNC_HLTH기타건강정보MR_PNC_HLTH_10증상/전원 정보MAIN_SYMP_YN주증상유무String()입원 시 주증상 유무995Y 유 |N 무
219220PNC_HLTH기타건강정보MR_PNC_HLTH_10증상/전원 정보MAIN_SYMP_CMNT증상 상세내용String()입원 시 주증상 상세내용1427텍스트
220221PNC_HLTH기타건강정보MR_PNC_HLTH_10증상/전원 정보OUTS_DIAG_TRANS_YN타 병원 진단 후 전원String()타 병원 진단 후 전원여부2321Y 유 |N 무