Overview

Dataset statistics

Number of variables11
Number of observations382
Missing cells382
Missing cells (%)9.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory34.1 KiB
Average record size in memory91.3 B

Variable types

Numeric2
Categorical5
Text3
Unsupported1

Dataset

Description간암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048691/fileData.do

Alerts

gpNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
tblNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 4 other fieldsHigh correlation
tblId is highly overall correlated with NUM and 4 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 3 other fieldsHigh correlation
colCnt is highly overall correlated with gpId and 3 other fieldsHigh correlation
dataType is highly imbalanced (50.6%)Imbalance
dispFormat has 382 (100.0%) missing valuesMissing
NUM has unique valuesUnique
dispFormat is an unsupported type, check if it needs cleaning or further analysisUnsupported
colCnt has 82 (21.5%) zerosZeros

Reproduction

Analysis started2023-12-12 17:10:20.259944
Analysis finished2023-12-12 17:10:21.621350
Duration1.36 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct382
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean191.5
Minimum1
Maximum382
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 KiB
2023-12-13T02:10:21.713502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile20.05
Q196.25
median191.5
Q3286.75
95-th percentile362.95
Maximum382
Range381
Interquartile range (IQR)190.5

Descriptive statistics

Standard deviation110.41814
Coefficient of variation (CV)0.57659606
Kurtosis-1.2
Mean191.5
Median Absolute Deviation (MAD)95.5
Skewness0
Sum73153
Variance12192.167
MonotonicityStrictly increasing
2023-12-13T02:10:21.909245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.3%
253 1
 
0.3%
262 1
 
0.3%
261 1
 
0.3%
260 1
 
0.3%
259 1
 
0.3%
258 1
 
0.3%
257 1
 
0.3%
256 1
 
0.3%
255 1
 
0.3%
Other values (372) 372
97.4%
ValueCountFrequency (%)
1 1
0.3%
2 1
0.3%
3 1
0.3%
4 1
0.3%
5 1
0.3%
6 1
0.3%
7 1
0.3%
8 1
0.3%
9 1
0.3%
10 1
0.3%
ValueCountFrequency (%)
382 1
0.3%
381 1
0.3%
380 1
0.3%
379 1
0.3%
378 1
0.3%
377 1
0.3%
376 1
0.3%
375 1
0.3%
374 1
0.3%
373 1
0.3%

gpId
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
_LVER_HLTH
79 
_LVER_SPR
77 
_LVER_OPRT
38 
_LVER_COMP
28 
_LVER_HEPA
23 
Other values (13)
137 

Length

Max length15
Median length10
Mean length10.670157
Min length9

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row_LVER_Summary
2nd row_LVER_Summary
3rd row_LVER_Summary
4th row_LVER_Summary
5th row_LVER_Summary

Common Values

ValueCountFrequency (%)
_LVER_HLTH 79
20.7%
_LVER_SPR 77
20.2%
_LVER_OPRT 38
9.9%
_LVER_COMP 28
 
7.3%
_LVER_HEPA 23
 
6.0%
_LVER_Summary 18
 
4.7%
_LVER_RTX 15
 
3.9%
_LVER_RLPS_MIST 15
 
3.9%
_LVER_CNDX 14
 
3.7%
_LVER_BX_INIT 13
 
3.4%
Other values (8) 62
16.2%

Length

2023-12-13T02:10:22.080767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
lver_hlth 79
20.7%
lver_spr 77
20.2%
lver_oprt 38
9.9%
lver_comp 28
 
7.3%
lver_hepa 23
 
6.0%
lver_summary 18
 
4.7%
lver_rtx 15
 
3.9%
lver_rlps_mist 15
 
3.9%
lver_cndx 14
 
3.7%
lver_bx_init 13
 
3.4%
Other values (8) 62
16.2%

gpNm
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
기타건강정보
79 
외과병리보고서
77 
수술정보
38 
합병증
28 
기저간질환 정보
23 
Other values (13)
137 

Length

Max length12
Median length11
Mean length6.1806283
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSummary
2nd rowSummary
3rd rowSummary
4th rowSummary
5th rowSummary

Common Values

ValueCountFrequency (%)
기타건강정보 79
20.7%
외과병리보고서 77
20.2%
수술정보 38
9.9%
합병증 28
 
7.3%
기저간질환 정보 23
 
6.0%
Summary 18
 
4.7%
방사선치료 15
 
3.9%
전이 및 재발 15
 
3.9%
진단정보 14
 
3.7%
병리검사 13
 
3.4%
Other values (8) 62
16.2%

Length

2023-12-13T02:10:22.204473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
기타건강정보 79
16.4%
외과병리보고서 77
15.9%
수술정보 38
 
7.9%
합병증 28
 
5.8%
27
 
5.6%
기저간질환 23
 
4.8%
정보 23
 
4.8%
summary 18
 
3.7%
방사선치료 15
 
3.1%
전이 15
 
3.1%
Other values (13) 140
29.0%

tblId
Categorical

HIGH CORRELATION 

Distinct33
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
_LVER_MR_HLTH
79 
_LVER_PE_SPR
44 
_LVER_PE_COMP
28 
_LVER_PE_SPR_SUB
23 
_LVER_PT_TRGT
 
18
Other values (28)
190 

Length

Max length18
Median length13
Mean length13.740838
Min length12

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row_LVER_PT_TRGT
2nd row_LVER_PT_TRGT
3rd row_LVER_PT_TRGT
4th row_LVER_PT_TRGT
5th row_LVER_PT_TRGT

Common Values

ValueCountFrequency (%)
_LVER_MR_HLTH 79
20.7%
_LVER_PE_SPR 44
 
11.5%
_LVER_PE_COMP 28
 
7.3%
_LVER_PE_SPR_SUB 23
 
6.0%
_LVER_PT_TRGT 18
 
4.7%
_LVER_PE_RTX 15
 
3.9%
_LVER_PE_OPRT 14
 
3.7%
_LVER_PE_CHMO 12
 
3.1%
_LVER_PE_OPRT_LIST 9
 
2.4%
_LVER_PE_BX_INIT 9
 
2.4%
Other values (23) 131
34.3%

Length

2023-12-13T02:10:22.341600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
lver_mr_hlth 79
20.7%
lver_pe_spr 44
 
11.5%
lver_pe_comp 28
 
7.3%
lver_pe_spr_sub 23
 
6.0%
lver_pt_trgt 18
 
4.7%
lver_pe_rtx 15
 
3.9%
lver_pe_oprt 14
 
3.7%
lver_pe_chmo 12
 
3.1%
lver_pe_oprt_list 9
 
2.4%
lver_pe_bx_init 9
 
2.4%
Other values (23) 131
34.3%

tblNm
Categorical

HIGH CORRELATION 

Distinct33
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
간암 환자건강정보
79 
간암 외과병리
44 
간암 합병증
28 
간암 외과병리내용
23 
간암 대상자
 
18
Other values (28)
190 

Length

Max length25
Median length19
Mean length9.3743455
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row간암 대상자
2nd row간암 대상자
3rd row간암 대상자
4th row간암 대상자
5th row간암 대상자

Common Values

ValueCountFrequency (%)
간암 환자건강정보 79
20.7%
간암 외과병리 44
 
11.5%
간암 합병증 28
 
7.3%
간암 외과병리내용 23
 
6.0%
간암 대상자 18
 
4.7%
간암 방사선치료 15
 
3.9%
간암 수술정보 14
 
3.7%
간암 항암치료 12
 
3.1%
간암 수술기록 9
 
2.4%
간암 Initial 병리검사 9
 
2.4%
Other values (23) 131
34.3%

Length

2023-12-13T02:10:22.461541image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
간암 359
40.7%
환자건강정보 79
 
9.0%
외과병리 44
 
5.0%
initial 30
 
3.4%
합병증 28
 
3.2%
외과병리내용 23
 
2.6%
대상자 18
 
2.0%
정보 18
 
2.0%
관련 18
 
2.0%
방사선치료 15
 
1.7%
Other values (34) 250
28.3%

colId
Text

Distinct337
Distinct (%)88.2%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
2023-12-13T02:10:22.751823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length16
Mean length11.39267
Min length5

Characters and Unicode

Total characters4352
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique305 ?
Unique (%)79.8%

Sample

1st rowPT_NM
2nd rowSEX_CD
3rd rowBRTH_YMD
4th rowDIAG_AGE
5th rowFRMD_YMD
ValueCountFrequency (%)
path_no 6
 
1.6%
oprt_ymd 5
 
1.3%
ord_seq 3
 
0.8%
cexm_kind_nm 3
 
0.8%
cexm_nm 3
 
0.8%
cexm_rslt_cmnt 3
 
0.8%
cexm_cd 3
 
0.8%
ctx_estm_cmnt 3
 
0.8%
stag_rcrd_ymd 2
 
0.5%
lymp_seq 2
 
0.5%
Other values (327) 349
91.4%
2023-12-13T02:10:23.180266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 688
15.8%
T 395
 
9.1%
M 349
 
8.0%
N 346
 
8.0%
C 289
 
6.6%
R 234
 
5.4%
S 229
 
5.3%
D 210
 
4.8%
A 196
 
4.5%
H 160
 
3.7%
Other values (23) 1256
28.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3615
83.1%
Connector Punctuation 688
 
15.8%
Decimal Number 49
 
1.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 395
 
10.9%
M 349
 
9.7%
N 346
 
9.6%
C 289
 
8.0%
R 234
 
6.5%
S 229
 
6.3%
D 210
 
5.8%
A 196
 
5.4%
H 160
 
4.4%
Y 156
 
4.3%
Other values (16) 1051
29.1%
Decimal Number
ValueCountFrequency (%)
1 15
30.6%
2 15
30.6%
0 12
24.5%
7 4
 
8.2%
3 2
 
4.1%
4 1
 
2.0%
Connector Punctuation
ValueCountFrequency (%)
_ 688
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3615
83.1%
Common 737
 
16.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 395
 
10.9%
M 349
 
9.7%
N 346
 
9.6%
C 289
 
8.0%
R 234
 
6.5%
S 229
 
6.3%
D 210
 
5.8%
A 196
 
5.4%
H 160
 
4.4%
Y 156
 
4.3%
Other values (16) 1051
29.1%
Common
ValueCountFrequency (%)
_ 688
93.4%
1 15
 
2.0%
2 15
 
2.0%
0 12
 
1.6%
7 4
 
0.5%
3 2
 
0.3%
4 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4352
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 688
15.8%
T 395
 
9.1%
M 349
 
8.0%
N 346
 
8.0%
C 289
 
6.6%
R 234
 
5.4%
S 229
 
5.3%
D 210
 
4.8%
A 196
 
4.5%
H 160
 
3.7%
Other values (23) 1256
28.9%

colNm
Text

Distinct329
Distinct (%)86.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
2023-12-13T02:10:23.489243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length64
Median length39
Mean length13.193717
Min length2

Characters and Unicode

Total characters5040
Distinct characters219
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique303 ?
Unique (%)79.3%

Sample

1st row성명
2nd row성별코드
3rd row생년월일
4th row진단시나이
5th row첫진료일자
ValueCountFrequency (%)
stage 33
 
4.3%
grade 19
 
2.5%
pathology 16
 
2.1%
항바이러스제 12
 
1.6%
ajcc 12
 
1.6%
경구용 12
 
1.6%
of 11
 
1.4%
histologic 10
 
1.3%
lymph 7
 
0.9%
node 7
 
0.9%
Other values (357) 635
82.0%
2023-12-13T02:10:23.921892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
398
 
7.9%
e 257
 
5.1%
a 239
 
4.7%
t 228
 
4.5%
i 218
 
4.3%
o 216
 
4.3%
r 165
 
3.3%
n 148
 
2.9%
s 139
 
2.8%
g 115
 
2.3%
Other values (209) 2917
57.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2407
47.8%
Other Letter 1634
32.4%
Space Separator 398
 
7.9%
Uppercase Letter 292
 
5.8%
Open Punctuation 99
 
2.0%
Close Punctuation 99
 
2.0%
Decimal Number 51
 
1.0%
Other Punctuation 50
 
1.0%
Dash Punctuation 9
 
0.2%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
70
 
4.3%
65
 
4.0%
58
 
3.5%
47
 
2.9%
46
 
2.8%
44
 
2.7%
42
 
2.6%
42
 
2.6%
36
 
2.2%
35
 
2.1%
Other values (147) 1149
70.3%
Lowercase Letter
ValueCountFrequency (%)
e 257
10.7%
a 239
9.9%
t 228
 
9.5%
i 218
 
9.1%
o 216
 
9.0%
r 165
 
6.9%
n 148
 
6.1%
s 139
 
5.8%
g 115
 
4.8%
l 107
 
4.4%
Other values (16) 575
23.9%
Uppercase Letter
ValueCountFrequency (%)
C 58
19.9%
P 28
 
9.6%
S 23
 
7.9%
A 20
 
6.8%
I 19
 
6.5%
N 17
 
5.8%
L 15
 
5.1%
B 14
 
4.8%
J 12
 
4.1%
T 12
 
4.1%
Other values (11) 74
25.3%
Other Punctuation
ValueCountFrequency (%)
/ 18
36.0%
: 17
34.0%
, 9
18.0%
% 3
 
6.0%
. 3
 
6.0%
Decimal Number
ValueCountFrequency (%)
1 16
31.4%
2 16
31.4%
0 12
23.5%
7 4
 
7.8%
3 3
 
5.9%
Space Separator
ValueCountFrequency (%)
398
100.0%
Open Punctuation
ValueCountFrequency (%)
( 99
100.0%
Close Punctuation
ValueCountFrequency (%)
) 99
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2699
53.6%
Hangul 1634
32.4%
Common 707
 
14.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
70
 
4.3%
65
 
4.0%
58
 
3.5%
47
 
2.9%
46
 
2.8%
44
 
2.7%
42
 
2.6%
42
 
2.6%
36
 
2.2%
35
 
2.1%
Other values (147) 1149
70.3%
Latin
ValueCountFrequency (%)
e 257
 
9.5%
a 239
 
8.9%
t 228
 
8.4%
i 218
 
8.1%
o 216
 
8.0%
r 165
 
6.1%
n 148
 
5.5%
s 139
 
5.2%
g 115
 
4.3%
l 107
 
4.0%
Other values (37) 867
32.1%
Common
ValueCountFrequency (%)
398
56.3%
( 99
 
14.0%
) 99
 
14.0%
/ 18
 
2.5%
: 17
 
2.4%
1 16
 
2.3%
2 16
 
2.3%
0 12
 
1.7%
, 9
 
1.3%
- 9
 
1.3%
Other values (5) 14
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3406
67.6%
Hangul 1634
32.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
398
 
11.7%
e 257
 
7.5%
a 239
 
7.0%
t 228
 
6.7%
i 218
 
6.4%
o 216
 
6.3%
r 165
 
4.8%
n 148
 
4.3%
s 139
 
4.1%
g 115
 
3.4%
Other values (52) 1283
37.7%
Hangul
ValueCountFrequency (%)
70
 
4.3%
65
 
4.0%
58
 
3.5%
47
 
2.9%
46
 
2.8%
44
 
2.7%
42
 
2.6%
42
 
2.6%
36
 
2.2%
35
 
2.1%
Other values (147) 1149
70.3%

dataType
Categorical

IMBALANCE 

Distinct4
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
STRING
303 
DATE
44 
INTEGER
31 
<NA>
 
4

Length

Max length7
Median length6
Mean length5.8298429
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSTRING
2nd rowSTRING
3rd rowDATE
4th rowINTEGER
5th rowDATE

Common Values

ValueCountFrequency (%)
STRING 303
79.3%
DATE 44
 
11.5%
INTEGER 31
 
8.1%
<NA> 4
 
1.0%

Length

2023-12-13T02:10:24.056747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:10:24.180158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
string 303
79.3%
date 44
 
11.5%
integer 31
 
8.1%
na 4
 
1.0%
Distinct329
Distinct (%)86.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
2023-12-13T02:10:24.463532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length64
Median length39
Mean length13.193717
Min length2

Characters and Unicode

Total characters5040
Distinct characters219
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique303 ?
Unique (%)79.3%

Sample

1st row성명
2nd row성별코드
3rd row생년월일
4th row진단시나이
5th row첫진료일자
ValueCountFrequency (%)
stage 33
 
4.3%
grade 19
 
2.5%
pathology 16
 
2.1%
항바이러스제 12
 
1.6%
ajcc 12
 
1.6%
경구용 12
 
1.6%
of 11
 
1.4%
histologic 10
 
1.3%
lymph 7
 
0.9%
node 7
 
0.9%
Other values (357) 635
82.0%
2023-12-13T02:10:24.945277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
398
 
7.9%
e 257
 
5.1%
a 239
 
4.7%
t 228
 
4.5%
i 218
 
4.3%
o 216
 
4.3%
r 165
 
3.3%
n 148
 
2.9%
s 139
 
2.8%
g 115
 
2.3%
Other values (209) 2917
57.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2407
47.8%
Other Letter 1634
32.4%
Space Separator 398
 
7.9%
Uppercase Letter 292
 
5.8%
Open Punctuation 99
 
2.0%
Close Punctuation 99
 
2.0%
Decimal Number 51
 
1.0%
Other Punctuation 50
 
1.0%
Dash Punctuation 9
 
0.2%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
70
 
4.3%
65
 
4.0%
58
 
3.5%
47
 
2.9%
46
 
2.8%
44
 
2.7%
42
 
2.6%
42
 
2.6%
36
 
2.2%
35
 
2.1%
Other values (147) 1149
70.3%
Lowercase Letter
ValueCountFrequency (%)
e 257
10.7%
a 239
9.9%
t 228
 
9.5%
i 218
 
9.1%
o 216
 
9.0%
r 165
 
6.9%
n 148
 
6.1%
s 139
 
5.8%
g 115
 
4.8%
l 107
 
4.4%
Other values (16) 575
23.9%
Uppercase Letter
ValueCountFrequency (%)
C 58
19.9%
P 28
 
9.6%
S 23
 
7.9%
A 20
 
6.8%
I 19
 
6.5%
N 17
 
5.8%
L 15
 
5.1%
B 14
 
4.8%
J 12
 
4.1%
T 12
 
4.1%
Other values (11) 74
25.3%
Other Punctuation
ValueCountFrequency (%)
/ 18
36.0%
: 17
34.0%
, 9
18.0%
% 3
 
6.0%
. 3
 
6.0%
Decimal Number
ValueCountFrequency (%)
1 16
31.4%
2 16
31.4%
0 12
23.5%
7 4
 
7.8%
3 3
 
5.9%
Space Separator
ValueCountFrequency (%)
398
100.0%
Open Punctuation
ValueCountFrequency (%)
( 99
100.0%
Close Punctuation
ValueCountFrequency (%)
) 99
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2699
53.6%
Hangul 1634
32.4%
Common 707
 
14.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
70
 
4.3%
65
 
4.0%
58
 
3.5%
47
 
2.9%
46
 
2.8%
44
 
2.7%
42
 
2.6%
42
 
2.6%
36
 
2.2%
35
 
2.1%
Other values (147) 1149
70.3%
Latin
ValueCountFrequency (%)
e 257
 
9.5%
a 239
 
8.9%
t 228
 
8.4%
i 218
 
8.1%
o 216
 
8.0%
r 165
 
6.1%
n 148
 
5.5%
s 139
 
5.2%
g 115
 
4.3%
l 107
 
4.0%
Other values (37) 867
32.1%
Common
ValueCountFrequency (%)
398
56.3%
( 99
 
14.0%
) 99
 
14.0%
/ 18
 
2.5%
: 17
 
2.4%
1 16
 
2.3%
2 16
 
2.3%
0 12
 
1.7%
, 9
 
1.3%
- 9
 
1.3%
Other values (5) 14
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3406
67.6%
Hangul 1634
32.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
398
 
11.7%
e 257
 
7.5%
a 239
 
7.0%
t 228
 
6.7%
i 218
 
6.4%
o 216
 
6.3%
r 165
 
4.8%
n 148
 
4.3%
s 139
 
4.1%
g 115
 
3.4%
Other values (52) 1283
37.7%
Hangul
ValueCountFrequency (%)
70
 
4.3%
65
 
4.0%
58
 
3.5%
47
 
2.9%
46
 
2.8%
44
 
2.7%
42
 
2.6%
42
 
2.6%
36
 
2.2%
35
 
2.1%
Other values (147) 1149
70.3%

colCnt
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct142
Distinct (%)37.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74961.393
Minimum0
Maximum5061656
Zeros82
Zeros (%)21.5%
Negative0
Negative (%)0.0%
Memory size3.5 KiB
2023-12-13T02:10:25.125983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q136.75
median1688
Q32257
95-th percentile63745.3
Maximum5061656
Range5061656
Interquartile range (IQR)2220.25

Descriptive statistics

Standard deviation572154.62
Coefficient of variation (CV)7.6326573
Kurtosis71.920694
Mean74961.393
Median Absolute Deviation (MAD)1504
Skewness8.5609433
Sum28635252
Variance3.2736091 × 1011
MonotonicityNot monotonic
2023-12-13T02:10:25.308938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 82
21.5%
2104 37
 
9.7%
2102 12
 
3.1%
1728 11
 
2.9%
2257 10
 
2.6%
903 8
 
2.1%
13049 7
 
1.8%
50443 7
 
1.8%
881 6
 
1.6%
543 6
 
1.6%
Other values (132) 196
51.3%
ValueCountFrequency (%)
0 82
21.5%
1 1
 
0.3%
2 2
 
0.5%
3 2
 
0.5%
4 1
 
0.3%
8 1
 
0.3%
9 1
 
0.3%
15 1
 
0.3%
16 1
 
0.3%
24 1
 
0.3%
ValueCountFrequency (%)
5061656 4
1.0%
4892526 1
 
0.3%
261251 4
1.0%
254070 1
 
0.3%
105528 4
1.0%
102459 1
 
0.3%
101029 1
 
0.3%
65605 3
0.8%
63806 1
 
0.3%
62592 1
 
0.3%

dispFormat
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing382
Missing (%)100.0%
Memory size3.5 KiB

Interactions

2023-12-13T02:10:21.148646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:10:20.913417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:10:21.246070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:10:21.046341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T02:10:25.722955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypecolCnt
NUM1.0000.9550.9550.9760.9760.3780.409
gpId0.9551.0001.0001.0001.0000.4971.000
gpNm0.9551.0001.0001.0001.0000.4971.000
tblId0.9761.0001.0001.0001.0000.6231.000
tblNm0.9761.0001.0001.0001.0000.6231.000
dataType0.3780.4970.4970.6230.6231.0000.000
colCnt0.4091.0001.0001.0001.0000.0001.000
2023-12-13T02:10:25.850444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
gpNmdataTypetblNmgpIdtblId
gpNm1.0000.2590.9791.0000.979
dataType0.2591.0000.3500.2590.350
tblNm0.9790.3501.0000.9791.000
gpId1.0000.2590.9791.0000.979
tblId0.9790.3501.0000.9791.000
2023-12-13T02:10:25.970406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMcolCntgpIdgpNmtblIdtblNmdataType
NUM1.000-0.2140.7860.7860.8190.8190.243
colCnt-0.2141.0000.9790.9790.9580.9580.000
gpId0.7860.9791.0001.0000.9790.9790.259
gpNm0.7860.9791.0001.0000.9790.9790.259
tblId0.8190.9580.9790.9791.0001.0000.350
tblNm0.8190.9580.9790.9791.0001.0000.350
dataType0.2430.0000.2590.2590.3500.3501.000

Missing values

2023-12-13T02:10:21.388070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T02:10:21.566042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01_LVER_SummarySummary_LVER_PT_TRGT간암 대상자PT_NM성명STRING성명13049<NA>
12_LVER_SummarySummary_LVER_PT_TRGT간암 대상자SEX_CD성별코드STRING성별코드13049<NA>
23_LVER_SummarySummary_LVER_PT_TRGT간암 대상자BRTH_YMD생년월일DATE생년월일13049<NA>
34_LVER_SummarySummary_LVER_PT_TRGT간암 대상자DIAG_AGE진단시나이INTEGER진단시나이13049<NA>
45_LVER_SummarySummary_LVER_PT_TRGT간암 대상자FRMD_YMD첫진료일자DATE첫진료일자13049<NA>
56_LVER_SummarySummary_LVER_PT_TRGT간암 대상자DIAG_CD첫진단코드STRING첫진단코드13049<NA>
67_LVER_SummarySummary_LVER_PT_TRGT간암 대상자DIAG_ENM첫진단한글명STRING첫진단한글명13049<NA>
78_LVER_SummarySummary_LVER_PT_TRGT간암 대상자FMDR_ID주치의IDSTRING주치의ID12899<NA>
89_LVER_SummarySummary_LVER_PT_TRGT간암 대상자FMDR_NM주치의명STRING주치의명12888<NA>
910_LVER_SummarySummary_LVER_PT_TRGT간암 대상자OPRT_YMD첫수술일자DATE첫수술일자2102<NA>
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
372373_LVER_HLTH기타건강정보_LVER_MR_HLTH간암 환자건강정보PHIS_CNCR_YN과거병력암여부STRING과거병력암여부2104<NA>
373374_LVER_HLTH기타건강정보_LVER_MR_HLTH간암 환자건강정보PHIS_DEPR_YN과거병력우울증여부STRING과거병력우울증여부2104<NA>
374375_LVER_HLTH기타건강정보_LVER_MR_HLTH간암 환자건강정보PHIS_INSM_YN과거병력불면증여부STRING과거병력불면증여부2104<NA>
375376_LVER_HLTH기타건강정보_LVER_MR_HLTH간암 환자건강정보PHIS_CADZ_YN과거병력심장질환여부STRING과거병력심장질환여부2104<NA>
376377_LVER_HLTH기타건강정보_LVER_MR_HLTH간암 환자건강정보PHIS_CADZ_CMNT과거병력심장질환내용STRING과거병력심장질환내용39<NA>
377378_LVER_HLTH기타건강정보_LVER_MR_HLTH간암 환자건강정보PHIS_ETC_YN과거병력기타여부STRING과거병력기타여부2104<NA>
378379_LVER_HLTH기타건강정보_LVER_MR_HLTH간암 환자건강정보PHIS_ETC_CMNT과거병력기타내용STRING과거병력기타내용678<NA>
379380_LVER_HLTH기타건강정보_LVER_MR_HLTH간암 환자건강정보MAIN_SYMP_YN주증상유무STRING주증상유무889<NA>
380381_LVER_HLTH기타건강정보_LVER_MR_HLTH간암 환자건강정보MAIN_SYMP_CMNT주증상내용STRING주증상내용1038<NA>
381382_LVER_HLTH기타건강정보_LVER_MR_HLTH간암 환자건강정보OUTS_DIAG_TRANS_YN타병원진단후전원여부STRING타병원진단후전원여부2104<NA>