Overview

Dataset statistics

Number of variables11
Number of observations284
Missing cells284
Missing cells (%)9.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory25.4 KiB
Average record size in memory91.5 B

Variable types

Numeric2
Categorical5
Text3
Unsupported1

Dataset

Description대장암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048689/fileData.do

Alerts

tblId is highly overall correlated with NUM and 4 other fieldsHigh correlation
tblNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
gpNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 4 other fieldsHigh correlation
NUM is highly overall correlated with colCnt and 4 other fieldsHigh correlation
colCnt is highly overall correlated with NUM and 4 other fieldsHigh correlation
dispFormat has 284 (100.0%) missing valuesMissing
NUM has unique valuesUnique
dispFormat is an unsupported type, check if it needs cleaning or further analysisUnsupported
colCnt has 98 (34.5%) zerosZeros

Reproduction

Analysis started2023-12-12 13:10:52.065721
Analysis finished2023-12-12 13:10:53.376701
Duration1.31 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct284
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean142.5
Minimum1
Maximum284
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2023-12-12T22:10:53.487676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile15.15
Q171.75
median142.5
Q3213.25
95-th percentile269.85
Maximum284
Range283
Interquartile range (IQR)141.5

Descriptive statistics

Standard deviation82.127949
Coefficient of variation (CV)0.57633648
Kurtosis-1.2
Mean142.5
Median Absolute Deviation (MAD)71
Skewness0
Sum40470
Variance6745
MonotonicityStrictly increasing
2023-12-12T22:10:53.700584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.4%
189 1
 
0.4%
195 1
 
0.4%
194 1
 
0.4%
193 1
 
0.4%
192 1
 
0.4%
191 1
 
0.4%
190 1
 
0.4%
188 1
 
0.4%
197 1
 
0.4%
Other values (274) 274
96.5%
ValueCountFrequency (%)
1 1
0.4%
2 1
0.4%
3 1
0.4%
4 1
0.4%
5 1
0.4%
6 1
0.4%
7 1
0.4%
8 1
0.4%
9 1
0.4%
10 1
0.4%
ValueCountFrequency (%)
284 1
0.4%
283 1
0.4%
282 1
0.4%
281 1
0.4%
280 1
0.4%
279 1
0.4%
278 1
0.4%
277 1
0.4%
276 1
0.4%
275 1
0.4%

gpId
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
_CLRC_CHMO
58 
_CLRC_SPR
52 
_CLRC_OPRT
42 
_CLRC_RTX
41 
_CLRC_RLPS_MIST
22 
Other values (8)
69 

Length

Max length15
Median length13
Mean length10.771127
Min length9

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row_CLRC_Summary
2nd row_CLRC_Summary
3rd row_CLRC_Summary
4th row_CLRC_Summary
5th row_CLRC_Summary

Common Values

ValueCountFrequency (%)
_CLRC_CHMO 58
20.4%
_CLRC_SPR 52
18.3%
_CLRC_OPRT 42
14.8%
_CLRC_RTX 41
14.4%
_CLRC_RLPS_MIST 22
 
7.7%
_CLRC_Summary 14
 
4.9%
_CLRC_EVAL_DEAD 14
 
4.9%
_CLRC_CNDX 11
 
3.9%
_CLRC_COMP 10
 
3.5%
_CLRC_MIEX_INIT 6
 
2.1%
Other values (3) 14
 
4.9%

Length

2023-12-12T22:10:53.830851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
clrc_chmo 58
20.4%
clrc_spr 52
18.3%
clrc_oprt 42
14.8%
clrc_rtx 41
14.4%
clrc_rlps_mist 22
 
7.7%
clrc_summary 14
 
4.9%
clrc_eval_dead 14
 
4.9%
clrc_cndx 11
 
3.9%
clrc_comp 10
 
3.5%
clrc_miex_init 6
 
2.1%
Other values (3) 14
 
4.9%

gpNm
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
항암치료
58 
외과병리보고서
52 
수술정보
42 
방사선치료
41 
전이 및 재발
22 
Other values (8)
69 

Length

Max length17
Median length16
Mean length5.915493
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSummary
2nd rowSummary
3rd rowSummary
4th rowSummary
5th rowSummary

Common Values

ValueCountFrequency (%)
항암치료 58
20.4%
외과병리보고서 52
18.3%
수술정보 42
14.8%
방사선치료 41
14.4%
전이 및 재발 22
 
7.7%
Summary 14
 
4.9%
사망 및 치료평가 14
 
4.9%
진단정보 11
 
3.9%
합병증 10
 
3.5%
진단검사(영상/시술) 6
 
2.1%
Other values (3) 14
 
4.9%

Length

2023-12-12T22:10:53.989088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
항암치료 58
15.7%
외과병리보고서 52
14.1%
수술정보 42
11.4%
방사선치료 41
11.1%
36
9.7%
전이 22
 
5.9%
재발 22
 
5.9%
치료평가 14
 
3.8%
사망 14
 
3.8%
summary 14
 
3.8%
Other values (7) 55
14.9%

tblId
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
_CLRC_PE_SPR
52 
_CLRC_PE_OPRT
31 
_CLRC_PE_CHMO
20 
_CLRC_PE_RTX
16 
_CLRC_PE_RTX_PRE
 
15
Other values (20)
150 

Length

Max length18
Median length13
Mean length13.327465
Min length11

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row_CLRC_PT_TRGT
2nd row_CLRC_PT_TRGT
3rd row_CLRC_PT_TRGT
4th row_CLRC_PT_TRGT
5th row_CLRC_PT_TRGT

Common Values

ValueCountFrequency (%)
_CLRC_PE_SPR 52
18.3%
_CLRC_PE_OPRT 31
 
10.9%
_CLRC_PE_CHMO 20
 
7.0%
_CLRC_PE_RTX 16
 
5.6%
_CLRC_PE_RTX_PRE 15
 
5.3%
_CLRC_PT_TRGT 14
 
4.9%
_CLRC_PE_RLPS 14
 
4.9%
_CLRC_PE_4S 12
 
4.2%
_CLRC_PE_RTX_PST 10
 
3.5%
_CLRC_PE_COMP 10
 
3.5%
Other values (15) 90
31.7%

Length

2023-12-12T22:10:54.172294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
clrc_pe_spr 52
18.3%
clrc_pe_oprt 31
 
10.9%
clrc_pe_chmo 20
 
7.0%
clrc_pe_rtx 16
 
5.6%
clrc_pe_rtx_pre 15
 
5.3%
clrc_pt_trgt 14
 
4.9%
clrc_pe_rlps 14
 
4.9%
clrc_pe_4s 12
 
4.2%
clrc_pe_rtx_pst 10
 
3.5%
clrc_pe_comp 10
 
3.5%
Other values (15) 90
31.7%

tblNm
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
대장암 외과병리결과
52 
대장암 수술기록
31 
대장암 항암치료
20 
대장암 방사선치료
16 
대장암 방사선치료전검사
 
15
Other values (20)
150 

Length

Max length19
Median length16
Mean length9.7570423
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대장암 대상자
2nd row대장암 대상자
3rd row대장암 대상자
4th row대장암 대상자
5th row대장암 대상자

Common Values

ValueCountFrequency (%)
대장암 외과병리결과 52
18.3%
대장암 수술기록 31
 
10.9%
대장암 항암치료 20
 
7.0%
대장암 방사선치료 16
 
5.6%
대장암 방사선치료전검사 15
 
5.3%
대장암 대상자 14
 
4.9%
대장암 재발정보 14
 
4.9%
대장암 4기진단정보 12
 
4.2%
대장암 방사선치료후검사 10
 
3.5%
대장암 합병증 10
 
3.5%
Other values (15) 90
31.7%

Length

2023-12-12T22:10:54.310159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
대장암 284
47.9%
외과병리결과 52
 
8.8%
수술기록 31
 
5.2%
항암치료 20
 
3.4%
initial 16
 
2.7%
방사선치료 16
 
2.7%
방사선치료전검사 15
 
2.5%
대상자 14
 
2.4%
재발정보 14
 
2.4%
4기진단정보 12
 
2.0%
Other values (18) 119
20.1%

colId
Text

Distinct253
Distinct (%)89.1%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2023-12-12T22:10:54.633380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length13
Mean length10.697183
Min length5

Characters and Unicode

Total characters3038
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique225 ?
Unique (%)79.2%

Sample

1st rowDIAG_AGE
2nd rowBRTH_YMD
3rd rowSEX_CD
4th rowFRMD_YMD
5th rowOPRT_CD
ValueCountFrequency (%)
oprt_ymd 4
 
1.4%
diag_ymd 3
 
1.1%
t4b_inst_cmnt 2
 
0.7%
meta_loca_cmnt 2
 
0.7%
tme_eval_cnmt 2
 
0.7%
exam_ymd 2
 
0.7%
tumr_size_vl 2
 
0.7%
ht_vl 2
 
0.7%
mtst_part_cmnt 2
 
0.7%
cexm_rslt_cmnt 2
 
0.7%
Other values (243) 261
91.9%
2023-12-12T22:10:55.110518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 458
15.1%
T 304
 
10.0%
M 281
 
9.2%
N 257
 
8.5%
C 215
 
7.1%
R 190
 
6.3%
D 185
 
6.1%
A 130
 
4.3%
E 126
 
4.1%
S 125
 
4.1%
Other values (23) 767
25.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2565
84.4%
Connector Punctuation 458
 
15.1%
Decimal Number 15
 
0.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 304
11.9%
M 281
11.0%
N 257
 
10.0%
C 215
 
8.4%
R 190
 
7.4%
D 185
 
7.2%
A 130
 
5.1%
E 126
 
4.9%
S 125
 
4.9%
L 96
 
3.7%
Other values (16) 656
25.6%
Decimal Number
ValueCountFrequency (%)
4 4
26.7%
2 4
26.7%
1 2
13.3%
6 2
13.3%
3 2
13.3%
8 1
 
6.7%
Connector Punctuation
ValueCountFrequency (%)
_ 458
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2565
84.4%
Common 473
 
15.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 304
11.9%
M 281
11.0%
N 257
 
10.0%
C 215
 
8.4%
R 190
 
7.4%
D 185
 
7.2%
A 130
 
5.1%
E 126
 
4.9%
S 125
 
4.9%
L 96
 
3.7%
Other values (16) 656
25.6%
Common
ValueCountFrequency (%)
_ 458
96.8%
4 4
 
0.8%
2 4
 
0.8%
1 2
 
0.4%
6 2
 
0.4%
3 2
 
0.4%
8 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3038
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 458
15.1%
T 304
 
10.0%
M 281
 
9.2%
N 257
 
8.5%
C 215
 
7.1%
R 190
 
6.3%
D 185
 
6.1%
A 130
 
4.3%
E 126
 
4.1%
S 125
 
4.1%
Other values (23) 767
25.2%

colNm
Text

Distinct256
Distinct (%)90.1%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2023-12-12T22:10:55.477445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length134
Median length43
Mean length12.933099
Min length1

Characters and Unicode

Total characters3673
Distinct characters192
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique236 ?
Unique (%)83.1%

Sample

1st row나이
2nd row생년월일
3rd row성별
4th row초진일
5th row수술코드
ValueCountFrequency (%)
tumor 19
 
3.2%
of 15
 
2.5%
the 12
 
2.0%
재발 10
 
1.7%
margin 8
 
1.3%
invasion 7
 
1.2%
lymph 6
 
1.0%
node 6
 
1.0%
기타 6
 
1.0%
stage 6
 
1.0%
Other values (335) 506
84.2%
2023-12-12T22:10:56.074608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
317
 
8.6%
e 228
 
6.2%
o 199
 
5.4%
r 197
 
5.4%
t 193
 
5.3%
a 192
 
5.2%
i 177
 
4.8%
n 149
 
4.1%
s 122
 
3.3%
l 112
 
3.0%
Other values (182) 1787
48.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2194
59.7%
Other Letter 728
 
19.8%
Space Separator 317
 
8.6%
Uppercase Letter 287
 
7.8%
Close Punctuation 40
 
1.1%
Open Punctuation 40
 
1.1%
Other Punctuation 27
 
0.7%
Connector Punctuation 17
 
0.5%
Decimal Number 14
 
0.4%
Dash Punctuation 8
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
46
 
6.3%
32
 
4.4%
23
 
3.2%
21
 
2.9%
19
 
2.6%
19
 
2.6%
19
 
2.6%
18
 
2.5%
18
 
2.5%
16
 
2.2%
Other values (119) 497
68.3%
Lowercase Letter
ValueCountFrequency (%)
e 228
10.4%
o 199
 
9.1%
r 197
 
9.0%
t 193
 
8.8%
a 192
 
8.8%
i 177
 
8.1%
n 149
 
6.8%
s 122
 
5.6%
l 112
 
5.1%
m 102
 
4.6%
Other values (15) 523
23.8%
Uppercase Letter
ValueCountFrequency (%)
M 26
 
9.1%
D 26
 
9.1%
C 21
 
7.3%
T 20
 
7.0%
E 20
 
7.0%
N 18
 
6.3%
S 18
 
6.3%
A 17
 
5.9%
R 17
 
5.9%
P 16
 
5.6%
Other values (12) 88
30.7%
Decimal Number
ValueCountFrequency (%)
2 4
28.6%
3 3
21.4%
4 3
21.4%
1 2
14.3%
6 1
 
7.1%
8 1
 
7.1%
Other Punctuation
ValueCountFrequency (%)
/ 21
77.8%
. 4
 
14.8%
: 1
 
3.7%
, 1
 
3.7%
Space Separator
ValueCountFrequency (%)
317
100.0%
Close Punctuation
ValueCountFrequency (%)
) 40
100.0%
Open Punctuation
ValueCountFrequency (%)
( 40
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 17
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%
Math Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2481
67.5%
Hangul 728
 
19.8%
Common 464
 
12.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
46
 
6.3%
32
 
4.4%
23
 
3.2%
21
 
2.9%
19
 
2.6%
19
 
2.6%
19
 
2.6%
18
 
2.5%
18
 
2.5%
16
 
2.2%
Other values (119) 497
68.3%
Latin
ValueCountFrequency (%)
e 228
 
9.2%
o 199
 
8.0%
r 197
 
7.9%
t 193
 
7.8%
a 192
 
7.7%
i 177
 
7.1%
n 149
 
6.0%
s 122
 
4.9%
l 112
 
4.5%
m 102
 
4.1%
Other values (37) 810
32.6%
Common
ValueCountFrequency (%)
317
68.3%
) 40
 
8.6%
( 40
 
8.6%
/ 21
 
4.5%
_ 17
 
3.7%
- 8
 
1.7%
2 4
 
0.9%
. 4
 
0.9%
3 3
 
0.6%
4 3
 
0.6%
Other values (6) 7
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2944
80.2%
Hangul 728
 
19.8%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
317
 
10.8%
e 228
 
7.7%
o 199
 
6.8%
r 197
 
6.7%
t 193
 
6.6%
a 192
 
6.5%
i 177
 
6.0%
n 149
 
5.1%
s 122
 
4.1%
l 112
 
3.8%
Other values (52) 1058
35.9%
Hangul
ValueCountFrequency (%)
46
 
6.3%
32
 
4.4%
23
 
3.2%
21
 
2.9%
19
 
2.6%
19
 
2.6%
19
 
2.6%
18
 
2.5%
18
 
2.5%
16
 
2.2%
Other values (119) 497
68.3%
Math Operators
ValueCountFrequency (%)
1
100.0%

dataType
Categorical

Distinct3
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
STRING
213 
DATE
47 
INTEGER
24 

Length

Max length7
Median length6
Mean length5.7535211
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowINTEGER
2nd rowDATE
3rd rowSTRING
4th rowDATE
5th rowSTRING

Common Values

ValueCountFrequency (%)
STRING 213
75.0%
DATE 47
 
16.5%
INTEGER 24
 
8.5%

Length

2023-12-12T22:10:56.229600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:10:56.356086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
string 213
75.0%
date 47
 
16.5%
integer 24
 
8.5%
Distinct256
Distinct (%)90.1%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2023-12-12T22:10:56.699909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length134
Median length43
Mean length12.933099
Min length1

Characters and Unicode

Total characters3673
Distinct characters192
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique236 ?
Unique (%)83.1%

Sample

1st row나이
2nd row생년월일
3rd row성별
4th row초진일
5th row수술코드
ValueCountFrequency (%)
tumor 19
 
3.2%
of 15
 
2.5%
the 12
 
2.0%
재발 10
 
1.7%
margin 8
 
1.3%
invasion 7
 
1.2%
lymph 6
 
1.0%
node 6
 
1.0%
기타 6
 
1.0%
stage 6
 
1.0%
Other values (335) 506
84.2%
2023-12-12T22:10:57.576992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
317
 
8.6%
e 228
 
6.2%
o 199
 
5.4%
r 197
 
5.4%
t 193
 
5.3%
a 192
 
5.2%
i 177
 
4.8%
n 149
 
4.1%
s 122
 
3.3%
l 112
 
3.0%
Other values (182) 1787
48.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2194
59.7%
Other Letter 728
 
19.8%
Space Separator 317
 
8.6%
Uppercase Letter 287
 
7.8%
Close Punctuation 40
 
1.1%
Open Punctuation 40
 
1.1%
Other Punctuation 27
 
0.7%
Connector Punctuation 17
 
0.5%
Decimal Number 14
 
0.4%
Dash Punctuation 8
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
46
 
6.3%
32
 
4.4%
23
 
3.2%
21
 
2.9%
19
 
2.6%
19
 
2.6%
19
 
2.6%
18
 
2.5%
18
 
2.5%
16
 
2.2%
Other values (119) 497
68.3%
Lowercase Letter
ValueCountFrequency (%)
e 228
10.4%
o 199
 
9.1%
r 197
 
9.0%
t 193
 
8.8%
a 192
 
8.8%
i 177
 
8.1%
n 149
 
6.8%
s 122
 
5.6%
l 112
 
5.1%
m 102
 
4.6%
Other values (15) 523
23.8%
Uppercase Letter
ValueCountFrequency (%)
M 26
 
9.1%
D 26
 
9.1%
C 21
 
7.3%
T 20
 
7.0%
E 20
 
7.0%
N 18
 
6.3%
S 18
 
6.3%
A 17
 
5.9%
R 17
 
5.9%
P 16
 
5.6%
Other values (12) 88
30.7%
Decimal Number
ValueCountFrequency (%)
2 4
28.6%
3 3
21.4%
4 3
21.4%
1 2
14.3%
6 1
 
7.1%
8 1
 
7.1%
Other Punctuation
ValueCountFrequency (%)
/ 21
77.8%
. 4
 
14.8%
: 1
 
3.7%
, 1
 
3.7%
Space Separator
ValueCountFrequency (%)
317
100.0%
Close Punctuation
ValueCountFrequency (%)
) 40
100.0%
Open Punctuation
ValueCountFrequency (%)
( 40
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 17
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%
Math Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2481
67.5%
Hangul 728
 
19.8%
Common 464
 
12.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
46
 
6.3%
32
 
4.4%
23
 
3.2%
21
 
2.9%
19
 
2.6%
19
 
2.6%
19
 
2.6%
18
 
2.5%
18
 
2.5%
16
 
2.2%
Other values (119) 497
68.3%
Latin
ValueCountFrequency (%)
e 228
 
9.2%
o 199
 
8.0%
r 197
 
7.9%
t 193
 
7.8%
a 192
 
7.7%
i 177
 
7.1%
n 149
 
6.0%
s 122
 
4.9%
l 112
 
4.5%
m 102
 
4.1%
Other values (37) 810
32.6%
Common
ValueCountFrequency (%)
317
68.3%
) 40
 
8.6%
( 40
 
8.6%
/ 21
 
4.5%
_ 17
 
3.7%
- 8
 
1.7%
2 4
 
0.9%
. 4
 
0.9%
3 3
 
0.6%
4 3
 
0.6%
Other values (6) 7
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2944
80.2%
Hangul 728
 
19.8%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
317
 
10.8%
e 228
 
7.7%
o 199
 
6.8%
r 197
 
6.7%
t 193
 
6.6%
a 192
 
6.5%
i 177
 
6.0%
n 149
 
5.1%
s 122
 
4.1%
l 112
 
3.8%
Other values (52) 1058
35.9%
Hangul
ValueCountFrequency (%)
46
 
6.3%
32
 
4.4%
23
 
3.2%
21
 
2.9%
19
 
2.6%
19
 
2.6%
19
 
2.6%
18
 
2.5%
18
 
2.5%
16
 
2.2%
Other values (119) 497
68.3%
Math Operators
ValueCountFrequency (%)
1
100.0%

colCnt
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct109
Distinct (%)38.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74970.236
Minimum0
Maximum7821950
Zeros98
Zeros (%)34.5%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2023-12-12T22:10:57.742684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1321.5
Q312801
95-th percentile123415.4
Maximum7821950
Range7821950
Interquartile range (IQR)12801

Descriptive statistics

Standard deviation642776.1
Coefficient of variation (CV)8.5737506
Kurtosis136.83565
Mean74970.236
Median Absolute Deviation (MAD)1321.5
Skewness11.68213
Sum21291547
Variance4.1316112 × 1011
MonotonicityNot monotonic
2023-12-12T22:10:57.921317image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 98
34.5%
49741 11
 
3.9%
12801 10
 
3.5%
19106 7
 
2.5%
1864 6
 
2.1%
251 6
 
2.1%
4883 6
 
2.1%
10135 5
 
1.8%
56267 5
 
1.8%
443 4
 
1.4%
Other values (99) 126
44.4%
ValueCountFrequency (%)
0 98
34.5%
16 1
 
0.4%
33 1
 
0.4%
65 1
 
0.4%
85 1
 
0.4%
94 1
 
0.4%
133 1
 
0.4%
185 1
 
0.4%
193 1
 
0.4%
196 1
 
0.4%
ValueCountFrequency (%)
7821950 1
 
0.4%
7483859 1
 
0.4%
516812 1
 
0.4%
491323 1
 
0.4%
354382 4
1.4%
340848 1
 
0.4%
133075 3
1.1%
131312 1
 
0.4%
130852 1
 
0.4%
129485 1
 
0.4%

dispFormat
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing284
Missing (%)100.0%
Memory size2.6 KiB

Interactions

2023-12-12T22:10:52.876696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:10:52.664304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:10:52.979265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:10:52.766417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T22:10:58.041001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypecolCnt
NUM1.0000.9250.9250.9720.9720.3380.241
gpId0.9251.0001.0001.0001.0000.3240.723
gpNm0.9251.0001.0001.0001.0000.3240.723
tblId0.9721.0001.0001.0001.0000.5300.754
tblNm0.9721.0001.0001.0001.0000.5300.754
dataType0.3380.3240.3240.5300.5301.0000.000
colCnt0.2410.7230.7230.7540.7540.0001.000
2023-12-12T22:10:58.159768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
tblIdtblNmgpNmgpIddataType
tblId1.0001.0000.9780.9780.310
tblNm1.0001.0000.9780.9780.310
gpNm0.9780.9781.0001.0000.189
gpId0.9780.9781.0001.0000.189
dataType0.3100.3100.1890.1891.000
2023-12-12T22:10:58.257787image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMcolCntgpIdgpNmtblIdtblNmdataType
NUM1.000-0.5360.7300.7300.7930.7930.213
colCnt-0.5361.0000.6750.6750.6430.6430.000
gpId0.7300.6751.0001.0000.9780.9780.189
gpNm0.7300.6751.0001.0000.9780.9780.189
tblId0.7930.6430.9780.9781.0001.0000.310
tblNm0.7930.6430.9780.9781.0001.0000.310
dataType0.2130.0000.1890.1890.3100.3101.000

Missing values

2023-12-12T22:10:53.128701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:10:53.309201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01_CLRC_SummarySummary_CLRC_PT_TRGT대장암 대상자DIAG_AGE나이INTEGER나이19106<NA>
12_CLRC_SummarySummary_CLRC_PT_TRGT대장암 대상자BRTH_YMD생년월일DATE생년월일19106<NA>
23_CLRC_SummarySummary_CLRC_PT_TRGT대장암 대상자SEX_CD성별STRING성별19106<NA>
34_CLRC_SummarySummary_CLRC_PT_TRGT대장암 대상자FRMD_YMD초진일DATE초진일19087<NA>
45_CLRC_SummarySummary_CLRC_PT_TRGT대장암 대상자OPRT_CD수술코드STRING수술코드10135<NA>
56_CLRC_SummarySummary_CLRC_PT_TRGT대장암 대상자OPRT_NM수술명STRING수술명10135<NA>
67_CLRC_SummarySummary_CLRC_PT_TRGT대장암 대상자OPDR_ID집도의IDSTRING집도의ID10135<NA>
78_CLRC_SummarySummary_CLRC_PT_TRGT대장암 대상자OPDR_NM집도의STRING집도의10135<NA>
89_CLRC_SummarySummary_CLRC_PT_TRGT대장암 대상자DRTR_YMD약물치료시작일DATE약물치료시작일0<NA>
910_CLRC_SummarySummary_CLRC_PT_TRGT대장암 대상자RATH_YMD방사선치료시작일DATE방사선치료시작일3615<NA>
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
274275_CLRC_EVAL_DEAD사망 및 치료평가_CLRC_PE_EVAL대장암 치료평가FLUP_LOSS_CD마지막 F/U 상태STRING마지막 F/U 상태0<NA>
275276_CLRC_EVAL_DEAD사망 및 치료평가_CLRC_PE_EVAL대장암 치료평가DFS_DRTNDisease-free survival (DFS)STRINGDisease-free survival (DFS)0<NA>
276277_CLRC_EVAL_DEAD사망 및 치료평가_CLRC_PE_EVAL대장암 치료평가OS_DRTNOverall survival (OS)STRINGOverall survival (OS)0<NA>
277278_CLRC_EVAL_DEAD사망 및 치료평가_CLRC_PT_DEAD대장암 사망정보DEAD_YN사망여부STRING사망여부19106<NA>
278279_CLRC_EVAL_DEAD사망 및 치료평가_CLRC_PT_DEAD대장암 사망정보DEAD_YMD사망일DATE사망일719<NA>
279280_CLRC_EVAL_DEAD사망 및 치료평가_CLRC_PT_DEAD대장암 사망정보DCUZ1_CMNT사망 사유(text)STRING사망 사유(text)712<NA>
280281_CLRC_EVAL_DEAD사망 및 치료평가_CLRC_PT_DEAD대장암 사망정보DCUZ2_CMNT기타 사망원인1STRING기타 사망원인1266<NA>
281282_CLRC_EVAL_DEAD사망 및 치료평가_CLRC_PT_DEAD대장암 사망정보DCUZ3_CMNT기타 사망원인2STRING기타 사망원인2133<NA>
282283_CLRC_EVAL_DEAD사망 및 치료평가_CLRC_PT_DEAD대장암 사망정보DCUZ4_CMNT기타 사망원인3STRING기타 사망원인333<NA>
283284_CLRC_EVAL_DEAD사망 및 치료평가_CLRC_PT_DEAD대장암 사망정보DCUZ_KCD6_CMNT주 사망원인코드(KCD)STRING주 사망원인코드(KCD)0<NA>