Overview

Dataset statistics

Number of variables11
Number of observations321
Missing cells320
Missing cells (%)9.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory28.3 KiB
Average record size in memory90.4 B

Variable types

Numeric2
Categorical5
Text4

Dataset

Description갑상선암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048690/fileData.do

Alerts

dispFormat has constant value ""Constant
gpNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
tblNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 4 other fieldsHigh correlation
tblId is highly overall correlated with NUM and 4 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 3 other fieldsHigh correlation
colCnt is highly overall correlated with gpId and 3 other fieldsHigh correlation
dispFormat has 320 (99.7%) missing valuesMissing
NUM has unique valuesUnique
colCnt has 6 (1.9%) zerosZeros

Reproduction

Analysis started2023-12-12 15:53:23.646175
Analysis finished2023-12-12 15:53:25.907905
Duration2.26 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct321
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean161
Minimum1
Maximum321
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.0 KiB
2023-12-13T00:53:26.020139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile17
Q181
median161
Q3241
95-th percentile305
Maximum321
Range320
Interquartile range (IQR)160

Descriptive statistics

Standard deviation92.808944
Coefficient of variation (CV)0.57645307
Kurtosis-1.2
Mean161
Median Absolute Deviation (MAD)80
Skewness0
Sum51681
Variance8613.5
MonotonicityStrictly increasing
2023-12-13T00:53:26.201493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.3%
242 1
 
0.3%
220 1
 
0.3%
219 1
 
0.3%
218 1
 
0.3%
217 1
 
0.3%
216 1
 
0.3%
215 1
 
0.3%
214 1
 
0.3%
213 1
 
0.3%
Other values (311) 311
96.9%
ValueCountFrequency (%)
1 1
0.3%
2 1
0.3%
3 1
0.3%
4 1
0.3%
5 1
0.3%
6 1
0.3%
7 1
0.3%
8 1
0.3%
9 1
0.3%
10 1
0.3%
ValueCountFrequency (%)
321 1
0.3%
320 1
0.3%
319 1
0.3%
318 1
0.3%
317 1
0.3%
316 1
0.3%
315 1
0.3%
314 1
0.3%
313 1
0.3%
312 1
0.3%

gpId
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
_THRD_HLTH
68 
_THRD_OPRT
49 
_THRD_SPR
48 
_THRD_CNDX
20 
_THRD_RADI
17 
Other values (13)
119 

Length

Max length15
Median length10
Mean length10.82243
Min length9

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row_THRD_Summary
2nd row_THRD_Summary
3rd row_THRD_Summary
4th row_THRD_Summary
5th row_THRD_Summary

Common Values

ValueCountFrequency (%)
_THRD_HLTH 68
21.2%
_THRD_OPRT 49
15.3%
_THRD_SPR 48
15.0%
_THRD_CNDX 20
 
6.2%
_THRD_RADI 17
 
5.3%
_THRD_RTX 14
 
4.4%
_THRD_IMMU 13
 
4.0%
_THRD_BX_INIT 12
 
3.7%
_THRD_Summary 12
 
3.7%
_THRD_BX_FLUP 12
 
3.7%
Other values (8) 56
17.4%

Length

2023-12-13T00:53:26.403520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
thrd_hlth 68
21.2%
thrd_oprt 49
15.3%
thrd_spr 48
15.0%
thrd_cndx 20
 
6.2%
thrd_radi 17
 
5.3%
thrd_rtx 14
 
4.4%
thrd_immu 13
 
4.0%
thrd_eval_dead 12
 
3.7%
thrd_bx_flup 12
 
3.7%
thrd_summary 12
 
3.7%
Other values (8) 56
17.4%

gpNm
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
기타
68 
수술
49 
외과병리
48 
진단정보
20 
방사성요오드치료
17 
Other values (13)
119 

Length

Max length15
Median length13
Mean length4.728972
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSummary
2nd rowSummary
3rd rowSummary
4th rowSummary
5th rowSummary

Common Values

ValueCountFrequency (%)
기타 68
21.2%
수술 49
15.3%
외과병리 48
15.0%
진단정보 20
 
6.2%
방사성요오드치료 17
 
5.3%
방사선치료 14
 
4.4%
면역병리 13
 
4.0%
수술 전 병리검사 12
 
3.7%
Summary 12
 
3.7%
F/U 병리검사 12
 
3.7%
Other values (8) 56
17.4%

Length

2023-12-13T00:53:26.593711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
수술 73
17.2%
기타 68
16.0%
외과병리 48
11.3%
f/u 24
 
5.6%
24
 
5.6%
병리검사 24
 
5.6%
진단정보 20
 
4.7%
방사성요오드치료 17
 
4.0%
방사선치료 14
 
3.3%
면역병리 13
 
3.1%
Other values (13) 100
23.5%

tblId
Categorical

HIGH CORRELATION 

Distinct29
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
_THRD_MR_HLTH
68 
_THRD_PE_OPRT_LIST
26 
_THRD_PE_SPR_SUB
23 
_THRD_PE_SPR
19 
_THRD_PE_OPRT
17 
Other values (24)
168 

Length

Max length18
Median length13
Mean length14.573209
Min length12

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row_THRD_PT_TRGT
2nd row_THRD_PT_TRGT
3rd row_THRD_PT_TRGT
4th row_THRD_PT_TRGT
5th row_THRD_PT_TRGT

Common Values

ValueCountFrequency (%)
_THRD_MR_HLTH 68
21.2%
_THRD_PE_OPRT_LIST 26
 
8.1%
_THRD_PE_SPR_SUB 23
 
7.2%
_THRD_PE_SPR 19
 
5.9%
_THRD_PE_OPRT 17
 
5.3%
_THRD_PE_IMMU 13
 
4.0%
_THRD_PE_BX_INIT 12
 
3.7%
_THRD_PE_RADI 12
 
3.7%
_THRD_PE_BX_FLUP 12
 
3.7%
_THRD_PT_TRGT 12
 
3.7%
Other values (19) 107
33.3%

Length

2023-12-13T00:53:26.747778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
thrd_mr_hlth 68
21.2%
thrd_pe_oprt_list 26
 
8.1%
thrd_pe_spr_sub 23
 
7.2%
thrd_pe_spr 19
 
5.9%
thrd_pe_oprt 17
 
5.3%
thrd_pe_immu 13
 
4.0%
thrd_pe_bx_init 12
 
3.7%
thrd_pe_radi 12
 
3.7%
thrd_pe_bx_flup 12
 
3.7%
thrd_pt_trgt 12
 
3.7%
Other values (19) 107
33.3%

tblNm
Categorical

HIGH CORRELATION 

Distinct28
Distinct (%)8.7%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
갑상선암 환자건강정보
68 
갑상선암 수술내용
26 
갑상선암 외과병리(SUB)
23 
갑상선암 외과병리내용
19 
갑상선암 수술정보
17 
Other values (23)
168 

Length

Max length27
Median length20
Mean length12.05919
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row갑성선암 대상자
2nd row갑성선암 대상자
3rd row갑성선암 대상자
4th row갑성선암 대상자
5th row갑성선암 대상자

Common Values

ValueCountFrequency (%)
갑상선암 환자건강정보 68
21.2%
갑상선암 수술내용 26
 
8.1%
갑상선암 외과병리(SUB) 23
 
7.2%
갑상선암 외과병리내용 19
 
5.9%
갑상선암 수술정보 17
 
5.3%
갑상선암 방사성요오드치료 17
 
5.3%
갑상선암 면역병리 13
 
4.0%
갑상선암 Initial 병리검사 12
 
3.7%
갑상선암 F/U 병리검사 12
 
3.7%
갑성선암 대상자 12
 
3.7%
Other values (18) 102
31.8%

Length

2023-12-13T00:53:26.923181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
갑상선암 309
43.1%
환자건강정보 68
 
9.5%
수술내용 26
 
3.6%
initial 25
 
3.5%
병리검사 24
 
3.3%
f/u 24
 
3.3%
외과병리(sub 23
 
3.2%
외과병리내용 19
 
2.6%
수술정보 17
 
2.4%
방사성요오드치료 17
 
2.4%
Other values (24) 165
23.0%

colId
Text

Distinct269
Distinct (%)83.8%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
2023-12-13T00:53:27.268528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length17
Mean length12.105919
Min length5

Characters and Unicode

Total characters3886
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique231 ?
Unique (%)72.0%

Sample

1st rowPT_NM
2nd rowBRTH_YMD
3rd rowSEX_CD
4th rowDIAG_YMD
5th rowDIAG_AGE
ValueCountFrequency (%)
pt_sbst_no 8
 
2.5%
ldng_ymd 7
 
2.2%
oprt_ymd 4
 
1.2%
oprt_nm 3
 
0.9%
miex_nm 2
 
0.6%
bx_inhs_yn 2
 
0.6%
bx_mthd_cmnt 2
 
0.6%
cexm_nm 2
 
0.6%
cexm_rslt_cmnt 2
 
0.6%
miex_srex_cd 2
 
0.6%
Other values (259) 287
89.4%
2023-12-13T00:53:27.748861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 630
16.2%
T 361
 
9.3%
M 329
 
8.5%
N 316
 
8.1%
C 217
 
5.6%
D 211
 
5.4%
R 208
 
5.4%
S 203
 
5.2%
E 153
 
3.9%
A 143
 
3.7%
Other values (25) 1115
28.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3212
82.7%
Connector Punctuation 630
 
16.2%
Decimal Number 44
 
1.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 361
 
11.2%
M 329
 
10.2%
N 316
 
9.8%
C 217
 
6.8%
D 211
 
6.6%
R 208
 
6.5%
S 203
 
6.3%
E 153
 
4.8%
A 143
 
4.5%
Y 141
 
4.4%
Other values (16) 930
29.0%
Decimal Number
ValueCountFrequency (%)
1 13
29.5%
0 12
27.3%
2 9
20.5%
7 4
 
9.1%
3 3
 
6.8%
9 1
 
2.3%
5 1
 
2.3%
4 1
 
2.3%
Connector Punctuation
ValueCountFrequency (%)
_ 630
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3212
82.7%
Common 674
 
17.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 361
 
11.2%
M 329
 
10.2%
N 316
 
9.8%
C 217
 
6.8%
D 211
 
6.6%
R 208
 
6.5%
S 203
 
6.3%
E 153
 
4.8%
A 143
 
4.5%
Y 141
 
4.4%
Other values (16) 930
29.0%
Common
ValueCountFrequency (%)
_ 630
93.5%
1 13
 
1.9%
0 12
 
1.8%
2 9
 
1.3%
7 4
 
0.6%
3 3
 
0.4%
9 1
 
0.1%
5 1
 
0.1%
4 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3886
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 630
16.2%
T 361
 
9.3%
M 329
 
8.5%
N 316
 
8.1%
C 217
 
5.6%
D 211
 
5.4%
R 208
 
5.4%
S 203
 
5.2%
E 153
 
3.9%
A 143
 
3.7%
Other values (25) 1115
28.7%

colNm
Text

Distinct268
Distinct (%)83.5%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
2023-12-13T00:53:28.082429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length36
Mean length11.196262
Min length2

Characters and Unicode

Total characters3594
Distinct characters228
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique245 ?
Unique (%)76.3%

Sample

1st row성명
2nd row생년월일
3rd row성별
4th row최초 진단일자
5th row최초 진단시 나이
ValueCountFrequency (%)
stage 14
 
2.4%
of 13
 
2.2%
finding 11
 
1.9%
operation 11
 
1.9%
procedure 10
 
1.7%
검체결과 10
 
1.7%
and 10
 
1.7%
최초 8
 
1.4%
환자대체번호 8
 
1.4%
diagnosis 7
 
1.2%
Other values (306) 490
82.8%
2023-12-13T00:53:28.561867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
272
 
7.6%
o 159
 
4.4%
i 150
 
4.2%
e 130
 
3.6%
n 117
 
3.3%
a 115
 
3.2%
t 106
 
2.9%
r 92
 
2.6%
s 77
 
2.1%
74
 
2.1%
Other values (218) 2302
64.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1435
39.9%
Other Letter 1430
39.8%
Space Separator 272
 
7.6%
Uppercase Letter 202
 
5.6%
Open Punctuation 73
 
2.0%
Close Punctuation 73
 
2.0%
Other Punctuation 55
 
1.5%
Decimal Number 49
 
1.4%
Dash Punctuation 3
 
0.1%
Connector Punctuation 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
74
 
5.2%
54
 
3.8%
52
 
3.6%
51
 
3.6%
49
 
3.4%
49
 
3.4%
44
 
3.1%
37
 
2.6%
34
 
2.4%
34
 
2.4%
Other values (154) 952
66.6%
Lowercase Letter
ValueCountFrequency (%)
o 159
11.1%
i 150
10.5%
e 130
 
9.1%
n 117
 
8.2%
a 115
 
8.0%
t 106
 
7.4%
r 92
 
6.4%
s 77
 
5.4%
d 71
 
4.9%
g 65
 
4.5%
Other values (13) 353
24.6%
Uppercase Letter
ValueCountFrequency (%)
P 34
16.8%
N 27
13.4%
T 19
9.4%
O 14
 
6.9%
L 14
 
6.9%
S 14
 
6.9%
M 12
 
5.9%
C 10
 
5.0%
E 9
 
4.5%
F 6
 
3.0%
Other values (13) 43
21.3%
Decimal Number
ValueCountFrequency (%)
1 14
28.6%
0 14
28.6%
2 10
20.4%
7 4
 
8.2%
3 3
 
6.1%
4 1
 
2.0%
5 1
 
2.0%
6 1
 
2.0%
9 1
 
2.0%
Other Punctuation
ValueCountFrequency (%)
: 33
60.0%
/ 14
25.5%
, 4
 
7.3%
" 4
 
7.3%
Space Separator
ValueCountFrequency (%)
272
100.0%
Open Punctuation
ValueCountFrequency (%)
( 73
100.0%
Close Punctuation
ValueCountFrequency (%)
) 73
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1637
45.5%
Hangul 1430
39.8%
Common 527
 
14.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
74
 
5.2%
54
 
3.8%
52
 
3.6%
51
 
3.6%
49
 
3.4%
49
 
3.4%
44
 
3.1%
37
 
2.6%
34
 
2.4%
34
 
2.4%
Other values (154) 952
66.6%
Latin
ValueCountFrequency (%)
o 159
 
9.7%
i 150
 
9.2%
e 130
 
7.9%
n 117
 
7.1%
a 115
 
7.0%
t 106
 
6.5%
r 92
 
5.6%
s 77
 
4.7%
d 71
 
4.3%
g 65
 
4.0%
Other values (36) 555
33.9%
Common
ValueCountFrequency (%)
272
51.6%
( 73
 
13.9%
) 73
 
13.9%
: 33
 
6.3%
1 14
 
2.7%
0 14
 
2.7%
/ 14
 
2.7%
2 10
 
1.9%
7 4
 
0.8%
, 4
 
0.8%
Other values (8) 16
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2164
60.2%
Hangul 1430
39.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
272
 
12.6%
o 159
 
7.3%
i 150
 
6.9%
e 130
 
6.0%
n 117
 
5.4%
a 115
 
5.3%
t 106
 
4.9%
r 92
 
4.3%
s 77
 
3.6%
( 73
 
3.4%
Other values (54) 873
40.3%
Hangul
ValueCountFrequency (%)
74
 
5.2%
54
 
3.8%
52
 
3.6%
51
 
3.6%
49
 
3.4%
49
 
3.4%
44
 
3.1%
37
 
2.6%
34
 
2.4%
34
 
2.4%
Other values (154) 952
66.6%

dataType
Categorical

Distinct3
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
STRING
248 
DATE
50 
INTEGER
 
23

Length

Max length7
Median length6
Mean length5.7601246
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSTRING
2nd rowDATE
3rd rowSTRING
4th rowDATE
5th rowINTEGER

Common Values

ValueCountFrequency (%)
STRING 248
77.3%
DATE 50
 
15.6%
INTEGER 23
 
7.2%

Length

2023-12-13T00:53:28.706739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:53:28.804134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
string 248
77.3%
date 50
 
15.6%
integer 23
 
7.2%
Distinct268
Distinct (%)83.5%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
2023-12-13T00:53:29.047105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length36
Mean length11.183801
Min length2

Characters and Unicode

Total characters3590
Distinct characters229
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique245 ?
Unique (%)76.3%

Sample

1st row성명
2nd row생년월일
3rd row성별
4th row최초 진단일자
5th row최초 진단시 나이
ValueCountFrequency (%)
stage 14
 
2.4%
of 13
 
2.2%
operation 11
 
1.9%
finding 11
 
1.9%
검체결과 10
 
1.7%
procedure 10
 
1.7%
and 10
 
1.7%
환자대체번호 8
 
1.4%
최초 8
 
1.4%
lymph 7
 
1.2%
Other values (307) 490
82.8%
2023-12-13T00:53:29.458972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
272
 
7.6%
o 161
 
4.5%
i 150
 
4.2%
e 130
 
3.6%
n 117
 
3.3%
a 115
 
3.2%
t 106
 
3.0%
r 92
 
2.6%
s 77
 
2.1%
74
 
2.1%
Other values (219) 2296
64.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1437
40.0%
Other Letter 1416
39.4%
Space Separator 272
 
7.6%
Uppercase Letter 208
 
5.8%
Open Punctuation 73
 
2.0%
Close Punctuation 73
 
2.0%
Other Punctuation 57
 
1.6%
Decimal Number 49
 
1.4%
Dash Punctuation 3
 
0.1%
Connector Punctuation 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
74
 
5.2%
54
 
3.8%
52
 
3.7%
51
 
3.6%
49
 
3.5%
49
 
3.5%
42
 
3.0%
37
 
2.6%
34
 
2.4%
34
 
2.4%
Other values (154) 940
66.4%
Lowercase Letter
ValueCountFrequency (%)
o 161
11.2%
i 150
10.4%
e 130
 
9.0%
n 117
 
8.1%
a 115
 
8.0%
t 106
 
7.4%
r 92
 
6.4%
s 77
 
5.4%
d 71
 
4.9%
g 65
 
4.5%
Other values (13) 353
24.6%
Uppercase Letter
ValueCountFrequency (%)
P 34
16.3%
N 29
13.9%
T 21
10.1%
O 14
 
6.7%
L 14
 
6.7%
S 14
 
6.7%
M 12
 
5.8%
C 10
 
4.8%
E 9
 
4.3%
R 8
 
3.8%
Other values (13) 43
20.7%
Decimal Number
ValueCountFrequency (%)
1 14
28.6%
0 14
28.6%
2 10
20.4%
7 4
 
8.2%
3 3
 
6.1%
4 1
 
2.0%
5 1
 
2.0%
6 1
 
2.0%
9 1
 
2.0%
Other Punctuation
ValueCountFrequency (%)
: 33
57.9%
/ 14
24.6%
" 4
 
7.0%
, 4
 
7.0%
. 2
 
3.5%
Space Separator
ValueCountFrequency (%)
272
100.0%
Open Punctuation
ValueCountFrequency (%)
( 73
100.0%
Close Punctuation
ValueCountFrequency (%)
) 73
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1645
45.8%
Hangul 1416
39.4%
Common 529
 
14.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
74
 
5.2%
54
 
3.8%
52
 
3.7%
51
 
3.6%
49
 
3.5%
49
 
3.5%
42
 
3.0%
37
 
2.6%
34
 
2.4%
34
 
2.4%
Other values (154) 940
66.4%
Latin
ValueCountFrequency (%)
o 161
 
9.8%
i 150
 
9.1%
e 130
 
7.9%
n 117
 
7.1%
a 115
 
7.0%
t 106
 
6.4%
r 92
 
5.6%
s 77
 
4.7%
d 71
 
4.3%
g 65
 
4.0%
Other values (36) 561
34.1%
Common
ValueCountFrequency (%)
272
51.4%
( 73
 
13.8%
) 73
 
13.8%
: 33
 
6.2%
/ 14
 
2.6%
1 14
 
2.6%
0 14
 
2.6%
2 10
 
1.9%
7 4
 
0.8%
" 4
 
0.8%
Other values (9) 18
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2174
60.6%
Hangul 1416
39.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
272
 
12.5%
o 161
 
7.4%
i 150
 
6.9%
e 130
 
6.0%
n 117
 
5.4%
a 115
 
5.3%
t 106
 
4.9%
r 92
 
4.2%
s 77
 
3.5%
( 73
 
3.4%
Other values (55) 881
40.5%
Hangul
ValueCountFrequency (%)
74
 
5.2%
54
 
3.8%
52
 
3.7%
51
 
3.6%
49
 
3.5%
49
 
3.5%
42
 
3.0%
37
 
2.6%
34
 
2.4%
34
 
2.4%
Other values (154) 940
66.4%

colCnt
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct173
Distinct (%)53.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73865.209
Minimum0
Maximum2951558
Zeros6
Zeros (%)1.9%
Negative0
Negative (%)0.0%
Memory size3.0 KiB
2023-12-13T00:53:29.594077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile23
Q11279
median6000
Q37999
95-th percentile87424
Maximum2951558
Range2951558
Interquartile range (IQR)6720

Descriptive statistics

Standard deviation421073.01
Coefficient of variation (CV)5.7005594
Kurtosis41.772907
Mean73865.209
Median Absolute Deviation (MAD)3619
Skewness6.5702359
Sum23710732
Variance1.7730248 × 1011
MonotonicityNot monotonic
2023-12-13T00:53:29.728816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7785 34
 
10.6%
5900 8
 
2.5%
3893 7
 
2.2%
7802 7
 
2.2%
11061 6
 
1.9%
203408 6
 
1.9%
8658 6
 
1.9%
0 6
 
1.9%
2951558 6
 
1.9%
4576 6
 
1.9%
Other values (163) 229
71.3%
ValueCountFrequency (%)
0 6
1.9%
1 1
 
0.3%
2 3
0.9%
4 1
 
0.3%
11 1
 
0.3%
13 1
 
0.3%
15 1
 
0.3%
22 1
 
0.3%
23 6
1.9%
24 1
 
0.3%
ValueCountFrequency (%)
2951558 6
1.9%
2430429 1
 
0.3%
203408 6
1.9%
87424 4
1.2%
76052 1
 
0.3%
38147 5
1.6%
30165 3
0.9%
29842 1
 
0.3%
29814 1
 
0.3%
29565 1
 
0.3%

dispFormat
Text

CONSTANT  MISSING 

Distinct1
Distinct (%)100.0%
Missing320
Missing (%)99.7%
Memory size2.6 KiB
2023-12-13T00:53:29.853854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters10
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)100.0%

Sample

1st rowYYYY-MM-DD
ValueCountFrequency (%)
yyyy-mm-dd 1
100.0%
2023-12-13T00:53:30.079480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
Y 4
40.0%
- 2
20.0%
M 2
20.0%
D 2
20.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 8
80.0%
Dash Punctuation 2
 
20.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
Y 4
50.0%
M 2
25.0%
D 2
25.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8
80.0%
Common 2
 
20.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
Y 4
50.0%
M 2
25.0%
D 2
25.0%
Common
ValueCountFrequency (%)
- 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
Y 4
40.0%
- 2
20.0%
M 2
20.0%
D 2
20.0%

Interactions

2023-12-13T00:53:24.908037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:53:24.636488image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:53:25.050187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:53:24.770388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T00:53:30.170208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypecolCnt
NUM1.0000.9600.9600.9830.9810.4170.414
gpId0.9601.0001.0001.0001.0000.5890.909
gpNm0.9601.0001.0001.0001.0000.5890.909
tblId0.9831.0001.0001.0001.0000.6480.854
tblNm0.9811.0001.0001.0001.0000.6390.854
dataType0.4170.5890.5890.6480.6391.0000.000
colCnt0.4140.9090.9090.8540.8540.0001.000
2023-12-13T00:53:30.284271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
gpNmtblNmdataTypegpIdtblId
gpNm1.0000.9830.3251.0000.982
tblNm0.9831.0000.4000.9830.998
dataType0.3250.4001.0000.3250.406
gpId1.0000.9830.3251.0000.982
tblId0.9820.9980.4060.9821.000
2023-12-13T00:53:30.427726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMcolCntgpIdgpNmtblIdtblNmdataType
NUM1.000-0.2220.7950.7950.8500.8520.281
colCnt-0.2221.0000.6710.6710.6440.6470.000
gpId0.7950.6711.0001.0000.9820.9830.325
gpNm0.7950.6711.0001.0000.9820.9830.325
tblId0.8500.6440.9820.9821.0000.9980.406
tblNm0.8520.6470.9830.9830.9981.0000.400
dataType0.2810.0000.3250.3250.4060.4001.000

Missing values

2023-12-13T00:53:25.233023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T00:53:25.817166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01_THRD_SummarySummary_THRD_PT_TRGT갑성선암 대상자PT_NM성명STRING성명11061<NA>
12_THRD_SummarySummary_THRD_PT_TRGT갑성선암 대상자BRTH_YMD생년월일DATE생년월일11061<NA>
23_THRD_SummarySummary_THRD_PT_TRGT갑성선암 대상자SEX_CD성별STRING성별11061<NA>
34_THRD_SummarySummary_THRD_PT_TRGT갑성선암 대상자DIAG_YMD최초 진단일자DATE최초 진단일자10513<NA>
45_THRD_SummarySummary_THRD_PT_TRGT갑성선암 대상자DIAG_AGE최초 진단시 나이INTEGER최초 진단시 나이10513<NA>
56_THRD_SummarySummary_THRD_PT_TRGT갑성선암 대상자DIAG_ENM최초 진단영문명STRING최초 진단영문명10513<NA>
67_THRD_SummarySummary_THRD_PT_TRGT갑성선암 대상자DIAG_KNM최초 진단한글명STRING최초 진단한글명10513<NA>
78_THRD_SummarySummary_THRD_PT_TRGT갑성선암 대상자OPRT_YMD최초 수술일자DATE최초 수술일자7572<NA>
89_THRD_SummarySummary_THRD_PT_TRGT갑성선암 대상자OPRT_NM수술명STRING수술명7572<NA>
910_THRD_SummarySummary_THRD_PT_TRGT갑성선암 대상자MDCL_YMD최초 초진일자DATE최초 초진일자9368<NA>
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
311312_THRD_HLTH기타_THRD_MR_HLTH갑상선암 환자건강정보PHIS_INSM_YN과거병력불면증여부STRING과거병력불면증여부7785<NA>
312313_THRD_HLTH기타_THRD_MR_HLTH갑상선암 환자건강정보PHIS_CADZ_YN과거병력심장질환여부STRING과거병력심장질환여부7785<NA>
313314_THRD_HLTH기타_THRD_MR_HLTH갑상선암 환자건강정보PHIS_ETC_YN과거병력기타여부STRING과거병력기타여부7785<NA>
314315_THRD_HLTH기타_THRD_MR_HLTH갑상선암 환자건강정보PHIS_HTN_CMNT과거병력고혈압내용STRING과거병력고혈압내용2056<NA>
315316_THRD_HLTH기타_THRD_MR_HLTH갑상선암 환자건강정보PHIS_DM_CMNT과거병력당뇨내용STRING과거병력당뇨내용345<NA>
316317_THRD_HLTH기타_THRD_MR_HLTH갑상선암 환자건강정보PHIS_CADZ_CMNT과거병력심장질환내용STRING과거병력심장질환내용68<NA>
317318_THRD_HLTH기타_THRD_MR_HLTH갑상선암 환자건강정보PHIS_ETC_CMNT과거병력기타내용STRING과거병력기타내용1096<NA>
318319_THRD_HLTH기타_THRD_MR_HLTH갑상선암 환자건강정보MAIN_SYMP_YN주증상STRING주증상3776<NA>
319320_THRD_HLTH기타_THRD_MR_HLTH갑상선암 환자건강정보MAIN_SYMP_CMNT주증상 상세내용STRING주증상 상세내용3553<NA>
320321_THRD_HLTH기타_THRD_MR_HLTH갑상선암 환자건강정보LDNG_YMD적재일자DATE적재일자7785<NA>