Overview

Dataset statistics

Number of variables11
Number of observations362
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory31.9 KiB
Average record size in memory90.4 B

Variable types

Numeric2
Categorical5
Text4

Dataset

DescriptionPRE위암_라이브러리_메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15073862/fileData.do

Alerts

분류명 is highly overall correlated with 순번 and 3 other fieldsHigh correlation
테이블ID is highly overall correlated with 순번 and 4 other fieldsHigh correlation
분류ID is highly overall correlated with 순번 and 3 other fieldsHigh correlation
테이블명 is highly overall correlated with 순번 and 4 other fieldsHigh correlation
순번 is highly overall correlated with 분류ID and 3 other fieldsHigh correlation
컬럼데이터수 is highly overall correlated with 테이블ID and 1 other fieldsHigh correlation
순번 has unique valuesUnique
컬럼데이터수 has 17 (4.7%) zerosZeros

Reproduction

Analysis started2023-12-12 08:18:51.791257
Analysis finished2023-12-12 08:18:53.439825
Duration1.65 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct362
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean181.5
Minimum1
Maximum362
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.3 KiB
2023-12-12T17:18:53.545691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile19.05
Q191.25
median181.5
Q3271.75
95-th percentile343.95
Maximum362
Range361
Interquartile range (IQR)180.5

Descriptive statistics

Standard deviation104.64464
Coefficient of variation (CV)0.57655447
Kurtosis-1.2
Mean181.5
Median Absolute Deviation (MAD)90.5
Skewness0
Sum65703
Variance10950.5
MonotonicityStrictly increasing
2023-12-12T17:18:53.723022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.3%
250 1
 
0.3%
248 1
 
0.3%
247 1
 
0.3%
246 1
 
0.3%
245 1
 
0.3%
244 1
 
0.3%
243 1
 
0.3%
242 1
 
0.3%
241 1
 
0.3%
Other values (352) 352
97.2%
ValueCountFrequency (%)
1 1
0.3%
2 1
0.3%
3 1
0.3%
4 1
0.3%
5 1
0.3%
6 1
0.3%
7 1
0.3%
8 1
0.3%
9 1
0.3%
10 1
0.3%
ValueCountFrequency (%)
362 1
0.3%
361 1
0.3%
360 1
0.3%
359 1
0.3%
358 1
0.3%
357 1
0.3%
356 1
0.3%
355 1
0.3%
354 1
0.3%
353 1
0.3%

분류ID
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
EX
89 
PT
85 
PTH
56 
OPRT
50 
TRTM
40 
Other values (2)
42 

Length

Max length4
Median length2
Mean length2.6850829
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPT
2nd rowPT
3rd rowPT
4th rowPT
5th rowPT

Common Values

ValueCountFrequency (%)
EX 89
24.6%
PT 85
23.5%
PTH 56
15.5%
OPRT 50
13.8%
TRTM 40
11.0%
DG 36
9.9%
DEAD 6
 
1.7%

Length

2023-12-12T17:18:53.897502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:18:54.039315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
ex 89
24.6%
pt 85
23.5%
pth 56
15.5%
oprt 50
13.8%
trtm 40
11.0%
dg 36
9.9%
dead 6
 
1.7%

분류명
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
검사
89 
환자
85 
병리
56 
수술
50 
치료
40 
Other values (2)
42 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row환자
2nd row환자
3rd row환자
4th row환자
5th row환자

Common Values

ValueCountFrequency (%)
검사 89
24.6%
환자 85
23.5%
병리 56
15.5%
수술 50
13.8%
치료 40
11.0%
진단 36
9.9%
사망 6
 
1.7%

Length

2023-12-12T17:18:54.180553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:18:54.327193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
검사 89
24.6%
환자 85
23.5%
병리 56
15.5%
수술 50
13.8%
치료 40
11.0%
진단 36
9.9%
사망 6
 
1.7%

테이블ID
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
PRE_GSTR_OPRT_NFRM
50 
PRE_GSTR_PT_HLNF
45 
PRE_GSTR_PTH_SRGC
45 
PRE_GSTR_EX_ESD
41 
PRE_GSTR_TRTM_RD
21 
Other values (11)
160 

Length

Max length18
Median length16
Mean length16.455801
Min length15

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPRE_GSTR_PT_BSNF
2nd rowPRE_GSTR_PT_BSNF
3rd rowPRE_GSTR_PT_BSNF
4th rowPRE_GSTR_PT_BSNF
5th rowPRE_GSTR_PT_BSNF

Common Values

ValueCountFrequency (%)
PRE_GSTR_OPRT_NFRM 50
13.8%
PRE_GSTR_PT_HLNF 45
12.4%
PRE_GSTR_PTH_SRGC 45
12.4%
PRE_GSTR_EX_ESD 41
11.3%
PRE_GSTR_TRTM_RD 21
 
5.8%
PRE_GSTR_PT_BSNF 20
 
5.5%
PRE_GSTR_PT_FMHT 20
 
5.5%
PRE_GSTR_TRTM_CASB 19
 
5.2%
PRE_GSTR_EX_ENDS 17
 
4.7%
PRE_GSTR_EX_DIAG 16
 
4.4%
Other values (6) 68
18.8%

Length

2023-12-12T17:18:54.774070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pre_gstr_oprt_nfrm 50
13.8%
pre_gstr_pt_hlnf 45
12.4%
pre_gstr_pth_srgc 45
12.4%
pre_gstr_ex_esd 41
11.3%
pre_gstr_trtm_rd 21
 
5.8%
pre_gstr_pt_bsnf 20
 
5.5%
pre_gstr_pt_fmht 20
 
5.5%
pre_gstr_trtm_casb 19
 
5.2%
pre_gstr_ex_ends 17
 
4.7%
pre_gstr_ex_diag 16
 
4.4%
Other values (6) 68
18.8%

테이블명
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
PRE_위암_수술_정보
50 
PRE_위암_환자_건강정보
45 
PRE_위암_병리_외과
45 
PRE_위암_검사_ ESD
41 
PRE_위암_치료_방사선
21 
Other values (11)
160 

Length

Max length16
Median length14
Mean length12.953039
Min length12

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPRE_위암_환자_기본정보
2nd rowPRE_위암_환자_기본정보
3rd rowPRE_위암_환자_기본정보
4th rowPRE_위암_환자_기본정보
5th rowPRE_위암_환자_기본정보

Common Values

ValueCountFrequency (%)
PRE_위암_수술_정보 50
13.8%
PRE_위암_환자_건강정보 45
12.4%
PRE_위암_병리_외과 45
12.4%
PRE_위암_검사_ ESD 41
11.3%
PRE_위암_치료_방사선 21
 
5.8%
PRE_위암_환자_기본정보 20
 
5.5%
PRE_위암_환자_가족력 20
 
5.5%
PRE_위암_치료_항암제 19
 
5.2%
PRE_위암_검사_내시경 17
 
4.7%
PRE_위암_검사_진단 16
 
4.4%
Other values (6) 68
18.8%

Length

2023-12-12T17:18:54.973742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pre_위암_수술_정보 50
12.4%
pre_위암_환자_건강정보 45
11.2%
pre_위암_병리_외과 45
11.2%
pre_위암_검사 41
10.2%
esd 41
10.2%
pre_위암_치료_방사선 21
 
5.2%
pre_위암_환자_기본정보 20
 
5.0%
pre_위암_환자_가족력 20
 
5.0%
pre_위암_치료_항암제 19
 
4.7%
pre_위암_검사_내시경 17
 
4.2%
Other values (7) 84
20.8%
Distinct302
Distinct (%)83.4%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
2023-12-12T17:18:55.229305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length24
Mean length15.41989
Min length7

Characters and Unicode

Total characters5582
Distinct characters28
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique298 ?
Unique (%)82.3%

Sample

1st rowCENTER_CD
2nd rowIRB_APRV_NO
3rd rowPT_SBST_NO
4th rowBSPT_IDGN_AGE
5th rowBSPT_SEX_CD
ValueCountFrequency (%)
center_cd 16
 
4.4%
irb_aprv_no 16
 
4.4%
pt_sbst_no 16
 
4.4%
crtn_dt 16
 
4.4%
dead_ymd 1
 
0.3%
sgpt_lymp_inva_nm 1
 
0.3%
sgpt_lymp_inva_cd 1
 
0.3%
sgpt_srmg_dstl_cncr_txsz_vl 1
 
0.3%
sgpt_srmg_prox_cncr_txsz_vl 1
 
0.3%
sgpt_oprt_rmrg_nm 1
 
0.3%
Other values (292) 292
80.7%
2023-12-12T17:18:55.678763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 984
17.6%
T 536
 
9.6%
C 436
 
7.8%
S 385
 
6.9%
D 380
 
6.8%
N 363
 
6.5%
R 328
 
5.9%
E 294
 
5.3%
P 286
 
5.1%
M 246
 
4.4%
Other values (18) 1344
24.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4597
82.4%
Connector Punctuation 984
 
17.6%
Decimal Number 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 536
11.7%
C 436
 
9.5%
S 385
 
8.4%
D 380
 
8.3%
N 363
 
7.9%
R 328
 
7.1%
E 294
 
6.4%
P 286
 
6.2%
M 246
 
5.4%
O 176
 
3.8%
Other values (16) 1167
25.4%
Connector Punctuation
ValueCountFrequency (%)
_ 984
100.0%
Decimal Number
ValueCountFrequency (%)
1 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4597
82.4%
Common 985
 
17.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 536
11.7%
C 436
 
9.5%
S 385
 
8.4%
D 380
 
8.3%
N 363
 
7.9%
R 328
 
7.1%
E 294
 
6.4%
P 286
 
6.2%
M 246
 
5.4%
O 176
 
3.8%
Other values (16) 1167
25.4%
Common
ValueCountFrequency (%)
_ 984
99.9%
1 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5582
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 984
17.6%
T 536
 
9.6%
C 436
 
7.8%
S 385
 
6.9%
D 380
 
6.8%
N 363
 
6.5%
R 328
 
5.9%
E 294
 
5.3%
P 286
 
5.1%
M 246
 
4.4%
Other values (18) 1344
24.1%
Distinct302
Distinct (%)83.4%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
2023-12-12T17:18:55.962855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length14
Mean length9.3259669
Min length3

Characters and Unicode

Total characters3376
Distinct characters191
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique298 ?
Unique (%)82.3%

Sample

1st row센터코드
2nd rowIRB승인번호
3rd row환자대체번호
4th row기본환자진단시연령
5th row기본환자성별코드
ValueCountFrequency (%)
센터코드 16
 
4.4%
irb승인번호 16
 
4.4%
환자대체번호 16
 
4.4%
생성일시 16
 
4.4%
사망일자 1
 
0.3%
외과병리림프성침윤명 1
 
0.3%
외과병리림프성침윤코드 1
 
0.3%
외과병리수술절제면원위암조직크기값 1
 
0.3%
외과병리수술절제면근위암조직크기값 1
 
0.3%
외과병리수술절제면명 1
 
0.3%
Other values (292) 292
80.7%
2023-12-12T17:18:56.492342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
124
 
3.7%
123
 
3.6%
119
 
3.5%
107
 
3.2%
94
 
2.8%
93
 
2.8%
87
 
2.6%
84
 
2.5%
79
 
2.3%
78
 
2.3%
Other values (181) 2388
70.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3177
94.1%
Uppercase Letter 198
 
5.9%
Decimal Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
124
 
3.9%
123
 
3.9%
119
 
3.7%
107
 
3.4%
94
 
3.0%
93
 
2.9%
87
 
2.7%
84
 
2.6%
79
 
2.5%
78
 
2.5%
Other values (167) 2189
68.9%
Uppercase Letter
ValueCountFrequency (%)
E 45
22.7%
D 42
21.2%
S 38
19.2%
I 22
11.1%
B 17
 
8.6%
R 16
 
8.1%
G 3
 
1.5%
M 3
 
1.5%
C 3
 
1.5%
O 3
 
1.5%
Other values (3) 6
 
3.0%
Decimal Number
ValueCountFrequency (%)
1 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3177
94.1%
Latin 198
 
5.9%
Common 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
124
 
3.9%
123
 
3.9%
119
 
3.7%
107
 
3.4%
94
 
3.0%
93
 
2.9%
87
 
2.7%
84
 
2.6%
79
 
2.5%
78
 
2.5%
Other values (167) 2189
68.9%
Latin
ValueCountFrequency (%)
E 45
22.7%
D 42
21.2%
S 38
19.2%
I 22
11.1%
B 17
 
8.6%
R 16
 
8.1%
G 3
 
1.5%
M 3
 
1.5%
C 3
 
1.5%
O 3
 
1.5%
Other values (3) 6
 
3.0%
Common
ValueCountFrequency (%)
1 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3177
94.1%
ASCII 199
 
5.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
124
 
3.9%
123
 
3.9%
119
 
3.7%
107
 
3.4%
94
 
3.0%
93
 
2.9%
87
 
2.7%
84
 
2.6%
79
 
2.5%
78
 
2.5%
Other values (167) 2189
68.9%
ASCII
ValueCountFrequency (%)
E 45
22.6%
D 42
21.1%
S 38
19.1%
I 22
11.1%
B 17
 
8.5%
R 16
 
8.0%
G 3
 
1.5%
M 3
 
1.5%
C 3
 
1.5%
O 3
 
1.5%
Other values (4) 7
 
3.5%

데이터타입
Categorical

Distinct21
Distinct (%)5.8%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
VARCHAR(20)
123 
CLOB
53 
VARCHAR(100)
53 
VARCHAR(8)
30 
VARCHAR(10)
17 
Other values (16)
86 

Length

Max length13
Median length12
Mean length9.8895028
Min length4

Unique

Unique5 ?
Unique (%)1.4%

Sample

1st rowVARCHAR(20)
2nd rowVARCHAR(50)
3rd rowVARCHAR(10)
4th rowNUMBER(4)
5th rowVARCHAR(20)

Common Values

ValueCountFrequency (%)
VARCHAR(20) 123
34.0%
CLOB 53
14.6%
VARCHAR(100) 53
14.6%
VARCHAR(8) 30
 
8.3%
VARCHAR(10) 17
 
4.7%
DATETIME 16
 
4.4%
VARCHAR(50) 16
 
4.4%
VARCHAR(200) 14
 
3.9%
NUMBER(3) 14
 
3.9%
VARCHAR(400) 5
 
1.4%
Other values (11) 21
 
5.8%

Length

2023-12-12T17:18:56.716330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
varchar(20 123
34.0%
clob 53
14.6%
varchar(100 53
14.6%
varchar(8 30
 
8.3%
varchar(10 17
 
4.7%
datetime 16
 
4.4%
varchar(50 16
 
4.4%
varchar(200 14
 
3.9%
number(3 14
 
3.9%
varchar(1000 5
 
1.4%
Other values (11) 21
 
5.8%
Distinct294
Distinct (%)81.2%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
2023-12-12T17:18:57.142319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length301
Median length96
Mean length39.91989
Min length7

Characters and Unicode

Total characters14451
Distinct characters327
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique282 ?
Unique (%)77.9%

Sample

1st row센터코드 (5자리 : XXXXX) / 00030 : 국립암센터 예) 00030
2nd row센터별 기준에 따라 생성
3rd row개인고유번호(10자리) / 센터별 별도부여 예) RN12345678
4th row환자의 위암 진단 당시 나이 / 예) 45
5th row환자의 성별코드 / M : 남성, F : 여성, E : ETC 예) M
ValueCountFrequency (%)
464
 
14.7%
178
 
5.7%
환자의 53
 
1.7%
free 47
 
1.5%
text 47
 
1.5%
1 41
 
1.3%
2 34
 
1.1%
n 33
 
1.0%
센터별 32
 
1.0%
00030 32
 
1.0%
Other values (820) 2189
69.5%
2023-12-12T17:18:57.832458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2796
 
19.3%
e 450
 
3.1%
0 444
 
3.1%
t 376
 
2.6%
/ 330
 
2.3%
) 315
 
2.2%
a 282
 
2.0%
r 267
 
1.8%
1 229
 
1.6%
n 229
 
1.6%
Other values (317) 8733
60.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4852
33.6%
Lowercase Letter 3320
23.0%
Space Separator 2796
19.3%
Decimal Number 1238
 
8.6%
Uppercase Letter 1095
 
7.6%
Other Punctuation 651
 
4.5%
Close Punctuation 325
 
2.2%
Open Punctuation 114
 
0.8%
Dash Punctuation 22
 
0.2%
Math Symbol 20
 
0.1%
Other values (2) 18
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
211
 
4.3%
157
 
3.2%
148
 
3.1%
141
 
2.9%
128
 
2.6%
127
 
2.6%
110
 
2.3%
104
 
2.1%
96
 
2.0%
92
 
1.9%
Other values (239) 3538
72.9%
Lowercase Letter
ValueCountFrequency (%)
e 450
13.6%
t 376
11.3%
a 282
 
8.5%
r 267
 
8.0%
n 229
 
6.9%
o 225
 
6.8%
i 215
 
6.5%
s 186
 
5.6%
c 148
 
4.5%
l 146
 
4.4%
Other values (15) 796
24.0%
Uppercase Letter
ValueCountFrequency (%)
Y 155
14.2%
D 129
11.8%
M 116
10.6%
N 84
 
7.7%
X 82
 
7.5%
E 74
 
6.8%
A 56
 
5.1%
I 48
 
4.4%
S 47
 
4.3%
C 45
 
4.1%
Other values (14) 259
23.7%
Decimal Number
ValueCountFrequency (%)
0 444
35.9%
1 229
18.5%
2 163
 
13.2%
3 107
 
8.6%
9 74
 
6.0%
5 67
 
5.4%
4 50
 
4.0%
6 38
 
3.1%
7 33
 
2.7%
8 33
 
2.7%
Other Punctuation
ValueCountFrequency (%)
/ 330
50.7%
: 163
25.0%
, 106
 
16.3%
. 36
 
5.5%
# 10
 
1.5%
& 4
 
0.6%
; 1
 
0.2%
% 1
 
0.2%
Math Symbol
ValueCountFrequency (%)
| 13
65.0%
~ 4
 
20.0%
+ 3
 
15.0%
Close Punctuation
ValueCountFrequency (%)
) 315
96.9%
] 10
 
3.1%
Open Punctuation
ValueCountFrequency (%)
( 104
91.2%
[ 10
 
8.8%
Space Separator
ValueCountFrequency (%)
2796
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 22
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 17
100.0%
Other Number
ValueCountFrequency (%)
² 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5184
35.9%
Hangul 4852
33.6%
Latin 4415
30.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
211
 
4.3%
157
 
3.2%
148
 
3.1%
141
 
2.9%
128
 
2.6%
127
 
2.6%
110
 
2.3%
104
 
2.1%
96
 
2.0%
92
 
1.9%
Other values (239) 3538
72.9%
Latin
ValueCountFrequency (%)
e 450
 
10.2%
t 376
 
8.5%
a 282
 
6.4%
r 267
 
6.0%
n 229
 
5.2%
o 225
 
5.1%
i 215
 
4.9%
s 186
 
4.2%
Y 155
 
3.5%
c 148
 
3.4%
Other values (39) 1882
42.6%
Common
ValueCountFrequency (%)
2796
53.9%
0 444
 
8.6%
/ 330
 
6.4%
) 315
 
6.1%
1 229
 
4.4%
: 163
 
3.1%
2 163
 
3.1%
3 107
 
2.1%
, 106
 
2.0%
( 104
 
2.0%
Other values (19) 427
 
8.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9598
66.4%
Hangul 4852
33.6%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2796
29.1%
e 450
 
4.7%
0 444
 
4.6%
t 376
 
3.9%
/ 330
 
3.4%
) 315
 
3.3%
a 282
 
2.9%
r 267
 
2.8%
1 229
 
2.4%
n 229
 
2.4%
Other values (67) 3880
40.4%
Hangul
ValueCountFrequency (%)
211
 
4.3%
157
 
3.2%
148
 
3.1%
141
 
2.9%
128
 
2.6%
127
 
2.6%
110
 
2.3%
104
 
2.1%
96
 
2.0%
92
 
1.9%
Other values (239) 3538
72.9%
None
ValueCountFrequency (%)
² 1
100.0%

컬럼데이터수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct120
Distinct (%)33.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean61272.967
Minimum0
Maximum593455
Zeros17
Zeros (%)4.7%
Negative0
Negative (%)0.0%
Memory size3.3 KiB
2023-12-12T17:18:58.058249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.45
Q11089
median4189
Q313610
95-th percentile555037
Maximum593455
Range593455
Interquartile range (IQR)12521

Descriptive statistics

Standard deviation156778.23
Coefficient of variation (CV)2.5586851
Kurtosis6.2687166
Mean61272.967
Median Absolute Deviation (MAD)3100
Skewness2.7990386
Sum22180814
Variance2.4579413 × 1010
MonotonicityNot monotonic
2023-12-12T17:18:58.232248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4189 29
 
8.0%
1089 24
 
6.6%
5632 23
 
6.4%
0 17
 
4.7%
1570 16
 
4.4%
39892 14
 
3.9%
758 13
 
3.6%
5899 12
 
3.3%
555037 12
 
3.3%
593455 11
 
3.0%
Other values (110) 191
52.8%
ValueCountFrequency (%)
0 17
4.7%
2 1
 
0.3%
7 1
 
0.3%
16 1
 
0.3%
53 1
 
0.3%
94 1
 
0.3%
101 1
 
0.3%
112 1
 
0.3%
179 1
 
0.3%
217 1
 
0.3%
ValueCountFrequency (%)
593455 11
3.0%
591950 1
 
0.3%
589023 1
 
0.3%
588863 1
 
0.3%
555037 12
3.3%
554975 1
 
0.3%
497982 1
 
0.3%
479066 1
 
0.3%
475950 1
 
0.3%
196419 11
3.0%
Distinct57
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
2023-12-12T17:18:58.630352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length324
Median length173
Mean length15.444751
Min length2

Characters and Unicode

Total characters5591
Distinct characters215
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39 ?
Unique (%)10.8%

Sample

1st row문자(5) : XXXXX
2nd row텍스트
3rd row문자(10) : XXXXXXXXXX
4th row숫자
5th rowM 남성 | F 여성 | E ETC
ValueCountFrequency (%)
283
 
19.7%
텍스트 165
 
11.5%
free 53
 
3.7%
1 33
 
2.3%
2 33
 
2.3%
yyyymmdd 30
 
2.1%
무응답 30
 
2.1%
y 29
 
2.0%
29
 
2.0%
n 29
 
2.0%
Other values (251) 724
50.3%
2023-12-12T17:18:59.298619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1077
 
19.3%
| 252
 
4.5%
e 250
 
4.5%
X 241
 
4.3%
Y 217
 
3.9%
167
 
3.0%
165
 
3.0%
165
 
3.0%
r 158
 
2.8%
M 144
 
2.6%
Other values (205) 2755
49.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1331
23.8%
Other Letter 1269
22.7%
Space Separator 1077
19.3%
Uppercase Letter 1016
18.2%
Decimal Number 437
 
7.8%
Math Symbol 254
 
4.5%
Other Punctuation 67
 
1.2%
Open Punctuation 50
 
0.9%
Close Punctuation 50
 
0.9%
Dash Punctuation 39
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
167
 
13.2%
165
 
13.0%
165
 
13.0%
61
 
4.8%
33
 
2.6%
31
 
2.4%
31
 
2.4%
30
 
2.4%
30
 
2.4%
30
 
2.4%
Other values (139) 526
41.4%
Lowercase Letter
ValueCountFrequency (%)
e 250
18.8%
r 158
11.9%
a 114
8.6%
t 110
8.3%
o 103
7.7%
n 98
 
7.4%
i 82
 
6.2%
l 67
 
5.0%
s 61
 
4.6%
c 52
 
3.9%
Other values (14) 236
17.7%
Uppercase Letter
ValueCountFrequency (%)
X 241
23.7%
Y 217
21.4%
M 144
14.2%
D 105
10.3%
F 57
 
5.6%
S 43
 
4.2%
N 40
 
3.9%
H 39
 
3.8%
I 28
 
2.8%
E 18
 
1.8%
Other values (11) 84
 
8.3%
Decimal Number
ValueCountFrequency (%)
0 120
27.5%
1 91
20.8%
9 61
14.0%
2 52
11.9%
3 37
 
8.5%
5 28
 
6.4%
4 16
 
3.7%
8 13
 
3.0%
6 11
 
2.5%
7 8
 
1.8%
Math Symbol
ValueCountFrequency (%)
| 252
99.2%
+ 2
 
0.8%
Other Punctuation
ValueCountFrequency (%)
: 66
98.5%
/ 1
 
1.5%
Open Punctuation
ValueCountFrequency (%)
( 43
86.0%
[ 7
 
14.0%
Close Punctuation
ValueCountFrequency (%)
) 43
86.0%
] 7
 
14.0%
Space Separator
ValueCountFrequency (%)
1077
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 39
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2347
42.0%
Common 1975
35.3%
Hangul 1269
22.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
167
 
13.2%
165
 
13.0%
165
 
13.0%
61
 
4.8%
33
 
2.6%
31
 
2.4%
31
 
2.4%
30
 
2.4%
30
 
2.4%
30
 
2.4%
Other values (139) 526
41.4%
Latin
ValueCountFrequency (%)
e 250
 
10.7%
X 241
 
10.3%
Y 217
 
9.2%
r 158
 
6.7%
M 144
 
6.1%
a 114
 
4.9%
t 110
 
4.7%
D 105
 
4.5%
o 103
 
4.4%
n 98
 
4.2%
Other values (35) 807
34.4%
Common
ValueCountFrequency (%)
1077
54.5%
| 252
 
12.8%
0 120
 
6.1%
1 91
 
4.6%
: 66
 
3.3%
9 61
 
3.1%
2 52
 
2.6%
( 43
 
2.2%
) 43
 
2.2%
- 39
 
2.0%
Other values (11) 131
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4322
77.3%
Hangul 1269
 
22.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1077
24.9%
| 252
 
5.8%
e 250
 
5.8%
X 241
 
5.6%
Y 217
 
5.0%
r 158
 
3.7%
M 144
 
3.3%
0 120
 
2.8%
a 114
 
2.6%
t 110
 
2.5%
Other values (56) 1639
37.9%
Hangul
ValueCountFrequency (%)
167
 
13.2%
165
 
13.0%
165
 
13.0%
61
 
4.8%
33
 
2.6%
31
 
2.4%
31
 
2.4%
30
 
2.4%
30
 
2.4%
30
 
2.4%
Other values (139) 526
41.4%

Interactions

2023-12-12T17:18:52.870693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:18:52.636446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:18:52.988921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:18:52.756606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T17:18:59.417032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번분류ID분류명테이블ID테이블명데이터타입컬럼데이터수표시형식
순번1.0000.9090.9090.9520.9520.3220.7020.404
분류ID0.9091.0001.0001.0001.0000.3400.4410.327
분류명0.9091.0001.0001.0001.0000.3400.4410.327
테이블ID0.9521.0001.0001.0001.0000.4300.8830.261
테이블명0.9521.0001.0001.0001.0000.4300.8830.261
데이터타입0.3220.3400.3400.4300.4301.0000.3530.910
컬럼데이터수0.7020.4410.4410.8830.8830.3531.0000.679
표시형식0.4040.3270.3270.2610.2610.9100.6791.000
2023-12-12T17:18:59.528539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
데이터타입분류명테이블ID분류ID테이블명
데이터타입1.0000.1310.1450.1310.145
분류명0.1311.0000.9871.0000.987
테이블ID0.1450.9871.0000.9871.000
분류ID0.1311.0000.9871.0000.987
테이블명0.1450.9871.0000.9871.000
2023-12-12T17:18:59.644797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번컬럼데이터수분류ID분류명테이블ID테이블명데이터타입
순번1.000-0.2840.7680.7680.7870.7870.122
컬럼데이터수-0.2841.0000.3000.3000.6890.6890.177
분류ID0.7680.3001.0001.0000.9870.9870.131
분류명0.7680.3001.0001.0000.9870.9870.131
테이블ID0.7870.6890.9870.9871.0001.0000.145
테이블명0.7870.6890.9870.9871.0001.0000.145
데이터타입0.1220.1770.1310.1310.1450.1451.000

Missing values

2023-12-12T17:18:53.145232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T17:18:53.366241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

순번분류ID분류명테이블ID테이블명컬럼ID컬럼명데이터타입컬럼설명컬럼데이터수표시형식
01PT환자PRE_GSTR_PT_BSNFPRE_위암_환자_기본정보CENTER_CD센터코드VARCHAR(20)센터코드 (5자리 : XXXXX) / 00030 : 국립암센터 예) 0003013610문자(5) : XXXXX
12PT환자PRE_GSTR_PT_BSNFPRE_위암_환자_기본정보IRB_APRV_NOIRB승인번호VARCHAR(50)센터별 기준에 따라 생성13610텍스트
23PT환자PRE_GSTR_PT_BSNFPRE_위암_환자_기본정보PT_SBST_NO환자대체번호VARCHAR(10)개인고유번호(10자리) / 센터별 별도부여 예) RN1234567813610문자(10) : XXXXXXXXXX
34PT환자PRE_GSTR_PT_BSNFPRE_위암_환자_기본정보BSPT_IDGN_AGE기본환자진단시연령NUMBER(4)환자의 위암 진단 당시 나이 / 예) 4513610숫자
45PT환자PRE_GSTR_PT_BSNFPRE_위암_환자_기본정보BSPT_SEX_CD기본환자성별코드VARCHAR(20)환자의 성별코드 / M : 남성, F : 여성, E : ETC 예) M13610M 남성 | F 여성 | E ETC
56PT환자PRE_GSTR_PT_BSNFPRE_위암_환자_기본정보BSPT_SEX_CD_ETC_CONT기본환자성별코드기타내용CLOB환자성별코드기타내용(환자성별코드 기타 : E 인 경우) / free text0Free 텍스트
67PT환자PRE_GSTR_PT_BSNFPRE_위암_환자_기본정보BSPT_FRST_ANCN_TRTM_STRT_YMD기본환자최초항암치료시작일자VARCHAR(8)환자가 위암으로 인해 처방받은 최초의 항암제 처방일자 / YYYYMMDD 예)2020010111214YYYYMMDD
78PT환자PRE_GSTR_PT_BSNFPRE_위암_환자_기본정보BSPT_FRST_OPRT_YMD기본환자최초수술일자VARCHAR(8)위암으로 진단 받은 후 첫번째 수술일자 / YYYYMMDD 예)202001015014YYYYMMDD
89PT환자PRE_GSTR_PT_BSNFPRE_위암_환자_기본정보BSPT_FRST_OPRT_CD기본환자최초수술코드VARCHAR(20)위암으로 진단 받은 후 첫번째 수술코드 / 예) 200062995015센터내 수술코드
910PT환자PRE_GSTR_PT_BSNFPRE_위암_환자_기본정보BSPT_FRST_OPRT_NM기본환자최초수술명VARCHAR(1000)위암으로 진단 받은 후 첫번째 수술명 / 예) Laparoscopy Assisted Distal Gastrectomy(림프절 청소 포함)5015텍스트
순번분류ID분류명테이블ID테이블명컬럼ID컬럼명데이터타입컬럼설명컬럼데이터수표시형식
352353TRTM치료PRE_GSTR_TRTM_RDPRE_위암_치료_방사선RDT_TOTL_CGY방사선치료총선량NUMBER(5)방사선 치료 시 총 누적선량 / 정수값 예) 6500758숫자
353354TRTM치료PRE_GSTR_TRTM_RDPRE_위암_치료_방사선RDT_TOTL_TRTM_NT방사선치료총치료횟수NUMBER(5)방사선 치료 총 실시 횟수 / 정수값 예) 18758숫자
354355TRTM치료PRE_GSTR_TRTM_RDPRE_위암_치료_방사선RDT_SMNT_CONT방사선치료평가내용CLOB방사선치료 평가내용 / free text0Free 텍스트
355356TRTM치료PRE_GSTR_TRTM_RDPRE_위암_치료_방사선CRTN_DT생성일시DATETIME생성일시 DEFAULT current_timestamp()758YYYY-MM-DD HH:MI:SS
356357DEAD사망PRE_GSTR_DEAD_NFRMPRE_위암_사망_정보CENTER_CD센터코드VARCHAR(20)센터코드 (5자리 : XXXXX) / 00030 : 국립암센터 예) 00030665문자(5) : XXXXX
357358DEAD사망PRE_GSTR_DEAD_NFRMPRE_위암_사망_정보IRB_APRV_NOIRB승인번호VARCHAR(50)센터별 기준에 따라 생성665텍스트
358359DEAD사망PRE_GSTR_DEAD_NFRMPRE_위암_사망_정보PT_SBST_NO환자대체번호VARCHAR(10)개인고유번호(10자리) / 센터별 별도부여 예) RN12345678665문자(10) : XXXXXXXXXX
359360DEAD사망PRE_GSTR_DEAD_NFRMPRE_위암_사망_정보DEAD_YMD사망일자VARCHAR(8)사망진단서에 기재된 사망일자 / YYYYMMDD 예)20200101665YYYYMMDD
360361DEAD사망PRE_GSTR_DEAD_NFRMPRE_위암_사망_정보DEAD_MAIN_CAUS_CONT사망주원인내용CLOB사망진단서에 기재된 사망원인 / free text 예) Advanced Gastric Cancer665Free 텍스트
361362DEAD사망PRE_GSTR_DEAD_NFRMPRE_위암_사망_정보CRTN_DT생성일시DATETIME생성일시 DEFAULT current_timestamp()665YYYY-MM-DD HH:MI:SS