Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 362 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 31.9 KiB |
Average record size in memory | 90.4 B |
Variable types
Numeric | 2 |
---|---|
Categorical | 5 |
Text | 4 |
Dataset
Description | PRE위암_라이브러리_메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수등)를 제공 |
---|---|
Author | 국립암센터 |
URL | https://www.data.go.kr/data/15073862/fileData.do |
분류명 is highly overall correlated with 순번 and 3 other fields | High correlation |
테이블ID is highly overall correlated with 순번 and 4 other fields | High correlation |
분류ID is highly overall correlated with 순번 and 3 other fields | High correlation |
테이블명 is highly overall correlated with 순번 and 4 other fields | High correlation |
순번 is highly overall correlated with 분류ID and 3 other fields | High correlation |
컬럼데이터수 is highly overall correlated with 테이블ID and 1 other fields | High correlation |
순번 has unique values | Unique |
컬럼데이터수 has 17 (4.7%) zeros | Zeros |
Reproduction
Analysis started | 2023-12-12 08:18:51.791257 |
---|---|
Analysis finished | 2023-12-12 08:18:53.439825 |
Duration | 1.65 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
순번
Real number (ℝ)
HIGH CORRELATION
  UNIQUE
 
Distinct | 362 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 181.5 |
Minimum | 1 |
---|---|
Maximum | 362 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 3.3 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 19.05 |
Q1 | 91.25 |
median | 181.5 |
Q3 | 271.75 |
95-th percentile | 343.95 |
Maximum | 362 |
Range | 361 |
Interquartile range (IQR) | 180.5 |
Descriptive statistics
Standard deviation | 104.64464 |
---|---|
Coefficient of variation (CV) | 0.57655447 |
Kurtosis | -1.2 |
Mean | 181.5 |
Median Absolute Deviation (MAD) | 90.5 |
Skewness | 0 |
Sum | 65703 |
Variance | 10950.5 |
Monotonicity | Strictly increasing |
Value | Count | Frequency (%) |
1 | 1 | 0.3% |
250 | 1 | 0.3% |
248 | 1 | 0.3% |
247 | 1 | 0.3% |
246 | 1 | 0.3% |
245 | 1 | 0.3% |
244 | 1 | 0.3% |
243 | 1 | 0.3% |
242 | 1 | 0.3% |
241 | 1 | 0.3% |
Other values (352) | 352 |
Value | Count | Frequency (%) |
1 | 1 | |
2 | 1 | |
3 | 1 | |
4 | 1 | |
5 | 1 | |
6 | 1 | |
7 | 1 | |
8 | 1 | |
9 | 1 | |
10 | 1 |
Value | Count | Frequency (%) |
362 | 1 | |
361 | 1 | |
360 | 1 | |
359 | 1 | |
358 | 1 | |
357 | 1 | |
356 | 1 | |
355 | 1 | |
354 | 1 | |
353 | 1 |
분류ID
Categorical
HIGH CORRELATION
 
Distinct | 7 |
---|---|
Distinct (%) | 1.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.0 KiB |
EX | |
---|---|
PT | |
PTH | |
OPRT | |
TRTM | |
Other values (2) |
Length
Max length | 4 |
---|---|
Median length | 2 |
Mean length | 2.6850829 |
Min length | 2 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | PT |
---|---|
2nd row | PT |
3rd row | PT |
4th row | PT |
5th row | PT |
Common Values
Value | Count | Frequency (%) |
EX | 89 | |
PT | 85 | |
PTH | 56 | |
OPRT | 50 | |
TRTM | 40 | |
DG | 36 | |
DEAD | 6 | 1.7% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
ex | 89 | |
pt | 85 | |
pth | 56 | |
oprt | 50 | |
trtm | 40 | |
dg | 36 | |
dead | 6 | 1.7% |
분류명
Categorical
HIGH CORRELATION
 
Distinct | 7 |
---|---|
Distinct (%) | 1.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.0 KiB |
검사 | |
---|---|
환자 | |
병리 | |
수술 | |
치료 | |
Other values (2) |
Length
Max length | 2 |
---|---|
Median length | 2 |
Mean length | 2 |
Min length | 2 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 환자 |
---|---|
2nd row | 환자 |
3rd row | 환자 |
4th row | 환자 |
5th row | 환자 |
Common Values
Value | Count | Frequency (%) |
검사 | 89 | |
환자 | 85 | |
병리 | 56 | |
수술 | 50 | |
치료 | 40 | |
진단 | 36 | |
사망 | 6 | 1.7% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
검사 | 89 | |
환자 | 85 | |
병리 | 56 | |
수술 | 50 | |
치료 | 40 | |
진단 | 36 | |
사망 | 6 | 1.7% |
테이블ID
Categorical
HIGH CORRELATION
 
Distinct | 16 |
---|---|
Distinct (%) | 4.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.0 KiB |
PRE_GSTR_OPRT_NFRM | |
---|---|
PRE_GSTR_PT_HLNF | |
PRE_GSTR_PTH_SRGC | |
PRE_GSTR_EX_ESD | |
PRE_GSTR_TRTM_RD | |
Other values (11) |
Length
Max length | 18 |
---|---|
Median length | 16 |
Mean length | 16.455801 |
Min length | 15 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | PRE_GSTR_PT_BSNF |
---|---|
2nd row | PRE_GSTR_PT_BSNF |
3rd row | PRE_GSTR_PT_BSNF |
4th row | PRE_GSTR_PT_BSNF |
5th row | PRE_GSTR_PT_BSNF |
Common Values
Value | Count | Frequency (%) |
PRE_GSTR_OPRT_NFRM | 50 | |
PRE_GSTR_PT_HLNF | 45 | |
PRE_GSTR_PTH_SRGC | 45 | |
PRE_GSTR_EX_ESD | 41 | |
PRE_GSTR_TRTM_RD | 21 | 5.8% |
PRE_GSTR_PT_BSNF | 20 | 5.5% |
PRE_GSTR_PT_FMHT | 20 | 5.5% |
PRE_GSTR_TRTM_CASB | 19 | 5.2% |
PRE_GSTR_EX_ENDS | 17 | 4.7% |
PRE_GSTR_EX_DIAG | 16 | 4.4% |
Other values (6) | 68 |
Length
Value | Count | Frequency (%) |
pre_gstr_oprt_nfrm | 50 | |
pre_gstr_pt_hlnf | 45 | |
pre_gstr_pth_srgc | 45 | |
pre_gstr_ex_esd | 41 | |
pre_gstr_trtm_rd | 21 | 5.8% |
pre_gstr_pt_bsnf | 20 | 5.5% |
pre_gstr_pt_fmht | 20 | 5.5% |
pre_gstr_trtm_casb | 19 | 5.2% |
pre_gstr_ex_ends | 17 | 4.7% |
pre_gstr_ex_diag | 16 | 4.4% |
Other values (6) | 68 |
테이블명
Categorical
HIGH CORRELATION
 
Distinct | 16 |
---|---|
Distinct (%) | 4.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.0 KiB |
PRE_위암_수술_정보 | |
---|---|
PRE_위암_환자_건강정보 | |
PRE_위암_병리_외과 | |
PRE_위암_검사_ ESD | |
PRE_위암_치료_방사선 | |
Other values (11) |
Length
Max length | 16 |
---|---|
Median length | 14 |
Mean length | 12.953039 |
Min length | 12 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | PRE_위암_환자_기본정보 |
---|---|
2nd row | PRE_위암_환자_기본정보 |
3rd row | PRE_위암_환자_기본정보 |
4th row | PRE_위암_환자_기본정보 |
5th row | PRE_위암_환자_기본정보 |
Common Values
Value | Count | Frequency (%) |
PRE_위암_수술_정보 | 50 | |
PRE_위암_환자_건강정보 | 45 | |
PRE_위암_병리_외과 | 45 | |
PRE_위암_검사_ ESD | 41 | |
PRE_위암_치료_방사선 | 21 | 5.8% |
PRE_위암_환자_기본정보 | 20 | 5.5% |
PRE_위암_환자_가족력 | 20 | 5.5% |
PRE_위암_치료_항암제 | 19 | 5.2% |
PRE_위암_검사_내시경 | 17 | 4.7% |
PRE_위암_검사_진단 | 16 | 4.4% |
Other values (6) | 68 |
Length
Value | Count | Frequency (%) |
pre_위암_수술_정보 | 50 | |
pre_위암_환자_건강정보 | 45 | |
pre_위암_병리_외과 | 45 | |
pre_위암_검사 | 41 | |
esd | 41 | |
pre_위암_치료_방사선 | 21 | 5.2% |
pre_위암_환자_기본정보 | 20 | 5.0% |
pre_위암_환자_가족력 | 20 | 5.0% |
pre_위암_치료_항암제 | 19 | 4.7% |
pre_위암_검사_내시경 | 17 | 4.2% |
Other values (7) | 84 |
컬럼ID
Text
Distinct | 302 |
---|---|
Distinct (%) | 83.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.0 KiB |
Length
Max length | 29 |
---|---|
Median length | 24 |
Mean length | 15.41989 |
Min length | 7 |
Characters and Unicode
Total characters | 5582 |
---|---|
Distinct characters | 28 |
Distinct categories | 3 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 298 ? |
---|---|
Unique (%) | 82.3% |
Sample
1st row | CENTER_CD |
---|---|
2nd row | IRB_APRV_NO |
3rd row | PT_SBST_NO |
4th row | BSPT_IDGN_AGE |
5th row | BSPT_SEX_CD |
Value | Count | Frequency (%) |
center_cd | 16 | 4.4% |
irb_aprv_no | 16 | 4.4% |
pt_sbst_no | 16 | 4.4% |
crtn_dt | 16 | 4.4% |
dead_ymd | 1 | 0.3% |
sgpt_lymp_inva_nm | 1 | 0.3% |
sgpt_lymp_inva_cd | 1 | 0.3% |
sgpt_srmg_dstl_cncr_txsz_vl | 1 | 0.3% |
sgpt_srmg_prox_cncr_txsz_vl | 1 | 0.3% |
sgpt_oprt_rmrg_nm | 1 | 0.3% |
Other values (292) | 292 |
Most occurring characters
Value | Count | Frequency (%) |
_ | 984 | |
T | 536 | 9.6% |
C | 436 | 7.8% |
S | 385 | 6.9% |
D | 380 | 6.8% |
N | 363 | 6.5% |
R | 328 | 5.9% |
E | 294 | 5.3% |
P | 286 | 5.1% |
M | 246 | 4.4% |
Other values (18) | 1344 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 4597 | |
Connector Punctuation | 984 | 17.6% |
Decimal Number | 1 | < 0.1% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
T | 536 | |
C | 436 | 9.5% |
S | 385 | 8.4% |
D | 380 | 8.3% |
N | 363 | 7.9% |
R | 328 | 7.1% |
E | 294 | 6.4% |
P | 286 | 6.2% |
M | 246 | 5.4% |
O | 176 | 3.8% |
Other values (16) | 1167 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 984 |
Decimal Number
Value | Count | Frequency (%) |
1 | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 4597 | |
Common | 985 | 17.6% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
T | 536 | |
C | 436 | 9.5% |
S | 385 | 8.4% |
D | 380 | 8.3% |
N | 363 | 7.9% |
R | 328 | 7.1% |
E | 294 | 6.4% |
P | 286 | 6.2% |
M | 246 | 5.4% |
O | 176 | 3.8% |
Other values (16) | 1167 |
Common
Value | Count | Frequency (%) |
_ | 984 | |
1 | 1 | 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5582 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
_ | 984 | |
T | 536 | 9.6% |
C | 436 | 7.8% |
S | 385 | 6.9% |
D | 380 | 6.8% |
N | 363 | 6.5% |
R | 328 | 5.9% |
E | 294 | 5.3% |
P | 286 | 5.1% |
M | 246 | 4.4% |
Other values (18) | 1344 |
컬럼명
Text
Distinct | 302 |
---|---|
Distinct (%) | 83.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.0 KiB |
Value | Count | Frequency (%) |
센터코드 | 16 | 4.4% |
irb승인번호 | 16 | 4.4% |
환자대체번호 | 16 | 4.4% |
생성일시 | 16 | 4.4% |
사망일자 | 1 | 0.3% |
외과병리림프성침윤명 | 1 | 0.3% |
외과병리림프성침윤코드 | 1 | 0.3% |
외과병리수술절제면원위암조직크기값 | 1 | 0.3% |
외과병리수술절제면근위암조직크기값 | 1 | 0.3% |
외과병리수술절제면명 | 1 | 0.3% |
Other values (292) | 292 |
Most occurring characters
Value | Count | Frequency (%) |
드 | 124 | 3.7% |
코 | 123 | 3.6% |
자 | 119 | 3.5% |
사 | 107 | 3.2% |
병 | 94 | 2.8% |
환 | 93 | 2.8% |
기 | 87 | 2.6% |
검 | 84 | 2.5% |
술 | 79 | 2.3% |
수 | 78 | 2.3% |
Other values (181) | 2388 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 3177 | |
Uppercase Letter | 198 | 5.9% |
Decimal Number | 1 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
드 | 124 | 3.9% |
코 | 123 | 3.9% |
자 | 119 | 3.7% |
사 | 107 | 3.4% |
병 | 94 | 3.0% |
환 | 93 | 2.9% |
기 | 87 | 2.7% |
검 | 84 | 2.6% |
술 | 79 | 2.5% |
수 | 78 | 2.5% |
Other values (167) | 2189 |
Uppercase Letter
Value | Count | Frequency (%) |
E | 45 | |
D | 42 | |
S | 38 | |
I | 22 | |
B | 17 | 8.6% |
R | 16 | 8.1% |
G | 3 | 1.5% |
M | 3 | 1.5% |
C | 3 | 1.5% |
O | 3 | 1.5% |
Other values (3) | 6 | 3.0% |
Decimal Number
Value | Count | Frequency (%) |
1 | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 3177 | |
Latin | 198 | 5.9% |
Common | 1 | < 0.1% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
드 | 124 | 3.9% |
코 | 123 | 3.9% |
자 | 119 | 3.7% |
사 | 107 | 3.4% |
병 | 94 | 3.0% |
환 | 93 | 2.9% |
기 | 87 | 2.7% |
검 | 84 | 2.6% |
술 | 79 | 2.5% |
수 | 78 | 2.5% |
Other values (167) | 2189 |
Latin
Value | Count | Frequency (%) |
E | 45 | |
D | 42 | |
S | 38 | |
I | 22 | |
B | 17 | 8.6% |
R | 16 | 8.1% |
G | 3 | 1.5% |
M | 3 | 1.5% |
C | 3 | 1.5% |
O | 3 | 1.5% |
Other values (3) | 6 | 3.0% |
Common
Value | Count | Frequency (%) |
1 | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 3177 | |
ASCII | 199 | 5.9% |
Most frequent character per block
Hangul
Value | Count | Frequency (%) |
드 | 124 | 3.9% |
코 | 123 | 3.9% |
자 | 119 | 3.7% |
사 | 107 | 3.4% |
병 | 94 | 3.0% |
환 | 93 | 2.9% |
기 | 87 | 2.7% |
검 | 84 | 2.6% |
술 | 79 | 2.5% |
수 | 78 | 2.5% |
Other values (167) | 2189 |
ASCII
Value | Count | Frequency (%) |
E | 45 | |
D | 42 | |
S | 38 | |
I | 22 | |
B | 17 | 8.5% |
R | 16 | 8.0% |
G | 3 | 1.5% |
M | 3 | 1.5% |
C | 3 | 1.5% |
O | 3 | 1.5% |
Other values (4) | 7 | 3.5% |
데이터타입
Categorical
Distinct | 21 |
---|---|
Distinct (%) | 5.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.0 KiB |
VARCHAR(20) | |
---|---|
CLOB | |
VARCHAR(100) | |
VARCHAR(8) | |
VARCHAR(10) | |
Other values (16) |
Length
Max length | 13 |
---|---|
Median length | 12 |
Mean length | 9.8895028 |
Min length | 4 |
Unique
Unique | 5 ? |
---|---|
Unique (%) | 1.4% |
Sample
1st row | VARCHAR(20) |
---|---|
2nd row | VARCHAR(50) |
3rd row | VARCHAR(10) |
4th row | NUMBER(4) |
5th row | VARCHAR(20) |
Common Values
Value | Count | Frequency (%) |
VARCHAR(20) | 123 | |
CLOB | 53 | |
VARCHAR(100) | 53 | |
VARCHAR(8) | 30 | 8.3% |
VARCHAR(10) | 17 | 4.7% |
DATETIME | 16 | 4.4% |
VARCHAR(50) | 16 | 4.4% |
VARCHAR(200) | 14 | 3.9% |
NUMBER(3) | 14 | 3.9% |
VARCHAR(400) | 5 | 1.4% |
Other values (11) | 21 | 5.8% |
Length
Value | Count | Frequency (%) |
varchar(20 | 123 | |
clob | 53 | |
varchar(100 | 53 | |
varchar(8 | 30 | 8.3% |
varchar(10 | 17 | 4.7% |
datetime | 16 | 4.4% |
varchar(50 | 16 | 4.4% |
varchar(200 | 14 | 3.9% |
number(3 | 14 | 3.9% |
varchar(1000 | 5 | 1.4% |
Other values (11) | 21 | 5.8% |
컬럼설명
Text
Distinct | 294 |
---|---|
Distinct (%) | 81.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.0 KiB |
Length
Max length | 301 |
---|---|
Median length | 96 |
Mean length | 39.91989 |
Min length | 7 |
Characters and Unicode
Total characters | 14451 |
---|---|
Distinct characters | 327 |
Distinct categories | 12 ? |
Distinct scripts | 3 ? |
Distinct blocks | 3 ? |
Unique
Unique | 282 ? |
---|---|
Unique (%) | 77.9% |
Sample
1st row | 센터코드 (5자리 : XXXXX) / 00030 : 국립암센터 예) 00030 |
---|---|
2nd row | 센터별 기준에 따라 생성 |
3rd row | 개인고유번호(10자리) / 센터별 별도부여 예) RN12345678 |
4th row | 환자의 위암 진단 당시 나이 / 예) 45 |
5th row | 환자의 성별코드 / M : 남성, F : 여성, E : ETC 예) M |
Value | Count | Frequency (%) |
464 | 14.7% | |
예 | 178 | 5.7% |
환자의 | 53 | 1.7% |
free | 47 | 1.5% |
text | 47 | 1.5% |
1 | 41 | 1.3% |
2 | 34 | 1.1% |
n | 33 | 1.0% |
센터별 | 32 | 1.0% |
00030 | 32 | 1.0% |
Other values (820) | 2189 |
Most occurring characters
Value | Count | Frequency (%) |
2796 | 19.3% | |
e | 450 | 3.1% |
0 | 444 | 3.1% |
t | 376 | 2.6% |
/ | 330 | 2.3% |
) | 315 | 2.2% |
a | 282 | 2.0% |
r | 267 | 1.8% |
1 | 229 | 1.6% |
n | 229 | 1.6% |
Other values (317) | 8733 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 4852 | |
Lowercase Letter | 3320 | |
Space Separator | 2796 | |
Decimal Number | 1238 | 8.6% |
Uppercase Letter | 1095 | 7.6% |
Other Punctuation | 651 | 4.5% |
Close Punctuation | 325 | 2.2% |
Open Punctuation | 114 | 0.8% |
Dash Punctuation | 22 | 0.2% |
Math Symbol | 20 | 0.1% |
Other values (2) | 18 | 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
예 | 211 | 4.3% |
자 | 157 | 3.2% |
기 | 148 | 3.1% |
사 | 141 | 2.9% |
드 | 128 | 2.6% |
코 | 127 | 2.6% |
의 | 110 | 2.3% |
검 | 104 | 2.1% |
암 | 96 | 2.0% |
부 | 92 | 1.9% |
Other values (239) | 3538 |
Lowercase Letter
Value | Count | Frequency (%) |
e | 450 | |
t | 376 | |
a | 282 | 8.5% |
r | 267 | 8.0% |
n | 229 | 6.9% |
o | 225 | 6.8% |
i | 215 | 6.5% |
s | 186 | 5.6% |
c | 148 | 4.5% |
l | 146 | 4.4% |
Other values (15) | 796 |
Uppercase Letter
Value | Count | Frequency (%) |
Y | 155 | |
D | 129 | |
M | 116 | |
N | 84 | 7.7% |
X | 82 | 7.5% |
E | 74 | 6.8% |
A | 56 | 5.1% |
I | 48 | 4.4% |
S | 47 | 4.3% |
C | 45 | 4.1% |
Other values (14) | 259 |
Decimal Number
Value | Count | Frequency (%) |
0 | 444 | |
1 | 229 | |
2 | 163 | 13.2% |
3 | 107 | 8.6% |
9 | 74 | 6.0% |
5 | 67 | 5.4% |
4 | 50 | 4.0% |
6 | 38 | 3.1% |
7 | 33 | 2.7% |
8 | 33 | 2.7% |
Other Punctuation
Value | Count | Frequency (%) |
/ | 330 | |
: | 163 | |
, | 106 | 16.3% |
. | 36 | 5.5% |
# | 10 | 1.5% |
& | 4 | 0.6% |
; | 1 | 0.2% |
% | 1 | 0.2% |
Math Symbol
Value | Count | Frequency (%) |
| | 13 | |
~ | 4 | 20.0% |
+ | 3 | 15.0% |
Close Punctuation
Value | Count | Frequency (%) |
) | 315 | |
] | 10 | 3.1% |
Open Punctuation
Value | Count | Frequency (%) |
( | 104 | |
[ | 10 | 8.8% |
Space Separator
Value | Count | Frequency (%) |
2796 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 22 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 17 |
Other Number
Value | Count | Frequency (%) |
² | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 5184 | |
Hangul | 4852 | |
Latin | 4415 |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
예 | 211 | 4.3% |
자 | 157 | 3.2% |
기 | 148 | 3.1% |
사 | 141 | 2.9% |
드 | 128 | 2.6% |
코 | 127 | 2.6% |
의 | 110 | 2.3% |
검 | 104 | 2.1% |
암 | 96 | 2.0% |
부 | 92 | 1.9% |
Other values (239) | 3538 |
Latin
Value | Count | Frequency (%) |
e | 450 | 10.2% |
t | 376 | 8.5% |
a | 282 | 6.4% |
r | 267 | 6.0% |
n | 229 | 5.2% |
o | 225 | 5.1% |
i | 215 | 4.9% |
s | 186 | 4.2% |
Y | 155 | 3.5% |
c | 148 | 3.4% |
Other values (39) | 1882 |
Common
Value | Count | Frequency (%) |
2796 | ||
0 | 444 | 8.6% |
/ | 330 | 6.4% |
) | 315 | 6.1% |
1 | 229 | 4.4% |
: | 163 | 3.1% |
2 | 163 | 3.1% |
3 | 107 | 2.1% |
, | 106 | 2.0% |
( | 104 | 2.0% |
Other values (19) | 427 | 8.2% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 9598 | |
Hangul | 4852 | |
None | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
2796 | ||
e | 450 | 4.7% |
0 | 444 | 4.6% |
t | 376 | 3.9% |
/ | 330 | 3.4% |
) | 315 | 3.3% |
a | 282 | 2.9% |
r | 267 | 2.8% |
1 | 229 | 2.4% |
n | 229 | 2.4% |
Other values (67) | 3880 |
Hangul
Value | Count | Frequency (%) |
예 | 211 | 4.3% |
자 | 157 | 3.2% |
기 | 148 | 3.1% |
사 | 141 | 2.9% |
드 | 128 | 2.6% |
코 | 127 | 2.6% |
의 | 110 | 2.3% |
검 | 104 | 2.1% |
암 | 96 | 2.0% |
부 | 92 | 1.9% |
Other values (239) | 3538 |
None
Value | Count | Frequency (%) |
² | 1 |
컬럼데이터수
Real number (ℝ)
HIGH CORRELATION
  ZEROS
 
Distinct | 120 |
---|---|
Distinct (%) | 33.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 61272.967 |
Minimum | 0 |
---|---|
Maximum | 593455 |
Zeros | 17 |
Zeros (%) | 4.7% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 3.3 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 7.45 |
Q1 | 1089 |
median | 4189 |
Q3 | 13610 |
95-th percentile | 555037 |
Maximum | 593455 |
Range | 593455 |
Interquartile range (IQR) | 12521 |
Descriptive statistics
Standard deviation | 156778.23 |
---|---|
Coefficient of variation (CV) | 2.5586851 |
Kurtosis | 6.2687166 |
Mean | 61272.967 |
Median Absolute Deviation (MAD) | 3100 |
Skewness | 2.7990386 |
Sum | 22180814 |
Variance | 2.4579413 × 1010 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
4189 | 29 | 8.0% |
1089 | 24 | 6.6% |
5632 | 23 | 6.4% |
0 | 17 | 4.7% |
1570 | 16 | 4.4% |
39892 | 14 | 3.9% |
758 | 13 | 3.6% |
5899 | 12 | 3.3% |
555037 | 12 | 3.3% |
593455 | 11 | 3.0% |
Other values (110) | 191 |
Value | Count | Frequency (%) |
0 | 17 | |
2 | 1 | 0.3% |
7 | 1 | 0.3% |
16 | 1 | 0.3% |
53 | 1 | 0.3% |
94 | 1 | 0.3% |
101 | 1 | 0.3% |
112 | 1 | 0.3% |
179 | 1 | 0.3% |
217 | 1 | 0.3% |
Value | Count | Frequency (%) |
593455 | 11 | |
591950 | 1 | 0.3% |
589023 | 1 | 0.3% |
588863 | 1 | 0.3% |
555037 | 12 | |
554975 | 1 | 0.3% |
497982 | 1 | 0.3% |
479066 | 1 | 0.3% |
475950 | 1 | 0.3% |
196419 | 11 |
표시형식
Text
Distinct | 57 |
---|---|
Distinct (%) | 15.7% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.0 KiB |
Length
Max length | 324 |
---|---|
Median length | 173 |
Mean length | 15.444751 |
Min length | 2 |
Characters and Unicode
Total characters | 5591 |
---|---|
Distinct characters | 215 |
Distinct categories | 11 ? |
Distinct scripts | 3 ? |
Distinct blocks | 2 ? |
Unique
Unique | 39 ? |
---|---|
Unique (%) | 10.8% |
Sample
1st row | 문자(5) : XXXXX |
---|---|
2nd row | 텍스트 |
3rd row | 문자(10) : XXXXXXXXXX |
4th row | 숫자 |
5th row | M 남성 | F 여성 | E ETC |
Value | Count | Frequency (%) |
283 | 19.7% | |
텍스트 | 165 | 11.5% |
free | 53 | 3.7% |
1 | 33 | 2.3% |
2 | 33 | 2.3% |
yyyymmdd | 30 | 2.1% |
무응답 | 30 | 2.1% |
y | 29 | 2.0% |
여 | 29 | 2.0% |
n | 29 | 2.0% |
Other values (251) | 724 |
Most occurring characters
Value | Count | Frequency (%) |
1077 | 19.3% | |
| | 252 | 4.5% |
e | 250 | 4.5% |
X | 241 | 4.3% |
Y | 217 | 3.9% |
스 | 167 | 3.0% |
텍 | 165 | 3.0% |
트 | 165 | 3.0% |
r | 158 | 2.8% |
M | 144 | 2.6% |
Other values (205) | 2755 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 1331 | |
Other Letter | 1269 | |
Space Separator | 1077 | |
Uppercase Letter | 1016 | |
Decimal Number | 437 | 7.8% |
Math Symbol | 254 | 4.5% |
Other Punctuation | 67 | 1.2% |
Open Punctuation | 50 | 0.9% |
Close Punctuation | 50 | 0.9% |
Dash Punctuation | 39 | 0.7% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
스 | 167 | 13.2% |
텍 | 165 | 13.0% |
트 | 165 | 13.0% |
자 | 61 | 4.8% |
문 | 33 | 2.6% |
부 | 31 | 2.4% |
무 | 31 | 2.4% |
여 | 30 | 2.4% |
응 | 30 | 2.4% |
답 | 30 | 2.4% |
Other values (139) | 526 |
Lowercase Letter
Value | Count | Frequency (%) |
e | 250 | |
r | 158 | |
a | 114 | |
t | 110 | |
o | 103 | |
n | 98 | 7.4% |
i | 82 | 6.2% |
l | 67 | 5.0% |
s | 61 | 4.6% |
c | 52 | 3.9% |
Other values (14) | 236 |
Uppercase Letter
Value | Count | Frequency (%) |
X | 241 | |
Y | 217 | |
M | 144 | |
D | 105 | |
F | 57 | 5.6% |
S | 43 | 4.2% |
N | 40 | 3.9% |
H | 39 | 3.8% |
I | 28 | 2.8% |
E | 18 | 1.8% |
Other values (11) | 84 | 8.3% |
Decimal Number
Value | Count | Frequency (%) |
0 | 120 | |
1 | 91 | |
9 | 61 | |
2 | 52 | |
3 | 37 | 8.5% |
5 | 28 | 6.4% |
4 | 16 | 3.7% |
8 | 13 | 3.0% |
6 | 11 | 2.5% |
7 | 8 | 1.8% |
Math Symbol
Value | Count | Frequency (%) |
| | 252 | |
+ | 2 | 0.8% |
Other Punctuation
Value | Count | Frequency (%) |
: | 66 | |
/ | 1 | 1.5% |
Open Punctuation
Value | Count | Frequency (%) |
( | 43 | |
[ | 7 | 14.0% |
Close Punctuation
Value | Count | Frequency (%) |
) | 43 | |
] | 7 | 14.0% |
Space Separator
Value | Count | Frequency (%) |
1077 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 39 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 2347 | |
Common | 1975 | |
Hangul | 1269 |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
스 | 167 | 13.2% |
텍 | 165 | 13.0% |
트 | 165 | 13.0% |
자 | 61 | 4.8% |
문 | 33 | 2.6% |
부 | 31 | 2.4% |
무 | 31 | 2.4% |
여 | 30 | 2.4% |
응 | 30 | 2.4% |
답 | 30 | 2.4% |
Other values (139) | 526 |
Latin
Value | Count | Frequency (%) |
e | 250 | 10.7% |
X | 241 | 10.3% |
Y | 217 | 9.2% |
r | 158 | 6.7% |
M | 144 | 6.1% |
a | 114 | 4.9% |
t | 110 | 4.7% |
D | 105 | 4.5% |
o | 103 | 4.4% |
n | 98 | 4.2% |
Other values (35) | 807 |
Common
Value | Count | Frequency (%) |
1077 | ||
| | 252 | 12.8% |
0 | 120 | 6.1% |
1 | 91 | 4.6% |
: | 66 | 3.3% |
9 | 61 | 3.1% |
2 | 52 | 2.6% |
( | 43 | 2.2% |
) | 43 | 2.2% |
- | 39 | 2.0% |
Other values (11) | 131 | 6.6% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 4322 | |
Hangul | 1269 | 22.7% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1077 | ||
| | 252 | 5.8% |
e | 250 | 5.8% |
X | 241 | 5.6% |
Y | 217 | 5.0% |
r | 158 | 3.7% |
M | 144 | 3.3% |
0 | 120 | 2.8% |
a | 114 | 2.6% |
t | 110 | 2.5% |
Other values (56) | 1639 |
Hangul
Value | Count | Frequency (%) |
스 | 167 | 13.2% |
텍 | 165 | 13.0% |
트 | 165 | 13.0% |
자 | 61 | 4.8% |
문 | 33 | 2.6% |
부 | 31 | 2.4% |
무 | 31 | 2.4% |
여 | 30 | 2.4% |
응 | 30 | 2.4% |
답 | 30 | 2.4% |
Other values (139) | 526 |
순번 | 분류ID | 분류명 | 테이블ID | 테이블명 | 데이터타입 | 컬럼데이터수 | 표시형식 | |
---|---|---|---|---|---|---|---|---|
순번 | 1.000 | 0.909 | 0.909 | 0.952 | 0.952 | 0.322 | 0.702 | 0.404 |
분류ID | 0.909 | 1.000 | 1.000 | 1.000 | 1.000 | 0.340 | 0.441 | 0.327 |
분류명 | 0.909 | 1.000 | 1.000 | 1.000 | 1.000 | 0.340 | 0.441 | 0.327 |
테이블ID | 0.952 | 1.000 | 1.000 | 1.000 | 1.000 | 0.430 | 0.883 | 0.261 |
테이블명 | 0.952 | 1.000 | 1.000 | 1.000 | 1.000 | 0.430 | 0.883 | 0.261 |
데이터타입 | 0.322 | 0.340 | 0.340 | 0.430 | 0.430 | 1.000 | 0.353 | 0.910 |
컬럼데이터수 | 0.702 | 0.441 | 0.441 | 0.883 | 0.883 | 0.353 | 1.000 | 0.679 |
표시형식 | 0.404 | 0.327 | 0.327 | 0.261 | 0.261 | 0.910 | 0.679 | 1.000 |
데이터타입 | 분류명 | 테이블ID | 분류ID | 테이블명 | |
---|---|---|---|---|---|
데이터타입 | 1.000 | 0.131 | 0.145 | 0.131 | 0.145 |
분류명 | 0.131 | 1.000 | 0.987 | 1.000 | 0.987 |
테이블ID | 0.145 | 0.987 | 1.000 | 0.987 | 1.000 |
분류ID | 0.131 | 1.000 | 0.987 | 1.000 | 0.987 |
테이블명 | 0.145 | 0.987 | 1.000 | 0.987 | 1.000 |
순번 | 컬럼데이터수 | 분류ID | 분류명 | 테이블ID | 테이블명 | 데이터타입 | |
---|---|---|---|---|---|---|---|
순번 | 1.000 | -0.284 | 0.768 | 0.768 | 0.787 | 0.787 | 0.122 |
컬럼데이터수 | -0.284 | 1.000 | 0.300 | 0.300 | 0.689 | 0.689 | 0.177 |
분류ID | 0.768 | 0.300 | 1.000 | 1.000 | 0.987 | 0.987 | 0.131 |
분류명 | 0.768 | 0.300 | 1.000 | 1.000 | 0.987 | 0.987 | 0.131 |
테이블ID | 0.787 | 0.689 | 0.987 | 0.987 | 1.000 | 1.000 | 0.145 |
테이블명 | 0.787 | 0.689 | 0.987 | 0.987 | 1.000 | 1.000 | 0.145 |
데이터타입 | 0.122 | 0.177 | 0.131 | 0.131 | 0.145 | 0.145 | 1.000 |
순번 | 분류ID | 분류명 | 테이블ID | 테이블명 | 컬럼ID | 컬럼명 | 데이터타입 | 컬럼설명 | 컬럼데이터수 | 표시형식 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | PT | 환자 | PRE_GSTR_PT_BSNF | PRE_위암_환자_기본정보 | CENTER_CD | 센터코드 | VARCHAR(20) | 센터코드 (5자리 : XXXXX) / 00030 : 국립암센터 예) 00030 | 13610 | 문자(5) : XXXXX |
1 | 2 | PT | 환자 | PRE_GSTR_PT_BSNF | PRE_위암_환자_기본정보 | IRB_APRV_NO | IRB승인번호 | VARCHAR(50) | 센터별 기준에 따라 생성 | 13610 | 텍스트 |
2 | 3 | PT | 환자 | PRE_GSTR_PT_BSNF | PRE_위암_환자_기본정보 | PT_SBST_NO | 환자대체번호 | VARCHAR(10) | 개인고유번호(10자리) / 센터별 별도부여 예) RN12345678 | 13610 | 문자(10) : XXXXXXXXXX |
3 | 4 | PT | 환자 | PRE_GSTR_PT_BSNF | PRE_위암_환자_기본정보 | BSPT_IDGN_AGE | 기본환자진단시연령 | NUMBER(4) | 환자의 위암 진단 당시 나이 / 예) 45 | 13610 | 숫자 |
4 | 5 | PT | 환자 | PRE_GSTR_PT_BSNF | PRE_위암_환자_기본정보 | BSPT_SEX_CD | 기본환자성별코드 | VARCHAR(20) | 환자의 성별코드 / M : 남성, F : 여성, E : ETC 예) M | 13610 | M 남성 | F 여성 | E ETC |
5 | 6 | PT | 환자 | PRE_GSTR_PT_BSNF | PRE_위암_환자_기본정보 | BSPT_SEX_CD_ETC_CONT | 기본환자성별코드기타내용 | CLOB | 환자성별코드기타내용(환자성별코드 기타 : E 인 경우) / free text | 0 | Free 텍스트 |
6 | 7 | PT | 환자 | PRE_GSTR_PT_BSNF | PRE_위암_환자_기본정보 | BSPT_FRST_ANCN_TRTM_STRT_YMD | 기본환자최초항암치료시작일자 | VARCHAR(8) | 환자가 위암으로 인해 처방받은 최초의 항암제 처방일자 / YYYYMMDD 예)20200101 | 11214 | YYYYMMDD |
7 | 8 | PT | 환자 | PRE_GSTR_PT_BSNF | PRE_위암_환자_기본정보 | BSPT_FRST_OPRT_YMD | 기본환자최초수술일자 | VARCHAR(8) | 위암으로 진단 받은 후 첫번째 수술일자 / YYYYMMDD 예)20200101 | 5014 | YYYYMMDD |
8 | 9 | PT | 환자 | PRE_GSTR_PT_BSNF | PRE_위암_환자_기본정보 | BSPT_FRST_OPRT_CD | 기본환자최초수술코드 | VARCHAR(20) | 위암으로 진단 받은 후 첫번째 수술코드 / 예) 20006299 | 5015 | 센터내 수술코드 |
9 | 10 | PT | 환자 | PRE_GSTR_PT_BSNF | PRE_위암_환자_기본정보 | BSPT_FRST_OPRT_NM | 기본환자최초수술명 | VARCHAR(1000) | 위암으로 진단 받은 후 첫번째 수술명 / 예) Laparoscopy Assisted Distal Gastrectomy(림프절 청소 포함) | 5015 | 텍스트 |
순번 | 분류ID | 분류명 | 테이블ID | 테이블명 | 컬럼ID | 컬럼명 | 데이터타입 | 컬럼설명 | 컬럼데이터수 | 표시형식 | |
---|---|---|---|---|---|---|---|---|---|---|---|
352 | 353 | TRTM | 치료 | PRE_GSTR_TRTM_RD | PRE_위암_치료_방사선 | RDT_TOTL_CGY | 방사선치료총선량 | NUMBER(5) | 방사선 치료 시 총 누적선량 / 정수값 예) 6500 | 758 | 숫자 |
353 | 354 | TRTM | 치료 | PRE_GSTR_TRTM_RD | PRE_위암_치료_방사선 | RDT_TOTL_TRTM_NT | 방사선치료총치료횟수 | NUMBER(5) | 방사선 치료 총 실시 횟수 / 정수값 예) 18 | 758 | 숫자 |
354 | 355 | TRTM | 치료 | PRE_GSTR_TRTM_RD | PRE_위암_치료_방사선 | RDT_SMNT_CONT | 방사선치료평가내용 | CLOB | 방사선치료 평가내용 / free text | 0 | Free 텍스트 |
355 | 356 | TRTM | 치료 | PRE_GSTR_TRTM_RD | PRE_위암_치료_방사선 | CRTN_DT | 생성일시 | DATETIME | 생성일시 DEFAULT current_timestamp() | 758 | YYYY-MM-DD HH:MI:SS |
356 | 357 | DEAD | 사망 | PRE_GSTR_DEAD_NFRM | PRE_위암_사망_정보 | CENTER_CD | 센터코드 | VARCHAR(20) | 센터코드 (5자리 : XXXXX) / 00030 : 국립암센터 예) 00030 | 665 | 문자(5) : XXXXX |
357 | 358 | DEAD | 사망 | PRE_GSTR_DEAD_NFRM | PRE_위암_사망_정보 | IRB_APRV_NO | IRB승인번호 | VARCHAR(50) | 센터별 기준에 따라 생성 | 665 | 텍스트 |
358 | 359 | DEAD | 사망 | PRE_GSTR_DEAD_NFRM | PRE_위암_사망_정보 | PT_SBST_NO | 환자대체번호 | VARCHAR(10) | 개인고유번호(10자리) / 센터별 별도부여 예) RN12345678 | 665 | 문자(10) : XXXXXXXXXX |
359 | 360 | DEAD | 사망 | PRE_GSTR_DEAD_NFRM | PRE_위암_사망_정보 | DEAD_YMD | 사망일자 | VARCHAR(8) | 사망진단서에 기재된 사망일자 / YYYYMMDD 예)20200101 | 665 | YYYYMMDD |
360 | 361 | DEAD | 사망 | PRE_GSTR_DEAD_NFRM | PRE_위암_사망_정보 | DEAD_MAIN_CAUS_CONT | 사망주원인내용 | CLOB | 사망진단서에 기재된 사망원인 / free text 예) Advanced Gastric Cancer | 665 | Free 텍스트 |
361 | 362 | DEAD | 사망 | PRE_GSTR_DEAD_NFRM | PRE_위암_사망_정보 | CRTN_DT | 생성일시 | DATETIME | 생성일시 DEFAULT current_timestamp() | 665 | YYYY-MM-DD HH:MI:SS |