Overview

Dataset statistics

Number of variables11
Number of observations295
Missing cells333
Missing cells (%)10.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory26.1 KiB
Average record size in memory90.4 B

Variable types

Numeric1
Categorical5
Text4
Unsupported1

Dataset

Description신장암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048684/fileData.do

Alerts

tblNm is highly overall correlated with NUM and 3 other fieldsHigh correlation
tblId is highly overall correlated with NUM and 3 other fieldsHigh correlation
gpNm is highly overall correlated with NUM and 3 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 3 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 3 other fieldsHigh correlation
dataType is highly imbalanced (58.5%)Imbalance
colCnt has 295 (100.0%) missing valuesMissing
dispFormat has 36 (12.2%) missing valuesMissing
NUM has unique valuesUnique
colCnt is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-19 06:27:37.316049
Analysis finished2024-04-19 06:27:38.636324
Duration1.32 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct295
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean148
Minimum1
Maximum295
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.7 KiB
2024-04-19T15:27:38.710050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile15.7
Q174.5
median148
Q3221.5
95-th percentile280.3
Maximum295
Range294
Interquartile range (IQR)147

Descriptive statistics

Standard deviation85.30338
Coefficient of variation (CV)0.57637419
Kurtosis-1.2
Mean148
Median Absolute Deviation (MAD)74
Skewness0
Sum43660
Variance7276.6667
MonotonicityStrictly increasing
2024-04-19T15:27:38.846478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.3%
204 1
 
0.3%
202 1
 
0.3%
201 1
 
0.3%
200 1
 
0.3%
199 1
 
0.3%
198 1
 
0.3%
197 1
 
0.3%
196 1
 
0.3%
195 1
 
0.3%
Other values (285) 285
96.6%
ValueCountFrequency (%)
1 1
0.3%
2 1
0.3%
3 1
0.3%
4 1
0.3%
5 1
0.3%
6 1
0.3%
7 1
0.3%
8 1
0.3%
9 1
0.3%
10 1
0.3%
ValueCountFrequency (%)
295 1
0.3%
294 1
0.3%
293 1
0.3%
292 1
0.3%
291 1
0.3%
290 1
0.3%
289 1
0.3%
288 1
0.3%
287 1
0.3%
286 1
0.3%

gpId
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
KDNY_HLTH
69 
KDNY_KUOS
44 
KDNY_COMP
24 
KDNY_OPRT
23 
KDNY_SPR
22 
Other values (10)
113 

Length

Max length18
Median length9
Mean length10.979661
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKDNY_SUMMARY_PTIF
2nd rowKDNY_SUMMARY_PTIF
3rd rowKDNY_SUMMARY_PTIF
4th rowKDNY_SUMMARY_PTIF
5th rowKDNY_SUMMARY_PTIF

Common Values

ValueCountFrequency (%)
KDNY_HLTH 69
23.4%
KDNY_KUOS 44
14.9%
KDNY_COMP 24
 
8.1%
KDNY_OPRT 23
 
7.8%
KDNY_SPR 22
 
7.5%
KDNY_FLUP_DEAD 19
 
6.4%
KDNY_SUMMARY_PTIF 17
 
5.8%
KDNY_FLUP_CST_RLPS 15
 
5.1%
KDNY_CHMO_HRSK 14
 
4.7%
KDNY_CEXM_DTPA 12
 
4.1%
Other values (5) 36
12.2%

Length

2024-04-19T15:27:38.989228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
kdny_hlth 69
23.4%
kdny_kuos 44
14.9%
kdny_comp 24
 
8.1%
kdny_oprt 23
 
7.8%
kdny_spr 22
 
7.5%
kdny_flup_dead 19
 
6.4%
kdny_summary_ptif 17
 
5.8%
kdny_flup_cst_rlps 15
 
5.1%
kdny_chmo_hrsk 14
 
4.7%
kdny_cexm_dtpa 12
 
4.1%
Other values (5) 36
12.2%

gpNm
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
기타건강정보
69 
비뇨기종양학회
44 
합병증
24 
수술정보
23 
외과병리보고서
22 
Other values (10)
113 

Length

Max length12
Median length10
Mean length6.3525424
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기본정보
2nd row기본정보
3rd row기본정보
4th row기본정보
5th row기본정보

Common Values

ValueCountFrequency (%)
기타건강정보 69
23.4%
비뇨기종양학회 44
14.9%
합병증 24
 
8.1%
수술정보 23
 
7.8%
외과병리보고서 22
 
7.5%
사망 및 치료평가 19
 
6.4%
기본정보 17
 
5.8%
전이 및 재발 15
 
5.1%
항암화학요법 14
 
4.7%
진단검사(검체) 12
 
4.1%
Other values (5) 36
12.2%

Length

2024-04-19T15:27:39.098943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
기타건강정보 69
17.7%
비뇨기종양학회 44
11.3%
34
 
8.7%
진단검사(검체 24
 
6.2%
합병증 24
 
6.2%
수술정보 23
 
5.9%
외과병리보고서 22
 
5.7%
사망 19
 
4.9%
치료평가 19
 
4.9%
기본정보 17
 
4.4%
Other values (8) 94
24.2%

tblId
Categorical

HIGH CORRELATION 

Distinct39
Distinct (%)13.2%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
MR_KDNY_HLTH_4
34 
PE_KDNY_COMP
24 
PE_KDNY_SPR_V
22 
MR_KDNY_HLTH_2
 
15
PE_KDNY_KUOS_OPRT_2
 
15
Other values (34)
185 

Length

Max length19
Median length17
Mean length14.291525
Min length11

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st rowPT_KDNY_TRGT
2nd rowPT_KDNY_TRGT
3rd rowRG_KDNY_CNDX_V
4th rowRG_KDNY_CNDX_V
5th rowRG_KDNY_CNDX_V

Common Values

ValueCountFrequency (%)
MR_KDNY_HLTH_4 34
 
11.5%
PE_KDNY_COMP 24
 
8.1%
PE_KDNY_SPR_V 22
 
7.5%
MR_KDNY_HLTH_2 15
 
5.1%
PE_KDNY_KUOS_OPRT_2 15
 
5.1%
MR_KDNY_HLTH_3 14
 
4.7%
PE_KDNY_KUOS_OPRT_1 13
 
4.4%
PE_KDNY_CHMO 12
 
4.1%
PE_KDNY_RTX 10
 
3.4%
PE_KDNY_FLUP 10
 
3.4%
Other values (29) 126
42.7%

Length

2024-04-19T15:27:39.224235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mr_kdny_hlth_4 34
 
11.5%
pe_kdny_comp 24
 
8.1%
pe_kdny_spr_v 22
 
7.5%
mr_kdny_hlth_2 15
 
5.1%
pe_kdny_kuos_oprt_2 15
 
5.1%
mr_kdny_hlth_3 14
 
4.7%
pe_kdny_kuos_oprt_1 13
 
4.4%
pe_kdny_chmo 12
 
4.1%
pe_kdny_flup 10
 
3.4%
pe_kdny_rtx 10
 
3.4%
Other values (29) 126
42.7%

tblNm
Categorical

HIGH CORRELATION 

Distinct36
Distinct (%)12.2%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
가족력
34 
외과병리보고서
27 
합병증
24 
Initial 영상검사
19 
입원정보
 
15
Other values (31)
176 

Length

Max length22
Median length19
Mean length7.1254237
Min length2

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st row기본정보
2nd row기본정보
3rd row진단정보
4th row진단정보
5th row진단정보

Common Values

ValueCountFrequency (%)
가족력 34
 
11.5%
외과병리보고서 27
 
9.2%
합병증 24
 
8.1%
Initial 영상검사 19
 
6.4%
입원정보 15
 
5.1%
PADUA/RENAL Score 15
 
5.1%
과거력 14
 
4.7%
항암화학요법 12
 
4.1%
RT 10
 
3.4%
추적관찰 10
 
3.4%
Other values (26) 115
39.0%

Length

2024-04-19T15:27:39.358248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
initial 47
 
11.9%
가족력 34
 
8.6%
외과병리보고서 27
 
6.8%
합병증 24
 
6.1%
영상검사 22
 
5.6%
입원정보 15
 
3.8%
padua/renal 15
 
3.8%
score 15
 
3.8%
f/u 15
 
3.8%
과거력 14
 
3.5%
Other values (31) 168
42.4%

colId
Text

Distinct270
Distinct (%)91.5%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
2024-04-19T15:27:39.607977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length17
Mean length12.786441
Min length5

Characters and Unicode

Total characters3772
Distinct characters31
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique246 ?
Unique (%)83.4%

Sample

1st rowFRMD_YMD
2nd rowOPRT_AGE
3rd rowDIAG_YMD
4th rowDIAG_CD
5th rowDIAG_ENM
ValueCountFrequency (%)
miex_ymd 3
 
1.0%
diur_t_half_l_cmnt 2
 
0.7%
diur_t_half_r_cmnt 2
 
0.7%
miex_nm 2
 
0.7%
ancd_nm 2
 
0.7%
t_max_l_cmnt 2
 
0.7%
relt_func_r_cmnt 2
 
0.7%
cexm_nm 2
 
0.7%
cexm_ymd 2
 
0.7%
t_half_r_cmnt 2
 
0.7%
Other values (260) 274
92.9%
2024-04-19T15:27:40.010205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 624
16.5%
T 358
 
9.5%
M 348
 
9.2%
N 340
 
9.0%
C 296
 
7.8%
S 208
 
5.5%
R 199
 
5.3%
D 167
 
4.4%
P 118
 
3.1%
L 118
 
3.1%
Other values (21) 996
26.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3139
83.2%
Connector Punctuation 624
 
16.5%
Decimal Number 9
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 358
11.4%
M 348
 
11.1%
N 340
 
10.8%
C 296
 
9.4%
S 208
 
6.6%
R 199
 
6.3%
D 167
 
5.3%
P 118
 
3.8%
L 118
 
3.8%
H 116
 
3.7%
Other values (16) 871
27.7%
Decimal Number
ValueCountFrequency (%)
1 5
55.6%
2 2
 
22.2%
3 1
 
11.1%
4 1
 
11.1%
Connector Punctuation
ValueCountFrequency (%)
_ 624
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3139
83.2%
Common 633
 
16.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 358
11.4%
M 348
 
11.1%
N 340
 
10.8%
C 296
 
9.4%
S 208
 
6.6%
R 199
 
6.3%
D 167
 
5.3%
P 118
 
3.8%
L 118
 
3.8%
H 116
 
3.7%
Other values (16) 871
27.7%
Common
ValueCountFrequency (%)
_ 624
98.6%
1 5
 
0.8%
2 2
 
0.3%
3 1
 
0.2%
4 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3772
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 624
16.5%
T 358
 
9.5%
M 348
 
9.2%
N 340
 
9.0%
C 296
 
7.8%
S 208
 
5.5%
R 199
 
5.3%
D 167
 
4.4%
P 118
 
3.1%
L 118
 
3.1%
Other values (21) 996
26.4%

colNm
Text

Distinct238
Distinct (%)80.7%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
2024-04-19T15:27:40.270103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length61
Median length31
Mean length8.2644068
Min length1

Characters and Unicode

Total characters2438
Distinct characters215
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique212 ?
Unique (%)71.9%

Sample

1st row초진일
2nd row수술 당시 연령
3rd row진단일
4th row진단코드
5th row진단명
ValueCountFrequency (%)
기타 14
 
2.9%
t 10
 
2.1%
invasion 9
 
1.9%
상세내용 9
 
1.9%
of 8
 
1.7%
오른쪽 8
 
1.7%
왼쪽 8
 
1.7%
당뇨 6
 
1.3%
검사명 5
 
1.0%
간질환 5
 
1.0%
Other values (289) 395
82.8%
2024-04-19T15:27:40.888722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
182
 
7.5%
i 94
 
3.9%
a 91
 
3.7%
o 88
 
3.6%
t 81
 
3.3%
e 79
 
3.2%
n 76
 
3.1%
s 66
 
2.7%
r 58
 
2.4%
53
 
2.2%
Other values (205) 1570
64.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1041
42.7%
Lowercase Letter 954
39.1%
Space Separator 182
 
7.5%
Uppercase Letter 158
 
6.5%
Open Punctuation 39
 
1.6%
Close Punctuation 39
 
1.6%
Other Punctuation 12
 
0.5%
Decimal Number 10
 
0.4%
Dash Punctuation 2
 
0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
53
 
5.1%
33
 
3.2%
33
 
3.2%
28
 
2.7%
22
 
2.1%
20
 
1.9%
20
 
1.9%
20
 
1.9%
19
 
1.8%
19
 
1.8%
Other values (150) 774
74.4%
Lowercase Letter
ValueCountFrequency (%)
i 94
9.9%
a 91
9.5%
o 88
 
9.2%
t 81
 
8.5%
e 79
 
8.3%
n 76
 
8.0%
s 66
 
6.9%
r 58
 
6.1%
c 52
 
5.5%
m 48
 
5.0%
Other values (15) 221
23.2%
Uppercase Letter
ValueCountFrequency (%)
T 17
10.8%
S 16
 
10.1%
C 13
 
8.2%
A 13
 
8.2%
R 10
 
6.3%
L 10
 
6.3%
N 10
 
6.3%
D 9
 
5.7%
P 9
 
5.7%
M 9
 
5.7%
Other values (10) 42
26.6%
Other Punctuation
ValueCountFrequency (%)
/ 10
83.3%
' 1
 
8.3%
. 1
 
8.3%
Decimal Number
ValueCountFrequency (%)
2 5
50.0%
1 5
50.0%
Space Separator
ValueCountFrequency (%)
182
100.0%
Open Punctuation
ValueCountFrequency (%)
( 39
100.0%
Close Punctuation
ValueCountFrequency (%)
) 39
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Math Symbol
ValueCountFrequency (%)
< 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1112
45.6%
Hangul 1041
42.7%
Common 285
 
11.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
53
 
5.1%
33
 
3.2%
33
 
3.2%
28
 
2.7%
22
 
2.1%
20
 
1.9%
20
 
1.9%
20
 
1.9%
19
 
1.8%
19
 
1.8%
Other values (150) 774
74.4%
Latin
ValueCountFrequency (%)
i 94
 
8.5%
a 91
 
8.2%
o 88
 
7.9%
t 81
 
7.3%
e 79
 
7.1%
n 76
 
6.8%
s 66
 
5.9%
r 58
 
5.2%
c 52
 
4.7%
m 48
 
4.3%
Other values (35) 379
34.1%
Common
ValueCountFrequency (%)
182
63.9%
( 39
 
13.7%
) 39
 
13.7%
/ 10
 
3.5%
2 5
 
1.8%
1 5
 
1.8%
- 2
 
0.7%
' 1
 
0.4%
. 1
 
0.4%
< 1
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1397
57.3%
Hangul 1041
42.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
182
 
13.0%
i 94
 
6.7%
a 91
 
6.5%
o 88
 
6.3%
t 81
 
5.8%
e 79
 
5.7%
n 76
 
5.4%
s 66
 
4.7%
r 58
 
4.2%
c 52
 
3.7%
Other values (45) 530
37.9%
Hangul
ValueCountFrequency (%)
53
 
5.1%
33
 
3.2%
33
 
3.2%
28
 
2.7%
22
 
2.1%
20
 
1.9%
20
 
1.9%
20
 
1.9%
19
 
1.8%
19
 
1.8%
Other values (150) 774
74.4%

dataType
Categorical

IMBALANCE 

Distinct5
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
String
240 
Date
35 
Float
 
12
FLOAT
 
4
INTEGER
 
4

Length

Max length7
Median length6
Mean length5.7220339
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDate
2nd rowFLOAT
3rd rowDate
4th rowString
5th rowString

Common Values

ValueCountFrequency (%)
String 240
81.4%
Date 35
 
11.9%
Float 12
 
4.1%
FLOAT 4
 
1.4%
INTEGER 4
 
1.4%

Length

2024-04-19T15:27:41.027694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-19T15:27:41.140807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
string 240
81.4%
date 35
 
11.9%
float 16
 
5.4%
integer 4
 
1.4%
Distinct287
Distinct (%)98.0%
Missing2
Missing (%)0.7%
Memory size2.4 KiB
2024-04-19T15:27:41.383566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length83
Median length38
Mean length17.587031
Min length5

Characters and Unicode

Total characters5153
Distinct characters334
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique285 ?
Unique (%)97.3%

Sample

1st rowKCD 분류가 C64 C65인 최초 진단 등록일
2nd row신장암 수술 당시 나이
3rd row환자가 진단받은 암 진단일
4th rowKCD 분류 모든 등록 진단 코드 (하위코드 포함)
5th row환자가 진단받은 암 진단명
ValueCountFrequency (%)
여부 66
 
5.0%
환자의 52
 
4.0%
기타 30
 
2.3%
종양의 29
 
2.2%
시행한 28
 
2.1%
ncc에서 25
 
1.9%
합병증 21
 
1.6%
dtpa 18
 
1.4%
18
 
1.4%
f/u 16
 
1.2%
Other values (472) 1013
77.0%
2024-04-19T15:27:42.144042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1025
 
19.9%
105
 
2.0%
98
 
1.9%
90
 
1.7%
86
 
1.7%
78
 
1.5%
77
 
1.5%
75
 
1.5%
C 74
 
1.4%
71
 
1.4%
Other values (324) 3374
65.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3321
64.4%
Space Separator 1025
 
19.9%
Uppercase Letter 377
 
7.3%
Lowercase Letter 242
 
4.7%
Other Punctuation 64
 
1.2%
Open Punctuation 38
 
0.7%
Close Punctuation 38
 
0.7%
Decimal Number 33
 
0.6%
Math Symbol 8
 
0.2%
Dash Punctuation 6
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
105
 
3.2%
98
 
3.0%
90
 
2.7%
86
 
2.6%
78
 
2.3%
77
 
2.3%
75
 
2.3%
71
 
2.1%
69
 
2.1%
63
 
1.9%
Other values (260) 2509
75.5%
Lowercase Letter
ValueCountFrequency (%)
i 26
10.7%
a 26
10.7%
o 24
 
9.9%
e 20
 
8.3%
t 19
 
7.9%
s 16
 
6.6%
l 15
 
6.2%
r 14
 
5.8%
n 12
 
5.0%
c 10
 
4.1%
Other values (12) 60
24.8%
Uppercase Letter
ValueCountFrequency (%)
C 74
19.6%
N 43
11.4%
T 40
10.6%
M 28
 
7.4%
F 25
 
6.6%
D 24
 
6.4%
A 22
 
5.8%
P 22
 
5.8%
U 17
 
4.5%
E 17
 
4.5%
Other values (10) 65
17.2%
Decimal Number
ValueCountFrequency (%)
1 14
42.4%
2 7
21.2%
3 3
 
9.1%
6 3
 
9.1%
0 3
 
9.1%
4 2
 
6.1%
5 1
 
3.0%
Other Punctuation
ValueCountFrequency (%)
/ 35
54.7%
: 18
28.1%
, 5
 
7.8%
. 4
 
6.2%
* 2
 
3.1%
Math Symbol
ValueCountFrequency (%)
= 2
25.0%
2
25.0%
~ 2
25.0%
< 1
12.5%
1
12.5%
Space Separator
ValueCountFrequency (%)
1025
100.0%
Open Punctuation
ValueCountFrequency (%)
( 38
100.0%
Close Punctuation
ValueCountFrequency (%)
) 38
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%
Other Number
ValueCountFrequency (%)
² 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3321
64.4%
Common 1213
 
23.5%
Latin 619
 
12.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
105
 
3.2%
98
 
3.0%
90
 
2.7%
86
 
2.6%
78
 
2.3%
77
 
2.3%
75
 
2.3%
71
 
2.1%
69
 
2.1%
63
 
1.9%
Other values (260) 2509
75.5%
Latin
ValueCountFrequency (%)
C 74
 
12.0%
N 43
 
6.9%
T 40
 
6.5%
M 28
 
4.5%
i 26
 
4.2%
a 26
 
4.2%
F 25
 
4.0%
D 24
 
3.9%
o 24
 
3.9%
A 22
 
3.6%
Other values (32) 287
46.4%
Common
ValueCountFrequency (%)
1025
84.5%
( 38
 
3.1%
) 38
 
3.1%
/ 35
 
2.9%
: 18
 
1.5%
1 14
 
1.2%
2 7
 
0.6%
- 6
 
0.5%
, 5
 
0.4%
. 4
 
0.3%
Other values (12) 23
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3321
64.4%
ASCII 1828
35.5%
Math Operators 2
 
< 0.1%
None 1
 
< 0.1%
Arrows 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1025
56.1%
C 74
 
4.0%
N 43
 
2.4%
T 40
 
2.2%
( 38
 
2.1%
) 38
 
2.1%
/ 35
 
1.9%
M 28
 
1.5%
i 26
 
1.4%
a 26
 
1.4%
Other values (51) 455
24.9%
Hangul
ValueCountFrequency (%)
105
 
3.2%
98
 
3.0%
90
 
2.7%
86
 
2.6%
78
 
2.3%
77
 
2.3%
75
 
2.3%
71
 
2.1%
69
 
2.1%
63
 
1.9%
Other values (260) 2509
75.5%
Math Operators
ValueCountFrequency (%)
2
100.0%
None
ValueCountFrequency (%)
² 1
100.0%
Arrows
ValueCountFrequency (%)
1
100.0%

colCnt
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing295
Missing (%)100.0%
Memory size2.7 KiB

dispFormat
Text

MISSING 

Distinct74
Distinct (%)28.6%
Missing36
Missing (%)12.2%
Memory size2.4 KiB
2024-04-19T15:27:42.488304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length587
Median length118
Mean length20.494208
Min length2

Characters and Unicode

Total characters5308
Distinct characters157
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique57 ?
Unique (%)22.0%

Sample

1st rowYYYY-MM-DD
2nd row숫자
3rd rowYYYY-MM-DD
4th rowex) C64
5th rowex) Mlignant neoplasms of kidney, except renal pelvis
ValueCountFrequency (%)
194
 
14.3%
n 61
 
4.5%
y 61
 
4.5%
1 55
 
4.1%
2 55
 
4.1%
텍스트 54
 
4.0%
grade 54
 
4.0%
49
 
3.6%
49
 
3.6%
3 45
 
3.3%
Other values (258) 675
49.9%
2024-04-19T15:27:42.979643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1118
21.1%
| 298
 
5.6%
, 250
 
4.7%
e 225
 
4.2%
Y 201
 
3.8%
a 167
 
3.1%
r 146
 
2.8%
i 145
 
2.7%
n 144
 
2.7%
o 109
 
2.1%
Other values (147) 2505
47.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1633
30.8%
Space Separator 1118
21.1%
Uppercase Letter 761
14.3%
Other Letter 593
 
11.2%
Decimal Number 433
 
8.2%
Math Symbol 303
 
5.7%
Other Punctuation 302
 
5.7%
Dash Punctuation 75
 
1.4%
Close Punctuation 58
 
1.1%
Open Punctuation 32
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
54
 
9.1%
54
 
9.1%
54
 
9.1%
53
 
8.9%
50
 
8.4%
21
 
3.5%
16
 
2.7%
15
 
2.5%
15
 
2.5%
14
 
2.4%
Other values (77) 247
41.7%
Lowercase Letter
ValueCountFrequency (%)
e 225
13.8%
a 167
10.2%
r 146
 
8.9%
i 145
 
8.9%
n 144
 
8.8%
o 109
 
6.7%
t 92
 
5.6%
l 88
 
5.4%
d 80
 
4.9%
s 61
 
3.7%
Other values (14) 376
23.0%
Uppercase Letter
ValueCountFrequency (%)
Y 201
26.4%
M 92
12.1%
D 84
11.0%
N 77
 
10.1%
G 60
 
7.9%
C 39
 
5.1%
R 26
 
3.4%
U 26
 
3.4%
T 25
 
3.3%
B 21
 
2.8%
Other values (12) 110
14.5%
Decimal Number
ValueCountFrequency (%)
1 86
19.9%
2 83
19.2%
3 81
18.7%
4 57
13.2%
5 36
8.3%
6 25
 
5.8%
7 20
 
4.6%
8 18
 
4.2%
9 16
 
3.7%
0 11
 
2.5%
Other Punctuation
ValueCountFrequency (%)
, 250
82.8%
* 38
 
12.6%
: 7
 
2.3%
% 4
 
1.3%
/ 2
 
0.7%
. 1
 
0.3%
Math Symbol
ValueCountFrequency (%)
| 298
98.3%
+ 2
 
0.7%
> 2
 
0.7%
~ 1
 
0.3%
Space Separator
ValueCountFrequency (%)
1118
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 75
100.0%
Close Punctuation
ValueCountFrequency (%)
) 58
100.0%
Open Punctuation
ValueCountFrequency (%)
( 32
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2394
45.1%
Common 2321
43.7%
Hangul 593
 
11.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
54
 
9.1%
54
 
9.1%
54
 
9.1%
53
 
8.9%
50
 
8.4%
21
 
3.5%
16
 
2.7%
15
 
2.5%
15
 
2.5%
14
 
2.4%
Other values (77) 247
41.7%
Latin
ValueCountFrequency (%)
e 225
 
9.4%
Y 201
 
8.4%
a 167
 
7.0%
r 146
 
6.1%
i 145
 
6.1%
n 144
 
6.0%
o 109
 
4.6%
t 92
 
3.8%
M 92
 
3.8%
l 88
 
3.7%
Other values (36) 985
41.1%
Common
ValueCountFrequency (%)
1118
48.2%
| 298
 
12.8%
, 250
 
10.8%
1 86
 
3.7%
2 83
 
3.6%
3 81
 
3.5%
- 75
 
3.2%
) 58
 
2.5%
4 57
 
2.5%
* 38
 
1.6%
Other values (14) 177
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4715
88.8%
Hangul 593
 
11.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1118
23.7%
| 298
 
6.3%
, 250
 
5.3%
e 225
 
4.8%
Y 201
 
4.3%
a 167
 
3.5%
r 146
 
3.1%
i 145
 
3.1%
n 144
 
3.1%
o 109
 
2.3%
Other values (60) 1912
40.6%
Hangul
ValueCountFrequency (%)
54
 
9.1%
54
 
9.1%
54
 
9.1%
53
 
8.9%
50
 
8.4%
21
 
3.5%
16
 
2.7%
15
 
2.5%
15
 
2.5%
14
 
2.4%
Other values (77) 247
41.7%

Interactions

2024-04-19T15:27:38.172206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-19T15:27:43.104530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.0000.9550.9550.9830.9750.5040.844
gpId0.9551.0001.0001.0000.9990.6980.910
gpNm0.9551.0001.0001.0000.9990.6980.910
tblId0.9831.0001.0001.0001.0000.7230.955
tblNm0.9750.9990.9991.0001.0000.6970.949
dataType0.5040.6980.6980.7230.6971.0000.913
dispFormat0.8440.9100.9100.9550.9490.9131.000
2024-04-19T15:27:43.230964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
tblNmdataTypetblIdgpNmgpId
tblNm1.0000.3890.9940.9470.947
dataType0.3891.0000.4080.3680.368
tblId0.9940.4081.0000.9560.956
gpNm0.9470.3680.9561.0001.000
gpId0.9470.3680.9561.0001.000
2024-04-19T15:27:43.349279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataType
NUM1.0000.7540.7540.8210.7980.231
gpId0.7541.0001.0000.9560.9470.368
gpNm0.7541.0001.0000.9560.9470.368
tblId0.8210.9560.9561.0000.9940.408
tblNm0.7980.9470.9470.9941.0000.389
dataType0.2310.3680.3680.4080.3891.000

Missing values

2024-04-19T15:27:38.309268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-19T15:27:38.476444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-19T15:27:38.585576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01KDNY_SUMMARY_PTIF기본정보PT_KDNY_TRGT기본정보FRMD_YMD초진일DateKCD 분류가 C64 C65인 최초 진단 등록일<NA>YYYY-MM-DD
12KDNY_SUMMARY_PTIF기본정보PT_KDNY_TRGT기본정보OPRT_AGE수술 당시 연령FLOAT신장암 수술 당시 나이<NA>숫자
23KDNY_SUMMARY_PTIF기본정보RG_KDNY_CNDX_V진단정보DIAG_YMD진단일Date환자가 진단받은 암 진단일<NA>YYYY-MM-DD
34KDNY_SUMMARY_PTIF기본정보RG_KDNY_CNDX_V진단정보DIAG_CD진단코드StringKCD 분류 모든 등록 진단 코드 (하위코드 포함)<NA>ex) C64
45KDNY_SUMMARY_PTIF기본정보RG_KDNY_CNDX_V진단정보DIAG_ENM진단명String환자가 진단받은 암 진단명<NA>ex) Mlignant neoplasms of kidney, except renal pelvis
56KDNY_SUMMARY_PTIF기본정보RG_KDNY_CNDX_V진단정보ETC_CNCR_YN기타암여부String신장암 혹은 신장암을 제외한 기타부위의 암종 여부<NA>Y, 유 | N, 무
67KDNY_SUMMARY_PTIF기본정보PT_KDNY_BDMS신체계측WT_MSRM_YMD체중(kg)Date환자의 몸무게<NA>YYYY-MM-DD
78KDNY_SUMMARY_PTIF기본정보PT_KDNY_BDMS신체계측WT_VL체중측정일FLOAT환자의 몸무게 측정일<NA>숫자
89KDNY_SUMMARY_PTIF기본정보PT_KDNY_BDMS신체계측HT_MSRM_YMD신장(cm)Date환자의 신장<NA>YYYY-MM-DD
910KDNY_SUMMARY_PTIF기본정보PT_KDNY_BDMS신체계측HT_VL신장측정일FLOAT환자의 신장 측정일<NA>숫자
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
285286KDNY_KUOS비뇨기종양학회PE_KDNY_KUOS_SPR외과병리보고서SP_ACPT_YMD접수일자Date수술후 외과병리 검체접수일자<NA>YYYY-MM-DD
286287KDNY_KUOS비뇨기종양학회PE_KDNY_KUOS_SPR외과병리보고서VSCL_INVS_CMNTVascular invasionString종양의 혈관 침범 여부<NA>ex) vascular invasion: not identified
287288KDNY_KUOS비뇨기종양학회PE_KDNY_KUOS_SPR외과병리보고서SRG_MRGN_CMNTSurgical marginString수술적으로 제거된 조직의 가장자리 전이 여부<NA>ex) Surgical margins: not identified
288289KDNY_KUOS비뇨기종양학회PE_KDNY_KUOS_SPR외과병리보고서CPSL_INVS_CMNTCapsular invasionString종양의 피막 침범 여부<NA>ex) capsular invasion: not identified
289290KDNY_KUOS비뇨기종양학회PE_KDNY_KUOS_SPR외과병리보고서SCMT_DIFF_CMNTSarcomatoid differentiationString종양의 육종 분화 정도<NA><NA>
290291KDNY_KUOS비뇨기종양학회PE_KDNY_KUOS_HLTH기타건강정보ACMP_DISS_CMNT동반질환String입원 당시 환자가 가지고 있었던 기존 질환<NA>1, DM | 2, HTN | 3, ETC
291292KDNY_KUOS비뇨기종양학회PE_KDNY_KUOS_HLTH기타건강정보CHRN_RNLF_YN만성신부전String만성신부전 여부<NA>Y, 유 | N, 무
292293KDNY_KUOS비뇨기종양학회PE_KDNY_KUOS_HLTH기타건강정보DALY_YN투석String투석 여부<NA>Y, 유 | N, 무
293294KDNY_KUOS비뇨기종양학회PE_KDNY_KUOS_HLTH기타건강정보CHRL_CMBD_INDX_CMNTCharlson comorbidity indexString환자가 앓고 있는 다른 상병들이 환자의 사망에 미치는 영향을 알기 위한 지표<NA>Myocardial infarct (1점)
294295KDNY_KUOS비뇨기종양학회PE_KDNY_KUOS_HLTH기타건강정보CHRL_CMBD_INDX_TPNTCharison comorbidity index Total (점)FloatCharison comorbidity index을 모두 합산한 점수<NA>정수로 표현