Overview

Dataset statistics

Number of variables11
Number of observations253
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory22.4 KiB
Average record size in memory90.5 B

Variable types

Numeric2
Categorical6
Text3

Dataset

Description폐암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048700/fileData.do

Alerts

tblNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 4 other fieldsHigh correlation
gpNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
tblId is highly overall correlated with NUM and 4 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 3 other fieldsHigh correlation
colCnt is highly overall correlated with gpId and 3 other fieldsHigh correlation
dataType is highly overall correlated with dispFormatHigh correlation
dispFormat is highly overall correlated with dataTypeHigh correlation
NUM has unique valuesUnique

Reproduction

Analysis started2023-12-12 23:00:44.278263
Analysis finished2023-12-12 23:00:45.905236
Duration1.63 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct253
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean127
Minimum1
Maximum253
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 KiB
2023-12-13T08:00:45.991918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile13.6
Q164
median127
Q3190
95-th percentile240.4
Maximum253
Range252
Interquartile range (IQR)126

Descriptive statistics

Standard deviation73.179004
Coefficient of variation (CV)0.57621263
Kurtosis-1.2
Mean127
Median Absolute Deviation (MAD)63
Skewness0
Sum32131
Variance5355.1667
MonotonicityStrictly increasing
2023-12-13T08:00:46.189640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.4%
175 1
 
0.4%
162 1
 
0.4%
163 1
 
0.4%
164 1
 
0.4%
165 1
 
0.4%
166 1
 
0.4%
167 1
 
0.4%
168 1
 
0.4%
169 1
 
0.4%
Other values (243) 243
96.0%
ValueCountFrequency (%)
1 1
0.4%
2 1
0.4%
3 1
0.4%
4 1
0.4%
5 1
0.4%
6 1
0.4%
7 1
0.4%
8 1
0.4%
9 1
0.4%
10 1
0.4%
ValueCountFrequency (%)
253 1
0.4%
252 1
0.4%
251 1
0.4%
250 1
0.4%
249 1
0.4%
248 1
0.4%
247 1
0.4%
246 1
0.4%
245 1
0.4%
244 1
0.4%

gpId
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
LUNG_HLTH
74 
LUNG_SPR
32 
LUNG_CHMO_FLST
26 
LUNG_BRON
18 
LUNG_OPRT
17 
Other values (10)
86 

Length

Max length14
Median length9
Mean length10.079051
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLUNG_TRGT
2nd rowLUNG_TRGT
3rd rowLUNG_TRGT
4th rowLUNG_TRGT
5th rowLUNG_TRGT

Common Values

ValueCountFrequency (%)
LUNG_HLTH 74
29.2%
LUNG_SPR 32
12.6%
LUNG_CHMO_FLST 26
 
10.3%
LUNG_BRON 18
 
7.1%
LUNG_OPRT 17
 
6.7%
LUNG_CHMO 14
 
5.5%
LUNG_TRGT 13
 
5.1%
LUNG_CNDX_BDMS 12
 
4.7%
LUNG_EVAL_DEAD 10
 
4.0%
LUNG_RTX 9
 
3.6%
Other values (5) 28
 
11.1%

Length

2023-12-13T08:00:46.353280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
lung_hlth 74
29.2%
lung_spr 32
12.6%
lung_chmo_flst 26
 
10.3%
lung_bron 18
 
7.1%
lung_oprt 17
 
6.7%
lung_chmo 14
 
5.5%
lung_trgt 13
 
5.1%
lung_cndx_bdms 12
 
4.7%
lung_eval_dead 10
 
4.0%
lung_rtx 9
 
3.6%
Other values (5) 28
 
11.1%

gpNm
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
환자건강정보
74 
외과병리
32 
항암 FlowSheet
26 
기관지내시경검사
18 
수술
17 
Other values (10)
86 

Length

Max length12
Median length11
Mean length6.8853755
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSummary
2nd rowSummary
3rd rowSummary
4th rowSummary
5th rowSummary

Common Values

ValueCountFrequency (%)
환자건강정보 74
29.2%
외과병리 32
12.6%
항암 FlowSheet 26
 
10.3%
기관지내시경검사 18
 
7.1%
수술 17
 
6.7%
항암치료 14
 
5.5%
Summary 13
 
5.1%
진단 및 신체 12
 
4.7%
치료평가 및 사망정보 10
 
4.0%
방사선 치료 9
 
3.6%
Other values (5) 28
 
11.1%

Length

2023-12-13T08:00:46.521978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
환자건강정보 74
21.1%
외과병리 32
 
9.1%
항암 26
 
7.4%
flowsheet 26
 
7.4%
22
 
6.3%
initial 19
 
5.4%
기관지내시경검사 18
 
5.1%
수술 17
 
4.8%
항암치료 14
 
4.0%
summary 13
 
3.7%
Other values (11) 90
25.6%

tblId
Categorical

HIGH CORRELATION 

Distinct29
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
LUNG_PE_SPR
32 
LUNG_PE_CHMO_FLST
26 
LUNG_PE_BRON
18 
LUNG_MR_HLTH_10
 
14
LUNG_PT_TRGT
 
13
Other values (24)
150 

Length

Max length17
Median length15
Mean length13.549407
Min length11

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLUNG_PT_TRGT
2nd rowLUNG_PT_TRGT
3rd rowLUNG_PT_TRGT
4th rowLUNG_PT_TRGT
5th rowLUNG_PT_TRGT

Common Values

ValueCountFrequency (%)
LUNG_PE_SPR 32
 
12.6%
LUNG_PE_CHMO_FLST 26
 
10.3%
LUNG_PE_BRON 18
 
7.1%
LUNG_MR_HLTH_10 14
 
5.5%
LUNG_PT_TRGT 13
 
5.1%
LUNG_PE_RTX 9
 
3.6%
LUNG_PE_CHMO 9
 
3.6%
LUNG_MR_HLTH_8 9
 
3.6%
LUNG_MR_HLTH_7 9
 
3.6%
LUNG_PE_OPRT 9
 
3.6%
Other values (19) 105
41.5%

Length

2023-12-13T08:00:46.672914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
lung_pe_spr 32
 
12.6%
lung_pe_chmo_flst 26
 
10.3%
lung_pe_bron 18
 
7.1%
lung_mr_hlth_10 14
 
5.5%
lung_pt_trgt 13
 
5.1%
lung_pe_rtx 9
 
3.6%
lung_pe_chmo 9
 
3.6%
lung_mr_hlth_8 9
 
3.6%
lung_mr_hlth_7 9
 
3.6%
lung_pe_oprt 9
 
3.6%
Other values (19) 105
41.5%

tblNm
Categorical

HIGH CORRELATION 

Distinct29
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
수술 후 결과
32 
Flow Sheet
26 
기관지내시경검사 결과
18 
과거력
 
14
기본정보
 
13
Other values (24)
150 

Length

Max length13
Median length10
Mean length7.0474308
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기본정보
2nd row기본정보
3rd row기본정보
4th row기본정보
5th row기본정보

Common Values

ValueCountFrequency (%)
수술 후 결과 32
 
12.6%
Flow Sheet 26
 
10.3%
기관지내시경검사 결과 18
 
7.1%
과거력 14
 
5.5%
기본정보 13
 
5.1%
방사선치료정보 9
 
3.6%
항암치료정보 9
 
3.6%
가족력(형제/자매) 9
 
3.6%
가족력(자녀) 9
 
3.6%
수술정보 9
 
3.6%
Other values (19) 105
41.5%

Length

2023-12-13T08:00:46.808820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
결과 67
16.4%
수술 32
 
7.8%
32
 
7.8%
flow 26
 
6.4%
sheet 26
 
6.4%
initial 19
 
4.6%
기관지내시경검사 18
 
4.4%
과거력 14
 
3.4%
기본정보 13
 
3.2%
방사선치료정보 9
 
2.2%
Other values (26) 153
37.4%

colId
Text

Distinct221
Distinct (%)87.4%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
2023-12-13T08:00:47.076874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length17
Mean length12.027668
Min length5

Characters and Unicode

Total characters3043
Distinct characters30
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique207 ?
Unique (%)81.8%

Sample

1st rowPT_SBST_NO
2nd rowSEX_CD
3rd rowBRTH_YMD
4th rowFRST_DIAG_CD
5th rowFRST_DIAG_YMD
ValueCountFrequency (%)
pt_sbst_no 19
 
7.5%
chmo_strt_ymd 3
 
1.2%
dead_ymd 2
 
0.8%
cexm_clsf_nm 2
 
0.8%
cexm_ymd 2
 
0.8%
oprt_ymd 2
 
0.8%
cexm_rslt_cmnt 2
 
0.8%
cexm_nm 2
 
0.8%
chmo_prps_nm 2
 
0.8%
rtx_strt_ymd 2
 
0.8%
Other values (211) 215
85.0%
2023-12-13T08:00:47.567324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 505
16.6%
T 297
 
9.8%
N 268
 
8.8%
M 258
 
8.5%
C 196
 
6.4%
S 192
 
6.3%
D 149
 
4.9%
R 123
 
4.0%
H 121
 
4.0%
Y 99
 
3.3%
Other values (20) 835
27.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2535
83.3%
Connector Punctuation 505
 
16.6%
Decimal Number 3
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 297
11.7%
N 268
 
10.6%
M 258
 
10.2%
C 196
 
7.7%
S 192
 
7.6%
D 149
 
5.9%
R 123
 
4.9%
H 121
 
4.8%
Y 99
 
3.9%
E 97
 
3.8%
Other values (16) 735
29.0%
Decimal Number
ValueCountFrequency (%)
3 1
33.3%
1 1
33.3%
2 1
33.3%
Connector Punctuation
ValueCountFrequency (%)
_ 505
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2535
83.3%
Common 508
 
16.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 297
11.7%
N 268
 
10.6%
M 258
 
10.2%
C 196
 
7.7%
S 192
 
7.6%
D 149
 
5.9%
R 123
 
4.9%
H 121
 
4.8%
Y 99
 
3.9%
E 97
 
3.8%
Other values (16) 735
29.0%
Common
ValueCountFrequency (%)
_ 505
99.4%
3 1
 
0.2%
1 1
 
0.2%
2 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3043
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 505
16.6%
T 297
 
9.8%
N 268
 
8.8%
M 258
 
8.5%
C 196
 
6.4%
S 192
 
6.3%
D 149
 
4.9%
R 123
 
4.0%
H 121
 
4.0%
Y 99
 
3.3%
Other values (20) 835
27.4%

colNm
Text

Distinct205
Distinct (%)81.0%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
2023-12-13T08:00:47.833350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length46
Median length33
Mean length9.6758893
Min length2

Characters and Unicode

Total characters2448
Distinct characters187
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique187 ?
Unique (%)73.9%

Sample

1st row환자대체번호
2nd row성별 코드
3rd row생년월일
4th row최초 진단 코드
5th row최초 진단일
ValueCountFrequency (%)
내용 49
 
10.4%
환자대체번호 19
 
4.0%
검사 14
 
3.0%
9
 
1.9%
기타 7
 
1.5%
to 7
 
1.5%
상세 7
 
1.5%
결과 7
 
1.5%
검사일 6
 
1.3%
n:무 6
 
1.3%
Other values (225) 342
72.3%
2023-12-13T08:00:48.327879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
220
 
9.0%
a 74
 
3.0%
62
 
2.5%
i 59
 
2.4%
58
 
2.4%
) 58
 
2.4%
( 58
 
2.4%
55
 
2.2%
54
 
2.2%
53
 
2.2%
Other values (177) 1697
69.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1261
51.5%
Lowercase Letter 598
24.4%
Space Separator 220
 
9.0%
Uppercase Letter 217
 
8.9%
Close Punctuation 58
 
2.4%
Open Punctuation 58
 
2.4%
Other Punctuation 28
 
1.1%
Decimal Number 4
 
0.2%
Dash Punctuation 3
 
0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
62
 
4.9%
58
 
4.6%
55
 
4.4%
54
 
4.3%
53
 
4.2%
38
 
3.0%
38
 
3.0%
38
 
3.0%
37
 
2.9%
37
 
2.9%
Other values (120) 791
62.7%
Lowercase Letter
ValueCountFrequency (%)
a 74
12.4%
i 59
9.9%
e 53
8.9%
n 51
8.5%
o 49
 
8.2%
t 48
 
8.0%
s 40
 
6.7%
c 35
 
5.9%
l 34
 
5.7%
r 32
 
5.4%
Other values (14) 123
20.6%
Uppercase Letter
ValueCountFrequency (%)
I 22
 
10.1%
N 21
 
9.7%
C 18
 
8.3%
T 17
 
7.8%
A 16
 
7.4%
S 15
 
6.9%
B 15
 
6.9%
P 13
 
6.0%
E 10
 
4.6%
L 10
 
4.6%
Other values (12) 60
27.6%
Other Punctuation
ValueCountFrequency (%)
/ 14
50.0%
: 12
42.9%
. 2
 
7.1%
Decimal Number
ValueCountFrequency (%)
1 2
50.0%
3 1
25.0%
2 1
25.0%
Space Separator
ValueCountFrequency (%)
220
100.0%
Close Punctuation
ValueCountFrequency (%)
) 58
100.0%
Open Punctuation
ValueCountFrequency (%)
( 58
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1261
51.5%
Latin 815
33.3%
Common 372
 
15.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
62
 
4.9%
58
 
4.6%
55
 
4.4%
54
 
4.3%
53
 
4.2%
38
 
3.0%
38
 
3.0%
38
 
3.0%
37
 
2.9%
37
 
2.9%
Other values (120) 791
62.7%
Latin
ValueCountFrequency (%)
a 74
 
9.1%
i 59
 
7.2%
e 53
 
6.5%
n 51
 
6.3%
o 49
 
6.0%
t 48
 
5.9%
s 40
 
4.9%
c 35
 
4.3%
l 34
 
4.2%
r 32
 
3.9%
Other values (36) 340
41.7%
Common
ValueCountFrequency (%)
220
59.1%
) 58
 
15.6%
( 58
 
15.6%
/ 14
 
3.8%
: 12
 
3.2%
- 3
 
0.8%
1 2
 
0.5%
. 2
 
0.5%
3 1
 
0.3%
2 1
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1261
51.5%
ASCII 1187
48.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
220
18.5%
a 74
 
6.2%
i 59
 
5.0%
) 58
 
4.9%
( 58
 
4.9%
e 53
 
4.5%
n 51
 
4.3%
o 49
 
4.1%
t 48
 
4.0%
s 40
 
3.4%
Other values (47) 477
40.2%
Hangul
ValueCountFrequency (%)
62
 
4.9%
58
 
4.6%
55
 
4.4%
54
 
4.3%
53
 
4.2%
38
 
3.0%
38
 
3.0%
38
 
3.0%
37
 
2.9%
37
 
2.9%
Other values (120) 791
62.7%

dataType
Categorical

HIGH CORRELATION 

Distinct28
Distinct (%)11.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
String(1)
49 
DATE
34 
String(10)
28 
String(100)
26 
String(200)
20 
Other values (23)
96 

Length

Max length13
Median length12
Mean length9.4624506
Min length4

Unique

Unique8 ?
Unique (%)3.2%

Sample

1st rowString(10)
2nd rowString(code)
3rd rowDATE
4th rowString(code)
5th rowDATE

Common Values

ValueCountFrequency (%)
String(1) 49
19.4%
DATE 34
13.4%
String(10) 28
11.1%
String(100) 26
10.3%
String(200) 20
7.9%
String(50) 16
 
6.3%
String(400) 13
 
5.1%
String(20) 12
 
4.7%
Integer(code) 9
 
3.6%
Integer(4) 6
 
2.4%
Other values (18) 40
15.8%

Length

2023-12-13T08:00:48.506371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
string(1 49
19.4%
date 34
13.4%
string(10 28
11.1%
string(100 26
10.3%
string(200 20
7.9%
string(50 16
 
6.3%
string(400 13
 
5.1%
string(20 12
 
4.7%
integer(code 9
 
3.6%
integer(4 6
 
2.4%
Other values (18) 40
15.8%
Distinct222
Distinct (%)87.7%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
2023-12-13T08:00:48.768814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length43
Median length32
Mean length9.9920949
Min length2

Characters and Unicode

Total characters2528
Distinct characters199
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique208 ?
Unique (%)82.2%

Sample

1st row환자대체번호
2nd row성별코드
3rd row생년월일
4th row최초진단코드
5th row최초진단일자
ValueCountFrequency (%)
환자대체번호 19
 
4.3%
내용 11
 
2.5%
상세 10
 
2.3%
to 7
 
1.6%
병리 7
 
1.6%
항암화학요법치료 6
 
1.4%
기타 6
 
1.4%
명칭 6
 
1.4%
stage 5
 
1.1%
5
 
1.1%
Other values (262) 357
81.3%
2023-12-13T08:00:49.182791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
186
 
7.4%
a 74
 
2.9%
i 71
 
2.8%
63
 
2.5%
) 59
 
2.3%
( 59
 
2.3%
e 54
 
2.1%
o 53
 
2.1%
53
 
2.1%
n 53
 
2.1%
Other values (189) 1803
71.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1346
53.2%
Lowercase Letter 628
24.8%
Uppercase Letter 217
 
8.6%
Space Separator 186
 
7.4%
Close Punctuation 59
 
2.3%
Open Punctuation 59
 
2.3%
Other Punctuation 25
 
1.0%
Decimal Number 4
 
0.2%
Dash Punctuation 3
 
0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
63
 
4.7%
53
 
3.9%
53
 
3.9%
50
 
3.7%
41
 
3.0%
40
 
3.0%
38
 
2.8%
36
 
2.7%
35
 
2.6%
31
 
2.3%
Other values (133) 906
67.3%
Lowercase Letter
ValueCountFrequency (%)
a 74
11.8%
i 71
11.3%
e 54
8.6%
o 53
8.4%
n 53
8.4%
t 48
 
7.6%
s 44
 
7.0%
c 37
 
5.9%
l 34
 
5.4%
r 33
 
5.3%
Other values (13) 127
20.2%
Uppercase Letter
ValueCountFrequency (%)
N 21
 
9.7%
B 19
 
8.8%
C 18
 
8.3%
I 18
 
8.3%
T 17
 
7.8%
S 15
 
6.9%
A 15
 
6.9%
P 12
 
5.5%
L 11
 
5.1%
E 10
 
4.6%
Other values (12) 61
28.1%
Other Punctuation
ValueCountFrequency (%)
/ 13
52.0%
: 10
40.0%
. 2
 
8.0%
Decimal Number
ValueCountFrequency (%)
1 2
50.0%
3 1
25.0%
2 1
25.0%
Space Separator
ValueCountFrequency (%)
186
100.0%
Close Punctuation
ValueCountFrequency (%)
) 59
100.0%
Open Punctuation
ValueCountFrequency (%)
( 59
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1346
53.2%
Latin 845
33.4%
Common 337
 
13.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
63
 
4.7%
53
 
3.9%
53
 
3.9%
50
 
3.7%
41
 
3.0%
40
 
3.0%
38
 
2.8%
36
 
2.7%
35
 
2.6%
31
 
2.3%
Other values (133) 906
67.3%
Latin
ValueCountFrequency (%)
a 74
 
8.8%
i 71
 
8.4%
e 54
 
6.4%
o 53
 
6.3%
n 53
 
6.3%
t 48
 
5.7%
s 44
 
5.2%
c 37
 
4.4%
l 34
 
4.0%
r 33
 
3.9%
Other values (35) 344
40.7%
Common
ValueCountFrequency (%)
186
55.2%
) 59
 
17.5%
( 59
 
17.5%
/ 13
 
3.9%
: 10
 
3.0%
- 3
 
0.9%
1 2
 
0.6%
. 2
 
0.6%
3 1
 
0.3%
2 1
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1346
53.2%
ASCII 1182
46.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
186
 
15.7%
a 74
 
6.3%
i 71
 
6.0%
) 59
 
5.0%
( 59
 
5.0%
e 54
 
4.6%
o 53
 
4.5%
n 53
 
4.5%
t 48
 
4.1%
s 44
 
3.7%
Other values (46) 481
40.7%
Hangul
ValueCountFrequency (%)
63
 
4.7%
53
 
3.9%
53
 
3.9%
50
 
3.7%
41
 
3.0%
40
 
3.0%
38
 
2.8%
36
 
2.7%
35
 
2.6%
31
 
2.3%
Other values (133) 906
67.3%

colCnt
Real number (ℝ)

HIGH CORRELATION 

Distinct147
Distinct (%)58.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean60406.569
Minimum10
Maximum755482
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 KiB
2023-12-13T08:00:49.352305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile422.4
Q15599
median19267
Q339381
95-th percentile286468.6
Maximum755482
Range755472
Interquartile range (IQR)33782

Descriptive statistics

Standard deviation129867.02
Coefficient of variation (CV)2.1498824
Kurtosis15.683781
Mean60406.569
Median Absolute Deviation (MAD)13848
Skewness3.8602949
Sum15282862
Variance1.6865443 × 1010
MonotonicityNot monotonic
2023-12-13T08:00:49.501399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
128430 12
 
4.7%
31410 10
 
4.0%
13393 9
 
3.6%
24809 7
 
2.8%
22501 7
 
2.8%
18640 7
 
2.8%
23008 7
 
2.8%
16395 7
 
2.8%
23358 7
 
2.8%
60299 6
 
2.4%
Other values (137) 174
68.8%
ValueCountFrequency (%)
10 1
0.4%
26 1
0.4%
41 1
0.4%
47 1
0.4%
67 1
0.4%
123 1
0.4%
128 1
0.4%
130 1
0.4%
145 1
0.4%
197 2
0.8%
ValueCountFrequency (%)
755482 4
1.6%
700110 1
 
0.4%
500357 1
 
0.4%
499632 1
 
0.4%
467196 1
 
0.4%
445627 4
1.6%
412393 1
 
0.4%
202519 1
 
0.4%
202517 2
0.8%
202514 1
 
0.4%

dispFormat
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
텍스트
112 
Y : 유 / N : 무
47 
YYYY-MM-DD
34 
숫자
25 
RN+비식별숫자(8)
19 
Other values (4)
16 

Length

Max length15
Median length13
Mean length6.6679842
Min length2

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st rowRN+비식별숫자(8)
2nd rowM 남 | F 여
3rd rowYYYY-MM-DD
4th row원내검사 코드
5th rowYYYY-MM-DD

Common Values

ValueCountFrequency (%)
텍스트 112
44.3%
Y : 유 / N : 무 47
18.6%
YYYY-MM-DD 34
 
13.4%
숫자 25
 
9.9%
RN+비식별숫자(8) 19
 
7.5%
Free 텍스트 11
 
4.3%
원내검사 코드 2
 
0.8%
Y : 내부 / N : 외부 2
 
0.8%
M 남 | F 여 1
 
0.4%

Length

2023-12-13T08:00:49.660630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:00:49.780342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
148
26.2%
텍스트 123
21.8%
y 49
 
8.7%
n 49
 
8.7%
47
 
8.3%
47
 
8.3%
yyyy-mm-dd 34
 
6.0%
숫자 25
 
4.4%
rn+비식별숫자(8 19
 
3.4%
free 11
 
2.0%
Other values (8) 12
 
2.1%

Interactions

2023-12-13T08:00:45.407171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:00:45.162893image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:00:45.504447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:00:45.275417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:00:49.877939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypecolCntdispFormat
NUM1.0000.9540.9540.9850.9850.7760.7510.585
gpId0.9541.0001.0001.0001.0000.7910.9550.631
gpNm0.9541.0001.0001.0001.0000.7910.9550.631
tblId0.9851.0001.0001.0001.0000.7090.9860.652
tblNm0.9851.0001.0001.0001.0000.7090.9860.652
dataType0.7760.7910.7910.7090.7091.0000.5410.958
colCnt0.7510.9550.9550.9860.9860.5411.0000.000
dispFormat0.5850.6310.6310.6520.6520.9580.0001.000
2023-12-13T08:00:49.994478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
dataTypetblNmdispFormatgpIdgpNmtblId
dataType1.0000.2330.7510.3590.3590.233
tblNm0.2331.0000.2970.9700.9701.000
dispFormat0.7510.2971.0000.3120.3120.297
gpId0.3590.9700.3121.0001.0000.970
gpNm0.3590.9700.3121.0001.0000.970
tblId0.2331.0000.2970.9700.9701.000
2023-12-13T08:00:50.122856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMcolCntgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.0000.1550.7500.7500.8530.8530.3940.312
colCnt0.1551.0000.8100.8100.8780.8780.2620.000
gpId0.7500.8101.0001.0000.9700.9700.3590.312
gpNm0.7500.8101.0001.0000.9700.9700.3590.312
tblId0.8530.8780.9700.9701.0001.0000.2330.297
tblNm0.8530.8780.9700.9701.0001.0000.2330.297
dataType0.3940.2620.3590.3590.2330.2331.0000.751
dispFormat0.3120.0000.3120.3120.2970.2970.7511.000

Missing values

2023-12-13T08:00:45.646942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:00:45.837308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01LUNG_TRGTSummaryLUNG_PT_TRGT기본정보PT_SBST_NO환자대체번호String(10)환자대체번호24809RN+비식별숫자(8)
12LUNG_TRGTSummaryLUNG_PT_TRGT기본정보SEX_CD성별 코드String(code)성별코드24791M 남 | F 여
23LUNG_TRGTSummaryLUNG_PT_TRGT기본정보BRTH_YMD생년월일DATE생년월일24791YYYY-MM-DD
34LUNG_TRGTSummaryLUNG_PT_TRGT기본정보FRST_DIAG_CD최초 진단 코드String(code)최초진단코드24809원내검사 코드
45LUNG_TRGTSummaryLUNG_PT_TRGT기본정보FRST_DIAG_YMD최초 진단일DATE최초진단일자24809YYYY-MM-DD
56LUNG_TRGTSummaryLUNG_PT_TRGT기본정보FRST_DIAG_NM최초 진단명String(256)최초진단명24809텍스트
67LUNG_TRGTSummaryLUNG_PT_TRGT기본정보DIAG_ATT_AGE진단 시 나이Integer(3)진단 시 나이24791숫자
78LUNG_TRGTSummaryLUNG_PT_TRGT기본정보FRMD_YMD초진일DATE초진일자24391YYYY-MM-DD
89LUNG_TRGTSummaryLUNG_PT_TRGT기본정보FRST_OPRT_YMD최초 수술일DATE최초 수술일자5307YYYY-MM-DD
910LUNG_TRGTSummaryLUNG_PT_TRGT기본정보FRST_OPRT_NM최초 수술명String(256)최초 수술명5307텍스트
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
243244LUNG_CHMO_FLST항암 FlowSheetLUNG_PE_CHMO_FLSTFlow SheetFATG_CMNTFATIGUE 내용String(50)FATIGUE128430텍스트
244245LUNG_CHMO_FLST항암 FlowSheetLUNG_PE_CHMO_FLSTFlow SheetNV_CMNTNV 내용String(50)NV128430텍스트
245246LUNG_CHMO_FLST항암 FlowSheetLUNG_PE_CHMO_FLSTFlow SheetCSTP_CMNTCONSTIPATION 내용String(50)CONSTIPATION128430텍스트
246247LUNG_CHMO_FLST항암 FlowSheetLUNG_PE_CHMO_FLSTFlow SheetDIAR_CMNTDIARRHEA 내용String(50)DIARRHEA128430텍스트
247248LUNG_CHMO_FLST항암 FlowSheetLUNG_PE_CHMO_FLSTFlow SheetSKIN_RASH_CMNTSKINRASH 내용String(50)SKINRASH128430텍스트
248249LUNG_CHMO_FLST항암 FlowSheetLUNG_PE_CHMO_FLSTFlow SheetMCST_CMNTMUCOSITIS 내용String(50)MUCOSITIS128430텍스트
249250LUNG_CHMO_FLST항암 FlowSheetLUNG_PE_CHMO_FLSTFlow SheetNURO_PTHY_CMNTNEUROPATHY 내용String(50)NEUROPATHY128430텍스트
250251LUNG_CHMO_FLST항암 FlowSheetLUNG_PE_CHMO_FLSTFlow SheetECOG_CDECOG 코드Integer(code)ECOG 전신상태평가70183숫자
251252LUNG_CHMO_FLST항암 FlowSheetLUNG_PE_CHMO_FLSTFlow SheetWT_VL체중 (kg)String(8)체중121580텍스트
252253LUNG_CHMO_FLST항암 FlowSheetLUNG_PE_CHMO_FLSTFlow SheetBSA_VLBSAFloat(102)체표면적121542숫자