Overview

Dataset statistics

Number of variables11
Number of observations309
Missing cells309
Missing cells (%)9.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory27.6 KiB
Average record size in memory91.4 B

Variable types

Numeric2
Categorical5
Text3
Unsupported1

Dataset

Description위암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048687/fileData.do

Alerts

tblNm is highly overall correlated with NUM and 4 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 3 other fieldsHigh correlation
gpNm is highly overall correlated with NUM and 3 other fieldsHigh correlation
tblId is highly overall correlated with NUM and 4 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 3 other fieldsHigh correlation
colCnt is highly overall correlated with tblId and 1 other fieldsHigh correlation
dispFormat has 309 (100.0%) missing valuesMissing
NUM has unique valuesUnique
dispFormat is an unsupported type, check if it needs cleaning or further analysisUnsupported
colCnt has 113 (36.6%) zerosZeros

Reproduction

Analysis started2023-12-12 05:20:51.941086
Analysis finished2023-12-12 05:20:53.299697
Duration1.36 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct309
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean155
Minimum1
Maximum309
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 KiB
2023-12-12T14:20:53.413496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16.4
Q178
median155
Q3232
95-th percentile293.6
Maximum309
Range308
Interquartile range (IQR)154

Descriptive statistics

Standard deviation89.344838
Coefficient of variation (CV)0.57641831
Kurtosis-1.2
Mean155
Median Absolute Deviation (MAD)77
Skewness0
Sum47895
Variance7982.5
MonotonicityStrictly increasing
2023-12-12T14:20:53.625760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.3%
205 1
 
0.3%
212 1
 
0.3%
211 1
 
0.3%
210 1
 
0.3%
209 1
 
0.3%
208 1
 
0.3%
207 1
 
0.3%
206 1
 
0.3%
204 1
 
0.3%
Other values (299) 299
96.8%
ValueCountFrequency (%)
1 1
0.3%
2 1
0.3%
3 1
0.3%
4 1
0.3%
5 1
0.3%
6 1
0.3%
7 1
0.3%
8 1
0.3%
9 1
0.3%
10 1
0.3%
ValueCountFrequency (%)
309 1
0.3%
308 1
0.3%
307 1
0.3%
306 1
0.3%
305 1
0.3%
304 1
0.3%
303 1
0.3%
302 1
0.3%
301 1
0.3%
300 1
0.3%

gpId
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
_GSTR_OPRT
110 
_GSTR_HLTH
80 
_GSTR_CEXM
62 
_GSTR_FLUP
37 
_GSTR_Summary
20 

Length

Max length13
Median length10
Mean length10.194175
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row_GSTR_Summary
2nd row_GSTR_Summary
3rd row_GSTR_Summary
4th row_GSTR_Summary
5th row_GSTR_Summary

Common Values

ValueCountFrequency (%)
_GSTR_OPRT 110
35.6%
_GSTR_HLTH 80
25.9%
_GSTR_CEXM 62
20.1%
_GSTR_FLUP 37
 
12.0%
_GSTR_Summary 20
 
6.5%

Length

2023-12-12T14:20:53.817823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:20:53.935161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
gstr_oprt 110
35.6%
gstr_hlth 80
25.9%
gstr_cexm 62
20.1%
gstr_flup 37
 
12.0%
gstr_summary 20
 
6.5%

gpNm
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
수술기록
110 
기타건강정보
80 
진단검사
62 
추적관찰
37 
Patient info
20 

Length

Max length12
Median length4
Mean length5.0355987
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPatient info
2nd rowPatient info
3rd rowPatient info
4th rowPatient info
5th rowPatient info

Common Values

ValueCountFrequency (%)
수술기록 110
35.6%
기타건강정보 80
25.9%
진단검사 62
20.1%
추적관찰 37
 
12.0%
Patient info 20
 
6.5%

Length

2023-12-12T14:20:54.089563image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:20:54.245918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
수술기록 110
33.4%
기타건강정보 80
24.3%
진단검사 62
18.8%
추적관찰 37
 
11.2%
patient 20
 
6.1%
info 20
 
6.1%

tblId
Categorical

HIGH CORRELATION 

Distinct21
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
_GSTR_MR_HLTH
80 
_GSTR_PE_OPRT
37 
_GSTR_PE_BIOP
35 
_GSTR_PE_CEXM
23 
_GSTR_PE_RTX
15 
Other values (16)
119 

Length

Max length18
Median length13
Mean length13.375405
Min length12

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row_GSTR_PT_TRGT
2nd row_GSTR_PT_TRGT
3rd row_GSTR_PT_TRGT
4th row_GSTR_PT_TRGT
5th row_GSTR_PT_TRGT

Common Values

ValueCountFrequency (%)
_GSTR_MR_HLTH 80
25.9%
_GSTR_PE_OPRT 37
12.0%
_GSTR_PE_BIOP 35
11.3%
_GSTR_PE_CEXM 23
 
7.4%
_GSTR_PE_RTX 15
 
4.9%
_GSTR_PE_SPR 15
 
4.9%
_GSTR_PE_CHMO 12
 
3.9%
_GSTR_PE_MIEX_FLUP 11
 
3.6%
_GSTR_PT_TRGT 9
 
2.9%
_GSTR_PE_ESD 9
 
2.9%
Other values (11) 63
20.4%

Length

2023-12-12T14:20:54.400684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gstr_mr_hlth 80
25.9%
gstr_pe_oprt 37
12.0%
gstr_pe_biop 35
11.3%
gstr_pe_cexm 23
 
7.4%
gstr_pe_rtx 15
 
4.9%
gstr_pe_spr 15
 
4.9%
gstr_pe_chmo 12
 
3.9%
gstr_pe_miex_flup 11
 
3.6%
gstr_pe_esd 9
 
2.9%
gstr_pe_sten 9
 
2.9%
Other values (11) 63
20.4%

tblNm
Categorical

HIGH CORRELATION 

Distinct21
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
위암 환자건강정보
80 
위암 수술기록
37 
위암 조직검사
35 
위암 진단검사
23 
위암 방사선치료
15 
Other values (16)
119 

Length

Max length25
Median length15
Mean length8.6569579
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row위암 대상자
2nd row위암 대상자
3rd row위암 대상자
4th row위암 대상자
5th row위암 대상자

Common Values

ValueCountFrequency (%)
위암 환자건강정보 80
25.9%
위암 수술기록 37
12.0%
위암 조직검사 35
11.3%
위암 진단검사 23
 
7.4%
위암 방사선치료 15
 
4.9%
위암 ESD 외과병리결과 15
 
4.9%
위암 항암치료 12
 
3.9%
위암 영상기능검사 11
 
3.6%
위암 대상자 9
 
2.9%
위암 ESD 검사 9
 
2.9%
Other values (11) 63
20.4%

Length

2023-12-12T14:20:54.590486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
위암 309
46.1%
환자건강정보 80
 
11.9%
수술기록 37
 
5.5%
조직검사 35
 
5.2%
진단검사 29
 
4.3%
esd 24
 
3.6%
방사선치료 15
 
2.2%
외과병리결과 15
 
2.2%
initial 15
 
2.2%
항암치료 12
 
1.8%
Other values (15) 100
 
14.9%

colId
Text

Distinct284
Distinct (%)91.9%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2023-12-12T14:20:54.926007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length15
Mean length11.498382
Min length5

Characters and Unicode

Total characters3553
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique261 ?
Unique (%)84.5%

Sample

1st rowDIAG_AGE
2nd rowSEX_CD
3rd rowFRMD_YMD
4th rowDRTR_YMD
5th rowOPRT_YMD
ValueCountFrequency (%)
comp_yn 3
 
1.0%
oprt_ymd 3
 
1.0%
mult_cnt 2
 
0.6%
diag_cmnt 2
 
0.6%
cexm_nm 2
 
0.6%
path_no 2
 
0.6%
ct_rslt_cmnt 2
 
0.6%
ct_ymd 2
 
0.6%
pa_rslt_cmnt 2
 
0.6%
pa_ymd 2
 
0.6%
Other values (274) 287
92.9%
2023-12-12T14:20:55.449224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 565
15.9%
T 345
 
9.7%
M 309
 
8.7%
N 297
 
8.4%
C 294
 
8.3%
S 203
 
5.7%
D 192
 
5.4%
R 152
 
4.3%
E 129
 
3.6%
H 125
 
3.5%
Other values (23) 942
26.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2970
83.6%
Connector Punctuation 565
 
15.9%
Decimal Number 18
 
0.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 345
11.6%
M 309
 
10.4%
N 297
 
10.0%
C 294
 
9.9%
S 203
 
6.8%
D 192
 
6.5%
R 152
 
5.1%
E 129
 
4.3%
H 125
 
4.2%
Y 111
 
3.7%
Other values (16) 813
27.4%
Decimal Number
ValueCountFrequency (%)
1 8
44.4%
2 5
27.8%
3 2
 
11.1%
5 1
 
5.6%
8 1
 
5.6%
0 1
 
5.6%
Connector Punctuation
ValueCountFrequency (%)
_ 565
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2970
83.6%
Common 583
 
16.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 345
11.6%
M 309
 
10.4%
N 297
 
10.0%
C 294
 
9.9%
S 203
 
6.8%
D 192
 
6.5%
R 152
 
5.1%
E 129
 
4.3%
H 125
 
4.2%
Y 111
 
3.7%
Other values (16) 813
27.4%
Common
ValueCountFrequency (%)
_ 565
96.9%
1 8
 
1.4%
2 5
 
0.9%
3 2
 
0.3%
5 1
 
0.2%
8 1
 
0.2%
0 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3553
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 565
15.9%
T 345
 
9.7%
M 309
 
8.7%
N 297
 
8.4%
C 294
 
8.3%
S 203
 
5.7%
D 192
 
5.4%
R 152
 
4.3%
E 129
 
3.6%
H 125
 
3.5%
Other values (23) 942
26.5%

colNm
Text

Distinct289
Distinct (%)93.5%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2023-12-12T14:20:55.822134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length300
Median length25
Mean length11.245955
Min length2

Characters and Unicode

Total characters3475
Distinct characters204
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique269 ?
Unique (%)87.1%

Sample

1st row진단시나이
2nd row성별코드
3rd row첫진료일자
4th row약물치료시작일
5th row수술일자
ValueCountFrequency (%)
egd 9
 
1.8%
결과 8
 
1.6%
location 7
 
1.4%
stage 7
 
1.4%
invasion 6
 
1.2%
type 6
 
1.2%
y:유 6
 
1.2%
n:무 6
 
1.2%
검사일 6
 
1.2%
상세내용 6
 
1.2%
Other values (318) 446
86.9%
2023-12-12T14:20:56.330251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
500
 
14.4%
i 145
 
4.2%
e 137
 
3.9%
t 133
 
3.8%
o 113
 
3.3%
n 102
 
2.9%
s 98
 
2.8%
a 96
 
2.8%
( 72
 
2.1%
) 72
 
2.1%
Other values (194) 2007
57.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1287
37.0%
Other Letter 1216
35.0%
Space Separator 500
 
14.4%
Uppercase Letter 265
 
7.6%
Open Punctuation 72
 
2.1%
Close Punctuation 72
 
2.1%
Other Punctuation 37
 
1.1%
Decimal Number 13
 
0.4%
Dash Punctuation 7
 
0.2%
Connector Punctuation 4
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
69
 
5.7%
58
 
4.8%
51
 
4.2%
44
 
3.6%
42
 
3.5%
39
 
3.2%
37
 
3.0%
29
 
2.4%
29
 
2.4%
28
 
2.3%
Other values (133) 790
65.0%
Lowercase Letter
ValueCountFrequency (%)
i 145
11.3%
e 137
10.6%
t 133
10.3%
o 113
8.8%
n 102
 
7.9%
s 98
 
7.6%
a 96
 
7.5%
r 67
 
5.2%
c 65
 
5.1%
l 54
 
4.2%
Other values (13) 277
21.5%
Uppercase Letter
ValueCountFrequency (%)
D 30
11.3%
C 26
 
9.8%
E 23
 
8.7%
S 20
 
7.5%
N 18
 
6.8%
I 18
 
6.8%
G 17
 
6.4%
A 14
 
5.3%
M 13
 
4.9%
L 12
 
4.5%
Other values (11) 74
27.9%
Decimal Number
ValueCountFrequency (%)
1 6
46.2%
2 3
23.1%
3 1
 
7.7%
0 1
 
7.7%
5 1
 
7.7%
8 1
 
7.7%
Other Punctuation
ValueCountFrequency (%)
/ 17
45.9%
: 12
32.4%
. 6
 
16.2%
% 2
 
5.4%
Letter Number
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
500
100.0%
Open Punctuation
ValueCountFrequency (%)
( 72
100.0%
Close Punctuation
ValueCountFrequency (%)
) 72
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1554
44.7%
Hangul 1216
35.0%
Common 705
20.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
69
 
5.7%
58
 
4.8%
51
 
4.2%
44
 
3.6%
42
 
3.5%
39
 
3.2%
37
 
3.0%
29
 
2.4%
29
 
2.4%
28
 
2.3%
Other values (133) 790
65.0%
Latin
ValueCountFrequency (%)
i 145
 
9.3%
e 137
 
8.8%
t 133
 
8.6%
o 113
 
7.3%
n 102
 
6.6%
s 98
 
6.3%
a 96
 
6.2%
r 67
 
4.3%
c 65
 
4.2%
l 54
 
3.5%
Other values (36) 544
35.0%
Common
ValueCountFrequency (%)
500
70.9%
( 72
 
10.2%
) 72
 
10.2%
/ 17
 
2.4%
: 12
 
1.7%
- 7
 
1.0%
. 6
 
0.9%
1 6
 
0.9%
_ 4
 
0.6%
2 3
 
0.4%
Other values (5) 6
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2257
64.9%
Hangul 1216
35.0%
Number Forms 2
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
500
22.2%
i 145
 
6.4%
e 137
 
6.1%
t 133
 
5.9%
o 113
 
5.0%
n 102
 
4.5%
s 98
 
4.3%
a 96
 
4.3%
( 72
 
3.2%
) 72
 
3.2%
Other values (49) 789
35.0%
Hangul
ValueCountFrequency (%)
69
 
5.7%
58
 
4.8%
51
 
4.2%
44
 
3.6%
42
 
3.5%
39
 
3.2%
37
 
3.0%
29
 
2.4%
29
 
2.4%
28
 
2.3%
Other values (133) 790
65.0%
Number Forms
ValueCountFrequency (%)
1
50.0%
1
50.0%

dataType
Categorical

Distinct3
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
STRING
252 
DATE
36 
INTEGER
 
21

Length

Max length7
Median length6
Mean length5.8349515
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowINTEGER
2nd rowSTRING
3rd rowDATE
4th rowDATE
5th rowDATE

Common Values

ValueCountFrequency (%)
STRING 252
81.6%
DATE 36
 
11.7%
INTEGER 21
 
6.8%

Length

2023-12-12T14:20:56.519514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:20:56.625159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
string 252
81.6%
date 36
 
11.7%
integer 21
 
6.8%
Distinct289
Distinct (%)93.5%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2023-12-12T14:20:56.871182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length1024
Median length25
Mean length13.588997
Min length2

Characters and Unicode

Total characters4199
Distinct characters204
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique269 ?
Unique (%)87.1%

Sample

1st row진단시나이
2nd row성별코드
3rd row첫진료일자
4th row약물치료시작일
5th row수술일자
ValueCountFrequency (%)
egd 9
 
1.8%
결과 8
 
1.6%
location 7
 
1.4%
stage 7
 
1.4%
invasion 6
 
1.2%
type 6
 
1.2%
y:유 6
 
1.2%
n:무 6
 
1.2%
검사일 6
 
1.2%
상세내용 6
 
1.2%
Other values (318) 446
86.9%
2023-12-12T14:20:57.267022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1224
29.1%
i 145
 
3.5%
e 137
 
3.3%
t 133
 
3.2%
o 113
 
2.7%
n 102
 
2.4%
s 98
 
2.3%
a 96
 
2.3%
( 72
 
1.7%
) 72
 
1.7%
Other values (194) 2007
47.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1287
30.7%
Space Separator 1224
29.1%
Other Letter 1216
29.0%
Uppercase Letter 265
 
6.3%
Open Punctuation 72
 
1.7%
Close Punctuation 72
 
1.7%
Other Punctuation 37
 
0.9%
Decimal Number 13
 
0.3%
Dash Punctuation 7
 
0.2%
Connector Punctuation 4
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
69
 
5.7%
58
 
4.8%
51
 
4.2%
44
 
3.6%
42
 
3.5%
39
 
3.2%
37
 
3.0%
29
 
2.4%
29
 
2.4%
28
 
2.3%
Other values (133) 790
65.0%
Lowercase Letter
ValueCountFrequency (%)
i 145
11.3%
e 137
10.6%
t 133
10.3%
o 113
8.8%
n 102
 
7.9%
s 98
 
7.6%
a 96
 
7.5%
r 67
 
5.2%
c 65
 
5.1%
l 54
 
4.2%
Other values (13) 277
21.5%
Uppercase Letter
ValueCountFrequency (%)
D 30
11.3%
C 26
 
9.8%
E 23
 
8.7%
S 20
 
7.5%
N 18
 
6.8%
I 18
 
6.8%
G 17
 
6.4%
A 14
 
5.3%
M 13
 
4.9%
L 12
 
4.5%
Other values (11) 74
27.9%
Decimal Number
ValueCountFrequency (%)
1 6
46.2%
2 3
23.1%
3 1
 
7.7%
0 1
 
7.7%
5 1
 
7.7%
8 1
 
7.7%
Other Punctuation
ValueCountFrequency (%)
/ 17
45.9%
: 12
32.4%
. 6
 
16.2%
% 2
 
5.4%
Letter Number
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
1224
100.0%
Open Punctuation
ValueCountFrequency (%)
( 72
100.0%
Close Punctuation
ValueCountFrequency (%)
) 72
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1554
37.0%
Common 1429
34.0%
Hangul 1216
29.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
69
 
5.7%
58
 
4.8%
51
 
4.2%
44
 
3.6%
42
 
3.5%
39
 
3.2%
37
 
3.0%
29
 
2.4%
29
 
2.4%
28
 
2.3%
Other values (133) 790
65.0%
Latin
ValueCountFrequency (%)
i 145
 
9.3%
e 137
 
8.8%
t 133
 
8.6%
o 113
 
7.3%
n 102
 
6.6%
s 98
 
6.3%
a 96
 
6.2%
r 67
 
4.3%
c 65
 
4.2%
l 54
 
3.5%
Other values (36) 544
35.0%
Common
ValueCountFrequency (%)
1224
85.7%
( 72
 
5.0%
) 72
 
5.0%
/ 17
 
1.2%
: 12
 
0.8%
- 7
 
0.5%
. 6
 
0.4%
1 6
 
0.4%
_ 4
 
0.3%
2 3
 
0.2%
Other values (5) 6
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2981
71.0%
Hangul 1216
29.0%
Number Forms 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1224
41.1%
i 145
 
4.9%
e 137
 
4.6%
t 133
 
4.5%
o 113
 
3.8%
n 102
 
3.4%
s 98
 
3.3%
a 96
 
3.2%
( 72
 
2.4%
) 72
 
2.4%
Other values (49) 789
26.5%
Hangul
ValueCountFrequency (%)
69
 
5.7%
58
 
4.8%
51
 
4.2%
44
 
3.6%
42
 
3.5%
39
 
3.2%
37
 
3.0%
29
 
2.4%
29
 
2.4%
28
 
2.3%
Other values (133) 790
65.0%
Number Forms
ValueCountFrequency (%)
1
50.0%
1
50.0%

colCnt
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct97
Distinct (%)31.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37756.388
Minimum0
Maximum908669
Zeros113
Zeros (%)36.6%
Negative0
Negative (%)0.0%
Memory size2.8 KiB
2023-12-12T14:20:57.402945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1499
Q38697
95-th percentile214531.2
Maximum908669
Range908669
Interquartile range (IQR)8697

Descriptive statistics

Standard deviation136944.26
Coefficient of variation (CV)3.6270487
Kurtosis28.133127
Mean37756.388
Median Absolute Deviation (MAD)1499
Skewness5.2092516
Sum11666724
Variance1.875373 × 1010
MonotonicityNot monotonic
2023-12-12T14:20:57.523877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 113
36.6%
8697 42
 
13.6%
1148 11
 
3.6%
23174 7
 
2.3%
3196 7
 
2.3%
908669 5
 
1.6%
68620 4
 
1.3%
7627 4
 
1.3%
90737 4
 
1.3%
226585 3
 
1.0%
Other values (87) 109
35.3%
ValueCountFrequency (%)
0 113
36.6%
7 1
 
0.3%
41 1
 
0.3%
42 1
 
0.3%
103 1
 
0.3%
177 1
 
0.3%
259 1
 
0.3%
432 1
 
0.3%
530 1
 
0.3%
588 1
 
0.3%
ValueCountFrequency (%)
908669 5
1.6%
607968 3
1.0%
607933 1
 
0.3%
226585 3
1.0%
217029 3
1.0%
216976 1
 
0.3%
210864 3
1.0%
111067 2
 
0.6%
100622 2
 
0.6%
90737 4
1.3%

dispFormat
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing309
Missing (%)100.0%
Memory size2.8 KiB

Interactions

2023-12-12T14:20:52.781025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:20:52.572762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:20:52.871449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:20:52.672226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:20:57.604614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypecolCnt
NUM1.0000.9870.9870.9670.9670.4560.726
gpId0.9871.0001.0001.0001.0000.3430.621
gpNm0.9871.0001.0001.0001.0000.3430.621
tblId0.9671.0001.0001.0001.0000.6490.883
tblNm0.9671.0001.0001.0001.0000.6490.883
dataType0.4560.3430.3430.6490.6491.0000.114
colCnt0.7260.6210.6210.8830.8830.1141.000
2023-12-12T14:20:57.702131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
tblNmgpIddataTypegpNmtblId
tblNm1.0000.9730.3740.9731.000
gpId0.9731.0000.2751.0000.973
dataType0.3740.2751.0000.2750.374
gpNm0.9731.0000.2751.0000.973
tblId1.0000.9730.3740.9731.000
2023-12-12T14:20:57.792078image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMcolCntgpIdgpNmtblIdtblNmdataType
NUM1.0000.0800.8330.8330.8040.8040.304
colCnt0.0801.0000.2740.2740.6610.6610.085
gpId0.8330.2741.0001.0000.9730.9730.275
gpNm0.8330.2741.0001.0000.9730.9730.275
tblId0.8040.6610.9730.9731.0001.0000.374
tblNm0.8040.6610.9730.9731.0001.0000.374
dataType0.3040.0850.2750.2750.3740.3741.000

Missing values

2023-12-12T14:20:53.040905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:20:53.234245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01_GSTR_SummaryPatient info_GSTR_PT_TRGT위암 대상자DIAG_AGE진단시나이INTEGER진단시나이20886<NA>
12_GSTR_SummaryPatient info_GSTR_PT_TRGT위암 대상자SEX_CD성별코드STRING성별코드20886<NA>
23_GSTR_SummaryPatient info_GSTR_PT_TRGT위암 대상자FRMD_YMD첫진료일자DATE첫진료일자19927<NA>
34_GSTR_SummaryPatient info_GSTR_PT_TRGT위암 대상자DRTR_YMD약물치료시작일DATE약물치료시작일16915<NA>
45_GSTR_SummaryPatient info_GSTR_PT_TRGT위암 대상자OPRT_YMD수술일자DATE수술일자7624<NA>
56_GSTR_SummaryPatient info_GSTR_PT_TRGT위암 대상자OPRT_CD수술코드STRING수술코드7627<NA>
67_GSTR_SummaryPatient info_GSTR_PT_TRGT위암 대상자OPRT_NM수술명STRING수술명7627<NA>
78_GSTR_SummaryPatient info_GSTR_PT_TRGT위암 대상자OPDR_ID집도의IDSTRING집도의ID7627<NA>
89_GSTR_SummaryPatient info_GSTR_PT_TRGT위암 대상자OPDR_NM집도의명STRING집도의명7627<NA>
910_GSTR_SummaryPatient info_GSTR_RG_CNDX위암 진단정보INIT_DIAG_YMD위암 최초 진단일DATE위암 최초 진단일9669<NA>
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
299300_GSTR_HLTH기타건강정보_GSTR_MR_HLTH위암 환자건강정보PHIS_INSM_YN과거병력불면증여부STRING과거병력불면증여부8697<NA>
300301_GSTR_HLTH기타건강정보_GSTR_MR_HLTH위암 환자건강정보PHIS_CADZ_YN과거병력심장질환여부STRING과거병력심장질환여부8697<NA>
301302_GSTR_HLTH기타건강정보_GSTR_MR_HLTH위암 환자건강정보PHIS_ETC_YN과거병력기타여부STRING과거병력기타여부8697<NA>
302303_GSTR_HLTH기타건강정보_GSTR_MR_HLTH위암 환자건강정보PHIS_HTN_CMNT과거병력고혈압내용STRING과거병력고혈압내용1419<NA>
303304_GSTR_HLTH기타건강정보_GSTR_MR_HLTH위암 환자건강정보PHIS_DM_CMNT과거병력당뇨내용STRING과거병력당뇨내용653<NA>
304305_GSTR_HLTH기타건강정보_GSTR_MR_HLTH위암 환자건강정보PHIS_CADZ_CMNT과거병력심장질환내용STRING과거병력심장질환내용177<NA>
305306_GSTR_HLTH기타건강정보_GSTR_MR_HLTH위암 환자건강정보PHIS_ETC_CMNT과거병력기타내용STRING과거병력기타내용3376<NA>
306307_GSTR_HLTH기타건강정보_GSTR_MR_HLTH위암 환자건강정보MAIN_SYMP_YN주증상유무STRING주증상유무8697<NA>
307308_GSTR_HLTH기타건강정보_GSTR_MR_HLTH위암 환자건강정보MAIN_SYMP_CMNT주증상내용STRING주증상내용5500<NA>
308309_GSTR_HLTH기타건강정보_GSTR_MR_HLTH위암 환자건강정보OUTS_DIAG_TRANS_YN타병원진단후전원여부STRING타병원진단후전원여부0<NA>