Overview

Dataset statistics

Number of variables11
Number of observations613
Missing cells613
Missing cells (%)9.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.0 KiB
Average record size in memory90.2 B

Variable types

Numeric1
Categorical4
Text5
Unsupported1

Dataset

Description자궁암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048702/fileData.do

Alerts

gpNm is highly overall correlated with NUM and 1 other fieldsHigh correlation
gpId is highly overall correlated with NUM and 1 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 1 other fieldsHigh correlation
dataType is highly overall correlated with dispFormatHigh correlation
dispFormat is highly overall correlated with dataTypeHigh correlation
colCnt has 613 (100.0%) missing valuesMissing
NUM has unique valuesUnique
colCnt is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-19 06:23:20.886571
Analysis finished2024-04-19 06:23:21.958002
Duration1.07 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct613
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean307
Minimum1
Maximum613
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2024-04-19T15:23:22.022823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile31.6
Q1154
median307
Q3460
95-th percentile582.4
Maximum613
Range612
Interquartile range (IQR)306

Descriptive statistics

Standard deviation177.10214
Coefficient of variation (CV)0.57687992
Kurtosis-1.2
Mean307
Median Absolute Deviation (MAD)153
Skewness0
Sum188191
Variance31365.167
MonotonicityStrictly increasing
2024-04-19T15:23:22.162631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.2%
413 1
 
0.2%
406 1
 
0.2%
407 1
 
0.2%
408 1
 
0.2%
409 1
 
0.2%
410 1
 
0.2%
411 1
 
0.2%
412 1
 
0.2%
414 1
 
0.2%
Other values (603) 603
98.4%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
613 1
0.2%
612 1
0.2%
611 1
0.2%
610 1
0.2%
609 1
0.2%
608 1
0.2%
607 1
0.2%
606 1
0.2%
605 1
0.2%
604 1
0.2%

gpId
Categorical

HIGH CORRELATION 

Distinct24
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
UTRN_OPRT_STOC
210 
UTRN_HLTH
93 
UTRN_OPRT
85 
UTRN_SPR
45 
UTRN_CHMO_FLST
26 
Other values (19)
154 

Length

Max length14
Median length10
Mean length11.231648
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUTRN_TRGT
2nd rowUTRN_TRGT
3rd rowUTRN_TRGT
4th rowUTRN_TRGT
5th rowUTRN_TRGT

Common Values

ValueCountFrequency (%)
UTRN_OPRT_STOC 210
34.3%
UTRN_HLTH 93
15.2%
UTRN_OPRT 85
13.9%
UTRN_SPR 45
 
7.3%
UTRN_CHMO_FLST 26
 
4.2%
UTRN_TRGT 16
 
2.6%
UTRN_IMNL 13
 
2.1%
UTRN_CNDX_BDMS 12
 
2.0%
UTRN_CHMO 12
 
2.0%
UTRN_RTX 11
 
1.8%
Other values (14) 90
14.7%

Length

2024-04-19T15:23:22.303681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
utrn_oprt_stoc 210
34.3%
utrn_hlth 93
15.2%
utrn_oprt 85
13.9%
utrn_spr 45
 
7.3%
utrn_chmo_flst 26
 
4.2%
utrn_trgt 16
 
2.6%
utrn_imnl 13
 
2.1%
utrn_cndx_bdms 12
 
2.0%
utrn_chmo 12
 
2.0%
utrn_rtx 11
 
1.8%
Other values (14) 90
14.7%

gpNm
Categorical

HIGH CORRELATION 

Distinct24
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
수술정보(SOTOC)
210 
환자건강정보
93 
수술
85 
외과병리
45 
항암 FlowSheet
26 
Other values (19)
154 

Length

Max length12
Median length11
Mean length7.4241436
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSummary
2nd rowSummary
3rd rowSummary
4th rowSummary
5th rowSummary

Common Values

ValueCountFrequency (%)
수술정보(SOTOC) 210
34.3%
환자건강정보 93
15.2%
수술 85
13.9%
외과병리 45
 
7.3%
항암 FlowSheet 26
 
4.2%
Summary 16
 
2.6%
면역병리검사 13
 
2.1%
진단 및 신체 12
 
2.0%
항암치료 12
 
2.0%
방사선치료 11
 
1.8%
Other values (14) 90
14.7%

Length

2024-04-19T15:23:22.782354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
수술정보(sotoc 210
29.7%
환자건강정보 93
13.2%
수술 85
12.0%
외과병리 45
 
6.4%
항암 26
 
3.7%
flowsheet 26
 
3.7%
22
 
3.1%
summary 16
 
2.3%
영상검사 14
 
2.0%
면역병리검사 13
 
1.8%
Other values (18) 157
22.2%

tblId
Text

Distinct62
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
2024-04-19T15:23:22.989309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length18
Mean length15.513866
Min length10

Characters and Unicode

Total characters9510
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUTRN_PT_TRGT
2nd rowUTRN_PT_TRGT
3rd rowUTRN_PT_TRGT
4th rowUTRN_PT_TRGT
5th rowUTRN_PT_TRGT
ValueCountFrequency (%)
utrn_pe_oprt 74
 
12.1%
utrn_pe_spr 45
 
7.3%
utrn_pe_chmo_flst 26
 
4.2%
utrn_pe_oprt_stoc_5 24
 
3.9%
utrn_pe_oprt_stoc_9 23
 
3.8%
utrn_pe_oprt_stoc_6 23
 
3.8%
utrn_mr_hlth_2 19
 
3.1%
utrn_pe_oprt_stoc_3 17
 
2.8%
utrn_pt_trgt 16
 
2.6%
utrn_pe_oprt_stoc_4 15
 
2.4%
Other values (52) 331
54.0%
2024-04-19T15:23:23.342672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 1828
19.2%
T 1334
14.0%
R 1090
11.5%
P 861
9.1%
N 695
 
7.3%
U 625
 
6.6%
E 587
 
6.2%
O 537
 
5.6%
S 310
 
3.3%
C 287
 
3.0%
Other values (22) 1356
14.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 7238
76.1%
Connector Punctuation 1828
 
19.2%
Decimal Number 444
 
4.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 1334
18.4%
R 1090
15.1%
P 861
11.9%
N 695
9.6%
U 625
8.6%
E 587
8.1%
O 537
7.4%
S 310
 
4.3%
C 287
 
4.0%
H 244
 
3.4%
Other values (11) 668
9.2%
Decimal Number
ValueCountFrequency (%)
1 107
24.1%
2 91
20.5%
6 44
9.9%
9 36
 
8.1%
5 36
 
8.1%
3 34
 
7.7%
4 34
 
7.7%
8 28
 
6.3%
7 23
 
5.2%
0 11
 
2.5%
Connector Punctuation
ValueCountFrequency (%)
_ 1828
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7238
76.1%
Common 2272
 
23.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 1334
18.4%
R 1090
15.1%
P 861
11.9%
N 695
9.6%
U 625
8.6%
E 587
8.1%
O 537
7.4%
S 310
 
4.3%
C 287
 
4.0%
H 244
 
3.4%
Other values (11) 668
9.2%
Common
ValueCountFrequency (%)
_ 1828
80.5%
1 107
 
4.7%
2 91
 
4.0%
6 44
 
1.9%
9 36
 
1.6%
5 36
 
1.6%
3 34
 
1.5%
4 34
 
1.5%
8 28
 
1.2%
7 23
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9510
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 1828
19.2%
T 1334
14.0%
R 1090
11.5%
P 861
9.1%
N 695
 
7.3%
U 625
 
6.6%
E 587
 
6.2%
O 537
 
5.6%
S 310
 
3.3%
C 287
 
3.0%
Other values (22) 1356
14.3%

tblNm
Text

Distinct62
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
2024-04-19T15:23:23.607234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length36
Median length27
Mean length8.1076672
Min length2

Characters and Unicode

Total characters4970
Distinct characters115
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기본정보
2nd row기본정보
3rd row기본정보
4th row기본정보
5th row기본정보
ValueCountFrequency (%)
결과 79
 
7.7%
수술정보 74
 
7.2%
ln 71
 
6.9%
66
 
6.4%
수술 45
 
4.4%
45
 
4.4%
sheet 26
 
2.5%
flow 26
 
2.5%
ruq 24
 
2.3%
pelvis 23
 
2.2%
Other values (71) 552
53.5%
2024-04-19T15:23:24.042134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
466
 
9.4%
a 213
 
4.3%
L 188
 
3.8%
l 180
 
3.6%
i 174
 
3.5%
e 173
 
3.5%
158
 
3.2%
r 157
 
3.2%
153
 
3.1%
t 148
 
3.0%
Other values (105) 2960
59.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1761
35.4%
Other Letter 1569
31.6%
Uppercase Letter 804
16.2%
Space Separator 466
 
9.4%
Dash Punctuation 102
 
2.1%
Decimal Number 88
 
1.8%
Other Punctuation 68
 
1.4%
Open Punctuation 56
 
1.1%
Close Punctuation 56
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
158
 
10.1%
153
 
9.8%
147
 
9.4%
130
 
8.3%
124
 
7.9%
114
 
7.3%
65
 
4.1%
45
 
2.9%
41
 
2.6%
38
 
2.4%
Other values (52) 554
35.3%
Lowercase Letter
ValueCountFrequency (%)
a 213
12.1%
l 180
10.2%
i 174
9.9%
e 173
9.8%
r 157
8.9%
t 148
8.4%
o 131
 
7.4%
n 96
 
5.5%
c 73
 
4.1%
s 72
 
4.1%
Other values (12) 344
19.5%
Uppercase Letter
ValueCountFrequency (%)
L 188
23.4%
N 125
15.5%
P 66
 
8.2%
S 60
 
7.5%
U 53
 
6.6%
C 48
 
6.0%
Q 41
 
5.1%
F 38
 
4.7%
R 36
 
4.5%
O 33
 
4.1%
Other values (8) 116
14.4%
Decimal Number
ValueCountFrequency (%)
2 29
33.0%
1 15
17.0%
6 12
13.6%
8 10
 
11.4%
5 6
 
6.8%
4 6
 
6.8%
3 6
 
6.8%
7 4
 
4.5%
Space Separator
ValueCountFrequency (%)
466
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 102
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 68
100.0%
Open Punctuation
ValueCountFrequency (%)
( 56
100.0%
Close Punctuation
ValueCountFrequency (%)
) 56
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2565
51.6%
Hangul 1569
31.6%
Common 836
 
16.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
158
 
10.1%
153
 
9.8%
147
 
9.4%
130
 
8.3%
124
 
7.9%
114
 
7.3%
65
 
4.1%
45
 
2.9%
41
 
2.6%
38
 
2.4%
Other values (52) 554
35.3%
Latin
ValueCountFrequency (%)
a 213
 
8.3%
L 188
 
7.3%
l 180
 
7.0%
i 174
 
6.8%
e 173
 
6.7%
r 157
 
6.1%
t 148
 
5.8%
o 131
 
5.1%
N 125
 
4.9%
n 96
 
3.7%
Other values (30) 980
38.2%
Common
ValueCountFrequency (%)
466
55.7%
- 102
 
12.2%
/ 68
 
8.1%
( 56
 
6.7%
) 56
 
6.7%
2 29
 
3.5%
1 15
 
1.8%
6 12
 
1.4%
8 10
 
1.2%
5 6
 
0.7%
Other values (3) 16
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3401
68.4%
Hangul 1569
31.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
466
 
13.7%
a 213
 
6.3%
L 188
 
5.5%
l 180
 
5.3%
i 174
 
5.1%
e 173
 
5.1%
r 157
 
4.6%
t 148
 
4.4%
o 131
 
3.9%
N 125
 
3.7%
Other values (43) 1446
42.5%
Hangul
ValueCountFrequency (%)
158
 
10.1%
153
 
9.8%
147
 
9.4%
130
 
8.3%
124
 
7.9%
114
 
7.3%
65
 
4.1%
45
 
2.9%
41
 
2.6%
38
 
2.4%
Other values (52) 554
35.3%

colId
Text

Distinct539
Distinct (%)87.9%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
2024-04-19T15:23:24.304566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length22
Mean length14.176183
Min length5

Characters and Unicode

Total characters8690
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique508 ?
Unique (%)82.9%

Sample

1st rowPT_SBST_NO
2nd rowSEX_CD
3rd rowBRTH_YMD
4th rowFRST_DIAG_YMD
5th rowFRST_DIAG_CD
ValueCountFrequency (%)
pt_sbst_no 28
 
4.6%
exam_ymd 6
 
1.0%
exam_nm 6
 
1.0%
exam_yn 6
 
1.0%
oprt_ymd 5
 
0.8%
chmo_strt_ymd 3
 
0.5%
oprt_nm 3
 
0.5%
gene_muta_cd 2
 
0.3%
chmo_prps_nm 2
 
0.3%
cexm_nm 2
 
0.3%
Other values (529) 550
89.7%
2024-04-19T15:23:24.715379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 1562
18.0%
N 761
 
8.8%
T 737
 
8.5%
M 558
 
6.4%
L 556
 
6.4%
S 481
 
5.5%
C 453
 
5.2%
R 431
 
5.0%
P 380
 
4.4%
E 370
 
4.3%
Other values (25) 2401
27.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 7067
81.3%
Connector Punctuation 1562
 
18.0%
Decimal Number 61
 
0.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 761
 
10.8%
T 737
 
10.4%
M 558
 
7.9%
L 556
 
7.9%
S 481
 
6.8%
C 453
 
6.4%
R 431
 
6.1%
P 380
 
5.4%
E 370
 
5.2%
Y 345
 
4.9%
Other values (16) 1995
28.2%
Decimal Number
ValueCountFrequency (%)
2 14
23.0%
1 10
16.4%
3 9
14.8%
6 8
13.1%
4 6
9.8%
5 6
9.8%
7 4
 
6.6%
8 4
 
6.6%
Connector Punctuation
ValueCountFrequency (%)
_ 1562
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7067
81.3%
Common 1623
 
18.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 761
 
10.8%
T 737
 
10.4%
M 558
 
7.9%
L 556
 
7.9%
S 481
 
6.8%
C 453
 
6.4%
R 431
 
6.1%
P 380
 
5.4%
E 370
 
5.2%
Y 345
 
4.9%
Other values (16) 1995
28.2%
Common
ValueCountFrequency (%)
_ 1562
96.2%
2 14
 
0.9%
1 10
 
0.6%
3 9
 
0.6%
6 8
 
0.5%
4 6
 
0.4%
5 6
 
0.4%
7 4
 
0.2%
8 4
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8690
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 1562
18.0%
N 761
 
8.8%
T 737
 
8.5%
M 558
 
6.4%
L 556
 
6.4%
S 481
 
5.5%
C 453
 
5.2%
R 431
 
5.0%
P 380
 
4.4%
E 370
 
4.3%
Other values (25) 2401
27.6%

colNm
Text

Distinct519
Distinct (%)84.7%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
2024-04-19T15:23:25.038079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length50
Median length41
Mean length17.154976
Min length2

Characters and Unicode

Total characters10516
Distinct characters213
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique491 ?
Unique (%)80.1%

Sample

1st row환자대체번호
2nd row성별 코드
3rd row생년월일
4th row최초 진단일
5th row최초 진단 코드
ValueCountFrequency (%)
여부 190
 
10.1%
내용 135
 
7.2%
57
 
3.0%
size 43
 
2.3%
right 36
 
1.9%
left 34
 
1.8%
lnd 33
 
1.8%
other 32
 
1.7%
환자대체번호 28
 
1.5%
lns 27
 
1.4%
Other values (444) 1267
67.3%
2024-04-19T15:23:25.480232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1270
 
12.1%
e 616
 
5.9%
i 510
 
4.8%
t 503
 
4.8%
a 453
 
4.3%
o 451
 
4.3%
r 448
 
4.3%
c 270
 
2.6%
l 265
 
2.5%
s 260
 
2.5%
Other values (203) 5470
52.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5307
50.5%
Other Letter 2111
 
20.1%
Uppercase Letter 1348
 
12.8%
Space Separator 1270
 
12.1%
Close Punctuation 114
 
1.1%
Open Punctuation 114
 
1.1%
Dash Punctuation 105
 
1.0%
Other Punctuation 83
 
0.8%
Decimal Number 61
 
0.6%
Math Symbol 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
248
 
11.7%
226
 
10.7%
150
 
7.1%
146
 
6.9%
58
 
2.7%
54
 
2.6%
53
 
2.5%
53
 
2.5%
50
 
2.4%
49
 
2.3%
Other values (139) 1024
48.5%
Lowercase Letter
ValueCountFrequency (%)
e 616
11.6%
i 510
 
9.6%
t 503
 
9.5%
a 453
 
8.5%
o 451
 
8.5%
r 448
 
8.4%
c 270
 
5.1%
l 265
 
5.0%
s 260
 
4.9%
n 256
 
4.8%
Other values (14) 1275
24.0%
Uppercase Letter
ValueCountFrequency (%)
L 253
18.8%
P 161
11.9%
N 146
10.8%
S 130
9.6%
R 91
 
6.8%
O 89
 
6.6%
C 67
 
5.0%
A 54
 
4.0%
D 53
 
3.9%
U 48
 
3.6%
Other values (14) 256
19.0%
Decimal Number
ValueCountFrequency (%)
2 13
21.3%
1 10
16.4%
3 9
14.8%
6 8
13.1%
5 7
11.5%
4 6
9.8%
7 4
 
6.6%
8 4
 
6.6%
Other Punctuation
ValueCountFrequency (%)
/ 58
69.9%
% 13
 
15.7%
: 12
 
14.5%
Space Separator
ValueCountFrequency (%)
1270
100.0%
Close Punctuation
ValueCountFrequency (%)
) 114
100.0%
Open Punctuation
ValueCountFrequency (%)
( 114
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105
100.0%
Math Symbol
ValueCountFrequency (%)
+ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6655
63.3%
Hangul 2111
 
20.1%
Common 1750
 
16.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
248
 
11.7%
226
 
10.7%
150
 
7.1%
146
 
6.9%
58
 
2.7%
54
 
2.6%
53
 
2.5%
53
 
2.5%
50
 
2.4%
49
 
2.3%
Other values (139) 1024
48.5%
Latin
ValueCountFrequency (%)
e 616
 
9.3%
i 510
 
7.7%
t 503
 
7.6%
a 453
 
6.8%
o 451
 
6.8%
r 448
 
6.7%
c 270
 
4.1%
l 265
 
4.0%
s 260
 
3.9%
n 256
 
3.8%
Other values (38) 2623
39.4%
Common
ValueCountFrequency (%)
1270
72.6%
) 114
 
6.5%
( 114
 
6.5%
- 105
 
6.0%
/ 58
 
3.3%
2 13
 
0.7%
% 13
 
0.7%
: 12
 
0.7%
1 10
 
0.6%
3 9
 
0.5%
Other values (6) 32
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8405
79.9%
Hangul 2111
 
20.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1270
 
15.1%
e 616
 
7.3%
i 510
 
6.1%
t 503
 
6.0%
a 453
 
5.4%
o 451
 
5.4%
r 448
 
5.3%
c 270
 
3.2%
l 265
 
3.2%
s 260
 
3.1%
Other values (54) 3359
40.0%
Hangul
ValueCountFrequency (%)
248
 
11.7%
226
 
10.7%
150
 
7.1%
146
 
6.9%
58
 
2.7%
54
 
2.6%
53
 
2.5%
53
 
2.5%
50
 
2.4%
49
 
2.3%
Other values (139) 1024
48.5%

dataType
Categorical

HIGH CORRELATION 

Distinct29
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
String(1)
238 
String(50)
76 
String(100)
62 
DATE
49 
String(10)
40 
Other values (24)
148 

Length

Max length13
Median length12
Mean length9.4159869
Min length4

Unique

Unique6 ?
Unique (%)1.0%

Sample

1st rowString(10)
2nd rowString(code)
3rd rowDATE
4th rowDATE
5th rowString(code)

Common Values

ValueCountFrequency (%)
String(1) 238
38.8%
String(50) 76
 
12.4%
String(100) 62
 
10.1%
DATE 49
 
8.0%
String(10) 40
 
6.5%
String(200) 28
 
4.6%
String(20) 24
 
3.9%
Integer(code) 15
 
2.4%
String(4000) 11
 
1.8%
String(256) 10
 
1.6%
Other values (19) 60
 
9.8%

Length

2024-04-19T15:23:25.605080image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
string(1 238
38.8%
string(50 76
 
12.4%
string(100 62
 
10.1%
date 49
 
8.0%
string(10 40
 
6.5%
string(200 28
 
4.6%
string(20 24
 
3.9%
integer(code 15
 
2.4%
string(4000 11
 
1.8%
string(256 10
 
1.6%
Other values (19) 60
 
9.8%
Distinct537
Distinct (%)87.6%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
2024-04-19T15:23:25.874437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length52
Median length41
Mean length16.877651
Min length2

Characters and Unicode

Total characters10346
Distinct characters222
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique504 ?
Unique (%)82.2%

Sample

1st row환자대체번호
2nd row성별코드
3rd row생년월일
4th row최초진단일자
5th row최초진단코드
ValueCountFrequency (%)
여부 102
 
5.8%
내용 90
 
5.1%
59
 
3.3%
size 43
 
2.4%
right 36
 
2.0%
left 34
 
1.9%
lnd 33
 
1.9%
other 32
 
1.8%
환자대체번호 28
 
1.6%
lns 27
 
1.5%
Other values (490) 1278
72.5%
2024-04-19T15:23:26.354567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1151
 
11.1%
e 611
 
5.9%
i 515
 
5.0%
t 499
 
4.8%
o 451
 
4.4%
a 449
 
4.3%
r 443
 
4.3%
c 276
 
2.7%
L 262
 
2.5%
l 260
 
2.5%
Other values (212) 5429
52.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5283
51.1%
Other Letter 2083
 
20.1%
Uppercase Letter 1375
 
13.3%
Space Separator 1151
 
11.1%
Open Punctuation 107
 
1.0%
Close Punctuation 107
 
1.0%
Dash Punctuation 104
 
1.0%
Other Punctuation 67
 
0.6%
Decimal Number 62
 
0.6%
Connector Punctuation 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
167
 
8.0%
144
 
6.9%
114
 
5.5%
110
 
5.3%
76
 
3.6%
65
 
3.1%
64
 
3.1%
53
 
2.5%
53
 
2.5%
51
 
2.4%
Other values (148) 1186
56.9%
Lowercase Letter
ValueCountFrequency (%)
e 611
11.6%
i 515
9.7%
t 499
 
9.4%
o 451
 
8.5%
a 449
 
8.5%
r 443
 
8.4%
c 276
 
5.2%
l 260
 
4.9%
n 259
 
4.9%
s 258
 
4.9%
Other values (14) 1262
23.9%
Uppercase Letter
ValueCountFrequency (%)
L 262
19.1%
P 161
11.7%
N 150
10.9%
S 131
9.5%
O 91
 
6.6%
R 89
 
6.5%
C 69
 
5.0%
D 54
 
3.9%
A 53
 
3.9%
U 48
 
3.5%
Other values (14) 267
19.4%
Decimal Number
ValueCountFrequency (%)
2 13
21.0%
1 11
17.7%
3 9
14.5%
6 8
12.9%
5 7
11.3%
4 6
9.7%
8 4
 
6.5%
7 4
 
6.5%
Other Punctuation
ValueCountFrequency (%)
/ 57
85.1%
: 10
 
14.9%
Space Separator
ValueCountFrequency (%)
1151
100.0%
Open Punctuation
ValueCountFrequency (%)
( 107
100.0%
Close Punctuation
ValueCountFrequency (%)
) 107
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 104
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%
Math Symbol
ValueCountFrequency (%)
+ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6658
64.4%
Hangul 2083
 
20.1%
Common 1605
 
15.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
167
 
8.0%
144
 
6.9%
114
 
5.5%
110
 
5.3%
76
 
3.6%
65
 
3.1%
64
 
3.1%
53
 
2.5%
53
 
2.5%
51
 
2.4%
Other values (148) 1186
56.9%
Latin
ValueCountFrequency (%)
e 611
 
9.2%
i 515
 
7.7%
t 499
 
7.5%
o 451
 
6.8%
a 449
 
6.7%
r 443
 
6.7%
c 276
 
4.1%
L 262
 
3.9%
l 260
 
3.9%
n 259
 
3.9%
Other values (38) 2633
39.5%
Common
ValueCountFrequency (%)
1151
71.7%
( 107
 
6.7%
) 107
 
6.7%
- 104
 
6.5%
/ 57
 
3.6%
2 13
 
0.8%
1 11
 
0.7%
: 10
 
0.6%
3 9
 
0.6%
6 8
 
0.5%
Other values (6) 28
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8263
79.9%
Hangul 2083
 
20.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1151
 
13.9%
e 611
 
7.4%
i 515
 
6.2%
t 499
 
6.0%
o 451
 
5.5%
a 449
 
5.4%
r 443
 
5.4%
c 276
 
3.3%
L 262
 
3.2%
l 260
 
3.1%
Other values (54) 3346
40.5%
Hangul
ValueCountFrequency (%)
167
 
8.0%
144
 
6.9%
114
 
5.5%
110
 
5.3%
76
 
3.6%
65
 
3.1%
64
 
3.1%
53
 
2.5%
53
 
2.5%
51
 
2.4%
Other values (148) 1186
56.9%

colCnt
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing613
Missing (%)100.0%
Memory size5.5 KiB

dispFormat
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
텍스트
242 
Y : 유 / N : 무
234 
YYYY-MM-DD
49 
숫자
41 
RN+비식별숫자(8)
28 
Other values (6)
 
19

Length

Max length15
Median length13
Mean length7.8531811
Min length2

Unique

Unique3 ?
Unique (%)0.5%

Sample

1st rowRN+비식별숫자(8)
2nd rowM 남 | F 여
3rd rowYYYY-MM-DD
4th rowYYYY-MM-DD
5th row원내검사 코드

Common Values

ValueCountFrequency (%)
텍스트 242
39.5%
Y : 유 / N : 무 234
38.2%
YYYY-MM-DD 49
 
8.0%
숫자 41
 
6.7%
RN+비식별숫자(8) 28
 
4.6%
Free 텍스트 11
 
1.8%
Y : 내부 / N : 외부 3
 
0.5%
원내검사 코드 2
 
0.3%
M 남 | F 여 1
 
0.2%
FREE 텍스트 1
 
0.2%

Length

2024-04-19T15:23:26.486482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
712
34.7%
텍스트 254
 
12.4%
y 237
 
11.5%
n 237
 
11.5%
234
 
11.4%
234
 
11.4%
yyyy-mm-dd 49
 
2.4%
숫자 41
 
2.0%
rn+비식별숫자(8 28
 
1.4%
free 12
 
0.6%
Other values (9) 15
 
0.7%

Interactions

2024-04-19T15:23:21.578069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-19T15:23:26.566558image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataTypedispFormat
NUM1.0000.9330.9330.9930.9930.6830.491
gpId0.9331.0001.0001.0001.0000.7730.664
gpNm0.9331.0001.0001.0001.0000.7730.664
tblId0.9931.0001.0001.0001.0000.7540.711
tblNm0.9931.0001.0001.0001.0000.7540.711
dataType0.6830.7730.7730.7540.7541.0000.956
dispFormat0.4910.6640.6640.7110.7110.9561.000
2024-04-19T15:23:26.667156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
dataTypegpNmgpIddispFormat
dataType1.0000.2950.2950.743
gpNm0.2951.0001.0000.299
gpId0.2951.0001.0000.299
dispFormat0.7430.2990.2991.000
2024-04-19T15:23:26.769709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmdataTypedispFormat
NUM1.0000.6920.6920.3130.233
gpId0.6921.0001.0000.2950.299
gpNm0.6921.0001.0000.2950.299
dataType0.3130.2950.2951.0000.743
dispFormat0.2330.2990.2990.7431.000

Missing values

2024-04-19T15:23:21.723495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-19T15:23:21.899438image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01UTRN_TRGTSummaryUTRN_PT_TRGT기본정보PT_SBST_NO환자대체번호String(10)환자대체번호<NA>RN+비식별숫자(8)
12UTRN_TRGTSummaryUTRN_PT_TRGT기본정보SEX_CD성별 코드String(code)성별코드<NA>M 남 | F 여
23UTRN_TRGTSummaryUTRN_PT_TRGT기본정보BRTH_YMD생년월일DATE생년월일<NA>YYYY-MM-DD
34UTRN_TRGTSummaryUTRN_PT_TRGT기본정보FRST_DIAG_YMD최초 진단일DATE최초진단일자<NA>YYYY-MM-DD
45UTRN_TRGTSummaryUTRN_PT_TRGT기본정보FRST_DIAG_CD최초 진단 코드String(code)최초진단코드<NA>원내검사 코드
56UTRN_TRGTSummaryUTRN_PT_TRGT기본정보FRST_DIAG_NM최초 진단명String(256)최초진단명<NA>텍스트
67UTRN_TRGTSummaryUTRN_PT_TRGT기본정보DIAG_ATT_AGE진단 시 나이Integer(3)진단 시 나이<NA>숫자
78UTRN_TRGTSummaryUTRN_PT_TRGT기본정보FRMD_YMD초진일DATE초진일자<NA>YYYY-MM-DD
89UTRN_TRGTSummaryUTRN_PT_TRGT기본정보FRMD_DEPT_NM초진 부서명String(20)초진 부서<NA>텍스트
910UTRN_TRGTSummaryUTRN_PT_TRGT기본정보FRST_OPRT_YMD최초 수술일DATE최초 수술일자<NA>YYYY-MM-DD
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
603604UTRN_CHMO_FLST항암 FlowSheetUTRN_PE_CHMO_FLSTFlow SheetFATG_CMNTFATIGUE 내용String(50)FATIGUE<NA>텍스트
604605UTRN_CHMO_FLST항암 FlowSheetUTRN_PE_CHMO_FLSTFlow SheetNV_CMNTNV 내용String(50)NV<NA>텍스트
605606UTRN_CHMO_FLST항암 FlowSheetUTRN_PE_CHMO_FLSTFlow SheetCSTP_CMNTCONSTIPATION 내용String(50)CONSTIPATION<NA>텍스트
606607UTRN_CHMO_FLST항암 FlowSheetUTRN_PE_CHMO_FLSTFlow SheetDIAR_CMNTDIARRHEA 내용String(50)DIARRHEA<NA>텍스트
607608UTRN_CHMO_FLST항암 FlowSheetUTRN_PE_CHMO_FLSTFlow SheetSKIN_RASH_CMNTSKINRASH 내용String(50)SKINRASH<NA>텍스트
608609UTRN_CHMO_FLST항암 FlowSheetUTRN_PE_CHMO_FLSTFlow SheetMCST_CMNTMUCOSITIS 내용String(50)MUCOSITIS<NA>텍스트
609610UTRN_CHMO_FLST항암 FlowSheetUTRN_PE_CHMO_FLSTFlow SheetNURO_PTHY_CMNTNEUROPATHY 내용String(50)NEUROPATHY<NA>텍스트
610611UTRN_CHMO_FLST항암 FlowSheetUTRN_PE_CHMO_FLSTFlow SheetECOG_CDECOG 코드Integer(code)ECOG 전신상태평가<NA>숫자
611612UTRN_CHMO_FLST항암 FlowSheetUTRN_PE_CHMO_FLSTFlow SheetWT_VL체중 (kg)Float(5,2)체중<NA>숫자
612613UTRN_CHMO_FLST항암 FlowSheetUTRN_PE_CHMO_FLSTFlow SheetBSA_VLBSAFloat(10,2)체표면적<NA>숫자