Overview

Dataset statistics

Number of variables11
Number of observations489
Missing cells572
Missing cells (%)10.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory43.1 KiB
Average record size in memory90.3 B

Variable types

Numeric1
Categorical3
Text6
Unsupported1

Dataset

Description난소암 레지스트리 메타정보( 제공 되어질 데이터 항목, 타입, 사이즈, 항목별건수, 샘플데이터 등)를 제공
Author국립암센터
URLhttps://www.data.go.kr/data/15048685/fileData.do

Alerts

gpId is highly overall correlated with NUM and 1 other fieldsHigh correlation
gpNm is highly overall correlated with NUM and 1 other fieldsHigh correlation
NUM is highly overall correlated with gpId and 1 other fieldsHigh correlation
dataType is highly imbalanced (75.7%)Imbalance
colDesc has 8 (1.6%) missing valuesMissing
colCnt has 489 (100.0%) missing valuesMissing
dispFormat has 75 (15.3%) missing valuesMissing
NUM has unique valuesUnique
colCnt is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-17 18:53:38.623977
Analysis finished2024-04-17 18:53:40.501479
Duration1.88 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NUM
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct489
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean245
Minimum1
Maximum489
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.4 KiB
2024-04-18T03:53:40.561915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile25.4
Q1123
median245
Q3367
95-th percentile464.6
Maximum489
Range488
Interquartile range (IQR)244

Descriptive statistics

Standard deviation141.3064
Coefficient of variation (CV)0.57676084
Kurtosis-1.2
Mean245
Median Absolute Deviation (MAD)122
Skewness0
Sum119805
Variance19967.5
MonotonicityStrictly increasing
2024-04-18T03:53:40.684136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.2%
337 1
 
0.2%
335 1
 
0.2%
334 1
 
0.2%
333 1
 
0.2%
332 1
 
0.2%
331 1
 
0.2%
330 1
 
0.2%
329 1
 
0.2%
328 1
 
0.2%
Other values (479) 479
98.0%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
489 1
0.2%
488 1
0.2%
487 1
0.2%
486 1
0.2%
485 1
0.2%
484 1
0.2%
483 1
0.2%
482 1
0.2%
481 1
0.2%
480 1
0.2%

gpId
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
OVRY_OPRT_SOTOC
190 
OVRY_OPRT
83 
OVRY_HLTH
79 
OVRY_SPR
33 
OVRY_FLUP_CST_FLUP
24 
Other values (9)
80 

Length

Max length18
Median length17
Mean length11.903885
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOVRY_SUMMARY_PTIF
2nd rowOVRY_SUMMARY_PTIF
3rd rowOVRY_SUMMARY_PTIF
4th rowOVRY_SUMMARY_PTIF
5th rowOVRY_SUMMARY_PTIF

Common Values

ValueCountFrequency (%)
OVRY_OPRT_SOTOC 190
38.9%
OVRY_OPRT 83
17.0%
OVRY_HLTH 79
16.2%
OVRY_SPR 33
 
6.7%
OVRY_FLUP_CST_FLUP 24
 
4.9%
OVRY_BX 17
 
3.5%
OVRY_SUMMARY_PTIF 16
 
3.3%
OVRY_RTX 10
 
2.0%
OVRY_CHMO 9
 
1.8%
OVRY_BRCA 9
 
1.8%
Other values (4) 19
 
3.9%

Length

2024-04-18T03:53:40.791665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ovry_oprt_sotoc 190
38.9%
ovry_oprt 83
17.0%
ovry_hlth 79
16.2%
ovry_spr 33
 
6.7%
ovry_flup_cst_flup 24
 
4.9%
ovry_bx 17
 
3.5%
ovry_summary_ptif 16
 
3.3%
ovry_rtx 10
 
2.0%
ovry_chmo 9
 
1.8%
ovry_brca 9
 
1.8%
Other values (4) 19
 
3.9%

gpNm
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
수술정보(SOTOC)
190 
수술정보
83 
기타건강정보
79 
외과병리결과
33 
추적관찰
24 
Other values (9)
80 

Length

Max length18
Median length17
Mean length7.7484663
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기본/진단정보
2nd row기본/진단정보
3rd row기본/진단정보
4th row기본/진단정보
5th row기본/진단정보

Common Values

ValueCountFrequency (%)
수술정보(SOTOC) 190
38.9%
수술정보 83
17.0%
기타건강정보 79
16.2%
외과병리결과 33
 
6.7%
추적관찰 24
 
4.9%
Bx 17
 
3.5%
기본/진단정보 16
 
3.3%
방사선치료 10
 
2.0%
항암화학요법 9
 
1.8%
BRCA 검사 9
 
1.8%
Other values (4) 19
 
3.9%

Length

2024-04-18T03:53:40.893353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
수술정보(sotoc 190
36.5%
수술정보 83
16.0%
기타건강정보 79
15.2%
외과병리결과 33
 
6.3%
추적관찰 24
 
4.6%
bx 17
 
3.3%
기본/진단정보 16
 
3.1%
방사선치료 10
 
1.9%
brca 9
 
1.7%
검사 9
 
1.7%
Other values (10) 50
 
9.6%

tblId
Text

Distinct66
Distinct (%)13.5%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
2024-04-18T03:53:41.083700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length20
Mean length16.257669
Min length10

Characters and Unicode

Total characters7950
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st rowPT_OVRY_TRGT
2nd rowPT_OVRY_TRGT
3rd rowRG_OVRY_CNDX_V
4th rowRG_OVRY_CNDX_V
5th rowRG_OVRY_CNDX_V
ValueCountFrequency (%)
pe_ovry_oprt_4 50
 
10.2%
pe_ovry_spr_1 28
 
5.7%
pe_ovry_oprt_sotoc_4 23
 
4.7%
pe_ovry_oprt_sotoc_8 22
 
4.5%
pe_ovry_oprt_sotoc_5 22
 
4.5%
pe_ovry_oprt_1 18
 
3.7%
pe_ovry_oprt_sotoc_2 16
 
3.3%
mr_ovry_hlth_10 14
 
2.9%
pe_ovry_oprt_sotoc_3 14
 
2.9%
pe_ovry_rtx_v 10
 
2.0%
Other values (56) 272
55.6%
2024-04-18T03:53:41.380904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 1607
20.2%
O 1145
14.4%
R 905
11.4%
P 728
9.2%
T 600
 
7.5%
V 520
 
6.5%
Y 494
 
6.2%
E 410
 
5.2%
S 251
 
3.2%
C 243
 
3.1%
Other values (22) 1047
13.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5875
73.9%
Connector Punctuation 1607
 
20.2%
Decimal Number 468
 
5.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O 1145
19.5%
R 905
15.4%
P 728
12.4%
T 600
10.2%
V 520
8.9%
Y 494
8.4%
E 410
 
7.0%
S 251
 
4.3%
C 243
 
4.1%
H 167
 
2.8%
Other values (11) 412
 
7.0%
Decimal Number
ValueCountFrequency (%)
1 128
27.4%
4 92
19.7%
2 64
13.7%
3 39
 
8.3%
5 37
 
7.9%
8 35
 
7.5%
6 22
 
4.7%
0 22
 
4.7%
7 21
 
4.5%
9 8
 
1.7%
Connector Punctuation
ValueCountFrequency (%)
_ 1607
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5875
73.9%
Common 2075
 
26.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
O 1145
19.5%
R 905
15.4%
P 728
12.4%
T 600
10.2%
V 520
8.9%
Y 494
8.4%
E 410
 
7.0%
S 251
 
4.3%
C 243
 
4.1%
H 167
 
2.8%
Other values (11) 412
 
7.0%
Common
ValueCountFrequency (%)
_ 1607
77.4%
1 128
 
6.2%
4 92
 
4.4%
2 64
 
3.1%
3 39
 
1.9%
5 37
 
1.8%
8 35
 
1.7%
6 22
 
1.1%
0 22
 
1.1%
7 21
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7950
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 1607
20.2%
O 1145
14.4%
R 905
11.4%
P 728
9.2%
T 600
 
7.5%
V 520
 
6.5%
Y 494
 
6.2%
E 410
 
5.2%
S 251
 
3.2%
C 243
 
3.1%
Other values (22) 1047
13.2%

tblNm
Text

Distinct65
Distinct (%)13.3%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
2024-04-18T03:53:41.584436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length58
Median length26
Mean length9.2331288
Min length2

Characters and Unicode

Total characters4515
Distinct characters123
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st row기본정보
2nd row기본정보
3rd row진단정보
4th row진단정보
5th row진단정보
ValueCountFrequency (%)
ln 64
 
7.5%
58
 
6.8%
name 50
 
5.8%
op 50
 
5.8%
postop 33
 
3.9%
bx 28
 
3.3%
ruq 23
 
2.7%
pelvis 22
 
2.6%
colon 22
 
2.6%
수술정보 18
 
2.1%
Other values (81) 487
57.0%
2024-04-18T03:53:41.905738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
370
 
8.2%
a 308
 
6.8%
e 221
 
4.9%
t 201
 
4.5%
o 185
 
4.1%
i 182
 
4.0%
l 174
 
3.9%
r 173
 
3.8%
L 160
 
3.5%
P 154
 
3.4%
Other values (113) 2387
52.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2282
50.5%
Uppercase Letter 815
 
18.1%
Other Letter 731
 
16.2%
Space Separator 370
 
8.2%
Dash Punctuation 94
 
2.1%
Other Punctuation 61
 
1.4%
Close Punctuation 60
 
1.3%
Open Punctuation 60
 
1.3%
Decimal Number 42
 
0.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
72
 
9.8%
41
 
5.6%
38
 
5.2%
26
 
3.6%
26
 
3.6%
25
 
3.4%
24
 
3.3%
21
 
2.9%
21
 
2.9%
21
 
2.9%
Other values (61) 416
56.9%
Lowercase Letter
ValueCountFrequency (%)
a 308
13.5%
e 221
9.7%
t 201
8.8%
o 185
 
8.1%
i 182
 
8.0%
l 174
 
7.6%
r 173
 
7.6%
n 152
 
6.7%
s 121
 
5.3%
m 99
 
4.3%
Other values (13) 466
20.4%
Uppercase Letter
ValueCountFrequency (%)
L 160
19.6%
P 154
18.9%
N 110
13.5%
O 75
9.2%
C 50
 
6.1%
U 39
 
4.8%
Q 39
 
4.8%
B 37
 
4.5%
R 36
 
4.4%
S 32
 
3.9%
Other values (6) 83
10.2%
Decimal Number
ValueCountFrequency (%)
1 7
16.7%
5 6
14.3%
3 6
14.3%
6 6
14.3%
4 6
14.3%
8 4
9.5%
7 4
9.5%
2 3
7.1%
Space Separator
ValueCountFrequency (%)
370
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 94
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 61
100.0%
Close Punctuation
ValueCountFrequency (%)
) 60
100.0%
Open Punctuation
ValueCountFrequency (%)
( 60
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3097
68.6%
Hangul 731
 
16.2%
Common 687
 
15.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
72
 
9.8%
41
 
5.6%
38
 
5.2%
26
 
3.6%
26
 
3.6%
25
 
3.4%
24
 
3.3%
21
 
2.9%
21
 
2.9%
21
 
2.9%
Other values (61) 416
56.9%
Latin
ValueCountFrequency (%)
a 308
 
9.9%
e 221
 
7.1%
t 201
 
6.5%
o 185
 
6.0%
i 182
 
5.9%
l 174
 
5.6%
r 173
 
5.6%
L 160
 
5.2%
P 154
 
5.0%
n 152
 
4.9%
Other values (29) 1187
38.3%
Common
ValueCountFrequency (%)
370
53.9%
- 94
 
13.7%
/ 61
 
8.9%
) 60
 
8.7%
( 60
 
8.7%
1 7
 
1.0%
5 6
 
0.9%
3 6
 
0.9%
6 6
 
0.9%
4 6
 
0.9%
Other values (3) 11
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3784
83.8%
Hangul 731
 
16.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
370
 
9.8%
a 308
 
8.1%
e 221
 
5.8%
t 201
 
5.3%
o 185
 
4.9%
i 182
 
4.8%
l 174
 
4.6%
r 173
 
4.6%
L 160
 
4.2%
P 154
 
4.1%
Other values (42) 1656
43.8%
Hangul
ValueCountFrequency (%)
72
 
9.8%
41
 
5.6%
38
 
5.2%
26
 
3.6%
26
 
3.6%
25
 
3.4%
24
 
3.3%
21
 
2.9%
21
 
2.9%
21
 
2.9%
Other values (61) 416
56.9%

colId
Text

Distinct478
Distinct (%)97.8%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
2024-04-18T03:53:42.117707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length23
Mean length14.437628
Min length5

Characters and Unicode

Total characters7060
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique468 ?
Unique (%)95.7%

Sample

1st rowFRMD_YMD
2nd rowDIAG_AGE
3rd rowDIAG_YMD
4th rowDIAG_ENM
5th rowETC_CNCR_YN
ValueCountFrequency (%)
figo_stag 3
 
0.6%
ancd_ingr_nm 2
 
0.4%
clnc_m_stag 2
 
0.4%
clnc_n_stag 2
 
0.4%
ord_ymd 2
 
0.4%
ancd_nm 2
 
0.4%
oprt_ymd 2
 
0.4%
clnc_stag 2
 
0.4%
ancd_ord_seq 2
 
0.4%
clnc_t_stag 2
 
0.4%
Other values (468) 468
95.7%
2024-04-18T03:53:42.433319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 1267
17.9%
N 706
 
10.0%
T 567
 
8.0%
M 470
 
6.7%
L 432
 
6.1%
C 421
 
6.0%
R 376
 
5.3%
S 359
 
5.1%
P 295
 
4.2%
Y 275
 
3.9%
Other values (25) 1892
26.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5735
81.2%
Connector Punctuation 1267
 
17.9%
Decimal Number 58
 
0.8%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 706
12.3%
T 567
 
9.9%
M 470
 
8.2%
L 432
 
7.5%
C 421
 
7.3%
R 376
 
6.6%
S 359
 
6.3%
P 295
 
5.1%
Y 275
 
4.8%
D 263
 
4.6%
Other values (16) 1571
27.4%
Decimal Number
ValueCountFrequency (%)
1 11
19.0%
4 9
15.5%
3 9
15.5%
5 8
13.8%
2 7
12.1%
6 6
10.3%
7 4
 
6.9%
8 4
 
6.9%
Connector Punctuation
ValueCountFrequency (%)
_ 1267
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5735
81.2%
Common 1325
 
18.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 706
12.3%
T 567
 
9.9%
M 470
 
8.2%
L 432
 
7.5%
C 421
 
7.3%
R 376
 
6.6%
S 359
 
6.3%
P 295
 
5.1%
Y 275
 
4.8%
D 263
 
4.6%
Other values (16) 1571
27.4%
Common
ValueCountFrequency (%)
_ 1267
95.6%
1 11
 
0.8%
4 9
 
0.7%
3 9
 
0.7%
5 8
 
0.6%
2 7
 
0.5%
6 6
 
0.5%
7 4
 
0.3%
8 4
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7060
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 1267
17.9%
N 706
 
10.0%
T 567
 
8.0%
M 470
 
6.7%
L 432
 
6.1%
C 421
 
6.0%
R 376
 
5.3%
S 359
 
5.1%
P 295
 
4.2%
Y 275
 
3.9%
Other values (25) 1892
26.8%

colNm
Text

Distinct313
Distinct (%)64.0%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
2024-04-18T03:53:42.646759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length43
Median length34
Mean length11.740286
Min length1

Characters and Unicode

Total characters5741
Distinct characters191
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique254 ?
Unique (%)51.9%

Sample

1st row초진일
2nd row진단시나이
3rd row진단일
4th row진단명
5th row난소암/기타암 구분
ValueCountFrequency (%)
right 34
 
3.5%
left 32
 
3.3%
lnd 30
 
3.1%
size 29
 
3.0%
op 27
 
2.8%
of 26
 
2.7%
lns 24
 
2.5%
resection 16
 
1.7%
stage 15
 
1.6%
기타 15
 
1.6%
Other values (325) 718
74.3%
2024-04-18T03:53:42.968593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
478
 
8.3%
e 469
 
8.2%
t 383
 
6.7%
i 338
 
5.9%
o 334
 
5.8%
r 275
 
4.8%
a 275
 
4.8%
c 209
 
3.6%
s 175
 
3.0%
n 166
 
2.9%
Other values (181) 2639
46.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3707
64.6%
Uppercase Letter 739
 
12.9%
Other Letter 714
 
12.4%
Space Separator 478
 
8.3%
Close Punctuation 28
 
0.5%
Open Punctuation 28
 
0.5%
Decimal Number 17
 
0.3%
Other Punctuation 14
 
0.2%
Dash Punctuation 13
 
0.2%
Math Symbol 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
30
 
4.2%
26
 
3.6%
23
 
3.2%
23
 
3.2%
20
 
2.8%
19
 
2.7%
17
 
2.4%
16
 
2.2%
14
 
2.0%
14
 
2.0%
Other values (122) 512
71.7%
Lowercase Letter
ValueCountFrequency (%)
e 469
12.7%
t 383
10.3%
i 338
 
9.1%
o 334
 
9.0%
r 275
 
7.4%
a 275
 
7.4%
c 209
 
5.6%
s 175
 
4.7%
n 166
 
4.5%
l 145
 
3.9%
Other values (14) 938
25.3%
Uppercase Letter
ValueCountFrequency (%)
L 114
15.4%
S 106
14.3%
P 89
12.0%
N 72
9.7%
R 58
7.8%
D 52
7.0%
O 50
6.8%
A 36
 
4.9%
C 30
 
4.1%
I 29
 
3.9%
Other values (12) 103
13.9%
Decimal Number
ValueCountFrequency (%)
2 4
23.5%
1 4
23.5%
5 3
17.6%
4 3
17.6%
3 3
17.6%
Other Punctuation
ValueCountFrequency (%)
/ 10
71.4%
, 2
 
14.3%
& 2
 
14.3%
Space Separator
ValueCountFrequency (%)
478
100.0%
Close Punctuation
ValueCountFrequency (%)
) 28
100.0%
Open Punctuation
ValueCountFrequency (%)
( 28
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%
Math Symbol
ValueCountFrequency (%)
+ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4446
77.4%
Hangul 714
 
12.4%
Common 581
 
10.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
30
 
4.2%
26
 
3.6%
23
 
3.2%
23
 
3.2%
20
 
2.8%
19
 
2.7%
17
 
2.4%
16
 
2.2%
14
 
2.0%
14
 
2.0%
Other values (122) 512
71.7%
Latin
ValueCountFrequency (%)
e 469
 
10.5%
t 383
 
8.6%
i 338
 
7.6%
o 334
 
7.5%
r 275
 
6.2%
a 275
 
6.2%
c 209
 
4.7%
s 175
 
3.9%
n 166
 
3.7%
l 145
 
3.3%
Other values (36) 1677
37.7%
Common
ValueCountFrequency (%)
478
82.3%
) 28
 
4.8%
( 28
 
4.8%
- 13
 
2.2%
/ 10
 
1.7%
2 4
 
0.7%
1 4
 
0.7%
+ 3
 
0.5%
5 3
 
0.5%
4 3
 
0.5%
Other values (3) 7
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5027
87.6%
Hangul 714
 
12.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
478
 
9.5%
e 469
 
9.3%
t 383
 
7.6%
i 338
 
6.7%
o 334
 
6.6%
r 275
 
5.5%
a 275
 
5.5%
c 209
 
4.2%
s 175
 
3.5%
n 166
 
3.3%
Other values (49) 1925
38.3%
Hangul
ValueCountFrequency (%)
30
 
4.2%
26
 
3.6%
23
 
3.2%
23
 
3.2%
20
 
2.8%
19
 
2.7%
17
 
2.4%
16
 
2.2%
14
 
2.0%
14
 
2.0%
Other values (122) 512
71.7%

dataType
Categorical

IMBALANCE 

Distinct9
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
String
433 
Date
 
26
Integer
 
10
Float
 
9
DATE
 
4
Other values (4)
 
7

Length

Max length9
Median length6
Mean length5.8916155
Min length4

Unique

Unique3 ?
Unique (%)0.6%

Sample

1st row<NA>
2nd rowFloat()
3rd rowDATE
4th rowDATE
5th rowString

Common Values

ValueCountFrequency (%)
String 433
88.5%
Date 26
 
5.3%
Integer 10
 
2.0%
Float 9
 
1.8%
DATE 4
 
0.8%
INTEGER 4
 
0.8%
<NA> 1
 
0.2%
Float() 1
 
0.2%
Float(51) 1
 
0.2%

Length

2024-04-18T03:53:43.073571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-18T03:53:43.162955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
string 433
88.5%
date 30
 
6.1%
integer 14
 
2.9%
float 10
 
2.0%
na 1
 
0.2%
float(51 1
 
0.2%

colDesc
Text

MISSING 

Distinct475
Distinct (%)98.8%
Missing8
Missing (%)1.6%
Memory size3.9 KiB
2024-04-18T03:53:43.394691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length436
Median length39
Mean length23.328482
Min length5

Characters and Unicode

Total characters11221
Distinct characters283
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique469 ?
Unique (%)97.5%

Sample

1st row자궁암센터 외래 초진일
2nd row환자의 진단시 나이
3rd rowKCD가 C48 C56 C57 중 하나 이상인 최초 진단 등록일
4th rowKCD가 C48 C56 C57 중 하나 이상인 모든 등록 진단 정보 (하위코드 포함)
5th row난소암 or 기타암 구분
ValueCountFrequency (%)
여부 171
 
8.0%
내용 85
 
4.0%
환자의 53
 
2.5%
52
 
2.4%
유무 43
 
2.0%
환자 39
 
1.8%
right 39
 
1.8%
left 37
 
1.7%
lnd 30
 
1.4%
size 28
 
1.3%
Other values (480) 1569
73.1%
2024-04-18T03:53:43.755170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1684
 
15.0%
e 571
 
5.1%
t 490
 
4.4%
i 481
 
4.3%
a 441
 
3.9%
r 434
 
3.9%
o 414
 
3.7%
c 267
 
2.4%
l 261
 
2.3%
L 251
 
2.2%
Other values (273) 5927
52.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5016
44.7%
Other Letter 2887
25.7%
Space Separator 1684
 
15.0%
Uppercase Letter 1217
 
10.8%
Dash Punctuation 103
 
0.9%
Decimal Number 90
 
0.8%
Open Punctuation 78
 
0.7%
Close Punctuation 78
 
0.7%
Other Punctuation 64
 
0.6%
Math Symbol 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
199
 
6.9%
177
 
6.1%
133
 
4.6%
120
 
4.2%
103
 
3.6%
100
 
3.5%
100
 
3.5%
72
 
2.5%
64
 
2.2%
53
 
1.8%
Other values (208) 1766
61.2%
Lowercase Letter
ValueCountFrequency (%)
e 571
11.4%
t 490
9.8%
i 481
9.6%
a 441
 
8.8%
r 434
 
8.7%
o 414
 
8.3%
c 267
 
5.3%
l 261
 
5.2%
s 244
 
4.9%
n 234
 
4.7%
Other values (14) 1179
23.5%
Uppercase Letter
ValueCountFrequency (%)
L 251
20.6%
P 143
11.8%
N 140
11.5%
S 98
 
8.1%
R 92
 
7.6%
O 76
 
6.2%
C 69
 
5.7%
D 53
 
4.4%
A 52
 
4.3%
U 46
 
3.8%
Other values (12) 197
16.2%
Decimal Number
ValueCountFrequency (%)
4 17
18.9%
3 16
17.8%
5 13
14.4%
1 12
13.3%
2 11
12.2%
6 9
10.0%
8 6
 
6.7%
7 6
 
6.7%
Other Punctuation
ValueCountFrequency (%)
/ 59
92.2%
. 2
 
3.1%
& 2
 
3.1%
, 1
 
1.6%
Math Symbol
ValueCountFrequency (%)
= 2
66.7%
+ 1
33.3%
Space Separator
ValueCountFrequency (%)
1684
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 103
100.0%
Open Punctuation
ValueCountFrequency (%)
( 78
100.0%
Close Punctuation
ValueCountFrequency (%)
) 78
100.0%
Other Number
ValueCountFrequency (%)
² 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6233
55.5%
Hangul 2887
25.7%
Common 2101
 
18.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
199
 
6.9%
177
 
6.1%
133
 
4.6%
120
 
4.2%
103
 
3.6%
100
 
3.5%
100
 
3.5%
72
 
2.5%
64
 
2.2%
53
 
1.8%
Other values (208) 1766
61.2%
Latin
ValueCountFrequency (%)
e 571
 
9.2%
t 490
 
7.9%
i 481
 
7.7%
a 441
 
7.1%
r 434
 
7.0%
o 414
 
6.6%
c 267
 
4.3%
l 261
 
4.2%
L 251
 
4.0%
s 244
 
3.9%
Other values (36) 2379
38.2%
Common
ValueCountFrequency (%)
1684
80.2%
- 103
 
4.9%
( 78
 
3.7%
) 78
 
3.7%
/ 59
 
2.8%
4 17
 
0.8%
3 16
 
0.8%
5 13
 
0.6%
1 12
 
0.6%
2 11
 
0.5%
Other values (9) 30
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8333
74.3%
Hangul 2887
 
25.7%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1684
20.2%
e 571
 
6.9%
t 490
 
5.9%
i 481
 
5.8%
a 441
 
5.3%
r 434
 
5.2%
o 414
 
5.0%
c 267
 
3.2%
l 261
 
3.1%
L 251
 
3.0%
Other values (54) 3039
36.5%
Hangul
ValueCountFrequency (%)
199
 
6.9%
177
 
6.1%
133
 
4.6%
120
 
4.2%
103
 
3.6%
100
 
3.5%
100
 
3.5%
72
 
2.5%
64
 
2.2%
53
 
1.8%
Other values (208) 1766
61.2%
None
ValueCountFrequency (%)
² 1
100.0%

colCnt
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing489
Missing (%)100.0%
Memory size4.4 KiB

dispFormat
Text

MISSING 

Distinct105
Distinct (%)25.4%
Missing75
Missing (%)15.3%
Memory size3.9 KiB
2024-04-18T03:53:43.994050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length328
Median length207
Mean length16.413043
Min length2

Characters and Unicode

Total characters6795
Distinct characters101
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique81 ?
Unique (%)19.6%

Sample

1st rowYYYY-MM-DD
2nd row숫자
3rd rowYYYY-MM-DD
4th rowOVARIAN CANCER UNSPECIFIED SIDE
5th rowY기타암 |N난소암
ValueCountFrequency (%)
y 176
 
11.5%
140
 
9.2%
n 112
 
7.3%
텍스트 74
 
4.8%
숫자 61
 
4.0%
no 60
 
3.9%
yes 60
 
3.9%
45
 
2.9%
y유 44
 
2.9%
yyyy-mm-dd 30
 
2.0%
Other values (230) 727
47.5%
2024-04-18T03:53:44.584889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1265
18.6%
e 376
 
5.5%
Y 373
 
5.5%
o 301
 
4.4%
t 273
 
4.0%
, 269
 
4.0%
N 253
 
3.7%
i 227
 
3.3%
a 195
 
2.9%
s 191
 
2.8%
Other values (91) 3072
45.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2865
42.2%
Space Separator 1265
18.6%
Uppercase Letter 1165
17.1%
Other Letter 486
 
7.2%
Other Punctuation 396
 
5.8%
Decimal Number 299
 
4.4%
Math Symbol 200
 
2.9%
Dash Punctuation 77
 
1.1%
Close Punctuation 27
 
0.4%
Open Punctuation 15
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
74
15.2%
74
15.2%
74
15.2%
62
12.8%
61
12.6%
46
9.5%
45
9.3%
6
 
1.2%
4
 
0.8%
3
 
0.6%
Other values (23) 37
7.6%
Lowercase Letter
ValueCountFrequency (%)
e 376
13.1%
o 301
10.5%
t 273
 
9.5%
i 227
 
7.9%
a 195
 
6.8%
s 191
 
6.7%
r 157
 
5.5%
c 147
 
5.1%
n 132
 
4.6%
m 125
 
4.4%
Other values (13) 741
25.9%
Uppercase Letter
ValueCountFrequency (%)
Y 373
32.0%
N 253
21.7%
L 140
 
12.0%
D 94
 
8.1%
M 58
 
5.0%
R 52
 
4.5%
S 44
 
3.8%
C 28
 
2.4%
A 20
 
1.7%
P 18
 
1.5%
Other values (13) 85
 
7.3%
Decimal Number
ValueCountFrequency (%)
1 71
23.7%
2 62
20.7%
3 44
14.7%
4 36
12.0%
0 25
 
8.4%
8 18
 
6.0%
6 13
 
4.3%
7 13
 
4.3%
5 9
 
3.0%
9 8
 
2.7%
Other Punctuation
ValueCountFrequency (%)
, 269
67.9%
: 54
 
13.6%
/ 47
 
11.9%
. 26
 
6.6%
Math Symbol
ValueCountFrequency (%)
| 188
94.0%
+ 6
 
3.0%
= 4
 
2.0%
> 2
 
1.0%
Space Separator
ValueCountFrequency (%)
1265
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 77
100.0%
Close Punctuation
ValueCountFrequency (%)
) 27
100.0%
Open Punctuation
ValueCountFrequency (%)
( 15
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4030
59.3%
Common 2279
33.5%
Hangul 486
 
7.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 376
 
9.3%
Y 373
 
9.3%
o 301
 
7.5%
t 273
 
6.8%
N 253
 
6.3%
i 227
 
5.6%
a 195
 
4.8%
s 191
 
4.7%
r 157
 
3.9%
c 147
 
3.6%
Other values (36) 1537
38.1%
Hangul
ValueCountFrequency (%)
74
15.2%
74
15.2%
74
15.2%
62
12.8%
61
12.6%
46
9.5%
45
9.3%
6
 
1.2%
4
 
0.8%
3
 
0.6%
Other values (23) 37
7.6%
Common
ValueCountFrequency (%)
1265
55.5%
, 269
 
11.8%
| 188
 
8.2%
- 77
 
3.4%
1 71
 
3.1%
2 62
 
2.7%
: 54
 
2.4%
/ 47
 
2.1%
3 44
 
1.9%
4 36
 
1.6%
Other values (12) 166
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6309
92.8%
Hangul 486
 
7.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1265
20.1%
e 376
 
6.0%
Y 373
 
5.9%
o 301
 
4.8%
t 273
 
4.3%
, 269
 
4.3%
N 253
 
4.0%
i 227
 
3.6%
a 195
 
3.1%
s 191
 
3.0%
Other values (58) 2586
41.0%
Hangul
ValueCountFrequency (%)
74
15.2%
74
15.2%
74
15.2%
62
12.8%
61
12.6%
46
9.5%
45
9.3%
6
 
1.2%
4
 
0.8%
3
 
0.6%
Other values (23) 37
7.6%

Interactions

2024-04-18T03:53:40.111401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-18T03:53:44.662124image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmtblIdtblNmdataType
NUM1.0000.8910.8910.9920.9930.353
gpId0.8911.0001.0001.0001.0000.630
gpNm0.8911.0001.0001.0001.0000.630
tblId0.9921.0001.0001.0001.0000.863
tblNm0.9931.0001.0001.0001.0000.864
dataType0.3530.6300.6300.8630.8641.000
2024-04-18T03:53:44.746020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
gpIdgpNmdataType
gpId1.0001.0000.337
gpNm1.0001.0000.337
dataType0.3370.3371.000
2024-04-18T03:53:44.814790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NUMgpIdgpNmdataType
NUM1.0000.6370.6370.176
gpId0.6371.0001.0000.337
gpNm0.6371.0001.0000.337
dataType0.1760.3370.3371.000

Missing values

2024-04-18T03:53:40.246956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-18T03:53:40.366161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-18T03:53:40.452256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
01OVRY_SUMMARY_PTIF기본/진단정보PT_OVRY_TRGT기본정보FRMD_YMD초진일<NA>자궁암센터 외래 초진일<NA>YYYY-MM-DD
12OVRY_SUMMARY_PTIF기본/진단정보PT_OVRY_TRGT기본정보DIAG_AGE진단시나이Float()환자의 진단시 나이<NA>숫자
23OVRY_SUMMARY_PTIF기본/진단정보RG_OVRY_CNDX_V진단정보DIAG_YMD진단일DATEKCD가 C48 C56 C57 중 하나 이상인 최초 진단 등록일<NA>YYYY-MM-DD
34OVRY_SUMMARY_PTIF기본/진단정보RG_OVRY_CNDX_V진단정보DIAG_ENM진단명DATEKCD가 C48 C56 C57 중 하나 이상인 모든 등록 진단 정보 (하위코드 포함)<NA>OVARIAN CANCER UNSPECIFIED SIDE
45OVRY_SUMMARY_PTIF기본/진단정보RG_OVRY_CNDX_V진단정보ETC_CNCR_YN난소암/기타암 구분String난소암 or 기타암 구분<NA>Y기타암 |N난소암
56OVRY_SUMMARY_PTIF기본/진단정보PT_OVRY_BDMS신체계측WT_MSRM_YMD체중 측정일DATE첫번째 간호기록의 입원일(측정일)<NA>YYYY-MM-DD
67OVRY_SUMMARY_PTIF기본/진단정보PT_OVRY_BDMS신체계측PT_OVRY_BDMS_WT_VL체중(KG)Float(51)첫번째 간호기록의 입원일에 측정한 체중 측정<NA>숫자
78OVRY_SUMMARY_PTIF기본/진단정보PT_OVRY_BDMS신체계측HT_MSRM_YMD신장 측정일DATE첫번째 간호기록의 입원일(측정일)<NA>YYYY-MM-DD
89OVRY_SUMMARY_PTIF기본/진단정보PT_OVRY_BDMS신체계측HT_VL신장(cm)Float첫번째 간호기록의 입원일에 측정한 신장 측정<NA>숫자
910OVRY_SUMMARY_PTIF기본/진단정보PT_OVRY_BDMS신체계측BMI_VLBMIFloat자동계산 = 환자의 체중/(환자의 신장)²<NA>숫자
NUMgpIdgpNmtblIdtblNmcolIdcolNmdataTypecolDesccolCntdispFormat
479480OVRY_HLTH기타건강정보MR_OVRY_HLTH_10과거력PHIS_INSM_YN불면String환자의 과거 불면증 유무<NA>Y유 | N 무
480481OVRY_HLTH기타건강정보MR_OVRY_HLTH_10과거력PHIS_CADZ_YN심장질환String환자의 과거 심장질환 유무<NA>Y유 | N 무
481482OVRY_HLTH기타건강정보MR_OVRY_HLTH_10과거력PHIS_ETC_YN기타String환자의 과거 기타 병력 유무<NA>Y유 | N 무
482483OVRY_HLTH기타건강정보MR_OVRY_HLTH_10과거력PHIS_HTN_CMNT고혈압 상세내용String환자의 과거 고혈압 관련 기타내용<NA>텍스트
483484OVRY_HLTH기타건강정보MR_OVRY_HLTH_10과거력PHIS_DM_CMNT당뇨내용String환자의 과거 당뇨 관련 기타 내용<NA>텍스트
484485OVRY_HLTH기타건강정보MR_OVRY_HLTH_10과거력PHIS_CADZ_CMNT심장질환 상세내용String환자의 과거 심장질환 관련 기타 내용<NA>텍스트
485486OVRY_HLTH기타건강정보MR_OVRY_HLTH_10과거력PHIS_ETC_CMNT기타 상세내용String환자의 과거 기타 병력 내용<NA>텍스트
486487OVRY_HLTH기타건강정보MR_OVRY_HLTH_11증상/전원 정보MAIN_SYMP_YN증상String환자의 주 증상 유무<NA>Y유 | N 무
487488OVRY_HLTH기타건강정보MR_OVRY_HLTH_11증상/전원 정보MAIN_SYMP_CMNT증상 상세내용String환자의 증상 상세내용<NA>텍스트
488489OVRY_HLTH기타건강정보MR_OVRY_HLTH_11증상/전원 정보OUTS_DIAG_TRANS_YN타 병원 진단 후 전원String환자의 타 병원 진단 후 전원 여부<NA>Y유 | N 무