Overview

Dataset statistics

Number of variables10
Number of observations104
Missing cells6
Missing cells (%)0.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.4 KiB
Average record size in memory82.3 B

Variable types

Numeric1
Categorical5
Text1
DateTime1
Boolean2

Dataset

Description국립암센터에서 2022년도 10월7일까지 국가암정보센터의 정보 암종별 시퀀스코드를 포함하고 있습니다. 암종>암종카테고리>시퀀스코드
Author국립암센터
URLhttps://www.data.go.kr/data/15049621/fileData.do

Alerts

암종분류(CANCER_PART) is highly overall correlated with 첨부파일명(FILE_ORG) and 3 other fieldsHigh correlation
첨부파일저장명(FILE_NM) is highly overall correlated with 순번(BOARD_SEQ) and 6 other fieldsHigh correlation
첨부파일경로(FILE_PATH) is highly overall correlated with 순번(BOARD_SEQ) and 6 other fieldsHigh correlation
암종연령대(CANCER_AGE) is highly overall correlated with 첨부파일명(FILE_ORG) and 2 other fieldsHigh correlation
삭제여부(DEL_YN) is highly overall correlated with 암종분류(CANCER_PART) and 3 other fieldsHigh correlation
사용여부(USE_YN) is highly overall correlated with 첨부파일명(FILE_ORG) and 2 other fieldsHigh correlation
첨부파일명(FILE_ORG) is highly overall correlated with 순번(BOARD_SEQ) and 6 other fieldsHigh correlation
순번(BOARD_SEQ) is highly overall correlated with 첨부파일명(FILE_ORG) and 2 other fieldsHigh correlation
암종연령대(CANCER_AGE) is highly imbalanced (70.4%)Imbalance
첨부파일명(FILE_ORG) is highly imbalanced (92.2%)Imbalance
첨부파일경로(FILE_PATH) is highly imbalanced (92.2%)Imbalance
첨부파일저장명(FILE_NM) is highly imbalanced (92.2%)Imbalance
사용여부(USE_YN) is highly imbalanced (81.0%)Imbalance
삭제여부(DEL_YN) is highly imbalanced (92.1%)Imbalance
순번(BOARD_SEQ) has 2 (1.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 21:29:42.143316
Analysis finished2023-12-12 21:29:43.260594
Duration1.12 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순번(BOARD_SEQ)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct102
Distinct (%)100.0%
Missing2
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean1382199.6
Minimum3293
Maximum8723941
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-13T06:29:43.385073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3293
5-th percentile3414.2
Q13947
median4601
Q35231
95-th percentile8665581.8
Maximum8723941
Range8720648
Interquartile range (IQR)1284

Descriptive statistics

Standard deviation2970715.1
Coefficient of variation (CV)2.1492664
Kurtosis1.4072384
Mean1382199.6
Median Absolute Deviation (MAD)648
Skewness1.798444
Sum1.4098436 × 108
Variance8.825148 × 1012
MonotonicityStrictly increasing
2023-12-13T06:29:43.521895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4949 1
 
1.0%
5213 1
 
1.0%
5189 1
 
1.0%
5141 1
 
1.0%
5117 1
 
1.0%
5093 1
 
1.0%
5069 1
 
1.0%
5045 1
 
1.0%
5021 1
 
1.0%
4997 1
 
1.0%
Other values (92) 92
88.5%
(Missing) 2
 
1.9%
ValueCountFrequency (%)
3293 1
1.0%
3317 1
1.0%
3341 1
1.0%
3365 1
1.0%
3389 1
1.0%
3413 1
1.0%
3437 1
1.0%
3461 1
1.0%
3485 1
1.0%
3509 1
1.0%
ValueCountFrequency (%)
8723941 1
1.0%
8723940 1
1.0%
8723939 1
1.0%
8723938 1
1.0%
8687522 1
1.0%
8676000 1
1.0%
8467636 1
1.0%
7882456 1
1.0%
7880257 1
1.0%
7573253 1
1.0%

암종분류(CANCER_PART)
Categorical

HIGH CORRELATION 

Distinct23
Distinct (%)22.1%
Missing0
Missing (%)0.0%
Memory size964.0 B
뇌/척수
10 
두경부
10 
비뇨생식기
부인과
Other values (18)
59 

Length

Max length11
Median length5
Mean length3.9038462
Min length1

Unique

Unique6 ?
Unique (%)5.8%

Sample

1st row간/담즙/췌장
2nd row간/담즙/췌장
3rd row내분비
4th row결장/직장/항문/소장
5th row비뇨생식기

Common Values

ValueCountFrequency (%)
뇌/척수 10
 
9.6%
두경부 10
 
9.6%
비뇨생식기 9
 
8.7%
부인과 8
 
7.7%
8
 
7.7%
간/담즙/췌장 7
 
6.7%
결장/직장/항문/소장 7
 
6.7%
골수및혈액 7
 
6.7%
근골격 6
 
5.8%
식도/위 5
 
4.8%
Other values (13) 27
26.0%

Length

2023-12-13T06:29:43.684844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
뇌/척수 10
 
9.6%
두경부 10
 
9.6%
비뇨생식기 9
 
8.7%
부인과 8
 
7.7%
8
 
7.7%
간/담즙/췌장 7
 
6.7%
결장/직장/항문/소장 7
 
6.7%
골수및혈액 7
 
6.7%
근골격 6
 
5.8%
식도/위 5
 
4.8%
Other values (13) 27
26.0%
Distinct103
Distinct (%)100.0%
Missing1
Missing (%)1.0%
Memory size964.0 B
2023-12-13T06:29:44.023517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length15
Mean length4.9902913
Min length2

Characters and Unicode

Total characters514
Distinct characters145
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique103 ?
Unique (%)100.0%

Sample

1st row간내 담도암
2nd row간암
3rd row갑상선암
4th row결장암
5th row고환암
ValueCountFrequency (%)
소아청소년 3
 
2.6%
뇌종양 3
 
2.6%
담도암 2
 
1.7%
육종 2
 
1.7%
전이성 2
 
1.7%
직장 1
 
0.9%
폐암 1
 
0.9%
폐선암 1
 
0.9%
편평상피세포암 1
 
0.9%
침샘암 1
 
0.9%
Other values (98) 98
85.2%
2023-12-13T06:29:44.536266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
58
 
11.3%
38
 
7.4%
19
 
3.7%
13
 
2.5%
12
 
2.3%
12
 
2.3%
12
 
2.3%
11
 
2.1%
10
 
1.9%
9
 
1.8%
Other values (135) 320
62.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 484
94.2%
Space Separator 12
 
2.3%
Uppercase Letter 7
 
1.4%
Lowercase Letter 4
 
0.8%
Dash Punctuation 2
 
0.4%
Open Punctuation 2
 
0.4%
Close Punctuation 2
 
0.4%
Other Punctuation 1
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
58
 
12.0%
38
 
7.9%
19
 
3.9%
13
 
2.7%
12
 
2.5%
12
 
2.5%
11
 
2.3%
10
 
2.1%
9
 
1.9%
8
 
1.7%
Other values (123) 294
60.7%
Uppercase Letter
ValueCountFrequency (%)
V 2
28.6%
P 2
28.6%
H 2
28.6%
B 1
14.3%
Lowercase Letter
ValueCountFrequency (%)
t 2
50.0%
e 1
25.0%
s 1
25.0%
Space Separator
ValueCountFrequency (%)
12
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Other Punctuation
ValueCountFrequency (%)
· 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 484
94.2%
Common 19
 
3.7%
Latin 11
 
2.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
58
 
12.0%
38
 
7.9%
19
 
3.9%
13
 
2.7%
12
 
2.5%
12
 
2.5%
11
 
2.3%
10
 
2.1%
9
 
1.9%
8
 
1.7%
Other values (123) 294
60.7%
Latin
ValueCountFrequency (%)
V 2
18.2%
P 2
18.2%
H 2
18.2%
t 2
18.2%
B 1
9.1%
e 1
9.1%
s 1
9.1%
Common
ValueCountFrequency (%)
12
63.2%
- 2
 
10.5%
( 2
 
10.5%
) 2
 
10.5%
· 1
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 484
94.2%
ASCII 29
 
5.6%
None 1
 
0.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
58
 
12.0%
38
 
7.9%
19
 
3.9%
13
 
2.7%
12
 
2.5%
12
 
2.5%
11
 
2.3%
10
 
2.1%
9
 
1.9%
8
 
1.7%
Other values (123) 294
60.7%
ASCII
ValueCountFrequency (%)
12
41.4%
V 2
 
6.9%
- 2
 
6.9%
( 2
 
6.9%
P 2
 
6.9%
H 2
 
6.9%
t 2
 
6.9%
) 2
 
6.9%
B 1
 
3.4%
e 1
 
3.4%
None
ValueCountFrequency (%)
· 1
100.0%

암종연령대(CANCER_AGE)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size964.0 B
성인
95 
소아
 
8
<NA>
 
1

Length

Max length4
Median length2
Mean length2.0192308
Min length2

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row성인
2nd row성인
3rd row성인
4th row성인
5th row성인

Common Values

ValueCountFrequency (%)
성인 95
91.3%
소아 8
 
7.7%
<NA> 1
 
1.0%

Length

2023-12-13T06:29:44.720852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:29:44.833579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
성인 95
91.3%
소아 8
 
7.7%
na 1
 
1.0%
Distinct103
Distinct (%)100.0%
Missing1
Missing (%)1.0%
Memory size964.0 B
Minimum2012-08-23 15:45:53
Maximum2019-08-06 12:04:14
2023-12-13T06:29:44.948493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:29:45.089083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

첨부파일명(FILE_ORG)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size964.0 B
\N
103 
<NA>
 
1

Length

Max length4
Median length2
Mean length2.0192308
Min length2

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row\N
2nd row\N
3rd row\N
4th row\N
5th row\N

Common Values

ValueCountFrequency (%)
\N 103
99.0%
<NA> 1
 
1.0%

Length

2023-12-13T06:29:45.230422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:29:45.338163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
n 103
99.0%
na 1
 
1.0%

첨부파일경로(FILE_PATH)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size964.0 B
\N
103 
<NA>
 
1

Length

Max length4
Median length2
Mean length2.0192308
Min length2

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row\N
2nd row\N
3rd row\N
4th row\N
5th row\N

Common Values

ValueCountFrequency (%)
\N 103
99.0%
<NA> 1
 
1.0%

Length

2023-12-13T06:29:45.479343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:29:45.580634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
n 103
99.0%
na 1
 
1.0%

첨부파일저장명(FILE_NM)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size964.0 B
\N
103 
<NA>
 
1

Length

Max length4
Median length2
Mean length2.0192308
Min length2

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row\N
2nd row\N
3rd row\N
4th row\N
5th row\N

Common Values

ValueCountFrequency (%)
\N 103
99.0%
<NA> 1
 
1.0%

Length

2023-12-13T06:29:46.002975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:29:46.095955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
n 103
99.0%
na 1
 
1.0%

사용여부(USE_YN)
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing1
Missing (%)1.0%
Memory size340.0 B
True
100 
False
 
3
(Missing)
 
1
ValueCountFrequency (%)
True 100
96.2%
False 3
 
2.9%
(Missing) 1
 
1.0%
2023-12-13T06:29:46.186156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

삭제여부(DEL_YN)
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing1
Missing (%)1.0%
Memory size340.0 B
False
102 
True
 
1
(Missing)
 
1
ValueCountFrequency (%)
False 102
98.1%
True 1
 
1.0%
(Missing) 1
 
1.0%
2023-12-13T06:29:46.265812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Interactions

2023-12-13T06:29:42.600560image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T06:29:46.328307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번(BOARD_SEQ)암종분류(CANCER_PART)암종연령대(CANCER_AGE)사용여부(USE_YN)삭제여부(DEL_YN)
순번(BOARD_SEQ)1.0000.5220.0000.3020.209
암종분류(CANCER_PART)0.5221.0000.5130.0001.000
암종연령대(CANCER_AGE)0.0000.5131.0000.0000.000
사용여부(USE_YN)0.3020.0000.0001.0000.000
삭제여부(DEL_YN)0.2091.0000.0000.0001.000
2023-12-13T06:29:46.437070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
암종분류(CANCER_PART)첨부파일저장명(FILE_NM)첨부파일경로(FILE_PATH)암종연령대(CANCER_AGE)삭제여부(DEL_YN)사용여부(USE_YN)첨부파일명(FILE_ORG)
암종분류(CANCER_PART)1.0001.0001.0000.3630.8960.0001.000
첨부파일저장명(FILE_NM)1.0001.0001.0001.0001.0001.0001.000
첨부파일경로(FILE_PATH)1.0001.0001.0001.0001.0001.0001.000
암종연령대(CANCER_AGE)0.3631.0001.0001.0000.0000.0001.000
삭제여부(DEL_YN)0.8961.0001.0000.0001.0000.0001.000
사용여부(USE_YN)0.0001.0001.0000.0000.0001.0001.000
첨부파일명(FILE_ORG)1.0001.0001.0001.0001.0001.0001.000
2023-12-13T06:29:46.558517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번(BOARD_SEQ)암종분류(CANCER_PART)암종연령대(CANCER_AGE)첨부파일명(FILE_ORG)첨부파일경로(FILE_PATH)첨부파일저장명(FILE_NM)사용여부(USE_YN)삭제여부(DEL_YN)
순번(BOARD_SEQ)1.0000.2630.0001.0001.0001.0000.3630.252
암종분류(CANCER_PART)0.2631.0000.3631.0001.0001.0000.0000.896
암종연령대(CANCER_AGE)0.0000.3631.0001.0001.0001.0000.0000.000
첨부파일명(FILE_ORG)1.0001.0001.0001.0001.0001.0001.0001.000
첨부파일경로(FILE_PATH)1.0001.0001.0001.0001.0001.0001.0001.000
첨부파일저장명(FILE_NM)1.0001.0001.0001.0001.0001.0001.0001.000
사용여부(USE_YN)0.3630.0000.0001.0001.0001.0001.0000.000
삭제여부(DEL_YN)0.2520.8960.0001.0001.0001.0000.0001.000

Missing values

2023-12-13T06:29:42.733389image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:29:42.913293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T06:29:43.117470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

순번(BOARD_SEQ)암종분류(CANCER_PART)암종명(CANCER_NAME)암종연령대(CANCER_AGE)작성일(REG_DATE)첨부파일명(FILE_ORG)첨부파일경로(FILE_PATH)첨부파일저장명(FILE_NM)사용여부(USE_YN)삭제여부(DEL_YN)
03293간/담즙/췌장간내 담도암성인2012-08-23 15:45:53\N\N\NYN
13317간/담즙/췌장간암성인2012-08-23 15:46:21\N\N\NYN
23341내분비갑상선암성인2012-08-23 15:46:39\N\N\NYN
33365결장/직장/항문/소장결장암성인2012-08-23 15:47:04\N\N\NYN
43389비뇨생식기고환암성인2012-08-23 15:47:56\N\N\NYN
53413골수및혈액골수이형성증후군성인2012-08-23 15:48:43\N\N\NYN
63437뇌/척수교모세포종성인2012-08-23 15:49:35\N\N\NYN
73461두경부구강암성인2012-08-23 15:50:04\N\N\NYN
83485피부균상식육종성인2012-08-23 15:51:01\N\N\NYN
93509골수및혈액급성골수성백혈병성인2012-08-23 15:51:29\N\N\NYN
순번(BOARD_SEQ)암종분류(CANCER_PART)암종명(CANCER_NAME)암종연령대(CANCER_AGE)작성일(REG_DATE)첨부파일명(FILE_ORG)첨부파일경로(FILE_PATH)첨부파일저장명(FILE_NM)사용여부(USE_YN)삭제여부(DEL_YN)
947880257비뇨생식기신우암성인2014-11-12 09:48:48\N\N\NYN
957882456비뇨생식기요관암성인2014-11-12 12:49:46\N\N\NYN
968467636심장종격동암성인2014-12-24 10:51:31\N\N\NYN
978676000간모세포종소아2015-01-12 14:39:23\N\N\NYN
988687522근골격횡문근육종소아2015-01-13 10:10:50\N\N\NYN
998723938간/담즙/췌장담낭·담도암성인2015-01-15 15:27:44\N\N\NYN
1008723939두경부구인두-편도암(HPV 관련)성인2019-08-01 09:06:11\N\N\NYN
1018723940두경부구인두-하인두암(HPV 비관련)성인2019-08-01 14:51:08\N\N\NYN
1028723941testtest성인2019-08-06 12:04:14\N\N\NYY
103<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>