Dataset statistics
Number of variables | 10 |
---|---|
Number of observations | 104 |
Missing cells | 6 |
Missing cells (%) | 0.6% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 8.4 KiB |
Average record size in memory | 82.3 B |
Variable types
Numeric | 1 |
---|---|
Categorical | 5 |
Text | 1 |
DateTime | 1 |
Boolean | 2 |
Dataset
Description | 국립암센터에서 2022년도 10월7일까지 국가암정보센터의 정보 암종별 시퀀스코드를 포함하고 있습니다. 암종>암종카테고리>시퀀스코드 |
---|---|
Author | 국립암센터 |
URL | https://www.data.go.kr/data/15049621/fileData.do |
암종분류(CANCER_PART) is highly overall correlated with 첨부파일명(FILE_ORG) and 3 other fields | High correlation |
첨부파일저장명(FILE_NM) is highly overall correlated with 순번(BOARD_SEQ) and 6 other fields | High correlation |
첨부파일경로(FILE_PATH) is highly overall correlated with 순번(BOARD_SEQ) and 6 other fields | High correlation |
암종연령대(CANCER_AGE) is highly overall correlated with 첨부파일명(FILE_ORG) and 2 other fields | High correlation |
삭제여부(DEL_YN) is highly overall correlated with 암종분류(CANCER_PART) and 3 other fields | High correlation |
사용여부(USE_YN) is highly overall correlated with 첨부파일명(FILE_ORG) and 2 other fields | High correlation |
첨부파일명(FILE_ORG) is highly overall correlated with 순번(BOARD_SEQ) and 6 other fields | High correlation |
순번(BOARD_SEQ) is highly overall correlated with 첨부파일명(FILE_ORG) and 2 other fields | High correlation |
암종연령대(CANCER_AGE) is highly imbalanced (70.4%) | Imbalance |
첨부파일명(FILE_ORG) is highly imbalanced (92.2%) | Imbalance |
첨부파일경로(FILE_PATH) is highly imbalanced (92.2%) | Imbalance |
첨부파일저장명(FILE_NM) is highly imbalanced (92.2%) | Imbalance |
사용여부(USE_YN) is highly imbalanced (81.0%) | Imbalance |
삭제여부(DEL_YN) is highly imbalanced (92.1%) | Imbalance |
순번(BOARD_SEQ) has 2 (1.9%) missing values | Missing |
Reproduction
Analysis started | 2023-12-12 21:29:42.143316 |
---|---|
Analysis finished | 2023-12-12 21:29:43.260594 |
Duration | 1.12 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
순번(BOARD_SEQ)
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 102 |
---|---|
Distinct (%) | 100.0% |
Missing | 2 |
Missing (%) | 1.9% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1382199.6 |
Minimum | 3293 |
---|---|
Maximum | 8723941 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 3293 |
---|---|
5-th percentile | 3414.2 |
Q1 | 3947 |
median | 4601 |
Q3 | 5231 |
95-th percentile | 8665581.8 |
Maximum | 8723941 |
Range | 8720648 |
Interquartile range (IQR) | 1284 |
Descriptive statistics
Standard deviation | 2970715.1 |
---|---|
Coefficient of variation (CV) | 2.1492664 |
Kurtosis | 1.4072384 |
Mean | 1382199.6 |
Median Absolute Deviation (MAD) | 648 |
Skewness | 1.798444 |
Sum | 1.4098436 × 108 |
Variance | 8.825148 × 1012 |
Monotonicity | Strictly increasing |
Value | Count | Frequency (%) |
4949 | 1 | 1.0% |
5213 | 1 | 1.0% |
5189 | 1 | 1.0% |
5141 | 1 | 1.0% |
5117 | 1 | 1.0% |
5093 | 1 | 1.0% |
5069 | 1 | 1.0% |
5045 | 1 | 1.0% |
5021 | 1 | 1.0% |
4997 | 1 | 1.0% |
Other values (92) | 92 | |
(Missing) | 2 | 1.9% |
Value | Count | Frequency (%) |
3293 | 1 | |
3317 | 1 | |
3341 | 1 | |
3365 | 1 | |
3389 | 1 | |
3413 | 1 | |
3437 | 1 | |
3461 | 1 | |
3485 | 1 | |
3509 | 1 |
Value | Count | Frequency (%) |
8723941 | 1 | |
8723940 | 1 | |
8723939 | 1 | |
8723938 | 1 | |
8687522 | 1 | |
8676000 | 1 | |
8467636 | 1 | |
7882456 | 1 | |
7880257 | 1 | |
7573253 | 1 |
암종분류(CANCER_PART)
Categorical
HIGH CORRELATION
 
Distinct | 23 |
---|---|
Distinct (%) | 22.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 964.0 B |
뇌/척수 | |
---|---|
두경부 | |
비뇨생식기 | |
부인과 | |
폐 | |
Other values (18) |
Length
Max length | 11 |
---|---|
Median length | 5 |
Mean length | 3.9038462 |
Min length | 1 |
Unique
Unique | 6 ? |
---|---|
Unique (%) | 5.8% |
Sample
1st row | 간/담즙/췌장 |
---|---|
2nd row | 간/담즙/췌장 |
3rd row | 내분비 |
4th row | 결장/직장/항문/소장 |
5th row | 비뇨생식기 |
Common Values
Value | Count | Frequency (%) |
뇌/척수 | 10 | 9.6% |
두경부 | 10 | 9.6% |
비뇨생식기 | 9 | 8.7% |
부인과 | 8 | 7.7% |
폐 | 8 | 7.7% |
간/담즙/췌장 | 7 | 6.7% |
결장/직장/항문/소장 | 7 | 6.7% |
골수및혈액 | 7 | 6.7% |
근골격 | 6 | 5.8% |
식도/위 | 5 | 4.8% |
Other values (13) | 27 |
Length
Value | Count | Frequency (%) |
뇌/척수 | 10 | 9.6% |
두경부 | 10 | 9.6% |
비뇨생식기 | 9 | 8.7% |
부인과 | 8 | 7.7% |
폐 | 8 | 7.7% |
간/담즙/췌장 | 7 | 6.7% |
결장/직장/항문/소장 | 7 | 6.7% |
골수및혈액 | 7 | 6.7% |
근골격 | 6 | 5.8% |
식도/위 | 5 | 4.8% |
Other values (13) | 27 |
암종명(CANCER_NAME)
Text
Distinct | 103 |
---|---|
Distinct (%) | 100.0% |
Missing | 1 |
Missing (%) | 1.0% |
Memory size | 964.0 B |
Value | Count | Frequency (%) |
소아청소년 | 3 | 2.6% |
뇌종양 | 3 | 2.6% |
담도암 | 2 | 1.7% |
육종 | 2 | 1.7% |
전이성 | 2 | 1.7% |
직장 | 1 | 0.9% |
폐암 | 1 | 0.9% |
폐선암 | 1 | 0.9% |
편평상피세포암 | 1 | 0.9% |
침샘암 | 1 | 0.9% |
Other values (98) | 98 |
Most occurring characters
Value | Count | Frequency (%) |
암 | 58 | 11.3% |
종 | 38 | 7.4% |
성 | 19 | 3.7% |
포 | 13 | 2.5% |
양 | 12 | 2.3% |
12 | 2.3% | |
세 | 12 | 2.3% |
소 | 11 | 2.1% |
장 | 10 | 1.9% |
부 | 9 | 1.8% |
Other values (135) | 320 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 484 | |
Space Separator | 12 | 2.3% |
Uppercase Letter | 7 | 1.4% |
Lowercase Letter | 4 | 0.8% |
Dash Punctuation | 2 | 0.4% |
Open Punctuation | 2 | 0.4% |
Close Punctuation | 2 | 0.4% |
Other Punctuation | 1 | 0.2% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
암 | 58 | 12.0% |
종 | 38 | 7.9% |
성 | 19 | 3.9% |
포 | 13 | 2.7% |
양 | 12 | 2.5% |
세 | 12 | 2.5% |
소 | 11 | 2.3% |
장 | 10 | 2.1% |
부 | 9 | 1.9% |
신 | 8 | 1.7% |
Other values (123) | 294 |
Uppercase Letter
Value | Count | Frequency (%) |
V | 2 | |
P | 2 | |
H | 2 | |
B | 1 |
Lowercase Letter
Value | Count | Frequency (%) |
t | 2 | |
e | 1 | |
s | 1 |
Space Separator
Value | Count | Frequency (%) |
12 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 2 |
Open Punctuation
Value | Count | Frequency (%) |
( | 2 |
Close Punctuation
Value | Count | Frequency (%) |
) | 2 |
Other Punctuation
Value | Count | Frequency (%) |
· | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 484 | |
Common | 19 | 3.7% |
Latin | 11 | 2.1% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
암 | 58 | 12.0% |
종 | 38 | 7.9% |
성 | 19 | 3.9% |
포 | 13 | 2.7% |
양 | 12 | 2.5% |
세 | 12 | 2.5% |
소 | 11 | 2.3% |
장 | 10 | 2.1% |
부 | 9 | 1.9% |
신 | 8 | 1.7% |
Other values (123) | 294 |
Latin
Value | Count | Frequency (%) |
V | 2 | |
P | 2 | |
H | 2 | |
t | 2 | |
B | 1 | |
e | 1 | |
s | 1 |
Common
Value | Count | Frequency (%) |
12 | ||
- | 2 | 10.5% |
( | 2 | 10.5% |
) | 2 | 10.5% |
· | 1 | 5.3% |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 484 | |
ASCII | 29 | 5.6% |
None | 1 | 0.2% |
Most frequent character per block
Hangul
Value | Count | Frequency (%) |
암 | 58 | 12.0% |
종 | 38 | 7.9% |
성 | 19 | 3.9% |
포 | 13 | 2.7% |
양 | 12 | 2.5% |
세 | 12 | 2.5% |
소 | 11 | 2.3% |
장 | 10 | 2.1% |
부 | 9 | 1.9% |
신 | 8 | 1.7% |
Other values (123) | 294 |
ASCII
Value | Count | Frequency (%) |
12 | ||
V | 2 | 6.9% |
- | 2 | 6.9% |
( | 2 | 6.9% |
P | 2 | 6.9% |
H | 2 | 6.9% |
t | 2 | 6.9% |
) | 2 | 6.9% |
B | 1 | 3.4% |
e | 1 | 3.4% |
None
Value | Count | Frequency (%) |
· | 1 |
암종연령대(CANCER_AGE)
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 3 |
---|---|
Distinct (%) | 2.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 964.0 B |
성인 | |
---|---|
소아 | 8 |
<NA> | 1 |
Length
Max length | 4 |
---|---|
Median length | 2 |
Mean length | 2.0192308 |
Min length | 2 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 1.0% |
Sample
1st row | 성인 |
---|---|
2nd row | 성인 |
3rd row | 성인 |
4th row | 성인 |
5th row | 성인 |
Common Values
Value | Count | Frequency (%) |
성인 | 95 | |
소아 | 8 | 7.7% |
<NA> | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
성인 | 95 | |
소아 | 8 | 7.7% |
na | 1 | 1.0% |
작성일(REG_DATE)
Date
Distinct | 103 |
---|---|
Distinct (%) | 100.0% |
Missing | 1 |
Missing (%) | 1.0% |
Memory size | 964.0 B |
Minimum | 2012-08-23 15:45:53 |
---|---|
Maximum | 2019-08-06 12:04:14 |
첨부파일명(FILE_ORG)
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 1.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 964.0 B |
\N | |
---|---|
<NA> | 1 |
Length
Max length | 4 |
---|---|
Median length | 2 |
Mean length | 2.0192308 |
Min length | 2 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 1.0% |
Sample
1st row | \N |
---|---|
2nd row | \N |
3rd row | \N |
4th row | \N |
5th row | \N |
Common Values
Value | Count | Frequency (%) |
\N | 103 | |
<NA> | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
n | 103 | |
na | 1 | 1.0% |
첨부파일경로(FILE_PATH)
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 1.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 964.0 B |
\N | |
---|---|
<NA> | 1 |
Length
Max length | 4 |
---|---|
Median length | 2 |
Mean length | 2.0192308 |
Min length | 2 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 1.0% |
Sample
1st row | \N |
---|---|
2nd row | \N |
3rd row | \N |
4th row | \N |
5th row | \N |
Common Values
Value | Count | Frequency (%) |
\N | 103 | |
<NA> | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
n | 103 | |
na | 1 | 1.0% |
첨부파일저장명(FILE_NM)
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 1.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 964.0 B |
\N | |
---|---|
<NA> | 1 |
Length
Max length | 4 |
---|---|
Median length | 2 |
Mean length | 2.0192308 |
Min length | 2 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 1.0% |
Sample
1st row | \N |
---|---|
2nd row | \N |
3rd row | \N |
4th row | \N |
5th row | \N |
Common Values
Value | Count | Frequency (%) |
\N | 103 | |
<NA> | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
n | 103 | |
na | 1 | 1.0% |
사용여부(USE_YN)
Boolean
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 1.9% |
Missing | 1 |
Missing (%) | 1.0% |
Memory size | 340.0 B |
True | |
---|---|
False | 3 |
(Missing) | 1 |
Value | Count | Frequency (%) |
True | 100 | |
False | 3 | 2.9% |
(Missing) | 1 | 1.0% |
삭제여부(DEL_YN)
Boolean
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 1.9% |
Missing | 1 |
Missing (%) | 1.0% |
Memory size | 340.0 B |
False | |
---|---|
True | 1 |
(Missing) | 1 |
Value | Count | Frequency (%) |
False | 102 | |
True | 1 | 1.0% |
(Missing) | 1 | 1.0% |
순번(BOARD_SEQ) | 암종분류(CANCER_PART) | 암종연령대(CANCER_AGE) | 사용여부(USE_YN) | 삭제여부(DEL_YN) | |
---|---|---|---|---|---|
순번(BOARD_SEQ) | 1.000 | 0.522 | 0.000 | 0.302 | 0.209 |
암종분류(CANCER_PART) | 0.522 | 1.000 | 0.513 | 0.000 | 1.000 |
암종연령대(CANCER_AGE) | 0.000 | 0.513 | 1.000 | 0.000 | 0.000 |
사용여부(USE_YN) | 0.302 | 0.000 | 0.000 | 1.000 | 0.000 |
삭제여부(DEL_YN) | 0.209 | 1.000 | 0.000 | 0.000 | 1.000 |
암종분류(CANCER_PART) | 첨부파일저장명(FILE_NM) | 첨부파일경로(FILE_PATH) | 암종연령대(CANCER_AGE) | 삭제여부(DEL_YN) | 사용여부(USE_YN) | 첨부파일명(FILE_ORG) | |
---|---|---|---|---|---|---|---|
암종분류(CANCER_PART) | 1.000 | 1.000 | 1.000 | 0.363 | 0.896 | 0.000 | 1.000 |
첨부파일저장명(FILE_NM) | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
첨부파일경로(FILE_PATH) | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
암종연령대(CANCER_AGE) | 0.363 | 1.000 | 1.000 | 1.000 | 0.000 | 0.000 | 1.000 |
삭제여부(DEL_YN) | 0.896 | 1.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 |
사용여부(USE_YN) | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 1.000 | 1.000 |
첨부파일명(FILE_ORG) | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
순번(BOARD_SEQ) | 암종분류(CANCER_PART) | 암종연령대(CANCER_AGE) | 첨부파일명(FILE_ORG) | 첨부파일경로(FILE_PATH) | 첨부파일저장명(FILE_NM) | 사용여부(USE_YN) | 삭제여부(DEL_YN) | |
---|---|---|---|---|---|---|---|---|
순번(BOARD_SEQ) | 1.000 | 0.263 | 0.000 | 1.000 | 1.000 | 1.000 | 0.363 | 0.252 |
암종분류(CANCER_PART) | 0.263 | 1.000 | 0.363 | 1.000 | 1.000 | 1.000 | 0.000 | 0.896 |
암종연령대(CANCER_AGE) | 0.000 | 0.363 | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.000 |
첨부파일명(FILE_ORG) | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
첨부파일경로(FILE_PATH) | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
첨부파일저장명(FILE_NM) | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
사용여부(USE_YN) | 0.363 | 0.000 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 |
삭제여부(DEL_YN) | 0.252 | 0.896 | 0.000 | 1.000 | 1.000 | 1.000 | 0.000 | 1.000 |
순번(BOARD_SEQ) | 암종분류(CANCER_PART) | 암종명(CANCER_NAME) | 암종연령대(CANCER_AGE) | 작성일(REG_DATE) | 첨부파일명(FILE_ORG) | 첨부파일경로(FILE_PATH) | 첨부파일저장명(FILE_NM) | 사용여부(USE_YN) | 삭제여부(DEL_YN) | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 3293 | 간/담즙/췌장 | 간내 담도암 | 성인 | 2012-08-23 15:45:53 | \N | \N | \N | Y | N |
1 | 3317 | 간/담즙/췌장 | 간암 | 성인 | 2012-08-23 15:46:21 | \N | \N | \N | Y | N |
2 | 3341 | 내분비 | 갑상선암 | 성인 | 2012-08-23 15:46:39 | \N | \N | \N | Y | N |
3 | 3365 | 결장/직장/항문/소장 | 결장암 | 성인 | 2012-08-23 15:47:04 | \N | \N | \N | Y | N |
4 | 3389 | 비뇨생식기 | 고환암 | 성인 | 2012-08-23 15:47:56 | \N | \N | \N | Y | N |
5 | 3413 | 골수및혈액 | 골수이형성증후군 | 성인 | 2012-08-23 15:48:43 | \N | \N | \N | Y | N |
6 | 3437 | 뇌/척수 | 교모세포종 | 성인 | 2012-08-23 15:49:35 | \N | \N | \N | Y | N |
7 | 3461 | 두경부 | 구강암 | 성인 | 2012-08-23 15:50:04 | \N | \N | \N | Y | N |
8 | 3485 | 피부 | 균상식육종 | 성인 | 2012-08-23 15:51:01 | \N | \N | \N | Y | N |
9 | 3509 | 골수및혈액 | 급성골수성백혈병 | 성인 | 2012-08-23 15:51:29 | \N | \N | \N | Y | N |
순번(BOARD_SEQ) | 암종분류(CANCER_PART) | 암종명(CANCER_NAME) | 암종연령대(CANCER_AGE) | 작성일(REG_DATE) | 첨부파일명(FILE_ORG) | 첨부파일경로(FILE_PATH) | 첨부파일저장명(FILE_NM) | 사용여부(USE_YN) | 삭제여부(DEL_YN) | |
---|---|---|---|---|---|---|---|---|---|---|
94 | 7880257 | 비뇨생식기 | 신우암 | 성인 | 2014-11-12 09:48:48 | \N | \N | \N | Y | N |
95 | 7882456 | 비뇨생식기 | 요관암 | 성인 | 2014-11-12 12:49:46 | \N | \N | \N | Y | N |
96 | 8467636 | 심장 | 종격동암 | 성인 | 2014-12-24 10:51:31 | \N | \N | \N | Y | N |
97 | 8676000 | 간 | 간모세포종 | 소아 | 2015-01-12 14:39:23 | \N | \N | \N | Y | N |
98 | 8687522 | 근골격 | 횡문근육종 | 소아 | 2015-01-13 10:10:50 | \N | \N | \N | Y | N |
99 | 8723938 | 간/담즙/췌장 | 담낭·담도암 | 성인 | 2015-01-15 15:27:44 | \N | \N | \N | Y | N |
100 | 8723939 | 두경부 | 구인두-편도암(HPV 관련) | 성인 | 2019-08-01 09:06:11 | \N | \N | \N | Y | N |
101 | 8723940 | 두경부 | 구인두-하인두암(HPV 비관련) | 성인 | 2019-08-01 14:51:08 | \N | \N | \N | Y | N |
102 | 8723941 | test | test | 성인 | 2019-08-06 12:04:14 | \N | \N | \N | Y | Y |
103 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |