Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells1561
Missing cells (%)2.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Categorical5
Numeric2

Dataset

Description중앙암등록본부(국립암센터 지정)가 국가암등록통계사업을 통해 수집한 [2021년 암등록통계]자료로 1999년부터 2021년도 까지의 암발생 통계 정보를 제공합니다.(단위 : 명, 10만 명 당 발생률)
Author국립암센터
URLhttps://www.data.go.kr/data/3039563/fileData.do

Alerts

국제질병분류 is highly overall correlated with 암종High correlation
암종 is highly overall correlated with 국제질병분류High correlation
발생자수 is highly overall correlated with 조발생률High correlation
조발생률 is highly overall correlated with 발생자수High correlation
조발생률 has 1561 (15.6%) missing valuesMissing
발생자수 is highly skewed (γ1 = 69.98564945)Skewed
발생자수 has 1561 (15.6%) zerosZeros
조발생률 has 189 (1.9%) zerosZeros

Reproduction

Analysis started2024-03-14 19:33:53.122428
Analysis finished2024-03-14 19:33:55.728793
Duration2.61 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

발생연도
Categorical

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2020
 
444
2001
 
442
2005
 
437
1999
 
432
2014
 
432
Other values (19)
7813 

Length

Max length9
Median length4
Mean length4.203
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2009
2nd row2015
3rd row2009
4th row2017
5th row2011

Common Values

ValueCountFrequency (%)
2020 444
 
4.4%
2001 442
 
4.4%
2005 437
 
4.4%
1999 432
 
4.3%
2014 432
 
4.3%
2004 431
 
4.3%
2003 425
 
4.2%
2013 422
 
4.2%
2002 422
 
4.2%
2008 420
 
4.2%
Other values (14) 5693
56.9%

Length

2024-03-15T04:33:55.942825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020 444
 
4.4%
2001 442
 
4.4%
2005 437
 
4.4%
1999 432
 
4.3%
2014 432
 
4.3%
2004 431
 
4.3%
2003 425
 
4.2%
2013 422
 
4.2%
2002 422
 
4.2%
2008 420
 
4.2%
Other values (14) 5693
56.9%

성별
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
남자
3387 
남녀전체
3324 
여자
3289 

Length

Max length4
Median length2
Mean length2.6648
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row남자
2nd row남자
3rd row남녀전체
4th row남자
5th row여자

Common Values

ValueCountFrequency (%)
남자 3387
33.9%
남녀전체 3324
33.2%
여자 3289
32.9%

Length

2024-03-15T04:33:56.290387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-15T04:33:56.488699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남자 3387
33.9%
남녀전체 3324
33.2%
여자 3289
32.9%

국제질병분류
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
18. C70-C72
 
427
17. C67
 
424
20. C81
 
420
03. C16
 
417
00. All cancers
 
416
Other values (20)
7896 

Length

Max length21
Median length7
Mean length9.2062
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row14. C61
2nd row05. C22
3rd row15. C62
4th row15. C62
5th row00. All cancers

Common Values

ValueCountFrequency (%)
18. C70-C72 427
 
4.3%
17. C67 424
 
4.2%
20. C81 420
 
4.2%
03. C16 417
 
4.2%
00. All cancers 416
 
4.2%
24. All other cancers 415
 
4.2%
01. C00-C14 411
 
4.1%
16. C64 405
 
4.0%
22. C90 405
 
4.0%
04. C18-C20 405
 
4.0%
Other values (15) 5855
58.6%

Length

2024-03-15T04:33:56.692235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
all 831
 
3.9%
cancers 831
 
3.9%
18 427
 
2.0%
c70-c72 427
 
2.0%
c67 424
 
2.0%
17 424
 
2.0%
20 420
 
2.0%
c81 420
 
2.0%
03 417
 
2.0%
c16 417
 
2.0%
Other values (41) 16208
76.3%

암종
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
뇌 및 중추신경계
 
427
방광
 
424
호지킨림프종
 
420
 
417
모든암
 
416
Other values (20)
7896 

Length

Max length11
Median length9
Mean length3.7994
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전립선
2nd row
3rd row고환
4th row고환
5th row모든암

Common Values

ValueCountFrequency (%)
뇌 및 중추신경계 427
 
4.3%
방광 424
 
4.2%
호지킨림프종 420
 
4.2%
417
 
4.2%
모든암 416
 
4.2%
기타 암 415
 
4.2%
입술, 구강 및 인두 411
 
4.1%
신장 405
 
4.0%
다발성 골수종 405
 
4.0%
대장 405
 
4.0%
Other values (15) 5855
58.6%

Length

2024-03-15T04:33:56.919203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1239
 
9.0%
427
 
3.1%
중추신경계 427
 
3.1%
방광 424
 
3.1%
호지킨림프종 420
 
3.1%
417
 
3.0%
모든암 416
 
3.0%
기타 415
 
3.0%
415
 
3.0%
입술 411
 
3.0%
Other values (22) 8698
63.4%

연령군
Categorical

Distinct19
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
50-54세
 
558
25-29세
 
545
80-84세
 
544
70-74세
 
542
05-09세
 
536
Other values (14)
7275 

Length

Max length6
Median length6
Mean length5.8437
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row60-64세
2nd row40-44세
3rd row15-19세
4th row25-29세
5th row60-64세

Common Values

ValueCountFrequency (%)
50-54세 558
 
5.6%
25-29세 545
 
5.5%
80-84세 544
 
5.4%
70-74세 542
 
5.4%
05-09세 536
 
5.4%
40-44세 529
 
5.3%
35-39세 529
 
5.3%
15-19세 528
 
5.3%
30-34세 528
 
5.3%
45-49세 527
 
5.3%
Other values (9) 4634
46.3%

Length

2024-03-15T04:33:57.293914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
50-54세 558
 
5.6%
25-29세 545
 
5.5%
80-84세 544
 
5.4%
70-74세 542
 
5.4%
05-09세 536
 
5.4%
40-44세 529
 
5.3%
35-39세 529
 
5.3%
15-19세 528
 
5.3%
30-34세 528
 
5.3%
45-49세 527
 
5.3%
Other values (9) 4634
46.3%

발생자수
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct2230
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2469.8342
Minimum0
Maximum4382233
Zeros1561
Zeros (%)15.6%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-15T04:33:57.716960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median72
Q3404
95-th percentile4791.3
Maximum4382233
Range4382233
Interquartile range (IQR)399

Descriptive statistics

Standard deviation51776.306
Coefficient of variation (CV)20.963475
Kurtosis5519.0499
Mean2469.8342
Median Absolute Deviation (MAD)72
Skewness69.985649
Sum24698342
Variance2.6807859 × 109
MonotonicityNot monotonic
2024-03-15T04:33:57.970226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1561
 
15.6%
1 379
 
3.8%
2 231
 
2.3%
3 171
 
1.7%
4 123
 
1.2%
5 110
 
1.1%
6 100
 
1.0%
11 82
 
0.8%
9 82
 
0.8%
7 80
 
0.8%
Other values (2220) 7081
70.8%
ValueCountFrequency (%)
0 1561
15.6%
1 379
 
3.8%
2 231
 
2.3%
3 171
 
1.7%
4 123
 
1.2%
5 110
 
1.1%
6 100
 
1.0%
7 80
 
0.8%
8 74
 
0.7%
9 82
 
0.8%
ValueCountFrequency (%)
4382233 1
< 0.1%
2306970 1
< 0.1%
634529 1
< 0.1%
531576 1
< 0.1%
485175 1
< 0.1%
429086 1
< 0.1%
423824 1
< 0.1%
356435 1
< 0.1%
315660 1
< 0.1%
267463 1
< 0.1%

조발생률
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct1711
Distinct (%)20.3%
Missing1561
Missing (%)15.6%
Infinite0
Infinite (%)0.0%
Mean63.398625
Minimum0
Maximum3259.9
Zeros189
Zeros (%)1.9%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-15T04:33:58.215322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.1
Q11.2
median7.1
Q330.9
95-th percentile250.55
Maximum3259.9
Range3259.9
Interquartile range (IQR)29.7

Descriptive statistics

Standard deviation233.04569
Coefficient of variation (CV)3.6758792
Kurtosis77.636249
Mean63.398625
Median Absolute Deviation (MAD)6.8
Skewness8.0053721
Sum535021
Variance54310.294
MonotonicityNot monotonic
2024-03-15T04:33:58.559260image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.1 497
 
5.0%
0.2 266
 
2.7%
0.3 220
 
2.2%
0.0 189
 
1.9%
0.4 162
 
1.6%
0.5 153
 
1.5%
0.7 128
 
1.3%
0.6 104
 
1.0%
0.8 95
 
0.9%
0.9 94
 
0.9%
Other values (1701) 6531
65.3%
(Missing) 1561
 
15.6%
ValueCountFrequency (%)
0.0 189
 
1.9%
0.1 497
5.0%
0.2 266
2.7%
0.3 220
2.2%
0.4 162
 
1.6%
0.5 153
 
1.5%
0.6 104
 
1.0%
0.7 128
 
1.3%
0.8 95
 
0.9%
0.9 94
 
0.9%
ValueCountFrequency (%)
3259.9 1
< 0.1%
3193.7 1
< 0.1%
3188.1 1
< 0.1%
3187.6 1
< 0.1%
3166.4 1
< 0.1%
3116.0 1
< 0.1%
3059.5 1
< 0.1%
3000.5 1
< 0.1%
2973.0 1
< 0.1%
2955.9 1
< 0.1%

Interactions

2024-03-15T04:33:54.540266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:33:54.014096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:33:54.799495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:33:54.269712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-15T04:33:58.750761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발생연도성별국제질병분류암종연령군발생자수조발생률
발생연도1.0000.0000.0000.0000.0000.0850.000
성별0.0001.0000.0000.0000.0000.0150.161
국제질병분류0.0000.0001.0001.0000.0000.0360.527
암종0.0000.0001.0001.0000.0000.0360.527
연령군0.0000.0000.0000.0001.0000.0000.220
발생자수0.0850.0150.0360.0360.0001.0000.189
조발생률0.0000.1610.5270.5270.2200.1891.000
2024-03-15T04:33:58.944314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연령군발생연도국제질병분류암종성별
연령군1.0000.0000.0000.0000.000
발생연도0.0001.0000.0000.0000.000
국제질병분류0.0000.0001.0001.0000.000
암종0.0000.0001.0001.0000.000
성별0.0000.0000.0000.0001.000
2024-03-15T04:33:59.121597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발생자수조발생률발생연도성별국제질병분류암종연령군
발생자수1.0000.8570.0410.0140.0190.0190.000
조발생률0.8571.0000.0000.0970.2140.2140.084
발생연도0.0410.0001.0000.0000.0000.0000.000
성별0.0140.0970.0001.0000.0000.0000.000
국제질병분류0.0190.2140.0000.0001.0001.0000.000
암종0.0190.2140.0000.0001.0001.0000.000
연령군0.0000.0840.0000.0000.0000.0001.000

Missing values

2024-03-15T04:33:55.160764image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-15T04:33:55.561197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

발생연도성별국제질병분류암종연령군발생자수조발생률
197532009남자14. C61전립선60-64세95894.3
77792015남자05. C2240-44세43819.5
210932009남녀전체15. C62고환15-19세120.3
215702017남자15. C62고환25-29세714.2
7342011여자00. All cancers모든암60-64세9903855.8
260822000남자19. C73갑상선70-74세195.8
269482015여자19. C73갑상선30-34세169891.6
154472006남녀전체11. C53자궁경부00-04세0<NA>
59312007남녀전체04. C18-C20대장15-19세80.2
72232005여자05. C2215-19세60.4
발생연도성별국제질병분류암종연령군발생자수조발생률
19162008남자01. C00-C14입술, 구강 및 인두80-84세5235.2
215492017남녀전체15. C62고환15-19세60.2
61322010남자04. C18-C20대장70-74세2480377.2
277682006남녀전체20. C81호지킨림프종45-49세140.3
87842009남녀전체06. C23-C24담낭 및 기타담도30-34세130.3
116932012남녀전체08. C32후두40-44세100.2
130592012남녀전체09. C33-C3430-34세481.2
299712020여자21. C82-C86,C96비호지킨림프종40-44세934.9
226492012남자16. C64신장05-09세10.1
93502019남녀전체06. C23-C24담낭 및 기타담도10-14세0<NA>