Overview

Dataset statistics

Number of variables7
Number of observations1725
Missing cells230
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory101.2 KiB
Average record size in memory60.1 B

Variable types

Numeric4
Categorical3

Dataset

Description2023년 12월 공표된 [2021년 암등록통계] 중, 국내 24개 암종별 암발생률 자료임.과거 자료 최신화로 인해, 1999-2020년의 수치가 변동됨.(단위 : 명, 10만 명 당 발생률)
Author국립암센터
URLhttps://www.data.go.kr/data/15009644/fileData.do

Alerts

국제질병분류 is highly overall correlated with 암종High correlation
암종 is highly overall correlated with 국제질병분류High correlation
발생자수 is highly overall correlated with 조발생률 and 1 other fieldsHigh correlation
조발생률 is highly overall correlated with 발생자수 and 1 other fieldsHigh correlation
연령표준화발생률 is highly overall correlated with 발생자수 and 1 other fieldsHigh correlation
조발생률 has 115 (6.7%) missing valuesMissing
연령표준화발생률 has 115 (6.7%) missing valuesMissing
발생자수 has 115 (6.7%) zerosZeros

Reproduction

Analysis started2024-03-14 23:14:27.450923
Analysis finished2024-03-14 23:14:33.541664
Duration6.09 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

발생연도
Real number (ℝ)

Distinct23
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2010
Minimum1999
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.3 KiB
2024-03-15T08:14:33.662073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1999
5-th percentile2000
Q12004
median2010
Q32016
95-th percentile2020
Maximum2021
Range22
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.6351731
Coefficient of variation (CV)0.0033010811
Kurtosis-1.2045578
Mean2010
Median Absolute Deviation (MAD)6
Skewness0
Sum3467250
Variance44.025522
MonotonicityIncreasing
2024-03-15T08:14:33.882292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
1999 75
 
4.3%
2000 75
 
4.3%
2021 75
 
4.3%
2020 75
 
4.3%
2019 75
 
4.3%
2018 75
 
4.3%
2017 75
 
4.3%
2016 75
 
4.3%
2015 75
 
4.3%
2014 75
 
4.3%
Other values (13) 975
56.5%
ValueCountFrequency (%)
1999 75
4.3%
2000 75
4.3%
2001 75
4.3%
2002 75
4.3%
2003 75
4.3%
2004 75
4.3%
2005 75
4.3%
2006 75
4.3%
2007 75
4.3%
2008 75
4.3%
ValueCountFrequency (%)
2021 75
4.3%
2020 75
4.3%
2019 75
4.3%
2018 75
4.3%
2017 75
4.3%
2016 75
4.3%
2015 75
4.3%
2014 75
4.3%
2013 75
4.3%
2012 75
4.3%

성별
Categorical

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
남녀전체
575 
남자
575 
여자
575 

Length

Max length4
Median length2
Mean length2.6666667
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row남녀전체
2nd row남녀전체
3rd row남녀전체
4th row남녀전체
5th row남녀전체

Common Values

ValueCountFrequency (%)
남녀전체 575
33.3%
남자 575
33.3%
여자 575
33.3%

Length

2024-03-15T08:14:34.124329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-15T08:14:34.484700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남녀전체 575
33.3%
남자 575
33.3%
여자 575
33.3%

국제질병분류
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
C00-C96
 
69
C00-C14
 
69
C15
 
69
C16
 
69
C18-C20
 
69
Other values (20)
1380 

Length

Max length11
Median length3
Mean length4.72
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC00-C96
2nd rowC00-C14
3rd rowC15
4th rowC16
5th rowC18-C20

Common Values

ValueCountFrequency (%)
C00-C96 69
 
4.0%
C00-C14 69
 
4.0%
C15 69
 
4.0%
C16 69
 
4.0%
C18-C20 69
 
4.0%
C22 69
 
4.0%
C23-C24 69
 
4.0%
C25 69
 
4.0%
C32 69
 
4.0%
C33-C34 69
 
4.0%
Other values (15) 1035
60.0%

Length

2024-03-15T08:14:34.769390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c00-c96 69
 
4.0%
c56 69
 
4.0%
c91-c95 69
 
4.0%
c90 69
 
4.0%
c82-c86,c96 69
 
4.0%
c81 69
 
4.0%
c73 69
 
4.0%
c70-c72 69
 
4.0%
c67 69
 
4.0%
c64 69
 
4.0%
Other values (15) 1035
60.0%

암종
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
모든암
 
69
입술, 구강 및 인두
 
69
식도
 
69
 
69
대장
 
69
Other values (20)
1380 

Length

Max length11
Median length9
Mean length3.76
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row모든암
2nd row입술, 구강 및 인두
3rd row식도
4th row
5th row대장

Common Values

ValueCountFrequency (%)
모든암 69
 
4.0%
입술, 구강 및 인두 69
 
4.0%
식도 69
 
4.0%
69
 
4.0%
대장 69
 
4.0%
69
 
4.0%
담낭 및 기타담도 69
 
4.0%
췌장 69
 
4.0%
후두 69
 
4.0%
69
 
4.0%
Other values (15) 1035
60.0%

Length

2024-03-15T08:14:35.129983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
207
 
8.8%
모든암 69
 
2.9%
난소 69
 
2.9%
기타 69
 
2.9%
백혈병 69
 
2.9%
골수종 69
 
2.9%
다발성 69
 
2.9%
비호지킨림프종 69
 
2.9%
호지킨림프종 69
 
2.9%
갑상선 69
 
2.9%
Other values (22) 1518
64.7%

발생자수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1369
Distinct (%)79.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10161.7
Minimum0
Maximum277523
Zeros115
Zeros (%)6.7%
Negative0
Negative (%)0.0%
Memory size15.3 KiB
2024-03-15T08:14:35.494655image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1906
median2463
Q37992
95-th percentile30526.6
Maximum277523
Range277523
Interquartile range (IQR)7086

Descriptive statistics

Standard deviation27561.421
Coefficient of variation (CV)2.7122845
Kurtosis38.225033
Mean10161.7
Median Absolute Deviation (MAD)2077
Skewness5.7387966
Sum17528932
Variance7.5963191 × 108
MonotonicityNot monotonic
2024-03-15T08:14:35.945724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 115
 
6.7%
62 4
 
0.2%
177 4
 
0.2%
155 4
 
0.2%
182 4
 
0.2%
76 4
 
0.2%
154 4
 
0.2%
328 3
 
0.2%
2808 3
 
0.2%
9458 3
 
0.2%
Other values (1359) 1577
91.4%
ValueCountFrequency (%)
0 115
6.7%
35 1
 
0.1%
37 1
 
0.1%
41 1
 
0.1%
44 1
 
0.1%
46 1
 
0.1%
48 2
 
0.1%
51 2
 
0.1%
53 1
 
0.1%
57 1
 
0.1%
ValueCountFrequency (%)
277523 1
0.1%
258121 1
0.1%
250521 1
0.1%
247251 1
0.1%
237181 1
0.1%
233426 1
0.1%
229659 1
0.1%
228956 1
0.1%
222996 1
0.1%
221356 1
0.1%

조발생률
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct551
Distinct (%)34.2%
Missing115
Missing (%)6.7%
Infinite0
Infinite (%)0.0%
Mean32.622174
Minimum0.1
Maximum561.7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.3 KiB
2024-03-15T08:14:36.410720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.1
5-th percentile0.4
Q13.7
median8.2
Q329.275
95-th percentile91.795
Maximum561.7
Range561.6
Interquartile range (IQR)25.575

Descriptive statistics

Standard deviation79.326079
Coefficient of variation (CV)2.4316613
Kurtosis20.692893
Mean32.622174
Median Absolute Deviation (MAD)6.2
Skewness4.4973279
Sum52521.7
Variance6292.6269
MonotonicityNot monotonic
2024-03-15T08:14:36.914324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.3 40
 
2.3%
0.4 33
 
1.9%
0.7 24
 
1.4%
0.5 19
 
1.1%
0.6 18
 
1.0%
0.2 18
 
1.0%
3.1 16
 
0.9%
2.8 15
 
0.9%
4.2 15
 
0.9%
4.3 15
 
0.9%
Other values (541) 1397
81.0%
(Missing) 115
 
6.7%
ValueCountFrequency (%)
0.1 1
 
0.1%
0.2 18
1.0%
0.3 40
2.3%
0.4 33
1.9%
0.5 19
1.1%
0.6 18
1.0%
0.7 24
1.4%
0.8 10
 
0.6%
0.9 6
 
0.3%
1.0 11
 
0.6%
ValueCountFrequency (%)
561.7 1
0.1%
540.6 1
0.1%
530.9 1
0.1%
519.7 1
0.1%
515.2 1
0.1%
509.9 1
0.1%
502.8 1
0.1%
487.9 2
0.1%
482.0 1
0.1%
479.1 1
0.1%

연령표준화발생률
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct619
Distinct (%)38.4%
Missing115
Missing (%)6.7%
Infinite0
Infinite (%)0.0%
Mean43.623043
Minimum0.2
Maximum686.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.3 KiB
2024-03-15T08:14:37.262105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile0.5
Q14.5
median11.7
Q336.525
95-th percentile131.03
Maximum686.9
Range686.7
Interquartile range (IQR)32.025

Descriptive statistics

Standard deviation104.35027
Coefficient of variation (CV)2.3920905
Kurtosis18.697302
Mean43.623043
Median Absolute Deviation (MAD)8.5
Skewness4.3125338
Sum70233.1
Variance10888.978
MonotonicityNot monotonic
2024-03-15T08:14:37.515154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.5 36
 
2.1%
0.4 31
 
1.8%
0.6 25
 
1.4%
0.3 22
 
1.3%
3.5 19
 
1.1%
0.7 16
 
0.9%
3.6 15
 
0.9%
3.9 15
 
0.9%
3.4 15
 
0.9%
5.8 13
 
0.8%
Other values (609) 1403
81.3%
(Missing) 115
 
6.7%
ValueCountFrequency (%)
0.2 11
 
0.6%
0.3 22
1.3%
0.4 31
1.8%
0.5 36
2.1%
0.6 25
1.4%
0.7 16
0.9%
0.8 6
 
0.3%
0.9 7
 
0.4%
1.0 12
 
0.7%
1.1 7
 
0.4%
ValueCountFrequency (%)
686.9 1
0.1%
675.8 1
0.1%
672.5 1
0.1%
672.1 1
0.1%
660.3 1
0.1%
651.2 1
0.1%
646.9 1
0.1%
636.0 1
0.1%
631.9 1
0.1%
626.5 1
0.1%

Interactions

2024-03-15T08:14:31.549686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:28.069065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:29.126192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:30.422705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:31.828773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:28.326599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:29.486125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:30.718176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:32.137577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:28.587413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:29.798857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:30.991079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:32.416308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:28.857522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:30.094543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T08:14:31.261856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-15T08:14:37.722629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발생연도성별국제질병분류암종발생자수조발생률연령표준화발생률
발생연도1.0000.0000.0000.0000.1880.3080.067
성별0.0001.0000.0000.0000.2430.1250.420
국제질병분류0.0000.0001.0001.0000.6960.7440.740
암종0.0000.0001.0001.0000.6960.7440.740
발생자수0.1880.2430.6960.6961.0000.9380.794
조발생률0.3080.1250.7440.7440.9381.0000.876
연령표준화발생률0.0670.4200.7400.7400.7940.8761.000
2024-03-15T08:14:38.227136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별국제질병분류암종
성별1.0000.0000.000
국제질병분류0.0001.0001.000
암종0.0001.0001.000
2024-03-15T08:14:38.383084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발생연도발생자수조발생률연령표준화발생률성별국제질병분류암종
발생연도1.0000.1750.1920.0600.0000.0000.000
발생자수0.1751.0000.9680.9460.1490.3250.325
조발생률0.1920.9681.0000.9830.0750.3680.368
연령표준화발생률0.0600.9460.9831.0000.2040.3920.392
성별0.0000.1490.0750.2041.0000.0000.000
국제질병분류0.0000.3250.3680.3920.0001.0001.000
암종0.0000.3250.3680.3920.0001.0001.000

Missing values

2024-03-15T08:14:32.844150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-15T08:14:33.275710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-15T08:14:33.459563image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

발생연도성별국제질병분류암종발생자수조발생률연령표준화발생률
01999남녀전체C00-C96모든암101857216.0402.7
11999남녀전체C00-C14입술, 구강 및 인두17393.76.6
21999남녀전체C15식도18613.98.2
31999남녀전체C162090144.386.0
41999남녀전체C18-C20대장978020.740.8
51999남녀전체C221326228.152.4
61999남녀전체C23-C24담낭 및 기타담도30476.514.0
71999남녀전체C25췌장26145.511.7
81999남녀전체C32후두11012.34.8
91999남녀전체C33-C341323028.159.8
발생연도성별국제질병분류암종발생자수조발생률연령표준화발생률
17152021여자C62고환0<NA><NA>
17162021여자C64신장21088.27.7
17172021여자C67방광9683.83.2
17182021여자C70-C72뇌 및 중추신경계9393.63.4
17192021여자C73갑상선26532103.1104.0
17202021여자C81호지킨림프종1120.40.4
17212021여자C82-C86,C96비호지킨림프종24349.58.7
17222021여자C90다발성 골수종9233.63.2
17232021여자C91-C95백혈병16866.56.2
17242021여자Re.C00-C96기타 암1199546.641.2