Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Categorical3
Text1
Numeric1

Dataset

Description1. 등록연도: 2018년~2022년 2. 암 상병별(KCD기준 상병코드 3단까지) 등록자 수(신규암, 재등록암, 중복암 포함) * 암: 보건복지부고시 「본인일부부담금 산정특례에 관한 기준」 [별표 3] 중증질환자 산정특례 대상 의 구분1에 해당하는 경우 전체 ** 삭제 건 제외 3. 성별구분: 남, 여 4. 연령구분: 연도말 기준 만나이. 5세 단위 구분. 단 0세와 1~4세는 구분. (0세, 1~4세/5~9세/…/100세 이상) ※ 민원인의 제공 신청에 따른 제공 건으로서 2023-07-14 발췌
URLhttps://www.data.go.kr/data/15116694/fileData.do

Alerts

등록자수 has 3492 (34.9%) zerosZeros

Reproduction

Analysis started2023-12-12 21:16:22.533671
Analysis finished2023-12-12 21:16:23.029660
Duration0.5 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

등록연도
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2021
2017 
2022
2014 
2018
2004 
2020
1990 
2019
1975 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021
2nd row2022
3rd row2019
4th row2019
5th row2018

Common Values

ValueCountFrequency (%)
2021 2017
20.2%
2022 2014
20.1%
2018 2004
20.0%
2020 1990
19.9%
2019 1975
19.8%

Length

2023-12-13T06:16:23.108022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:16:23.245646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2021 2017
20.2%
2022 2014
20.1%
2018 2004
20.0%
2020 1990
19.9%
2019 1975
19.8%
Distinct112
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T06:16:23.507640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters30000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC18
2nd rowC65
3rd rowC08
4th rowD47
5th rowC33
ValueCountFrequency (%)
c88 111
 
1.1%
c81 106
 
1.1%
c66 105
 
1.1%
c20 104
 
1.0%
c68 104
 
1.0%
c07 102
 
1.0%
d05 102
 
1.0%
c11 101
 
1.0%
c85 101
 
1.0%
c46 100
 
1.0%
Other values (102) 8964
89.6%
2023-12-13T06:16:24.140699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 7982
26.6%
0 2795
 
9.3%
4 2606
 
8.7%
3 2323
 
7.7%
D 2018
 
6.7%
1 1933
 
6.4%
6 1899
 
6.3%
7 1867
 
6.2%
5 1781
 
5.9%
8 1695
 
5.7%
Other values (2) 3101
 
10.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20000
66.7%
Uppercase Letter 10000
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2795
14.0%
4 2606
13.0%
3 2323
11.6%
1 1933
9.7%
6 1899
9.5%
7 1867
9.3%
5 1781
8.9%
8 1695
8.5%
2 1656
8.3%
9 1445
7.2%
Uppercase Letter
ValueCountFrequency (%)
C 7982
79.8%
D 2018
 
20.2%

Most occurring scripts

ValueCountFrequency (%)
Common 20000
66.7%
Latin 10000
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2795
14.0%
4 2606
13.0%
3 2323
11.6%
1 1933
9.7%
6 1899
9.5%
7 1867
9.3%
5 1781
8.9%
8 1695
8.5%
2 1656
8.3%
9 1445
7.2%
Latin
ValueCountFrequency (%)
C 7982
79.8%
D 2018
 
20.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 7982
26.6%
0 2795
 
9.3%
4 2606
 
8.7%
3 2323
 
7.7%
D 2018
 
6.7%
1 1933
 
6.4%
6 1899
 
6.3%
7 1867
 
6.2%
5 1781
 
5.9%
8 1695
 
5.7%
Other values (2) 3101
 
10.3%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
5040 
4960 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
5040
50.4%
4960
49.6%

Length

2023-12-13T06:16:24.260280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:16:24.351152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5040
50.4%
4960
49.6%

연령
Categorical

Distinct22
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
55~59
 
500
90~94
 
484
5~9
 
482
70~74
 
471
35~39
 
464
Other values (17)
7599 

Length

Max length6
Median length5
Mean length4.6809
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row75~79
2nd row90~94
3rd row60~64
4th row45~49
5th row45~49

Common Values

ValueCountFrequency (%)
55~59 500
 
5.0%
90~94 484
 
4.8%
5~9 482
 
4.8%
70~74 471
 
4.7%
35~39 464
 
4.6%
95~99 463
 
4.6%
40~44 462
 
4.6%
1~4 460
 
4.6%
60~64 459
 
4.6%
20~24 458
 
4.6%
Other values (12) 5297
53.0%

Length

2023-12-13T06:16:24.439359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
55~59 500
 
4.8%
90~94 484
 
4.6%
5~9 482
 
4.6%
70~74 471
 
4.5%
35~39 464
 
4.4%
95~99 463
 
4.4%
40~44 462
 
4.4%
1~4 460
 
4.4%
60~64 459
 
4.4%
20~24 458
 
4.4%
Other values (13) 5726
54.9%

등록자수
Real number (ℝ)

ZEROS 

Distinct649
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean66.4386
Minimum0
Maximum5262
Zeros3492
Zeros (%)34.9%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T06:16:24.548104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median3
Q322
95-th percentile266
Maximum5262
Range5262
Interquartile range (IQR)22

Descriptive statistics

Standard deviation292.48421
Coefficient of variation (CV)4.4023235
Kurtosis110.21405
Mean66.4386
Median Absolute Deviation (MAD)3
Skewness9.4246897
Sum664386
Variance85547.014
MonotonicityNot monotonic
2023-12-13T06:16:24.686507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 3492
34.9%
1 772
 
7.7%
2 494
 
4.9%
3 385
 
3.9%
4 297
 
3.0%
5 266
 
2.7%
6 225
 
2.2%
7 188
 
1.9%
8 173
 
1.7%
9 156
 
1.6%
Other values (639) 3552
35.5%
ValueCountFrequency (%)
0 3492
34.9%
1 772
 
7.7%
2 494
 
4.9%
3 385
 
3.9%
4 297
 
3.0%
5 266
 
2.7%
6 225
 
2.2%
7 188
 
1.9%
8 173
 
1.7%
9 156
 
1.6%
ValueCountFrequency (%)
5262 1
< 0.1%
5189 1
< 0.1%
5129 1
< 0.1%
4911 1
< 0.1%
4412 1
< 0.1%
4398 1
< 0.1%
4147 1
< 0.1%
3967 1
< 0.1%
3860 1
< 0.1%
3837 1
< 0.1%

Interactions

2023-12-13T06:16:22.817566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T06:16:24.783919image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록연도성별연령등록자수
등록연도1.0000.0000.0000.000
성별0.0001.0000.0000.026
연령0.0000.0001.0000.180
등록자수0.0000.0260.1801.000
2023-12-13T06:16:24.876470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록연도연령성별
등록연도1.0000.0000.000
연령0.0001.0000.000
성별0.0000.0001.000
2023-12-13T06:16:24.951938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록자수등록연도성별연령
등록자수1.0000.0000.0200.068
등록연도0.0001.0000.0000.000
성별0.0200.0001.0000.000
연령0.0680.0000.0001.000

Missing values

2023-12-13T06:16:22.910313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:16:22.992154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

등록연도등록 상병(KCD 분류 기준)성별연령등록자수
155922021C1875~791282
223052022C6590~947
52932019C0860~6423
97782019D4745~4973
13302018C3345~492
193082021D3965~6939
175752021C7090~940
198452022C031~40
218182022C5375~79144
61762019C3175~7913
등록연도등록 상병(KCD 분류 기준)성별연령등록자수
107732020C2070~74266
14422018C3755~5949
34632018C8540~4453
25162018C6435~39168
180202021C805~91
92852019D331~44
114622020C4103
3062018C0695~992
42922018D095~90
176012021C711~47