Overview

Dataset statistics

Number of variables9
Number of observations21
Missing cells97
Missing cells (%)51.3%
Duplicate rows1
Duplicate rows (%)4.8%
Total size in memory1.6 KiB
Average record size in memory78.3 B

Variable types

Unsupported2
Text5
Categorical1
Boolean1

Dataset

Description건강보험 가입자의 주요질환 분포 현황
Author국민건강보험공단
URLhttps://www.vworld.kr/dtmk/dtmk_ntads_s002.do?dsId=30026

Alerts

Unnamed: 5 has constant value ""Constant
Unnamed: 8 has constant value ""Constant
Dataset has 1 (4.8%) duplicate rowsDuplicates
테이블정의서 has 1 (4.8%) missing valuesMissing
Unnamed: 1 has 11 (52.4%) missing valuesMissing
Unnamed: 2 has 7 (33.3%) missing valuesMissing
Unnamed: 4 has 10 (47.6%) missing valuesMissing
Unnamed: 5 has 12 (57.1%) missing valuesMissing
Unnamed: 6 has 18 (85.7%) missing valuesMissing
Unnamed: 7 has 18 (85.7%) missing valuesMissing
Unnamed: 8 has 20 (95.2%) missing valuesMissing
테이블정의서 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-22 00:14:15.998427
Analysis finished2024-04-22 00:14:16.688117
Duration0.69 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

테이블정의서
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing1
Missing (%)4.8%
Memory size300.0 B

Unnamed: 1
Text

MISSING 

Distinct10
Distinct (%)100.0%
Missing11
Missing (%)52.4%
Memory size300.0 B
2024-04-22T09:14:16.842823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length8.3
Min length4

Characters and Unicode

Total characters83
Distinct characters21
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)100.0%

Sample

1st row컬럼ID
2nd rowCOORD_X_RE
3rd rowCOORD_Y_RE
4th rowCOORD_XY
5th rowSUM_HP
ValueCountFrequency (%)
컬럼id 1
10.0%
coord_x_re 1
10.0%
coord_y_re 1
10.0%
coord_xy 1
10.0%
sum_hp 1
10.0%
sum_dib 1
10.0%
sum_hyper 1
10.0%
sum_cancer 1
10.0%
sum_heart 1
10.0%
sum_stroke 1
10.0%
2024-04-22T09:14:17.270979image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 11
13.3%
R 9
10.8%
O 7
 
8.4%
S 7
 
8.4%
U 6
 
7.2%
E 6
 
7.2%
M 6
 
7.2%
D 5
 
6.0%
C 5
 
6.0%
Y 3
 
3.6%
Other values (11) 18
21.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 70
84.3%
Connector Punctuation 11
 
13.3%
Other Letter 2
 
2.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 9
12.9%
O 7
10.0%
S 7
10.0%
U 6
8.6%
E 6
8.6%
M 6
8.6%
D 5
 
7.1%
C 5
 
7.1%
Y 3
 
4.3%
H 3
 
4.3%
Other values (8) 13
18.6%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 70
84.3%
Common 11
 
13.3%
Hangul 2
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 9
12.9%
O 7
10.0%
S 7
10.0%
U 6
8.6%
E 6
8.6%
M 6
8.6%
D 5
 
7.1%
C 5
 
7.1%
Y 3
 
4.3%
H 3
 
4.3%
Other values (8) 13
18.6%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%
Common
ValueCountFrequency (%)
_ 11
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 81
97.6%
Hangul 2
 
2.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 11
13.6%
R 9
11.1%
O 7
8.6%
S 7
8.6%
U 6
 
7.4%
E 6
 
7.4%
M 6
 
7.4%
D 5
 
6.2%
C 5
 
6.2%
Y 3
 
3.7%
Other values (9) 16
19.8%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Unnamed: 2
Text

MISSING 

Distinct14
Distinct (%)100.0%
Missing7
Missing (%)33.3%
Memory size300.0 B
2024-04-22T09:14:17.493434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length21.5
Mean length16
Min length3

Characters and Unicode

Total characters224
Distinct characters89
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)100.0%

Sample

1st row김재안
2nd row건강보험통계
3rd row건강보험 가입자의 주요질환 분포 현황
4th row컬럼명
5th rowX좌표의 200m단위 grid
ValueCountFrequency (%)
환자수(상병기호 3
 
7.9%
200m단위 3
 
7.9%
grid 3
 
7.9%
e10~e14 1
 
2.6%
고지혈증(이상지질혈증 1
 
2.6%
e78 1
 
2.6%
악성신생물 1
 
2.6%
환자수(암환자 1
 
2.6%
김재안 1
 
2.6%
당뇨병 1
 
2.6%
Other values (22) 22
57.9%
2024-04-22T09:14:17.839543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
24
 
10.7%
9
 
4.0%
0 9
 
4.0%
) 8
 
3.6%
8
 
3.6%
( 8
 
3.6%
7
 
3.1%
2 6
 
2.7%
1 6
 
2.7%
I 6
 
2.7%
Other values (79) 133
59.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 120
53.6%
Decimal Number 30
 
13.4%
Space Separator 24
 
10.7%
Lowercase Letter 15
 
6.7%
Uppercase Letter 14
 
6.2%
Close Punctuation 8
 
3.6%
Open Punctuation 8
 
3.6%
Math Symbol 4
 
1.8%
Dash Punctuation 1
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9
 
7.5%
8
 
6.7%
7
 
5.8%
5
 
4.2%
5
 
4.2%
4
 
3.3%
4
 
3.3%
4
 
3.3%
3
 
2.5%
3
 
2.5%
Other values (54) 68
56.7%
Decimal Number
ValueCountFrequency (%)
0 9
30.0%
2 6
20.0%
1 6
20.0%
4 2
 
6.7%
6 2
 
6.7%
9 1
 
3.3%
3 1
 
3.3%
8 1
 
3.3%
5 1
 
3.3%
7 1
 
3.3%
Uppercase Letter
ValueCountFrequency (%)
I 6
42.9%
E 3
21.4%
X 2
 
14.3%
Y 2
 
14.3%
V 1
 
7.1%
Lowercase Letter
ValueCountFrequency (%)
d 3
20.0%
i 3
20.0%
r 3
20.0%
g 3
20.0%
m 3
20.0%
Space Separator
ValueCountFrequency (%)
24
100.0%
Close Punctuation
ValueCountFrequency (%)
) 8
100.0%
Open Punctuation
ValueCountFrequency (%)
( 8
100.0%
Math Symbol
ValueCountFrequency (%)
~ 4
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 120
53.6%
Common 75
33.5%
Latin 29
 
12.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9
 
7.5%
8
 
6.7%
7
 
5.8%
5
 
4.2%
5
 
4.2%
4
 
3.3%
4
 
3.3%
4
 
3.3%
3
 
2.5%
3
 
2.5%
Other values (54) 68
56.7%
Common
ValueCountFrequency (%)
24
32.0%
0 9
 
12.0%
) 8
 
10.7%
( 8
 
10.7%
2 6
 
8.0%
1 6
 
8.0%
~ 4
 
5.3%
4 2
 
2.7%
6 2
 
2.7%
9 1
 
1.3%
Other values (5) 5
 
6.7%
Latin
ValueCountFrequency (%)
I 6
20.7%
d 3
10.3%
i 3
10.3%
r 3
10.3%
g 3
10.3%
m 3
10.3%
E 3
10.3%
X 2
 
6.9%
Y 2
 
6.9%
V 1
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 120
53.6%
ASCII 104
46.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
24
23.1%
0 9
 
8.7%
) 8
 
7.7%
( 8
 
7.7%
2 6
 
5.8%
1 6
 
5.8%
I 6
 
5.8%
~ 4
 
3.8%
d 3
 
2.9%
i 3
 
2.9%
Other values (15) 27
26.0%
Hangul
ValueCountFrequency (%)
9
 
7.5%
8
 
6.7%
7
 
5.8%
5
 
4.2%
5
 
4.2%
4
 
3.3%
4
 
3.3%
4
 
3.3%
3
 
2.5%
3
 
2.5%
Other values (54) 68
56.7%

Unnamed: 3
Categorical

Distinct5
Distinct (%)23.8%
Missing0
Missing (%)0.0%
Memory size300.0 B
<NA>
10 
Numeric
테이블ID
 
1
타입
 
1
Character
 
1

Length

Max length9
Median length7
Mean length5.3333333
Min length2

Unique

Unique3 ?
Unique (%)14.3%

Sample

1st row<NA>
2nd row테이블ID
3rd row<NA>
4th row타입
5th rowNumeric

Common Values

ValueCountFrequency (%)
<NA> 10
47.6%
Numeric 8
38.1%
테이블ID 1
 
4.8%
타입 1
 
4.8%
Character 1
 
4.8%

Length

2024-04-22T09:14:17.980573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T09:14:18.098081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 10
47.6%
numeric 8
38.1%
테이블id 1
 
4.8%
타입 1
 
4.8%
character 1
 
4.8%

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10
Missing (%)47.6%
Memory size300.0 B

Unnamed: 5
Boolean

CONSTANT  MISSING 

Distinct1
Distinct (%)11.1%
Missing12
Missing (%)57.1%
Memory size174.0 B
True
(Missing)
12 
ValueCountFrequency (%)
True 9
42.9%
(Missing) 12
57.1%
2024-04-22T09:14:18.208750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Unnamed: 6
Text

MISSING 

Distinct3
Distinct (%)100.0%
Missing18
Missing (%)85.7%
Memory size300.0 B
2024-04-22T09:14:18.341436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length4
Mean length4
Min length3

Characters and Unicode

Total characters12
Distinct characters11
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)100.0%

Sample

1st row작성일
2nd row테이블명
3rd rowPK/FK
ValueCountFrequency (%)
작성일 1
33.3%
테이블명 1
33.3%
pk/fk 1
33.3%
2024-04-22T09:14:18.617618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
K 2
16.7%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
P 1
8.3%
/ 1
8.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7
58.3%
Uppercase Letter 4
33.3%
Other Punctuation 1
 
8.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Uppercase Letter
ValueCountFrequency (%)
K 2
50.0%
P 1
25.0%
F 1
25.0%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7
58.3%
Latin 4
33.3%
Common 1
 
8.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Latin
ValueCountFrequency (%)
K 2
50.0%
P 1
25.0%
F 1
25.0%
Common
ValueCountFrequency (%)
/ 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7
58.3%
ASCII 5
41.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
K 2
40.0%
P 1
20.0%
/ 1
20.0%
F 1
20.0%
Hangul
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%

Unnamed: 7
Text

MISSING 

Distinct3
Distinct (%)100.0%
Missing18
Missing (%)85.7%
Memory size300.0 B
2024-04-22T09:14:18.763318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length10
Mean length9.3333333
Min length7

Characters and Unicode

Total characters28
Distinct characters21
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)100.0%

Sample

1st row2020.02.05
2nd row전국민 주요질환 통계
3rd rowDefault
ValueCountFrequency (%)
2020.02.05 1
20.0%
전국민 1
20.0%
주요질환 1
20.0%
통계 1
20.0%
default 1
20.0%
2024-04-22T09:14:19.044492image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4
 
14.3%
2 3
 
10.7%
. 2
 
7.1%
2
 
7.1%
1
 
3.6%
l 1
 
3.6%
u 1
 
3.6%
a 1
 
3.6%
f 1
 
3.6%
e 1
 
3.6%
Other values (11) 11
39.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 9
32.1%
Decimal Number 8
28.6%
Lowercase Letter 6
21.4%
Other Punctuation 2
 
7.1%
Space Separator 2
 
7.1%
Uppercase Letter 1
 
3.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Lowercase Letter
ValueCountFrequency (%)
l 1
16.7%
u 1
16.7%
a 1
16.7%
f 1
16.7%
e 1
16.7%
t 1
16.7%
Decimal Number
ValueCountFrequency (%)
0 4
50.0%
2 3
37.5%
5 1
 
12.5%
Other Punctuation
ValueCountFrequency (%)
. 2
100.0%
Space Separator
ValueCountFrequency (%)
2
100.0%
Uppercase Letter
ValueCountFrequency (%)
D 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12
42.9%
Hangul 9
32.1%
Latin 7
25.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Latin
ValueCountFrequency (%)
l 1
14.3%
u 1
14.3%
a 1
14.3%
f 1
14.3%
e 1
14.3%
D 1
14.3%
t 1
14.3%
Common
ValueCountFrequency (%)
0 4
33.3%
2 3
25.0%
. 2
16.7%
2
16.7%
5 1
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19
67.9%
Hangul 9
32.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4
21.1%
2 3
15.8%
. 2
10.5%
2
10.5%
l 1
 
5.3%
u 1
 
5.3%
a 1
 
5.3%
f 1
 
5.3%
e 1
 
5.3%
D 1
 
5.3%
Other values (2) 2
10.5%
Hangul
ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%

Unnamed: 8
Text

CONSTANT  MISSING 

Distinct1
Distinct (%)100.0%
Missing20
Missing (%)95.2%
Memory size300.0 B
2024-04-22T09:14:19.196671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters9
Distinct characters9
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)100.0%

Sample

1st row참조테이블명/비고
ValueCountFrequency (%)
참조테이블명/비고 1
100.0%
2024-04-22T09:14:19.474362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
/ 1
11.1%
1
11.1%
1
11.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8
88.9%
Other Punctuation 1
 
11.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8
88.9%
Common 1
 
11.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Common
ValueCountFrequency (%)
/ 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8
88.9%
ASCII 1
 
11.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
ASCII
ValueCountFrequency (%)
/ 1
100.0%

Correlations

2024-04-22T09:14:19.557097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 6Unnamed: 7
Unnamed: 11.0001.0001.000NaNNaN
Unnamed: 21.0001.0001.0001.0001.000
Unnamed: 31.0001.0001.0000.0000.000
Unnamed: 6NaN1.0000.0001.0001.000
Unnamed: 7NaN1.0000.0001.0001.000

Missing values

2024-04-22T09:14:16.279933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-22T09:14:16.414200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-22T09:14:16.554487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

테이블정의서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
0작성자<NA>김재안<NA>NaN<NA>작성일2020.02.05<NA>
1주제영역명<NA>건강보험통계테이블IDNHIS_2018_SICK<NA>테이블명전국민 주요질환 통계<NA>
2테이블설명<NA>건강보험 가입자의 주요질환 분포 현황<NA>NaN<NA><NA><NA><NA>
3No컬럼ID컬럼명타입길이(Byte)<NA>PK/FKDefault참조테이블명/비고
41COORD_X_REX좌표의 200m단위 gridNumeric8Y<NA><NA><NA>
52COORD_Y_REY좌표의 200m단위 gridNumeric8Y<NA><NA><NA>
63COORD_XYX-Y좌표의 200m단위 gridCharacter22Y<NA><NA><NA>
74SUM_HP고혈압 환자 수(상병기호 I10~I15)Numeric8Y<NA><NA><NA>
85SUM_DIB당뇨병 환자수(상병기호 E10~E14)Numeric8Y<NA><NA><NA>
96SUM_HYPER고지혈증(이상지질혈증) 환자수(상병기호 E78)Numeric8Y<NA><NA><NA>
테이블정의서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
118SUM_HEART심근경색 환자수(상병기호 I21~I22)Numeric8Y<NA><NA><NA>
129SUM_STROKE뇌졸중 환자수(I60~I64)Numeric8Y<NA><NA><NA>
1310<NA><NA><NA>NaN<NA><NA><NA><NA>
1411<NA><NA><NA>NaN<NA><NA><NA><NA>
1512<NA><NA><NA>NaN<NA><NA><NA><NA>
1613<NA><NA><NA>NaN<NA><NA><NA><NA>
1714<NA><NA><NA>NaN<NA><NA><NA><NA>
18인덱스명<NA>인덱스키<NA>NaN<NA><NA><NA><NA>
19NaN<NA><NA><NA>NaN<NA><NA><NA><NA>
20업무규칙<NA><NA><NA>NaN<NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8# duplicates
0<NA><NA><NA><NA><NA><NA><NA>7