Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells4
Missing cells (%)< 0.1%
Duplicate rows22
Duplicate rows (%)0.2%
Total size in memory312.5 KiB
Average record size in memory32.0 B

Variable types

Text3

Dataset

Description건강보험심사평가원 데이터베이스에 구축된 코드마스터 정보
Author건강보험심사평가원
URLhttps://www.data.go.kr/data/15067468/fileData.do

Alerts

Dataset has 22 (0.2%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 05:23:22.341138
Analysis finished2023-12-12 05:23:23.237438
Duration0.9 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2806
Distinct (%)28.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T14:23:23.409188image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length24
Mean length14.3301
Min length4

Characters and Unicode

Total characters143301
Distinct characters116
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1323 ?
Unique (%)13.2%

Sample

1st rowDUR_CVAPL_PRS_DIV_CD
2nd rowPPR_REFM_ERR_CD
3rd rowCVAPL_APL_STAT_CD
4th rowDATA_TRSMRCV_CD
5th rowPAY_CND_TY_CD
ValueCountFrequency (%)
err_cd 328
 
3.3%
extr_snd_univ_cd 304
 
3.0%
fct_model_dtl_cd 185
 
1.8%
drg_exm_cd 165
 
1.7%
ppr_refm_err_cd 142
 
1.4%
refm_czitm_cd 139
 
1.4%
fct_model_cd 86
 
0.9%
bm_diag_cd 86
 
0.9%
mdiv_dstr_cd 84
 
0.8%
sct_cd 83
 
0.8%
Other values (2796) 8398
84.0%
2023-12-12T14:23:23.795780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 27964
19.5%
D 16601
11.6%
C 16223
11.3%
T 11556
 
8.1%
R 8661
 
6.0%
P 6990
 
4.9%
S 6921
 
4.8%
M 5612
 
3.9%
E 4713
 
3.3%
A 4373
 
3.1%
Other values (106) 33687
23.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 112032
78.2%
Connector Punctuation 27964
 
19.5%
Other Letter 3151
 
2.2%
Decimal Number 154
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
422
 
13.4%
422
 
13.4%
224
 
7.1%
131
 
4.2%
126
 
4.0%
100
 
3.2%
100
 
3.2%
92
 
2.9%
88
 
2.8%
88
 
2.8%
Other values (70) 1358
43.1%
Uppercase Letter
ValueCountFrequency (%)
D 16601
14.8%
C 16223
14.5%
T 11556
10.3%
R 8661
 
7.7%
P 6990
 
6.2%
S 6921
 
6.2%
M 5612
 
5.0%
E 4713
 
4.2%
A 4373
 
3.9%
N 4291
 
3.8%
Other values (16) 26091
23.3%
Decimal Number
ValueCountFrequency (%)
1 52
33.8%
4 32
20.8%
2 21
13.6%
9 14
 
9.1%
5 13
 
8.4%
3 10
 
6.5%
0 8
 
5.2%
7 3
 
1.9%
8 1
 
0.6%
Connector Punctuation
ValueCountFrequency (%)
_ 27964
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 112032
78.2%
Common 28118
 
19.6%
Hangul 3151
 
2.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
422
 
13.4%
422
 
13.4%
224
 
7.1%
131
 
4.2%
126
 
4.0%
100
 
3.2%
100
 
3.2%
92
 
2.9%
88
 
2.8%
88
 
2.8%
Other values (70) 1358
43.1%
Latin
ValueCountFrequency (%)
D 16601
14.8%
C 16223
14.5%
T 11556
10.3%
R 8661
 
7.7%
P 6990
 
6.2%
S 6921
 
6.2%
M 5612
 
5.0%
E 4713
 
4.2%
A 4373
 
3.9%
N 4291
 
3.8%
Other values (16) 26091
23.3%
Common
ValueCountFrequency (%)
_ 27964
99.5%
1 52
 
0.2%
4 32
 
0.1%
2 21
 
0.1%
9 14
 
< 0.1%
5 13
 
< 0.1%
3 10
 
< 0.1%
0 8
 
< 0.1%
7 3
 
< 0.1%
8 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 140150
97.8%
Hangul 3151
 
2.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 27964
20.0%
D 16601
11.8%
C 16223
11.6%
T 11556
 
8.2%
R 8661
 
6.2%
P 6990
 
5.0%
S 6921
 
4.9%
M 5612
 
4.0%
E 4713
 
3.4%
A 4373
 
3.1%
Other values (26) 30536
21.8%
Hangul
ValueCountFrequency (%)
422
 
13.4%
422
 
13.4%
224
 
7.1%
131
 
4.2%
126
 
4.0%
100
 
3.2%
100
 
3.2%
92
 
2.9%
88
 
2.8%
88
 
2.8%
Other values (70) 1358
43.1%

코드
Text

Distinct3207
Distinct (%)32.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T14:23:24.214869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length8
Mean length4.2084
Min length1

Characters and Unicode

Total characters42084
Distinct characters58
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2588 ?
Unique (%)25.9%

Sample

1st row10
2nd row68601
3rd row9
4th row3
5th rowG
ValueCountFrequency (%)
1 857
 
8.6%
2 715
 
7.1%
3 496
 
5.0%
4 348
 
3.5%
5 261
 
2.6%
0 212
 
2.1%
6 168
 
1.7%
9 151
 
1.5%
7 142
 
1.4%
10 112
 
1.1%
Other values (3180) 6538
65.4%
2023-12-12T14:23:24.787776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
15939
37.9%
0 5570
 
13.2%
1 4423
 
10.5%
2 3106
 
7.4%
3 2059
 
4.9%
4 1560
 
3.7%
5 1337
 
3.2%
6 1037
 
2.5%
9 943
 
2.2%
7 898
 
2.1%
Other values (48) 5212
 
12.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 21780
51.8%
Space Separator 15939
37.9%
Uppercase Letter 4266
 
10.1%
Currency Symbol 60
 
0.1%
Lowercase Letter 26
 
0.1%
Connector Punctuation 5
 
< 0.1%
Other Punctuation 5
 
< 0.1%
Other Letter 2
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 407
 
9.5%
A 390
 
9.1%
N 344
 
8.1%
R 319
 
7.5%
D 289
 
6.8%
B 278
 
6.5%
M 264
 
6.2%
H 222
 
5.2%
U 185
 
4.3%
X 180
 
4.2%
Other values (16) 1388
32.5%
Lowercase Letter
ValueCountFrequency (%)
a 4
15.4%
c 4
15.4%
b 3
11.5%
i 3
11.5%
n 2
7.7%
r 2
7.7%
g 2
7.7%
p 1
 
3.8%
h 1
 
3.8%
m 1
 
3.8%
Other values (3) 3
11.5%
Decimal Number
ValueCountFrequency (%)
0 5570
25.6%
1 4423
20.3%
2 3106
14.3%
3 2059
 
9.5%
4 1560
 
7.2%
5 1337
 
6.1%
6 1037
 
4.8%
9 943
 
4.3%
7 898
 
4.1%
8 847
 
3.9%
Other Punctuation
ValueCountFrequency (%)
* 2
40.0%
# 2
40.0%
, 1
20.0%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
15939
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 60
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 5
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 37790
89.8%
Latin 4292
 
10.2%
Hangul 2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 407
 
9.5%
A 390
 
9.1%
N 344
 
8.0%
R 319
 
7.4%
D 289
 
6.7%
B 278
 
6.5%
M 264
 
6.2%
H 222
 
5.2%
U 185
 
4.3%
X 180
 
4.2%
Other values (29) 1414
32.9%
Common
ValueCountFrequency (%)
15939
42.2%
0 5570
 
14.7%
1 4423
 
11.7%
2 3106
 
8.2%
3 2059
 
5.4%
4 1560
 
4.1%
5 1337
 
3.5%
6 1037
 
2.7%
9 943
 
2.5%
7 898
 
2.4%
Other values (7) 918
 
2.4%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42082
> 99.9%
Hangul 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
15939
37.9%
0 5570
 
13.2%
1 4423
 
10.5%
2 3106
 
7.4%
3 2059
 
4.9%
4 1560
 
3.7%
5 1337
 
3.2%
6 1037
 
2.5%
9 943
 
2.2%
7 898
 
2.1%
Other values (46) 5210
 
12.4%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct7626
Distinct (%)76.3%
Missing4
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-12T14:23:25.153704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length118
Median length105
Mean length8.6283513
Min length1

Characters and Unicode

Total characters86249
Distinct characters753
Distinct categories16 ?
Distinct scripts4 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6715 ?
Unique (%)67.2%

Sample

1st row우편
2nd row사용기간_19일반오류
3rd row점검오류 9차
4th rowUNIX(대구지원2)
5th row급여특정내역 or 급여연령
ValueCountFrequency (%)
인사 304
 
1.9%
기타 213
 
1.3%
145
 
0.9%
73
 
0.5%
68
 
0.4%
또는 68
 
0.4%
관련 58
 
0.4%
조정 58
 
0.4%
해당사항없음 57
 
0.4%
경우 56
 
0.4%
Other values (9103) 14873
93.1%
2023-12-12T14:23:25.772036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5994
 
6.9%
_ 1874
 
2.2%
1652
 
1.9%
1647
 
1.9%
1337
 
1.6%
1323
 
1.5%
1122
 
1.3%
1067
 
1.2%
) 1050
 
1.2%
( 1049
 
1.2%
Other values (743) 68134
79.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66518
77.1%
Space Separator 5994
 
6.9%
Decimal Number 3148
 
3.6%
Lowercase Letter 2642
 
3.1%
Uppercase Letter 2345
 
2.7%
Connector Punctuation 1874
 
2.2%
Close Punctuation 1103
 
1.3%
Open Punctuation 1102
 
1.3%
Other Punctuation 868
 
1.0%
Dash Punctuation 402
 
0.5%
Other values (6) 253
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1652
 
2.5%
1647
 
2.5%
1337
 
2.0%
1323
 
2.0%
1122
 
1.7%
1067
 
1.6%
1025
 
1.5%
950
 
1.4%
874
 
1.3%
874
 
1.3%
Other values (639) 54647
82.2%
Lowercase Letter
ValueCountFrequency (%)
e 279
10.6%
o 258
 
9.8%
i 233
 
8.8%
a 200
 
7.6%
r 194
 
7.3%
s 185
 
7.0%
n 173
 
6.5%
l 158
 
6.0%
t 156
 
5.9%
m 144
 
5.5%
Other values (18) 662
25.1%
Uppercase Letter
ValueCountFrequency (%)
C 193
 
8.2%
T 183
 
7.8%
R 175
 
7.5%
I 175
 
7.5%
D 162
 
6.9%
A 147
 
6.3%
Z 133
 
5.7%
P 126
 
5.4%
E 125
 
5.3%
S 122
 
5.2%
Other values (16) 804
34.3%
Other Punctuation
ValueCountFrequency (%)
, 381
43.9%
. 151
 
17.4%
/ 101
 
11.6%
% 79
 
9.1%
: 57
 
6.6%
' 45
 
5.2%
· 39
 
4.5%
* 7
 
0.8%
& 5
 
0.6%
1
 
0.1%
Other values (2) 2
 
0.2%
Decimal Number
ValueCountFrequency (%)
1 710
22.6%
2 591
18.8%
0 587
18.6%
5 234
 
7.4%
3 224
 
7.1%
4 220
 
7.0%
8 182
 
5.8%
7 149
 
4.7%
6 135
 
4.3%
9 116
 
3.7%
Math Symbol
ValueCountFrequency (%)
~ 92
42.4%
+ 44
20.3%
> 30
 
13.8%
< 21
 
9.7%
= 19
 
8.8%
6
 
2.8%
× 4
 
1.8%
1
 
0.5%
Letter Number
ValueCountFrequency (%)
10
41.7%
9
37.5%
2
 
8.3%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Close Punctuation
ValueCountFrequency (%)
) 1050
95.2%
] 52
 
4.7%
1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 1049
95.2%
[ 52
 
4.7%
1
 
0.1%
Initial Punctuation
ValueCountFrequency (%)
3
75.0%
1
 
25.0%
Space Separator
ValueCountFrequency (%)
5994
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1874
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 402
100.0%
Modifier Symbol
ValueCountFrequency (%)
^ 4
100.0%
Final Punctuation
ValueCountFrequency (%)
3
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66518
77.1%
Common 14720
 
17.1%
Latin 5008
 
5.8%
Greek 3
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1652
 
2.5%
1647
 
2.5%
1337
 
2.0%
1323
 
2.0%
1122
 
1.7%
1067
 
1.6%
1025
 
1.5%
950
 
1.4%
874
 
1.3%
874
 
1.3%
Other values (639) 54647
82.2%
Latin
ValueCountFrequency (%)
e 279
 
5.6%
o 258
 
5.2%
i 233
 
4.7%
a 200
 
4.0%
r 194
 
3.9%
C 193
 
3.9%
s 185
 
3.7%
T 183
 
3.7%
R 175
 
3.5%
I 175
 
3.5%
Other values (48) 2933
58.6%
Common
ValueCountFrequency (%)
5994
40.7%
_ 1874
 
12.7%
) 1050
 
7.1%
( 1049
 
7.1%
1 710
 
4.8%
2 591
 
4.0%
0 587
 
4.0%
- 402
 
2.7%
, 381
 
2.6%
5 234
 
1.6%
Other values (34) 1848
 
12.6%
Greek
ValueCountFrequency (%)
δ 2
66.7%
β 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66510
77.1%
ASCII 19644
 
22.8%
None 48
 
0.1%
Number Forms 24
 
< 0.1%
Compat Jamo 8
 
< 0.1%
Punctuation 8
 
< 0.1%
Arrows 6
 
< 0.1%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5994
30.5%
_ 1874
 
9.5%
) 1050
 
5.3%
( 1049
 
5.3%
1 710
 
3.6%
2 591
 
3.0%
0 587
 
3.0%
- 402
 
2.0%
, 381
 
1.9%
e 279
 
1.4%
Other values (76) 6727
34.2%
Hangul
ValueCountFrequency (%)
1652
 
2.5%
1647
 
2.5%
1337
 
2.0%
1323
 
2.0%
1122
 
1.7%
1067
 
1.6%
1025
 
1.5%
950
 
1.4%
874
 
1.3%
874
 
1.3%
Other values (636) 54639
82.2%
None
ValueCountFrequency (%)
· 39
81.2%
× 4
 
8.3%
δ 2
 
4.2%
1
 
2.1%
1
 
2.1%
β 1
 
2.1%
Number Forms
ValueCountFrequency (%)
10
41.7%
9
37.5%
2
 
8.3%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Compat Jamo
ValueCountFrequency (%)
6
75.0%
1
 
12.5%
1
 
12.5%
Arrows
ValueCountFrequency (%)
6
100.0%
Punctuation
ValueCountFrequency (%)
3
37.5%
3
37.5%
1
 
12.5%
1
 
12.5%
Math Operators
ValueCountFrequency (%)
1
100.0%

Missing values

2023-12-12T14:23:23.046668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:23:23.180528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

코드유형코드코드명
11332DUR_CVAPL_PRS_DIV_CD10우편
25071PPR_REFM_ERR_CD68601사용기간_19일반오류
7655CVAPL_APL_STAT_CD9점검오류 9차
8446DATA_TRSMRCV_CD3UNIX(대구지원2)
23923PAY_CND_TY_CDG급여특정내역 or 급여연령
28067REGT_DTL_TP_CD3신설
33894WK_STAT_CD5휴가
33028TMCAT_PRS_PRG_STAT_CD61독립적검토 신청 접수
34046WRC_ISUD_RLTN_CDB8처외조부
11896EMY_RDP_TP_CD11외국인-소재확인자
코드유형코드코드명
28202REQ_PRS_RST_CD40접수중
23721OVL_INFM_SVC_MGMT_RS_CD20399기타
18224IX_CDb3정기검사 실시주기 충족률<INVST_CD=09>, 정신과 간호인력 1인당 1일 입원 환자수 <INVST_CD=07>
17979INZ_TP_CD5전라권
5607CLFY_CDB3임시직(주 30시간 이상)
31705SRCH_TP_CD4기관소개
5594CLFY_CDB13정규직 단시간(주 36~40시간 미만)
7752CVAPL_DATA_CHG_TY_CD12핸드폰번호
31828SRCMT_SPC_FLD_CD10803종양_간
31788SRCMT_DTL_SBJT_CD2300가정의학과

Duplicate rows

Most frequently occurring

코드유형코드코드명# duplicates
0ANCE_DESC_CD1본인일부2
1ANCE_DESC_CD2100/1002
2MDIV_DSTR_CD141성남 광주 하남2
3MDIV_DSTR_CD202양구군 인제군2
4MDIV_DSTR_CD204삼척시 동해시2
5MDIV_DSTR_CD304진천군2
6MDIV_DSTR_CD307충주시2
7MDIV_DSTR_CD308제천시2
8MDIV_DSTR_CD414아산시2
9MDIV_DSTR_CD424연기군2