Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory410.2 KiB
Average record size in memory42.0 B

Variable types

Text2
Categorical1
Numeric1

Dataset

Description의료급여 수급권자에 대해 등록된 고시 질환 내역 중 희귀, 난치성 질환의 상병 기호 내역과 그룹 별 일련번호로 분류한 내역. 컬럼명은 상병기호, 그룹, 그룹내 일련번호, 순번으로 구성됨
URLhttps://www.data.go.kr/data/15121404/fileData.do

Alerts

상병기호 has unique valuesUnique
순번 has 2558 (25.6%) zerosZeros

Reproduction

Analysis started2023-12-12 23:00:13.196514
Analysis finished2023-12-12 23:00:14.154768
Duration0.96 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

상병기호
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T08:00:14.575714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length4
Mean length3.8584
Min length3

Characters and Unicode

Total characters38584
Distinct characters35
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st rowM872
2nd rowT71
3rd rowM354
4th rowM813
5th rowZ891
ValueCountFrequency (%)
m872 1
 
< 0.1%
q922 1
 
< 0.1%
x995 1
 
< 0.1%
v31 1
 
< 0.1%
t631 1
 
< 0.1%
x108 1
 
< 0.1%
g620 1
 
< 0.1%
g618 1
 
< 0.1%
a666 1
 
< 0.1%
y497 1
 
< 0.1%
Other values (9990) 9990
99.9%
2023-12-13T08:00:15.181479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3531
 
9.2%
1 3289
 
8.5%
2 3155
 
8.2%
3 3028
 
7.8%
8 2880
 
7.5%
4 2826
 
7.3%
9 2654
 
6.9%
5 2585
 
6.7%
6 2393
 
6.2%
7 2243
 
5.8%
Other values (25) 10000
25.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 28584
74.1%
Uppercase Letter 10000
 
25.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
X 736
 
7.4%
W 655
 
6.6%
V 608
 
6.1%
Y 558
 
5.6%
T 538
 
5.4%
Q 494
 
4.9%
Z 492
 
4.9%
S 492
 
4.9%
M 455
 
4.5%
D 386
 
3.9%
Other values (15) 4586
45.9%
Decimal Number
ValueCountFrequency (%)
0 3531
12.4%
1 3289
11.5%
2 3155
11.0%
3 3028
10.6%
8 2880
10.1%
4 2826
9.9%
9 2654
9.3%
5 2585
9.0%
6 2393
8.4%
7 2243
7.8%

Most occurring scripts

ValueCountFrequency (%)
Common 28584
74.1%
Latin 10000
 
25.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
X 736
 
7.4%
W 655
 
6.6%
V 608
 
6.1%
Y 558
 
5.6%
T 538
 
5.4%
Q 494
 
4.9%
Z 492
 
4.9%
S 492
 
4.9%
M 455
 
4.5%
D 386
 
3.9%
Other values (15) 4586
45.9%
Common
ValueCountFrequency (%)
0 3531
12.4%
1 3289
11.5%
2 3155
11.0%
3 3028
10.6%
8 2880
10.1%
4 2826
9.9%
9 2654
9.3%
5 2585
9.0%
6 2393
8.4%
7 2243
7.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 38584
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3531
 
9.2%
1 3289
 
8.5%
2 3155
 
8.2%
3 3028
 
7.8%
8 2880
 
7.5%
4 2826
 
7.3%
9 2654
 
6.9%
5 2585
 
6.7%
6 2393
 
6.2%
7 2243
 
5.8%
Other values (25) 10000
25.9%

그룹
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
33
8382 
11
916 
22
 
702

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row22
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33 8382
83.8%
11 916
 
9.2%
22 702
 
7.0%

Length

2023-12-13T08:00:15.352776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:00:15.465847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
33 8382
83.8%
11 916
 
9.2%
22 702
 
7.0%
Distinct8474
Distinct (%)84.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T08:00:15.790421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length34
Median length22
Mean length15.8445
Min length1

Characters and Unicode

Total characters158445
Distinct characters715
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8267 ?
Unique (%)82.7%

Sample

1st row이전의 외상에 의한 골괴사증
2nd row질식
3rd row결합조직의기타전신성침습
4th row수술후 흡수불량성 골다공증
5th row손 및 손목의 후천성 부재
ValueCountFrequency (%)
2640
 
6.5%
기타 2146
 
5.3%
상세불명의 1041
 
2.6%
또는 598
 
1.5%
의한 579
 
1.4%
명시된 504
 
1.2%
다친 472
 
1.2%
충돌로 417
 
1.0%
악성신생물 404
 
1.0%
장애 393
 
1.0%
Other values (6501) 31220
77.3%
2023-12-13T08:00:16.343505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
31067
 
19.6%
5724
 
3.6%
3761
 
2.4%
3632
 
2.3%
3369
 
2.1%
2909
 
1.8%
2827
 
1.8%
2451
 
1.5%
2427
 
1.5%
2130
 
1.3%
Other values (705) 98148
61.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 125317
79.1%
Space Separator 31067
 
19.6%
Open Punctuation 608
 
0.4%
Close Punctuation 597
 
0.4%
Uppercase Letter 549
 
0.3%
Decimal Number 163
 
0.1%
Dash Punctuation 118
 
0.1%
Other Punctuation 25
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5724
 
4.6%
3761
 
3.0%
3632
 
2.9%
3369
 
2.7%
2909
 
2.3%
2827
 
2.3%
2451
 
2.0%
2427
 
1.9%
2130
 
1.7%
1999
 
1.6%
Other values (664) 94088
75.1%
Uppercase Letter
ValueCountFrequency (%)
N 124
22.6%
S 109
19.9%
O 99
18.0%
C 37
 
6.7%
E 30
 
5.5%
B 24
 
4.4%
I 24
 
4.4%
X 15
 
2.7%
A 14
 
2.6%
V 13
 
2.4%
Other values (12) 60
10.9%
Decimal Number
ValueCountFrequency (%)
1 38
23.3%
2 32
19.6%
3 21
12.9%
0 16
9.8%
4 15
 
9.2%
9 13
 
8.0%
6 11
 
6.7%
5 7
 
4.3%
7 6
 
3.7%
8 4
 
2.5%
Other Punctuation
ValueCountFrequency (%)
% 15
60.0%
. 8
32.0%
· 1
 
4.0%
/ 1
 
4.0%
Space Separator
ValueCountFrequency (%)
31067
100.0%
Open Punctuation
ValueCountFrequency (%)
( 608
100.0%
Close Punctuation
ValueCountFrequency (%)
) 597
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 118
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 125317
79.1%
Common 32579
 
20.6%
Latin 549
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5724
 
4.6%
3761
 
3.0%
3632
 
2.9%
3369
 
2.7%
2909
 
2.3%
2827
 
2.3%
2451
 
2.0%
2427
 
1.9%
2130
 
1.7%
1999
 
1.6%
Other values (664) 94088
75.1%
Latin
ValueCountFrequency (%)
N 124
22.6%
S 109
19.9%
O 99
18.0%
C 37
 
6.7%
E 30
 
5.5%
B 24
 
4.4%
I 24
 
4.4%
X 15
 
2.7%
A 14
 
2.6%
V 13
 
2.4%
Other values (12) 60
10.9%
Common
ValueCountFrequency (%)
31067
95.4%
( 608
 
1.9%
) 597
 
1.8%
- 118
 
0.4%
1 38
 
0.1%
2 32
 
0.1%
3 21
 
0.1%
0 16
 
< 0.1%
% 15
 
< 0.1%
4 15
 
< 0.1%
Other values (9) 52
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 125317
79.1%
ASCII 33127
 
20.9%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
31067
93.8%
( 608
 
1.8%
) 597
 
1.8%
N 124
 
0.4%
- 118
 
0.4%
S 109
 
0.3%
O 99
 
0.3%
1 38
 
0.1%
C 37
 
0.1%
2 32
 
0.1%
Other values (30) 298
 
0.9%
Hangul
ValueCountFrequency (%)
5724
 
4.6%
3761
 
3.0%
3632
 
2.9%
3369
 
2.7%
2909
 
2.3%
2827
 
2.3%
2451
 
2.0%
2427
 
1.9%
2130
 
1.7%
1999
 
1.6%
Other values (664) 94088
75.1%
None
ValueCountFrequency (%)
· 1
100.0%

순번
Real number (ℝ)

ZEROS 

Distinct255
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean122.1933
Minimum0
Maximum298
Zeros2558
Zeros (%)25.6%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T08:00:16.521237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median99
Q3244
95-th percentile289
Maximum298
Range298
Interquartile range (IQR)244

Descriptive statistics

Standard deviation117.41211
Coefficient of variation (CV)0.96087188
Kurtosis-1.6823419
Mean122.1933
Median Absolute Deviation (MAD)99
Skewness0.20072777
Sum1221933
Variance13785.602
MonotonicityNot monotonic
2023-12-13T08:00:16.689372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2558
25.6%
7 397
 
4.0%
1 320
 
3.2%
281 320
 
3.2%
270 258
 
2.6%
199 251
 
2.5%
298 246
 
2.5%
2 236
 
2.4%
11 177
 
1.8%
96 158
 
1.6%
Other values (245) 5079
50.8%
ValueCountFrequency (%)
0 2558
25.6%
1 320
 
3.2%
2 236
 
2.4%
3 42
 
0.4%
4 54
 
0.5%
5 46
 
0.5%
6 78
 
0.8%
7 397
 
4.0%
8 46
 
0.5%
9 55
 
0.5%
ValueCountFrequency (%)
298 246
2.5%
297 90
 
0.9%
296 3
 
< 0.1%
295 5
 
0.1%
294 19
 
0.2%
293 6
 
0.1%
292 52
 
0.5%
291 1
 
< 0.1%
290 70
 
0.7%
289 34
 
0.3%

Interactions

2023-12-13T08:00:13.837659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:00:16.789074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
그룹순번
그룹1.0000.542
순번0.5421.000
2023-12-13T08:00:17.130111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번그룹
순번1.0000.386
그룹0.3861.000

Missing values

2023-12-13T08:00:13.998523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:00:14.101833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

상병기호그룹그룹내 일련번호순번
4919M87233이전의 외상에 의한 골괴사증210
14195T7133질식287
13188M35422결합조직의기타전신성침습65
13808M81333수술후 흡수불량성 골다공증208
10154Z89133손 및 손목의 후천성 부재298
5718Q96122터저증후군98
13442Q40233기타 명시된 위의 선천성 기형259
584Y2733의도 미확인의 증기 및 고온물체에 접촉0
11291Y18333운동 및 경기장에서 살충제에 의한 의도 미확인의 중0
5151O3133다태 임신에 특이한 합병증239
상병기호그룹그룹내 일련번호순번
4060E3533달리 분류된 질환에서의 내분비선 장애111
11256W01033주거지에서 미끌림 걸림 및 헛디딤에 의한 동일 면상0
6341Z65533재앙 전쟁 및 기타 적대행위에 노출298
7516K35033전신성 복막염을 동반한 급성 충수염186
14191T70333잠함병(감압병)287
12408I45811기타 명시된 전도 장애11
1475I2311급성 심근경색증에 의한 특정 현재 합병증11
10560C93922악성신생물(암)6
13066S62033손의 주상골의 골절274
4925M88833기타 뼈의 파젯병210