Overview

Dataset statistics

Number of variables5
Number of observations1141
Missing cells1141
Missing cells (%)20.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory48.0 KiB
Average record size in memory43.1 B

Variable types

Text2
Numeric2
Unsupported1

Dataset

Description한국보훈복지의료공단 대구보훈병원에서 퇴원한 환자의 국제분류질병코드에 관한 통계입니다. 국비환자, 사비환자로 나누어져 있습니다.
URLhttps://www.data.go.kr/data/15066475/fileData.do

Alerts

국비 is highly overall correlated with 사비High correlation
사비 is highly overall correlated with 국비High correlation
Unnamed: 4 has 1141 (100.0%) missing valuesMissing
사비 is highly skewed (γ1 = 31.93406768)Skewed
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported
사비 has 678 (59.4%) zerosZeros

Reproduction

Analysis started2023-12-12 22:19:25.792033
Analysis finished2023-12-12 22:19:26.999815
Duration1.21 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1135
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Memory size9.0 KiB
2023-12-13T07:19:27.352587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.3751096
Min length3

Characters and Unicode

Total characters4992
Distinct characters34
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1129 ?
Unique (%)98.9%

Sample

1st rowA020
2nd rowA047
3rd rowA084
4th rowA090
5th rowA099
ValueCountFrequency (%)
k621 2
 
0.2%
j129 2
 
0.2%
e1164 2
 
0.2%
x5999 2
 
0.2%
s32890 2
 
0.2%
n401 2
 
0.2%
m7979 1
 
0.1%
m751 1
 
0.1%
m8159 1
 
0.1%
m7938 1
 
0.1%
Other values (1125) 1125
98.6%
2023-12-13T07:19:28.014412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 628
12.6%
1 517
10.4%
2 468
9.4%
9 459
9.2%
8 379
 
7.6%
4 320
 
6.4%
5 312
 
6.2%
3 280
 
5.6%
6 249
 
5.0%
7 239
 
4.8%
Other values (24) 1141
22.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3851
77.1%
Uppercase Letter 1141
 
22.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 140
12.3%
K 118
 
10.3%
I 111
 
9.7%
S 104
 
9.1%
C 89
 
7.8%
J 62
 
5.4%
H 53
 
4.6%
E 52
 
4.6%
R 51
 
4.5%
G 43
 
3.8%
Other values (14) 318
27.9%
Decimal Number
ValueCountFrequency (%)
0 628
16.3%
1 517
13.4%
2 468
12.2%
9 459
11.9%
8 379
9.8%
4 320
8.3%
5 312
8.1%
3 280
7.3%
6 249
 
6.5%
7 239
 
6.2%

Most occurring scripts

ValueCountFrequency (%)
Common 3851
77.1%
Latin 1141
 
22.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 140
12.3%
K 118
 
10.3%
I 111
 
9.7%
S 104
 
9.1%
C 89
 
7.8%
J 62
 
5.4%
H 53
 
4.6%
E 52
 
4.6%
R 51
 
4.5%
G 43
 
3.8%
Other values (14) 318
27.9%
Common
ValueCountFrequency (%)
0 628
16.3%
1 517
13.4%
2 468
12.2%
9 459
11.9%
8 379
9.8%
4 320
8.3%
5 312
8.1%
3 280
7.3%
6 249
 
6.5%
7 239
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4992
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 628
12.6%
1 517
10.4%
2 468
9.4%
9 459
9.2%
8 379
 
7.6%
4 320
 
6.4%
5 312
 
6.2%
3 280
 
5.6%
6 249
 
5.0%
7 239
 
4.8%
Other values (24) 1141
22.9%
Distinct1134
Distinct (%)99.4%
Missing0
Missing (%)0.0%
Memory size9.0 KiB
2023-12-13T07:19:28.402816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length71
Median length49
Mean length16.850131
Min length1

Characters and Unicode

Total characters19226
Distinct characters511
Distinct categories8 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1127 ?
Unique (%)98.8%

Sample

1st row살모넬라장염
2nd row클로스트리듐 디피실리에 의한 장결장염
3rd row상세불명의 바이러스성 장감염
4th row감염성 기원의 기타 및 상세불명의 위장염 및 결장염
5th row상세불명 기원의 위장염 및 결장염
ValueCountFrequency (%)
상세불명의 279
 
5.9%
기타 213
 
4.5%
190
 
4.0%
신생물 105
 
2.2%
상세불명 100
 
2.1%
악성 88
 
1.9%
동반한 82
 
1.7%
폐쇄성 74
 
1.6%
또는 70
 
1.5%
골절 62
 
1.3%
Other values (1365) 3432
73.1%
2023-12-13T07:19:29.034434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3554
 
18.5%
822
 
4.3%
555
 
2.9%
533
 
2.8%
442
 
2.3%
411
 
2.1%
407
 
2.1%
, 370
 
1.9%
325
 
1.7%
241
 
1.3%
Other values (501) 11566
60.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 14851
77.2%
Space Separator 3554
 
18.5%
Other Punctuation 419
 
2.2%
Decimal Number 189
 
1.0%
Close Punctuation 73
 
0.4%
Open Punctuation 73
 
0.4%
Uppercase Letter 66
 
0.3%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
822
 
5.5%
555
 
3.7%
533
 
3.6%
442
 
3.0%
411
 
2.8%
407
 
2.7%
325
 
2.2%
241
 
1.6%
236
 
1.6%
232
 
1.6%
Other values (463) 10647
71.7%
Uppercase Letter
ValueCountFrequency (%)
T 9
13.6%
G 8
12.1%
I 8
12.1%
B 6
9.1%
E 6
9.1%
L 5
7.6%
N 4
 
6.1%
C 3
 
4.5%
H 3
 
4.5%
S 3
 
4.5%
Other values (7) 11
16.7%
Decimal Number
ValueCountFrequency (%)
0 49
25.9%
2 38
20.1%
1 32
16.9%
3 21
11.1%
9 15
 
7.9%
4 10
 
5.3%
5 8
 
4.2%
6 6
 
3.2%
8 5
 
2.6%
7 5
 
2.6%
Other Punctuation
ValueCountFrequency (%)
, 370
88.3%
. 25
 
6.0%
* 15
 
3.6%
8
 
1.9%
/ 1
 
0.2%
Close Punctuation
ValueCountFrequency (%)
) 54
74.0%
] 19
 
26.0%
Open Punctuation
ValueCountFrequency (%)
( 54
74.0%
[ 19
 
26.0%
Space Separator
ValueCountFrequency (%)
3554
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 14842
77.2%
Common 4309
 
22.4%
Latin 66
 
0.3%
Han 9
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
822
 
5.5%
555
 
3.7%
533
 
3.6%
442
 
3.0%
411
 
2.8%
407
 
2.7%
325
 
2.2%
241
 
1.6%
236
 
1.6%
232
 
1.6%
Other values (454) 10638
71.7%
Common
ValueCountFrequency (%)
3554
82.5%
, 370
 
8.6%
) 54
 
1.3%
( 54
 
1.3%
0 49
 
1.1%
2 38
 
0.9%
1 32
 
0.7%
. 25
 
0.6%
3 21
 
0.5%
] 19
 
0.4%
Other values (11) 93
 
2.2%
Latin
ValueCountFrequency (%)
T 9
13.6%
G 8
12.1%
I 8
12.1%
B 6
9.1%
E 6
9.1%
L 5
7.6%
N 4
 
6.1%
C 3
 
4.5%
H 3
 
4.5%
S 3
 
4.5%
Other values (7) 11
16.7%
Han
ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 14842
77.2%
ASCII 4367
 
22.7%
Punctuation 8
 
< 0.1%
CJK 8
 
< 0.1%
CJK Compat Ideographs 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3554
81.4%
, 370
 
8.5%
) 54
 
1.2%
( 54
 
1.2%
0 49
 
1.1%
2 38
 
0.9%
1 32
 
0.7%
. 25
 
0.6%
3 21
 
0.5%
] 19
 
0.4%
Other values (27) 151
 
3.5%
Hangul
ValueCountFrequency (%)
822
 
5.5%
555
 
3.7%
533
 
3.6%
442
 
3.0%
411
 
2.8%
407
 
2.7%
325
 
2.2%
241
 
1.6%
236
 
1.6%
232
 
1.6%
Other values (454) 10638
71.7%
Punctuation
ValueCountFrequency (%)
8
100.0%
CJK
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
CJK Compat Ideographs
ValueCountFrequency (%)
1
100.0%

국비
Real number (ℝ)

HIGH CORRELATION 

Distinct68
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.7984224
Minimum1
Maximum503
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.2 KiB
2023-12-13T07:19:29.191863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34
95-th percentile27
Maximum503
Range502
Interquartile range (IQR)3

Descriptive statistics

Standard deviation22.481018
Coefficient of variation (CV)3.3067992
Kurtosis221.78086
Mean6.7984224
Median Absolute Deviation (MAD)1
Skewness12.066437
Sum7757
Variance505.39617
MonotonicityNot monotonic
2023-12-13T07:19:29.356744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 555
48.6%
2 170
 
14.9%
3 98
 
8.6%
4 62
 
5.4%
5 44
 
3.9%
6 28
 
2.5%
8 19
 
1.7%
7 19
 
1.7%
12 11
 
1.0%
9 10
 
0.9%
Other values (58) 125
 
11.0%
ValueCountFrequency (%)
1 555
48.6%
2 170
 
14.9%
3 98
 
8.6%
4 62
 
5.4%
5 44
 
3.9%
6 28
 
2.5%
7 19
 
1.7%
8 19
 
1.7%
9 10
 
0.9%
10 9
 
0.8%
ValueCountFrequency (%)
503 1
0.1%
181 1
0.1%
171 1
0.1%
156 1
0.1%
152 1
0.1%
150 2
0.2%
131 1
0.1%
116 1
0.1%
115 1
0.1%
109 1
0.1%

사비
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct47
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.6091148
Minimum0
Maximum1788
Zeros678
Zeros (%)59.4%
Negative0
Negative (%)0.0%
Memory size10.2 KiB
2023-12-13T07:19:29.536031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile18
Maximum1788
Range1788
Interquartile range (IQR)2

Descriptive statistics

Standard deviation53.87127
Coefficient of variation (CV)11.687986
Kurtosis1056.2116
Mean4.6091148
Median Absolute Deviation (MAD)0
Skewness31.934068
Sum5259
Variance2902.1137
MonotonicityNot monotonic
2023-12-13T07:19:29.727205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
0 678
59.4%
1 170
 
14.9%
2 73
 
6.4%
3 50
 
4.4%
4 31
 
2.7%
5 15
 
1.3%
8 11
 
1.0%
7 11
 
1.0%
6 10
 
0.9%
9 8
 
0.7%
Other values (37) 84
 
7.4%
ValueCountFrequency (%)
0 678
59.4%
1 170
 
14.9%
2 73
 
6.4%
3 50
 
4.4%
4 31
 
2.7%
5 15
 
1.3%
6 10
 
0.9%
7 11
 
1.0%
8 11
 
1.0%
9 8
 
0.7%
ValueCountFrequency (%)
1788 1
0.1%
156 1
0.1%
145 1
0.1%
86 1
0.1%
82 1
0.1%
76 1
0.1%
73 1
0.1%
58 1
0.1%
55 2
0.2%
54 2
0.2%

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing1141
Missing (%)100.0%
Memory size10.2 KiB

Interactions

2023-12-13T07:19:26.313888image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:19:26.091609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:19:26.708559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:19:26.190470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:19:29.838281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국비사비
국비1.0000.408
사비0.4081.000
2023-12-13T07:19:29.926573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국비사비
국비1.0000.623
사비0.6231.000

Missing values

2023-12-13T07:19:26.848099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:19:26.951961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

상병코드상병명국비사비Unnamed: 4
0A020살모넬라장염11<NA>
1A047클로스트리듐 디피실리에 의한 장결장염20<NA>
2A084상세불명의 바이러스성 장감염60<NA>
3A090감염성 기원의 기타 및 상세불명의 위장염 및 결장염85<NA>
4A099상세불명 기원의 위장염 및 결장염3727<NA>
5A1500배양 유무에 관계없이 가래 현미경 검사로 확인된 공동이 있는 폐결핵10<NA>
6A1501배양 유무에 관계없이 가래 현미경 검사로 확인된 공동이 없거나 상세불명의 폐결핵12<NA>
7A1621세균학적 또는 조직학적 확인에 대한 언급이 없는 공동이 없거나 상세불명의 폐결핵10<NA>
8A1681세균학적 또는 조직학적 확인에 대한 언급이 없는 공동이 없거나 상세불명의 기타 호흡기결핵10<NA>
9A169세균학적 또는 조직학적 확인에 대한 언급이 없는 상세불명의 호흡기결핵10<NA>
상병코드상병명국비사비Unnamed: 4
1131Z904소화관의 기타 부분의 후천성 결여40<NA>
1132Z933결장조루상태21<NA>
1133Z942폐이식상태10<NA>
1134Z944간이식상태40<NA>
1135Z950심장전자장치의 존재11<NA>
1136Z951대동맥관상동맥우회로이식편의 존재63<NA>
1137Z955관상동맥혈관성형 삽입물 및 이식편의 존재1710<NA>
1138Z961안구내렌즈의 존재5935<NA>
1139Z9664무릎관절삽입물의 존재20<NA>
1140Z988기타 명시된 수술후 상태80<NA>