Overview

Dataset statistics

Number of variables4
Number of observations1069
Missing cells661
Missing cells (%)15.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory35.6 KiB
Average record size in memory34.1 B

Variable types

Text2
Numeric2

Dataset

Description대전보훈병원에서 개방하는 질병통계 및 수술통계 데이터로 상병코드,상병명,국비,사비가 포함된 개방데이터입니다.
URLhttps://www.data.go.kr/data/15066476/fileData.do

Alerts

국비 is highly overall correlated with 사비High correlation
사비 is highly overall correlated with 국비High correlation
국비 has 268 (25.1%) missing valuesMissing
사비 has 393 (36.8%) missing valuesMissing
상병코드 has unique valuesUnique
상병명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 20:14:22.931593
Analysis finished2023-12-12 20:14:23.957858
Duration1.03 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

상병코드
Text

UNIQUE 

Distinct1069
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
2023-12-13T05:14:24.285091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.362956
Min length3

Characters and Unicode

Total characters4664
Distinct characters34
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1069 ?
Unique (%)100.0%

Sample

1st rowA047
2nd rowA048
3rd rowA050
4th rowA0838
5th rowA084
ValueCountFrequency (%)
a047 1
 
0.1%
m8619 1
 
0.1%
n179 1
 
0.1%
m8000 1
 
0.1%
m8088 1
 
0.1%
m8099 1
 
0.1%
m8185 1
 
0.1%
m8189 1
 
0.1%
m8195 1
 
0.1%
m8199 1
 
0.1%
Other values (1059) 1059
99.1%
2023-12-13T05:14:24.869109image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 591
12.7%
1 497
10.7%
2 425
 
9.1%
9 422
 
9.0%
8 346
 
7.4%
5 313
 
6.7%
3 278
 
6.0%
4 267
 
5.7%
7 235
 
5.0%
6 221
 
4.7%
Other values (24) 1069
22.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3595
77.1%
Uppercase Letter 1069
 
22.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 139
13.0%
K 119
11.1%
S 112
10.5%
C 90
 
8.4%
I 81
 
7.6%
E 58
 
5.4%
W 53
 
5.0%
J 52
 
4.9%
R 48
 
4.5%
D 38
 
3.6%
Other values (14) 279
26.1%
Decimal Number
ValueCountFrequency (%)
0 591
16.4%
1 497
13.8%
2 425
11.8%
9 422
11.7%
8 346
9.6%
5 313
8.7%
3 278
7.7%
4 267
7.4%
7 235
 
6.5%
6 221
 
6.1%

Most occurring scripts

ValueCountFrequency (%)
Common 3595
77.1%
Latin 1069
 
22.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 139
13.0%
K 119
11.1%
S 112
10.5%
C 90
 
8.4%
I 81
 
7.6%
E 58
 
5.4%
W 53
 
5.0%
J 52
 
4.9%
R 48
 
4.5%
D 38
 
3.6%
Other values (14) 279
26.1%
Common
ValueCountFrequency (%)
0 591
16.4%
1 497
13.8%
2 425
11.8%
9 422
11.7%
8 346
9.6%
5 313
8.7%
3 278
7.7%
4 267
7.4%
7 235
 
6.5%
6 221
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4664
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 591
12.7%
1 497
10.7%
2 425
 
9.1%
9 422
 
9.0%
8 346
 
7.4%
5 313
 
6.7%
3 278
 
6.0%
4 267
 
5.7%
7 235
 
5.0%
6 221
 
4.7%
Other values (24) 1069
22.9%

상병명
Text

UNIQUE 

Distinct1069
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
2023-12-13T05:14:25.214781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length63
Median length44
Mean length16.888681
Min length2

Characters and Unicode

Total characters18054
Distinct characters489
Distinct categories9 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1069 ?
Unique (%)100.0%

Sample

1st row클로스트리듐 디피실리에 의한 장결장염
2nd row기타 명시된 세균성 장감염
3rd row음식매개포도알균중독
4th row기타 바이러스장염
5th row상세불명의 바이러스성 장감염
ValueCountFrequency (%)
상세불명의 263
 
5.9%
기타 212
 
4.8%
202
 
4.5%
상세불명 113
 
2.5%
신생물 107
 
2.4%
악성 90
 
2.0%
동반한 79
 
1.8%
폐쇄성 67
 
1.5%
또는 61
 
1.4%
골절 61
 
1.4%
Other values (1281) 3201
71.8%
2023-12-13T05:14:25.769907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3387
 
18.8%
770
 
4.3%
559
 
3.1%
481
 
2.7%
437
 
2.4%
400
 
2.2%
399
 
2.2%
, 355
 
2.0%
297
 
1.6%
231
 
1.3%
Other values (479) 10738
59.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 13872
76.8%
Space Separator 3387
 
18.8%
Other Punctuation 404
 
2.2%
Decimal Number 166
 
0.9%
Close Punctuation 68
 
0.4%
Open Punctuation 68
 
0.4%
Uppercase Letter 62
 
0.3%
Dash Punctuation 26
 
0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
770
 
5.6%
559
 
4.0%
481
 
3.5%
437
 
3.2%
400
 
2.9%
399
 
2.9%
297
 
2.1%
231
 
1.7%
229
 
1.7%
225
 
1.6%
Other values (442) 9844
71.0%
Uppercase Letter
ValueCountFrequency (%)
G 8
12.9%
B 7
11.3%
L 7
11.3%
E 6
9.7%
T 6
9.7%
N 6
9.7%
I 4
6.5%
S 3
 
4.8%
H 3
 
4.8%
M 3
 
4.8%
Other values (6) 9
14.5%
Decimal Number
ValueCountFrequency (%)
2 41
24.7%
1 36
21.7%
3 21
12.7%
0 19
11.4%
9 16
 
9.6%
4 11
 
6.6%
5 8
 
4.8%
8 6
 
3.6%
6 4
 
2.4%
7 4
 
2.4%
Other Punctuation
ValueCountFrequency (%)
, 355
87.9%
. 26
 
6.4%
* 16
 
4.0%
7
 
1.7%
Close Punctuation
ValueCountFrequency (%)
) 45
66.2%
] 23
33.8%
Open Punctuation
ValueCountFrequency (%)
( 45
66.2%
[ 23
33.8%
Space Separator
ValueCountFrequency (%)
3387
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 26
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 13867
76.8%
Common 4120
 
22.8%
Latin 62
 
0.3%
Han 5
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
770
 
5.6%
559
 
4.0%
481
 
3.5%
437
 
3.2%
400
 
2.9%
399
 
2.9%
297
 
2.1%
231
 
1.7%
229
 
1.7%
225
 
1.6%
Other values (437) 9839
71.0%
Common
ValueCountFrequency (%)
3387
82.2%
, 355
 
8.6%
) 45
 
1.1%
( 45
 
1.1%
2 41
 
1.0%
1 36
 
0.9%
- 26
 
0.6%
. 26
 
0.6%
] 23
 
0.6%
[ 23
 
0.6%
Other values (11) 113
 
2.7%
Latin
ValueCountFrequency (%)
G 8
12.9%
B 7
11.3%
L 7
11.3%
E 6
9.7%
T 6
9.7%
N 6
9.7%
I 4
6.5%
S 3
 
4.8%
H 3
 
4.8%
M 3
 
4.8%
Other values (6) 9
14.5%
Han
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 13867
76.8%
ASCII 4175
 
23.1%
Punctuation 7
 
< 0.1%
CJK 5
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3387
81.1%
, 355
 
8.5%
) 45
 
1.1%
( 45
 
1.1%
2 41
 
1.0%
1 36
 
0.9%
- 26
 
0.6%
. 26
 
0.6%
] 23
 
0.6%
[ 23
 
0.6%
Other values (26) 168
 
4.0%
Hangul
ValueCountFrequency (%)
770
 
5.6%
559
 
4.0%
481
 
3.5%
437
 
3.2%
400
 
2.9%
399
 
2.9%
297
 
2.1%
231
 
1.7%
229
 
1.7%
225
 
1.6%
Other values (437) 9839
71.0%
Punctuation
ValueCountFrequency (%)
7
100.0%
CJK
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%

국비
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct64
Distinct (%)8.0%
Missing268
Missing (%)25.1%
Infinite0
Infinite (%)0.0%
Mean9.7178527
Minimum1
Maximum734
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.5 KiB
2023-12-13T05:14:26.279552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q35
95-th percentile33
Maximum734
Range733
Interquartile range (IQR)4

Descriptive statistics

Standard deviation43.965672
Coefficient of variation (CV)4.5242167
Kurtosis192.762
Mean9.7178527
Median Absolute Deviation (MAD)1
Skewness12.996441
Sum7784
Variance1932.9803
MonotonicityNot monotonic
2023-12-13T05:14:26.458624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 348
32.6%
2 107
 
10.0%
3 68
 
6.4%
4 45
 
4.2%
5 41
 
3.8%
7 23
 
2.2%
8 18
 
1.7%
6 14
 
1.3%
9 13
 
1.2%
11 10
 
0.9%
Other values (54) 114
 
10.7%
(Missing) 268
25.1%
ValueCountFrequency (%)
1 348
32.6%
2 107
 
10.0%
3 68
 
6.4%
4 45
 
4.2%
5 41
 
3.8%
6 14
 
1.3%
7 23
 
2.2%
8 18
 
1.7%
9 13
 
1.2%
10 10
 
0.9%
ValueCountFrequency (%)
734 1
0.1%
694 1
0.1%
545 1
0.1%
228 1
0.1%
161 1
0.1%
156 1
0.1%
132 1
0.1%
125 1
0.1%
119 1
0.1%
107 1
0.1%

사비
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct45
Distinct (%)6.7%
Missing393
Missing (%)36.8%
Infinite0
Infinite (%)0.0%
Mean6.9156805
Minimum1
Maximum569
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.5 KiB
2023-12-13T05:14:26.651136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34
95-th percentile20.25
Maximum569
Range568
Interquartile range (IQR)3

Descriptive statistics

Standard deviation31.135931
Coefficient of variation (CV)4.5022224
Kurtosis219.79712
Mean6.9156805
Median Absolute Deviation (MAD)1
Skewness13.894564
Sum4675
Variance969.44621
MonotonicityNot monotonic
2023-12-13T05:14:26.807235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
1 316
29.6%
2 116
 
10.9%
3 48
 
4.5%
4 28
 
2.6%
5 27
 
2.5%
6 19
 
1.8%
9 15
 
1.4%
7 14
 
1.3%
10 9
 
0.8%
11 8
 
0.7%
Other values (35) 76
 
7.1%
(Missing) 393
36.8%
ValueCountFrequency (%)
1 316
29.6%
2 116
 
10.9%
3 48
 
4.5%
4 28
 
2.6%
5 27
 
2.5%
6 19
 
1.8%
7 14
 
1.3%
8 5
 
0.5%
9 15
 
1.4%
10 9
 
0.8%
ValueCountFrequency (%)
569 1
0.1%
438 1
0.1%
275 1
0.1%
148 1
0.1%
102 1
0.1%
91 1
0.1%
69 1
0.1%
61 1
0.1%
58 1
0.1%
56 1
0.1%

Interactions

2023-12-13T05:14:23.464700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:14:23.277205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:14:23.565206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:14:23.376142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T05:14:26.977918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국비사비
국비1.0000.931
사비0.9311.000
2023-12-13T05:14:27.059352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국비사비
국비1.0000.659
사비0.6591.000

Missing values

2023-12-13T05:14:23.695539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T05:14:23.786167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T05:14:23.897635image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

상병코드상병명국비사비
0A047클로스트리듐 디피실리에 의한 장결장염210
1A048기타 명시된 세균성 장감염1<NA>
2A050음식매개포도알균중독<NA>1
3A0838기타 바이러스장염<NA>1
4A084상세불명의 바이러스성 장감염29
5A090감염성 기원의 기타 및 상세불명의 위장염 및 결장염79
6A099상세불명 기원의 위장염 및 결장염2516
7A1501배양 유무에 관계없이 가래 현미경 검사로 확인된 공동이 없거나 상세불명의 폐결핵<NA>1
8A1591세균학적 및 조직학적으로 확인된 공동이 없거나 상세불명의 상세불명의 호흡기결핵<NA>1
9A1651세균학적 또는 조직학적 확인에 대한 언급이 없는 공동이 없거나 상세불명의 결핵성 흉막염<NA>3
상병코드상병명국비사비
1059Z952인공심장판막의 존재2<NA>
1060Z955관상동맥혈관성형 삽입물 및 이식편의 존재487
1061Z961안구내렌즈의 존재4128
1062Z988기타 명시된 수술후 상태56
1063U071바이러스가 확인된 코로나바이러스 질환 2019 [바이러스가 확인된 코로나-19]132569
1064U072바이러스가 확인되지 않은 코로나바이러스 질환 2019 [바이러스가 확인되지 않은 코로나-19]13
1065U089상세불명의 코로나-19의 개인력1812
1066U099상세불명의 코로나-19 이후 병태12
1067U8280카바페넴계내성3<NA>
1068U99재발한 악성 신생물514