Overview

Dataset statistics

Number of variables4
Number of observations220
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.4 KiB
Average record size in memory34.6 B

Variable types

Text2
Numeric2

Dataset

Description보건복지부 국립나주병원에 내원한 환자 주진단명 현황으로 진단코드, 진단명, 진단받은 외래환자 수, 진단받은 입원환자 수로 구성되어 있습니다.
Author보건복지부 국립나주병원
URLhttps://www.data.go.kr/data/15042749/fileData.do

Alerts

인원수(외래) is highly overall correlated with 인원수(입원)High correlation
인원수(입원) is highly overall correlated with 인원수(외래)High correlation
진단코드 has unique valuesUnique
진단명 has unique valuesUnique
인원수(입원) has 185 (84.1%) zerosZeros

Reproduction

Analysis started2024-04-06 08:22:33.213187
Analysis finished2024-04-06 08:22:34.873291
Duration1.66 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

진단코드
Text

UNIQUE 

Distinct220
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
2024-04-06T17:22:35.522357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length4
Mean length4.0636364
Min length3

Characters and Unicode

Total characters894
Distinct characters21
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique220 ?
Unique (%)100.0%

Sample

1st rowE031
2nd rowE079
3rd rowE139
4th rowE784
5th rowE785
ValueCountFrequency (%)
e031 1
 
0.5%
f912 1
 
0.5%
g218 1
 
0.5%
f781 1
 
0.5%
f788 1
 
0.5%
f798 1
 
0.5%
f840 1
 
0.5%
f841 1
 
0.5%
f845 1
 
0.5%
f849 1
 
0.5%
Other values (210) 210
95.5%
2024-04-06T17:22:36.649039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
F 158
17.7%
0 133
14.9%
1 90
10.1%
3 80
8.9%
2 79
8.8%
9 67
7.5%
4 60
 
6.7%
8 56
 
6.3%
5 42
 
4.7%
7 38
 
4.3%
Other values (11) 91
10.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 674
75.4%
Uppercase Letter 220
 
24.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 158
71.8%
G 21
 
9.5%
Z 13
 
5.9%
K 8
 
3.6%
E 5
 
2.3%
I 4
 
1.8%
R 4
 
1.8%
J 3
 
1.4%
H 2
 
0.9%
M 1
 
0.5%
Decimal Number
ValueCountFrequency (%)
0 133
19.7%
1 90
13.4%
3 80
11.9%
2 79
11.7%
9 67
9.9%
4 60
8.9%
8 56
8.3%
5 42
 
6.2%
7 38
 
5.6%
6 29
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Common 674
75.4%
Latin 220
 
24.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 158
71.8%
G 21
 
9.5%
Z 13
 
5.9%
K 8
 
3.6%
E 5
 
2.3%
I 4
 
1.8%
R 4
 
1.8%
J 3
 
1.4%
H 2
 
0.9%
M 1
 
0.5%
Common
ValueCountFrequency (%)
0 133
19.7%
1 90
13.4%
3 80
11.9%
2 79
11.7%
9 67
9.9%
4 60
8.9%
8 56
8.3%
5 42
 
6.2%
7 38
 
5.6%
6 29
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 894
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 158
17.7%
0 133
14.9%
1 90
10.1%
3 80
8.9%
2 79
8.8%
9 67
7.5%
4 60
 
6.7%
8 56
 
6.3%
5 42
 
4.7%
7 38
 
4.3%
Other values (11) 91
10.2%

진단명
Text

UNIQUE 

Distinct220
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
2024-04-06T17:22:37.156083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length143
Median length75
Mean length41.909091
Min length8

Characters and Unicode

Total characters9220
Distinct characters64
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique220 ?
Unique (%)100.0%

Sample

1st rowCongenital hypothyroidism without goitre
2nd rowDisorder of thyroid/unspecified
3rd rowOther specified diabetes mellitus/without complications
4th rowOther hyperlipidaemia
5th rowHyperlipidaemia/unspecified
ValueCountFrequency (%)
of 49
 
4.7%
other 49
 
4.7%
disorder 31
 
3.0%
and 29
 
2.8%
with 29
 
2.8%
mental 22
 
2.1%
disorders 21
 
2.0%
behaviour 17
 
1.6%
without 17
 
1.6%
retardation 15
 
1.4%
Other values (327) 756
73.0%
2024-04-06T17:22:38.012108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 1013
 
11.0%
i 878
 
9.5%
815
 
8.8%
r 620
 
6.7%
t 611
 
6.6%
o 597
 
6.5%
s 586
 
6.4%
a 510
 
5.5%
n 486
 
5.3%
d 460
 
5.0%
Other values (54) 2644
28.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7986
86.6%
Space Separator 815
 
8.8%
Uppercase Letter 241
 
2.6%
Other Punctuation 84
 
0.9%
Open Punctuation 27
 
0.3%
Close Punctuation 27
 
0.3%
Decimal Number 19
 
0.2%
Dash Punctuation 13
 
0.1%
Final Punctuation 7
 
0.1%
Modifier Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1013
12.7%
i 878
11.0%
r 620
 
7.8%
t 611
 
7.7%
o 597
 
7.5%
s 586
 
7.3%
a 510
 
6.4%
n 486
 
6.1%
d 460
 
5.8%
c 323
 
4.0%
Other values (16) 1902
23.8%
Uppercase Letter
ValueCountFrequency (%)
O 42
17.4%
A 29
12.0%
D 22
9.1%
S 22
9.1%
M 18
 
7.5%
P 17
 
7.1%
C 13
 
5.4%
B 12
 
5.0%
G 11
 
4.6%
H 10
 
4.1%
Other values (10) 45
18.7%
Decimal Number
ValueCountFrequency (%)
0 6
31.6%
3 5
26.3%
8 3
15.8%
1 2
 
10.5%
2 2
 
10.5%
9 1
 
5.3%
Other Punctuation
ValueCountFrequency (%)
/ 72
85.7%
. 6
 
7.1%
5
 
6.0%
* 1
 
1.2%
Open Punctuation
ValueCountFrequency (%)
( 17
63.0%
[ 10
37.0%
Close Punctuation
ValueCountFrequency (%)
) 17
63.0%
] 10
37.0%
Space Separator
ValueCountFrequency (%)
815
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%
Final Punctuation
ValueCountFrequency (%)
7
100.0%
Modifier Symbol
ValueCountFrequency (%)
´ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8227
89.2%
Common 993
 
10.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1013
12.3%
i 878
10.7%
r 620
 
7.5%
t 611
 
7.4%
o 597
 
7.3%
s 586
 
7.1%
a 510
 
6.2%
n 486
 
5.9%
d 460
 
5.6%
c 323
 
3.9%
Other values (36) 2143
26.0%
Common
ValueCountFrequency (%)
815
82.1%
/ 72
 
7.3%
( 17
 
1.7%
) 17
 
1.7%
- 13
 
1.3%
] 10
 
1.0%
[ 10
 
1.0%
7
 
0.7%
. 6
 
0.6%
0 6
 
0.6%
Other values (8) 20
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9207
99.9%
Punctuation 12
 
0.1%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1013
 
11.0%
i 878
 
9.5%
815
 
8.9%
r 620
 
6.7%
t 611
 
6.6%
o 597
 
6.5%
s 586
 
6.4%
a 510
 
5.5%
n 486
 
5.3%
d 460
 
5.0%
Other values (51) 2631
28.6%
Punctuation
ValueCountFrequency (%)
7
58.3%
5
41.7%
None
ValueCountFrequency (%)
´ 1
100.0%

인원수(외래)
Real number (ℝ)

HIGH CORRELATION 

Distinct40
Distinct (%)18.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.645455
Minimum1
Maximum382
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB
2024-04-06T17:22:38.290947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q38
95-th percentile62
Maximum382
Range381
Interquartile range (IQR)7

Descriptive statistics

Standard deviation33.408523
Coefficient of variation (CV)2.6419393
Kurtosis70.076797
Mean12.645455
Median Absolute Deviation (MAD)1
Skewness7.1742011
Sum2782
Variance1116.1294
MonotonicityNot monotonic
2024-04-06T17:22:38.608835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
1 87
39.5%
2 24
 
10.9%
3 16
 
7.3%
4 14
 
6.4%
6 10
 
4.5%
5 7
 
3.2%
14 6
 
2.7%
8 5
 
2.3%
7 5
 
2.3%
11 3
 
1.4%
Other values (30) 43
19.5%
ValueCountFrequency (%)
1 87
39.5%
2 24
 
10.9%
3 16
 
7.3%
4 14
 
6.4%
5 7
 
3.2%
6 10
 
4.5%
7 5
 
2.3%
8 5
 
2.3%
10 3
 
1.4%
11 3
 
1.4%
ValueCountFrequency (%)
382 1
0.5%
152 1
0.5%
113 1
0.5%
100 2
0.9%
98 2
0.9%
97 1
0.5%
68 1
0.5%
64 1
0.5%
62 2
0.9%
59 1
0.5%

인원수(입원)
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct10
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.49545455
Minimum0
Maximum22
Zeros185
Zeros (%)84.1%
Negative0
Negative (%)0.0%
Memory size2.1 KiB
2024-04-06T17:22:38.845346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3
Maximum22
Range22
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.9479476
Coefficient of variation (CV)3.9316373
Kurtosis72.6008
Mean0.49545455
Median Absolute Deviation (MAD)0
Skewness7.5629252
Sum109
Variance3.7944998
MonotonicityNot monotonic
2024-04-06T17:22:39.018886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
0 185
84.1%
1 16
 
7.3%
2 7
 
3.2%
3 3
 
1.4%
4 3
 
1.4%
6 2
 
0.9%
22 1
 
0.5%
5 1
 
0.5%
8 1
 
0.5%
11 1
 
0.5%
ValueCountFrequency (%)
0 185
84.1%
1 16
 
7.3%
2 7
 
3.2%
3 3
 
1.4%
4 3
 
1.4%
5 1
 
0.5%
6 2
 
0.9%
8 1
 
0.5%
11 1
 
0.5%
22 1
 
0.5%
ValueCountFrequency (%)
22 1
 
0.5%
11 1
 
0.5%
8 1
 
0.5%
6 2
 
0.9%
5 1
 
0.5%
4 3
 
1.4%
3 3
 
1.4%
2 7
 
3.2%
1 16
 
7.3%
0 185
84.1%

Interactions

2024-04-06T17:22:34.214407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-06T17:22:33.628972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-06T17:22:34.433133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-06T17:22:33.913795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-06T17:22:39.183278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인원수(외래)인원수(입원)
인원수(외래)1.0000.748
인원수(입원)0.7481.000
2024-04-06T17:22:39.363801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인원수(외래)인원수(입원)
인원수(외래)1.0000.511
인원수(입원)0.5111.000

Missing values

2024-04-06T17:22:34.622221image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-06T17:22:34.777778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

진단코드진단명인원수(외래)인원수(입원)
0E031Congenital hypothyroidism without goitre10
1E079Disorder of thyroid/unspecified10
2E139Other specified diabetes mellitus/without complications10
3E784Other hyperlipidaemia10
4E785Hyperlipidaemia/unspecified40
5F000Dementia in Alzheimer’s disease with early onset(G30.0†)10
6F001Dementia in Alzheimer’s disease with late onset(G30.1†)50
7F002Dementia in Alzheimer’s disease/atypical or mixed type(G30.8†)20
8F009Dementia in Alzheimer´s disease/unspecified(G30.9†)240
9F019Vascular dementia/unspecified10
진단코드진단명인원수(외래)인원수(입원)
210Z246Need for immunization against viral hepatitis10
211Z559Problems related to education and literacy/unspecified10
212Z637Other stressful life events affecting family and household10
213Z718Persons encountering health services for other specified counselling10
214Z721Problems related to alcohol use40
215Z7280Problems related to using internet10
216Z7288Other problems related to lifestyle10
217Z730Problems related to burn-out10
218Z768Persons encountering health services in other specified circumstances140
219Z810Family history of mental retardation10