Overview

Dataset statistics

Number of variables6
Number of observations62
Missing cells78
Missing cells (%)21.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.3 KiB
Average record size in memory54.1 B

Variable types

Categorical3
Text1
Numeric2

Dataset

Description강원특별자치도에서 발생한 1~3급 법정감염병 주차별 발생현황 (주요 감염병 : 중동호흡기증후군, 수두, 홍역, 콜레라 , 장티푸스 등)
URLhttps://www.data.go.kr/data/15064351/fileData.do

Alerts

해당연도 has constant value ""Constant
해당연도.1 has constant value ""Constant
건수 is highly overall correlated with 건수.1High correlation
건수.1 is highly overall correlated with 건수High correlation
건수 has 39 (62.9%) missing valuesMissing
건수.1 has 39 (62.9%) missing valuesMissing
감염병명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 04:25:35.706455
Analysis finished2023-12-12 04:25:36.686698
Duration0.98 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

질병군
Categorical

Distinct3
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Memory size628.0 B
3급
25 
2급
22 
1급
15 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1급
2nd row1급
3rd row1급
4th row1급
5th row1급

Common Values

ValueCountFrequency (%)
3급 25
40.3%
2급 22
35.5%
1급 15
24.2%

Length

2023-12-12T13:25:36.778846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:25:36.918293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3급 25
40.3%
2급 22
35.5%
1급 15
24.2%

감염병명
Text

UNIQUE 

Distinct62
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size628.0 B
2023-12-12T13:25:37.249517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length9
Mean length5.3225806
Min length2

Characters and Unicode

Total characters330
Distinct characters149
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique62 ?
Unique (%)100.0%

Sample

1st row에볼라바이러스
2nd row마버그열
3rd row라싸열
4th row크리미안콩고출혈열
5th row리프트밸리열
ValueCountFrequency (%)
에볼라바이러스 1
 
1.6%
디프테리아 1
 
1.6%
한센병 1
 
1.6%
중증열성혈소판감소증후군 1
 
1.6%
성홍열 1
 
1.6%
vrsa 1
 
1.6%
cre 1
 
1.6%
e형간염 1
 
1.6%
파상풍 1
 
1.6%
b형간염 1
 
1.6%
Other values (54) 54
84.4%
2023-12-12T13:25:37.795892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
16
 
4.8%
13
 
3.9%
12
 
3.6%
9
 
2.7%
9
 
2.7%
8
 
2.4%
8
 
2.4%
6
 
1.8%
6
 
1.8%
5
 
1.5%
Other values (139) 238
72.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 298
90.3%
Uppercase Letter 17
 
5.2%
Decimal Number 4
 
1.2%
Open Punctuation 3
 
0.9%
Close Punctuation 3
 
0.9%
Space Separator 2
 
0.6%
Lowercase Letter 2
 
0.6%
Other Punctuation 1
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
16
 
5.4%
13
 
4.4%
12
 
4.0%
9
 
3.0%
9
 
3.0%
8
 
2.7%
8
 
2.7%
6
 
2.0%
6
 
2.0%
5
 
1.7%
Other values (120) 206
69.1%
Uppercase Letter
ValueCountFrequency (%)
C 4
23.5%
A 2
11.8%
D 2
11.8%
E 2
11.8%
J 2
11.8%
R 2
11.8%
V 1
 
5.9%
B 1
 
5.9%
S 1
 
5.9%
Decimal Number
ValueCountFrequency (%)
8 1
25.0%
1 1
25.0%
0 1
25.0%
2 1
25.0%
Lowercase Letter
ValueCountFrequency (%)
v 1
50.0%
b 1
50.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Space Separator
ValueCountFrequency (%)
2
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 298
90.3%
Latin 19
 
5.8%
Common 13
 
3.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
16
 
5.4%
13
 
4.4%
12
 
4.0%
9
 
3.0%
9
 
3.0%
8
 
2.7%
8
 
2.7%
6
 
2.0%
6
 
2.0%
5
 
1.7%
Other values (120) 206
69.1%
Latin
ValueCountFrequency (%)
C 4
21.1%
A 2
10.5%
D 2
10.5%
E 2
10.5%
J 2
10.5%
R 2
10.5%
V 1
 
5.3%
B 1
 
5.3%
S 1
 
5.3%
v 1
 
5.3%
Common
ValueCountFrequency (%)
( 3
23.1%
) 3
23.1%
2
15.4%
/ 1
 
7.7%
8 1
 
7.7%
1 1
 
7.7%
0 1
 
7.7%
2 1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 298
90.3%
ASCII 32
 
9.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
16
 
5.4%
13
 
4.4%
12
 
4.0%
9
 
3.0%
9
 
3.0%
8
 
2.7%
8
 
2.7%
6
 
2.0%
6
 
2.0%
5
 
1.7%
Other values (120) 206
69.1%
ASCII
ValueCountFrequency (%)
C 4
12.5%
( 3
 
9.4%
) 3
 
9.4%
A 2
 
6.2%
D 2
 
6.2%
E 2
 
6.2%
J 2
 
6.2%
R 2
 
6.2%
2
 
6.2%
V 1
 
3.1%
Other values (9) 9
28.1%

해당연도
Categorical

CONSTANT 

Distinct1
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size628.0 B
2022
62 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022
2nd row2022
3rd row2022
4th row2022
5th row2022

Common Values

ValueCountFrequency (%)
2022 62
100.0%

Length

2023-12-12T13:25:37.983577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:25:38.118572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022 62
100.0%

건수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct16
Distinct (%)69.6%
Missing39
Missing (%)62.9%
Infinite0
Infinite (%)0.0%
Mean76.521739
Minimum1
Maximum706
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size690.0 B
2023-12-12T13:25:38.225077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median11
Q333
95-th percentile459
Maximum706
Range705
Interquartile range (IQR)31

Descriptive statistics

Standard deviation174.32046
Coefficient of variation (CV)2.2780515
Kurtosis8.5703034
Mean76.521739
Median Absolute Deviation (MAD)10
Skewness2.962307
Sum1760
Variance30387.625
MonotonicityNot monotonic
2023-12-12T13:25:38.403237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
1 5
 
8.1%
6 2
 
3.2%
13 2
 
3.2%
2 2
 
3.2%
15 1
 
1.6%
28 1
 
1.6%
4 1
 
1.6%
25 1
 
1.6%
10 1
 
1.6%
486 1
 
1.6%
Other values (6) 6
 
9.7%
(Missing) 39
62.9%
ValueCountFrequency (%)
1 5
8.1%
2 2
 
3.2%
4 1
 
1.6%
6 2
 
3.2%
10 1
 
1.6%
11 1
 
1.6%
13 2
 
3.2%
15 1
 
1.6%
25 1
 
1.6%
28 1
 
1.6%
ValueCountFrequency (%)
706 1
1.6%
486 1
1.6%
216 1
1.6%
126 1
1.6%
48 1
1.6%
38 1
1.6%
28 1
1.6%
25 1
1.6%
15 1
1.6%
13 2
3.2%

해당연도.1
Categorical

CONSTANT 

Distinct1
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size628.0 B
2021
62 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021
2nd row2021
3rd row2021
4th row2021
5th row2021

Common Values

ValueCountFrequency (%)
2021 62
100.0%

Length

2023-12-12T13:25:38.562405image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:25:38.682274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2021 62
100.0%

건수.1
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct17
Distinct (%)73.9%
Missing39
Missing (%)62.9%
Infinite0
Infinite (%)0.0%
Mean89.347826
Minimum1
Maximum608
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size690.0 B
2023-12-12T13:25:39.068953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median8
Q337
95-th percentile487.6
Maximum608
Range607
Interquartile range (IQR)35

Descriptive statistics

Standard deviation172.50444
Coefficient of variation (CV)1.9307067
Kurtosis3.8705125
Mean89.347826
Median Absolute Deviation (MAD)7
Skewness2.1867273
Sum2055
Variance29757.783
MonotonicityNot monotonic
2023-12-12T13:25:39.197126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1 5
 
8.1%
2 2
 
3.2%
6 2
 
3.2%
4 1
 
1.6%
158 1
 
1.6%
376 1
 
1.6%
18 1
 
1.6%
500 1
 
1.6%
608 1
 
1.6%
15 1
 
1.6%
Other values (7) 7
 
11.3%
(Missing) 39
62.9%
ValueCountFrequency (%)
1 5
8.1%
2 2
 
3.2%
4 1
 
1.6%
6 2
 
3.2%
7 1
 
1.6%
8 1
 
1.6%
15 1
 
1.6%
18 1
 
1.6%
19 1
 
1.6%
29 1
 
1.6%
ValueCountFrequency (%)
608 1
1.6%
500 1
1.6%
376 1
1.6%
218 1
1.6%
158 1
1.6%
42 1
1.6%
32 1
1.6%
29 1
1.6%
19 1
1.6%
18 1
1.6%

Interactions

2023-12-12T13:25:36.143187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:25:35.926944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:25:36.262103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:25:36.042097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T13:25:39.300258image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
질병군감염병명건수건수.1
질병군1.0001.0000.1890.000
감염병명1.0001.0001.0001.000
건수0.1891.0001.0001.000
건수.10.0001.0001.0001.000
2023-12-12T13:25:39.435151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
건수건수.1질병군
건수1.0000.9430.194
건수.10.9431.0000.000
질병군0.1940.0001.000

Missing values

2023-12-12T13:25:36.417940image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:25:36.515977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T13:25:36.635499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

질병군감염병명해당연도건수해당연도.1건수.1
01급에볼라바이러스2022<NA>2021<NA>
11급마버그열2022<NA>2021<NA>
21급라싸열2022<NA>2021<NA>
31급크리미안콩고출혈열2022<NA>2021<NA>
41급리프트밸리열2022<NA>2021<NA>
51급두창2022<NA>2021<NA>
61급페스트2022<NA>2021<NA>
71급탄저2022<NA>2021<NA>
81급보툴리눔독소증2022<NA>2021<NA>
91급야토병2022<NA>2021<NA>
질병군감염병명해당연도건수해당연도.1건수.1
523급황열2022<NA>2021<NA>
533급뎅기열2022420211
543급큐열202212021<NA>
553급웨스트나일열2022<NA>2021<NA>
563급라임병2022<NA>2021<NA>
573급진드기매개뇌염2022<NA>2021<NA>
583급유비저2022<NA>2021<NA>
593급치쿤구니아열2022<NA>2021<NA>
603급중증열성혈소판감소증후군202228202119
613급지카바이러스감염증2022<NA>2021<NA>