Overview

Dataset statistics

Number of variables5
Number of observations332
Missing cells1
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.1 KiB
Average record size in memory40.4 B

Variable types

Text1
Categorical3
Boolean1

Dataset

Description「검역법」에 따라 세계보건기구(WHO), 현지공관 등의 감염병 발생 정보를 기준으로 검역감염병 오염지역을 규정.그에 따른 검역감염병 오염국가에 대한 정보 제공. (국가, 오염감염병, 지정일, 해제일)
Author질병관리청
URLhttps://www.data.go.kr/data/3074726/fileData.do

Alerts

지정일자 is highly overall correlated with 오염감염병 and 2 other fieldsHigh correlation
지속여부 is highly overall correlated with 오염감염병 and 2 other fieldsHigh correlation
해지일자 is highly overall correlated with 오염감염병 and 2 other fieldsHigh correlation
오염감염병 is highly overall correlated with 지정일자 and 2 other fieldsHigh correlation
오염감염병 is highly imbalanced (61.4%)Imbalance
지정일자 is highly imbalanced (67.8%)Imbalance

Reproduction

Analysis started2023-12-12 05:21:02.775542
Analysis finished2023-12-12 05:21:03.262555
Duration0.49 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

국가
Text

Distinct259
Distinct (%)78.2%
Missing1
Missing (%)0.3%
Memory size2.7 KiB
2023-12-12T14:21:03.587748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length11
Mean length4.3746224
Min length1

Characters and Unicode

Total characters1448
Distinct characters236
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique199 ?
Unique (%)60.1%

Sample

1st rowMI
2nd rowOTHER
3rd row가나
4th row가나
5th row가봉
ValueCountFrequency (%)
콩고민주공화국 6
 
1.6%
5
 
1.3%
제도 5
 
1.3%
기니 4
 
1.0%
세인트 4
 
1.0%
에티오피아 4
 
1.0%
영국령 4
 
1.0%
나이지리아 4
 
1.0%
지역 3
 
0.8%
니제르 3
 
0.8%
Other values (267) 339
89.0%
2023-12-12T14:21:04.214737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
84
 
5.8%
50
 
3.5%
47
 
3.2%
45
 
3.1%
38
 
2.6%
38
 
2.6%
38
 
2.6%
36
 
2.5%
34
 
2.3%
30
 
2.1%
Other values (226) 1008
69.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1379
95.2%
Space Separator 50
 
3.5%
Uppercase Letter 7
 
0.5%
Other Punctuation 5
 
0.3%
Close Punctuation 3
 
0.2%
Open Punctuation 3
 
0.2%
Dash Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
84
 
6.1%
47
 
3.4%
45
 
3.3%
38
 
2.8%
38
 
2.8%
38
 
2.8%
36
 
2.6%
34
 
2.5%
30
 
2.2%
27
 
2.0%
Other values (214) 962
69.8%
Uppercase Letter
ValueCountFrequency (%)
I 1
14.3%
M 1
14.3%
H 1
14.3%
R 1
14.3%
E 1
14.3%
T 1
14.3%
O 1
14.3%
Space Separator
ValueCountFrequency (%)
50
100.0%
Other Punctuation
ValueCountFrequency (%)
, 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1379
95.2%
Common 62
 
4.3%
Latin 7
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
84
 
6.1%
47
 
3.4%
45
 
3.3%
38
 
2.8%
38
 
2.8%
38
 
2.8%
36
 
2.6%
34
 
2.5%
30
 
2.2%
27
 
2.0%
Other values (214) 962
69.8%
Latin
ValueCountFrequency (%)
I 1
14.3%
M 1
14.3%
H 1
14.3%
R 1
14.3%
E 1
14.3%
T 1
14.3%
O 1
14.3%
Common
ValueCountFrequency (%)
50
80.6%
, 5
 
8.1%
) 3
 
4.8%
( 3
 
4.8%
- 1
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1379
95.2%
ASCII 69
 
4.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
84
 
6.1%
47
 
3.4%
45
 
3.3%
38
 
2.8%
38
 
2.8%
38
 
2.8%
36
 
2.6%
34
 
2.5%
30
 
2.2%
27
 
2.0%
Other values (214) 962
69.8%
ASCII
ValueCountFrequency (%)
50
72.5%
, 5
 
7.2%
) 3
 
4.3%
( 3
 
4.3%
I 1
 
1.4%
M 1
 
1.4%
- 1
 
1.4%
H 1
 
1.4%
R 1
 
1.4%
E 1
 
1.4%
Other values (2) 2
 
2.9%

오염감염병
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
코로나19
260 
황열
41 
중동호흡기증후군
 
11
폴리오
 
9
콜레라
 
8
Other values (3)
 
3

Length

Max length12
Median length5
Mean length4.6746988
Min length2

Unique

Unique3 ?
Unique (%)0.9%

Sample

1st row코로나19
2nd row코로나19
3rd row황열
4th row코로나19
5th row코로나19

Common Values

ValueCountFrequency (%)
코로나19 260
78.3%
황열 41
 
12.3%
중동호흡기증후군 11
 
3.3%
폴리오 9
 
2.7%
콜레라 8
 
2.4%
동물인플루엔자인체감염증 1
 
0.3%
페스트 1
 
0.3%
에볼라바이러스병 1
 
0.3%

Length

2023-12-12T14:21:04.703371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:21:04.828076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
코로나19 260
78.3%
황열 41
 
12.3%
중동호흡기증후군 11
 
3.3%
폴리오 9
 
2.7%
콜레라 8
 
2.4%
동물인플루엔자인체감염증 1
 
0.3%
페스트 1
 
0.3%
에볼라바이러스병 1
 
0.3%

지정일자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct21
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
2020-07-01
257 
2005-05-01
40 
2019-01-01
 
7
2011-09-28
 
3
2017-02-10
 
2
Other values (16)
 
23

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique9 ?
Unique (%)2.7%

Sample

1st row2020-07-01
2nd row2020-07-01
3rd row2005-05-01
4th row2020-07-01
5th row2020-07-01

Common Values

ValueCountFrequency (%)
2020-07-01 257
77.4%
2005-05-01 40
 
12.0%
2019-01-01 7
 
2.1%
2011-09-28 3
 
0.9%
2017-02-10 2
 
0.6%
2018-01-01 2
 
0.6%
2020-01-01 2
 
0.6%
2020-03-11 2
 
0.6%
2018-07-01 2
 
0.6%
2013-05-16 2
 
0.6%
Other values (11) 13
 
3.9%

Length

2023-12-12T14:21:04.966709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-07-01 257
77.4%
2005-05-01 40
 
12.0%
2019-01-01 7
 
2.1%
2011-09-28 3
 
0.9%
2018-07-01 2
 
0.6%
2020-02-12 2
 
0.6%
2013-05-16 2
 
0.6%
2017-06-30 2
 
0.6%
2020-03-11 2
 
0.6%
2020-01-01 2
 
0.6%
Other values (11) 13
 
3.9%

해지일자
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
2023-07-15
261 
9999-12-31
71 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-07-15
2nd row2023-07-15
3rd row9999-12-31
4th row2023-07-15
5th row2023-07-15

Common Values

ValueCountFrequency (%)
2023-07-15 261
78.6%
9999-12-31 71
 
21.4%

Length

2023-12-12T14:21:05.142526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:21:05.247142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023-07-15 261
78.6%
9999-12-31 71
 
21.4%

지속여부
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size464.0 B
False
261 
True
71 
ValueCountFrequency (%)
False 261
78.6%
True 71
 
21.4%
2023-12-12T14:21:05.327544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:21:05.400846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
오염감염병지정일자해지일자지속여부
오염감염병1.0000.9741.0001.000
지정일자0.9741.0000.9820.982
해지일자1.0000.9821.0001.000
지속여부1.0000.9821.0001.000
2023-12-12T14:21:05.503834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지정일자지속여부해지일자오염감염병
지정일자1.0000.9520.9520.865
지속여부0.9521.0000.9910.983
해지일자0.9520.9911.0000.983
오염감염병0.8650.9830.9831.000
2023-12-12T14:21:05.593540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
오염감염병지정일자해지일자지속여부
오염감염병1.0000.8650.9830.983
지정일자0.8651.0000.9520.952
해지일자0.9830.9521.0000.991
지속여부0.9830.9520.9911.000

Missing values

2023-12-12T14:21:03.085099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:21:03.208747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

국가오염감염병지정일자해지일자지속여부
0MI코로나192020-07-012023-07-15N
1OTHER코로나192020-07-012023-07-15N
2가나황열2005-05-019999-12-31Y
3가나코로나192020-07-012023-07-15N
4가봉코로나192020-07-012023-07-15N
5가봉황열2005-05-019999-12-31Y
6가이아나황열2005-05-019999-12-31Y
7가이아나코로나192020-07-012023-07-15N
8감비아코로나192020-07-012023-07-15N
9과들루프코로나192020-07-012023-07-15N
국가오염감염병지정일자해지일자지속여부
322프랑스령 폴리네시아코로나192020-07-012023-07-15N
323피지코로나192020-07-012023-07-15N
324핀란드코로나192020-07-012023-07-15N
325필리핀코로나192020-07-012023-07-15N
326핏카인도코로나192020-07-012023-07-15N
327하드 앤 맥도날드코로나192020-07-012023-07-15N
328헝가리코로나192020-07-012023-07-15N
329호주코로나192020-07-012023-07-15N
330홍콩코로나192020-02-122023-07-15N
331<NA>코로나192020-07-012023-07-15N