Overview

Dataset statistics

Number of variables10
Number of observations236
Missing cells0
Missing cells (%)0.0%
Duplicate rows48
Duplicate rows (%)20.3%
Total size in memory18.6 KiB
Average record size in memory80.6 B

Variable types

Categorical10

Dataset

Description검역시 탑승자로 부터 채취한 가검물에 대한 검사정보 (구분, 채취일자, 가검물분류, 상세가검물, 검사기관, 검사종류, 검출일자, 검출균, 법정감염병, 법정군)
Author질병관리청
URLhttps://www.data.go.kr/data/3074717/fileData.do

Alerts

Dataset has 48 (20.3%) duplicate rowsDuplicates
법정감염병 is highly overall correlated with 구분 and 6 other fieldsHigh correlation
검출균 is highly overall correlated with 구분 and 6 other fieldsHigh correlation
가검물분류 is highly overall correlated with 구분 and 8 other fieldsHigh correlation
검사종류 is highly overall correlated with 구분 and 6 other fieldsHigh correlation
법정군 is highly overall correlated with 가검물분류 and 4 other fieldsHigh correlation
검출일자 is highly overall correlated with 구분 and 4 other fieldsHigh correlation
상세가검물 is highly overall correlated with 구분 and 5 other fieldsHigh correlation
구분 is highly overall correlated with 채취일자 and 6 other fieldsHigh correlation
채취일자 is highly overall correlated with 구분 and 3 other fieldsHigh correlation
검사기관 is highly overall correlated with 채취일자 and 4 other fieldsHigh correlation
구분 is highly imbalanced (87.6%)Imbalance
가검물분류 is highly imbalanced (76.6%)Imbalance
상세가검물 is highly imbalanced (86.2%)Imbalance
검사기관 is highly imbalanced (77.8%)Imbalance
검사종류 is highly imbalanced (58.1%)Imbalance
법정군 is highly imbalanced (84.4%)Imbalance

Reproduction

Analysis started2023-12-13 00:58:09.355454
Analysis finished2023-12-13 00:58:10.169155
Duration0.81 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
항공
232 
선박
 
4

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row항공
2nd row항공
3rd row항공
4th row항공
5th row항공

Common Values

ValueCountFrequency (%)
항공 232
98.3%
선박 4
 
1.7%

Length

2023-12-13T09:58:10.217169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:58:10.282904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
항공 232
98.3%
선박 4
 
1.7%

채취일자
Categorical

HIGH CORRELATION 

Distinct42
Distinct (%)17.8%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
2020-01-15
27 
2020-01-14
18 
2020-01-16
18 
2020-01-23
 
13
2020-01-24
 
12
Other values (37)
148 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique11 ?
Unique (%)4.7%

Sample

1st row2020-01-01
2nd row2020-01-02
3rd row2020-01-02
4th row2020-01-02
5th row2020-01-02

Common Values

ValueCountFrequency (%)
2020-01-15 27
 
11.4%
2020-01-14 18
 
7.6%
2020-01-16 18
 
7.6%
2020-01-23 13
 
5.5%
2020-01-24 12
 
5.1%
2020-01-18 11
 
4.7%
2020-01-12 11
 
4.7%
2020-01-21 11
 
4.7%
2020-01-20 11
 
4.7%
2020-01-17 10
 
4.2%
Other values (32) 94
39.8%

Length

2023-12-13T09:58:10.354330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-01-15 27
 
11.4%
2020-01-16 18
 
7.6%
2020-01-14 18
 
7.6%
2020-01-23 13
 
5.5%
2020-01-24 12
 
5.1%
2020-01-18 11
 
4.7%
2020-01-12 11
 
4.7%
2020-01-21 11
 
4.7%
2020-01-20 11
 
4.7%
2020-01-03 10
 
4.2%
Other values (32) 94
39.8%

가검물분류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
채변
227 
상기도
 
9

Length

Max length3
Median length2
Mean length2.0381356
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row상기도
2nd row채변
3rd row채변
4th row채변
5th row채변

Common Values

ValueCountFrequency (%)
채변 227
96.2%
상기도 9
 
3.8%

Length

2023-12-13T09:58:10.454935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:58:10.520712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
채변 227
96.2%
상기도 9
 
3.8%

상세가검물
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
<NA>
229 
구인두
 
5
구인두+비인두
 
2

Length

Max length7
Median length4
Mean length4.0042373
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row구인두
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 229
97.0%
구인두 5
 
2.1%
구인두+비인두 2
 
0.8%

Length

2023-12-13T09:58:10.615281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:58:10.691461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 229
97.0%
구인두 5
 
2.1%
구인두+비인두 2
 
0.8%

검사기관
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
수도권질병대응센터
220 
경남권질병대응센터
 
8
호남권질병대응센터
 
7
경북권질병대응센터
 
1

Length

Max length9
Median length9
Mean length9
Min length9

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row호남권질병대응센터
2nd row수도권질병대응센터
3rd row수도권질병대응센터
4th row수도권질병대응센터
5th row수도권질병대응센터

Common Values

ValueCountFrequency (%)
수도권질병대응센터 220
93.2%
경남권질병대응센터 8
 
3.4%
호남권질병대응센터 7
 
3.0%
경북권질병대응센터 1
 
0.4%

Length

2023-12-13T09:58:10.764560image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:58:10.836835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
수도권질병대응센터 220
93.2%
경남권질병대응센터 8
 
3.4%
호남권질병대응센터 7
 
3.0%
경북권질병대응센터 1
 
0.4%

검사종류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
Escherichia coli
196 
Campylobacter
 
15
Vibrio
 
12
Virus
 
9
Salmonella&Shigella
 
4

Length

Max length19
Median length16
Mean length14.932203
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowVirus
2nd rowEscherichia coli
3rd rowEscherichia coli
4th rowEscherichia coli
5th rowEscherichia coli

Common Values

ValueCountFrequency (%)
Escherichia coli 196
83.1%
Campylobacter 15
 
6.4%
Vibrio 12
 
5.1%
Virus 9
 
3.8%
Salmonella&Shigella 4
 
1.7%

Length

2023-12-13T09:58:10.944930image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:58:11.021454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
escherichia 196
45.4%
coli 196
45.4%
campylobacter 15
 
3.5%
vibrio 12
 
2.8%
virus 9
 
2.1%
salmonella&shigella 4
 
0.9%

검출일자
Categorical

HIGH CORRELATION 

Distinct44
Distinct (%)18.6%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
2020-01-17
24 
2020-01-16
20 
2020-01-25
18 
2020-01-18
 
14
2020-01-13
 
14
Other values (39)
146 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique14 ?
Unique (%)5.9%

Sample

1st row2020-01-03
2nd row2020-01-03
3rd row2020-01-03
4th row2020-01-03
5th row2020-01-04

Common Values

ValueCountFrequency (%)
2020-01-17 24
 
10.2%
2020-01-16 20
 
8.5%
2020-01-25 18
 
7.6%
2020-01-18 14
 
5.9%
2020-01-13 14
 
5.9%
2020-01-24 10
 
4.2%
2020-01-22 10
 
4.2%
2020-01-15 10
 
4.2%
2020-01-19 10
 
4.2%
2020-01-23 9
 
3.8%
Other values (34) 97
41.1%

Length

2023-12-13T09:58:11.103332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-01-17 24
 
10.2%
2020-01-16 20
 
8.5%
2020-01-25 18
 
7.6%
2020-01-18 14
 
5.9%
2020-01-13 14
 
5.9%
2020-01-24 10
 
4.2%
2020-01-22 10
 
4.2%
2020-01-15 10
 
4.2%
2020-01-19 10
 
4.2%
2020-01-23 9
 
3.8%
Other values (34) 97
41.1%

검출균
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
EPEC
94 
ETEC LT, ST
61 
ETEC ST
26 
ETEC LT
14 
Vibrio parahaemolyticus
11 
Other values (11)
30 

Length

Max length26
Median length23
Mean length8.9915254
Min length4

Unique

Unique5 ?
Unique (%)2.1%

Sample

1st rowInfluenza A/H3N2
2nd rowEPEC
3rd rowETEC LT
4th rowEPEC
5th rowETEC ST

Common Values

ValueCountFrequency (%)
EPEC 94
39.8%
ETEC LT, ST 61
25.8%
ETEC ST 26
 
11.0%
ETEC LT 14
 
5.9%
Vibrio parahaemolyticus 11
 
4.7%
Campylobacter jejuni 10
 
4.2%
Campylobacter coli 5
 
2.1%
Influenza A/H3N2 3
 
1.3%
Influenza A/H1N1 3
 
1.3%
Shigella sonnei 2
 
0.8%
Other values (6) 7
 
3.0%

Length

2023-12-13T09:58:11.187343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
etec 101
22.7%
epec 94
21.2%
st 87
19.6%
lt 75
16.9%
campylobacter 15
 
3.4%
vibrio 12
 
2.7%
parahaemolyticus 11
 
2.5%
jejuni 10
 
2.3%
influenza 6
 
1.4%
coli 5
 
1.1%
Other values (16) 28
 
6.3%

법정감염병
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
장독소성대장균 (ETEC)감염증
101 
장병원성대장균(EPEC)감염증
94 
캄필로박터균 감염증
15 
장염비브리오균 감염증
11 
인플루엔자
 
6
Other values (5)
 
9

Length

Max length17
Median length16
Mean length15.207627
Min length4

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row인플루엔자
2nd row장병원성대장균(EPEC)감염증
3rd row장독소성대장균 (ETEC)감염증
4th row장병원성대장균(EPEC)감염증
5th row장독소성대장균 (ETEC)감염증

Common Values

ValueCountFrequency (%)
장독소성대장균 (ETEC)감염증 101
42.8%
장병원성대장균(EPEC)감염증 94
39.8%
캄필로박터균 감염증 15
 
6.4%
장염비브리오균 감염증 11
 
4.7%
인플루엔자 6
 
2.5%
살모넬라균 감염증 2
 
0.8%
세균성이질 2
 
0.8%
<NA> 2
 
0.8%
리노바이러스 감염증 2
 
0.8%
아데노바이러스 감염증 1
 
0.4%

Length

2023-12-13T09:58:11.274385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:58:11.363825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
장독소성대장균 101
27.4%
etec)감염증 101
27.4%
장병원성대장균(epec)감염증 94
25.5%
감염증 31
 
8.4%
캄필로박터균 15
 
4.1%
장염비브리오균 11
 
3.0%
인플루엔자 6
 
1.6%
살모넬라균 2
 
0.5%
세균성이질 2
 
0.5%
na 2
 
0.5%
Other values (2) 3
 
0.8%

법정군
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
지정
226 
3군
 
6
1군
 
2
<NA>
 
2

Length

Max length4
Median length2
Mean length2.0169492
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3군
2nd row지정
3rd row지정
4th row지정
5th row지정

Common Values

ValueCountFrequency (%)
지정 226
95.8%
3군 6
 
2.5%
1군 2
 
0.8%
<NA> 2
 
0.8%

Length

2023-12-13T09:58:11.481285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:58:11.557276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지정 226
95.8%
3군 6
 
2.5%
1군 2
 
0.8%
na 2
 
0.8%

Correlations

2023-12-13T09:58:11.614385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분채취일자가검물분류상세가검물검사기관검사종류검출일자검출균법정감염병법정군
구분1.0000.9210.7820.7820.6990.5380.9500.9870.7850.237
채취일자0.9211.0000.8371.0000.8280.7150.9920.8430.8110.731
가검물분류0.7820.8371.000NaN0.9401.0000.6811.0001.0000.542
상세가검물0.7821.000NaN1.0000.000NaN1.0001.0001.0000.270
검사기관0.6990.8280.9400.0001.0000.6180.7510.9920.7930.429
검사종류0.5380.7151.000NaN0.6181.0000.8021.0001.0000.753
검출일자0.9500.9920.6811.0000.7510.8021.0000.8850.8450.847
검출균0.9870.8431.0001.0000.9921.0000.8851.0001.0001.000
법정감염병0.7850.8111.0001.0000.7931.0000.8451.0001.0001.000
법정군0.2370.7310.5420.2700.4290.7530.8471.0001.0001.000
2023-12-13T09:58:11.711730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
법정감염병검출균가검물분류검사종류구분검사기관채취일자법정군검출일자상세가검물
법정감염병1.0000.9890.9850.9910.7930.6440.4180.9870.4640.894
검출균0.9891.0000.9700.9760.8770.8590.3810.9760.4390.775
가검물분류0.9850.9701.0000.9940.5710.7760.6380.8080.5001.000
검사종류0.9910.9760.9941.0000.6480.5450.3910.7510.4751.000
구분0.7930.8770.5710.6481.0000.4930.7290.3860.7640.554
검사기관0.6440.8590.7760.5450.4931.0000.5320.4210.4160.000
채취일자0.4180.3810.6380.3910.7290.5321.0000.4270.7620.447
법정군0.9870.9760.8080.7510.3860.4210.4271.0000.5860.050
검출일자0.4640.4390.5000.4750.7640.4160.7620.5861.0001.000
상세가검물0.8940.7751.0001.0000.5540.0000.4470.0501.0001.000
2023-12-13T09:58:11.805645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분채취일자가검물분류상세가검물검사기관검사종류검출일자검출균법정감염병법정군
구분1.0000.7290.5710.5540.4930.6480.7640.8770.7930.386
채취일자0.7291.0000.6380.4470.5320.3910.7620.3810.4180.427
가검물분류0.5710.6381.0001.0000.7760.9940.5000.9700.9850.808
상세가검물0.5540.4471.0001.0000.0001.0001.0000.7750.8940.050
검사기관0.4930.5320.7760.0001.0000.5450.4160.8590.6440.421
검사종류0.6480.3910.9941.0000.5451.0000.4750.9760.9910.751
검출일자0.7640.7620.5001.0000.4160.4751.0000.4390.4640.586
검출균0.8770.3810.9700.7750.8590.9760.4391.0000.9890.976
법정감염병0.7930.4180.9850.8940.6440.9910.4640.9891.0000.987
법정군0.3860.4270.8080.0500.4210.7510.5860.9760.9871.000

Missing values

2023-12-13T09:58:09.998104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T09:58:10.114326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

구분채취일자가검물분류상세가검물검사기관검사종류검출일자검출균법정감염병법정군
0항공2020-01-01상기도구인두호남권질병대응센터Virus2020-01-03Influenza A/H3N2인플루엔자3군
1항공2020-01-02채변<NA>수도권질병대응센터Escherichia coli2020-01-03EPEC장병원성대장균(EPEC)감염증지정
2항공2020-01-02채변<NA>수도권질병대응센터Escherichia coli2020-01-03ETEC LT장독소성대장균 (ETEC)감염증지정
3항공2020-01-02채변<NA>수도권질병대응센터Escherichia coli2020-01-03EPEC장병원성대장균(EPEC)감염증지정
4항공2020-01-02채변<NA>수도권질병대응센터Escherichia coli2020-01-04ETEC ST장독소성대장균 (ETEC)감염증지정
5항공2020-01-02채변<NA>수도권질병대응센터Campylobacter2020-01-04Campylobacter coli캄필로박터균 감염증지정
6항공2020-01-02채변<NA>수도권질병대응센터Campylobacter2020-01-04Campylobacter coli캄필로박터균 감염증지정
7항공2020-01-03채변<NA>수도권질병대응센터Salmonella&Shigella2020-01-05Salmonella sp. Serogroup d살모넬라균 감염증지정
8항공2020-01-03채변<NA>수도권질병대응센터Escherichia coli2020-01-05ETEC LT, ST장독소성대장균 (ETEC)감염증지정
9항공2020-01-03채변<NA>수도권질병대응센터Escherichia coli2020-01-05EPEC장병원성대장균(EPEC)감염증지정
구분채취일자가검물분류상세가검물검사기관검사종류검출일자검출균법정감염병법정군
226항공2020-02-18채변<NA>수도권질병대응센터Escherichia coli2020-02-19ETEC LT, ST장독소성대장균 (ETEC)감염증지정
227항공2020-02-19채변<NA>수도권질병대응센터Escherichia coli2020-02-20ETEC LT, ST장독소성대장균 (ETEC)감염증지정
228항공2020-02-24채변<NA>수도권질병대응센터Vibrio2020-02-29Vibrio parahaemolyticus장염비브리오균 감염증지정
229항공2020-02-24채변<NA>수도권질병대응센터Escherichia coli2020-02-26EPEC장병원성대장균(EPEC)감염증지정
230항공2020-02-25채변<NA>수도권질병대응센터Escherichia coli2020-02-27ETEC LT, ST장독소성대장균 (ETEC)감염증지정
231항공2020-03-06채변<NA>수도권질병대응센터Escherichia coli2020-03-08ETEC ST장독소성대장균 (ETEC)감염증지정
232항공2020-03-06채변<NA>수도권질병대응센터Escherichia coli2020-03-08EPEC장병원성대장균(EPEC)감염증지정
233항공2020-03-15채변<NA>수도권질병대응센터Escherichia coli2020-03-17EPEC장병원성대장균(EPEC)감염증지정
234선박2020-03-23상기도구인두+비인두호남권질병대응센터Virus2020-03-23Human Rhinovirus리노바이러스 감염증지정
235선박2020-06-13상기도구인두+비인두호남권질병대응센터Virus2020-06-13Human Rhinovirus리노바이러스 감염증지정

Duplicate rows

Most frequently occurring

구분채취일자가검물분류상세가검물검사기관검사종류검출일자검출균법정감염병법정군# duplicates
24항공2020-01-16채변<NA>수도권질병대응센터Escherichia coli2020-01-17ETEC LT, ST장독소성대장균 (ETEC)감염증지정8
20항공2020-01-15채변<NA>수도권질병대응센터Escherichia coli2020-01-17EPEC장병원성대장균(EPEC)감염증지정7
43항공2020-01-23채변<NA>수도권질병대응센터Escherichia coli2020-01-25ETEC LT, ST장독소성대장균 (ETEC)감염증지정6
10항공2020-01-12채변<NA>수도권질병대응센터Escherichia coli2020-01-13EPEC장병원성대장균(EPEC)감염증지정5
17항공2020-01-15채변<NA>수도권질병대응센터Escherichia coli2020-01-16EPEC장병원성대장균(EPEC)감염증지정5
21항공2020-01-15채변<NA>수도권질병대응센터Escherichia coli2020-01-17ETEC LT, ST장독소성대장균 (ETEC)감염증지정5
44항공2020-01-24채변<NA>수도권질병대응센터Escherichia coli2020-01-25EPEC장병원성대장균(EPEC)감염증지정5
13항공2020-01-14채변<NA>수도권질병대응센터Escherichia coli2020-01-15EPEC장병원성대장균(EPEC)감염증지정4
15항공2020-01-14채변<NA>수도권질병대응센터Escherichia coli2020-01-16EPEC장병원성대장균(EPEC)감염증지정4
16항공2020-01-14채변<NA>수도권질병대응센터Escherichia coli2020-01-16ETEC LT, ST장독소성대장균 (ETEC)감염증지정4