Overview

Dataset statistics

Number of variables6
Number of observations265
Missing cells94
Missing cells (%)5.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory12.6 KiB
Average record size in memory48.5 B

Variable types

Categorical3
Text3

Dataset

Description대구광역시_시료유형 및 검사항목_20210316
Author대구광역시
URLhttp://data.daegu.go.kr/open/data/dataView.do?dataSetId=15062514&dataSetDetailId=150625141b5eba396deab&provdMethod=FILE

Alerts

시료유형1 is highly overall correlated with 구분 and 1 other fieldsHigh correlation
구분 is highly overall correlated with 시료유형1 and 1 other fieldsHigh correlation
비고 is highly overall correlated with 구분 and 1 other fieldsHigh correlation
비고 is highly imbalanced (73.2%)Imbalance
시료유형2 has 87 (32.8%) missing valuesMissing
검사항목 has 7 (2.6%) missing valuesMissing

Reproduction

Analysis started2024-04-16 15:51:05.700217
Analysis finished2024-04-16 15:51:07.528155
Duration1.83 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
식품
95 
의약품
40 
하천수, 호소수
37 
토양
34 
환경검사
23 
Other values (3)
36 

Length

Max length8
Median length2
Mean length3.3735849
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row미생물
2nd row미생물
3rd row미생물
4th row미생물
5th row미생물

Common Values

ValueCountFrequency (%)
식품 95
35.8%
의약품 40
15.1%
하천수, 호소수 37
 
14.0%
토양 34
 
12.8%
환경검사 23
 
8.7%
축산 15
 
5.7%
미생물 14
 
5.3%
하수(오수)검사 7
 
2.6%

Length

2024-04-17T00:51:07.590910image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T00:51:07.694261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
식품 95
31.5%
의약품 40
13.2%
하천수 37
 
12.3%
호소수 37
 
12.3%
토양 34
 
11.3%
환경검사 23
 
7.6%
축산 15
 
5.0%
미생물 14
 
4.6%
하수(오수)검사 7
 
2.3%

시료유형1
Categorical

HIGH CORRELATION 

Distinct35
Distinct (%)13.2%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
식품별 규격 확인 시험법
34 
성분시험법
25 
일반오염물질
20 
검사항목(22항목)
19 
특정유해물질
17 
Other values (30)
150 

Length

Max length18
Median length11
Mean length7.6981132
Min length2

Unique

Unique11 ?
Unique (%)4.2%

Sample

1st row가공식품 및 조리식품 등
2nd row가공식품 및 조리식품 등
3rd row가공식품 및 조리식품 등
4th row가공식품 및 조리식품 등
5th row가공식품 및 조리식품 등

Common Values

ValueCountFrequency (%)
식품별 규격 확인 시험법 34
 
12.8%
성분시험법 25
 
9.4%
일반오염물질 20
 
7.5%
검사항목(22항목) 19
 
7.2%
특정유해물질 17
 
6.4%
축산물 15
 
5.7%
가공식품 및 조리식품 등 14
 
5.3%
의약품 및 의약외품 12
 
4.5%
건강기능식품 11
 
4.2%
유해물질 10
 
3.8%
Other values (25) 88
33.2%

Length

2024-04-17T00:51:07.841558image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
식품별 34
 
7.3%
시험법 34
 
7.3%
규격 34
 
7.3%
확인 34
 
7.3%
26
 
5.6%
성분시험법 25
 
5.3%
일반오염물질 20
 
4.3%
검사항목(22항목 19
 
4.1%
특정유해물질 17
 
3.6%
축산물 15
 
3.2%
Other values (39) 210
44.9%

시료유형2
Text

MISSING 

Distinct60
Distinct (%)33.7%
Missing87
Missing (%)32.8%
Memory size2.2 KiB
2024-04-17T00:51:08.102103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length36
Median length30
Mean length6.1179775
Min length2

Characters and Unicode

Total characters1089
Distinct characters155
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)16.9%

Sample

1st row식품규격
2nd row식품규격
3rd row식중독균 검사
4th row식중독균 검사
5th row식중독균 검사
ValueCountFrequency (%)
21
 
7.8%
의약품 12
 
4.5%
의약외품 12
 
4.5%
식중독균 11
 
4.1%
검사 11
 
4.1%
일반시험법 10
 
3.7%
조미식품 9
 
3.4%
위생용품 9
 
3.4%
또는 7
 
2.6%
개별성분시험법 7
 
2.6%
Other values (79) 159
59.3%
2024-04-17T00:51:08.462534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
90
 
8.3%
71
 
6.5%
40
 
3.7%
26
 
2.4%
25
 
2.3%
24
 
2.2%
24
 
2.2%
22
 
2.0%
22
 
2.0%
22
 
2.0%
Other values (145) 723
66.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 940
86.3%
Space Separator 90
 
8.3%
Other Punctuation 22
 
2.0%
Close Punctuation 14
 
1.3%
Open Punctuation 14
 
1.3%
Uppercase Letter 5
 
0.5%
Decimal Number 4
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
71
 
7.6%
40
 
4.3%
26
 
2.8%
25
 
2.7%
24
 
2.6%
24
 
2.6%
22
 
2.3%
22
 
2.3%
22
 
2.3%
21
 
2.2%
Other values (134) 643
68.4%
Uppercase Letter
ValueCountFrequency (%)
C 2
40.0%
H 1
20.0%
A 1
20.0%
P 1
20.0%
Other Punctuation
ValueCountFrequency (%)
, 15
68.2%
· 7
31.8%
Decimal Number
ValueCountFrequency (%)
1 3
75.0%
7 1
 
25.0%
Space Separator
ValueCountFrequency (%)
90
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14
100.0%
Open Punctuation
ValueCountFrequency (%)
( 14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 940
86.3%
Common 144
 
13.2%
Latin 5
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
71
 
7.6%
40
 
4.3%
26
 
2.8%
25
 
2.7%
24
 
2.6%
24
 
2.6%
22
 
2.3%
22
 
2.3%
22
 
2.3%
21
 
2.2%
Other values (134) 643
68.4%
Common
ValueCountFrequency (%)
90
62.5%
, 15
 
10.4%
) 14
 
9.7%
( 14
 
9.7%
· 7
 
4.9%
1 3
 
2.1%
7 1
 
0.7%
Latin
ValueCountFrequency (%)
C 2
40.0%
H 1
20.0%
A 1
20.0%
P 1
20.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 940
86.3%
ASCII 142
 
13.0%
None 7
 
0.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
90
63.4%
, 15
 
10.6%
) 14
 
9.9%
( 14
 
9.9%
1 3
 
2.1%
C 2
 
1.4%
H 1
 
0.7%
7 1
 
0.7%
A 1
 
0.7%
P 1
 
0.7%
Hangul
ValueCountFrequency (%)
71
 
7.6%
40
 
4.3%
26
 
2.8%
25
 
2.7%
24
 
2.6%
24
 
2.6%
22
 
2.3%
22
 
2.3%
22
 
2.3%
21
 
2.2%
Other values (134) 643
68.4%
None
ValueCountFrequency (%)
· 7
100.0%

검사항목
Text

MISSING 

Distinct215
Distinct (%)83.3%
Missing7
Missing (%)2.6%
Memory size2.2 KiB
2024-04-17T00:51:08.727802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length65
Median length45
Mean length9.3682171
Min length1

Characters and Unicode

Total characters2417
Distinct characters294
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique186 ?
Unique (%)72.1%

Sample

1st row대장균군, 일반세균수, 유산균수, 대장균, 진균수(1항목당)
2nd row대장균군, 일반세균수, 대장균(1항목당)
3rd row살모넬라, 장출혈성 대장균, 장염비브리오, 황색포도상구균, 리스테리아 모노사이토제네스(1항목당)
4th row살모넬라, 장출혈성 대장균, 장염비브리오, 황색포도상구균, 리스테리아 모노사이토제네스(1항목당)
5th row여시니아 엔테로콜리티카
ValueCountFrequency (%)
항목 16
 
3.7%
7
 
1.6%
시험 7
 
1.6%
6
 
1.4%
카드뮴 6
 
1.4%
살모넬라 6
 
1.4%
그밖의 5
 
1.2%
타르색소 5
 
1.2%
대장균 5
 
1.2%
비소 5
 
1.2%
Other values (288) 365
84.3%
2024-04-17T00:51:09.136527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
176
 
7.3%
) 100
 
4.1%
( 100
 
4.1%
, 98
 
4.1%
53
 
2.2%
1 52
 
2.2%
48
 
2.0%
44
 
1.8%
43
 
1.8%
43
 
1.8%
Other values (284) 1660
68.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1681
69.5%
Space Separator 176
 
7.3%
Uppercase Letter 132
 
5.5%
Other Punctuation 101
 
4.2%
Close Punctuation 100
 
4.1%
Open Punctuation 100
 
4.1%
Decimal Number 89
 
3.7%
Lowercase Letter 21
 
0.9%
Dash Punctuation 16
 
0.7%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
53
 
3.2%
48
 
2.9%
44
 
2.6%
43
 
2.6%
43
 
2.6%
39
 
2.3%
38
 
2.3%
37
 
2.2%
29
 
1.7%
29
 
1.7%
Other values (238) 1278
76.0%
Uppercase Letter
ValueCountFrequency (%)
D 14
10.6%
C 14
10.6%
B 13
9.8%
P 12
9.1%
N 11
8.3%
H 11
8.3%
T 10
7.6%
O 9
6.8%
A 9
6.8%
E 9
6.8%
Other values (7) 20
15.2%
Decimal Number
ValueCountFrequency (%)
1 52
58.4%
2 11
 
12.4%
6 9
 
10.1%
5 5
 
5.6%
0 3
 
3.4%
4 3
 
3.4%
9 2
 
2.2%
3 2
 
2.2%
7 1
 
1.1%
8 1
 
1.1%
Lowercase Letter
ValueCountFrequency (%)
n 5
23.8%
α 3
14.3%
p 3
14.3%
a 2
 
9.5%
r 2
 
9.5%
i 2
 
9.5%
e 1
 
4.8%
b 1
 
4.8%
g 1
 
4.8%
u 1
 
4.8%
Other Punctuation
ValueCountFrequency (%)
, 98
97.0%
# 1
 
1.0%
/ 1
 
1.0%
· 1
 
1.0%
Space Separator
ValueCountFrequency (%)
176
100.0%
Close Punctuation
ValueCountFrequency (%)
) 100
100.0%
Open Punctuation
ValueCountFrequency (%)
( 100
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 16
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1681
69.5%
Common 583
 
24.1%
Latin 150
 
6.2%
Greek 3
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
53
 
3.2%
48
 
2.9%
44
 
2.6%
43
 
2.6%
43
 
2.6%
39
 
2.3%
38
 
2.3%
37
 
2.2%
29
 
1.7%
29
 
1.7%
Other values (238) 1278
76.0%
Latin
ValueCountFrequency (%)
D 14
 
9.3%
C 14
 
9.3%
B 13
 
8.7%
P 12
 
8.0%
N 11
 
7.3%
H 11
 
7.3%
T 10
 
6.7%
O 9
 
6.0%
A 9
 
6.0%
E 9
 
6.0%
Other values (16) 38
25.3%
Common
ValueCountFrequency (%)
176
30.2%
) 100
17.2%
( 100
17.2%
, 98
16.8%
1 52
 
8.9%
- 16
 
2.7%
2 11
 
1.9%
6 9
 
1.5%
5 5
 
0.9%
0 3
 
0.5%
Other values (9) 13
 
2.2%
Greek
ValueCountFrequency (%)
α 3
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1681
69.5%
ASCII 732
30.3%
None 4
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
176
24.0%
) 100
13.7%
( 100
13.7%
, 98
13.4%
1 52
 
7.1%
- 16
 
2.2%
D 14
 
1.9%
C 14
 
1.9%
B 13
 
1.8%
P 12
 
1.6%
Other values (34) 137
18.7%
Hangul
ValueCountFrequency (%)
53
 
3.2%
48
 
2.9%
44
 
2.6%
43
 
2.6%
43
 
2.6%
39
 
2.3%
38
 
2.3%
37
 
2.2%
29
 
1.7%
29
 
1.7%
Other values (238) 1278
76.0%
None
ValueCountFrequency (%)
α 3
75.0%
· 1
 
25.0%

비고
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct14
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
<NA>
229 
n=5
 
9
기기분석
 
5
상기외의 시험
 
4
참고용
 
4
Other values (9)
 
14

Length

Max length24
Median length4
Mean length4.2867925
Min length3

Unique

Unique6 ?
Unique (%)2.3%

Sample

1st row<NA>
2nd rown=5
3rd row<NA>
4th rown=5
5th rown=5

Common Values

ValueCountFrequency (%)
<NA> 229
86.4%
n=5 9
 
3.4%
기기분석 5
 
1.9%
상기외의 시험 4
 
1.5%
참고용 4
 
1.5%
출장비: 40,000원 별도 3
 
1.1%
살균, n=5 3
 
1.1%
비살균, n=5 2
 
0.8%
박층크로마토그래프법 1
 
0.4%
적정법 1
 
0.4%
Other values (4) 4
 
1.5%

Length

2024-04-17T00:51:09.267555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 229
80.9%
n=5 14
 
4.9%
기기분석 5
 
1.8%
상기외의 4
 
1.4%
시험 4
 
1.4%
참고용 4
 
1.4%
출장비 4
 
1.4%
비살균 3
 
1.1%
살균 3
 
1.1%
별도 3
 
1.1%
Other values (8) 10
 
3.5%
Distinct167
Distinct (%)63.0%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
2024-04-17T00:51:09.531763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length5
Mean length4.9886792
Min length3

Characters and Unicode

Total characters1322
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique129 ?
Unique (%)48.7%

Sample

1st row19900
2nd row43600
3rd row29000
4th row64900
5th row47700
ValueCountFrequency (%)
44200 22
 
7.9%
6900 8
 
2.9%
8600 7
 
2.5%
30000 7
 
2.5%
유사시험 6
 
2.2%
항목에 6
 
2.2%
준함 6
 
2.2%
3400 5
 
1.8%
26000 5
 
1.8%
2800 4
 
1.4%
Other values (159) 201
72.6%
2024-04-17T00:51:09.941827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 622
47.0%
4 104
 
7.9%
1 90
 
6.8%
2 88
 
6.7%
3 79
 
6.0%
6 76
 
5.7%
5 55
 
4.2%
7 50
 
3.8%
8 47
 
3.6%
9 45
 
3.4%
Other values (10) 66
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1256
95.0%
Other Letter 54
 
4.1%
Space Separator 12
 
0.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 622
49.5%
4 104
 
8.3%
1 90
 
7.2%
2 88
 
7.0%
3 79
 
6.3%
6 76
 
6.1%
5 55
 
4.4%
7 50
 
4.0%
8 47
 
3.7%
9 45
 
3.6%
Other Letter
ValueCountFrequency (%)
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
Space Separator
ValueCountFrequency (%)
12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1268
95.9%
Hangul 54
 
4.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 622
49.1%
4 104
 
8.2%
1 90
 
7.1%
2 88
 
6.9%
3 79
 
6.2%
6 76
 
6.0%
5 55
 
4.3%
7 50
 
3.9%
8 47
 
3.7%
9 45
 
3.5%
Hangul
ValueCountFrequency (%)
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1268
95.9%
Hangul 54
 
4.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 622
49.1%
4 104
 
8.2%
1 90
 
7.1%
2 88
 
6.9%
3 79
 
6.2%
6 76
 
6.0%
5 55
 
4.3%
7 50
 
3.9%
8 47
 
3.7%
9 45
 
3.5%
Hangul
ValueCountFrequency (%)
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%
6
11.1%

Correlations

2024-04-17T00:51:10.046246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분시료유형1시료유형2비고
구분1.0001.0001.0001.000
시료유형11.0001.0000.9990.849
시료유형21.0000.9991.0000.955
비고1.0000.8490.9551.000
2024-04-17T00:51:10.141439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시료유형1구분비고
시료유형11.0000.9460.513
구분0.9461.0000.848
비고0.5130.8481.000
2024-04-17T00:51:10.224306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분시료유형1비고
구분1.0000.9460.848
시료유형10.9461.0000.513
비고0.8480.5131.000

Missing values

2024-04-17T00:51:07.215671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T00:51:07.373132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-17T00:51:07.473915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

구분시료유형1시료유형2검사항목비고수수료(원)
0미생물가공식품 및 조리식품 등식품규격대장균군, 일반세균수, 유산균수, 대장균, 진균수(1항목당)<NA>19900
1미생물가공식품 및 조리식품 등식품규격대장균군, 일반세균수, 대장균(1항목당)n=543600
2미생물가공식품 및 조리식품 등식중독균 검사살모넬라, 장출혈성 대장균, 장염비브리오, 황색포도상구균, 리스테리아 모노사이토제네스(1항목당)<NA>29000
3미생물가공식품 및 조리식품 등식중독균 검사살모넬라, 장출혈성 대장균, 장염비브리오, 황색포도상구균, 리스테리아 모노사이토제네스(1항목당)n=564900
4미생물가공식품 및 조리식품 등식중독균 검사여시니아 엔테로콜리티카n=547700
5미생물가공식품 및 조리식품 등식중독균 검사클로스트리디움 퍼프린젠스n=540200
6미생물가공식품 및 조리식품 등식중독균 검사캠필로박터 제주니/콜리n=546500
7미생물가공식품 및 조리식품 등식중독균 검사크로노박터n=538300
8미생물가공식품 및 조리식품 등식중독균 검사바실루스 세레우스(정량)<NA>52800
9미생물가공식품 및 조리식품 등식중독균 검사바실루스 세레우스(정량)n=5225700
구분시료유형1시료유형2검사항목비고수수료(원)
255축산축산물알가공품세균수, 대장균군, 살모넬라, 리스테리아모노사이토제네스살균, n=5217000
256축산축산물식육 (학교급식)항생제검사(정성, 단성분정량검사)<NA>60000
257축산축산물HACCP검사세균수, 대장균, 살모넬라<NA>68800
258축산축산물기구류, 낙하세균세균수<NA>19900
259축산축산물축산물잔류농약(다성분분석법)참고용278400
260축산축산물축산물잔류농약(단성분분석법)참고용81400
261축산축산물축산물잔류동물용의약품(정성시험)참고용20000
262축산축산물축산물잔류동물용의약품(정량시험)참고용40000
263축산축산물식용란잔류물질자가품질검사231400
264축산축산물소고기한우확인검사<NA>80000