Overview

Dataset statistics

Number of variables6
Number of observations5641
Missing cells0
Missing cells (%)0.0%
Duplicate rows372
Duplicate rows (%)6.6%
Total size in memory275.6 KiB
Average record size in memory50.0 B

Variable types

Numeric2
Categorical4

Dataset

Description축산물 미생물 검사결과 현황
Author경기도
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=ZP00NKB44H9ZJXUSY11Z11622002&infSeq=1

Alerts

Dataset has 372 (6.6%) duplicate rowsDuplicates
세균구분명 is highly overall correlated with 검사세균분포구분명High correlation
검사세균분포구분명 is highly overall correlated with 세균구분명High correlation
검사결과수(건) has 2161 (38.3%) zerosZeros

Reproduction

Analysis started2024-04-20 18:36:00.484412
Analysis finished2024-04-20 18:36:02.487091
Duration2 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

집계년도
Real number (ℝ)

Distinct15
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2017.0207
Minimum2010
Maximum2024
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.7 KiB
2024-04-21T03:36:02.530299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12013
median2016
Q32021
95-th percentile2023
Maximum2024
Range14
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.4742085
Coefficient of variation (CV)0.0022182263
Kurtosis-1.4372532
Mean2017.0207
Median Absolute Deviation (MAD)4
Skewness-0.041293385
Sum11378014
Variance20.018541
MonotonicityDecreasing
2024-04-21T03:36:02.620133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
2023 600
10.6%
2022 600
10.6%
2013 528
9.4%
2021 480
8.5%
2014 444
7.9%
2010 405
 
7.2%
2015 396
 
7.0%
2020 386
 
6.8%
2011 384
 
6.8%
2012 338
 
6.0%
Other values (5) 1080
19.1%
ValueCountFrequency (%)
2010 405
7.2%
2011 384
6.8%
2012 338
6.0%
2013 528
9.4%
2014 444
7.9%
2015 396
7.0%
2016 337
6.0%
2017 66
 
1.2%
2018 263
4.7%
2019 264
4.7%
ValueCountFrequency (%)
2024 150
 
2.7%
2023 600
10.6%
2022 600
10.6%
2021 480
8.5%
2020 386
6.8%
2019 264
4.7%
2018 263
4.7%
2017 66
 
1.2%
2016 337
6.0%
2015 396
7.0%

축종명
Categorical

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.2 KiB
1414 
돼지
1384 
닭고기
1253 
오리고기
1147 
염소
270 
Other values (2)
173 

Length

Max length4
Median length3
Mean length2.348697
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row오리고기
2nd row
3rd row염소
4th row닭고기
5th row돼지

Common Values

ValueCountFrequency (%)
1414
25.1%
돼지 1384
24.5%
닭고기 1253
22.2%
오리고기 1147
20.3%
염소 270
 
4.8%
166
 
2.9%
오리 7
 
0.1%

Length

2024-04-21T03:36:02.743244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T03:36:02.855753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1414
25.1%
돼지 1384
24.5%
닭고기 1253
22.2%
오리고기 1147
20.3%
염소 270
 
4.8%
166
 
2.9%
오리 7
 
0.1%

세균구분명
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.2 KiB
일반세균수
3020 
대장균수
2621 

Length

Max length5
Median length5
Mean length4.5353661
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반세균수
2nd row일반세균수
3rd row일반세균수
4th row일반세균수
5th row일반세균수

Common Values

ValueCountFrequency (%)
일반세균수 3020
53.5%
대장균수 2621
46.5%

Length

2024-04-21T03:36:02.964871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T03:36:03.042511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반세균수 3020
53.5%
대장균수 2621
46.5%

검사년월
Categorical

Distinct12
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size44.2 KiB
2024-03
563 
2024-01
539 
2024-09
538 
2024-02
515 
2024-12
479 
Other values (7)
3007 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2024-03
2nd row2024-03
3rd row2024-03
4th row2024-03
5th row2024-03

Common Values

ValueCountFrequency (%)
2024-03 563
10.0%
2024-01 539
9.6%
2024-09 538
9.5%
2024-02 515
9.1%
2024-12 479
8.5%
2024-08 463
8.2%
2024-06 442
7.8%
2024-11 436
7.7%
2024-04 421
7.5%
2024-10 419
7.4%
Other values (2) 826
14.6%

Length

2024-04-21T03:36:03.125215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2024-03 563
10.0%
2024-01 539
9.6%
2024-09 538
9.5%
2024-02 515
9.1%
2024-12 479
8.5%
2024-08 463
8.2%
2024-06 442
7.8%
2024-11 436
7.7%
2024-04 421
7.5%
2024-10 419
7.4%
Other values (2) 826
14.6%

검사세균분포구분명
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size44.2 KiB
10의2승초과~10의3승이하
1154 
10의3승초과~10의4승이하
1023 
10의4승초과~10의5승이하
789 
10의1승초과~10의2승이하
590 
10의5승초과~10의6승이하
461 
Other values (20)
1624 

Length

Max length15
Median length15
Mean length12.843999
Min length5

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row10의5승초과~10의6승이하
2nd row10의5승초과~10의6승이하
3rd row10의5승초과~10의6승이하
4th row10의5승초과~10의6승이하
5th row10의5승초과~10의6승이하

Common Values

ValueCountFrequency (%)
10의2승초과~10의3승이하 1154
20.5%
10의3승초과~10의4승이하 1023
18.1%
10의4승초과~10의5승이하 789
14.0%
10의1승초과~10의2승이하 590
10.5%
10의5승초과~10의6승이하 461
 
8.2%
10의1승이하 362
 
6.4%
10의2승이하 355
 
6.3%
10의2승 이하 223
 
4.0%
10의1승 이하 223
 
4.0%
10의4승초과 153
 
2.7%
Other values (15) 308
 
5.5%

Length

2024-04-21T03:36:03.215001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
10의2승초과~10의3승이하 1154
18.9%
10의3승초과~10의4승이하 1023
16.8%
10의4승초과~10의5승이하 789
12.9%
10의1승초과~10의2승이하 590
9.7%
10의5승초과~10의6승이하 461
 
7.6%
이하 452
 
7.4%
10의1승이하 362
 
5.9%
10의2승이하 355
 
5.8%
10의2승 223
 
3.7%
10의1승 223
 
3.7%
Other values (16) 461
 
7.6%

검사결과수(건)
Real number (ℝ)

ZEROS 

Distinct497
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean68.706435
Minimum0
Maximum1548
Zeros2161
Zeros (%)38.3%
Negative0
Negative (%)0.0%
Memory size49.7 KiB
2024-04-21T03:36:03.308648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median8
Q389
95-th percentile337
Maximum1548
Range1548
Interquartile range (IQR)89

Descriptive statistics

Standard deviation128.87329
Coefficient of variation (CV)1.8757091
Kurtosis22.998167
Mean68.706435
Median Absolute Deviation (MAD)8
Skewness3.7376973
Sum387573
Variance16608.324
MonotonicityNot monotonic
2024-04-21T03:36:03.424289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2161
38.3%
1 173
 
3.1%
3 94
 
1.7%
2 84
 
1.5%
5 75
 
1.3%
4 72
 
1.3%
10 62
 
1.1%
12 62
 
1.1%
6 59
 
1.0%
7 53
 
0.9%
Other values (487) 2746
48.7%
ValueCountFrequency (%)
0 2161
38.3%
1 173
 
3.1%
2 84
 
1.5%
3 94
 
1.7%
4 72
 
1.3%
5 75
 
1.3%
6 59
 
1.0%
7 53
 
0.9%
8 51
 
0.9%
9 43
 
0.8%
ValueCountFrequency (%)
1548 1
< 0.1%
1528 1
< 0.1%
1395 1
< 0.1%
1392 1
< 0.1%
1304 1
< 0.1%
1295 1
< 0.1%
1261 1
< 0.1%
1252 1
< 0.1%
1202 1
< 0.1%
1190 1
< 0.1%

Interactions

2024-04-21T03:36:02.161476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T03:36:01.833621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T03:36:02.246625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T03:36:01.965332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-21T03:36:03.507253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
집계년도축종명세균구분명검사년월검사세균분포구분명검사결과수(건)
집계년도1.0000.5100.0740.2470.6230.266
축종명0.5101.0000.0000.0630.1410.228
세균구분명0.0740.0001.0000.0000.7840.264
검사년월0.2470.0630.0001.0000.2240.115
검사세균분포구분명0.6230.1410.7840.2241.0000.743
검사결과수(건)0.2660.2280.2640.1150.7431.000
2024-04-21T03:36:03.595854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
축종명검사세균분포구분명검사년월세균구분명
축종명1.0000.0600.0300.000
검사세균분포구분명0.0601.0000.0760.701
검사년월0.0300.0761.0000.000
세균구분명0.0000.7010.0001.000
2024-04-21T03:36:03.684545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
집계년도검사결과수(건)축종명세균구분명검사년월검사세균분포구분명
집계년도1.000-0.1760.2890.0530.1110.274
검사결과수(건)-0.1761.0000.1170.2020.0480.369
축종명0.2890.1171.0000.0000.0300.060
세균구분명0.0530.2020.0001.0000.0000.701
검사년월0.1110.0480.0300.0001.0000.076
검사세균분포구분명0.2740.3690.0600.7010.0761.000

Missing values

2024-04-21T03:36:02.358293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T03:36:02.445494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

집계년도축종명세균구분명검사년월검사세균분포구분명검사결과수(건)
02024오리고기일반세균수2024-0310의5승초과~10의6승이하0
12024일반세균수2024-0310의5승초과~10의6승이하0
22024염소일반세균수2024-0310의5승초과~10의6승이하0
32024닭고기일반세균수2024-0310의5승초과~10의6승이하0
42024돼지일반세균수2024-0310의5승초과~10의6승이하0
52024대장균수2024-0310의4승초과~10의5승이하0
62024돼지일반세균수2024-0310의4승초과~10의5승이하3
72024오리고기일반세균수2024-0310의4승초과~10의5승이하0
82024닭고기대장균수2024-0310의4승초과~10의5승이하0
92024염소일반세균수2024-0310의4승초과~10의5승이하0
집계년도축종명세균구분명검사년월검사세균분포구분명검사결과수(건)
56312010오리고기대장균수2024-0110의1승초과~10의2승이하2
56322010대장균수2024-0110의1승초과~10의2승이하39
56332010돼지대장균수2024-0110의1승초과~10의2승이하84
56342010돼지대장균수2024-0110의1승초과~10의2승이하84
56352010닭고기대장균수2024-0110의1승초과~10의2승이하73
56362010돼지대장균수2024-0110의1승이하214
56372010닭고기대장균수2024-0110의1승이하60
56382010대장균수2024-0110의1승이하219
56392010오리고기대장균수2024-0110의1승이하17
56402010돼지대장균수2024-0110의1승이하214

Duplicate rows

Most frequently occurring

집계년도축종명세균구분명검사년월검사세균분포구분명검사결과수(건)# duplicates
02010닭고기대장균수2024-0310의1승이하342
12010닭고기대장균수2024-0310의1승초과~10의2승이하1472
22010닭고기대장균수2024-0310의2승초과~10의3승이하142
32010닭고기대장균수2024-0310의3승초과~10의4승이하02
42010닭고기대장균수2024-0410의1승이하342
52010닭고기대장균수2024-0410의1승초과~10의2승이하1232
62010닭고기대장균수2024-0410의2승초과~10의3승이하132
72010닭고기대장균수2024-0410의3승초과~10의4승이하02
82010닭고기일반세균수2024-0110의2승이하12
92010닭고기일반세균수2024-0110의2승초과~10의3승이하692