Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows957
Duplicate rows (%)9.6%
Total size in memory732.4 KiB
Average record size in memory75.0 B

Variable types

Numeric2
DateTime3
Categorical2
Boolean1

Dataset

Description가축분뇨 전자인계관리시스템에서 관리하고 있는 가축분뇨 중 분뇨 운반인계서로 등록된 정보(운반업체번호, 인수일자)인수량 등) 입니다.
Author한국환경공단
URLhttps://www.data.go.kr/data/15041899/fileData.do

Alerts

Dataset has 957 (9.6%) duplicate rowsDuplicates
인계량입력업체구분 is highly imbalanced (61.8%)Imbalance
인계서입력구분 is highly imbalanced (64.5%)Imbalance
보관장소 경유여부 is highly imbalanced (59.6%)Imbalance

Reproduction

Analysis started2023-12-12 01:17:39.742582
Analysis finished2023-12-12 01:17:41.088087
Duration1.35 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

운반업체번호
Real number (ℝ)

Distinct443
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0156297 × 109
Minimum2.0130004 × 109
Maximum2.0200008 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T10:17:41.199436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2.0130004 × 109
5-th percentile2.0130006 × 109
Q12.0150003 × 109
median2.0160008 × 109
Q32.0160024 × 109
95-th percentile2.0180107 × 109
Maximum2.0200008 × 109
Range7000382
Interquartile range (IQR)1002126

Descriptive statistics

Standard deviation1275700.3
Coefficient of variation (CV)0.00063290407
Kurtosis1.5342188
Mean2.0156297 × 109
Median Absolute Deviation (MAD)999784
Skewness0.30450119
Sum2.0156297 × 1013
Variance1.6274111 × 1012
MonotonicityNot monotonic
2023-12-12T10:17:41.463877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2015000322 428
 
4.3%
2017000168 180
 
1.8%
2016002219 169
 
1.7%
2019000255 136
 
1.4%
2016001736 128
 
1.3%
2016002849 126
 
1.3%
2018010804 122
 
1.2%
2013000369 113
 
1.1%
2014000273 109
 
1.1%
2017000803 98
 
1.0%
Other values (433) 8391
83.9%
ValueCountFrequency (%)
2013000369 113
1.1%
2013000371 30
 
0.3%
2013000374 15
 
0.1%
2013000377 9
 
0.1%
2013000378 7
 
0.1%
2013000379 5
 
0.1%
2013000380 11
 
0.1%
2013000384 40
 
0.4%
2013000385 53
0.5%
2013000406 72
0.7%
ValueCountFrequency (%)
2020000751 3
 
< 0.1%
2020000750 6
 
0.1%
2020000737 18
0.2%
2020000601 38
0.4%
2020000531 2
 
< 0.1%
2020000451 2
 
< 0.1%
2020000413 12
 
0.1%
2020000294 15
 
0.1%
2020000284 1
 
< 0.1%
2020000183 3
 
< 0.1%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2021-01-01 00:00:00
Maximum2021-03-01 00:00:00
2023-12-12T10:17:41.600500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:17:41.748095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=3)

인수량(톤)
Real number (ℝ)

Distinct1819
Distinct (%)18.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.439636
Minimum0.5
Maximum170
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T10:17:41.893802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.5
5-th percentile5
Q17.55
median15
Q322.99
95-th percentile25
Maximum170
Range169.5
Interquartile range (IQR)15.44

Descriptive statistics

Standard deviation8.298592
Coefficient of variation (CV)0.53748623
Kurtosis24.441574
Mean15.439636
Median Absolute Deviation (MAD)7.58
Skewness2.0434509
Sum154396.36
Variance68.86663
MonotonicityNot monotonic
2023-12-12T10:17:42.075375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
23.0 532
 
5.3%
15.0 465
 
4.7%
24.0 399
 
4.0%
20.0 357
 
3.6%
5.0 324
 
3.2%
25.0 300
 
3.0%
8.0 258
 
2.6%
6.0 242
 
2.4%
21.0 169
 
1.7%
22.0 147
 
1.5%
Other values (1809) 6807
68.1%
ValueCountFrequency (%)
0.5 7
 
0.1%
1.0 54
0.5%
1.15 1
 
< 0.1%
1.29 1
 
< 0.1%
1.5 11
 
0.1%
1.7 1
 
< 0.1%
1.73 1
 
< 0.1%
1.87 1
 
< 0.1%
2.0 38
0.4%
2.04 1
 
< 0.1%
ValueCountFrequency (%)
170.0 1
 
< 0.1%
135.0 1
 
< 0.1%
110.0 1
 
< 0.1%
100.0 1
 
< 0.1%
90.0 1
 
< 0.1%
87.0 7
0.1%
85.0 1
 
< 0.1%
78.0 1
 
< 0.1%
73.0 1
 
< 0.1%
70.0 1
 
< 0.1%
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2021-01-01 00:00:00
Maximum2021-04-01 00:00:00
2023-12-12T10:17:42.211067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:17:42.672993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=4)
Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2021-01-01 00:00:00
Maximum2022-10-01 00:00:00
2023-12-12T10:17:42.808851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:17:42.951413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)

인계량입력업체구분
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
T1
9255 
E1
 
745

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowT1
2nd rowT1
3rd rowT1
4th rowT1
5th rowT1

Common Values

ValueCountFrequency (%)
T1 9255
92.5%
E1 745
 
7.4%

Length

2023-12-12T10:17:43.104126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T10:17:43.233851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
t1 9255
92.5%
e1 745
 
7.4%

인계서입력구분
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2
8198 
1
1762 
4
 
35
5
 
5

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
2 8198
82.0%
1 1762
 
17.6%
4 35
 
0.4%
5 5
 
0.1%

Length

2023-12-12T10:17:43.364298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T10:17:43.500511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 8198
82.0%
1 1762
 
17.6%
4 35
 
0.4%
5 5
 
< 0.1%

보관장소 경유여부
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size87.9 KiB
False
9196 
True
 
804
ValueCountFrequency (%)
False 9196
92.0%
True 804
 
8.0%
2023-12-12T10:17:43.625216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Interactions

2023-12-12T10:17:40.603980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:17:40.317830image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:17:40.713394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:17:40.468854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T10:17:43.707204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
운반업체번호인수일자인수량(톤)인계일자마감처리일자인계량입력업체구분인계서입력구분보관장소 경유여부
운반업체번호1.0000.0220.2040.0380.1600.1420.2600.517
인수일자0.0221.0000.0350.9660.8750.0000.0000.012
인수량(톤)0.2040.0351.0000.0000.1840.1830.0790.151
인계일자0.0380.9660.0001.0000.9440.0000.0000.027
마감처리일자0.1600.8750.1840.9441.0000.0790.1270.029
인계량입력업체구분0.1420.0000.1830.0000.0791.0000.0130.030
인계서입력구분0.2600.0000.0790.0000.1270.0131.0000.063
보관장소 경유여부0.5170.0120.1510.0270.0290.0300.0631.000
2023-12-12T10:17:43.841116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인계서입력구분보관장소 경유여부인계량입력업체구분
인계서입력구분1.0000.0420.009
보관장소 경유여부0.0421.0000.019
인계량입력업체구분0.0090.0191.000
2023-12-12T10:17:43.954243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
운반업체번호인수량(톤)인계량입력업체구분인계서입력구분보관장소 경유여부
운반업체번호1.000-0.1820.1110.1210.390
인수량(톤)-0.1821.0000.1830.0510.151
인계량입력업체구분0.1110.1831.0000.0090.019
인계서입력구분0.1210.0510.0091.0000.042
보관장소 경유여부0.3900.1510.0190.0421.000

Missing values

2023-12-12T10:17:40.857141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T10:17:40.997183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

운반업체번호인수일자인수량(톤)인계일자마감처리일자인계량입력업체구분인계서입력구분보관장소 경유여부
8009120130005882021-0324.02021-032021-03T12N
1937020160022652021-015.462021-012021-01T11N
2241420160017872021-016.732021-012021-01T12N
6846220160015502021-0315.02021-032021-03T12N
7129220150003162021-0325.02021-032021-03T11N
790120140003212021-016.412021-012021-01E12Y
5855020160009512021-0219.382021-022021-03T12N
2040420160009812021-0115.02021-012021-01T12N
8730720150002752021-0323.452021-032021-03T11N
6202420150002982021-0216.02021-022021-03T12N
운반업체번호인수일자인수량(톤)인계일자마감처리일자인계량입력업체구분인계서입력구분보관장소 경유여부
8902520160023602021-038.142021-032021-03T12N
4365920160032482021-0216.02021-022021-02T12N
1950220150003542021-0119.72021-012021-01T12N
917420160009732021-0123.52021-012021-01T12N
4038320180104632021-0224.02021-022021-02T12N
1429420190002062021-0112.912021-012021-01T12N
2376820160023702021-0124.82021-012021-01T12N
8679020150003222021-0314.12021-032021-03T12Y
3720320170001282021-0222.332021-022021-02T12N
1382720160007452021-0114.762021-012021-01T12N

Duplicate rows

Most frequently occurring

운반업체번호인수일자인수량(톤)인계일자마감처리일자인계량입력업체구분인계서입력구분보관장소 경유여부# duplicates
94320190002552021-015.02021-012021-01T12N44
94520190002552021-025.02021-022021-02T12N42
94720190002552021-035.02021-032021-03T12N30
87820170008032021-0120.02021-012021-01T12N27
43720150003942021-0125.02021-012021-01T12N25
8420130005882021-0224.02021-022021-02T12N21
63120160016802021-016.02021-012021-01T12N21
63620160016802021-036.02021-032021-03T12N21
88120170008032021-0220.02021-022021-02T12N21
320130003692021-0215.02021-022021-02T12N20