Overview

Dataset statistics

Number of variables4
Number of observations1424
Missing cells1127
Missing cells (%)19.8%
Duplicate rows249
Duplicate rows (%)17.5%
Total size in memory47.4 KiB
Average record size in memory34.1 B

Variable types

Numeric2
Text1
Boolean1

Dataset

Description가축분뇨 전자인계관리시스템에서 관리하고 있는 정보 중 장비 장착현황(업체번호, 모델명, 사용여부 등)으로 등록된 정보 입니다.
Author한국환경공단
URLhttps://www.data.go.kr/data/15041948/fileData.do

Alerts

Dataset has 249 (17.5%) duplicate rowsDuplicates
사용여부 is highly imbalanced (99.2%)Imbalance
모델 명 has 1127 (79.1%) missing valuesMissing

Reproduction

Analysis started2023-12-12 21:08:17.379792
Analysis finished2023-12-12 21:08:18.159929
Duration0.78 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

업체번호
Real number (ℝ)

Distinct723
Distinct (%)50.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0159192 × 109
Minimum2.0130002 × 109
Maximum2.0220003 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.6 KiB
2023-12-13T06:08:18.248245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2.0130002 × 109
5-th percentile2.0130004 × 109
Q12.0150003 × 109
median2.016001 × 109
Q32.0160035 × 109
95-th percentile2.0200005 × 109
Maximum2.0220003 × 109
Range9000113
Interquartile range (IQR)1003154.8

Descriptive statistics

Standard deviation1734943.3
Coefficient of variation (CV)0.00086062146
Kurtosis2.6195974
Mean2.0159192 × 109
Median Absolute Deviation (MAD)1000418
Skewness1.2007762
Sum2.8706689 × 1012
Variance3.0100283 × 1012
MonotonicityNot monotonic
2023-12-13T06:08:18.447497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2016000829 13
 
0.9%
2013000410 13
 
0.9%
2017000168 10
 
0.7%
2022000001 9
 
0.6%
2013000588 9
 
0.6%
2015000203 8
 
0.6%
2013000406 8
 
0.6%
2013000572 7
 
0.5%
2015000394 7
 
0.5%
2015000376 7
 
0.5%
Other values (713) 1333
93.6%
ValueCountFrequency (%)
2013000160 4
0.3%
2013000196 5
0.4%
2013000217 1
 
0.1%
2013000240 1
 
0.1%
2013000259 1
 
0.1%
2013000277 2
 
0.1%
2013000315 1
 
0.1%
2013000319 1
 
0.1%
2013000350 1
 
0.1%
2013000355 1
 
0.1%
ValueCountFrequency (%)
2022000273 2
0.1%
2022000237 2
0.1%
2022000190 1
0.1%
2022000173 1
0.1%
2022000156 1
0.1%
2022000142 1
0.1%
2022000112 1
0.1%
2022000074 1
0.1%
2022000054 1
0.1%
2022000042 1
0.1%

모델 명
Text

MISSING 

Distinct241
Distinct (%)81.1%
Missing1127
Missing (%)79.1%
Memory size11.3 KiB
2023-12-13T06:08:18.766468image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length8
Mean length7.7239057
Min length4

Characters and Unicode

Total characters2294
Distinct characters34
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique232 ?
Unique (%)78.1%

Sample

1st rowCA17-099
2nd rowCA17-090
3rd rowCA17-075
4th rowCA17-093
5th rowCA17-023
ValueCountFrequency (%)
xv-ca100 27
 
9.1%
혁신제품 18
 
6.1%
혁신장비 5
 
1.7%
vpn장비 4
 
1.3%
ca17-075 3
 
1.0%
ca18-032 2
 
0.7%
ca17-098 2
 
0.7%
ca18-064 2
 
0.7%
ca17-097 2
 
0.7%
ca18-050 1
 
0.3%
Other values (231) 231
77.8%
2023-12-13T06:08:19.243324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 337
14.7%
0 334
14.6%
- 264
11.5%
C 257
11.2%
A 235
10.2%
8 162
7.1%
7 139
 
6.1%
2 103
 
4.5%
9 50
 
2.2%
3 47
 
2.0%
Other values (24) 366
16.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1299
56.6%
Uppercase Letter 621
27.1%
Dash Punctuation 264
 
11.5%
Other Letter 106
 
4.6%
Lowercase Letter 4
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 337
25.9%
0 334
25.7%
8 162
12.5%
7 139
10.7%
2 103
 
7.9%
9 50
 
3.8%
3 47
 
3.6%
5 44
 
3.4%
4 43
 
3.3%
6 40
 
3.1%
Uppercase Letter
ValueCountFrequency (%)
C 257
41.4%
A 235
37.8%
V 31
 
5.0%
X 27
 
4.3%
S 23
 
3.7%
T 20
 
3.2%
R 17
 
2.7%
N 4
 
0.6%
P 4
 
0.6%
M 3
 
0.5%
Other Letter
ValueCountFrequency (%)
23
21.7%
23
21.7%
18
17.0%
18
17.0%
10
9.4%
10
9.4%
1
 
0.9%
1
 
0.9%
1
 
0.9%
1
 
0.9%
Lowercase Letter
ValueCountFrequency (%)
t 2
50.0%
e 1
25.0%
s 1
25.0%
Dash Punctuation
ValueCountFrequency (%)
- 264
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1563
68.1%
Latin 625
 
27.2%
Hangul 106
 
4.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 257
41.1%
A 235
37.6%
V 31
 
5.0%
X 27
 
4.3%
S 23
 
3.7%
T 20
 
3.2%
R 17
 
2.7%
N 4
 
0.6%
P 4
 
0.6%
M 3
 
0.5%
Other values (3) 4
 
0.6%
Common
ValueCountFrequency (%)
1 337
21.6%
0 334
21.4%
- 264
16.9%
8 162
10.4%
7 139
8.9%
2 103
 
6.6%
9 50
 
3.2%
3 47
 
3.0%
5 44
 
2.8%
4 43
 
2.8%
Hangul
ValueCountFrequency (%)
23
21.7%
23
21.7%
18
17.0%
18
17.0%
10
9.4%
10
9.4%
1
 
0.9%
1
 
0.9%
1
 
0.9%
1
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2188
95.4%
Hangul 106
 
4.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 337
15.4%
0 334
15.3%
- 264
12.1%
C 257
11.7%
A 235
10.7%
8 162
7.4%
7 139
6.4%
2 103
 
4.7%
9 50
 
2.3%
3 47
 
2.1%
Other values (14) 260
11.9%
Hangul
ValueCountFrequency (%)
23
21.7%
23
21.7%
18
17.0%
18
17.0%
10
9.4%
10
9.4%
1
 
0.9%
1
 
0.9%
1
 
0.9%
1
 
0.9%

사용여부
Boolean

IMBALANCE 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
True
1423 
False
 
1
ValueCountFrequency (%)
True 1423
99.9%
False 1
 
0.1%
2023-12-13T06:08:19.389323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

관할지사
Real number (ℝ)

Distinct11
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean906.11657
Minimum900
Maximum910
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.6 KiB
2023-12-13T06:08:19.477820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum900
5-th percentile901
Q1905
median906
Q3908
95-th percentile910
Maximum910
Range10
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.4653108
Coefficient of variation (CV)0.0027207436
Kurtosis-0.59500847
Mean906.11657
Median Absolute Deviation (MAD)2
Skewness-0.32937089
Sum1290310
Variance6.0777575
MonotonicityNot monotonic
2023-12-13T06:08:19.605596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
906 267
18.8%
905 231
16.2%
908 175
12.3%
907 165
11.6%
909 157
11.0%
910 123
8.6%
902 80
 
5.6%
903 80
 
5.6%
901 75
 
5.3%
904 70
 
4.9%
ValueCountFrequency (%)
900 1
 
0.1%
901 75
 
5.3%
902 80
 
5.6%
903 80
 
5.6%
904 70
 
4.9%
905 231
16.2%
906 267
18.8%
907 165
11.6%
908 175
12.3%
909 157
11.0%
ValueCountFrequency (%)
910 123
8.6%
909 157
11.0%
908 175
12.3%
907 165
11.6%
906 267
18.8%
905 231
16.2%
904 70
 
4.9%
903 80
 
5.6%
902 80
 
5.6%
901 75
 
5.3%

Interactions

2023-12-13T06:08:17.753070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:08:17.549967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:08:17.860929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:08:17.658065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T06:08:19.688151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업체번호사용여부관할지사
업체번호1.0000.0000.849
사용여부0.0001.0000.097
관할지사0.8490.0971.000
2023-12-13T06:08:19.789658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업체번호관할지사사용여부
업체번호1.000-0.2960.000
관할지사-0.2961.0000.074
사용여부0.0000.0741.000

Missing values

2023-12-13T06:08:18.012059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:08:18.112479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

업체번호모델 명사용여부관할지사
02013000315CA17-099Y910
12013000319CA17-090Y910
22013000350<NA>Y910
32013000196CA17-075Y910
42013000196CA17-093Y910
52013000196CA17-023Y910
62013000196CA17-036Y910
72013000196CA17-094Y910
82013000217CA17-083Y910
92013000259CA17-091Y910
업체번호모델 명사용여부관할지사
14142018010669<NA>Y909
14152018010669<NA>Y909
14162020000750<NA>Y902
14172018010423<NA>Y902
14182019000257<NA>Y906
14192019000257<NA>Y906
14202019000281<NA>Y905
14212019000206<NA>Y908
14222019000206<NA>Y908
14232020000183<NA>Y908

Duplicate rows

Most frequently occurring

업체번호모델 명사용여부관할지사# duplicates
952016000829<NA>Y90511
2052017000168<NA>Y90110
2452022000001<NA>Y9068
82015000203<NA>Y9057
452015000376<NA>Y9097
472015000380<NA>Y9097
532015000394<NA>Y9077
1162016000978<NA>Y9077
1232016001370<NA>Y9077
42013000572<NA>Y9106