Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells11528
Missing cells (%)11.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory908.2 KiB
Average record size in memory93.0 B

Variable types

Numeric4
Text1
Categorical2
Boolean3

Dataset

Description폐기물처분부담금관리시스템 내 등록되어진 데이터로 폐기물처분부담금신고를 위한 폐기물분류코드 등록 및 그에 따른 신고한 부분에 대한 SMS 전송내역을 제공하는 자료 입니다.
Author한국환경공단
URLhttps://www.data.go.kr/data/15092767/fileData.do

Alerts

실적년도 has constant value ""Constant
폐기물정보대분류 is highly overall correlated with 폐기물정보중분류 and 2 other fieldsHigh correlation
폐기물정보중분류 is highly overall correlated with 폐기물정보대분류High correlation
폐기물정보소분류 is highly overall correlated with 폐기물분류코드High correlation
폐기물분류코드 is highly overall correlated with 폐기물정보대분류 and 1 other fieldsHigh correlation
연소여부 is highly overall correlated with 의료폐기물여부High correlation
지정폐기물여부 is highly overall correlated with 폐기물정보대분류High correlation
의료폐기물여부 is highly overall correlated with 연소여부High correlation
폐기물분류코드 is highly imbalanced (50.1%)Imbalance
의료폐기물여부 is highly imbalanced (75.8%)Imbalance
폐기물정보중분류 has 311 (3.1%) missing valuesMissing
폐기물정보소분류 has 2938 (29.4%) missing valuesMissing
연소여부 has 6617 (66.2%) missing valuesMissing
의료폐기물여부 has 1662 (16.6%) missing valuesMissing
폐기물정보소분류 has 1411 (14.1%) zerosZeros

Reproduction

Analysis started2023-12-12 06:41:17.865107
Analysis finished2023-12-12 06:41:20.788239
Duration2.92 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시퀀스
Real number (ℝ)

Distinct273
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1458439.8
Minimum1367640
Maximum1576405
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T15:41:20.880458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1367640
5-th percentile1379902
Q11418528
median1460016
Q31479476
95-th percentile1558089
Maximum1576405
Range208765
Interquartile range (IQR)60948

Descriptive statistics

Standard deviation51636.283
Coefficient of variation (CV)0.035405151
Kurtosis-0.51673777
Mean1458439.8
Median Absolute Deviation (MAD)31115
Skewness0.35401196
Sum1.4584398 × 1010
Variance2.6663058 × 109
MonotonicityNot monotonic
2023-12-12T15:41:21.379371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1381007 57
 
0.6%
1450828 54
 
0.5%
1453803 52
 
0.5%
1478713 51
 
0.5%
1530218 49
 
0.5%
1428901 49
 
0.5%
1549925 49
 
0.5%
1467487 48
 
0.5%
1448608 47
 
0.5%
1419828 47
 
0.5%
Other values (263) 9497
95.0%
ValueCountFrequency (%)
1367640 42
0.4%
1368888 38
0.4%
1370501 30
0.3%
1370544 36
0.4%
1371038 36
0.4%
1372788 36
0.4%
1372808 44
0.4%
1373470 33
0.3%
1373472 43
0.4%
1374300 36
0.4%
ValueCountFrequency (%)
1576405 38
0.4%
1567487 41
0.4%
1564181 30
0.3%
1562983 39
0.4%
1562521 32
0.3%
1561820 34
0.3%
1560845 37
0.4%
1560575 30
0.3%
1560415 44
0.4%
1558361 38
0.4%
Distinct367
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T15:41:21.765600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length7.0253
Min length2

Characters and Unicode

Total characters70253
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row02-01-99
2nd row51-18-02
3rd row51-08-05
4th row91-10
5th row51-42
ValueCountFrequency (%)
03-01-03 43
 
0.4%
51-08-05 41
 
0.4%
51-20-05 39
 
0.4%
51-38-01 39
 
0.4%
06-01 39
 
0.4%
51-18 38
 
0.4%
51-29 38
 
0.4%
03-08-02 37
 
0.4%
91-15 37
 
0.4%
06-01-07 37
 
0.4%
Other values (357) 9612
96.1%
2023-12-12T15:41:22.313262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 17331
24.7%
- 16751
23.8%
1 12279
17.5%
5 6085
 
8.7%
2 4210
 
6.0%
3 3762
 
5.4%
9 3638
 
5.2%
4 2209
 
3.1%
7 1411
 
2.0%
6 1409
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 53502
76.2%
Dash Punctuation 16751
 
23.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 17331
32.4%
1 12279
23.0%
5 6085
 
11.4%
2 4210
 
7.9%
3 3762
 
7.0%
9 3638
 
6.8%
4 2209
 
4.1%
7 1411
 
2.6%
6 1409
 
2.6%
8 1168
 
2.2%
Dash Punctuation
ValueCountFrequency (%)
- 16751
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 70253
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 17331
24.7%
- 16751
23.8%
1 12279
17.5%
5 6085
 
8.7%
2 4210
 
6.0%
3 3762
 
5.4%
9 3638
 
5.2%
4 2209
 
3.1%
7 1411
 
2.0%
6 1409
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 70253
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 17331
24.7%
- 16751
23.8%
1 12279
17.5%
5 6085
 
8.7%
2 4210
 
6.0%
3 3762
 
5.4%
9 3638
 
5.2%
4 2209
 
3.1%
7 1411
 
2.0%
6 1409
 
2.0%

실적년도
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2022
10000 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022
2nd row2022
3rd row2022
4th row2022
5th row2022

Common Values

ValueCountFrequency (%)
2022 10000
100.0%

Length

2023-12-12T15:41:22.464631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:41:22.581311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022 10000
100.0%

폐기물분류코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
사업장폐기물
8312 
생활폐기물
1277 
건설폐기물
 
411

Length

Max length6
Median length6
Mean length5.8312
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row사업장폐기물
2nd row사업장폐기물
3rd row사업장폐기물
4th row생활폐기물
5th row사업장폐기물

Common Values

ValueCountFrequency (%)
사업장폐기물 8312
83.1%
생활폐기물 1277
 
12.8%
건설폐기물 411
 
4.1%

Length

2023-12-12T15:41:22.686852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:41:22.797484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
사업장폐기물 8312
83.1%
생활폐기물 1277
 
12.8%
건설폐기물 411
 
4.1%

폐기물정보대분류
Real number (ℝ)

HIGH CORRELATION 

Distinct13
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.8767
Minimum1
Maximum91
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T15:41:22.892255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q17
median51
Q351
95-th percentile91
Maximum91
Range90
Interquartile range (IQR)44

Descriptive statistics

Standard deviation28.631791
Coefficient of variation (CV)0.71800804
Kurtosis-0.88077618
Mean39.8767
Median Absolute Deviation (MAD)11
Skewness0.10959072
Sum398767
Variance819.77948
MonotonicityNot monotonic
2023-12-12T15:41:22.997435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
51 4908
49.1%
91 1277
 
12.8%
3 942
 
9.4%
2 416
 
4.2%
40 411
 
4.1%
1 402
 
4.0%
6 386
 
3.9%
10 273
 
2.7%
8 250
 
2.5%
7 228
 
2.3%
Other values (3) 507
 
5.1%
ValueCountFrequency (%)
1 402
4.0%
2 416
4.2%
3 942
9.4%
4 141
 
1.4%
5 187
 
1.9%
6 386
3.9%
7 228
 
2.3%
8 250
 
2.5%
9 179
 
1.8%
10 273
 
2.7%
ValueCountFrequency (%)
91 1277
 
12.8%
51 4908
49.1%
40 411
 
4.1%
10 273
 
2.7%
9 179
 
1.8%
8 250
 
2.5%
7 228
 
2.3%
6 386
 
3.9%
5 187
 
1.9%
4 141
 
1.4%

폐기물정보중분류
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct46
Distinct (%)0.5%
Missing311
Missing (%)3.1%
Infinite0
Infinite (%)0.0%
Mean13.46269
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T15:41:23.172428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median8
Q318
95-th percentile41
Maximum99
Range98
Interquartile range (IQR)15

Descriptive statistics

Standard deviation15.707709
Coefficient of variation (CV)1.1667586
Kurtosis9.9793014
Mean13.46269
Median Absolute Deviation (MAD)6
Skewness2.6169287
Sum130440
Variance246.73212
MonotonicityNot monotonic
2023-12-12T15:41:23.377321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
1 1369
 
13.7%
3 910
 
9.1%
2 906
 
9.1%
4 505
 
5.1%
17 490
 
4.9%
6 417
 
4.2%
8 350
 
3.5%
5 315
 
3.1%
12 304
 
3.0%
9 257
 
2.6%
Other values (36) 3866
38.7%
(Missing) 311
 
3.1%
ValueCountFrequency (%)
1 1369
13.7%
2 906
9.1%
3 910
9.1%
4 505
 
5.1%
5 315
 
3.1%
6 417
 
4.2%
7 252
 
2.5%
8 350
 
3.5%
9 257
 
2.6%
10 206
 
2.1%
ValueCountFrequency (%)
99 104
1.0%
90 44
 
0.4%
46 55
0.5%
45 123
1.2%
44 60
0.6%
43 46
 
0.5%
42 47
 
0.5%
41 63
0.6%
40 11
 
0.1%
38 105
1.1%

폐기물정보소분류
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct23
Distinct (%)0.3%
Missing2938
Missing (%)29.4%
Infinite0
Infinite (%)0.0%
Mean11.708581
Minimum0
Maximum99
Zeros1411
Zeros (%)14.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T15:41:23.535185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q35
95-th percentile99
Maximum99
Range99
Interquartile range (IQR)4

Descriptive statistics

Standard deviation27.546837
Coefficient of variation (CV)2.352705
Kurtosis5.9119469
Mean11.708581
Median Absolute Deviation (MAD)2
Skewness2.7772046
Sum82686
Variance758.82825
MonotonicityNot monotonic
2023-12-12T15:41:23.701601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
0 1411
14.1%
1 1204
12.0%
2 1189
11.9%
3 818
 
8.2%
99 607
 
6.1%
4 467
 
4.7%
5 320
 
3.2%
6 252
 
2.5%
7 187
 
1.9%
8 127
 
1.3%
Other values (13) 480
 
4.8%
(Missing) 2938
29.4%
ValueCountFrequency (%)
0 1411
14.1%
1 1204
12.0%
2 1189
11.9%
3 818
8.2%
4 467
 
4.7%
5 320
 
3.2%
6 252
 
2.5%
7 187
 
1.9%
8 127
 
1.3%
9 78
 
0.8%
ValueCountFrequency (%)
99 607
6.1%
90 24
 
0.2%
29 26
 
0.3%
24 29
 
0.3%
23 28
 
0.3%
22 19
 
0.2%
21 28
 
0.3%
19 53
 
0.5%
14 26
 
0.3%
13 35
 
0.4%

연소여부
Boolean

HIGH CORRELATION  MISSING 

Distinct2
Distinct (%)0.1%
Missing6617
Missing (%)66.2%
Memory size97.7 KiB
False
2954 
True
 
429
(Missing)
6617 
ValueCountFrequency (%)
False 2954
29.5%
True 429
 
4.3%
(Missing) 6617
66.2%
2023-12-12T15:41:23.824880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

지정폐기물여부
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size87.9 KiB
False
7709 
True
2291 
ValueCountFrequency (%)
False 7709
77.1%
True 2291
 
22.9%
2023-12-12T15:41:23.909115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

의료폐기물여부
Boolean

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing1662
Missing (%)16.6%
Memory size97.7 KiB
False
8005 
True
 
333
(Missing)
1662 
ValueCountFrequency (%)
False 8005
80.0%
True 333
 
3.3%
(Missing) 1662
 
16.6%
2023-12-12T15:41:23.995270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Interactions

2023-12-12T15:41:19.920451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:18.827123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:19.181548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:19.593276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:20.001030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:18.907459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:19.309965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:19.673352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:20.082523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:19.008268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:19.403080image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:19.761087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:20.163587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:19.093236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:19.496075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:41:19.843714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:41:24.078345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시퀀스폐기물분류코드폐기물정보대분류폐기물정보중분류폐기물정보소분류연소여부지정폐기물여부의료폐기물여부
시퀀스1.0000.0000.0200.0340.0000.0180.0260.013
폐기물분류코드0.0001.0001.0000.5080.5040.0990.1490.409
폐기물정보대분류0.0201.0001.0000.5150.4900.8990.6310.996
폐기물정보중분류0.0340.5080.5151.0000.2630.3200.5490.412
폐기물정보소분류0.0000.5040.4900.2631.0000.0560.2460.117
연소여부0.0180.0990.8990.3200.0561.0000.6990.893
지정폐기물여부0.0260.1490.6310.5490.2460.6991.0000.288
의료폐기물여부0.0130.4090.9960.4120.1170.8930.2881.000
2023-12-12T15:41:24.204539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
폐기물분류코드의료폐기물여부지정폐기물여부연소여부
폐기물분류코드1.0000.2690.2450.063
의료폐기물여부0.2691.0000.1860.703
지정폐기물여부0.2450.1861.0000.493
연소여부0.0630.7030.4931.000
2023-12-12T15:41:24.319577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시퀀스폐기물정보대분류폐기물정보중분류폐기물정보소분류폐기물분류코드연소여부지정폐기물여부의료폐기물여부
시퀀스1.000-0.006-0.0320.0130.0000.0180.0260.013
폐기물정보대분류-0.0061.0000.556-0.1941.0000.4850.7590.355
폐기물정보중분류-0.0320.5561.000-0.1340.2430.3910.3970.297
폐기물정보소분류0.013-0.194-0.1341.0000.5040.0930.1640.078
폐기물분류코드0.0001.0000.2430.5041.0000.0630.2450.269
연소여부0.0180.4850.3910.0930.0631.0000.4930.703
지정폐기물여부0.0260.7590.3970.1640.2450.4931.0000.186
의료폐기물여부0.0130.3550.2970.0780.2690.7030.1861.000

Missing values

2023-12-12T15:41:20.312870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:41:20.476344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T15:41:20.699819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시퀀스폐기물분류번호실적년도폐기물분류코드폐기물정보대분류폐기물정보중분류폐기물정보소분류연소여부지정폐기물여부의료폐기물여부
86724139854902-01-992022사업장폐기물2199<NA>YN
77788137508551-18-022022사업장폐기물51182<NA>NN
79569147355651-08-052022사업장폐기물5185NNN
18541142377391-102022생활폐기물9110<NA><NA>N<NA>
29964140354451-422022사업장폐기물5142<NA><NA>NN
18550137865591-102022생활폐기물9110<NA><NA>N<NA>
48935145482151-03-062022사업장폐기물5136<NA>NN
20394139954207-012022사업장폐기물71<NA><NA>NN
14658139997506-01-062022사업장폐기물616<NA>YN
2113139676151-03-052022사업장폐기물5135<NA>NN
시퀀스폐기물분류번호실적년도폐기물분류코드폐기물정보대분류폐기물정보중분류폐기물정보소분류연소여부지정폐기물여부의료폐기물여부
51874152885451-17-212022사업장폐기물511721<NA>NN
199651547873072022사업장폐기물7<NA><NA><NA>NN
33895148772251-13-022022사업장폐기물51132NNN
72018141864351-26-002022사업장폐기물51260NNN
7507147870940-02-062022건설폐기물4026<NA>N<NA>
52860147069851-17-292022사업장폐기물511729<NA>NN
29210137054451-04-022022사업장폐기물5142NNN
44079147355691-15-002022생활폐기물91150<NA>N<NA>
47822155121810-11-002022사업장폐기물10110YYY
71862145488751-26-002022사업장폐기물51260NNN