Overview

Dataset statistics

Number of variables7
Number of observations9312
Missing cells12924
Missing cells (%)19.8%
Duplicate rows100
Duplicate rows (%)1.1%
Total size in memory545.8 KiB
Average record size in memory60.0 B

Variable types

Categorical3
Text2
Numeric2

Dataset

Description인천광역시 서구 쓰레기종량제봉투 포장단위에 대한 데이터로 봉투명, 구분, 만료기간, 수량 등의 정보가 포함되어 있습니다.
Author인천광역시 서구
URLhttps://data.incheon.go.kr/findData/publicDataDetail?dataId=15090818&srcSe=7661IVAWM27C61E190

Alerts

Dataset has 100 (1.1%) duplicate rowsDuplicates
데이터기준일자 is highly overall correlated with 수량 and 3 other fieldsHigh correlation
지정코드 is highly overall correlated with 수량 and 3 other fieldsHigh correlation
구분 is highly overall correlated with 지정코드 and 1 other fieldsHigh correlation
수량 is highly overall correlated with 지정코드 and 1 other fieldsHigh correlation
Unnamed: 6 is highly overall correlated with 지정코드 and 1 other fieldsHigh correlation
봉투명 has 3231 (34.7%) missing valuesMissing
만료기간 has 3231 (34.7%) missing valuesMissing
수량 has 3231 (34.7%) missing valuesMissing
Unnamed: 6 has 3231 (34.7%) missing valuesMissing

Reproduction

Analysis started2024-03-18 02:00:40.440144
Analysis finished2024-03-18 02:00:42.943878
Duration2.5 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

지정코드
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size72.9 KiB
110308
6081 
<NA>
3231 

Length

Max length6
Median length6
Mean length5.3060567
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row110308
2nd row110308
3rd row110308
4th row110308
5th row110308

Common Values

ValueCountFrequency (%)
110308 6081
65.3%
<NA> 3231
34.7%

Length

2024-03-18T11:00:43.006582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-18T11:00:43.090542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
110308 6081
65.3%
na 3231
34.7%

봉투명
Text

MISSING 

Distinct61
Distinct (%)1.0%
Missing3231
Missing (%)34.7%
Memory size72.9 KiB
2024-03-18T11:00:43.246379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length11
Mean length9.3601381
Min length5

Characters and Unicode

Total characters56919
Distinct characters55
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row불연성 10L
2nd row불연성 10L
3rd row불연성 10L
4th row불연성 10L
5th row불연성 10L
ValueCountFrequency (%)
일반용 1236
 
10.7%
스티커 1206
 
10.4%
재활용 753
 
6.5%
음식물 696
 
6.0%
필증 501
 
4.3%
불연성 462
 
4.0%
10l 444
 
3.8%
20l 438
 
3.8%
재사용 336
 
2.9%
50l 321
 
2.8%
Other values (48) 5166
44.7%
2024-03-18T11:00:43.525926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 7155
 
12.6%
6486
 
11.4%
L 5013
 
8.8%
3135
 
5.5%
) 2775
 
4.9%
( 2775
 
4.9%
1 2037
 
3.6%
5 1800
 
3.2%
1551
 
2.7%
2 1458
 
2.6%
Other values (45) 22734
39.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 26043
45.8%
Decimal Number 13791
24.2%
Space Separator 6486
 
11.4%
Uppercase Letter 5025
 
8.8%
Close Punctuation 2775
 
4.9%
Open Punctuation 2775
 
4.9%
Other Punctuation 12
 
< 0.1%
Math Symbol 12
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3135
 
12.0%
1551
 
6.0%
1332
 
5.1%
1236
 
4.7%
1236
 
4.7%
1206
 
4.6%
1206
 
4.6%
1206
 
4.6%
1089
 
4.2%
1056
 
4.1%
Other values (29) 11790
45.3%
Decimal Number
ValueCountFrequency (%)
0 7155
51.9%
1 2037
 
14.8%
5 1800
 
13.1%
2 1458
 
10.6%
3 1029
 
7.5%
6 276
 
2.0%
8 12
 
0.1%
9 12
 
0.1%
4 12
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
L 5013
99.8%
E 12
 
0.2%
Space Separator
ValueCountFrequency (%)
6486
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2775
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2775
100.0%
Other Punctuation
ValueCountFrequency (%)
. 12
100.0%
Math Symbol
ValueCountFrequency (%)
+ 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 26043
45.8%
Common 25851
45.4%
Latin 5025
 
8.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3135
 
12.0%
1551
 
6.0%
1332
 
5.1%
1236
 
4.7%
1236
 
4.7%
1206
 
4.6%
1206
 
4.6%
1206
 
4.6%
1089
 
4.2%
1056
 
4.1%
Other values (29) 11790
45.3%
Common
ValueCountFrequency (%)
0 7155
27.7%
6486
25.1%
) 2775
 
10.7%
( 2775
 
10.7%
1 2037
 
7.9%
5 1800
 
7.0%
2 1458
 
5.6%
3 1029
 
4.0%
6 276
 
1.1%
8 12
 
< 0.1%
Other values (4) 48
 
0.2%
Latin
ValueCountFrequency (%)
L 5013
99.8%
E 12
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30876
54.2%
Hangul 26043
45.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 7155
23.2%
6486
21.0%
L 5013
16.2%
) 2775
 
9.0%
( 2775
 
9.0%
1 2037
 
6.6%
5 1800
 
5.8%
2 1458
 
4.7%
3 1029
 
3.3%
6 276
 
0.9%
Other values (6) 72
 
0.2%
Hangul
ValueCountFrequency (%)
3135
 
12.0%
1551
 
6.0%
1332
 
5.1%
1236
 
4.7%
1236
 
4.7%
1206
 
4.6%
1206
 
4.6%
1206
 
4.6%
1089
 
4.2%
1056
 
4.1%
Other values (29) 11790
45.3%

구분
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size72.9 KiB
<NA>
3231 
1
2027 
2
2027 
3
2027 

Length

Max length4
Median length1
Mean length2.0409149
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row3
4th row1
5th row2

Common Values

ValueCountFrequency (%)
<NA> 3231
34.7%
1 2027
21.8%
2 2027
21.8%
3 2027
21.8%

Length

2024-03-18T11:00:43.660987image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-18T11:00:43.759927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 3231
34.7%
1 2027
21.8%
2 2027
21.8%
3 2027
21.8%

만료기간
Text

MISSING 

Distinct60
Distinct (%)1.0%
Missing3231
Missing (%)34.7%
Memory size72.9 KiB
2024-03-18T11:00:43.936285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters60810
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2002-01-02
2nd row2002-01-02
3rd row2002-01-02
4th row2003-03-02
5th row2003-03-02
ValueCountFrequency (%)
2022-04-13 186
 
3.1%
9999-99-99 186
 
3.1%
2022-04-12 186
 
3.1%
2021-12-23 186
 
3.1%
2020-11-12 159
 
2.6%
2020-08-04 159
 
2.6%
2020-08-03 159
 
2.6%
2021-12-07 159
 
2.6%
2021-01-31 159
 
2.6%
2020-11-15 159
 
2.6%
Other values (50) 4383
72.1%
2024-03-18T11:00:44.317669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 14337
23.6%
- 12162
20.0%
2 11955
19.7%
1 9813
16.1%
9 4263
 
7.0%
3 1902
 
3.1%
7 1485
 
2.4%
4 1443
 
2.4%
6 1239
 
2.0%
8 1110
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 48648
80.0%
Dash Punctuation 12162
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 14337
29.5%
2 11955
24.6%
1 9813
20.2%
9 4263
 
8.8%
3 1902
 
3.9%
7 1485
 
3.1%
4 1443
 
3.0%
6 1239
 
2.5%
8 1110
 
2.3%
5 1101
 
2.3%
Dash Punctuation
ValueCountFrequency (%)
- 12162
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 60810
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 14337
23.6%
- 12162
20.0%
2 11955
19.7%
1 9813
16.1%
9 4263
 
7.0%
3 1902
 
3.1%
7 1485
 
2.4%
4 1443
 
2.4%
6 1239
 
2.0%
8 1110
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60810
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 14337
23.6%
- 12162
20.0%
2 11955
19.7%
1 9813
16.1%
9 4263
 
7.0%
3 1902
 
3.1%
7 1485
 
2.4%
4 1443
 
2.4%
6 1239
 
2.0%
8 1110
 
1.8%

수량
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct13
Distinct (%)0.2%
Missing3231
Missing (%)34.7%
Infinite0
Infinite (%)0.0%
Mean36.439895
Minimum1
Maximum1000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size82.0 KiB
2024-03-18T11:00:44.442791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median10
Q350
95-th percentile200
Maximum1000
Range999
Interquartile range (IQR)49

Descriptive statistics

Standard deviation71.641466
Coefficient of variation (CV)1.9660174
Kurtosis84.783041
Mean36.439895
Median Absolute Deviation (MAD)9
Skewness7.0907156
Sum221591
Variance5132.4997
MonotonicityNot monotonic
2024-03-18T11:00:44.556412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
1 2041
21.9%
10 1603
17.2%
100 856
 
9.2%
20 580
 
6.2%
50 444
 
4.8%
200 329
 
3.5%
5 142
 
1.5%
30 27
 
0.3%
1000 16
 
0.2%
15 16
 
0.2%
Other values (3) 27
 
0.3%
(Missing) 3231
34.7%
ValueCountFrequency (%)
1 2041
21.9%
4 10
 
0.1%
5 142
 
1.5%
10 1603
17.2%
15 16
 
0.2%
20 580
 
6.2%
25 16
 
0.2%
30 27
 
0.3%
50 444
 
4.8%
100 856
9.2%
ValueCountFrequency (%)
1000 16
 
0.2%
200 329
 
3.5%
120 1
 
< 0.1%
100 856
9.2%
50 444
 
4.8%
30 27
 
0.3%
25 16
 
0.2%
20 580
 
6.2%
15 16
 
0.2%
10 1603
17.2%

데이터기준일자
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size72.9 KiB
2022-09-06
6081 
<NA>
3231 

Length

Max length10
Median length10
Mean length7.9181701
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-09-06
2nd row2022-09-06
3rd row2022-09-06
4th row2022-09-06
5th row2022-09-06

Common Values

ValueCountFrequency (%)
2022-09-06 6081
65.3%
<NA> 3231
34.7%

Length

2024-03-18T11:00:44.666213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-18T11:00:44.762368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022-09-06 6081
65.3%
na 3231
34.7%

Unnamed: 6
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct20
Distinct (%)0.3%
Missing3231
Missing (%)34.7%
Infinite0
Infinite (%)0.0%
Mean2261.3222
Minimum2000
Maximum9999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size82.0 KiB
2024-03-18T11:00:44.854406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2003
Q12017
median2019
Q32021
95-th percentile2022
Maximum9999
Range7999
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1374.5615
Coefficient of variation (CV)0.60785746
Kurtosis27.747976
Mean2261.3222
Median Absolute Deviation (MAD)2
Skewness5.4532872
Sum13751100
Variance1889419.4
MonotonicityNot monotonic
2024-03-18T11:00:44.974011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
2019 1701
18.3%
2020 1107
 
11.9%
2021 981
 
10.5%
2022 372
 
4.0%
2017 369
 
4.0%
2003 267
 
2.9%
2012 210
 
2.3%
9999 186
 
2.0%
2013 168
 
1.8%
2016 123
 
1.3%
Other values (10) 597
 
6.4%
(Missing) 3231
34.7%
ValueCountFrequency (%)
2000 15
 
0.2%
2001 81
 
0.9%
2002 36
 
0.4%
2003 267
2.9%
2004 93
 
1.0%
2005 48
 
0.5%
2006 48
 
0.5%
2009 48
 
0.5%
2010 105
 
1.1%
2012 210
2.3%
ValueCountFrequency (%)
9999 186
 
2.0%
2022 372
 
4.0%
2021 981
10.5%
2020 1107
11.9%
2019 1701
18.3%
2017 369
 
4.0%
2016 123
 
1.3%
2015 39
 
0.4%
2014 84
 
0.9%
2013 168
 
1.8%

Interactions

2024-03-18T11:00:42.439126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-18T11:00:42.180778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-18T11:00:42.519594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-18T11:00:42.328928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-18T11:00:45.063030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
봉투명구분만료기간수량Unnamed: 6
봉투명1.0000.0000.1000.6490.183
구분0.0001.0000.0000.4780.000
만료기간0.1000.0001.0000.1391.000
수량0.6490.4780.1391.0000.000
Unnamed: 60.1830.0001.0000.0001.000
2024-03-18T11:00:45.146730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
데이터기준일자지정코드구분
데이터기준일자1.0001.0001.000
지정코드1.0001.0001.000
구분1.0001.0001.000
2024-03-18T11:00:45.215074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수량Unnamed: 6지정코드구분데이터기준일자
수량1.0000.0171.0000.1901.000
Unnamed: 60.0171.0001.0000.0001.000
지정코드1.0001.0001.0001.0001.000
구분0.1900.0001.0001.0001.000
데이터기준일자1.0001.0001.0001.0001.000

Missing values

2024-03-18T11:00:42.628616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-18T11:00:42.720956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-18T11:00:42.852446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

지정코드봉투명구분만료기간수량데이터기준일자Unnamed: 6
0110308불연성 10L12002-01-02202022-09-062002
1110308불연성 10L22002-01-021002022-09-062002
2110308불연성 10L32002-01-0212022-09-062002
3110308불연성 10L12003-03-02202022-09-062003
4110308불연성 10L22003-03-021002022-09-062003
5110308불연성 10L32003-03-0212022-09-062003
6110308불연성 10L12003-04-07202022-09-062003
7110308불연성 10L22003-04-071002022-09-062003
8110308불연성 10L32003-04-0712022-09-062003
9110308불연성 20L12003-03-02102022-09-062003
지정코드봉투명구분만료기간수량데이터기준일자Unnamed: 6
9302<NA><NA><NA><NA><NA><NA><NA>
9303<NA><NA><NA><NA><NA><NA><NA>
9304<NA><NA><NA><NA><NA><NA><NA>
9305<NA><NA><NA><NA><NA><NA><NA>
9306<NA><NA><NA><NA><NA><NA><NA>
9307<NA><NA><NA><NA><NA><NA><NA>
9308<NA><NA><NA><NA><NA><NA><NA>
9309<NA><NA><NA><NA><NA><NA><NA>
9310<NA><NA><NA><NA><NA><NA><NA>
9311<NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

지정코드봉투명구분만료기간수량데이터기준일자Unnamed: 6# duplicates
99<NA><NA><NA><NA><NA><NA><NA>3231
0110308스티커 1000원권(청라)12015-12-01502022-09-0620152
1110308스티커 1000원권(청라)12016-10-17502022-09-0620162
2110308스티커 1000원권(청라)12017-08-15502022-09-0620172
3110308스티커 1000원권(청라)12017-10-16502022-09-0620172
4110308스티커 1000원권(청라)12017-12-12502022-09-0620172
5110308스티커 1000원권(청라)12019-05-12502022-09-0620192
6110308스티커 1000원권(청라)12019-05-13502022-09-0620192
7110308스티커 1000원권(청라)12019-05-14502022-09-0620192
8110308스티커 1000원권(청라)12019-09-04502022-09-0620192