Dataset statistics
Number of variables | 7 |
---|---|
Number of observations | 9312 |
Missing cells | 12924 |
Missing cells (%) | 19.8% |
Duplicate rows | 100 |
Duplicate rows (%) | 1.1% |
Total size in memory | 545.8 KiB |
Average record size in memory | 60.0 B |
Variable types
Categorical | 3 |
---|---|
Text | 2 |
Numeric | 2 |
Dataset
Description | 인천광역시 서구 쓰레기종량제봉투 포장단위에 대한 데이터로 봉투명, 구분, 만료기간, 수량 등의 정보가 포함되어 있습니다. |
---|---|
Author | 인천광역시 서구 |
URL | https://data.incheon.go.kr/findData/publicDataDetail?dataId=15090818&srcSe=7661IVAWM27C61E190 |
Dataset has 100 (1.1%) duplicate rows | Duplicates |
데이터기준일자 is highly overall correlated with 수량 and 3 other fields | High correlation |
지정코드 is highly overall correlated with 수량 and 3 other fields | High correlation |
구분 is highly overall correlated with 지정코드 and 1 other fields | High correlation |
수량 is highly overall correlated with 지정코드 and 1 other fields | High correlation |
Unnamed: 6 is highly overall correlated with 지정코드 and 1 other fields | High correlation |
봉투명 has 3231 (34.7%) missing values | Missing |
만료기간 has 3231 (34.7%) missing values | Missing |
수량 has 3231 (34.7%) missing values | Missing |
Unnamed: 6 has 3231 (34.7%) missing values | Missing |
Reproduction
Analysis started | 2024-03-18 02:00:40.440144 |
---|---|
Analysis finished | 2024-03-18 02:00:42.943878 |
Duration | 2.5 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
지정코드
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 72.9 KiB |
110308 | |
---|---|
<NA> |
Length
Max length | 6 |
---|---|
Median length | 6 |
Mean length | 5.3060567 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 110308 |
---|---|
2nd row | 110308 |
3rd row | 110308 |
4th row | 110308 |
5th row | 110308 |
Common Values
Value | Count | Frequency (%) |
110308 | 6081 | |
<NA> | 3231 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
110308 | 6081 | |
na | 3231 |
봉투명
Text
MISSING
 
Distinct | 61 |
---|---|
Distinct (%) | 1.0% |
Missing | 3231 |
Missing (%) | 34.7% |
Memory size | 72.9 KiB |
Value | Count | Frequency (%) |
일반용 | 1236 | 10.7% |
스티커 | 1206 | 10.4% |
재활용 | 753 | 6.5% |
음식물 | 696 | 6.0% |
필증 | 501 | 4.3% |
불연성 | 462 | 4.0% |
10l | 444 | 3.8% |
20l | 438 | 3.8% |
재사용 | 336 | 2.9% |
50l | 321 | 2.8% |
Other values (48) | 5166 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 7155 | 12.6% |
6486 | 11.4% | |
L | 5013 | 8.8% |
용 | 3135 | 5.5% |
) | 2775 | 4.9% |
( | 2775 | 4.9% |
1 | 2037 | 3.6% |
5 | 1800 | 3.2% |
라 | 1551 | 2.7% |
2 | 1458 | 2.6% |
Other values (45) | 22734 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 26043 | |
Decimal Number | 13791 | |
Space Separator | 6486 | 11.4% |
Uppercase Letter | 5025 | 8.8% |
Close Punctuation | 2775 | 4.9% |
Open Punctuation | 2775 | 4.9% |
Other Punctuation | 12 | < 0.1% |
Math Symbol | 12 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
용 | 3135 | 12.0% |
라 | 1551 | 6.0% |
청 | 1332 | 5.1% |
반 | 1236 | 4.7% |
일 | 1236 | 4.7% |
스 | 1206 | 4.6% |
커 | 1206 | 4.6% |
티 | 1206 | 4.6% |
재 | 1089 | 4.2% |
권 | 1056 | 4.1% |
Other values (29) | 11790 |
Decimal Number
Value | Count | Frequency (%) |
0 | 7155 | |
1 | 2037 | 14.8% |
5 | 1800 | 13.1% |
2 | 1458 | 10.6% |
3 | 1029 | 7.5% |
6 | 276 | 2.0% |
8 | 12 | 0.1% |
9 | 12 | 0.1% |
4 | 12 | 0.1% |
Uppercase Letter
Value | Count | Frequency (%) |
L | 5013 | |
E | 12 | 0.2% |
Space Separator
Value | Count | Frequency (%) |
6486 |
Close Punctuation
Value | Count | Frequency (%) |
) | 2775 |
Open Punctuation
Value | Count | Frequency (%) |
( | 2775 |
Other Punctuation
Value | Count | Frequency (%) |
. | 12 |
Math Symbol
Value | Count | Frequency (%) |
+ | 12 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 26043 | |
Common | 25851 | |
Latin | 5025 | 8.8% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
용 | 3135 | 12.0% |
라 | 1551 | 6.0% |
청 | 1332 | 5.1% |
반 | 1236 | 4.7% |
일 | 1236 | 4.7% |
스 | 1206 | 4.6% |
커 | 1206 | 4.6% |
티 | 1206 | 4.6% |
재 | 1089 | 4.2% |
권 | 1056 | 4.1% |
Other values (29) | 11790 |
Common
Value | Count | Frequency (%) |
0 | 7155 | |
6486 | ||
) | 2775 | 10.7% |
( | 2775 | 10.7% |
1 | 2037 | 7.9% |
5 | 1800 | 7.0% |
2 | 1458 | 5.6% |
3 | 1029 | 4.0% |
6 | 276 | 1.1% |
8 | 12 | < 0.1% |
Other values (4) | 48 | 0.2% |
Latin
Value | Count | Frequency (%) |
L | 5013 | |
E | 12 | 0.2% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 30876 | |
Hangul | 26043 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 7155 | |
6486 | ||
L | 5013 | |
) | 2775 | 9.0% |
( | 2775 | 9.0% |
1 | 2037 | 6.6% |
5 | 1800 | 5.8% |
2 | 1458 | 4.7% |
3 | 1029 | 3.3% |
6 | 276 | 0.9% |
Other values (6) | 72 | 0.2% |
Hangul
Value | Count | Frequency (%) |
용 | 3135 | 12.0% |
라 | 1551 | 6.0% |
청 | 1332 | 5.1% |
반 | 1236 | 4.7% |
일 | 1236 | 4.7% |
스 | 1206 | 4.6% |
커 | 1206 | 4.6% |
티 | 1206 | 4.6% |
재 | 1089 | 4.2% |
권 | 1056 | 4.1% |
Other values (29) | 11790 |
구분
Categorical
HIGH CORRELATION
 
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 72.9 KiB |
<NA> | |
---|---|
1 | |
2 | |
3 |
Length
Max length | 4 |
---|---|
Median length | 1 |
Mean length | 2.0409149 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1 |
---|---|
2nd row | 2 |
3rd row | 3 |
4th row | 1 |
5th row | 2 |
Common Values
Value | Count | Frequency (%) |
<NA> | 3231 | |
1 | 2027 | |
2 | 2027 | |
3 | 2027 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 3231 | |
1 | 2027 | |
2 | 2027 | |
3 | 2027 |
만료기간
Text
MISSING
 
Distinct | 60 |
---|---|
Distinct (%) | 1.0% |
Missing | 3231 |
Missing (%) | 34.7% |
Memory size | 72.9 KiB |
Length
Max length | 10 |
---|---|
Median length | 10 |
Mean length | 10 |
Min length | 10 |
Characters and Unicode
Total characters | 60810 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 2002-01-02 |
---|---|
2nd row | 2002-01-02 |
3rd row | 2002-01-02 |
4th row | 2003-03-02 |
5th row | 2003-03-02 |
Value | Count | Frequency (%) |
2022-04-13 | 186 | 3.1% |
9999-99-99 | 186 | 3.1% |
2022-04-12 | 186 | 3.1% |
2021-12-23 | 186 | 3.1% |
2020-11-12 | 159 | 2.6% |
2020-08-04 | 159 | 2.6% |
2020-08-03 | 159 | 2.6% |
2021-12-07 | 159 | 2.6% |
2021-01-31 | 159 | 2.6% |
2020-11-15 | 159 | 2.6% |
Other values (50) | 4383 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 14337 | |
- | 12162 | |
2 | 11955 | |
1 | 9813 | |
9 | 4263 | 7.0% |
3 | 1902 | 3.1% |
7 | 1485 | 2.4% |
4 | 1443 | 2.4% |
6 | 1239 | 2.0% |
8 | 1110 | 1.8% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 48648 | |
Dash Punctuation | 12162 | 20.0% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 14337 | |
2 | 11955 | |
1 | 9813 | |
9 | 4263 | 8.8% |
3 | 1902 | 3.9% |
7 | 1485 | 3.1% |
4 | 1443 | 3.0% |
6 | 1239 | 2.5% |
8 | 1110 | 2.3% |
5 | 1101 | 2.3% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 12162 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 60810 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 14337 | |
- | 12162 | |
2 | 11955 | |
1 | 9813 | |
9 | 4263 | 7.0% |
3 | 1902 | 3.1% |
7 | 1485 | 2.4% |
4 | 1443 | 2.4% |
6 | 1239 | 2.0% |
8 | 1110 | 1.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 60810 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 14337 | |
- | 12162 | |
2 | 11955 | |
1 | 9813 | |
9 | 4263 | 7.0% |
3 | 1902 | 3.1% |
7 | 1485 | 2.4% |
4 | 1443 | 2.4% |
6 | 1239 | 2.0% |
8 | 1110 | 1.8% |
수량
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 13 |
---|---|
Distinct (%) | 0.2% |
Missing | 3231 |
Missing (%) | 34.7% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 36.439895 |
Minimum | 1 |
---|---|
Maximum | 1000 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 82.0 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
median | 10 |
Q3 | 50 |
95-th percentile | 200 |
Maximum | 1000 |
Range | 999 |
Interquartile range (IQR) | 49 |
Descriptive statistics
Standard deviation | 71.641466 |
---|---|
Coefficient of variation (CV) | 1.9660174 |
Kurtosis | 84.783041 |
Mean | 36.439895 |
Median Absolute Deviation (MAD) | 9 |
Skewness | 7.0907156 |
Sum | 221591 |
Variance | 5132.4997 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 2041 | |
10 | 1603 | |
100 | 856 | 9.2% |
20 | 580 | 6.2% |
50 | 444 | 4.8% |
200 | 329 | 3.5% |
5 | 142 | 1.5% |
30 | 27 | 0.3% |
1000 | 16 | 0.2% |
15 | 16 | 0.2% |
Other values (3) | 27 | 0.3% |
(Missing) | 3231 |
Value | Count | Frequency (%) |
1 | 2041 | |
4 | 10 | 0.1% |
5 | 142 | 1.5% |
10 | 1603 | |
15 | 16 | 0.2% |
20 | 580 | 6.2% |
25 | 16 | 0.2% |
30 | 27 | 0.3% |
50 | 444 | 4.8% |
100 | 856 |
Value | Count | Frequency (%) |
1000 | 16 | 0.2% |
200 | 329 | 3.5% |
120 | 1 | < 0.1% |
100 | 856 | |
50 | 444 | 4.8% |
30 | 27 | 0.3% |
25 | 16 | 0.2% |
20 | 580 | 6.2% |
15 | 16 | 0.2% |
10 | 1603 |
데이터기준일자
Categorical
HIGH CORRELATION
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 72.9 KiB |
2022-09-06 | |
---|---|
<NA> |
Length
Max length | 10 |
---|---|
Median length | 10 |
Mean length | 7.9181701 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 2022-09-06 |
---|---|
2nd row | 2022-09-06 |
3rd row | 2022-09-06 |
4th row | 2022-09-06 |
5th row | 2022-09-06 |
Common Values
Value | Count | Frequency (%) |
2022-09-06 | 6081 | |
<NA> | 3231 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
2022-09-06 | 6081 | |
na | 3231 |
Unnamed: 6
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 20 |
---|---|
Distinct (%) | 0.3% |
Missing | 3231 |
Missing (%) | 34.7% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2261.3222 |
Minimum | 2000 |
---|---|
Maximum | 9999 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 82.0 KiB |
Quantile statistics
Minimum | 2000 |
---|---|
5-th percentile | 2003 |
Q1 | 2017 |
median | 2019 |
Q3 | 2021 |
95-th percentile | 2022 |
Maximum | 9999 |
Range | 7999 |
Interquartile range (IQR) | 4 |
Descriptive statistics
Standard deviation | 1374.5615 |
---|---|
Coefficient of variation (CV) | 0.60785746 |
Kurtosis | 27.747976 |
Mean | 2261.3222 |
Median Absolute Deviation (MAD) | 2 |
Skewness | 5.4532872 |
Sum | 13751100 |
Variance | 1889419.4 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
2019 | 1701 | |
2020 | 1107 | 11.9% |
2021 | 981 | 10.5% |
2022 | 372 | 4.0% |
2017 | 369 | 4.0% |
2003 | 267 | 2.9% |
2012 | 210 | 2.3% |
9999 | 186 | 2.0% |
2013 | 168 | 1.8% |
2016 | 123 | 1.3% |
Other values (10) | 597 | 6.4% |
(Missing) | 3231 |
Value | Count | Frequency (%) |
2000 | 15 | 0.2% |
2001 | 81 | 0.9% |
2002 | 36 | 0.4% |
2003 | 267 | |
2004 | 93 | 1.0% |
2005 | 48 | 0.5% |
2006 | 48 | 0.5% |
2009 | 48 | 0.5% |
2010 | 105 | 1.1% |
2012 | 210 |
Value | Count | Frequency (%) |
9999 | 186 | 2.0% |
2022 | 372 | 4.0% |
2021 | 981 | |
2020 | 1107 | |
2019 | 1701 | |
2017 | 369 | 4.0% |
2016 | 123 | 1.3% |
2015 | 39 | 0.4% |
2014 | 84 | 0.9% |
2013 | 168 | 1.8% |
봉투명 | 구분 | 만료기간 | 수량 | Unnamed: 6 | |
---|---|---|---|---|---|
봉투명 | 1.000 | 0.000 | 0.100 | 0.649 | 0.183 |
구분 | 0.000 | 1.000 | 0.000 | 0.478 | 0.000 |
만료기간 | 0.100 | 0.000 | 1.000 | 0.139 | 1.000 |
수량 | 0.649 | 0.478 | 0.139 | 1.000 | 0.000 |
Unnamed: 6 | 0.183 | 0.000 | 1.000 | 0.000 | 1.000 |
데이터기준일자 | 지정코드 | 구분 | |
---|---|---|---|
데이터기준일자 | 1.000 | 1.000 | 1.000 |
지정코드 | 1.000 | 1.000 | 1.000 |
구분 | 1.000 | 1.000 | 1.000 |
수량 | Unnamed: 6 | 지정코드 | 구분 | 데이터기준일자 | |
---|---|---|---|---|---|
수량 | 1.000 | 0.017 | 1.000 | 0.190 | 1.000 |
Unnamed: 6 | 0.017 | 1.000 | 1.000 | 0.000 | 1.000 |
지정코드 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
구분 | 0.190 | 0.000 | 1.000 | 1.000 | 1.000 |
데이터기준일자 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
지정코드 | 봉투명 | 구분 | 만료기간 | 수량 | 데이터기준일자 | Unnamed: 6 | |
---|---|---|---|---|---|---|---|
0 | 110308 | 불연성 10L | 1 | 2002-01-02 | 20 | 2022-09-06 | 2002 |
1 | 110308 | 불연성 10L | 2 | 2002-01-02 | 100 | 2022-09-06 | 2002 |
2 | 110308 | 불연성 10L | 3 | 2002-01-02 | 1 | 2022-09-06 | 2002 |
3 | 110308 | 불연성 10L | 1 | 2003-03-02 | 20 | 2022-09-06 | 2003 |
4 | 110308 | 불연성 10L | 2 | 2003-03-02 | 100 | 2022-09-06 | 2003 |
5 | 110308 | 불연성 10L | 3 | 2003-03-02 | 1 | 2022-09-06 | 2003 |
6 | 110308 | 불연성 10L | 1 | 2003-04-07 | 20 | 2022-09-06 | 2003 |
7 | 110308 | 불연성 10L | 2 | 2003-04-07 | 100 | 2022-09-06 | 2003 |
8 | 110308 | 불연성 10L | 3 | 2003-04-07 | 1 | 2022-09-06 | 2003 |
9 | 110308 | 불연성 20L | 1 | 2003-03-02 | 10 | 2022-09-06 | 2003 |
지정코드 | 봉투명 | 구분 | 만료기간 | 수량 | 데이터기준일자 | Unnamed: 6 | |
---|---|---|---|---|---|---|---|
9302 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9303 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9304 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9305 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9306 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9307 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9308 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9309 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9310 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9311 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
Most frequently occurring
지정코드 | 봉투명 | 구분 | 만료기간 | 수량 | 데이터기준일자 | Unnamed: 6 | # duplicates | |
---|---|---|---|---|---|---|---|---|
99 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 3231 |
0 | 110308 | 스티커 1000원권(청라) | 1 | 2015-12-01 | 50 | 2022-09-06 | 2015 | 2 |
1 | 110308 | 스티커 1000원권(청라) | 1 | 2016-10-17 | 50 | 2022-09-06 | 2016 | 2 |
2 | 110308 | 스티커 1000원권(청라) | 1 | 2017-08-15 | 50 | 2022-09-06 | 2017 | 2 |
3 | 110308 | 스티커 1000원권(청라) | 1 | 2017-10-16 | 50 | 2022-09-06 | 2017 | 2 |
4 | 110308 | 스티커 1000원권(청라) | 1 | 2017-12-12 | 50 | 2022-09-06 | 2017 | 2 |
5 | 110308 | 스티커 1000원권(청라) | 1 | 2019-05-12 | 50 | 2022-09-06 | 2019 | 2 |
6 | 110308 | 스티커 1000원권(청라) | 1 | 2019-05-13 | 50 | 2022-09-06 | 2019 | 2 |
7 | 110308 | 스티커 1000원권(청라) | 1 | 2019-05-14 | 50 | 2022-09-06 | 2019 | 2 |
8 | 110308 | 스티커 1000원권(청라) | 1 | 2019-09-04 | 50 | 2022-09-06 | 2019 | 2 |