Overview

Dataset statistics

Number of variables5
Number of observations7221
Missing cells0
Missing cells (%)0.0%
Duplicate rows20
Duplicate rows (%)0.3%
Total size in memory296.3 KiB
Average record size in memory42.0 B

Variable types

Text2
Categorical1
Numeric1
DateTime1

Dataset

Description인천광역시 서구 쓰레기종량제봉투 포장단위에 대한 데이터로 봉투명, 구분, 만료기간, 수량 등의 정보가 포함되어 있습니다.
Author인천광역시 서구
URLhttps://www.data.go.kr/data/15090818/fileData.do

Alerts

데이터기준일자 has constant value ""Constant
Dataset has 20 (0.3%) duplicate rowsDuplicates

Reproduction

Analysis started2024-03-14 17:25:51.713968
Analysis finished2024-03-14 17:25:52.834550
Duration1.12 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct64
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size56.5 KiB
2024-03-15T02:25:53.580848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length9.2966348
Min length5

Characters and Unicode

Total characters67131
Distinct characters51
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row불연성 10L
2nd row불연성 10L
3rd row불연성 10L
4th row불연성 10L
5th row불연성 10L
ValueCountFrequency (%)
일반용 1398
 
9.8%
스티커 1368
 
9.5%
재활용 1059
 
7.4%
필증 969
 
6.8%
음식물 885
 
6.2%
불연성 516
 
3.6%
10l 498
 
3.5%
사업계용 498
 
3.5%
20l 492
 
3.4%
재사용 414
 
2.9%
Other values (48) 6231
43.5%
2024-03-15T02:25:54.851660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 8598
 
12.8%
7107
 
10.6%
L 5991
 
8.9%
3807
 
5.7%
) 3480
 
5.2%
( 3480
 
5.2%
1 2415
 
3.6%
5 2082
 
3.1%
1878
 
2.8%
2 1692
 
2.5%
Other values (41) 26601
39.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 30621
45.6%
Decimal Number 16452
24.5%
Space Separator 7107
 
10.6%
Uppercase Letter 5991
 
8.9%
Close Punctuation 3480
 
5.2%
Open Punctuation 3480
 
5.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3807
 
12.4%
1878
 
6.1%
1605
 
5.2%
1473
 
4.8%
1398
 
4.6%
1398
 
4.6%
1368
 
4.5%
1368
 
4.5%
1368
 
4.5%
1200
 
3.9%
Other values (29) 13758
44.9%
Decimal Number
ValueCountFrequency (%)
0 8598
52.3%
1 2415
 
14.7%
5 2082
 
12.7%
2 1692
 
10.3%
3 1293
 
7.9%
6 312
 
1.9%
8 30
 
0.2%
9 30
 
0.2%
Space Separator
ValueCountFrequency (%)
7107
100.0%
Uppercase Letter
ValueCountFrequency (%)
L 5991
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3480
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 30621
45.6%
Common 30519
45.5%
Latin 5991
 
8.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3807
 
12.4%
1878
 
6.1%
1605
 
5.2%
1473
 
4.8%
1398
 
4.6%
1398
 
4.6%
1368
 
4.5%
1368
 
4.5%
1368
 
4.5%
1200
 
3.9%
Other values (29) 13758
44.9%
Common
ValueCountFrequency (%)
0 8598
28.2%
7107
23.3%
) 3480
11.4%
( 3480
11.4%
1 2415
 
7.9%
5 2082
 
6.8%
2 1692
 
5.5%
3 1293
 
4.2%
6 312
 
1.0%
8 30
 
0.1%
Latin
ValueCountFrequency (%)
L 5991
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36510
54.4%
Hangul 30621
45.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 8598
23.5%
7107
19.5%
L 5991
16.4%
) 3480
9.5%
( 3480
9.5%
1 2415
 
6.6%
5 2082
 
5.7%
2 1692
 
4.6%
3 1293
 
3.5%
6 312
 
0.9%
Other values (2) 60
 
0.2%
Hangul
ValueCountFrequency (%)
3807
 
12.4%
1878
 
6.1%
1605
 
5.2%
1473
 
4.8%
1398
 
4.6%
1398
 
4.6%
1368
 
4.5%
1368
 
4.5%
1368
 
4.5%
1200
 
3.9%
Other values (29) 13758
44.9%

구분
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size56.5 KiB
1
2407 
2
2407 
3
2407 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row3
4th row1
5th row2

Common Values

ValueCountFrequency (%)
1 2407
33.3%
2 2407
33.3%
3 2407
33.3%

Length

2024-03-15T02:25:55.139215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-15T02:25:55.325342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 2407
33.3%
2 2407
33.3%
3 2407
33.3%
Distinct66
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size56.5 KiB
2024-03-15T02:25:56.218841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters72210
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2002-01-02
2nd row2002-01-02
3rd row2002-01-02
4th row2003-03-02
5th row2003-03-02
ValueCountFrequency (%)
9999-99-99 195
 
2.7%
2022-10-24 189
 
2.6%
2022-10-18 189
 
2.6%
2023-09-14 189
 
2.6%
2022-12-11 189
 
2.6%
2022-10-25 189
 
2.6%
2022-10-11 186
 
2.6%
2022-04-13 186
 
2.6%
2022-04-12 186
 
2.6%
2021-12-23 186
 
2.6%
Other values (56) 5337
73.9%
2024-03-15T02:25:57.625120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 16410
22.7%
2 15726
21.8%
- 14442
20.0%
1 11883
16.5%
9 4524
 
6.3%
3 2091
 
2.9%
4 1821
 
2.5%
7 1485
 
2.1%
8 1299
 
1.8%
5 1290
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 57768
80.0%
Dash Punctuation 14442
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 16410
28.4%
2 15726
27.2%
1 11883
20.6%
9 4524
 
7.8%
3 2091
 
3.6%
4 1821
 
3.2%
7 1485
 
2.6%
8 1299
 
2.2%
5 1290
 
2.2%
6 1239
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
- 14442
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 72210
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 16410
22.7%
2 15726
21.8%
- 14442
20.0%
1 11883
16.5%
9 4524
 
6.3%
3 2091
 
2.9%
4 1821
 
2.5%
7 1485
 
2.1%
8 1299
 
1.8%
5 1290
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 72210
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 16410
22.7%
2 15726
21.8%
- 14442
20.0%
1 11883
16.5%
9 4524
 
6.3%
3 2091
 
2.9%
4 1821
 
2.5%
7 1485
 
2.1%
8 1299
 
1.8%
5 1290
 
1.8%

수량
Real number (ℝ)

Distinct13
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.558371
Minimum1
Maximum1000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size63.6 KiB
2024-03-15T02:25:58.006790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median10
Q350
95-th percentile200
Maximum1000
Range999
Interquartile range (IQR)49

Descriptive statistics

Standard deviation68.650515
Coefficient of variation (CV)1.9306428
Kurtosis85.290681
Mean35.558371
Median Absolute Deviation (MAD)9
Skewness6.9367767
Sum256767
Variance4712.8932
MonotonicityNot monotonic
2024-03-15T02:25:58.389663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
1 2421
33.5%
10 1867
25.9%
100 983
13.6%
20 732
 
10.1%
50 511
 
7.1%
200 387
 
5.4%
5 160
 
2.2%
15 40
 
0.6%
25 40
 
0.6%
30 39
 
0.5%
Other values (3) 41
 
0.6%
ValueCountFrequency (%)
1 2421
33.5%
4 24
 
0.3%
5 160
 
2.2%
10 1867
25.9%
15 40
 
0.6%
20 732
 
10.1%
25 40
 
0.6%
30 39
 
0.5%
50 511
 
7.1%
100 983
13.6%
ValueCountFrequency (%)
1000 16
 
0.2%
200 387
 
5.4%
120 1
 
< 0.1%
100 983
13.6%
50 511
 
7.1%
30 39
 
0.5%
25 40
 
0.6%
20 732
 
10.1%
15 40
 
0.6%
10 1867
25.9%

데이터기준일자
Date

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size56.5 KiB
Minimum2023-12-06 00:00:00
Maximum2023-12-06 00:00:00
2024-03-15T02:25:58.758568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T02:25:59.088300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Interactions

2024-03-15T02:25:52.020408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-15T02:25:59.274850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
봉투명구분만료기간수량
봉투명1.0000.0000.0000.644
구분0.0001.0000.0000.483
만료기간0.0000.0001.0000.142
수량0.6440.4830.1421.000
2024-03-15T02:25:59.552114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수량구분
수량1.0000.192
구분0.1921.000

Missing values

2024-03-15T02:25:52.378357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-15T02:25:52.698356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

봉투명구분만료기간수량데이터기준일자
0불연성 10L12002-01-02202023-12-06
1불연성 10L22002-01-021002023-12-06
2불연성 10L32002-01-0212023-12-06
3불연성 10L12003-03-02202023-12-06
4불연성 10L22003-03-021002023-12-06
5불연성 10L32003-03-0212023-12-06
6불연성 10L12003-04-07202023-12-06
7불연성 10L22003-04-071002023-12-06
8불연성 10L32003-04-0712023-12-06
9불연성 20L12003-03-02102023-12-06
봉투명구분만료기간수량데이터기준일자
7211재사용 10L(청라)39999-99-9912023-12-06
7212재사용 20L(청라)19999-99-99102023-12-06
7213재사용 20L(청라)29999-99-991002023-12-06
7214재사용 20L(청라)39999-99-9912023-12-06
7215재사용 5L19999-99-99202023-12-06
7216재사용 5L29999-99-991002023-12-06
7217재사용 5L39999-99-9912023-12-06
7218재사용 5L(청라)19999-99-99202023-12-06
7219재사용 5L(청라)29999-99-991002023-12-06
7220재사용 5L(청라)39999-99-9912023-12-06

Duplicate rows

Most frequently occurring

봉투명구분만료기간수량데이터기준일자# duplicates
0재활용 30L(주황)22021-12-23202023-12-062
1재활용 30L(주황)22022-04-12202023-12-062
2재활용 30L(주황)22022-04-13202023-12-062
3재활용 30L(주황)22022-10-11202023-12-062
4재활용 30L(주황)22022-10-18202023-12-062
5재활용 30L(주황)22022-10-24202023-12-062
6재활용 30L(주황)22022-10-25202023-12-062
7재활용 30L(주황)22022-12-11202023-12-062
8재활용 30L(주황)22023-09-14202023-12-062
9재활용 30L(주황)29999-99-99202023-12-062