Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows197
Duplicate rows (%)2.0%
Total size in memory556.6 KiB
Average record size in memory57.0 B

Variable types

Numeric1
Text1
DateTime2
Categorical2

Dataset

Description서울특별시 광진구 대형생활폐기물 수거실적처리 데이터(수거실적처리전표번호,품목 일련번호,최초신청시간,최초신청자,최종작업시간,최종작업자)
Author서울특별시 광진구
URLhttps://www.data.go.kr/data/15069870/fileData.do

Alerts

Dataset has 197 (2.0%) duplicate rowsDuplicates
최초신청자 is highly imbalanced (58.2%)Imbalance
최종작업자 is highly imbalanced (75.5%)Imbalance

Reproduction

Analysis started2023-12-12 22:40:28.712789
Analysis finished2023-12-12 22:40:29.350282
Duration0.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct4752
Distinct (%)47.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.02073 × 1011
Minimum2.0201 × 1011
Maximum2.022021 × 1011
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T07:40:29.437157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2.0201 × 1011
5-th percentile2.0201 × 1011
Q12.02011 × 1011
median2.02102 × 1011
Q32.0210726 × 1011
95-th percentile2.0211227 × 1011
Maximum2.022021 × 1011
Range1.9210025 × 108
Interquartile range (IQR)96260252

Descriptive statistics

Standard deviation53244981
Coefficient of variation (CV)0.00026349379
Kurtosis-0.72172815
Mean2.02073 × 1011
Median Absolute Deviation (MAD)9055286.5
Skewness0.13114652
Sum2.02073 × 1015
Variance2.835028 × 1015
MonotonicityNot monotonic
2023-12-13T07:40:29.629605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
202011000000 1814
 
18.1%
202010000000 1097
 
11.0%
202012000000 987
 
9.9%
202101000000 951
 
9.5%
202102000000 334
 
3.3%
202103080190 4
 
< 0.1%
202110010131 3
 
< 0.1%
202106180110 2
 
< 0.1%
202105240122 2
 
< 0.1%
202108140116 2
 
< 0.1%
Other values (4742) 4804
48.0%
ValueCountFrequency (%)
202010000000 1097
11.0%
202011000000 1814
18.1%
202012000000 987
9.9%
202101000000 951
9.5%
202102000000 334
 
3.3%
202102090133 1
 
< 0.1%
202102090149 1
 
< 0.1%
202102090173 1
 
< 0.1%
202102090176 1
 
< 0.1%
202102090268 1
 
< 0.1%
ValueCountFrequency (%)
202202100252 1
< 0.1%
202202100124 1
< 0.1%
202202090652 1
< 0.1%
202202090633 1
< 0.1%
202202090607 1
< 0.1%
202202090563 1
< 0.1%
202202090532 1
< 0.1%
202202090510 1
< 0.1%
202202090457 1
< 0.1%
202202090438 1
< 0.1%
Distinct429
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:40:29.967754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7
Mean length7.1237
Min length6

Characters and Unicode

Total characters71237
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique59 ?
Unique (%)0.6%

Sample

1st row3015001
2nd row2-20-16
3rd row1035006
4th row2-20-20
5th row2-20-6
ValueCountFrequency (%)
2-20-20 616
 
6.2%
03-30-49 515
 
5.1%
4-40-18 411
 
4.1%
03-30-65 240
 
2.4%
1017001 228
 
2.3%
2-20-12 216
 
2.2%
1035002 202
 
2.0%
1009001 163
 
1.6%
1017003 137
 
1.4%
3015001 128
 
1.3%
Other values (419) 7144
71.4%
2023-12-13T07:40:30.459326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 23275
32.7%
- 10366
14.6%
1 9383
13.2%
2 8920
 
12.5%
3 7461
 
10.5%
4 2813
 
3.9%
5 2501
 
3.5%
6 2089
 
2.9%
7 1833
 
2.6%
9 1813
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 60871
85.4%
Dash Punctuation 10366
 
14.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 23275
38.2%
1 9383
15.4%
2 8920
 
14.7%
3 7461
 
12.3%
4 2813
 
4.6%
5 2501
 
4.1%
6 2089
 
3.4%
7 1833
 
3.0%
9 1813
 
3.0%
8 783
 
1.3%
Dash Punctuation
ValueCountFrequency (%)
- 10366
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 71237
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 23275
32.7%
- 10366
14.6%
1 9383
13.2%
2 8920
 
12.5%
3 7461
 
10.5%
4 2813
 
3.9%
5 2501
 
3.5%
6 2089
 
2.9%
7 1833
 
2.6%
9 1813
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 71237
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 23275
32.7%
- 10366
14.6%
1 9383
13.2%
2 8920
 
12.5%
3 7461
 
10.5%
4 2813
 
3.9%
5 2501
 
3.5%
6 2089
 
2.9%
7 1833
 
2.6%
9 1813
 
2.5%
Distinct8165
Distinct (%)81.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2020-10-01 14:39:00
Maximum2022-02-10 12:52:00
2023-12-13T07:40:30.611669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:40:30.747944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

최초신청자
Categorical

IMBALANCE 

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
guest
7636 
dong14
 
206
dong03
 
202
dong04
 
190
dong07
 
181
Other values (12)
1585 

Length

Max length6
Median length5
Mean length5.2363
Min length5

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowdong13
2nd rowguest
3rd rowguest
4th rowguest
5th rowguest

Common Values

ValueCountFrequency (%)
guest 7636
76.4%
dong14 206
 
2.1%
dong03 202
 
2.0%
dong04 190
 
1.9%
dong07 181
 
1.8%
dong10 165
 
1.7%
dong09 165
 
1.7%
dong15 165
 
1.7%
dong02 162
 
1.6%
dong12 155
 
1.6%
Other values (7) 773
 
7.7%

Length

2023-12-13T07:40:30.868198image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
guest 7636
76.4%
dong14 206
 
2.1%
dong03 202
 
2.0%
dong04 190
 
1.9%
dong07 181
 
1.8%
dong10 165
 
1.7%
dong09 165
 
1.7%
dong15 165
 
1.7%
dong02 162
 
1.6%
dong12 155
 
1.6%
Other values (7) 773
 
7.7%
Distinct8008
Distinct (%)80.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2020-10-02 11:04:00
Maximum2022-02-23 23:45:00
2023-12-13T07:40:30.977159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:40:31.120898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

최종작업자
Categorical

IMBALANCE 

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
pda01
8514 
guest
 
783
env01
 
132
dong04
 
66
dong07
 
56
Other values (13)
 
449

Length

Max length6
Median length5
Mean length5.0571
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpda01
2nd rowpda01
3rd rowpda01
4th rowpda01
5th rowpda01

Common Values

ValueCountFrequency (%)
pda01 8514
85.1%
guest 783
 
7.8%
env01 132
 
1.3%
dong04 66
 
0.7%
dong07 56
 
0.6%
dong03 49
 
0.5%
dong02 45
 
0.4%
dong05 42
 
0.4%
dong09 40
 
0.4%
dong14 39
 
0.4%
Other values (8) 234
 
2.3%

Length

2023-12-13T07:40:31.246138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pda01 8514
85.1%
guest 783
 
7.8%
env01 132
 
1.3%
dong04 66
 
0.7%
dong07 56
 
0.6%
dong03 49
 
0.5%
dong02 45
 
0.4%
dong05 42
 
0.4%
dong09 40
 
0.4%
dong14 39
 
0.4%
Other values (8) 234
 
2.3%

Interactions

2023-12-13T07:40:29.008276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:40:31.325033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수거실적처리전표번호최초신청자최종작업자
수거실적처리전표번호1.0000.4400.292
최초신청자0.4401.0000.758
최종작업자0.2920.7581.000
2023-12-13T07:40:31.398714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
최초신청자최종작업자
최초신청자1.0000.346
최종작업자0.3461.000
2023-12-13T07:40:31.468884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수거실적처리전표번호최초신청자최종작업자
수거실적처리전표번호1.0000.2780.175
최초신청자0.2781.0000.346
최종작업자0.1750.3461.000

Missing values

2023-12-13T07:40:29.166879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:40:29.296946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

수거실적처리전표번호품목 일련번호최초신청시간최초신청자최종작업시간최종작업자
8378020211028042630150012021-10-28 16:00dong132021-10-30 07:30pda01
134812020110000002-20-162020-11-09 15:33guest2020-11-13 12:25pda01
7709120210830020410350062021-08-30 10:12guest2021-09-01 12:50pda01
314802020120000002-20-202020-12-11 16:07guest2020-12-14 09:15pda01
52062020100000002-20-62020-10-16 10:23guest2020-10-17 06:05pda01
1088720201100000003-30-492020-11-02 13:51guest2020-11-03 05:35pda01
477420201000000003-30-532020-10-15 12:04guest2020-10-15 12:09pda01
305552020120000002-20-322020-12-08 16:20guest2020-12-11 09:52pda01
294620201000000003-30-492020-10-12 10:42guest2020-10-12 16:14pda01
6155920210519027610350062021-05-19 20:21guest2021-05-21 08:06pda01
수거실적처리전표번호품목 일련번호최초신청시간최초신청자최종작업시간최종작업자
5583920210327005010190012021-03-27 11:28guest2021-03-29 14:24pda01
1712020100000002-20-622020-10-05 08:59guest2020-10-13 12:45pda01
390562021010000001-20-142021-01-06 13:49guest2021-01-07 08:53pda01
5967820210419022410030012021-04-19 10:46dong132021-04-20 09:17pda01
3710520201200000003-30-532020-12-30 11:51guest2021-01-04 09:58pda01
8163920211004008860030032021-10-04 11:05guest2021-10-06 13:04pda01
7034320210713000370410012021-07-13 00:13guest2021-07-14 13:14pda01
5072820210216042860450032021-02-16 15:54dong042021-02-18 12:56pda01
6311020210514006910170022021-05-14 09:42dong092021-05-14 12:58pda01
83820201000000003-30-492020-10-05 15:09guest2020-10-06 10:33pda01

Duplicate rows

Most frequently occurring

수거실적처리전표번호품목 일련번호최초신청시간최초신청자최종작업시간최종작업자# duplicates
822020110000002-20-202020-11-14 00:23guest2020-11-14 00:23pda01384
982020110000004-40-182020-11-14 00:23guest2020-11-14 00:23pda01238
1022020110000004-40-182020-11-14 00:24guest2020-11-14 00:24pda01146
1120201000000003-30-492020-10-12 10:42guest2020-10-12 16:14pda016
15620210100000003-30-492021-01-07 14:05guest2021-01-08 06:11pda015
1420201000000003-30-492020-10-20 17:45guest2020-10-23 10:41pda014
292020100000002-20-162020-10-05 12:24guest2020-10-05 13:50pda014
802020110000002-20-172020-11-19 10:04guest2020-11-20 10:32pda014
10920201200000003-30-492020-12-11 16:21guest2020-12-14 09:17pda014
15420210100000003-30-492021-01-06 13:23guest2021-01-07 10:53pda014