Overview

Dataset statistics

Number of variables6
Number of observations2729
Missing cells0
Missing cells (%)0.0%
Duplicate rows16
Duplicate rows (%)0.6%
Total size in memory133.4 KiB
Average record size in memory50.0 B

Variable types

DateTime2
Categorical1
Text1
Numeric2

Dataset

Description포천시 대형폐기물시스템에서 제공하는 대형폐기물 수거일자, 접수일자, 결제 방법, 폐기물 종류, 수량, 금액 데이터 입니다.
Author경기도 포천시
URLhttps://www.data.go.kr/data/15061837/fileData.do

Alerts

Dataset has 16 (0.6%) duplicate rowsDuplicates
수량 is highly overall correlated with 금액High correlation
금액 is highly overall correlated with 수량High correlation

Reproduction

Analysis started2023-12-12 18:11:28.814769
Analysis finished2023-12-12 18:11:29.742566
Duration0.93 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1272
Distinct (%)46.6%
Missing0
Missing (%)0.0%
Memory size21.4 KiB
Minimum2014-06-23 00:00:00
Maximum2021-09-24 00:00:00
2023-12-13T03:11:29.821663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:11:29.990152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct622
Distinct (%)22.8%
Missing0
Missing (%)0.0%
Memory size21.4 KiB
Minimum2014-12-22 00:00:00
Maximum2021-09-30 00:00:00
2023-12-13T03:11:30.131650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:11:30.296547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

결재방법
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size21.4 KiB
카드
2107 
이체
609 
현금
 
13

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row현금
2nd row현금
3rd row현금
4th row이체
5th row이체

Common Values

ValueCountFrequency (%)
카드 2107
77.2%
이체 609
 
22.3%
현금 13
 
0.5%

Length

2023-12-13T03:11:30.440256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:11:30.853199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
카드 2107
77.2%
이체 609
 
22.3%
현금 13
 
0.5%
Distinct351
Distinct (%)12.9%
Missing0
Missing (%)0.0%
Memory size21.4 KiB
2023-12-13T03:11:31.078430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length12
Mean length5.815317
Min length2

Characters and Unicode

Total characters15870
Distinct characters159
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique132 ?
Unique (%)4.8%

Sample

1st row응접세트
2nd row매트리스
3rd row매트리스
4th row인형·장난감류 외 1종
5th row쌀통
ValueCountFrequency (%)
1214
23.3%
1종 502
 
9.6%
매트리스 342
 
6.6%
응접세트 301
 
5.8%
의자 285
 
5.5%
2종 250
 
4.8%
서랍장 235
 
4.5%
교자상 136
 
2.6%
3종 125
 
2.4%
침대(매트리스포함 121
 
2.3%
Other values (90) 1695
32.6%
2023-12-13T03:11:31.532882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2480
 
15.6%
1214
 
7.6%
1214
 
7.6%
782
 
4.9%
1 579
 
3.6%
574
 
3.6%
522
 
3.3%
496
 
3.1%
479
 
3.0%
463
 
2.9%
Other values (149) 7067
44.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 11630
73.3%
Space Separator 2480
 
15.6%
Decimal Number 1283
 
8.1%
Open Punctuation 194
 
1.2%
Close Punctuation 194
 
1.2%
Other Punctuation 45
 
0.3%
Uppercase Letter 44
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1214
 
10.4%
1214
 
10.4%
782
 
6.7%
574
 
4.9%
522
 
4.5%
496
 
4.3%
479
 
4.1%
463
 
4.0%
316
 
2.7%
301
 
2.6%
Other values (131) 5269
45.3%
Decimal Number
ValueCountFrequency (%)
1 579
45.1%
2 263
20.5%
3 129
 
10.1%
4 82
 
6.4%
5 73
 
5.7%
6 54
 
4.2%
7 37
 
2.9%
8 25
 
1.9%
9 22
 
1.7%
0 19
 
1.5%
Other Punctuation
ValueCountFrequency (%)
, 24
53.3%
/ 11
24.4%
· 10
22.2%
Uppercase Letter
ValueCountFrequency (%)
V 22
50.0%
T 22
50.0%
Space Separator
ValueCountFrequency (%)
2480
100.0%
Open Punctuation
ValueCountFrequency (%)
( 194
100.0%
Close Punctuation
ValueCountFrequency (%)
) 194
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 11630
73.3%
Common 4196
 
26.4%
Latin 44
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1214
 
10.4%
1214
 
10.4%
782
 
6.7%
574
 
4.9%
522
 
4.5%
496
 
4.3%
479
 
4.1%
463
 
4.0%
316
 
2.7%
301
 
2.6%
Other values (131) 5269
45.3%
Common
ValueCountFrequency (%)
2480
59.1%
1 579
 
13.8%
2 263
 
6.3%
( 194
 
4.6%
) 194
 
4.6%
3 129
 
3.1%
4 82
 
2.0%
5 73
 
1.7%
6 54
 
1.3%
7 37
 
0.9%
Other values (6) 111
 
2.6%
Latin
ValueCountFrequency (%)
V 22
50.0%
T 22
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 11630
73.3%
ASCII 4230
 
26.7%
None 10
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2480
58.6%
1 579
 
13.7%
2 263
 
6.2%
( 194
 
4.6%
) 194
 
4.6%
3 129
 
3.0%
4 82
 
1.9%
5 73
 
1.7%
6 54
 
1.3%
7 37
 
0.9%
Other values (7) 145
 
3.4%
Hangul
ValueCountFrequency (%)
1214
 
10.4%
1214
 
10.4%
782
 
6.7%
574
 
4.9%
522
 
4.5%
496
 
4.3%
479
 
4.1%
463
 
4.0%
316
 
2.7%
301
 
2.6%
Other values (131) 5269
45.3%
None
ValueCountFrequency (%)
· 10
100.0%

수량
Real number (ℝ)

HIGH CORRELATION 

Distinct48
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.913155
Minimum1
Maximum101
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size24.1 KiB
2023-12-13T03:11:31.688123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34
95-th percentile15
Maximum101
Range100
Interquartile range (IQR)3

Descriptive statistics

Standard deviation6.3370165
Coefficient of variation (CV)1.6194136
Kurtosis46.596451
Mean3.913155
Median Absolute Deviation (MAD)1
Skewness5.3621959
Sum10679
Variance40.157778
MonotonicityNot monotonic
2023-12-13T03:11:31.868706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
1 1245
45.6%
2 488
 
17.9%
3 257
 
9.4%
4 146
 
5.3%
5 107
 
3.9%
6 71
 
2.6%
7 68
 
2.5%
8 51
 
1.9%
9 34
 
1.2%
11 31
 
1.1%
Other values (38) 231
 
8.5%
ValueCountFrequency (%)
1 1245
45.6%
2 488
 
17.9%
3 257
 
9.4%
4 146
 
5.3%
5 107
 
3.9%
6 71
 
2.6%
7 68
 
2.5%
8 51
 
1.9%
9 34
 
1.2%
10 30
 
1.1%
ValueCountFrequency (%)
101 1
< 0.1%
80 1
< 0.1%
71 1
< 0.1%
69 1
< 0.1%
53 2
0.1%
48 2
0.1%
47 1
< 0.1%
44 1
< 0.1%
42 1
< 0.1%
41 1
< 0.1%

금액
Real number (ℝ)

HIGH CORRELATION 

Distinct132
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16147.307
Minimum2000
Maximum277000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size24.1 KiB
2023-12-13T03:11:32.037285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2000
Q14000
median7000
Q315000
95-th percentile63000
Maximum277000
Range275000
Interquartile range (IQR)11000

Descriptive statistics

Standard deviation25810.916
Coefficient of variation (CV)1.5984657
Kurtosis26.221466
Mean16147.307
Median Absolute Deviation (MAD)4000
Skewness4.3234057
Sum44066000
Variance6.6620337 × 108
MonotonicityNot monotonic
2023-12-13T03:11:32.206119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2000 362
13.3%
5000 359
13.2%
4000 228
 
8.4%
3000 200
 
7.3%
8000 189
 
6.9%
6000 144
 
5.3%
10000 142
 
5.2%
15000 86
 
3.2%
7000 77
 
2.8%
12000 68
 
2.5%
Other values (122) 874
32.0%
ValueCountFrequency (%)
2000 362
13.3%
3000 200
7.3%
4000 228
8.4%
5000 359
13.2%
6000 144
 
5.3%
7000 77
 
2.8%
8000 189
6.9%
9000 64
 
2.3%
10000 142
 
5.2%
11000 39
 
1.4%
ValueCountFrequency (%)
277000 2
0.1%
257000 1
< 0.1%
241000 1
< 0.1%
240000 1
< 0.1%
233000 1
< 0.1%
212000 1
< 0.1%
198000 1
< 0.1%
197000 1
< 0.1%
196000 1
< 0.1%
186000 1
< 0.1%

Interactions

2023-12-13T03:11:29.302987image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:11:29.091988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:11:29.420963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:11:29.194845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T03:11:32.314893image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
결재방법수량금액
결재방법1.0000.0440.049
수량0.0441.0000.838
금액0.0490.8381.000
2023-12-13T03:11:32.427179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수량금액결재방법
수량1.0000.8240.025
금액0.8241.0000.033
결재방법0.0250.0331.000

Missing values

2023-12-13T03:11:29.591089image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T03:11:29.699742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

접수일자수거일자결재방법폐기물종류수량금액
02014-11-272014-12-22현금응접세트112000
12014-11-272014-12-22현금매트리스15000
22014-11-272014-12-22현금매트리스15000
32014-12-202014-12-24이체인형·장난감류 외 1종48000
42014-12-202014-12-24이체쌀통13000
52014-12-192014-12-24현금매트리스15000
62014-12-182014-12-24이체TV 다이 외 3종511000
72014-12-202014-12-31현금서랍장 외 17종32105000
82014-12-202014-12-31현금서랍장 외 15종2998000
92015-01-062015-01-09이체장식장 외 7종840000
접수일자수거일자결재방법폐기물종류수량금액
27192021-09-182021-09-29이체응접세트 외 1종315000
27202021-09-242021-09-30카드식탁728000
27212021-09-122021-09-30이체의자 외 4종1137000
27222021-09-172021-09-30카드의자 외 4종1133000
27232021-09-162021-09-30카드청소기12000
27242021-09-132021-09-30카드컴퓨터 외 1종37000
27252021-09-122021-09-30이체의자 외 4종1137000
27262021-09-012021-09-30카드응접세트13000
27272021-09-092021-09-30카드매트리스15000
27282021-09-192021-09-30카드장식장 외 1종313000

Duplicate rows

Most frequently occurring

접수일자수거일자결재방법폐기물종류수량금액# duplicates
112021-04-262021-05-07카드매트리스180009
52020-07-132020-07-21카드서랍장130006
102021-04-262021-05-07카드매트리스150006
82021-01-062021-02-17이체의자120003
02014-11-272014-12-22현금매트리스150002
12017-03-272017-04-10카드응접세트130002
22017-11-072017-11-15카드컴퓨터 외 4종5130002
32018-12-042018-12-05카드문갑140002
42020-06-232020-08-05카드의자120002
62020-07-162020-09-04카드변기150002