Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows140
Duplicate rows (%)1.4%
Total size in memory312.5 KiB
Average record size in memory32.0 B

Variable types

Categorical1
Text2

Dataset

Description강남구 쓰레기 무단투기 적발내역은 불법으로 쓰레기를 무단 투기한 내역 중 위반 쓰레기명, 위반장소, 위반일시 정보를 제공합니다.
Author서울특별시 강남구
URLhttps://www.data.go.kr/data/15127416/fileData.do

Alerts

Dataset has 140 (1.4%) duplicate rowsDuplicates

Reproduction

Analysis started2024-04-21 11:27:29.031459
Analysis finished2024-04-21 11:27:29.965013
Duration0.93 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

위반쓰레기
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
담배꽁초
7520 
혼합배출
2480 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row담배꽁초
2nd row담배꽁초
3rd row담배꽁초
4th row담배꽁초
5th row담배꽁초

Common Values

ValueCountFrequency (%)
담배꽁초 7520
75.2%
혼합배출 2480
 
24.8%

Length

2024-04-21T20:27:30.164108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T20:27:30.466942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
담배꽁초 7520
75.2%
혼합배출 2480
 
24.8%
Distinct3694
Distinct (%)36.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T20:27:31.546378image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length23
Mean length12.6727
Min length3

Characters and Unicode

Total characters126727
Distinct characters389
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2650 ?
Unique (%)26.5%

Sample

1st row강남역 1번출구
2nd row삼성동 하동관옆
3rd row대치동 농협
4th row역삼동 테헤란로1길40
5th row삼성동 삼성로96길20
ValueCountFrequency (%)
역삼동 3958
 
16.7%
강남역 1246
 
5.3%
대치동 1210
 
5.1%
논현동 1201
 
5.1%
삼성동 983
 
4.2%
11번출구 712
 
3.0%
강남대로406 591
 
2.5%
1번출구 308
 
1.3%
신사동 283
 
1.2%
강남대로408 199
 
0.8%
Other values (3672) 12971
54.8%
2024-04-21T20:27:33.093335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
13751
 
10.9%
8829
 
7.0%
1 8137
 
6.4%
6241
 
4.9%
6035
 
4.8%
5641
 
4.5%
4454
 
3.5%
2 4093
 
3.2%
4 3663
 
2.9%
3 3116
 
2.5%
Other values (379) 62767
49.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 70313
55.5%
Decimal Number 34306
27.1%
Space Separator 13751
 
10.9%
Dash Punctuation 2730
 
2.2%
Open Punctuation 2479
 
2.0%
Close Punctuation 2476
 
2.0%
Uppercase Letter 637
 
0.5%
Other Punctuation 33
 
< 0.1%
Math Symbol 1
 
< 0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8829
 
12.6%
6241
 
8.9%
6035
 
8.6%
5641
 
8.0%
4454
 
6.3%
3087
 
4.4%
2761
 
3.9%
2749
 
3.9%
1886
 
2.7%
1886
 
2.7%
Other values (337) 26744
38.0%
Uppercase Letter
ValueCountFrequency (%)
G 162
25.4%
C 125
19.6%
F 92
14.4%
T 52
 
8.2%
K 51
 
8.0%
S 36
 
5.7%
V 25
 
3.9%
A 20
 
3.1%
M 11
 
1.7%
U 9
 
1.4%
Other values (13) 54
 
8.5%
Decimal Number
ValueCountFrequency (%)
1 8137
23.7%
2 4093
11.9%
4 3663
10.7%
3 3116
 
9.1%
6 2980
 
8.7%
0 2856
 
8.3%
5 2606
 
7.6%
8 2517
 
7.3%
7 2361
 
6.9%
9 1977
 
5.8%
Other Punctuation
ValueCountFrequency (%)
& 30
90.9%
. 2
 
6.1%
: 1
 
3.0%
Space Separator
ValueCountFrequency (%)
13751
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2730
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2479
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2476
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 70313
55.5%
Common 55777
44.0%
Latin 637
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8829
 
12.6%
6241
 
8.9%
6035
 
8.6%
5641
 
8.0%
4454
 
6.3%
3087
 
4.4%
2761
 
3.9%
2749
 
3.9%
1886
 
2.7%
1886
 
2.7%
Other values (337) 26744
38.0%
Latin
ValueCountFrequency (%)
G 162
25.4%
C 125
19.6%
F 92
14.4%
T 52
 
8.2%
K 51
 
8.0%
S 36
 
5.7%
V 25
 
3.9%
A 20
 
3.1%
M 11
 
1.7%
U 9
 
1.4%
Other values (13) 54
 
8.5%
Common
ValueCountFrequency (%)
13751
24.7%
1 8137
14.6%
2 4093
 
7.3%
4 3663
 
6.6%
3 3116
 
5.6%
6 2980
 
5.3%
0 2856
 
5.1%
- 2730
 
4.9%
5 2606
 
4.7%
8 2517
 
4.5%
Other values (9) 9328
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 70313
55.5%
ASCII 56414
44.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13751
24.4%
1 8137
14.4%
2 4093
 
7.3%
4 3663
 
6.5%
3 3116
 
5.5%
6 2980
 
5.3%
0 2856
 
5.1%
- 2730
 
4.8%
5 2606
 
4.6%
8 2517
 
4.5%
Other values (32) 9965
17.7%
Hangul
ValueCountFrequency (%)
8829
 
12.6%
6241
 
8.9%
6035
 
8.6%
5641
 
8.0%
4454
 
6.3%
3087
 
4.4%
2761
 
3.9%
2749
 
3.9%
1886
 
2.7%
1886
 
2.7%
Other values (337) 26744
38.0%
Distinct9058
Distinct (%)90.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T20:27:34.185387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length16
Mean length16.0019
Min length16

Characters and Unicode

Total characters160019
Distinct characters16
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8242 ?
Unique (%)82.4%

Sample

1st row2022-04-21 11:06
2nd row2022-06-27 09:13
3rd row2023-11-08 09:55
4th row2022-02-23 13:50
5th row2022-08-12 10:28
ValueCountFrequency (%)
10:30 211
 
1.1%
10:50 202
 
1.0%
10:40 192
 
1.0%
09:50 184
 
0.9%
11:00 177
 
0.9%
10:10 171
 
0.9%
10:00 171
 
0.9%
10:20 169
 
0.8%
09:40 159
 
0.8%
11:20 153
 
0.8%
Other values (927) 18210
91.1%
2024-04-21T20:27:35.420329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 35455
22.2%
0 32714
20.4%
1 21277
13.3%
- 19998
12.5%
: 10000
 
6.2%
9999
 
6.2%
3 9074
 
5.7%
5 5377
 
3.4%
4 4601
 
2.9%
9 4211
 
2.6%
Other values (6) 7313
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 120019
75.0%
Dash Punctuation 19998
 
12.5%
Other Punctuation 10000
 
6.2%
Space Separator 9999
 
6.2%
Uppercase Letter 2
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 35455
29.5%
0 32714
27.3%
1 21277
17.7%
3 9074
 
7.6%
5 5377
 
4.5%
4 4601
 
3.8%
9 4211
 
3.5%
8 2589
 
2.2%
7 2520
 
2.1%
6 2201
 
1.8%
Uppercase Letter
ValueCountFrequency (%)
T 1
50.0%
X 1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 19998
100.0%
Other Punctuation
ValueCountFrequency (%)
: 10000
100.0%
Space Separator
ValueCountFrequency (%)
9999
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 160017
> 99.9%
Latin 2
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
2 35455
22.2%
0 32714
20.4%
1 21277
13.3%
- 19998
12.5%
: 10000
 
6.2%
9999
 
6.2%
3 9074
 
5.7%
5 5377
 
3.4%
4 4601
 
2.9%
9 4211
 
2.6%
Other values (4) 7311
 
4.6%
Latin
ValueCountFrequency (%)
T 1
50.0%
X 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 160019
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 35455
22.2%
0 32714
20.4%
1 21277
13.3%
- 19998
12.5%
: 10000
 
6.2%
9999
 
6.2%
3 9074
 
5.7%
5 5377
 
3.4%
4 4601
 
2.9%
9 4211
 
2.6%
Other values (6) 7313
 
4.6%

Missing values

2024-04-21T20:27:29.597184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T20:27:29.851418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

위반쓰레기위반장소위반일시
5051담배꽁초강남역 1번출구2022-04-21 11:06
7829담배꽁초삼성동 하동관옆2022-06-27 09:13
19815담배꽁초대치동 농협2023-11-08 09:55
2455담배꽁초역삼동 테헤란로1길402022-02-23 13:50
9821담배꽁초삼성동 삼성로96길202022-08-12 10:28
7967담배꽁초강남역 1번출구2022-07-01 13:10
3511담배꽁초역삼동 강남대로4062022-03-22 10:15
11872담배꽁초역삼동 건강보험센터옆2022-10-07 12:15
1839담배꽁초대치동 테헤란로78길82022-02-10 11:00
12732담배꽁초역삼동 테헤란로2052022-10-27 10:43
위반쓰레기위반장소위반일시
25647혼합배출삼성로71길 27-5 (대치동)2023-05-17 10:04
2087담배꽁초역삼동 테헤란로1292022-02-15 10:40
2384담배꽁초강남역 1번출구2022-02-21 12:44
17975담배꽁초삼성동 삼성역7번출구2023-09-14 11:23
12955담배꽁초역삼동 테헤란로2길272022-11-01 13:18
21043담배꽁초논현동 선릉로129길52023-12-20 13:35
23521혼합배출테헤란로53길 60-8 (역삼동 693-11)2022-11-03 10:32
16486담배꽁초압구정동 CGV2023-08-03 10:40
15525담배꽁초역삼동 강남대로4062023-01-06 12:10
8465담배꽁초역삼동 강남대로4062022-07-13 12:12

Duplicate rows

Most frequently occurring

위반쓰레기위반장소위반일시# duplicates
16담배꽁초강남역 1번출구2022-07-11 10:103
59담배꽁초수서역2022-11-23 10:303
116담배꽁초역삼동 패스트파이브앞길2022-08-31 11:103
0담배꽁초강남역 11번출구2022-01-05 13:342
1담배꽁초강남역 11번출구2022-01-05 13:502
2담배꽁초강남역 11번출구2022-01-06 13:482
3담배꽁초강남역 11번출구2022-01-13 13:202
4담배꽁초강남역 11번출구2022-01-18 12:302
5담배꽁초강남역 11번출구2022-02-03 13:052
6담배꽁초강남역 11번출구2022-05-30 11:502