Overview

Dataset statistics

Number of variables7
Number of observations5376
Missing cells5640
Missing cells (%)15.0%
Duplicate rows761
Duplicate rows (%)14.2%
Total size in memory299.4 KiB
Average record size in memory57.0 B

Variable types

Unsupported3
Categorical2
Text2

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15069/F/1/datasetView.do

Alerts

Dataset has 761 (14.2%) duplicate rowsDuplicates
Unnamed: 2 is highly overall correlated with Unnamed: 6High correlation
Unnamed: 6 is highly overall correlated with Unnamed: 2High correlation
Unnamed: 0 has 5376 (100.0%) missing valuesMissing
Unnamed: 3 has 232 (4.3%) missing valuesMissing
Unnamed: 0 is an unsupported type, check if it needs cleaning or further analysisUnsupported
서울시 가로쓰레기통 현황 (2021.6월 기준) is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-13 06:32:29.776233
Analysis finished2024-03-13 06:32:30.521909
Duration0.75 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Unnamed: 0
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing5376
Missing (%)100.0%
Memory size47.4 KiB

서울시 가로쓰레기통 현황 (2021.6월 기준)
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size42.1 KiB

Unnamed: 2
Categorical

HIGH CORRELATION 

Distinct26
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size42.1 KiB
강남구
976 
구로구
 
292
서대문구
 
290
강서
 
287
도봉구
 
280
Other values (21)
3251 

Length

Max length4
Median length3
Mean length3.0228795
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row자치구명
2nd row종로구
3rd row종로구
4th row종로구
5th row종로구

Common Values

ValueCountFrequency (%)
강남구 976
18.2%
구로구 292
 
5.4%
서대문구 290
 
5.4%
강서 287
 
5.3%
도봉구 280
 
5.2%
용산구 269
 
5.0%
서초구 250
 
4.7%
강북구 247
 
4.6%
마포구 239
 
4.4%
은평구 233
 
4.3%
Other values (16) 2013
37.4%

Length

2024-03-13T15:32:30.584050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
강남구 976
18.2%
구로구 292
 
5.4%
서대문구 290
 
5.4%
강서 287
 
5.3%
도봉구 280
 
5.2%
용산구 269
 
5.0%
서초구 250
 
4.7%
강북구 247
 
4.6%
마포구 239
 
4.4%
은평구 233
 
4.3%
Other values (16) 2013
37.4%

Unnamed: 3
Text

MISSING 

Distinct587
Distinct (%)11.4%
Missing232
Missing (%)4.3%
Memory size42.1 KiB
2024-03-13T15:32:30.832374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length3
Mean length3.8028771
Min length2

Characters and Unicode

Total characters19562
Distinct characters272
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique179 ?
Unique (%)3.5%

Sample

1st row도로명
2nd row사직로
3rd row사직로
4th row자하문로
5th row자하문로
ValueCountFrequency (%)
도봉로 150
 
2.8%
남부순환로 116
 
2.2%
영동대로 108
 
2.0%
통일로 94
 
1.8%
천호대로 81
 
1.5%
테헤란로 80
 
1.5%
삼성로 78
 
1.5%
경인로 71
 
1.3%
봉은사로 70
 
1.3%
한강대로 68
 
1.3%
Other values (584) 4447
82.9%
2024-03-13T15:32:31.418234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4943
25.3%
742
 
3.8%
585
 
3.0%
337
 
1.7%
327
 
1.7%
324
 
1.7%
279
 
1.4%
252
 
1.3%
251
 
1.3%
227
 
1.2%
Other values (262) 11295
57.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 18177
92.9%
Decimal Number 1049
 
5.4%
Space Separator 251
 
1.3%
Control 23
 
0.1%
Open Punctuation 20
 
0.1%
Close Punctuation 20
 
0.1%
Dash Punctuation 15
 
0.1%
Math Symbol 6
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4943
27.2%
742
 
4.1%
585
 
3.2%
337
 
1.9%
327
 
1.8%
324
 
1.8%
279
 
1.5%
252
 
1.4%
227
 
1.2%
213
 
1.2%
Other values (245) 9948
54.7%
Decimal Number
ValueCountFrequency (%)
1 223
21.3%
2 152
14.5%
5 112
10.7%
3 98
9.3%
0 85
 
8.1%
9 84
 
8.0%
7 78
 
7.4%
8 77
 
7.3%
4 75
 
7.1%
6 65
 
6.2%
Space Separator
ValueCountFrequency (%)
251
100.0%
Control
ValueCountFrequency (%)
23
100.0%
Open Punctuation
ValueCountFrequency (%)
( 20
100.0%
Close Punctuation
ValueCountFrequency (%)
) 20
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 15
100.0%
Math Symbol
ValueCountFrequency (%)
~ 6
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 18177
92.9%
Common 1385
 
7.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4943
27.2%
742
 
4.1%
585
 
3.2%
337
 
1.9%
327
 
1.8%
324
 
1.8%
279
 
1.5%
252
 
1.4%
227
 
1.2%
213
 
1.2%
Other values (245) 9948
54.7%
Common
ValueCountFrequency (%)
251
18.1%
1 223
16.1%
2 152
11.0%
5 112
8.1%
3 98
 
7.1%
0 85
 
6.1%
9 84
 
6.1%
7 78
 
5.6%
8 77
 
5.6%
4 75
 
5.4%
Other values (7) 150
10.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 18177
92.9%
ASCII 1385
 
7.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4943
27.2%
742
 
4.1%
585
 
3.2%
337
 
1.9%
327
 
1.8%
324
 
1.8%
279
 
1.5%
252
 
1.4%
227
 
1.2%
213
 
1.2%
Other values (245) 9948
54.7%
ASCII
ValueCountFrequency (%)
251
18.1%
1 223
16.1%
2 152
11.0%
5 112
8.1%
3 98
 
7.1%
0 85
 
6.1%
9 84
 
6.1%
7 78
 
5.6%
8 77
 
5.6%
4 75
 
5.4%
Other values (7) 150
10.8%

Unnamed: 4
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size42.1 KiB
Distinct332
Distinct (%)6.2%
Missing32
Missing (%)0.6%
Memory size42.1 KiB
2024-03-13T15:32:31.606867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length21
Mean length10.427208
Min length1

Characters and Unicode

Total characters55723
Distinct characters336
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique274 ?
Unique (%)5.1%

Sample

1st row설치 지점
2nd row지하철역 입구
3rd row지하철역 입구
4th row도로(가로)변
5th row 도로(가로)변
ValueCountFrequency (%)
2107
16.9%
택시 1880
15.1%
정류소(버스 1425
11.4%
도로변(횡단보도 954
 
7.7%
포함 954
 
7.7%
도로(가로)변 542
 
4.4%
입구 530
 
4.3%
정류장(버스 455
 
3.7%
지하철역 452
 
3.6%
394
 
3.2%
Other values (375) 2756
22.1%
2024-03-13T15:32:31.912321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7250
 
13.0%
) 3543
 
6.4%
( 3543
 
6.4%
2593
 
4.7%
2548
 
4.6%
2537
 
4.6%
2395
 
4.3%
2365
 
4.2%
2358
 
4.2%
2119
 
3.8%
Other values (326) 24472
43.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 38415
68.9%
Space Separator 7250
 
13.0%
Close Punctuation 3543
 
6.4%
Open Punctuation 3543
 
6.4%
Other Punctuation 1854
 
3.3%
Other Number 924
 
1.7%
Decimal Number 150
 
0.3%
Uppercase Letter 38
 
0.1%
Lowercase Letter 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2593
 
6.7%
2548
 
6.6%
2537
 
6.6%
2395
 
6.2%
2365
 
6.2%
2358
 
6.1%
2119
 
5.5%
2024
 
5.3%
2021
 
5.3%
1624
 
4.2%
Other values (287) 15831
41.2%
Uppercase Letter
ValueCountFrequency (%)
S 7
18.4%
U 6
15.8%
G 4
10.5%
C 4
10.5%
A 3
7.9%
K 3
7.9%
T 3
7.9%
W 2
 
5.3%
B 2
 
5.3%
Y 2
 
5.3%
Other values (2) 2
 
5.3%
Decimal Number
ValueCountFrequency (%)
1 45
30.0%
2 26
17.3%
0 24
16.0%
3 16
 
10.7%
4 12
 
8.0%
9 6
 
4.0%
6 6
 
4.0%
5 5
 
3.3%
8 5
 
3.3%
7 5
 
3.3%
Other Number
ValueCountFrequency (%)
394
42.6%
219
23.7%
104
 
11.3%
103
 
11.1%
37
 
4.0%
36
 
3.9%
31
 
3.4%
Lowercase Letter
ValueCountFrequency (%)
e 1
16.7%
w 1
16.7%
o 1
16.7%
r 1
16.7%
l 1
16.7%
d 1
16.7%
Space Separator
ValueCountFrequency (%)
7250
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3543
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3543
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1854
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 38415
68.9%
Common 17264
31.0%
Latin 44
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2593
 
6.7%
2548
 
6.6%
2537
 
6.6%
2395
 
6.2%
2365
 
6.2%
2358
 
6.1%
2119
 
5.5%
2024
 
5.3%
2021
 
5.3%
1624
 
4.2%
Other values (287) 15831
41.2%
Common
ValueCountFrequency (%)
7250
42.0%
) 3543
20.5%
( 3543
20.5%
, 1854
 
10.7%
394
 
2.3%
219
 
1.3%
104
 
0.6%
103
 
0.6%
1 45
 
0.3%
37
 
0.2%
Other values (11) 172
 
1.0%
Latin
ValueCountFrequency (%)
S 7
15.9%
U 6
13.6%
G 4
9.1%
C 4
9.1%
A 3
 
6.8%
K 3
 
6.8%
T 3
 
6.8%
W 2
 
4.5%
B 2
 
4.5%
Y 2
 
4.5%
Other values (8) 8
18.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 38415
68.9%
ASCII 16384
29.4%
Enclosed Alphanum 924
 
1.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7250
44.3%
) 3543
21.6%
( 3543
21.6%
, 1854
 
11.3%
1 45
 
0.3%
2 26
 
0.2%
0 24
 
0.1%
3 16
 
0.1%
4 12
 
0.1%
S 7
 
< 0.1%
Other values (22) 64
 
0.4%
Hangul
ValueCountFrequency (%)
2593
 
6.7%
2548
 
6.6%
2537
 
6.6%
2395
 
6.2%
2365
 
6.2%
2358
 
6.1%
2119
 
5.5%
2024
 
5.3%
2021
 
5.3%
1624
 
4.2%
Other values (287) 15831
41.2%
Enclosed Alphanum
ValueCountFrequency (%)
394
42.6%
219
23.7%
104
 
11.3%
103
 
11.1%
37
 
4.0%
36
 
3.9%
31
 
3.4%

Unnamed: 6
Categorical

HIGH CORRELATION 

Distinct26
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size42.1 KiB
일반쓰레기
1896 
재활용쓰레기
586 
재활용
522 
일반쓰레기 수거용
380 
① 일반쓰레기
329 
Other values (21)
1663 

Length

Max length27
Median length15
Mean length6.2613467
Min length2

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row수거 쓰레기 종류(일반 쓰레기 / 재활용 쓰레기)
2nd row일반쓰레기
3rd row재활용
4th row일반쓰레기
5th row재활용

Common Values

ValueCountFrequency (%)
일반쓰레기 1896
35.3%
재활용쓰레기 586
 
10.9%
재활용 522
 
9.7%
일반쓰레기 수거용 380
 
7.1%
① 일반쓰레기 329
 
6.1%
일반쓰레기수거용 250
 
4.7%
재활용쓰레기 205
 
3.8%
일반쓰레기 205
 
3.8%
① 정류장(버스 등) 159
 
3.0%
② 재활용품 수거용 153
 
2.8%
Other values (16) 691
 
12.9%

Length

2024-03-13T15:32:32.021122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
일반쓰레기 2947
39.6%
재활용쓰레기 886
 
11.9%
수거용 760
 
10.2%
628
 
8.4%
재활용 613
 
8.2%
260
 
3.5%
일반쓰레기수거용 250
 
3.4%
168
 
2.3%
정류장(버스 162
 
2.2%
재활용품 153
 
2.1%
Other values (19) 622
 
8.4%

Correlations

2024-03-13T15:32:32.082232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 2Unnamed: 6
Unnamed: 21.0000.973
Unnamed: 60.9731.000
2024-03-13T15:32:32.169076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 6Unnamed: 2
Unnamed: 61.0000.561
Unnamed: 20.5611.000
2024-03-13T15:32:32.302121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 2Unnamed: 6
Unnamed: 21.0000.561
Unnamed: 60.5611.000

Missing values

2024-03-13T15:32:30.267949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T15:32:30.369049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-13T15:32:30.470919image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0서울시 가로쓰레기통 현황 (2021.6월 기준)Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6
0<NA>연번자치구명도로명세부 위치(상세 주소)설치 지점수거 쓰레기 종류(일반 쓰레기 / 재활용 쓰레기)
1<NA>1종로구사직로경복궁역 4번출구지하철역 입구일반쓰레기
2<NA>2종로구사직로경복궁역 4번출구지하철역 입구재활용
3<NA>3종로구자하문로자하문로 28도로(가로)변일반쓰레기
4<NA>4종로구자하문로자하문로 28도로(가로)변재활용
5<NA>5종로구자하문로자하문로 44도로(가로)변일반쓰레기
6<NA>6종로구자하문로자하문로 44도로(가로)변재활용
7<NA>7종로구자하문로자하문로 68(효자동 정류소)정류장(버스, 택시 등)일반쓰레기
8<NA>8종로구자하문로자하문로 68(효자동 정류소)정류장(버스, 택시 등)재활용
9<NA>9종로구효자로청와대 분수대(사랑채)광장 등 다중집합장소일반쓰레기
Unnamed: 0서울시 가로쓰레기통 현황 (2021.6월 기준)Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6
5366<NA>5366도봉구해등로해등로 48창1동주민센터 앞⑥ 횡단보도 입구
5367<NA>5367도봉구해등로해등로 139미소애아파트 앞 버스정류장① 정류장(버스, 택시 등)
5368<NA>5368도봉구해등로창동 347-5창1동주민센터 버스정류장① 정류장(버스, 택시 등)
5369<NA>5369도봉구노해로노해로62길 36창북중학교 앞 횡단보도⑥ 횡단보도 입구
5370<NA>5370도봉구노해로창동 135-26창동지하차도 이마트 건너편 횡단보도⑥ 횡단보도 입구
5371<NA>5371도봉구해등로해등로3길 41삼환빌라 앞 횡단보도⑥ 횡단보도 입구
5372<NA>5372도봉구해등로해등로 103창원초등학교 앞 횡단보도⑥ 횡단보도 입구
5373<NA>5373도봉구도봉산길도봉산길 27도봉고등학교 건너편 횡단보도⑥ 횡단보도 입구
5374<NA>5374도봉구도봉로도봉동 620-23신도봉사거리 도봉중학교방향 횡단보도⑥ 횡단보도 입구
5375<NA>5375도봉구해등로해등로 32창1동 서울가든아파트 버스정류장① 정류장(버스, 택시 등)

Duplicate rows

Most frequently occurring

Unnamed: 2Unnamed: 3Unnamed: 5Unnamed: 6# duplicates
473서초구<NA>정류소(버스, 택시 등)재활용쓰레기95
474서초구<NA>지하철역 입구재활용쓰레기72
135강동구천호대로버스중앙차로일반쓰레기37
203강서양천로버스정류장일반쓰레기 수거용36
472서초구<NA>상가지역재활용쓰레기36
324금천구시흥대로정류소(버스, 택시 등)일반쓰레기33
330노원구동일로정류소(버스, 택시 등)일반쓰레기31
454서대문구연세로③ 도로(가로)변② 재활용품 수거용30
631용산구한강대로도로(가로)변일반쓰레기수거용29
475서초구<NA><NA>재활용쓰레기28