Overview

Dataset statistics

Number of variables6
Number of observations5383
Missing cells8
Missing cells (%)< 0.1%
Duplicate rows366
Duplicate rows (%)6.8%
Total size in memory252.5 KiB
Average record size in memory48.0 B

Variable types

Unsupported1
Categorical3
Text2

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15069/F/1/datasetView.do

Alerts

Dataset has 366 (6.8%) duplicate rowsDuplicates
Unnamed: 1 is highly overall correlated with Unnamed: 5High correlation
Unnamed: 4 is highly overall correlated with Unnamed: 5High correlation
Unnamed: 5 is highly overall correlated with Unnamed: 1 and 1 other fieldsHigh correlation
Unnamed: 5 is highly imbalanced (53.8%)Imbalance
가로 쓰레기통 현황 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-13 06:32:36.103303
Analysis finished2024-03-13 06:32:37.055739
Duration0.95 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

가로 쓰레기통 현황
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size42.2 KiB

Unnamed: 1
Categorical

HIGH CORRELATION 

Distinct27
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size42.2 KiB
강남구
974 
강서구
 
315
은평구
 
308
도봉구
 
303
용산구
 
278
Other values (22)
3205 

Length

Max length4
Median length3
Mean length3.0572172
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row자치구명
4th row종로구
5th row종로구

Common Values

ValueCountFrequency (%)
강남구 974
18.1%
강서구 315
 
5.9%
은평구 308
 
5.7%
도봉구 303
 
5.6%
용산구 278
 
5.2%
종로구 272
 
5.1%
마포구 254
 
4.7%
서초구 246
 
4.6%
동작구 230
 
4.3%
구로구 221
 
4.1%
Other values (17) 1982
36.8%

Length

2024-03-13T15:32:37.122756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
강남구 974
18.1%
강서구 315
 
5.9%
은평구 308
 
5.7%
도봉구 303
 
5.6%
용산구 278
 
5.2%
종로구 272
 
5.1%
마포구 254
 
4.7%
서초구 246
 
4.6%
동작구 230
 
4.3%
구로구 221
 
4.1%
Other values (17) 1982
36.8%
Distinct3341
Distinct (%)62.1%
Missing2
Missing (%)< 0.1%
Memory size42.2 KiB
2024-03-13T15:32:37.423735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length18
Mean length8.0795391
Min length4

Characters and Unicode

Total characters43476
Distinct characters286
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1734 ?
Unique (%)32.2%

Sample

1st row설치위치(도로명 주소)
2nd row사직로 125
3rd row사직로 125
4th row자하문로 28
5th row자하문로 28
ValueCountFrequency (%)
도봉로 111
 
1.1%
테헤란로 90
 
0.9%
영동대로 86
 
0.9%
삼성로 82
 
0.8%
봉은사로 74
 
0.8%
통일로 67
 
0.7%
압구정로 64
 
0.7%
학동로 62
 
0.6%
논현로 62
 
0.6%
도산대로 60
 
0.6%
Other values (2770) 9081
92.3%
2024-03-13T15:32:37.826366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4654
 
10.7%
4622
 
10.6%
1 3445
 
7.9%
2 2432
 
5.6%
3 1998
 
4.6%
4 1604
 
3.7%
5 1515
 
3.5%
0 1433
 
3.3%
1432
 
3.3%
6 1374
 
3.2%
Other values (276) 18967
43.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 20071
46.2%
Decimal Number 17157
39.5%
Space Separator 4622
 
10.6%
Dash Punctuation 1158
 
2.7%
Open Punctuation 234
 
0.5%
Close Punctuation 233
 
0.5%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4654
23.2%
1432
 
7.1%
676
 
3.4%
590
 
2.9%
358
 
1.8%
357
 
1.8%
333
 
1.7%
322
 
1.6%
288
 
1.4%
263
 
1.3%
Other values (261) 10798
53.8%
Decimal Number
ValueCountFrequency (%)
1 3445
20.1%
2 2432
14.2%
3 1998
11.6%
4 1604
9.3%
5 1515
8.8%
0 1433
8.4%
6 1374
 
8.0%
7 1287
 
7.5%
8 1070
 
6.2%
9 999
 
5.8%
Space Separator
ValueCountFrequency (%)
4622
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1158
100.0%
Open Punctuation
ValueCountFrequency (%)
( 234
100.0%
Close Punctuation
ValueCountFrequency (%)
) 233
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 23405
53.8%
Hangul 20071
46.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4654
23.2%
1432
 
7.1%
676
 
3.4%
590
 
2.9%
358
 
1.8%
357
 
1.8%
333
 
1.7%
322
 
1.6%
288
 
1.4%
263
 
1.3%
Other values (261) 10798
53.8%
Common
ValueCountFrequency (%)
4622
19.7%
1 3445
14.7%
2 2432
10.4%
3 1998
8.5%
4 1604
 
6.9%
5 1515
 
6.5%
0 1433
 
6.1%
6 1374
 
5.9%
7 1287
 
5.5%
- 1158
 
4.9%
Other values (5) 2537
10.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23405
53.8%
Hangul 20071
46.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4654
23.2%
1432
 
7.1%
676
 
3.4%
590
 
2.9%
358
 
1.8%
357
 
1.8%
333
 
1.7%
322
 
1.6%
288
 
1.4%
263
 
1.3%
Other values (261) 10798
53.8%
ASCII
ValueCountFrequency (%)
4622
19.7%
1 3445
14.7%
2 2432
10.4%
3 1998
8.5%
4 1604
 
6.9%
5 1515
 
6.5%
0 1433
 
6.1%
6 1374
 
5.9%
7 1287
 
5.5%
- 1158
 
4.9%
Other values (5) 2537
10.8%
Distinct3524
Distinct (%)65.5%
Missing5
Missing (%)0.1%
Memory size42.2 KiB
2024-03-13T15:32:38.069898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length59
Median length36
Mean length15.162514
Min length2

Characters and Unicode

Total characters81544
Distinct characters684
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1977 ?
Unique (%)36.8%

Sample

1st row세부 위치
2nd row경복궁역 4번출구
3rd row경복궁역 4번출구
4th row스타벅스 앞
5th row스타벅스 앞
ValueCountFrequency (%)
1702
 
13.7%
버스정류장 319
 
2.6%
출구 158
 
1.3%
도로변 106
 
0.9%
횡단보도 101
 
0.8%
91
 
0.7%
건너편 54
 
0.4%
방면 49
 
0.4%
1번출구 49
 
0.4%
통일로 47
 
0.4%
Other values (4859) 9731
78.4%
2024-03-13T15:32:38.447212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7238
 
8.9%
1 3289
 
4.0%
( 2862
 
3.5%
) 2860
 
3.5%
2 2619
 
3.2%
0 2582
 
3.2%
2198
 
2.7%
2160
 
2.6%
- 2130
 
2.6%
2061
 
2.5%
Other values (674) 51545
63.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49034
60.1%
Decimal Number 15830
 
19.4%
Space Separator 7238
 
8.9%
Open Punctuation 2862
 
3.5%
Close Punctuation 2860
 
3.5%
Dash Punctuation 2130
 
2.6%
Uppercase Letter 901
 
1.1%
Other Punctuation 621
 
0.8%
Lowercase Letter 52
 
0.1%
Math Symbol 9
 
< 0.1%
Other values (2) 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2198
 
4.5%
2160
 
4.4%
2061
 
4.2%
2014
 
4.1%
1835
 
3.7%
1799
 
3.7%
1213
 
2.5%
1118
 
2.3%
1099
 
2.2%
1035
 
2.1%
Other values (610) 32502
66.3%
Uppercase Letter
ValueCountFrequency (%)
I 167
18.5%
D 119
13.2%
K 81
9.0%
S 76
8.4%
B 57
 
6.3%
G 55
 
6.1%
C 53
 
5.9%
O 48
 
5.3%
T 47
 
5.2%
L 35
 
3.9%
Other values (13) 163
18.1%
Lowercase Letter
ValueCountFrequency (%)
g 8
15.4%
l 7
13.5%
e 5
9.6%
u 5
9.6%
r 5
9.6%
o 3
 
5.8%
d 3
 
5.8%
a 3
 
5.8%
w 3
 
5.8%
c 3
 
5.8%
Other values (4) 7
13.5%
Decimal Number
ValueCountFrequency (%)
1 3289
20.8%
2 2619
16.5%
0 2582
16.3%
3 1881
11.9%
4 1110
 
7.0%
6 1013
 
6.4%
5 951
 
6.0%
8 797
 
5.0%
7 796
 
5.0%
9 792
 
5.0%
Other Punctuation
ValueCountFrequency (%)
. 277
44.6%
, 222
35.7%
: 102
 
16.4%
/ 6
 
1.0%
' 5
 
0.8%
4
 
0.6%
& 4
 
0.6%
· 1
 
0.2%
Math Symbol
ValueCountFrequency (%)
+ 6
66.7%
~ 2
 
22.2%
1
 
11.1%
Space Separator
ValueCountFrequency (%)
7238
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2862
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2860
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2130
100.0%
Control
ValueCountFrequency (%)
5
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49036
60.1%
Common 31555
38.7%
Latin 953
 
1.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2198
 
4.5%
2160
 
4.4%
2061
 
4.2%
2014
 
4.1%
1835
 
3.7%
1799
 
3.7%
1213
 
2.5%
1118
 
2.3%
1099
 
2.2%
1035
 
2.1%
Other values (611) 32504
66.3%
Latin
ValueCountFrequency (%)
I 167
17.5%
D 119
12.5%
K 81
 
8.5%
S 76
 
8.0%
B 57
 
6.0%
G 55
 
5.8%
C 53
 
5.6%
O 48
 
5.0%
T 47
 
4.9%
L 35
 
3.7%
Other values (27) 215
22.6%
Common
ValueCountFrequency (%)
7238
22.9%
1 3289
10.4%
( 2862
 
9.1%
) 2860
 
9.1%
2 2619
 
8.3%
0 2582
 
8.2%
- 2130
 
6.8%
3 1881
 
6.0%
4 1110
 
3.5%
6 1013
 
3.2%
Other values (16) 3971
12.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49034
60.1%
ASCII 32502
39.9%
Punctuation 4
 
< 0.1%
None 3
 
< 0.1%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7238
22.3%
1 3289
10.1%
( 2862
 
8.8%
) 2860
 
8.8%
2 2619
 
8.1%
0 2582
 
7.9%
- 2130
 
6.6%
3 1881
 
5.8%
4 1110
 
3.4%
6 1013
 
3.1%
Other values (50) 4918
15.1%
Hangul
ValueCountFrequency (%)
2198
 
4.5%
2160
 
4.4%
2061
 
4.2%
2014
 
4.1%
1835
 
3.7%
1799
 
3.7%
1213
 
2.5%
1118
 
2.3%
1099
 
2.2%
1035
 
2.1%
Other values (610) 32502
66.3%
Punctuation
ValueCountFrequency (%)
4
100.0%
None
ValueCountFrequency (%)
2
66.7%
· 1
33.3%
Math Operators
ValueCountFrequency (%)
1
100.0%

Unnamed: 4
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size42.2 KiB
정류소(버스,택시 등)
2901 
도로변(횡단보도 포함)
1829 
지하철역 입구
410 
상가지역
 
111
광장, 공원
 
70
Other values (3)
 
62

Length

Max length12
Median length12
Mean length11.262865
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row설치 장소 유형
4th row지하철역 입구
5th row지하철역 입구

Common Values

ValueCountFrequency (%)
정류소(버스,택시 등) 2901
53.9%
도로변(횡단보도 포함) 1829
34.0%
지하철역 입구 410
 
7.6%
상가지역 111
 
2.1%
광장, 공원 70
 
1.3%
기타 59
 
1.1%
<NA> 2
 
< 0.1%
설치 장소 유형 1
 
< 0.1%

Length

2024-03-13T15:32:38.557723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T15:32:38.648177image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정류소(버스,택시 2901
27.4%
2901
27.4%
도로변(횡단보도 1829
17.3%
포함 1829
17.3%
지하철역 410
 
3.9%
입구 410
 
3.9%
상가지역 111
 
1.0%
광장 70
 
0.7%
공원 70
 
0.7%
기타 59
 
0.6%
Other values (4) 5
 
< 0.1%

Unnamed: 5
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size42.2 KiB
일반쓰레기
3598 
재활용쓰레기
1782 
<NA>
 
2
수거 쓰레기 종류
 
1

Length

Max length9
Median length5
Mean length5.3314137
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row수거 쓰레기 종류
4th row일반쓰레기
5th row재활용쓰레기

Common Values

ValueCountFrequency (%)
일반쓰레기 3598
66.8%
재활용쓰레기 1782
33.1%
<NA> 2
 
< 0.1%
수거 쓰레기 종류 1
 
< 0.1%

Length

2024-03-13T15:32:38.766290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T15:32:38.863289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반쓰레기 3598
66.8%
재활용쓰레기 1782
33.1%
na 2
 
< 0.1%
수거 1
 
< 0.1%
쓰레기 1
 
< 0.1%
종류 1
 
< 0.1%

Correlations

2024-03-13T15:32:38.937908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 4Unnamed: 5
Unnamed: 11.0000.7890.912
Unnamed: 40.7891.0000.771
Unnamed: 50.9120.7711.000
2024-03-13T15:32:39.062943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 4Unnamed: 5
Unnamed: 11.0000.4840.775
Unnamed: 40.4841.0000.710
Unnamed: 50.7750.7101.000
2024-03-13T15:32:39.145716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 4Unnamed: 5
Unnamed: 11.0000.4840.775
Unnamed: 40.4841.0000.710
Unnamed: 50.7750.7101.000

Missing values

2024-03-13T15:32:36.758524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T15:32:36.893991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-13T15:32:36.988322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

가로 쓰레기통 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5
0NaN<NA><NA><NA><NA><NA>
1기준일자 : '23. 12. 31.<NA><NA><NA><NA><NA>
2연번자치구명설치위치(도로명 주소)세부 위치설치 장소 유형수거 쓰레기 종류
31종로구사직로 125경복궁역 4번출구지하철역 입구일반쓰레기
42종로구사직로 125경복궁역 4번출구지하철역 입구재활용쓰레기
53종로구자하문로 28스타벅스 앞도로변(횡단보도 포함)일반쓰레기
64종로구자하문로 28스타벅스 앞도로변(횡단보도 포함)재활용쓰레기
75종로구자하문로 44라파리나 카페 앞도로변(횡단보도 포함)일반쓰레기
86종로구자하문로 44라파리나 카페 앞도로변(횡단보도 포함)재활용쓰레기
97종로구자하문로 68평화제과 앞도로변(횡단보도 포함)일반쓰레기
가로 쓰레기통 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5
53735371강동구동남로71길주양쇼핑 따릉이 대여소(1036) 앞도로변(횡단보도 포함)일반쓰레기
53745372강동구고덕로 276명일동 이마트 앞(상일동 방향) 횡단보도도로변(횡단보도 포함)일반쓰레기
53755373강동구고덕로 276명일동 이마트 앞(고덕동 방향) 횡단보도도로변(횡단보도 포함)일반쓰레기
53765374강동구고덕로 276강동아트센터(25-179)정류소(버스,택시 등)일반쓰레기
53775375강동구동남로 832한영중고한영외고 앞(25-181)정류소(버스,택시 등)일반쓰레기
53785376강동구명일동 60한영중고한영외고 맞은편(25-180)정류소(버스,택시 등)일반쓰레기
53795377강동구고덕로 269고덕역 3번 출구(명덕성결교회 입구 부근)지하철역 입구일반쓰레기
53805378강동구고덕동 313-1배재중고등학교(25-141)(배재고등학교정문 옆)정류소(버스,택시 등)일반쓰레기
53815379강동구상일동 440-3상일초교(중)(25-001)정류소(버스,택시 등)일반쓰레기
53825380강동구상일동 512강동첨단업무단지,상일여고입구(25-310)정류소(버스,택시 등)일반쓰레기

Duplicate rows

Most frequently occurring

Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5# duplicates
271송파구장지동 896-1(위례중앙로)위례중앙광장(장지동 896-1)광장, 공원재활용쓰레기4
295용산구한남동 705-10순천향대학병원(서초IC방면) 버스정류장(03-164)정류소(버스,택시 등)일반쓰레기4
8강동구천호대로 1120길동사거리,강동세무서(25-010)정류소(버스,택시 등)일반쓰레기3
9강동구천호대로 1180길동주민센터,둔촌2동주민센터(25-007)정류소(버스,택시 등)일반쓰레기3
10강동구천호대로 1180길동주민센터,둔촌2동주민센터(25-008)정류소(버스,택시 등)일반쓰레기3
11강동구천호대로1240강동자이,프라자아파트(25-005)정류소(버스,택시 등)일반쓰레기3
12강동구천호대로1240강동자이,프라자아파트(25-006)정류소(버스,택시 등)일반쓰레기3
28광진구강변역로 50동서울터미널 흡연부스 안기타일반쓰레기3
33구로구가마산로 250도로변도로변(횡단보도 포함)일반쓰레기3
51동작구노량진로 114-5올리브 영도로변(횡단보도 포함)재활용쓰레기3