Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows1356
Duplicate rows (%)13.6%
Total size in memory312.5 KiB
Average record size in memory32.0 B

Variable types

Text2
Categorical1

Dataset

Description폐기물관리법 제17조에 근거하여 사업장폐기물배출자 신고현황 및 내용에 관한 데이터입니다.(사업장명, 사업장 주소지, 사업장폐기물 배출자 신고항목(폐기물 종류))
URLhttps://www.data.go.kr/data/15062394/fileData.do

Alerts

Dataset has 1356 (13.6%) duplicate rowsDuplicates
건설폐기물 종류 is highly imbalanced (61.4%)Imbalance

Reproduction

Analysis started2023-12-12 07:40:14.625910
Analysis finished2023-12-12 07:40:15.243363
Duration0.62 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

상호
Text

Distinct1273
Distinct (%)12.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T16:40:15.378034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length18
Mean length6.3866
Min length2

Characters and Unicode

Total characters63866
Distinct characters428
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique411 ?
Unique (%)4.1%

Sample

1st row동광건설산업(주)
2nd row대림산업(주)
3rd row명문종합건설(주)
4th row대림산업(주)
5th row(주)현암도시개발
ValueCountFrequency (%)
개인 2085
 
19.7%
주)청산건설 404
 
3.8%
주)터원 352
 
3.3%
동작구청 320
 
3.0%
주식회사 281
 
2.7%
윤정이엔씨(주 179
 
1.7%
문창토건 125
 
1.2%
호림건설(주 111
 
1.0%
주언건설(주 111
 
1.0%
주)대우건설 111
 
1.0%
Other values (1278) 6493
61.4%
2023-12-12T16:40:15.735737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6124
 
9.6%
( 5441
 
8.5%
) 5441
 
8.5%
3657
 
5.7%
2890
 
4.5%
2844
 
4.5%
2219
 
3.5%
1316
 
2.1%
1083
 
1.7%
1064
 
1.7%
Other values (418) 31787
49.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 51955
81.4%
Open Punctuation 5441
 
8.5%
Close Punctuation 5441
 
8.5%
Space Separator 572
 
0.9%
Decimal Number 290
 
0.5%
Uppercase Letter 129
 
0.2%
Other Punctuation 18
 
< 0.1%
Other Symbol 10
 
< 0.1%
Lowercase Letter 8
 
< 0.1%
Dash Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6124
 
11.8%
3657
 
7.0%
2890
 
5.6%
2844
 
5.5%
2219
 
4.3%
1316
 
2.5%
1083
 
2.1%
1064
 
2.0%
956
 
1.8%
898
 
1.7%
Other values (383) 28904
55.6%
Uppercase Letter
ValueCountFrequency (%)
S 29
22.5%
E 22
17.1%
J 17
13.2%
H 16
12.4%
N 11
 
8.5%
K 8
 
6.2%
T 8
 
6.2%
C 5
 
3.9%
G 4
 
3.1%
L 3
 
2.3%
Other values (4) 6
 
4.7%
Decimal Number
ValueCountFrequency (%)
2 75
25.9%
1 75
25.9%
0 59
20.3%
3 35
12.1%
6 20
 
6.9%
5 18
 
6.2%
4 4
 
1.4%
9 2
 
0.7%
7 2
 
0.7%
Lowercase Letter
ValueCountFrequency (%)
o 3
37.5%
s 2
25.0%
c 1
 
12.5%
m 1
 
12.5%
e 1
 
12.5%
Other Punctuation
ValueCountFrequency (%)
. 14
77.8%
& 4
 
22.2%
Open Punctuation
ValueCountFrequency (%)
( 5441
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5441
100.0%
Space Separator
ValueCountFrequency (%)
572
100.0%
Other Symbol
ValueCountFrequency (%)
10
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 51965
81.4%
Common 11764
 
18.4%
Latin 137
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6124
 
11.8%
3657
 
7.0%
2890
 
5.6%
2844
 
5.5%
2219
 
4.3%
1316
 
2.5%
1083
 
2.1%
1064
 
2.0%
956
 
1.8%
898
 
1.7%
Other values (384) 28914
55.6%
Latin
ValueCountFrequency (%)
S 29
21.2%
E 22
16.1%
J 17
12.4%
H 16
11.7%
N 11
 
8.0%
K 8
 
5.8%
T 8
 
5.8%
C 5
 
3.6%
G 4
 
2.9%
o 3
 
2.2%
Other values (9) 14
10.2%
Common
ValueCountFrequency (%)
( 5441
46.3%
) 5441
46.3%
572
 
4.9%
2 75
 
0.6%
1 75
 
0.6%
0 59
 
0.5%
3 35
 
0.3%
6 20
 
0.2%
5 18
 
0.2%
. 14
 
0.1%
Other values (5) 14
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 51955
81.4%
ASCII 11901
 
18.6%
None 10
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6124
 
11.8%
3657
 
7.0%
2890
 
5.6%
2844
 
5.5%
2219
 
4.3%
1316
 
2.5%
1083
 
2.1%
1064
 
2.0%
956
 
1.8%
898
 
1.7%
Other values (383) 28904
55.6%
ASCII
ValueCountFrequency (%)
( 5441
45.7%
) 5441
45.7%
572
 
4.8%
2 75
 
0.6%
1 75
 
0.6%
0 59
 
0.5%
3 35
 
0.3%
S 29
 
0.2%
E 22
 
0.2%
6 20
 
0.2%
Other values (24) 132
 
1.1%
None
ValueCountFrequency (%)
10
100.0%

건설폐기물 종류
Categorical

IMBALANCE 

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
건설폐재류:폐콘크리트
5411 
혼합건설폐기물
3499 
건설폐재류:폐아스팔트콘크리트
 
530
건설폐재류:건설폐토석
 
166
폐합성수지
 
154
Other values (13)
 
240

Length

Max length43
Median length11
Mean length9.8865
Min length3

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row건설폐재류:폐콘크리트
2nd row폐목재(나무의 뿌리ㆍ가지 등 임목폐기물이 5톤 이상인 경우는 제외한다)
3rd row혼합건설폐기물
4th row건설폐재류:폐콘크리트
5th row건설폐재류:폐콘크리트

Common Values

ValueCountFrequency (%)
건설폐재류:폐콘크리트 5411
54.1%
혼합건설폐기물 3499
35.0%
건설폐재류:폐아스팔트콘크리트 530
 
5.3%
건설폐재류:건설폐토석 166
 
1.7%
폐합성수지 154
 
1.5%
폐목재(나무의 뿌리ㆍ가지 등 임목폐기물이 5톤 이상인 경우는 제외한다) 78
 
0.8%
건설폐재류:폐벽돌 75
 
0.8%
건설오니 39
 
0.4%
건설폐재류:폐블록 23
 
0.2%
폐보드류 13
 
0.1%
Other values (8) 12
 
0.1%

Length

2023-12-12T16:40:15.887666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
건설폐재류:폐콘크리트 5411
51.2%
혼합건설폐기물 3499
33.1%
건설폐재류:폐아스팔트콘크리트 530
 
5.0%
건설폐재류:건설폐토석 166
 
1.6%
폐합성수지 154
 
1.5%
제외한다 80
 
0.8%
5톤 78
 
0.7%
이상인 78
 
0.7%
경우는 78
 
0.7%
임목폐기물이 78
 
0.7%
Other values (27) 417
 
3.9%
Distinct1604
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T16:40:16.264992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length50
Median length43
Mean length16.0707
Min length1

Characters and Unicode

Total characters160707
Distinct characters411
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique423 ?
Unique (%)4.2%

Sample

1st row서울특별시 영등포구 신길동 238번지 33호 네오캐슬
2nd row
3rd row서울특별시 동작구 노량진동 47번지 2호 동작구청
4th row
5th row서울특별시 동작구 노량진동 232번지 142호
ValueCountFrequency (%)
서울특별시 5561
 
17.2%
동작구 4506
 
13.9%
상도동 1274
 
3.9%
사당동 1041
 
3.2%
노량진동 787
 
2.4%
신대방동 432
 
1.3%
1호 431
 
1.3%
2호 386
 
1.2%
동작구청 286
 
0.9%
대방동 281
 
0.9%
Other values (2067) 17327
53.6%
2023-12-12T16:40:16.988941image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
29892
18.6%
11648
 
7.2%
6535
 
4.1%
6358
 
4.0%
6309
 
3.9%
1 6117
 
3.8%
5880
 
3.7%
5728
 
3.6%
5726
 
3.6%
5169
 
3.2%
Other values (401) 71345
44.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 99414
61.9%
Space Separator 29892
 
18.6%
Decimal Number 29385
 
18.3%
Dash Punctuation 1475
 
0.9%
Uppercase Letter 206
 
0.1%
Connector Punctuation 198
 
0.1%
Lowercase Letter 58
 
< 0.1%
Close Punctuation 29
 
< 0.1%
Open Punctuation 29
 
< 0.1%
Other Punctuation 21
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
11648
 
11.7%
6535
 
6.6%
6358
 
6.4%
6309
 
6.3%
5880
 
5.9%
5728
 
5.8%
5726
 
5.8%
5169
 
5.2%
5050
 
5.1%
4859
 
4.9%
Other values (357) 36152
36.4%
Uppercase Letter
ValueCountFrequency (%)
K 25
12.1%
T 24
11.7%
B 21
10.2%
S 21
10.2%
H 18
8.7%
C 13
 
6.3%
G 11
 
5.3%
P 11
 
5.3%
A 10
 
4.9%
L 9
 
4.4%
Other values (10) 43
20.9%
Decimal Number
ValueCountFrequency (%)
1 6117
20.8%
2 4936
16.8%
4 3390
11.5%
3 3140
10.7%
7 2326
 
7.9%
5 2198
 
7.5%
0 2174
 
7.4%
6 1955
 
6.7%
9 1614
 
5.5%
8 1535
 
5.2%
Lowercase Letter
ValueCountFrequency (%)
e 20
34.5%
t 7
 
12.1%
s 7
 
12.1%
b 6
 
10.3%
o 6
 
10.3%
n 6
 
10.3%
l 6
 
10.3%
Other Punctuation
ValueCountFrequency (%)
. 13
61.9%
& 8
38.1%
Space Separator
ValueCountFrequency (%)
29892
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1475
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 198
100.0%
Close Punctuation
ValueCountFrequency (%)
) 29
100.0%
Open Punctuation
ValueCountFrequency (%)
( 29
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 99414
61.9%
Common 61029
38.0%
Latin 264
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
11648
 
11.7%
6535
 
6.6%
6358
 
6.4%
6309
 
6.3%
5880
 
5.9%
5728
 
5.8%
5726
 
5.8%
5169
 
5.2%
5050
 
5.1%
4859
 
4.9%
Other values (357) 36152
36.4%
Latin
ValueCountFrequency (%)
K 25
 
9.5%
T 24
 
9.1%
B 21
 
8.0%
S 21
 
8.0%
e 20
 
7.6%
H 18
 
6.8%
C 13
 
4.9%
G 11
 
4.2%
P 11
 
4.2%
A 10
 
3.8%
Other values (17) 90
34.1%
Common
ValueCountFrequency (%)
29892
49.0%
1 6117
 
10.0%
2 4936
 
8.1%
4 3390
 
5.6%
3 3140
 
5.1%
7 2326
 
3.8%
5 2198
 
3.6%
0 2174
 
3.6%
6 1955
 
3.2%
9 1614
 
2.6%
Other values (7) 3287
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 99414
61.9%
ASCII 61293
38.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
29892
48.8%
1 6117
 
10.0%
2 4936
 
8.1%
4 3390
 
5.5%
3 3140
 
5.1%
7 2326
 
3.8%
5 2198
 
3.6%
0 2174
 
3.5%
6 1955
 
3.2%
9 1614
 
2.6%
Other values (34) 3551
 
5.8%
Hangul
ValueCountFrequency (%)
11648
 
11.7%
6535
 
6.6%
6358
 
6.4%
6309
 
6.3%
5880
 
5.9%
5728
 
5.8%
5726
 
5.8%
5169
 
5.2%
5050
 
5.1%
4859
 
4.9%
Other values (357) 36152
36.4%

Missing values

2023-12-12T16:40:15.126448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T16:40:15.201911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

상호건설폐기물 종류지번주소(발주자)
1740동광건설산업(주)건설폐재류:폐콘크리트서울특별시 영등포구 신길동 238번지 33호 네오캐슬
7462대림산업(주)폐목재(나무의 뿌리ㆍ가지 등 임목폐기물이 5톤 이상인 경우는 제외한다)
4462명문종합건설(주)혼합건설폐기물서울특별시 동작구 노량진동 47번지 2호 동작구청
7455대림산업(주)건설폐재류:폐콘크리트
1466(주)현암도시개발건설폐재류:폐콘크리트서울특별시 동작구 노량진동 232번지 142호
3234(주)성현이엔씨혼합건설폐기물서울특별시 동작구 대방동 390번지 5호 서울공업고등학교
4382(주)터원건설폐재류:폐콘크리트서울특별시 동작구 노량진동 119번지 100호
9382정대엔지니어링(주)건설폐재류:폐아스팔트콘크리트
7532주언건설(주)혼합건설폐기물
9751태라관광주식회사혼합건설폐기물서울특별시 동작구 상도동 353-2번지
상호건설폐기물 종류지번주소(발주자)
6751롯데건설(주)폐합성수지
5126삼경건기건설폐재류:폐콘크리트
8903개인혼합건설폐기물
1886동신물류건설폐재류:폐콘크리트서울특별시 종로구 세종로 211번지 광화문빌딩
2113(주)청산건설건설폐재류:폐콘크리트서울특별시 동작구 사당동 432번지 21호
9464(주)하오삼건설건설폐재류:폐콘크리트
9717인투종합건설(주)건설폐재류:폐콘크리트서울특별시 송파구 오금동 152번지 1호
8626개인혼합건설폐기물
9025(주)엠케이지종합건설혼합건설폐기물
9545두리공영(주)혼합건설폐기물서울특별시 동작구 사당동 1034번지 33호

Duplicate rows

Most frequently occurring

상호건설폐기물 종류지번주소(발주자)# duplicates
455개인건설폐재류:폐콘크리트695
524개인혼합건설폐기물515
1179주언건설(주)건설폐재류:폐콘크리트57
318(주)터원건설폐재류:폐콘크리트52
745동작구청건설폐재류:폐콘크리트서울특별시 동작구 노량진동 47-2번지47
348(주)터원혼합건설폐기물45
215(주)청산건설건설폐재류:폐콘크리트41
2(주)가람주택건설건설폐재류:폐콘크리트35
35(주)대우건설건설폐재류:폐콘크리트34
1080윤정이엔씨(주)건설폐재류:폐콘크리트34