Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells29811
Missing cells (%)59.6%
Duplicate rows4
Duplicate rows (%)< 0.1%
Total size in memory468.8 KiB
Average record size in memory48.0 B

Variable types

Text3
Categorical2

Dataset

Description전라남도 무안군 사업장폐기물 배출자 신고현황으로 사업장폐기물 배출 사업장 상호, 사업장도로명주소, 사업장지번주소, 생활계구분, 폐기물종류 등의 데이터입니다.
URLhttps://www.data.go.kr/data/15061982/fileData.do

Alerts

Dataset has 4 (< 0.1%) duplicate rowsDuplicates
생활계구분 is highly imbalanced (96.8%)Imbalance
폐기물 종류 is highly imbalanced (98.3%)Imbalance
상호 has 9937 (99.4%) missing valuesMissing
사업장도로명주소 has 9937 (99.4%) missing valuesMissing
사업장지번주소 has 9937 (99.4%) missing valuesMissing

Reproduction

Analysis started2023-12-12 08:04:57.839514
Analysis finished2023-12-12 08:04:58.557073
Duration0.72 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

상호
Text

MISSING 

Distinct47
Distinct (%)74.6%
Missing9937
Missing (%)99.4%
Memory size156.2 KiB
2023-12-12T17:04:58.797328image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length15
Mean length8.9206349
Min length4

Characters and Unicode

Total characters562
Distinct characters141
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)60.3%

Sample

1st row(유)남해환경
2nd row서나산업
3rd row주식회사 금화
4th row천지영농조합법인
5th row보울앤플레이트(주)
ValueCountFrequency (%)
유)남해환경 4
 
5.2%
주식회사 4
 
5.2%
유한회사 4
 
5.2%
드림에너지 3
 
3.9%
농업회사법인유한회사성아축산 3
 
3.9%
주)동양환경 3
 
3.9%
천지영농조합법인 3
 
3.9%
금화 3
 
3.9%
영농조합법인 2
 
2.6%
주)대기산업 2
 
2.6%
Other values (44) 46
59.7%
2023-12-12T17:04:59.269998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 26
 
4.6%
) 26
 
4.6%
23
 
4.1%
21
 
3.7%
18
 
3.2%
18
 
3.2%
17
 
3.0%
17
 
3.0%
17
 
3.0%
17
 
3.0%
Other values (131) 362
64.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 490
87.2%
Open Punctuation 26
 
4.6%
Close Punctuation 26
 
4.6%
Space Separator 14
 
2.5%
Uppercase Letter 6
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
23
 
4.7%
21
 
4.3%
18
 
3.7%
18
 
3.7%
17
 
3.5%
17
 
3.5%
17
 
3.5%
17
 
3.5%
16
 
3.3%
15
 
3.1%
Other values (122) 311
63.5%
Uppercase Letter
ValueCountFrequency (%)
C 1
16.7%
F 1
16.7%
R 1
16.7%
P 1
16.7%
E 1
16.7%
N 1
16.7%
Open Punctuation
ValueCountFrequency (%)
( 26
100.0%
Close Punctuation
ValueCountFrequency (%)
) 26
100.0%
Space Separator
ValueCountFrequency (%)
14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 490
87.2%
Common 66
 
11.7%
Latin 6
 
1.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
23
 
4.7%
21
 
4.3%
18
 
3.7%
18
 
3.7%
17
 
3.5%
17
 
3.5%
17
 
3.5%
17
 
3.5%
16
 
3.3%
15
 
3.1%
Other values (122) 311
63.5%
Latin
ValueCountFrequency (%)
C 1
16.7%
F 1
16.7%
R 1
16.7%
P 1
16.7%
E 1
16.7%
N 1
16.7%
Common
ValueCountFrequency (%)
( 26
39.4%
) 26
39.4%
14
21.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 490
87.2%
ASCII 72
 
12.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 26
36.1%
) 26
36.1%
14
19.4%
C 1
 
1.4%
F 1
 
1.4%
R 1
 
1.4%
P 1
 
1.4%
E 1
 
1.4%
N 1
 
1.4%
Hangul
ValueCountFrequency (%)
23
 
4.7%
21
 
4.3%
18
 
3.7%
18
 
3.7%
17
 
3.5%
17
 
3.5%
17
 
3.5%
17
 
3.5%
16
 
3.3%
15
 
3.1%
Other values (122) 311
63.5%
Distinct46
Distinct (%)73.0%
Missing9937
Missing (%)99.4%
Memory size156.2 KiB
2023-12-12T17:04:59.595653image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length39
Median length35
Mean length24.539683
Min length19

Characters and Unicode

Total characters1546
Distinct characters102
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique36 ?
Unique (%)57.1%

Sample

1st row전라남도 무안군 삼향읍 삼향중앙로 140-51 (유)남해환경
2nd row전라남도 무안군 삼향읍 영산로 1392-60
3rd row전라남도 무안군 청계면 청계공단1길 128
4th row전라남도 무안군 청계면 청운로 322-163_ 성아축산
5th row전라남도 무안군 청계면 청계공단길 70
ValueCountFrequency (%)
전라남도 63
19.1%
무안군 63
19.1%
청계면 31
 
9.4%
삼향읍 11
 
3.3%
영산로 8
 
2.4%
일로읍 7
 
2.1%
청계공단길 7
 
2.1%
해안로 5
 
1.5%
삼향중앙로 5
 
1.5%
청계공단1길 4
 
1.2%
Other values (81) 126
38.2%
2023-12-12T17:05:00.095749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
267
 
17.3%
73
 
4.7%
69
 
4.5%
68
 
4.4%
64
 
4.1%
63
 
4.1%
63
 
4.1%
63
 
4.1%
1 60
 
3.9%
47
 
3.0%
Other values (92) 709
45.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 963
62.3%
Space Separator 267
 
17.3%
Decimal Number 258
 
16.7%
Dash Punctuation 35
 
2.3%
Open Punctuation 9
 
0.6%
Close Punctuation 9
 
0.6%
Connector Punctuation 5
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
73
 
7.6%
69
 
7.2%
68
 
7.1%
64
 
6.6%
63
 
6.5%
63
 
6.5%
63
 
6.5%
47
 
4.9%
46
 
4.8%
43
 
4.5%
Other values (77) 364
37.8%
Decimal Number
ValueCountFrequency (%)
1 60
23.3%
2 35
13.6%
5 30
11.6%
6 26
10.1%
0 24
 
9.3%
3 21
 
8.1%
8 20
 
7.8%
4 19
 
7.4%
9 15
 
5.8%
7 8
 
3.1%
Space Separator
ValueCountFrequency (%)
267
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 35
100.0%
Open Punctuation
ValueCountFrequency (%)
( 9
100.0%
Close Punctuation
ValueCountFrequency (%)
) 9
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 963
62.3%
Common 583
37.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
73
 
7.6%
69
 
7.2%
68
 
7.1%
64
 
6.6%
63
 
6.5%
63
 
6.5%
63
 
6.5%
47
 
4.9%
46
 
4.8%
43
 
4.5%
Other values (77) 364
37.8%
Common
ValueCountFrequency (%)
267
45.8%
1 60
 
10.3%
2 35
 
6.0%
- 35
 
6.0%
5 30
 
5.1%
6 26
 
4.5%
0 24
 
4.1%
3 21
 
3.6%
8 20
 
3.4%
4 19
 
3.3%
Other values (5) 46
 
7.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 963
62.3%
ASCII 583
37.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
267
45.8%
1 60
 
10.3%
2 35
 
6.0%
- 35
 
6.0%
5 30
 
5.1%
6 26
 
4.5%
0 24
 
4.1%
3 21
 
3.6%
8 20
 
3.4%
4 19
 
3.3%
Other values (5) 46
 
7.9%
Hangul
ValueCountFrequency (%)
73
 
7.6%
69
 
7.2%
68
 
7.1%
64
 
6.6%
63
 
6.5%
63
 
6.5%
63
 
6.5%
47
 
4.9%
46
 
4.8%
43
 
4.5%
Other values (77) 364
37.8%

사업장지번주소
Text

MISSING 

Distinct44
Distinct (%)69.8%
Missing9937
Missing (%)99.4%
Memory size156.2 KiB
2023-12-12T17:05:00.380791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length40
Median length36
Mean length22.809524
Min length1

Characters and Unicode

Total characters1437
Distinct characters98
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)52.4%

Sample

1st row전라남도 무안군 삼향읍 유교리 350-1 (유)남해환경
2nd row
3rd row전라남도 무안군 청계면 송현리 637-2
4th row전라남도 무안군 청계면 서호리 922-4 성아축산
5th row전라남도 무안군 청계면 청수리 556-2
ValueCountFrequency (%)
전라남도 60
19.2%
무안군 60
19.2%
청계면 31
 
9.9%
삼향읍 9
 
2.9%
청수리 8
 
2.6%
일로읍 7
 
2.2%
송현리 6
 
1.9%
유교리 4
 
1.3%
350-1 4
 
1.3%
유)남해환경 4
 
1.3%
Other values (81) 120
38.3%
2023-12-12T17:05:00.776793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
319
22.2%
67
 
4.7%
63
 
4.4%
63
 
4.4%
62
 
4.3%
61
 
4.2%
60
 
4.2%
60
 
4.2%
60
 
4.2%
43
 
3.0%
Other values (88) 579
40.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 850
59.2%
Space Separator 319
 
22.2%
Decimal Number 221
 
15.4%
Dash Punctuation 37
 
2.6%
Close Punctuation 5
 
0.3%
Open Punctuation 5
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
67
 
7.9%
63
 
7.4%
63
 
7.4%
62
 
7.3%
61
 
7.2%
60
 
7.1%
60
 
7.1%
60
 
7.1%
43
 
5.1%
41
 
4.8%
Other values (74) 270
31.8%
Decimal Number
ValueCountFrequency (%)
1 37
16.7%
5 37
16.7%
3 26
11.8%
2 25
11.3%
6 20
9.0%
4 16
7.2%
8 15
6.8%
7 15
6.8%
9 15
6.8%
0 15
6.8%
Space Separator
ValueCountFrequency (%)
319
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 37
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 850
59.2%
Common 587
40.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
67
 
7.9%
63
 
7.4%
63
 
7.4%
62
 
7.3%
61
 
7.2%
60
 
7.1%
60
 
7.1%
60
 
7.1%
43
 
5.1%
41
 
4.8%
Other values (74) 270
31.8%
Common
ValueCountFrequency (%)
319
54.3%
1 37
 
6.3%
- 37
 
6.3%
5 37
 
6.3%
3 26
 
4.4%
2 25
 
4.3%
6 20
 
3.4%
4 16
 
2.7%
8 15
 
2.6%
7 15
 
2.6%
Other values (4) 40
 
6.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 850
59.2%
ASCII 587
40.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
319
54.3%
1 37
 
6.3%
- 37
 
6.3%
5 37
 
6.3%
3 26
 
4.4%
2 25
 
4.3%
6 20
 
3.4%
4 16
 
2.7%
8 15
 
2.6%
7 15
 
2.6%
Other values (4) 40
 
6.8%
Hangul
ValueCountFrequency (%)
67
 
7.9%
63
 
7.4%
63
 
7.4%
62
 
7.3%
61
 
7.2%
60
 
7.1%
60
 
7.1%
60
 
7.1%
43
 
5.1%
41
 
4.8%
Other values (74) 270
31.8%

생활계구분
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9937 
배출시설계
 
36
비배출시설계
 
22
 
5

Length

Max length6
Median length4
Mean length4.0065
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9937
99.4%
배출시설계 36
 
0.4%
비배출시설계 22
 
0.2%
5
 
0.1%

Length

2023-12-12T17:05:00.906507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:05:01.032651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9937
99.4%
배출시설계 36
 
0.4%
비배출시설계 22
 
0.2%

폐기물 종류
Categorical

IMBALANCE 

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9937 
폐합성수지류(폐염화비닐수지류는 제외한다)
 
11
그 밖의 식물성잔재물
 
8
가축분뇨처리오니
 
6
폐도자기 조각
 
5
Other values (20)
 
33

Length

Max length84
Median length4
Mean length4.057
Min length3

Unique

Unique9 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9937
99.4%
폐합성수지류(폐염화비닐수지류는 제외한다) 11
 
0.1%
그 밖의 식물성잔재물 8
 
0.1%
가축분뇨처리오니 6
 
0.1%
폐도자기 조각 5
 
0.1%
폐합성수지류 3
 
< 0.1%
폐콘크리트 3
 
< 0.1%
초본류 2
 
< 0.1%
폐수처리오니 2
 
< 0.1%
하수처리오니 2
 
< 0.1%
Other values (15) 21
 
0.2%

Length

2023-12-12T17:05:01.168864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 9937
98.5%
16
 
0.2%
밖의 16
 
0.2%
제외한다 12
 
0.1%
폐합성수지류(폐염화비닐수지류는 11
 
0.1%
식물성잔재물 8
 
0.1%
가축분뇨처리오니 6
 
0.1%
사업장폐기물 6
 
0.1%
폐도자기 5
 
< 0.1%
조각 5
 
< 0.1%
Other values (46) 67
 
0.7%

Correlations

2023-12-12T17:05:01.263470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상호사업장도로명주소사업장지번주소생활계구분폐기물 종류
상호1.0001.0001.0000.8850.000
사업장도로명주소1.0001.0001.0000.8960.000
사업장지번주소1.0001.0001.0000.9140.580
생활계구분0.8850.8960.9141.0000.680
폐기물 종류0.0000.0000.5800.6801.000
2023-12-12T17:05:01.402121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
폐기물 종류생활계구분
폐기물 종류1.0000.327
생활계구분0.3271.000
2023-12-12T17:05:01.510001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
생활계구분폐기물 종류
생활계구분1.0000.327
폐기물 종류0.3271.000

Missing values

2023-12-12T17:04:58.267627image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T17:04:58.362468image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T17:04:58.478534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

상호사업장도로명주소사업장지번주소생활계구분폐기물 종류
27044<NA><NA><NA><NA><NA>
9192<NA><NA><NA><NA><NA>
59065<NA><NA><NA><NA><NA>
4071<NA><NA><NA><NA><NA>
36926<NA><NA><NA><NA><NA>
48204<NA><NA><NA><NA><NA>
19905<NA><NA><NA><NA><NA>
17440<NA><NA><NA><NA><NA>
22552<NA><NA><NA><NA><NA>
42295<NA><NA><NA><NA><NA>
상호사업장도로명주소사업장지번주소생활계구분폐기물 종류
40297<NA><NA><NA><NA><NA>
59661<NA><NA><NA><NA><NA>
21525<NA><NA><NA><NA><NA>
31945<NA><NA><NA><NA><NA>
41630<NA><NA><NA><NA><NA>
61687<NA><NA><NA><NA><NA>
46671<NA><NA><NA><NA><NA>
20591<NA><NA><NA><NA><NA>
30395<NA><NA><NA><NA><NA>
12120<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

상호사업장도로명주소사업장지번주소생활계구분폐기물 종류# duplicates
3<NA><NA><NA><NA><NA>9937
2천지영농조합법인전라남도 무안군 청계면 청운로 322-163_ 성아축산전라남도 무안군 청계면 서호리 922-4 성아축산배출시설계가축분뇨처리오니3
0(유)남해환경전라남도 무안군 삼향읍 삼향중앙로 140-51 (유)남해환경전라남도 무안군 삼향읍 유교리 350-1 (유)남해환경배출시설계사업장폐기물 소각시설 바닥재2
1농업회사법인유한회사성아축산전라남도 무안군 현경면 오류길 287-88전라남도 무안군 현경면 오류리 1091-1배출시설계가축분뇨처리오니2