Overview

Dataset statistics

Number of variables7
Number of observations723
Missing cells1451
Missing cells (%)28.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory41.1 KiB
Average record size in memory58.2 B

Variable types

Categorical2
Text4
Unsupported1

Dataset

Description경상남도 양산시 대기 및 폐수 배출업소의 업소명, 사업장 소재지주소, 업종, 전화번호 등을 읍면동별로 배출업소사업장현황을 확인할 수 있습니다.
Author경상남도 양산시
URLhttps://www.data.go.kr/data/3040406/fileData.do

Alerts

시설구분 is highly imbalanced (86.2%)Imbalance
업종 has 25 (3.5%) missing valuesMissing
전화번호 has 700 (96.8%) missing valuesMissing
Unnamed: 6 has 723 (100.0%) missing valuesMissing
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 13:32:21.559539
Analysis finished2023-12-12 13:32:22.513207
Duration0.95 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시설구분
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.8 KiB
폐수배출시설
699 
대기배출시설
 
23
<NA>
 
1

Length

Max length6
Median length6
Mean length5.9972337
Min length4

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row대기배출시설
2nd row대기배출시설
3rd row대기배출시설
4th row대기배출시설
5th row대기배출시설

Common Values

ValueCountFrequency (%)
폐수배출시설 699
96.7%
대기배출시설 23
 
3.2%
<NA> 1
 
0.1%

Length

2023-12-12T22:32:22.926130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:32:23.073548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
폐수배출시설 699
96.7%
대기배출시설 23
 
3.2%
na 1
 
0.1%
Distinct714
Distinct (%)99.0%
Missing2
Missing (%)0.3%
Memory size5.8 KiB
2023-12-12T22:32:23.345995image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length19
Mean length6.3259362
Min length2

Characters and Unicode

Total characters4561
Distinct characters372
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique707 ?
Unique (%)98.1%

Sample

1st row고려강선(주)양산공장
2nd row고려특수선재㈜
3rd row넥센타이어(주)
4th row동아타이어공업(주)
5th row(주)디비켐
ValueCountFrequency (%)
주식회사 20
 
2.6%
양산공장 5
 
0.6%
2공장 3
 
0.4%
진성산업 2
 
0.3%
광원산업 2
 
0.3%
㈜써테크 2
 
0.3%
㈜대원크리닝 2
 
0.3%
㈜태일잉크화학 2
 
0.3%
양산지점 2
 
0.3%
㈜디티알 2
 
0.3%
Other values (722) 733
94.6%
2023-12-12T22:32:23.829318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
489
 
10.7%
145
 
3.2%
127
 
2.8%
115
 
2.5%
99
 
2.2%
80
 
1.8%
78
 
1.7%
76
 
1.7%
75
 
1.6%
72
 
1.6%
Other values (362) 3205
70.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3771
82.7%
Other Symbol 489
 
10.7%
Uppercase Letter 82
 
1.8%
Space Separator 76
 
1.7%
Close Punctuation 45
 
1.0%
Open Punctuation 45
 
1.0%
Decimal Number 38
 
0.8%
Other Punctuation 11
 
0.2%
Dash Punctuation 3
 
0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
145
 
3.8%
127
 
3.4%
115
 
3.0%
99
 
2.6%
80
 
2.1%
78
 
2.1%
75
 
2.0%
72
 
1.9%
65
 
1.7%
64
 
1.7%
Other values (326) 2851
75.6%
Uppercase Letter
ValueCountFrequency (%)
M 11
13.4%
S 11
13.4%
C 11
13.4%
T 8
9.8%
D 5
 
6.1%
H 4
 
4.9%
A 4
 
4.9%
E 4
 
4.9%
F 3
 
3.7%
R 3
 
3.7%
Other values (11) 18
22.0%
Decimal Number
ValueCountFrequency (%)
2 22
57.9%
1 9
23.7%
3 5
 
13.2%
4 1
 
2.6%
0 1
 
2.6%
Close Punctuation
ValueCountFrequency (%)
) 44
97.8%
] 1
 
2.2%
Open Punctuation
ValueCountFrequency (%)
( 44
97.8%
[ 1
 
2.2%
Other Punctuation
ValueCountFrequency (%)
. 10
90.9%
& 1
 
9.1%
Other Symbol
ValueCountFrequency (%)
489
100.0%
Space Separator
ValueCountFrequency (%)
76
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4260
93.4%
Common 219
 
4.8%
Latin 82
 
1.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
489
 
11.5%
145
 
3.4%
127
 
3.0%
115
 
2.7%
99
 
2.3%
80
 
1.9%
78
 
1.8%
75
 
1.8%
72
 
1.7%
65
 
1.5%
Other values (327) 2915
68.4%
Latin
ValueCountFrequency (%)
M 11
13.4%
S 11
13.4%
C 11
13.4%
T 8
9.8%
D 5
 
6.1%
H 4
 
4.9%
A 4
 
4.9%
E 4
 
4.9%
F 3
 
3.7%
R 3
 
3.7%
Other values (11) 18
22.0%
Common
ValueCountFrequency (%)
76
34.7%
) 44
20.1%
( 44
20.1%
2 22
 
10.0%
. 10
 
4.6%
1 9
 
4.1%
3 5
 
2.3%
- 3
 
1.4%
4 1
 
0.5%
0 1
 
0.5%
Other values (4) 4
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3771
82.7%
None 489
 
10.7%
ASCII 301
 
6.6%

Most frequent character per block

None
ValueCountFrequency (%)
489
100.0%
Hangul
ValueCountFrequency (%)
145
 
3.8%
127
 
3.4%
115
 
3.0%
99
 
2.6%
80
 
2.1%
78
 
2.1%
75
 
2.0%
72
 
1.9%
65
 
1.7%
64
 
1.7%
Other values (326) 2851
75.6%
ASCII
ValueCountFrequency (%)
76
25.2%
) 44
14.6%
( 44
14.6%
2 22
 
7.3%
M 11
 
3.7%
S 11
 
3.7%
C 11
 
3.7%
. 10
 
3.3%
1 9
 
3.0%
T 8
 
2.7%
Other values (25) 55
18.3%

주소
Text

Distinct668
Distinct (%)92.5%
Missing1
Missing (%)0.1%
Memory size5.8 KiB
2023-12-12T22:32:24.166004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length17
Mean length10.199446
Min length6

Characters and Unicode

Total characters7364
Distinct characters124
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique625 ?
Unique (%)86.6%

Sample

1st row유산공단7길 15(유산동)
2nd row어실로 43(유산동)
3rd row충렬로 355(유산동)
4th row유산공단11길 11(유산동)
5th row새목1길 15(유산동)
ValueCountFrequency (%)
상북면 94
 
6.0%
어실로 24
 
1.5%
양산대로 23
 
1.5%
충렬로 23
 
1.5%
유산공단10길 20
 
1.3%
소주로 19
 
1.2%
산막공단남11길 19
 
1.2%
유산공단3길 18
 
1.1%
유산공단8길 17
 
1.1%
산막공단북5길 16
 
1.0%
Other values (529) 1304
82.7%
2023-12-12T22:32:24.594359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1000
 
13.6%
1 546
 
7.4%
498
 
6.8%
429
 
5.8%
405
 
5.5%
322
 
4.4%
2 301
 
4.1%
3 296
 
4.0%
4 254
 
3.4%
226
 
3.1%
Other values (114) 3087
41.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3721
50.5%
Decimal Number 2417
32.8%
Space Separator 1000
 
13.6%
Dash Punctuation 181
 
2.5%
Open Punctuation 19
 
0.3%
Close Punctuation 19
 
0.3%
Other Punctuation 4
 
0.1%
Uppercase Letter 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
498
13.4%
429
 
11.5%
405
 
10.9%
322
 
8.7%
226
 
6.1%
208
 
5.6%
150
 
4.0%
123
 
3.3%
120
 
3.2%
112
 
3.0%
Other values (95) 1128
30.3%
Decimal Number
ValueCountFrequency (%)
1 546
22.6%
2 301
12.5%
3 296
12.2%
4 254
10.5%
5 210
 
8.7%
6 195
 
8.1%
0 180
 
7.4%
7 164
 
6.8%
8 136
 
5.6%
9 135
 
5.6%
Uppercase Letter
ValueCountFrequency (%)
C 1
33.3%
I 1
33.3%
D 1
33.3%
Other Punctuation
ValueCountFrequency (%)
, 3
75.0%
: 1
 
25.0%
Space Separator
ValueCountFrequency (%)
1000
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 181
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3721
50.5%
Common 3640
49.4%
Latin 3
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
498
13.4%
429
 
11.5%
405
 
10.9%
322
 
8.7%
226
 
6.1%
208
 
5.6%
150
 
4.0%
123
 
3.3%
120
 
3.2%
112
 
3.0%
Other values (95) 1128
30.3%
Common
ValueCountFrequency (%)
1000
27.5%
1 546
15.0%
2 301
 
8.3%
3 296
 
8.1%
4 254
 
7.0%
5 210
 
5.8%
6 195
 
5.4%
- 181
 
5.0%
0 180
 
4.9%
7 164
 
4.5%
Other values (6) 313
 
8.6%
Latin
ValueCountFrequency (%)
C 1
33.3%
I 1
33.3%
D 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3721
50.5%
ASCII 3643
49.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1000
27.4%
1 546
15.0%
2 301
 
8.3%
3 296
 
8.1%
4 254
 
7.0%
5 210
 
5.8%
6 195
 
5.4%
- 181
 
5.0%
0 180
 
4.9%
7 164
 
4.5%
Other values (9) 316
 
8.7%
Hangul
ValueCountFrequency (%)
498
13.4%
429
 
11.5%
405
 
10.9%
322
 
8.7%
226
 
6.1%
208
 
5.6%
150
 
4.0%
123
 
3.3%
120
 
3.2%
112
 
3.0%
Other values (95) 1128
30.3%

종별
Categorical

Distinct6
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size5.8 KiB
5
427 
4
243 
3
 
25
2
 
19
1
 
7

Length

Max length4
Median length1
Mean length1.0082988
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4
2nd row3
3rd row4
4th row5
5th row5

Common Values

ValueCountFrequency (%)
5 427
59.1%
4 243
33.6%
3 25
 
3.5%
2 19
 
2.6%
1 7
 
1.0%
<NA> 2
 
0.3%

Length

2023-12-12T22:32:24.742176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:32:24.882798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5 427
59.1%
4 243
33.6%
3 25
 
3.5%
2 19
 
2.6%
1 7
 
1.0%
na 2
 
0.3%

업종
Text

MISSING 

Distinct304
Distinct (%)43.6%
Missing25
Missing (%)3.5%
Memory size5.8 KiB
2023-12-12T22:32:25.175339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length34
Median length26
Mean length12.08596
Min length3

Characters and Unicode

Total characters8436
Distinct characters252
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique198 ?
Unique (%)28.4%

Sample

1st row가공 및 재생플라스틱 원료생산업
2nd row강주물주조업
3rd row기타곡물가공품제조업
4th row윤활유 및 그리스제조업
5th row합성수지 및 기타 플라스틱 물질 제조업
ValueCountFrequency (%)
194
 
11.7%
제조업 155
 
9.4%
기타 50
 
3.0%
그외 46
 
2.8%
플라스틱 43
 
2.6%
자동차종합수리업 33
 
2.0%
도금업 23
 
1.4%
21
 
1.3%
산업용 21
 
1.3%
물질 21
 
1.3%
Other values (433) 1048
63.3%
2023-12-12T22:32:25.638265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
962
 
11.4%
725
 
8.6%
633
 
7.5%
517
 
6.1%
348
 
4.1%
293
 
3.5%
246
 
2.9%
209
 
2.5%
140
 
1.7%
130
 
1.5%
Other values (242) 4233
50.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7370
87.4%
Space Separator 962
 
11.4%
Other Punctuation 77
 
0.9%
Decimal Number 13
 
0.2%
Open Punctuation 7
 
0.1%
Close Punctuation 7
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
725
 
9.8%
633
 
8.6%
517
 
7.0%
348
 
4.7%
293
 
4.0%
246
 
3.3%
209
 
2.8%
140
 
1.9%
130
 
1.8%
126
 
1.7%
Other values (232) 4003
54.3%
Decimal Number
ValueCountFrequency (%)
1 6
46.2%
2 4
30.8%
4 2
 
15.4%
3 1
 
7.7%
Other Punctuation
ValueCountFrequency (%)
, 75
97.4%
· 1
 
1.3%
. 1
 
1.3%
Space Separator
ValueCountFrequency (%)
962
100.0%
Open Punctuation
ValueCountFrequency (%)
( 7
100.0%
Close Punctuation
ValueCountFrequency (%)
) 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7370
87.4%
Common 1066
 
12.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
725
 
9.8%
633
 
8.6%
517
 
7.0%
348
 
4.7%
293
 
4.0%
246
 
3.3%
209
 
2.8%
140
 
1.9%
130
 
1.8%
126
 
1.7%
Other values (232) 4003
54.3%
Common
ValueCountFrequency (%)
962
90.2%
, 75
 
7.0%
( 7
 
0.7%
) 7
 
0.7%
1 6
 
0.6%
2 4
 
0.4%
4 2
 
0.2%
3 1
 
0.1%
· 1
 
0.1%
. 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7370
87.4%
ASCII 1065
 
12.6%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
962
90.3%
, 75
 
7.0%
( 7
 
0.7%
) 7
 
0.7%
1 6
 
0.6%
2 4
 
0.4%
4 2
 
0.2%
3 1
 
0.1%
. 1
 
0.1%
Hangul
ValueCountFrequency (%)
725
 
9.8%
633
 
8.6%
517
 
7.0%
348
 
4.7%
293
 
4.0%
246
 
3.3%
209
 
2.8%
140
 
1.9%
130
 
1.8%
126
 
1.7%
Other values (232) 4003
54.3%
None
ValueCountFrequency (%)
· 1
100.0%

전화번호
Text

MISSING 

Distinct23
Distinct (%)100.0%
Missing700
Missing (%)96.8%
Memory size5.8 KiB
2023-12-12T22:32:25.856223image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length12
Mean length16.869565
Min length12

Characters and Unicode

Total characters388
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)100.0%

Sample

1st row055-380-3432(055-380-3490)
2nd row055-389-1050(055-386-6520)
3rd row055-370-5273
4th row055-370-7756(055-386-0661)
5th row055-388-1500(055-383-0190)
ValueCountFrequency (%)
055-389-1050(055-386-6520 1
 
4.3%
055-380-9928 1
 
4.3%
055-372-0180(055-372-0110 1
 
4.3%
055-383-9601 1
 
4.3%
055-781-0119 1
 
4.3%
055-268-0300 1
 
4.3%
055-586-2310 1
 
4.3%
055-386-4177 1
 
4.3%
055-386-1251 1
 
4.3%
055-911-3082 1
 
4.3%
Other values (13) 13
56.5%
2023-12-12T22:32:26.238437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 75
19.3%
5 73
18.8%
- 62
16.0%
3 42
10.8%
8 25
 
6.4%
1 21
 
5.4%
7 21
 
5.4%
6 18
 
4.6%
2 16
 
4.1%
9 14
 
3.6%
Other values (3) 21
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 310
79.9%
Dash Punctuation 62
 
16.0%
Open Punctuation 8
 
2.1%
Close Punctuation 8
 
2.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 75
24.2%
5 73
23.5%
3 42
13.5%
8 25
 
8.1%
1 21
 
6.8%
7 21
 
6.8%
6 18
 
5.8%
2 16
 
5.2%
9 14
 
4.5%
4 5
 
1.6%
Dash Punctuation
ValueCountFrequency (%)
- 62
100.0%
Open Punctuation
ValueCountFrequency (%)
( 8
100.0%
Close Punctuation
ValueCountFrequency (%)
) 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 388
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 75
19.3%
5 73
18.8%
- 62
16.0%
3 42
10.8%
8 25
 
6.4%
1 21
 
5.4%
7 21
 
5.4%
6 18
 
4.6%
2 16
 
4.1%
9 14
 
3.6%
Other values (3) 21
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 388
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 75
19.3%
5 73
18.8%
- 62
16.0%
3 42
10.8%
8 25
 
6.4%
1 21
 
5.4%
7 21
 
5.4%
6 18
 
4.6%
2 16
 
4.1%
9 14
 
3.6%
Other values (3) 21
 
5.4%

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing723
Missing (%)100.0%
Memory size6.5 KiB

Correlations

2023-12-12T22:32:26.341500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시설구분종별전화번호
시설구분1.0000.076NaN
종별0.0761.0001.000
전화번호NaN1.0001.000
2023-12-12T22:32:26.438727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시설구분종별
시설구분1.0000.093
종별0.0931.000
2023-12-12T22:32:26.559168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시설구분종별
시설구분1.0000.093
종별0.0931.000

Missing values

2023-12-12T22:32:22.124514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:32:22.311320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T22:32:22.443749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시설구분업소명주소종별업종전화번호Unnamed: 6
0대기배출시설고려강선(주)양산공장유산공단7길 15(유산동)4<NA>055-380-3432(055-380-3490)<NA>
1대기배출시설고려특수선재㈜어실로 43(유산동)3<NA>055-389-1050(055-386-6520)<NA>
2대기배출시설넥센타이어(주)충렬로 355(유산동)4<NA>055-370-5273<NA>
3대기배출시설동아타이어공업(주)유산공단11길 11(유산동)5<NA>055-370-7756(055-386-0661)<NA>
4대기배출시설(주)디비켐새목1길 15(유산동)5<NA>055-388-1500(055-383-0190)<NA>
5대기배출시설진흥철강(주)어실로 119(유산동)4<NA>055-389-0360(055-382-0620)<NA>
6대기배출시설코카콜라음료(주)충렬로 269(유산동)1<NA>055-370-4405<NA>
7대기배출시설한일제관(주)유산공단4길 21(유산동)5<NA>055-370-6636(055-385-1327)<NA>
8대기배출시설(주)흥아 양산공장충렬로 327(유산동)5<NA>055-370-3759(055-370-3799)<NA>
9대기배출시설반도코리아(주)어곡공단1길 38(어곡동)5<NA>055-371-9200<NA>
시설구분업소명주소종별업종전화번호Unnamed: 6
713폐수배출시설㈜다미온푸드주남산단로 175천연 및 혼합조제 조미료제조업<NA><NA>
714폐수배출시설에스엠케이㈜주남산단로 455선박 구성 부분품 제조업<NA><NA>
715폐수배출시설㈜웅상현대1급정비주남산단로 74자동차종합수리업<NA><NA>
716폐수배출시설부성폴리콤㈜진등1길 95합성수지및기타플라스틱제품<NA><NA>
717폐수배출시설영케미칼㈜초동길 315기타화학제품<NA><NA>
718폐수배출시설은진자동차종합정비평산남로 185자동차정비업<NA><NA>
719폐수배출시설천성산온천레포츠㈜평산북2길 285서비스업<NA><NA>
720폐수배출시설영창목재산업평산북2길 304제재 및 목재가공업<NA><NA>
721폐수배출시설㈜동흥포장평산회야로 1574종이제품제조<NA><NA>
722<NA><NA><NA><NA><NA><NA><NA>