Overview

Dataset statistics

Number of variables4
Number of observations460
Missing cells1
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory14.5 KiB
Average record size in memory32.3 B

Variable types

Text3
Categorical1

Dataset

Description함안군 폐수배수시설 설치현황 제공, 폐수배출시설의 사업장명, 폐수배출시설의 소재지 주소, 폐수배출시설 업장의 업종명, 폐수배출시설의 종별 구분 등의 정보를 포함합니다.
Author경상남도 함안군
URLhttps://www.data.go.kr/data/3066728/fileData.do

Alerts

is highly imbalanced (85.6%)Imbalance

Reproduction

Analysis started2023-12-13 00:36:39.660887
Analysis finished2023-12-13 00:36:40.110489
Duration0.45 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct455
Distinct (%)98.9%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
2023-12-13T09:36:40.270985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length16
Mean length7.373913
Min length2

Characters and Unicode

Total characters3392
Distinct characters297
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique450 ?
Unique (%)97.8%

Sample

1st row팀일레븐
2nd row쌍둥이세차장
3rd row강남 손 세차장
4th row수영 손세차장
5th row함안주유소
ValueCountFrequency (%)
주식회사 6
 
1.2%
함안지점 5
 
1.0%
2공장 4
 
0.8%
제3공장 3
 
0.6%
함안공장 3
 
0.6%
광진테크(주 2
 
0.4%
주)지티씨 2
 
0.4%
주)쎄노텍 2
 
0.4%
주)삼화대림화학 2
 
0.4%
의료법인 2
 
0.4%
Other values (471) 482
94.0%
2023-12-13T09:36:40.593775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
303
 
8.9%
( 293
 
8.6%
) 293
 
8.6%
80
 
2.4%
80
 
2.4%
65
 
1.9%
62
 
1.8%
56
 
1.7%
54
 
1.6%
53
 
1.6%
Other values (287) 2053
60.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2674
78.8%
Open Punctuation 293
 
8.6%
Close Punctuation 293
 
8.6%
Space Separator 53
 
1.6%
Uppercase Letter 38
 
1.1%
Decimal Number 37
 
1.1%
Other Punctuation 4
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
303
 
11.3%
80
 
3.0%
80
 
3.0%
65
 
2.4%
62
 
2.3%
56
 
2.1%
54
 
2.0%
53
 
2.0%
53
 
2.0%
47
 
1.8%
Other values (260) 1821
68.1%
Uppercase Letter
ValueCountFrequency (%)
E 6
15.8%
C 4
10.5%
H 4
10.5%
G 4
10.5%
N 4
10.5%
T 3
7.9%
S 3
7.9%
P 2
 
5.3%
B 2
 
5.3%
M 2
 
5.3%
Other values (4) 4
10.5%
Decimal Number
ValueCountFrequency (%)
2 16
43.2%
1 10
27.0%
3 5
 
13.5%
8 2
 
5.4%
4 1
 
2.7%
0 1
 
2.7%
5 1
 
2.7%
6 1
 
2.7%
Other Punctuation
ValueCountFrequency (%)
& 3
75.0%
. 1
 
25.0%
Open Punctuation
ValueCountFrequency (%)
( 293
100.0%
Close Punctuation
ValueCountFrequency (%)
) 293
100.0%
Space Separator
ValueCountFrequency (%)
53
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2674
78.8%
Common 680
 
20.0%
Latin 38
 
1.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
303
 
11.3%
80
 
3.0%
80
 
3.0%
65
 
2.4%
62
 
2.3%
56
 
2.1%
54
 
2.0%
53
 
2.0%
53
 
2.0%
47
 
1.8%
Other values (260) 1821
68.1%
Latin
ValueCountFrequency (%)
E 6
15.8%
C 4
10.5%
H 4
10.5%
G 4
10.5%
N 4
10.5%
T 3
7.9%
S 3
7.9%
P 2
 
5.3%
B 2
 
5.3%
M 2
 
5.3%
Other values (4) 4
10.5%
Common
ValueCountFrequency (%)
( 293
43.1%
) 293
43.1%
53
 
7.8%
2 16
 
2.4%
1 10
 
1.5%
3 5
 
0.7%
& 3
 
0.4%
8 2
 
0.3%
4 1
 
0.1%
0 1
 
0.1%
Other values (3) 3
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2674
78.8%
ASCII 718
 
21.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
303
 
11.3%
80
 
3.0%
80
 
3.0%
65
 
2.4%
62
 
2.3%
56
 
2.1%
54
 
2.0%
53
 
2.0%
53
 
2.0%
47
 
1.8%
Other values (260) 1821
68.1%
ASCII
ValueCountFrequency (%)
( 293
40.8%
) 293
40.8%
53
 
7.4%
2 16
 
2.2%
1 10
 
1.4%
E 6
 
0.8%
3 5
 
0.7%
C 4
 
0.6%
H 4
 
0.6%
G 4
 
0.6%
Other values (17) 30
 
4.2%
Distinct447
Distinct (%)97.2%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
2023-12-13T09:36:40.811113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length40
Median length37
Mean length22.713043
Min length18

Characters and Unicode

Total characters10448
Distinct characters208
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique434 ?
Unique (%)94.3%

Sample

1st row경상남도 함안군 가야읍 가야11길 13
2nd row경상남도 함안군 가야읍 가야16길 11
3rd row경상남도 함안군 가야읍 가야로 103-1
4th row경상남도 함안군 가야읍 가야로 132
5th row경상남도 함안군 가야읍 가야로 64
ValueCountFrequency (%)
경상남도 460
19.3%
함안군 460
19.3%
칠원읍 118
 
4.9%
군북면 92
 
3.9%
칠서면 85
 
3.6%
칠북면 38
 
1.6%
법수면 33
 
1.4%
가야읍 32
 
1.3%
산인면 29
 
1.2%
대산면 23
 
1.0%
Other values (565) 1018
42.6%
2023-12-13T09:36:41.110634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1975
18.9%
552
 
5.3%
538
 
5.1%
525
 
5.0%
482
 
4.6%
470
 
4.5%
463
 
4.4%
462
 
4.4%
1 357
 
3.4%
310
 
3.0%
Other values (198) 4314
41.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6561
62.8%
Space Separator 1975
 
18.9%
Decimal Number 1619
 
15.5%
Dash Punctuation 166
 
1.6%
Close Punctuation 57
 
0.5%
Open Punctuation 57
 
0.5%
Uppercase Letter 12
 
0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
552
 
8.4%
538
 
8.2%
525
 
8.0%
482
 
7.3%
470
 
7.2%
463
 
7.1%
462
 
7.0%
310
 
4.7%
262
 
4.0%
231
 
3.5%
Other values (175) 2266
34.5%
Decimal Number
ValueCountFrequency (%)
1 357
22.1%
2 227
14.0%
3 190
11.7%
4 135
 
8.3%
6 132
 
8.2%
5 131
 
8.1%
9 131
 
8.1%
7 128
 
7.9%
0 107
 
6.6%
8 81
 
5.0%
Uppercase Letter
ValueCountFrequency (%)
T 2
16.7%
C 2
16.7%
G 2
16.7%
N 2
16.7%
I 1
8.3%
E 1
8.3%
P 1
8.3%
K 1
8.3%
Space Separator
ValueCountFrequency (%)
1975
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 166
100.0%
Close Punctuation
ValueCountFrequency (%)
) 57
100.0%
Open Punctuation
ValueCountFrequency (%)
( 57
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6561
62.8%
Common 3875
37.1%
Latin 12
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
552
 
8.4%
538
 
8.2%
525
 
8.0%
482
 
7.3%
470
 
7.2%
463
 
7.1%
462
 
7.0%
310
 
4.7%
262
 
4.0%
231
 
3.5%
Other values (175) 2266
34.5%
Common
ValueCountFrequency (%)
1975
51.0%
1 357
 
9.2%
2 227
 
5.9%
3 190
 
4.9%
- 166
 
4.3%
4 135
 
3.5%
6 132
 
3.4%
5 131
 
3.4%
9 131
 
3.4%
7 128
 
3.3%
Other values (5) 303
 
7.8%
Latin
ValueCountFrequency (%)
T 2
16.7%
C 2
16.7%
G 2
16.7%
N 2
16.7%
I 1
8.3%
E 1
8.3%
P 1
8.3%
K 1
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6561
62.8%
ASCII 3887
37.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1975
50.8%
1 357
 
9.2%
2 227
 
5.8%
3 190
 
4.9%
- 166
 
4.3%
4 135
 
3.5%
6 132
 
3.4%
5 131
 
3.4%
9 131
 
3.4%
7 128
 
3.3%
Other values (13) 315
 
8.1%
Hangul
ValueCountFrequency (%)
552
 
8.4%
538
 
8.2%
525
 
8.0%
482
 
7.3%
470
 
7.2%
463
 
7.1%
462
 
7.0%
310
 
4.7%
262
 
4.0%
231
 
3.5%
Other values (175) 2266
34.5%
Distinct242
Distinct (%)52.7%
Missing1
Missing (%)0.2%
Memory size3.7 KiB
2023-12-13T09:36:41.305013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length21
Mean length12
Min length1

Characters and Unicode

Total characters5508
Distinct characters227
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique171 ?
Unique (%)37.3%

Sample

1st row자동차 세차업
2nd row자동차 세차업
3rd row자동차 세차업
4th row자동차 세차업
5th row주유소 운영업
ValueCountFrequency (%)
제조업 221
 
15.9%
136
 
9.8%
기타 76
 
5.5%
금속 41
 
2.9%
자동차 39
 
2.8%
절삭가공 28
 
2.0%
유사처리업 26
 
1.9%
그외 24
 
1.7%
세차업 22
 
1.6%
처리업 17
 
1.2%
Other values (354) 764
54.8%
2023-12-13T09:36:41.621347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
965
 
17.5%
425
 
7.7%
345
 
6.3%
338
 
6.1%
228
 
4.1%
142
 
2.6%
126
 
2.3%
121
 
2.2%
113
 
2.1%
107
 
1.9%
Other values (217) 2598
47.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4517
82.0%
Space Separator 965
 
17.5%
Other Punctuation 18
 
0.3%
Decimal Number 8
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
425
 
9.4%
345
 
7.6%
338
 
7.5%
228
 
5.0%
142
 
3.1%
126
 
2.8%
121
 
2.7%
113
 
2.5%
107
 
2.4%
95
 
2.1%
Other values (209) 2477
54.8%
Decimal Number
ValueCountFrequency (%)
1 3
37.5%
2 2
25.0%
3 2
25.0%
5 1
 
12.5%
Other Punctuation
ValueCountFrequency (%)
, 16
88.9%
· 1
 
5.6%
. 1
 
5.6%
Space Separator
ValueCountFrequency (%)
965
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4517
82.0%
Common 991
 
18.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
425
 
9.4%
345
 
7.6%
338
 
7.5%
228
 
5.0%
142
 
3.1%
126
 
2.8%
121
 
2.7%
113
 
2.5%
107
 
2.4%
95
 
2.1%
Other values (209) 2477
54.8%
Common
ValueCountFrequency (%)
965
97.4%
, 16
 
1.6%
1 3
 
0.3%
2 2
 
0.2%
3 2
 
0.2%
· 1
 
0.1%
. 1
 
0.1%
5 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4517
82.0%
ASCII 990
 
18.0%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
965
97.5%
, 16
 
1.6%
1 3
 
0.3%
2 2
 
0.2%
3 2
 
0.2%
. 1
 
0.1%
5 1
 
0.1%
Hangul
ValueCountFrequency (%)
425
 
9.4%
345
 
7.6%
338
 
7.5%
228
 
5.0%
142
 
3.1%
126
 
2.8%
121
 
2.7%
113
 
2.5%
107
 
2.4%
95
 
2.1%
Other values (209) 2477
54.8%
None
ValueCountFrequency (%)
· 1
100.0%


Categorical

IMBALANCE 

Distinct5
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
5종
436 
4종
 
21
 
1
2종
 
1
3종
 
1

Length

Max length2
Median length2
Mean length1.9978261
Min length1

Unique

Unique3 ?
Unique (%)0.7%

Sample

1st row5종
2nd row5종
3rd row5종
4th row5종
5th row5종

Common Values

ValueCountFrequency (%)
5종 436
94.8%
4종 21
 
4.6%
1
 
0.2%
2종 1
 
0.2%
3종 1
 
0.2%

Length

2023-12-13T09:36:41.725361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:36:41.811816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5종 436
95.0%
4종 21
 
4.6%
2종 1
 
0.2%
3종 1
 
0.2%

Missing values

2023-12-13T09:36:40.023666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T09:36:40.085150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사업장명도로명소재지대표업종
0팀일레븐경상남도 함안군 가야읍 가야11길 13자동차 세차업5종
1쌍둥이세차장경상남도 함안군 가야읍 가야16길 11자동차 세차업5종
2강남 손 세차장경상남도 함안군 가야읍 가야로 103-1자동차 세차업5종
3수영 손세차장경상남도 함안군 가야읍 가야로 132자동차 세차업5종
4함안주유소경상남도 함안군 가야읍 가야로 64주유소 운영업5종
5삼진알루늄경상남도 함안군 가야읍 검암리 990-175종
6삼진알미늄경상남도 함안군 가야읍 검암리 990-175종
7함안셀프세차장경상남도 함안군 가야읍 검암천북길 19 ((주)대동공업함안대리점)자동차 세차업5종
8(주)케이씨피 제5공장경상남도 함안군 가야읍 광정로 312토목공사 및 유사용 기계장비 제조업5종
9고려주유소경상남도 함안군 가야읍 광정리 302주유소 운영업5종
사업장명도로명소재지대표업종
450금성열처리경상남도 함안군 함안면 광정로 339-14금속 열처리업5종
451(주)오양기업 함안공장경상남도 함안군 함안면 광정로 344-17도장 및 기타 피막처리업4종
452(주)건양메탈 함안지점경상남도 함안군 함안면 광정로 372혼성 및 재생플라스틱 소재 물질 제조업5종
453신진물산(주)경상남도 함안군 함안면 봉성1길 41음·식료품 제조업3종
454지리산농산경상남도 함안군 함안면 봉수로 715 (지리산농산)과실 및 채소 절임식품 제조업5종
455(주)케이씨피 제3공장경상남도 함안군 함안면 봉수로 721토목공사 및 유사용 기계장비 제조업5종
456동원ENG경상남도 함안군 함안면 봉수로 733금속조립구조재 제조업5종
457태성기업경상남도 함안군 함안면 파수리 465-4 파수농공단지금속조립구조재 제조업5종
458영은금속경상남도 함안군 함안면 파수리 파수농공단지 465-17수동식 식품 가공기기 및 금속주방용기 제조업5종
459칠서제일주유소경상남도 함안군 칠서면 청계3길 1, 공단주유소주유소 운영업5종