Overview

Dataset statistics

Number of variables8
Number of observations24
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.6 KiB
Average record size in memory69.5 B

Variable types

Categorical6
Text2

Dataset

Description광주환경공단 평동폐수처리장의 폐수처리수질 현황에 대한 데이터로 측정항목, 측정지점에 따른 월별 측정항목별 측정 수치를 제공합니다.
URLhttps://www.data.go.kr/data/15104635/fileData.do

Alerts

처리장명 has constant value ""Constant
측정지점 is highly overall correlated with 시료채취일1(2022-09-22) and 3 other fieldsHigh correlation
시료채취일3(2023-03-28) is highly overall correlated with 측정지점 and 3 other fieldsHigh correlation
시료채취일4(2023-05-19) is highly overall correlated with 측정지점 and 3 other fieldsHigh correlation
시료채취일2(2022-12-08) is highly overall correlated with 측정지점 and 3 other fieldsHigh correlation
시료채취일1(2022-09-22) is highly overall correlated with 측정지점 and 3 other fieldsHigh correlation
측정지점 is highly imbalanced (75.0%)Imbalance
시료채취일1(2022-09-22) is highly imbalanced (68.6%)Imbalance
시료채취일2(2022-12-08) is highly imbalanced (55.0%)Imbalance
시료채취일3(2023-03-28) is highly imbalanced (68.6%)Imbalance
시료채취일4(2023-05-19) is highly imbalanced (62.9%)Imbalance

Reproduction

Analysis started2023-12-12 11:27:59.228618
Analysis finished2023-12-12 11:28:00.144632
Duration0.92 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

처리장명
Categorical

CONSTANT 

Distinct1
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size324.0 B
평동폐수처리장
24 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row평동폐수처리장
2nd row평동폐수처리장
3rd row평동폐수처리장
4th row평동폐수처리장
5th row평동폐수처리장

Common Values

ValueCountFrequency (%)
평동폐수처리장 24
100.0%

Length

2023-12-12T20:28:00.264662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:28:00.420590image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
평동폐수처리장 24
100.0%
Distinct23
Distinct (%)95.8%
Missing0
Missing (%)0.0%
Memory size324.0 B
2023-12-12T20:28:00.658692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length8
Mean length6.3333333
Min length3

Characters and Unicode

Total characters152
Distinct characters54
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)91.7%

Sample

1st row복합악취
2nd row복합악취
3rd row암모니아
4th row황화수소
5th row메틸메르캅탄
ValueCountFrequency (%)
복합악취 2
 
8.3%
스타이렌 1
 
4.2%
i-발레르산 1
 
4.2%
n-뷰틸산 1
 
4.2%
프로피온산 1
 
4.2%
i-뷰틸알코올 1
 
4.2%
뷰틸아세테이트 1
 
4.2%
메틸아이소뷰틸케톤 1
 
4.2%
메틸에틸케톤 1
 
4.2%
자일렌 1
 
4.2%
Other values (13) 13
54.2%
2023-12-12T20:28:01.212906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
14
 
9.2%
12
 
7.9%
7
 
4.6%
7
 
4.6%
- 6
 
3.9%
6
 
3.9%
5
 
3.3%
5
 
3.3%
5
 
3.3%
5
 
3.3%
Other values (44) 80
52.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 140
92.1%
Dash Punctuation 6
 
3.9%
Lowercase Letter 6
 
3.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
14
 
10.0%
12
 
8.6%
7
 
5.0%
7
 
5.0%
6
 
4.3%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
Other values (41) 69
49.3%
Lowercase Letter
ValueCountFrequency (%)
i 3
50.0%
n 3
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 140
92.1%
Common 6
 
3.9%
Latin 6
 
3.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
14
 
10.0%
12
 
8.6%
7
 
5.0%
7
 
5.0%
6
 
4.3%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
Other values (41) 69
49.3%
Latin
ValueCountFrequency (%)
i 3
50.0%
n 3
50.0%
Common
ValueCountFrequency (%)
- 6
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 140
92.1%
ASCII 12
 
7.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
14
 
10.0%
12
 
8.6%
7
 
5.0%
7
 
5.0%
6
 
4.3%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
Other values (41) 69
49.3%
ASCII
ValueCountFrequency (%)
- 6
50.0%
i 3
25.0%
n 3
25.0%

측정지점
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size324.0 B
부지경계
23 
약액세정탑
 
1

Length

Max length5
Median length4
Mean length4.0416667
Min length4

Unique

Unique1 ?
Unique (%)4.2%

Sample

1st row부지경계
2nd row약액세정탑
3rd row부지경계
4th row부지경계
5th row부지경계

Common Values

ValueCountFrequency (%)
부지경계 23
95.8%
약액세정탑 1
 
4.2%

Length

2023-12-12T20:28:01.455639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:28:01.607595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부지경계 23
95.8%
약액세정탑 1
 
4.2%
Distinct18
Distinct (%)75.0%
Missing0
Missing (%)0.0%
Memory size324.0 B
2023-12-12T20:28:01.814571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7
Mean length5.5
Min length3

Characters and Unicode

Total characters132
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)58.3%

Sample

1st row15이하
2nd row500이하
3rd row1이하
4th row0.02이하
5th row0.002이하
ValueCountFrequency (%)
1이하 4
16.7%
0.05이하 2
 
8.3%
0.01이하 2
 
8.3%
0.009이하 2
 
8.3%
10이하 1
 
4.2%
15이하 1
 
4.2%
0.4이하 1
 
4.2%
0.001이하 1
 
4.2%
0.03이하 1
 
4.2%
0.9이하 1
 
4.2%
Other values (8) 8
33.3%
2023-12-12T20:28:02.282540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 41
31.1%
24
18.2%
24
18.2%
. 16
 
12.1%
1 10
 
7.6%
9 5
 
3.8%
5 5
 
3.8%
2 3
 
2.3%
3 3
 
2.3%
4 1
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 68
51.5%
Other Letter 48
36.4%
Other Punctuation 16
 
12.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 41
60.3%
1 10
 
14.7%
9 5
 
7.4%
5 5
 
7.4%
2 3
 
4.4%
3 3
 
4.4%
4 1
 
1.5%
Other Letter
ValueCountFrequency (%)
24
50.0%
24
50.0%
Other Punctuation
ValueCountFrequency (%)
. 16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 84
63.6%
Hangul 48
36.4%

Most frequent character per script

Common
ValueCountFrequency (%)
0 41
48.8%
. 16
 
19.0%
1 10
 
11.9%
9 5
 
6.0%
5 5
 
6.0%
2 3
 
3.6%
3 3
 
3.6%
4 1
 
1.2%
Hangul
ValueCountFrequency (%)
24
50.0%
24
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 84
63.6%
Hangul 48
36.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 41
48.8%
. 16
 
19.0%
1 10
 
11.9%
9 5
 
6.0%
5 5
 
6.0%
2 3
 
3.6%
3 3
 
3.6%
4 1
 
1.2%
Hangul
ValueCountFrequency (%)
24
50.0%
24
50.0%

시료채취일1(2022-09-22)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)12.5%
Missing0
Missing (%)0.0%
Memory size324.0 B
해당없음
22 
3
 
1
300
 
1

Length

Max length4
Median length4
Mean length3.8333333
Min length1

Unique

Unique2 ?
Unique (%)8.3%

Sample

1st row3
2nd row300
3rd row해당없음
4th row해당없음
5th row해당없음

Common Values

ValueCountFrequency (%)
해당없음 22
91.7%
3 1
 
4.2%
300 1
 
4.2%

Length

2023-12-12T20:28:02.486903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:28:02.621318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
해당없음 22
91.7%
3 1
 
4.2%
300 1
 
4.2%

시료채취일2(2022-12-08)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Memory size324.0 B
불검출
20 
0
 
2
3
 
1
448
 
1

Length

Max length3
Median length3
Mean length2.75
Min length1

Unique

Unique2 ?
Unique (%)8.3%

Sample

1st row3
2nd row448
3rd row0
4th row0
5th row불검출

Common Values

ValueCountFrequency (%)
불검출 20
83.3%
0 2
 
8.3%
3 1
 
4.2%
448 1
 
4.2%

Length

2023-12-12T20:28:02.789540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:28:02.966653image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
불검출 20
83.3%
0 2
 
8.3%
3 1
 
4.2%
448 1
 
4.2%

시료채취일3(2023-03-28)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)12.5%
Missing0
Missing (%)0.0%
Memory size324.0 B
해당없음
22 
3
 
1
173
 
1

Length

Max length4
Median length4
Mean length3.8333333
Min length1

Unique

Unique2 ?
Unique (%)8.3%

Sample

1st row3
2nd row173
3rd row해당없음
4th row해당없음
5th row해당없음

Common Values

ValueCountFrequency (%)
해당없음 22
91.7%
3 1
 
4.2%
173 1
 
4.2%

Length

2023-12-12T20:28:03.148483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:28:03.326466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
해당없음 22
91.7%
3 1
 
4.2%
173 1
 
4.2%

시료채취일4(2023-05-19)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Memory size324.0 B
불검출
21 
3
 
1
373
 
1
0.1
 
1

Length

Max length3
Median length3
Mean length2.9166667
Min length1

Unique

Unique3 ?
Unique (%)12.5%

Sample

1st row3
2nd row373
3rd row0.1
4th row불검출
5th row불검출

Common Values

ValueCountFrequency (%)
불검출 21
87.5%
3 1
 
4.2%
373 1
 
4.2%
0.1 1
 
4.2%

Length

2023-12-12T20:28:03.506947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:28:03.684576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
불검출 21
87.5%
3 1
 
4.2%
373 1
 
4.2%
0.1 1
 
4.2%

Correlations

2023-12-12T20:28:03.900881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
측정항목측정지점악취기준시료채취일1(2022-09-22)시료채취일2(2022-12-08)시료채취일3(2023-03-28)시료채취일4(2023-05-19)
측정항목1.0000.0000.9090.0000.0000.0000.000
측정지점0.0001.0001.0001.0001.0001.0001.000
악취기준0.9091.0001.0001.0000.8931.0000.533
시료채취일1(2022-09-22)0.0001.0001.0001.0001.0001.0001.000
시료채취일2(2022-12-08)0.0001.0000.8931.0001.0001.0000.994
시료채취일3(2023-03-28)0.0001.0001.0001.0001.0001.0001.000
시료채취일4(2023-05-19)0.0001.0000.5331.0000.9941.0001.000
2023-12-12T20:28:04.168743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
측정지점시료채취일3(2023-03-28)시료채취일4(2023-05-19)시료채취일2(2022-12-08)시료채취일1(2022-09-22)
측정지점1.0000.9770.9530.9530.977
시료채취일3(2023-03-28)0.9771.0000.9760.9761.000
시료채취일4(2023-05-19)0.9530.9761.0000.8940.976
시료채취일2(2022-12-08)0.9530.9760.8941.0000.976
시료채취일1(2022-09-22)0.9771.0000.9760.9761.000
2023-12-12T20:28:04.403187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
측정지점시료채취일1(2022-09-22)시료채취일2(2022-12-08)시료채취일3(2023-03-28)시료채취일4(2023-05-19)
측정지점1.0000.9770.9530.9770.953
시료채취일1(2022-09-22)0.9771.0000.9761.0000.976
시료채취일2(2022-12-08)0.9530.9761.0000.9760.894
시료채취일3(2023-03-28)0.9771.0000.9761.0000.976
시료채취일4(2023-05-19)0.9530.9760.8940.9761.000

Missing values

2023-12-12T20:27:59.806336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T20:28:00.047503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

처리장명측정항목측정지점악취기준시료채취일1(2022-09-22)시료채취일2(2022-12-08)시료채취일3(2023-03-28)시료채취일4(2023-05-19)
0평동폐수처리장복합악취부지경계15이하3333
1평동폐수처리장복합악취약액세정탑500이하300448173373
2평동폐수처리장암모니아부지경계1이하해당없음0해당없음0.1
3평동폐수처리장황화수소부지경계0.02이하해당없음0해당없음불검출
4평동폐수처리장메틸메르캅탄부지경계0.002이하해당없음불검출해당없음불검출
5평동폐수처리장다이메틸설파이드부지경계0.01이하해당없음불검출해당없음불검출
6평동폐수처리장다이메틸다이설파이드부지경계0.009이하해당없음불검출해당없음불검출
7평동폐수처리장트라이메틸아민부지경계0.005이하해당없음불검출해당없음불검출
8평동폐수처리장아세트알데하이드부지경계0.05이하해당없음불검출해당없음불검출
9평동폐수처리장프로피온알데하이드부지경계0.05이하해당없음불검출해당없음불검출
처리장명측정항목측정지점악취기준시료채취일1(2022-09-22)시료채취일2(2022-12-08)시료채취일3(2023-03-28)시료채취일4(2023-05-19)
14평동폐수처리장톨루엔부지경계10이하해당없음불검출해당없음불검출
15평동폐수처리장자일렌부지경계1이하해당없음불검출해당없음불검출
16평동폐수처리장메틸에틸케톤부지경계13이하해당없음불검출해당없음불검출
17평동폐수처리장메틸아이소뷰틸케톤부지경계1이하해당없음불검출해당없음불검출
18평동폐수처리장뷰틸아세테이트부지경계1이하해당없음불검출해당없음불검출
19평동폐수처리장i-뷰틸알코올부지경계0.9이하해당없음불검출해당없음불검출
20평동폐수처리장프로피온산부지경계0.03이하해당없음불검출해당없음불검출
21평동폐수처리장n-뷰틸산부지경계0.001이하해당없음불검출해당없음불검출
22평동폐수처리장i-발레르산부지경계0.01이하해당없음불검출해당없음불검출
23평동폐수처리장n-발레르산부지경계0.0009이하해당없음불검출해당없음불검출