Overview

Dataset statistics

Number of variables12
Number of observations203
Missing cells377
Missing cells (%)15.5%
Duplicate rows4
Duplicate rows (%)2.0%
Total size in memory19.4 KiB
Average record size in memory97.7 B

Variable types

Unsupported7
Categorical5

Dataset

Description조류이슈에 대응하는 수질 기초자료를 확보하여 이를 수돗물 생산에 활용함으로써 국민이 안심하고 마실 수 있는 수돗물을 공급하는 한편, 수돗물 공급에 대한 인식을 제고
Author한국표준과학연구원
URLhttps://www.data.go.kr/data/15053555/fileData.do

Alerts

Dataset has 4 (2.0%) duplicate rowsDuplicates
( 단위: μg/L, 불확도: k = 2, 신뢰 수준 약 95 %) is highly overall correlated with Unnamed: 1 and 3 other fieldsHigh correlation
Unnamed: 2 is highly overall correlated with Unnamed: 1 and 1 other fieldsHigh correlation
Unnamed: 10 is highly overall correlated with Unnamed: 7 and 1 other fieldsHigh correlation
Unnamed: 7 is highly overall correlated with Unnamed: 10 and 1 other fieldsHigh correlation
Unnamed: 1 is highly overall correlated with Unnamed: 2 and 1 other fieldsHigh correlation
Unnamed: 0 has 203 (100.0%) missing valuesMissing
Unnamed: 3 has 11 (5.4%) missing valuesMissing
Unnamed: 4 has 35 (17.2%) missing valuesMissing
Unnamed: 5 has 23 (11.3%) missing valuesMissing
Unnamed: 6 has 35 (17.2%) missing valuesMissing
Unnamed: 8 has 35 (17.2%) missing valuesMissing
Unnamed: 9 has 35 (17.2%) missing valuesMissing
Unnamed: 0 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 3 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-18 07:23:30.470994
Analysis finished2024-04-18 07:23:31.114367
Duration0.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Unnamed: 0
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing203
Missing (%)100.0%
Memory size1.9 KiB

Unnamed: 1
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)8.4%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
<NA>
35 
연초
 
12
반송
 
12
고령
 
12
학야
 
12
Other values (12)
120 

Length

Max length4
Median length2
Mean length2.4187192
Min length2

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row정수장
2nd row<NA>
3rd row<NA>
4th row성남
5th row고양

Common Values

ValueCountFrequency (%)
<NA> 35
17.2%
연초 12
 
5.9%
반송 12
 
5.9%
고령 12
 
5.9%
학야 12
 
5.9%
화순 12
 
5.9%
정수장 12
 
5.9%
공주 12
 
5.9%
청주 12
 
5.9%
황지 12
 
5.9%
Other values (7) 60
29.6%

Length

2024-04-18T16:23:31.186635image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 35
17.2%
연초 12
 
5.9%
반송 12
 
5.9%
고령 12
 
5.9%
학야 12
 
5.9%
화순 12
 
5.9%
정수장 12
 
5.9%
공주 12
 
5.9%
청주 12
 
5.9%
황지 12
 
5.9%
Other values (5) 60
29.6%

Unnamed: 2
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
<NA>
35 
위치
 
12
경기도 성남시 수정구 사송동 산88-5
 
12
경기도 고양시 일산 동구 산황동 300
 
12
강원도 태백시 황연동 산174-2
 
12
Other values (10)
120 

Length

Max length24
Median length21
Mean length15.349754
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row위치
2nd row<NA>
3rd row<NA>
4th row경기도 성남시 수정구 사송동 산88-5
5th row경기도 고양시 일산 동구 산황동 300

Common Values

ValueCountFrequency (%)
<NA> 35
17.2%
위치 12
 
5.9%
경기도 성남시 수정구 사송동 산88-5 12
 
5.9%
경기도 고양시 일산 동구 산황동 300 12
 
5.9%
강원도 태백시 황연동 산174-2 12
 
5.9%
충북 청주시 흥덕구 성화동 286 12
 
5.9%
충북 충주시 용탄동 305 12
 
5.9%
충남 공주시 월송동 산42-7번지 12
 
5.9%
전북 완주군 고산면 성재리 27 12
 
5.9%
전남 화순군 화순읍 일심리 224 12
 
5.9%
Other values (5) 60
29.6%

Length

2024-04-18T16:23:31.342466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경남 36
 
4.4%
na 35
 
4.2%
경북 24
 
2.9%
충북 24
 
2.9%
경기도 24
 
2.9%
신현읍 12
 
1.5%
거제시 12
 
1.5%
화순군 12
 
1.5%
화순읍 12
 
1.5%
일심리 12
 
1.5%
Other values (52) 624
75.5%

Unnamed: 3
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)5.4%
Memory size1.7 KiB

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35
Missing (%)17.2%
Memory size1.7 KiB

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing23
Missing (%)11.3%
Memory size1.7 KiB

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35
Missing (%)17.2%
Memory size1.7 KiB

Unnamed: 7
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
<NA>
23 
2019.09.23
13 
2019.10.25
13 
2019.11.29
13 
2019.12.31
13 
Other values (10)
128 

Length

Max length10
Median length10
Mean length8.6108374
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row측정정보
3rd row채취일자
4th row2019.09.23
5th row2019.09.23

Common Values

ValueCountFrequency (%)
<NA> 23
 
11.3%
2019.09.23 13
 
6.4%
2019.10.25 13
 
6.4%
2019.11.29 13
 
6.4%
2019.12.31 13
 
6.4%
2020.01.31 13
 
6.4%
2020.02.28 13
 
6.4%
2020.03.12 13
 
6.4%
2020.04.30 13
 
6.4%
2020.05.29 13
 
6.4%
Other values (5) 63
31.0%

Length

2024-04-18T16:23:31.473975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 23
 
11.3%
2019.09.23 13
 
6.4%
2019.10.25 13
 
6.4%
2019.11.29 13
 
6.4%
2019.12.31 13
 
6.4%
2020.01.31 13
 
6.4%
2020.02.28 13
 
6.4%
2020.03.12 13
 
6.4%
2020.04.30 13
 
6.4%
2020.05.29 13
 
6.4%
Other values (5) 63
31.0%

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35
Missing (%)17.2%
Memory size1.7 KiB

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35
Missing (%)17.2%
Memory size1.7 KiB

Unnamed: 10
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
<NA>
35 
2020.07.02 ~ 07.05
13 
2020.08.03 ~ 08.05
13 
2020.06.01 ~ 06.03
13 
2020.02.04 ~ 02.07
13 
Other values (13)
116 

Length

Max length18
Median length18
Mean length14.758621
Min length4

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row<NA>
2nd row<NA>
3rd row시험일자
4th row2019.09.25 ~ 09.27
5th row2019.09.25 ~ 09.27

Common Values

ValueCountFrequency (%)
<NA> 35
17.2%
2020.07.02 ~ 07.05 13
 
6.4%
2020.08.03 ~ 08.05 13
 
6.4%
2020.06.01 ~ 06.03 13
 
6.4%
2020.02.04 ~ 02.07 13
 
6.4%
2020.02.28 ~ 03.02 13
 
6.4%
2019.12.03 ~ 12.06 13
 
6.4%
2020.08.27 ~ 08.31 13
 
6.4%
2020.01.02 ~ 01.06 13
 
6.4%
2020.05.06 ~ 05.08 12
 
5.9%
Other values (8) 52
25.6%

Length

2024-04-18T16:23:31.592134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
156
30.3%
na 35
 
6.8%
2020.02.28 13
 
2.5%
2020.07.02 13
 
2.5%
2020.01.02 13
 
2.5%
08.31 13
 
2.5%
2020.08.27 13
 
2.5%
12.06 13
 
2.5%
2019.12.03 13
 
2.5%
03.02 13
 
2.5%
Other values (25) 220
42.7%
Distinct3
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
K-water
156 
<NA>
35 
분석자
 
12

Length

Max length7
Median length7
Mean length6.2463054
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row분석자
4th rowK-water
5th rowK-water

Common Values

ValueCountFrequency (%)
K-water 156
76.8%
<NA> 35
 
17.2%
분석자 12
 
5.9%

Length

2024-04-18T16:23:31.722019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-18T16:23:31.830823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
k-water 156
76.8%
na 35
 
17.2%
분석자 12
 
5.9%

Correlations

2024-04-18T16:23:31.902631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 2Unnamed: 7Unnamed: 10( 단위: μg/L, 불확도: k = 2, 신뢰 수준 약 95 %)
Unnamed: 11.0001.0000.0000.000NaN
Unnamed: 21.0001.0000.0000.000NaN
Unnamed: 70.0000.0001.0001.0001.000
Unnamed: 100.0000.0001.0001.0001.000
( 단위: μg/L, 불확도: k = 2, 신뢰 수준 약 95 %)NaNNaN1.0001.0001.000
2024-04-18T16:23:32.009334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
( 단위: μg/L, 불확도: k = 2, 신뢰 수준 약 95 %)Unnamed: 2Unnamed: 10Unnamed: 7Unnamed: 1
( 단위: μg/L, 불확도: k = 2, 신뢰 수준 약 95 %)1.0001.0000.9540.9661.000
Unnamed: 21.0001.0000.0000.0000.993
Unnamed: 100.9540.0001.0000.9870.000
Unnamed: 70.9660.0000.9871.0000.000
Unnamed: 11.0000.9930.0000.0001.000
2024-04-18T16:23:32.123518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 2Unnamed: 7Unnamed: 10( 단위: μg/L, 불확도: k = 2, 신뢰 수준 약 95 %)
Unnamed: 11.0000.9930.0000.0001.000
Unnamed: 20.9931.0000.0000.0001.000
Unnamed: 70.0000.0001.0000.9870.966
Unnamed: 100.0000.0000.9871.0000.954
( 단위: μg/L, 불확도: k = 2, 신뢰 수준 약 95 %)1.0001.0000.9660.9541.000

Missing values

2024-04-18T16:23:30.842209image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-18T16:23:30.998407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10( 단위: μg/L, 불확도: k = 2, 신뢰 수준 약 95 %)
0<NA>정수장위치2019년 9월NaNNaNNaN<NA>NaNNaN<NA><NA>
1<NA><NA><NA>Geosmin 측정값 (μg/L)NaN2-MIB 측정값 (μg/L)NaN측정정보NaNNaN<NA><NA>
2<NA><NA><NA>측정결과확장불확도측정결과확장불확도채취일자pH수온(℃)시험일자분석자
3<NA>성남경기도 성남시 수정구 사송동 산88-5불검출-불검출-2019.09.237.722.62019.09.25 ~ 09.27K-water
4<NA>고양경기도 고양시 일산 동구 산황동 3000.0010.0001불검출-2019.09.237.0822.52019.09.25 ~ 09.27K-water
5<NA>황지강원도 태백시 황연동 산174-2불검출-불검출-2019.09.237.0716.22019.09.25 ~ 09.27K-water
6<NA>청주충북 청주시 흥덕구 성화동 286불검출-불검출-2019.09.237.4519.32019.09.25 ~ 09.27K-water
7<NA>충주충북 충주시 용탄동 305불검출-불검출-2019.09.237.28192019.09.25 ~ 09.27K-water
8<NA>공주충남 공주시 월송동 산42-7번지불검출-불검출-2019.09.236.9916.22019.09.25 ~ 09.27K-water
9<NA>고산전북 완주군 고산면 성재리 270.00110.0001불검출-2019.09.236.8912.42019.09.25 ~ 09.27K-water
Unnamed: 0Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10( 단위: μg/L, 불확도: k = 2, 신뢰 수준 약 95 %)
193<NA>청주충북 청주시 흥덕구 성화동 286불검출-불검출-2020.08.276.9620.92020.08.27 ~ 08.31K-water
194<NA>충주충북 충주시 용탄동 305불검출-불검출-2020.08.277.619.52020.08.27 ~ 08.31K-water
195<NA>공주충남 공주시 월송동 산42-7번지0.00120.0001불검출-2020.08.276.8121.22020.08.27 ~ 08.31K-water
196<NA>고산전북 완주군 고산면 성재리 27불검출-불검출-2020.08.276.8321.92020.08.27 ~ 08.31K-water
197<NA>화순전남 화순군 화순읍 일심리 224불검출-불검출-2020.08.276.7422.92020.08.27 ~ 08.31K-water
198<NA>학야경북 포항시 북구 기계면 학야리 730불검출-불검출-2020.08.277.3817.52020.08.27 ~ 08.31K-water
199<NA>고령경북 고령군 다산면 노곡리 746불검출-불검출-2020.08.277.4727.72020.08.27 ~ 08.31K-water
200<NA>반송경남 창원시 석영길 12번지 (반림동 25)불검출-불검출-2020.08.277.0828.52020.08.27 ~ 08.31K-water
201<NA>밀양경남 밀양시 산외면 다죽리 402불검출-불검출-2020.08.277.121.72020.08.27 ~ 08.31K-water
202<NA>연초경남 거제시 신현읍 삼거리 529-4불검출-불검출-2020.08.276.7827.92020.08.27 ~ 08.31K-water

Duplicate rows

Most frequently occurring

Unnamed: 1Unnamed: 2Unnamed: 7Unnamed: 10( 단위: μg/L, 불확도: k = 2, 신뢰 수준 약 95 %)# duplicates
0정수장위치<NA><NA><NA>12
1<NA><NA>채취일자시험일자분석자12
2<NA><NA>측정정보<NA><NA>12
3<NA><NA><NA><NA><NA>11