Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

DateTime1
Numeric2
Categorical4

Dataset

Description김해도시개발공사 상동 하수처리시설별에 대한 시간대별 계측 현황을 조회하는 서비스로 기준연월일, 기준시간, 하수처리장구분명, 계측구분명, 계측값 등의 정보를 제공
Author김해시도시개발공사
URLhttps://www.data.go.kr/data/15096555/fileData.do

Alerts

하수처리장구분명 has constant value ""Constant
계측단위 is highly overall correlated with 기준시간 and 3 other fieldsHigh correlation
계측태그명 is highly overall correlated with 계측값 and 2 other fieldsHigh correlation
계측구분명 is highly overall correlated with 계측값 and 2 other fieldsHigh correlation
기준시간 is highly overall correlated with 계측단위High correlation
계측값 is highly overall correlated with 계측구분명 and 2 other fieldsHigh correlation
계측단위 is highly imbalanced (89.3%)Imbalance
기준시간 has 396 (4.0%) zerosZeros
계측값 has 775 (7.8%) zerosZeros

Reproduction

Analysis started2023-12-12 14:21:16.927146
Analysis finished2023-12-12 14:21:18.248732
Duration1.32 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1302
Distinct (%)13.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2018-01-01 00:00:00
Maximum2021-08-25 00:00:00
2023-12-12T23:21:18.325919image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T23:21:18.479307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

기준시간
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.5807
Minimum0
Maximum23
Zeros396
Zeros (%)4.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T23:21:18.611564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q16
median12
Q318
95-th percentile22
Maximum23
Range23
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.9359569
Coefficient of variation (CV)0.59892381
Kurtosis-1.2151545
Mean11.5807
Median Absolute Deviation (MAD)6
Skewness-0.0057697122
Sum115807
Variance48.107498
MonotonicityNot monotonic
2023-12-12T23:21:18.730262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
19 456
 
4.6%
23 455
 
4.5%
6 453
 
4.5%
9 439
 
4.4%
16 437
 
4.4%
12 436
 
4.4%
5 434
 
4.3%
2 429
 
4.3%
15 426
 
4.3%
20 425
 
4.2%
Other values (14) 5610
56.1%
ValueCountFrequency (%)
0 396
4.0%
1 397
4.0%
2 429
4.3%
3 405
4.0%
4 418
4.2%
5 434
4.3%
6 453
4.5%
7 385
3.9%
8 406
4.1%
9 439
4.4%
ValueCountFrequency (%)
23 455
4.5%
22 400
4.0%
21 414
4.1%
20 425
4.2%
19 456
4.6%
18 394
3.9%
17 420
4.2%
16 437
4.4%
15 426
4.3%
14 405
4.0%

하수처리장구분명
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
상동 공공하수처리시설
10000 

Length

Max length11
Median length11
Mean length11
Min length11

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row상동 공공하수처리시설
2nd row상동 공공하수처리시설
3rd row상동 공공하수처리시설
4th row상동 공공하수처리시설
5th row상동 공공하수처리시설

Common Values

ValueCountFrequency (%)
상동 공공하수처리시설 10000
100.0%

Length

2023-12-12T23:21:18.839660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:21:18.920805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
상동 10000
50.0%
공공하수처리시설 10000
50.0%

계측구분명
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
유입유량 적산
6559 
유량조정조 수위
3300 
반응조PH
 
141

Length

Max length8
Median length7
Mean length7.3018
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row유입유량 적산
2nd row유입유량 적산
3rd row유량조정조 수위
4th row유입유량 적산
5th row유입유량 적산

Common Values

ValueCountFrequency (%)
유입유량 적산 6559
65.6%
유량조정조 수위 3300
33.0%
반응조PH 141
 
1.4%

Length

2023-12-12T23:21:19.020452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:21:19.130320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
유입유량 6559
33.0%
적산 6559
33.0%
유량조정조 3300
16.6%
수위 3300
16.6%
반응조ph 141
 
0.7%

계측태그명
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
FIT-101A
3334 
LIT-101
3300 
FIT-101B
3225 
PHIT-202A
 
141

Length

Max length9
Median length8
Mean length7.6841
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFIT-101B
2nd rowFIT-101B
3rd rowLIT-101
4th rowFIT-101B
5th rowFIT-101A

Common Values

ValueCountFrequency (%)
FIT-101A 3334
33.3%
LIT-101 3300
33.0%
FIT-101B 3225
32.2%
PHIT-202A 141
 
1.4%

Length

2023-12-12T23:21:19.262095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:21:19.369600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
fit-101a 3334
33.3%
lit-101 3300
33.0%
fit-101b 3225
32.2%
phit-202a 141
 
1.4%

계측단위
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9859 
pH
 
141

Length

Max length4
Median length4
Mean length3.9718
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9859
98.6%
pH 141
 
1.4%

Length

2023-12-12T23:21:19.496542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:21:19.600485image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9859
98.6%
ph 141
 
1.4%

계측값
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct4592
Distinct (%)45.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean538578.56
Minimum0
Maximum1164880
Zeros775
Zeros (%)7.8%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T23:21:19.761613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12.25
median700074
Q3954121.75
95-th percentile1102982.8
Maximum1164880
Range1164880
Interquartile range (IQR)954119.5

Descriptive statistics

Standard deviation454152.36
Coefficient of variation (CV)0.84324256
Kurtosis-1.7090882
Mean538578.56
Median Absolute Deviation (MAD)372967
Skewness-0.1965468
Sum5.3857856 × 109
Variance2.0625437 × 1011
MonotonicityNot monotonic
2023-12-12T23:21:19.941595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 775
 
7.8%
2.04 34
 
0.3%
2.05 32
 
0.3%
2.06 30
 
0.3%
2.13 29
 
0.3%
2.07 29
 
0.3%
1.89 29
 
0.3%
2.43 29
 
0.3%
1.96 27
 
0.3%
2.15 26
 
0.3%
Other values (4582) 8960
89.6%
ValueCountFrequency (%)
0.0 775
7.8%
0.13 1
 
< 0.1%
0.53 1
 
< 0.1%
0.67 1
 
< 0.1%
0.79 1
 
< 0.1%
0.83 1
 
< 0.1%
0.86 1
 
< 0.1%
0.88 1
 
< 0.1%
0.92 1
 
< 0.1%
0.98 1
 
< 0.1%
ValueCountFrequency (%)
1164880.0 1
 
< 0.1%
1164728.0 1
 
< 0.1%
1164631.0 2
< 0.1%
1164478.0 1
 
< 0.1%
1164325.0 1
 
< 0.1%
1164172.0 3
< 0.1%
1163715.0 1
 
< 0.1%
1162514.0 1
 
< 0.1%
1162375.0 2
< 0.1%
1162230.0 1
 
< 0.1%

Interactions

2023-12-12T23:21:17.858879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T23:21:17.330126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T23:21:17.954680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T23:21:17.425922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T23:21:20.046334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기준시간계측구분명계측태그명계측값
기준시간1.0000.0000.0000.000
계측구분명0.0001.0001.0000.714
계측태그명0.0001.0001.0000.670
계측값0.0000.7140.6701.000
2023-12-12T23:21:20.149269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
계측단위계측태그명계측구분명
계측단위1.0001.0001.000
계측태그명1.0001.0001.000
계측구분명1.0001.0001.000
2023-12-12T23:21:20.379566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기준시간계측값계측구분명계측태그명계측단위
기준시간1.0000.0060.0000.0001.000
계측값0.0061.0000.6330.5321.000
계측구분명0.0000.6331.0001.0001.000
계측태그명0.0000.5321.0001.0001.000
계측단위1.0001.0001.0001.0001.000

Missing values

2023-12-12T23:21:18.083235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T23:21:18.196467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

기준연월일기준시간하수처리장구분명계측구분명계측태그명계측단위계측값
744342019-05-2210상동 공공하수처리시설유입유량 적산FIT-101B<NA>797040.0
740082019-05-0416상동 공공하수처리시설유입유량 적산FIT-101B<NA>788980.0
252542020-12-026상동 공공하수처리시설유량조정조 수위LIT-101<NA>2.45
825982020-04-2814상동 공공하수처리시설유입유량 적산FIT-101B<NA>965562.0
350082018-06-1316상동 공공하수처리시설유입유량 적산FIT-101A<NA>693249.0
922102021-06-192상동 공공하수처리시설유입유량 적산FIT-101B<NA>1086672.0
459582019-09-1422상동 공공하수처리시설유입유량 적산FIT-101A<NA>913444.0
46562018-07-210상동 공공하수처리시설유량조정조 수위LIT-101<NA>1.6
779732019-10-1621상동 공공하수처리시설유입유량 적산FIT-101B<NA>873780.0
105102019-03-2322상동 공공하수처리시설유량조정조 수위LIT-101<NA>2.39
기준연월일기준시간하수처리장구분명계측구분명계측태그명계측단위계측값
461442019-09-2216상동 공공하수처리시설유입유량 적산FIT-101A<NA>917044.0
585232021-02-2411상동 공공하수처리시설유입유량 적산FIT-101A<NA>1109549.0
419462019-03-3118상동 공공하수처리시설유입유량 적산FIT-101A<NA>830981.0
360182018-07-2518상동 공공하수처리시설유입유량 적산FIT-101A<NA>714997.0
274382021-03-036상동 공공하수처리시설유량조정조 수위LIT-101<NA>1.76
236962020-09-268상동 공공하수처리시설유량조정조 수위LIT-101<NA>2.66
458372019-09-0921상동 공공하수처리시설유입유량 적산FIT-101A<NA>911194.0
815232020-03-1419상동 공공하수처리시설유입유량 적산FIT-101B<NA>945267.0
634592018-02-153상동 공공하수처리시설유입유량 적산FIT-101B<NA>580704.0
68432018-10-213상동 공공하수처리시설유량조정조 수위LIT-101<NA>2.1