Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells905
Missing cells (%)1.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory556.6 KiB
Average record size in memory57.0 B

Variable types

DateTime1
Categorical2
Text2
Numeric1

Dataset

Description김해도시개발공사 하수처리시설별에 대한 일자별 계측 현황을 조회하는 서비스로 기준연월일, 하수처리장구분명, 계측구분명, 계측값 등의 정보를 제공
Author김해시도시개발공사
URLhttps://www.data.go.kr/data/15096571/fileData.do

Alerts

하수처리장구분명 is highly overall correlated with 계측단위High correlation
계측단위 is highly overall correlated with 하수처리장구분명High correlation
계측태그명 has 605 (6.0%) missing valuesMissing
계측값 has 300 (3.0%) missing valuesMissing
계측값 is highly skewed (γ1 = 56.80990004)Skewed
계측값 has 1830 (18.3%) zerosZeros

Reproduction

Analysis started2023-12-12 12:54:39.057428
Analysis finished2023-12-12 12:54:39.791445
Duration0.73 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct200
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2018-01-01 00:00:00
Maximum2018-07-19 00:00:00
2023-12-12T21:54:39.860384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:39.989709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

하수처리장구분명
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
진영맑은물사업소 HANT반응조
2463 
장유 하수처리장
1506 
(증설)진례 하수처리장
1362 
진영맑은물사업소
1347 
안하 하수처리장
1025 
Other values (11)
2297 

Length

Max length16
Median length13
Mean length10.904
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row진영맑은물사업소
2nd row진영맑은물사업소
3rd row장유 하수처리장
4th row안하 하수처리장
5th row진영맑은물사업소 HANT반응조

Common Values

ValueCountFrequency (%)
진영맑은물사업소 HANT반응조 2463
24.6%
장유 하수처리장 1506
15.1%
(증설)진례 하수처리장 1362
13.6%
진영맑은물사업소 1347
13.5%
안하 하수처리장 1025
10.2%
상동 공공하수처리시설 972
 
9.7%
진례 하수처리장 527
 
5.3%
생림 하수처리장 487
 
4.9%
낙산마을 하수처리장 65
 
0.7%
용산마을 하수처리장 63
 
0.6%
Other values (6) 183
 
1.8%

Length

2023-12-12T21:54:40.111519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
하수처리장 5178
27.8%
진영맑은물사업소 3810
20.4%
hant반응조 2463
13.2%
장유 1506
 
8.1%
증설)진례 1362
 
7.3%
안하 1025
 
5.5%
상동 972
 
5.2%
공공하수처리시설 972
 
5.2%
진례 527
 
2.8%
생림 487
 
2.6%
Other values (9) 351
 
1.9%
Distinct257
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T21:54:40.352242image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length18
Mean length7.9533
Min length2

Characters and Unicode

Total characters79533
Distinct characters166
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row포기조 용존산소량
2nd row케잌호퍼 중량
3rd row2지슬러지량
4th row구연산 공급펌프 토출유량 적산
5th row여과막 토출량계
ValueCountFrequency (%)
수위 997
 
5.5%
여과막 824
 
4.6%
토출량계 824
 
4.6%
슬러지 390
 
2.2%
반응조 375
 
2.1%
유량 365
 
2.0%
분리막 345
 
1.9%
압력계 345
 
1.9%
흡입 345
 
1.9%
공급유량 272
 
1.5%
Other values (236) 12954
71.8%
2023-12-12T21:54:40.713887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8036
 
10.1%
5034
 
6.3%
3560
 
4.5%
3533
 
4.4%
3012
 
3.8%
2570
 
3.2%
1957
 
2.5%
1861
 
2.3%
1831
 
2.3%
1752
 
2.2%
Other values (156) 46387
58.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 60576
76.2%
Space Separator 8036
 
10.1%
Uppercase Letter 7996
 
10.1%
Close Punctuation 949
 
1.2%
Open Punctuation 949
 
1.2%
Decimal Number 655
 
0.8%
Lowercase Letter 314
 
0.4%
Other Punctuation 40
 
0.1%
Dash Punctuation 18
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5034
 
8.3%
3560
 
5.9%
3533
 
5.8%
3012
 
5.0%
2570
 
4.2%
1957
 
3.2%
1861
 
3.1%
1831
 
3.0%
1752
 
2.9%
1441
 
2.4%
Other values (122) 34025
56.2%
Uppercase Letter
ValueCountFrequency (%)
O 1338
16.7%
L 1101
13.8%
S 828
10.4%
H 750
9.4%
P 682
8.5%
M 524
 
6.6%
R 483
 
6.0%
D 407
 
5.1%
A 390
 
4.9%
N 378
 
4.7%
Other values (8) 1115
13.9%
Decimal Number
ValueCountFrequency (%)
2 215
32.8%
3 178
27.2%
1 149
22.7%
4 44
 
6.7%
5 21
 
3.2%
6 18
 
2.7%
7 17
 
2.6%
8 13
 
2.0%
Lowercase Letter
ValueCountFrequency (%)
a 235
74.8%
p 62
 
19.7%
h 17
 
5.4%
Space Separator
ValueCountFrequency (%)
8036
100.0%
Close Punctuation
ValueCountFrequency (%)
) 949
100.0%
Open Punctuation
ValueCountFrequency (%)
( 949
100.0%
Other Punctuation
ValueCountFrequency (%)
# 40
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 60576
76.2%
Common 10647
 
13.4%
Latin 8310
 
10.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5034
 
8.3%
3560
 
5.9%
3533
 
5.8%
3012
 
5.0%
2570
 
4.2%
1957
 
3.2%
1861
 
3.1%
1831
 
3.0%
1752
 
2.9%
1441
 
2.4%
Other values (122) 34025
56.2%
Latin
ValueCountFrequency (%)
O 1338
16.1%
L 1101
13.2%
S 828
10.0%
H 750
9.0%
P 682
8.2%
M 524
 
6.3%
R 483
 
5.8%
D 407
 
4.9%
A 390
 
4.7%
N 378
 
4.5%
Other values (11) 1429
17.2%
Common
ValueCountFrequency (%)
8036
75.5%
) 949
 
8.9%
( 949
 
8.9%
2 215
 
2.0%
3 178
 
1.7%
1 149
 
1.4%
4 44
 
0.4%
# 40
 
0.4%
5 21
 
0.2%
6 18
 
0.2%
Other values (3) 48
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 60576
76.2%
ASCII 18957
 
23.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8036
42.4%
O 1338
 
7.1%
L 1101
 
5.8%
) 949
 
5.0%
( 949
 
5.0%
S 828
 
4.4%
H 750
 
4.0%
P 682
 
3.6%
M 524
 
2.8%
R 483
 
2.5%
Other values (24) 3317
17.5%
Hangul
ValueCountFrequency (%)
5034
 
8.3%
3560
 
5.9%
3533
 
5.8%
3012
 
5.0%
2570
 
4.2%
1957
 
3.2%
1861
 
3.1%
1831
 
3.0%
1752
 
2.9%
1441
 
2.4%
Other values (122) 34025
56.2%

계측태그명
Text

MISSING 

Distinct397
Distinct (%)4.2%
Missing605
Missing (%)6.0%
Memory size156.2 KiB
2023-12-12T21:54:40.959899image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length10
Mean length7.3474188
Min length5

Characters and Unicode

Total characters69029
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDOIA-101A
2nd rowWIA-101
3rd rowFI311
4th rowFIQ-402
5th rowFIT-402L
ValueCountFrequency (%)
lit-302 130
 
1.4%
lit-301 113
 
1.2%
lit-401 107
 
1.1%
lit-101 107
 
1.1%
lit-201a 83
 
0.9%
lit-201b 80
 
0.9%
fit-404 71
 
0.8%
fit-402h 55
 
0.6%
mlss-204a 53
 
0.6%
fit-201 52
 
0.6%
Other values (387) 8544
90.9%
2023-12-12T21:54:41.337318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 8716
12.6%
- 8113
11.8%
1 6340
 
9.2%
I 5939
 
8.6%
T 5187
 
7.5%
F 4485
 
6.5%
2 4326
 
6.3%
4 3459
 
5.0%
L 2492
 
3.6%
A 2478
 
3.6%
Other values (25) 17494
25.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 32441
47.0%
Decimal Number 28475
41.3%
Dash Punctuation 8113
 
11.8%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I 5939
18.3%
T 5187
16.0%
F 4485
13.8%
L 2492
7.7%
A 2478
7.6%
B 1809
 
5.6%
P 1573
 
4.8%
R 1333
 
4.1%
Q 1279
 
3.9%
O 1036
 
3.2%
Other values (14) 4830
14.9%
Decimal Number
ValueCountFrequency (%)
0 8716
30.6%
1 6340
22.3%
2 4326
15.2%
4 3459
 
12.1%
3 2246
 
7.9%
5 1411
 
5.0%
6 659
 
2.3%
9 581
 
2.0%
8 533
 
1.9%
7 204
 
0.7%
Dash Punctuation
ValueCountFrequency (%)
- 8113
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 36588
53.0%
Latin 32441
47.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 5939
18.3%
T 5187
16.0%
F 4485
13.8%
L 2492
7.7%
A 2478
7.6%
B 1809
 
5.6%
P 1573
 
4.8%
R 1333
 
4.1%
Q 1279
 
3.9%
O 1036
 
3.2%
Other values (14) 4830
14.9%
Common
ValueCountFrequency (%)
0 8716
23.8%
- 8113
22.2%
1 6340
17.3%
2 4326
11.8%
4 3459
 
9.5%
3 2246
 
6.1%
5 1411
 
3.9%
6 659
 
1.8%
9 581
 
1.6%
8 533
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 69029
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 8716
12.6%
- 8113
11.8%
1 6340
 
9.2%
I 5939
 
8.6%
T 5187
 
7.5%
F 4485
 
6.5%
2 4326
 
6.3%
4 3459
 
5.0%
L 2492
 
3.6%
A 2478
 
3.6%
Other values (25) 17494
25.3%

계측단위
Categorical

HIGH CORRELATION 

Distinct42
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
1504 
1387 
㎥/H
921 
M
801 
m
688 
Other values (37)
4699 

Length

Max length5
Median length4
Mean length2.6025
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowppm
2nd rowton
3rd row㎥/h
4th row
5th row

Common Values

ValueCountFrequency (%)
<NA> 1504
15.0%
1387
13.9%
㎥/H 921
 
9.2%
M 801
 
8.0%
m 688
 
6.9%
㎥/hr 654
 
6.5%
㎥/h 473
 
4.7%
mmHg 377
 
3.8%
mV 324
 
3.2%
% 317
 
3.2%
Other values (32) 2554
25.5%

Length

2023-12-12T21:54:41.482186image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 1504
15.0%
m 1489
14.9%
㎥/h 1394
13.9%
1387
13.9%
㎥/hr 654
 
6.5%
mv 448
 
4.5%
mmhg 377
 
3.8%
317
 
3.2%
ppm 305
 
3.0%
mg/l 274
 
2.7%
Other values (22) 1851
18.5%

계측값
Real number (ℝ)

MISSING  SKEWED  ZEROS 

Distinct3862
Distinct (%)39.8%
Missing300
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean60210.967
Minimum-1500
Maximum1.7874988 × 108
Zeros1830
Zeros (%)18.3%
Negative765
Negative (%)7.6%
Memory size166.0 KiB
2023-12-12T21:54:41.619604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-1500
5-th percentile-256.271
Q10
median2.55
Q324.505
95-th percentile2734.1265
Maximum1.7874988 × 108
Range1.7875138 × 108
Interquartile range (IQR)24.505

Descriptive statistics

Standard deviation3143729.9
Coefficient of variation (CV)52.211915
Kurtosis3227.3437
Mean60210.967
Median Absolute Deviation (MAD)2.55
Skewness56.8099
Sum5.8404638 × 108
Variance9.8830376 × 1012
MonotonicityNot monotonic
2023-12-12T21:54:41.755637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 1830
 
18.3%
0.01 76
 
0.8%
1.0 51
 
0.5%
4.8 48
 
0.5%
4.7 48
 
0.5%
2.5 47
 
0.5%
0.5 47
 
0.5%
3.2 44
 
0.4%
0.02 42
 
0.4%
0.04 39
 
0.4%
Other values (3852) 7428
74.3%
(Missing) 300
 
3.0%
ValueCountFrequency (%)
-1500.0 4
< 0.1%
-855.96 1
 
< 0.1%
-700.0 3
< 0.1%
-675.67 1
 
< 0.1%
-672.34 1
 
< 0.1%
-671.31 1
 
< 0.1%
-670.82 1
 
< 0.1%
-665.49 1
 
< 0.1%
-662.67 1
 
< 0.1%
-656.82 1
 
< 0.1%
ValueCountFrequency (%)
178749875.0 1
 
< 0.1%
178749348.5 1
 
< 0.1%
178740852.08 1
 
< 0.1%
1110303.0 23
0.2%
712008.12 1
 
< 0.1%
709004.38 1
 
< 0.1%
690174.96 1
 
< 0.1%
672942.75 1
 
< 0.1%
671262.17 1
 
< 0.1%
656615.96 1
 
< 0.1%

Interactions

2023-12-12T21:54:39.414124image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:54:41.846111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
하수처리장구분명계측단위계측값
하수처리장구분명1.0000.8980.000
계측단위0.8981.0000.000
계측값0.0000.0001.000
2023-12-12T21:54:41.960850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
하수처리장구분명계측단위
하수처리장구분명1.0000.616
계측단위0.6161.000
2023-12-12T21:54:42.057028image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
계측값하수처리장구분명계측단위
계측값1.0000.0000.000
하수처리장구분명0.0001.0000.616
계측단위0.0000.6161.000

Missing values

2023-12-12T21:54:39.535721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:54:39.648900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T21:54:39.742310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

기준연월일하수처리장구분명계측구분명계측태그명계측단위계측값
226542018-02-17진영맑은물사업소포기조 용존산소량DOIA-101Appm0.09
157572018-02-03진영맑은물사업소케잌호퍼 중량WIA-101ton4.6
693342018-05-30장유 하수처리장2지슬러지량FI311㎥/h332.9
893622018-07-08안하 하수처리장구연산 공급펌프 토출유량 적산FIQ-402<NA>
456412018-04-05진영맑은물사업소 HANT반응조여과막 토출량계FIT-402L28.21
811572018-06-22안하 하수처리장용존산소저감조 차압DPI-204Ambar-0.34
81952018-01-18진영맑은물사업소 HANT반응조NaOCL 수위계LIT-901Em0.45
72352018-01-16진영맑은물사업소 HANT반응조반응조 수위LIA-401Am5.0
620642018-05-16안하 하수처리장용존산소 저감조 수위계LIT-205Bm3.0
489012018-04-14(증설)진례 하수처리장반송펌프P-206AHz46.78
기준연월일하수처리장구분명계측구분명계측태그명계측단위계측값
531992018-04-24진영맑은물사업소한트반응조유입유량FRQ-101614.29
518972018-04-21진례 하수처리장방류하수량FT-302㎥/hr0.0
370462018-03-18진영맑은물사업소 HANT반응조호기조 DO계DO-401Bmg/ℓ0.17
115332018-01-25진영맑은물사업소슬러지 저장조액위LIA-201m1.73
767382018-06-13진영맑은물사업소 HANT반응조ALUM주입량계FIT-901C0.0
420532018-03-29상동 공공하수처리시설반응조 수위설정(LO)LIT-201B<NA>3.2
495952018-04-15진영맑은물사업소 HANT반응조여과막 토출량계FIT-402E0.0
782962018-06-16진영맑은물사업소 HANT반응조NaOH주입량계FIT-901Dℓ/min0.0
787022018-06-17장유 하수처리장탈질조ORP2301AmV-1500.0
705152018-06-01진영맑은물사업소초침슬러지 반송유량FRQ-103A0.0