Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells713
Missing cells (%)1.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory556.6 KiB
Average record size in memory57.0 B

Variable types

Categorical3
Text2
Numeric1

Dataset

Description김해도시개발공사 하수처리시설별에 대한 월별 계측 현황을 조회하는 서비스로 기준연월, 하수처리장구분명, 계측구분명, 계측값 등의 정보를 제공
Author김해시도시개발공사
URLhttps://www.data.go.kr/data/15096569/fileData.do

Alerts

하수처리장구분명 is highly overall correlated with 계측단위High correlation
계측단위 is highly overall correlated with 하수처리장구분명High correlation
계측태그명 has 627 (6.3%) missing valuesMissing
계측값 is highly skewed (γ1 = 27.45932954)Skewed
계측값 has 1322 (13.2%) zerosZeros

Reproduction

Analysis started2023-12-12 20:20:33.891537
Analysis finished2023-12-12 20:20:34.936493
Duration1.04 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

기준연월
Categorical

Distinct45
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2021-02
 
278
2021-01
 
274
2020-11
 
273
2020-07
 
262
2020-10
 
262
Other values (40)
8651 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021-03
2nd row2018-02
3rd row2021-05
4th row2019-07
5th row2020-02

Common Values

ValueCountFrequency (%)
2021-02 278
 
2.8%
2021-01 274
 
2.7%
2020-11 273
 
2.7%
2020-07 262
 
2.6%
2020-10 262
 
2.6%
2021-03 261
 
2.6%
2019-07 255
 
2.5%
2020-12 253
 
2.5%
2020-08 253
 
2.5%
2021-08 252
 
2.5%
Other values (35) 7377
73.8%

Length

2023-12-13T05:20:34.998387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2021-02 278
 
2.8%
2021-01 274
 
2.7%
2020-11 273
 
2.7%
2020-07 262
 
2.6%
2020-10 262
 
2.6%
2021-03 261
 
2.6%
2019-07 255
 
2.5%
2020-12 253
 
2.5%
2020-08 253
 
2.5%
2020-09 252
 
2.5%
Other values (35) 7377
73.8%

하수처리장구분명
Categorical

HIGH CORRELATION 

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
진영맑은물사업소 HANT반응조
1933 
장유 하수처리장
1822 
(증설)진례 하수처리장
1152 
진영맑은물사업소
1131 
상동 공공하수처리시설
808 
Other values (19)
3154 

Length

Max length16
Median length15
Mean length10.9258
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row상동 공공하수처리시설
2nd row안하 하수처리장
3rd row(증설)상동 공공하수처리시설
4th row진영맑은물사업소 HANT반응조
5th row진영맑은물사업소 HANT반응조

Common Values

ValueCountFrequency (%)
진영맑은물사업소 HANT반응조 1933
19.3%
장유 하수처리장 1822
18.2%
(증설)진례 하수처리장 1152
11.5%
진영맑은물사업소 1131
11.3%
상동 공공하수처리시설 808
8.1%
안하 하수처리장 483
 
4.8%
(증설)한림 하수처리장 438
 
4.4%
진례 하수처리장 426
 
4.3%
대동 공공하수처리시설 346
 
3.5%
생림 하수처리장 343
 
3.4%
Other values (14) 1118
11.2%

Length

2023-12-13T05:20:35.130023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
하수처리장 5379
28.5%
진영맑은물사업소 3064
16.2%
hant반응조 1933
 
10.2%
장유 1822
 
9.7%
공공하수처리시설 1458
 
7.7%
증설)진례 1152
 
6.1%
상동 808
 
4.3%
안하 483
 
2.6%
증설)한림 438
 
2.3%
진례 426
 
2.3%
Other values (18) 1906
 
10.1%
Distinct389
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T05:20:35.410814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length19
Mean length7.6949
Min length2

Characters and Unicode

Total characters76949
Distinct characters188
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row처리수조 수위설정(LO)
2nd row혐기조 ORP
3rd row유량조정조 수위설정(H.H.HI)
4th row반응조 수위
5th row슬러지 저류조 수위
ValueCountFrequency (%)
수위 1019
 
5.9%
여과막 682
 
3.9%
토출량계 682
 
3.9%
반응조 509
 
2.9%
슬러지 322
 
1.9%
유량 311
 
1.8%
흡입 253
 
1.5%
압력계 253
 
1.5%
분리막 253
 
1.5%
유량조정조 247
 
1.4%
Other values (330) 12807
73.9%
2023-12-13T05:20:35.846496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7338
 
9.5%
4764
 
6.2%
3787
 
4.9%
3467
 
4.5%
2941
 
3.8%
2275
 
3.0%
1982
 
2.6%
1971
 
2.6%
1952
 
2.5%
1683
 
2.2%
Other values (178) 44789
58.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 57532
74.8%
Uppercase Letter 8471
 
11.0%
Space Separator 7338
 
9.5%
Decimal Number 1069
 
1.4%
Close Punctuation 1018
 
1.3%
Open Punctuation 1018
 
1.3%
Lowercase Letter 335
 
0.4%
Other Punctuation 147
 
0.2%
Dash Punctuation 15
 
< 0.1%
Other Symbol 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4764
 
8.3%
3787
 
6.6%
3467
 
6.0%
2941
 
5.1%
2275
 
4.0%
1982
 
3.4%
1971
 
3.4%
1952
 
3.4%
1683
 
2.9%
1336
 
2.3%
Other values (138) 31374
54.5%
Uppercase Letter
ValueCountFrequency (%)
O 1321
15.6%
L 1052
12.4%
S 892
10.5%
H 837
9.9%
P 742
8.8%
M 607
7.2%
D 502
 
5.9%
A 449
 
5.3%
R 426
 
5.0%
N 308
 
3.6%
Other values (9) 1335
15.8%
Decimal Number
ValueCountFrequency (%)
2 347
32.5%
1 296
27.7%
3 261
24.4%
4 92
 
8.6%
6 24
 
2.2%
8 19
 
1.8%
7 16
 
1.5%
5 14
 
1.3%
Lowercase Letter
ValueCountFrequency (%)
a 188
56.1%
p 106
31.6%
h 34
 
10.1%
l 7
 
2.1%
Other Punctuation
ValueCountFrequency (%)
. 69
46.9%
# 40
27.2%
% 32
21.8%
/ 6
 
4.1%
Space Separator
ValueCountFrequency (%)
7338
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1018
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1018
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 15
100.0%
Other Symbol
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 57532
74.8%
Common 10611
 
13.8%
Latin 8806
 
11.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4764
 
8.3%
3787
 
6.6%
3467
 
6.0%
2941
 
5.1%
2275
 
4.0%
1982
 
3.4%
1971
 
3.4%
1952
 
3.4%
1683
 
2.9%
1336
 
2.3%
Other values (138) 31374
54.5%
Latin
ValueCountFrequency (%)
O 1321
15.0%
L 1052
11.9%
S 892
10.1%
H 837
9.5%
P 742
8.4%
M 607
 
6.9%
D 502
 
5.7%
A 449
 
5.1%
R 426
 
4.8%
N 308
 
3.5%
Other values (13) 1670
19.0%
Common
ValueCountFrequency (%)
7338
69.2%
) 1018
 
9.6%
( 1018
 
9.6%
2 347
 
3.3%
1 296
 
2.8%
3 261
 
2.5%
4 92
 
0.9%
. 69
 
0.7%
# 40
 
0.4%
% 32
 
0.3%
Other values (7) 100
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 57532
74.8%
ASCII 19411
 
25.2%
CJK Compat 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7338
37.8%
O 1321
 
6.8%
L 1052
 
5.4%
) 1018
 
5.2%
( 1018
 
5.2%
S 892
 
4.6%
H 837
 
4.3%
P 742
 
3.8%
M 607
 
3.1%
D 502
 
2.6%
Other values (29) 4084
21.0%
Hangul
ValueCountFrequency (%)
4764
 
8.3%
3787
 
6.6%
3467
 
6.0%
2941
 
5.1%
2275
 
4.0%
1982
 
3.4%
1971
 
3.4%
1952
 
3.4%
1683
 
2.9%
1336
 
2.3%
Other values (138) 31374
54.5%
CJK Compat
ValueCountFrequency (%)
6
100.0%

계측태그명
Text

MISSING 

Distinct480
Distinct (%)5.1%
Missing627
Missing (%)6.3%
Memory size156.2 KiB
2023-12-13T05:20:36.188876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length10
Mean length7.1176784
Min length2

Characters and Unicode

Total characters66714
Distinct characters39
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.1%

Sample

1st rowLIT-301
2nd rowORP-201B
3rd rowLIT-102
4th rowLIA-401D
5th rowLIT-404A
ValueCountFrequency (%)
lit-401 143
 
1.5%
lit-301 116
 
1.2%
fit-201 112
 
1.2%
lit-302 95
 
1.0%
lit-101 88
 
0.9%
lit-201a 70
 
0.7%
lit-201b 69
 
0.7%
fit-501 59
 
0.6%
lt-301 56
 
0.6%
fit-401 53
 
0.6%
Other values (469) 8512
90.8%
2023-12-13T05:20:36.686927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 8370
12.5%
- 7415
11.1%
1 6300
 
9.4%
I 5946
 
8.9%
T 5078
 
7.6%
2 4231
 
6.3%
F 4136
 
6.2%
4 2856
 
4.3%
3 2630
 
3.9%
L 2528
 
3.8%
Other values (29) 17224
25.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 31523
47.3%
Decimal Number 27686
41.5%
Dash Punctuation 7415
 
11.1%
Other Letter 48
 
0.1%
Lowercase Letter 42
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I 5946
18.9%
T 5078
16.1%
F 4136
13.1%
L 2528
8.0%
A 2319
 
7.4%
B 1637
 
5.2%
P 1555
 
4.9%
R 1183
 
3.8%
O 1103
 
3.5%
S 1068
 
3.4%
Other values (15) 4970
15.8%
Decimal Number
ValueCountFrequency (%)
0 8370
30.2%
1 6300
22.8%
2 4231
15.3%
4 2856
 
10.3%
3 2630
 
9.5%
5 1337
 
4.8%
6 675
 
2.4%
8 585
 
2.1%
9 497
 
1.8%
7 205
 
0.7%
Other Letter
ValueCountFrequency (%)
24
50.0%
24
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 7415
100.0%
Lowercase Letter
ValueCountFrequency (%)
p 42
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 35101
52.6%
Latin 31565
47.3%
Hangul 48
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 5946
18.8%
T 5078
16.1%
F 4136
13.1%
L 2528
8.0%
A 2319
 
7.3%
B 1637
 
5.2%
P 1555
 
4.9%
R 1183
 
3.7%
O 1103
 
3.5%
S 1068
 
3.4%
Other values (16) 5012
15.9%
Common
ValueCountFrequency (%)
0 8370
23.8%
- 7415
21.1%
1 6300
17.9%
2 4231
12.1%
4 2856
 
8.1%
3 2630
 
7.5%
5 1337
 
3.8%
6 675
 
1.9%
8 585
 
1.7%
9 497
 
1.4%
Hangul
ValueCountFrequency (%)
24
50.0%
24
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 66666
99.9%
Hangul 48
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 8370
12.6%
- 7415
11.1%
1 6300
 
9.5%
I 5946
 
8.9%
T 5078
 
7.6%
2 4231
 
6.3%
F 4136
 
6.2%
4 2856
 
4.3%
3 2630
 
3.9%
L 2528
 
3.8%
Other values (27) 17176
25.8%
Hangul
ValueCountFrequency (%)
24
50.0%
24
50.0%

계측단위
Categorical

HIGH CORRELATION 

Distinct43
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
2393 
1229 
M
805 
㎥/H
694 
m
529 
Other values (38)
4350 

Length

Max length5
Median length4
Mean length2.7354
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd rowmV
3rd row<NA>
4th rowm
5th rowm

Common Values

ValueCountFrequency (%)
<NA> 2393
23.9%
1229
12.3%
M 805
 
8.1%
㎥/H 694
 
6.9%
m 529
 
5.3%
㎥/h 529
 
5.3%
㎥/hr 437
 
4.4%
% 324
 
3.2%
mmHg 289
 
2.9%
mV 252
 
2.5%
Other values (33) 2519
25.2%

Length

2023-12-13T05:20:36.837082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 2393
23.9%
m 1334
13.3%
1229
12.3%
㎥/h 1223
12.2%
㎥/hr 437
 
4.4%
ppm 382
 
3.8%
mv 364
 
3.6%
324
 
3.2%
mmhg 289
 
2.9%
mg/l 243
 
2.4%
Other values (23) 1782
17.8%

계측값
Real number (ℝ)

SKEWED  ZEROS 

Distinct4440
Distinct (%)44.8%
Missing86
Missing (%)0.9%
Infinite0
Infinite (%)0.0%
Mean278690.76
Minimum-2000
Maximum2.5483554 × 108
Zeros1322
Zeros (%)13.2%
Negative704
Negative (%)7.0%
Memory size166.0 KiB
2023-12-13T05:20:36.977904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2000
5-th percentile-196.2755
Q10.25
median3.5
Q326.135
95-th percentile3101.284
Maximum2.5483554 × 108
Range2.5483754 × 108
Interquartile range (IQR)25.885

Descriptive statistics

Standard deviation5768213.4
Coefficient of variation (CV)20.697541
Kurtosis856.401
Mean278690.76
Median Absolute Deviation (MAD)3.5
Skewness27.45933
Sum2.7629402 × 109
Variance3.3272286 × 1013
MonotonicityNot monotonic
2023-12-13T05:20:37.134939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 1322
 
13.2%
0.01 61
 
0.6%
0.7 44
 
0.4%
4.8 38
 
0.4%
0.8 37
 
0.4%
0.9 36
 
0.4%
0.02 34
 
0.3%
2.5 32
 
0.3%
0.03 28
 
0.3%
3.2 28
 
0.3%
Other values (4430) 8254
82.5%
(Missing) 86
 
0.9%
ValueCountFrequency (%)
-2000.0 7
0.1%
-1500.0 14
0.1%
-1000.0 1
 
< 0.1%
-837.44 1
 
< 0.1%
-700.0 4
 
< 0.1%
-698.59 1
 
< 0.1%
-698.44 1
 
< 0.1%
-693.19 1
 
< 0.1%
-684.02 1
 
< 0.1%
-658.43 1
 
< 0.1%
ValueCountFrequency (%)
254835540.95 1
< 0.1%
184937691.6 1
< 0.1%
167411359.61 1
< 0.1%
161061273.63 1
< 0.1%
161061273.61 1
< 0.1%
138387096.85 1
< 0.1%
138091849.1 1
< 0.1%
132774526.59 1
< 0.1%
125269879.51 1
< 0.1%
119277051.2 1
< 0.1%

Interactions

2023-12-13T05:20:34.518850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T05:20:37.230314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기준연월하수처리장구분명계측단위계측값
기준연월1.0000.4140.0000.000
하수처리장구분명0.4141.0000.8880.000
계측단위0.0000.8881.0000.000
계측값0.0000.0000.0001.000
2023-12-13T05:20:37.354039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
계측단위하수처리장구분명기준연월
계측단위1.0000.5170.000
하수처리장구분명0.5171.0000.105
기준연월0.0000.1051.000
2023-12-13T05:20:37.501909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
계측값기준연월하수처리장구분명계측단위
계측값1.0000.0000.0000.000
기준연월0.0001.0000.1050.000
하수처리장구분명0.0000.1051.0000.517
계측단위0.0000.0000.5171.000

Missing values

2023-12-13T05:20:34.646559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T05:20:34.762302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T05:20:34.881269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

기준연월하수처리장구분명계측구분명계측태그명계측단위계측값
222452021-03상동 공공하수처리시설처리수조 수위설정(LO)LIT-301<NA>0.9
6892018-02안하 하수처리장혐기조 ORPORP-201BmV30.65
233052021-05(증설)상동 공공하수처리시설유량조정조 수위설정(H.H.HI)LIT-102<NA>4.5
99052019-07진영맑은물사업소 HANT반응조반응조 수위LIA-401Dm5.02
140482020-02진영맑은물사업소 HANT반응조슬러지 저류조 수위LIT-404Am3.19
248732021-07장유 하수처리장2지공기량FI302N㎥/h2756.41
48452018-10장유 하수처리장2차 침전지 잉여슬러지량FI401㎥/h38.06
245772021-07(증설)상동 공공하수처리시설방류유량FIT-401㎥/h46.83
148022020-04(증설)한림 하수처리장반송유량FT301Al/min6.16
97732019-07장유 하수처리장호기성 산화조LI816M11.38
기준연월하수처리장구분명계측구분명계측태그명계측단위계측값
185872020-10(증설)생림 하수처리장반응조B(온도)<NA><NA>21.33
243812021-06진영맑은물사업소잉여슬러지반송 유량FRQ-107㎥/H7.2
94202019-06진영맑은물사업소 HANT반응조호기조 DO계DO-401Dmg/ℓ0.98
62362019-01낙산마을 하수처리장중계펌프장1 수위<NA><NA>1.01
192312020-11(증설)상동 공공하수처리시설장내용수공급수조 수위설정(H.HI)LIT-401<NA>3.0
85402019-05장유 하수처리장1지MLSSMLSS334PPM3002.97
155092020-05상동 공공하수처리시설유량조정조교반기 운전설정(RESET LL)<NA><NA>0.0
87342019-05진영맑은물사업소폴리머 용해조액위LIA-202m1.29
46542018-10(증설)진례 하수처리장유입하수량FIT-115㎥/hr183.76
23742018-05진례 하수처리장처리수조 수위LT-301m1.52