Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells8
Missing cells (%)< 0.1%
Duplicate rows21
Duplicate rows (%)0.2%
Total size in memory556.6 KiB
Average record size in memory57.0 B

Variable types

Text2
Categorical2
DateTime1
Numeric1

Dataset

Description이동설치가 용이한 이동형측정기기를 수질오염사고 예상 지점 및 사고 발생지점에 설치하여 측정소별 자체적으로 정한 기준을 초과한 경우의 측정값 및 경보이력
Author한국환경공단
URLhttps://www.data.go.kr/data/15065132/fileData.do

Alerts

Dataset has 21 (0.2%) duplicate rowsDuplicates
항목명 is highly overall correlated with 항목코드High correlation
항목코드 is highly overall correlated with 항목명High correlation
항목코드 is highly imbalanced (62.4%)Imbalance
항목명 is highly imbalanced (62.4%)Imbalance
측정값 has 1804 (18.0%) zerosZeros

Reproduction

Analysis started2023-12-12 11:21:17.759963
Analysis finished2023-12-12 11:21:19.109689
Duration1.35 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct55
Distinct (%)0.6%
Missing1
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-12T20:21:19.275736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length3
Mean length2.9539954
Min length2

Characters and Unicode

Total characters29537
Distinct characters81
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.1%

Sample

1st row석남천
2nd row석남천
3rd row석남천
4th row여주침사지
5th row석남천
ValueCountFrequency (%)
석남천 8345
83.5%
김해 298
 
3.0%
구미 187
 
1.9%
성주 168
 
1.7%
호남예비3 143
 
1.4%
한강1 136
 
1.4%
지석천 109
 
1.1%
황룡강 104
 
1.0%
왕숙천 103
 
1.0%
고령 52
 
0.5%
Other values (45) 354
 
3.5%
2023-12-12T20:21:19.687688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8633
29.2%
8535
28.9%
8454
28.6%
299
 
1.0%
298
 
1.0%
249
 
0.8%
208
 
0.7%
198
 
0.7%
1 195
 
0.7%
190
 
0.6%
Other values (71) 2278
 
7.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 29081
98.5%
Decimal Number 454
 
1.5%
Dash Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8633
29.7%
8535
29.3%
8454
29.1%
299
 
1.0%
298
 
1.0%
249
 
0.9%
208
 
0.7%
198
 
0.7%
190
 
0.7%
178
 
0.6%
Other values (64) 1839
 
6.3%
Decimal Number
ValueCountFrequency (%)
1 195
43.0%
3 158
34.8%
2 43
 
9.5%
0 28
 
6.2%
8 28
 
6.2%
4 2
 
0.4%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 29081
98.5%
Common 456
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8633
29.7%
8535
29.3%
8454
29.1%
299
 
1.0%
298
 
1.0%
249
 
0.9%
208
 
0.7%
198
 
0.7%
190
 
0.7%
178
 
0.6%
Other values (64) 1839
 
6.3%
Common
ValueCountFrequency (%)
1 195
42.8%
3 158
34.6%
2 43
 
9.4%
0 28
 
6.1%
8 28
 
6.1%
4 2
 
0.4%
- 2
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 29081
98.5%
ASCII 456
 
1.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8633
29.7%
8535
29.3%
8454
29.1%
299
 
1.0%
298
 
1.0%
249
 
0.9%
208
 
0.7%
198
 
0.7%
190
 
0.7%
178
 
0.6%
Other values (64) 1839
 
6.3%
ASCII
ValueCountFrequency (%)
1 195
42.8%
3 158
34.6%
2 43
 
9.4%
0 28
 
6.1%
8 28
 
6.1%
4 2
 
0.4%
- 2
 
0.4%

항목코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
DOW00
7914 
PHY00
2073 
ETC
 
7
CON00
 
6

Length

Max length5
Median length5
Mean length4.9986
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDOW00
2nd rowDOW00
3rd rowDOW00
4th rowETC
5th rowDOW00

Common Values

ValueCountFrequency (%)
DOW00 7914
79.1%
PHY00 2073
 
20.7%
ETC 7
 
0.1%
CON00 6
 
0.1%

Length

2023-12-12T20:21:19.884028image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:21:20.043286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
dow00 7914
79.1%
phy00 2073
 
20.7%
etc 7
 
0.1%
con00 6
 
0.1%

항목명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
용존산소
7914 
pH
2073 
기타
 
7
전기전도도
 
6

Length

Max length5
Median length4
Mean length3.5846
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row용존산소
2nd row용존산소
3rd row용존산소
4th row기타
5th row용존산소

Common Values

ValueCountFrequency (%)
용존산소 7914
79.1%
pH 2073
 
20.7%
기타 7
 
0.1%
전기전도도 6
 
0.1%

Length

2023-12-12T20:21:20.205399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:21:20.358663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
용존산소 7914
79.1%
ph 2073
 
20.7%
기타 7
 
0.1%
전기전도도 6
 
0.1%
Distinct9880
Distinct (%)98.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T20:21:20.784709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters190000
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9769 ?
Unique (%)97.7%

Sample

1st row2021-11-26 10:20:00
2nd row2022-04-24 03:50:00
3rd row2021-05-13 09:50:00
4th row2013-04-26 16:46:00
5th row2021-05-19 03:50:00
ValueCountFrequency (%)
05:50:00 132
 
0.7%
07:20:00 120
 
0.6%
01:20:00 118
 
0.6%
06:20:00 117
 
0.6%
09:20:00 115
 
0.6%
02:50:00 115
 
0.6%
05:20:00 114
 
0.6%
00:50:00 114
 
0.6%
04:20:00 114
 
0.6%
03:20:00 113
 
0.6%
Other values (1141) 18828
94.1%
2023-12-12T20:21:21.457568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 60645
31.9%
2 27234
14.3%
1 24142
 
12.7%
- 20000
 
10.5%
: 20000
 
10.5%
10000
 
5.3%
9 5370
 
2.8%
5 5334
 
2.8%
4 5072
 
2.7%
3 4947
 
2.6%
Other values (3) 7256
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 140000
73.7%
Dash Punctuation 20000
 
10.5%
Other Punctuation 20000
 
10.5%
Space Separator 10000
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 60645
43.3%
2 27234
19.5%
1 24142
 
17.2%
9 5370
 
3.8%
5 5334
 
3.8%
4 5072
 
3.6%
3 4947
 
3.5%
6 2756
 
2.0%
7 2309
 
1.6%
8 2191
 
1.6%
Dash Punctuation
ValueCountFrequency (%)
- 20000
100.0%
Other Punctuation
ValueCountFrequency (%)
: 20000
100.0%
Space Separator
ValueCountFrequency (%)
10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 190000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 60645
31.9%
2 27234
14.3%
1 24142
 
12.7%
- 20000
 
10.5%
: 20000
 
10.5%
10000
 
5.3%
9 5370
 
2.8%
5 5334
 
2.8%
4 5072
 
2.7%
3 4947
 
2.6%
Other values (3) 7256
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 190000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 60645
31.9%
2 27234
14.3%
1 24142
 
12.7%
- 20000
 
10.5%
: 20000
 
10.5%
10000
 
5.3%
9 5370
 
2.8%
5 5334
 
2.8%
4 5072
 
2.7%
3 4947
 
2.6%
Other values (3) 7256
 
3.8%
Distinct9931
Distinct (%)99.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2011-02-23 13:45:00
Maximum2022-04-25 09:50:00
2023-12-12T20:21:21.668574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:21:21.874270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

측정값
Real number (ℝ)

ZEROS 

Distinct15
Distinct (%)0.2%
Missing7
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean3.698589
Minimum0
Maximum30
Zeros1804
Zeros (%)18.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T20:21:22.066710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q34
95-th percentile5
Maximum30
Range30
Interquartile range (IQR)2

Descriptive statistics

Standard deviation4.3764069
Coefficient of variation (CV)1.1832639
Kurtosis22.515947
Mean3.698589
Median Absolute Deviation (MAD)1
Skewness4.4153555
Sum36960
Variance19.152937
MonotonicityNot monotonic
2023-12-12T20:21:22.223797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
4 2761
27.6%
3 2001
20.0%
5 1818
18.2%
0 1804
18.0%
2 851
 
8.5%
1 306
 
3.1%
28 171
 
1.7%
9 136
 
1.4%
29 75
 
0.8%
6 46
 
0.5%
Other values (5) 24
 
0.2%
ValueCountFrequency (%)
0 1804
18.0%
1 306
 
3.1%
2 851
 
8.5%
3 2001
20.0%
4 2761
27.6%
5 1818
18.2%
6 46
 
0.5%
8 5
 
0.1%
9 136
 
1.4%
10 12
 
0.1%
ValueCountFrequency (%)
30 5
 
0.1%
29 75
 
0.8%
28 171
 
1.7%
24 1
 
< 0.1%
18 1
 
< 0.1%
10 12
 
0.1%
9 136
 
1.4%
8 5
 
0.1%
6 46
 
0.5%
5 1818
18.2%

Interactions

2023-12-12T20:21:18.367267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T20:21:22.359181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
측정소명항목코드항목명측정값
측정소명1.0000.8820.8820.750
항목코드0.8821.0001.0000.601
항목명0.8821.0001.0000.601
측정값0.7500.6010.6011.000
2023-12-12T20:21:22.503582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
항목명항목코드
항목명1.0001.000
항목코드1.0001.000
2023-12-12T20:21:22.651445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
측정값항목코드항목명
측정값1.0000.3770.377
항목코드0.3771.0001.000
항목명0.3771.0001.000

Missing values

2023-12-12T20:21:18.584741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T20:21:18.763222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T20:21:19.015687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

측정소명항목코드항목명측정일시경보발생시간측정값
1774석남천DOW00용존산소2021-11-26 10:20:002021-11-26 10:20:004
53석남천DOW00용존산소2022-04-24 03:50:002022-04-24 03:50:003
5947석남천DOW00용존산소2021-05-13 09:50:002021-05-13 09:50:005
17502여주침사지ETC기타2013-04-26 16:46:002013-04-26 16:51:00<NA>
5757석남천DOW00용존산소2021-05-19 03:50:002021-05-19 03:50:003
6851석남천DOW00용존산소2021-04-14 03:50:002021-04-14 03:50:004
17001호남예비3PHY00pH2015-06-24 02:00:002015-06-24 02:16:000
9971석남천DOW00용존산소2020-10-07 20:40:002020-10-07 20:45:004
6836석남천DOW00용존산소2021-04-14 12:40:002021-04-14 12:40:005
5483석남천DOW00용존산소2021-05-25 12:20:002021-05-25 12:20:001
측정소명항목코드항목명측정일시경보발생시간측정값
7177구미PHY00pH2021-03-27 03:10:002021-03-27 03:15:000
10169석남천DOW00용존산소2020-09-27 00:40:002020-09-27 00:45:004
12101석남천DOW00용존산소2019-11-19 23:20:002019-11-19 23:25:003
5801석남천DOW00용존산소2021-05-17 07:20:002021-05-17 07:20:003
13555석남천DOW00용존산소2019-09-27 22:40:002019-09-27 22:45:004
380석남천PHY00pH2022-04-15 05:40:002022-04-15 05:40:000
2037석남천DOW00용존산소2021-11-17 18:20:002021-11-17 18:20:005
9645고령PHY00pH2020-10-17 10:30:002020-10-17 10:35:000
17056호남예비3PHY00pH2015-06-10 19:00:002015-06-10 19:56:000
12625석남천DOW00용존산소2019-11-07 09:40:002019-11-07 09:45:004

Duplicate rows

Most frequently occurring

측정소명항목코드항목명측정일시경보발생시간측정값# duplicates
3석남천DOW00용존산소2021-04-30 07:20:002021-04-30 10:23:0003
20호남예비3PHY00pH2015-06-22 17:00:002015-06-22 19:10:0003
0감천PHY00pH2013-05-25 00:20:002013-05-25 00:56:0002
1금남PHY00pH2013-05-20 13:20:002013-05-20 13:53:0002
2노안PHY00pH2012-10-24 14:00:002012-10-24 14:20:0002
4성주PHY00pH2013-03-19 16:50:002013-03-19 17:25:0002
5승촌PHY00pH2013-03-28 11:50:002013-03-28 12:17:0002
6승촌PHY00pH2013-05-06 10:00:002013-05-06 10:41:0032
7승촌PHY00pH2013-05-10 16:50:002013-05-10 17:26:0002
8승촌PHY00pH2013-05-13 15:10:002013-05-13 16:04:0002