Overview

Dataset statistics

Number of variables7
Number of observations106
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.3 KiB
Average record size in memory61.2 B

Variable types

Categorical1
Text2
Numeric4

Dataset

Description국립공원 계곡 수질관리 현황을 공원명, 일련번호, 지점명, BOD_1차, BOD_2차, BOD_3차로 구분해 놓은 데이터입니다.
Author국립공원공단
URLhttps://www.data.go.kr/data/15044518/fileData.do

Alerts

BOD_1차 is highly overall correlated with BOD_2차 and 2 other fieldsHigh correlation
BOD_2차 is highly overall correlated with BOD_1차 and 3 other fieldsHigh correlation
BOD_3차 is highly overall correlated with BOD_1차 and 3 other fieldsHigh correlation
BOD_평균 is highly overall correlated with BOD_1차 and 3 other fieldsHigh correlation
공원명 is highly overall correlated with BOD_2차 and 2 other fieldsHigh correlation
일련번호 has unique valuesUnique
BOD_2차 has 9 (8.5%) zerosZeros

Reproduction

Analysis started2023-12-12 19:38:46.331788
Analysis finished2023-12-12 19:38:48.729023
Duration2.4 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

공원명
Categorical

HIGH CORRELATION 

Distinct21
Distinct (%)19.8%
Missing0
Missing (%)0.0%
Memory size980.0 B
지리산
11 
계룡산
지리산남부
북한산
북한산도봉
 
6
Other values (16)
64 

Length

Max length5
Median length3
Mean length3.4528302
Min length2

Unique

Unique1 ?
Unique (%)0.9%

Sample

1st row지리산
2nd row지리산
3rd row지리산
4th row지리산
5th row지리산

Common Values

ValueCountFrequency (%)
지리산 11
 
10.4%
계룡산 9
 
8.5%
지리산남부 8
 
7.5%
북한산 8
 
7.5%
북한산도봉 6
 
5.7%
월악산 6
 
5.7%
덕유산 6
 
5.7%
속리산 6
 
5.7%
설악산 6
 
5.7%
치악산 4
 
3.8%
Other values (11) 36
34.0%

Length

2023-12-13T04:38:48.793084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
지리산 11
 
10.4%
계룡산 9
 
8.5%
지리산남부 8
 
7.5%
북한산 8
 
7.5%
북한산도봉 6
 
5.7%
월악산 6
 
5.7%
덕유산 6
 
5.7%
속리산 6
 
5.7%
설악산 6
 
5.7%
오대산 4
 
3.8%
Other values (11) 36
34.0%

일련번호
Text

UNIQUE 

Distinct106
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size980.0 B
2023-12-13T04:38:49.070182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length4
Mean length4.0188679
Min length4

Characters and Unicode

Total characters426
Distinct characters40
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique106 ?
Unique (%)100.0%

Sample

1st row지리-1
2nd row지리-2
3rd row지리-3
4th row지리-4
5th row지리-5
ValueCountFrequency (%)
지리-1 1
 
0.9%
월악-5 1
 
0.9%
월악-3 1
 
0.9%
월악-2 1
 
0.9%
월악-1 1
 
0.9%
치악-4 1
 
0.9%
치악-3 1
 
0.9%
치악-2 1
 
0.9%
치악-1 1
 
0.9%
주왕-5 1
 
0.9%
Other values (96) 96
90.6%
2023-12-13T04:38:49.508914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 106
24.9%
1 24
 
5.6%
22
 
5.2%
21
 
4.9%
3 19
 
4.5%
17
 
4.0%
2 17
 
4.0%
16
 
3.8%
4 14
 
3.3%
5 11
 
2.6%
Other values (30) 159
37.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 212
49.8%
Decimal Number 108
25.4%
Dash Punctuation 106
24.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
22
 
10.4%
21
 
9.9%
17
 
8.0%
16
 
7.5%
9
 
4.2%
9
 
4.2%
9
 
4.2%
8
 
3.8%
8
 
3.8%
8
 
3.8%
Other values (19) 85
40.1%
Decimal Number
ValueCountFrequency (%)
1 24
22.2%
3 19
17.6%
2 17
15.7%
4 14
13.0%
5 11
10.2%
6 10
9.3%
7 6
 
5.6%
8 4
 
3.7%
9 2
 
1.9%
0 1
 
0.9%
Dash Punctuation
ValueCountFrequency (%)
- 106
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 214
50.2%
Hangul 212
49.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
22
 
10.4%
21
 
9.9%
17
 
8.0%
16
 
7.5%
9
 
4.2%
9
 
4.2%
9
 
4.2%
8
 
3.8%
8
 
3.8%
8
 
3.8%
Other values (19) 85
40.1%
Common
ValueCountFrequency (%)
- 106
49.5%
1 24
 
11.2%
3 19
 
8.9%
2 17
 
7.9%
4 14
 
6.5%
5 11
 
5.1%
6 10
 
4.7%
7 6
 
2.8%
8 4
 
1.9%
9 2
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 214
50.2%
Hangul 212
49.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 106
49.5%
1 24
 
11.2%
3 19
 
8.9%
2 17
 
7.9%
4 14
 
6.5%
5 11
 
5.1%
6 10
 
4.7%
7 6
 
2.8%
8 4
 
1.9%
9 2
 
0.9%
Hangul
ValueCountFrequency (%)
22
 
10.4%
21
 
9.9%
17
 
8.0%
16
 
7.5%
9
 
4.2%
9
 
4.2%
9
 
4.2%
8
 
3.8%
8
 
3.8%
8
 
3.8%
Other values (19) 85
40.1%
Distinct76
Distinct (%)71.7%
Missing0
Missing (%)0.0%
Memory size980.0 B
2023-12-13T04:38:49.853865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length5.5
Mean length4.2264151
Min length2

Characters and Unicode

Total characters448
Distinct characters97
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)50.9%

Sample

1st row유평계곡
2nd row중산리계곡
3rd row백무동계곡
4th row쌍계사계곡
5th row대성계곡
ValueCountFrequency (%)
원당천 4
 
3.8%
화산계곡 3
 
2.8%
내장천 3
 
2.8%
홍류동계곡 3
 
2.8%
동학사계곡 3
 
2.8%
황룡동계곡 3
 
2.8%
갑사계곡 3
 
2.8%
송계계곡 2
 
1.9%
백담계곡 2
 
1.9%
오대천 2
 
1.9%
Other values (66) 78
73.6%
2023-12-13T04:38:50.239948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
93
20.8%
91
20.3%
24
 
5.4%
16
 
3.6%
15
 
3.3%
8
 
1.8%
7
 
1.6%
7
 
1.6%
6
 
1.3%
6
 
1.3%
Other values (87) 175
39.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 442
98.7%
Decimal Number 6
 
1.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
93
21.0%
91
20.6%
24
 
5.4%
16
 
3.6%
15
 
3.4%
8
 
1.8%
7
 
1.6%
7
 
1.6%
6
 
1.4%
6
 
1.4%
Other values (85) 169
38.2%
Decimal Number
ValueCountFrequency (%)
1 4
66.7%
2 2
33.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 442
98.7%
Common 6
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
93
21.0%
91
20.6%
24
 
5.4%
16
 
3.6%
15
 
3.4%
8
 
1.8%
7
 
1.6%
7
 
1.6%
6
 
1.4%
6
 
1.4%
Other values (85) 169
38.2%
Common
ValueCountFrequency (%)
1 4
66.7%
2 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 442
98.7%
ASCII 6
 
1.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
93
21.0%
91
20.6%
24
 
5.4%
16
 
3.6%
15
 
3.4%
8
 
1.8%
7
 
1.6%
7
 
1.6%
6
 
1.4%
6
 
1.4%
Other values (85) 169
38.2%
ASCII
ValueCountFrequency (%)
1 4
66.7%
2 2
33.3%

BOD_1차
Real number (ℝ)

HIGH CORRELATION 

Distinct10
Distinct (%)9.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.38490566
Minimum0.1
Maximum1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-12-13T04:38:50.365108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.1
5-th percentile0.1
Q10.2
median0.4
Q30.5
95-th percentile0.7
Maximum1
Range0.9
Interquartile range (IQR)0.3

Descriptive statistics

Standard deviation0.20414395
Coefficient of variation (CV)0.53037399
Kurtosis-0.23137853
Mean0.38490566
Median Absolute Deviation (MAD)0.2
Skewness0.55053581
Sum40.8
Variance0.041674753
MonotonicityNot monotonic
2023-12-13T04:38:50.479480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
0.2 26
24.5%
0.5 18
17.0%
0.4 17
16.0%
0.3 12
11.3%
0.1 11
10.4%
0.6 11
10.4%
0.7 6
 
5.7%
0.8 3
 
2.8%
1.0 1
 
0.9%
0.9 1
 
0.9%
ValueCountFrequency (%)
0.1 11
10.4%
0.2 26
24.5%
0.3 12
11.3%
0.4 17
16.0%
0.5 18
17.0%
0.6 11
10.4%
0.7 6
 
5.7%
0.8 3
 
2.8%
0.9 1
 
0.9%
1.0 1
 
0.9%
ValueCountFrequency (%)
1.0 1
 
0.9%
0.9 1
 
0.9%
0.8 3
 
2.8%
0.7 6
 
5.7%
0.6 11
10.4%
0.5 18
17.0%
0.4 17
16.0%
0.3 12
11.3%
0.2 26
24.5%
0.1 11
10.4%

BOD_2차
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct10
Distinct (%)9.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.34339623
Minimum0
Maximum0.9
Zeros9
Zeros (%)8.5%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-12-13T04:38:50.592696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.1
median0.35
Q30.5
95-th percentile0.675
Maximum0.9
Range0.9
Interquartile range (IQR)0.4

Descriptive statistics

Standard deviation0.21996488
Coefficient of variation (CV)0.64055705
Kurtosis-0.59863691
Mean0.34339623
Median Absolute Deviation (MAD)0.15
Skewness0.23524709
Sum36.4
Variance0.048384546
MonotonicityNot monotonic
2023-12-13T04:38:50.696420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
0.1 19
17.9%
0.4 17
16.0%
0.5 16
15.1%
0.3 15
14.2%
0.6 14
13.2%
0.2 10
9.4%
0.0 9
8.5%
0.7 2
 
1.9%
0.8 2
 
1.9%
0.9 2
 
1.9%
ValueCountFrequency (%)
0.0 9
8.5%
0.1 19
17.9%
0.2 10
9.4%
0.3 15
14.2%
0.4 17
16.0%
0.5 16
15.1%
0.6 14
13.2%
0.7 2
 
1.9%
0.8 2
 
1.9%
0.9 2
 
1.9%
ValueCountFrequency (%)
0.9 2
 
1.9%
0.8 2
 
1.9%
0.7 2
 
1.9%
0.6 14
13.2%
0.5 16
15.1%
0.4 17
16.0%
0.3 15
14.2%
0.2 10
9.4%
0.1 19
17.9%
0.0 9
8.5%

BOD_3차
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)8.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.39528302
Minimum0.1
Maximum0.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-12-13T04:38:50.787755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.1
5-th percentile0.1
Q10.2
median0.4
Q30.6
95-th percentile0.775
Maximum0.9
Range0.8
Interquartile range (IQR)0.4

Descriptive statistics

Standard deviation0.22012208
Coefficient of variation (CV)0.55687208
Kurtosis-0.98399725
Mean0.39528302
Median Absolute Deviation (MAD)0.2
Skewness0.30671154
Sum41.9
Variance0.048453729
MonotonicityNot monotonic
2023-12-13T04:38:50.886326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
0.2 20
18.9%
0.6 17
16.0%
0.3 16
15.1%
0.1 16
15.1%
0.5 15
14.2%
0.7 8
 
7.5%
0.4 8
 
7.5%
0.8 4
 
3.8%
0.9 2
 
1.9%
ValueCountFrequency (%)
0.1 16
15.1%
0.2 20
18.9%
0.3 16
15.1%
0.4 8
 
7.5%
0.5 15
14.2%
0.6 17
16.0%
0.7 8
 
7.5%
0.8 4
 
3.8%
0.9 2
 
1.9%
ValueCountFrequency (%)
0.9 2
 
1.9%
0.8 4
 
3.8%
0.7 8
 
7.5%
0.6 17
16.0%
0.5 15
14.2%
0.4 8
 
7.5%
0.3 16
15.1%
0.2 20
18.9%
0.1 16
15.1%

BOD_평균
Real number (ℝ)

HIGH CORRELATION 

Distinct8
Distinct (%)7.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3745283
Minimum0.1
Maximum0.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-12-13T04:38:51.004848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.1
5-th percentile0.1
Q10.2
median0.4
Q30.5
95-th percentile0.6
Maximum0.8
Range0.7
Interquartile range (IQR)0.3

Descriptive statistics

Standard deviation0.18976744
Coefficient of variation (CV)0.50668384
Kurtosis-1.2948993
Mean0.3745283
Median Absolute Deviation (MAD)0.2
Skewness-0.11122697
Sum39.7
Variance0.03601168
MonotonicityNot monotonic
2023-12-13T04:38:51.148891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0.5 23
21.7%
0.6 22
20.8%
0.1 19
17.9%
0.2 17
16.0%
0.4 13
12.3%
0.3 10
9.4%
0.8 1
 
0.9%
0.7 1
 
0.9%
ValueCountFrequency (%)
0.1 19
17.9%
0.2 17
16.0%
0.3 10
9.4%
0.4 13
12.3%
0.5 23
21.7%
0.6 22
20.8%
0.7 1
 
0.9%
0.8 1
 
0.9%
ValueCountFrequency (%)
0.8 1
 
0.9%
0.7 1
 
0.9%
0.6 22
20.8%
0.5 23
21.7%
0.4 13
12.3%
0.3 10
9.4%
0.2 17
16.0%
0.1 19
17.9%

Interactions

2023-12-13T04:38:47.789796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:46.647973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:47.007418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:47.389479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:47.878992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:46.724399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:47.106865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:47.477477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:47.965235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:46.807204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:47.202521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:47.562391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:48.061618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:46.897863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:47.297987image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:38:47.691650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T04:38:51.249223image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공원명지점명BOD_1차BOD_2차BOD_3차BOD_평균
공원명1.0001.0000.7490.8920.8690.848
지점명1.0001.0000.8030.8410.8630.636
BOD_1차0.7490.8031.0000.8080.4300.806
BOD_2차0.8920.8410.8081.0000.5610.794
BOD_3차0.8690.8630.4300.5611.0000.589
BOD_평균0.8480.6360.8060.7940.5891.000
2023-12-13T04:38:51.357935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
BOD_1차BOD_2차BOD_3차BOD_평균공원명
BOD_1차1.0000.6600.6350.8810.366
BOD_2차0.6601.0000.5720.8430.565
BOD_3차0.6350.5721.0000.8230.534
BOD_평균0.8810.8430.8231.0000.523
공원명0.3660.5650.5340.5231.000

Missing values

2023-12-13T04:38:48.192931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T04:38:48.681521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

공원명일련번호지점명BOD_1차BOD_2차BOD_3차BOD_평균
0지리산지리-1유평계곡0.30.30.30.3
1지리산지리-2중산리계곡0.40.30.20.3
2지리산지리-3백무동계곡0.40.30.30.3
3지리산지리-4쌍계사계곡0.50.30.10.3
4지리산지리-5대성계곡0.50.40.50.5
5지리산지리-6단천계곡0.50.40.20.4
6지리산지리-7내원계곡0.40.30.50.4
7지리산지리-8장당계곡0.70.50.20.5
8지리산지리-9칠선계곡0.40.40.50.4
9지리산지리-10거림계곡0.40.50.20.4
공원명일련번호지점명BOD_1차BOD_2차BOD_3차BOD_평균
96소백산소백-3삼가계곡10.60.50.60.6
97소백산소백-4삼가계곡20.30.20.60.4
98소백산북부소북-1천동계곡0.20.50.30.3
99소백산북부소북-2죽령계곡0.30.50.30.4
100소백산북부소북-3남천계곡0.20.20.30.2
101소백산북부소북-4어의계곡0.20.30.20.2
102월출산월출-1천황계곡0.10.50.30.3
103월출산월출-2도갑사계곡0.20.40.40.3
104월출산월출-3경포대계곡0.30.40.40.4
105변산반도변산-1직소천0.30.80.50.5