Overview

Dataset statistics

Number of variables6
Number of observations38
Missing cells18
Missing cells (%)7.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 KiB
Average record size in memory52.5 B

Variable types

Categorical3
Text2
Numeric1

Dataset

Description전라남도 보건연구원홈페이지에 게시된 수질(지하수, 상수도 등) 관련 검사항목 및 수수료에 대한 사항을 정리한 파일입니다.
Author전라남도
URLhttps://www.data.go.kr/data/15041955/fileData.do

Alerts

검체명 is highly overall correlated with 구분1 and 1 other fieldsHigh correlation
구분1 is highly overall correlated with 검체명High correlation
구분2 is highly overall correlated with 검체명High correlation
비고 has 18 (47.4%) missing valuesMissing

Reproduction

Analysis started2023-12-12 01:52:24.676684
Analysis finished2023-12-12 01:52:25.356051
Duration0.68 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

검체명
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)26.3%
Missing0
Missing (%)0.0%
Memory size436.0 B
상수도
16 
지하수
10 
먹는샘물
목욕장
온천수
Other values (5)

Length

Max length12
Median length3
Mean length3.4473684
Min length3

Unique

Unique5 ?
Unique (%)13.2%

Sample

1st row지하수
2nd row지하수
3rd row지하수
4th row지하수
5th row지하수

Common Values

ValueCountFrequency (%)
상수도 16
42.1%
지하수 10
26.3%
먹는샘물 3
 
7.9%
목욕장 2
 
5.3%
온천수 2
 
5.3%
수영장수 1
 
2.6%
수경시설 용수 1
 
2.6%
물놀이형 유기시설(수) 1
 
2.6%
저수조 1
 
2.6%
급수관 1
 
2.6%

Length

2023-12-12T10:52:25.442872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T10:52:25.591381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
상수도 16
40.0%
지하수 10
25.0%
먹는샘물 3
 
7.5%
목욕장 2
 
5.0%
온천수 2
 
5.0%
수영장수 1
 
2.5%
수경시설 1
 
2.5%
용수 1
 
2.5%
물놀이형 1
 
2.5%
유기시설(수 1
 
2.5%
Other values (2) 2
 
5.0%

구분1
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)36.8%
Missing0
Missing (%)0.0%
Memory size436.0 B
지방 상수도
소규모급수시설
<NA>
학교먹는물
원수
Other values (9)
12 

Length

Max length7
Median length5.5
Mean length5.0526316
Min length2

Unique

Unique6 ?
Unique (%)15.8%

Sample

1st row음용수
2nd row음용수
3rd row생활 용수
4th row농업 용수
5th row매립장 검사정

Common Values

ValueCountFrequency (%)
지방 상수도 9
23.7%
소규모급수시설 6
15.8%
<NA> 5
13.2%
학교먹는물 3
 
7.9%
원수 3
 
7.9%
음용수 2
 
5.3%
먹는물공동시설 2
 
5.3%
욕조수 2
 
5.3%
생활 용수 1
 
2.6%
농업 용수 1
 
2.6%
Other values (4) 4
10.5%

Length

2023-12-12T10:52:25.779908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
지방 9
17.6%
상수도 9
17.6%
소규모급수시설 6
11.8%
na 5
9.8%
학교먹는물 3
 
5.9%
원수 3
 
5.9%
음용수 2
 
3.9%
먹는물공동시설 2
 
3.9%
욕조수 2
 
3.9%
용수 2
 
3.9%
Other values (8) 8
15.7%

구분2
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)23.7%
Missing0
Missing (%)0.0%
Memory size436.0 B
<NA>
19 
원수(하천수)
원수(호소수)
정수
지하수
Other values (4)

Length

Max length10
Median length4
Mean length4.7894737
Min length2

Unique

Unique2 ?
Unique (%)5.3%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 19
50.0%
원수(하천수) 4
 
10.5%
원수(호소수) 4
 
10.5%
정수 3
 
7.9%
지하수 2
 
5.3%
수도꼭지 2
 
5.3%
원수(지하수) 2
 
5.3%
정수기 및 냉온수기 1
 
2.6%
급수 과정별 1
 
2.6%

Length

2023-12-12T10:52:25.922590image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T10:52:26.079471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 19
46.3%
원수(하천수 4
 
9.8%
원수(호소수 4
 
9.8%
정수 3
 
7.3%
지하수 2
 
4.9%
수도꼭지 2
 
4.9%
원수(지하수 2
 
4.9%
정수기 1
 
2.4%
1
 
2.4%
냉온수기 1
 
2.4%
Other values (2) 2
 
4.9%
Distinct23
Distinct (%)60.5%
Missing0
Missing (%)0.0%
Memory size436.0 B
2023-12-12T10:52:26.295298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length3.5526316
Min length3

Characters and Unicode

Total characters135
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)36.8%

Sample

1st row46항목
2nd row12항목
3rd row20항목
4th row15항목
5th row25항목
ValueCountFrequency (%)
6항목 7
18.4%
15항목 3
 
7.9%
46항목 2
 
5.3%
1항목 2
 
5.3%
31항목 2
 
5.3%
11항목 2
 
5.3%
59항목 2
 
5.3%
4항목 2
 
5.3%
5항목 2
 
5.3%
2항목 1
 
2.6%
Other values (13) 13
34.2%
2023-12-12T10:52:26.643761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
38
28.1%
38
28.1%
1 15
 
11.1%
5 10
 
7.4%
6 9
 
6.7%
4 6
 
4.4%
2 5
 
3.7%
3 4
 
3.0%
9 4
 
3.0%
0 3
 
2.2%
Other values (2) 3
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 76
56.3%
Decimal Number 59
43.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 15
25.4%
5 10
16.9%
6 9
15.3%
4 6
 
10.2%
2 5
 
8.5%
3 4
 
6.8%
9 4
 
6.8%
0 3
 
5.1%
7 2
 
3.4%
8 1
 
1.7%
Other Letter
ValueCountFrequency (%)
38
50.0%
38
50.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 76
56.3%
Common 59
43.7%

Most frequent character per script

Common
ValueCountFrequency (%)
1 15
25.4%
5 10
16.9%
6 9
15.3%
4 6
 
10.2%
2 5
 
8.5%
3 4
 
6.8%
9 4
 
6.8%
0 3
 
5.1%
7 2
 
3.4%
8 1
 
1.7%
Hangul
ValueCountFrequency (%)
38
50.0%
38
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 76
56.3%
ASCII 59
43.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
38
50.0%
38
50.0%
ASCII
ValueCountFrequency (%)
1 15
25.4%
5 10
16.9%
6 9
15.3%
4 6
 
10.2%
2 5
 
8.5%
3 4
 
6.8%
9 4
 
6.8%
0 3
 
5.1%
7 2
 
3.4%
8 1
 
1.7%

수수료
Real number (ℝ)

Distinct33
Distinct (%)86.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean132163.16
Minimum6200
Maximum360000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size474.0 B
2023-12-12T10:52:26.817252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum6200
5-th percentile9140
Q130325
median59500
Q3263075
95-th percentile347535
Maximum360000
Range353800
Interquartile range (IQR)232750

Descriptive statistics

Standard deviation129174.44
Coefficient of variation (CV)0.97738618
Kurtosis-1.2332428
Mean132163.16
Median Absolute Deviation (MAD)50000
Skewness0.70445359
Sum5022200
Variance1.6686037 × 1010
MonotonicityNot monotonic
2023-12-12T10:52:26.945595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
267700 2
 
5.3%
326400 2
 
5.3%
30200 2
 
5.3%
30700 2
 
5.3%
32200 2
 
5.3%
358500 1
 
2.6%
21600 1
 
2.6%
52200 1
 
2.6%
62500 1
 
2.6%
360000 1
 
2.6%
Other values (23) 23
60.5%
ValueCountFrequency (%)
6200 1
2.6%
8800 1
2.6%
9200 1
2.6%
9400 1
2.6%
14900 1
2.6%
16000 1
2.6%
17000 1
2.6%
21600 1
2.6%
30200 2
5.3%
30700 2
5.3%
ValueCountFrequency (%)
360000 1
2.6%
358500 1
2.6%
345600 1
2.6%
334200 1
2.6%
326400 2
5.3%
323800 1
2.6%
306100 1
2.6%
267700 2
5.3%
249200 1
2.6%
247700 1
2.6%

비고
Text

MISSING 

Distinct18
Distinct (%)90.0%
Missing18
Missing (%)47.4%
Memory size436.0 B
2023-12-12T10:52:27.167729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length23.5
Mean length21.35
Min length3

Characters and Unicode

Total characters427
Distinct characters86
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)85.0%

Sample

1st row비음용
2nd row공업어업용수 동일
3rd row생활용수20항목+5항목(BOD,COD,아질산성질소,암모니아성질소,전기전도도)
4th row약수터 45항목+여시니아균+우라늄
5th row일반세균 등 6개 항목
ValueCountFrequency (%)
탁도 5
 
8.8%
분원성대장균군 5
 
8.8%
ph 4
 
7.0%
총대장균군 4
 
7.0%
ph,bod,ss,do,총대장균군 3
 
5.3%
잔류염소 2
 
3.5%
대장균 2
 
3.5%
유리잔류염소 2
 
3.5%
대장균군 2
 
3.5%
과망간산칼륨소비량 2
 
3.5%
Other values (25) 26
45.6%
2023-12-12T10:52:27.577137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
, 57
 
13.3%
37
 
8.7%
23
 
5.4%
19
 
4.4%
19
 
4.4%
17
 
4.0%
13
 
3.0%
12
 
2.8%
O 10
 
2.3%
D 10
 
2.3%
Other values (76) 210
49.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 266
62.3%
Other Punctuation 57
 
13.3%
Uppercase Letter 45
 
10.5%
Space Separator 37
 
8.7%
Lowercase Letter 9
 
2.1%
Decimal Number 7
 
1.6%
Math Symbol 4
 
0.9%
Open Punctuation 1
 
0.2%
Close Punctuation 1
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
23
 
8.6%
19
 
7.1%
19
 
7.1%
17
 
6.4%
13
 
4.9%
12
 
4.5%
9
 
3.4%
8
 
3.0%
7
 
2.6%
6
 
2.3%
Other values (57) 133
50.0%
Uppercase Letter
ValueCountFrequency (%)
O 10
22.2%
D 10
22.2%
H 10
22.2%
S 8
17.8%
B 4
 
8.9%
C 2
 
4.4%
P 1
 
2.2%
Decimal Number
ValueCountFrequency (%)
5 2
28.6%
2 1
14.3%
0 1
14.3%
6 1
14.3%
4 1
14.3%
3 1
14.3%
Other Punctuation
ValueCountFrequency (%)
, 57
100.0%
Space Separator
ValueCountFrequency (%)
37
100.0%
Lowercase Letter
ValueCountFrequency (%)
p 9
100.0%
Math Symbol
ValueCountFrequency (%)
+ 4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 266
62.3%
Common 107
25.1%
Latin 54
 
12.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
23
 
8.6%
19
 
7.1%
19
 
7.1%
17
 
6.4%
13
 
4.9%
12
 
4.5%
9
 
3.4%
8
 
3.0%
7
 
2.6%
6
 
2.3%
Other values (57) 133
50.0%
Common
ValueCountFrequency (%)
, 57
53.3%
37
34.6%
+ 4
 
3.7%
5 2
 
1.9%
2 1
 
0.9%
0 1
 
0.9%
( 1
 
0.9%
6 1
 
0.9%
) 1
 
0.9%
4 1
 
0.9%
Latin
ValueCountFrequency (%)
O 10
18.5%
D 10
18.5%
H 10
18.5%
p 9
16.7%
S 8
14.8%
B 4
 
7.4%
C 2
 
3.7%
P 1
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 266
62.3%
ASCII 161
37.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
, 57
35.4%
37
23.0%
O 10
 
6.2%
D 10
 
6.2%
H 10
 
6.2%
p 9
 
5.6%
S 8
 
5.0%
+ 4
 
2.5%
B 4
 
2.5%
5 2
 
1.2%
Other values (9) 10
 
6.2%
Hangul
ValueCountFrequency (%)
23
 
8.6%
19
 
7.1%
19
 
7.1%
17
 
6.4%
13
 
4.9%
12
 
4.5%
9
 
3.4%
8
 
3.0%
7
 
2.6%
6
 
2.3%
Other values (57) 133
50.0%

Interactions

2023-12-12T10:52:25.035517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T10:52:27.676790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
검체명구분1구분2검사항목수수료비고
검체명1.0000.8491.0000.8830.0001.000
구분10.8491.0000.7620.9120.7870.946
구분21.0000.7621.0000.8140.1460.964
검사항목0.8830.9120.8141.0000.9571.000
수수료0.0000.7870.1460.9571.0001.000
비고1.0000.9460.9641.0001.0001.000
2023-12-12T10:52:27.784631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분2구분1검체명
구분21.0000.3390.804
구분10.3391.0000.563
검체명0.8040.5631.000
2023-12-12T10:52:27.895887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수수료검체명구분1구분2
수수료1.0000.0000.4550.000
검체명0.0001.0000.5630.804
구분10.4550.5631.0000.339
구분20.0000.8040.3391.000

Missing values

2023-12-12T10:52:25.186453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T10:52:25.304586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

검체명구분1구분2검사항목수수료비고
0지하수음용수<NA>46항목267700<NA>
1지하수음용수<NA>12항목52800<NA>
2지하수생활 용수<NA>20항목137800비음용
3지하수농업 용수<NA>15항목109400공업어업용수 동일
4지하수매립장 검사정<NA>25항목167500생활용수20항목+5항목(BOD,COD,아질산성질소,암모니아성질소,전기전도도)
5지하수먹는물공동시설<NA>47항목306100약수터 45항목+여시니아균+우라늄
6지하수먹는물공동시설<NA>6항목30200일반세균 등 6개 항목
7지하수학교먹는물정수기 및 냉온수기2항목9200탁도, 총대장균군
8지하수학교먹는물지하수6항목30200<NA>
9지하수학교먹는물지하수46항목267700<NA>
검체명구분1구분2검사항목수수료비고
28상수도지방 상수도원수(하천수)31항목358500<NA>
29상수도지방 상수도원수(호소수)31항목360000<NA>
30상수도지방 상수도원수(지하수)19항목136500<NA>
31상수도소규모급 시설정수13항목56500<NA>
32상수도소규모급수시설정수59항목326400<NA>
33상수도소규모급수시설원수(하천수)6항목30700pH,BOD,SS,DO,총대장균군, 분원성대장균군
34상수도소규모급수시설원수(호소수)6항목32200pH,COD,SS,DO,총대장균군, 분원성대장균군
35상수도소규모급수시설원수(하천수)15항목247700<NA>
36상수도소규모급수시설원수(호소수)15항목249200<NA>
37상수도소규모급수시설원수(지하수)11항목75800<NA>