Overview

Dataset statistics

Number of variables6
Number of observations5670
Missing cells1
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory282.5 KiB
Average record size in memory51.0 B

Variable types

Text2
Categorical4

Dataset

Description지하수 수질측정망 제원에 대한 내용입니다. - 주소, 관측소명, 관정구분, 지하수용도코드, 음용여부, 구분 등을 제공합니다. * 지하수 관련 사이트는 www.gims.go.kr 을 참고하여주시기 바랍니다.
URLhttps://www.data.go.kr/data/15104449/fileData.do

Alerts

관정구분 is highly overall correlated with 구분High correlation
구분 is highly overall correlated with 관정구분High correlation
관정구분 is highly imbalanced (66.5%)Imbalance

Reproduction

Analysis started2023-12-11 23:58:20.779827
Analysis finished2023-12-11 23:58:21.531512
Duration0.75 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

주소
Text

Distinct4714
Distinct (%)83.1%
Missing0
Missing (%)0.0%
Memory size44.4 KiB
2023-12-12T08:58:21.787997image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length29
Mean length20.150617
Min length4

Characters and Unicode

Total characters114254
Distinct characters372
Distinct categories7 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3928 ?
Unique (%)69.3%

Sample

1st row전라북도 임실군 덕치면 장암리 산301
2nd row충청남도 예산군 예산읍 주교리 420
3rd row강원도 홍천군 서면 모곡리 산234-4
4th row충청북도 충주시 중앙탑면 가흥리 582
5th row충청북도 충주시 동량면 조동리 1370-4
ValueCountFrequency (%)
경기도 805
 
3.1%
경상북도 619
 
2.4%
경상남도 532
 
2.1%
전라남도 501
 
2.0%
강원도 458
 
1.8%
전라북도 413
 
1.6%
충청남도 404
 
1.6%
대구광역시 401
 
1.6%
충청북도 355
 
1.4%
서울특별시 304
 
1.2%
Other values (6734) 20858
81.3%
2023-12-12T08:58:22.245354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
22910
 
20.1%
4395
 
3.8%
1 4217
 
3.7%
3993
 
3.5%
- 3513
 
3.1%
3381
 
3.0%
3062
 
2.7%
2 2786
 
2.4%
2395
 
2.1%
3 2242
 
2.0%
Other values (362) 61360
53.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67205
58.8%
Space Separator 22910
 
20.1%
Decimal Number 20604
 
18.0%
Dash Punctuation 3513
 
3.1%
Uppercase Letter 10
 
< 0.1%
Open Punctuation 6
 
< 0.1%
Close Punctuation 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4395
 
6.5%
3993
 
5.9%
3381
 
5.0%
3062
 
4.6%
2395
 
3.6%
2087
 
3.1%
2078
 
3.1%
2040
 
3.0%
2035
 
3.0%
1970
 
2.9%
Other values (342) 39769
59.2%
Decimal Number
ValueCountFrequency (%)
1 4217
20.5%
2 2786
13.5%
3 2242
10.9%
4 1991
9.7%
5 1917
9.3%
6 1656
 
8.0%
7 1515
 
7.4%
0 1447
 
7.0%
8 1434
 
7.0%
9 1399
 
6.8%
Uppercase Letter
ValueCountFrequency (%)
B 3
30.0%
N 3
30.0%
L 1
 
10.0%
T 1
 
10.0%
P 1
 
10.0%
A 1
 
10.0%
Space Separator
ValueCountFrequency (%)
22910
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3513
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67201
58.8%
Common 47039
41.2%
Latin 10
 
< 0.1%
Han 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4395
 
6.5%
3993
 
5.9%
3381
 
5.0%
3062
 
4.6%
2395
 
3.6%
2087
 
3.1%
2078
 
3.1%
2040
 
3.0%
2035
 
3.0%
1970
 
2.9%
Other values (340) 39765
59.2%
Common
ValueCountFrequency (%)
22910
48.7%
1 4217
 
9.0%
- 3513
 
7.5%
2 2786
 
5.9%
3 2242
 
4.8%
4 1991
 
4.2%
5 1917
 
4.1%
6 1656
 
3.5%
7 1515
 
3.2%
0 1447
 
3.1%
Other values (4) 2845
 
6.0%
Latin
ValueCountFrequency (%)
B 3
30.0%
N 3
30.0%
L 1
 
10.0%
T 1
 
10.0%
P 1
 
10.0%
A 1
 
10.0%
Han
ValueCountFrequency (%)
2
50.0%
2
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67201
58.8%
ASCII 47049
41.2%
CJK 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
22910
48.7%
1 4217
 
9.0%
- 3513
 
7.5%
2 2786
 
5.9%
3 2242
 
4.8%
4 1991
 
4.2%
5 1917
 
4.1%
6 1656
 
3.5%
7 1515
 
3.2%
0 1447
 
3.1%
Other values (10) 2855
 
6.1%
Hangul
ValueCountFrequency (%)
4395
 
6.5%
3993
 
5.9%
3381
 
5.0%
3062
 
4.6%
2395
 
3.6%
2087
 
3.1%
2078
 
3.1%
2040
 
3.0%
2035
 
3.0%
1970
 
2.9%
Other values (340) 39765
59.2%
CJK
ValueCountFrequency (%)
2
50.0%
2
50.0%
Distinct2916
Distinct (%)51.4%
Missing1
Missing (%)< 0.1%
Memory size44.4 KiB
2023-12-12T08:58:22.532479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length4
Mean length4.1718116
Min length2

Characters and Unicode

Total characters23650
Distinct characters384
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1610 ?
Unique (%)28.4%

Sample

1st row임실덕치_신
2nd row예산예산
3rd row홍천서면
4th row충주가금
5th row충주동량
ValueCountFrequency (%)
자동관측정 72
 
1.3%
남구대명 15
 
0.3%
달성논공 13
 
0.2%
울산서하 12
 
0.2%
평창평창 12
 
0.2%
순창순창 11
 
0.2%
옥천옥천 11
 
0.2%
서구평리 11
 
0.2%
신안지도 11
 
0.2%
여주점동 11
 
0.2%
Other values (2928) 5527
96.9%
2023-12-12T08:58:22.938956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
944
 
4.0%
840
 
3.6%
755
 
3.2%
710
 
3.0%
709
 
3.0%
675
 
2.9%
529
 
2.2%
520
 
2.2%
470
 
2.0%
389
 
1.6%
Other values (374) 17109
72.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 22580
95.5%
Decimal Number 970
 
4.1%
Space Separator 40
 
0.2%
Connector Punctuation 25
 
0.1%
Dash Punctuation 21
 
0.1%
Uppercase Letter 7
 
< 0.1%
Other Symbol 3
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
944
 
4.2%
840
 
3.7%
755
 
3.3%
710
 
3.1%
709
 
3.1%
675
 
3.0%
529
 
2.3%
520
 
2.3%
470
 
2.1%
389
 
1.7%
Other values (355) 16039
71.0%
Decimal Number
ValueCountFrequency (%)
1 328
33.8%
2 267
27.5%
3 248
25.6%
4 62
 
6.4%
5 27
 
2.8%
6 19
 
2.0%
8 12
 
1.2%
7 4
 
0.4%
9 3
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
S 2
28.6%
K 2
28.6%
C 2
28.6%
A 1
14.3%
Space Separator
ValueCountFrequency (%)
40
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 25
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 21
100.0%
Other Symbol
ValueCountFrequency (%)
3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 22583
95.5%
Common 1060
 
4.5%
Latin 7
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
944
 
4.2%
840
 
3.7%
755
 
3.3%
710
 
3.1%
709
 
3.1%
675
 
3.0%
529
 
2.3%
520
 
2.3%
470
 
2.1%
389
 
1.7%
Other values (356) 16042
71.0%
Common
ValueCountFrequency (%)
1 328
30.9%
2 267
25.2%
3 248
23.4%
4 62
 
5.8%
40
 
3.8%
5 27
 
2.5%
_ 25
 
2.4%
- 21
 
2.0%
6 19
 
1.8%
8 12
 
1.1%
Other values (4) 11
 
1.0%
Latin
ValueCountFrequency (%)
S 2
28.6%
K 2
28.6%
C 2
28.6%
A 1
14.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 22580
95.5%
ASCII 1067
 
4.5%
None 3
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
944
 
4.2%
840
 
3.7%
755
 
3.3%
710
 
3.1%
709
 
3.1%
675
 
3.0%
529
 
2.3%
520
 
2.3%
470
 
2.1%
389
 
1.7%
Other values (355) 16039
71.0%
ASCII
ValueCountFrequency (%)
1 328
30.7%
2 267
25.0%
3 248
23.2%
4 62
 
5.8%
40
 
3.7%
5 27
 
2.5%
_ 25
 
2.3%
- 21
 
2.0%
6 19
 
1.8%
8 12
 
1.1%
Other values (8) 18
 
1.7%
None
ValueCountFrequency (%)
3
100.0%

관정구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.4 KiB
<NA>
4764 
1
594 
2
 
199
5
 
43
6
 
35

Length

Max length4
Median length4
Mean length3.5206349
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
<NA> 4764
84.0%
1 594
 
10.5%
2 199
 
3.5%
5 43
 
0.8%
6 35
 
0.6%
4 35
 
0.6%

Length

2023-12-12T08:58:23.074935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:58:23.172052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 4764
84.0%
1 594
 
10.5%
2 199
 
3.5%
5 43
 
0.8%
6 35
 
0.6%
4 35
 
0.6%
Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.4 KiB
1
4246 
<NA>
612 
2
486 
3
 
322
4
 
4

Length

Max length4
Median length1
Mean length1.3238095
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 4246
74.9%
<NA> 612
 
10.8%
2 486
 
8.6%
3 322
 
5.7%
4 4
 
0.1%

Length

2023-12-12T08:58:23.278879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:58:23.397891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 4246
74.9%
na 612
 
10.8%
2 486
 
8.6%
3 322
 
5.7%
4 4
 
0.1%

음용여부
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.4 KiB
0
2777 
1
1666 
<NA>
1227 

Length

Max length4
Median length1
Mean length1.6492063
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
0 2777
49.0%
1 1666
29.4%
<NA> 1227
21.6%

Length

2023-12-12T08:58:23.551324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:58:23.667974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 2777
49.0%
1 1666
29.4%
na 1227
21.6%

구분
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.4 KiB
일반지역
2846 
오염우려지역
2146 
국가관측망
678 

Length

Max length6
Median length4
Mean length4.8765432
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row국가관측망
2nd row국가관측망
3rd row국가관측망
4th row국가관측망
5th row국가관측망

Common Values

ValueCountFrequency (%)
일반지역 2846
50.2%
오염우려지역 2146
37.8%
국가관측망 678
 
12.0%

Length

2023-12-12T08:58:23.775602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:58:23.877029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반지역 2846
50.2%
오염우려지역 2146
37.8%
국가관측망 678
 
12.0%

Correlations

2023-12-12T08:58:23.939569image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
관정구분지하수용도코드음용여부구분
관정구분1.0000.0000.1210.561
지하수용도코드0.0001.0000.4710.295
음용여부0.1210.4711.0000.180
구분0.5610.2950.1801.000
2023-12-12T08:58:24.019652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분관정구분지하수용도코드음용여부
구분1.0000.5050.2840.297
관정구분0.5051.0000.0000.077
지하수용도코드0.2840.0001.0000.318
음용여부0.2970.0770.3181.000
2023-12-12T08:58:24.104814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
관정구분지하수용도코드음용여부구분
관정구분1.0000.0000.0770.505
지하수용도코드0.0001.0000.3180.284
음용여부0.0770.3181.0000.297
구분0.5050.2840.2971.000

Missing values

2023-12-12T08:58:21.395775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T08:58:21.490577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

주소관측소명관정구분지하수용도코드음용여부구분
0전라북도 임실군 덕치면 장암리 산301임실덕치_신21<NA>국가관측망
1충청남도 예산군 예산읍 주교리 420예산예산11<NA>국가관측망
2강원도 홍천군 서면 모곡리 산234-4홍천서면11<NA>국가관측망
3충청북도 충주시 중앙탑면 가흥리 582충주가금11<NA>국가관측망
4충청북도 충주시 동량면 조동리 1370-4충주동량11<NA>국가관측망
5강원도 춘천시 북산면 추곡리 108-1춘천북산11<NA>국가관측망
6경상남도 창녕군 영산면 죽사리 1456-41창녕영산11<NA>국가관측망
7전라북도 남원시 도통동 554남원도통11<NA>국가관측망
8충청북도 옥천군 청성면 묘금리 19-1옥천청성11<NA>국가관측망
9경상북도 경주시 외동읍 활성리 948-1경주외동21<NA>국가관측망
주소관측소명관정구분지하수용도코드음용여부구분
5660경상북도 상주시 공검면 양정리 898상주21<NA>0오염우려지역
5661태창광산<NA>10오염우려지역
5662경상남도 함양군 병곡면 송평리 628-3함양병곡211국가관측망
5663경상남도 창원시 성산구 반림동 6-4창원반림<NA>11오염우려지역
5664경상남도 창원시 반림동 6-4창원반림<NA>11오염우려지역
5665전라남도 완도군 완도읍 가용리 172완도가용5<NA><NA>오염우려지역
5666울산광역시 울주군 두서면 서하리 86울산서하6<NA><NA>오염우려지역
5667경상북도 청도군 청도읍 신도리 50-2청도신도4<NA><NA>오염우려지역
5668경기도 안성시 미양면 계륵리 268안성미양<NA>20오염우려지역
5669경기도 안성시 신건지동 60-1안성신건<NA>20오염우려지역