Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells4531
Missing cells (%)6.5%
Duplicate rows415
Duplicate rows (%)4.2%
Total size in memory664.1 KiB
Average record size in memory68.0 B

Variable types

Text1
Categorical1
Numeric4
DateTime1

Dataset

Description전국 지하수 관정의 주소, 지하수용도코드, 년사용량, 개발일자, 심도, 양수능력, 취수계획량에 대한 정보를 제공합니다.
Author한국수자원공사
URLhttps://www.data.go.kr/data/3074803/fileData.do

Alerts

Dataset has 415 (4.2%) duplicate rowsDuplicates
심도 is highly overall correlated with 양수능력 and 1 other fieldsHigh correlation
양수능력 is highly overall correlated with 심도 and 1 other fieldsHigh correlation
취수계획량 is highly overall correlated with 심도 and 1 other fieldsHigh correlation
지하수용도 is highly imbalanced (52.8%)Imbalance
년사용량 has 538 (5.4%) missing valuesMissing
심도 has 908 (9.1%) missing valuesMissing
양수능력 has 681 (6.8%) missing valuesMissing
취수계획량 has 2400 (24.0%) missing valuesMissing
년사용량 has 152 (1.5%) zerosZeros
심도 has 191 (1.9%) zerosZeros
취수계획량 has 395 (4.0%) zerosZeros

Reproduction

Analysis started2023-12-12 07:29:14.359644
Analysis finished2023-12-12 07:29:18.100818
Duration3.74 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

주소
Text

Distinct1084
Distinct (%)10.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T16:29:18.458572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length16
Mean length12.3993
Min length10

Characters and Unicode

Total characters123993
Distinct characters269
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique210 ?
Unique (%)2.1%

Sample

1st row대구광역시 달성군 옥포읍
2nd row광주광역시 동구 용연동
3rd row인천광역시 강화군 내가면
4th row경기도 수원시 천천동
5th row대전광역시 서구 둔산동
ValueCountFrequency (%)
대전광역시 2141
 
7.1%
인천광역시 1992
 
6.6%
경기도 1794
 
6.0%
광주광역시 1329
 
4.4%
강화군 1211
 
4.0%
동구 850
 
2.8%
서울특별시 805
 
2.7%
평택시 726
 
2.4%
부산광역시 696
 
2.3%
중구 682
 
2.3%
Other values (1017) 17774
59.2%
2023-12-12T16:29:19.061206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
20000
 
16.1%
10011
 
8.1%
9665
 
7.8%
8759
 
7.1%
7415
 
6.0%
6762
 
5.5%
3536
 
2.9%
2759
 
2.2%
2387
 
1.9%
2354
 
1.9%
Other values (259) 50345
40.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 103870
83.8%
Space Separator 20000
 
16.1%
Decimal Number 123
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10011
 
9.6%
9665
 
9.3%
8759
 
8.4%
7415
 
7.1%
6762
 
6.5%
3536
 
3.4%
2759
 
2.7%
2387
 
2.3%
2354
 
2.3%
2273
 
2.2%
Other values (251) 47949
46.2%
Decimal Number
ValueCountFrequency (%)
3 35
28.5%
2 31
25.2%
1 30
24.4%
7 10
 
8.1%
4 9
 
7.3%
5 6
 
4.9%
6 2
 
1.6%
Space Separator
ValueCountFrequency (%)
20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 103870
83.8%
Common 20123
 
16.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10011
 
9.6%
9665
 
9.3%
8759
 
8.4%
7415
 
7.1%
6762
 
6.5%
3536
 
3.4%
2759
 
2.7%
2387
 
2.3%
2354
 
2.3%
2273
 
2.2%
Other values (251) 47949
46.2%
Common
ValueCountFrequency (%)
20000
99.4%
3 35
 
0.2%
2 31
 
0.2%
1 30
 
0.1%
7 10
 
< 0.1%
4 9
 
< 0.1%
5 6
 
< 0.1%
6 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 103870
83.8%
ASCII 20123
 
16.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20000
99.4%
3 35
 
0.2%
2 31
 
0.2%
1 30
 
0.1%
7 10
 
< 0.1%
4 9
 
< 0.1%
5 6
 
< 0.1%
6 2
 
< 0.1%
Hangul
ValueCountFrequency (%)
10011
 
9.6%
9665
 
9.3%
8759
 
8.4%
7415
 
7.1%
6762
 
6.5%
3536
 
3.4%
2759
 
2.7%
2387
 
2.3%
2354
 
2.3%
2273
 
2.2%
Other values (251) 47949
46.2%

지하수용도
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
생활용
6002 
농업용
3800 
공업용
 
171
기타
 
25
<NA>
 
2

Length

Max length4
Median length3
Mean length2.9977
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row농업용
2nd row생활용
3rd row생활용
4th row생활용
5th row생활용

Common Values

ValueCountFrequency (%)
생활용 6002
60.0%
농업용 3800
38.0%
공업용 171
 
1.7%
기타 25
 
0.2%
<NA> 2
 
< 0.1%

Length

2023-12-12T16:29:19.258137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:29:19.385274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
생활용 6002
60.0%
농업용 3800
38.0%
공업용 171
 
1.7%
기타 25
 
0.2%
na 2
 
< 0.1%

년사용량
Real number (ℝ)

MISSING  ZEROS 

Distinct1683
Distinct (%)17.8%
Missing538
Missing (%)5.4%
Infinite0
Infinite (%)0.0%
Mean2504.1976
Minimum0
Maximum244704
Zeros152
Zeros (%)1.5%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:29:19.513842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile180
Q1360
median990
Q31739
95-th percentile10800
Maximum244704
Range244704
Interquartile range (IQR)1379

Descriptive statistics

Standard deviation7608.7786
Coefficient of variation (CV)3.0384098
Kurtosis281.55585
Mean2504.1976
Median Absolute Deviation (MAD)681
Skewness13.327442
Sum23694718
Variance57893512
MonotonicityNot monotonic
2023-12-12T16:29:19.665819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
277 483
 
4.8%
600 235
 
2.4%
360 232
 
2.3%
285 192
 
1.9%
280 175
 
1.8%
365 160
 
1.6%
0 152
 
1.5%
256 127
 
1.3%
300 122
 
1.2%
673 119
 
1.2%
Other values (1673) 7465
74.7%
(Missing) 538
 
5.4%
ValueCountFrequency (%)
0 152
1.5%
1 18
 
0.2%
2 1
 
< 0.1%
3 4
 
< 0.1%
4 2
 
< 0.1%
6 1
 
< 0.1%
7 4
 
< 0.1%
9 2
 
< 0.1%
10 6
 
0.1%
11 1
 
< 0.1%
ValueCountFrequency (%)
244704 1
< 0.1%
194910 1
< 0.1%
194180 1
< 0.1%
182500 1
< 0.1%
144000 1
< 0.1%
127750 1
< 0.1%
115705 1
< 0.1%
109500 1
< 0.1%
102200 1
< 0.1%
98806 1
< 0.1%
Distinct4191
Distinct (%)41.9%
Missing4
Missing (%)< 0.1%
Memory size156.2 KiB
Minimum1900-01-01 00:00:00
Maximum2020-12-31 00:00:00
2023-12-12T16:29:19.820603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:19.960721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

심도
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct214
Distinct (%)2.4%
Missing908
Missing (%)9.1%
Infinite0
Infinite (%)0.0%
Mean68.092939
Minimum0
Maximum1015
Zeros191
Zeros (%)1.9%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:29:20.134949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10
Q125
median33
Q3100
95-th percentile170
Maximum1015
Range1015
Interquartile range (IQR)75

Descriptive statistics

Standard deviation74.461013
Coefficient of variation (CV)1.0935203
Kurtosis31.908849
Mean68.092939
Median Absolute Deviation (MAD)19.5
Skewness4.1985705
Sum619101
Variance5544.4424
MonotonicityNot monotonic
2023-12-12T16:29:20.307233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30.0 1976
19.8%
100.0 1671
16.7%
20.0 837
 
8.4%
150.0 405
 
4.0%
25.0 348
 
3.5%
50.0 327
 
3.3%
120.0 281
 
2.8%
80.0 268
 
2.7%
70.0 237
 
2.4%
40.0 197
 
2.0%
Other values (204) 2545
25.4%
(Missing) 908
 
9.1%
ValueCountFrequency (%)
0.0 191
1.9%
1.0 8
 
0.1%
1.5 1
 
< 0.1%
2.0 7
 
0.1%
2.2 1
 
< 0.1%
3.0 9
 
0.1%
3.5 4
 
< 0.1%
3.6 1
 
< 0.1%
4.0 11
 
0.1%
4.5 2
 
< 0.1%
ValueCountFrequency (%)
1015.0 1
 
< 0.1%
1000.0 5
0.1%
900.0 1
 
< 0.1%
895.0 1
 
< 0.1%
810.0 1
 
< 0.1%
800.0 1
 
< 0.1%
750.0 1
 
< 0.1%
700.0 3
< 0.1%
540.0 3
< 0.1%
520.0 4
< 0.1%

양수능력
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct331
Distinct (%)3.6%
Missing681
Missing (%)6.8%
Infinite0
Infinite (%)0.0%
Mean52.926133
Minimum0
Maximum2400
Zeros6
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:29:20.460788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10
Q123.05
median32
Q360
95-th percentile140
Maximum2400
Range2400
Interquartile range (IQR)36.95

Descriptive statistics

Standard deviation74.401809
Coefficient of variation (CV)1.405767
Kurtosis234.69839
Mean52.926133
Median Absolute Deviation (MAD)12
Skewness11.130637
Sum493218.63
Variance5535.6292
MonotonicityNot monotonic
2023-12-12T16:29:20.625482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20.0 1296
 
13.0%
30.0 1038
 
10.4%
40.0 705
 
7.0%
25.0 704
 
7.0%
50.0 601
 
6.0%
29.0 314
 
3.1%
10.0 293
 
2.9%
60.0 288
 
2.9%
80.0 249
 
2.5%
90.0 212
 
2.1%
Other values (321) 3619
36.2%
(Missing) 681
 
6.8%
ValueCountFrequency (%)
0.0 6
 
0.1%
0.3 1
 
< 0.1%
0.5 1
 
< 0.1%
1.0 16
 
0.2%
1.2 3
 
< 0.1%
1.5 4
 
< 0.1%
2.0 63
0.6%
2.2 2
 
< 0.1%
2.5 3
 
< 0.1%
2.7 1
 
< 0.1%
ValueCountFrequency (%)
2400.0 1
< 0.1%
2000.0 1
< 0.1%
1859.0 1
< 0.1%
1450.0 1
< 0.1%
1100.0 1
< 0.1%
1000.0 2
< 0.1%
910.0 1
< 0.1%
900.0 1
< 0.1%
870.0 1
< 0.1%
831.0 1
< 0.1%

취수계획량
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct137
Distinct (%)1.8%
Missing2400
Missing (%)24.0%
Infinite0
Infinite (%)0.0%
Mean33.249632
Minimum0
Maximum1500
Zeros395
Zeros (%)4.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:29:20.792285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q110
median20
Q340
95-th percentile100
Maximum1500
Range1500
Interquartile range (IQR)30

Descriptive statistics

Standard deviation51.185032
Coefficient of variation (CV)1.5394165
Kurtosis181.43061
Mean33.249632
Median Absolute Deviation (MAD)10
Skewness9.7165896
Sum252697.2
Variance2619.9075
MonotonicityNot monotonic
2023-12-12T16:29:20.951463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20.0 1308
13.1%
10.0 1055
10.5%
30.0 839
 
8.4%
50.0 585
 
5.9%
5.0 426
 
4.3%
0.0 395
 
4.0%
15.0 392
 
3.9%
40.0 371
 
3.7%
80.0 217
 
2.2%
1.0 209
 
2.1%
Other values (127) 1803
18.0%
(Missing) 2400
24.0%
ValueCountFrequency (%)
0.0 395
4.0%
0.1 3
 
< 0.1%
0.2 6
 
0.1%
0.3 2
 
< 0.1%
0.4 4
 
< 0.1%
0.5 23
 
0.2%
0.6 2
 
< 0.1%
0.7 4
 
< 0.1%
0.8 6
 
0.1%
0.9 1
 
< 0.1%
ValueCountFrequency (%)
1500.0 1
 
< 0.1%
1200.0 1
 
< 0.1%
1000.0 1
 
< 0.1%
900.0 1
 
< 0.1%
800.0 1
 
< 0.1%
770.0 1
 
< 0.1%
700.0 1
 
< 0.1%
600.0 2
< 0.1%
520.0 1
 
< 0.1%
500.0 3
< 0.1%

Interactions

2023-12-12T16:29:17.112620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:15.406713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:15.837363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:16.550985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:17.234724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:15.511390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:15.956203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:16.665497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:17.358680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:15.614761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:16.063943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:16.825380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:17.495921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:15.712626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:16.439810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:29:16.953977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T16:29:21.063482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지하수용도년사용량심도양수능력취수계획량
지하수용도1.0000.3250.2440.1250.135
년사용량0.3251.0000.3800.8340.854
심도0.2440.3801.0000.3520.315
양수능력0.1250.8340.3521.0000.975
취수계획량0.1350.8540.3150.9751.000
2023-12-12T16:29:21.171099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년사용량심도양수능력취수계획량지하수용도
년사용량1.0000.3240.3970.3060.150
심도0.3241.0000.5920.5000.148
양수능력0.3970.5921.0000.6040.080
취수계획량0.3060.5000.6041.0000.086
지하수용도0.1500.1480.0800.0861.000

Missing values

2023-12-12T16:29:17.676610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T16:29:17.832059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T16:29:17.995935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

주소지하수용도년사용량개발일자심도양수능력취수계획량
19188대구광역시 달성군 옥포읍농업용3562014-10-02100.0<NA><NA>
39405광주광역시 동구 용연동생활용2801986-03-0630.020.01.0
29748인천광역시 강화군 내가면생활용2702016-10-2035.015.05.0
80090경기도 수원시 천천동생활용2771988-01-0150.026.0<NA>
61571대전광역시 서구 둔산동생활용12001-02-2180.060.030.0
37522인천광역시 옹진군 덕적면생활용2562012-02-0125.025.010.0
73819울산광역시 울주군 두동면농업용11502019-09-18130.033.020.0
10410부산광역시 해운대구 중동생활용26281994-11-0350.070.030.0
79544경기도 수원시 이목동생활용2772002-11-20<NA><NA><NA>
4814서울특별시 서초구 내곡동생활용2521994-11-0418.030.010.0
주소지하수용도년사용량개발일자심도양수능력취수계획량
55396대전광역시 동구 자양동생활용2771980-08-14<NA>20.0<NA>
3734서울특별시 구로구 구로동생활용101901-01-0120.029.6<NA>
28515인천광역시 강화군 길상면농업용6002002-04-29100.040.030.0
92789경기도 평택시 용이동농업용5142003-01-2030.020.010.0
20153대구광역시 달성군 화원읍농업용63002012-08-31100.090.090.0
71395대전광역시 대덕구 읍내동생활용3651990-08-0170.010.0<NA>
46474광주광역시 광산구 동호동농업용<NA>2018-03-09<NA><NA><NA>
81752경기도 성남시 석운동생활용10802004-12-28100.065.05.0
68380대전광역시 유성구 죽동생활용3601901-01-0130.025.0<NA>
13133부산광역시 기장군 기장읍생활용6712015-03-12150.020.03.0

Duplicate rows

Most frequently occurring

주소지하수용도년사용량개발일자심도양수능력취수계획량# duplicates
363인천광역시 강화군 선원면농업용22192020-03-0250.030.040.043
345인천광역시 강화군 불은면농업용21682020-03-0230.020.015.035
373인천광역시 강화군 송해면생활용18251901-01-01<NA><NA><NA>31
365인천광역시 강화군 선원면생활용20782020-03-0250.030.040.029
136광주광역시 광산구 유계동농업용<NA>2018-03-09<NA><NA><NA>27
122광주광역시 광산구 연산동농업용<NA>2018-03-09<NA><NA><NA>24
78경기도 평택시 지제동생활용13192003-01-2230.020.010.022
251대전광역시 유성구 구암동생활용3601901-01-0130.025.0<NA>14
331인천광역시 강화군 강화읍생활용18251901-01-0130.0<NA><NA>13
366인천광역시 강화군 송해면농업용6002000-08-0130.050.050.013