Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows1235
Duplicate rows (%)12.3%
Total size in memory732.4 KiB
Average record size in memory75.0 B

Variable types

Categorical5
Numeric3

Dataset

Description경기도 하남시 상수도과에서 상수도 사용량, 상수업종, 부과금액 등의 정보를 제공하여 상수도 요금 과징 현황을 보여주는 자료입니다.
Author경기도 하남시
URLhttps://www.data.go.kr/data/15042428/fileData.do

Alerts

Dataset has 1235 (12.3%) duplicate rowsDuplicates
상수업종 is highly overall correlated with 하수사용량(톤) and 1 other fieldsHigh correlation
하수업종 is highly overall correlated with 하수사용량(톤) and 1 other fieldsHigh correlation
상수사용량(톤) is highly overall correlated with 부과금액(원) and 1 other fieldsHigh correlation
하수사용량(톤) is highly overall correlated with 상수업종 and 1 other fieldsHigh correlation
부과금액(원) is highly overall correlated with 상수사용량(톤)High correlation
지하업종 is highly overall correlated with 상수사용량(톤)High correlation
상수업종 is highly imbalanced (57.8%)Imbalance
지하업종 is highly imbalanced (96.4%)Imbalance
하수사용량(톤) is highly skewed (γ1 = 55.99470905)Skewed
상수사용량(톤) has 1218 (12.2%) zerosZeros
하수사용량(톤) has 9976 (99.8%) zerosZeros

Reproduction

Analysis started2024-04-13 12:24:25.569577
Analysis finished2024-04-13 12:24:30.945110
Duration5.38 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

부과년월
Categorical

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-10
2067 
2023-11
2030 
2023-09
1989 
2023-12
1961 
2024-01
1891 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-12
2nd row2023-10
3rd row2023-11
4th row2024-01
5th row2023-09

Common Values

ValueCountFrequency (%)
2023-10 2067
20.7%
2023-11 2030
20.3%
2023-09 1989
19.9%
2023-12 1961
19.6%
2024-01 1891
18.9%
2024-02 62
 
0.6%

Length

2024-04-13T21:24:31.135606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-13T21:24:31.471321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023-10 2067
20.7%
2023-11 2030
20.3%
2023-09 1989
19.9%
2023-12 1961
19.6%
2024-01 1891
18.9%
2024-02 62
 
0.6%

행정동
Categorical

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
풍산동
1992 
감북동
1547 
천현동
1333 
덕풍3동
1115 
덕풍2동
970 
Other values (5)
3043 

Length

Max length4
Median length3
Mean length3.4036
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row덕풍2동
2nd row천현동
3rd row덕풍2동
4th row덕풍2동
5th row춘궁동

Common Values

ValueCountFrequency (%)
풍산동 1992
19.9%
감북동 1547
15.5%
천현동 1333
13.3%
덕풍3동 1115
11.2%
덕풍2동 970
9.7%
신장2동 788
 
7.9%
신장1동 658
 
6.6%
초이동 605
 
6.0%
덕풍1동 505
 
5.1%
춘궁동 487
 
4.9%

Length

2024-04-13T21:24:31.865778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-13T21:24:32.223644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
풍산동 1992
19.9%
감북동 1547
15.5%
천현동 1333
13.3%
덕풍3동 1115
11.2%
덕풍2동 970
9.7%
신장2동 788
 
7.9%
신장1동 658
 
6.6%
초이동 605
 
6.0%
덕풍1동 505
 
5.1%
춘궁동 487
 
4.9%

상수업종
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
가정용
6606 
일반용
3316 
<NA>
 
74
대중탕용
 
2
산업용
 
2

Length

Max length4
Median length3
Mean length3.0076
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반용
2nd row일반용
3rd row가정용
4th row일반용
5th row일반용

Common Values

ValueCountFrequency (%)
가정용 6606
66.1%
일반용 3316
33.2%
<NA> 74
 
0.7%
대중탕용 2
 
< 0.1%
산업용 2
 
< 0.1%

Length

2024-04-13T21:24:32.643323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-13T21:24:32.989992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
가정용 6606
66.1%
일반용 3316
33.2%
na 74
 
0.7%
대중탕용 2
 
< 0.1%
산업용 2
 
< 0.1%

하수업종
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
가정용
6405 
일반용
3043 
<NA>
 
550
대중탕용
 
2

Length

Max length4
Median length3
Mean length3.0552
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반용
2nd row일반용
3rd row가정용
4th row일반용
5th row일반용

Common Values

ValueCountFrequency (%)
가정용 6405
64.0%
일반용 3043
30.4%
<NA> 550
 
5.5%
대중탕용 2
 
< 0.1%

Length

2024-04-13T21:24:33.352240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-13T21:24:33.669052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
가정용 6405
64.0%
일반용 3043
30.4%
na 550
 
5.5%
대중탕용 2
 
< 0.1%

지하업종
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9926 
일반용
 
48
가정용
 
23
대중탕용
 
3

Length

Max length4
Median length4
Mean length3.9929
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9926
99.3%
일반용 48
 
0.5%
가정용 23
 
0.2%
대중탕용 3
 
< 0.1%

Length

2024-04-13T21:24:34.024937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-13T21:24:34.345307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9926
99.3%
일반용 48
 
0.5%
가정용 23
 
0.2%
대중탕용 3
 
< 0.1%

상수사용량(톤)
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct504
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean130.7903
Minimum0
Maximum36695
Zeros1218
Zeros (%)12.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-13T21:24:34.696795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median16
Q334
95-th percentile136.05
Maximum36695
Range36695
Interquartile range (IQR)29

Descriptive statistics

Standard deviation1182.4589
Coefficient of variation (CV)9.0408758
Kurtosis422.51393
Mean130.7903
Median Absolute Deviation (MAD)12
Skewness18.200408
Sum1307903
Variance1398208.9
MonotonicityNot monotonic
2024-04-13T21:24:35.139034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1218
 
12.2%
1 292
 
2.9%
4 279
 
2.8%
7 275
 
2.8%
5 270
 
2.7%
10 265
 
2.6%
14 262
 
2.6%
6 256
 
2.6%
11 253
 
2.5%
2 253
 
2.5%
Other values (494) 6377
63.8%
ValueCountFrequency (%)
0 1218
12.2%
1 292
 
2.9%
2 253
 
2.5%
3 230
 
2.3%
4 279
 
2.8%
5 270
 
2.7%
6 256
 
2.6%
7 275
 
2.8%
8 238
 
2.4%
9 214
 
2.1%
ValueCountFrequency (%)
36695 1
< 0.1%
35742 1
< 0.1%
35651 1
< 0.1%
34327 1
< 0.1%
26643 1
< 0.1%
18695 1
< 0.1%
18530 1
< 0.1%
18207 1
< 0.1%
16786 1
< 0.1%
16693 1
< 0.1%

하수사용량(톤)
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct22
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8409
Minimum0
Maximum2530
Zeros9976
Zeros (%)99.8%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-13T21:24:35.658261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum2530
Range2530
Interquartile range (IQR)0

Descriptive statistics

Standard deviation37.687651
Coefficient of variation (CV)44.818231
Kurtosis3306.6906
Mean0.8409
Median Absolute Deviation (MAD)0
Skewness55.994709
Sum8409
Variance1420.359
MonotonicityNot monotonic
2024-04-13T21:24:36.035894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
0 9976
99.8%
232 2
 
< 0.1%
39 2
 
< 0.1%
10 2
 
< 0.1%
2026 1
 
< 0.1%
40 1
 
< 0.1%
8 1
 
< 0.1%
68 1
 
< 0.1%
13 1
 
< 0.1%
3 1
 
< 0.1%
Other values (12) 12
 
0.1%
ValueCountFrequency (%)
0 9976
99.8%
3 1
 
< 0.1%
8 1
 
< 0.1%
10 2
 
< 0.1%
13 1
 
< 0.1%
15 1
 
< 0.1%
16 1
 
< 0.1%
32 1
 
< 0.1%
35 1
 
< 0.1%
39 2
 
< 0.1%
ValueCountFrequency (%)
2530 1
< 0.1%
2026 1
< 0.1%
1720 1
< 0.1%
696 1
< 0.1%
326 1
< 0.1%
232 2
< 0.1%
161 1
< 0.1%
68 1
< 0.1%
63 1
< 0.1%
54 1
< 0.1%

부과금액(원)
Real number (ℝ)

HIGH CORRELATION 

Distinct2717
Distinct (%)27.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean161366.89
Minimum0
Maximum44590210
Zeros74
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-13T21:24:36.438678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile450
Q15950
median15600
Q336940
95-th percentile201631.5
Maximum44590210
Range44590210
Interquartile range (IQR)30990

Descriptive statistics

Standard deviation1314139.5
Coefficient of variation (CV)8.1437987
Kurtosis378.96311
Mean161366.89
Median Absolute Deviation (MAD)11880
Skewness17.092689
Sum1.6136689 × 109
Variance1.7269626 × 1012
MonotonicityNot monotonic
2024-04-13T21:24:36.868011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
440 381
 
3.8%
1200 276
 
2.8%
12920 172
 
1.7%
10250 156
 
1.6%
9360 150
 
1.5%
12030 149
 
1.5%
11140 140
 
1.4%
15600 138
 
1.4%
450 136
 
1.4%
13820 136
 
1.4%
Other values (2707) 8166
81.7%
ValueCountFrequency (%)
0 74
 
0.7%
110 1
 
< 0.1%
300 1
 
< 0.1%
440 381
3.8%
450 136
 
1.4%
460 14
 
0.1%
470 1
 
< 0.1%
490 4
 
< 0.1%
510 1
 
< 0.1%
520 1
 
< 0.1%
ValueCountFrequency (%)
44590210 1
< 0.1%
34344470 1
< 0.1%
32368580 1
< 0.1%
31827000 1
< 0.1%
31514040 1
< 0.1%
31432870 1
< 0.1%
23799590 1
< 0.1%
22514650 1
< 0.1%
21918510 1
< 0.1%
19284220 1
< 0.1%

Interactions

2024-04-13T21:24:29.438707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:24:27.846752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:24:28.659638image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:24:29.704450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:24:28.131644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:24:28.930535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:24:29.957899image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:24:28.398011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:24:29.183853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-13T21:24:37.134395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
부과년월행정동상수업종하수업종지하업종상수사용량(톤)하수사용량(톤)부과금액(원)
부과년월1.0000.1680.0240.0600.0000.0000.1500.000
행정동0.1681.0000.2700.3180.2880.0580.0000.052
상수업종0.0240.2701.0001.000NaN0.312NaN0.318
하수업종0.0600.3181.0001.000NaN0.043NaN0.122
지하업종0.0000.288NaNNaN1.000NaN0.0000.000
상수사용량(톤)0.0000.0580.3120.043NaN1.0000.0000.957
하수사용량(톤)0.1500.000NaNNaN0.0000.0001.0000.115
부과금액(원)0.0000.0520.3180.1220.0000.9570.1151.000
2024-04-13T21:24:37.442926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상수업종지하업종하수업종행정동부과년월
상수업종1.000NaN1.0000.1640.015
지하업종NaN1.000NaN0.1800.000
하수업종1.000NaN1.0000.2010.025
행정동0.1640.1800.2011.0000.089
부과년월0.0150.0000.0250.0891.000
2024-04-13T21:24:37.727060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상수사용량(톤)하수사용량(톤)부과금액(원)부과년월행정동상수업종하수업종지하업종
상수사용량(톤)1.000-0.0750.9700.0000.0280.1440.0271.000
하수사용량(톤)-0.0751.0000.0290.0550.0001.0001.0000.000
부과금액(원)0.9700.0291.0000.0000.0250.1470.0770.000
부과년월0.0000.0550.0001.0000.0890.0150.0250.000
행정동0.0280.0000.0250.0891.0000.1640.2010.180
상수업종0.1441.0000.1470.0150.1641.0001.0000.000
하수업종0.0271.0000.0770.0250.2011.0001.0000.000
지하업종1.0000.0000.0000.0000.1800.0000.0001.000

Missing values

2024-04-13T21:24:30.303959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-13T21:24:30.735774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

부과년월행정동상수업종하수업종지하업종상수사용량(톤)하수사용량(톤)부과금액(원)
670972023-12덕풍2동일반용일반용<NA>407010
210902023-10천현동일반용일반용<NA>102080
481492023-11덕풍2동가정용가정용<NA>25024720
877952024-01덕풍2동일반용일반용<NA>26033190
180702023-09춘궁동일반용일반용<NA>00450
20562023-09천현동일반용일반용<NA>00720
559592023-11감북동일반용일반용<NA>9015220
935472024-01풍산동가정용가정용<NA>23022140
463512023-11덕풍2동일반용일반용<NA>16027480
838222024-01신장2동가정용가정용<NA>11010250
부과년월행정동상수업종하수업종지하업종상수사용량(톤)하수사용량(톤)부과금액(원)
287712023-10덕풍3동가정용가정용<NA>202980
42572023-09신장2동가정용가정용<NA>908820
214932023-10천현동일반용일반용<NA>32053770
826262024-01신장1동일반용일반용<NA>7012180
455552023-11덕풍1동가정용가정용<NA>11010250
109272023-09풍산동일반용<NA><NA>23022840
200772023-10천현동가정용가정용<NA>66055210
246302023-10신장2동가정용가정용<NA>00440
446642023-11신장2동일반용일반용<NA>34031120
924712024-01풍산동가정용가정용<NA>17015600

Duplicate rows

Most frequently occurring

부과년월행정동상수업종하수업종지하업종상수사용량(톤)하수사용량(톤)부과금액(원)# duplicates
4612023-10풍산동가정용가정용<NA>00120038
7202023-11풍산동가정용가정용<NA>00120036
11932024-01풍산동가정용가정용<NA>00120031
02023-09감북동가정용가정용<NA>0044028
2052023-09풍산동가정용가정용<NA>00120024
9472023-12풍산동가정용가정용<NA>00120023
2442023-10감북동가정용가정용<NA>0044022
5032023-11감북동가정용가정용<NA>0044018
4742023-10풍산동가정용가정용<NA>1401292016
9892024-01감북동가정용가정용<NA>0044016