Overview

Dataset statistics

Number of variables7
Number of observations2622
Missing cells7
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory156.3 KiB
Average record size in memory61.0 B

Variable types

Numeric5
Categorical2

Dataset

Description부산광역시 상수도사업본부에서 상하수도 요금 계산 및 징수를 위해 운영하는 수용가정보시스템에 사용되는 민원 신청 정보(급수공사_신청승낙) 자료입니다.
Author부산광역시 상수도사업본부
URLhttps://www.data.go.kr/data/15083686/fileData.do

Alerts

사업소코드 is highly overall correlated with 사업소명High correlation
전수 is highly overall correlated with 상수도업종High correlation
사업소명 is highly overall correlated with 사업소코드High correlation
상수도업종 is highly overall correlated with 전수High correlation
상수도업종 is highly imbalanced (68.6%)Imbalance
월사용량 is highly skewed (γ1 = 21.1077881)Skewed
전수 is highly skewed (γ1 = 27.07038882)Skewed
연번 has unique valuesUnique
월사용량 has 38 (1.4%) zerosZeros
전수 has 27 (1.0%) zerosZeros

Reproduction

Analysis started2024-03-14 19:14:23.894131
Analysis finished2024-03-14 19:14:31.567563
Duration7.67 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct2622
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1311.5
Minimum1
Maximum2622
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.2 KiB
2024-03-15T04:14:31.779811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile132.05
Q1656.25
median1311.5
Q31966.75
95-th percentile2490.95
Maximum2622
Range2621
Interquartile range (IQR)1310.5

Descriptive statistics

Standard deviation757.05053
Coefficient of variation (CV)0.5772402
Kurtosis-1.2
Mean1311.5
Median Absolute Deviation (MAD)655.5
Skewness0
Sum3438753
Variance573125.5
MonotonicityStrictly increasing
2024-03-15T04:14:32.227455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
1763 1
 
< 0.1%
1745 1
 
< 0.1%
1746 1
 
< 0.1%
1747 1
 
< 0.1%
1748 1
 
< 0.1%
1749 1
 
< 0.1%
1750 1
 
< 0.1%
1751 1
 
< 0.1%
1752 1
 
< 0.1%
Other values (2612) 2612
99.6%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
2622 1
< 0.1%
2621 1
< 0.1%
2620 1
< 0.1%
2619 1
< 0.1%
2618 1
< 0.1%
2617 1
< 0.1%
2616 1
< 0.1%
2615 1
< 0.1%
2614 1
< 0.1%
2613 1
< 0.1%

사업소코드
Real number (ℝ)

HIGH CORRELATION 

Distinct12
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean296.18078
Minimum201
Maximum312
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.2 KiB
2024-03-15T04:14:32.575059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum201
5-th percentile244
Q1303
median307
Q3311
95-th percentile312
Maximum312
Range111
Interquartile range (IQR)8

Descriptive statistics

Standard deviation25.699926
Coefficient of variation (CV)0.08677108
Kurtosis1.4139794
Mean296.18078
Median Absolute Deviation (MAD)4
Skewness-1.7273778
Sum776586
Variance660.48619
MonotonicityNot monotonic
2024-03-15T04:14:32.778056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
244 450
17.2%
311 437
16.7%
312 422
16.1%
306 271
10.3%
307 233
8.9%
304 228
8.7%
308 164
 
6.3%
309 150
 
5.7%
301 105
 
4.0%
303 75
 
2.9%
Other values (2) 87
 
3.3%
ValueCountFrequency (%)
201 20
 
0.8%
244 450
17.2%
301 105
 
4.0%
302 67
 
2.6%
303 75
 
2.9%
304 228
8.7%
306 271
10.3%
307 233
8.9%
308 164
 
6.3%
309 150
 
5.7%
ValueCountFrequency (%)
312 422
16.1%
311 437
16.7%
309 150
 
5.7%
308 164
 
6.3%
307 233
8.9%
306 271
10.3%
304 228
8.7%
303 75
 
2.9%
302 67
 
2.6%
301 105
 
4.0%

사업소명
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size20.6 KiB
동래통합사업소
450 
강서사업소
437 
기장사업소
422 
남부사업소
271 
북부사업소
233 
Other values (7)
809 

Length

Max length9
Median length5
Mean length5.82418
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row동래통합사업소
2nd row남부사업소
3rd row동래통합사업소
4th row동래통합사업소
5th row동래통합사업소

Common Values

ValueCountFrequency (%)
동래통합사업소 450
17.2%
강서사업소 437
16.7%
기장사업소 422
16.1%
남부사업소 271
10.3%
북부사업소 233
8.9%
부산진 사업소 228
8.7%
해운대사업소 164
 
6.3%
사하사업소 150
 
5.7%
중동부사업소 105
 
4.0%
영도사업소 75
 
2.9%
Other values (2) 87
 
3.3%

Length

2024-03-15T04:14:33.207553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
동래통합사업소 450
15.4%
강서사업소 437
15.0%
기장사업소 422
14.5%
사업소 295
10.1%
남부사업소 271
9.3%
북부사업소 233
8.0%
부산진 228
7.8%
해운대사업소 164
 
5.6%
사하사업소 150
 
5.1%
중동부사업소 105
 
3.6%
Other values (3) 162
 
5.6%

상수도업종
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size20.6 KiB
<NA>
2186 
일반용
305 
가정용
 
125
공업용수
 
3
업무용
 
2

Length

Max length4
Median length4
Mean length3.8348589
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 2186
83.4%
일반용 305
 
11.6%
가정용 125
 
4.8%
공업용수 3
 
0.1%
업무용 2
 
0.1%
욕탕용 1
 
< 0.1%

Length

2024-03-15T04:14:33.456252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-15T04:14:33.793686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 2186
83.4%
일반용 305
 
11.6%
가정용 125
 
4.8%
공업용수 3
 
0.1%
업무용 2
 
0.1%
욕탕용 1
 
< 0.1%

월사용량
Real number (ℝ)

SKEWED  ZEROS 

Distinct66
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean99.827231
Minimum0
Maximum26160
Zeros38
Zeros (%)1.4%
Negative0
Negative (%)0.0%
Memory size23.2 KiB
2024-03-15T04:14:34.139429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q110
median10
Q322.75
95-th percentile200
Maximum26160
Range26160
Interquartile range (IQR)12.75

Descriptive statistics

Standard deviation820.49361
Coefficient of variation (CV)8.2191362
Kurtosis548.28918
Mean99.827231
Median Absolute Deviation (MAD)5
Skewness21.107788
Sum261747
Variance673209.76
MonotonicityNot monotonic
2024-03-15T04:14:34.410231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10 1280
48.8%
1 542
20.7%
100 179
 
6.8%
50 175
 
6.7%
30 82
 
3.1%
20 51
 
1.9%
200 43
 
1.6%
500 41
 
1.6%
0 38
 
1.4%
15 28
 
1.1%
Other values (56) 163
 
6.2%
ValueCountFrequency (%)
0 38
 
1.4%
1 542
20.7%
2 3
 
0.1%
3 1
 
< 0.1%
4 3
 
0.1%
5 13
 
0.5%
10 1280
48.8%
11 5
 
0.2%
15 28
 
1.1%
20 51
 
1.9%
ValueCountFrequency (%)
26160 1
 
< 0.1%
19500 1
 
< 0.1%
10000 4
0.2%
9780 1
 
< 0.1%
5235 1
 
< 0.1%
5000 3
0.1%
4500 1
 
< 0.1%
4080 1
 
< 0.1%
3000 2
0.1%
2430 1
 
< 0.1%

구경
Real number (ℝ)

Distinct14
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.488177
Minimum13
Maximum400
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.2 KiB
2024-03-15T04:14:34.643578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile15
Q115
median15
Q325
95-th percentile100
Maximum400
Range387
Interquartile range (IQR)10

Descriptive statistics

Standard deviation37.117173
Coefficient of variation (CV)1.1787654
Kurtosis23.781453
Mean31.488177
Median Absolute Deviation (MAD)0
Skewness4.2921058
Sum82562
Variance1377.6845
MonotonicityNot monotonic
2024-03-15T04:14:35.015237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
15 1334
50.9%
25 397
 
15.1%
20 256
 
9.8%
40 151
 
5.8%
100 145
 
5.5%
50 132
 
5.0%
32 94
 
3.6%
80 44
 
1.7%
150 29
 
1.1%
300 15
 
0.6%
Other values (4) 25
 
1.0%
ValueCountFrequency (%)
13 3
 
0.1%
15 1334
50.9%
20 256
 
9.8%
25 397
 
15.1%
32 94
 
3.6%
40 151
 
5.8%
50 132
 
5.0%
80 44
 
1.7%
100 145
 
5.5%
150 29
 
1.1%
ValueCountFrequency (%)
400 1
 
< 0.1%
300 15
 
0.6%
250 7
 
0.3%
200 14
 
0.5%
150 29
 
1.1%
100 145
5.5%
80 44
 
1.7%
50 132
5.0%
40 151
5.8%
32 94
3.6%

전수
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct36
Distinct (%)1.4%
Missing7
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean3.4535373
Minimum0
Maximum1360
Zeros27
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size23.2 KiB
2024-03-15T04:14:35.467636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile6
Maximum1360
Range1360
Interquartile range (IQR)0

Descriptive statistics

Standard deviation43.16906
Coefficient of variation (CV)12.499955
Kurtosis765.58876
Mean3.4535373
Median Absolute Deviation (MAD)0
Skewness27.070389
Sum9031
Variance1863.5678
MonotonicityNot monotonic
2024-03-15T04:14:35.879679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%)
1 2350
89.6%
8 43
 
1.6%
2 32
 
1.2%
4 31
 
1.2%
0 27
 
1.0%
5 20
 
0.8%
3 19
 
0.7%
6 12
 
0.5%
7 11
 
0.4%
12 8
 
0.3%
Other values (26) 62
 
2.4%
ValueCountFrequency (%)
0 27
 
1.0%
1 2350
89.6%
2 32
 
1.2%
3 19
 
0.7%
4 31
 
1.2%
5 20
 
0.8%
6 12
 
0.5%
7 11
 
0.4%
8 43
 
1.6%
9 4
 
0.2%
ValueCountFrequency (%)
1360 1
 
< 0.1%
1183 1
 
< 0.1%
1124 1
 
< 0.1%
481 1
 
< 0.1%
322 1
 
< 0.1%
130 1
 
< 0.1%
43 1
 
< 0.1%
42 2
0.1%
40 3
0.1%
39 1
 
< 0.1%

Interactions

2024-03-15T04:14:29.822927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:24.279612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:25.707119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:27.301366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:28.639429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:30.073582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:24.534264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:25.968974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:27.560140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:28.907822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:30.237340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:24.799491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:26.226803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:27.834698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:29.178322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:30.403767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:25.057265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:26.505106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:28.089772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:29.354153image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:30.652497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:25.428068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:26.795254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:28.368041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:14:29.587038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-15T04:14:36.143786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명상수도업종월사용량구경전수
연번1.0000.0600.0860.0710.0000.0000.071
사업소코드0.0601.0001.0000.0000.0000.1330.072
사업소명0.0861.0001.0000.4440.2240.3280.000
상수도업종0.0710.0000.4441.0000.0000.489NaN
월사용량0.0000.0000.2240.0001.0000.3100.000
구경0.0000.1330.3280.4890.3101.0000.188
전수0.0710.0720.000NaN0.0000.1881.000
2024-03-15T04:14:36.523652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업소명상수도업종
사업소명1.0000.262
상수도업종0.2621.000
2024-03-15T04:14:36.774715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드월사용량구경전수사업소명상수도업종
연번1.000-0.0130.017-0.0100.0110.0360.028
사업소코드-0.0131.000-0.001-0.188-0.1010.9980.278
월사용량0.017-0.0011.0000.0920.0430.0890.000
구경-0.010-0.1880.0921.000-0.2400.1450.340
전수0.011-0.1010.043-0.2401.0000.0001.000
사업소명0.0360.9980.0890.1450.0001.0000.262
상수도업종0.0280.2780.0000.3401.0000.2621.000

Missing values

2024-03-15T04:14:31.023778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-15T04:14:31.413729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번사업소코드사업소명상수도업종월사용량구경전수
01244동래통합사업소<NA>30151
12306남부사업소<NA>1251
23244동래통합사업소<NA>1151
34244동래통합사업소<NA>1151
45244동래통합사업소<NA>1401
56307북부사업소<NA>30321
67312기장사업소<NA>10151
78311강서사업소<NA>10151
89312기장사업소<NA>10151
910303영도사업소<NA>10151
연번사업소코드사업소명상수도업종월사용량구경전수
26122613311강서사업소<NA>1001002
26132614307북부사업소<NA>1156
26142615304부산진 사업소<NA>10251
26152616312기장사업소<NA>101516
26162617301중동부사업소가정용0151
26172618307북부사업소<NA>101500
26182619301중동부사업소<NA>50201
26192620244동래통합사업소<NA>30201
26202621244동래통합사업소<NA>30151
26212622244동래통합사업소<NA>10251