Overview

Dataset statistics

Number of variables7
Number of observations2067
Missing cells2
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory125.3 KiB
Average record size in memory62.1 B

Variable types

Numeric5
Categorical2

Dataset

Description부산광역시상수도사업본부_수용가정보시스템_민원신청정보_급수공사(신청승낙)_20210601
Author부산광역시 상수도사업본부
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15083686

Alerts

사업소코드 is highly overall correlated with 사업소명High correlation
월사용량 is highly overall correlated with 상수도업종High correlation
사업소명 is highly overall correlated with 사업소코드High correlation
상수도업종 is highly overall correlated with 월사용량High correlation
상수도업종 is highly imbalanced (78.3%)Imbalance
월사용량 is highly skewed (γ1 = 44.48573781)Skewed
연번 has unique valuesUnique
월사용량 has 226 (10.9%) zerosZeros
전수 has 25 (1.2%) zerosZeros

Reproduction

Analysis started2023-12-10 16:51:16.352831
Analysis finished2023-12-10 16:51:20.883509
Duration4.53 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct2067
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1034
Minimum1
Maximum2067
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.3 KiB
2023-12-11T01:51:21.019619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile104.3
Q1517.5
median1034
Q31550.5
95-th percentile1963.7
Maximum2067
Range2066
Interquartile range (IQR)1033

Descriptive statistics

Standard deviation596.83582
Coefficient of variation (CV)0.57721066
Kurtosis-1.2
Mean1034
Median Absolute Deviation (MAD)517
Skewness0
Sum2137278
Variance356213
MonotonicityStrictly increasing
2023-12-11T01:51:21.217249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
1359 1
 
< 0.1%
1389 1
 
< 0.1%
1388 1
 
< 0.1%
1387 1
 
< 0.1%
1386 1
 
< 0.1%
1385 1
 
< 0.1%
1384 1
 
< 0.1%
1383 1
 
< 0.1%
1382 1
 
< 0.1%
Other values (2057) 2057
99.5%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
2067 1
< 0.1%
2066 1
< 0.1%
2065 1
< 0.1%
2064 1
< 0.1%
2063 1
< 0.1%
2062 1
< 0.1%
2061 1
< 0.1%
2060 1
< 0.1%
2059 1
< 0.1%
2058 1
< 0.1%

사업소코드
Real number (ℝ)

HIGH CORRELATION 

Distinct12
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean295.5075
Minimum201
Maximum312
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.3 KiB
2023-12-11T01:51:21.416462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum201
5-th percentile244
Q1302
median307
Q3311
95-th percentile312
Maximum312
Range111
Interquartile range (IQR)9

Descriptive statistics

Standard deviation26.03011
Coefficient of variation (CV)0.088086123
Kurtosis0.85003457
Mean295.5075
Median Absolute Deviation (MAD)4
Skewness-1.591574
Sum610814
Variance677.56662
MonotonicityNot monotonic
2023-12-11T01:51:21.609737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
244 385
18.6%
311 371
17.9%
312 311
15.0%
307 227
11.0%
306 185
9.0%
304 165
8.0%
308 119
 
5.8%
309 97
 
4.7%
301 87
 
4.2%
303 61
 
3.0%
Other values (2) 59
 
2.9%
ValueCountFrequency (%)
201 11
 
0.5%
244 385
18.6%
301 87
 
4.2%
302 48
 
2.3%
303 61
 
3.0%
304 165
8.0%
306 185
9.0%
307 227
11.0%
308 119
 
5.8%
309 97
 
4.7%
ValueCountFrequency (%)
312 311
15.0%
311 371
17.9%
309 97
 
4.7%
308 119
 
5.8%
307 227
11.0%
306 185
9.0%
304 165
8.0%
303 61
 
3.0%
302 48
 
2.3%
301 87
 
4.2%

사업소명
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size16.3 KiB
북부통합사업소
385 
강서 사업소
371 
기장 사업소
311 
북부 사업소
227 
남부 사업소
185 
Other values (7)
588 

Length

Max length9
Median length9
Mean length8.4373488
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강서 사업소
2nd row부산진 사업소
3rd row부산진 사업소
4th row강서 사업소
5th row기장 사업소

Common Values

ValueCountFrequency (%)
북부통합사업소 385
18.6%
강서 사업소 371
17.9%
기장 사업소 311
15.0%
북부 사업소 227
11.0%
남부 사업소 185
9.0%
부산진 사업소 165
8.0%
해운대 사업소 119
 
5.8%
사하 사업소 97
 
4.7%
중동부 사업소 87
 
4.2%
영도 사업소 61
 
3.0%
Other values (2) 59
 
2.9%

Length

2023-12-11T01:51:21.885496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
사업소 1671
44.7%
북부통합사업소 385
 
10.3%
강서 371
 
9.9%
기장 311
 
8.3%
북부 227
 
6.1%
남부 185
 
4.9%
부산진 165
 
4.4%
해운대 119
 
3.2%
사하 97
 
2.6%
중동부 87
 
2.3%
Other values (3) 120
 
3.2%

상수도업종
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size16.3 KiB
<NA>
1894 
3
 
100
1
 
71
2
 
1
8
 
1

Length

Max length4
Median length4
Mean length3.7489115
Min length1

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 1894
91.6%
3 100
 
4.8%
1 71
 
3.4%
2 1
 
< 0.1%
8 1
 
< 0.1%

Length

2023-12-11T01:51:22.200798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:51:22.414644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 1894
91.6%
3 100
 
4.8%
1 71
 
3.4%
2 1
 
< 0.1%
8 1
 
< 0.1%

월사용량
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct70
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean92.706821
Minimum0
Maximum100000
Zeros226
Zeros (%)10.9%
Negative0
Negative (%)0.0%
Memory size18.3 KiB
2023-12-11T01:51:22.656419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median10
Q320
95-th percentile100
Maximum100000
Range100000
Interquartile range (IQR)19

Descriptive statistics

Standard deviation2215.3937
Coefficient of variation (CV)23.896771
Kurtosis2004.9832
Mean92.706821
Median Absolute Deviation (MAD)9
Skewness44.485738
Sum191625
Variance4907969.1
MonotonicityNot monotonic
2023-12-11T01:51:23.011269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10 901
43.6%
1 298
 
14.4%
100 236
 
11.4%
0 226
 
10.9%
15 93
 
4.5%
50 57
 
2.8%
20 43
 
2.1%
40 28
 
1.4%
30 25
 
1.2%
25 13
 
0.6%
Other values (60) 147
 
7.1%
ValueCountFrequency (%)
0 226
 
10.9%
1 298
 
14.4%
2 4
 
0.2%
3 2
 
0.1%
5 2
 
0.1%
6 1
 
< 0.1%
7 1
 
< 0.1%
8 1
 
< 0.1%
10 901
43.6%
11 4
 
0.2%
ValueCountFrequency (%)
100000 1
 
< 0.1%
10000 1
 
< 0.1%
5000 1
 
< 0.1%
3120 1
 
< 0.1%
1500 1
 
< 0.1%
1000 11
0.5%
800 1
 
< 0.1%
713 1
 
< 0.1%
600 1
 
< 0.1%
510 1
 
< 0.1%

구경
Real number (ℝ)

Distinct14
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.707305
Minimum13
Maximum400
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.3 KiB
2023-12-11T01:51:23.248037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile15
Q115
median15
Q325
95-th percentile100
Maximum400
Range387
Interquartile range (IQR)10

Descriptive statistics

Standard deviation44.164328
Coefficient of variation (CV)1.3102302
Kurtosis29.331605
Mean33.707305
Median Absolute Deviation (MAD)0
Skewness4.6585068
Sum69673
Variance1950.4879
MonotonicityNot monotonic
2023-12-11T01:51:23.466527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
15 1089
52.7%
25 286
 
13.8%
20 224
 
10.8%
100 201
 
9.7%
40 72
 
3.5%
32 66
 
3.2%
50 52
 
2.5%
80 28
 
1.4%
200 13
 
0.6%
150 13
 
0.6%
Other values (4) 23
 
1.1%
ValueCountFrequency (%)
13 2
 
0.1%
15 1089
52.7%
20 224
 
10.8%
25 286
 
13.8%
32 66
 
3.2%
40 72
 
3.5%
50 52
 
2.5%
80 28
 
1.4%
100 201
 
9.7%
150 13
 
0.6%
ValueCountFrequency (%)
400 11
 
0.5%
300 6
 
0.3%
250 4
 
0.2%
200 13
 
0.6%
150 13
 
0.6%
100 201
9.7%
80 28
 
1.4%
50 52
 
2.5%
40 72
 
3.5%
32 66
 
3.2%

전수
Real number (ℝ)

ZEROS 

Distinct37
Distinct (%)1.8%
Missing2
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean1.9719128
Minimum0
Maximum115
Zeros25
Zeros (%)1.2%
Negative0
Negative (%)0.0%
Memory size18.3 KiB
2023-12-11T01:51:23.702956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile8
Maximum115
Range115
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.6972474
Coefficient of variation (CV)2.3820766
Kurtosis185.8225
Mean1.9719128
Median Absolute Deviation (MAD)0
Skewness10.505005
Sum4072
Variance22.064133
MonotonicityNot monotonic
2023-12-11T01:51:24.341801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
1 1851
89.6%
8 36
 
1.7%
0 25
 
1.2%
2 17
 
0.8%
12 16
 
0.8%
3 16
 
0.8%
10 10
 
0.5%
16 9
 
0.4%
5 8
 
0.4%
4 8
 
0.4%
Other values (27) 69
 
3.3%
ValueCountFrequency (%)
0 25
 
1.2%
1 1851
89.6%
2 17
 
0.8%
3 16
 
0.8%
4 8
 
0.4%
5 8
 
0.4%
6 5
 
0.2%
7 8
 
0.4%
8 36
 
1.7%
9 7
 
0.3%
ValueCountFrequency (%)
115 1
< 0.1%
49 1
< 0.1%
47 1
< 0.1%
42 1
< 0.1%
41 2
0.1%
35 1
< 0.1%
33 1
< 0.1%
32 2
0.1%
30 2
0.1%
29 1
< 0.1%

Interactions

2023-12-11T01:51:19.796027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:16.929129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:17.652527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:18.386948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:19.131092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:19.923182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:17.055464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:17.787994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:18.531039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:19.250752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:20.061453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:17.179524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:17.926544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:18.671273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:19.403534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:20.183218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:17.324516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:18.074003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:18.803671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:19.542809image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:20.323568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:17.489180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:18.231519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:18.966429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:19.666687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:51:24.509539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명상수도업종월사용량구경전수
연번1.0000.0000.0900.0000.0000.0000.026
사업소코드0.0001.0001.0000.0000.0000.0690.101
사업소명0.0901.0001.0000.0000.0000.6230.196
상수도업종0.0000.0000.0001.000NaN0.0000.000
월사용량0.0000.0000.000NaN1.0000.6540.000
구경0.0000.0690.6230.0000.6541.0000.000
전수0.0260.1010.1960.0000.0000.0001.000
2023-12-11T01:51:24.723548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상수도업종사업소명
상수도업종1.0000.000
사업소명0.0001.000
2023-12-11T01:51:24.883861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드월사용량구경전수사업소명상수도업종
연번1.0000.0020.045-0.002-0.0290.0380.000
사업소코드0.0021.000-0.174-0.150-0.0910.9980.000
월사용량0.045-0.1741.0000.1090.1260.0001.000
구경-0.002-0.1500.1091.000-0.2610.3200.000
전수-0.029-0.0910.126-0.2611.0000.0780.000
사업소명0.0380.9980.0000.3200.0781.0000.000
상수도업종0.0000.0001.0000.0000.0000.0001.000

Missing values

2023-12-11T01:51:20.522890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:51:20.799847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번사업소코드사업소명상수도업종월사용량구경전수
01311강서 사업소<NA>10151
12304부산진 사업소<NA>1151
23304부산진 사업소<NA>1251
34311강서 사업소<NA>10151
45312기장 사업소<NA>10151
56244북부통합사업소<NA>10151
67312기장 사업소315201
78312기장 사업소115151
89312기장 사업소<NA>10151
910307북부 사업소<NA>30151
연번사업소코드사업소명상수도업종월사용량구경전수
20572058307북부 사업소<NA>20251
20582059308해운대 사업소<NA>15151
20592060244북부통합사업소<NA>11251
20602061244북부통합사업소<NA>10401
20612062307북부 사업소<NA>10201
20622063307북부 사업소<NA>10201
20632064304부산진 사업소<NA>1151
20642065311강서 사업소<NA>0151
20652066303영도 사업소<NA>02000
20662067312기장 사업소<NA>1001001