Overview

Dataset statistics

Number of variables7
Number of observations4802
Missing cells8
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory286.2 KiB
Average record size in memory61.0 B

Variable types

Numeric5
Categorical2

Dataset

Description부산광역시상수도사업본부_수용가정보시스템_민원신청정보_급수공사(신청승낙)_20220131
Author부산광역시 상수도사업본부
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15083686

Alerts

사업소코드 is highly overall correlated with 사업소명High correlation
월사용량 is highly overall correlated with 상수도업종High correlation
사업소명 is highly overall correlated with 사업소코드High correlation
상수도업종 is highly overall correlated with 월사용량High correlation
상수도업종 is highly imbalanced (81.6%)Imbalance
월사용량 is highly skewed (γ1 = 66.19852167)Skewed
연번 has unique valuesUnique
월사용량 has 466 (9.7%) zerosZeros

Reproduction

Analysis started2023-12-10 16:51:06.201274
Analysis finished2023-12-10 16:51:11.587346
Duration5.39 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct4802
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2401.5
Minimum1
Maximum4802
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size42.3 KiB
2023-12-11T01:51:11.749297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile241.05
Q11201.25
median2401.5
Q33601.75
95-th percentile4561.95
Maximum4802
Range4801
Interquartile range (IQR)2400.5

Descriptive statistics

Standard deviation1386.3623
Coefficient of variation (CV)0.57729016
Kurtosis-1.2
Mean2401.5
Median Absolute Deviation (MAD)1200.5
Skewness0
Sum11532003
Variance1922000.5
MonotonicityStrictly increasing
2023-12-11T01:51:12.012796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
3209 1
 
< 0.1%
3207 1
 
< 0.1%
3206 1
 
< 0.1%
3205 1
 
< 0.1%
3204 1
 
< 0.1%
3203 1
 
< 0.1%
3202 1
 
< 0.1%
3201 1
 
< 0.1%
3200 1
 
< 0.1%
Other values (4792) 4792
99.8%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
4802 1
< 0.1%
4801 1
< 0.1%
4800 1
< 0.1%
4799 1
< 0.1%
4798 1
< 0.1%
4797 1
< 0.1%
4796 1
< 0.1%
4795 1
< 0.1%
4794 1
< 0.1%
4793 1
< 0.1%

사업소코드
Real number (ℝ)

HIGH CORRELATION 

Distinct13
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean296.3586
Minimum101
Maximum312
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size42.3 KiB
2023-12-11T01:51:12.199403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum101
5-th percentile244
Q1303
median307
Q3311
95-th percentile312
Maximum312
Range211
Interquartile range (IQR)8

Descriptive statistics

Standard deviation25.782382
Coefficient of variation (CV)0.086997244
Kurtosis3.0739613
Mean296.3586
Median Absolute Deviation (MAD)4
Skewness-1.892807
Sum1423114
Variance664.7312
MonotonicityNot monotonic
2023-12-11T01:51:12.385479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
311 857
17.8%
244 820
17.1%
312 755
15.7%
307 527
11.0%
306 463
9.6%
304 374
7.8%
308 281
 
5.9%
309 238
 
5.0%
301 183
 
3.8%
302 141
 
2.9%
Other values (3) 163
 
3.4%
ValueCountFrequency (%)
101 3
 
0.1%
201 27
 
0.6%
244 820
17.1%
301 183
 
3.8%
302 141
 
2.9%
303 133
 
2.8%
304 374
7.8%
306 463
9.6%
307 527
11.0%
308 281
 
5.9%
ValueCountFrequency (%)
312 755
15.7%
311 857
17.8%
309 238
 
5.0%
308 281
 
5.9%
307 527
11.0%
306 463
9.6%
304 374
7.8%
303 133
 
2.8%
302 141
 
2.9%
301 183
 
3.8%

사업소명
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
강서 사업소
857 
동래통합사업소
820 
기장 사업소
755 
북부 사업소
527 
남부 사업소
463 
Other values (8)
1380 

Length

Max length9
Median length9
Mean length8.4702207
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강서 사업소
2nd row기장 사업소
3rd row남부 사업소
4th row부산진 사업소
5th row부산진 사업소

Common Values

ValueCountFrequency (%)
강서 사업소 857
17.8%
동래통합사업소 820
17.1%
기장 사업소 755
15.7%
북부 사업소 527
11.0%
남부 사업소 463
9.6%
부산진 사업소 374
7.8%
해운대 사업소 281
 
5.9%
사하 사업소 238
 
5.0%
중동부 사업소 183
 
3.8%
서부 사업소 141
 
2.9%
Other values (3) 163
 
3.4%

Length

2023-12-11T01:51:12.689999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
사업소 3952
45.1%
강서 857
 
9.8%
동래통합사업소 820
 
9.4%
기장 755
 
8.6%
북부 527
 
6.0%
남부 463
 
5.3%
부산진 374
 
4.3%
해운대 281
 
3.2%
사하 238
 
2.7%
중동부 183
 
2.1%
Other values (4) 304
 
3.5%

상수도업종
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size37.6 KiB
<NA>
4475 
일반용
 
197
가정용
 
128
공업용수
 
1
사회복지
 
1

Length

Max length4
Median length4
Mean length3.9323199
Min length3

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row가정용
3rd row가정용
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 4475
93.2%
일반용 197
 
4.1%
가정용 128
 
2.7%
공업용수 1
 
< 0.1%
사회복지 1
 
< 0.1%

Length

2023-12-11T01:51:12.908093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:51:13.064779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 4475
93.2%
일반용 197
 
4.1%
가정용 128
 
2.7%
공업용수 1
 
< 0.1%
사회복지 1
 
< 0.1%

월사용량
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct107
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63.55581
Minimum0
Maximum100000
Zeros466
Zeros (%)9.7%
Negative0
Negative (%)0.0%
Memory size42.3 KiB
2023-12-11T01:51:13.285668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median10
Q320
95-th percentile100
Maximum100000
Range100000
Interquartile range (IQR)15

Descriptive statistics

Standard deviation1465.7565
Coefficient of variation (CV)23.062509
Kurtosis4504.654
Mean63.55581
Median Absolute Deviation (MAD)7
Skewness66.198522
Sum305195
Variance2148442
MonotonicityNot monotonic
2023-12-11T01:51:13.557667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10 2156
44.9%
1 716
 
14.9%
0 466
 
9.7%
100 458
 
9.5%
15 200
 
4.2%
50 175
 
3.6%
30 116
 
2.4%
20 76
 
1.6%
40 55
 
1.1%
60 33
 
0.7%
Other values (97) 351
 
7.3%
ValueCountFrequency (%)
0 466
 
9.7%
1 716
 
14.9%
2 7
 
0.1%
3 7
 
0.1%
5 6
 
0.1%
6 1
 
< 0.1%
7 1
 
< 0.1%
8 6
 
0.1%
10 2156
44.9%
11 4
 
0.1%
ValueCountFrequency (%)
100000 1
 
< 0.1%
10000 2
 
< 0.1%
5000 2
 
< 0.1%
3120 1
 
< 0.1%
3000 2
 
< 0.1%
2316 2
 
< 0.1%
1500 2
 
< 0.1%
1000 19
0.4%
900 1
 
< 0.1%
800 2
 
< 0.1%

구경
Real number (ℝ)

Distinct14
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.304456
Minimum13
Maximum400
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size42.3 KiB
2023-12-11T01:51:13.730695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile15
Q115
median15
Q325
95-th percentile100
Maximum400
Range387
Interquartile range (IQR)10

Descriptive statistics

Standard deviation41.686786
Coefficient of variation (CV)1.3316566
Kurtosis37.840023
Mean31.304456
Median Absolute Deviation (MAD)0
Skewness5.3480935
Sum150324
Variance1737.7881
MonotonicityNot monotonic
2023-12-11T01:51:13.876394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
15 2576
53.6%
25 697
 
14.5%
20 550
 
11.5%
100 351
 
7.3%
40 175
 
3.6%
32 150
 
3.1%
50 141
 
2.9%
80 64
 
1.3%
200 28
 
0.6%
150 27
 
0.6%
Other values (4) 43
 
0.9%
ValueCountFrequency (%)
13 3
 
0.1%
15 2576
53.6%
20 550
 
11.5%
25 697
 
14.5%
32 150
 
3.1%
40 175
 
3.6%
50 141
 
2.9%
80 64
 
1.3%
100 351
 
7.3%
150 27
 
0.6%
ValueCountFrequency (%)
400 27
 
0.6%
300 9
 
0.2%
250 4
 
0.1%
200 28
 
0.6%
150 27
 
0.6%
100 351
7.3%
80 64
 
1.3%
50 141
2.9%
40 175
3.6%
32 150
3.1%

전수
Real number (ℝ)

Distinct42
Distinct (%)0.9%
Missing8
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean1.8224864
Minimum0
Maximum115
Zeros47
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size42.3 KiB
2023-12-11T01:51:14.338562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile8
Maximum115
Range115
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.0136928
Coefficient of variation (CV)2.202317
Kurtosis174.50908
Mean1.8224864
Median Absolute Deviation (MAD)0
Skewness9.8938747
Sum8737
Variance16.10973
MonotonicityNot monotonic
2023-12-11T01:51:14.490362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%)
1 4353
90.6%
8 79
 
1.6%
0 47
 
1.0%
12 37
 
0.8%
2 36
 
0.7%
3 31
 
0.6%
10 22
 
0.5%
5 20
 
0.4%
16 20
 
0.4%
6 17
 
0.4%
Other values (32) 132
 
2.7%
ValueCountFrequency (%)
0 47
 
1.0%
1 4353
90.6%
2 36
 
0.7%
3 31
 
0.6%
4 16
 
0.3%
5 20
 
0.4%
6 17
 
0.4%
7 12
 
0.2%
8 79
 
1.6%
9 8
 
0.2%
ValueCountFrequency (%)
115 1
< 0.1%
62 1
< 0.1%
49 1
< 0.1%
48 1
< 0.1%
47 1
< 0.1%
42 2
< 0.1%
41 2
< 0.1%
40 1
< 0.1%
35 1
< 0.1%
33 1
< 0.1%

Interactions

2023-12-11T01:51:10.530816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:07.174392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:08.060392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:08.788508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:09.590956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:10.716816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:07.376925image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:08.196637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:08.952585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:09.781301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:10.874465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:07.526283image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:08.341145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:09.106474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:09.926104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:11.004342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:07.691799image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:08.490784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:09.263566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:10.104642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:11.175328image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:07.889403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:08.639107image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:09.428502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:51:10.340058image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:51:14.611508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명상수도업종월사용량구경전수
연번1.0000.0190.0000.1490.0000.0000.000
사업소코드0.0191.0001.0000.0000.0300.8570.063
사업소명0.0001.0001.0000.0000.0000.6350.140
상수도업종0.1490.0000.0001.000NaN0.2920.000
월사용량0.0000.0300.000NaN1.0000.6560.000
구경0.0000.8570.6350.2920.6561.0000.000
전수0.0000.0630.1400.0000.0000.0001.000
2023-12-11T01:51:14.749399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상수도업종사업소명
상수도업종1.0000.000
사업소명0.0001.000
2023-12-11T01:51:14.873037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드월사용량구경전수사업소명상수도업종
연번1.000-0.0010.029-0.011-0.0030.0000.088
사업소코드-0.0011.000-0.153-0.132-0.0910.9990.000
월사용량0.029-0.1531.0000.0870.1340.0001.000
구경-0.011-0.1320.0871.000-0.2490.3520.191
전수-0.003-0.0910.134-0.2491.0000.0650.000
사업소명0.0000.9990.0000.3520.0651.0000.000
상수도업종0.0880.0001.0000.1910.0000.0001.000

Missing values

2023-12-11T01:51:11.345686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:51:11.494273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번사업소코드사업소명상수도업종월사용량구경전수
01311강서 사업소<NA>10151
12312기장 사업소가정용42201
23306남부 사업소가정용21151
34304부산진 사업소<NA>1151
45304부산진 사업소<NA>1251
56311강서 사업소<NA>10151
67312기장 사업소<NA>10151
78244동래통합사업소<NA>10151
89312기장 사업소일반용15201
910312기장 사업소가정용15151
연번사업소코드사업소명상수도업종월사용량구경전수
47924793302서부 사업소<NA>100158
47934794306남부 사업소<NA>10251
47944795312기장 사업소일반용60321
47954796244동래통합사업소<NA>1251
47964797309사하 사업소<NA>100203
47974798311강서 사업소<NA>10801
47984799201시설관리사업소<NA>12001
47994800306남부 사업소<NA>261516
48004801311강서 사업소<NA>10151
48014802244동래통합사업소<NA>1251