Overview

Dataset statistics

Number of variables7
Number of observations3656
Missing cells4
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory217.9 KiB
Average record size in memory61.0 B

Variable types

Numeric5
Categorical2

Dataset

Description부산광역시상수도사업본부_수용가정보시스템_민원신청정보_급수공사(신청승낙)_20230126
Author부산광역시 상수도사업본부
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15083686

Alerts

사업소코드 is highly overall correlated with 사업소명High correlation
월사용량 is highly overall correlated with 상수도업종High correlation
사업소명 is highly overall correlated with 사업소코드High correlation
상수도업종 is highly overall correlated with 월사용량High correlation
상수도업종 is highly imbalanced (82.0%)Imbalance
월사용량 is highly skewed (γ1 = 52.82985869)Skewed
연번 has unique valuesUnique
월사용량 has 38 (1.0%) zerosZeros

Reproduction

Analysis started2023-12-10 16:50:55.255613
Analysis finished2023-12-10 16:51:00.191402
Duration4.94 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct3656
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1828.5
Minimum1
Maximum3656
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 KiB
2023-12-11T01:51:00.309889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile183.75
Q1914.75
median1828.5
Q32742.25
95-th percentile3473.25
Maximum3656
Range3655
Interquartile range (IQR)1827.5

Descriptive statistics

Standard deviation1055.5406
Coefficient of variation (CV)0.57727133
Kurtosis-1.2
Mean1828.5
Median Absolute Deviation (MAD)914
Skewness0
Sum6684996
Variance1114166
MonotonicityStrictly increasing
2023-12-11T01:51:00.566199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
2430 1
 
< 0.1%
2432 1
 
< 0.1%
2433 1
 
< 0.1%
2434 1
 
< 0.1%
2435 1
 
< 0.1%
2436 1
 
< 0.1%
2437 1
 
< 0.1%
2438 1
 
< 0.1%
2439 1
 
< 0.1%
Other values (3646) 3646
99.7%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
3656 1
< 0.1%
3655 1
< 0.1%
3654 1
< 0.1%
3653 1
< 0.1%
3652 1
< 0.1%
3651 1
< 0.1%
3650 1
< 0.1%
3649 1
< 0.1%
3648 1
< 0.1%
3647 1
< 0.1%

사업소코드
Real number (ℝ)

HIGH CORRELATION 

Distinct13
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean297.84382
Minimum101
Maximum312
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 KiB
2023-12-11T01:51:00.742625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum101
5-th percentile244
Q1304
median307
Q3311
95-th percentile312
Maximum312
Range211
Interquartile range (IQR)7

Descriptive statistics

Standard deviation25.169585
Coefficient of variation (CV)0.084505985
Kurtosis4.8026857
Mean297.84382
Median Absolute Deviation (MAD)4
Skewness-2.1798632
Sum1088917
Variance633.50802
MonotonicityNot monotonic
2023-12-11T01:51:00.941731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
311 730
20.0%
312 667
18.2%
244 545
14.9%
306 382
10.4%
307 341
9.3%
304 298
8.2%
309 212
 
5.8%
308 183
 
5.0%
301 94
 
2.6%
302 88
 
2.4%
Other values (3) 116
 
3.2%
ValueCountFrequency (%)
101 3
 
0.1%
201 26
 
0.7%
244 545
14.9%
301 94
 
2.6%
302 88
 
2.4%
303 87
 
2.4%
304 298
8.2%
306 382
10.4%
307 341
9.3%
308 183
 
5.0%
ValueCountFrequency (%)
312 667
18.2%
311 730
20.0%
309 212
 
5.8%
308 183
 
5.0%
307 341
9.3%
306 382
10.4%
304 298
8.2%
303 87
 
2.4%
302 88
 
2.4%
301 94
 
2.6%

사업소명
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size28.7 KiB
강서 사업소
730 
기장 사업소
667 
동래통합사업소
545 
남부 사업소
382 
북부 사업소
341 
Other values (8)
991 

Length

Max length9
Median length9
Mean length8.5270788
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row북부 사업소
2nd row강서 사업소
3rd row기장 사업소
4th row강서 사업소
5th row강서 사업소

Common Values

ValueCountFrequency (%)
강서 사업소 730
20.0%
기장 사업소 667
18.2%
동래통합사업소 545
14.9%
남부 사업소 382
10.4%
북부 사업소 341
9.3%
부산진 사업소 298
8.2%
사하 사업소 212
 
5.8%
해운대 사업소 183
 
5.0%
중동부 사업소 94
 
2.6%
서부 사업소 88
 
2.4%
Other values (3) 116
 
3.2%

Length

2023-12-11T01:51:01.152349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
사업소 3082
45.7%
강서 730
 
10.8%
기장 667
 
9.9%
동래통합사업소 545
 
8.1%
남부 382
 
5.7%
북부 341
 
5.1%
부산진 298
 
4.4%
사하 212
 
3.1%
해운대 183
 
2.7%
중동부 94
 
1.4%
Other values (4) 204
 
3.0%

상수도업종
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size28.7 KiB
<NA>
3417 
일반용
 
130
가정용
 
107
욕탕용
 
1
공동수도
 
1

Length

Max length4
Median length4
Mean length3.9349015
Min length3

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row일반용

Common Values

ValueCountFrequency (%)
<NA> 3417
93.5%
일반용 130
 
3.6%
가정용 107
 
2.9%
욕탕용 1
 
< 0.1%
공동수도 1
 
< 0.1%

Length

2023-12-11T01:51:01.455369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:51:01.678075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 3417
93.5%
일반용 130
 
3.6%
가정용 107
 
2.9%
욕탕용 1
 
< 0.1%
공동수도 1
 
< 0.1%

월사용량
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct83
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean164.52817
Minimum0
Maximum270000
Zeros38
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size32.3 KiB
2023-12-11T01:51:01.898877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q110
median10
Q312.75
95-th percentile100
Maximum270000
Range270000
Interquartile range (IQR)2.75

Descriptive statistics

Standard deviation4718.7392
Coefficient of variation (CV)28.680433
Kurtosis2956.1663
Mean164.52817
Median Absolute Deviation (MAD)0
Skewness52.829859
Sum601515
Variance22266500
MonotonicityNot monotonic
2023-12-11T01:51:02.625380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10 1991
54.5%
1 695
 
19.0%
100 228
 
6.2%
30 176
 
4.8%
50 112
 
3.1%
20 77
 
2.1%
15 62
 
1.7%
0 38
 
1.0%
40 29
 
0.8%
500 25
 
0.7%
Other values (73) 223
 
6.1%
ValueCountFrequency (%)
0 38
 
1.0%
1 695
 
19.0%
2 3
 
0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%
5 8
 
0.2%
6 2
 
0.1%
10 1991
54.5%
11 1
 
< 0.1%
12 2
 
0.1%
ValueCountFrequency (%)
270000 1
 
< 0.1%
85000 1
 
< 0.1%
22900 1
 
< 0.1%
15900 1
 
< 0.1%
10000 1
 
< 0.1%
8825 1
 
< 0.1%
8600 3
0.1%
5000 3
0.1%
4500 1
 
< 0.1%
2580 1
 
< 0.1%

구경
Real number (ℝ)

Distinct13
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.740153
Minimum15
Maximum400
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.3 KiB
2023-12-11T01:51:02.863421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile15
Q115
median15
Q325
95-th percentile100
Maximum400
Range385
Interquartile range (IQR)10

Descriptive statistics

Standard deviation37.060099
Coefficient of variation (CV)1.2461301
Kurtosis42.602399
Mean29.740153
Median Absolute Deviation (MAD)0
Skewness5.5673095
Sum108730
Variance1373.4509
MonotonicityNot monotonic
2023-12-11T01:51:03.079495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
15 1978
54.1%
25 560
 
15.3%
20 359
 
9.8%
100 193
 
5.3%
50 177
 
4.8%
40 158
 
4.3%
32 115
 
3.1%
80 46
 
1.3%
150 26
 
0.7%
200 23
 
0.6%
Other values (3) 21
 
0.6%
ValueCountFrequency (%)
15 1978
54.1%
20 359
 
9.8%
25 560
 
15.3%
32 115
 
3.1%
40 158
 
4.3%
50 177
 
4.8%
80 46
 
1.3%
100 193
 
5.3%
150 26
 
0.7%
200 23
 
0.6%
ValueCountFrequency (%)
400 13
 
0.4%
300 7
 
0.2%
250 1
 
< 0.1%
200 23
 
0.6%
150 26
 
0.7%
100 193
5.3%
80 46
 
1.3%
50 177
4.8%
40 158
4.3%
32 115
3.1%

전수
Real number (ℝ)

Distinct39
Distinct (%)1.1%
Missing4
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean1.7779299
Minimum0
Maximum52
Zeros35
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size32.3 KiB
2023-12-11T01:51:03.305656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile7
Maximum52
Range52
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3.6143488
Coefficient of variation (CV)2.0328972
Kurtosis66.525133
Mean1.7779299
Median Absolute Deviation (MAD)0
Skewness7.1944066
Sum6493
Variance13.063517
MonotonicityNot monotonic
2023-12-11T01:51:03.543540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
1 3291
90.0%
8 53
 
1.4%
2 37
 
1.0%
0 35
 
1.0%
4 31
 
0.8%
5 30
 
0.8%
3 25
 
0.7%
7 19
 
0.5%
12 15
 
0.4%
16 14
 
0.4%
Other values (29) 102
 
2.8%
ValueCountFrequency (%)
0 35
 
1.0%
1 3291
90.0%
2 37
 
1.0%
3 25
 
0.7%
4 31
 
0.8%
5 30
 
0.8%
6 11
 
0.3%
7 19
 
0.5%
8 53
 
1.4%
9 6
 
0.2%
ValueCountFrequency (%)
52 1
< 0.1%
49 1
< 0.1%
48 2
0.1%
45 1
< 0.1%
42 1
< 0.1%
39 1
< 0.1%
38 1
< 0.1%
37 1
< 0.1%
36 2
0.1%
32 1
< 0.1%

Interactions

2023-12-11T01:50:59.015601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:55.965752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:56.701027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:57.383631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:58.143462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:59.182737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:56.110086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:56.859873image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:57.558885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:58.388857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:59.334432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:56.239406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:56.977772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:57.687178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:58.544953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:59.501322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:56.401334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:57.116338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:57.816362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:58.708617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:59.673517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:56.538398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:57.259061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:57.988368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:50:58.869187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:51:03.722596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명상수도업종월사용량구경전수
연번1.0000.0000.0760.1740.0020.0280.036
사업소코드0.0001.0001.0000.1220.0000.7570.113
사업소명0.0761.0001.0000.4650.0000.5450.143
상수도업종0.1740.1220.4651.000NaN0.2330.310
월사용량0.0020.0000.000NaN1.0000.3880.000
구경0.0280.7570.5450.2330.3881.0000.000
전수0.0360.1130.1430.3100.0000.0001.000
2023-12-11T01:51:03.899535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상수도업종사업소명
상수도업종1.0000.292
사업소명0.2921.000
2023-12-11T01:51:04.036676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드월사용량구경전수사업소명상수도업종
연번1.000-0.016-0.0130.007-0.0070.0310.102
사업소코드-0.0161.000-0.024-0.169-0.1120.9990.080
월사용량-0.013-0.0241.0000.0400.1050.0001.000
구경0.007-0.1690.0401.000-0.2760.2830.222
전수-0.007-0.1120.105-0.2761.0000.0580.125
사업소명0.0310.9990.0000.2830.0581.0000.292
상수도업종0.1020.0801.0000.2220.1250.2921.000

Missing values

2023-12-11T01:50:59.917633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:51:00.113090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번사업소코드사업소명상수도업종월사용량구경전수
01307북부 사업소<NA>10151
12311강서 사업소<NA>10151
23312기장 사업소<NA>10153
34311강서 사업소<NA>10155
45311강서 사업소일반용170801
56244동래통합사업소<NA>10251
67306남부 사업소<NA>101518
78304부산진 사업소<NA>500251
89311강서 사업소<NA>01000
910309사하 사업소<NA>50151
연번사업소코드사업소명상수도업종월사용량구경전수
36463647311강서 사업소<NA>10151
36473648311강서 사업소<NA>10151
36483649312기장 사업소<NA>10501
36493650311강서 사업소<NA>10151
36503651244동래통합사업소<NA>10151
36513652304부산진 사업소<NA>150251
36523653304부산진 사업소<NA>1151
36533654244동래통합사업소<NA>30251
36543655312기장 사업소<NA>10201
36553656244동래통합사업소<NA>1001001