Overview

Dataset statistics

Number of variables7
Number of observations2230
Missing cells6
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory130.8 KiB
Average record size in memory60.1 B

Variable types

Numeric4
Categorical3

Dataset

Description부산광역시상수도사업본부_수용가정보시스템_민원신청정보_누수탐지비_20230126
Author부산광역시 상수도사업본부
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15083449

Alerts

사업소코드 is highly overall correlated with 사업소명 and 1 other fieldsHigh correlation
사업소명 is highly overall correlated with 사업소코드 and 2 other fieldsHigh correlation
신청부서명 is highly overall correlated with 사업소코드 and 1 other fieldsHigh correlation
건물형태 is highly overall correlated with 사업소명High correlation
건물형태 is highly imbalanced (66.9%)Imbalance
누수탐지지급액 is highly skewed (γ1 = 39.27539373)Skewed
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-10 16:56:29.786356
Analysis finished2023-12-10 16:56:33.084715
Duration3.3 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct2230
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1115.5
Minimum1
Maximum2230
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size19.7 KiB
2023-12-11T01:56:33.177597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile112.45
Q1558.25
median1115.5
Q31672.75
95-th percentile2118.55
Maximum2230
Range2229
Interquartile range (IQR)1114.5

Descriptive statistics

Standard deviation643.88987
Coefficient of variation (CV)0.57722086
Kurtosis-1.2
Mean1115.5
Median Absolute Deviation (MAD)557.5
Skewness0
Sum2487565
Variance414594.17
MonotonicityStrictly increasing
2023-12-11T01:56:33.345858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
1491 1
 
< 0.1%
1485 1
 
< 0.1%
1486 1
 
< 0.1%
1487 1
 
< 0.1%
1488 1
 
< 0.1%
1489 1
 
< 0.1%
1490 1
 
< 0.1%
1492 1
 
< 0.1%
1500 1
 
< 0.1%
Other values (2220) 2220
99.6%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
2230 1
< 0.1%
2229 1
< 0.1%
2228 1
< 0.1%
2227 1
< 0.1%
2226 1
< 0.1%
2225 1
< 0.1%
2224 1
< 0.1%
2223 1
< 0.1%
2222 1
< 0.1%
2221 1
< 0.1%

사업소코드
Real number (ℝ)

HIGH CORRELATION 

Distinct11
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean289.71883
Minimum244
Maximum312
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size19.7 KiB
2023-12-11T01:56:33.515230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum244
5-th percentile244
Q1244
median304
Q3307
95-th percentile311
Maximum312
Range68
Interquartile range (IQR)63

Descriptive statistics

Standard deviation27.270464
Coefficient of variation (CV)0.094127342
Kurtosis-0.82903864
Mean289.71883
Median Absolute Deviation (MAD)3
Skewness-1.0601262
Sum646073
Variance743.6782
MonotonicityNot monotonic
2023-12-11T01:56:33.723438image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
244 581
26.1%
307 288
12.9%
306 236
10.6%
304 212
 
9.5%
303 170
 
7.6%
301 165
 
7.4%
309 160
 
7.2%
308 149
 
6.7%
302 113
 
5.1%
312 80
 
3.6%
ValueCountFrequency (%)
244 581
26.1%
301 165
 
7.4%
302 113
 
5.1%
303 170
 
7.6%
304 212
 
9.5%
306 236
10.6%
307 288
12.9%
308 149
 
6.7%
309 160
 
7.2%
311 76
 
3.4%
ValueCountFrequency (%)
312 80
 
3.6%
311 76
 
3.4%
309 160
7.2%
308 149
6.7%
307 288
12.9%
306 236
10.6%
304 212
9.5%
303 170
7.6%
302 113
 
5.1%
301 165
7.4%

사업소명
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size17.6 KiB
동래통합사업소
581 
북부 사업소
288 
남부 사업소
236 
부산진 사업소
212 
영도 사업소
170 
Other values (6)
743 

Length

Max length9
Median length9
Mean length8.2430493
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row동래통합사업소
2nd row강서 사업소
3rd row강서 사업소
4th row해운대 사업소
5th row동래통합사업소

Common Values

ValueCountFrequency (%)
동래통합사업소 581
26.1%
북부 사업소 288
12.9%
남부 사업소 236
10.6%
부산진 사업소 212
 
9.5%
영도 사업소 170
 
7.6%
중동부 사업소 165
 
7.4%
사하 사업소 160
 
7.2%
해운대 사업소 149
 
6.7%
서부 사업소 113
 
5.1%
기장 사업소 80
 
3.6%

Length

2023-12-11T01:56:34.026012image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
사업소 1649
42.5%
동래통합사업소 581
 
15.0%
북부 288
 
7.4%
남부 236
 
6.1%
부산진 212
 
5.5%
영도 170
 
4.4%
중동부 165
 
4.3%
사하 160
 
4.1%
해운대 149
 
3.8%
서부 113
 
2.9%
Other values (2) 156
 
4.0%

신청부서명
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size17.6 KiB
급수운영팀
581 
요금1
419 
요금
358 
공무1
238 
공무
228 
Other values (4)
406 

Length

Max length5
Median length4
Mean length3.2609865
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row급수운영팀
2nd row요금
3rd row요금
4th row공무
5th row급수운영팀

Common Values

ValueCountFrequency (%)
급수운영팀 581
26.1%
요금1 419
18.8%
요금 358
16.1%
공무1 238
10.7%
공무 228
 
10.2%
요금2 211
 
9.5%
공무2 185
 
8.3%
<NA> 8
 
0.4%
업무 2
 
0.1%

Length

2023-12-11T01:56:34.260150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:56:34.511752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
급수운영팀 581
26.1%
요금1 419
18.8%
요금 358
16.1%
공무1 238
10.7%
공무 228
 
10.2%
요금2 211
 
9.5%
공무2 185
 
8.3%
na 8
 
0.4%
업무 2
 
0.1%

건물형태
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size17.6 KiB
단독주택
1892 
기타
245 
공동주택
 
82
<NA>
 
8
근린생활시설(상가 등)
 
3

Length

Max length12
Median length4
Mean length3.7910314
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row단독주택
2nd row단독주택
3rd row단독주택
4th row단독주택
5th row단독주택

Common Values

ValueCountFrequency (%)
단독주택 1892
84.8%
기타 245
 
11.0%
공동주택 82
 
3.7%
<NA> 8
 
0.4%
근린생활시설(상가 등) 3
 
0.1%

Length

2023-12-11T01:56:34.762450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:56:34.981435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
단독주택 1892
84.7%
기타 245
 
11.0%
공동주택 82
 
3.7%
na 8
 
0.4%
근린생활시설(상가 3
 
0.1%
3
 
0.1%

누수탐지소요액
Real number (ℝ)

Distinct65
Distinct (%)2.9%
Missing3
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean255423.44
Minimum0
Maximum3000000
Zeros8
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size19.7 KiB
2023-12-11T01:56:35.248631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile100000
Q1150000
median200000
Q3300000
95-th percentile585000
Maximum3000000
Range3000000
Interquartile range (IQR)150000

Descriptive statistics

Standard deviation207049.72
Coefficient of variation (CV)0.81061362
Kurtosis39.385186
Mean255423.44
Median Absolute Deviation (MAD)100000
Skewness4.7080641
Sum5.68828 × 108
Variance4.2869587 × 1010
MonotonicityNot monotonic
2023-12-11T01:56:35.595798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000 466
20.9%
300000 393
17.6%
200000 360
16.1%
150000 284
12.7%
250000 200
9.0%
400000 107
 
4.8%
500000 77
 
3.5%
350000 71
 
3.2%
80000 40
 
1.8%
450000 35
 
1.6%
Other values (55) 194
8.7%
ValueCountFrequency (%)
0 8
 
0.4%
10000 1
 
< 0.1%
30000 1
 
< 0.1%
40000 2
 
0.1%
50000 4
 
0.2%
60000 1
 
< 0.1%
70000 3
 
0.1%
80000 40
 
1.8%
90000 19
 
0.9%
100000 466
20.9%
ValueCountFrequency (%)
3000000 1
 
< 0.1%
2500000 2
0.1%
2020000 1
 
< 0.1%
2000000 3
0.1%
1800000 1
 
< 0.1%
1500000 2
0.1%
1430000 1
 
< 0.1%
1400000 1
 
< 0.1%
1300000 1
 
< 0.1%
1250000 1
 
< 0.1%

누수탐지지급액
Real number (ℝ)

SKEWED 

Distinct9
Distinct (%)0.4%
Missing3
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean39956.893
Minimum0
Maximum400000
Zeros8
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size19.7 KiB
2023-12-11T01:56:35.837267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile40000
Q140000
median40000
Q340000
95-th percentile40000
Maximum400000
Range400000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8080.3985
Coefficient of variation (CV)0.2022279
Kurtosis1774.9444
Mean39956.893
Median Absolute Deviation (MAD)0
Skewness39.275394
Sum88984000
Variance65292840
MonotonicityNot monotonic
2023-12-11T01:56:36.068855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
40000 2209
99.1%
0 8
 
0.4%
35000 3
 
0.1%
25000 2
 
0.1%
400000 1
 
< 0.1%
4000 1
 
< 0.1%
15000 1
 
< 0.1%
20000 1
 
< 0.1%
30000 1
 
< 0.1%
(Missing) 3
 
0.1%
ValueCountFrequency (%)
0 8
 
0.4%
4000 1
 
< 0.1%
15000 1
 
< 0.1%
20000 1
 
< 0.1%
25000 2
 
0.1%
30000 1
 
< 0.1%
35000 3
 
0.1%
40000 2209
99.1%
400000 1
 
< 0.1%
ValueCountFrequency (%)
400000 1
 
< 0.1%
40000 2209
99.1%
35000 3
 
0.1%
30000 1
 
< 0.1%
25000 2
 
0.1%
20000 1
 
< 0.1%
15000 1
 
< 0.1%
4000 1
 
< 0.1%
0 8
 
0.4%

Interactions

2023-12-11T01:56:32.167195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:30.409100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:31.025994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:31.602237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:32.286721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:30.538675image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:31.177651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:31.757797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:32.397972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:30.690892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:31.311888image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:31.901789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:32.514024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:30.852806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:31.431338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:56:32.043546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:56:36.286467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명신청부서명건물형태누수탐지소요액누수탐지지급액
연번1.0000.0600.2230.2030.0730.0650.000
사업소코드0.0601.0001.0000.5040.4640.0410.000
사업소명0.2231.0001.0000.9250.7210.1160.000
신청부서명0.2030.5040.9251.0000.6280.1330.057
건물형태0.0730.4640.7210.6281.0000.0840.000
누수탐지소요액0.0650.0410.1160.1330.0841.0000.000
누수탐지지급액0.0000.0000.0000.0570.0000.0001.000
2023-12-11T01:56:36.492893image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신청부서명사업소명건물형태
신청부서명1.0000.7840.321
사업소명0.7841.0000.531
건물형태0.3210.5311.000
2023-12-11T01:56:36.670625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드누수탐지소요액누수탐지지급액사업소명신청부서명건물형태
연번1.0000.047-0.007-0.0200.0970.0980.044
사업소코드0.0471.000-0.018-0.0250.9980.8030.279
누수탐지소요액-0.007-0.0181.0000.1350.0520.0650.054
누수탐지지급액-0.020-0.0250.1351.0000.0000.0420.000
사업소명0.0970.9980.0520.0001.0000.7840.531
신청부서명0.0980.8030.0650.0420.7841.0000.321
건물형태0.0440.2790.0540.0000.5310.3211.000

Missing values

2023-12-11T01:56:32.696971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:56:32.839168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T01:56:32.991064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번사업소코드사업소명신청부서명건물형태누수탐지소요액누수탐지지급액
01244동래통합사업소급수운영팀단독주택20000040000
12311강서 사업소요금단독주택15000040000
23311강서 사업소요금단독주택15000040000
34308해운대 사업소공무단독주택30000040000
45244동래통합사업소급수운영팀단독주택10000040000
56244동래통합사업소급수운영팀단독주택25000040000
67244동래통합사업소급수운영팀단독주택40000040000
78244동래통합사업소급수운영팀단독주택30000040000
89244동래통합사업소급수운영팀단독주택15000040000
910244동래통합사업소급수운영팀단독주택50000040000
연번사업소코드사업소명신청부서명건물형태누수탐지소요액누수탐지지급액
22202221244동래통합사업소급수운영팀단독주택30000040000
22212222244동래통합사업소급수운영팀단독주택80000040000
22222223244동래통합사업소급수운영팀단독주택10000040000
22232224307북부 사업소공무1단독주택20000040000
22242225307북부 사업소공무1단독주택20000040000
22252226304부산진 사업소요금1기타200000040000
22262227304부산진 사업소요금2기타40000040000
22272228304부산진 사업소요금1기타30000040000
22282229311강서 사업소요금단독주택30000040000
22292230301중동부 사업소공무1공동주택15000040000