Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory595.7 KiB
Average record size in memory61.0 B

Variable types

Numeric4
Categorical2

Dataset

Description부산광역시상수도사업본부_수용가정보시스템_민원신청정보_급수공사비_20210601
Author부산광역시 상수도사업본부
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15083685

Alerts

사업소코드 is highly overall correlated with 사업소명High correlation
분류코드 is highly overall correlated with 예산코드High correlation
사업소명 is highly overall correlated with 사업소코드High correlation
예산코드 is highly overall correlated with 분류코드High correlation
실제공사비 is highly skewed (γ1 = 37.98715735)Skewed
연번 has unique valuesUnique
실제공사비 has 8983 (89.8%) zerosZeros

Reproduction

Analysis started2023-12-10 16:41:14.176311
Analysis finished2023-12-10 16:41:17.656753
Duration3.48 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6843.4036
Minimum1
Maximum13708
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T01:41:17.733412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile669.9
Q13419.75
median6831.5
Q310283.25
95-th percentile13033.05
Maximum13708
Range13707
Interquartile range (IQR)6863.5

Descriptive statistics

Standard deviation3960.7706
Coefficient of variation (CV)0.57877204
Kurtosis-1.2056448
Mean6843.4036
Median Absolute Deviation (MAD)3433.5
Skewness0.0045214166
Sum68434036
Variance15687704
MonotonicityNot monotonic
2023-12-11T01:41:17.890713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6125 1
 
< 0.1%
9083 1
 
< 0.1%
5232 1
 
< 0.1%
7399 1
 
< 0.1%
12726 1
 
< 0.1%
2298 1
 
< 0.1%
2388 1
 
< 0.1%
1423 1
 
< 0.1%
13646 1
 
< 0.1%
11198 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
12 1
< 0.1%
13 1
< 0.1%
ValueCountFrequency (%)
13708 1
< 0.1%
13707 1
< 0.1%
13705 1
< 0.1%
13704 1
< 0.1%
13702 1
< 0.1%
13701 1
< 0.1%
13700 1
< 0.1%
13699 1
< 0.1%
13698 1
< 0.1%
13697 1
< 0.1%

사업소코드
Real number (ℝ)

HIGH CORRELATION 

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean296.4234
Minimum201
Maximum312
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T01:41:18.031283image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum201
5-th percentile244
Q1303
median307
Q3311
95-th percentile312
Maximum312
Range111
Interquartile range (IQR)8

Descriptive statistics

Standard deviation25.379546
Coefficient of variation (CV)0.085619239
Kurtosis0.97964593
Mean296.4234
Median Absolute Deviation (MAD)4
Skewness-1.6487491
Sum2964234
Variance644.12134
MonotonicityNot monotonic
2023-12-11T01:41:18.182538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
311 2027
20.3%
244 1781
17.8%
312 1661
16.6%
307 957
9.6%
304 821
8.2%
306 815
8.2%
308 558
 
5.6%
309 452
 
4.5%
301 375
 
3.8%
303 289
 
2.9%
Other values (2) 264
 
2.6%
ValueCountFrequency (%)
201 34
 
0.3%
244 1781
17.8%
301 375
 
3.8%
302 230
 
2.3%
303 289
 
2.9%
304 821
8.2%
306 815
8.2%
307 957
9.6%
308 558
 
5.6%
309 452
 
4.5%
ValueCountFrequency (%)
312 1661
16.6%
311 2027
20.3%
309 452
 
4.5%
308 558
 
5.6%
307 957
9.6%
306 815
8.2%
304 821
8.2%
303 289
 
2.9%
302 230
 
2.3%
301 375
 
3.8%

사업소명
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
강서 사업소
2027 
북부통합사업소
1781 
기장 사업소
1661 
북부 사업소
957 
부산진 사업소
821 
Other values (7)
2753 

Length

Max length9
Median length9
Mean length8.4616
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강서 사업소
2nd row사하 사업소
3rd row기장 사업소
4th row강서 사업소
5th row북부 사업소

Common Values

ValueCountFrequency (%)
강서 사업소 2027
20.3%
북부통합사업소 1781
17.8%
기장 사업소 1661
16.6%
북부 사업소 957
9.6%
부산진 사업소 821
8.2%
남부 사업소 815
8.2%
해운대 사업소 558
 
5.6%
사하 사업소 452
 
4.5%
중동부 사업소 375
 
3.8%
영도 사업소 289
 
2.9%
Other values (2) 264
 
2.6%

Length

2023-12-11T01:41:18.333935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
사업소 8185
45.0%
강서 2027
 
11.1%
북부통합사업소 1781
 
9.8%
기장 1661
 
9.1%
북부 957
 
5.3%
부산진 821
 
4.5%
남부 815
 
4.5%
해운대 558
 
3.1%
사하 452
 
2.5%
중동부 375
 
2.1%
Other values (3) 553
 
3.0%

예산코드
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
11176
2926 
11151
2388 
11161
1496 
<NA>
1226 
11117
1196 

Length

Max length5
Median length5
Mean length4.8774
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row21333
2nd row11161
3rd row11151
4th row11176
5th row11176

Common Values

ValueCountFrequency (%)
11176 2926
29.3%
11151 2388
23.9%
11161 1496
15.0%
<NA> 1226
12.3%
11117 1196
12.0%
21333 768
 
7.7%

Length

2023-12-11T01:41:18.466006image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:41:18.600348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
11176 2926
29.3%
11151 2388
23.9%
11161 1496
15.0%
na 1226
12.3%
11117 1196
12.0%
21333 768
 
7.7%

분류코드
Real number (ℝ)

HIGH CORRELATION 

Distinct51
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6243.6609
Minimum6111
Maximum6983
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T01:41:18.802976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum6111
5-th percentile6111
Q16121
median6141
Q36171
95-th percentile6983
Maximum6983
Range872
Interquartile range (IQR)50

Descriptive statistics

Standard deviation277.27688
Coefficient of variation (CV)0.044409343
Kurtosis3.2227648
Mean6243.6609
Median Absolute Deviation (MAD)23
Skewness2.2726703
Sum62436609
Variance76882.467
MonotonicityNot monotonic
2023-12-11T01:41:18.985237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6983 1226
 
12.3%
6116 811
 
8.1%
6111 801
 
8.0%
6141 664
 
6.6%
6121 636
 
6.4%
6131 614
 
6.1%
6136 569
 
5.7%
6146 555
 
5.5%
6191 532
 
5.3%
6163 213
 
2.1%
Other values (41) 3379
33.8%
ValueCountFrequency (%)
6111 801
8.0%
6112 127
 
1.3%
6113 112
 
1.1%
6114 199
 
2.0%
6116 811
8.1%
6117 128
 
1.3%
6118 96
 
1.0%
6121 636
6.4%
6123 66
 
0.7%
6124 153
 
1.5%
ValueCountFrequency (%)
6983 1226
12.3%
6191 532
5.3%
6187 3
 
< 0.1%
6186 25
 
0.2%
6185 35
 
0.4%
6184 1
 
< 0.1%
6183 8
 
0.1%
6182 1
 
< 0.1%
6181 162
 
1.6%
6175 107
 
1.1%

실제공사비
Real number (ℝ)

SKEWED  ZEROS 

Distinct951
Distinct (%)9.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean279383.26
Minimum0
Maximum1.81208 × 108
Zeros8983
Zeros (%)89.8%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T01:41:19.155658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile902705
Maximum1.81208 × 108
Range1.81208 × 108
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3171113.9
Coefficient of variation (CV)11.350408
Kurtosis1847.9872
Mean279383.26
Median Absolute Deviation (MAD)0
Skewness37.987157
Sum2.7938326 × 109
Variance1.0055964 × 1013
MonotonicityNot monotonic
2023-12-11T01:41:19.345132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 8983
89.8%
24410 5
 
0.1%
1330 4
 
< 0.1%
299430 4
 
< 0.1%
1199000 3
 
< 0.1%
805200 3
 
< 0.1%
1386000 3
 
< 0.1%
1551000 3
 
< 0.1%
275010 2
 
< 0.1%
392260 2
 
< 0.1%
Other values (941) 988
 
9.9%
ValueCountFrequency (%)
0 8983
89.8%
150 2
 
< 0.1%
1330 4
 
< 0.1%
8000 1
 
< 0.1%
10100 1
 
< 0.1%
15000 1
 
< 0.1%
15300 2
 
< 0.1%
19200 1
 
< 0.1%
24400 1
 
< 0.1%
24410 5
 
0.1%
ValueCountFrequency (%)
181208000 1
< 0.1%
155000000 1
< 0.1%
118551000 1
< 0.1%
50170000 1
< 0.1%
46085000 1
< 0.1%
45584000 1
< 0.1%
42304900 1
< 0.1%
40744000 1
< 0.1%
38926000 1
< 0.1%
33407000 1
< 0.1%

Interactions

2023-12-11T01:41:16.809354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:15.033709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:15.607375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:16.447353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:16.904833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:15.183254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:16.083669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:16.549400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:16.984375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:15.329697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:16.202281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:16.635344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:17.069380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:15.457814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:16.304246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:41:16.717075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:41:19.465054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명예산코드분류코드실제공사비
연번1.0000.0760.2020.0000.0000.012
사업소코드0.0761.0001.0000.1230.0200.013
사업소명0.2021.0001.0000.2760.0130.036
예산코드0.0000.1230.2761.000NaN0.046
분류코드0.0000.0200.013NaN1.0000.000
실제공사비0.0120.0130.0360.0460.0001.000
2023-12-11T01:41:19.642122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
예산코드사업소명
예산코드1.0000.156
사업소명0.1561.000
2023-12-11T01:41:19.763602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드분류코드실제공사비사업소명예산코드
연번1.000-0.037-0.0060.0100.0860.000
사업소코드-0.0371.000-0.1630.2421.0000.111
분류코드-0.006-0.1631.000-0.0940.0001.000
실제공사비0.0100.242-0.0941.0000.0140.031
사업소명0.0861.0000.0000.0141.0000.156
예산코드0.0000.1111.0000.0310.1561.000

Missing values

2023-12-11T01:41:17.403194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:41:17.605860image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번사업소코드사업소명예산코드분류코드실제공사비
61246125311강서 사업소2133361460
86358636309사하 사업소1116161810
81828183312기장 사업소111516121240060
76097610311강서 사업소1117661110
1202312024307북부 사업소1117661160
18731874303영도 사업소<NA>69830
19751976307북부 사업소1116161730
1266212663244북부통합사업소1115161210
756757302서부 사업소1116161710
83748375307북부 사업소2133361470
연번사업소코드사업소명예산코드분류코드실제공사비
82498250304부산진 사업소1117661180
1128011281304부산진 사업소1116161810
93459346244북부통합사업소1117661560
1276512766244북부통합사업소1111761410
16181619308해운대 사업소1111761410
1347113472244북부통합사업소1116161740
55415542307북부 사업소1117661140
1244312444304부산진 사업소1117661180
31543155244북부통합사업소1117661110
65836584303영도 사업소2133361460