Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory576.2 KiB
Average record size in memory59.0 B

Variable types

Numeric3
Categorical2
Text1

Dataset

Description부산광역시 상수도사업본부에서 상하수도 요금 계산 및 징수를 위해 운영하는 수용가정보시스템에 사용되는 민원 신청 정보(급수공사비) 자료입니다.
Author부산광역시 상수도사업본부
URLhttps://www.data.go.kr/data/15083685/fileData.do

Alerts

예산과목 has a high cardinality: 51 distinct valuesHigh cardinality
사업소코드 is highly overall correlated with 사업소명High correlation
사업소명 is highly overall correlated with 사업소코드High correlation
실제공사비 is highly skewed (γ1 = 78.7471872)Skewed
연번 has unique valuesUnique
실제공사비 has 9159 (91.6%) zerosZeros

Reproduction

Analysis started2024-03-14 19:48:22.477373
Analysis finished2024-03-14 19:48:26.128082
Duration3.65 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8505.3895
Minimum4
Maximum17038
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-15T04:48:26.345967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile852.85
Q14267.25
median8497
Q312750.25
95-th percentile16160.05
Maximum17038
Range17034
Interquartile range (IQR)8483

Descriptive statistics

Standard deviation4905.3017
Coefficient of variation (CV)0.57672864
Kurtosis-1.1960767
Mean8505.3895
Median Absolute Deviation (MAD)4242
Skewness2.189905 × 10-5
Sum85053895
Variance24061985
MonotonicityNot monotonic
2024-03-15T04:48:26.718847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14098 1
 
< 0.1%
664 1
 
< 0.1%
6802 1
 
< 0.1%
14870 1
 
< 0.1%
2651 1
 
< 0.1%
10839 1
 
< 0.1%
16018 1
 
< 0.1%
8030 1
 
< 0.1%
7466 1
 
< 0.1%
12334 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
11 1
< 0.1%
12 1
< 0.1%
13 1
< 0.1%
14 1
< 0.1%
16 1
< 0.1%
17 1
< 0.1%
18 1
< 0.1%
ValueCountFrequency (%)
17038 1
< 0.1%
17037 1
< 0.1%
17036 1
< 0.1%
17034 1
< 0.1%
17031 1
< 0.1%
17028 1
< 0.1%
17025 1
< 0.1%
17024 1
< 0.1%
17023 1
< 0.1%
17022 1
< 0.1%

사업소코드
Real number (ℝ)

HIGH CORRELATION 

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean296.8412
Minimum201
Maximum312
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-15T04:48:27.113278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum201
5-th percentile244
Q1303
median307
Q3311
95-th percentile312
Maximum312
Range111
Interquartile range (IQR)8

Descriptive statistics

Standard deviation25.328268
Coefficient of variation (CV)0.085325984
Kurtosis1.554399
Mean296.8412
Median Absolute Deviation (MAD)4
Skewness-1.7730705
Sum2968412
Variance641.52113
MonotonicityNot monotonic
2024-03-15T04:48:27.335682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
311 1869
18.7%
312 1752
17.5%
244 1662
16.6%
306 920
9.2%
304 844
8.4%
307 804
8.0%
308 619
 
6.2%
309 587
 
5.9%
301 357
 
3.6%
303 279
 
2.8%
Other values (2) 307
 
3.1%
ValueCountFrequency (%)
201 66
 
0.7%
244 1662
16.6%
301 357
 
3.6%
302 241
 
2.4%
303 279
 
2.8%
304 844
8.4%
306 920
9.2%
307 804
8.0%
308 619
 
6.2%
309 587
 
5.9%
ValueCountFrequency (%)
312 1752
17.5%
311 1869
18.7%
309 587
 
5.9%
308 619
 
6.2%
307 804
8.0%
306 920
9.2%
304 844
8.4%
303 279
 
2.8%
302 241
 
2.4%
301 357
 
3.6%

사업소명
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
강서사업소
1869 
기장사업소
1752 
동래통합사업소
1662 
남부사업소
920 
부산진 사업소
844 
Other values (7)
2953 

Length

Max length9
Median length5
Mean length5.7928
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row사하사업소
2nd row남부사업소
3rd row사하사업소
4th row동래통합사업소
5th row중동부사업소

Common Values

ValueCountFrequency (%)
강서사업소 1869
18.7%
기장사업소 1752
17.5%
동래통합사업소 1662
16.6%
남부사업소 920
9.2%
부산진 사업소 844
8.4%
북부사업소 804
8.0%
해운대사업소 619
 
6.2%
사하사업소 587
 
5.9%
중동부사업소 357
 
3.6%
영도사업소 279
 
2.8%
Other values (2) 307
 
3.1%

Length

2024-03-15T04:48:27.568221image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
강서사업소 1869
16.9%
기장사업소 1752
15.8%
동래통합사업소 1662
15.0%
사업소 1085
9.8%
남부사업소 920
8.3%
부산진 844
7.6%
북부사업소 804
7.3%
해운대사업소 619
 
5.6%
사하사업소 587
 
5.3%
중동부사업소 357
 
3.2%
Other values (3) 586
 
5.3%

예산과목
Categorical

HIGH CARDINALITY 

Distinct51
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
1193 
설계료(25mm이하)
813 
준공검사료(25mm이하)
802 
원인자부담금(시설부담금)
752 
공사방수량(신설)
751 
Other values (46)
5689 

Length

Max length17
Median length13
Mean length10.6889
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row자재비(신설,구경별정액제)
2nd row<NA>
3rd row공사방수량(신설)
4th row준공검사수수료(비정액제)
5th row시공비(세대별분리공사)

Common Values

ValueCountFrequency (%)
<NA> 1193
11.9%
설계료(25mm이하) 813
 
8.1%
준공검사료(25mm이하) 802
 
8.0%
원인자부담금(시설부담금) 752
 
7.5%
공사방수량(신설) 751
 
7.5%
시공비(신설,정액제) 695
 
7.0%
자재비(신설,구경별정액제) 675
 
6.8%
복구비(신설,구경별정액제) 657
 
6.6%
공사방수료(개조) 487
 
4.9%
자재비(이동공사) 275
 
2.8%
Other values (41) 2900
29.0%

Length

2024-03-15T04:48:27.952797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 1193
11.9%
설계료(25mm이하 813
 
8.1%
준공검사료(25mm이하 802
 
8.0%
원인자부담금(시설부담금 752
 
7.5%
공사방수량(신설 751
 
7.5%
시공비(신설,정액제 695
 
7.0%
자재비(신설,구경별정액제 675
 
6.8%
복구비(신설,구경별정액제 657
 
6.6%
공사방수료(개조 487
 
4.9%
자재비(이동공사 275
 
2.8%
Other values (41) 2900
29.0%
Distinct51
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-15T04:48:28.814140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length14
Mean length11.524
Min length9

Characters and Unicode

Total characters115240
Distinct characters66
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row자재비(신설,구경별정액제)
2nd row물이용부담금(일반용)
3rd row공사방수량(신설)
4th row준공검사수수료(비정액제)
5th row시공비(세대별분리공사)
ValueCountFrequency (%)
물이용부담금(일반용 1193
11.9%
설계료(25mm이하 813
 
8.1%
준공검사료(25mm이하 802
 
8.0%
원인자부담금(시설부담금 752
 
7.5%
공사방수량(신설 751
 
7.5%
시공비(신설,정액제 695
 
7.0%
자재비(신설,구경별정액제 675
 
6.8%
복구비(신설,구경별정액제 657
 
6.6%
공사방수료(개조 487
 
4.9%
자재비(이동공사 275
 
2.8%
Other values (41) 2900
29.0%
2024-03-15T04:48:30.111755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 9994
 
8.7%
) 9994
 
8.7%
5734
 
5.0%
5387
 
4.7%
m 4480
 
3.9%
4225
 
3.7%
3961
 
3.4%
3932
 
3.4%
3455
 
3.0%
3138
 
2.7%
Other values (56) 60940
52.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 83686
72.6%
Open Punctuation 9994
 
8.7%
Close Punctuation 9994
 
8.7%
Lowercase Letter 4480
 
3.9%
Decimal Number 4480
 
3.9%
Other Punctuation 2606
 
2.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5734
 
6.9%
5387
 
6.4%
4225
 
5.0%
3961
 
4.7%
3932
 
4.7%
3455
 
4.1%
3138
 
3.7%
2737
 
3.3%
2737
 
3.3%
2737
 
3.3%
Other values (49) 45643
54.5%
Decimal Number
ValueCountFrequency (%)
2 2240
50.0%
5 1883
42.0%
3 357
 
8.0%
Open Punctuation
ValueCountFrequency (%)
( 9994
100.0%
Close Punctuation
ValueCountFrequency (%)
) 9994
100.0%
Lowercase Letter
ValueCountFrequency (%)
m 4480
100.0%
Other Punctuation
ValueCountFrequency (%)
, 2606
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 83686
72.6%
Common 27074
 
23.5%
Latin 4480
 
3.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5734
 
6.9%
5387
 
6.4%
4225
 
5.0%
3961
 
4.7%
3932
 
4.7%
3455
 
4.1%
3138
 
3.7%
2737
 
3.3%
2737
 
3.3%
2737
 
3.3%
Other values (49) 45643
54.5%
Common
ValueCountFrequency (%)
( 9994
36.9%
) 9994
36.9%
, 2606
 
9.6%
2 2240
 
8.3%
5 1883
 
7.0%
3 357
 
1.3%
Latin
ValueCountFrequency (%)
m 4480
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 83686
72.6%
ASCII 31554
 
27.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 9994
31.7%
) 9994
31.7%
m 4480
14.2%
, 2606
 
8.3%
2 2240
 
7.1%
5 1883
 
6.0%
3 357
 
1.1%
Hangul
ValueCountFrequency (%)
5734
 
6.9%
5387
 
6.4%
4225
 
5.0%
3961
 
4.7%
3932
 
4.7%
3455
 
4.1%
3138
 
3.7%
2737
 
3.3%
2737
 
3.3%
2737
 
3.3%
Other values (49) 45643
54.5%

실제공사비
Real number (ℝ)

SKEWED  ZEROS 

Distinct434
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean134933.64
Minimum0
Maximum2.31572 × 108
Zeros9159
Zeros (%)91.6%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-15T04:48:30.604940image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile7000
Maximum2.31572 × 108
Range2.31572 × 108
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2516194.4
Coefficient of variation (CV)18.647643
Kurtosis7166.0059
Mean134933.64
Median Absolute Deviation (MAD)0
Skewness78.747187
Sum1.3493364 × 109
Variance6.3312345 × 1012
MonotonicityNot monotonic
2024-03-15T04:48:30.988404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 9159
91.6%
7000 166
 
1.7%
150 94
 
0.9%
1330 85
 
0.9%
8000 21
 
0.2%
10000 15
 
0.1%
960200 5
 
0.1%
330860 4
 
< 0.1%
6650 4
 
< 0.1%
750 4
 
< 0.1%
Other values (424) 443
 
4.4%
ValueCountFrequency (%)
0 9159
91.6%
150 94
 
0.9%
750 4
 
< 0.1%
1300 1
 
< 0.1%
1330 85
 
0.9%
1500 2
 
< 0.1%
6550 1
 
< 0.1%
6650 4
 
< 0.1%
7000 166
 
1.7%
8000 21
 
0.2%
ValueCountFrequency (%)
231572000 1
< 0.1%
25983000 1
< 0.1%
24193050 1
< 0.1%
21899000 1
< 0.1%
21098000 1
< 0.1%
19829700 1
< 0.1%
19424900 1
< 0.1%
19305000 1
< 0.1%
19233000 1
< 0.1%
18810550 1
< 0.1%

Interactions

2024-03-15T04:48:24.702393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:48:23.069231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:48:23.897879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:48:24.974065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:48:23.356828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:48:24.177269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:48:25.236203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:48:23.630902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T04:48:24.438114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-15T04:48:31.252186image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명예산과목분류코드실제공사비
연번1.0000.0820.1630.0370.0230.000
사업소코드0.0821.0001.0000.1980.1780.011
사업소명0.1631.0001.0000.4710.4490.000
예산과목0.0370.1980.4711.0001.0000.189
분류코드0.0230.1780.4491.0001.0000.204
실제공사비0.0000.0110.0000.1890.2041.000
2024-03-15T04:48:31.523234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
예산과목사업소명
예산과목1.0000.165
사업소명0.1651.000
2024-03-15T04:48:31.769108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드실제공사비사업소명예산과목
연번1.0000.0240.0450.0690.012
사업소코드0.0241.0000.1261.0000.267
실제공사비0.0450.1261.0000.0000.092
사업소명0.0691.0000.0001.0000.165
예산과목0.0120.2670.0920.1651.000

Missing values

2024-03-15T04:48:25.590192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-15T04:48:25.992456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번사업소코드사업소명예산과목분류코드실제공사비
1409714098309사하사업소자재비(신설,구경별정액제)자재비(신설,구경별정액제)0
41424143306남부사업소<NA>물이용부담금(일반용)0
15961597309사하사업소공사방수량(신설)공사방수량(신설)0
48934894244동래통합사업소준공검사수수료(비정액제)준공검사수수료(비정액제)0
21152116301중동부사업소시공비(세대별분리공사)시공비(세대별분리공사)0
17481749312기장사업소자재비(신설,구경별정액제)자재비(신설,구경별정액제)0
76327633311강서사업소공사방수량(신설)공사방수량(신설)0
92099210308해운대사업소설계료(25mm이하)설계료(25mm이하)0
52615262308해운대사업소설계료(비정액제)설계료(비정액제)0
43984399244동래통합사업소원인자부담금(시설부담금)원인자부담금(시설부담금)0
연번사업소코드사업소명예산과목분류코드실제공사비
1633716338307북부사업소시공비(신설,정액제)시공비(신설,정액제)0
742743244동래통합사업소<NA>물이용부담금(일반용)0
83348335307북부사업소시공비(이동공사)시공비(이동공사)0
1683116832306남부사업소준공검사료(공동주택)준공검사료(공동주택)0
1354213543244동래통합사업소복구비(신설,구경별정액제)복구비(신설,구경별정액제)0
1087710878311강서사업소자재비(신설,구경별정액제)자재비(신설,구경별정액제)0
91749175244동래통합사업소자재비(신설,기타)자재비(신설,기타)0
1027710278312기장사업소<NA>물이용부담금(일반용)0
1408114082309사하사업소설계료(32mm이상)설계료(32mm이상)0
15341535244동래통합사업소설계료(개조,기타)설계료(개조,기타)0