Overview

Dataset statistics

Number of variables5
Number of observations3459
Missing cells130
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory142.0 KiB
Average record size in memory42.0 B

Variable types

Numeric2
Categorical2
Text1

Dataset

Description부산광역시상수도사업본부_수용가정보시스템_민원신청정보_급수폐전_20230125
Author부산광역시 상수도사업본부
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15100353

Alerts

사업소코드 is highly overall correlated with 사업소명High correlation
사업소명 is highly overall correlated with 사업소코드High correlation
폐전사유 is highly imbalanced (53.5%)Imbalance
폐전일자 has 130 (3.8%) missing valuesMissing
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-10 16:45:57.679043
Analysis finished2023-12-10 16:45:58.685686
Duration1.01 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct3459
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1730
Minimum1
Maximum3459
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.5 KiB
2023-12-11T01:45:58.779270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile173.9
Q1865.5
median1730
Q32594.5
95-th percentile3286.1
Maximum3459
Range3458
Interquartile range (IQR)1729

Descriptive statistics

Standard deviation998.67162
Coefficient of variation (CV)0.57726683
Kurtosis-1.2
Mean1730
Median Absolute Deviation (MAD)865
Skewness0
Sum5984070
Variance997345
MonotonicityStrictly increasing
2023-12-11T01:45:58.965088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
2312 1
 
< 0.1%
2301 1
 
< 0.1%
2302 1
 
< 0.1%
2303 1
 
< 0.1%
2304 1
 
< 0.1%
2305 1
 
< 0.1%
2306 1
 
< 0.1%
2307 1
 
< 0.1%
2308 1
 
< 0.1%
Other values (3449) 3449
99.7%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
3459 1
< 0.1%
3458 1
< 0.1%
3457 1
< 0.1%
3456 1
< 0.1%
3455 1
< 0.1%
3454 1
< 0.1%
3453 1
< 0.1%
3452 1
< 0.1%
3451 1
< 0.1%
3450 1
< 0.1%

사업소코드
Real number (ℝ)

HIGH CORRELATION 

Distinct11
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean296.24458
Minimum244
Maximum312
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.5 KiB
2023-12-11T01:45:59.123449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum244
5-th percentile244
Q1303
median306
Q3307
95-th percentile311
Maximum312
Range68
Interquartile range (IQR)4

Descriptive statistics

Standard deviation22.48136
Coefficient of variation (CV)0.075887835
Kurtosis1.5648948
Mean296.24458
Median Absolute Deviation (MAD)2
Skewness-1.8688063
Sum1024710
Variance505.41153
MonotonicityNot monotonic
2023-12-11T01:45:59.228481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
306 1128
32.6%
244 536
15.5%
304 457
13.2%
307 421
 
12.2%
308 197
 
5.7%
302 191
 
5.5%
311 128
 
3.7%
301 128
 
3.7%
309 108
 
3.1%
303 107
 
3.1%
ValueCountFrequency (%)
244 536
15.5%
301 128
 
3.7%
302 191
 
5.5%
303 107
 
3.1%
304 457
13.2%
306 1128
32.6%
307 421
 
12.2%
308 197
 
5.7%
309 108
 
3.1%
311 128
 
3.7%
ValueCountFrequency (%)
312 58
 
1.7%
311 128
 
3.7%
309 108
 
3.1%
308 197
 
5.7%
307 421
 
12.2%
306 1128
32.6%
304 457
13.2%
303 107
 
3.1%
302 191
 
5.5%
301 128
 
3.7%

사업소명
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size27.2 KiB
남부 사업소
1128 
동래통합사업소
536 
부산진 사업소
457 
북부 사업소
421 
해운대 사업소
197 
Other values (6)
720 

Length

Max length9
Median length9
Mean length8.4640069
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row남부 사업소
2nd row동래통합사업소
3rd row동래통합사업소
4th row남부 사업소
5th row영도 사업소

Common Values

ValueCountFrequency (%)
남부 사업소 1128
32.6%
동래통합사업소 536
15.5%
부산진 사업소 457
13.2%
북부 사업소 421
 
12.2%
해운대 사업소 197
 
5.7%
서부 사업소 191
 
5.5%
강서 사업소 128
 
3.7%
중동부 사업소 128
 
3.7%
사하 사업소 108
 
3.1%
영도 사업소 107
 
3.1%

Length

2023-12-11T01:45:59.400392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
사업소 2923
45.8%
남부 1128
 
17.7%
동래통합사업소 536
 
8.4%
부산진 457
 
7.2%
북부 421
 
6.6%
해운대 197
 
3.1%
서부 191
 
3.0%
강서 128
 
2.0%
중동부 128
 
2.0%
사하 108
 
1.7%
Other values (2) 165
 
2.6%

폐전사유
Categorical

IMBALANCE 

Distinct8
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size27.2 KiB
건물철거
2300 
불필요등기타
811 
도시계획 도로편입
 
147
직권폐전
 
118
<NA>
 
60
Other values (3)
 
23

Length

Max length10
Median length4
Mean length4.6837236
Min length4

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row건물철거
2nd row불필요등기타
3rd row불필요등기타
4th row건물철거
5th row불필요등기타

Common Values

ValueCountFrequency (%)
건물철거 2300
66.5%
불필요등기타 811
 
23.4%
도시계획 도로편입 147
 
4.2%
직권폐전 118
 
3.4%
<NA> 60
 
1.7%
폐전분실 21
 
0.6%
중지후 건물신축포기 1
 
< 0.1%
공동수도폐전 1
 
< 0.1%

Length

2023-12-11T01:45:59.528700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:45:59.662279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
건물철거 2300
63.8%
불필요등기타 811
 
22.5%
도시계획 147
 
4.1%
도로편입 147
 
4.1%
직권폐전 118
 
3.3%
na 60
 
1.7%
폐전분실 21
 
0.6%
중지후 1
 
< 0.1%
건물신축포기 1
 
< 0.1%
공동수도폐전 1
 
< 0.1%

폐전일자
Text

MISSING 

Distinct295
Distinct (%)8.9%
Missing130
Missing (%)3.8%
Memory size27.2 KiB
2023-12-11T01:46:00.044976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters33290
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique36 ?
Unique (%)1.1%

Sample

1st row2022-12-09
2nd row2022-12-19
3rd row2022-12-12
4th row2022-12-09
5th row2022-12-12
ValueCountFrequency (%)
2022-10-07 37
 
1.1%
2022-02-16 35
 
1.1%
2022-10-13 33
 
1.0%
2022-08-10 33
 
1.0%
2022-06-02 32
 
1.0%
2022-09-30 32
 
1.0%
2022-08-31 32
 
1.0%
2022-11-22 31
 
0.9%
2022-03-02 30
 
0.9%
2022-03-25 29
 
0.9%
Other values (285) 3005
90.3%
2023-12-11T01:46:00.546784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 11938
35.9%
0 7333
22.0%
- 6658
20.0%
1 3017
 
9.1%
3 718
 
2.2%
8 672
 
2.0%
5 648
 
1.9%
7 645
 
1.9%
6 606
 
1.8%
9 564
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 26632
80.0%
Dash Punctuation 6658
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 11938
44.8%
0 7333
27.5%
1 3017
 
11.3%
3 718
 
2.7%
8 672
 
2.5%
5 648
 
2.4%
7 645
 
2.4%
6 606
 
2.3%
9 564
 
2.1%
4 491
 
1.8%
Dash Punctuation
ValueCountFrequency (%)
- 6658
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 33290
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 11938
35.9%
0 7333
22.0%
- 6658
20.0%
1 3017
 
9.1%
3 718
 
2.2%
8 672
 
2.0%
5 648
 
1.9%
7 645
 
1.9%
6 606
 
1.8%
9 564
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 33290
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 11938
35.9%
0 7333
22.0%
- 6658
20.0%
1 3017
 
9.1%
3 718
 
2.2%
8 672
 
2.0%
5 648
 
1.9%
7 645
 
1.9%
6 606
 
1.8%
9 564
 
1.7%

Interactions

2023-12-11T01:45:58.227774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:45:57.979656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:45:58.354230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:45:58.090371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:46:00.670614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명폐전사유
연번1.0000.1520.1300.082
사업소코드0.1521.0001.0000.136
사업소명0.1301.0001.0000.511
폐전사유0.0820.1360.5111.000
2023-12-11T01:46:01.105737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
폐전사유사업소명
폐전사유1.0000.282
사업소명0.2821.000
2023-12-11T01:46:01.227256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명폐전사유
연번1.0000.0060.0550.041
사업소코드0.0061.0000.9990.206
사업소명0.0550.9991.0000.282
폐전사유0.0410.2060.2821.000

Missing values

2023-12-11T01:45:58.520440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:45:58.634987image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번사업소코드사업소명폐전사유폐전일자
01306남부 사업소건물철거2022-12-09
12244동래통합사업소불필요등기타2022-12-19
23244동래통합사업소불필요등기타2022-12-12
34306남부 사업소건물철거2022-12-09
45303영도 사업소불필요등기타2022-12-12
56306남부 사업소불필요등기타2022-12-15
67306남부 사업소건물철거2022-12-21
78306남부 사업소건물철거2022-12-23
89308해운대 사업소불필요등기타2022-02-16
910308해운대 사업소불필요등기타2022-02-16
연번사업소코드사업소명폐전사유폐전일자
34493450304부산진 사업소도시계획 도로편입2022-06-03
34503451306남부 사업소건물철거2022-11-21
34513452306남부 사업소건물철거2022-11-22
34523453302서부 사업소건물철거2022-11-23
34533454307북부 사업소건물철거2022-11-28
34543455306남부 사업소건물철거2022-11-30
34553456306남부 사업소건물철거2022-11-29
34563457306남부 사업소건물철거2022-12-01
34573458244동래통합사업소건물철거2022-01-11
34583459244동래통합사업소건물철거2022-01-11