Overview

Dataset statistics

Number of variables5
Number of observations3576
Missing cells130
Missing cells (%)0.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory146.8 KiB
Average record size in memory42.0 B

Variable types

Numeric2
Categorical2
Text1

Dataset

Description부산광역시상수도사업본부_수용가정보시스템_민원신청정보_급수폐전_20220131
Author부산광역시 상수도사업본부
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15100353

Alerts

사업소코드 is highly overall correlated with 사업소명High correlation
사업소명 is highly overall correlated with 사업소코드High correlation
폐전일자 has 130 (3.6%) missing valuesMissing
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-10 16:46:02.883812
Analysis finished2023-12-10 16:46:04.029624
Duration1.15 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct3576
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1788.5
Minimum1
Maximum3576
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.6 KiB
2023-12-11T01:46:04.126355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile179.75
Q1894.75
median1788.5
Q32682.25
95-th percentile3397.25
Maximum3576
Range3575
Interquartile range (IQR)1787.5

Descriptive statistics

Standard deviation1032.4466
Coefficient of variation (CV)0.57726956
Kurtosis-1.2
Mean1788.5
Median Absolute Deviation (MAD)894
Skewness0
Sum6395676
Variance1065946
MonotonicityStrictly increasing
2023-12-11T01:46:04.305193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
2390 1
 
< 0.1%
2379 1
 
< 0.1%
2380 1
 
< 0.1%
2381 1
 
< 0.1%
2382 1
 
< 0.1%
2383 1
 
< 0.1%
2384 1
 
< 0.1%
2385 1
 
< 0.1%
2386 1
 
< 0.1%
Other values (3566) 3566
99.7%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
3576 1
< 0.1%
3575 1
< 0.1%
3574 1
< 0.1%
3573 1
< 0.1%
3572 1
< 0.1%
3571 1
< 0.1%
3570 1
< 0.1%
3569 1
< 0.1%
3568 1
< 0.1%
3567 1
< 0.1%

사업소코드
Real number (ℝ)

HIGH CORRELATION 

Distinct11
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean294.00587
Minimum244
Maximum312
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.6 KiB
2023-12-11T01:46:04.443727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum244
5-th percentile244
Q1302
median304
Q3307
95-th percentile311
Maximum312
Range68
Interquartile range (IQR)5

Descriptive statistics

Standard deviation24.327997
Coefficient of variation (CV)0.082746636
Kurtosis0.44893422
Mean294.00587
Median Absolute Deviation (MAD)3
Skewness-1.541009
Sum1051365
Variance591.85143
MonotonicityNot monotonic
2023-12-11T01:46:04.592134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
304 781
21.8%
244 678
19.0%
306 521
14.6%
309 356
10.0%
302 298
 
8.3%
307 247
 
6.9%
311 185
 
5.2%
308 163
 
4.6%
301 146
 
4.1%
303 127
 
3.6%
ValueCountFrequency (%)
244 678
19.0%
301 146
 
4.1%
302 298
 
8.3%
303 127
 
3.6%
304 781
21.8%
306 521
14.6%
307 247
 
6.9%
308 163
 
4.6%
309 356
10.0%
311 185
 
5.2%
ValueCountFrequency (%)
312 74
 
2.1%
311 185
 
5.2%
309 356
10.0%
308 163
 
4.6%
307 247
 
6.9%
306 521
14.6%
304 781
21.8%
303 127
 
3.6%
302 298
 
8.3%
301 146
 
4.1%

사업소명
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size28.1 KiB
부산진 사업소
781 
동래통합사업소
678 
남부 사업소
521 
사하 사업소
356 
서부 사업소
298 
Other values (6)
942 

Length

Max length9
Median length9
Mean length8.3159955
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부산진 사업소
2nd row기장 사업소
3rd row부산진 사업소
4th row부산진 사업소
5th row중동부 사업소

Common Values

ValueCountFrequency (%)
부산진 사업소 781
21.8%
동래통합사업소 678
19.0%
남부 사업소 521
14.6%
사하 사업소 356
10.0%
서부 사업소 298
 
8.3%
북부 사업소 247
 
6.9%
강서 사업소 185
 
5.2%
해운대 사업소 163
 
4.6%
중동부 사업소 146
 
4.1%
영도 사업소 127
 
3.6%

Length

2023-12-11T01:46:04.785473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
사업소 2898
44.8%
부산진 781
 
12.1%
동래통합사업소 678
 
10.5%
남부 521
 
8.0%
사하 356
 
5.5%
서부 298
 
4.6%
북부 247
 
3.8%
강서 185
 
2.9%
해운대 163
 
2.5%
중동부 146
 
2.3%
Other values (2) 201
 
3.1%

폐전사유
Categorical

Distinct7
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size28.1 KiB
건물철거
2155 
불필요등기타
729 
도시계획 도로편입
387 
직권폐전
243 
<NA>
 
53
Other values (2)
 
9

Length

Max length10
Median length4
Mean length4.9521812
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row건물철거
2nd row건물철거
3rd row불필요등기타
4th row불필요등기타
5th row건물철거

Common Values

ValueCountFrequency (%)
건물철거 2155
60.3%
불필요등기타 729
 
20.4%
도시계획 도로편입 387
 
10.8%
직권폐전 243
 
6.8%
<NA> 53
 
1.5%
폐전분실 7
 
0.2%
중지후 건물신축포기 2
 
0.1%

Length

2023-12-11T01:46:05.023978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:46:05.202819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
건물철거 2155
54.4%
불필요등기타 729
 
18.4%
도시계획 387
 
9.8%
도로편입 387
 
9.8%
직권폐전 243
 
6.1%
na 53
 
1.3%
폐전분실 7
 
0.2%
중지후 2
 
0.1%
건물신축포기 2
 
0.1%

폐전일자
Text

MISSING 

Distinct300
Distinct (%)8.7%
Missing130
Missing (%)3.6%
Memory size28.1 KiB
2023-12-11T01:46:05.638917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters34460
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique43 ?
Unique (%)1.2%

Sample

1st row2021-10-20
2nd row2021-12-21
3rd row2021-12-21
4th row2021-12-21
5th row2021-12-28
ValueCountFrequency (%)
2021-03-08 88
 
2.6%
2021-01-05 63
 
1.8%
2021-04-21 32
 
0.9%
2021-09-23 32
 
0.9%
2021-08-31 32
 
0.9%
2021-04-07 32
 
0.9%
2021-02-04 30
 
0.9%
2021-12-24 29
 
0.8%
2021-10-20 27
 
0.8%
2021-03-15 26
 
0.8%
Other values (290) 3055
88.7%
2023-12-11T01:46:06.272196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 8820
25.6%
0 7769
22.5%
- 6892
20.0%
1 6311
18.3%
3 897
 
2.6%
8 683
 
2.0%
7 630
 
1.8%
9 622
 
1.8%
6 618
 
1.8%
4 613
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 27568
80.0%
Dash Punctuation 6892
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 8820
32.0%
0 7769
28.2%
1 6311
22.9%
3 897
 
3.3%
8 683
 
2.5%
7 630
 
2.3%
9 622
 
2.3%
6 618
 
2.2%
4 613
 
2.2%
5 605
 
2.2%
Dash Punctuation
ValueCountFrequency (%)
- 6892
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 34460
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 8820
25.6%
0 7769
22.5%
- 6892
20.0%
1 6311
18.3%
3 897
 
2.6%
8 683
 
2.0%
7 630
 
1.8%
9 622
 
1.8%
6 618
 
1.8%
4 613
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 34460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 8820
25.6%
0 7769
22.5%
- 6892
20.0%
1 6311
18.3%
3 897
 
2.6%
8 683
 
2.0%
7 630
 
1.8%
9 622
 
1.8%
6 618
 
1.8%
4 613
 
1.8%

Interactions

2023-12-11T01:46:03.472870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:46:03.188091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:46:03.637246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:46:03.339632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:46:06.440203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명폐전사유
연번1.0000.0860.1020.081
사업소코드0.0861.0001.0000.350
사업소명0.1021.0001.0000.587
폐전사유0.0810.3500.5871.000
2023-12-11T01:46:06.583933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
폐전사유사업소명
폐전사유1.0000.350
사업소명0.3501.000
2023-12-11T01:46:06.728240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업소코드사업소명폐전사유
연번1.000-0.0220.0430.043
사업소코드-0.0221.0000.9990.207
사업소명0.0430.9991.0000.350
폐전사유0.0430.2070.3501.000

Missing values

2023-12-11T01:46:03.832134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:46:03.975984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번사업소코드사업소명폐전사유폐전일자
01304부산진 사업소건물철거2021-10-20
12312기장 사업소건물철거2021-12-21
23304부산진 사업소불필요등기타2021-12-21
34304부산진 사업소불필요등기타2021-12-21
45301중동부 사업소건물철거2021-12-28
56306남부 사업소건물철거2021-12-31
67302서부 사업소불필요등기타2021-09-10
78306남부 사업소건물철거2021-09-10
89311강서 사업소도시계획 도로편입2021-06-24
910309사하 사업소건물철거2021-06-23
연번사업소코드사업소명폐전사유폐전일자
35663567244동래통합사업소건물철거2021-03-29
35673568311강서 사업소도시계획 도로편입2021-03-30
35683569311강서 사업소불필요등기타2021-03-31
35693570312기장 사업소불필요등기타2021-04-01
35703571244동래통합사업소<NA><NA>
35713572304부산진 사업소건물철거2021-03-30
35723573309사하 사업소불필요등기타2021-04-05
35733574304부산진 사업소건물철거2021-02-02
35743575312기장 사업소직권폐전2021-03-15
35753576312기장 사업소불필요등기타2021-03-18