Overview

Dataset statistics

Number of variables8
Number of observations603
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory40.2 KiB
Average record size in memory68.2 B

Variable types

Numeric4
Text1
Categorical3

Dataset

Description부산광역시 상수도사업본부에서 상하수도 요금 계산 및 징수를 위해 운영하는 수용가정보시스템에 사용되는 요금계산 관련 정보(추징계산 이력) 자료입니다.
Author부산광역시 상수도사업본부
URLhttps://www.data.go.kr/data/15083669/fileData.do

Alerts

연번 is highly overall correlated with 추징발생년월 and 2 other fieldsHigh correlation
추징금액(상) is highly overall correlated with 추징금액(하) and 1 other fieldsHigh correlation
추징금액(하) is highly overall correlated with 추징금액(상) and 1 other fieldsHigh correlation
추징금액(물) is highly overall correlated with 추징금액(상) and 1 other fieldsHigh correlation
추징발생년월 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
고지년월 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
계산년월 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
연번 has unique valuesUnique
추징금액(상) has 170 (28.2%) zerosZeros
추징금액(하) has 97 (16.1%) zerosZeros
추징금액(물) has 193 (32.0%) zerosZeros

Reproduction

Analysis started2024-03-14 09:21:28.541103
Analysis finished2024-03-14 09:21:32.370843
Duration3.83 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct603
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean302
Minimum1
Maximum603
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.4 KiB
2024-03-14T18:21:32.500307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile31.1
Q1151.5
median302
Q3452.5
95-th percentile572.9
Maximum603
Range602
Interquartile range (IQR)301

Descriptive statistics

Standard deviation174.21538
Coefficient of variation (CV)0.57687213
Kurtosis-1.2
Mean302
Median Absolute Deviation (MAD)151
Skewness0
Sum182106
Variance30351
MonotonicityStrictly increasing
2024-03-14T18:21:32.769803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.2%
398 1
 
0.2%
400 1
 
0.2%
401 1
 
0.2%
402 1
 
0.2%
403 1
 
0.2%
404 1
 
0.2%
405 1
 
0.2%
406 1
 
0.2%
407 1
 
0.2%
Other values (593) 593
98.3%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
603 1
0.2%
602 1
0.2%
601 1
0.2%
600 1
0.2%
599 1
0.2%
598 1
0.2%
597 1
0.2%
596 1
0.2%
595 1
0.2%
594 1
0.2%
Distinct130
Distinct (%)21.6%
Missing0
Missing (%)0.0%
Memory size4.8 KiB
2024-03-14T18:21:33.927222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters3618
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique72 ?
Unique (%)11.9%

Sample

1st row*11*53
2nd row*19*30
3rd row*74*85
4th row*77*32
5th row*59*77
ValueCountFrequency (%)
20*29 103
17.1%
94*17 53
 
8.8%
21*88 51
 
8.5%
87*20 36
 
6.0%
87*15 27
 
4.5%
98*78 26
 
4.3%
11*32 22
 
3.6%
95*02 17
 
2.8%
95*98 17
 
2.8%
53*28 13
 
2.2%
Other values (120) 238
39.5%
2024-03-14T18:21:35.318371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 1206
33.3%
2 425
 
11.7%
9 350
 
9.7%
8 308
 
8.5%
1 298
 
8.2%
0 278
 
7.7%
7 231
 
6.4%
5 188
 
5.2%
3 134
 
3.7%
4 125
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2412
66.7%
Other Punctuation 1206
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 425
17.6%
9 350
14.5%
8 308
12.8%
1 298
12.4%
0 278
11.5%
7 231
9.6%
5 188
7.8%
3 134
 
5.6%
4 125
 
5.2%
6 75
 
3.1%
Other Punctuation
ValueCountFrequency (%)
* 1206
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3618
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
* 1206
33.3%
2 425
 
11.7%
9 350
 
9.7%
8 308
 
8.5%
1 298
 
8.2%
0 278
 
7.7%
7 231
 
6.4%
5 188
 
5.2%
3 134
 
3.7%
4 125
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3618
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 1206
33.3%
2 425
 
11.7%
9 350
 
9.7%
8 308
 
8.5%
1 298
 
8.2%
0 278
 
7.7%
7 231
 
6.4%
5 188
 
5.2%
3 134
 
3.7%
4 125
 
3.5%

추징발생년월
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size4.8 KiB
2023-04
117 
2023-03
83 
2023-09
83 
2023-12
82 
2023-10
53 
Other values (7)
185 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-01
2nd row2023-01
3rd row2023-01
4th row2023-01
5th row2023-01

Common Values

ValueCountFrequency (%)
2023-04 117
19.4%
2023-03 83
13.8%
2023-09 83
13.8%
2023-12 82
13.6%
2023-10 53
8.8%
2023-06 41
 
6.8%
2023-05 39
 
6.5%
2023-08 32
 
5.3%
2023-02 25
 
4.1%
2023-11 22
 
3.6%
Other values (2) 26
 
4.3%

Length

2024-03-14T18:21:35.582146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2023-04 117
19.4%
2023-03 83
13.8%
2023-09 83
13.8%
2023-12 82
13.6%
2023-10 53
8.8%
2023-06 41
 
6.8%
2023-05 39
 
6.5%
2023-08 32
 
5.3%
2023-02 25
 
4.1%
2023-11 22
 
3.6%
Other values (2) 26
 
4.3%

고지년월
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size4.8 KiB
2023-12
100 
2023-03
79 
2023-10
71 
2023-09
65 
2023-06
59 
Other values (7)
229 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-01
2nd row2023-01
3rd row2023-01
4th row2023-01
5th row2023-01

Common Values

ValueCountFrequency (%)
2023-12 100
16.6%
2023-03 79
13.1%
2023-10 71
11.8%
2023-09 65
10.8%
2023-06 59
9.8%
2023-04 55
9.1%
2023-08 51
8.5%
2023-05 42
7.0%
2023-11 36
 
6.0%
2023-02 23
 
3.8%
Other values (2) 22
 
3.6%

Length

2024-03-14T18:21:35.783122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2023-12 100
16.6%
2023-03 79
13.1%
2023-10 71
11.8%
2023-09 65
10.8%
2023-06 59
9.8%
2023-04 55
9.1%
2023-08 51
8.5%
2023-05 42
7.0%
2023-11 36
 
6.0%
2023-02 23
 
3.8%
Other values (2) 22
 
3.6%

계산년월
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size4.8 KiB
2023-11
84 
2023-09
83 
2023-03
70 
2023-08
54 
2023-02
50 
Other values (8)
262 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-12
2nd row2023-01
3rd row2023-01
4th row2022-12
5th row2023-01

Common Values

ValueCountFrequency (%)
2023-11 84
13.9%
2023-09 83
13.8%
2023-03 70
11.6%
2023-08 54
9.0%
2023-02 50
8.3%
2023-04 50
8.3%
2023-05 50
8.3%
2023-10 38
6.3%
2023-12 36
6.0%
2023-06 35
5.8%
Other values (3) 53
8.8%

Length

2024-03-14T18:21:35.988397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2023-11 84
13.9%
2023-09 83
13.8%
2023-03 70
11.6%
2023-08 54
9.0%
2023-02 50
8.3%
2023-04 50
8.3%
2023-05 50
8.3%
2023-10 38
6.3%
2023-12 36
6.0%
2023-06 35
5.8%
Other values (3) 53
8.8%

추징금액(상)
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct194
Distinct (%)32.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10025.705
Minimum-2023050
Maximum1788990
Zeros170
Zeros (%)28.2%
Negative329
Negative (%)54.6%
Memory size5.4 KiB
2024-03-14T18:21:36.208344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2023050
5-th percentile-49350
Q1-10570
median-1880
Q30
95-th percentile141858
Maximum1788990
Range3812040
Interquartile range (IQR)10570

Descriptive statistics

Standard deviation185538.86
Coefficient of variation (CV)18.506316
Kurtosis64.551054
Mean10025.705
Median Absolute Deviation (MAD)4360
Skewness1.8939531
Sum6045500
Variance3.4424668 × 1010
MonotonicityNot monotonic
2024-03-14T18:21:36.565187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 170
28.2%
-4020 22
 
3.6%
-1880 17
 
2.8%
-11850 16
 
2.7%
-49350 16
 
2.7%
-1200 15
 
2.5%
-6640 15
 
2.5%
-20080 15
 
2.5%
-39730 15
 
2.5%
-7200 10
 
1.7%
Other values (184) 292
48.4%
ValueCountFrequency (%)
-2023050 1
0.2%
-1000000 1
0.2%
-864180 1
0.2%
-494530 1
0.2%
-486140 1
0.2%
-470410 1
0.2%
-412000 1
0.2%
-291440 1
0.2%
-286920 1
0.2%
-214680 1
0.2%
ValueCountFrequency (%)
1788990 2
0.3%
1648380 1
 
0.2%
972120 1
 
0.2%
600000 4
0.7%
500000 2
0.3%
404610 1
 
0.2%
329580 3
0.5%
329540 1
 
0.2%
243070 2
0.3%
200000 4
0.7%

추징금액(하)
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct306
Distinct (%)50.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-35328.889
Minimum-16155750
Maximum15645610
Zeros97
Zeros (%)16.1%
Negative415
Negative (%)68.8%
Memory size5.4 KiB
2024-03-14T18:21:37.024484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-16155750
5-th percentile-87888
Q1-14620
median-3000
Q30
95-th percentile140550
Maximum15645610
Range31801360
Interquartile range (IQR)14620

Descriptive statistics

Standard deviation1165779.7
Coefficient of variation (CV)-32.997916
Kurtosis137.9365
Mean-35328.889
Median Absolute Deviation (MAD)4690
Skewness-1.8703012
Sum-21303320
Variance1.3590423 × 1012
MonotonicityNot monotonic
2024-03-14T18:21:37.275846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 97
 
16.1%
-2800 17
 
2.8%
-4500 13
 
2.2%
-1350 13
 
2.2%
140550 9
 
1.5%
-2250 9
 
1.5%
-9190 8
 
1.3%
-2920 8
 
1.3%
-2020 7
 
1.2%
-2470 7
 
1.2%
Other values (296) 415
68.8%
ValueCountFrequency (%)
-16155750 1
0.2%
-10962830 1
0.2%
-8817560 1
0.2%
-5385250 1
0.2%
-1217180 1
0.2%
-1199980 1
0.2%
-835460 1
0.2%
-635250 1
0.2%
-547950 2
0.3%
-502320 1
0.2%
ValueCountFrequency (%)
15645610 1
0.2%
8817560 1
0.2%
716400 1
0.2%
537080 1
0.2%
502320 1
0.2%
395550 1
0.2%
387000 1
0.2%
376200 1
0.2%
360000 1
0.2%
324050 1
0.2%

추징금액(물)
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct172
Distinct (%)28.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean905.87065
Minimum-2204390
Maximum2204390
Zeros193
Zeros (%)32.0%
Negative319
Negative (%)52.9%
Memory size5.4 KiB
2024-03-14T18:21:37.538702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2204390
5-th percentile-7650
Q1-1740
median-300
Q30
95-th percentile17950
Maximum2204390
Range4408780
Interquartile range (IQR)1740

Descriptive statistics

Standard deviation128989.9
Coefficient of variation (CV)142.39329
Kurtosis283.25408
Mean905.87065
Median Absolute Deviation (MAD)610
Skewness-0.0084849768
Sum546240
Variance1.6638395 × 1010
MonotonicityNot monotonic
2024-03-14T18:21:37.893651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 193
32.0%
-610 22
 
3.6%
-370 17
 
2.8%
-7650 16
 
2.7%
-6580 16
 
2.7%
-2060 16
 
2.7%
-1170 15
 
2.5%
-3210 15
 
2.5%
-450 10
 
1.7%
-1510 9
 
1.5%
Other values (162) 274
45.4%
ValueCountFrequency (%)
-2204390 1
0.2%
-233230 1
0.2%
-97830 1
0.2%
-75900 1
0.2%
-68690 2
0.3%
-57300 1
0.2%
-44420 1
0.2%
-34080 1
0.2%
-33540 1
0.2%
-33390 1
0.2%
ValueCountFrequency (%)
2204390 1
 
0.2%
210550 2
0.3%
175100 1
 
0.2%
141500 1
 
0.2%
131910 1
 
0.2%
76070 4
0.7%
46650 1
 
0.2%
37370 3
0.5%
37360 1
 
0.2%
35100 1
 
0.2%

Interactions

2024-03-14T18:21:31.278069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:28.978688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:29.830899image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:30.509505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:31.437973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:29.240816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:29.999364image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:30.779232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:31.617774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:29.503694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:30.178530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:30.954953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:31.831370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:29.678200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:30.348714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T18:21:31.119929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-14T18:21:38.060004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번추징발생년월고지년월계산년월추징금액(상)추징금액(하)추징금액(물)
연번1.0000.9400.8810.8130.1450.0910.301
추징발생년월0.9401.0000.9870.8550.2680.1240.433
고지년월0.8810.9871.0000.9240.2600.1330.313
계산년월0.8130.8550.9241.0000.2070.1020.182
추징금액(상)0.1450.2680.2600.2071.0000.4510.300
추징금액(하)0.0910.1240.1330.1020.4511.0000.941
추징금액(물)0.3010.4330.3130.1820.3000.9411.000
2024-03-14T18:21:38.245520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
고지년월계산년월추징발생년월
고지년월1.0000.6910.783
계산년월0.6911.0000.544
추징발생년월0.7830.5441.000
2024-03-14T18:21:38.400667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번추징금액(상)추징금액(하)추징금액(물)추징발생년월고지년월계산년월
연번1.000-0.0190.1230.0160.7770.6340.511
추징금액(상)-0.0191.0000.6130.8930.1220.1160.082
추징금액(하)0.1230.6131.0000.6610.0380.0570.041
추징금액(물)0.0160.8930.6611.0000.1690.1140.062
추징발생년월0.7770.1220.0380.1691.0000.7830.544
고지년월0.6340.1160.0570.1140.7831.0000.691
계산년월0.5110.0820.0410.0620.5440.6911.000

Missing values

2024-03-14T18:21:32.061565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T18:21:32.283184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번고객번호추징발생년월고지년월계산년월추징금액(상)추징금액(하)추징금액(물)
01*11*532023-012023-012022-12-291440-4800-44420
12*19*302023-012023-012023-01600000076070
23*74*852023-012023-012023-01137700020980
34*77*322023-012023-012022-1212000000
45*59*772023-012023-012023-01-115140-195270-17720
56*04*062023-012023-022023-02-33340-52420-5120
67*35*022023-012023-012023-0100-9160
78*41*802023-012023-012022-1232958019824037370
89*30*602023-012023-012023-010-959400
910*30*602023-012023-022023-020-1098500
연번고객번호추징발생년월고지년월계산년월추징금액(상)추징금액(하)추징금액(물)
593594*95*982023-122023-122023-11-1880-2800-370
594595*95*982023-122023-122023-11-1880-2800-370
595596*95*982023-122023-122023-11-1880-2800-370
596597*95*982023-122023-122023-11-1880-2800-370
597598*95*982023-122023-122023-11-1880-2800-370
598599*95*982023-122023-122023-11-1880-2800-370
599600*34*652023-122023-122023-1213006016663015570
600601*12*542023-122023-122023-110537080141500
601602*19*602023-122023-122023-12-6040-14840-510
602603*15*162023-122023-122023-1203870000