Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory742.2 KiB
Average record size in memory76.0 B

Variable types

Numeric4
Text1
Categorical3

Dataset

Description부산광역시상수도사업본부_수용가정보시스템_요금계산관련정보_추징계산이력_20220609
Author부산광역시 상수도사업본부
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15083669

Alerts

추징금액(하) is highly overall correlated with 추징금액(물)High correlation
추징금액(물) is highly overall correlated with 추징금액(하)High correlation
추징발생년월 is highly overall correlated with 고지년월 and 1 other fieldsHigh correlation
고지년월 is highly overall correlated with 추징발생년월 and 1 other fieldsHigh correlation
계산년월 is highly overall correlated with 추징발생년월 and 1 other fieldsHigh correlation
추징발생년월 is highly imbalanced (90.9%)Imbalance
고지년월 is highly imbalanced (71.3%)Imbalance
계산년월 is highly imbalanced (93.3%)Imbalance
추징금액(상) is highly skewed (γ1 = -79.55228714)Skewed
추징금액(하) is highly skewed (γ1 = -22.92633547)Skewed
연번 has unique valuesUnique
추징금액(하) has 9963 (99.6%) zerosZeros
추징금액(물) has 9970 (99.7%) zerosZeros

Reproduction

Analysis started2023-12-10 17:13:49.178704
Analysis finished2023-12-10 17:13:55.298172
Duration6.12 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47590.43
Minimum5
Maximum95392
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T02:13:55.483719image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile4804.95
Q123302.5
median47946
Q371434.75
95-th percentile90594.5
Maximum95392
Range95387
Interquartile range (IQR)48132.25

Descriptive statistics

Standard deviation27544.536
Coefficient of variation (CV)0.57878308
Kurtosis-1.2026119
Mean47590.43
Median Absolute Deviation (MAD)23977
Skewness-0.0031175675
Sum4.759043 × 108
Variance7.5870145 × 108
MonotonicityNot monotonic
2023-12-11T02:13:55.848858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33739 1
 
< 0.1%
2204 1
 
< 0.1%
8859 1
 
< 0.1%
26291 1
 
< 0.1%
15637 1
 
< 0.1%
11995 1
 
< 0.1%
86813 1
 
< 0.1%
28844 1
 
< 0.1%
40752 1
 
< 0.1%
83714 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
5 1
< 0.1%
7 1
< 0.1%
19 1
< 0.1%
28 1
< 0.1%
44 1
< 0.1%
50 1
< 0.1%
83 1
< 0.1%
85 1
< 0.1%
88 1
< 0.1%
97 1
< 0.1%
ValueCountFrequency (%)
95392 1
< 0.1%
95390 1
< 0.1%
95389 1
< 0.1%
95359 1
< 0.1%
95348 1
< 0.1%
95325 1
< 0.1%
95308 1
< 0.1%
95303 1
< 0.1%
95290 1
< 0.1%
95284 1
< 0.1%
Distinct5792
Distinct (%)57.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T02:13:57.400921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters60000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3201 ?
Unique (%)32.0%

Sample

1st row*91*30
2nd row*19*90
3rd row*97*05
4th row*07*83
5th row*92*27
ValueCountFrequency (%)
93*59 21
 
0.2%
07*98 8
 
0.1%
07*88 8
 
0.1%
07*13 8
 
0.1%
07*89 8
 
0.1%
07*16 8
 
0.1%
89*98 7
 
0.1%
00*16 7
 
0.1%
07*75 7
 
0.1%
21*93 7
 
0.1%
Other values (5782) 9911
99.1%
2023-12-11T02:13:58.600133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 20000
33.3%
0 5193
 
8.7%
9 4402
 
7.3%
1 4274
 
7.1%
5 4048
 
6.7%
3 3898
 
6.5%
7 3856
 
6.4%
2 3810
 
6.3%
8 3715
 
6.2%
4 3535
 
5.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 40000
66.7%
Other Punctuation 20000
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5193
13.0%
9 4402
11.0%
1 4274
10.7%
5 4048
10.1%
3 3898
9.7%
7 3856
9.6%
2 3810
9.5%
8 3715
9.3%
4 3535
8.8%
6 3269
8.2%
Other Punctuation
ValueCountFrequency (%)
* 20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 60000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
* 20000
33.3%
0 5193
 
8.7%
9 4402
 
7.3%
1 4274
 
7.1%
5 4048
 
6.7%
3 3898
 
6.5%
7 3856
 
6.4%
2 3810
 
6.3%
8 3715
 
6.2%
4 3535
 
5.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 20000
33.3%
0 5193
 
8.7%
9 4402
 
7.3%
1 4274
 
7.1%
5 4048
 
6.7%
3 3898
 
6.5%
7 3856
 
6.4%
2 3810
 
6.3%
8 3715
 
6.2%
4 3535
 
5.9%

추징발생년월
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2021-07-01
9598 
2021-08-01
 
357
2021-03-01
 
23
2021-02-01
 
7
2021-06-01
 
4
Other values (3)
 
11

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021-07-01
2nd row2021-07-01
3rd row2021-07-01
4th row2021-07-01
5th row2021-07-01

Common Values

ValueCountFrequency (%)
2021-07-01 9598
96.0%
2021-08-01 357
 
3.6%
2021-03-01 23
 
0.2%
2021-02-01 7
 
0.1%
2021-06-01 4
 
< 0.1%
2021-05-01 4
 
< 0.1%
2021-01-01 4
 
< 0.1%
2021-04-01 3
 
< 0.1%

Length

2023-12-11T02:13:58.959071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T02:13:59.208395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2021-07-01 9598
96.0%
2021-08-01 357
 
3.6%
2021-03-01 23
 
0.2%
2021-02-01 7
 
0.1%
2021-06-01 4
 
< 0.1%
2021-05-01 4
 
< 0.1%
2021-01-01 4
 
< 0.1%
2021-04-01 3
 
< 0.1%

고지년월
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2021-07-01
5387 
2021-08-01
4457 
2021-09-01
 
117
2021-03-01
 
8
2021-05-01
 
6
Other values (10)
 
25

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row2021-07-01
2nd row2021-07-01
3rd row2021-08-01
4th row2021-07-01
5th row2021-07-01

Common Values

ValueCountFrequency (%)
2021-07-01 5387
53.9%
2021-08-01 4457
44.6%
2021-09-01 117
 
1.2%
2021-03-01 8
 
0.1%
2021-05-01 6
 
0.1%
2021-04-01 5
 
0.1%
2022-05-01 4
 
< 0.1%
2021-02-01 4
 
< 0.1%
2021-11-01 3
 
< 0.1%
2022-03-01 3
 
< 0.1%
Other values (5) 6
 
0.1%

Length

2023-12-11T02:13:59.523526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2021-07-01 5387
53.9%
2021-08-01 4457
44.6%
2021-09-01 117
 
1.2%
2021-03-01 8
 
0.1%
2021-05-01 6
 
0.1%
2021-04-01 5
 
< 0.1%
2022-05-01 4
 
< 0.1%
2021-02-01 4
 
< 0.1%
2021-11-01 3
 
< 0.1%
2022-03-01 3
 
< 0.1%
Other values (5) 6
 
0.1%

계산년월
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2021-07-01
9597 
2021-08-01
 
360
2021-03-01
 
10
2021-02-01
 
6
2021-05-01
 
5
Other values (13)
 
22

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st row2021-07-01
2nd row2021-07-01
3rd row2021-07-01
4th row2021-07-01
5th row2021-07-01

Common Values

ValueCountFrequency (%)
2021-07-01 9597
96.0%
2021-08-01 360
 
3.6%
2021-03-01 10
 
0.1%
2021-02-01 6
 
0.1%
2021-05-01 5
 
0.1%
2021-06-01 4
 
< 0.1%
2022-02-01 2
 
< 0.1%
2022-05-01 2
 
< 0.1%
2021-10-01 2
 
< 0.1%
2021-11-01 2
 
< 0.1%
Other values (8) 10
 
0.1%

Length

2023-12-11T02:13:59.974379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2021-07-01 9597
96.0%
2021-08-01 360
 
3.6%
2021-03-01 10
 
0.1%
2021-02-01 6
 
0.1%
2021-05-01 5
 
< 0.1%
2021-06-01 4
 
< 0.1%
2022-03-01 2
 
< 0.1%
2022-04-01 2
 
< 0.1%
2021-11-01 2
 
< 0.1%
2021-10-01 2
 
< 0.1%
Other values (8) 10
 
0.1%

추징금액(상)
Real number (ℝ)

SKEWED 

Distinct911
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-3093.195
Minimum-5792950
Maximum155990
Zeros15
Zeros (%)0.1%
Negative9981
Negative (%)99.8%
Memory size166.0 KiB
2023-12-11T02:14:00.378233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-5792950
5-th percentile-7211.5
Q1-1200
median-430
Q3-140
95-th percentile-20
Maximum155990
Range5948940
Interquartile range (IQR)1060

Descriptive statistics

Standard deviation63526.446
Coefficient of variation (CV)-20.537485
Kurtosis7013.8682
Mean-3093.195
Median Absolute Deviation (MAD)350
Skewness-79.552287
Sum-30931950
Variance4.0356094 × 109
MonotonicityNot monotonic
2023-12-11T02:14:00.748358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-10 286
 
2.9%
-30 247
 
2.5%
-40 245
 
2.5%
-20 243
 
2.4%
-90 231
 
2.3%
-60 228
 
2.3%
-80 219
 
2.2%
-70 211
 
2.1%
-120 190
 
1.9%
-110 182
 
1.8%
Other values (901) 7718
77.2%
ValueCountFrequency (%)
-5792950 1
< 0.1%
-2040550 1
< 0.1%
-955010 1
< 0.1%
-463250 1
< 0.1%
-449300 1
< 0.1%
-367610 1
< 0.1%
-353690 1
< 0.1%
-347800 1
< 0.1%
-302290 1
< 0.1%
-299040 1
< 0.1%
ValueCountFrequency (%)
155990 1
 
< 0.1%
94630 1
 
< 0.1%
460 2
 
< 0.1%
0 15
 
0.1%
-10 286
2.9%
-20 243
2.4%
-30 247
2.5%
-40 245
2.5%
-60 228
2.3%
-70 211
2.1%

추징금액(하)
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct32
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-19.946
Minimum-356310
Maximum212900
Zeros9963
Zeros (%)99.6%
Negative34
Negative (%)0.3%
Memory size166.0 KiB
2023-12-11T02:14:01.076614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-356310
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum212900
Range569210
Interquartile range (IQR)0

Descriptive statistics

Standard deviation5010.3727
Coefficient of variation (CV)-251.19687
Kurtosis3246.3427
Mean-19.946
Median Absolute Deviation (MAD)0
Skewness-22.926335
Sum-199460
Variance25103835
MonotonicityNot monotonic
2023-12-11T02:14:01.508176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
0 9963
99.6%
-1800 3
 
< 0.1%
-900 3
 
< 0.1%
-4500 2
 
< 0.1%
-220 2
 
< 0.1%
-190 1
 
< 0.1%
-670 1
 
< 0.1%
-25050 1
 
< 0.1%
-1150 1
 
< 0.1%
-8020 1
 
< 0.1%
Other values (22) 22
 
0.2%
ValueCountFrequency (%)
-356310 1
< 0.1%
-116800 1
< 0.1%
-111690 1
< 0.1%
-25050 1
< 0.1%
-15820 1
< 0.1%
-11770 1
< 0.1%
-8020 1
< 0.1%
-7690 1
< 0.1%
-6200 1
< 0.1%
-5660 1
< 0.1%
ValueCountFrequency (%)
212900 1
 
< 0.1%
208500 1
 
< 0.1%
88320 1
 
< 0.1%
0 9963
99.6%
-190 1
 
< 0.1%
-220 2
 
< 0.1%
-450 1
 
< 0.1%
-530 1
 
< 0.1%
-670 1
 
< 0.1%
-860 1
 
< 0.1%

추징금액(물)
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct26
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-3.425
Minimum-21770
Maximum18420
Zeros9970
Zeros (%)99.7%
Negative28
Negative (%)0.3%
Memory size166.0 KiB
2023-12-11T02:14:01.846278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-21770
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum18420
Range40190
Interquartile range (IQR)0

Descriptive statistics

Standard deviation362.61099
Coefficient of variation (CV)-105.87182
Kurtosis2388.6316
Mean-3.425
Median Absolute Deviation (MAD)0
Skewness-9.1915201
Sum-34250
Variance131486.73
MonotonicityNot monotonic
2023-12-11T02:14:02.202810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
0 9970
99.7%
-70 4
 
< 0.1%
-820 2
 
< 0.1%
-1500 2
 
< 0.1%
-750 1
 
< 0.1%
-12300 1
 
< 0.1%
-590 1
 
< 0.1%
-5210 1
 
< 0.1%
-290 1
 
< 0.1%
-1110 1
 
< 0.1%
Other values (16) 16
 
0.2%
ValueCountFrequency (%)
-21770 1
< 0.1%
-12300 1
< 0.1%
-10210 1
< 0.1%
-5210 1
< 0.1%
-2310 1
< 0.1%
-1780 1
< 0.1%
-1640 1
< 0.1%
-1500 2
< 0.1%
-1110 1
< 0.1%
-830 1
< 0.1%
ValueCountFrequency (%)
18420 1
 
< 0.1%
14050 1
 
< 0.1%
0 9970
99.7%
-60 1
 
< 0.1%
-70 4
 
< 0.1%
-140 1
 
< 0.1%
-220 1
 
< 0.1%
-290 1
 
< 0.1%
-300 1
 
< 0.1%
-370 1
 
< 0.1%

Interactions

2023-12-11T02:13:53.753713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:51.077617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:51.954458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:52.861166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:53.964293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:51.287517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:52.171208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:53.094864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:54.204119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:51.511566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:52.379817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:53.313937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:54.431846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:51.718783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:52.621757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:13:53.516360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T02:14:02.452749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번추징발생년월고지년월계산년월추징금액(상)추징금액(하)추징금액(물)
연번1.0000.4470.3410.4700.0250.0610.061
추징발생년월0.4471.0000.9040.9170.0000.5110.478
고지년월0.3410.9041.0000.9690.0000.7030.557
계산년월0.4700.9170.9691.0000.0000.6490.613
추징금액(상)0.0250.0000.0000.0001.0000.0000.000
추징금액(하)0.0610.5110.7030.6490.0001.0000.966
추징금액(물)0.0610.4780.5570.6130.0000.9661.000
2023-12-11T02:14:02.816878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
고지년월계산년월추징발생년월
고지년월1.0000.7960.685
계산년월0.7961.0000.711
추징발생년월0.6850.7111.000
2023-12-11T02:14:03.037527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번추징금액(상)추징금액(하)추징금액(물)추징발생년월고지년월계산년월
연번1.0000.0370.0780.0770.2320.1340.200
추징금액(상)0.0371.0000.0110.0230.0000.0000.000
추징금액(하)0.0780.0111.0000.8400.3450.3840.394
추징금액(물)0.0770.0230.8401.0000.3620.3000.381
추징발생년월0.2320.0000.3450.3621.0000.6850.711
고지년월0.1340.0000.3840.3000.6851.0000.796
계산년월0.2000.0000.3940.3810.7110.7961.000

Missing values

2023-12-11T02:13:54.759208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T02:13:55.153531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번고객번호추징발생년월고지년월계산년월추징금액(상)추징금액(하)추징금액(물)
3373833739*91*302021-07-012021-07-012021-07-01-729000
6954669547*19*902021-07-012021-07-012021-07-01-8000
8181381814*97*052021-07-012021-08-012021-07-01-9000
8708887089*07*832021-07-012021-07-012021-07-01-245000
3403834039*92*272021-07-012021-07-012021-07-01-22000
7526075261*51*622021-07-012021-08-012021-07-01-275000
7257072571*35*542021-07-012021-07-012021-07-01-200000
11151116*00*792021-07-012021-07-012021-07-01-4000
7329873299*38*762021-07-012021-08-012021-07-01-34000
66366637*33*022021-07-012021-07-012021-07-01-135000
연번고객번호추징발생년월고지년월계산년월추징금액(상)추징금액(하)추징금액(물)
1658516586*93*182021-07-012021-08-012021-07-01-22000
2802928030*55*712021-07-012021-07-012021-07-01-25000
4329343294*42*652021-07-012021-07-012021-07-01-30000
1605916060*89*232021-07-012021-08-012021-07-01-54000
6487564876*99*522021-07-012021-07-012021-07-01-17000
5940159402*13*322021-07-012021-07-012021-07-01-28000
7780877809*90*972021-07-012021-08-012021-07-01-4000
3536635367*98*682021-07-012021-08-012021-07-01-13000
8096080961*96*922021-07-012021-07-012021-07-01-16000
2757327574*53*502021-07-012021-08-012021-07-01-16000