Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows197
Duplicate rows (%)2.0%
Total size in memory654.3 KiB
Average record size in memory67.0 B

Variable types

Numeric3
Text1
Categorical1
DateTime2

Dataset

Description충청남도 부여군 상하수도요금 고지 현황 정보입니다.(고지번호, 지역, 업종, 사용량, 사용료, 납기마감일, 데이터기준일자)
Author충청남도
URLhttps://alldam.chungnam.go.kr/index.chungnam?menuCd=DOM_000000201001001001&st=&cds=&orgCd=&apiType=&isOpen=Y&pageIndex=410&beforeMenuCd=DOM_000000201001001000&publicdatapk=15040581

Alerts

납기마감일 has constant value ""Constant
데이터기준일자 has constant value ""Constant
Dataset has 197 (2.0%) duplicate rowsDuplicates
사용량 is highly overall correlated with 사용료High correlation
사용료 is highly overall correlated with 사용량High correlation
업종 is highly imbalanced (79.7%)Imbalance
사용량 is highly skewed (γ1 = 57.11037773)Skewed
사용료 is highly skewed (γ1 = 31.59289652)Skewed
사용량 has 2116 (21.2%) zerosZeros

Reproduction

Analysis started2024-01-09 22:59:40.977185
Analysis finished2024-01-09 22:59:42.297948
Duration1.32 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

수용가번호
Real number (ℝ)

Distinct8228
Distinct (%)82.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.9846697 × 109
Minimum1.0110022 × 109
Maximum2.2007004 × 1010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-10T07:59:42.367277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.0110022 × 109
5-th percentile1.0320118 × 109
Q11.1410387 × 109
median2.3380114 × 109
Q31.4016003 × 1010
95-th percentile1.9015005 × 1010
Maximum2.2007004 × 1010
Range2.0996002 × 1010
Interquartile range (IQR)1.2874965 × 1010

Descriptive statistics

Standard deviation6.599785 × 109
Coefficient of variation (CV)0.9448958
Kurtosis-1.2485065
Mean6.9846697 × 109
Median Absolute Deviation (MAD)1.3069918 × 109
Skewness0.60590672
Sum6.9846697 × 1013
Variance4.3557163 × 1019
MonotonicityNot monotonic
2024-01-10T07:59:42.496785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2333011500 5
 
0.1%
1366001400 5
 
0.1%
1366001600 5
 
0.1%
1366002500 5
 
0.1%
2338010200 4
 
< 0.1%
2355014100 4
 
< 0.1%
1141011200 4
 
< 0.1%
2338003000 4
 
< 0.1%
3333002500 4
 
< 0.1%
2338007800 4
 
< 0.1%
Other values (8218) 9956
99.6%
ValueCountFrequency (%)
1011002200 1
 
< 0.1%
1011003700 1
 
< 0.1%
1011004900 1
 
< 0.1%
1011005300 1
 
< 0.1%
1011005800 1
 
< 0.1%
1011006200 1
 
< 0.1%
1011006500 2
< 0.1%
1011006600 1
 
< 0.1%
1011007300 3
< 0.1%
1011007500 1
 
< 0.1%
ValueCountFrequency (%)
22007004200 1
< 0.1%
22007003900 1
< 0.1%
22007003300 1
< 0.1%
22007001700 1
< 0.1%
22007001600 1
< 0.1%
22007001200 1
< 0.1%
22007000500 1
< 0.1%
22006003600 1
< 0.1%
22006003300 1
< 0.1%
22006002500 1
< 0.1%

지역
Text

Distinct302
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T07:59:42.778440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters80000
Distinct characters176
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique72 ?
Unique (%)0.7%

Sample

1st row충청남도 부여군
2nd row충청남도 부여군
3rd row충청남도 부여군
4th row충청남도 부여군
5th row충청남도 부여군
ValueCountFrequency (%)
충청남도 7638
38.0%
부여군 7638
38.0%
부여읍 743
 
3.7%
규암면 287
 
1.4%
세도면 173
 
0.9%
석성면 155
 
0.8%
초촌면 132
 
0.7%
임천면 129
 
0.6%
장암면 117
 
0.6%
구룡면 111
 
0.6%
Other values (263) 2957
 
14.7%
2024-01-10T07:59:43.160171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
12100
15.1%
8476
10.6%
8393
10.5%
7889
9.9%
7841
9.8%
7811
9.8%
7645
9.6%
7640
9.6%
2231
 
2.8%
1622
 
2.0%
Other values (166) 8352
10.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67815
84.8%
Space Separator 12100
 
15.1%
Decimal Number 85
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8476
12.5%
8393
12.4%
7889
11.6%
7841
11.6%
7811
11.5%
7645
11.3%
7640
11.3%
2231
 
3.3%
1622
 
2.4%
743
 
1.1%
Other values (156) 7524
11.1%
Decimal Number
ValueCountFrequency (%)
2 17
20.0%
7 12
14.1%
1 11
12.9%
4 10
11.8%
3 10
11.8%
5 10
11.8%
6 8
9.4%
9 6
 
7.1%
8 1
 
1.2%
Space Separator
ValueCountFrequency (%)
12100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67815
84.8%
Common 12185
 
15.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8476
12.5%
8393
12.4%
7889
11.6%
7841
11.6%
7811
11.5%
7645
11.3%
7640
11.3%
2231
 
3.3%
1622
 
2.4%
743
 
1.1%
Other values (156) 7524
11.1%
Common
ValueCountFrequency (%)
12100
99.3%
2 17
 
0.1%
7 12
 
0.1%
1 11
 
0.1%
4 10
 
0.1%
3 10
 
0.1%
5 10
 
0.1%
6 8
 
0.1%
9 6
 
< 0.1%
8 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67815
84.8%
ASCII 12185
 
15.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12100
99.3%
2 17
 
0.1%
7 12
 
0.1%
1 11
 
0.1%
4 10
 
0.1%
3 10
 
0.1%
5 10
 
0.1%
6 8
 
0.1%
9 6
 
< 0.1%
8 1
 
< 0.1%
Hangul
ValueCountFrequency (%)
8476
12.5%
8393
12.4%
7889
11.6%
7841
11.6%
7811
11.5%
7645
11.3%
7640
11.3%
2231
 
3.3%
1622
 
2.4%
743
 
1.1%
Other values (156) 7524
11.1%

업종
Categorical

IMBALANCE 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
가정용
8786 
일반용
1165 
일반겸용
 
27
일반(교육)용
 
14
대중탕용
 
6
Other values (2)
 
2

Length

Max length9
Median length3
Mean length3.0096
Min length3

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row일반용
2nd row가정용
3rd row가정용
4th row가정용
5th row가정용

Common Values

ValueCountFrequency (%)
가정용 8786
87.9%
일반용 1165
 
11.7%
일반겸용 27
 
0.3%
일반(교육)용 14
 
0.1%
대중탕용 6
 
0.1%
일반용(가구분할) 1
 
< 0.1%
가정용2 1
 
< 0.1%

Length

2024-01-10T07:59:43.288865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T07:59:43.394833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
가정용 8786
87.9%
일반용 1165
 
11.7%
일반겸용 27
 
0.3%
일반(교육)용 14
 
0.1%
대중탕용 6
 
0.1%
일반용(가구분할 1
 
< 0.1%
가정용2 1
 
< 0.1%

사용량
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct192
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.402
Minimum0
Maximum8382
Zeros2116
Zeros (%)21.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-10T07:59:43.512807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median10
Q319
95-th percentile43
Maximum8382
Range8382
Interquartile range (IQR)17

Descriptive statistics

Standard deviation125.60751
Coefficient of variation (CV)7.2179928
Kurtosis3623.2885
Mean17.402
Median Absolute Deviation (MAD)9
Skewness57.110378
Sum174020
Variance15777.247
MonotonicityNot monotonic
2024-01-10T07:59:43.657346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2116
21.2%
1 379
 
3.8%
7 330
 
3.3%
5 329
 
3.3%
6 320
 
3.2%
10 320
 
3.2%
4 310
 
3.1%
8 308
 
3.1%
9 305
 
3.0%
11 301
 
3.0%
Other values (182) 4982
49.8%
ValueCountFrequency (%)
0 2116
21.2%
1 379
 
3.8%
2 286
 
2.9%
3 292
 
2.9%
4 310
 
3.1%
5 329
 
3.3%
6 320
 
3.2%
7 330
 
3.3%
8 308
 
3.1%
9 305
 
3.0%
ValueCountFrequency (%)
8382 1
< 0.1%
7980 1
< 0.1%
3090 1
< 0.1%
1603 1
< 0.1%
1574 1
< 0.1%
1192 1
< 0.1%
903 1
< 0.1%
881 1
< 0.1%
606 1
< 0.1%
590 1
< 0.1%

사용료
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct2578
Distinct (%)25.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27275.047
Minimum0
Maximum9444190
Zeros29
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-10T07:59:43.782249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1140
Q13280
median10310
Q320690
95-th percentile66399.5
Maximum9444190
Range9444190
Interquartile range (IQR)17410

Descriptive statistics

Standard deviation185718.07
Coefficient of variation (CV)6.8090835
Kurtosis1217.6277
Mean27275.047
Median Absolute Deviation (MAD)8020
Skewness31.592897
Sum2.7275047 × 108
Variance3.4491202 × 1010
MonotonicityNot monotonic
2024-01-10T07:59:43.907856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1140 1044
 
10.4%
1320 422
 
4.2%
3280 88
 
0.9%
1640 81
 
0.8%
1230 64
 
0.6%
1360 62
 
0.6%
2290 56
 
0.6%
1950 50
 
0.5%
19540 49
 
0.5%
2120 49
 
0.5%
Other values (2568) 8035
80.3%
ValueCountFrequency (%)
0 29
0.3%
80 1
 
< 0.1%
340 2
 
< 0.1%
510 1
 
< 0.1%
560 1
 
< 0.1%
570 1
 
< 0.1%
590 1
 
< 0.1%
1030 1
 
< 0.1%
1110 1
 
< 0.1%
1120 19
0.2%
ValueCountFrequency (%)
9444190 1
< 0.1%
7263900 1
< 0.1%
6176270 1
< 0.1%
5302700 1
< 0.1%
4901660 1
< 0.1%
4660090 1
< 0.1%
4198920 1
< 0.1%
3492700 1
< 0.1%
3192700 1
< 0.1%
2502600 1
< 0.1%

납기마감일
Date

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2023-09-30 00:00:00
Maximum2023-09-30 00:00:00
2024-01-10T07:59:44.012751image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:59:44.093515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

데이터기준일자
Date

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2023-08-31 00:00:00
Maximum2023-08-31 00:00:00
2024-01-10T07:59:44.171711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:59:44.577922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Interactions

2024-01-10T07:59:41.882576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:59:41.395709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:59:41.642980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:59:41.964391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:59:41.476122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:59:41.722617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:59:42.041862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:59:41.561724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:59:41.800125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-01-10T07:59:44.642897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수용가번호업종사용량사용료
수용가번호1.0000.1260.0000.000
업종0.1261.0000.0440.104
사용량0.0000.0441.0000.942
사용료0.0000.1040.9421.000
2024-01-10T07:59:44.736561image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수용가번호사용량사용료업종
수용가번호1.000-0.147-0.2220.066
사용량-0.1471.0000.9480.030
사용료-0.2220.9481.0000.055
업종0.0660.0300.0551.000

Missing values

2024-01-10T07:59:42.145333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-10T07:59:42.247662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

수용가번호지역업종사용량사용료납기마감일데이터기준일자
445141021007500충청남도 부여군일반용29665802023-09-302023-08-31
5779412021011500충청남도 부여군가정용25313502023-09-302023-08-31
8422915121000900충청남도 부여군가정용345802023-09-302023-08-31
949091247003400충청남도 부여군가정용665702023-09-302023-08-31
528672085002500충청남도 부여군가정용443202023-09-302023-08-31
8204414012002300충청남도 부여군가정용011402023-09-302023-08-31
6118614039004200충청남도 부여군가정용011402023-09-302023-08-31
692301111018600충청남도 부여군가정용17205902023-09-302023-08-31
479561122035600충청남도 부여군가정용680102023-09-302023-08-31
727111251017600충청남도 부여군가정용011402023-09-302023-08-31
수용가번호지역업종사용량사용료납기마감일데이터기준일자
344697081000800충청남도 부여군가정용1091902023-09-302023-08-31
4214717071000800충청남도 부여군가정용011402023-09-302023-08-31
689491101001100충청남도 부여군가정용40568502023-09-302023-08-31
5908613101002200충청남도 부여군가정용011402023-09-302023-08-31
687551082008900충청남도 부여군가정용011402023-09-302023-08-31
236021051018300충청남도 부여군가정용679802023-09-302023-08-31
5913413102002100충청남도 부여군가정용335702023-09-302023-08-31
8266514027001400충청남도 부여군일반용024702023-09-302023-08-31
4315219019004600충청남도 부여군가정용011402023-09-302023-08-31
517732031006300충청남도 부여군가정용31404702023-09-302023-08-31

Duplicate rows

Most frequently occurring

수용가번호지역업종사용량사용료납기마감일데이터기준일자# duplicates
411366001600충청남도 부여군가정용027202023-09-302023-08-314
441366002500충청남도 부여군가정용032802023-09-302023-08-314
832338003000충청남도 부여군가정용032802023-09-302023-08-314
892338007800충청남도 부여군가정용032802023-09-302023-08-314
602333001900충청남도 부여군가정용032802023-09-302023-08-313
612333007300충청남도 부여군가정용032802023-09-302023-08-313
962338010500충청남도 부여군가정용0100002023-09-302023-08-313
1103333002500충청남도 부여군가정용013602023-09-302023-08-313
1123333004900충청남도 부여군가정용032802023-09-302023-08-313
1183335002600충청남도 부여군가정용016402023-09-302023-08-313