Overview

Dataset statistics

Number of variables7
Number of observations437
Missing cells0
Missing cells (%)0.0%
Duplicate rows135
Duplicate rows (%)30.9%
Total size in memory26.2 KiB
Average record size in memory61.3 B

Variable types

Numeric4
Categorical3

Alerts

시도명 has constant value ""Constant
Dataset has 135 (30.9%) duplicate rowsDuplicates
가구수 is highly overall correlated with 시군구명High correlation
평균전력사용량 is highly overall correlated with 평균전기요금High correlation
평균전기요금 is highly overall correlated with 평균전력사용량High correlation
시군구명 is highly overall correlated with 가구수High correlation

Reproduction

Analysis started2024-03-13 11:46:02.599919
Analysis finished2024-03-13 11:46:04.945965
Duration2.35 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

가구수
Real number (ℝ)

HIGH CORRELATION 

Distinct298
Distinct (%)68.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84723.604
Minimum3
Maximum375586
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2024-03-13T20:46:05.011808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile19674.6
Q135541
median60907
Q3110795
95-th percentile361791.6
Maximum375586
Range375583
Interquartile range (IQR)75254

Descriptive statistics

Standard deviation86140.141
Coefficient of variation (CV)1.0167195
Kurtosis4.6813406
Mean84723.604
Median Absolute Deviation (MAD)25573
Skewness2.282445
Sum37024215
Variance7.4201238 × 109
MonotonicityNot monotonic
2024-03-13T20:46:05.159718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3 6
 
1.4%
21345 3
 
0.7%
22257 2
 
0.5%
39768 2
 
0.5%
63747 2
 
0.5%
51436 2
 
0.5%
194451 2
 
0.5%
116857 2
 
0.5%
111170 2
 
0.5%
35182 2
 
0.5%
Other values (288) 412
94.3%
ValueCountFrequency (%)
3 6
1.4%
19470 1
 
0.2%
19508 2
 
0.5%
19536 2
 
0.5%
19562 1
 
0.2%
19579 2
 
0.5%
19602 2
 
0.5%
19613 2
 
0.5%
19632 2
 
0.5%
19657 2
 
0.5%
ValueCountFrequency (%)
375586 1
0.2%
374952 1
0.2%
373490 1
0.2%
372843 1
0.2%
372705 1
0.2%
369085 1
0.2%
368954 1
0.2%
368054 1
0.2%
368023 1
0.2%
366907 2
0.5%

조회월
Real number (ℝ)

Distinct12
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.02746
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2024-03-13T20:46:05.332320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q38
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.3577483
Coefficient of variation (CV)0.55707517
Kurtosis-1.0178662
Mean6.02746
Median Absolute Deviation (MAD)2
Skewness0.15940546
Sum2634
Variance11.274474
MonotonicityNot monotonic
2024-03-13T20:46:05.459345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1 47
10.8%
2 47
10.8%
4 45
10.3%
5 45
10.3%
6 45
10.3%
7 45
10.3%
8 45
10.3%
12 30
6.9%
10 29
6.6%
11 29
6.6%
Other values (2) 30
6.9%
ValueCountFrequency (%)
1 47
10.8%
2 47
10.8%
3 16
 
3.7%
4 45
10.3%
5 45
10.3%
6 45
10.3%
7 45
10.3%
8 45
10.3%
9 14
 
3.2%
10 29
6.6%
ValueCountFrequency (%)
12 30
6.9%
11 29
6.6%
10 29
6.6%
9 14
 
3.2%
8 45
10.3%
7 45
10.3%
6 45
10.3%
5 45
10.3%
4 45
10.3%
3 16
 
3.7%

조회연도
Categorical

Distinct3
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
2022
316 
2023
105 
2021
 
16

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021
2nd row2021
3rd row2021
4th row2021
5th row2021

Common Values

ValueCountFrequency (%)
2022 316
72.3%
2023 105
 
24.0%
2021 16
 
3.7%

Length

2024-03-13T20:46:05.610352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T20:46:05.741586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022 316
72.3%
2023 105
 
24.0%
2021 16
 
3.7%

시군구명
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
당진시
 
29
보령시
 
29
계룡시
 
29
공주시
 
29
논산시
 
29
Other values (11)
292 

Length

Max length7
Median length3
Mean length3.0549199
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row천안시
2nd row당진시
3rd row보령시
4th row계룡시
5th row공주시

Common Values

ValueCountFrequency (%)
당진시 29
 
6.6%
보령시 29
 
6.6%
계룡시 29
 
6.6%
공주시 29
 
6.6%
논산시 29
 
6.6%
부여군 29
 
6.6%
태안군 29
 
6.6%
서산시 29
 
6.6%
예산군 29
 
6.6%
청양군 29
 
6.6%
Other values (6) 147
33.6%

Length

2024-03-13T20:46:05.876955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
천안시 34
 
7.7%
당진시 29
 
6.5%
보령시 29
 
6.5%
계룡시 29
 
6.5%
공주시 29
 
6.5%
논산시 29
 
6.5%
부여군 29
 
6.5%
태안군 29
 
6.5%
서산시 29
 
6.5%
예산군 29
 
6.5%
Other values (6) 148
33.4%

시도명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
충청남도
437 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row충청남도
2nd row충청남도
3rd row충청남도
4th row충청남도
5th row충청남도

Common Values

ValueCountFrequency (%)
충청남도 437
100.0%

Length

2024-03-13T20:46:06.028479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T20:46:06.130499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
충청남도 437
100.0%

평균전력사용량
Real number (ℝ)

HIGH CORRELATION 

Distinct298
Distinct (%)68.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean198.5654
Minimum89.67
Maximum357.3
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2024-03-13T20:46:06.260329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum89.67
5-th percentile148.272
Q1171.79
median194.93
Q3219.13
95-th percentile264.42
Maximum357.3
Range267.63
Interquartile range (IQR)47.34

Descriptive statistics

Standard deviation36.928602
Coefficient of variation (CV)0.18597702
Kurtosis1.2909157
Mean198.5654
Median Absolute Deviation (MAD)23.18
Skewness0.65162594
Sum86773.08
Variance1363.7217
MonotonicityNot monotonic
2024-03-13T20:46:06.430882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
212.69 3
 
0.7%
222.19 2
 
0.5%
234.1 2
 
0.5%
166.44 2
 
0.5%
163.63 2
 
0.5%
154.0 2
 
0.5%
196.83 2
 
0.5%
251.35 2
 
0.5%
167.17 2
 
0.5%
226.53 2
 
0.5%
Other values (288) 416
95.2%
ValueCountFrequency (%)
89.67 1
0.2%
93.67 1
0.2%
109.33 2
0.5%
112.33 2
0.5%
138.9 1
0.2%
138.96 1
0.2%
140.51 1
0.2%
141.26 2
0.5%
141.48 1
0.2%
141.81 2
0.5%
ValueCountFrequency (%)
357.3 1
0.2%
315.53 1
0.2%
312.94 2
0.5%
311.33 2
0.5%
301.13 2
0.5%
296.26 1
0.2%
290.04 1
0.2%
282.52 2
0.5%
280.78 2
0.5%
280.19 1
0.2%

평균전기요금
Real number (ℝ)

HIGH CORRELATION 

Distinct297
Distinct (%)68.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24691.121
Minimum9067
Maximum52257
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2024-03-13T20:46:06.600952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum9067
5-th percentile18988.2
Q121393
median23680
Q327256
95-th percentile33289.8
Maximum52257
Range43190
Interquartile range (IQR)5863

Descriptive statistics

Standard deviation4983.9309
Coefficient of variation (CV)0.20185114
Kurtosis3.5209319
Mean24691.121
Median Absolute Deviation (MAD)2929
Skewness1.0844679
Sum10790020
Variance24839567
MonotonicityNot monotonic
2024-03-13T20:46:06.808449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27256 4
 
0.9%
21683 3
 
0.7%
22366 2
 
0.5%
30588 2
 
0.5%
22835 2
 
0.5%
25390 2
 
0.5%
28394 2
 
0.5%
19694 2
 
0.5%
25932 2
 
0.5%
25894 2
 
0.5%
Other values (287) 414
94.7%
ValueCountFrequency (%)
9067 1
0.2%
9347 1
0.2%
12407 2
0.5%
12996 2
0.5%
18185 2
0.5%
18304 2
0.5%
18312 1
0.2%
18440 1
0.2%
18549 2
0.5%
18566 2
0.5%
ValueCountFrequency (%)
52257 1
0.2%
45690 1
0.2%
43545 1
0.2%
42422 1
0.2%
41264 1
0.2%
40895 1
0.2%
38936 1
0.2%
38133 1
0.2%
37198 1
0.2%
36492 1
0.2%

Interactions

2024-03-13T20:46:04.283562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:02.959071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:03.502284image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:03.889587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:04.393416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:03.123008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:03.623190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:03.994050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:04.490536image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:03.245895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:03.721073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:04.092354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:04.586857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:03.389679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:03.808382image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:46:04.190878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-13T20:46:06.899846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
가구수조회월조회연도시군구명평균전력사용량평균전기요금
가구수1.0000.0000.0000.9820.4570.245
조회월0.0001.0000.5470.0000.6640.585
조회연도0.0000.5471.0000.0000.0810.635
시군구명0.9820.0000.0001.0000.7590.682
평균전력사용량0.4570.6640.0810.7591.0000.854
평균전기요금0.2450.5850.6350.6820.8541.000
2024-03-13T20:46:07.047194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군구명조회연도
시군구명1.0000.000
조회연도0.0001.000
2024-03-13T20:46:07.177306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
가구수조회월평균전력사용량평균전기요금조회연도시군구명
가구수1.0000.0450.3730.1940.0000.926
조회월0.0451.000-0.108-0.2360.3880.000
평균전력사용량0.373-0.1081.0000.7640.0470.420
평균전기요금0.194-0.2360.7641.0000.3530.360
조회연도0.0000.3880.0470.3531.0000.000
시군구명0.9260.0000.4200.3600.0001.000

Missing values

2024-03-13T20:46:04.712692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T20:46:04.893168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

가구수조회월조회연도시군구명시도명평균전력사용량평균전기요금
0361507122021천안시충청남도220.7823704
1114755122021당진시충청남도201.523050
260630122021보령시충청남도203.2823219
321341122021계룡시충청남도240.7425968
464061122021공주시충청남도194.0322197
534089122021금산군충청남도170.8620728
672990122021논산시충청남도187.5121352
738994122021부여군충청남도191.2422631
839483122021태안군충청남도179.1122009
9189857122021아산시충청남도213.6222924
가구수조회월조회연도시군구명시도명평균전력사용량평균전기요금
4276378882022홍성군충청남도268.331014
42819468482022아산시충청남도301.1334591
4296147882022보령시충청남도264.4230561
4303514682022서천군충청남도216.2225372
4311965782022청양군충청남도193.4722359
4323989482022태안군충청남도225.4228916
4336582582022공주시충청남도238.926972
4342226082022계룡시충청남도311.3333926
4355144682022예산군충청남도231.826297
43611688282022당진시충청남도260.2629943

Duplicate rows

Most frequently occurring

가구수조회월조회연도시군구명시도명평균전력사용량평균전기요금# duplicates
0312022천안시 서북구충청남도112.33129962
1322022천안시 서북구충청남도109.33124072
21950812022청양군충청남도199.16260792
31953622022청양군충청남도198.75264832
41957942022청양군충청남도168.63210592
51960252022청양군충청남도148.47194492
61961362022청양군충청남도146.63189932
71963272022청양군충청남도167.17196942
81965782022청양군충청남도193.47223592
919833102022청양군충청남도147.48189692