Overview

Dataset statistics

Number of variables5
Number of observations549
Missing cells42
Missing cells (%)1.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory23.2 KiB
Average record size in memory43.2 B

Variable types

Numeric3
Categorical2

Alerts

년도 is highly overall correlated with 발전원개수 and 1 other fieldsHigh correlation
발전원개수 is highly overall correlated with 년도 and 2 other fieldsHigh correlation
발전용량(kW) is highly overall correlated with 년도 and 2 other fieldsHigh correlation
발전원구분명 is highly overall correlated with 발전원개수 and 1 other fieldsHigh correlation
발전원구분명 is highly imbalanced (71.3%)Imbalance
발전원개수 has 21 (3.8%) missing valuesMissing
발전용량(kW) has 21 (3.8%) missing valuesMissing

Reproduction

Analysis started2024-03-12 23:00:43.143182
Analysis finished2024-03-12 23:00:44.542351
Duration1.4 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

년도
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.2587
Minimum2007
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.0 KiB
2024-03-13T08:00:44.588837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2007
5-th percentile2008
Q12013
median2017
Q32020
95-th percentile2023
Maximum2023
Range16
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.6161553
Coefficient of variation (CV)0.0022894658
Kurtosis-1.0614295
Mean2016.2587
Median Absolute Deviation (MAD)4
Skewness-0.27751225
Sum1106926
Variance21.308889
MonotonicityDecreasing
2024-03-13T08:00:44.679307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
2021 43
 
7.8%
2023 42
 
7.7%
2020 42
 
7.7%
2019 42
 
7.7%
2022 41
 
7.5%
2018 38
 
6.9%
2017 37
 
6.7%
2016 33
 
6.0%
2015 33
 
6.0%
2014 31
 
5.6%
Other values (7) 167
30.4%
ValueCountFrequency (%)
2007 11
 
2.0%
2008 21
3.8%
2009 24
4.4%
2010 24
4.4%
2011 28
5.1%
2012 29
5.3%
2013 30
5.5%
2014 31
5.6%
2015 33
6.0%
2016 33
6.0%
ValueCountFrequency (%)
2023 42
7.7%
2022 41
7.5%
2021 43
7.8%
2020 42
7.7%
2019 42
7.7%
2018 38
6.9%
2017 37
6.7%
2016 33
6.0%
2015 33
6.0%
2014 31
5.6%

시군명
Categorical

Distinct37
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size4.4 KiB
태양광
 
31
안산시
 
27
안성시
 
26
이천시
 
22
용인시
 
22
Other values (32)
421 

Length

Max length6
Median length3
Mean length3.0837887
Min length2

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row바이오에너지
2nd row바이오에너지
3rd row바이오에너지
4th row소수력
5th row소수력

Common Values

ValueCountFrequency (%)
태양광 31
 
5.6%
안산시 27
 
4.9%
안성시 26
 
4.7%
이천시 22
 
4.0%
용인시 22
 
4.0%
포천시 21
 
3.8%
연천군 20
 
3.6%
구리시 20
 
3.6%
김포시 17
 
3.1%
평택시 16
 
2.9%
Other values (27) 327
59.6%

Length

2024-03-13T08:00:44.792421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
태양광 31
 
5.6%
안산시 27
 
4.9%
안성시 26
 
4.7%
이천시 22
 
4.0%
용인시 22
 
4.0%
포천시 21
 
3.8%
연천군 20
 
3.6%
구리시 20
 
3.6%
김포시 17
 
3.1%
수원시 16
 
2.9%
Other values (27) 327
59.6%

발전원구분명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct37
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size4.4 KiB
태양광
445 
소수력
 
24
바이오
 
14
연료전지
 
12
풍력
 
8
Other values (32)
46 

Length

Max length4
Median length3
Mean length3.0145719
Min length2

Unique

Unique22 ?
Unique (%)4.0%

Sample

1st row이천시
2nd row연천군
3rd row의정부시
4th row안성시
5th row용인시

Common Values

ValueCountFrequency (%)
태양광 445
81.1%
소수력 24
 
4.4%
바이오 14
 
2.6%
연료전지 12
 
2.2%
풍력 8
 
1.5%
폐기물 4
 
0.7%
용인시 3
 
0.5%
안산시 3
 
0.5%
안성시 2
 
0.4%
화성시 2
 
0.4%
Other values (27) 32
 
5.8%

Length

2024-03-13T08:00:44.910145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
태양광 445
81.1%
소수력 24
 
4.4%
바이오 14
 
2.6%
연료전지 12
 
2.2%
풍력 8
 
1.5%
폐기물 4
 
0.7%
용인시 3
 
0.5%
안산시 3
 
0.5%
구리시 2
 
0.4%
연천군 2
 
0.4%
Other values (27) 32
 
5.8%

발전원개수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct181
Distinct (%)34.3%
Missing21
Missing (%)3.8%
Infinite0
Infinite (%)0.0%
Mean100.10985
Minimum1
Maximum1828
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.0 KiB
2024-03-13T08:00:45.050725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median14
Q374.75
95-th percentile587.15
Maximum1828
Range1827
Interquartile range (IQR)72.75

Descriptive statistics

Standard deviation217.10062
Coefficient of variation (CV)2.168624
Kurtosis17.378733
Mean100.10985
Median Absolute Deviation (MAD)13
Skewness3.7220587
Sum52858
Variance47132.679
MonotonicityNot monotonic
2024-03-13T08:00:45.178567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 96
 
17.5%
2 37
 
6.7%
5 19
 
3.5%
4 19
 
3.5%
6 13
 
2.4%
9 13
 
2.4%
7 13
 
2.4%
3 11
 
2.0%
14 11
 
2.0%
11 10
 
1.8%
Other values (171) 286
52.1%
(Missing) 21
 
3.8%
ValueCountFrequency (%)
1 96
17.5%
2 37
 
6.7%
3 11
 
2.0%
4 19
 
3.5%
5 19
 
3.5%
6 13
 
2.4%
7 13
 
2.4%
8 10
 
1.8%
9 13
 
2.4%
10 7
 
1.3%
ValueCountFrequency (%)
1828 1
0.2%
1571 1
0.2%
1301 1
0.2%
1288 1
0.2%
1024 1
0.2%
1022 1
0.2%
999 1
0.2%
887 1
0.2%
858 1
0.2%
855 1
0.2%

발전용량(kW)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct413
Distinct (%)78.2%
Missing21
Missing (%)3.8%
Infinite0
Infinite (%)0.0%
Mean9631.4831
Minimum3
Maximum171284.24
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.0 KiB
2024-03-13T08:00:45.296084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile23.17
Q1243.62
median992.8
Q35812.525
95-th percentile60548.725
Maximum171284.24
Range171281.24
Interquartile range (IQR)5568.905

Descriptive statistics

Standard deviation22535.873
Coefficient of variation (CV)2.3398133
Kurtosis14.594658
Mean9631.4831
Median Absolute Deviation (MAD)947.8
Skewness3.6158647
Sum5085423.1
Variance5.0786556 × 108
MonotonicityNot monotonic
2024-03-13T08:00:45.440255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
45.0 11
 
2.0%
20.0 10
 
1.8%
30.0 7
 
1.3%
992.8 6
 
1.1%
360.0 5
 
0.9%
450.0 5
 
0.9%
440.0 5
 
0.9%
250.0 5
 
0.9%
29.4 5
 
0.9%
98.28 5
 
0.9%
Other values (403) 464
84.5%
(Missing) 21
 
3.8%
ValueCountFrequency (%)
3.0 2
 
0.4%
5.58 1
 
0.2%
9.0 3
 
0.5%
10.0 3
 
0.5%
12.0 2
 
0.4%
18.62 3
 
0.5%
20.0 10
1.8%
21.0 3
 
0.5%
27.2 1
 
0.2%
28.0 2
 
0.4%
ValueCountFrequency (%)
171284.24 1
0.2%
144666.08 1
0.2%
131886.97 1
0.2%
118976.29 1
0.2%
110043.36 1
0.2%
108784.23 1
0.2%
107193.86 1
0.2%
105598.43 1
0.2%
103042.06 1
0.2%
97435.8 1
0.2%

Interactions

2024-03-13T08:00:44.075965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T08:00:43.334136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T08:00:43.564362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T08:00:44.174930image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T08:00:43.409565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T08:00:43.641177image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T08:00:44.258644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T08:00:43.486047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T08:00:43.991740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-13T08:00:45.539149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년도시군명발전원구분명발전원개수발전용량(kW)
년도1.0000.0000.0000.3780.482
시군명0.0001.0000.7510.2630.000
발전원구분명0.0000.7511.0000.8740.882
발전원개수0.3780.2630.8741.0000.930
발전용량(kW)0.4820.0000.8820.9301.000
2024-03-13T08:00:45.627328image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군명발전원구분명
시군명1.0000.185
발전원구분명0.1851.000
2024-03-13T08:00:45.698031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년도발전원개수발전용량(kW)시군명발전원구분명
년도1.0000.5470.6540.0250.000
발전원개수0.5471.0000.9420.0960.539
발전용량(kW)0.6540.9421.0000.0000.542
시군명0.0250.0960.0001.0000.185
발전원구분명0.0000.5390.5420.1851.000

Missing values

2024-03-13T08:00:44.357604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T08:00:44.433898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-13T08:00:44.505594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

년도시군명발전원구분명발전원개수발전용량(kW)
02023바이오에너지이천시<NA><NA>
12023바이오에너지연천군<NA><NA>
22023바이오에너지의정부시<NA><NA>
32023소수력안성시<NA><NA>
42023소수력용인시<NA><NA>
52023소수력구리시<NA><NA>
62023연료전지안산시<NA><NA>
72023연료전지포천시<NA><NA>
82023연료전지화성시<NA><NA>
92023연료전지용인시<NA><NA>
년도시군명발전원구분명발전원개수발전용량(kW)
5392007광주시태양광110.0
5402007부천시태양광13.0
5412007수원시태양광112.0
5422007시흥시태양광15.58
5432007안성시태양광13.0
5442007여주시태양광2199.0
5452007용인시태양광199.0
5462007의왕시태양광150.0
5472007평택시태양광7160.0
5482007화성시태양광199.0