Overview

Dataset statistics

Number of variables9
Number of observations6422
Missing cells12345
Missing cells (%)21.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory483.0 KiB
Average record size in memory77.0 B

Variable types

Categorical3
DateTime1
Numeric5

Dataset

Description한국서부발전 발전소 호기별 월별 대기오염물질 배출량 및 발전량 입니다. 데이터 기간 : 2002-01 ~ 2022-07 데이터 내용 : 발전소명, 호기명, 발전량(MWh), 황산화물(SOx ), 질소산화물(NOx), 먼지(TPS) - 대기오염물질 단위는 톤이며, 먼지는 TPS만 제공됩니다.
Author한국서부발전(주)
URLhttps://www.data.go.kr/data/15099592/fileData.do

Alerts

발전소 is highly overall correlated with 발전용량(MW) and 3 other fieldsHigh correlation
호기 is highly overall correlated with 발전용량(MW) and 2 other fieldsHigh correlation
비고 is highly overall correlated with 발전용량(MW) and 2 other fieldsHigh correlation
발전용량(MW) is highly overall correlated with 발전량(MWh) and 3 other fieldsHigh correlation
발전량(MWh) is highly overall correlated with 발전용량(MW) and 3 other fieldsHigh correlation
SOx is highly overall correlated with 발전량(MWh) and 2 other fieldsHigh correlation
NOx is highly overall correlated with 발전량(MWh) and 2 other fieldsHigh correlation
먼지(TSP) is highly overall correlated with SOx and 1 other fieldsHigh correlation
비고 is highly imbalanced (92.9%)Imbalance
발전량(MWh) has 1451 (22.6%) missing valuesMissing
SOx has 4282 (66.7%) missing valuesMissing
NOx has 4357 (67.8%) missing valuesMissing
먼지(TSP) has 2255 (35.1%) missing valuesMissing

Reproduction

Analysis started2023-12-12 19:21:47.356799
Analysis finished2023-12-12 19:21:51.525173
Duration4.17 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

발전소
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size50.3 KiB
태안
2717 
서인천
1976 
평택
1482 
군산
 
247

Length

Max length3
Median length2
Mean length2.3076923
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row태안
2nd row태안
3rd row태안
4th row태안
5th row태안

Common Values

ValueCountFrequency (%)
태안 2717
42.3%
서인천 1976
30.8%
평택 1482
23.1%
군산 247
 
3.8%

Length

2023-12-13T04:21:51.599604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:21:51.739278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
태안 2717
42.3%
서인천 1976
30.8%
평택 1482
23.1%
군산 247
 
3.8%

호기
Categorical

HIGH CORRELATION 

Distinct26
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size50.3 KiB
1호기
 
247
2호기
 
247
3호기
 
247
4호기
 
247
5호기
 
247
Other values (21)
5187 

Length

Max length8
Median length7
Mean length5.4615385
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호기
2nd row1호기
3rd row1호기
4th row1호기
5th row1호기

Common Values

ValueCountFrequency (%)
1호기 247
 
3.8%
2호기 247
 
3.8%
3호기 247
 
3.8%
4호기 247
 
3.8%
5호기 247
 
3.8%
6호기 247
 
3.8%
7호기 247
 
3.8%
8호기 247
 
3.8%
9호기 247
 
3.8%
10호기 247
 
3.8%
Other values (16) 3952
61.5%

Length

2023-12-13T04:21:51.878300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
복합 2717
26.8%
기력 988
 
9.8%
2호기 494
 
4.9%
2cc 494
 
4.9%
1cc 494
 
4.9%
1호기 494
 
4.9%
4호기 494
 
4.9%
3호기 494
 
4.9%
7호기 247
 
2.4%
8호기 247
 
2.4%
Other values (12) 2964
29.3%

날짜
Date

Distinct247
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size50.3 KiB
Minimum2002-01-01 00:00:00
Maximum2022-07-01 00:00:00
2023-12-13T04:21:52.034986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:52.224555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

발전용량(MW)
Real number (ℝ)

HIGH CORRELATION 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean450.50885
Minimum225
Maximum1050
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size56.6 KiB
2023-12-13T04:21:52.411975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum225
5-th percentile225
Q1225
median415
Q3500
95-th percentile1050
Maximum1050
Range825
Interquartile range (IQR)275

Descriptive statistics

Standard deviation235.66978
Coefficient of variation (CV)0.5231191
Kurtosis0.93370119
Mean450.50885
Median Absolute Deviation (MAD)85
Skewness1.2559296
Sum2893167.8
Variance55540.246
MonotonicityNot monotonic
2023-12-13T04:21:52.538768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
500.0 1976
30.8%
225.0 1976
30.8%
350.0 988
15.4%
1050.0 494
 
7.7%
346.33 247
 
3.8%
480.0 247
 
3.8%
868.5 247
 
3.8%
718.4 247
 
3.8%
ValueCountFrequency (%)
225.0 1976
30.8%
346.33 247
 
3.8%
350.0 988
15.4%
480.0 247
 
3.8%
500.0 1976
30.8%
718.4 247
 
3.8%
868.5 247
 
3.8%
1050.0 494
 
7.7%
ValueCountFrequency (%)
1050.0 494
 
7.7%
868.5 247
 
3.8%
718.4 247
 
3.8%
500.0 1976
30.8%
480.0 247
 
3.8%
350.0 988
15.4%
346.33 247
 
3.8%
225.0 1976
30.8%

발전량(MWh)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct4919
Distinct (%)99.0%
Missing1451
Missing (%)22.6%
Infinite0
Infinite (%)0.0%
Mean187533.35
Minimum0
Maximum749933
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size56.6 KiB
2023-12-13T04:21:52.685567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5786
Q164856.5
median134443
Q3340267.5
95-th percentile388889.5
Maximum749933
Range749933
Interquartile range (IQR)275411

Descriptive statistics

Standard deviation149243.57
Coefficient of variation (CV)0.79582415
Kurtosis-0.38194161
Mean187533.35
Median Absolute Deviation (MAD)106729
Skewness0.65176916
Sum9.3222829 × 108
Variance2.2273644 × 1010
MonotonicityNot monotonic
2023-12-13T04:21:53.197569image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
103453 2
 
< 0.1%
75762 2
 
< 0.1%
27562 2
 
< 0.1%
338846 2
 
< 0.1%
76900 2
 
< 0.1%
353544 2
 
< 0.1%
140432 2
 
< 0.1%
7235 2
 
< 0.1%
31921 2
 
< 0.1%
380320 2
 
< 0.1%
Other values (4909) 4951
77.1%
(Missing) 1451
 
22.6%
ValueCountFrequency (%)
0 1
< 0.1%
3 2
< 0.1%
4 1
< 0.1%
33 1
< 0.1%
97 1
< 0.1%
116 1
< 0.1%
120 1
< 0.1%
134 1
< 0.1%
180 1
< 0.1%
189 1
< 0.1%
ValueCountFrequency (%)
749933 1
< 0.1%
747394 1
< 0.1%
743351 1
< 0.1%
729921 1
< 0.1%
725428 1
< 0.1%
722789 1
< 0.1%
699180 1
< 0.1%
694927 1
< 0.1%
690001 1
< 0.1%
684722 1
< 0.1%

SOx
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct1197
Distinct (%)55.9%
Missing4282
Missing (%)66.7%
Infinite0
Infinite (%)0.0%
Mean73.110607
Minimum0
Maximum306.2
Zeros6
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size56.6 KiB
2023-12-13T04:21:53.361669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.4
Q126
median60
Q3113.45
95-th percentile178.905
Maximum306.2
Range306.2
Interquartile range (IQR)87.45

Descriptive statistics

Standard deviation57.053789
Coefficient of variation (CV)0.78037636
Kurtosis-0.14614913
Mean73.110607
Median Absolute Deviation (MAD)40.45
Skewness0.74112593
Sum156456.7
Variance3255.1349
MonotonicityNot monotonic
2023-12-13T04:21:53.519799image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.1 10
 
0.2%
4.7 8
 
0.1%
0.3 8
 
0.1%
31.0 8
 
0.1%
60.0 8
 
0.1%
3.0 7
 
0.1%
20.0 7
 
0.1%
44.0 7
 
0.1%
40.0 7
 
0.1%
34.0 6
 
0.1%
Other values (1187) 2064
32.1%
(Missing) 4282
66.7%
ValueCountFrequency (%)
0.0 6
0.1%
0.1 6
0.1%
0.2 3
 
< 0.1%
0.3 8
0.1%
0.4 3
 
< 0.1%
0.5 5
0.1%
0.6 4
0.1%
0.7 3
 
< 0.1%
0.8 4
0.1%
0.9 2
 
< 0.1%
ValueCountFrequency (%)
306.2 1
< 0.1%
297.1 1
< 0.1%
294.5 1
< 0.1%
264.2 1
< 0.1%
252.5 1
< 0.1%
249.6 1
< 0.1%
247.0 1
< 0.1%
246.8 1
< 0.1%
243.1 1
< 0.1%
241.3 1
< 0.1%

NOx
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct1309
Distinct (%)63.4%
Missing4357
Missing (%)67.8%
Infinite0
Infinite (%)0.0%
Mean110.8431
Minimum0
Maximum346
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size56.6 KiB
2023-12-13T04:21:53.688582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5.4
Q133.8
median75
Q3194.9
95-th percentile258.18
Maximum346
Range346
Interquartile range (IQR)161.1

Descriptive statistics

Standard deviation88.734012
Coefficient of variation (CV)0.80053709
Kurtosis-1.256433
Mean110.8431
Median Absolute Deviation (MAD)60.2
Skewness0.46005842
Sum228891
Variance7873.7249
MonotonicityNot monotonic
2023-12-13T04:21:53.876089image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31.0 9
 
0.1%
46.0 7
 
0.1%
42.0 7
 
0.1%
50.0 6
 
0.1%
28.6 6
 
0.1%
33.6 6
 
0.1%
35.7 6
 
0.1%
47.7 6
 
0.1%
47.0 6
 
0.1%
38.7 6
 
0.1%
Other values (1299) 2000
31.1%
(Missing) 4357
67.8%
ValueCountFrequency (%)
0.0 1
 
< 0.1%
0.1 4
0.1%
0.2 2
< 0.1%
0.3 3
< 0.1%
0.5 1
 
< 0.1%
0.6 3
< 0.1%
0.7 2
< 0.1%
0.8 2
< 0.1%
0.9 2
< 0.1%
1.0 3
< 0.1%
ValueCountFrequency (%)
346.0 1
< 0.1%
324.9 1
< 0.1%
318.1 1
< 0.1%
317.7 1
< 0.1%
316.7 1
< 0.1%
310.0 1
< 0.1%
308.1 1
< 0.1%
306.9 1
< 0.1%
306.7 1
< 0.1%
306.4 1
< 0.1%

먼지(TSP)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct588
Distinct (%)14.1%
Missing2255
Missing (%)35.1%
Infinite0
Infinite (%)0.0%
Mean13.30144
Minimum0
Maximum214.8
Zeros29
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size56.6 KiB
2023-12-13T04:21:54.038865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.2
Q12.1
median8
Q313.85
95-th percentile50.34
Maximum214.8
Range214.8
Interquartile range (IQR)11.75

Descriptive statistics

Standard deviation21.512152
Coefficient of variation (CV)1.61728
Kurtosis23.728474
Mean13.30144
Median Absolute Deviation (MAD)5.9
Skewness4.2277072
Sum55427.1
Variance462.7727
MonotonicityNot monotonic
2023-12-13T04:21:54.223738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.3 138
 
2.1%
0.1 135
 
2.1%
0.2 129
 
2.0%
0.4 84
 
1.3%
0.5 64
 
1.0%
1.0 46
 
0.7%
3.0 45
 
0.7%
0.7 43
 
0.7%
0.6 37
 
0.6%
2.0 37
 
0.6%
Other values (578) 3409
53.1%
(Missing) 2255
35.1%
ValueCountFrequency (%)
0.0 29
 
0.5%
0.1 135
2.1%
0.2 129
2.0%
0.3 138
2.1%
0.4 84
1.3%
0.5 64
1.0%
0.6 37
 
0.6%
0.7 43
 
0.7%
0.8 26
 
0.4%
0.9 28
 
0.4%
ValueCountFrequency (%)
214.8 1
< 0.1%
205.0 1
< 0.1%
204.2 1
< 0.1%
203.5 1
< 0.1%
200.0 1
< 0.1%
199.5 1
< 0.1%
192.7 1
< 0.1%
191.5 1
< 0.1%
186.0 1
< 0.1%
175.3 1
< 0.1%

비고
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size50.3 KiB
<NA>
6367 
2018-01-01 부 폐지
 
55

Length

Max length15
Median length4
Mean length4.0942074
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 6367
99.1%
2018-01-01 부 폐지 55
 
0.9%

Length

2023-12-13T04:21:54.399470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:21:54.508864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 6367
97.5%
2018-01-01 55
 
0.8%
55
 
0.8%
폐지 55
 
0.8%

Interactions

2023-12-13T04:21:50.494071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:48.082948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:48.609907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:49.203340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:49.871097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:50.593062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:48.186218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:48.726579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:49.347418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:49.996320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:50.708680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:48.304924image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:48.859783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:49.482237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:50.121193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:50.819795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:48.413624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:48.961667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:49.606323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:50.238780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:50.937609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:48.521741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:49.090056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:49.741551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:21:50.359676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T04:21:54.584389image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발전소호기발전용량(MW)발전량(MWh)SOxNOx먼지(TSP)
발전소1.0001.0000.8470.7240.5280.6210.526
호기1.0001.0001.0000.7940.5140.5280.570
발전용량(MW)0.8471.0001.0000.8860.5240.5910.636
발전량(MWh)0.7240.7940.8861.0000.7660.7620.561
SOx0.5280.5140.5240.7661.0000.8090.565
NOx0.6210.5280.5910.7620.8091.0000.140
먼지(TSP)0.5260.5700.6360.5610.5650.1401.000
2023-12-13T04:21:54.715926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발전소호기비고
발전소1.0000.9981.000
호기0.9981.0001.000
비고1.0001.0001.000
2023-12-13T04:21:54.839159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발전용량(MW)발전량(MWh)SOxNOx먼지(TSP)발전소호기비고
발전용량(MW)1.0000.6990.4520.479-0.2520.9240.9981.000
발전량(MWh)0.6991.0000.7240.7660.2110.5290.4350.000
SOx0.4520.7241.0000.9030.7770.4060.2190.000
NOx0.4790.7660.9031.0000.8140.4820.2260.000
먼지(TSP)-0.2520.2110.7770.8141.0000.3420.2450.000
발전소0.9240.5290.4060.4820.3421.0000.9981.000
호기0.9980.4350.2190.2260.2450.9981.0001.000
비고1.0000.0000.0000.0000.0001.0001.0001.000

Missing values

2023-12-13T04:21:51.086542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T04:21:51.271928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T04:21:51.436760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

발전소호기날짜발전용량(MW)발전량(MWh)SOxNOx먼지(TSP)비고
0태안1호기2002-01500.0360555<NA><NA><NA><NA>
1태안1호기2002-02500.0106134<NA><NA><NA><NA>
2태안1호기2002-03500.073675<NA><NA><NA><NA>
3태안1호기2002-04500.0339544<NA><NA><NA><NA>
4태안1호기2002-05500.0337666<NA><NA><NA><NA>
5태안1호기2002-06500.0341539<NA><NA><NA><NA>
6태안1호기2002-07500.0331669<NA><NA><NA><NA>
7태안1호기2002-08500.0313827<NA><NA><NA><NA>
8태안1호기2002-09500.0223793<NA><NA><NA><NA>
9태안1호기2002-10500.0354079<NA><NA><NA><NA>
발전소호기날짜발전용량(MW)발전량(MWh)SOxNOx먼지(TSP)비고
6412군산복합 CC2021-10718.4157349<NA><NA>11.8<NA>
6413군산복합 CC2021-11718.4147689<NA><NA>11.1<NA>
6414군산복합 CC2021-12718.4128665<NA><NA>9.6<NA>
6415군산복합 CC2022-01718.4102682<NA><NA>8.1<NA>
6416군산복합 CC2022-02718.459888<NA><NA>5.0<NA>
6417군산복합 CC2022-03718.4<NA><NA><NA><NA><NA>
6418군산복합 CC2022-04718.4<NA><NA><NA><NA><NA>
6419군산복합 CC2022-05718.4<NA><NA><NA><NA><NA>
6420군산복합 CC2022-06718.4<NA><NA><NA><NA><NA>
6421군산복합 CC2022-07718.461954<NA><NA>5.7<NA>