Overview

Dataset statistics

Number of variables6
Number of observations25
Missing cells0
Missing cells (%)0.0%
Duplicate rows3
Duplicate rows (%)12.0%
Total size in memory1.4 KiB
Average record size in memory57.3 B

Variable types

DateTime1
Numeric1
Categorical4

Dataset

Description한국남동발전 환경화학 시스템 내 용수 관리 정보입니다. 분석기간에 따른 원수비용, 여과수비용, 음용수비용 등의 데이터를 포함하고 있습니다.
Author한국남동발전㈜
URLhttps://www.data.go.kr/data/15093003/fileData.do

Alerts

통화단위 has constant value ""Constant
Dataset has 3 (12.0%) duplicate rowsDuplicates
순수비용 is highly overall correlated with 여과수비용 and 1 other fieldsHigh correlation
음용수비용 is highly overall correlated with 원수비용 and 2 other fieldsHigh correlation
여과수비용 is highly overall correlated with 음용수비용 and 1 other fieldsHigh correlation
원수비용 is highly overall correlated with 음용수비용High correlation
여과수비용 is highly imbalanced (64.0%)Imbalance
음용수비용 is highly imbalanced (75.8%)Imbalance
순수비용 is highly imbalanced (64.0%)Imbalance
원수비용 has 17 (68.0%) zerosZeros

Reproduction

Analysis started2023-12-12 18:19:02.801680
Analysis finished2023-12-12 18:19:03.368244
Duration0.57 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct4
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
Minimum2020-01-11 00:00:00
Maximum2020-07-10 00:00:00
2023-12-13T03:19:03.423904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:19:03.589356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=4)

원수비용
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct9
Distinct (%)36.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8058855.4
Minimum0
Maximum65554521
Zeros17
Zeros (%)68.0%
Negative0
Negative (%)0.0%
Memory size357.0 B
2023-12-13T03:19:03.756192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q36825654
95-th percentile43275860
Maximum65554521
Range65554521
Interquartile range (IQR)6825654

Descriptive statistics

Standard deviation17026267
Coefficient of variation (CV)2.1127401
Kurtosis5.5162727
Mean8058855.4
Median Absolute Deviation (MAD)0
Skewness2.4557808
Sum2.0147138 × 108
Variance2.8989377 × 1014
MonotonicityNot monotonic
2023-12-13T03:19:03.899681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
0 17
68.0%
42843300 1
 
4.0%
13721290 1
 
4.0%
4075826 1
 
4.0%
43384000 1
 
4.0%
12135580 1
 
4.0%
6825654 1
 
4.0%
65554521 1
 
4.0%
12931213 1
 
4.0%
ValueCountFrequency (%)
0 17
68.0%
4075826 1
 
4.0%
6825654 1
 
4.0%
12135580 1
 
4.0%
12931213 1
 
4.0%
13721290 1
 
4.0%
42843300 1
 
4.0%
43384000 1
 
4.0%
65554521 1
 
4.0%
ValueCountFrequency (%)
65554521 1
 
4.0%
43384000 1
 
4.0%
42843300 1
 
4.0%
13721290 1
 
4.0%
12931213 1
 
4.0%
12135580 1
 
4.0%
6825654 1
 
4.0%
4075826 1
 
4.0%
0 17
68.0%

여과수비용
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
0
22 
1165887
 
1
2226968
 
1
277
 
1

Length

Max length7
Median length1
Mean length1.56
Min length1

Unique

Unique3 ?
Unique (%)12.0%

Sample

1st row0
2nd row0
3rd row0
4th row1165887
5th row2226968

Common Values

ValueCountFrequency (%)
0 22
88.0%
1165887 1
 
4.0%
2226968 1
 
4.0%
277 1
 
4.0%

Length

2023-12-13T03:19:04.080147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:19:04.217132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 22
88.0%
1165887 1
 
4.0%
2226968 1
 
4.0%
277 1
 
4.0%

음용수비용
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
0
24 
268
 
1

Length

Max length3
Median length1
Mean length1.08
Min length1

Unique

Unique1 ?
Unique (%)4.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 24
96.0%
268 1
 
4.0%

Length

2023-12-13T03:19:04.373477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:19:04.503136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 24
96.0%
268 1
 
4.0%

순수비용
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
0
22 
1318124
 
1
2249928
 
1
467
 
1

Length

Max length7
Median length1
Mean length1.56
Min length1

Unique

Unique3 ?
Unique (%)12.0%

Sample

1st row0
2nd row0
3rd row0
4th row1318124
5th row2249928

Common Values

ValueCountFrequency (%)
0 22
88.0%
1318124 1
 
4.0%
2249928 1
 
4.0%
467 1
 
4.0%

Length

2023-12-13T03:19:04.644665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:19:04.802871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 22
88.0%
1318124 1
 
4.0%
2249928 1
 
4.0%
467 1
 
4.0%

통화단위
Categorical

CONSTANT 

Distinct1
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
KRW
25 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKRW
2nd rowKRW
3rd rowKRW
4th rowKRW
5th rowKRW

Common Values

ValueCountFrequency (%)
KRW 25
100.0%

Length

2023-12-13T03:19:04.956868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:19:05.081356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
krw 25
100.0%

Interactions

2023-12-13T03:19:03.040227image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T03:19:05.147308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분석일자원수비용여과수비용음용수비용순수비용
분석일자1.0000.3780.5450.3450.545
원수비용0.3781.0000.5401.0000.540
여과수비용0.5450.5401.0001.0001.000
음용수비용0.3451.0001.0001.0001.000
순수비용0.5450.5401.0001.0001.000
2023-12-13T03:19:05.282875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순수비용음용수비용여과수비용
순수비용1.0000.9561.000
음용수비용0.9561.0000.956
여과수비용1.0000.9561.000
2023-12-13T03:19:05.407548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
원수비용여과수비용음용수비용순수비용
원수비용1.0000.4490.9330.449
여과수비용0.4491.0000.9561.000
음용수비용0.9330.9561.0000.956
순수비용0.4491.0000.9561.000

Missing values

2023-12-13T03:19:03.185867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T03:19:03.311639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

분석일자원수비용여과수비용음용수비용순수비용통화단위
02020-07-1042843300000KRW
12020-07-1013721290000KRW
22020-07-104075826000KRW
32020-07-100116588701318124KRW
42020-07-100222696802249928KRW
52020-01-110000KRW
62020-01-110000KRW
72020-01-110000KRW
82020-01-110000KRW
92020-01-110000KRW
분석일자원수비용여과수비용음용수비용순수비용통화단위
152020-01-1243384000000KRW
162020-01-1212135580000KRW
172020-01-126825654000KRW
182020-01-120000KRW
192020-01-120000KRW
202020-01-1665554521277268467KRW
212020-01-1612931213000KRW
222020-01-160000KRW
232020-01-160000KRW
242020-01-160000KRW

Duplicate rows

Most frequently occurring

분석일자원수비용여과수비용음용수비용순수비용통화단위# duplicates
02020-01-110000KRW10
22020-01-160000KRW3
12020-01-120000KRW2