Overview

Dataset statistics

Number of variables7
Number of observations1117
Missing cells1096
Missing cells (%)14.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory67.8 KiB
Average record size in memory62.1 B

Variable types

Categorical1
Text1
Numeric5

Dataset

Description수질TMS 부착사업장에서 실시간으로 측정되는 수질오염물질의 배출량을 매년 통계자료로 생성하여 시스템을 통해 공개
URLhttps://www.data.go.kr/data/15106197/fileData.do

Alerts

연도 has constant value ""Constant
총유기탄소 배출량 is highly overall correlated with 부유물질 배출량 and 2 other fieldsHigh correlation
부유물질 배출량 is highly overall correlated with 총유기탄소 배출량 and 2 other fieldsHigh correlation
총질소 배출량 is highly overall correlated with 총유기탄소 배출량 and 2 other fieldsHigh correlation
총인 배출량 is highly overall correlated with 총유기탄소 배출량 and 2 other fieldsHigh correlation
총유기탄소 배출량 has 902 (80.8%) missing valuesMissing
총질소 배출량 has 90 (8.1%) missing valuesMissing
총인 배출량 has 103 (9.2%) missing valuesMissing
부유물질 배출량 has 58 (5.2%) zerosZeros
총인 배출량 has 505 (45.2%) zerosZeros

Reproduction

Analysis started2023-12-13 00:41:45.703216
Analysis finished2023-12-13 00:41:48.071833
Duration2.37 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
2022
1117 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022
2nd row2022
3rd row2022
4th row2022
5th row2022

Common Values

ValueCountFrequency (%)
2022 1117
100.0%

Length

2023-12-13T09:41:48.121086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:41:48.205673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022 1117
100.0%
Distinct1045
Distinct (%)93.6%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
2023-12-13T09:41:48.374257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length20
Mean length6.8299015
Min length4

Characters and Unicode

Total characters7629
Distinct characters391
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique993 ?
Unique (%)88.9%

Sample

1st row영천신녕하수
2nd row성주폐수
3rd row봉화춘양하수
4th row함안군북하수
5th row사천곤양하수
ValueCountFrequency (%)
현대제철(당진 6
 
0.5%
한국서부발전(태안 5
 
0.4%
아산디스플레이시티1폐수 4
 
0.4%
포스코(광양 4
 
0.4%
풍산안강공장(경주 4
 
0.4%
화성봉담하수 3
 
0.3%
포스코인터내셔널(인천 3
 
0.3%
삼성전자(화성 3
 
0.3%
대전하수 3
 
0.3%
한울원자력본부(울진 3
 
0.3%
Other values (1042) 1088
96.6%
2023-12-13T09:41:48.711396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
884
 
11.6%
678
 
8.9%
( 305
 
4.0%
) 305
 
4.0%
254
 
3.3%
190
 
2.5%
189
 
2.5%
150
 
2.0%
143
 
1.9%
126
 
1.7%
Other values (381) 4405
57.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6904
90.5%
Open Punctuation 305
 
4.0%
Close Punctuation 305
 
4.0%
Decimal Number 48
 
0.6%
Uppercase Letter 43
 
0.6%
Space Separator 12
 
0.2%
Dash Punctuation 6
 
0.1%
Other Punctuation 3
 
< 0.1%
Lowercase Letter 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
884
 
12.8%
678
 
9.8%
254
 
3.7%
190
 
2.8%
189
 
2.7%
150
 
2.2%
143
 
2.1%
126
 
1.8%
100
 
1.4%
91
 
1.3%
Other values (356) 4099
59.4%
Uppercase Letter
ValueCountFrequency (%)
C 8
18.6%
S 8
18.6%
L 6
14.0%
K 6
14.0%
I 4
9.3%
O 3
 
7.0%
D 2
 
4.7%
P 2
 
4.7%
A 2
 
4.7%
J 1
 
2.3%
Decimal Number
ValueCountFrequency (%)
1 21
43.8%
2 20
41.7%
3 3
 
6.2%
4 3
 
6.2%
5 1
 
2.1%
Lowercase Letter
ValueCountFrequency (%)
t 1
33.3%
b 1
33.3%
h 1
33.3%
Other Punctuation
ValueCountFrequency (%)
# 2
66.7%
. 1
33.3%
Open Punctuation
ValueCountFrequency (%)
( 305
100.0%
Close Punctuation
ValueCountFrequency (%)
) 305
100.0%
Space Separator
ValueCountFrequency (%)
12
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6904
90.5%
Common 679
 
8.9%
Latin 46
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
884
 
12.8%
678
 
9.8%
254
 
3.7%
190
 
2.8%
189
 
2.7%
150
 
2.2%
143
 
2.1%
126
 
1.8%
100
 
1.4%
91
 
1.3%
Other values (356) 4099
59.4%
Latin
ValueCountFrequency (%)
C 8
17.4%
S 8
17.4%
L 6
13.0%
K 6
13.0%
I 4
8.7%
O 3
 
6.5%
D 2
 
4.3%
P 2
 
4.3%
A 2
 
4.3%
J 1
 
2.2%
Other values (4) 4
8.7%
Common
ValueCountFrequency (%)
( 305
44.9%
) 305
44.9%
1 21
 
3.1%
2 20
 
2.9%
12
 
1.8%
- 6
 
0.9%
3 3
 
0.4%
4 3
 
0.4%
# 2
 
0.3%
. 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6904
90.5%
ASCII 725
 
9.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
884
 
12.8%
678
 
9.8%
254
 
3.7%
190
 
2.8%
189
 
2.7%
150
 
2.2%
143
 
2.1%
126
 
1.8%
100
 
1.4%
91
 
1.3%
Other values (356) 4099
59.4%
ASCII
ValueCountFrequency (%)
( 305
42.1%
) 305
42.1%
1 21
 
2.9%
2 20
 
2.8%
12
 
1.7%
C 8
 
1.1%
S 8
 
1.1%
L 6
 
0.8%
- 6
 
0.8%
K 6
 
0.8%
Other values (15) 28
 
3.9%

방류구
Real number (ℝ)

Distinct6
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.1056401
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.9 KiB
2023-12-13T09:41:48.805209image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum6
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.44954623
Coefficient of variation (CV)0.40659364
Kurtosis36.909075
Mean1.1056401
Median Absolute Deviation (MAD)0
Skewness5.5451571
Sum1235
Variance0.20209182
MonotonicityNot monotonic
2023-12-13T09:41:48.881887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 1039
93.0%
2 53
 
4.7%
3 14
 
1.3%
4 8
 
0.7%
5 2
 
0.2%
6 1
 
0.1%
ValueCountFrequency (%)
1 1039
93.0%
2 53
 
4.7%
3 14
 
1.3%
4 8
 
0.7%
5 2
 
0.2%
6 1
 
0.1%
ValueCountFrequency (%)
6 1
 
0.1%
5 2
 
0.2%
4 8
 
0.7%
3 14
 
1.3%
2 53
 
4.7%
1 1039
93.0%

총유기탄소 배출량
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct205
Distinct (%)95.3%
Missing902
Missing (%)80.8%
Infinite0
Infinite (%)0.0%
Mean17626.42
Minimum0
Maximum464456.4
Zeros7
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size9.9 KiB
2023-12-13T09:41:48.973281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.47
Q1157.85
median918.1
Q35387
95-th percentile81215.4
Maximum464456.4
Range464456.4
Interquartile range (IQR)5229.15

Descriptive statistics

Standard deviation62782.811
Coefficient of variation (CV)3.5618583
Kurtosis29.193007
Mean17626.42
Median Absolute Deviation (MAD)896.7
Skewness5.2230009
Sum3789680.3
Variance3.9416813 × 109
MonotonicityNot monotonic
2023-12-13T09:41:49.089226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 7
 
0.6%
26.2 2
 
0.2%
152.8 2
 
0.2%
117.9 2
 
0.2%
0.4 2
 
0.2%
2402.2 1
 
0.1%
1054.6 1
 
0.1%
918.1 1
 
0.1%
48.0 1
 
0.1%
152.2 1
 
0.1%
Other values (195) 195
 
17.5%
(Missing) 902
80.8%
ValueCountFrequency (%)
0.0 7
0.6%
0.1 1
 
0.1%
0.3 1
 
0.1%
0.4 2
 
0.2%
0.5 1
 
0.1%
0.8 1
 
0.1%
6.7 1
 
0.1%
7.3 1
 
0.1%
8.1 1
 
0.1%
9.1 1
 
0.1%
ValueCountFrequency (%)
464456.4 1
0.1%
451185.3 1
0.1%
360009.2 1
0.1%
291364.0 1
0.1%
278032.3 1
0.1%
253415.1 1
0.1%
205769.2 1
0.1%
149978.1 1
0.1%
149676.3 1
0.1%
102796.8 1
0.1%

부유물질 배출량
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct989
Distinct (%)88.6%
Missing1
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean17976.615
Minimum0
Maximum1957909.5
Zeros58
Zeros (%)5.2%
Negative0
Negative (%)0.0%
Memory size9.9 KiB
2023-12-13T09:41:49.188324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q178.95
median991.6
Q35763.275
95-th percentile80083.125
Maximum1957909.5
Range1957909.5
Interquartile range (IQR)5684.325

Descriptive statistics

Standard deviation84442.932
Coefficient of variation (CV)4.6973767
Kurtosis276.85937
Mean17976.615
Median Absolute Deviation (MAD)989.3
Skewness14.056499
Sum20061902
Variance7.1306087 × 109
MonotonicityNot monotonic
2023-12-13T09:41:49.297590image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 58
 
5.2%
0.1 8
 
0.7%
0.2 7
 
0.6%
0.9 4
 
0.4%
1.6 3
 
0.3%
1.5 3
 
0.3%
0.4 3
 
0.3%
2.3 3
 
0.3%
9.5 3
 
0.3%
3.3 3
 
0.3%
Other values (979) 1021
91.4%
ValueCountFrequency (%)
0.0 58
5.2%
0.1 8
 
0.7%
0.2 7
 
0.6%
0.3 3
 
0.3%
0.4 3
 
0.3%
0.5 2
 
0.2%
0.6 1
 
0.1%
0.7 1
 
0.1%
0.8 2
 
0.2%
0.9 4
 
0.4%
ValueCountFrequency (%)
1957909.5 1
0.1%
1075668.4 1
0.1%
537200.4 1
0.1%
499623.2 1
0.1%
482916.9 1
0.1%
458477.3 1
0.1%
442631.9 1
0.1%
426682.8 1
0.1%
376532.3 1
0.1%
361731.5 1
0.1%

총질소 배출량
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct1024
Distinct (%)99.7%
Missing90
Missing (%)8.1%
Infinite0
Infinite (%)0.0%
Mean76534.536
Minimum0
Maximum7059490.7
Zeros3
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size9.9 KiB
2023-12-13T09:41:49.403485image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile147.53
Q11204.25
median4845.7
Q326217.1
95-th percentile307782.98
Maximum7059490.7
Range7059490.7
Interquartile range (IQR)25012.85

Descriptive statistics

Standard deviation340796.62
Coefficient of variation (CV)4.4528475
Kurtosis206.05173
Mean76534.536
Median Absolute Deviation (MAD)4399
Skewness12.246752
Sum78600969
Variance1.1614234 × 1011
MonotonicityNot monotonic
2023-12-13T09:41:49.506877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 3
 
0.3%
43.2 2
 
0.2%
3127.7 1
 
0.1%
8058.0 1
 
0.1%
750.2 1
 
0.1%
443.1 1
 
0.1%
10.7 1
 
0.1%
13010.9 1
 
0.1%
5515.0 1
 
0.1%
231.7 1
 
0.1%
Other values (1014) 1014
90.8%
(Missing) 90
 
8.1%
ValueCountFrequency (%)
0.0 3
0.3%
0.1 1
 
0.1%
0.4 1
 
0.1%
1.3 1
 
0.1%
2.0 1
 
0.1%
2.2 1
 
0.1%
8.2 1
 
0.1%
9.1 1
 
0.1%
9.6 1
 
0.1%
10.7 1
 
0.1%
ValueCountFrequency (%)
7059490.7 1
0.1%
4422279.3 1
0.1%
2792729.4 1
0.1%
2424608.6 1
0.1%
1968647.7 1
0.1%
1860595.5 1
0.1%
1719874.8 1
0.1%
1683698.8 1
0.1%
1402254.4 1
0.1%
1303658.1 1
0.1%

총인 배출량
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct396
Distinct (%)39.1%
Missing103
Missing (%)9.2%
Infinite0
Infinite (%)0.0%
Mean1605.5534
Minimum0
Maximum109115.5
Zeros505
Zeros (%)45.2%
Negative0
Negative (%)0.0%
Memory size9.9 KiB
2023-12-13T09:41:49.612680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.1
Q396.175
95-th percentile7259.9
Maximum109115.5
Range109115.5
Interquartile range (IQR)96.175

Descriptive statistics

Standard deviation7605.628
Coefficient of variation (CV)4.7370759
Kurtosis80.526929
Mean1605.5534
Median Absolute Deviation (MAD)0.1
Skewness8.1761724
Sum1628031.1
Variance57845578
MonotonicityNot monotonic
2023-12-13T09:41:49.710983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 505
45.2%
0.2 18
 
1.6%
0.3 14
 
1.3%
0.4 11
 
1.0%
0.1 9
 
0.8%
0.9 7
 
0.6%
2.6 6
 
0.5%
0.7 6
 
0.5%
1.1 5
 
0.4%
1.3 5
 
0.4%
Other values (386) 428
38.3%
(Missing) 103
 
9.2%
ValueCountFrequency (%)
0.0 505
45.2%
0.1 9
 
0.8%
0.2 18
 
1.6%
0.3 14
 
1.3%
0.4 11
 
1.0%
0.5 4
 
0.4%
0.6 2
 
0.2%
0.7 6
 
0.5%
0.8 3
 
0.3%
0.9 7
 
0.6%
ValueCountFrequency (%)
109115.5 1
0.1%
86871.0 1
0.1%
67944.0 1
0.1%
66568.0 1
0.1%
61266.6 1
0.1%
59488.3 1
0.1%
54510.7 1
0.1%
51630.4 1
0.1%
48112.9 1
0.1%
47652.6 1
0.1%

Interactions

2023-12-13T09:41:47.466458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.039875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.380547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.683784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:47.085048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:47.536370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.103859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.446978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.761165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:47.161849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:47.592715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.162789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.500507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.827371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:47.222887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:47.672557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.236911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.561544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.918801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:47.305813image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:47.760556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.316307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.622557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:46.997875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:41:47.380324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T09:41:49.774657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
방류구총유기탄소 배출량부유물질 배출량총질소 배출량총인 배출량
방류구1.0000.1900.0480.0830.000
총유기탄소 배출량0.1901.0001.0001.0000.944
부유물질 배출량0.0481.0001.0000.8960.850
총질소 배출량0.0831.0000.8961.0000.883
총인 배출량0.0000.9440.8500.8831.000
2023-12-13T09:41:49.849492image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
방류구총유기탄소 배출량부유물질 배출량총질소 배출량총인 배출량
방류구1.0000.2140.1170.1520.127
총유기탄소 배출량0.2141.0000.8370.8350.705
부유물질 배출량0.1170.8371.0000.9120.742
총질소 배출량0.1520.8350.9121.0000.735
총인 배출량0.1270.7050.7420.7351.000

Missing values

2023-12-13T09:41:47.852721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T09:41:47.936358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T09:41:48.026663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연도사업장명방류구총유기탄소 배출량부유물질 배출량총질소 배출량총인 배출량
02022영천신녕하수1<NA>26.61446.10.0
12022성주폐수1<NA>76.61721.10.0
22022봉화춘양하수1<NA>4.3704.30.0
32022함안군북하수10.40.3583.10.0
42022사천곤양하수1323.6113.62181.43.7
52022고창대산하수1280.77.61706.00.0
62022김천아포하수1<NA>1586.82058.00.0
72022중앙특수제지(포천)11050.129.6522.10.0
82022한국서부발전(태안)4<NA>486.81735.72.6
92022괴산대제폐수175.7235.1309.40.0
연도사업장명방류구총유기탄소 배출량부유물질 배출량총질소 배출량총인 배출량
11072022울진후포하수1<NA>93.58189.59.7
11082022화천산양하수1<NA>290.52048.20.0
11092022양주옥정하수1<NA>15459.386176.0330.1
11102022나주산단폐수1<NA>237.5815.90.9
11112022켐트로닉스(세종)1<NA>1158.8814.10.0
11122022화성서신하수1<NA>294.11103.80.0
11132022하남하수1<NA>2928.851521.10.0
11142022의성금성하수1835.9276.22154.70.0
11152022밀양정수(밀양)1<NA>269.2<NA><NA>
11162022화성매송하수14338.7524.01893.00.8