Overview

Dataset statistics

Number of variables8
Number of observations1223
Missing cells16
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory83.7 KiB
Average record size in memory70.1 B

Variable types

Numeric5
Categorical2
DateTime1

Dataset

Description대장관리번호,자치구명,자치구 코드,집수면적,처리용량,시설용량,이용량,설치일자
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15646/S/1/datasetView.do

Alerts

대장관리번호 is highly overall correlated with 집수면적High correlation
자치구 코드 is highly overall correlated with 자치구명High correlation
집수면적 is highly overall correlated with 대장관리번호High correlation
처리용량 is highly overall correlated with 시설용량High correlation
시설용량 is highly overall correlated with 처리용량High correlation
자치구명 is highly overall correlated with 자치구 코드High correlation
이용량 is highly imbalanced (98.6%)Imbalance
설치일자 has 16 (1.3%) missing valuesMissing
대장관리번호 has unique valuesUnique
집수면적 has 976 (79.8%) zerosZeros
처리용량 has 302 (24.7%) zerosZeros
시설용량 has 42 (3.4%) zerosZeros

Reproduction

Analysis started2024-05-10 22:15:43.144203
Analysis finished2024-05-10 22:15:54.583262
Duration11.44 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

대장관리번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct1223
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3161.1594
Minimum14
Maximum4162
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.9 KiB
2024-05-10T22:15:54.938041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile430.1
Q13139.5
median3517
Q33850.5
95-th percentile4097.9
Maximum4162
Range4148
Interquartile range (IQR)711

Descriptive statistics

Standard deviation1101.3145
Coefficient of variation (CV)0.34838941
Kurtosis1.5353706
Mean3161.1594
Median Absolute Deviation (MAD)354
Skewness-1.6827366
Sum3866098
Variance1212893.6
MonotonicityStrictly decreasing
2024-05-10T22:15:55.543588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4162 1
 
0.1%
3276 1
 
0.1%
3269 1
 
0.1%
3270 1
 
0.1%
3271 1
 
0.1%
3272 1
 
0.1%
3273 1
 
0.1%
3274 1
 
0.1%
3275 1
 
0.1%
3277 1
 
0.1%
Other values (1213) 1213
99.2%
ValueCountFrequency (%)
14 1
0.1%
67 1
0.1%
68 1
0.1%
71 1
0.1%
84 1
0.1%
100 1
0.1%
101 1
0.1%
105 1
0.1%
106 1
0.1%
115 1
0.1%
ValueCountFrequency (%)
4162 1
0.1%
4161 1
0.1%
4160 1
0.1%
4159 1
0.1%
4158 1
0.1%
4157 1
0.1%
4156 1
0.1%
4155 1
0.1%
4154 1
0.1%
4153 1
0.1%

자치구명
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size9.7 KiB
서초구
119 
은평구
81 
송파구
80 
도봉구
 
76
강서구
 
76
Other values (20)
791 

Length

Max length4
Median length3
Mean length3.0801308
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강남구
2nd row영등포구
3rd row영등포구
4th row도봉구
5th row관악구

Common Values

ValueCountFrequency (%)
서초구 119
 
9.7%
은평구 81
 
6.6%
송파구 80
 
6.5%
도봉구 76
 
6.2%
강서구 76
 
6.2%
성북구 73
 
6.0%
광진구 64
 
5.2%
노원구 58
 
4.7%
강동구 54
 
4.4%
동대문구 53
 
4.3%
Other values (15) 489
40.0%

Length

2024-05-10T22:15:55.978265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서초구 119
 
9.7%
은평구 81
 
6.6%
송파구 80
 
6.5%
도봉구 76
 
6.2%
강서구 76
 
6.2%
성북구 73
 
6.0%
광진구 64
 
5.2%
노원구 58
 
4.7%
강동구 54
 
4.4%
동대문구 53
 
4.3%
Other values (15) 489
40.0%

자치구 코드
Real number (ℝ)

HIGH CORRELATION 

Distinct25
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11446.672
Minimum11110
Maximum11740
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.9 KiB
2024-05-10T22:15:56.269980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11110
5-th percentile11170
Q111290
median11440
Q311620
95-th percentile11710
Maximum11740
Range630
Interquartile range (IQR)330

Descriptive statistics

Standard deviation182.80045
Coefficient of variation (CV)0.015969746
Kurtosis-1.303038
Mean11446.672
Median Absolute Deviation (MAD)150
Skewness0.053969132
Sum13999280
Variance33416.003
MonotonicityNot monotonic
2024-05-10T22:15:56.621462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
11650 119
 
9.7%
11380 81
 
6.6%
11710 80
 
6.5%
11320 76
 
6.2%
11500 76
 
6.2%
11290 73
 
6.0%
11215 64
 
5.2%
11350 58
 
4.7%
11740 54
 
4.4%
11230 53
 
4.3%
Other values (15) 489
40.0%
ValueCountFrequency (%)
11110 17
 
1.4%
11140 20
 
1.6%
11170 30
 
2.5%
11200 29
 
2.4%
11215 64
5.2%
11230 53
4.3%
11260 40
3.3%
11290 73
6.0%
11305 33
2.7%
11320 76
6.2%
ValueCountFrequency (%)
11740 54
4.4%
11710 80
6.5%
11680 50
4.1%
11650 119
9.7%
11620 53
4.3%
11590 19
 
1.6%
11560 31
 
2.5%
11545 31
 
2.5%
11530 40
 
3.3%
11500 76
6.2%

집수면적
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct112
Distinct (%)9.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean545.62796
Minimum0
Maximum32140
Zeros976
Zeros (%)79.8%
Negative0
Negative (%)0.0%
Memory size10.9 KiB
2024-05-10T22:15:56.974600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2980
Maximum32140
Range32140
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2751.5602
Coefficient of variation (CV)5.0429237
Kurtosis56.973135
Mean545.62796
Median Absolute Deviation (MAD)0
Skewness7.1361345
Sum667303
Variance7571083.4
MonotonicityNot monotonic
2024-05-10T22:15:57.336752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 976
79.8%
20 31
 
2.5%
40 25
 
2.0%
12 10
 
0.8%
200 8
 
0.7%
1000 5
 
0.4%
7 5
 
0.4%
600 5
 
0.4%
6 4
 
0.3%
60 4
 
0.3%
Other values (102) 150
 
12.3%
ValueCountFrequency (%)
0 976
79.8%
4 2
 
0.2%
6 4
 
0.3%
7 5
 
0.4%
8 3
 
0.2%
10 2
 
0.2%
12 10
 
0.8%
13 3
 
0.2%
19 1
 
0.1%
20 31
 
2.5%
ValueCountFrequency (%)
32140 1
0.1%
27140 1
0.1%
26060 1
0.1%
24000 1
0.1%
23964 1
0.1%
23600 1
0.1%
23278 2
0.2%
21040 1
0.1%
19000 2
0.2%
18208 2
0.2%

처리용량
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct123
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.009812
Minimum0
Maximum1520
Zeros302
Zeros (%)24.7%
Negative0
Negative (%)0.0%
Memory size10.9 KiB
2024-05-10T22:15:57.747529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q310
95-th percentile101.8
Maximum1520
Range1520
Interquartile range (IQR)9

Descriptive statistics

Standard deviation113.29591
Coefficient of variation (CV)4.0448652
Kurtosis69.66391
Mean28.009812
Median Absolute Deviation (MAD)2
Skewness7.5657625
Sum34256
Variance12835.964
MonotonicityNot monotonic
2024-05-10T22:15:58.190745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 302
24.7%
1 293
24.0%
2 129
10.5%
3 39
 
3.2%
9 31
 
2.5%
5 31
 
2.5%
6 28
 
2.3%
7 24
 
2.0%
8 18
 
1.5%
4 17
 
1.4%
Other values (113) 311
25.4%
ValueCountFrequency (%)
0 302
24.7%
1 293
24.0%
2 129
10.5%
3 39
 
3.2%
4 17
 
1.4%
5 31
 
2.5%
6 28
 
2.3%
7 24
 
2.0%
8 18
 
1.5%
9 31
 
2.5%
ValueCountFrequency (%)
1520 1
0.1%
1350 1
0.1%
1260 1
0.1%
1004 1
0.1%
915 1
0.1%
908 1
0.1%
830 1
0.1%
800 1
0.1%
720 1
0.1%
642 1
0.1%

시설용량
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct260
Distinct (%)21.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean135.8937
Minimum0
Maximum3840
Zeros42
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size10.9 KiB
2024-05-10T22:15:58.632174image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median13
Q3148
95-th percentile643.8
Maximum3840
Range3840
Interquartile range (IQR)147

Descriptive statistics

Standard deviation285.8763
Coefficient of variation (CV)2.1036758
Kurtosis37.252051
Mean135.8937
Median Absolute Deviation (MAD)12
Skewness4.7199771
Sum166198
Variance81725.257
MonotonicityNot monotonic
2024-05-10T22:15:59.085385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 340
27.8%
2 156
 
12.8%
0 42
 
3.4%
10 21
 
1.7%
40 20
 
1.6%
50 20
 
1.6%
30 20
 
1.6%
150 17
 
1.4%
100 17
 
1.4%
5 17
 
1.4%
Other values (250) 553
45.2%
ValueCountFrequency (%)
0 42
 
3.4%
1 340
27.8%
2 156
12.8%
3 8
 
0.7%
4 9
 
0.7%
5 17
 
1.4%
6 5
 
0.4%
7 1
 
0.1%
8 8
 
0.7%
9 1
 
0.1%
ValueCountFrequency (%)
3840 1
 
0.1%
3000 1
 
0.1%
2000 1
 
0.1%
1800 1
 
0.1%
1607 1
 
0.1%
1520 1
 
0.1%
1400 4
0.3%
1357 1
 
0.1%
1350 1
 
0.1%
1316 1
 
0.1%

이용량
Categorical

IMBALANCE 

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size9.7 KiB
0
1220 
175
 
1
340
 
1
1300
 
1

Length

Max length4
Median length1
Mean length1.0057236
Min length1

Unique

Unique3 ?
Unique (%)0.2%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1220
99.8%
175 1
 
0.1%
340 1
 
0.1%
1300 1
 
0.1%

Length

2024-05-10T22:15:59.530122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-10T22:15:59.888575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 1220
99.8%
175 1
 
0.1%
340 1
 
0.1%
1300 1
 
0.1%

설치일자
Date

MISSING 

Distinct226
Distinct (%)18.7%
Missing16
Missing (%)1.3%
Memory size9.7 KiB
Minimum2003-01-01 00:00:00
Maximum2024-01-04 00:00:00
2024-05-10T22:16:00.192759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:16:00.613760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2024-05-10T22:15:52.398287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:46.890264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:48.260421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:49.688721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:50.995922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:52.665300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:47.174029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:48.521650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:49.937761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:51.256392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:52.952232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:47.461331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:48.830162image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:50.206642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:51.545961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:53.314677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:47.710125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:49.106977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:50.465141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:51.833703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:53.597602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:47.982908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:49.391020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:50.743158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:15:52.114436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-10T22:16:00.846864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대장관리번호자치구명자치구 코드집수면적처리용량시설용량이용량
대장관리번호1.0000.6230.4080.4280.1320.0600.179
자치구명0.6231.0001.0000.3470.2090.2550.290
자치구 코드0.4081.0001.0000.2630.1310.1830.132
집수면적0.4280.3470.2631.0000.0000.6750.480
처리용량0.1320.2090.1310.0001.0000.5350.000
시설용량0.0600.2550.1830.6750.5351.0000.237
이용량0.1790.2900.1320.4800.0000.2371.000
2024-05-10T22:16:01.137583image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자치구명이용량
자치구명1.0000.156
이용량0.1561.000
2024-05-10T22:16:01.388542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대장관리번호자치구 코드집수면적처리용량시설용량자치구명이용량
대장관리번호1.000-0.167-0.5230.040-0.2040.3200.124
자치구 코드-0.1671.0000.062-0.201-0.0350.9940.069
집수면적-0.5230.0621.000-0.0510.0260.1280.306
처리용량0.040-0.201-0.0511.0000.7370.0810.000
시설용량-0.204-0.0350.0260.7371.0000.1030.108
자치구명0.3200.9940.1280.0810.1031.0000.156
이용량0.1240.0690.3060.0000.1080.1561.000

Missing values

2024-05-10T22:15:53.973018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-10T22:15:54.403469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

대장관리번호자치구명자치구 코드집수면적처리용량시설용량이용량설치일자
04162강남구1168000002024-01-04 00:00:00.0
14161영등포구1156001102023-05-24 00:00:00.0
24160영등포구1156001102023-05-24 00:00:00.0
34159도봉구1132001102023-05-02 00:00:00.0
44158관악구1162001102023-09-05 00:00:00.0
54157관악구1162001102023-09-05 00:00:00.0
64156강서구1150001102023-05-09 00:00:00.0
74155관악구1162001102023-05-17 00:00:00.0
84154강북구1130501102023-10-26 00:00:00.0
94153강북구1130501102023-05-02 00:00:00.0
대장관리번호자치구명자치구 코드집수면적처리용량시설용량이용량설치일자
1213115성동구11200176058802013-08-01 00:00:00.0
1214106도봉구11320110035502013-05-01 00:00:00.0
1215105노원구1135092602646302013-08-01 00:00:00.0
1216101중구1114064001832002013-07-01 00:00:00.0
1217100종로구11110400202013-06-01 00:00:00.0
121884마포구1144060023002013-01-01 00:00:00.0
121971서대문구11410800402013-01-01 00:00:00.0
122068서대문구11410400202013-04-01 00:00:00.0
122167서대문구11410200102013-01-01 00:00:00.0
122214중구111402400068120013002013-03-15 00:00:00.0