Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory722.7 KiB
Average record size in memory74.0 B

Variable types

Numeric2
Categorical6

Dataset

Description일자,사업소 구분 코드,사업소 구분명,사업소 코드,사업소 명,유량 구분 코드,유량 구분 명,측정값(톤)
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-20565/S/1/datasetView.do

Alerts

유량 구분 명 is highly overall correlated with 사업소 구분 코드 and 4 other fieldsHigh correlation
사업소 구분 코드 is highly overall correlated with 사업소 구분명 and 4 other fieldsHigh correlation
유량 구분 코드 is highly overall correlated with 사업소 구분 코드 and 4 other fieldsHigh correlation
사업소 코드 is highly overall correlated with 사업소 구분 코드 and 4 other fieldsHigh correlation
사업소 구분명 is highly overall correlated with 사업소 구분 코드 and 4 other fieldsHigh correlation
사업소 명 is highly overall correlated with 사업소 구분 코드 and 4 other fieldsHigh correlation
측정값(톤) is highly skewed (γ1 = -89.53798489)Skewed

Reproduction

Analysis started2024-05-10 22:19:04.892323
Analysis finished2024-05-10 22:19:08.255163
Duration3.36 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

일자
Real number (ℝ)

Distinct1562
Distinct (%)15.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20217269
Minimum20200101
Maximum20240509
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-10T22:19:08.490067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20200101
5-th percentile20200317
Q120210112
median20220210
Q320230322
95-th percentile20240216
Maximum20240509
Range40408
Interquartile range (IQR)20210

Descriptive statistics

Standard deviation12680.83
Coefficient of variation (CV)0.00062722762
Kurtosis-1.1482819
Mean20217269
Median Absolute Deviation (MAD)10105
Skewness0.15482534
Sum2.0217269 × 1011
Variance1.6080344 × 108
MonotonicityNot monotonic
2024-05-10T22:19:09.103021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20210626 14
 
0.1%
20201222 13
 
0.1%
20220226 12
 
0.1%
20221225 12
 
0.1%
20210826 12
 
0.1%
20211118 12
 
0.1%
20201128 12
 
0.1%
20220527 12
 
0.1%
20221002 12
 
0.1%
20210125 12
 
0.1%
Other values (1552) 9877
98.8%
ValueCountFrequency (%)
20200101 8
0.1%
20200102 6
0.1%
20200103 7
0.1%
20200104 9
0.1%
20200105 8
0.1%
20200106 9
0.1%
20200107 4
< 0.1%
20200108 9
0.1%
20200109 6
0.1%
20200110 4
< 0.1%
ValueCountFrequency (%)
20240509 7
0.1%
20240508 8
0.1%
20240507 5
0.1%
20240506 5
0.1%
20240505 5
0.1%
20240504 4
< 0.1%
20240503 7
0.1%
20240502 7
0.1%
20240501 8
0.1%
20240430 6
0.1%

사업소 구분 코드
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
P
5440 
W
4560 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowW
2nd rowP
3rd rowP
4th rowW
5th rowP

Common Values

ValueCountFrequency (%)
P 5440
54.4%
W 4560
45.6%

Length

2024-05-10T22:19:09.469779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-10T22:19:09.749234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
p 5440
54.4%
w 4560
45.6%

사업소 구분명
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
정수센터
5440 
수도사업소
4560 

Length

Max length5
Median length4
Mean length4.456
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row수도사업소
2nd row정수센터
3rd row정수센터
4th row수도사업소
5th row정수센터

Common Values

ValueCountFrequency (%)
정수센터 5440
54.4%
수도사업소 4560
45.6%

Length

2024-05-10T22:19:10.111737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-10T22:19:10.434746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정수센터 5440
54.4%
수도사업소 4560
45.6%

사업소 코드
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
PR0055
926 
PR0370
911 
PR0710
910 
PR0407
906 
PR0183
904 
Other values (11)
5443 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWW0003
2nd rowPR0407
3rd rowPR0065
4th rowWW0011
5th rowPR0055

Common Values

ValueCountFrequency (%)
PR0055 926
 
9.3%
PR0370 911
 
9.1%
PR0710 910
 
9.1%
PR0407 906
 
9.1%
PR0183 904
 
9.0%
PR0065 883
 
8.8%
WW0010 476
 
4.8%
RW0003 471
 
4.7%
WW0011 470
 
4.7%
WW0007 464
 
4.6%
Other values (6) 2679
26.8%

Length

2024-05-10T22:19:10.674410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pr0055 926
 
9.3%
pr0370 911
 
9.1%
pr0710 910
 
9.1%
pr0407 906
 
9.1%
pr0183 904
 
9.0%
pr0065 883
 
8.8%
ww0010 476
 
4.8%
rw0003 471
 
4.7%
ww0011 470
 
4.7%
ww0007 464
 
4.6%
Other values (6) 2679
26.8%

사업소 명
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
광암
926 
암사
911 
강북
910 
영등포
906 
뚝도
904 
Other values (11)
5443 

Length

Max length7
Median length2
Mean length2.5431
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row동부
2nd row영등포
3rd row구의
4th row강동
5th row광암

Common Values

ValueCountFrequency (%)
광암 926
 
9.3%
암사 911
 
9.1%
강북 910
 
9.1%
영등포 906
 
9.1%
뚝도 904
 
9.0%
구의 883
 
8.8%
강남 476
 
4.8%
동부(시계외) 471
 
4.7%
강동 470
 
4.7%
강서 464
 
4.6%
Other values (6) 2679
26.8%

Length

2024-05-10T22:19:11.070556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
광암 926
 
9.3%
암사 911
 
9.1%
강북 910
 
9.1%
영등포 906
 
9.1%
뚝도 904
 
9.0%
구의 883
 
8.8%
강남 476
 
4.8%
동부(시계외 471
 
4.7%
강동 470
 
4.7%
강서 464
 
4.6%
Other values (6) 2679
26.8%

유량 구분 코드
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
T
4560 
C
2763 
S
2677 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowT
2nd rowS
3rd rowS
4th rowT
5th rowC

Common Values

ValueCountFrequency (%)
T 4560
45.6%
C 2763
27.6%
S 2677
26.8%

Length

2024-05-10T22:19:11.445549image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-10T22:19:11.751771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
t 4560
45.6%
c 2763
27.6%
s 2677
26.8%

유량 구분 명
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
공급량
4560 
취수
2763 
송수
2677 

Length

Max length3
Median length2
Mean length2.456
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row공급량
2nd row송수
3rd row송수
4th row공급량
5th row취수

Common Values

ValueCountFrequency (%)
공급량 4560
45.6%
취수 2763
27.6%
송수 2677
26.8%

Length

2024-05-10T22:19:12.111864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-10T22:19:12.435772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
공급량 4560
45.6%
취수 2763
27.6%
송수 2677
26.8%

측정값(톤)
Real number (ℝ)

SKEWED 

Distinct9279
Distinct (%)92.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean85253.904
Minimum-4.2946647 × 109
Maximum1.0719865 × 109
Zeros0
Zeros (%)0.0%
Negative15
Negative (%)0.1%
Memory size166.0 KiB
2024-05-10T22:19:12.809338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4.2946647 × 109
5-th percentile72972.95
Q1306026.75
median387587.5
Q3468544.75
95-th percentile952225
Maximum1.0719865 × 109
Range5.3666512 × 109
Interquartile range (IQR)162518

Descriptive statistics

Standard deviation44331493
Coefficient of variation (CV)519.9937
Kurtosis8845.8853
Mean85253.904
Median Absolute Deviation (MAD)81229.5
Skewness-89.537985
Sum8.5253904 × 108
Variance1.9652813 × 1015
MonotonicityNot monotonic
2024-05-10T22:19:13.279081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
222500 11
 
0.1%
218400 8
 
0.1%
218700 7
 
0.1%
221600 7
 
0.1%
215000 7
 
0.1%
222000 6
 
0.1%
218500 6
 
0.1%
217500 6
 
0.1%
211500 6
 
0.1%
219400 6
 
0.1%
Other values (9269) 9930
99.3%
ValueCountFrequency (%)
-4294664741 1
< 0.1%
-99628626 1
< 0.1%
-89806583 1
< 0.1%
-89615122 1
< 0.1%
-54032502 1
< 0.1%
-52837400 1
< 0.1%
-45677489 1
< 0.1%
-9644589 1
< 0.1%
-9518026 1
< 0.1%
-2860612 1
< 0.1%
ValueCountFrequency (%)
1071986500 1
< 0.1%
103866986 1
< 0.1%
82532072 1
< 0.1%
53711716 1
< 0.1%
27042433 1
< 0.1%
3482411 1
< 0.1%
3420049 1
< 0.1%
1857494 1
< 0.1%
1157000 1
< 0.1%
1154377 1
< 0.1%

Interactions

2024-05-10T22:19:06.808088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:19:06.203859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:19:07.121376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-10T22:19:06.518613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-10T22:19:13.563891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일자사업소 구분 코드사업소 구분명사업소 코드사업소 명유량 구분 코드유량 구분 명측정값(톤)
일자1.0000.0000.0000.0000.0000.0000.0000.026
사업소 구분 코드0.0001.0001.0001.0001.0001.0001.0000.014
사업소 구분명0.0001.0001.0001.0001.0001.0001.0000.014
사업소 코드0.0001.0001.0001.0001.0000.8480.8480.016
사업소 명0.0001.0001.0001.0001.0000.8480.8480.016
유량 구분 코드0.0001.0001.0000.8480.8481.0001.0000.058
유량 구분 명0.0001.0001.0000.8480.8481.0001.0000.058
측정값(톤)0.0260.0140.0140.0160.0160.0580.0581.000
2024-05-10T22:19:13.875854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
유량 구분 명사업소 구분 코드유량 구분 코드사업소 코드사업소 구분명사업소 명
유량 구분 명1.0001.0001.0000.7061.0000.706
사업소 구분 코드1.0001.0001.0000.9991.0000.999
유량 구분 코드1.0001.0001.0000.7061.0000.706
사업소 코드0.7060.9990.7061.0000.9991.000
사업소 구분명1.0001.0001.0000.9991.0000.999
사업소 명0.7060.9990.7061.0000.9991.000
2024-05-10T22:19:14.170069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일자측정값(톤)사업소 구분 코드사업소 구분명사업소 코드사업소 명유량 구분 코드유량 구분 명
일자1.000-0.0470.0000.0000.0000.0000.0000.000
측정값(톤)-0.0471.0000.0240.0240.0150.0150.0160.016
사업소 구분 코드0.0000.0241.0001.0000.9990.9991.0001.000
사업소 구분명0.0000.0241.0001.0000.9990.9991.0001.000
사업소 코드0.0000.0150.9990.9991.0001.0000.7060.706
사업소 명0.0000.0150.9990.9991.0001.0000.7060.706
유량 구분 코드0.0000.0161.0001.0000.7060.7061.0001.000
유량 구분 명0.0000.0161.0001.0000.7060.7061.0001.000

Missing values

2024-05-10T22:19:07.495408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-10T22:19:07.938939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

일자사업소 구분 코드사업소 구분명사업소 코드사업소 명유량 구분 코드유량 구분 명측정값(톤)
46820240418W수도사업소WW0003동부T공급량567048
461020231005P정수센터PR0407영등포S송수411227
250420240110P정수센터PR0065구의S송수343852
176020240216W수도사업소WW0011강동T공급량305925
325520231206P정수센터PR0055광암C취수215800
649120230708W수도사업소WW0010강남T공급량345171
1191420221030P정수센터PR0407영등포C취수397370
2418820210407P정수센터PR0710강북C취수669656
2386020210422P정수센터PR0407영등포C취수442530
2406620210413P정수센터PR0055광암C취수218100
일자사업소 구분 코드사업소 구분명사업소 코드사업소 명유량 구분 코드유량 구분 명측정값(톤)
3171320200430P정수센터PR0710강북S송수714059
2724120201119W수도사업소WW0005북부T공급량310177
1209320221022P정수센터PR0370암사C취수934500
1807420220123P정수센터PR0407영등포S송수431571
2785420201022W수도사업소WW0009남부T공급량460016
2711020201125W수도사업소WW0003동부T공급량479640
1456820220701W수도사업소WW0006서부T공급량331158
126820240310P정수센터PR0370암사S송수803900
1673320220325P정수센터PR0407영등포C취수416650
2642120201227P정수센터PR0055광암S송수212400