Overview

Dataset statistics

Number of variables4
Number of observations203
Missing cells16
Missing cells (%)2.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.9 KiB
Average record size in memory34.7 B

Variable types

Numeric2
Categorical1
Text1

Dataset

Description스마트워터그리드(수도원격계량기) 집계 현황
Author가평군
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=UE5LDMA24ZVIE209NQW132062810&infSeq=1

Alerts

설치대수 is highly overall correlated with 시군명High correlation
시군명 is highly overall correlated with 설치대수High correlation
지역 has 16 (7.9%) missing valuesMissing
설치대수 has 10 (4.9%) zerosZeros

Reproduction

Analysis started2023-12-10 22:33:46.296344
Analysis finished2023-12-10 22:33:47.003631
Duration0.71 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

설치연도
Real number (ℝ)

Distinct10
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2020.7389
Minimum2013
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-11T07:33:47.078953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2013
5-th percentile2018
Q12020
median2021
Q32022
95-th percentile2022
Maximum2022
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.4944825
Coefficient of variation (CV)0.0007395723
Kurtosis7.0749081
Mean2020.7389
Median Absolute Deviation (MAD)1
Skewness-2.3127422
Sum410210
Variance2.233478
MonotonicityNot monotonic
2023-12-11T07:33:47.210667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2021 84
41.4%
2022 64
31.5%
2020 27
 
13.3%
2019 16
 
7.9%
2017 3
 
1.5%
2018 3
 
1.5%
2015 2
 
1.0%
2016 2
 
1.0%
2013 1
 
0.5%
2014 1
 
0.5%
ValueCountFrequency (%)
2013 1
 
0.5%
2014 1
 
0.5%
2015 2
 
1.0%
2016 2
 
1.0%
2017 3
 
1.5%
2018 3
 
1.5%
2019 16
 
7.9%
2020 27
 
13.3%
2021 84
41.4%
2022 64
31.5%
ValueCountFrequency (%)
2022 64
31.5%
2021 84
41.4%
2020 27
 
13.3%
2019 16
 
7.9%
2018 3
 
1.5%
2017 3
 
1.5%
2016 2
 
1.0%
2015 2
 
1.0%
2014 1
 
0.5%
2013 1
 
0.5%

시군명
Categorical

HIGH CORRELATION 

Distinct26
Distinct (%)12.8%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
양주시
44 
용인시
30 
이천시
25 
의정부시
21 
동두천시
11 
Other values (21)
72 

Length

Max length4
Median length3
Mean length3.1576355
Min length3

Unique

Unique9 ?
Unique (%)4.4%

Sample

1st row군포시
2nd row과천시
3rd row수원시
4th row수원시
5th row수원시

Common Values

ValueCountFrequency (%)
양주시 44
21.7%
용인시 30
14.8%
이천시 25
12.3%
의정부시 21
10.3%
동두천시 11
 
5.4%
안산시 11
 
5.4%
오산시 10
 
4.9%
가평군 7
 
3.4%
구리시 7
 
3.4%
수원시 5
 
2.5%
Other values (16) 32
15.8%

Length

2023-12-11T07:33:47.352378image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
양주시 44
21.7%
용인시 30
14.8%
이천시 25
12.3%
의정부시 21
10.3%
동두천시 11
 
5.4%
안산시 11
 
5.4%
오산시 10
 
4.9%
가평군 7
 
3.4%
구리시 7
 
3.4%
수원시 5
 
2.5%
Other values (16) 32
15.8%

지역
Text

MISSING 

Distinct132
Distinct (%)70.6%
Missing16
Missing (%)7.9%
Memory size1.7 KiB
2023-12-11T07:33:47.730067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length3
Mean length3.6951872
Min length1

Characters and Unicode

Total characters691
Distinct characters129
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)57.2%

Sample

1st row장안구
2nd row장안구
3rd row팔달구
4th row권선구
5th row영통구
ValueCountFrequency (%)
오산시 10
 
4.7%
일대 10
 
4.7%
단원구 8
 
3.7%
회천2동 4
 
1.9%
장흥면 4
 
1.9%
은현면 4
 
1.9%
양주1동 4
 
1.9%
회천1동 4
 
1.9%
양주2동 4
 
1.9%
회천3동 4
 
1.9%
Other values (130) 159
74.0%
2023-12-11T07:33:48.294377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
130
 
18.8%
29
 
4.2%
28
 
4.1%
27
 
3.9%
23
 
3.3%
18
 
2.6%
16
 
2.3%
16
 
2.3%
15
 
2.2%
15
 
2.2%
Other values (119) 374
54.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 636
92.0%
Space Separator 28
 
4.1%
Decimal Number 24
 
3.5%
Other Punctuation 2
 
0.3%
Dash Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
130
 
20.4%
29
 
4.6%
27
 
4.2%
23
 
3.6%
18
 
2.8%
16
 
2.5%
16
 
2.5%
15
 
2.4%
15
 
2.4%
14
 
2.2%
Other values (112) 333
52.4%
Decimal Number
ValueCountFrequency (%)
2 8
33.3%
1 8
33.3%
3 4
16.7%
4 4
16.7%
Space Separator
ValueCountFrequency (%)
28
100.0%
Other Punctuation
ValueCountFrequency (%)
, 2
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 636
92.0%
Common 55
 
8.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
130
 
20.4%
29
 
4.6%
27
 
4.2%
23
 
3.6%
18
 
2.8%
16
 
2.5%
16
 
2.5%
15
 
2.4%
15
 
2.4%
14
 
2.2%
Other values (112) 333
52.4%
Common
ValueCountFrequency (%)
28
50.9%
2 8
 
14.5%
1 8
 
14.5%
3 4
 
7.3%
4 4
 
7.3%
, 2
 
3.6%
- 1
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 636
92.0%
ASCII 55
 
8.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
130
 
20.4%
29
 
4.6%
27
 
4.2%
23
 
3.6%
18
 
2.8%
16
 
2.5%
16
 
2.5%
15
 
2.4%
15
 
2.4%
14
 
2.2%
Other values (112) 333
52.4%
ASCII
ValueCountFrequency (%)
28
50.9%
2 8
 
14.5%
1 8
 
14.5%
3 4
 
7.3%
4 4
 
7.3%
, 2
 
3.6%
- 1
 
1.8%

설치대수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct147
Distinct (%)72.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean910.92118
Minimum0
Maximum28047
Zeros10
Zeros (%)4.9%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-11T07:33:48.458106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q115
median90
Q3770
95-th percentile4019.3
Maximum28047
Range28047
Interquartile range (IQR)755

Descriptive statistics

Standard deviation2637.8285
Coefficient of variation (CV)2.8957813
Kurtosis66.524904
Mean910.92118
Median Absolute Deviation (MAD)88
Skewness7.3564465
Sum184917
Variance6958139.2
MonotonicityNot monotonic
2023-12-11T07:33:48.608004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 10
 
4.9%
1 7
 
3.4%
7 5
 
2.5%
2 5
 
2.5%
4 4
 
2.0%
9 4
 
2.0%
8 3
 
1.5%
10 3
 
1.5%
3 3
 
1.5%
6 3
 
1.5%
Other values (137) 156
76.8%
ValueCountFrequency (%)
0 10
4.9%
1 7
3.4%
2 5
2.5%
3 3
 
1.5%
4 4
 
2.0%
5 1
 
0.5%
6 3
 
1.5%
7 5
2.5%
8 3
 
1.5%
9 4
 
2.0%
ValueCountFrequency (%)
28047 1
0.5%
19293 1
0.5%
7263 1
0.5%
5442 1
0.5%
5231 1
0.5%
5221 1
0.5%
5078 1
0.5%
4751 1
0.5%
4215 1
0.5%
4128 1
0.5%

Interactions

2023-12-11T07:33:46.666080image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:33:46.467202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:33:46.765218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:33:46.563716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T07:33:48.698352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설치연도시군명설치대수
설치연도1.0000.7250.000
시군명0.7251.0000.963
설치대수0.0000.9631.000
2023-12-11T07:33:48.784694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설치연도설치대수시군명
설치연도1.0000.0790.326
설치대수0.0791.0000.804
시군명0.3260.8041.000

Missing values

2023-12-11T07:33:46.890565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T07:33:46.971568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

설치연도시군명지역설치대수
02019군포시<NA>427
12021과천시<NA>1830
22020수원시장안구426
32021수원시장안구96
42021수원시팔달구580
52021수원시권선구727
62021수원시영통구554
72019부천시부천시2995
82021부천시부천시28047
92021평택시<NA>3761
설치연도시군명지역설치대수
1932019양주시은현면6
1942019양주시양주1동0
1952019양주시양주2동1
1962019양주시장흥면2
1972019양주시회천1동4
1982019양주시회천2동3
1992019양주시회천3동2
2002019양주시회천4동0
2012020양주시광적면6
2022020양주시백석읍21