Overview

Dataset statistics

Number of variables6
Number of observations33
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.8 KiB
Average record size in memory55.0 B

Variable types

Numeric3
Text2
Categorical1

Dataset

Description의정부시 빗물이용시설현황(연번, 건물명, 설치위치, 용량(세제곱미터), 용도, 설치년도)입니다. 조경용수, 청소용수, 화실세정수로 사용되고 있습니다.
URLhttps://www.data.go.kr/data/15114047/fileData.do

Alerts

연번 is highly overall correlated with 설치년도High correlation
설치년도 is highly overall correlated with 연번 and 1 other fieldsHigh correlation
용도 is highly overall correlated with 설치년도High correlation
용도 is highly imbalanced (58.9%)Imbalance
연번 has unique valuesUnique
건물명 has unique valuesUnique
설치위치 has unique valuesUnique

Reproduction

Analysis started2023-12-12 12:08:32.616127
Analysis finished2023-12-12 12:08:34.061417
Duration1.45 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct33
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17
Minimum1
Maximum33
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size429.0 B
2023-12-12T21:08:34.141955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.6
Q19
median17
Q325
95-th percentile31.4
Maximum33
Range32
Interquartile range (IQR)16

Descriptive statistics

Standard deviation9.6695398
Coefficient of variation (CV)0.56879646
Kurtosis-1.2
Mean17
Median Absolute Deviation (MAD)8
Skewness0
Sum561
Variance93.5
MonotonicityStrictly increasing
2023-12-12T21:08:34.731385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
1 1
 
3.0%
26 1
 
3.0%
20 1
 
3.0%
21 1
 
3.0%
22 1
 
3.0%
23 1
 
3.0%
24 1
 
3.0%
25 1
 
3.0%
27 1
 
3.0%
2 1
 
3.0%
Other values (23) 23
69.7%
ValueCountFrequency (%)
1 1
3.0%
2 1
3.0%
3 1
3.0%
4 1
3.0%
5 1
3.0%
6 1
3.0%
7 1
3.0%
8 1
3.0%
9 1
3.0%
10 1
3.0%
ValueCountFrequency (%)
33 1
3.0%
32 1
3.0%
31 1
3.0%
30 1
3.0%
29 1
3.0%
28 1
3.0%
27 1
3.0%
26 1
3.0%
25 1
3.0%
24 1
3.0%

건물명
Text

UNIQUE 

Distinct33
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size396.0 B
2023-12-12T21:08:35.048595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length15
Mean length11.666667
Min length4

Characters and Unicode

Total characters385
Distinct characters147
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)100.0%

Sample

1st row효자중학교
2nd row동암중학교
3rd row발곡고등학교
4th row의정부공업고등학교
5th row의정부역사
ValueCountFrequency (%)
베르디움 3
 
4.2%
호반 3
 
4.2%
증축 2
 
2.8%
별관 2
 
2.8%
1단지 2
 
2.8%
직동-롯데캐슬골드파크 2
 
2.8%
북부청사 2
 
2.8%
효자중학교 1
 
1.4%
부속병원 1
 
1.4%
대방노블랜드 1
 
1.4%
Other values (53) 53
73.6%
2023-12-12T21:08:35.560842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
43
 
11.2%
12
 
3.1%
10
 
2.6%
8
 
2.1%
- 7
 
1.8%
7
 
1.8%
7
 
1.8%
6
 
1.6%
6
 
1.6%
6
 
1.6%
Other values (137) 273
70.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 311
80.8%
Space Separator 43
 
11.2%
Decimal Number 12
 
3.1%
Dash Punctuation 7
 
1.8%
Uppercase Letter 4
 
1.0%
Open Punctuation 3
 
0.8%
Close Punctuation 3
 
0.8%
Lowercase Letter 2
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
12
 
3.9%
10
 
3.2%
8
 
2.6%
7
 
2.3%
7
 
2.3%
6
 
1.9%
6
 
1.9%
6
 
1.9%
6
 
1.9%
6
 
1.9%
Other values (126) 237
76.2%
Decimal Number
ValueCountFrequency (%)
1 5
41.7%
3 3
25.0%
2 3
25.0%
8 1
 
8.3%
Uppercase Letter
ValueCountFrequency (%)
B 2
50.0%
L 2
50.0%
Space Separator
ValueCountFrequency (%)
43
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Lowercase Letter
ValueCountFrequency (%)
e 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 311
80.8%
Common 68
 
17.7%
Latin 6
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
12
 
3.9%
10
 
3.2%
8
 
2.6%
7
 
2.3%
7
 
2.3%
6
 
1.9%
6
 
1.9%
6
 
1.9%
6
 
1.9%
6
 
1.9%
Other values (126) 237
76.2%
Common
ValueCountFrequency (%)
43
63.2%
- 7
 
10.3%
1 5
 
7.4%
3 3
 
4.4%
2 3
 
4.4%
( 3
 
4.4%
) 3
 
4.4%
8 1
 
1.5%
Latin
ValueCountFrequency (%)
B 2
33.3%
L 2
33.3%
e 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 311
80.8%
ASCII 74
 
19.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
43
58.1%
- 7
 
9.5%
1 5
 
6.8%
3 3
 
4.1%
2 3
 
4.1%
( 3
 
4.1%
) 3
 
4.1%
B 2
 
2.7%
L 2
 
2.7%
e 2
 
2.7%
Hangul
ValueCountFrequency (%)
12
 
3.9%
10
 
3.2%
8
 
2.6%
7
 
2.3%
7
 
2.3%
6
 
1.9%
6
 
1.9%
6
 
1.9%
6
 
1.9%
6
 
1.9%
Other values (126) 237
76.2%

설치위치
Text

UNIQUE 

Distinct33
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size396.0 B
2023-12-12T21:08:35.876711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length11
Mean length8.3333333
Min length5

Characters and Unicode

Total characters275
Distinct characters58
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)100.0%

Sample

1st row부용로185번길15
2nd row장곡로226번길 132
3rd row동일로454번길 150
4th row가능로 57
5th row시민로62
ValueCountFrequency (%)
2
 
3.3%
시민로 2
 
3.3%
1 2
 
3.3%
민락로 2
 
3.3%
용현동 2
 
3.3%
고산 2
 
3.3%
부용로185번길15 1
 
1.6%
금오동 1
 
1.6%
산25-36 1
 
1.6%
19필지 1
 
1.6%
Other values (45) 45
73.8%
2023-12-12T21:08:36.363505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
30
 
10.9%
23
 
8.4%
1 18
 
6.5%
3 14
 
5.1%
5 14
 
5.1%
2 13
 
4.7%
11
 
4.0%
10
 
3.6%
4 9
 
3.3%
6 8
 
2.9%
Other values (48) 125
45.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 137
49.8%
Decimal Number 102
37.1%
Space Separator 30
 
10.9%
Dash Punctuation 4
 
1.5%
Uppercase Letter 2
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
23
16.8%
11
 
8.0%
10
 
7.3%
8
 
5.8%
7
 
5.1%
6
 
4.4%
5
 
3.6%
4
 
2.9%
4
 
2.9%
3
 
2.2%
Other values (34) 56
40.9%
Decimal Number
ValueCountFrequency (%)
1 18
17.6%
3 14
13.7%
5 14
13.7%
2 13
12.7%
4 9
8.8%
6 8
7.8%
0 8
7.8%
9 8
7.8%
8 6
 
5.9%
7 4
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
S 1
50.0%
C 1
50.0%
Space Separator
ValueCountFrequency (%)
30
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 137
49.8%
Common 136
49.5%
Latin 2
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
23
16.8%
11
 
8.0%
10
 
7.3%
8
 
5.8%
7
 
5.1%
6
 
4.4%
5
 
3.6%
4
 
2.9%
4
 
2.9%
3
 
2.2%
Other values (34) 56
40.9%
Common
ValueCountFrequency (%)
30
22.1%
1 18
13.2%
3 14
10.3%
5 14
10.3%
2 13
9.6%
4 9
 
6.6%
6 8
 
5.9%
0 8
 
5.9%
9 8
 
5.9%
8 6
 
4.4%
Other values (2) 8
 
5.9%
Latin
ValueCountFrequency (%)
S 1
50.0%
C 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 138
50.2%
Hangul 137
49.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
30
21.7%
1 18
13.0%
3 14
10.1%
5 14
10.1%
2 13
9.4%
4 9
 
6.5%
6 8
 
5.8%
0 8
 
5.8%
9 8
 
5.8%
8 6
 
4.3%
Other values (4) 10
 
7.2%
Hangul
ValueCountFrequency (%)
23
16.8%
11
 
8.0%
10
 
7.3%
8
 
5.8%
7
 
5.1%
6
 
4.4%
5
 
3.6%
4
 
2.9%
4
 
2.9%
3
 
2.2%
Other values (34) 56
40.9%

용량(세제곱미터)
Real number (ℝ)

Distinct31
Distinct (%)93.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean300.85515
Minimum11
Maximum1203
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size429.0 B
2023-12-12T21:08:36.560071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile16.8
Q188
median230
Q3453.64
95-th percentile778.58
Maximum1203
Range1192
Interquartile range (IQR)365.64

Descriptive statistics

Standard deviation272.81924
Coefficient of variation (CV)0.90681258
Kurtosis2.3616806
Mean300.85515
Median Absolute Deviation (MAD)161
Skewness1.3821514
Sum9928.22
Variance74430.336
MonotonicityNot monotonic
2023-12-12T21:08:36.746313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
20.0 2
 
6.1%
300.0 2
 
6.1%
12.0 1
 
3.0%
1203.0 1
 
3.0%
88.0 1
 
3.0%
784.4 1
 
3.0%
65.0 1
 
3.0%
538.0 1
 
3.0%
69.0 1
 
3.0%
220.0 1
 
3.0%
Other values (21) 21
63.6%
ValueCountFrequency (%)
11.0 1
3.0%
12.0 1
3.0%
20.0 2
6.1%
27.0 1
3.0%
65.0 1
3.0%
69.0 1
3.0%
75.0 1
3.0%
88.0 1
3.0%
100.0 1
3.0%
103.8 1
3.0%
ValueCountFrequency (%)
1203.0 1
3.0%
784.4 1
3.0%
774.7 1
3.0%
570.0 1
3.0%
542.0 1
3.0%
538.0 1
3.0%
528.53 1
3.0%
517.45 1
3.0%
453.64 1
3.0%
445.2 1
3.0%

용도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)12.1%
Missing0
Missing (%)0.0%
Memory size396.0 B
조경용수
28 
조경용수+청소용수
화장실세정수
 
1
화장실세정수+조경용수
 
1

Length

Max length11
Median length4
Mean length4.7272727
Min length4

Unique

Unique2 ?
Unique (%)6.1%

Sample

1st row조경용수
2nd row조경용수
3rd row조경용수
4th row조경용수
5th row화장실세정수

Common Values

ValueCountFrequency (%)
조경용수 28
84.8%
조경용수+청소용수 3
 
9.1%
화장실세정수 1
 
3.0%
화장실세정수+조경용수 1
 
3.0%

Length

2023-12-12T21:08:36.950574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:08:37.101221image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
조경용수 28
84.8%
조경용수+청소용수 3
 
9.1%
화장실세정수 1
 
3.0%
화장실세정수+조경용수 1
 
3.0%

설치년도
Real number (ℝ)

HIGH CORRELATION 

Distinct14
Distinct (%)42.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.9091
Minimum2003
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size429.0 B
2023-12-12T21:08:37.241445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2003
5-th percentile2008.6
Q12016
median2018
Q32020
95-th percentile2021
Maximum2022
Range19
Interquartile range (IQR)4

Descriptive statistics

Standard deviation4.3615052
Coefficient of variation (CV)0.0021624699
Kurtosis2.2360392
Mean2016.9091
Median Absolute Deviation (MAD)2
Skewness-1.5440521
Sum66558
Variance19.022727
MonotonicityNot monotonic
2023-12-12T21:08:37.402730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
2020 8
24.2%
2018 6
18.2%
2017 4
12.1%
2021 3
 
9.1%
2014 2
 
6.1%
2019 2
 
6.1%
2003 1
 
3.0%
2008 1
 
3.0%
2009 1
 
3.0%
2010 1
 
3.0%
Other values (4) 4
12.1%
ValueCountFrequency (%)
2003 1
 
3.0%
2008 1
 
3.0%
2009 1
 
3.0%
2010 1
 
3.0%
2012 1
 
3.0%
2013 1
 
3.0%
2014 2
 
6.1%
2016 1
 
3.0%
2017 4
12.1%
2018 6
18.2%
ValueCountFrequency (%)
2022 1
 
3.0%
2021 3
 
9.1%
2020 8
24.2%
2019 2
 
6.1%
2018 6
18.2%
2017 4
12.1%
2016 1
 
3.0%
2014 2
 
6.1%
2013 1
 
3.0%
2012 1
 
3.0%

Interactions

2023-12-12T21:08:33.409875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:32.887102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:33.140774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:33.506254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:32.968483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:33.239527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:33.606459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:33.044997image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:33.322992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:08:37.520918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번건물명설치위치용량(세제곱미터)용도설치년도
연번1.0001.0001.0000.3590.5840.838
건물명1.0001.0001.0001.0001.0001.000
설치위치1.0001.0001.0001.0001.0001.000
용량(세제곱미터)0.3591.0001.0001.0000.0000.000
용도0.5841.0001.0000.0001.0000.906
설치년도0.8381.0001.0000.0000.9061.000
2023-12-12T21:08:37.674568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번용량(세제곱미터)설치년도용도
연번1.0000.4850.9800.336
용량(세제곱미터)0.4851.0000.4990.000
설치년도0.9800.4991.0000.537
용도0.3360.0000.5371.000

Missing values

2023-12-12T21:08:33.867214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:08:34.007083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번건물명설치위치용량(세제곱미터)용도설치년도
01효자중학교부용로185번길1512.0조경용수2003
12동암중학교장곡로226번길 13220.0조경용수2008
23발곡고등학교동일로454번길 15020.0조경용수2009
34의정부공업고등학교가능로 5727.0조경용수2010
45의정부역사시민로62240.0화장실세정수2012
56송민학교민락로 262198.0조경용수2013
67코스트코 홀세일 의정부점용민로489번길 911.0조경용수2014
78경기도교육청 북부청사동일로700230.0화장실세정수+조경용수2014
89한국전력공사 경기북부지역본부용민로19번길 80172.5조경용수2016
910호반 베르디움 1차민락로211774.7조경용수+청소용수2017
연번건물명설치위치용량(세제곱미터)용도설치년도
2324송학글래드스톤 앤 그레이스모나코천보로 14453.64조경용수2020
2425을지대 캠퍼스 및 부속병원금오동 439-38 외 35220.0조경용수2020
2526고산 대방노블랜드고산 C5블럭300.0조경용수2020
2627용현산업단지 기업지원센터용현동 524-3번지69.0조경용수2020
2728고산지구 3단지 (계룡건설)고산 S-3블럭538.0조경용수2020
2829성암문화 체육비전센터용현동 552번지65.0조경용수2020
2930가능더샵 파크이비뉴가능생활권2구역300.0조경용수2021
3031탑석 센트럴자이송산생활권1구역784.4조경용수2021
3132송산3동 공공복합청사낙양동 750번지88.0조경용수2021
3233의정부역 센트럴 자이 위브캐슬의정부동 380번지1203.0조경용수2022