Overview

Dataset statistics

Number of variables13
Number of observations25
Missing cells50
Missing cells (%)15.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.8 KiB
Average record size in memory114.3 B

Variable types

Text2
Unsupported2
Categorical9

Dataset

Description파일 다운로드
Author강남구
URLhttps://data.seoul.go.kr/dataList/OA-15010/S/1/datasetView.do

Alerts

관리기관전화번호 has constant value ""Constant
관리기관명 has constant value ""Constant
데이터기준일자 has constant value ""Constant
위도 is highly overall correlated with 경도 and 4 other fieldsHigh correlation
일평균이용인구수 is highly overall correlated with 위도 and 1 other fieldsHigh correlation
부적합항목 is highly overall correlated with 위도 and 3 other fieldsHigh correlation
수질검사결과구분 is highly overall correlated with 위도 and 2 other fieldsHigh correlation
경도 is highly overall correlated with 위도 and 4 other fieldsHigh correlation
수질검사일자 is highly overall correlated with 위도 and 2 other fieldsHigh correlation
위도 is highly imbalanced (69.6%)Imbalance
경도 is highly imbalanced (69.6%)Imbalance
소재지도로명주소 has 25 (100.0%) missing valuesMissing
지정일자 has 25 (100.0%) missing valuesMissing
약수터명 has unique valuesUnique
소재지도로명주소 is an unsupported type, check if it needs cleaning or further analysisUnsupported
지정일자 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 05:29:42.249571
Analysis finished2023-12-11 05:29:42.963334
Duration0.71 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

약수터명
Text

UNIQUE 

Distinct25
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
2023-12-11T14:29:43.104197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length2
Mean length2.56
Min length2

Characters and Unicode

Total characters64
Distinct characters40
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique25 ?
Unique (%)100.0%

Sample

1st row개암
2nd row천의
3rd row대룡
4th row대천
5th row구룡산
ValueCountFrequency (%)
개암 1
 
4.0%
대모천 1
 
4.0%
불국사 1
 
4.0%
용두천 1
 
4.0%
인수천 1
 
4.0%
임록천 1
 
4.0%
못골 1
 
4.0%
옥수천 1
 
4.0%
실로암 1
 
4.0%
성지 1
 
4.0%
Other values (15) 15
60.0%
2023-12-11T14:29:43.490096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9
 
14.1%
4
 
6.2%
3
 
4.7%
3
 
4.7%
3
 
4.7%
3
 
4.7%
2
 
3.1%
2
 
3.1%
2
 
3.1%
1 2
 
3.1%
Other values (30) 31
48.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 60
93.8%
Decimal Number 4
 
6.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9
 
15.0%
4
 
6.7%
3
 
5.0%
3
 
5.0%
3
 
5.0%
3
 
5.0%
2
 
3.3%
2
 
3.3%
2
 
3.3%
1
 
1.7%
Other values (28) 28
46.7%
Decimal Number
ValueCountFrequency (%)
1 2
50.0%
2 2
50.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 60
93.8%
Common 4
 
6.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9
 
15.0%
4
 
6.7%
3
 
5.0%
3
 
5.0%
3
 
5.0%
3
 
5.0%
2
 
3.3%
2
 
3.3%
2
 
3.3%
1
 
1.7%
Other values (28) 28
46.7%
Common
ValueCountFrequency (%)
1 2
50.0%
2 2
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 60
93.8%
ASCII 4
 
6.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9
 
15.0%
4
 
6.7%
3
 
5.0%
3
 
5.0%
3
 
5.0%
3
 
5.0%
2
 
3.3%
2
 
3.3%
2
 
3.3%
1
 
1.7%
Other values (28) 28
46.7%
ASCII
ValueCountFrequency (%)
1 2
50.0%
2 2
50.0%

소재지도로명주소
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing25
Missing (%)100.0%
Memory size357.0 B
Distinct23
Distinct (%)92.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
2023-12-11T14:29:43.696883image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length20
Mean length19.32
Min length17

Characters and Unicode

Total characters483
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)84.0%

Sample

1st row서울특별시 강남구 개포동 산53-20
2nd row서울특별시 강남구 개포동 산53-30
3rd row서울특별시 강남구 개포동 118-21
4th row서울특별시 강남구 개포동 산53-31
5th row서울특별시 강남구 개포동 1017-7
ValueCountFrequency (%)
서울특별시 25
25.0%
강남구 25
25.0%
개포동 11
11.0%
일원동 7
 
7.0%
산53-28 2
 
2.0%
산63-32 2
 
2.0%
자곡동 2
 
2.0%
개포당 1
 
1.0%
산63-51 1
 
1.0%
산53-32 1
 
1.0%
Other values (23) 23
23.0%
2023-12-11T14:29:44.057856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
75
15.5%
26
 
5.4%
25
 
5.2%
25
 
5.2%
25
 
5.2%
25
 
5.2%
25
 
5.2%
25
 
5.2%
25
 
5.2%
24
 
5.0%
Other values (24) 183
37.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 294
60.9%
Decimal Number 93
 
19.3%
Space Separator 75
 
15.5%
Dash Punctuation 21
 
4.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
26
8.8%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
24
8.2%
19
 
6.5%
Other values (12) 50
17.0%
Decimal Number
ValueCountFrequency (%)
3 23
24.7%
1 17
18.3%
2 14
15.1%
5 12
12.9%
4 7
 
7.5%
6 7
 
7.5%
0 4
 
4.3%
9 3
 
3.2%
8 3
 
3.2%
7 3
 
3.2%
Space Separator
ValueCountFrequency (%)
75
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 294
60.9%
Common 189
39.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
26
8.8%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
24
8.2%
19
 
6.5%
Other values (12) 50
17.0%
Common
ValueCountFrequency (%)
75
39.7%
3 23
 
12.2%
- 21
 
11.1%
1 17
 
9.0%
2 14
 
7.4%
5 12
 
6.3%
4 7
 
3.7%
6 7
 
3.7%
0 4
 
2.1%
9 3
 
1.6%
Other values (2) 6
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 294
60.9%
ASCII 189
39.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
75
39.7%
3 23
 
12.2%
- 21
 
11.1%
1 17
 
9.0%
2 14
 
7.4%
5 12
 
6.3%
4 7
 
3.7%
6 7
 
3.7%
0 4
 
2.1%
9 3
 
1.6%
Other values (2) 6
 
3.2%
Hangul
ValueCountFrequency (%)
26
8.8%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
25
8.5%
24
8.2%
19
 
6.5%
Other values (12) 50
17.0%

위도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)12.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
<NA>
23 
37.47711586
 
1
37.52079791
 
1

Length

Max length11
Median length4
Mean length4.56
Min length4

Unique

Unique2 ?
Unique (%)8.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 23
92.0%
37.47711586 1
 
4.0%
37.52079791 1
 
4.0%

Length

2023-12-11T14:29:44.231159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:29:44.362768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 23
92.0%
37.47711586 1
 
4.0%
37.52079791 1
 
4.0%

경도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)12.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
<NA>
23 
127.0518861
 
1
127.0542756
 
1

Length

Max length11
Median length4
Mean length4.56
Min length4

Unique

Unique2 ?
Unique (%)8.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 23
92.0%
127.0518861 1
 
4.0%
127.0542756 1
 
4.0%

Length

2023-12-11T14:29:44.514235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:29:44.641570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 23
92.0%
127.0518861 1
 
4.0%
127.0542756 1
 
4.0%

지정일자
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing25
Missing (%)100.0%
Memory size357.0 B

일평균이용인구수
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)20.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
150
100
200
300
250

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row150
2nd row200
3rd row150
4th row100
5th row150

Common Values

ValueCountFrequency (%)
150 7
28.0%
100 7
28.0%
200 6
24.0%
300 3
12.0%
250 2
 
8.0%

Length

2023-12-11T14:29:44.763429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:29:44.899660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
150 7
28.0%
100 7
28.0%
200 6
24.0%
300 3
12.0%
250 2
 
8.0%

수질검사일자
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
2020-07-06
22 
2020-07-07

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-07-06
2nd row2020-07-06
3rd row2020-07-06
4th row2020-07-06
5th row2020-07-06

Common Values

ValueCountFrequency (%)
2020-07-06 22
88.0%
2020-07-07 3
 
12.0%

Length

2023-12-11T14:29:45.034076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:29:45.151995image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020-07-06 22
88.0%
2020-07-07 3
 
12.0%

수질검사결과구분
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
부적합
18 
적합

Length

Max length3
Median length3
Mean length2.72
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부적합
2nd row부적합
3rd row부적합
4th row적합
5th row부적합

Common Values

ValueCountFrequency (%)
부적합 18
72.0%
적합 7
 
28.0%

Length

2023-12-11T14:29:45.317753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:29:45.451048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부적합 18
72.0%
적합 7
 
28.0%

부적합항목
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)20.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
총대장균군 검출
11 
<NA>
미검사
일반세균,대장균군,총대장균군검출
총대장균군, 대장균군 검출
 
1

Length

Max length17
Median length14
Mean length7.6
Min length3

Unique

Unique1 ?
Unique (%)4.0%

Sample

1st row총대장균군, 대장균군 검출
2nd row총대장균군 검출
3rd row총대장균군 검출
4th row<NA>
5th row총대장균군 검출

Common Values

ValueCountFrequency (%)
총대장균군 검출 11
44.0%
<NA> 7
28.0%
미검사 3
 
12.0%
일반세균,대장균군,총대장균군검출 3
 
12.0%
총대장균군, 대장균군 검출 1
 
4.0%

Length

2023-12-11T14:29:45.619144image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:29:45.770167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
총대장균군 12
31.6%
검출 12
31.6%
na 7
18.4%
미검사 3
 
7.9%
일반세균,대장균군,총대장균군검출 3
 
7.9%
대장균군 1
 
2.6%

관리기관전화번호
Categorical

CONSTANT 

Distinct1
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
02-3423-6284
25 

Length

Max length12
Median length12
Mean length12
Min length12

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row02-3423-6284
2nd row02-3423-6284
3rd row02-3423-6284
4th row02-3423-6284
5th row02-3423-6284

Common Values

ValueCountFrequency (%)
02-3423-6284 25
100.0%

Length

2023-12-11T14:29:45.910896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:29:46.026218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
02-3423-6284 25
100.0%

관리기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
서울특별시 강남구청
25 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울특별시 강남구청
2nd row서울특별시 강남구청
3rd row서울특별시 강남구청
4th row서울특별시 강남구청
5th row서울특별시 강남구청

Common Values

ValueCountFrequency (%)
서울특별시 강남구청 25
100.0%

Length

2023-12-11T14:29:46.147514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:29:46.246260image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울특별시 25
50.0%
강남구청 25
50.0%

데이터기준일자
Categorical

CONSTANT 

Distinct1
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
2023-05-31
25 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-05-31
2nd row2023-05-31
3rd row2023-05-31
4th row2023-05-31
5th row2023-05-31

Common Values

ValueCountFrequency (%)
2023-05-31 25
100.0%

Length

2023-12-11T14:29:46.376607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:29:46.494591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023-05-31 25
100.0%

Correlations

2023-12-11T14:29:46.568364image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
약수터명소재지지번주소위도경도일평균이용인구수수질검사일자수질검사결과구분부적합항목
약수터명1.0001.0000.0000.0001.0001.0001.0001.000
소재지지번주소1.0001.0000.0000.0001.0001.0001.0001.000
위도0.0000.0001.0000.0000.000NaNNaNNaN
경도0.0000.0000.0001.0000.000NaNNaNNaN
일평균이용인구수1.0001.0000.0000.0001.0000.0900.4350.000
수질검사일자1.0001.000NaNNaN0.0901.0000.0001.000
수질검사결과구분1.0001.000NaNNaN0.4350.0001.000NaN
부적합항목1.0001.000NaNNaN0.0001.000NaN1.000
2023-12-11T14:29:46.712741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위도일평균이용인구수부적합항목수질검사결과구분경도수질검사일자
위도1.0001.0001.0001.0001.0001.000
일평균이용인구수1.0001.0000.0000.4881.0000.061
부적합항목1.0000.0001.0001.0001.0000.935
수질검사결과구분1.0000.4881.0001.0001.0000.000
경도1.0001.0001.0001.0001.0001.000
수질검사일자1.0000.0610.9350.0001.0001.000
2023-12-11T14:29:46.833801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위도경도일평균이용인구수수질검사일자수질검사결과구분부적합항목
위도1.0001.0001.0001.0001.0001.000
경도1.0001.0001.0001.0001.0001.000
일평균이용인구수1.0001.0001.0000.0610.4880.000
수질검사일자1.0001.0000.0611.0000.0000.935
수질검사결과구분1.0001.0000.4880.0001.0001.000
부적합항목1.0001.0000.0000.9351.0001.000

Missing values

2023-12-11T14:29:42.696913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T14:29:42.880599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

약수터명소재지도로명주소소재지지번주소위도경도지정일자일평균이용인구수수질검사일자수질검사결과구분부적합항목관리기관전화번호관리기관명데이터기준일자
0개암<NA>서울특별시 강남구 개포동 산53-20<NA><NA><NA>1502020-07-06부적합총대장균군, 대장균군 검출02-3423-6284서울특별시 강남구청2023-05-31
1천의<NA>서울특별시 강남구 개포동 산53-30<NA><NA><NA>2002020-07-06부적합총대장균군 검출02-3423-6284서울특별시 강남구청2023-05-31
2대룡<NA>서울특별시 강남구 개포동 118-21<NA><NA><NA>1502020-07-06부적합총대장균군 검출02-3423-6284서울특별시 강남구청2023-05-31
3대천<NA>서울특별시 강남구 개포동 산53-31<NA><NA><NA>1002020-07-06적합<NA>02-3423-6284서울특별시 강남구청2023-05-31
4구룡산<NA>서울특별시 강남구 개포동 1017-7<NA><NA><NA>1502020-07-06부적합총대장균군 검출02-3423-6284서울특별시 강남구청2023-05-31
5습지원<NA>서울특별시 강남구 개포동 산53-42<NA><NA><NA>1002020-07-06부적합미검사02-3423-6284서울특별시 강남구청2023-05-31
6율암<NA>서울특별시 강남구 세곡동 산52-44<NA><NA><NA>1002020-07-06부적합총대장균군 검출02-3423-6284서울특별시 강남구청2023-05-31
7만수정<NA>서울특별시 강남구 자곡동 산15<NA><NA><NA>2002020-07-06부적합미검사02-3423-6284서울특별시 강남구청2023-05-31
8구룡천2<NA>서울특별시 강남구 개포동 산53-28<NA><NA><NA>3002020-07-06부적합총대장균군 검출02-3423-6284서울특별시 강남구청2023-05-31
9매봉<NA>서울특별시 강남구 도곡동 산31-3<NA><NA><NA>2002020-07-07부적합일반세균,대장균군,총대장균군검출02-3423-6284서울특별시 강남구청2023-05-31
약수터명소재지도로명주소소재지지번주소위도경도지정일자일평균이용인구수수질검사일자수질검사결과구분부적합항목관리기관전화번호관리기관명데이터기준일자
15옛2<NA>서울특별시 강남구 일원동 산63-32<NA><NA><NA>2502020-07-06적합<NA>02-3423-6284서울특별시 강남구청2023-05-31
16성지<NA>서울특별시 강남구 일원동 산63-32<NA><NA><NA>2502020-07-06적합<NA>02-3423-6284서울특별시 강남구청2023-05-31
17실로암<NA>서울특별시 강남구 일원동 산63-51<NA><NA><NA>1002020-07-06적합<NA>02-3423-6284서울특별시 강남구청2023-05-31
18옥수천<NA>서울특별시 강남구 개포동 산53-32<NA><NA><NA>1002020-07-06적합<NA>02-3423-6284서울특별시 강남구청2023-05-31
19못골<NA>서울특별시 강남구 자곡동 산39-10<NA><NA><NA>1502020-07-06부적합미검사02-3423-6284서울특별시 강남구청2023-05-31
20임록천<NA>서울특별시 강남구 개포당 산53-41<NA><NA><NA>1002020-07-06부적합총대장균군 검출02-3423-6284서울특별시 강남구청2023-05-31
21인수천<NA>서울특별시 강남구 일원동 산63-1<NA><NA><NA>3002020-07-06부적합총대장균군 검출02-3423-6284서울특별시 강남구청2023-05-31
22용두천<NA>서울특별시 강남구 개포동 192<NA><NA><NA>1502020-07-06부적합총대장균군 검출02-3423-6284서울특별시 강남구청2023-05-31
23불국사<NA>서울특별시 강남구 일원동 441<NA><NA><NA>2002020-07-06적합<NA>02-3423-6284서울특별시 강남구청2023-05-31
24구룡천1<NA>서울특별시 강남구 개포동 산53-28<NA><NA><NA>3002020-07-06부적합총대장균군 검출02-3423-6284서울특별시 강남구청2023-05-31