Overview

Dataset statistics

Number of variables8
Number of observations47
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.2 KiB
Average record size in memory68.8 B

Variable types

Categorical4
Text2
Numeric1
Boolean1

Alerts

집계년도 has constant value ""Constant
정수장상수도구분명 has constant value ""Constant
정수방법명 is highly overall correlated with 용도구분명High correlation
용도구분명 is highly overall correlated with 정수방법명High correlation
용도구분명 is highly imbalanced (51.1%)Imbalance

Reproduction

Analysis started2024-03-12 23:19:07.851847
Analysis finished2024-03-12 23:19:08.303835
Duration0.45 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

집계년도
Categorical

CONSTANT 

Distinct1
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size508.0 B
2024
47 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2024
2nd row2024
3rd row2024
4th row2024
5th row2024

Common Values

ValueCountFrequency (%)
2024 47
100.0%

Length

2024-03-13T08:19:08.348998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T08:19:08.422795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2024 47
100.0%
Distinct27
Distinct (%)57.4%
Missing0
Missing (%)0.0%
Memory size508.0 B
2024-03-13T08:19:08.538330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length3.106383
Min length3

Characters and Unicode

Total characters146
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)29.8%

Sample

1st row안양시
2nd row안양시
3rd row평택시
4th row평택시
5th row파주시
ValueCountFrequency (%)
안산시 4
 
8.5%
가평군 3
 
6.4%
양평군 3
 
6.4%
포천시 3
 
6.4%
안양시 3
 
6.4%
광주시 3
 
6.4%
성남시 2
 
4.3%
남양주시 2
 
4.3%
파주시 2
 
4.3%
안성시 2
 
4.3%
Other values (17) 20
42.6%
2024-03-13T08:19:08.795294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
40
27.4%
9
 
6.2%
9
 
6.2%
9
 
6.2%
9
 
6.2%
8
 
5.5%
8
 
5.5%
5
 
3.4%
5
 
3.4%
4
 
2.7%
Other values (24) 40
27.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 146
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
40
27.4%
9
 
6.2%
9
 
6.2%
9
 
6.2%
9
 
6.2%
8
 
5.5%
8
 
5.5%
5
 
3.4%
5
 
3.4%
4
 
2.7%
Other values (24) 40
27.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 146
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
40
27.4%
9
 
6.2%
9
 
6.2%
9
 
6.2%
9
 
6.2%
8
 
5.5%
8
 
5.5%
5
 
3.4%
5
 
3.4%
4
 
2.7%
Other values (24) 40
27.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 146
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
40
27.4%
9
 
6.2%
9
 
6.2%
9
 
6.2%
9
 
6.2%
8
 
5.5%
8
 
5.5%
5
 
3.4%
5
 
3.4%
4
 
2.7%
Other values (24) 40
27.4%

정수장상수도구분명
Categorical

CONSTANT 

Distinct1
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size508.0 B
지방정수장
47 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row지방정수장
2nd row지방정수장
3rd row지방정수장
4th row지방정수장
5th row지방정수장

Common Values

ValueCountFrequency (%)
지방정수장 47
100.0%

Length

2024-03-13T08:19:08.889820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T08:19:08.955167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지방정수장 47
100.0%
Distinct44
Distinct (%)93.6%
Missing0
Missing (%)0.0%
Memory size508.0 B
2024-03-13T08:19:09.096349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length2
Mean length2.2978723
Min length2

Characters and Unicode

Total characters108
Distinct characters62
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique41 ?
Unique (%)87.2%

Sample

1st row포일
2nd row청계통합
3rd row유천
4th row송탄
5th row문산
ValueCountFrequency (%)
연성 2
 
4.3%
안산 2
 
4.3%
동두천 2
 
4.3%
도곡 1
 
2.1%
까치울 1
 
2.1%
양동 1
 
2.1%
화도 1
 
2.1%
포일 1
 
2.1%
연천 1
 
2.1%
양서 1
 
2.1%
Other values (34) 34
72.3%
2024-03-13T08:19:09.376627image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6
 
5.6%
5
 
4.6%
5
 
4.6%
4
 
3.7%
4
 
3.7%
4
 
3.7%
3
 
2.8%
3
 
2.8%
3
 
2.8%
3
 
2.8%
Other values (52) 68
63.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 103
95.4%
Decimal Number 5
 
4.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6
 
5.8%
5
 
4.9%
5
 
4.9%
4
 
3.9%
4
 
3.9%
4
 
3.9%
3
 
2.9%
3
 
2.9%
3
 
2.9%
3
 
2.9%
Other values (49) 63
61.2%
Decimal Number
ValueCountFrequency (%)
3 2
40.0%
2 2
40.0%
1 1
20.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 103
95.4%
Common 5
 
4.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6
 
5.8%
5
 
4.9%
5
 
4.9%
4
 
3.9%
4
 
3.9%
4
 
3.9%
3
 
2.9%
3
 
2.9%
3
 
2.9%
3
 
2.9%
Other values (49) 63
61.2%
Common
ValueCountFrequency (%)
3 2
40.0%
2 2
40.0%
1 1
20.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 103
95.4%
ASCII 5
 
4.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6
 
5.8%
5
 
4.9%
5
 
4.9%
4
 
3.9%
4
 
3.9%
4
 
3.9%
3
 
2.9%
3
 
2.9%
3
 
2.9%
3
 
2.9%
Other values (49) 63
61.2%
ASCII
ValueCountFrequency (%)
3 2
40.0%
2 2
40.0%
1 1
20.0%
Distinct37
Distinct (%)78.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73161.702
Minimum900
Maximum560000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size555.0 B
2024-03-13T08:19:09.473033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum900
5-th percentile1150
Q112500
median50000
Q381500
95-th percentile234400
Maximum560000
Range559100
Interquartile range (IQR)69000

Descriptive statistics

Standard deviation100757.28
Coefficient of variation (CV)1.3771861
Kurtosis11.310845
Mean73161.702
Median Absolute Deviation (MAD)35000
Skewness2.9491233
Sum3438600
Variance1.015203 × 1010
MonotonicityNot monotonic
2024-03-13T08:19:09.568763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
50000 5
 
10.6%
60000 3
 
6.4%
150000 2
 
4.3%
10000 2
 
4.3%
1000 2
 
4.3%
15000 2
 
4.3%
8000 1
 
2.1%
26000 1
 
2.1%
72000 1
 
2.1%
55000 1
 
2.1%
Other values (27) 27
57.4%
ValueCountFrequency (%)
900 1
2.1%
1000 2
4.3%
1500 1
2.1%
1700 1
2.1%
2000 1
2.1%
5000 1
2.1%
5500 1
2.1%
6000 1
2.1%
8000 1
2.1%
10000 2
4.3%
ValueCountFrequency (%)
560000 1
2.1%
280000 1
2.1%
235000 1
2.1%
233000 1
2.1%
223000 1
2.1%
182000 1
2.1%
150000 2
4.3%
110000 1
2.1%
100000 1
2.1%
96000 1
2.1%

정수방법명
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)31.9%
Missing0
Missing (%)0.0%
Memory size508.0 B
급속여과
26 
고도정수처리
완속여과
활성탄여과
 
1
고도처리(전오존+활성탄)
 
1
Other values (10)
10 

Length

Max length13
Median length4
Mean length4.893617
Min length4

Unique

Unique12 ?
Unique (%)25.5%

Sample

1st row급속여과
2nd row급속여과
3rd row고도정수처리
4th row급속여과
5th row활성탄여과

Common Values

ValueCountFrequency (%)
급속여과 26
55.3%
고도정수처리 5
 
10.6%
완속여과 4
 
8.5%
활성탄여과 1
 
2.1%
고도처리(전오존+활성탄) 1
 
2.1%
고도처리 1
 
2.1%
직접여과 1
 
2.1%
소독만의 방식 1
 
2.1%
막여과(UF) 1
 
2.1%
급속여과방식 1
 
2.1%
Other values (5) 5
 
10.6%

Length

2024-03-13T08:19:09.663796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
급속여과 26
54.2%
고도정수처리 5
 
10.4%
완속여과 4
 
8.3%
활성탄여과 1
 
2.1%
고도처리(전오존+활성탄 1
 
2.1%
고도처리 1
 
2.1%
직접여과 1
 
2.1%
소독만의 1
 
2.1%
방식 1
 
2.1%
막여과(uf 1
 
2.1%
Other values (6) 6
 
12.5%

용도구분명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size508.0 B
생활용수
42 
공업용수

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row생활용수
2nd row생활용수
3rd row생활용수
4th row생활용수
5th row생활용수

Common Values

ValueCountFrequency (%)
생활용수 42
89.4%
공업용수 5
 
10.6%

Length

2024-03-13T08:19:09.749412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T08:19:09.815854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
생활용수 42
89.4%
공업용수 5
 
10.6%
Distinct2
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size179.0 B
True
40 
False
ValueCountFrequency (%)
True 40
85.1%
False 7
 
14.9%
2024-03-13T08:19:09.873415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Interactions

2024-03-13T08:19:08.077665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-13T08:19:09.918494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
정수장수도사업자명정수장명정수장설계시설용량(㎥/일)정수방법명용도구분명정수장가동여부
정수장수도사업자명1.0001.0000.7180.5510.0000.000
정수장명1.0001.0000.4380.9570.0001.000
정수장설계시설용량(㎥/일)0.7180.4381.0000.0000.0000.000
정수방법명0.5510.9570.0001.0000.6910.118
용도구분명0.0000.0000.0000.6911.0000.000
정수장가동여부0.0001.0000.0000.1180.0001.000
2024-03-13T08:19:10.190298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
용도구분명정수방법명정수장가동여부
용도구분명1.0000.5370.000
정수방법명0.5371.0000.041
정수장가동여부0.0000.0411.000
2024-03-13T08:19:10.271126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
정수장설계시설용량(㎥/일)정수방법명용도구분명정수장가동여부
정수장설계시설용량(㎥/일)1.0000.0000.0000.000
정수방법명0.0001.0000.5370.041
용도구분명0.0000.5371.0000.000
정수장가동여부0.0000.0410.0001.000

Missing values

2024-03-13T08:19:08.170833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T08:19:08.266371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

집계년도정수장수도사업자명정수장상수도구분명정수장명정수장설계시설용량(㎥/일)정수방법명용도구분명정수장가동여부
02024안양시지방정수장포일150000급속여과생활용수Y
12024안양시지방정수장청계통합182000급속여과생활용수Y
22024평택시지방정수장유천15000고도정수처리생활용수Y
32024평택시지방정수장송탄15000급속여과생활용수Y
42024파주시지방정수장문산96000활성탄여과생활용수Y
52024파주시지방정수장금촌6000급속여과생활용수N
62024의정부시지방정수장가능8000급속여과생활용수N
72024김포시지방정수장고촌223000고도처리(전오존+활성탄)생활용수Y
82024광주시지방정수장광주124000급속여과생활용수Y
92024광주시지방정수장광주380000급속여과생활용수Y
집계년도정수장수도사업자명정수장상수도구분명정수장명정수장설계시설용량(㎥/일)정수방법명용도구분명정수장가동여부
372024성남시지방정수장복정234000급속여과생활용수N
382024성남시지방정수장복정3280000급속여과생활용수Y
392024부천시지방정수장까치울235000급속여과생활용수Y
402024안산시지방정수장안산83000고도정수처리생활용수Y
412024안산시지방정수장안산60000급속여과공업용수Y
422024안산시지방정수장연성233000급속여과생활용수Y
432024안산시지방정수장연성150000응집침전공업용수Y
442024남양주시지방정수장도곡16000고도정수처리생활용수Y
452024남양주시지방정수장화도55000고도정수처리생활용수Y
462024안양시지방정수장비산72000급속여과생활용수N