Overview

Dataset statistics

Number of variables9
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory800.8 KiB
Average record size in memory82.0 B

Variable types

Text1
Categorical6
Numeric2

Dataset

Description강원도 속초시 상하수도요금관리 수용가기초정보입니다. 관리번호, 검침원코드, 종류코드, 구경코드, 형식코드, 설치구분코드, 대표업종코드, 세대수 자료가 포함되어있습니다.
Author강원도 속초시
URLhttps://www.data.go.kr/data/15093752/fileData.do

Alerts

겸업종코드 is highly overall correlated with 구경코드 and 5 other fieldsHigh correlation
형식코드 is highly overall correlated with 구경코드 and 5 other fieldsHigh correlation
설치구분코드 is highly overall correlated with 형식코드 and 1 other fieldsHigh correlation
대표업종코드 is highly overall correlated with 형식코드 and 1 other fieldsHigh correlation
종류코드 is highly overall correlated with 형식코드 and 1 other fieldsHigh correlation
검침원코드 is highly overall correlated with 형식코드 and 1 other fieldsHigh correlation
구경코드 is highly overall correlated with 형식코드 and 1 other fieldsHigh correlation
세대수 is highly overall correlated with 형식코드 and 1 other fieldsHigh correlation
형식코드 is highly imbalanced (97.5%)Imbalance
설치구분코드 is highly imbalanced (74.2%)Imbalance
대표업종코드 is highly imbalanced (51.6%)Imbalance
겸업종코드 is highly imbalanced (91.6%)Imbalance
세대수 is highly skewed (γ1 = 30.21706746)Skewed

Reproduction

Analysis started2023-12-12 05:05:56.040444
Analysis finished2023-12-12 05:05:58.247302
Duration2.21 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct9994
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T14:05:58.398281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length16
Mean length16
Min length16

Characters and Unicode

Total characters160000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9988 ?
Unique (%)99.9%

Sample

1st row05-04-138-000000
2nd row01-03-155-290000
3rd row06-22-104-000000
4th row09-02-157-030000
5th row13-06-375-000000
ValueCountFrequency (%)
06-12-146-000000 2
 
< 0.1%
08-04-106-010000 2
 
< 0.1%
05-01-295-000000 2
 
< 0.1%
06-21-161-040000 2
 
< 0.1%
06-10-087-030000 2
 
< 0.1%
06-22-135-000000 2
 
< 0.1%
12-00-730-000000 1
 
< 0.1%
05-04-138-000000 1
 
< 0.1%
05-04-052-000000 1
 
< 0.1%
07-05-125-000000 1
 
< 0.1%
Other values (9984) 9984
99.8%
2023-12-12T14:05:58.814903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 78810
49.3%
- 30000
 
18.8%
1 13081
 
8.2%
2 8689
 
5.4%
3 5956
 
3.7%
6 5358
 
3.3%
4 4724
 
3.0%
8 3931
 
2.5%
5 3773
 
2.4%
7 2954
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 130000
81.2%
Dash Punctuation 30000
 
18.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 78810
60.6%
1 13081
 
10.1%
2 8689
 
6.7%
3 5956
 
4.6%
6 5358
 
4.1%
4 4724
 
3.6%
8 3931
 
3.0%
5 3773
 
2.9%
7 2954
 
2.3%
9 2724
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
- 30000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 160000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 78810
49.3%
- 30000
 
18.8%
1 13081
 
8.2%
2 8689
 
5.4%
3 5956
 
3.7%
6 5358
 
3.3%
4 4724
 
3.0%
8 3931
 
2.5%
5 3773
 
2.4%
7 2954
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 160000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 78810
49.3%
- 30000
 
18.8%
1 13081
 
8.2%
2 8689
 
5.4%
3 5956
 
3.7%
6 5358
 
3.3%
4 4724
 
3.0%
8 3931
 
2.5%
5 3773
 
2.4%
7 2954
 
1.8%

검침원코드
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
강동훈
1181 
남기화
1133 
정은익
1104 
엄석용
1094 
최종명
1065 
Other values (5)
4423 

Length

Max length3
Median length3
Mean length2.9558
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row천상한
2nd row엄석용
3rd row김창권
4th row최종명
5th row엄석용

Common Values

ValueCountFrequency (%)
강동훈 1181
11.8%
남기화 1133
11.3%
정은익 1104
11.0%
엄석용 1094
10.9%
최종명 1065
10.7%
우종환 1038
10.4%
이재성 1010
10.1%
천상한 993
9.9%
김창권 940
9.4%
김욱 442
 
4.4%

Length

2023-12-12T14:05:59.000146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:05:59.156931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
강동훈 1181
11.8%
남기화 1133
11.3%
정은익 1104
11.0%
엄석용 1094
10.9%
최종명 1065
10.7%
우종환 1038
10.4%
이재성 1010
10.1%
천상한 993
9.9%
김창권 940
9.4%
김욱 442
 
4.4%

종류코드
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
유니온식
8590 
원격식
1410 

Length

Max length4
Median length4
Mean length3.859
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row원격식
2nd row유니온식
3rd row유니온식
4th row유니온식
5th row유니온식

Common Values

ValueCountFrequency (%)
유니온식 8590
85.9%
원격식 1410
 
14.1%

Length

2023-12-12T14:05:59.372740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:05:59.491283image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
유니온식 8590
85.9%
원격식 1410
 
14.1%

구경코드
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.3887
Minimum13
Maximum200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:05:59.589965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile13
Q113
median13
Q313
95-th percentile25
Maximum200
Range187
Interquartile range (IQR)0

Descriptive statistics

Standard deviation9.7065665
Coefficient of variation (CV)0.63075936
Kurtosis81.196301
Mean15.3887
Median Absolute Deviation (MAD)0
Skewness7.7394672
Sum153887
Variance94.217434
MonotonicityNot monotonic
2023-12-12T14:05:59.722055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
13 8704
87.0%
25 525
 
5.2%
20 458
 
4.6%
40 136
 
1.4%
50 87
 
0.9%
80 52
 
0.5%
100 25
 
0.2%
150 12
 
0.1%
200 1
 
< 0.1%
ValueCountFrequency (%)
13 8704
87.0%
20 458
 
4.6%
25 525
 
5.2%
40 136
 
1.4%
50 87
 
0.9%
80 52
 
0.5%
100 25
 
0.2%
150 12
 
0.1%
200 1
 
< 0.1%
ValueCountFrequency (%)
200 1
 
< 0.1%
150 12
 
0.1%
100 25
 
0.2%
80 52
 
0.5%
50 87
 
0.9%
40 136
 
1.4%
25 525
 
5.2%
20 458
 
4.6%
13 8704
87.0%

형식코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9975 
직독식
 
25

Length

Max length4
Median length4
Mean length3.9975
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9975
99.8%
직독식 25
 
0.2%

Length

2023-12-12T14:05:59.871843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:05:59.995363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9975
99.8%
직독식 25
 
0.2%

설치구분코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
정상조정
8644 
가구분할
 
769
정수
 
427
중지
 
117
휴전
 
32
Other values (3)
 
11

Length

Max length6
Median length4
Mean length3.8868
Min length2

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row정상조정
2nd row정상조정
3rd row정상조정
4th row정상조정
5th row정상조정

Common Values

ValueCountFrequency (%)
정상조정 8644
86.4%
가구분할 769
 
7.7%
정수 427
 
4.3%
중지 117
 
1.2%
휴전 32
 
0.3%
정액료미부과 9
 
0.1%
메인정산 1
 
< 0.1%
가산금미조정 1
 
< 0.1%

Length

2023-12-12T14:06:00.145655image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:06:00.325147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정상조정 8644
86.4%
가구분할 769
 
7.7%
정수 427
 
4.3%
중지 117
 
1.2%
휴전 32
 
0.3%
정액료미부과 9
 
0.1%
메인정산 1
 
< 0.1%
가산금미조정 1
 
< 0.1%

대표업종코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
가정용
6665 
일반용
3269 
산업용
 
62
대중탕용
 
4

Length

Max length4
Median length3
Mean length3.0004
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row가정용
2nd row가정용
3rd row가정용
4th row가정용
5th row가정용

Common Values

ValueCountFrequency (%)
가정용 6665
66.6%
일반용 3269
32.7%
산업용 62
 
0.6%
대중탕용 4
 
< 0.1%

Length

2023-12-12T14:06:00.490215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:06:00.603591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
가정용 6665
66.6%
일반용 3269
32.7%
산업용 62
 
0.6%
대중탕용 4
 
< 0.1%

겸업종코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9895 
가정용
 
105

Length

Max length4
Median length4
Mean length3.9895
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9895
99.0%
가정용 105
 
1.1%

Length

2023-12-12T14:06:00.735185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:06:00.835604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9895
99.0%
가정용 105
 
1.1%

세대수
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct55
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.4076
Minimum1
Maximum1319
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:06:00.965926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum1319
Range1318
Interquartile range (IQR)0

Descriptive statistics

Standard deviation26.806819
Coefficient of variation (CV)11.134249
Kurtosis1082.2462
Mean2.4076
Median Absolute Deviation (MAD)0
Skewness30.217067
Sum24076
Variance718.60552
MonotonicityNot monotonic
2023-12-12T14:06:01.150060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 9222
92.2%
2 377
 
3.8%
3 168
 
1.7%
4 71
 
0.7%
5 44
 
0.4%
6 20
 
0.2%
7 17
 
0.2%
10 12
 
0.1%
8 8
 
0.1%
9 6
 
0.1%
Other values (45) 55
 
0.5%
ValueCountFrequency (%)
1 9222
92.2%
2 377
 
3.8%
3 168
 
1.7%
4 71
 
0.7%
5 44
 
0.4%
6 20
 
0.2%
7 17
 
0.2%
8 8
 
0.1%
9 6
 
0.1%
10 12
 
0.1%
ValueCountFrequency (%)
1319 1
< 0.1%
945 1
< 0.1%
866 1
< 0.1%
740 1
< 0.1%
733 1
< 0.1%
635 1
< 0.1%
614 1
< 0.1%
579 2
< 0.1%
501 1
< 0.1%
421 1
< 0.1%

Interactions

2023-12-12T14:05:57.567918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:05:57.275975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:05:57.691350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:05:57.408516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:06:01.290636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
검침원코드종류코드구경코드설치구분코드대표업종코드세대수
검침원코드1.0000.2670.0650.1160.2690.000
종류코드0.2671.0000.0000.1170.0000.000
구경코드0.0650.0001.0000.1230.2300.690
설치구분코드0.1160.1170.1231.0000.1590.121
대표업종코드0.2690.0000.2300.1591.0000.000
세대수0.0000.0000.6900.1210.0001.000
2023-12-12T14:06:01.417495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
겸업종코드형식코드설치구분코드대표업종코드종류코드검침원코드
겸업종코드1.000NaN1.0001.0001.0001.000
형식코드NaN1.0001.0001.0001.0001.000
설치구분코드1.0001.0001.0000.0720.0880.055
대표업종코드1.0001.0000.0721.0000.0000.164
종류코드1.0001.0000.0880.0001.0000.204
검침원코드1.0001.0000.0550.1640.2041.000
2023-12-12T14:06:01.561023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구경코드세대수검침원코드종류코드형식코드설치구분코드대표업종코드겸업종코드
구경코드1.0000.0470.0340.0001.0000.0690.1501.000
세대수0.0471.0000.0000.0001.0000.0590.0001.000
검침원코드0.0340.0001.0000.2041.0000.0550.1641.000
종류코드0.0000.0000.2041.0001.0000.0880.0001.000
형식코드1.0001.0001.0001.0001.0001.0001.0000.000
설치구분코드0.0690.0590.0550.0881.0001.0000.0721.000
대표업종코드0.1500.0000.1640.0001.0000.0721.0001.000
겸업종코드1.0001.0001.0001.0000.0001.0001.0001.000

Missing values

2023-12-12T14:05:57.938321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:05:58.154784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

관리번호검침원코드종류코드구경코드형식코드설치구분코드대표업종코드겸업종코드세대수
621905-04-138-000000천상한원격식13<NA>정상조정가정용<NA>1
100001-03-155-290000엄석용유니온식13<NA>정상조정가정용<NA>1
1079106-22-104-000000김창권유니온식13<NA>정상조정가정용<NA>1
1557709-02-157-030000최종명유니온식13<NA>정상조정가정용<NA>1
1909413-06-375-000000엄석용유니온식13<NA>정상조정가정용<NA>1
1062206-21-190-000000우종환유니온식13<NA>정상조정가정용<NA>1
1584809-03-133-000000최종명유니온식13<NA>정상조정가정용<NA>1
41501-01-231-000000엄석용원격식13<NA>가구분할가정용<NA>3
617505-04-099-020000천상한원격식13<NA>정상조정일반용<NA>1
720506-04-141-000000천상한유니온식13<NA>정상조정가정용<NA>1
관리번호검침원코드종류코드구경코드형식코드설치구분코드대표업종코드겸업종코드세대수
1803412-00-720-000000김욱유니온식13<NA>정상조정가정용<NA>1
730206-04-227-000000천상한유니온식13<NA>정상조정일반용<NA>1
5901-01-038-100000엄석용유니온식13<NA>정상조정일반용<NA>1
360603-04-062-150000남기화원격식13<NA>정상조정일반용<NA>1
1646510-01-134-140000이재성유니온식13<NA>정상조정가정용<NA>1
209802-03-240-000000김창권유니온식13<NA>정상조정가정용<NA>1
1780311-04-193-000000이재성유니온식13<NA>정상조정가정용<NA>1
618205-04-103-000000천상한원격식13<NA>정상조정가정용<NA>1
1254308-02-206-001200강동훈유니온식13<NA>정상조정일반용<NA>1
342603-03-222-000000남기화유니온식13<NA>정상조정가정용<NA>1