Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells9996
Missing cells (%)14.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Numeric2
Text2
DateTime1
Categorical2

Dataset

Description부산광역시상수도사업본부_수용가정보시스템_계량기정보_계량기변경이력정보_20230125
Author부산광역시 상수도사업본부
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15100348

Alerts

계량기형식 is highly overall correlated with 결재처리문서아이디(ID)High correlation
결재처리문서아이디(ID) is highly overall correlated with 계량기형식High correlation
계량기형식 is highly imbalanced (65.7%)Imbalance
결재처리문서아이디(ID) is highly imbalanced (97.6%)Imbalance
진척번호 has 9996 (> 99.9%) missing valuesMissing
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-10 16:15:28.436093
Analysis finished2023-12-10 16:15:29.772919
Duration1.34 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23636.912
Minimum7
Maximum47196
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T01:15:29.882104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile2553.95
Q111858.25
median23608.5
Q335331.25
95-th percentile44872.2
Maximum47196
Range47189
Interquartile range (IQR)23473

Descriptive statistics

Standard deviation13510.792
Coefficient of variation (CV)0.57159714
Kurtosis-1.1881096
Mean23636.912
Median Absolute Deviation (MAD)11734
Skewness0.0095790645
Sum2.3636912 × 108
Variance1.8254149 × 108
MonotonicityNot monotonic
2023-12-11T01:15:30.066448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
37711 1
 
< 0.1%
28484 1
 
< 0.1%
13445 1
 
< 0.1%
17868 1
 
< 0.1%
25097 1
 
< 0.1%
39549 1
 
< 0.1%
20162 1
 
< 0.1%
4412 1
 
< 0.1%
41528 1
 
< 0.1%
5028 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
7 1
< 0.1%
10 1
< 0.1%
16 1
< 0.1%
24 1
< 0.1%
27 1
< 0.1%
35 1
< 0.1%
46 1
< 0.1%
58 1
< 0.1%
59 1
< 0.1%
76 1
< 0.1%
ValueCountFrequency (%)
47196 1
< 0.1%
47187 1
< 0.1%
47184 1
< 0.1%
47183 1
< 0.1%
47182 1
< 0.1%
47177 1
< 0.1%
47176 1
< 0.1%
47175 1
< 0.1%
47171 1
< 0.1%
47167 1
< 0.1%
Distinct5514
Distinct (%)55.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T01:15:30.502568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters60000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3089 ?
Unique (%)30.9%

Sample

1st row*71*29
2nd row*36*87
3rd row*76*15
4th row*18*50
5th row*56*55
ValueCountFrequency (%)
52*54 17
 
0.2%
27*05 16
 
0.2%
28*78 14
 
0.1%
80*96 14
 
0.1%
15*91 12
 
0.1%
02*78 12
 
0.1%
54*92 12
 
0.1%
33*85 11
 
0.1%
03*28 11
 
0.1%
34*75 11
 
0.1%
Other values (5504) 9870
98.7%
2023-12-11T01:15:31.077025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 20000
33.3%
1 4552
 
7.6%
0 4505
 
7.5%
9 4479
 
7.5%
2 4044
 
6.7%
8 3937
 
6.6%
7 3886
 
6.5%
5 3863
 
6.4%
3 3798
 
6.3%
4 3695
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 40000
66.7%
Other Punctuation 20000
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 4552
11.4%
0 4505
11.3%
9 4479
11.2%
2 4044
10.1%
8 3937
9.8%
7 3886
9.7%
5 3863
9.7%
3 3798
9.5%
4 3695
9.2%
6 3241
8.1%
Other Punctuation
ValueCountFrequency (%)
* 20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 60000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
* 20000
33.3%
1 4552
 
7.6%
0 4505
 
7.5%
9 4479
 
7.5%
2 4044
 
6.7%
8 3937
 
6.6%
7 3886
 
6.5%
5 3863
 
6.4%
3 3798
 
6.3%
4 3695
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 20000
33.3%
1 4552
 
7.6%
0 4505
 
7.5%
9 4479
 
7.5%
2 4044
 
6.7%
8 3937
 
6.6%
7 3886
 
6.5%
5 3863
 
6.4%
3 3798
 
6.3%
4 3695
 
6.2%
Distinct330
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2022-01-03 00:00:00
Maximum2022-12-30 00:00:00
2023-12-11T01:15:31.223548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:15:31.372491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

구경
Real number (ℝ)

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.1216
Minimum15
Maximum200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T01:15:31.540164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile15
Q115
median15
Q315
95-th percentile32
Maximum200
Range185
Interquartile range (IQR)0

Descriptive statistics

Standard deviation12.021949
Coefficient of variation (CV)0.66340441
Kurtosis77.605085
Mean18.1216
Median Absolute Deviation (MAD)0
Skewness7.6888416
Sum181216
Variance144.52727
MonotonicityNot monotonic
2023-12-11T01:15:31.676775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
15 8224
82.2%
20 647
 
6.5%
25 604
 
6.0%
40 203
 
2.0%
50 127
 
1.3%
32 73
 
0.7%
80 52
 
0.5%
100 39
 
0.4%
150 25
 
0.2%
200 6
 
0.1%
ValueCountFrequency (%)
15 8224
82.2%
20 647
 
6.5%
25 604
 
6.0%
32 73
 
0.7%
40 203
 
2.0%
50 127
 
1.3%
80 52
 
0.5%
100 39
 
0.4%
150 25
 
0.2%
200 6
 
0.1%
ValueCountFrequency (%)
200 6
 
0.1%
150 25
 
0.2%
100 39
 
0.4%
80 52
 
0.5%
50 127
 
1.3%
40 203
 
2.0%
32 73
 
0.7%
25 604
 
6.0%
20 647
 
6.5%
15 8224
82.2%

계량기형식
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
건식
8866 
습식
1074 
<NA>
 
60

Length

Max length4
Median length2
Mean length2.012
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row건식
2nd row습식
3rd row건식
4th row건식
5th row건식

Common Values

ValueCountFrequency (%)
건식 8866
88.7%
습식 1074
 
10.7%
<NA> 60
 
0.6%

Length

2023-12-11T01:15:31.869239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:15:32.028294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
건식 8866
88.7%
습식 1074
 
10.7%
na 60
 
0.6%

결재처리문서아이디(ID)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9932 
SUDO_309_20220600016
 
55
SUDO_309_20220600017
 
6
SUDO_309_20220500002
 
2
SUDO_309_20220600014
 
2
Other values (2)
 
3

Length

Max length20
Median length4
Mean length4.1088
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9932
99.3%
SUDO_309_20220600016 55
 
0.5%
SUDO_309_20220600017 6
 
0.1%
SUDO_309_20220500002 2
 
< 0.1%
SUDO_309_20220600014 2
 
< 0.1%
SUDO_309_20220600010 2
 
< 0.1%
SUDO_309_20220400001 1
 
< 0.1%

Length

2023-12-11T01:15:32.157145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:15:32.309861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9932
99.3%
sudo_309_20220600016 55
 
0.5%
sudo_309_20220600017 6
 
0.1%
sudo_309_20220500002 2
 
< 0.1%
sudo_309_20220600014 2
 
< 0.1%
sudo_309_20220600010 2
 
< 0.1%
sudo_309_20220400001 1
 
< 0.1%

진척번호
Text

MISSING 

Distinct4
Distinct (%)100.0%
Missing9996
Missing (%)> 99.9%
Memory size156.2 KiB
2023-12-11T01:15:32.497034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length7
Mean length6.75
Min length6

Characters and Unicode

Total characters27
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)100.0%

Sample

1st row22-1921
2nd row22-2200
3rd row22-1947
4th row22-984
ValueCountFrequency (%)
22-1921 1
25.0%
22-2200 1
25.0%
22-1947 1
25.0%
22-984 1
25.0%
2023-12-11T01:15:32.786932image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 11
40.7%
- 4
 
14.8%
1 3
 
11.1%
9 3
 
11.1%
0 2
 
7.4%
4 2
 
7.4%
7 1
 
3.7%
8 1
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 23
85.2%
Dash Punctuation 4
 
14.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 11
47.8%
1 3
 
13.0%
9 3
 
13.0%
0 2
 
8.7%
4 2
 
8.7%
7 1
 
4.3%
8 1
 
4.3%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 27
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 11
40.7%
- 4
 
14.8%
1 3
 
11.1%
9 3
 
11.1%
0 2
 
7.4%
4 2
 
7.4%
7 1
 
3.7%
8 1
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 11
40.7%
- 4
 
14.8%
1 3
 
11.1%
9 3
 
11.1%
0 2
 
7.4%
4 2
 
7.4%
7 1
 
3.7%
8 1
 
3.7%

Interactions

2023-12-11T01:15:29.211661image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:15:28.959755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:15:29.344064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:15:29.072787image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:15:32.877116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번구경계량기형식결재처리문서아이디(ID)진척번호
연번1.0000.0000.0870.4971.000
구경0.0001.0000.3330.264NaN
계량기형식0.0870.3331.0000.8361.000
결재처리문서아이디(ID)0.4970.2640.8361.000NaN
진척번호1.000NaN1.000NaN1.000
2023-12-11T01:15:33.006656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
계량기형식결재처리문서아이디(ID)
계량기형식1.0000.621
결재처리문서아이디(ID)0.6211.000
2023-12-11T01:15:33.101311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번구경계량기형식결재처리문서아이디(ID)
연번1.0000.0230.0670.296
구경0.0231.0000.2390.181
계량기형식0.0670.2391.0000.621
결재처리문서아이디(ID)0.2960.1810.6211.000

Missing values

2023-12-11T01:15:29.520495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:15:29.691852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번고객번호장치일구경계량기형식결재처리문서아이디(ID)진척번호
3771037711*71*292022-09-1515건식<NA><NA>
2510225103*36*872022-02-0432습식<NA><NA>
39743975*76*152022-09-2915건식<NA><NA>
14791480*18*502022-10-1815건식<NA><NA>
2199521996*56*552022-05-3015건식<NA><NA>
1297712978*18*622022-05-1715건식<NA><NA>
21882189*26*922022-06-1340건식<NA><NA>
4227142272*03*632022-10-2140습식<NA><NA>
80998100*95*242022-07-0215건식<NA><NA>
4044540446*91*572022-11-0915건식<NA><NA>
연번고객번호장치일구경계량기형식결재처리문서아이디(ID)진척번호
1390613907*17*572022-03-1015건식<NA><NA>
4512945130*19*312022-12-1232건식<NA><NA>
3679536796*81*892022-09-1315건식<NA><NA>
3844438445*95*682022-12-1315건식<NA><NA>
1845718458*01*902022-01-1115건식<NA><NA>
35073508*86*942022-10-0540습식<NA><NA>
1478414785*83*332022-07-1115건식<NA><NA>
3965039651*75*492022-11-2340습식<NA><NA>
2852128522*36*362022-03-1115건식<NA><NA>
1743917440*11*392022-05-0915건식<NA><NA>