Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells19953
Missing cells (%)28.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Numeric2
Text3
DateTime1
Categorical1

Dataset

Description부산광역시상수도사업본부_수용가정보시스템_계량기정보_계량기변경이력정보_20220131
Author부산광역시 상수도사업본부
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15100348

Alerts

계량기형식 is highly imbalanced (82.1%)Imbalance
결재처리문서아이디(ID) has 9997 (> 99.9%) missing valuesMissing
진척번호 has 9956 (99.6%) missing valuesMissing
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-10 16:15:35.309549
Analysis finished2023-12-10 16:15:36.721380
Duration1.41 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36479.525
Minimum14
Maximum73674
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T01:15:36.815090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile3488.75
Q117993
median36206
Q354972.75
95-th percentile69731.4
Maximum73674
Range73660
Interquartile range (IQR)36979.75

Descriptive statistics

Standard deviation21260.434
Coefficient of variation (CV)0.58280458
Kurtosis-1.2021913
Mean36479.525
Median Absolute Deviation (MAD)18466
Skewness0.013469914
Sum3.6479525 × 108
Variance4.5200605 × 108
MonotonicityNot monotonic
2023-12-11T01:15:37.010070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
55573 1
 
< 0.1%
30614 1
 
< 0.1%
47581 1
 
< 0.1%
2882 1
 
< 0.1%
15921 1
 
< 0.1%
57616 1
 
< 0.1%
10360 1
 
< 0.1%
36393 1
 
< 0.1%
23975 1
 
< 0.1%
71280 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
14 1
< 0.1%
22 1
< 0.1%
23 1
< 0.1%
40 1
< 0.1%
45 1
< 0.1%
46 1
< 0.1%
54 1
< 0.1%
55 1
< 0.1%
87 1
< 0.1%
88 1
< 0.1%
ValueCountFrequency (%)
73674 1
< 0.1%
73665 1
< 0.1%
73647 1
< 0.1%
73646 1
< 0.1%
73636 1
< 0.1%
73631 1
< 0.1%
73616 1
< 0.1%
73612 1
< 0.1%
73607 1
< 0.1%
73599 1
< 0.1%
Distinct3439
Distinct (%)34.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T01:15:37.415921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters60000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1237 ?
Unique (%)12.4%

Sample

1st row19**85
2nd row45**64
3rd row39**18
4th row40**36
5th row25**77
ValueCountFrequency (%)
49**88 20
 
0.2%
49**11 20
 
0.2%
49**15 18
 
0.2%
50**76 17
 
0.2%
49**31 17
 
0.2%
49**73 17
 
0.2%
51**63 16
 
0.2%
50**44 16
 
0.2%
50**94 16
 
0.2%
50**05 16
 
0.2%
Other values (3429) 9827
98.3%
2023-12-11T01:15:37.991134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 20000
33.3%
4 5273
 
8.8%
1 5111
 
8.5%
2 4985
 
8.3%
0 4914
 
8.2%
5 4633
 
7.7%
9 3764
 
6.3%
3 3645
 
6.1%
6 2718
 
4.5%
8 2552
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 40000
66.7%
Other Punctuation 20000
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 5273
13.2%
1 5111
12.8%
2 4985
12.5%
0 4914
12.3%
5 4633
11.6%
9 3764
9.4%
3 3645
9.1%
6 2718
6.8%
8 2552
6.4%
7 2405
6.0%
Other Punctuation
ValueCountFrequency (%)
* 20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 60000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
* 20000
33.3%
4 5273
 
8.8%
1 5111
 
8.5%
2 4985
 
8.3%
0 4914
 
8.2%
5 4633
 
7.7%
9 3764
 
6.3%
3 3645
 
6.1%
6 2718
 
4.5%
8 2552
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 20000
33.3%
4 5273
 
8.8%
1 5111
 
8.5%
2 4985
 
8.3%
0 4914
 
8.2%
5 4633
 
7.7%
9 3764
 
6.3%
3 3645
 
6.1%
6 2718
 
4.5%
8 2552
 
4.3%
Distinct335
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2021-01-01 00:00:00
Maximum2021-12-31 00:00:00
2023-12-11T01:15:38.228036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:15:38.410697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

구경
Real number (ℝ)

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.4716
Minimum15
Maximum300
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T01:15:38.635566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile15
Q115
median15
Q315
95-th percentile25
Maximum300
Range285
Interquartile range (IQR)0

Descriptive statistics

Standard deviation13.930312
Coefficient of variation (CV)0.75414757
Kurtosis109.53858
Mean18.4716
Median Absolute Deviation (MAD)0
Skewness8.8823022
Sum184716
Variance194.0536
MonotonicityNot monotonic
2023-12-11T01:15:38.769327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
15 7927
79.3%
20 987
 
9.9%
25 599
 
6.0%
40 160
 
1.6%
50 109
 
1.1%
80 78
 
0.8%
32 58
 
0.6%
100 44
 
0.4%
150 25
 
0.2%
200 7
 
0.1%
Other values (2) 6
 
0.1%
ValueCountFrequency (%)
15 7927
79.3%
20 987
 
9.9%
25 599
 
6.0%
32 58
 
0.6%
40 160
 
1.6%
50 109
 
1.1%
80 78
 
0.8%
100 44
 
0.4%
150 25
 
0.2%
200 7
 
0.1%
ValueCountFrequency (%)
300 2
 
< 0.1%
250 4
 
< 0.1%
200 7
 
0.1%
150 25
 
0.2%
100 44
 
0.4%
80 78
 
0.8%
50 109
 
1.1%
40 160
 
1.6%
32 58
 
0.6%
25 599
6.0%

계량기형식
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
건식
9564 
습식
 
376
<NA>
 
60

Length

Max length4
Median length2
Mean length2.012
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row건식
2nd row건식
3rd row건식
4th row건식
5th row건식

Common Values

ValueCountFrequency (%)
건식 9564
95.6%
습식 376
 
3.8%
<NA> 60
 
0.6%

Length

2023-12-11T01:15:38.994656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:15:39.162799image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
건식 9564
95.6%
습식 376
 
3.8%
na 60
 
0.6%
Distinct2
Distinct (%)66.7%
Missing9997
Missing (%)> 99.9%
Memory size156.2 KiB
2023-12-11T01:15:39.352919image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length20
Mean length20
Min length20

Characters and Unicode

Total characters60
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)33.3%

Sample

1st rowSUDO_301_20210300161
2nd rowSUDO_301_20210300161
3rd rowSUDO_301_20210300160
ValueCountFrequency (%)
sudo_301_20210300161 2
66.7%
sudo_301_20210300160 1
33.3%
2023-12-11T01:15:39.761916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 16
26.7%
1 11
18.3%
_ 6
 
10.0%
3 6
 
10.0%
2 6
 
10.0%
S 3
 
5.0%
U 3
 
5.0%
D 3
 
5.0%
O 3
 
5.0%
6 3
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 42
70.0%
Uppercase Letter 12
 
20.0%
Connector Punctuation 6
 
10.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 16
38.1%
1 11
26.2%
3 6
 
14.3%
2 6
 
14.3%
6 3
 
7.1%
Uppercase Letter
ValueCountFrequency (%)
S 3
25.0%
U 3
25.0%
D 3
25.0%
O 3
25.0%
Connector Punctuation
ValueCountFrequency (%)
_ 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 48
80.0%
Latin 12
 
20.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 16
33.3%
1 11
22.9%
_ 6
 
12.5%
3 6
 
12.5%
2 6
 
12.5%
6 3
 
6.2%
Latin
ValueCountFrequency (%)
S 3
25.0%
U 3
25.0%
D 3
25.0%
O 3
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 16
26.7%
1 11
18.3%
_ 6
 
10.0%
3 6
 
10.0%
2 6
 
10.0%
S 3
 
5.0%
U 3
 
5.0%
D 3
 
5.0%
O 3
 
5.0%
6 3
 
5.0%

진척번호
Text

MISSING 

Distinct44
Distinct (%)100.0%
Missing9956
Missing (%)99.6%
Memory size156.2 KiB
2023-12-11T01:15:40.068484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length6
Mean length5.5227273
Min length1

Characters and Unicode

Total characters243
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)100.0%

Sample

1st row21-278
2nd row21-33
3rd row21-244
4th row21-511
5th row21-773
ValueCountFrequency (%)
21-278 1
 
2.3%
21-33 1
 
2.3%
21-58 1
 
2.3%
21-912 1
 
2.3%
21-162 1
 
2.3%
21-470 1
 
2.3%
553 1
 
2.3%
21-483 1
 
2.3%
683 1
 
2.3%
21-588 1
 
2.3%
Other values (34) 34
77.3%
2023-12-11T01:15:40.611481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 55
22.6%
1 49
20.2%
- 39
16.0%
3 19
 
7.8%
4 16
 
6.6%
5 14
 
5.8%
7 13
 
5.3%
6 13
 
5.3%
8 10
 
4.1%
0 7
 
2.9%
Other values (2) 8
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 203
83.5%
Dash Punctuation 39
 
16.0%
Other Punctuation 1
 
0.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 55
27.1%
1 49
24.1%
3 19
 
9.4%
4 16
 
7.9%
5 14
 
6.9%
7 13
 
6.4%
6 13
 
6.4%
8 10
 
4.9%
0 7
 
3.4%
9 7
 
3.4%
Dash Punctuation
ValueCountFrequency (%)
- 39
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 243
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 55
22.6%
1 49
20.2%
- 39
16.0%
3 19
 
7.8%
4 16
 
6.6%
5 14
 
5.8%
7 13
 
5.3%
6 13
 
5.3%
8 10
 
4.1%
0 7
 
2.9%
Other values (2) 8
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 243
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 55
22.6%
1 49
20.2%
- 39
16.0%
3 19
 
7.8%
4 16
 
6.6%
5 14
 
5.8%
7 13
 
5.3%
6 13
 
5.3%
8 10
 
4.1%
0 7
 
2.9%
Other values (2) 8
 
3.3%

Interactions

2023-12-11T01:15:36.104118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:15:35.860129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:15:36.235336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:15:35.984556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:15:40.743260image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번구경계량기형식결재처리문서아이디(ID)진척번호
연번1.0000.0560.0971.0001.000
구경0.0561.0000.216NaNNaN
계량기형식0.0970.2161.000NaN1.000
결재처리문서아이디(ID)1.000NaNNaN1.000NaN
진척번호1.000NaN1.000NaN1.000
2023-12-11T01:15:40.893766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번구경계량기형식
연번1.0000.0570.074
구경0.0571.0000.155
계량기형식0.0740.1551.000

Missing values

2023-12-11T01:15:36.396940image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:15:36.515117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T01:15:36.645653image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번고객번호장치일구경계량기형식결재처리문서아이디(ID)진척번호
555725557319**852021-08-2715건식<NA><NA>
286042860545**642021-08-0215건식<NA><NA>
4144414539**182021-06-0715건식<NA><NA>
3215321640**362021-06-0715건식<NA><NA>
510405104125**772021-10-0815건식<NA><NA>
168721687349**372021-08-3015건식<NA><NA>
531365313740**242021-07-2015건식<NA><NA>
270942709545**142021-10-2615건식<NA><NA>
474504745115**712021-04-2215건식<NA><NA>
606446064545**862021-08-1015건식<NA><NA>
연번고객번호장치일구경계량기형식결재처리문서아이디(ID)진척번호
338723387313**332021-07-0525습식<NA>21-1277
7091709239**152021-05-1115건식<NA><NA>
195411954241**472021-11-1115건식<NA><NA>
630736307426**672021-11-2915건식<NA><NA>
463464634714**002021-10-2015건식<NA><NA>
693136931430**232021-12-0125습식<NA><NA>
699356993601**492021-11-0515건식<NA><NA>
386463864717**022021-04-2915건식<NA><NA>
369233692429**182021-06-1715건식<NA><NA>
76076150**932021-05-1115건식<NA><NA>