Overview

Dataset statistics

Number of variables9
Number of observations2512
Missing cells7536
Missing cells (%)33.3%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory193.9 KiB
Average record size in memory79.1 B

Variable types

Numeric3
Categorical2
Text1
Unsupported3

Dataset

Description경상남도 공사계약대장시스템의 선급금정산 데이터입니다. 공사년도, 공사구분, 제출일자, 정산금액등의 데이터를 포함하고있습니다.
Author경상남도
URLhttps://bigdata.gyeongnam.go.kr/index.gn?menuCd=DOM_000000114002001000&publicdatapk=15049521

Alerts

부서코드 has constant value ""Constant
Dataset has 1 (< 0.1%) duplicate rowsDuplicates
공사구분 is highly imbalanced (63.0%)Imbalance
Unnamed: 6 has 2512 (100.0%) missing valuesMissing
Unnamed: 7 has 2512 (100.0%) missing valuesMissing
Unnamed: 8 has 2512 (100.0%) missing valuesMissing
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 00:38:12.918941
Analysis finished2023-12-11 00:38:14.265340
Duration1.35 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

공사년도
Real number (ℝ)

Distinct29
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2006.6286
Minimum1990
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.2 KiB
2023-12-11T09:38:14.435816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1990
5-th percentile1992
Q12003
median2009
Q32011
95-th percentile2016
Maximum2019
Range29
Interquartile range (IQR)8

Descriptive statistics

Standard deviation7.0183939
Coefficient of variation (CV)0.0034976049
Kurtosis-0.040319738
Mean2006.6286
Median Absolute Deviation (MAD)4
Skewness-0.8594025
Sum5040651
Variance49.257853
MonotonicityNot monotonic
2023-12-11T09:38:14.624704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
2010 243
 
9.7%
2009 241
 
9.6%
2012 193
 
7.7%
2011 184
 
7.3%
2008 160
 
6.4%
2007 134
 
5.3%
2015 125
 
5.0%
2003 109
 
4.3%
2014 92
 
3.7%
2005 91
 
3.6%
Other values (19) 940
37.4%
ValueCountFrequency (%)
1990 61
2.4%
1991 60
2.4%
1992 51
2.0%
1993 73
2.9%
1994 61
2.4%
1995 10
 
0.4%
1997 1
 
< 0.1%
1998 1
 
< 0.1%
1999 60
2.4%
2000 83
3.3%
ValueCountFrequency (%)
2019 2
 
0.1%
2018 29
 
1.2%
2017 26
 
1.0%
2016 74
 
2.9%
2015 125
5.0%
2014 92
 
3.7%
2013 76
 
3.0%
2012 193
7.7%
2011 184
7.3%
2010 243
9.7%

공사구분
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size19.8 KiB
공사
2165 
용역
345 
구매
 
2

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row공사
2nd row공사
3rd row공사
4th row공사
5th row공사

Common Values

ValueCountFrequency (%)
공사 2165
86.2%
용역 345
 
13.7%
구매 2
 
0.1%

Length

2023-12-11T09:38:14.750566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T09:38:14.893752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
공사 2165
86.2%
용역 345
 
13.7%
구매 2
 
0.1%

공사번호
Real number (ℝ)

Distinct377
Distinct (%)15.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean104.0211
Minimum1
Maximum617
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.2 KiB
2023-12-11T09:38:15.016520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q133
median65
Q3114
95-th percentile416
Maximum617
Range616
Interquartile range (IQR)81

Descriptive statistics

Standard deviation116.27908
Coefficient of variation (CV)1.1178413
Kurtosis3.771799
Mean104.0211
Median Absolute Deviation (MAD)40
Skewness2.0740245
Sum261301
Variance13520.824
MonotonicityNot monotonic
2023-12-11T09:38:15.144425image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
113 33
 
1.3%
42 30
 
1.2%
40 28
 
1.1%
20 28
 
1.1%
39 28
 
1.1%
7 27
 
1.1%
12 27
 
1.1%
36 27
 
1.1%
105 27
 
1.1%
58 26
 
1.0%
Other values (367) 2231
88.8%
ValueCountFrequency (%)
1 22
0.9%
2 13
0.5%
3 24
1.0%
4 11
0.4%
5 15
0.6%
6 9
 
0.4%
7 27
1.1%
8 24
1.0%
9 20
0.8%
10 22
0.9%
ValueCountFrequency (%)
617 1
< 0.1%
596 1
< 0.1%
594 1
< 0.1%
582 1
< 0.1%
564 1
< 0.1%
554 2
0.1%
543 2
0.1%
530 1
< 0.1%
529 1
< 0.1%
527 1
< 0.1%

부서코드
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.8 KiB
1
2512 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 2512
100.0%

Length

2023-12-11T09:38:15.265447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T09:38:15.360268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 2512
100.0%
Distinct1465
Distinct (%)58.3%
Missing0
Missing (%)0.0%
Memory size19.8 KiB
2023-12-11T09:38:15.630725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.9164013
Min length4

Characters and Unicode

Total characters24910
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique983 ?
Unique (%)39.1%

Sample

1st row1990-12-15
2nd row1990-12-29
3rd row1990-12-07
4th row199012
5th row1990-04-06
ValueCountFrequency (%)
2009-12-30 20
 
0.8%
2010-12-30 17
 
0.7%
2007-12-28 15
 
0.6%
2009-12-29 15
 
0.6%
2012-12-28 14
 
0.6%
2009-06-25 12
 
0.5%
2010-06-28 11
 
0.4%
2010-06-29 11
 
0.4%
2008-12-29 11
 
0.4%
2015-12-23 10
 
0.4%
Other values (1455) 2376
94.6%
2023-12-11T09:38:16.106578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 6178
24.8%
- 4920
19.8%
2 4493
18.0%
1 3745
15.0%
9 1695
 
6.8%
6 773
 
3.1%
3 728
 
2.9%
5 613
 
2.5%
8 602
 
2.4%
7 590
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 19990
80.2%
Dash Punctuation 4920
 
19.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 6178
30.9%
2 4493
22.5%
1 3745
18.7%
9 1695
 
8.5%
6 773
 
3.9%
3 728
 
3.6%
5 613
 
3.1%
8 602
 
3.0%
7 590
 
3.0%
4 573
 
2.9%
Dash Punctuation
ValueCountFrequency (%)
- 4920
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 24910
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 6178
24.8%
- 4920
19.8%
2 4493
18.0%
1 3745
15.0%
9 1695
 
6.8%
6 773
 
3.1%
3 728
 
2.9%
5 613
 
2.5%
8 602
 
2.4%
7 590
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24910
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 6178
24.8%
- 4920
19.8%
2 4493
18.0%
1 3745
15.0%
9 1695
 
6.8%
6 773
 
3.1%
3 728
 
2.9%
5 613
 
2.5%
8 602
 
2.4%
7 590
 
2.4%

정산금액
Real number (ℝ)

Distinct1889
Distinct (%)75.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.7389131 × 108
Minimum-9.68 × 108
Maximum8.568216 × 109
Zeros17
Zeros (%)0.7%
Negative1
Negative (%)< 0.1%
Memory size22.2 KiB
2023-12-11T09:38:16.310992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-9.68 × 108
5-th percentile20000000
Q161000000
median1.3081599 × 108
Q32.7979 × 108
95-th percentile8.424625 × 108
Maximum8.568216 × 109
Range9.536216 × 109
Interquartile range (IQR)2.1879 × 108

Descriptive statistics

Standard deviation5.7660852 × 108
Coefficient of variation (CV)2.1052458
Kurtosis81.941892
Mean2.7389131 × 108
Median Absolute Deviation (MAD)86815988
Skewness7.8059608
Sum6.8801496 × 1011
Variance3.3247739 × 1017
MonotonicityNot monotonic
2023-12-11T09:38:16.814033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
300000000 34
 
1.4%
200000000 21
 
0.8%
50000000 18
 
0.7%
0 17
 
0.7%
100000000 15
 
0.6%
40000000 13
 
0.5%
80000000 13
 
0.5%
150000000 12
 
0.5%
250000000 12
 
0.5%
120000000 10
 
0.4%
Other values (1879) 2347
93.4%
ValueCountFrequency (%)
-968000000 1
 
< 0.1%
0 17
0.7%
280000 1
 
< 0.1%
300000 1
 
< 0.1%
840000 1
 
< 0.1%
1167910 1
 
< 0.1%
1760000 1
 
< 0.1%
2670000 1
 
< 0.1%
3500000 1
 
< 0.1%
4074750 1
 
< 0.1%
ValueCountFrequency (%)
8568216000 1
< 0.1%
8448000000 1
< 0.1%
7654240000 1
< 0.1%
7345700000 1
< 0.1%
7273200000 1
< 0.1%
5976520000 1
< 0.1%
5941350000 1
< 0.1%
5896000000 1
< 0.1%
5451160000 1
< 0.1%
4531950000 1
< 0.1%

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2512
Missing (%)100.0%
Memory size22.2 KiB

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2512
Missing (%)100.0%
Memory size22.2 KiB

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2512
Missing (%)100.0%
Memory size22.2 KiB

Interactions

2023-12-11T09:38:13.753602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:38:13.142720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:38:13.455575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:38:13.843883image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:38:13.241867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:38:13.567600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:38:13.931040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:38:13.362948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:38:13.659341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T09:38:16.909121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공사년도공사구분공사번호정산금액
공사년도1.0000.3180.5480.158
공사구분0.3181.0000.5640.024
공사번호0.5480.5641.0000.020
정산금액0.1580.0240.0201.000
2023-12-11T09:38:17.026085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공사년도공사번호정산금액공사구분
공사년도1.0000.2800.0620.205
공사번호0.2801.0000.0110.407
정산금액0.0620.0111.0000.000
공사구분0.2050.4070.0001.000

Missing values

2023-12-11T09:38:14.060472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T09:38:14.198476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

공사년도공사구분공사번호부서코드제출일자정산금액Unnamed: 6Unnamed: 7Unnamed: 8
01990공사111990-12-1553400000<NA><NA><NA>
11990공사411990-12-2940850000<NA><NA><NA>
21990공사211990-12-0798000000<NA><NA><NA>
31990공사2119901261900000<NA><NA><NA>
41990공사111990-04-0635600000<NA><NA><NA>
51990공사411990-09-2648000000<NA><NA><NA>
61990공사31199012140090000<NA><NA><NA>
71990공사311990-04-11101600000<NA><NA><NA>
81990공사711990-12-3159000000<NA><NA><NA>
91990공사711990-05-0382000000<NA><NA><NA>
공사년도공사구분공사번호부서코드제출일자정산금액Unnamed: 6Unnamed: 7Unnamed: 8
25022018공사13312019-04-01150754000<NA><NA><NA>
25032018공사13412019-03-2638500000<NA><NA><NA>
25042018공사13312018-12-19100000000<NA><NA><NA>
25052018공사13312019-01-31109246000<NA><NA><NA>
25062018공사13412018-12-2177000000<NA><NA><NA>
25072018공사14012018-12-2480000000<NA><NA><NA>
25082018공사14912019-05-13110000000<NA><NA><NA>
25092018공사15012019-03-1112882000<NA><NA><NA>
25102019공사112019-05-2947000000<NA><NA><NA>
25112019공사6012019-07-3180000000<NA><NA><NA>

Duplicate rows

Most frequently occurring

공사년도공사구분공사번호부서코드제출일자정산금액# duplicates
02005용역7712006-08-31540000002