Overview

Dataset statistics

Number of variables7
Number of observations45
Missing cells0
Missing cells (%)0.0%
Duplicate rows6
Duplicate rows (%)13.3%
Total size in memory2.7 KiB
Average record size in memory61.9 B

Variable types

DateTime3
Categorical2
Numeric2

Dataset

Description사유림업무지원포털 내 공사유림에서 추진하는 숲가꾸기 사업에 대한 계약시작일, 계약종료일, 계약보증금, 지체상금율, 용역금액 등의 정보
Author산림청
URLhttps://www.data.go.kr/data/15071634/fileData.do

Alerts

Dataset has 6 (13.3%) duplicate rowsDuplicates
계약보증금 is highly overall correlated with 용역금액 and 1 other fieldsHigh correlation
용역금액 is highly overall correlated with 계약보증금 and 1 other fieldsHigh correlation
지체상금율 is highly overall correlated with 계약보증금 and 1 other fieldsHigh correlation
지체상금율 is highly imbalanced (84.6%)Imbalance

Reproduction

Analysis started2023-12-12 09:43:10.343657
Analysis finished2023-12-12 09:43:11.247270
Duration0.9 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct17
Distinct (%)37.8%
Missing0
Missing (%)0.0%
Memory size492.0 B
Minimum2018-06-15 00:00:00
Maximum2020-06-10 00:00:00
2023-12-12T18:43:11.324883image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:43:11.460229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
Distinct19
Distinct (%)42.2%
Missing0
Missing (%)0.0%
Memory size492.0 B
Minimum2018-06-15 00:00:00
Maximum2020-06-11 00:00:00
2023-12-12T18:43:11.595638image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:43:11.736033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
Distinct16
Distinct (%)35.6%
Missing0
Missing (%)0.0%
Memory size492.0 B
Minimum2018-06-29 00:00:00
Maximum2020-08-12 00:00:00
2023-12-12T18:43:11.855623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:43:11.974026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
Distinct16
Distinct (%)35.6%
Missing0
Missing (%)0.0%
Memory size492.0 B
이**
13 
김**
13 
오**
용**
박**
Other values (11)
12 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique10 ?
Unique (%)22.2%

Sample

1st row신**
2nd row이**
3rd row이**
4th row노**
5th row이**

Common Values

ValueCountFrequency (%)
이** 13
28.9%
김** 13
28.9%
오** 3
 
6.7%
용** 2
 
4.4%
박** 2
 
4.4%
황** 2
 
4.4%
신** 1
 
2.2%
노** 1
 
2.2%
유** 1
 
2.2%
홍** 1
 
2.2%
Other values (6) 6
13.3%

Length

2023-12-12T18:43:12.101450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
13
28.9%
13
28.9%
3
 
6.7%
2
 
4.4%
2
 
4.4%
2
 
4.4%
1
 
2.2%
1
 
2.2%
1
 
2.2%
1
 
2.2%
Other values (6) 6
13.3%

계약보증금
Real number (ℝ)

HIGH CORRELATION 

Distinct14
Distinct (%)31.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean386003.93
Minimum50000
Maximum3200000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size537.0 B
2023-12-12T18:43:12.240604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum50000
5-th percentile50000
Q150000
median50000
Q3500000
95-th percentile1749580
Maximum3200000
Range3150000
Interquartile range (IQR)450000

Descriptive statistics

Standard deviation651666.25
Coefficient of variation (CV)1.6882373
Kurtosis7.8982436
Mean386003.93
Median Absolute Deviation (MAD)0
Skewness2.6767455
Sum17370177
Variance4.246689 × 1011
MonotonicityNot monotonic
2023-12-12T18:43:12.391086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
50000 23
51.1%
100000 7
 
15.6%
500000 4
 
8.9%
1547900 1
 
2.2%
1000000 1
 
2.2%
459895 1
 
2.2%
3200000 1
 
2.2%
245000 1
 
2.2%
1800000 1
 
2.2%
2023500 1
 
2.2%
Other values (4) 4
 
8.9%
ValueCountFrequency (%)
50000 23
51.1%
100000 7
 
15.6%
245000 1
 
2.2%
459895 1
 
2.2%
479870 1
 
2.2%
500000 4
 
8.9%
611855 1
 
2.2%
852157 1
 
2.2%
1000000 1
 
2.2%
1300000 1
 
2.2%
ValueCountFrequency (%)
3200000 1
 
2.2%
2023500 1
 
2.2%
1800000 1
 
2.2%
1547900 1
 
2.2%
1300000 1
 
2.2%
1000000 1
 
2.2%
852157 1
 
2.2%
611855 1
 
2.2%
500000 4
8.9%
479870 1
 
2.2%

용역금액
Real number (ℝ)

HIGH CORRELATION 

Distinct14
Distinct (%)31.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3790972.7
Minimum500000
Maximum32000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size537.0 B
2023-12-12T18:43:12.545887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum500000
5-th percentile500000
Q1500000
median500000
Q34798700
95-th percentile17495800
Maximum32000000
Range31500000
Interquartile range (IQR)4298700

Descriptive statistics

Standard deviation6557976
Coefficient of variation (CV)1.7298927
Kurtosis7.7888979
Mean3790972.7
Median Absolute Deviation (MAD)0
Skewness2.6781949
Sum1.7059377 × 108
Variance4.300705 × 1013
MonotonicityNot monotonic
2023-12-12T18:43:12.663657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
500000 23
51.1%
1000000 8
 
17.8%
5000000 3
 
6.7%
15479000 1
 
2.2%
10000000 1
 
2.2%
4598950 1
 
2.2%
32000000 1
 
2.2%
2450000 1
 
2.2%
18000000 1
 
2.2%
20235000 1
 
2.2%
Other values (4) 4
 
8.9%
ValueCountFrequency (%)
500000 23
51.1%
1000000 8
 
17.8%
2450000 1
 
2.2%
4598950 1
 
2.2%
4798700 1
 
2.2%
5000000 3
 
6.7%
6118550 1
 
2.2%
8521570 1
 
2.2%
10000000 1
 
2.2%
13892000 1
 
2.2%
ValueCountFrequency (%)
32000000 1
 
2.2%
20235000 1
 
2.2%
18000000 1
 
2.2%
15479000 1
 
2.2%
13892000 1
 
2.2%
10000000 1
 
2.2%
8521570 1
 
2.2%
6118550 1
 
2.2%
5000000 3
6.7%
4798700 1
 
2.2%

지체상금율
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size492.0 B
0.13
44 
0.8
 
1

Length

Max length4
Median length4
Mean length3.9777778
Min length3

Unique

Unique1 ?
Unique (%)2.2%

Sample

1st row0.13
2nd row0.13
3rd row0.13
4th row0.13
5th row0.13

Common Values

ValueCountFrequency (%)
0.13 44
97.8%
0.8 1
 
2.2%

Length

2023-12-12T18:43:12.835275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:43:12.972949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0.13 44
97.8%
0.8 1
 
2.2%

Interactions

2023-12-12T18:43:10.794355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:43:10.584854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:43:10.928576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:43:10.688852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T18:43:13.068007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
계약일자계약시작일계약종료일용역업체대표자명계약보증금용역금액지체상금율
계약일자1.0000.9960.9850.0001.0001.0001.000
계약시작일0.9961.0000.9840.0000.9990.9991.000
계약종료일0.9850.9841.0000.0000.9620.9451.000
용역업체대표자명0.0000.0000.0001.0000.0000.0000.000
계약보증금1.0000.9990.9620.0001.0000.9960.635
용역금액1.0000.9990.9450.0000.9961.0000.635
지체상금율1.0001.0001.0000.0000.6350.6351.000
2023-12-12T18:43:13.191992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
용역업체대표자명지체상금율
용역업체대표자명1.0000.000
지체상금율0.0001.000
2023-12-12T18:43:13.301295image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
계약보증금용역금액용역업체대표자명지체상금율
계약보증금1.0000.9950.0000.581
용역금액0.9951.0000.0000.581
용역업체대표자명0.0000.0001.0000.000
지체상금율0.5810.5810.0001.000

Missing values

2023-12-12T18:43:11.076203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:43:11.195201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

계약일자계약시작일계약종료일용역업체대표자명계약보증금용역금액지체상금율
02018-11-262018-11-262018-12-20신**1547900154790000.13
12020-05-262020-05-262020-05-31이**50000050000000.13
22020-05-262020-05-262020-05-31이**500005000000.13
32020-05-262020-05-262020-05-31노**500005000000.13
42020-02-252020-03-012020-03-31이**1000000100000000.13
52020-03-132020-03-132020-04-21유**45989545989500.13
62020-04-152020-04-202020-08-12이**3200000320000000.13
72020-05-262020-05-262020-05-31이**500005000000.13
82020-05-272020-05-282020-06-30용**24500024500000.13
92020-02-032020-02-032020-03-31이**1800000180000000.13
계약일자계약시작일계약종료일용역업체대표자명계약보증금용역금액지체상금율
352020-05-272020-05-272020-05-31강**500005000000.13
362020-05-272020-05-272020-05-31조**500005000000.13
372020-06-102020-06-102020-06-30이**10000010000000.13
382020-06-102020-06-102020-06-30황**10000010000000.13
392018-06-152018-06-152018-06-29이**1300000138920000.8
402020-05-272020-05-272020-05-31오**500005000000.13
412020-05-272020-05-272020-05-31김**500005000000.13
422020-05-272020-05-272020-05-31김**500005000000.13
432020-05-272020-05-272020-05-30박**500005000000.13
442020-05-272020-05-272020-05-31오**500005000000.13

Duplicate rows

Most frequently occurring

계약일자계약시작일계약종료일용역업체대표자명계약보증금용역금액지체상금율# duplicates
12020-05-262020-05-262020-05-31이**500005000000.133
22020-05-272020-05-272020-05-31김**500005000000.133
02020-05-262020-05-262020-05-31김**500005000000.132
32020-05-272020-05-272020-05-31오**500005000000.132
42020-05-272020-05-282020-05-31김**500005000000.132
52020-06-102020-06-102020-06-30이**10000010000000.132