Overview

Dataset statistics

Number of variables5
Number of observations2923
Missing cells0
Missing cells (%)0.0%
Duplicate rows6
Duplicate rows (%)0.2%
Total size in memory122.9 KiB
Average record size in memory43.0 B

Variable types

Numeric2
Categorical2
Text1

Dataset

Description경상남도 공사계약대장시스템의 연대보증 데이터입니다. (공사년도, 공사구분, 업체명등의 데이터를 포함하고있습니다.)
Author경상남도
URLhttps://www.data.go.kr/data/15049519/fileData.do

Alerts

부서코드 has constant value ""Constant
Dataset has 6 (0.2%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 07:32:36.439431
Analysis finished2023-12-12 07:32:37.236571
Duration0.8 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

공사년도
Real number (ℝ)

Distinct27
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2001.6712
Minimum1990
Maximum2016
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.8 KiB
2023-12-12T16:32:37.299584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1990
5-th percentile1991
Q11997
median2002
Q32006
95-th percentile2011
Maximum2016
Range26
Interquartile range (IQR)9

Descriptive statistics

Standard deviation6.1424191
Coefficient of variation (CV)0.0030686453
Kurtosis-0.84790169
Mean2001.6712
Median Absolute Deviation (MAD)5
Skewness-0.1657417
Sum5850885
Variance37.729312
MonotonicityNot monotonic
2023-12-12T16:32:37.445051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
2004 281
 
9.6%
2000 189
 
6.5%
1999 165
 
5.6%
2001 164
 
5.6%
2010 161
 
5.5%
2003 150
 
5.1%
2005 148
 
5.1%
2007 148
 
5.1%
2006 141
 
4.8%
2008 124
 
4.2%
Other values (17) 1252
42.8%
ValueCountFrequency (%)
1990 88
3.0%
1991 119
4.1%
1992 87
3.0%
1993 99
3.4%
1994 79
2.7%
1995 100
3.4%
1996 106
3.6%
1997 94
3.2%
1998 95
3.3%
1999 165
5.6%
ValueCountFrequency (%)
2016 9
 
0.3%
2015 11
 
0.4%
2014 6
 
0.2%
2013 23
 
0.8%
2012 37
 
1.3%
2011 82
2.8%
2010 161
5.5%
2009 118
4.0%
2008 124
4.2%
2007 148
5.1%

공사구분
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.0 KiB
공사
2259 
용역
561 
기타
 
103

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row공사
2nd row공사
3rd row공사
4th row공사
5th row공사

Common Values

ValueCountFrequency (%)
공사 2259
77.3%
용역 561
 
19.2%
기타 103
 
3.5%

Length

2023-12-12T16:32:37.582546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:32:37.687121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
공사 2259
77.3%
용역 561
 
19.2%
기타 103
 
3.5%

공사번호
Real number (ℝ)

Distinct373
Distinct (%)12.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean81.025316
Minimum1
Maximum629
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.8 KiB
2023-12-12T16:32:37.829227image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q129
median57
Q388
95-th percentile331.7
Maximum629
Range628
Interquartile range (IQR)59

Descriptive statistics

Standard deviation95.413894
Coefficient of variation (CV)1.1775813
Kurtosis8.8132136
Mean81.025316
Median Absolute Deviation (MAD)29
Skewness2.8741572
Sum236837
Variance9103.8111
MonotonicityNot monotonic
2023-12-12T16:32:37.998190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
48 30
 
1.0%
56 30
 
1.0%
30 29
 
1.0%
27 29
 
1.0%
50 29
 
1.0%
63 29
 
1.0%
61 29
 
1.0%
20 28
 
1.0%
74 28
 
1.0%
46 28
 
1.0%
Other values (363) 2634
90.1%
ValueCountFrequency (%)
1 27
0.9%
2 24
0.8%
3 24
0.8%
4 25
0.9%
5 26
0.9%
6 24
0.8%
7 25
0.9%
8 25
0.9%
9 28
1.0%
10 26
0.9%
ValueCountFrequency (%)
629 1
< 0.1%
623 1
< 0.1%
620 1
< 0.1%
618 1
< 0.1%
566 1
< 0.1%
547 1
< 0.1%
543 1
< 0.1%
530 1
< 0.1%
529 1
< 0.1%
527 1
< 0.1%

부서코드
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size23.0 KiB
1
2923 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 2923
100.0%

Length

2023-12-12T16:32:38.155072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:32:38.307611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 2923
100.0%
Distinct614
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size23.0 KiB
2023-12-12T16:32:38.490629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length18
Mean length6.6989394
Min length1

Characters and Unicode

Total characters19581
Distinct characters235
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique285 ?
Unique (%)9.8%

Sample

1st row(합자)동신종합설비
2nd row삼흥종합건설(주)
3rd row삼흥종합건설(주)
4th row중앙토건(주)
5th row삼성건설(주)
ValueCountFrequency (%)
482
 
16.4%
삼성건설(주 42
 
1.4%
주)도화종합기술공사 40
 
1.4%
흥한건설(주 34
 
1.2%
극동건설(주 29
 
1.0%
주)해동건설 29
 
1.0%
우람종합건설(주 25
 
0.8%
한신공영(주 25
 
0.8%
대창건설(주 24
 
0.8%
대림산업(주 24
 
0.8%
Other values (613) 2190
74.4%
2023-12-12T16:32:38.872510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 2364
 
12.1%
) 2364
 
12.1%
2361
 
12.1%
1697
 
8.7%
1479
 
7.6%
655
 
3.3%
647
 
3.3%
- 482
 
2.5%
339
 
1.7%
296
 
1.5%
Other values (225) 6897
35.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 14317
73.1%
Open Punctuation 2364
 
12.1%
Close Punctuation 2364
 
12.1%
Dash Punctuation 482
 
2.5%
Space Separator 22
 
0.1%
Decimal Number 12
 
0.1%
Math Symbol 10
 
0.1%
Uppercase Letter 6
 
< 0.1%
Other Punctuation 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2361
 
16.5%
1697
 
11.9%
1479
 
10.3%
655
 
4.6%
647
 
4.5%
339
 
2.4%
296
 
2.1%
291
 
2.0%
255
 
1.8%
243
 
1.7%
Other values (205) 6054
42.3%
Decimal Number
ValueCountFrequency (%)
1 4
33.3%
0 3
25.0%
9 2
16.7%
6 1
 
8.3%
5 1
 
8.3%
2 1
 
8.3%
Uppercase Letter
ValueCountFrequency (%)
S 1
16.7%
K 1
16.7%
N 1
16.7%
A 1
16.7%
M 1
16.7%
E 1
16.7%
Other Punctuation
ValueCountFrequency (%)
. 2
50.0%
# 1
25.0%
? 1
25.0%
Open Punctuation
ValueCountFrequency (%)
( 2364
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2364
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 482
100.0%
Space Separator
ValueCountFrequency (%)
22
100.0%
Math Symbol
ValueCountFrequency (%)
10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 14317
73.1%
Common 5258
 
26.9%
Latin 6
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2361
 
16.5%
1697
 
11.9%
1479
 
10.3%
655
 
4.6%
647
 
4.5%
339
 
2.4%
296
 
2.1%
291
 
2.0%
255
 
1.8%
243
 
1.7%
Other values (205) 6054
42.3%
Common
ValueCountFrequency (%)
( 2364
45.0%
) 2364
45.0%
- 482
 
9.2%
22
 
0.4%
10
 
0.2%
1 4
 
0.1%
0 3
 
0.1%
. 2
 
< 0.1%
9 2
 
< 0.1%
6 1
 
< 0.1%
Other values (4) 4
 
0.1%
Latin
ValueCountFrequency (%)
S 1
16.7%
K 1
16.7%
N 1
16.7%
A 1
16.7%
M 1
16.7%
E 1
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 14317
73.1%
ASCII 5254
 
26.8%
Arrows 10
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 2364
45.0%
) 2364
45.0%
- 482
 
9.2%
22
 
0.4%
1 4
 
0.1%
0 3
 
0.1%
. 2
 
< 0.1%
9 2
 
< 0.1%
6 1
 
< 0.1%
5 1
 
< 0.1%
Other values (9) 9
 
0.2%
Hangul
ValueCountFrequency (%)
2361
 
16.5%
1697
 
11.9%
1479
 
10.3%
655
 
4.6%
647
 
4.5%
339
 
2.4%
296
 
2.1%
291
 
2.0%
255
 
1.8%
243
 
1.7%
Other values (205) 6054
42.3%
Arrows
ValueCountFrequency (%)
10
100.0%

Interactions

2023-12-12T16:32:36.827669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:32:36.658381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:32:36.920508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:32:36.744651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T16:32:39.019848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공사년도공사구분공사번호
공사년도1.0000.5500.653
공사구분0.5501.0000.454
공사번호0.6530.4541.000
2023-12-12T16:32:39.151938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공사년도공사번호공사구분
공사년도1.0000.3300.398
공사번호0.3301.0000.307
공사구분0.3980.3071.000

Missing values

2023-12-12T16:32:37.074342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T16:32:37.193342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

공사년도공사구분공사번호부서코드업체명
01990공사51(합자)동신종합설비
11990공사11삼흥종합건설(주)
21990공사21삼흥종합건설(주)
31990공사41중앙토건(주)
41990공사31삼성건설(주)
51990공사61-
61990공사71경남종합건설(주)
71990공사81중앙토건(주)
81990공사101중앙토건(주)
91990공사121-
공사년도공사구분공사번호부서코드업체명
29132016공사1251화성종합건설(주)
29142016공사2181한신공영(주)
29152016공사701(주)대우건설
29162016공사1591남국종합건설(주)
29172016공사1431한신공영(주)
29182016공사2261대림산업(주)
29191996공사481금미건설(주)
29202004용역21없음
29212004용역271-
29222000용역561-

Duplicate rows

Most frequently occurring

공사년도공사구분공사번호부서코드업체명# duplicates
02005용역501(주)도화종합기술공사2
12005용역631보람엔지니어링2
22005용역761(주)한국종합기술2
32008용역2211(주)덕성2
42008용역2231(주)부호산업개발2
52010공사4631아산건설주식회사2