Overview

Dataset statistics

Number of variables6
Number of observations207
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory10.4 KiB
Average record size in memory51.6 B

Variable types

Numeric3
Categorical2
Text1

Dataset

Description한국산업안전보건공단에서 제공하는 KOSHA_MS 인증 현황에 관한 데이터로 사업장명, 현장명, 인증년도, 인증결정일, 인증번호, 인증유효기간, 진행상태에 관한 데이터를 확인하실 수 있습니다.
Author한국산업안전보건공단
URLhttps://www.data.go.kr/data/15102716/fileData.do

Alerts

구분 is highly overall correlated with 비고High correlation
비고 is highly overall correlated with 연번 and 3 other fieldsHigh correlation
연번 is highly overall correlated with 인증번호 and 2 other fieldsHigh correlation
인증번호 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
인증연도 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
연번 has unique valuesUnique
인증번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 01:40:13.222680
Analysis finished2023-12-12 01:40:14.731078
Duration1.51 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct207
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean104
Minimum1
Maximum207
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-12T10:40:14.810202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile11.3
Q152.5
median104
Q3155.5
95-th percentile196.7
Maximum207
Range206
Interquartile range (IQR)103

Descriptive statistics

Standard deviation59.899917
Coefficient of variation (CV)0.57596074
Kurtosis-1.2
Mean104
Median Absolute Deviation (MAD)52
Skewness0
Sum21528
Variance3588
MonotonicityStrictly increasing
2023-12-12T10:40:14.975101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.5%
2 1
 
0.5%
133 1
 
0.5%
134 1
 
0.5%
135 1
 
0.5%
136 1
 
0.5%
137 1
 
0.5%
138 1
 
0.5%
139 1
 
0.5%
140 1
 
0.5%
Other values (197) 197
95.2%
ValueCountFrequency (%)
1 1
0.5%
2 1
0.5%
3 1
0.5%
4 1
0.5%
5 1
0.5%
6 1
0.5%
7 1
0.5%
8 1
0.5%
9 1
0.5%
10 1
0.5%
ValueCountFrequency (%)
207 1
0.5%
206 1
0.5%
205 1
0.5%
204 1
0.5%
203 1
0.5%
202 1
0.5%
201 1
0.5%
200 1
0.5%
199 1
0.5%
198 1
0.5%

구분
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
전문건설업체
144 
종합건설업체
41 
발주기관
22 

Length

Max length6
Median length6
Mean length5.7874396
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row종합건설업체
2nd row종합건설업체
3rd row종합건설업체
4th row종합건설업체
5th row발주기관

Common Values

ValueCountFrequency (%)
전문건설업체 144
69.6%
종합건설업체 41
 
19.8%
발주기관 22
 
10.6%

Length

2023-12-12T10:40:15.158589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T10:40:15.309779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
전문건설업체 144
69.6%
종합건설업체 41
 
19.8%
발주기관 22
 
10.6%

인증번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct207
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1549.8599
Minimum435
Maximum3317
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-12T10:40:15.469647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum435
5-th percentile445.3
Q1613
median1286
Q32422.5
95-th percentile3041.7
Maximum3317
Range2882
Interquartile range (IQR)1809.5

Descriptive statistics

Standard deviation921.57404
Coefficient of variation (CV)0.59461764
Kurtosis-1.2711936
Mean1549.8599
Median Absolute Deviation (MAD)819
Skewness0.31916469
Sum320821
Variance849298.7
MonotonicityNot monotonic
2023-12-12T10:40:15.619905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
435 1
 
0.5%
436 1
 
0.5%
1968 1
 
0.5%
1969 1
 
0.5%
2010 1
 
0.5%
2011 1
 
0.5%
2163 1
 
0.5%
2164 1
 
0.5%
2165 1
 
0.5%
2166 1
 
0.5%
Other values (197) 197
95.2%
ValueCountFrequency (%)
435 1
0.5%
436 1
0.5%
437 1
0.5%
438 1
0.5%
439 1
0.5%
440 1
0.5%
441 1
0.5%
442 1
0.5%
443 1
0.5%
444 1
0.5%
ValueCountFrequency (%)
3317 1
0.5%
3316 1
0.5%
3315 1
0.5%
3314 1
0.5%
3313 1
0.5%
3312 1
0.5%
3311 1
0.5%
3310 1
0.5%
3044 1
0.5%
3043 1
0.5%
Distinct203
Distinct (%)98.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2023-12-12T10:40:15.898733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length19
Mean length7.3913043
Min length3

Characters and Unicode

Total characters1530
Distinct characters202
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique199 ?
Unique (%)96.1%

Sample

1st row삼성물산㈜ 건설부문
2nd row롯데건설㈜
3rd row㈜태영건설
4th row㈜포스코건설
5th row한국도로공사
ValueCountFrequency (%)
건설부문 3
 
1.4%
삼성물산㈜ 2
 
0.9%
sk건설(주 2
 
0.9%
현대건설㈜ 2
 
0.9%
㈜신성이엔지 2
 
0.9%
보림토건(주 2
 
0.9%
무경설비㈜ 1
 
0.5%
일광전설㈜ 1
 
0.5%
티엔에스㈜ 1
 
0.5%
sk 1
 
0.5%
Other values (204) 204
92.3%
2023-12-12T10:40:16.432457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
122
 
8.0%
) 119
 
7.8%
( 119
 
7.8%
70
 
4.6%
61
 
4.0%
55
 
3.6%
37
 
2.4%
28
 
1.8%
27
 
1.8%
24
 
1.6%
Other values (192) 868
56.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1159
75.8%
Close Punctuation 119
 
7.8%
Open Punctuation 119
 
7.8%
Other Symbol 70
 
4.6%
Uppercase Letter 42
 
2.7%
Space Separator 17
 
1.1%
Other Punctuation 4
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
122
 
10.5%
61
 
5.3%
55
 
4.7%
37
 
3.2%
28
 
2.4%
27
 
2.3%
24
 
2.1%
23
 
2.0%
21
 
1.8%
19
 
1.6%
Other values (172) 742
64.0%
Uppercase Letter
ValueCountFrequency (%)
S 9
21.4%
C 7
16.7%
E 6
14.3%
K 4
9.5%
G 4
9.5%
M 2
 
4.8%
J 2
 
4.8%
L 2
 
4.8%
D 1
 
2.4%
F 1
 
2.4%
Other values (4) 4
9.5%
Other Punctuation
ValueCountFrequency (%)
. 2
50.0%
& 2
50.0%
Close Punctuation
ValueCountFrequency (%)
) 119
100.0%
Open Punctuation
ValueCountFrequency (%)
( 119
100.0%
Other Symbol
ValueCountFrequency (%)
70
100.0%
Space Separator
ValueCountFrequency (%)
17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1229
80.3%
Common 259
 
16.9%
Latin 42
 
2.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
122
 
9.9%
70
 
5.7%
61
 
5.0%
55
 
4.5%
37
 
3.0%
28
 
2.3%
27
 
2.2%
24
 
2.0%
23
 
1.9%
21
 
1.7%
Other values (173) 761
61.9%
Latin
ValueCountFrequency (%)
S 9
21.4%
C 7
16.7%
E 6
14.3%
K 4
9.5%
G 4
9.5%
M 2
 
4.8%
J 2
 
4.8%
L 2
 
4.8%
D 1
 
2.4%
F 1
 
2.4%
Other values (4) 4
9.5%
Common
ValueCountFrequency (%)
) 119
45.9%
( 119
45.9%
17
 
6.6%
. 2
 
0.8%
& 2
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1159
75.8%
ASCII 301
 
19.7%
None 70
 
4.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
122
 
10.5%
61
 
5.3%
55
 
4.7%
37
 
3.2%
28
 
2.4%
27
 
2.3%
24
 
2.1%
23
 
2.0%
21
 
1.8%
19
 
1.6%
Other values (172) 742
64.0%
ASCII
ValueCountFrequency (%)
) 119
39.5%
( 119
39.5%
17
 
5.6%
S 9
 
3.0%
C 7
 
2.3%
E 6
 
2.0%
K 4
 
1.3%
G 4
 
1.3%
M 2
 
0.7%
J 2
 
0.7%
Other values (9) 12
 
4.0%
None
ValueCountFrequency (%)
70
100.0%

인증연도
Real number (ℝ)

HIGH CORRELATION 

Distinct21
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2013.3285
Minimum2002
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-12T10:40:16.647969image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2002
5-th percentile2006.3
Q12010
median2012
Q32017
95-th percentile2021
Maximum2022
Range20
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.7895831
Coefficient of variation (CV)0.0023789377
Kurtosis-0.67914146
Mean2013.3285
Median Absolute Deviation (MAD)4
Skewness-0.051684071
Sum416759
Variance22.940106
MonotonicityIncreasing
2023-12-12T10:40:16.829710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
2012 25
12.1%
2017 22
 
10.6%
2010 16
 
7.7%
2011 16
 
7.7%
2009 15
 
7.2%
2018 11
 
5.3%
2008 11
 
5.3%
2015 11
 
5.3%
2020 10
 
4.8%
2007 10
 
4.8%
Other values (11) 60
29.0%
ValueCountFrequency (%)
2002 3
 
1.4%
2003 3
 
1.4%
2004 1
 
0.5%
2005 2
 
1.0%
2006 2
 
1.0%
2007 10
4.8%
2008 11
5.3%
2009 15
7.2%
2010 16
7.7%
2011 16
7.7%
ValueCountFrequency (%)
2022 8
 
3.9%
2021 8
 
3.9%
2020 10
4.8%
2019 7
 
3.4%
2018 11
5.3%
2017 22
10.6%
2016 10
4.8%
2015 11
5.3%
2014 7
 
3.4%
2013 9
4.3%

비고
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
<NA>
119 
취소
88 

Length

Max length4
Median length4
Mean length3.1497585
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row취소
2nd row<NA>
3rd row<NA>
4th row취소
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 119
57.5%
취소 88
42.5%

Length

2023-12-12T10:40:17.034672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T10:40:17.173103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 119
57.5%
취소 88
42.5%

Interactions

2023-12-12T10:40:14.218248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:40:13.526403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:40:13.872914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:40:14.320032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:40:13.633504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:40:13.967079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:40:14.422240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:40:13.748382image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T10:40:14.079901image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T10:40:17.260085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번구분인증번호인증연도
연번1.0000.3340.9760.979
구분0.3341.0000.0000.356
인증번호0.9760.0001.0000.974
인증연도0.9790.3560.9741.000
2023-12-12T10:40:17.393149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분비고
구분1.0001.000
비고1.0001.000
2023-12-12T10:40:17.480109image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번인증번호인증연도구분비고
연번1.0001.0000.9970.2061.000
인증번호1.0001.0000.9970.0001.000
인증연도0.9970.9971.0000.2541.000
구분0.2060.0000.2541.0001.000
비고1.0001.0001.0001.0001.000

Missing values

2023-12-12T10:40:14.569874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T10:40:14.685361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번구분인증번호사업장명인증연도비고
01종합건설업체435삼성물산㈜ 건설부문2002취소
12종합건설업체436롯데건설㈜2002<NA>
23종합건설업체437㈜태영건설2002<NA>
34종합건설업체438㈜포스코건설2003취소
45발주기관439한국도로공사2003<NA>
56발주기관440한국서부발전㈜ 청송양수건설처2003취소
67종합건설업체441현대건설㈜2004취소
78종합건설업체442㈜한진중공업 건설부문2005<NA>
89발주기관443한국수력원자력㈜ 예천양수건설처2005취소
910발주기관444인천국제공항공사2006<NA>
연번구분인증번호사업장명인증연도비고
197198전문건설업체3043동남통신건설㈜2021<NA>
198199전문건설업체3044(주)세광통신2021<NA>
199200발주기관3310국가철도공단2022<NA>
200201종합건설업체3311금강주택(주)2022<NA>
201202종합건설업체3312우미건설(주)2022<NA>
202203종합건설업체3313삼환기업(주)2022<NA>
203204전문건설업체3314(주)창원기전2022<NA>
204205전문건설업체3315(유)엠케이지2022<NA>
205206전문건설업체3316(주)윈하이텍2022<NA>
206207전문건설업체3317동서통신(주)2022<NA>