Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells8341
Missing cells (%)16.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory478.5 KiB
Average record size in memory49.0 B

Variable types

Numeric1
Categorical1
Text2
DateTime1

Dataset

Description위험성평가 인정신청서를 제출한 사업장의 위험성평가 결과와 실시규정을 심사하여 선정된 우수 사업장 현황을 제공
Author한국산업안전보건공단
URLhttps://www.data.go.kr/data/3038400/fileData.do

Alerts

글번호 is highly overall correlated with 노동지청명High correlation
노동지청명 is highly overall correlated with 글번호High correlation
공사장명 has 8341 (83.4%) missing valuesMissing
글번호 has unique valuesUnique

Reproduction

Analysis started2023-12-11 23:25:18.548270
Analysis finished2023-12-11 23:25:19.679152
Duration1.13 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

글번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6459.2061
Minimum1
Maximum12969
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T08:25:19.745218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile640.95
Q13215.75
median6438.5
Q39687.25
95-th percentile12323.05
Maximum12969
Range12968
Interquartile range (IQR)6471.5

Descriptive statistics

Standard deviation3740.6873
Coefficient of variation (CV)0.57912493
Kurtosis-1.1959182
Mean6459.2061
Median Absolute Deviation (MAD)3237.5
Skewness0.010053763
Sum64592061
Variance13992742
MonotonicityNot monotonic
2023-12-12T08:25:19.864115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8281 1
 
< 0.1%
4075 1
 
< 0.1%
2981 1
 
< 0.1%
5835 1
 
< 0.1%
6234 1
 
< 0.1%
8048 1
 
< 0.1%
3746 1
 
< 0.1%
12384 1
 
< 0.1%
12420 1
 
< 0.1%
7800 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
11 1
< 0.1%
ValueCountFrequency (%)
12969 1
< 0.1%
12968 1
< 0.1%
12967 1
< 0.1%
12966 1
< 0.1%
12965 1
< 0.1%
12964 1
< 0.1%
12963 1
< 0.1%
12962 1
< 0.1%
12960 1
< 0.1%
12959 1
< 0.1%

노동지청명
Categorical

HIGH CORRELATION 

Distinct50
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기
 
543
안산
 
461
성남
 
442
양산
 
430
울산
 
415
Other values (45)
7709 

Length

Max length7
Median length2
Mean length2.4954
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row울산
2nd row양산
3rd row울산
4th row대전청
5th row서울남부

Common Values

ValueCountFrequency (%)
경기 543
 
5.4%
안산 461
 
4.6%
성남 442
 
4.4%
양산 430
 
4.3%
울산 415
 
4.2%
창원 403
 
4.0%
부천 368
 
3.7%
천안 334
 
3.3%
대전청 331
 
3.3%
대구서부 329
 
3.3%
Other values (40) 5944
59.4%

Length

2023-12-12T08:25:19.964621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 543
 
5.4%
안산 461
 
4.6%
성남 442
 
4.4%
양산 430
 
4.3%
울산 415
 
4.2%
창원 403
 
4.0%
부천 368
 
3.7%
천안 334
 
3.3%
대전청 331
 
3.3%
대구서부 329
 
3.3%
Other values (40) 5944
59.4%
Distinct8646
Distinct (%)86.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T08:25:20.155869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length38
Median length31
Mean length8.6164
Min length2

Characters and Unicode

Total characters86164
Distinct characters802
Distinct categories12 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8358 ?
Unique (%)83.6%

Sample

1st row(주)로리텍
2nd row(주)기성
3rd row정진기업
4th row(주)스타벅스커피코리아
5th row(주)아워홈_본사등
ValueCountFrequency (%)
주)스타벅스커피코리아 158
 
1.6%
주택관리공단(주 106
 
1.1%
주)현대그린푸드 81
 
0.8%
주)신세계푸드 80
 
0.8%
주)현대그린푸드/음식 76
 
0.8%
주)케이티 60
 
0.6%
한화호텔앤드리조트(주 47
 
0.5%
주)kt 31
 
0.3%
주)한화에스테이트 27
 
0.3%
주택관리공단(주)(901)(주 26
 
0.3%
Other values (8661) 9348
93.1%
2023-12-12T08:25:20.483789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7887
 
9.2%
( 5559
 
6.5%
) 5536
 
6.4%
2652
 
3.1%
2112
 
2.5%
2076
 
2.4%
2006
 
2.3%
1924
 
2.2%
1608
 
1.9%
1400
 
1.6%
Other values (792) 53404
62.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 71227
82.7%
Close Punctuation 6037
 
7.0%
Open Punctuation 6032
 
7.0%
Uppercase Letter 1404
 
1.6%
Other Punctuation 462
 
0.5%
Lowercase Letter 375
 
0.4%
Decimal Number 340
 
0.4%
Dash Punctuation 212
 
0.2%
Space Separator 57
 
0.1%
Connector Punctuation 14
 
< 0.1%
Other values (2) 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7887
 
11.1%
2652
 
3.7%
2112
 
3.0%
2076
 
2.9%
2006
 
2.8%
1924
 
2.7%
1608
 
2.3%
1400
 
2.0%
1189
 
1.7%
1177
 
1.7%
Other values (707) 47196
66.3%
Uppercase Letter
ValueCountFrequency (%)
T 145
 
10.3%
E 128
 
9.1%
S 126
 
9.0%
C 126
 
9.0%
N 96
 
6.8%
G 90
 
6.4%
K 87
 
6.2%
M 76
 
5.4%
O 73
 
5.2%
L 68
 
4.8%
Other values (16) 389
27.7%
Lowercase Letter
ValueCountFrequency (%)
o 49
13.1%
e 48
12.8%
n 32
 
8.5%
t 32
 
8.5%
r 24
 
6.4%
a 21
 
5.6%
c 21
 
5.6%
d 20
 
5.3%
i 18
 
4.8%
l 16
 
4.3%
Other values (12) 94
25.1%
Decimal Number
ValueCountFrequency (%)
1 102
30.0%
2 80
23.5%
0 45
13.2%
9 37
 
10.9%
3 30
 
8.8%
5 12
 
3.5%
7 11
 
3.2%
4 10
 
2.9%
6 9
 
2.6%
8 2
 
0.6%
Other values (2) 2
 
0.6%
Other Punctuation
ValueCountFrequency (%)
/ 287
62.1%
. 76
 
16.5%
& 36
 
7.8%
23
 
5.0%
, 19
 
4.1%
: 9
 
1.9%
5
 
1.1%
@ 3
 
0.6%
· 2
 
0.4%
# 1
 
0.2%
Open Punctuation
ValueCountFrequency (%)
( 5559
92.2%
465
 
7.7%
[ 8
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 5536
91.7%
493
 
8.2%
] 8
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 210
99.1%
2
 
0.9%
Space Separator
ValueCountFrequency (%)
29
50.9%
  28
49.1%
Math Symbol
ValueCountFrequency (%)
< 1
50.0%
> 1
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 14
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 71224
82.7%
Common 13156
 
15.3%
Latin 1779
 
2.1%
Han 5
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7887
 
11.1%
2652
 
3.7%
2112
 
3.0%
2076
 
2.9%
2006
 
2.8%
1924
 
2.7%
1608
 
2.3%
1400
 
2.0%
1189
 
1.7%
1177
 
1.7%
Other values (703) 47193
66.3%
Latin
ValueCountFrequency (%)
T 145
 
8.2%
E 128
 
7.2%
S 126
 
7.1%
C 126
 
7.1%
N 96
 
5.4%
G 90
 
5.1%
K 87
 
4.9%
M 76
 
4.3%
O 73
 
4.1%
L 68
 
3.8%
Other values (38) 764
42.9%
Common
ValueCountFrequency (%)
( 5559
42.3%
) 5536
42.1%
493
 
3.7%
465
 
3.5%
/ 287
 
2.2%
- 210
 
1.6%
1 102
 
0.8%
2 80
 
0.6%
. 76
 
0.6%
0 45
 
0.3%
Other values (26) 303
 
2.3%
Han
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 71221
82.7%
ASCII 13914
 
16.1%
None 1023
 
1.2%
CJK 5
 
< 0.1%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
7887
 
11.1%
2652
 
3.7%
2112
 
3.0%
2076
 
2.9%
2006
 
2.8%
1924
 
2.7%
1608
 
2.3%
1400
 
2.0%
1189
 
1.7%
1177
 
1.7%
Other values (701) 47190
66.3%
ASCII
ValueCountFrequency (%)
( 5559
40.0%
) 5536
39.8%
/ 287
 
2.1%
- 210
 
1.5%
T 145
 
1.0%
E 128
 
0.9%
S 126
 
0.9%
C 126
 
0.9%
1 102
 
0.7%
N 96
 
0.7%
Other values (64) 1599
 
11.5%
None
ValueCountFrequency (%)
493
48.2%
465
45.5%
  28
 
2.7%
23
 
2.2%
5
 
0.5%
2
 
0.2%
2
 
0.2%
· 2
 
0.2%
1
 
0.1%
1
 
0.1%
CJK
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%

공사장명
Text

MISSING 

Distinct1434
Distinct (%)86.4%
Missing8341
Missing (%)83.4%
Memory size156.2 KiB
2023-12-12T08:25:20.690450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length58
Median length37
Mean length12.086197
Min length2

Characters and Unicode

Total characters20051
Distinct characters579
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1380 ?
Unique (%)83.2%

Sample

1st row충남대정문
2nd row롯데마트김포공항점
3rd row마트송도
4th rowSK hynix 자회사형 표준사업장 Project(청주)
5th row관악구민종합체육센터
ValueCountFrequency (%)
주)케이티 82
 
3.0%
주)스타벅스커피코리아 66
 
2.4%
신축공사 51
 
1.8%
이마트 35
 
1.3%
주)서울메트로환경 24
 
0.9%
한화호텔앤드리조트(주 23
 
0.8%
주)미래비엠 22
 
0.8%
롯데마트 18
 
0.6%
주)현대그린푸드/음식 17
 
0.6%
롯데백화점 15
 
0.5%
Other values (1898) 2420
87.3%
2023-12-12T08:25:21.019562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1114
 
5.6%
1000
 
5.0%
( 794
 
4.0%
) 789
 
3.9%
573
 
2.9%
437
 
2.2%
414
 
2.1%
405
 
2.0%
400
 
2.0%
359
 
1.8%
Other values (569) 13766
68.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 15854
79.1%
Space Separator 1114
 
5.6%
Open Punctuation 813
 
4.1%
Close Punctuation 808
 
4.0%
Uppercase Letter 536
 
2.7%
Decimal Number 417
 
2.1%
Lowercase Letter 231
 
1.2%
Other Punctuation 124
 
0.6%
Dash Punctuation 117
 
0.6%
Other Symbol 16
 
0.1%
Other values (2) 21
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1000
 
6.3%
573
 
3.6%
437
 
2.8%
414
 
2.6%
405
 
2.6%
400
 
2.5%
359
 
2.3%
351
 
2.2%
266
 
1.7%
257
 
1.6%
Other values (486) 11392
71.9%
Uppercase Letter
ValueCountFrequency (%)
S 59
11.0%
C 56
 
10.4%
T 48
 
9.0%
K 47
 
8.8%
D 34
 
6.3%
O 31
 
5.8%
P 30
 
5.6%
A 30
 
5.6%
L 28
 
5.2%
I 26
 
4.9%
Other values (14) 147
27.4%
Lowercase Letter
ValueCountFrequency (%)
e 27
11.7%
r 22
 
9.5%
o 22
 
9.5%
n 21
 
9.1%
t 17
 
7.4%
a 17
 
7.4%
i 13
 
5.6%
l 13
 
5.6%
k 10
 
4.3%
p 9
 
3.9%
Other values (14) 60
26.0%
Decimal Number
ValueCountFrequency (%)
1 113
27.1%
2 102
24.5%
0 50
12.0%
3 39
 
9.4%
9 27
 
6.5%
8 20
 
4.8%
4 16
 
3.8%
7 16
 
3.8%
5 15
 
3.6%
6 14
 
3.4%
Other values (3) 5
 
1.2%
Other Punctuation
ValueCountFrequency (%)
/ 69
55.6%
, 24
 
19.4%
& 15
 
12.1%
. 9
 
7.3%
' 2
 
1.6%
: 2
 
1.6%
· 2
 
1.6%
# 1
 
0.8%
Math Symbol
ValueCountFrequency (%)
~ 7
63.6%
+ 2
 
18.2%
> 1
 
9.1%
< 1
 
9.1%
Open Punctuation
ValueCountFrequency (%)
( 794
97.7%
13
 
1.6%
[ 6
 
0.7%
Close Punctuation
ValueCountFrequency (%)
) 789
97.6%
14
 
1.7%
] 5
 
0.6%
Space Separator
ValueCountFrequency (%)
1114
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 117
100.0%
Other Symbol
ValueCountFrequency (%)
16
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 15870
79.1%
Common 3414
 
17.0%
Latin 767
 
3.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1000
 
6.3%
573
 
3.6%
437
 
2.8%
414
 
2.6%
405
 
2.6%
400
 
2.5%
359
 
2.3%
351
 
2.2%
266
 
1.7%
257
 
1.6%
Other values (487) 11408
71.9%
Latin
ValueCountFrequency (%)
S 59
 
7.7%
C 56
 
7.3%
T 48
 
6.3%
K 47
 
6.1%
D 34
 
4.4%
O 31
 
4.0%
P 30
 
3.9%
A 30
 
3.9%
L 28
 
3.7%
e 27
 
3.5%
Other values (38) 377
49.2%
Common
ValueCountFrequency (%)
1114
32.6%
( 794
23.3%
) 789
23.1%
- 117
 
3.4%
1 113
 
3.3%
2 102
 
3.0%
/ 69
 
2.0%
0 50
 
1.5%
3 39
 
1.1%
9 27
 
0.8%
Other values (24) 200
 
5.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 15854
79.1%
ASCII 4147
 
20.7%
None 50
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1114
26.9%
( 794
19.1%
) 789
19.0%
- 117
 
2.8%
1 113
 
2.7%
2 102
 
2.5%
/ 69
 
1.7%
S 59
 
1.4%
C 56
 
1.4%
0 50
 
1.2%
Other values (66) 884
21.3%
Hangul
ValueCountFrequency (%)
1000
 
6.3%
573
 
3.6%
437
 
2.8%
414
 
2.6%
405
 
2.6%
400
 
2.5%
359
 
2.3%
351
 
2.2%
266
 
1.7%
257
 
1.6%
Other values (486) 11392
71.9%
None
ValueCountFrequency (%)
16
32.0%
14
28.0%
13
26.0%
2
 
4.0%
2
 
4.0%
· 2
 
4.0%
1
 
2.0%
Distinct767
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2017-08-01 00:00:00
Maximum2020-07-31 00:00:00
2023-12-12T08:25:21.131334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:25:21.240355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-12T08:25:19.369400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T08:25:21.307566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
글번호노동지청명
글번호1.0001.000
노동지청명1.0001.000
2023-12-12T08:25:21.603030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
글번호노동지청명
글번호1.0000.962
노동지청명0.9621.000

Missing values

2023-12-12T08:25:19.518261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T08:25:19.633504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

글번호노동지청명사업장명공사장명인정일
82808281울산(주)로리텍<NA>2017-08-09
76367637양산(주)기성<NA>2018-11-29
85468547울산정진기업<NA>2018-11-13
33693370대전청(주)스타벅스커피코리아충남대정문2018-03-29
51035104서울남부(주)아워홈_본사등롯데마트김포공항점2018-11-14
24892490대구서부주식회사대원정밀<NA>2019-09-10
74937494양산경덕산업주식회사<NA>2020-01-01
69556956안산주식회사중앙브레인(경기지점)<NA>2019-10-02
31243125대전청계룡유통<NA>2017-11-16
1022010221중부청이조케터링서비스(주)마트송도2017-08-10
글번호노동지청명사업장명공사장명인정일
91529153의정부대흥화공<NA>2019-03-08
982983경기주식회사디케이텍인더스트리<NA>2019-08-22
86448645울산대성기업<NA>2018-04-25
88078808원주(주)하나플랜트<NA>2019-10-29
62136214성남세준푸드농업회사법인주식회사<NA>2020-01-16
35803581목포지팸중공업(주)<NA>2017-11-27
1050910510중부청신영산업사<NA>2019-01-10
1244812449평택주식회사아로<NA>2019-07-25
43014302부천성일식품<NA>2018-01-17
1032310324중부청(주)서브엔롯데백화점 인천터미널점2019-12-05