Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory566.4 KiB
Average record size in memory58.0 B

Variable types

Numeric2
Text2
Categorical2

Dataset

Description한국산업안전보건공단에서 사용하는 표준 용어에 대한 정보로 용어명 도메인 유형구분, 도메인명, 데이터타입, 데이터 길이와 같은 컬럼을 제공합니다.
Author한국산업안전보건공단
URLhttps://www.data.go.kr/data/15091964/fileData.do

Alerts

데이터타입 is highly imbalanced (56.7%)Imbalance
번호 has unique valuesUnique
데이터길이 has 1684 (16.8%) zerosZeros

Reproduction

Analysis started2023-12-12 13:02:24.862928
Analysis finished2023-12-12 13:02:26.434068
Duration1.57 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

번호
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9693.9196
Minimum2
Maximum19464
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T22:02:26.543903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile958.9
Q14862.75
median9632.5
Q314568.25
95-th percentile18491.05
Maximum19464
Range19462
Interquartile range (IQR)9705.5

Descriptive statistics

Standard deviation5614.577
Coefficient of variation (CV)0.57918544
Kurtosis-1.1919924
Mean9693.9196
Median Absolute Deviation (MAD)4856.5
Skewness0.013073001
Sum96939196
Variance31523475
MonotonicityNot monotonic
2023-12-12T22:02:26.734396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4119 1
 
< 0.1%
1159 1
 
< 0.1%
11500 1
 
< 0.1%
5964 1
 
< 0.1%
3485 1
 
< 0.1%
11995 1
 
< 0.1%
13677 1
 
< 0.1%
17076 1
 
< 0.1%
9862 1
 
< 0.1%
17909 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
2 1
< 0.1%
3 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
10 1
< 0.1%
13 1
< 0.1%
14 1
< 0.1%
17 1
< 0.1%
18 1
< 0.1%
ValueCountFrequency (%)
19464 1
< 0.1%
19461 1
< 0.1%
19459 1
< 0.1%
19456 1
< 0.1%
19450 1
< 0.1%
19448 1
< 0.1%
19447 1
< 0.1%
19445 1
< 0.1%
19440 1
< 0.1%
19439 1
< 0.1%
Distinct9979
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T22:02:26.994415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length43
Median length35
Mean length8.3485
Min length2

Characters and Unicode

Total characters83485
Distinct characters529
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9958 ?
Unique (%)99.6%

Sample

1st row저장일자
2nd row절감여부
3rd row교육기관구분코드
4th row기준일자
5th row기간시간분류코드
ValueCountFrequency (%)
s마크기술분야명 2
 
< 0.1%
강사이메일암호화값 2
 
< 0.1%
ods처리유형구분코드 2
 
< 0.1%
건강적합율 2
 
< 0.1%
강의월또는연평균횟수 2
 
< 0.1%
강사한자성명 2
 
< 0.1%
가중사망지분율 2
 
< 0.1%
거래처담당자직위명 2
 
< 0.1%
가중치운영3영역값 2
 
< 0.1%
거절사유구분코드 2
 
< 0.1%
Other values (9969) 9980
99.8%
2023-12-12T22:02:27.371956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2837
 
3.4%
2753
 
3.3%
2487
 
3.0%
1981
 
2.4%
1960
 
2.3%
1722
 
2.1%
1697
 
2.0%
1607
 
1.9%
1534
 
1.8%
1429
 
1.7%
Other values (519) 63478
76.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 80548
96.5%
Decimal Number 1838
 
2.2%
Uppercase Letter 451
 
0.5%
Open Punctuation 300
 
0.4%
Close Punctuation 300
 
0.4%
Dash Punctuation 23
 
< 0.1%
Lowercase Letter 21
 
< 0.1%
Other Punctuation 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2837
 
3.5%
2753
 
3.4%
2487
 
3.1%
1981
 
2.5%
1960
 
2.4%
1722
 
2.1%
1697
 
2.1%
1607
 
2.0%
1534
 
1.9%
1429
 
1.8%
Other values (472) 60541
75.2%
Uppercase Letter
ValueCountFrequency (%)
S 77
17.1%
P 37
 
8.2%
M 37
 
8.2%
B 32
 
7.1%
A 28
 
6.2%
R 28
 
6.2%
L 25
 
5.5%
O 23
 
5.1%
D 23
 
5.1%
E 19
 
4.2%
Other values (15) 122
27.1%
Decimal Number
ValueCountFrequency (%)
1 511
27.8%
2 430
23.4%
3 233
12.7%
4 171
 
9.3%
5 129
 
7.0%
0 120
 
6.5%
6 90
 
4.9%
7 60
 
3.3%
8 51
 
2.8%
9 43
 
2.3%
Lowercase Letter
ValueCountFrequency (%)
i 4
19.0%
e 4
19.0%
g 3
14.3%
w 2
9.5%
n 2
9.5%
p 2
9.5%
m 2
9.5%
l 2
9.5%
Open Punctuation
ValueCountFrequency (%)
( 300
100.0%
Close Punctuation
ValueCountFrequency (%)
) 300
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 23
100.0%
Other Punctuation
ValueCountFrequency (%)
% 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 80548
96.5%
Common 2465
 
3.0%
Latin 472
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2837
 
3.5%
2753
 
3.4%
2487
 
3.1%
1981
 
2.5%
1960
 
2.4%
1722
 
2.1%
1697
 
2.1%
1607
 
2.0%
1534
 
1.9%
1429
 
1.8%
Other values (472) 60541
75.2%
Latin
ValueCountFrequency (%)
S 77
16.3%
P 37
 
7.8%
M 37
 
7.8%
B 32
 
6.8%
A 28
 
5.9%
R 28
 
5.9%
L 25
 
5.3%
O 23
 
4.9%
D 23
 
4.9%
E 19
 
4.0%
Other values (23) 143
30.3%
Common
ValueCountFrequency (%)
1 511
20.7%
2 430
17.4%
( 300
12.2%
) 300
12.2%
3 233
9.5%
4 171
 
6.9%
5 129
 
5.2%
0 120
 
4.9%
6 90
 
3.7%
7 60
 
2.4%
Other values (4) 121
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 80548
96.5%
ASCII 2937
 
3.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2837
 
3.5%
2753
 
3.4%
2487
 
3.1%
1981
 
2.5%
1960
 
2.4%
1722
 
2.1%
1697
 
2.1%
1607
 
2.0%
1534
 
1.9%
1429
 
1.8%
Other values (472) 60541
75.2%
ASCII
ValueCountFrequency (%)
1 511
17.4%
2 430
14.6%
( 300
10.2%
) 300
10.2%
3 233
7.9%
4 171
 
5.8%
5 129
 
4.4%
0 120
 
4.1%
6 90
 
3.1%
S 77
 
2.6%
Other values (37) 576
19.6%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
일반
6310 
코드
2467 
번호
1223 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반
2nd row일반
3rd row코드
4th row일반
5th row코드

Common Values

ValueCountFrequency (%)
일반 6310
63.1%
코드 2467
 
24.7%
번호 1223
 
12.2%

Length

2023-12-12T22:02:27.776591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:02:27.887643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반 6310
63.1%
코드 2467
 
24.7%
번호 1223
 
12.2%
Distinct182
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T22:02:28.112478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length13
Mean length6.4186
Min length4

Characters and Unicode

Total characters64186
Distinct characters199
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique63 ?
Unique (%)0.6%

Sample

1st row일자VC8
2nd row여부VC1
3rd row구분코드VC20
4th row일자VC8
5th row코드VC15
ValueCountFrequency (%)
명vc200 1337
13.4%
구분코드vc20 1316
13.2%
코드vc15 1123
11.2%
일자vc8 759
 
7.6%
여부vc1 667
 
6.7%
금액dec 616
 
6.2%
내용vc2000 525
 
5.2%
수dec 450
 
4.5%
일련번호dec10 292
 
2.9%
번호vc11 254
 
2.5%
Other values (172) 2661
26.6%
2023-12-12T22:02:28.500294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 9902
15.4%
0 7747
 
12.1%
V 7575
 
11.8%
2 3681
 
5.7%
1 3183
 
5.0%
2483
 
3.9%
2470
 
3.8%
D 2320
 
3.6%
E 2319
 
3.6%
5 1826
 
2.8%
Other values (189) 20680
32.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 23802
37.1%
Uppercase Letter 22354
34.8%
Decimal Number 18030
28.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2483
 
10.4%
2470
 
10.4%
1436
 
6.0%
1337
 
5.6%
1317
 
5.5%
1286
 
5.4%
1236
 
5.2%
1234
 
5.2%
1039
 
4.4%
830
 
3.5%
Other values (168) 9134
38.4%
Uppercase Letter
ValueCountFrequency (%)
C 9902
44.3%
V 7575
33.9%
D 2320
 
10.4%
E 2319
 
10.4%
T 96
 
0.4%
M 96
 
0.4%
L 17
 
0.1%
U 12
 
0.1%
R 8
 
< 0.1%
N 5
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 7747
43.0%
2 3681
20.4%
1 3183
17.7%
5 1826
 
10.1%
8 798
 
4.4%
3 265
 
1.5%
4 240
 
1.3%
9 158
 
0.9%
6 67
 
0.4%
7 65
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 23802
37.1%
Latin 22354
34.8%
Common 18030
28.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2483
 
10.4%
2470
 
10.4%
1436
 
6.0%
1337
 
5.6%
1317
 
5.5%
1286
 
5.4%
1236
 
5.2%
1234
 
5.2%
1039
 
4.4%
830
 
3.5%
Other values (168) 9134
38.4%
Latin
ValueCountFrequency (%)
C 9902
44.3%
V 7575
33.9%
D 2320
 
10.4%
E 2319
 
10.4%
T 96
 
0.4%
M 96
 
0.4%
L 17
 
0.1%
U 12
 
0.1%
R 8
 
< 0.1%
N 5
 
< 0.1%
Common
ValueCountFrequency (%)
0 7747
43.0%
2 3681
20.4%
1 3183
17.7%
5 1826
 
10.1%
8 798
 
4.4%
3 265
 
1.5%
4 240
 
1.3%
9 158
 
0.9%
6 67
 
0.4%
7 65
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 40384
62.9%
Hangul 23802
37.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 9902
24.5%
0 7747
19.2%
V 7575
18.8%
2 3681
 
9.1%
1 3183
 
7.9%
D 2320
 
5.7%
E 2319
 
5.7%
5 1826
 
4.5%
8 798
 
2.0%
3 265
 
0.7%
Other values (11) 768
 
1.9%
Hangul
ValueCountFrequency (%)
2483
 
10.4%
2470
 
10.4%
1436
 
6.0%
1337
 
5.6%
1317
 
5.5%
1286
 
5.4%
1236
 
5.2%
1234
 
5.2%
1039
 
4.4%
830
 
3.5%
Other values (168) 9134
38.4%

데이터타입
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
VARCHAR
7576 
NUMERIC
2319 
DATE
 
96
CLOB
 
9

Length

Max length7
Median length7
Mean length6.9685
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowVARCHAR
2nd rowVARCHAR
3rd rowVARCHAR
4th rowVARCHAR
5th rowVARCHAR

Common Values

ValueCountFrequency (%)
VARCHAR 7576
75.8%
NUMERIC 2319
 
23.2%
DATE 96
 
1.0%
CLOB 9
 
0.1%

Length

2023-12-12T22:02:28.661348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:02:28.771595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
varchar 7576
75.8%
numeric 2319
 
23.2%
date 96
 
1.0%
clob 9
 
0.1%

데이터길이
Real number (ℝ)

ZEROS 

Distinct35
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean182.9338
Minimum0
Maximum4000
Zeros1684
Zeros (%)16.8%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T22:02:28.876842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14
median15
Q350
95-th percentile2000
Maximum4000
Range4000
Interquartile range (IQR)46

Descriptive statistics

Standard deviation540.22655
Coefficient of variation (CV)2.9531259
Kurtosis16.853902
Mean182.9338
Median Absolute Deviation (MAD)14
Skewness3.9725137
Sum1829338
Variance291844.73
MonotonicityNot monotonic
2023-12-12T22:02:29.000667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
0 1684
16.8%
20 1485
14.8%
200 1357
13.6%
15 1145
11.5%
8 772
7.7%
1 702
7.0%
2000 595
 
5.9%
10 536
 
5.4%
50 325
 
3.2%
11 254
 
2.5%
Other values (25) 1145
11.5%
ValueCountFrequency (%)
0 1684
16.8%
1 702
7.0%
2 4
 
< 0.1%
3 11
 
0.1%
4 118
 
1.2%
5 192
 
1.9%
6 63
 
0.6%
7 1
 
< 0.1%
8 772
7.7%
9 142
 
1.4%
ValueCountFrequency (%)
4000 49
 
0.5%
2000 595
5.9%
1024 1
 
< 0.1%
1000 7
 
0.1%
500 38
 
0.4%
300 151
 
1.5%
250 2
 
< 0.1%
200 1357
13.6%
100 134
 
1.3%
64 3
 
< 0.1%

Interactions

2023-12-12T22:02:25.956397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:02:25.715015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:02:26.084977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:02:25.832416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T22:02:29.114878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호도메인유형구분데이터타입데이터길이
번호1.0000.2640.1580.216
도메인유형구분0.2641.0000.2490.191
데이터타입0.1580.2491.0000.106
데이터길이0.2160.1910.1061.000
2023-12-12T22:02:29.227244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
데이터타입도메인유형구분
데이터타입1.0000.237
도메인유형구분0.2371.000
2023-12-12T22:02:29.320395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호데이터길이도메인유형구분데이터타입
번호1.0000.0380.1630.095
데이터길이0.0381.0000.1460.087
도메인유형구분0.1630.1461.0000.237
데이터타입0.0950.0870.2371.000

Missing values

2023-12-12T22:02:26.219758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:02:26.367642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

번호용어명도메인유형구분도메인명데이터타입데이터길이
41184119저장일자일반일자VC8VARCHAR8
1018010181절감여부일반여부VC1VARCHAR1
1046310464교육기관구분코드코드구분코드VC20VARCHAR20
17421743기준일자일반일자VC8VARCHAR8
80758076기간시간분류코드코드코드VC15VARCHAR15
75507551대표이메일일반이메일VC50VARCHAR50
1503815039미디어제작일자일반일자VC8VARCHAR8
1691716918사업장공사장지점명일반명VC200VARCHAR200
1339613397문자메시지발송번호번호번호VC11VARCHAR11
1266512666수당종류코드코드코드VC15VARCHAR15
번호용어명도메인유형구분도메인명데이터타입데이터길이
1624616247출장주관부서코드코드코드VC15VARCHAR15
97859786제3실태문제점내용일반내용VC2000VARCHAR2000
18101811관심분야명일반명VC200VARCHAR200
43384339제12월계획문자값일반문자값VC50VARCHAR50
34483449접수종료일자일반일자VC8VARCHAR8
1646316464점검의견상세내용일반상세내용VC4000VARCHAR4000
1026810269사회적협동조합제품목표금액일반금액DECNUMERIC0
1014110142제1하청사업자회사명일반명VC200VARCHAR200
82608261사업장사업자등록번호번호사업자등록번호VC20VARCHAR20
55035504재발급사유구분코드코드구분코드VC20VARCHAR20