Overview

Dataset statistics

Number of variables7
Number of observations2486
Missing cells85
Missing cells (%)0.5%
Duplicate rows12
Duplicate rows (%)0.5%
Total size in memory138.5 KiB
Average record size in memory57.1 B

Variable types

Text2
Categorical2
Numeric1
Boolean2

Dataset

Description중소벤처기업연수원에서 운영하는 온오프라인 연수과정에 대한 정보입니다.- 컬럼명 : 과정명, 연수방법, 중분류, 소분류, 연수비용, 교재유무, 시험유무
Author중소벤처기업진흥공단
URLhttps://www.data.go.kr/data/15124931/fileData.do

Alerts

Dataset has 12 (0.5%) duplicate rowsDuplicates
연수방법 is highly overall correlated with 중분류 and 2 other fieldsHigh correlation
중분류 is highly overall correlated with 연수방법 and 2 other fieldsHigh correlation
교재유무(Y_N) is highly overall correlated with 연수방법 and 1 other fieldsHigh correlation
시험유무(Y_N) is highly overall correlated with 연수방법 and 1 other fieldsHigh correlation
연수비용(원) has 85 (3.4%) missing valuesMissing
연수비용(원) has 775 (31.2%) zerosZeros

Reproduction

Analysis started2023-12-12 01:56:10.186345
Analysis finished2023-12-12 01:56:11.432696
Duration1.25 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2422
Distinct (%)97.4%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
2023-12-12T10:56:11.725819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length76
Median length54
Mean length24.734916
Min length2

Characters and Unicode

Total characters61491
Distinct characters782
Distinct categories16 ?
Distinct scripts4 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2363 ?
Unique (%)95.1%

Sample

1st row세무조사 완전준비! 어서와 세무조사는 처음이지?
2nd row품질관리 기본
3rd row자동차 전장기초
4th row알기쉬운 3D스캐닝/프린터 실무
5th row(금형과 관련된)제품 설계기술
ValueCountFrequency (%)
스마트공장 323
 
2.7%
216
 
1.8%
위한 215
 
1.8%
웨비나 157
 
1.3%
실무 140
 
1.2%
135
 
1.1%
2022 127
 
1.0%
기업현장 118
 
1.0%
2022년 107
 
0.9%
기초 75
 
0.6%
Other values (4140) 10493
86.7%
2023-12-12T10:56:12.290770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9835
 
16.0%
1391
 
2.3%
2 1166
 
1.9%
[ 1078
 
1.8%
] 1078
 
1.8%
) 1041
 
1.7%
( 1040
 
1.7%
951
 
1.5%
850
 
1.4%
837
 
1.4%
Other values (772) 42224
68.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 38654
62.9%
Space Separator 9835
 
16.0%
Uppercase Letter 3170
 
5.2%
Decimal Number 2401
 
3.9%
Close Punctuation 2124
 
3.5%
Open Punctuation 2123
 
3.5%
Lowercase Letter 2072
 
3.4%
Other Punctuation 622
 
1.0%
Connector Punctuation 211
 
0.3%
Dash Punctuation 187
 
0.3%
Other values (6) 92
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1391
 
3.6%
951
 
2.5%
850
 
2.2%
837
 
2.2%
755
 
2.0%
696
 
1.8%
656
 
1.7%
631
 
1.6%
584
 
1.5%
544
 
1.4%
Other values (677) 30759
79.6%
Uppercase Letter
ValueCountFrequency (%)
C 335
 
10.6%
P 333
 
10.5%
E 263
 
8.3%
S 263
 
8.3%
A 224
 
7.1%
L 201
 
6.3%
D 185
 
5.8%
T 165
 
5.2%
I 164
 
5.2%
M 143
 
4.5%
Other values (16) 894
28.2%
Lowercase Letter
ValueCountFrequency (%)
t 246
11.9%
e 204
9.8%
r 200
9.7%
o 186
 
9.0%
n 166
 
8.0%
a 158
 
7.6%
i 156
 
7.5%
s 92
 
4.4%
l 88
 
4.2%
h 81
 
3.9%
Other values (15) 495
23.9%
Other Punctuation
ValueCountFrequency (%)
, 222
35.7%
! 110
17.7%
/ 102
16.4%
. 82
 
13.2%
& 32
 
5.1%
· 28
 
4.5%
# 13
 
2.1%
? 12
 
1.9%
: 12
 
1.9%
" 4
 
0.6%
Other values (3) 5
 
0.8%
Decimal Number
ValueCountFrequency (%)
2 1166
48.6%
0 515
21.4%
1 322
 
13.4%
3 133
 
5.5%
4 98
 
4.1%
9 50
 
2.1%
5 46
 
1.9%
6 36
 
1.5%
7 19
 
0.8%
8 16
 
0.7%
Open Punctuation
ValueCountFrequency (%)
[ 1078
50.8%
( 1040
49.0%
4
 
0.2%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
] 1078
50.8%
) 1041
49.0%
4
 
0.2%
1
 
< 0.1%
Letter Number
ValueCountFrequency (%)
2
50.0%
1
25.0%
1
25.0%
Math Symbol
ValueCountFrequency (%)
+ 58
81.7%
~ 13
 
18.3%
Other Symbol
ValueCountFrequency (%)
® 7
70.0%
3
30.0%
Space Separator
ValueCountFrequency (%)
9835
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 211
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 187
100.0%
Final Punctuation
ValueCountFrequency (%)
3
100.0%
Initial Punctuation
ValueCountFrequency (%)
3
100.0%
Modifier Symbol
ValueCountFrequency (%)
´ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 38644
62.8%
Common 17591
28.6%
Latin 5246
 
8.5%
Han 10
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1391
 
3.6%
951
 
2.5%
850
 
2.2%
837
 
2.2%
755
 
2.0%
696
 
1.8%
656
 
1.7%
631
 
1.6%
584
 
1.5%
544
 
1.4%
Other values (668) 30749
79.6%
Latin
ValueCountFrequency (%)
C 335
 
6.4%
P 333
 
6.3%
E 263
 
5.0%
S 263
 
5.0%
t 246
 
4.7%
A 224
 
4.3%
e 204
 
3.9%
L 201
 
3.8%
r 200
 
3.8%
o 186
 
3.5%
Other values (44) 2791
53.2%
Common
ValueCountFrequency (%)
9835
55.9%
2 1166
 
6.6%
[ 1078
 
6.1%
] 1078
 
6.1%
) 1041
 
5.9%
( 1040
 
5.9%
0 515
 
2.9%
1 322
 
1.8%
, 222
 
1.3%
_ 211
 
1.2%
Other values (31) 1083
 
6.2%
Han
ValueCountFrequency (%)
2
20.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 38644
62.8%
ASCII 22775
37.0%
None 49
 
0.1%
CJK 10
 
< 0.1%
Punctuation 6
 
< 0.1%
Number Forms 4
 
< 0.1%
Enclosed Alphanum 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9835
43.2%
2 1166
 
5.1%
[ 1078
 
4.7%
] 1078
 
4.7%
) 1041
 
4.6%
( 1040
 
4.6%
0 515
 
2.3%
C 335
 
1.5%
P 333
 
1.5%
1 322
 
1.4%
Other values (71) 6032
26.5%
Hangul
ValueCountFrequency (%)
1391
 
3.6%
951
 
2.5%
850
 
2.2%
837
 
2.2%
755
 
2.0%
696
 
1.8%
656
 
1.7%
631
 
1.6%
584
 
1.5%
544
 
1.4%
Other values (668) 30749
79.6%
None
ValueCountFrequency (%)
· 28
57.1%
® 7
 
14.3%
4
 
8.2%
4
 
8.2%
3
 
6.1%
1
 
2.0%
1
 
2.0%
´ 1
 
2.0%
Punctuation
ValueCountFrequency (%)
3
50.0%
3
50.0%
Enclosed Alphanum
ValueCountFrequency (%)
3
100.0%
CJK
ValueCountFrequency (%)
2
20.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Number Forms
ValueCountFrequency (%)
2
50.0%
1
25.0%
1
25.0%

연수방법
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
집합연수
1307 
이러닝연수
1179 

Length

Max length5
Median length4
Mean length4.4742558
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row집합연수
2nd row집합연수
3rd row집합연수
4th row집합연수
5th row집합연수

Common Values

ValueCountFrequency (%)
집합연수 1307
52.6%
이러닝연수 1179
47.4%

Length

2023-12-12T10:56:12.464976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T10:56:12.571376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
집합연수 1307
52.6%
이러닝연수 1179
47.4%

중분류
Categorical

HIGH CORRELATION 

Distinct20
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
경영
881 
스마트제조
755 
4차 산업혁명
205 
생산품질
140 
뿌리/생산기술
98 
Other values (15)
407 

Length

Max length10
Median length9
Mean length4.0884956
Min length2

Unique

Unique5 ?
Unique (%)0.2%

Sample

1st row경영
2nd row생산품질
3rd row스마트제조
4th row스마트제조
5th row뿌리/생산기술

Common Values

ValueCountFrequency (%)
경영 881
35.4%
스마트제조 755
30.4%
4차 산업혁명 205
 
8.2%
생산품질 140
 
5.6%
뿌리/생산기술 98
 
3.9%
비즈외국어 94
 
3.8%
뿌리기술 80
 
3.2%
협업연수 70
 
2.8%
기타연수 46
 
1.9%
직무 36
 
1.4%
Other values (10) 81
 
3.3%

Length

2023-12-12T10:56:12.680304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경영 881
32.3%
스마트제조 755
27.7%
4차 205
 
7.5%
산업혁명 205
 
7.5%
생산품질 140
 
5.1%
뿌리/생산기술 98
 
3.6%
비즈외국어 94
 
3.5%
뿌리기술 80
 
2.9%
협업연수 70
 
2.6%
기타연수 46
 
1.7%
Other values (12) 150
 
5.5%
Distinct62
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
2023-12-12T10:56:12.951931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length4
Mean length4.8382944
Min length2

Characters and Unicode

Total characters12028
Distinct characters141
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.3%

Sample

1st row경영직무
2nd row품질관리
3rd row융합기술
4th row융합기술
5th row기계설계/기계가공
ValueCountFrequency (%)
자기계발 375
15.1%
융합기술 223
 
9.0%
도입전략 221
 
8.9%
경영직무 207
 
8.3%
경영일반 186
 
7.5%
운영관리기술 131
 
5.3%
정보기술(it 95
 
3.8%
표면처리/열처리/금형 78
 
3.1%
제조현장관리기술 72
 
2.9%
인사이트 71
 
2.9%
Other values (53) 828
33.3%
2023-12-12T10:56:13.413638image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1179
 
9.8%
643
 
5.3%
595
 
4.9%
486
 
4.0%
448
 
3.7%
437
 
3.6%
417
 
3.5%
390
 
3.2%
/ 341
 
2.8%
292
 
2.4%
Other values (131) 6800
56.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 11224
93.3%
Other Punctuation 349
 
2.9%
Uppercase Letter 253
 
2.1%
Close Punctuation 98
 
0.8%
Open Punctuation 98
 
0.8%
Dash Punctuation 5
 
< 0.1%
Space Separator 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1179
 
10.5%
643
 
5.7%
595
 
5.3%
486
 
4.3%
448
 
4.0%
437
 
3.9%
417
 
3.7%
390
 
3.5%
292
 
2.6%
273
 
2.4%
Other values (114) 6064
54.0%
Uppercase Letter
ValueCountFrequency (%)
I 102
40.3%
T 97
38.3%
E 10
 
4.0%
O 10
 
4.0%
S 7
 
2.8%
M 5
 
2.0%
K 5
 
2.0%
V 5
 
2.0%
A 5
 
2.0%
L 5
 
2.0%
Other Punctuation
ValueCountFrequency (%)
/ 341
97.7%
· 8
 
2.3%
Close Punctuation
ValueCountFrequency (%)
) 98
100.0%
Open Punctuation
ValueCountFrequency (%)
( 98
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 11224
93.3%
Common 551
 
4.6%
Latin 253
 
2.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1179
 
10.5%
643
 
5.7%
595
 
5.3%
486
 
4.3%
448
 
4.0%
437
 
3.9%
417
 
3.7%
390
 
3.5%
292
 
2.6%
273
 
2.4%
Other values (114) 6064
54.0%
Latin
ValueCountFrequency (%)
I 102
40.3%
T 97
38.3%
E 10
 
4.0%
O 10
 
4.0%
S 7
 
2.8%
M 5
 
2.0%
K 5
 
2.0%
V 5
 
2.0%
A 5
 
2.0%
L 5
 
2.0%
Common
ValueCountFrequency (%)
/ 341
61.9%
) 98
 
17.8%
( 98
 
17.8%
· 8
 
1.5%
- 5
 
0.9%
1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 11224
93.3%
ASCII 796
 
6.6%
None 8
 
0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1179
 
10.5%
643
 
5.7%
595
 
5.3%
486
 
4.3%
448
 
4.0%
437
 
3.9%
417
 
3.7%
390
 
3.5%
292
 
2.6%
273
 
2.4%
Other values (114) 6064
54.0%
ASCII
ValueCountFrequency (%)
/ 341
42.8%
I 102
 
12.8%
) 98
 
12.3%
( 98
 
12.3%
T 97
 
12.2%
E 10
 
1.3%
O 10
 
1.3%
S 7
 
0.9%
M 5
 
0.6%
K 5
 
0.6%
Other values (6) 23
 
2.9%
None
ValueCountFrequency (%)
· 8
100.0%

연수비용(원)
Real number (ℝ)

MISSING  ZEROS 

Distinct263
Distinct (%)11.0%
Missing85
Missing (%)3.4%
Infinite0
Infinite (%)0.0%
Mean314366.79
Minimum-1
Maximum19500000
Zeros775
Zeros (%)31.2%
Negative3
Negative (%)0.1%
Memory size22.0 KiB
2023-12-12T10:56:13.580263image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile0
Q10
median50000
Q3107000
95-th percentile539000
Maximum19500000
Range19500001
Interquartile range (IQR)107000

Descriptive statistics

Standard deviation1306195.1
Coefficient of variation (CV)4.1550034
Kurtosis72.507988
Mean314366.79
Median Absolute Deviation (MAD)50000
Skewness7.7921535
Sum7.5479467 × 108
Variance1.7061456 × 1012
MonotonicityNot monotonic
2023-12-12T10:56:13.775213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 775
31.2%
330000 135
 
5.4%
40000 112
 
4.5%
60000 91
 
3.7%
341000 88
 
3.5%
50000 87
 
3.5%
100000 67
 
2.7%
440000 62
 
2.5%
80000 57
 
2.3%
90000 51
 
2.1%
Other values (253) 876
35.2%
(Missing) 85
 
3.4%
ValueCountFrequency (%)
-1 3
 
0.1%
0 775
31.2%
1 18
 
0.7%
20 1
 
< 0.1%
3500 6
 
0.2%
5000 1
 
< 0.1%
7000 3
 
0.1%
9000 3
 
0.1%
10000 18
 
0.7%
12000 2
 
0.1%
ValueCountFrequency (%)
19500000 1
< 0.1%
17031000 1
< 0.1%
15000000 1
< 0.1%
14371000 1
< 0.1%
13111000 1
< 0.1%
13000000 1
< 0.1%
12396000 1
< 0.1%
12000000 1
< 0.1%
11200000 1
< 0.1%
10750000 1
< 0.1%

교재유무(Y_N)
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
True
1290 
False
1196 
ValueCountFrequency (%)
True 1290
51.9%
False 1196
48.1%
2023-12-12T10:56:13.921338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

시험유무(Y_N)
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
False
1816 
True
670 
ValueCountFrequency (%)
False 1816
73.0%
True 670
 
27.0%
2023-12-12T10:56:14.020729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Interactions

2023-12-12T10:56:11.044554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T10:56:14.399199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연수방법중분류소분류연수비용(원)교재유무(Y_N)시험유무(Y_N)
연수방법1.0000.8850.9150.2370.8940.838
중분류0.8851.0000.9980.3290.7020.636
소분류0.9150.9981.0000.2770.7360.663
연수비용(원)0.2370.3290.2771.0000.2270.138
교재유무(Y_N)0.8940.7020.7360.2271.0000.692
시험유무(Y_N)0.8380.6360.6630.1380.6921.000
2023-12-12T10:56:14.529347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
교재유무(Y_N)시험유무(Y_N)중분류연수방법
교재유무(Y_N)1.0000.4860.5640.705
시험유무(Y_N)0.4861.0000.5080.633
중분류0.5640.5081.0000.743
연수방법0.7050.6330.7431.000
2023-12-12T10:56:14.629494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연수비용(원)연수방법중분류교재유무(Y_N)시험유무(Y_N)
연수비용(원)1.0000.1820.1350.1740.105
연수방법0.1821.0000.7430.7050.633
중분류0.1350.7431.0000.5640.508
교재유무(Y_N)0.1740.7050.5641.0000.486
시험유무(Y_N)0.1050.6330.5080.4861.000

Missing values

2023-12-12T10:56:11.202847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T10:56:11.363704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

과정명연수방법중분류소분류연수비용(원)교재유무(Y_N)시험유무(Y_N)
0세무조사 완전준비! 어서와 세무조사는 처음이지?집합연수경영경영직무242000YN
1품질관리 기본집합연수생산품질품질관리528000NN
2자동차 전장기초집합연수스마트제조융합기술440000NN
3알기쉬운 3D스캐닝/프린터 실무집합연수스마트제조융합기술0YN
4(금형과 관련된)제품 설계기술집합연수뿌리/생산기술기계설계/기계가공341000NN
5구매/외주관리 실무집합연수생산품질생산관리330000NN
6알기 쉬운 생산관리집합연수생산품질생산관리330000NN
7자재/재고관리 실무집합연수생산품질생산관리330000NN
8조직을 살리는 파워리더십집합연수경영경영일반330000NN
9원가계산 및 CVP분석집합연수경영경영직무242000YN
과정명연수방법중분류소분류연수비용(원)교재유무(Y_N)시험유무(Y_N)
2476[커뮤니케이션 멘토링] 자신감을 더하는 스피치이러닝연수경영경영일반70000NY
2477(폐강)[모바일][통섭의 지혜] 경영비책 36계이러닝연수경영리더십70000NY
2478(폐강)[모바일][통섭의 지혜] 정치 속 숨겨진 전략을 발견하다이러닝연수경영리더십60000NY
2479(폐강)[모바일][통찰의 숲] 정관정요 제왕을 만드는 15책이러닝연수경영리더십60000NY
2480시퀀스(PLC)/인버터 제어실무이러닝연수스마트제조융합기술50000NY
2481공압제어 입문이러닝연수뿌리기술유공압/자동화50000NY
2482[모바일][트렌드 인사이트] 일상 속에서 진짜 트렌드를 찾아라이러닝연수경영경영직무50000NY
2483(폐강)디지털 마케터를 위한 구글애널리틱스(GA)이러닝연수직무영업/마케팅/무역70000NY
2484모두의 포토샵 CC이러닝연수4차 산업혁명정보기술(IT)90000NY
2485[모바일]성공을 부르는 비즈니스 스피치이러닝연수경영경영일반60000NY

Duplicate rows

Most frequently occurring

과정명연수방법중분류소분류연수비용(원)교재유무(Y_N)시험유무(Y_N)# duplicates
0(사)중소기업융합강원연합회 리더십역량강화집합연수경영최고경영자2400000YN2
1(웨비나)원전 협력 업체 중대재해처벌법 대응하기집합연수중기지원 정책연수중기지원정책연수0YN2
2NX 3차원설계-심화집합연수스마트제조융합기술341000NN2
3PLC 프로그래밍 기법과 응용(MELSEC)집합연수스마트제조융합기술440000YN2
4[2022 토목기사] 필기 - 측량학이러닝연수경영자기계발104000YN2
5공정감사와 협력업체 품질관리집합연수생산품질품질관리330000YN2
6생산계획 및 통제 시스템집합연수생산품질생산관리330000NN2
7스마트공장 구축 및 추진실무집합연수스마트제조도입전략0YN2
8스마트공장의 생산품질 최적화집합연수스마트제조제조현장관리기술0YN2
9제조원가의 이해와 활용집합연수생산품질생산관리242000YN2