Overview

Dataset statistics

Number of variables7
Number of observations6616
Missing cells0
Missing cells (%)0.0%
Duplicate rows221
Duplicate rows (%)3.3%
Total size in memory374.9 KiB
Average record size in memory58.0 B

Variable types

Categorical4
Text2
Numeric1

Dataset

Description기업인력애로센터 일자리매칭플랫폼(https://job.kosmes.or.kr)을 통해 기업-구직자 취업매칭 목록- 중소기업은 채용공고를 플랫폼에 게시하고, 구직자는 채용공고를 통해 이력서 제출
Author중소벤처기업진흥공단
URLhttps://www.data.go.kr/data/15100247/fileData.do

Alerts

Dataset has 221 (3.3%) duplicate rowsDuplicates

Reproduction

Analysis started2024-03-23 04:44:46.370726
Analysis finished2024-03-23 04:44:50.547356
Duration4.18 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

참여년도
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size51.8 KiB
2023
3325 
2022
3291 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023
2nd row2023
3rd row2023
4th row2023
5th row2023

Common Values

ValueCountFrequency (%)
2023 3325
50.3%
2022 3291
49.7%

Length

2024-03-23T04:44:50.798875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T04:44:51.284819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023 3325
50.3%
2022 3291
49.7%
Distinct4271
Distinct (%)64.6%
Missing0
Missing (%)0.0%
Memory size51.8 KiB
2024-03-23T04:44:51.928291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length32
Median length30
Mean length7.8278416
Min length1

Characters and Unicode

Total characters51789
Distinct characters776
Distinct categories12 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3270 ?
Unique (%)49.4%

Sample

1st row김앤장법률사무소
2nd row김앤장법률사무소
3rd row해피요기즈
4th row고려해상화재손해사정(주)
5th row대정화금(주)
ValueCountFrequency (%)
주식회사 211
 
3.0%
우리엔유 52
 
0.7%
주)포파코 43
 
0.6%
로쏘(주 33
 
0.5%
국토건설(주 30
 
0.4%
주)휴먼바이오 30
 
0.4%
한샘리하우스 23
 
0.3%
주)대주기계 18
 
0.3%
주)라온디어스 18
 
0.3%
와이엠씨(주 18
 
0.3%
Other values (4336) 6516
93.2%
2024-03-23T04:44:53.328280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5126
 
9.9%
) 4691
 
9.1%
( 4690
 
9.1%
1921
 
3.7%
1756
 
3.4%
962
 
1.9%
876
 
1.7%
724
 
1.4%
634
 
1.2%
582
 
1.1%
Other values (766) 29827
57.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 41179
79.5%
Close Punctuation 4700
 
9.1%
Open Punctuation 4698
 
9.1%
Uppercase Letter 414
 
0.8%
Space Separator 401
 
0.8%
Lowercase Letter 181
 
0.3%
Decimal Number 111
 
0.2%
Other Punctuation 73
 
0.1%
Dash Punctuation 19
 
< 0.1%
Control 7
 
< 0.1%
Other values (2) 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5126
 
12.4%
1921
 
4.7%
1756
 
4.3%
962
 
2.3%
876
 
2.1%
724
 
1.8%
634
 
1.5%
582
 
1.4%
570
 
1.4%
547
 
1.3%
Other values (692) 27481
66.7%
Uppercase Letter
ValueCountFrequency (%)
C 43
 
10.4%
S 38
 
9.2%
E 35
 
8.5%
N 29
 
7.0%
L 26
 
6.3%
O 25
 
6.0%
M 22
 
5.3%
T 22
 
5.3%
K 20
 
4.8%
R 20
 
4.8%
Other values (14) 134
32.4%
Lowercase Letter
ValueCountFrequency (%)
o 24
13.3%
e 19
10.5%
t 16
 
8.8%
d 15
 
8.3%
n 14
 
7.7%
s 13
 
7.2%
i 11
 
6.1%
r 9
 
5.0%
b 8
 
4.4%
a 8
 
4.4%
Other values (12) 44
24.3%
Decimal Number
ValueCountFrequency (%)
2 35
31.5%
1 23
20.7%
3 14
 
12.6%
5 10
 
9.0%
0 8
 
7.2%
6 8
 
7.2%
7 5
 
4.5%
9 4
 
3.6%
4 2
 
1.8%
8 2
 
1.8%
Other Punctuation
ValueCountFrequency (%)
. 29
39.7%
& 18
24.7%
/ 13
17.8%
, 10
 
13.7%
2
 
2.7%
1
 
1.4%
Close Punctuation
ValueCountFrequency (%)
) 4691
99.8%
8
 
0.2%
] 1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 4690
99.8%
7
 
0.1%
[ 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
399
99.5%
  2
 
0.5%
Dash Punctuation
ValueCountFrequency (%)
- 19
100.0%
Control
ValueCountFrequency (%)
7
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 5
100.0%
Math Symbol
ValueCountFrequency (%)
= 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 41178
79.5%
Common 10015
 
19.3%
Latin 595
 
1.1%
Han 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5126
 
12.4%
1921
 
4.7%
1756
 
4.3%
962
 
2.3%
876
 
2.1%
724
 
1.8%
634
 
1.5%
582
 
1.4%
570
 
1.4%
547
 
1.3%
Other values (691) 27480
66.7%
Latin
ValueCountFrequency (%)
C 43
 
7.2%
S 38
 
6.4%
E 35
 
5.9%
N 29
 
4.9%
L 26
 
4.4%
O 25
 
4.2%
o 24
 
4.0%
M 22
 
3.7%
T 22
 
3.7%
K 20
 
3.4%
Other values (36) 311
52.3%
Common
ValueCountFrequency (%)
) 4691
46.8%
( 4690
46.8%
399
 
4.0%
2 35
 
0.3%
. 29
 
0.3%
1 23
 
0.2%
- 19
 
0.2%
& 18
 
0.2%
3 14
 
0.1%
/ 13
 
0.1%
Other values (18) 84
 
0.8%
Han
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 41178
79.5%
ASCII 10590
 
20.4%
None 20
 
< 0.1%
CJK 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5126
 
12.4%
1921
 
4.7%
1756
 
4.3%
962
 
2.3%
876
 
2.1%
724
 
1.8%
634
 
1.5%
582
 
1.4%
570
 
1.4%
547
 
1.3%
Other values (691) 27480
66.7%
ASCII
ValueCountFrequency (%)
) 4691
44.3%
( 4690
44.3%
399
 
3.8%
C 43
 
0.4%
S 38
 
0.4%
2 35
 
0.3%
E 35
 
0.3%
. 29
 
0.3%
N 29
 
0.3%
L 26
 
0.2%
Other values (59) 575
 
5.4%
None
ValueCountFrequency (%)
8
40.0%
7
35.0%
  2
 
10.0%
2
 
10.0%
1
 
5.0%
CJK
ValueCountFrequency (%)
1
100.0%

사업자번호
Real number (ℝ)

Distinct4179
Distinct (%)63.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.7457166 × 109
Minimum1.0101754 × 109
Maximum9.0984004 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size58.3 KiB
2024-03-23T04:44:53.953803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.0101754 × 109
5-th percentile1.1081752 × 109
Q11.3786267 × 109
median3.4086006 × 109
Q35.5042005 × 109
95-th percentile7.6986013 × 109
Maximum9.0984004 × 109
Range8.0882249 × 109
Interquartile range (IQR)4.1255739 × 109

Descriptive statistics

Standard deviation2.2002857 × 109
Coefficient of variation (CV)0.58741383
Kurtosis-0.97805961
Mean3.7457166 × 109
Median Absolute Deviation (MAD)2.0304732 × 109
Skewness0.39755845
Sum2.4781661 × 1013
Variance4.8412573 × 1018
MonotonicityNot monotonic
2024-03-23T04:44:54.460680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1078794004 52
 
0.8%
5148186049 43
 
0.6%
3058148738 33
 
0.5%
1668701542 30
 
0.5%
4178119176 30
 
0.5%
3128197106 18
 
0.3%
2308802024 18
 
0.3%
5018120018 18
 
0.3%
4698101454 17
 
0.3%
1378626704 16
 
0.2%
Other values (4169) 6341
95.8%
ValueCountFrequency (%)
1010175441 2
< 0.1%
1013375383 1
< 0.1%
1018111885 1
< 0.1%
1018118476 1
< 0.1%
1018154491 1
< 0.1%
1018168786 1
< 0.1%
1018185790 1
< 0.1%
1018187328 2
< 0.1%
1018190589 2
< 0.1%
1018191588 1
< 0.1%
ValueCountFrequency (%)
9098400371 1
 
< 0.1%
8998800811 1
 
< 0.1%
8998500260 2
< 0.1%
8998102178 1
 
< 0.1%
8988700198 1
 
< 0.1%
8978601557 2
< 0.1%
8968800786 3
< 0.1%
8968100297 1
 
< 0.1%
8948701860 1
 
< 0.1%
8948601183 1
 
< 0.1%

지역구분
Categorical

Distinct17
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size51.8 KiB
경기
1536 
서울
1079 
경남
708 
경북
427 
대구
394 
Other values (12)
2472 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울
2nd row서울
3rd row대구
4th row서울
5th row경기

Common Values

ValueCountFrequency (%)
경기 1536
23.2%
서울 1079
16.3%
경남 708
10.7%
경북 427
 
6.5%
대구 394
 
6.0%
충남 347
 
5.2%
인천 332
 
5.0%
광주 320
 
4.8%
전남 309
 
4.7%
부산 290
 
4.4%
Other values (7) 874
13.2%

Length

2024-03-23T04:44:55.108138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 1536
23.2%
서울 1079
16.3%
경남 708
10.7%
경북 427
 
6.5%
대구 394
 
6.0%
충남 347
 
5.2%
인천 332
 
5.0%
광주 320
 
4.8%
전남 309
 
4.7%
부산 290
 
4.4%
Other values (7) 874
13.2%

업종
Categorical

Distinct20
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size51.8 KiB
제조업
3960 
도매 및 소매업
544 
건설업
497 
정보통신업
484 
전문, 과학 및 기술 서비스업
 
326
Other values (15)
805 

Length

Max length24
Median length3
Mean length5.7929262
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row전문, 과학 및 기술 서비스업
2nd row전문, 과학 및 기술 서비스업
3rd row제조업
4th row금융 및 보험업
5th row제조업

Common Values

ValueCountFrequency (%)
제조업 3960
59.9%
도매 및 소매업 544
 
8.2%
건설업 497
 
7.5%
정보통신업 484
 
7.3%
전문, 과학 및 기술 서비스업 326
 
4.9%
사업시설 관리 사업 지원 및 임대 서비스업 219
 
3.3%
숙박 및 음식점업 110
 
1.7%
사업시설 관리, 사업 지원 및 임대 서비스업 109
 
1.6%
운수 및 창고업 81
 
1.2%
보건업 및 사회복지 서비스업 75
 
1.1%
Other values (10) 211
 
3.2%

Length

2024-03-23T04:44:55.518168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
제조업 3960
32.5%
1613
13.2%
서비스업 863
 
7.1%
소매업 544
 
4.5%
도매 544
 
4.5%
건설업 497
 
4.1%
정보통신업 484
 
4.0%
관리 328
 
2.7%
임대 328
 
2.7%
지원 328
 
2.7%
Other values (38) 2692
22.1%

이름
Text

Distinct107
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size51.8 KiB
2024-03-23T04:44:56.096304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters19848
Distinct characters108
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)0.3%

Sample

1st row안**
2nd row임**
3rd row김**
4th row김**
5th row안**
ValueCountFrequency (%)
1354
20.5%
983
14.9%
599
 
9.1%
341
 
5.2%
330
 
5.0%
179
 
2.7%
164
 
2.5%
132
 
2.0%
130
 
2.0%
128
 
1.9%
Other values (97) 2276
34.4%
2024-03-23T04:44:57.759475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 13232
66.7%
1354
 
6.8%
983
 
5.0%
599
 
3.0%
341
 
1.7%
330
 
1.7%
179
 
0.9%
164
 
0.8%
132
 
0.7%
130
 
0.7%
Other values (98) 2404
 
12.1%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 13232
66.7%
Other Letter 6612
33.3%
Uppercase Letter 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1354
20.5%
983
14.9%
599
 
9.1%
341
 
5.2%
330
 
5.0%
179
 
2.7%
164
 
2.5%
132
 
2.0%
130
 
2.0%
128
 
1.9%
Other values (95) 2272
34.4%
Uppercase Letter
ValueCountFrequency (%)
T 2
50.0%
N 2
50.0%
Other Punctuation
ValueCountFrequency (%)
* 13232
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13232
66.7%
Hangul 6612
33.3%
Latin 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1354
20.5%
983
14.9%
599
 
9.1%
341
 
5.2%
330
 
5.0%
179
 
2.7%
164
 
2.5%
132
 
2.0%
130
 
2.0%
128
 
1.9%
Other values (95) 2272
34.4%
Latin
ValueCountFrequency (%)
T 2
50.0%
N 2
50.0%
Common
ValueCountFrequency (%)
* 13232
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13236
66.7%
Hangul 6612
33.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 13232
> 99.9%
T 2
 
< 0.1%
N 2
 
< 0.1%
Hangul
ValueCountFrequency (%)
1354
20.5%
983
14.9%
599
 
9.1%
341
 
5.2%
330
 
5.0%
179
 
2.7%
164
 
2.5%
132
 
2.0%
130
 
2.0%
128
 
1.9%
Other values (95) 2272
34.4%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size51.8 KiB
4651 
1965 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
4651
70.3%
1965
29.7%

Length

2024-03-23T04:44:58.506515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T04:44:59.307702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
4651
70.3%
1965
29.7%

Interactions

2024-03-23T04:44:49.002103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T04:44:59.557146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
참여년도사업자번호지역구분업종성별
참여년도1.0000.0470.1430.3300.056
사업자번호0.0471.0000.7220.3140.067
지역구분0.1430.7221.0000.4150.136
업종0.3300.3140.4151.0000.280
성별0.0560.0670.1360.2801.000
2024-03-23T04:45:00.040526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종지역구분성별참여년도
업종1.0000.1390.2210.261
지역구분0.1391.0000.1220.128
성별0.2210.1221.0000.036
참여년도0.2610.1280.0361.000
2024-03-23T04:45:00.420503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업자번호참여년도지역구분업종성별
사업자번호1.0000.0360.3830.1040.051
참여년도0.0361.0000.1280.2610.036
지역구분0.3830.1281.0000.1390.122
업종0.1040.2610.1391.0000.221
성별0.0510.0360.1220.2211.000

Missing values

2024-03-23T04:44:49.649423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T04:44:50.319461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

참여년도기업명사업자번호지역구분업종이름성별
02023김앤장법률사무소1010175441서울전문, 과학 및 기술 서비스업안**
12023김앤장법률사무소1010175441서울전문, 과학 및 기술 서비스업임**
22023해피요기즈1013375383대구제조업김**
32023고려해상화재손해사정(주)1018111885서울금융 및 보험업김**
42023대정화금(주)1018118476경기제조업안**
52023엘케이테크넷(주)1018154491경기제조업조**
62023(주)네오닥터1018185790강원제조업조**
72023유아이비손해보험중개1018187328서울금융 및 보험업이**
82023유아이비손해보험중개1018187328서울금융 및 보험업이**
92023(주)뉴트리케어1018190589서울제조업김**
참여년도기업명사업자번호지역구분업종이름성별
66062022하레하레베이커리카페3091761464대전숙박 및 음식점업현**
66072022성이바이오(주)2308600681강원전문, 과학 및 기술 서비스업김**
66082022메가엠지씨커피 김포고촌4165400630경기숙박 및 음식점업김**
66092022선진로지스틱스 주식회사1028121052서울운수 및 창고업장**
66102022(주)에이투지5098801902경기정보통신업오**
66112022버슘머트리얼즈에이디엠코리아 유한회사1348653444경기도매 및 소매업김**
66122022주식회사모젠컴퍼니8158501091대구사업시설 관리 사업 지원 및 임대 서비스업이**
66132022(주)모젠컴퍼니5038187164대구사업시설 관리 사업 지원 및 임대 서비스업전**
66142022(주)모젠컴퍼니5038187164대구사업시설 관리 사업 지원 및 임대 서비스업심**
66152022주식회사모젠컴퍼니8158501091대구사업시설 관리 사업 지원 및 임대 서비스업최**

Duplicate rows

Most frequently occurring

참여년도기업명사업자번호지역구분업종이름성별# duplicates
782022우리엔유1078794004서울사업시설 관리 사업 지원 및 임대 서비스업김**8
572022국토건설(주)4178119176전남건설업김**7
2002023와이엠씨(주)3128197106충남제조업이**5
862022우리엔유1078794004서울사업시설 관리 사업 지원 및 임대 서비스업이**4
892022우리엔유1078794004서울사업시설 관리 사업 지원 및 임대 서비스업최**4
972022주식회사 브이디에스5138124017대구제조업박**4
1342023(주)샤프테크닉스케이1218608356인천제조업김**4
1352023(주)샤프테크닉스케이1218608356인천제조업박**4
1422023(주)에어릭스5068161464경북제조업이**4
1582023(주)포파코5148186049대구도매 및 소매업김**4