Overview

Dataset statistics

Number of variables4
Number of observations1297
Missing cells0
Missing cells (%)0.0%
Duplicate rows76
Duplicate rows (%)5.9%
Total size in memory40.7 KiB
Average record size in memory32.1 B

Variable types

Text3
Categorical1

Dataset

Description직종코드 3분류를 파일형태로 제공합니다. 대분류 중분루 소분류로 나뉘며, 1dapth 는 한자리, 2depth는 세자리 , 3depty는 여섯자리로 구성되어 있습니다.
URLhttps://www.data.go.kr/data/15120487/fileData.do

Alerts

Dataset has 76 (5.9%) duplicate rowsDuplicates
대분류 is highly imbalanced (96.9%)Imbalance

Reproduction

Analysis started2023-12-12 20:53:20.500366
Analysis finished2023-12-12 20:53:21.023440
Duration0.52 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1212
Distinct (%)93.4%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
2023-12-13T05:53:21.282954image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.4841943
Min length1

Characters and Unicode

Total characters7113
Distinct characters14
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1133 ?
Unique (%)87.4%

Sample

1st row1
2nd row11
3rd row11100
4th row11200
5th row12100
ValueCountFrequency (%)
703902 3
 
0.2%
703200 3
 
0.2%
215200 3
 
0.2%
231400 3
 
0.2%
703101 3
 
0.2%
703102 3
 
0.2%
24402 2
 
0.2%
824107 2
 
0.2%
816200 2
 
0.2%
550104 2
 
0.2%
Other values (1202) 1271
98.0%
2023-12-13T05:53:21.758878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1929
27.1%
1 1330
18.7%
2 1089
15.3%
3 654
 
9.2%
4 489
 
6.9%
5 484
 
6.8%
8 379
 
5.3%
6 336
 
4.7%
9 230
 
3.2%
7 181
 
2.5%
Other values (4) 12
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7101
99.8%
Uppercase Letter 12
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1929
27.2%
1 1330
18.7%
2 1089
15.3%
3 654
 
9.2%
4 489
 
6.9%
5 484
 
6.8%
8 379
 
5.3%
6 336
 
4.7%
9 230
 
3.2%
7 181
 
2.5%
Uppercase Letter
ValueCountFrequency (%)
A 6
50.0%
B 3
25.0%
C 2
 
16.7%
D 1
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
Common 7101
99.8%
Latin 12
 
0.2%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1929
27.2%
1 1330
18.7%
2 1089
15.3%
3 654
 
9.2%
4 489
 
6.9%
5 484
 
6.8%
8 379
 
5.3%
6 336
 
4.7%
9 230
 
3.2%
7 181
 
2.5%
Latin
ValueCountFrequency (%)
A 6
50.0%
B 3
25.0%
C 2
 
16.7%
D 1
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7113
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1929
27.1%
1 1330
18.7%
2 1089
15.3%
3 654
 
9.2%
4 489
 
6.9%
5 484
 
6.8%
8 379
 
5.3%
6 336
 
4.7%
9 230
 
3.2%
7 181
 
2.5%
Other values (4) 12
 
0.2%

대분류
Categorical

IMBALANCE 

Distinct14
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
1284 
경영·사무·금융·보험
 
1
연구 및 공학기술
 
1
교육·법률·사회복지·경찰·소방 및 군인
 
1
보건·의료
 
1
Other values (9)
 
9

Length

Max length25
Median length1
Mean length1.1333847
Min length1

Unique

Unique13 ?
Unique (%)1.0%

Sample

1st row경영·사무·금융·보험
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
1284
99.0%
경영·사무·금융·보험 1
 
0.1%
연구 및 공학기술 1
 
0.1%
교육·법률·사회복지·경찰·소방 및 군인 1
 
0.1%
보건·의료 1
 
0.1%
예술·디자인·방송·스포츠 1
 
0.1%
미용·여행·숙박·음식·경비·돌봄·청소 1
 
0.1%
영업·판매·운전·운송 1
 
0.1%
건설·채굴 1
 
0.1%
설치·정비·생산-기계·금속·재료 1
 
0.1%
Other values (4) 4
 
0.3%

Length

2023-12-13T05:53:21.904623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
3
 
15.0%
경영·사무·금융·보험 1
 
5.0%
건설·채굴 1
 
5.0%
단순 1
 
5.0%
제조 1
 
5.0%
설치·정비·생산-인쇄·목재·공예 1
 
5.0%
설치·정비·생산-화학·환경·섬유·의복·식품가공 1
 
5.0%
설치·정비·생산-전기·전자·정보통신 1
 
5.0%
설치·정비·생산-기계·금속·재료 1
 
5.0%
영업·판매·운전·운송 1
 
5.0%
Other values (8) 8
40.0%
Distinct113
Distinct (%)8.7%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
2023-12-13T05:53:22.068823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length34
Median length1
Mean length2.1935235
Min length1

Characters and Unicode

Total characters2845
Distinct characters247
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique112 ?
Unique (%)8.6%

Sample

1st row
2nd row행정·경영·금융·보험 관리직
3rd row
4th row
5th row
ValueCountFrequency (%)
41
 
12.2%
조작 12
 
3.6%
사무 8
 
2.4%
8
 
2.4%
기타 7
 
2.1%
종사자 7
 
2.1%
공학기술 6
 
1.8%
연구 6
 
1.8%
기계 5
 
1.5%
전문가 5
 
1.5%
Other values (194) 232
68.8%
2023-12-13T05:53:22.387938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1410
49.6%
· 152
 
5.3%
58
 
2.0%
45
 
1.6%
41
 
1.4%
35
 
1.2%
25
 
0.9%
23
 
0.8%
23
 
0.8%
23
 
0.8%
Other values (237) 1010
35.5%

Most occurring categories

ValueCountFrequency (%)
Space Separator 1410
49.6%
Other Letter 1225
43.1%
Other Punctuation 164
 
5.8%
Open Punctuation 18
 
0.6%
Close Punctuation 18
 
0.6%
Uppercase Letter 9
 
0.3%
Decimal Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
58
 
4.7%
45
 
3.7%
41
 
3.3%
35
 
2.9%
25
 
2.0%
23
 
1.9%
23
 
1.9%
23
 
1.9%
21
 
1.7%
20
 
1.6%
Other values (222) 911
74.4%
Uppercase Letter
ValueCountFrequency (%)
U 2
22.2%
I 1
11.1%
X 1
11.1%
D 1
11.1%
R 1
11.1%
A 1
11.1%
M 1
11.1%
T 1
11.1%
Other Punctuation
ValueCountFrequency (%)
· 152
92.7%
, 11
 
6.7%
/ 1
 
0.6%
Space Separator
ValueCountFrequency (%)
1410
100.0%
Open Punctuation
ValueCountFrequency (%)
( 18
100.0%
Close Punctuation
ValueCountFrequency (%)
) 18
100.0%
Decimal Number
ValueCountFrequency (%)
3 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1611
56.6%
Hangul 1225
43.1%
Latin 9
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
58
 
4.7%
45
 
3.7%
41
 
3.3%
35
 
2.9%
25
 
2.0%
23
 
1.9%
23
 
1.9%
23
 
1.9%
21
 
1.7%
20
 
1.6%
Other values (222) 911
74.4%
Latin
ValueCountFrequency (%)
U 2
22.2%
I 1
11.1%
X 1
11.1%
D 1
11.1%
R 1
11.1%
A 1
11.1%
M 1
11.1%
T 1
11.1%
Common
ValueCountFrequency (%)
1410
87.5%
· 152
 
9.4%
( 18
 
1.1%
) 18
 
1.1%
, 11
 
0.7%
/ 1
 
0.1%
3 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1468
51.6%
Hangul 1225
43.1%
None 152
 
5.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1410
96.0%
( 18
 
1.2%
) 18
 
1.2%
, 11
 
0.7%
U 2
 
0.1%
I 1
 
0.1%
/ 1
 
0.1%
X 1
 
0.1%
3 1
 
0.1%
D 1
 
0.1%
Other values (4) 4
 
0.3%
None
ValueCountFrequency (%)
· 152
100.0%
Hangul
ValueCountFrequency (%)
58
 
4.7%
45
 
3.7%
41
 
3.3%
35
 
2.9%
25
 
2.0%
23
 
1.9%
23
 
1.9%
23
 
1.9%
21
 
1.7%
20
 
1.6%
Other values (222) 911
74.4%
Distinct1089
Distinct (%)84.0%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
2023-12-13T05:53:22.682241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length42
Median length31
Mean length11.46569
Min length1

Characters and Unicode

Total characters14871
Distinct characters485
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1011 ?
Unique (%)77.9%

Sample

1st row
2nd row
3rd row의회의원·고위공무원 및 공공단체임원
4th row기업 고위임원
5th row정부행정 관리자
ValueCountFrequency (%)
197
 
6.1%
조작원 109
 
3.3%
기타 104
 
3.2%
기술자 65
 
2.0%
56
 
1.7%
사무원 52
 
1.6%
연구원 47
 
1.4%
종사원 40
 
1.2%
전문가 35
 
1.1%
포함 35
 
1.1%
Other values (1402) 2516
77.3%
2023-12-13T05:53:23.109104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2209
 
14.9%
829
 
5.6%
· 527
 
3.5%
502
 
3.4%
462
 
3.1%
309
 
2.1%
) 291
 
2.0%
( 291
 
2.0%
254
 
1.7%
212
 
1.4%
Other values (475) 8985
60.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 11260
75.7%
Space Separator 2209
 
14.9%
Other Punctuation 683
 
4.6%
Close Punctuation 291
 
2.0%
Open Punctuation 291
 
2.0%
Uppercase Letter 120
 
0.8%
Lowercase Letter 9
 
0.1%
Dash Punctuation 4
 
< 0.1%
Decimal Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
829
 
7.4%
502
 
4.5%
462
 
4.1%
309
 
2.7%
254
 
2.3%
212
 
1.9%
197
 
1.7%
185
 
1.6%
185
 
1.6%
160
 
1.4%
Other values (435) 7965
70.7%
Uppercase Letter
ValueCountFrequency (%)
A 20
16.7%
T 16
13.3%
C 13
10.8%
D 9
 
7.5%
I 8
 
6.7%
S 8
 
6.7%
P 6
 
5.0%
B 5
 
4.2%
R 5
 
4.2%
N 4
 
3.3%
Other values (12) 26
21.7%
Lowercase Letter
ValueCountFrequency (%)
n 2
22.2%
e 2
22.2%
g 1
11.1%
i 1
11.1%
r 1
11.1%
a 1
11.1%
l 1
11.1%
Other Punctuation
ValueCountFrequency (%)
· 527
77.2%
, 150
 
22.0%
/ 5
 
0.7%
& 1
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 2
50.0%
3 1
25.0%
9 1
25.0%
Space Separator
ValueCountFrequency (%)
2209
100.0%
Close Punctuation
ValueCountFrequency (%)
) 291
100.0%
Open Punctuation
ValueCountFrequency (%)
( 291
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 11260
75.7%
Common 3482
 
23.4%
Latin 129
 
0.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
829
 
7.4%
502
 
4.5%
462
 
4.1%
309
 
2.7%
254
 
2.3%
212
 
1.9%
197
 
1.7%
185
 
1.6%
185
 
1.6%
160
 
1.4%
Other values (435) 7965
70.7%
Latin
ValueCountFrequency (%)
A 20
15.5%
T 16
12.4%
C 13
 
10.1%
D 9
 
7.0%
I 8
 
6.2%
S 8
 
6.2%
P 6
 
4.7%
B 5
 
3.9%
R 5
 
3.9%
N 4
 
3.1%
Other values (19) 35
27.1%
Common
ValueCountFrequency (%)
2209
63.4%
· 527
 
15.1%
) 291
 
8.4%
( 291
 
8.4%
, 150
 
4.3%
/ 5
 
0.1%
- 4
 
0.1%
1 2
 
0.1%
& 1
 
< 0.1%
3 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 11260
75.7%
ASCII 3084
 
20.7%
None 527
 
3.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2209
71.6%
) 291
 
9.4%
( 291
 
9.4%
, 150
 
4.9%
A 20
 
0.6%
T 16
 
0.5%
C 13
 
0.4%
D 9
 
0.3%
I 8
 
0.3%
S 8
 
0.3%
Other values (29) 69
 
2.2%
Hangul
ValueCountFrequency (%)
829
 
7.4%
502
 
4.5%
462
 
4.1%
309
 
2.7%
254
 
2.3%
212
 
1.9%
197
 
1.7%
185
 
1.6%
185
 
1.6%
160
 
1.4%
Other values (435) 7965
70.7%
None
ValueCountFrequency (%)
· 527
100.0%

Missing values

2023-12-13T05:53:20.915913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T05:53:20.993074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

카테고리 ID대분류중분류소분류
01경영·사무·금융·보험
111행정·경영·금융·보험 관리직
211100의회의원·고위공무원 및 공공단체임원
311200기업 고위임원
412100정부행정 관리자
512200경영지원 관리자
612201경영기획 부서장
712202인사·노무·교육·총무·감사 부서장
812203자재·구매 부서장
912204재무·회계·경리 부서장
카테고리 ID대분류중분류소분류
1287903100조림·산림경영인 및 벌목원
1288903900임산물 채취 및 기타 임업 종사원
1289134어업 종사자
1290904100양식원
1291904200어부 및 해녀
1292135농림어업 단순 종사자
1293905000농림어업 단순 종사원
1294905001농업 단순 종사원
1295905002임업 단순 종사원(산림보호감시, 산불감시원 등)
1296905003어업 단순 종사원

Duplicate rows

Most frequently occurring

카테고리 ID대분류중분류소분류# duplicates
5215200대학 교육 조교(TA) 및 연구 조교(RA)3
7231400직업상담사3
51703101건축 배관공(옥내급수관,상하수배관,위생 배관)3
52703102가스 배관공(가스관 설치원)3
53703200공업 배관공(플랜트,항공,선박,철도차량)3
54703902배관 보조원3
0134301정보시스템 운영자2
1159100제도사2
2214401미술 강사2
3214402음악 강사2