Overview

Dataset statistics

Number of variables9
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory810.5 KiB
Average record size in memory83.0 B

Variable types

Categorical6
Numeric2
Text1

Dataset

Description2013년에서 2021년까지의 국가자격취득자 현황 정보입니다. 취득연도, 취득월, 지역명, 연령대, 자격구분명, 계열명, 족명, 취득 수로 데이터가 구성되어 있습니다.
Author한국산업인력공단
URLhttps://www.data.go.kr/data/15088896/fileData.do

Alerts

계열명 is highly overall correlated with 자격구분명High correlation
자격구분명 is highly overall correlated with 계열명High correlation
자격구분명 is highly imbalanced (99.9%)Imbalance
계열명 is highly imbalanced (52.5%)Imbalance

Reproduction

Analysis started2023-12-12 12:49:09.542001
Analysis finished2023-12-12 12:49:11.279515
Duration1.74 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

취득년도
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2014
5801 
2015
4199 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2014
2nd row2015
3rd row2015
4th row2015
5th row2014

Common Values

ValueCountFrequency (%)
2014 5801
58.0%
2015 4199
42.0%

Length

2023-12-12T21:49:11.374436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:49:11.514538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2014 5801
58.0%
2015 4199
42.0%

취득월
Real number (ℝ)

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.21
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T21:49:11.606781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q15
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.8518964
Coefficient of variation (CV)0.39554736
Kurtosis-0.89621589
Mean7.21
Median Absolute Deviation (MAD)2
Skewness0.1007945
Sum72100
Variance8.1333133
MonotonicityNot monotonic
2023-12-12T21:49:11.745803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
6 1443
14.4%
8 1397
14.0%
5 1387
13.9%
4 1157
11.6%
12 1055
10.5%
10 887
8.9%
9 722
7.2%
7 670
6.7%
11 586
5.9%
2 283
 
2.8%
Other values (2) 413
 
4.1%
ValueCountFrequency (%)
1 133
 
1.3%
2 283
 
2.8%
3 280
 
2.8%
4 1157
11.6%
5 1387
13.9%
6 1443
14.4%
7 670
6.7%
8 1397
14.0%
9 722
7.2%
10 887
8.9%
ValueCountFrequency (%)
12 1055
10.5%
11 586
5.9%
10 887
8.9%
9 722
7.2%
8 1397
14.0%
7 670
6.7%
6 1443
14.4%
5 1387
13.9%
4 1157
11.6%
3 280
 
2.8%

지역명
Categorical

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기
1045 
서울
925 
경남
659 
인천
640 
충남
638 
Other values (13)
6093 

Length

Max length2
Median length2
Mean length1.985
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경남
2nd row
3rd row전남
4th row전남
5th row경북

Common Values

ValueCountFrequency (%)
경기 1045
 
10.4%
서울 925
 
9.2%
경남 659
 
6.6%
인천 640
 
6.4%
충남 638
 
6.4%
경북 624
 
6.2%
대구 611
 
6.1%
부산 592
 
5.9%
전북 573
 
5.7%
대전 524
 
5.2%
Other values (8) 3169
31.7%

Length

2023-12-12T21:49:11.876510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 1045
 
10.6%
서울 925
 
9.4%
경남 659
 
6.7%
인천 640
 
6.5%
충남 638
 
6.5%
경북 624
 
6.3%
대구 611
 
6.2%
부산 592
 
6.0%
전북 573
 
5.8%
대전 524
 
5.3%
Other values (7) 3019
30.6%

연령대
Categorical

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20대
2741 
30대
2109 
40대
1839 
10대
1734 
50대
1154 
Other values (2)
423 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row40대
2nd row20대
3rd row20대
4th row10대
5th row10대

Common Values

ValueCountFrequency (%)
20대 2741
27.4%
30대 2109
21.1%
40대 1839
18.4%
10대 1734
17.3%
50대 1154
11.5%
60대 411
 
4.1%
70대 12
 
0.1%

Length

2023-12-12T21:49:11.998494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:49:12.105921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20대 2741
27.4%
30대 2109
21.1%
40대 1839
18.4%
10대 1734
17.3%
50대 1154
11.5%
60대 411
 
4.1%
70대 12
 
0.1%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
남성
6594 
여성
3406 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row남성
2nd row여성
3rd row남성
4th row남성
5th row남성

Common Values

ValueCountFrequency (%)
남성 6594
65.9%
여성 3406
34.1%

Length

2023-12-12T21:49:12.236947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:49:12.357505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남성 6594
65.9%
여성 3406
34.1%

자격구분명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
국가기술자격
9999 
일학습병행자격
 
1

Length

Max length7
Median length6
Mean length6.0001
Min length6

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row국가기술자격
2nd row국가기술자격
3rd row국가기술자격
4th row국가기술자격
5th row국가기술자격

Common Values

ValueCountFrequency (%)
국가기술자격 9999
> 99.9%
일학습병행자격 1
 
< 0.1%

Length

2023-12-12T21:49:12.472328image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:49:12.594582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국가기술자격 9999
> 99.9%
일학습병행자격 1
 
< 0.1%

계열명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
기능사
6714 
기사
2953 
기능장
 
199
기술사
 
133
L2
 
1

Length

Max length3
Median length3
Mean length2.7046
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row기술사
2nd row기능사
3rd row기사
4th row기능사
5th row기능사

Common Values

ValueCountFrequency (%)
기능사 6714
67.1%
기사 2953
29.5%
기능장 199
 
2.0%
기술사 133
 
1.3%
L2 1
 
< 0.1%

Length

2023-12-12T21:49:12.722598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:49:12.848640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
기능사 6714
67.1%
기사 2953
29.5%
기능장 199
 
2.0%
기술사 133
 
1.3%
l2 1
 
< 0.1%
Distinct415
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T21:49:13.110630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length17
Mean length7.1553
Min length3

Characters and Unicode

Total characters71553
Distinct characters244
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique64 ?
Unique (%)0.6%

Sample

1st row건축시공기술사
2nd row미용사(일반)
3rd row폐기물처리산업기사
4th row컴퓨터응용선반기능사
5th row위험물기능사
ValueCountFrequency (%)
한식조리기능사 366
 
3.7%
양식조리기능사 341
 
3.4%
제빵기능사 296
 
3.0%
미용사(일반 282
 
2.8%
지게차운전기능사 255
 
2.5%
중식조리기능사 229
 
2.3%
굴삭기운전기능사 228
 
2.3%
제과기능사 217
 
2.2%
미용사(피부 212
 
2.1%
정보처리기능사 196
 
2.0%
Other values (405) 7378
73.8%
2023-12-12T21:49:13.551240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
11300
 
15.8%
10285
 
14.4%
6321
 
8.8%
2039
 
2.8%
1956
 
2.7%
1922
 
2.7%
1658
 
2.3%
1613
 
2.3%
1376
 
1.9%
1352
 
1.9%
Other values (234) 31731
44.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 69959
97.8%
Open Punctuation 719
 
1.0%
Close Punctuation 719
 
1.0%
Decimal Number 154
 
0.2%
Connector Punctuation 1
 
< 0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
11300
 
16.2%
10285
 
14.7%
6321
 
9.0%
2039
 
2.9%
1956
 
2.8%
1922
 
2.7%
1658
 
2.4%
1613
 
2.3%
1376
 
2.0%
1352
 
1.9%
Other values (228) 30137
43.1%
Decimal Number
ValueCountFrequency (%)
2 146
94.8%
1 8
 
5.2%
Open Punctuation
ValueCountFrequency (%)
( 719
100.0%
Close Punctuation
ValueCountFrequency (%)
) 719
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%
Uppercase Letter
ValueCountFrequency (%)
L 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 69959
97.8%
Common 1593
 
2.2%
Latin 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
11300
 
16.2%
10285
 
14.7%
6321
 
9.0%
2039
 
2.9%
1956
 
2.8%
1922
 
2.7%
1658
 
2.4%
1613
 
2.3%
1376
 
2.0%
1352
 
1.9%
Other values (228) 30137
43.1%
Common
ValueCountFrequency (%)
( 719
45.1%
) 719
45.1%
2 146
 
9.2%
1 8
 
0.5%
_ 1
 
0.1%
Latin
ValueCountFrequency (%)
L 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 69959
97.8%
ASCII 1594
 
2.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
11300
 
16.2%
10285
 
14.7%
6321
 
9.0%
2039
 
2.9%
1956
 
2.8%
1922
 
2.7%
1658
 
2.4%
1613
 
2.3%
1376
 
2.0%
1352
 
1.9%
Other values (228) 30137
43.1%
ASCII
ValueCountFrequency (%)
( 719
45.1%
) 719
45.1%
2 146
 
9.2%
1 8
 
0.5%
_ 1
 
0.1%
L 1
 
0.1%

취득 수
Real number (ℝ)

Distinct159
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.6512
Minimum1
Maximum1125
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T21:49:13.711642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q37
95-th percentile37
Maximum1125
Range1124
Interquartile range (IQR)6

Descriptive statistics

Standard deviation24.031906
Coefficient of variation (CV)2.7778696
Kurtosis591.07679
Mean8.6512
Median Absolute Deviation (MAD)1
Skewness17.219656
Sum86512
Variance577.53249
MonotonicityNot monotonic
2023-12-12T21:49:13.851837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 3577
35.8%
2 1469
14.7%
3 939
 
9.4%
4 615
 
6.2%
5 421
 
4.2%
6 345
 
3.5%
7 261
 
2.6%
8 243
 
2.4%
9 197
 
2.0%
10 156
 
1.6%
Other values (149) 1777
17.8%
ValueCountFrequency (%)
1 3577
35.8%
2 1469
14.7%
3 939
 
9.4%
4 615
 
6.2%
5 421
 
4.2%
6 345
 
3.5%
7 261
 
2.6%
8 243
 
2.4%
9 197
 
2.0%
10 156
 
1.6%
ValueCountFrequency (%)
1125 1
< 0.1%
720 1
< 0.1%
474 1
< 0.1%
396 1
< 0.1%
389 1
< 0.1%
363 1
< 0.1%
307 1
< 0.1%
278 1
< 0.1%
265 1
< 0.1%
248 1
< 0.1%

Interactions

2023-12-12T21:49:10.757432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:49:10.510363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:49:10.890105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:49:10.632087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:49:13.959360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
취득년도취득월지역명연령대성별자격구분명계열명취득 수
취득년도1.0000.5560.0090.0000.0120.0000.0380.000
취득월0.5561.0000.0880.2760.1620.0000.7420.074
지역명0.0090.0881.0000.0700.0360.0460.1180.064
연령대0.0000.2760.0701.0000.0840.0000.2700.055
성별0.0120.1620.0360.0841.0000.0000.0830.004
자격구분명0.0000.0000.0460.0000.0001.0001.0000.000
계열명0.0380.7420.1180.2700.0831.0001.0000.000
취득 수0.0000.0740.0640.0550.0040.0000.0001.000
2023-12-12T21:49:14.106749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
계열명지역명자격구분명성별취득년도연령대
계열명1.0000.0601.0000.1010.0460.176
지역명0.0601.0000.0360.0280.0070.031
자격구분명1.0000.0361.0000.0000.0000.000
성별0.1010.0280.0001.0000.0080.090
취득년도0.0460.0070.0000.0081.0000.000
연령대0.1760.0310.0000.0900.0001.000
2023-12-12T21:49:14.238823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
취득월취득 수취득년도지역명연령대성별자격구분명계열명
취득월1.000-0.0060.4290.0340.1430.1240.0000.400
취득 수-0.0061.0000.0000.0290.0190.0040.0000.000
취득년도0.4290.0001.0000.0070.0000.0080.0000.046
지역명0.0340.0290.0071.0000.0310.0280.0360.060
연령대0.1430.0190.0000.0311.0000.0900.0000.176
성별0.1240.0040.0080.0280.0901.0000.0000.101
자격구분명0.0000.0000.0000.0360.0000.0001.0001.000
계열명0.4000.0000.0460.0600.1760.1011.0001.000

Missing values

2023-12-12T21:49:11.039662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:49:11.199819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

취득년도취득월지역명연령대성별자격구분명계열명종목명취득 수
1358120145경남40대남성국가기술자격기술사건축시공기술사1
572362015320대여성국가기술자격기능사미용사(일반)1
6975920155전남20대남성국가기술자격기사폐기물처리산업기사1
6961020155전남10대남성국가기술자격기능사컴퓨터응용선반기능사1
1856220146경북10대남성국가기술자격기능사위험물기능사2
6615820155충남30대남성국가기술자격기사에너지관리기사11
46611201412인천10대여성국가기술자격기능사생산자동화기능사1
7938820157경북10대여성국가기술자격기능사전기기능사6
54221201412세종20대남성국가기술자격기능사승강기기능사2
853320144광주30대남성국가기술자격기능사천공기운전기능사1
취득년도취득월지역명연령대성별자격구분명계열명종목명취득 수
8838820159경기30대여성국가기술자격기능사사진기능사1
49920201412경북50대남성국가기술자격기능사플라스틱창호기능사1
252292014710대여성국가기술자격기능사전자기기기능사7
8358720158충북40대남성국가기술자격기사산업안전기사6
8139520158서울20대남성국가기술자격기사식품산업기사6
3111720148전남20대여성국가기술자격기능사전자출판기능사1
927312015910대남성국가기술자격기능사일식조리기능사1
53587201412전남30대남성국가기술자격기사침투비파괴검사산업기사1
8463920158경북40대여성국가기술자격기능사양식조리기능사9
8317720158강원30대남성국가기술자격기사에너지관리기사3