Overview

Dataset statistics

Number of variables4
Number of observations4003
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory129.1 KiB
Average record size in memory33.0 B

Variable types

Numeric1
Categorical1
Text2

Dataset

Description공단(큐넷)에서 시행중인 국가자격(국가기술자격, 국가전문자격) 종목, 유형 별 세트번호, 세트명, 유형번호 등에 대한 정보를 제공한다.
URLhttps://www.data.go.kr/data/15120656/fileData.do

Alerts

선택분야코드 is highly imbalanced (87.0%)Imbalance

Reproduction

Analysis started2023-12-12 00:50:06.095854
Analysis finished2023-12-12 00:50:06.721075
Duration0.63 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

종목코드
Real number (ℝ)

Distinct665
Distinct (%)16.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5662.4929
Minimum1021
Maximum9728
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size35.3 KiB
2023-12-12T09:50:06.810504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1021
5-th percentile1322
Q12450
median6892
Q37932
95-th percentile8670
Maximum9728
Range8707
Interquartile range (IQR)5482

Descriptive statistics

Standard deviation2725.8407
Coefficient of variation (CV)0.48138528
Kurtosis-1.4738492
Mean5662.4929
Median Absolute Deviation (MAD)1075
Skewness-0.46735447
Sum22666959
Variance7430207.7
MonotonicityNot monotonic
2023-12-12T09:50:06.947524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7937 229
 
5.7%
7625 133
 
3.3%
7947 125
 
3.1%
7957 100
 
2.5%
3922 80
 
2.0%
7910 53
 
1.3%
1530 52
 
1.3%
1560 52
 
1.3%
6592 49
 
1.2%
7620 44
 
1.1%
Other values (655) 3086
77.1%
ValueCountFrequency (%)
1021 1
< 0.1%
1022 2
< 0.1%
1023 1
< 0.1%
1024 1
< 0.1%
1025 1
< 0.1%
1030 1
< 0.1%
1040 1
< 0.1%
1048 1
< 0.1%
1050 1
< 0.1%
1051 1
< 0.1%
ValueCountFrequency (%)
9728 1
 
< 0.1%
9700 1
 
< 0.1%
9699 1
 
< 0.1%
9698 1
 
< 0.1%
9697 1
 
< 0.1%
9696 1
 
< 0.1%
9685 1
 
< 0.1%
9672 1
 
< 0.1%
9545 9
0.2%
9544 1
 
< 0.1%

선택분야코드
Categorical

IMBALANCE 

Distinct49
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size31.4 KiB
00
3683 
98
 
62
97
 
62
24
 
24
22
 
21
Other values (44)
 
151

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique27 ?
Unique (%)0.7%

Sample

1st row71
2nd row00
3rd row00
4th row20
5th row00

Common Values

ValueCountFrequency (%)
00 3683
92.0%
98 62
 
1.5%
97 62
 
1.5%
24 24
 
0.6%
22 21
 
0.5%
36 17
 
0.4%
38 16
 
0.4%
37 16
 
0.4%
21 15
 
0.4%
23 13
 
0.3%
Other values (39) 74
 
1.8%

Length

2023-12-12T09:50:07.069522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
00 3683
92.0%
97 62
 
1.5%
98 62
 
1.5%
24 24
 
0.6%
22 21
 
0.5%
36 17
 
0.4%
38 16
 
0.4%
37 16
 
0.4%
21 15
 
0.4%
23 13
 
0.3%
Other values (39) 74
 
1.8%
Distinct432
Distinct (%)10.8%
Missing0
Missing (%)0.0%
Memory size31.4 KiB
2023-12-12T09:50:07.482180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length3.2185861
Min length1

Characters and Unicode

Total characters12884
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique129 ?
Unique (%)3.2%

Sample

1st row2-01
2nd row2-A
3rd row2-A
4th row2-A
5th row2-A
ValueCountFrequency (%)
2-a 418
 
10.4%
3-01 281
 
7.0%
3-02 198
 
4.9%
3-03 175
 
4.4%
3-04 142
 
3.5%
3-05 124
 
3.1%
1 106
 
2.6%
3-06 102
 
2.5%
3-07 84
 
2.1%
3-08 70
 
1.7%
Other values (422) 2303
57.5%
2023-12-12T09:50:08.112632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 3134
24.3%
- 2378
18.5%
0 1645
12.8%
2 1639
12.7%
1 1285
10.0%
4 606
 
4.7%
5 441
 
3.4%
A 437
 
3.4%
6 356
 
2.8%
7 315
 
2.4%
Other values (20) 648
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9999
77.6%
Dash Punctuation 2378
 
18.5%
Uppercase Letter 506
 
3.9%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 437
86.4%
B 27
 
5.3%
C 9
 
1.8%
D 7
 
1.4%
E 5
 
1.0%
F 4
 
0.8%
H 3
 
0.6%
G 3
 
0.6%
I 2
 
0.4%
Q 1
 
0.2%
Other values (8) 8
 
1.6%
Decimal Number
ValueCountFrequency (%)
3 3134
31.3%
0 1645
16.5%
2 1639
16.4%
1 1285
12.9%
4 606
 
6.1%
5 441
 
4.4%
6 356
 
3.6%
7 315
 
3.2%
8 297
 
3.0%
9 281
 
2.8%
Dash Punctuation
ValueCountFrequency (%)
- 2378
100.0%
Lowercase Letter
ValueCountFrequency (%)
k 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12377
96.1%
Latin 507
 
3.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 437
86.2%
B 27
 
5.3%
C 9
 
1.8%
D 7
 
1.4%
E 5
 
1.0%
F 4
 
0.8%
H 3
 
0.6%
G 3
 
0.6%
I 2
 
0.4%
Q 1
 
0.2%
Other values (9) 9
 
1.8%
Common
ValueCountFrequency (%)
3 3134
25.3%
- 2378
19.2%
0 1645
13.3%
2 1639
13.2%
1 1285
10.4%
4 606
 
4.9%
5 441
 
3.6%
6 356
 
2.9%
7 315
 
2.5%
8 297
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12884
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 3134
24.3%
- 2378
18.5%
0 1645
12.8%
2 1639
12.7%
1 1285
10.0%
4 606
 
4.7%
5 441
 
3.4%
A 437
 
3.4%
6 356
 
2.8%
7 315
 
2.4%
Other values (20) 648
 
5.0%
Distinct983
Distinct (%)24.6%
Missing0
Missing (%)0.0%
Memory size31.4 KiB
2023-12-12T09:50:08.565251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length6
Mean length5.3792156
Min length1

Characters and Unicode

Total characters21533
Distinct characters304
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique714 ?
Unique (%)17.8%

Sample

1st row공통-01형
2nd row공통-A형
3rd row공통-A형
4th row공통-A형
5th row공통-A형
ValueCountFrequency (%)
공통-a형 439
 
10.9%
개별-01형 271
 
6.7%
개별-02형 192
 
4.8%
개별-03형 174
 
4.3%
개별-04형 145
 
3.6%
개별-05형 129
 
3.2%
개별-06형 103
 
2.6%
개별-07형 85
 
2.1%
개별-08형 75
 
1.9%
개별-09형 67
 
1.7%
Other values (975) 2344
58.3%
2023-12-12T09:50:09.172036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 3066
14.2%
3022
14.0%
2276
10.6%
2268
10.5%
1 1652
 
7.7%
0 1528
 
7.1%
2 1181
 
5.5%
3 856
 
4.0%
A 724
 
3.4%
4 680
 
3.2%
Other values (294) 4280
19.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 9944
46.2%
Decimal Number 7348
34.1%
Dash Punctuation 3066
 
14.2%
Uppercase Letter 963
 
4.5%
Other Punctuation 106
 
0.5%
Close Punctuation 35
 
0.2%
Open Punctuation 35
 
0.2%
Space Separator 21
 
0.1%
Lowercase Letter 9
 
< 0.1%
Connector Punctuation 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3022
30.4%
2276
22.9%
2268
22.8%
603
 
6.1%
599
 
6.0%
66
 
0.7%
65
 
0.7%
46
 
0.5%
41
 
0.4%
36
 
0.4%
Other values (252) 922
 
9.3%
Uppercase Letter
ValueCountFrequency (%)
A 724
75.2%
B 75
 
7.8%
C 68
 
7.1%
D 52
 
5.4%
E 7
 
0.7%
N 5
 
0.5%
F 4
 
0.4%
S 4
 
0.4%
G 4
 
0.4%
H 4
 
0.4%
Other values (9) 16
 
1.7%
Decimal Number
ValueCountFrequency (%)
1 1652
22.5%
0 1528
20.8%
2 1181
16.1%
3 856
11.6%
4 680
9.3%
5 418
 
5.7%
6 345
 
4.7%
7 255
 
3.5%
8 233
 
3.2%
9 200
 
2.7%
Lowercase Letter
ValueCountFrequency (%)
l 2
22.2%
d 2
22.2%
o 2
22.2%
i 1
11.1%
r 1
11.1%
k 1
11.1%
Other Punctuation
ValueCountFrequency (%)
. 102
96.2%
, 4
 
3.8%
Dash Punctuation
ValueCountFrequency (%)
- 3066
100.0%
Close Punctuation
ValueCountFrequency (%)
) 35
100.0%
Open Punctuation
ValueCountFrequency (%)
( 35
100.0%
Space Separator
ValueCountFrequency (%)
21
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10617
49.3%
Hangul 9944
46.2%
Latin 972
 
4.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3022
30.4%
2276
22.9%
2268
22.8%
603
 
6.1%
599
 
6.0%
66
 
0.7%
65
 
0.7%
46
 
0.5%
41
 
0.4%
36
 
0.4%
Other values (252) 922
 
9.3%
Latin
ValueCountFrequency (%)
A 724
74.5%
B 75
 
7.7%
C 68
 
7.0%
D 52
 
5.3%
E 7
 
0.7%
N 5
 
0.5%
F 4
 
0.4%
S 4
 
0.4%
G 4
 
0.4%
H 4
 
0.4%
Other values (15) 25
 
2.6%
Common
ValueCountFrequency (%)
- 3066
28.9%
1 1652
15.6%
0 1528
14.4%
2 1181
 
11.1%
3 856
 
8.1%
4 680
 
6.4%
5 418
 
3.9%
6 345
 
3.2%
7 255
 
2.4%
8 233
 
2.2%
Other values (7) 403
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11589
53.8%
Hangul 9944
46.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 3066
26.5%
1 1652
14.3%
0 1528
13.2%
2 1181
 
10.2%
3 856
 
7.4%
A 724
 
6.2%
4 680
 
5.9%
5 418
 
3.6%
6 345
 
3.0%
7 255
 
2.2%
Other values (32) 884
 
7.6%
Hangul
ValueCountFrequency (%)
3022
30.4%
2276
22.9%
2268
22.8%
603
 
6.1%
599
 
6.0%
66
 
0.7%
65
 
0.7%
46
 
0.5%
41
 
0.4%
36
 
0.4%
Other values (252) 922
 
9.3%

Interactions

2023-12-12T09:50:06.401155image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T09:50:09.297335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
종목코드선택분야코드
종목코드1.0000.602
선택분야코드0.6021.000
2023-12-12T09:50:09.400115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
종목코드선택분야코드
종목코드1.0000.261
선택분야코드0.2611.000

Missing values

2023-12-12T09:50:06.558932image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T09:50:06.675343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

종목코드선택분야코드세트번호세트명
02521712-01공통-01형
19210002-A공통-A형
22104002-A공통-A형
36791202-A공통-A형
42047002-A공통-A형
52264002-A공통-A형
67926002-A공통-A형
71296002-06공통-06형
87864002-A공통-A형
97889002-A공통-A형
종목코드선택분야코드세트번호세트명
3993617600313개별-13형
3994617600314개별-14형
3995392300340개별-42
3996132200308개별-08형
3997132200309개별-09형
399867900013개별-13
399929740033형
40002324003개별3형
4001629100341표준-01형
4002158100305개별-05형