gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	4003
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	129.1 KiB
Average record size in memory	33.0 B

Variable types

Numeric	1
Categorical	1
Text	2

Dataset

Description	공단(큐넷)에서 시행중인 국가자격(국가기술자격, 국가전문자격) 종목, 유형 별 세트번호, 세트명, 유형번호 등에 대한 정보를 제공한다.
URL	https://www.data.go.kr/data/15120656/fileData.do

Alerts

선택분야코드 is highly imbalanced (87.0%) Imbalance

Reproduction

Analysis started	2023-12-12 00:50:06.095854
Analysis finished	2023-12-12 00:50:06.721075
Duration	0.63 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

종목코드
Real number (ℝ)

Distinct	665
Distinct (%)	16.6%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	5662.4929

Minimum	1021
Maximum	9728
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	35.3 KiB

Quantile statistics

Minimum	1021
5-th percentile	1322
Q1	2450
median	6892
Q3	7932
95-th percentile	8670
Maximum	9728
Range	8707
Interquartile range (IQR)	5482

Descriptive statistics

Standard deviation	2725.8407
Coefficient of variation (CV)	0.48138528
Kurtosis	-1.4738492
Mean	5662.4929
Median Absolute Deviation (MAD)	1075
Skewness	-0.46735447
Sum	22666959
Variance	7430207.7
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
7937	229	5.7%
7625	133	3.3%
7947	125	3.1%
7957	100	2.5%
3922	80	2.0%
7910	53	1.3%
1530	52	1.3%
1560	52	1.3%
6592	49	1.2%
7620	44	1.1%
Other values (655)	3086	77.1%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1021	1	< 0.1%
1022	2	< 0.1%
1023	1	< 0.1%
1024	1	< 0.1%
1025	1	< 0.1%
1030	1	< 0.1%
1040	1	< 0.1%
1048	1	< 0.1%
1050	1	< 0.1%
1051	1	< 0.1%

Value	Count	Frequency (%)
9728	1	< 0.1%
9700	1	< 0.1%
9699	1	< 0.1%
9698	1	< 0.1%
9697	1	< 0.1%
9696	1	< 0.1%
9685	1	< 0.1%
9672	1	< 0.1%
9545	9	0.2%
9544	1	< 0.1%

선택분야코드
Categorical

IMBALANCE

Distinct	49
Distinct (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	31.4 KiB

00	3683
98	62
97	62
24	24
22	21
Other values (44)	151

Length

Max length	2
Median length	2
Mean length	2
Min length	2

Unique

Unique	27 ?
Unique (%)	0.7%

Sample

1st row	71
2nd row	00
3rd row	00
4th row	20
5th row	00

Common Values

Value	Count	Frequency (%)
00	3683	92.0%
98	62	1.5%
97	62	1.5%
24	24	0.6%
22	21	0.5%
36	17	0.4%
38	16	0.4%
37	16	0.4%
21	15	0.4%
23	13	0.3%
Other values (39)	74	1.8%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
00	3683	92.0%
97	62	1.5%
98	62	1.5%
24	24	0.6%
22	21	0.5%
36	17	0.4%
38	16	0.4%
37	16	0.4%
21	15	0.4%
23	13	0.3%
Other values (39)	74	1.8%

세트번호
Text

Distinct	432
Distinct (%)	10.8%
Missing	0
Missing (%)	0.0%
Memory size	31.4 KiB

Length

Max length	4
Median length	3
Mean length	3.2185861
Min length	1

Characters and Unicode

Total characters	12884
Distinct characters	30
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	129 ?
Unique (%)	3.2%

Sample

1st row	2-01
2nd row	2-A
3rd row	2-A
4th row	2-A
5th row	2-A

Value	Count	Frequency (%)
2-a	418	10.4%
3-01	281	7.0%
3-02	198	4.9%
3-03	175	4.4%
3-04	142	3.5%
3-05	124	3.1%
1	106	2.6%
3-06	102	2.5%
3-07	84	2.1%
3-08	70	1.7%
Other values (422)	2303	57.5%

Most occurring characters

Value	Count	Frequency (%)
3	3134	24.3%
-	2378	18.5%
0	1645	12.8%
2	1639	12.7%
1	1285	10.0%
4	606	4.7%
5	441	3.4%
A	437	3.4%
6	356	2.8%
7	315	2.4%
Other values (20)	648	5.0%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	9999	77.6%
Dash Punctuation	2378	18.5%
Uppercase Letter	506	3.9%
Lowercase Letter	1	< 0.1%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
A	437	86.4%
B	27	5.3%
C	9	1.8%
D	7	1.4%
E	5	1.0%
F	4	0.8%
H	3	0.6%
G	3	0.6%
I	2	0.4%
Q	1	0.2%
Other values (8)	8	1.6%

Decimal Number

Value	Count	Frequency (%)
3	3134	31.3%
0	1645	16.5%
2	1639	16.4%
1	1285	12.9%
4	606	6.1%
5	441	4.4%
6	356	3.6%
7	315	3.2%
8	297	3.0%
9	281	2.8%

Dash Punctuation

Value	Count	Frequency (%)
-	2378	100.0%

Lowercase Letter

Value	Count	Frequency (%)
k	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	12377	96.1%
Latin	507	3.9%

Most frequent character per script

Latin

Value	Count	Frequency (%)
A	437	86.2%
B	27	5.3%
C	9	1.8%
D	7	1.4%
E	5	1.0%
F	4	0.8%
H	3	0.6%
G	3	0.6%
I	2	0.4%
Q	1	0.2%
Other values (9)	9	1.8%

Common

Value	Count	Frequency (%)
3	3134	25.3%
-	2378	19.2%
0	1645	13.3%
2	1639	13.2%
1	1285	10.4%
4	606	4.9%
5	441	3.6%
6	356	2.9%
7	315	2.5%
8	297	2.4%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	12884	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
3	3134	24.3%
-	2378	18.5%
0	1645	12.8%
2	1639	12.7%
1	1285	10.0%
4	606	4.7%
5	441	3.4%
A	437	3.4%
6	356	2.8%
7	315	2.4%
Other values (20)	648	5.0%

세트명
Text

Distinct	983
Distinct (%)	24.6%
Missing	0
Missing (%)	0.0%
Memory size	31.4 KiB

Length

Max length	25
Median length	6
Mean length	5.3792156
Min length	1

Characters and Unicode

Total characters	21533
Distinct characters	304
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	714 ?
Unique (%)	17.8%

Sample

1st row	공통-01형
2nd row	공통-A형
3rd row	공통-A형
4th row	공통-A형
5th row	공통-A형

Value	Count	Frequency (%)
공통-a형	439	10.9%
개별-01형	271	6.7%
개별-02형	192	4.8%
개별-03형	174	4.3%
개별-04형	145	3.6%
개별-05형	129	3.2%
개별-06형	103	2.6%
개별-07형	85	2.1%
개별-08형	75	1.9%
개별-09형	67	1.7%
Other values (975)	2344	58.3%

Most occurring characters

Value	Count	Frequency (%)
-	3066	14.2%
형	3022	14.0%
개	2276	10.6%
별	2268	10.5%
1	1652	7.7%
0	1528	7.1%
2	1181	5.5%
3	856	4.0%
A	724	3.4%
4	680	3.2%
Other values (294)	4280	19.9%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	9944	46.2%
Decimal Number	7348	34.1%
Dash Punctuation	3066	14.2%
Uppercase Letter	963	4.5%
Other Punctuation	106	0.5%
Close Punctuation	35	0.2%
Open Punctuation	35	0.2%
Space Separator	21	0.1%
Lowercase Letter	9	< 0.1%
Connector Punctuation	6	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
형	3022	30.4%
개	2276	22.9%
별	2268	22.8%
공	603	6.1%
통	599	6.0%
표	66	0.7%
준	65	0.7%
제	46	0.5%
삭	41	0.4%
이	36	0.4%
Other values (252)	922	9.3%

Uppercase Letter

Value	Count	Frequency (%)
A	724	75.2%
B	75	7.8%
C	68	7.1%
D	52	5.4%
E	7	0.7%
N	5	0.5%
F	4	0.4%
S	4	0.4%
G	4	0.4%
H	4	0.4%
Other values (9)	16	1.7%

Decimal Number

Value	Count	Frequency (%)
1	1652	22.5%
0	1528	20.8%
2	1181	16.1%
3	856	11.6%
4	680	9.3%
5	418	5.7%
6	345	4.7%
7	255	3.5%
8	233	3.2%
9	200	2.7%

Lowercase Letter

Value	Count	Frequency (%)
l	2	22.2%
d	2	22.2%
o	2	22.2%
i	1	11.1%
r	1	11.1%
k	1	11.1%

Other Punctuation

Value	Count	Frequency (%)
.	102	96.2%
,	4	3.8%

Dash Punctuation

Value	Count	Frequency (%)
-	3066	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	35	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	35	100.0%

Space Separator

Value	Count	Frequency (%)
	21	100.0%

Connector Punctuation

Value	Count	Frequency (%)
_	6	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	10617	49.3%
Hangul	9944	46.2%
Latin	972	4.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
형	3022	30.4%
개	2276	22.9%
별	2268	22.8%
공	603	6.1%
통	599	6.0%
표	66	0.7%
준	65	0.7%
제	46	0.5%
삭	41	0.4%
이	36	0.4%
Other values (252)	922	9.3%

Latin

Value	Count	Frequency (%)
A	724	74.5%
B	75	7.7%
C	68	7.0%
D	52	5.3%
E	7	0.7%
N	5	0.5%
F	4	0.4%
S	4	0.4%
G	4	0.4%
H	4	0.4%
Other values (15)	25	2.6%

Common

Value	Count	Frequency (%)
-	3066	28.9%
1	1652	15.6%
0	1528	14.4%
2	1181	11.1%
3	856	8.1%
4	680	6.4%
5	418	3.9%
6	345	3.2%
7	255	2.4%
8	233	2.2%
Other values (7)	403	3.8%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	11589	53.8%
Hangul	9944	46.2%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
-	3066	26.5%
1	1652	14.3%
0	1528	13.2%
2	1181	10.2%
3	856	7.4%
A	724	6.2%
4	680	5.9%
5	418	3.6%
6	345	3.0%
7	255	2.2%
Other values (32)	884	7.6%

Hangul

Value	Count	Frequency (%)
형	3022	30.4%
개	2276	22.9%
별	2268	22.8%
공	603	6.1%
통	599	6.0%
표	66	0.7%
준	65	0.7%
제	46	0.5%
삭	41	0.4%
이	36	0.4%
Other values (252)	922	9.3%

종목코드

종목코드

Phik (φk)
Auto

Heatmap
Table

	종목코드	선택분야코드
종목코드	1.000	0.602
선택분야코드	0.602	1.000

Heatmap
Table

	종목코드	선택분야코드
종목코드	1.000	0.261
선택분야코드	0.261	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	종목코드	선택분야코드	세트번호	세트명
0	2521	71	2-01	공통-01형
1	9210	00	2-A	공통-A형
2	2104	00	2-A	공통-A형
3	6791	20	2-A	공통-A형
4	2047	00	2-A	공통-A형
5	2264	00	2-A	공통-A형
6	7926	00	2-A	공통-A형
7	1296	00	2-06	공통-06형
8	7864	00	2-A	공통-A형
9	7889	00	2-A	공통-A형

	종목코드	세트번호	세트명
3993	6176	313	개별-13형
3994	6176	314	개별-14형
3995	3923	340	개별-42
3996	1322	308	개별-08형
3997	1322	309	개별-09형
3998	6790	13	개별-13
3999	2974	3	3형
4000	2324	3	개별3형
4001	6291	341	표준-01형
4002	1581	305	개별-05형

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Decimal Number

Dash Punctuation

Lowercase Letter

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Lowercase Letter

Other Punctuation

Dash Punctuation

Close Punctuation

Open Punctuation

Space Separator

Connector Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Interactions

Correlations

Missing values

Sample