gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	104
Missing cells	100
Missing cells (%)	24.0%
Duplicate rows	1
Duplicate rows (%)	1.0%
Total size in memory	3.7 KiB
Average record size in memory	36.3 B

Variable types

Text	1
Categorical	3

Dataset

Description	2022년에 실시한 대전일자리종합박람회 성과 결과입니다(참여기업수, 기업의 구인인원수, 구직자 취업인원수)
URL	https://www.data.go.kr/data/15081175/fileData.do

Alerts

Dataset has 1 (1.0%) duplicate rows	Duplicates
`참여기업` is highly overall correlated with `구인인원` and 1 other fields	High correlation
`구인인원` is highly overall correlated with `참여기업` and 1 other fields	High correlation
`취업인원` is highly overall correlated with `참여기업` and 1 other fields	High correlation
`참여기업` is highly imbalanced (86.6%)	Imbalance
`구인인원` is highly imbalanced (86.6%)	Imbalance
`취업인원` is highly imbalanced (86.6%)	Imbalance
`구 분` has 100 (96.2%) missing values	Missing

Reproduction

Analysis started	2023-12-12 09:38:50.734673
Analysis finished	2023-12-12 09:38:51.191423
Duration	0.46 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

구 분
Text

MISSING

Distinct	4
Distinct (%)	100.0%
Missing	100
Missing (%)	96.2%
Memory size	964.0 B

Length

Max length	7
Median length	6.5
Mean length	5.75
Min length	4

Characters and Unicode

Total characters	23
Distinct characters	19
Distinct categories	4 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	4 ?
Unique (%)	100.0%

Sample

1st row	청년특화기업
2nd row	공사, 공단
3rd row	IT 정보통신
4th row	일반기업

Value	Count	Frequency (%)
청년특화기업	1	16.7%
공사	1	16.7%
공단	1	16.7%
it	1	16.7%
정보통신	1	16.7%
일반기업	1	16.7%

Most occurring characters

Value	Count	Frequency (%)
	2	8.7%
기	2	8.7%
업	2	8.7%
공	2	8.7%
I	1	4.3%
일	1	4.3%
신	1	4.3%
통	1	4.3%
보	1	4.3%
정	1	4.3%
Other values (9)	9	39.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	18	78.3%
Space Separator	2	8.7%
Uppercase Letter	2	8.7%
Other Punctuation	1	4.3%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
기	2	11.1%
업	2	11.1%
공	2	11.1%
일	1	5.6%
신	1	5.6%
통	1	5.6%
보	1	5.6%
정	1	5.6%
청	1	5.6%
단	1	5.6%
Other values (5)	5	27.8%

Uppercase Letter

Value	Count	Frequency (%)
I	1	50.0%
T	1	50.0%

Space Separator

Value	Count	Frequency (%)
	2	100.0%

Other Punctuation

Value	Count	Frequency (%)
,	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	18	78.3%
Common	3	13.0%
Latin	2	8.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
기	2	11.1%
업	2	11.1%
공	2	11.1%
일	1	5.6%
신	1	5.6%
통	1	5.6%
보	1	5.6%
정	1	5.6%
청	1	5.6%
단	1	5.6%
Other values (5)	5	27.8%

Common

Value	Count	Frequency (%)
	2	66.7%
,	1	33.3%

Latin

Value	Count	Frequency (%)
I	1	50.0%
T	1	50.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	18	78.3%
ASCII	5	21.7%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	2	40.0%
I	1	20.0%
T	1	20.0%
,	1	20.0%

Hangul

Value	Count	Frequency (%)
기	2	11.1%
업	2	11.1%
공	2	11.1%
일	1	5.6%
신	1	5.6%
통	1	5.6%
보	1	5.6%
정	1	5.6%
청	1	5.6%
단	1	5.6%
Other values (5)	5	27.8%

참여기업
Categorical

HIGH CORRELATION IMBALANCE

Distinct	5
Distinct (%)	4.8%
Missing	0
Missing (%)	0.0%
Memory size	964.0 B

<NA>	100
39	1
11	1
12	1
71	1

Length

Max length	4
Median length	4
Mean length	3.9230769
Min length	2

Unique

Unique	4 ?
Unique (%)	3.8%

Sample

1st row	39
2nd row	11
3rd row	12
4th row	71
5th row	<NA>

Common Values

Value	Count	Frequency (%)
<NA>	100	96.2%
39	1	1.0%
11	1	1.0%
12	1	1.0%
71	1	1.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	100	96.2%
39	1	1.0%
11	1	1.0%
12	1	1.0%
71	1	1.0%

구인인원
Categorical

HIGH CORRELATION IMBALANCE

Distinct	5
Distinct (%)	4.8%
Missing	0
Missing (%)	0.0%
Memory size	964.0 B

<NA>	100
108	1
31	1
44	1
383	1

Length

Max length	4
Median length	4
Mean length	3.9423077
Min length	2

Unique

Unique	4 ?
Unique (%)	3.8%

Sample

1st row	108
2nd row	31
3rd row	44
4th row	383
5th row	<NA>

Common Values

Value	Count	Frequency (%)
<NA>	100	96.2%
108	1	1.0%
31	1	1.0%
44	1	1.0%
383	1	1.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	100	96.2%
108	1	1.0%
31	1	1.0%
44	1	1.0%
383	1	1.0%

취업인원
Categorical

HIGH CORRELATION IMBALANCE

Distinct	5
Distinct (%)	4.8%
Missing	0
Missing (%)	0.0%
Memory size	964.0 B

<NA>	100
27	1
4	1
2	1
131	1

Length

Max length	4
Median length	4
Mean length	3.9134615
Min length	1

Unique

Unique	4 ?
Unique (%)	3.8%

Sample

1st row	27
2nd row	4
3rd row	2
4th row	131
5th row	<NA>

Common Values

Value	Count	Frequency (%)
<NA>	100	96.2%
27	1	1.0%
4	1	1.0%
2	1	1.0%
131	1	1.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	100	96.2%
27	1	1.0%
4	1	1.0%
2	1	1.0%
131	1	1.0%

Heatmap
Table

	구 분	참여기업	구인인원	취업인원
구 분	1.000	1.000	1.000	1.000
참여기업	1.000	1.000	1.000	1.000
구인인원	1.000	1.000	1.000	1.000
취업인원	1.000	1.000	1.000	1.000

Heatmap
Table

	참여기업	구인인원	취업인원
참여기업	1.000	1.000	1.000
구인인원	1.000	1.000	1.000
취업인원	1.000	1.000	1.000

Heatmap
Table

	참여기업	구인인원	취업인원
참여기업	1.000	1.000	1.000
구인인원	1.000	1.000	1.000
취업인원	1.000	1.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	구 분	참여기업	구인인원	취업인원
0	청년특화기업	39	108	27
1	공사, 공단	11	31	4
2	IT 정보통신	12	44	2
3	일반기업	71	383	131
4	<NA>	<NA>	<NA>	<NA>
5	<NA>	<NA>	<NA>	<NA>
6	<NA>	<NA>	<NA>	<NA>
7	<NA>	<NA>	<NA>	<NA>
8	<NA>	<NA>	<NA>	<NA>
9	<NA>	<NA>	<NA>	<NA>

	구 분	참여기업	구인인원	취업인원
94	<NA>	<NA>	<NA>	<NA>
95	<NA>	<NA>	<NA>	<NA>
96	<NA>	<NA>	<NA>	<NA>
97	<NA>	<NA>	<NA>	<NA>
98	<NA>	<NA>	<NA>	<NA>
99	<NA>	<NA>	<NA>	<NA>
100	<NA>	<NA>	<NA>	<NA>
101	<NA>	<NA>	<NA>	<NA>
102	<NA>	<NA>	<NA>	<NA>
103	<NA>	<NA>	<NA>	<NA>

Most frequently occurring

	구 분	참여기업	구인인원	취업인원	# duplicates
0	<NA>	<NA>	<NA>	<NA>	100

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Space Separator

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring