gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	542
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	41
Duplicate rows (%)	7.6%
Total size in memory	13.4 KiB
Average record size in memory	25.2 B

Variable types

Text	1
Numeric	1
Categorical	1

Dataset

Description	중소벤처기업진흥공단에서 운영하는 중소벤기업연수원에서 지난 3년간 등록된 강사 등급 정보입니다.- 컬럼명 : 강사명, 나이, 급호
Author	중소벤처기업진흥공단
URL	https://www.data.go.kr/data/15124962/fileData.do

Alerts

Dataset has 41 (7.6%) duplicate rows

Duplicates

Reproduction

Analysis started	2023-12-12 20:10:17.873077
Analysis finished	2023-12-12 20:10:18.264281
Duration	0.39 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

강사명
Text

Distinct	57
Distinct (%)	10.5%
Missing	0
Missing (%)	0.0%
Memory size	4.4 KiB

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Characters and Unicode

Total characters	1626
Distinct characters	58
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	19 ?
Unique (%)	3.5%

Sample

1st row	박**
2nd row	조**
3rd row	이**
4th row	조**
5th row	고**

Value	Count	Frequency (%)
김	100	18.5%
이	80	14.8%
박	43	7.9%
최	31	5.7%
정	19	3.5%
임	17	3.1%
조	15	2.8%
문	14	2.6%
윤	13	2.4%
안	13	2.4%
Other values (47)	197	36.3%

Most occurring characters

Value	Count	Frequency (%)
*	1084	66.7%
김	100	6.2%
이	80	4.9%
박	43	2.6%
최	31	1.9%
정	19	1.2%
임	17	1.0%
조	15	0.9%
문	14	0.9%
윤	13	0.8%
Other values (48)	210	12.9%

Most occurring categories

Value	Count	Frequency (%)
Other Punctuation	1084	66.7%
Other Letter	542	33.3%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
김	100	18.5%
이	80	14.8%
박	43	7.9%
최	31	5.7%
정	19	3.5%
임	17	3.1%
조	15	2.8%
문	14	2.6%
윤	13	2.4%
안	13	2.4%
Other values (47)	197	36.3%

Other Punctuation

Value	Count	Frequency (%)
*	1084	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	1084	66.7%
Hangul	542	33.3%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
김	100	18.5%
이	80	14.8%
박	43	7.9%
최	31	5.7%
정	19	3.5%
임	17	3.1%
조	15	2.8%
문	14	2.6%
윤	13	2.4%
안	13	2.4%
Other values (47)	197	36.3%

Common

Value	Count	Frequency (%)
*	1084	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1084	66.7%
Hangul	542	33.3%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
*	1084	100.0%

Hangul

Value	Count	Frequency (%)
김	100	18.5%
이	80	14.8%
박	43	7.9%
최	31	5.7%
정	19	3.5%
임	17	3.1%
조	15	2.8%
문	14	2.6%
윤	13	2.4%
안	13	2.4%
Other values (47)	197	36.3%

나이
Real number (ℝ)

Distinct	52
Distinct (%)	9.6%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	48.162362

Minimum	22
Maximum	77
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	4.9 KiB

Quantile statistics

Minimum	22
5-th percentile	32
Q1	40
median	48
Q3	56
95-th percentile	65
Maximum	77
Range	55
Interquartile range (IQR)	16

Descriptive statistics

Standard deviation	10.805259
Coefficient of variation (CV)	0.22435069
Kurtosis	-0.71828918
Mean	48.162362
Median Absolute Deviation (MAD)	8
Skewness	0.089707678
Sum	26104
Variance	116.75363
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
43	29	5.4%
44	23	4.2%
54	23	4.2%
32	20	3.7%
45	18	3.3%
41	18	3.3%
40	18	3.3%
51	17	3.1%
57	17	3.1%
53	17	3.1%
Other values (42)	342	63.1%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
22	1	0.2%
24	1	0.2%
25	1	0.2%
26	1	0.2%
27	2	0.4%
28	6	1.1%
29	3	0.6%
30	4	0.7%
31	8	1.5%
32	20	3.7%

Value	Count	Frequency (%)
77	1	0.2%
74	2	0.4%
73	2	0.4%
72	1	0.2%
71	2	0.4%
70	2	0.4%
69	2	0.4%
68	6	1.1%
66	5	0.9%
65	10	1.8%

급호
Categorical

Distinct	10
Distinct (%)	1.8%
Missing	0
Missing (%)	0.0%
Memory size	4.4 KiB

2호	122
A등급	98
임시특호	86
1호	71
D등급	46
Other values (5)	119

Length

Max length	4
Median length	3
Mean length	2.7472325
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	S등급
2nd row	S등급
3rd row	S등급
4th row	S등급
5th row	S등급

Common Values

Value	Count	Frequency (%)
2호	122	22.5%
A등급	98	18.1%
임시특호	86	15.9%
1호	71	13.1%
D등급	46	8.5%
특호	37	6.8%
S등급	35	6.5%
B등급	21	3.9%
C등급	19	3.5%
보조강사	7	1.3%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
2호	122	22.5%
a등급	98	18.1%
임시특호	86	15.9%
1호	71	13.1%
d등급	46	8.5%
특호	37	6.8%
s등급	35	6.5%
b등급	21	3.9%
c등급	19	3.5%
보조강사	7	1.3%

나이

나이

Phik (φk)
Auto

Heatmap
Table

	강사명	나이	급호
강사명	1.000	0.190	0.000
나이	0.190	1.000	0.612
급호	0.000	0.612	1.000

Heatmap
Table

	나이	급호
나이	1.000	0.231
급호	0.231	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	강사명	나이	급호
0	박**	58	S등급
1	조**	49	S등급
2	이**	63	S등급
3	조**	49	S등급
4	고**	35	S등급
5	이**	37	S등급
6	김**	48	S등급
7	이**	51	S등급
8	오**	52	S등급
9	김**	45	S등급

	강사명	나이	급호
532	이**	46	2호
533	천**	54	2호
534	최**	38	2호
535	김**	64	보조강사
536	김**	22	보조강사
537	염**	25	보조강사
538	조**	28	보조강사
539	안**	26	보조강사
540	천**	35	보조강사
541	장**	28	보조강사

Most frequently occurring

	강사명	나이	급호	# duplicates
5	김**	40	2호	3
7	김**	41	1호	3
9	김**	45	A등급	3
26	이**	43	2호	3
0	김**	29	2호	2
1	김**	31	D등급	2
2	김**	32	2호	2
3	김**	32	D등급	2
4	김**	38	임시특호	2
6	김**	40	임시특호	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring