Overview

Dataset statistics

Number of variables3
Number of observations542
Missing cells0
Missing cells (%)0.0%
Duplicate rows41
Duplicate rows (%)7.6%
Total size in memory13.4 KiB
Average record size in memory25.2 B

Variable types

Text1
Numeric1
Categorical1

Dataset

Description중소벤처기업진흥공단에서 운영하는 중소벤기업연수원에서 지난 3년간 등록된 강사 등급 정보입니다.- 컬럼명 : 강사명, 나이, 급호
Author중소벤처기업진흥공단
URLhttps://www.data.go.kr/data/15124962/fileData.do

Alerts

Dataset has 41 (7.6%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 20:10:17.873077
Analysis finished2023-12-12 20:10:18.264281
Duration0.39 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct57
Distinct (%)10.5%
Missing0
Missing (%)0.0%
Memory size4.4 KiB
2023-12-13T05:10:18.397431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1626
Distinct characters58
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19 ?
Unique (%)3.5%

Sample

1st row박**
2nd row조**
3rd row이**
4th row조**
5th row고**
ValueCountFrequency (%)
100
18.5%
80
14.8%
43
 
7.9%
31
 
5.7%
19
 
3.5%
17
 
3.1%
15
 
2.8%
14
 
2.6%
13
 
2.4%
13
 
2.4%
Other values (47) 197
36.3%
2023-12-13T05:10:18.765918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 1084
66.7%
100
 
6.2%
80
 
4.9%
43
 
2.6%
31
 
1.9%
19
 
1.2%
17
 
1.0%
15
 
0.9%
14
 
0.9%
13
 
0.8%
Other values (48) 210
 
12.9%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 1084
66.7%
Other Letter 542
33.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
100
18.5%
80
14.8%
43
 
7.9%
31
 
5.7%
19
 
3.5%
17
 
3.1%
15
 
2.8%
14
 
2.6%
13
 
2.4%
13
 
2.4%
Other values (47) 197
36.3%
Other Punctuation
ValueCountFrequency (%)
* 1084
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1084
66.7%
Hangul 542
33.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
100
18.5%
80
14.8%
43
 
7.9%
31
 
5.7%
19
 
3.5%
17
 
3.1%
15
 
2.8%
14
 
2.6%
13
 
2.4%
13
 
2.4%
Other values (47) 197
36.3%
Common
ValueCountFrequency (%)
* 1084
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1084
66.7%
Hangul 542
33.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 1084
100.0%
Hangul
ValueCountFrequency (%)
100
18.5%
80
14.8%
43
 
7.9%
31
 
5.7%
19
 
3.5%
17
 
3.1%
15
 
2.8%
14
 
2.6%
13
 
2.4%
13
 
2.4%
Other values (47) 197
36.3%

나이
Real number (ℝ)

Distinct52
Distinct (%)9.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48.162362
Minimum22
Maximum77
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 KiB
2023-12-13T05:10:18.947143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile32
Q140
median48
Q356
95-th percentile65
Maximum77
Range55
Interquartile range (IQR)16

Descriptive statistics

Standard deviation10.805259
Coefficient of variation (CV)0.22435069
Kurtosis-0.71828918
Mean48.162362
Median Absolute Deviation (MAD)8
Skewness0.089707678
Sum26104
Variance116.75363
MonotonicityNot monotonic
2023-12-13T05:10:19.117907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
43 29
 
5.4%
44 23
 
4.2%
54 23
 
4.2%
32 20
 
3.7%
45 18
 
3.3%
41 18
 
3.3%
40 18
 
3.3%
51 17
 
3.1%
57 17
 
3.1%
53 17
 
3.1%
Other values (42) 342
63.1%
ValueCountFrequency (%)
22 1
 
0.2%
24 1
 
0.2%
25 1
 
0.2%
26 1
 
0.2%
27 2
 
0.4%
28 6
 
1.1%
29 3
 
0.6%
30 4
 
0.7%
31 8
 
1.5%
32 20
3.7%
ValueCountFrequency (%)
77 1
 
0.2%
74 2
 
0.4%
73 2
 
0.4%
72 1
 
0.2%
71 2
 
0.4%
70 2
 
0.4%
69 2
 
0.4%
68 6
1.1%
66 5
0.9%
65 10
1.8%

급호
Categorical

Distinct10
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size4.4 KiB
2호
122 
A등급
98 
임시특호
86 
1호
71 
D등급
46 
Other values (5)
119 

Length

Max length4
Median length3
Mean length2.7472325
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS등급
2nd rowS등급
3rd rowS등급
4th rowS등급
5th rowS등급

Common Values

ValueCountFrequency (%)
2호 122
22.5%
A등급 98
18.1%
임시특호 86
15.9%
1호 71
13.1%
D등급 46
 
8.5%
특호 37
 
6.8%
S등급 35
 
6.5%
B등급 21
 
3.9%
C등급 19
 
3.5%
보조강사 7
 
1.3%

Length

2023-12-13T05:10:19.313615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T05:10:19.480345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2호 122
22.5%
a등급 98
18.1%
임시특호 86
15.9%
1호 71
13.1%
d등급 46
 
8.5%
특호 37
 
6.8%
s등급 35
 
6.5%
b등급 21
 
3.9%
c등급 19
 
3.5%
보조강사 7
 
1.3%

Interactions

2023-12-13T05:10:18.017065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T05:10:19.590531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
강사명나이급호
강사명1.0000.1900.000
나이0.1901.0000.612
급호0.0000.6121.000
2023-12-13T05:10:19.701503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
나이급호
나이1.0000.231
급호0.2311.000

Missing values

2023-12-13T05:10:18.149051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T05:10:18.229516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

강사명나이급호
0박**58S등급
1조**49S등급
2이**63S등급
3조**49S등급
4고**35S등급
5이**37S등급
6김**48S등급
7이**51S등급
8오**52S등급
9김**45S등급
강사명나이급호
532이**462호
533천**542호
534최**382호
535김**64보조강사
536김**22보조강사
537염**25보조강사
538조**28보조강사
539안**26보조강사
540천**35보조강사
541장**28보조강사

Duplicate rows

Most frequently occurring

강사명나이급호# duplicates
5김**402호3
7김**411호3
9김**45A등급3
26이**432호3
0김**292호2
1김**31D등급2
2김**322호2
3김**32D등급2
4김**38임시특호2
6김**40임시특호2