Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows691
Duplicate rows (%)6.9%
Total size in memory322.3 KiB
Average record size in memory33.0 B

Variable types

Numeric1
Categorical1
Text1

Dataset

Description산림복지전문업지원시스템에서 추출한 자격증 발급 정보에 관련된 내용입니다. 자격취득연도, 자격종류, 성명 등으로 구성되어있습니다.
Author한국산림복지진흥원
URLhttps://www.data.go.kr/data/15088850/fileData.do

Alerts

Dataset has 691 (6.9%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 05:50:31.117740
Analysis finished2023-12-12 05:50:31.571851
Duration0.45 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

자격취득연도
Real number (ℝ)

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.7249
Minimum2013
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:50:31.646708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2013
5-th percentile2013
Q12015
median2017
Q32019
95-th percentile2020
Maximum2021
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.3438791
Coefficient of variation (CV)0.0011622206
Kurtosis-1.1230527
Mean2016.7249
Median Absolute Deviation (MAD)2
Skewness-0.2434831
Sum20167249
Variance5.4937694
MonotonicityNot monotonic
2023-12-12T14:50:31.880411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
2019 1697
17.0%
2018 1655
16.6%
2013 1602
16.0%
2017 1308
13.1%
2016 1069
10.7%
2015 943
9.4%
2020 917
9.2%
2014 632
 
6.3%
2021 177
 
1.8%
ValueCountFrequency (%)
2013 1602
16.0%
2014 632
 
6.3%
2015 943
9.4%
2016 1069
10.7%
2017 1308
13.1%
2018 1655
16.6%
2019 1697
17.0%
2020 917
9.2%
2021 177
 
1.8%
ValueCountFrequency (%)
2021 177
 
1.8%
2020 917
9.2%
2019 1697
17.0%
2018 1655
16.6%
2017 1308
13.1%
2016 1069
10.7%
2015 943
9.4%
2014 632
 
6.3%
2013 1602
16.0%

자격종류
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
숲해설가
5998 
유아숲지도사
3681 
숲길등산지도사
 
321

Length

Max length7
Median length4
Mean length4.8325
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row숲해설가
2nd row유아숲지도사
3rd row숲해설가
4th row숲해설가
5th row숲해설가

Common Values

ValueCountFrequency (%)
숲해설가 5998
60.0%
유아숲지도사 3681
36.8%
숲길등산지도사 321
 
3.2%

Length

2023-12-12T14:50:32.043880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:50:32.176901image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
숲해설가 5998
60.0%
유아숲지도사 3681
36.8%
숲길등산지도사 321
 
3.2%

성명
Text

Distinct91
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T14:50:32.389386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters30000
Distinct characters92
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row조**
2nd row노**
3rd row최**
4th row최**
5th row박**
ValueCountFrequency (%)
2185
21.9%
1552
15.5%
821
 
8.2%
478
 
4.8%
467
 
4.7%
296
 
3.0%
261
 
2.6%
249
 
2.5%
214
 
2.1%
207
 
2.1%
Other values (81) 3270
32.7%
2023-12-12T14:50:32.801992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 20000
66.7%
2185
 
7.3%
1552
 
5.2%
821
 
2.7%
478
 
1.6%
467
 
1.6%
296
 
1.0%
261
 
0.9%
249
 
0.8%
214
 
0.7%
Other values (82) 3477
 
11.6%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 20000
66.7%
Other Letter 10000
33.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2185
21.9%
1552
15.5%
821
 
8.2%
478
 
4.8%
467
 
4.7%
296
 
3.0%
261
 
2.6%
249
 
2.5%
214
 
2.1%
207
 
2.1%
Other values (81) 3270
32.7%
Other Punctuation
ValueCountFrequency (%)
* 20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 20000
66.7%
Hangul 10000
33.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2185
21.9%
1552
15.5%
821
 
8.2%
478
 
4.8%
467
 
4.7%
296
 
3.0%
261
 
2.6%
249
 
2.5%
214
 
2.1%
207
 
2.1%
Other values (81) 3270
32.7%
Common
ValueCountFrequency (%)
* 20000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20000
66.7%
Hangul 10000
33.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 20000
100.0%
Hangul
ValueCountFrequency (%)
2185
21.9%
1552
15.5%
821
 
8.2%
478
 
4.8%
467
 
4.7%
296
 
3.0%
261
 
2.6%
249
 
2.5%
214
 
2.1%
207
 
2.1%
Other values (81) 3270
32.7%

Interactions

2023-12-12T14:50:31.307671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:50:32.946654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자격취득연도자격종류성명
자격취득연도1.0000.2260.324
자격종류0.2261.0000.169
성명0.3240.1691.000
2023-12-12T14:50:33.064484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자격취득연도자격종류
자격취득연도1.0000.267
자격종류0.2671.000

Missing values

2023-12-12T14:50:31.442084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:50:31.529999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

자격취득연도자격종류성명
113912020숲해설가조**
40842016유아숲지도사노**
103782019숲해설가최**
61492017숲해설가최**
110702020숲해설가박**
33322015숲해설가장**
114242020유아숲지도사이**
20392014숲해설가정**
85362018숲해설가이**
29822015숲해설가이**
자격취득연도자격종류성명
90902019유아숲지도사이**
5652013숲해설가조**
54932017숲해설가정**
24032014숲해설가김**
108642019숲해설가김**
36362015숲해설가정**
58832017숲해설가송**
12042013숲해설가이**
53092017유아숲지도사김**
35182015숲해설가이**

Duplicate rows

Most frequently occurring

자격취득연도자격종류성명# duplicates
142013숲해설가김**341
472013숲해설가이**246
5412019유아숲지도사김**193
4402018유아숲지도사김**183
3502017유아숲지도사김**170
4972019숲해설가김**162
5662019유아숲지도사이**153
2212016숲해설가김**139
5192019숲해설가이**137
4202018숲해설가이**133