Overview

Dataset statistics

Number of variables5
Number of observations466
Missing cells0
Missing cells (%)0.0%
Duplicate rows47
Duplicate rows (%)10.1%
Total size in memory18.3 KiB
Average record size in memory40.3 B

Variable types

Text1
Categorical3
DateTime1

Dataset

Description2001~현재까지의 수목보호기술자 자격 취득자에 대한 관리정보 데이터 파일입니다.(산림사업법인관리시스템에 등록된 수목보호기술자 자격정보)
Author산림청
URLhttps://www.data.go.kr/data/15066523/fileData.do

Alerts

Dataset has 47 (10.1%) duplicate rowsDuplicates
등록일시 is highly overall correlated with 자격증번호 and 1 other fieldsHigh correlation
자격증번호 is highly overall correlated with 합격일 and 1 other fieldsHigh correlation
합격일 is highly overall correlated with 자격증번호 and 1 other fieldsHigh correlation

Reproduction

Analysis started2023-12-12 05:17:54.348836
Analysis finished2023-12-12 05:17:54.906197
Duration0.56 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

이름
Text

Distinct78
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
2023-12-12T14:17:55.101651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length2.9978541
Min length2

Characters and Unicode

Total characters1397
Distinct characters86
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)8.2%

Sample

1st row김**
2nd row최**
3rd row노**
4th row강**
5th row고**
ValueCountFrequency (%)
81
17.4%
78
16.7%
36
 
7.7%
28
 
6.0%
26
 
5.6%
15
 
3.2%
14
 
3.0%
14
 
3.0%
10
 
2.1%
10
 
2.1%
Other values (68) 154
33.0%
2023-12-12T14:17:55.489759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 876
62.7%
87
 
6.2%
86
 
6.2%
38
 
2.7%
28
 
2.0%
26
 
1.9%
15
 
1.1%
15
 
1.1%
14
 
1.0%
10
 
0.7%
Other values (76) 202
 
14.5%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 876
62.7%
Other Letter 521
37.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
87
16.7%
86
16.5%
38
 
7.3%
28
 
5.4%
26
 
5.0%
15
 
2.9%
15
 
2.9%
14
 
2.7%
10
 
1.9%
10
 
1.9%
Other values (75) 192
36.9%
Other Punctuation
ValueCountFrequency (%)
* 876
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 876
62.7%
Hangul 521
37.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
87
16.7%
86
16.5%
38
 
7.3%
28
 
5.4%
26
 
5.0%
15
 
2.9%
15
 
2.9%
14
 
2.7%
10
 
1.9%
10
 
1.9%
Other values (75) 192
36.9%
Common
ValueCountFrequency (%)
* 876
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 876
62.7%
Hangul 521
37.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 876
100.0%
Hangul
ValueCountFrequency (%)
87
16.7%
86
16.5%
38
 
7.3%
28
 
5.4%
26
 
5.0%
15
 
2.9%
15
 
2.9%
14
 
2.7%
10
 
1.9%
10
 
1.9%
Other values (75) 192
36.9%

자격증번호
Categorical

HIGH CORRELATION 

Distinct44
Distinct (%)9.4%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
2109***
 
31
2111***
 
30
2806***
 
26
2906***
 
22
2609***
 
21
Other values (39)
336 

Length

Max length7
Median length7
Mean length6.9935622
Min length4

Unique

Unique4 ?
Unique (%)0.9%

Sample

1st row2306***
2nd row2612***
3rd row2612***
4th row2506***
5th row2609***

Common Values

ValueCountFrequency (%)
2109*** 31
 
6.7%
2111*** 30
 
6.4%
2806*** 26
 
5.6%
2906*** 22
 
4.7%
2609*** 21
 
4.5%
2306*** 20
 
4.3%
2201*** 18
 
3.9%
2605*** 17
 
3.6%
2912*** 17
 
3.6%
2403*** 16
 
3.4%
Other values (34) 248
53.2%

Length

2023-12-12T14:17:55.634411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2109 31
 
6.7%
2111 30
 
6.4%
2806 26
 
5.6%
2906 22
 
4.7%
2609 21
 
4.5%
2306 20
 
4.3%
2201 18
 
3.9%
2605 17
 
3.6%
2912 17
 
3.6%
2106 16
 
3.4%
Other values (34) 248
53.2%
Distinct168
Distinct (%)36.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
Minimum2001-04-10 00:00:00
Maximum2017-12-14 00:00:00
2023-12-12T14:17:55.804610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:17:55.982241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

합격일
Categorical

HIGH CORRELATION 

Distinct47
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
<NA>
38 
2016-08-10
 
20
2011-08-18
 
18
2009-12-03
 
17
2010-11-11
 
17
Other values (42)
356 

Length

Max length10
Median length10
Mean length9.5107296
Min length4

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row2015-05-20
5th row2016-08-10

Common Values

ValueCountFrequency (%)
<NA> 38
 
8.2%
2016-08-10 20
 
4.3%
2011-08-18 18
 
3.9%
2009-12-03 17
 
3.6%
2010-11-11 17
 
3.6%
2004-03-20 16
 
3.4%
2008-11-27 16
 
3.4%
2009-09-23 15
 
3.2%
2010-06-24 15
 
3.2%
2005-05-09 14
 
3.0%
Other values (37) 280
60.1%

Length

2023-12-12T14:17:56.152103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 38
 
8.2%
2016-08-10 20
 
4.3%
2011-08-18 18
 
3.9%
2009-12-03 17
 
3.6%
2010-11-11 17
 
3.6%
2004-03-20 16
 
3.4%
2008-11-27 16
 
3.4%
2009-09-23 15
 
3.2%
2010-06-24 15
 
3.2%
2005-05-09 14
 
3.0%
Other values (37) 280
60.1%

등록일시
Categorical

HIGH CORRELATION 

Distinct32
Distinct (%)6.9%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
2010-06-21
191 
2014-05-07
66 
2016-11-24
62 
2010-08-31
32 
2011-01-04
29 
Other values (27)
86 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique18 ?
Unique (%)3.9%

Sample

1st row2017-12-15
2nd row2017-02-07
3rd row2016-12-28
4th row2016-11-24
5th row2016-11-24

Common Values

ValueCountFrequency (%)
2010-06-21 191
41.0%
2014-05-07 66
 
14.2%
2016-11-24 62
 
13.3%
2010-08-31 32
 
6.9%
2011-01-04 29
 
6.2%
2011-12-19 28
 
6.0%
2014-09-25 10
 
2.1%
2015-01-09 9
 
1.9%
2016-01-07 7
 
1.5%
2016-01-08 6
 
1.3%
Other values (22) 26
 
5.6%

Length

2023-12-12T14:17:56.299448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2010-06-21 191
41.0%
2014-05-07 66
 
14.2%
2016-11-24 62
 
13.3%
2010-08-31 32
 
6.9%
2011-01-04 29
 
6.2%
2011-12-19 28
 
6.0%
2014-09-25 10
 
2.1%
2015-01-09 9
 
1.9%
2016-01-07 7
 
1.5%
2016-01-08 6
 
1.3%
Other values (22) 26
 
5.6%

Correlations

2023-12-12T14:17:56.398295image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
이름자격증번호합격일등록일시
이름1.0000.7980.0000.975
자격증번호0.7981.0001.0000.962
합격일0.0001.0001.0001.000
등록일시0.9750.9621.0001.000
2023-12-12T14:17:56.517617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록일시자격증번호합격일
등록일시1.0000.5540.953
자격증번호0.5541.0000.974
합격일0.9530.9741.000
2023-12-12T14:17:56.609765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자격증번호합격일등록일시
자격증번호1.0000.9740.554
합격일0.9741.0000.953
등록일시0.5540.9531.000

Missing values

2023-12-12T14:17:54.740167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:17:54.847708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

이름자격증번호교부일합격일등록일시
0김**2306***2017-12-14<NA>2017-12-15
1최**2612***2017-02-07<NA>2017-02-07
2노**2612***2016-12-28<NA>2016-12-28
3강**2506***2015-06-152015-05-202016-11-24
4고**2609***2016-09-122016-08-102016-11-24
5곽**2506***2015-09-222015-09-092016-11-24
6권**2506***2015-09-222015-09-092016-11-24
7권**2609***2016-09-122016-08-102016-11-24
8김**2306***2014-06-272014-05-282016-11-24
9김**2609***2016-09-122016-08-102016-11-24
이름자격증번호교부일합격일등록일시
456최**2906***2009-06-252009-05-292010-06-21
457최**2806***2008-12-082008-11-272010-06-21
458표**2906***2009-06-092009-05-292010-06-21
459한**2304***2003-04-232003-04-192010-06-21
460한**2605***2006-06-012006-05-192010-06-21
461한**2611***2006-11-132006-11-032010-06-21
462현**2310***2003-11-102003-10-112010-06-21
463홍**2906***2009-09-282009-09-232010-06-21
464홍**2508***2005-08-022005-07-262010-06-21
465황**2906***2009-06-152009-05-292010-06-21

Duplicate rows

Most frequently occurring

이름자격증번호교부일합격일등록일시# duplicates
25이**2111***2010-11-292010-11-112011-01-046
24이**2109***2010-09-132010-08-262011-01-045
3김**2109***2011-09-192011-08-182011-12-193
5김**2306***2014-06-272014-05-282016-11-243
13김**2609***2016-09-122016-08-102016-11-243
27이**2209***2012-09-102012-08-222014-05-073
35이**2512***2015-12-242015-12-022016-11-243
46최**2506***2015-06-152015-05-202016-11-243
0강**2311***2013-12-032013-11-202014-05-072
1김**2106***2010-07-142010-06-242010-08-312