Overview

Dataset statistics

Number of variables3
Number of observations198
Missing cells0
Missing cells (%)0.0%
Duplicate rows5
Duplicate rows (%)2.5%
Total size in memory4.8 KiB
Average record size in memory24.7 B

Variable types

Categorical2
Text1

Dataset

Description산·학·연의 다양한 경력과 지식을 가진 퇴직 과학기술인에 대한 정보입니다.
Author한국과학기술정보연구원
URLhttps://www.data.go.kr/data/3077287/fileData.do

Alerts

Dataset has 5 (2.5%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 06:54:11.466387
Analysis finished2023-12-12 06:54:11.745800
Duration0.28 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

성명
Categorical

Distinct43
Distinct (%)21.7%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
김**
38 
이**
23 
박**
13 
조**
12 
정**
 
10
Other values (38)
102 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique14 ?
Unique (%)7.1%

Sample

1st row감**
2nd row강**
3rd row강**
4th row강**
5th row강**

Common Values

ValueCountFrequency (%)
김** 38
19.2%
이** 23
 
11.6%
박** 13
 
6.6%
조** 12
 
6.1%
정** 10
 
5.1%
윤** 7
 
3.5%
강** 7
 
3.5%
신** 6
 
3.0%
최** 6
 
3.0%
고** 5
 
2.5%
Other values (33) 71
35.9%

Length

2023-12-12T15:54:11.806536image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
38
19.2%
23
 
11.6%
13
 
6.6%
12
 
6.1%
10
 
5.1%
7
 
3.5%
7
 
3.5%
6
 
3.0%
6
 
3.0%
5
 
2.5%
Other values (33) 71
35.9%

분야
Categorical

Distinct9
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
큐레이터
30 
화학
30 
IT
28 
BT
27 
재료
27 
Other values (4)
56 

Length

Max length6
Median length2
Mean length2.4343434
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row큐레이터
2nd rowIT
3rd rowBT
4th rowBT
5th row화학

Common Values

ValueCountFrequency (%)
큐레이터 30
15.2%
화학 30
15.2%
IT 28
14.1%
BT 27
13.6%
재료 27
13.6%
기계 25
12.6%
에너지 14
7.1%
환경 14
7.1%
과학기술일반 3
 
1.5%

Length

2023-12-12T15:54:11.948394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:54:12.098770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
큐레이터 30
15.2%
화학 30
15.2%
it 28
14.1%
bt 27
13.6%
재료 27
13.6%
기계 25
12.6%
에너지 14
7.1%
환경 14
7.1%
과학기술일반 3
 
1.5%

전공
Text

Distinct125
Distinct (%)63.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2023-12-12T15:54:12.424268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length19
Mean length4.959596
Min length1

Characters and Unicode

Total characters982
Distinct characters151
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique97 ?
Unique (%)49.0%

Sample

1st row산업기술경영
2nd row전자
3rd row생물학/식물생리학
4th row동물분자생리학
5th row고체물리화학/화학/에너지
ValueCountFrequency (%)
기계공학 10
 
4.7%
화학공학 10
 
4.7%
기계 6
 
2.8%
식품공학 6
 
2.8%
화학 6
 
2.8%
금속공학 5
 
2.4%
재료공학 5
 
2.4%
미생물학 4
 
1.9%
유기화학 4
 
1.9%
재료 4
 
1.9%
Other values (122) 151
71.6%
2023-12-12T15:54:12.878789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
151
 
15.4%
70
 
7.1%
41
 
4.2%
37
 
3.8%
36
 
3.7%
26
 
2.6%
26
 
2.6%
23
 
2.3%
21
 
2.1%
21
 
2.1%
Other values (141) 530
54.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 871
88.7%
Space Separator 41
 
4.2%
Other Punctuation 28
 
2.9%
Lowercase Letter 25
 
2.5%
Uppercase Letter 11
 
1.1%
Open Punctuation 3
 
0.3%
Close Punctuation 3
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
151
 
17.3%
70
 
8.0%
37
 
4.2%
36
 
4.1%
26
 
3.0%
26
 
3.0%
23
 
2.6%
21
 
2.4%
21
 
2.4%
20
 
2.3%
Other values (115) 440
50.5%
Lowercase Letter
ValueCountFrequency (%)
e 5
20.0%
c 3
12.0%
t 3
12.0%
a 3
12.0%
h 2
 
8.0%
i 2
 
8.0%
m 1
 
4.0%
f 1
 
4.0%
s 1
 
4.0%
r 1
 
4.0%
Other values (3) 3
12.0%
Uppercase Letter
ValueCountFrequency (%)
T 2
18.2%
C 2
18.2%
M 2
18.2%
I 2
18.2%
D 1
9.1%
P 1
9.1%
S 1
9.1%
Other Punctuation
ValueCountFrequency (%)
, 17
60.7%
/ 9
32.1%
. 2
 
7.1%
Space Separator
ValueCountFrequency (%)
41
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 871
88.7%
Common 75
 
7.6%
Latin 36
 
3.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
151
 
17.3%
70
 
8.0%
37
 
4.2%
36
 
4.1%
26
 
3.0%
26
 
3.0%
23
 
2.6%
21
 
2.4%
21
 
2.4%
20
 
2.3%
Other values (115) 440
50.5%
Latin
ValueCountFrequency (%)
e 5
13.9%
c 3
 
8.3%
t 3
 
8.3%
a 3
 
8.3%
h 2
 
5.6%
T 2
 
5.6%
C 2
 
5.6%
M 2
 
5.6%
i 2
 
5.6%
I 2
 
5.6%
Other values (10) 10
27.8%
Common
ValueCountFrequency (%)
41
54.7%
, 17
22.7%
/ 9
 
12.0%
( 3
 
4.0%
) 3
 
4.0%
. 2
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 871
88.7%
ASCII 111
 
11.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
151
 
17.3%
70
 
8.0%
37
 
4.2%
36
 
4.1%
26
 
3.0%
26
 
3.0%
23
 
2.6%
21
 
2.4%
21
 
2.4%
20
 
2.3%
Other values (115) 440
50.5%
ASCII
ValueCountFrequency (%)
41
36.9%
, 17
15.3%
/ 9
 
8.1%
e 5
 
4.5%
c 3
 
2.7%
t 3
 
2.7%
a 3
 
2.7%
( 3
 
2.7%
) 3
 
2.7%
h 2
 
1.8%
Other values (16) 22
19.8%

Correlations

2023-12-12T15:54:12.971581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성명분야
성명1.0000.422
분야0.4221.000
2023-12-12T15:54:13.043522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성명분야
성명1.0000.147
분야0.1471.000
2023-12-12T15:54:13.121171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성명분야
성명1.0000.147
분야0.1471.000

Missing values

2023-12-12T15:54:11.641198image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:54:11.716983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

성명분야전공
0감**큐레이터산업기술경영
1강**IT전자
2강**BT생물학/식물생리학
3강**BT동물분자생리학
4강**화학고체물리화학/화학/에너지
5강**재료세라믹
6강**BT수의내과학(면역학)
7강**화학물리화학, Ph.D.
8고**큐레이터군사학
9고**화학화학공학
성명분야전공
188허**IT컴퓨터정보학 (박사수료)
189허**기계불규칙진동
190현**화학섬유공학
191홍**환경환경공학
192홍**환경자원순환공학
193황**환경토양환경
194황**화학화학공학
195황**재료
196황**환경환경
197황**재료고온구조용 복합재료

Duplicate rows

Most frequently occurring

성명분야전공# duplicates
0김**재료금속공학3
1김**재료재료공학2
2문**기계기계2
3심**재료금속재료공학2
4이**화학화학공학2