Overview

Dataset statistics

Number of variables6
Number of observations335
Missing cells0
Missing cells (%)0.0%
Duplicate rows12
Duplicate rows (%)3.6%
Total size in memory16.5 KiB
Average record size in memory50.4 B

Variable types

Categorical4
Text1
Numeric1

Dataset

Description한국연구재단이 보유하고있는 인재매칭플랫폼 시스템의 연도별 개인회원 연구실 회원가입현황입니다. 가입년도, 회원국적, 연구분야(대), 연구분야(중), 성별, 회원수 정보가 있습니다
URLhttps://www.data.go.kr/data/15117653/fileData.do

Alerts

Dataset has 12 (3.6%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 13:19:01.641749
Analysis finished2023-12-12 13:19:02.337593
Duration0.7 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

가입년도
Categorical

Distinct5
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
2020
170 
2021
70 
2022
63 
2023
31 
2019
 
1

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row2019
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
2020 170
50.7%
2021 70
20.9%
2022 63
 
18.8%
2023 31
 
9.3%
2019 1
 
0.3%

Length

2023-12-12T22:19:02.414821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:19:02.534734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020 170
50.7%
2021 70
20.9%
2022 63
 
18.8%
2023 31
 
9.3%
2019 1
 
0.3%

회원국적
Categorical

Distinct43
Distinct (%)12.8%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
인도
105 
대한민국
55 
파키스탄
28 
방글라데시
20 
네팔
12 
Other values (38)
115 

Length

Max length11
Median length7
Mean length3.2686567
Min length2

Unique

Unique18 ?
Unique (%)5.4%

Sample

1st row이집트
2nd row네팔
3rd row네팔
4th row네팔
5th row네팔

Common Values

ValueCountFrequency (%)
인도 105
31.3%
대한민국 55
16.4%
파키스탄 28
 
8.4%
방글라데시 20
 
6.0%
네팔 12
 
3.6%
베트남 12
 
3.6%
이란 11
 
3.3%
나이지리아 10
 
3.0%
이집트 10
 
3.0%
중화인민공화국 9
 
2.7%
Other values (33) 63
18.8%

Length

2023-12-12T22:19:02.668186image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
인도 105
31.3%
대한민국 55
16.4%
파키스탄 28
 
8.4%
방글라데시 20
 
6.0%
네팔 12
 
3.6%
베트남 12
 
3.6%
이란 11
 
3.3%
나이지리아 10
 
3.0%
이집트 10
 
3.0%
중화인민공화국 9
 
2.7%
Other values (33) 63
18.8%
Distinct19
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
생명과학
56 
화학
44 
재료
32 
농림수산식품
29 
물리학
25 
Other values (14)
149 

Length

Max length21
Median length11
Mean length5.3940299
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row화학
2nd row보건의료
3rd row보건의료
4th row보건의료
5th row생명과학

Common Values

ValueCountFrequency (%)
생명과학 56
16.7%
화학 44
13.1%
재료 32
9.6%
농림수산식품 29
8.7%
물리학 25
7.5%
보건의료 24
7.2%
에너지/ 자원 22
 
6.6%
화공 17
 
5.1%
환경 15
 
4.5%
정보/ 통신 14
 
4.2%
Other values (9) 57
17.0%

Length

2023-12-12T22:19:02.787845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
생명과학 56
 
13.1%
화학 44
 
10.3%
재료 32
 
7.5%
농림수산식품 29
 
6.8%
물리학 25
 
5.9%
보건의료 24
 
5.6%
에너지 22
 
5.2%
자원 22
 
5.2%
화공 17
 
4.0%
환경 15
 
3.5%
Other values (17) 140
32.9%
Distinct117
Distinct (%)34.9%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
2023-12-12T22:19:02.992039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length15
Mean length8.2149254
Min length3

Characters and Unicode

Total characters2752
Distinct characters159
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)16.1%

Sample

1st row 유기 화학
2nd row 기타 보건 의료
3rd row 의약품/ 의약품 개발
4th row 의약품/ 의약품 개발
5th row 기타 생명 과학
ValueCountFrequency (%)
화학 56
 
7.0%
기타 42
 
5.2%
과학 39
 
4.9%
생물학 31
 
3.9%
재료 29
 
3.6%
세포 23
 
2.9%
분자 23
 
2.9%
물리 22
 
2.7%
기술 19
 
2.4%
에너지 15
 
1.9%
Other values (163) 502
62.7%
2023-12-12T22:19:03.347014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
801
29.1%
176
 
6.4%
116
 
4.2%
114
 
4.1%
97
 
3.5%
/ 79
 
2.9%
64
 
2.3%
59
 
2.1%
54
 
2.0%
45
 
1.6%
Other values (149) 1147
41.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1864
67.7%
Space Separator 801
29.1%
Other Punctuation 79
 
2.9%
Uppercase Letter 6
 
0.2%
Dash Punctuation 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
176
 
9.4%
116
 
6.2%
114
 
6.1%
97
 
5.2%
64
 
3.4%
59
 
3.2%
54
 
2.9%
45
 
2.4%
44
 
2.4%
44
 
2.4%
Other values (143) 1051
56.4%
Uppercase Letter
ValueCountFrequency (%)
U 2
33.3%
T 2
33.3%
I 2
33.3%
Space Separator
ValueCountFrequency (%)
801
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 79
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1864
67.7%
Common 882
32.0%
Latin 6
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
176
 
9.4%
116
 
6.2%
114
 
6.1%
97
 
5.2%
64
 
3.4%
59
 
3.2%
54
 
2.9%
45
 
2.4%
44
 
2.4%
44
 
2.4%
Other values (143) 1051
56.4%
Common
ValueCountFrequency (%)
801
90.8%
/ 79
 
9.0%
- 2
 
0.2%
Latin
ValueCountFrequency (%)
U 2
33.3%
T 2
33.3%
I 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1864
67.7%
ASCII 888
32.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
801
90.2%
/ 79
 
8.9%
U 2
 
0.2%
- 2
 
0.2%
T 2
 
0.2%
I 2
 
0.2%
Hangul
ValueCountFrequency (%)
176
 
9.4%
116
 
6.2%
114
 
6.1%
97
 
5.2%
64
 
3.4%
59
 
3.2%
54
 
2.9%
45
 
2.4%
44
 
2.4%
44
 
2.4%
Other values (143) 1051
56.4%

성별
Categorical

Distinct2
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
남성
274 
여성
61 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row남성
2nd row여성
3rd row남성
4th row여성
5th row남성

Common Values

ValueCountFrequency (%)
남성 274
81.8%
여성 61
 
18.2%

Length

2023-12-12T22:19:03.500024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:19:03.596907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남성 274
81.8%
여성 61
 
18.2%

회원수
Real number (ℝ)

Distinct6
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.1343284
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2023-12-12T22:19:03.685720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.65422447
Coefficient of variation (CV)0.57675052
Kurtosis106.7096
Mean1.1343284
Median Absolute Deviation (MAD)0
Skewness9.0202348
Sum380
Variance0.42800965
MonotonicityNot monotonic
2023-12-12T22:19:03.774546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 309
92.2%
2 18
 
5.4%
3 4
 
1.2%
4 2
 
0.6%
10 1
 
0.3%
5 1
 
0.3%
ValueCountFrequency (%)
1 309
92.2%
2 18
 
5.4%
3 4
 
1.2%
4 2
 
0.6%
5 1
 
0.3%
10 1
 
0.3%
ValueCountFrequency (%)
10 1
 
0.3%
5 1
 
0.3%
4 2
 
0.6%
3 4
 
1.2%
2 18
 
5.4%
1 309
92.2%

Interactions

2023-12-12T22:19:02.050841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T22:19:03.844642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
가입년도회원국적연구분야(대)성별회원수
가입년도1.0000.0000.1810.0000.000
회원국적0.0001.0000.3840.3950.000
연구분야(대)0.1810.3841.0000.2380.000
성별0.0000.3950.2381.0000.000
회원수0.0000.0000.0000.0001.000
2023-12-12T22:19:03.930882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연구분야(대)회원국적성별가입년도
연구분야(대)1.0000.1010.2050.088
회원국적0.1011.0000.3090.000
성별0.2050.3091.0000.000
가입년도0.0880.0000.0001.000
2023-12-12T22:19:04.012679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
회원수가입년도회원국적연구분야(대)성별
회원수1.0000.0000.0000.0000.000
가입년도0.0001.0000.0000.0880.000
회원국적0.0000.0001.0000.1010.309
연구분야(대)0.0000.0880.1011.0000.205
성별0.0000.0000.3090.2051.000

Missing values

2023-12-12T22:19:02.178278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:19:02.288002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

가입년도회원국적연구분야(대)연구분야(중)성별회원수
02019이집트화학유기 화학남성1
12020네팔보건의료기타 보건 의료여성1
22020네팔보건의료의약품/ 의약품 개발남성1
32020네팔보건의료의약품/ 의약품 개발여성1
42020네팔생명과학기타 생명 과학남성1
52020네팔재료세라믹 재료남성1
62020네팔전기/ 전자기타 전기/전자남성1
72020네팔화학전기 화학남성1
82020노르웨이생명과학분자 세포 생물학여성1
92020대한민국생명과학분자 세포 생물학남성1
가입년도회원국적연구분야(대)연구분야(중)성별회원수
3252023인도화학융합 화학남성1
3262023인도네시아화공기타 화공남성1
3272023파키스탄농림수산식품식량 작물 과학남성1
3282023파키스탄물리학응집 물질 물리여성1
3292023파키스탄생명과학분자 세포 생물학남성1
3302023파키스탄전기/ 전자기타 전기/전자남성1
3312023파키스탄화공나노 화학 공정 기술남성1
3322023포르투갈물리학기타 물리학여성1
3332023나이지리아에너지/ 자원신재생 에너지남성1
3342023케냐건설/ 교통물류 기술남성1

Duplicate rows

Most frequently occurring

가입년도회원국적연구분야(대)연구분야(중)성별회원수# duplicates
02020대한민국보건의료보건학여성12
12020대한민국생명과학분자 세포 생물학남성12
22020미국생명과학분자 세포 생물학여성12
32020방글라데시생명과학분자 세포 생물학여성12
42020방글라데시정보/ 통신이동 통신남성12
52020베트남생명과학분자 세포 생물학여성12
62020베트남재료기타 재료남성12
72020인도보건의료의생명 과학남성12
82020인도에너지/ 자원신재생 에너지남성12
92020인도재료분석/ 물성 평가 기술남성12