Overview

Dataset statistics

Number of variables6
Number of observations2110
Missing cells1
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory101.1 KiB
Average record size in memory49.1 B

Variable types

Categorical3
Text2
Numeric1

Dataset

Description기업부설연구소, 기초과학연구기관, 대학부설연구기관, 방위산업연구기관, 자연계대학원, 지역혁신연구기관, 특정연구정부출연국공립 등 전문연구요원 인원배정 명부2024년 전문연구요원 인원배정 명부입니다.
Author병무청
URLhttps://www.data.go.kr/data/3068274/fileData.do

Alerts

분야 is highly overall correlated with 24년 배정인원High correlation
24년 배정인원 is highly overall correlated with 분야High correlation

Reproduction

Analysis started2024-03-14 20:06:37.793013
Analysis finished2024-03-14 20:06:39.489038
Duration1.7 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

분야
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size16.6 KiB
벤처기업부설연구소
1065 
중소기업부설연구소
404 
중견기업부설연구소
170 
자연계대학부설연구기관
145 
대학원연구기관
119 
Other values (9)
207 

Length

Max length11
Median length9
Mean length8.8890995
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row과기원
2nd row과기원
3rd row과기원
4th row과기원
5th row과기원

Common Values

ValueCountFrequency (%)
벤처기업부설연구소 1065
50.5%
중소기업부설연구소 404
 
19.1%
중견기업부설연구소 170
 
8.1%
자연계대학부설연구기관 145
 
6.9%
대학원연구기관 119
 
5.6%
정부출연연구소 48
 
2.3%
대기업부설연구소 43
 
2.0%
국가기관 등 연구소 37
 
1.8%
과기원부설연구소 28
 
1.3%
특정연구소 17
 
0.8%
Other values (4) 34
 
1.6%

Length

2024-03-15T05:06:39.652571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
벤처기업부설연구소 1065
48.8%
중소기업부설연구소 404
 
18.5%
중견기업부설연구소 170
 
7.8%
자연계대학부설연구기관 145
 
6.6%
대학원연구기관 119
 
5.4%
정부출연연구소 48
 
2.2%
대기업부설연구소 43
 
2.0%
국가기관 37
 
1.7%
37
 
1.7%
연구소 37
 
1.7%
Other values (6) 79
 
3.6%
Distinct2109
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size16.6 KiB
2024-03-15T05:06:40.407983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length46
Median length34
Mean length15.18673
Min length5

Characters and Unicode

Total characters32044
Distinct characters617
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2108 ?
Unique (%)99.9%

Sample

1st row한국과학기술원
2nd row광주과학기술원
3rd row한국과학기술원경영대학
4th row울산과학기술원
5th row대구경북과학기술원
ValueCountFrequency (%)
기업부설연구소 628
 
16.3%
기술연구소 114
 
3.0%
부설연구소 113
 
2.9%
연구소 100
 
2.6%
center 67
 
1.7%
r&d 62
 
1.6%
중앙연구소 26
 
0.7%
서울대학교 22
 
0.6%
고려대학교 17
 
0.4%
성균관대학교 14
 
0.4%
Other values (2444) 2701
69.9%
2024-03-15T05:06:41.678458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1923
 
6.0%
1906
 
5.9%
1776
 
5.5%
1764
 
5.5%
( 1589
 
5.0%
) 1589
 
5.0%
1566
 
4.9%
1254
 
3.9%
1038
 
3.2%
1023
 
3.2%
Other values (607) 16616
51.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 25631
80.0%
Space Separator 1764
 
5.5%
Open Punctuation 1589
 
5.0%
Close Punctuation 1589
 
5.0%
Uppercase Letter 664
 
2.1%
Lowercase Letter 645
 
2.0%
Other Punctuation 124
 
0.4%
Decimal Number 26
 
0.1%
Dash Punctuation 6
 
< 0.1%
Other Symbol 3
 
< 0.1%
Other values (2) 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1923
 
7.5%
1906
 
7.4%
1776
 
6.9%
1566
 
6.1%
1254
 
4.9%
1038
 
4.0%
1023
 
4.0%
791
 
3.1%
726
 
2.8%
692
 
2.7%
Other values (540) 12936
50.5%
Uppercase Letter
ValueCountFrequency (%)
R 123
18.5%
D 115
17.3%
C 93
14.0%
I 54
8.1%
T 45
 
6.8%
A 40
 
6.0%
S 32
 
4.8%
E 26
 
3.9%
N 18
 
2.7%
K 15
 
2.3%
Other values (15) 103
15.5%
Lowercase Letter
ValueCountFrequency (%)
e 176
27.3%
n 87
13.5%
t 83
12.9%
r 82
12.7%
a 39
 
6.0%
i 23
 
3.6%
c 19
 
2.9%
l 18
 
2.8%
o 17
 
2.6%
s 15
 
2.3%
Other values (13) 86
13.3%
Decimal Number
ValueCountFrequency (%)
2 11
42.3%
1 8
30.8%
3 3
 
11.5%
4 1
 
3.8%
8 1
 
3.8%
7 1
 
3.8%
0 1
 
3.8%
Other Punctuation
ValueCountFrequency (%)
& 107
86.3%
. 8
 
6.5%
· 4
 
3.2%
, 3
 
2.4%
/ 2
 
1.6%
Space Separator
ValueCountFrequency (%)
1764
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1589
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1589
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%
Other Symbol
ValueCountFrequency (%)
3
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 25634
80.0%
Common 5100
 
15.9%
Latin 1310
 
4.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1923
 
7.5%
1906
 
7.4%
1776
 
6.9%
1566
 
6.1%
1254
 
4.9%
1038
 
4.0%
1023
 
4.0%
791
 
3.1%
726
 
2.8%
692
 
2.7%
Other values (541) 12939
50.5%
Latin
ValueCountFrequency (%)
e 176
13.4%
R 123
 
9.4%
D 115
 
8.8%
C 93
 
7.1%
n 87
 
6.6%
t 83
 
6.3%
r 82
 
6.3%
I 54
 
4.1%
T 45
 
3.4%
A 40
 
3.1%
Other values (39) 412
31.5%
Common
ValueCountFrequency (%)
1764
34.6%
( 1589
31.2%
) 1589
31.2%
& 107
 
2.1%
2 11
 
0.2%
. 8
 
0.2%
1 8
 
0.2%
- 6
 
0.1%
· 4
 
0.1%
3 3
 
0.1%
Other values (7) 11
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 25631
80.0%
ASCII 6405
 
20.0%
None 7
 
< 0.1%
Number Forms 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1923
 
7.5%
1906
 
7.4%
1776
 
6.9%
1566
 
6.1%
1254
 
4.9%
1038
 
4.0%
1023
 
4.0%
791
 
3.1%
726
 
2.8%
692
 
2.7%
Other values (540) 12936
50.5%
ASCII
ValueCountFrequency (%)
1764
27.5%
( 1589
24.8%
) 1589
24.8%
e 176
 
2.7%
R 123
 
1.9%
D 115
 
1.8%
& 107
 
1.7%
C 93
 
1.5%
n 87
 
1.4%
t 83
 
1.3%
Other values (54) 679
 
10.6%
None
ValueCountFrequency (%)
· 4
57.1%
3
42.9%
Number Forms
ValueCountFrequency (%)
1
100.0%
Distinct961
Distinct (%)45.6%
Missing1
Missing (%)< 0.1%
Memory size16.6 KiB
2024-03-15T05:06:42.755410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length22
Mean length15.772878
Min length11

Characters and Unicode

Total characters33265
Distinct characters339
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique650 ?
Unique (%)30.8%

Sample

1st row대전광역시 유성구 대학로
2nd row광주광역시 북구 첨단과기로
3rd row서울특별시 동대문구 회기로
4th row울산광역시 울주군 언양읍 유니스트길
5th row대구광역시 달성군 현풍읍 테크노중앙대로
ValueCountFrequency (%)
서울특별시 843
 
11.8%
경기도 637
 
8.9%
성남시 233
 
3.3%
강남구 200
 
2.8%
대전광역시 173
 
2.4%
유성구 152
 
2.1%
분당구 149
 
2.1%
서초구 94
 
1.3%
수원시 93
 
1.3%
금천구 68
 
1.0%
Other values (1185) 4480
62.9%
2024-03-15T05:06:44.283899image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5013
 
15.1%
2131
 
6.4%
2042
 
6.1%
1880
 
5.7%
1142
 
3.4%
923
 
2.8%
911
 
2.7%
891
 
2.7%
889
 
2.7%
736
 
2.2%
Other values (329) 16707
50.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 26784
80.5%
Space Separator 5013
 
15.1%
Decimal Number 1468
 
4.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2131
 
8.0%
2042
 
7.6%
1880
 
7.0%
1142
 
4.3%
923
 
3.4%
911
 
3.4%
891
 
3.3%
889
 
3.3%
736
 
2.7%
731
 
2.7%
Other values (318) 14508
54.2%
Decimal Number
ValueCountFrequency (%)
1 334
22.8%
2 236
16.1%
5 148
10.1%
3 138
9.4%
8 123
 
8.4%
6 120
 
8.2%
4 117
 
8.0%
0 90
 
6.1%
9 85
 
5.8%
7 77
 
5.2%
Space Separator
ValueCountFrequency (%)
5013
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 26784
80.5%
Common 6481
 
19.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2131
 
8.0%
2042
 
7.6%
1880
 
7.0%
1142
 
4.3%
923
 
3.4%
911
 
3.4%
891
 
3.3%
889
 
3.3%
736
 
2.7%
731
 
2.7%
Other values (318) 14508
54.2%
Common
ValueCountFrequency (%)
5013
77.3%
1 334
 
5.2%
2 236
 
3.6%
5 148
 
2.3%
3 138
 
2.1%
8 123
 
1.9%
6 120
 
1.9%
4 117
 
1.8%
0 90
 
1.4%
9 85
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 26784
80.5%
ASCII 6481
 
19.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5013
77.3%
1 334
 
5.2%
2 236
 
3.6%
5 148
 
2.3%
3 138
 
2.1%
8 123
 
1.9%
6 120
 
1.9%
4 117
 
1.8%
0 90
 
1.4%
9 85
 
1.3%
Hangul
ValueCountFrequency (%)
2131
 
8.0%
2042
 
7.6%
1880
 
7.0%
1142
 
4.3%
923
 
3.4%
911
 
3.4%
891
 
3.3%
889
 
3.3%
736
 
2.7%
731
 
2.7%
Other values (318) 14508
54.2%

선정년도
Real number (ℝ)

Distinct44
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.8938
Minimum1973
Maximum2024
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.7 KiB
2024-03-15T05:06:44.786820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1973
5-th percentile1996
Q12016
median2020
Q32022
95-th percentile2024
Maximum2024
Range51
Interquartile range (IQR)6

Descriptive statistics

Standard deviation8.5637159
Coefficient of variation (CV)0.0042459924
Kurtosis4.0139235
Mean2016.8938
Median Absolute Deviation (MAD)2
Skewness-2.114402
Sum4255646
Variance73.337231
MonotonicityNot monotonic
2024-03-15T05:06:45.198411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
2022 360
17.1%
2021 315
14.9%
2020 237
11.2%
2024 168
 
8.0%
2023 130
 
6.2%
2019 125
 
5.9%
2018 122
 
5.8%
2017 96
 
4.5%
2016 73
 
3.5%
2015 60
 
2.8%
Other values (34) 424
20.1%
ValueCountFrequency (%)
1973 1
 
< 0.1%
1978 1
 
< 0.1%
1981 1
 
< 0.1%
1982 7
0.3%
1984 2
 
0.1%
1985 2
 
0.1%
1986 6
0.3%
1987 4
0.2%
1988 2
 
0.1%
1989 2
 
0.1%
ValueCountFrequency (%)
2024 168
8.0%
2023 130
 
6.2%
2022 360
17.1%
2021 315
14.9%
2020 237
11.2%
2019 125
 
5.9%
2018 122
 
5.8%
2017 96
 
4.5%
2016 73
 
3.5%
2015 60
 
2.8%

24년 배정인원
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size16.6 KiB
총괄배정
1032 
미신청
507 
미배정
202 
교육부 총괄배정
119 
1
111 
Other values (9)
139 

Length

Max length8
Median length4
Mean length3.549763
Min length1

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st row328
2nd row54
3rd rowKAIST 포함
4th row54
5th row37

Common Values

ValueCountFrequency (%)
총괄배정 1032
48.9%
미신청 507
24.0%
미배정 202
 
9.6%
교육부 총괄배정 119
 
5.6%
1 111
 
5.3%
2 61
 
2.9%
0 60
 
2.8%
배정제한 7
 
0.3%
5 3
 
0.1%
3 3
 
0.1%
Other values (4) 5
 
0.2%

Length

2024-03-15T05:06:45.639346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
총괄배정 1151
51.6%
미신청 507
22.7%
미배정 202
 
9.1%
교육부 119
 
5.3%
1 111
 
5.0%
2 61
 
2.7%
0 60
 
2.7%
배정제한 7
 
0.3%
5 3
 
0.1%
3 3
 
0.1%
Other values (5) 6
 
0.3%

관할청
Categorical

Distinct14
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size16.6 KiB
서울
844 
경인
566 
대전.충남
226 
인천
111 
부산.울산
85 
Other values (9)
278 

Length

Max length5
Median length2
Mean length2.6312796
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대전.충남
2nd row광주.전남
3rd row서울
4th row부산.울산
5th row대구.경북

Common Values

ValueCountFrequency (%)
서울 844
40.0%
경인 566
26.8%
대전.충남 226
 
10.7%
인천 111
 
5.3%
부산.울산 85
 
4.0%
대구.경북 83
 
3.9%
충북 48
 
2.3%
경남 39
 
1.8%
광주.전남 32
 
1.5%
강원 25
 
1.2%
Other values (4) 51
 
2.4%

Length

2024-03-15T05:06:46.071885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울 844
40.0%
경인 566
26.8%
대전.충남 226
 
10.7%
인천 111
 
5.3%
부산.울산 85
 
4.0%
대구.경북 83
 
3.9%
충북 48
 
2.3%
경남 39
 
1.8%
광주.전남 32
 
1.5%
강원 25
 
1.2%
Other values (4) 51
 
2.4%

Interactions

2024-03-15T05:06:38.749699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-15T05:06:46.314433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분야선정년도24년 배정인원관할청
분야1.0000.5450.9460.497
선정년도0.5451.0000.4940.069
24년 배정인원0.9460.4941.0000.403
관할청0.4970.0690.4031.000
2024-03-15T05:06:46.866578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분야24년 배정인원관할청
분야1.0000.5620.150
24년 배정인원0.5621.0000.116
관할청0.1500.1161.000
2024-03-15T05:06:47.128732image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선정년도분야24년 배정인원관할청
선정년도1.0000.2680.2250.035
분야0.2681.0000.5620.150
24년 배정인원0.2250.5621.0000.116
관할청0.0350.1500.1161.000

Missing values

2024-03-15T05:06:39.100662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-15T05:06:39.414130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

분야업체명소재지선정년도24년 배정인원관할청
0과기원한국과학기술원대전광역시 유성구 대학로1992328대전.충남
1과기원광주과학기술원광주광역시 북구 첨단과기로199654광주.전남
2과기원한국과학기술원경영대학서울특별시 동대문구 회기로2000KAIST 포함서울
3과기원울산과학기술원울산광역시 울주군 언양읍 유니스트길200954부산.울산
4과기원대구경북과학기술원대구광역시 달성군 현풍읍 테크노중앙대로201137대구.경북
5과기원부설연구소한국과학기술원자연과학연구소대전광역시 유성구 대학로1990미배정대전.충남
6과기원부설연구소한국과학기술원기계기술연구소대전광역시 유성구 대학로1990미배정대전.충남
7과기원부설연구소한국과학기술원응용과학연구소대전광역시 유성구 대학로1990미배정대전.충남
8과기원부설연구소한국과학기술원정보전자연구소대전광역시 유성구 대학로1990미배정대전.충남
9과기원부설연구소한국과학기술원부설고등과학원서울특별시 동대문구 회기로1997미배정서울
분야업체명소재지선정년도24년 배정인원관할청
2100특정연구소기초과학연구원첨단연성물질연구단울산광역시 울주군 언양읍 유니스트길2016미배정부산.울산
2101특정연구소한국세라믹기술원 이천분원경기도 이천시 신둔면 경충대로2016미배정경인
2102특정연구소기초과학연구원 유전체항상성연구단울산광역시 울주군 언양읍 유니스트길2017미배정부산.울산
2103특정연구소기초과학연구원원자제어저차원전자계연구단경상북도 포항시 남구 청암로2017미배정대구.경북
2104특정연구소기초과학연구원분자활성촉매반응 연구단대전광역시 유성구 대학로2017미배정대전.충남
2105특정연구소한국세라믹기술원 오송융합바이오세라믹충청북도 청주시 흥덕구 오송읍 오송생명1로2018미배정충북
2106특정연구소한국원자력통제기술원대전광역시 유성구 유성대로2022미배정대전.충남
2107특정연구소기초과학연구원 시냅스 뇌질환 연구단대전광역시 유성구 대학로2022미배정대전.충남
2108특정연구소기초과학연구원 중이온가속기연구소대전광역시 유성구 국제과학로2022미배정대전.충남
2109특정연구소한국과학기술원 인공위성연구소대전광역시 유성구 대학로2024미배정대전.충남