Overview

Dataset statistics

Number of variables7
Number of observations2799
Missing cells0
Missing cells (%)0.0%
Duplicate rows254
Duplicate rows (%)9.1%
Total size in memory153.2 KiB
Average record size in memory56.0 B

Variable types

Categorical4
Text3

Dataset

Description2023년 4월 6일 기준 국가과학기술표준분류에 대한 정보입니다. 국가과학기술표준분류: 과학기술 관련 정보, 인력, 연구개발사업 등의 효율적 관리하고, 국가연구개발사업의 연구기획·평가 및 관리, 과학기술예측 및 기술수준평가 수행, 과학기술 정보의 관리·유통 등을 위한 과학기술 표준분류틀 해당 데이터가 보유한 컬럼은 다음과 같습니다. 컬럼명: 연구분야, 대분류코드, 대분류명, 대분류영문명, 중분류명, 중분류 영문명
URLhttps://www.data.go.kr/data/15113217/fileData.do

Alerts

Dataset has 254 (9.1%) duplicate rowsDuplicates
대분류명 is highly overall correlated with 연구분야 and 2 other fieldsHigh correlation
연구분야 is highly overall correlated with 대분류코드 and 2 other fieldsHigh correlation
대분류코드 is highly overall correlated with 연구분야 and 2 other fieldsHigh correlation
대분류 영문명 is highly overall correlated with 연구분야 and 2 other fieldsHigh correlation

Reproduction

Analysis started2023-12-12 14:53:13.282150
Analysis finished2023-12-12 14:53:13.871455
Duration0.59 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연구분야
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size22.0 KiB
인문사회학
1090 
인공물
848 
생명
445 
자연
343 
인간 과학과 기술
 
73

Length

Max length9
Median length5
Mean length3.6538049
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row자연
2nd row자연
3rd row자연
4th row자연
5th row자연

Common Values

ValueCountFrequency (%)
인문사회학 1090
38.9%
인공물 848
30.3%
생명 445
15.9%
자연 343
 
12.3%
인간 과학과 기술 73
 
2.6%

Length

2023-12-12T23:53:13.950063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:53:14.060183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
인문사회학 1090
37.0%
인공물 848
28.8%
생명 445
15.1%
자연 343
 
11.6%
인간 73
 
2.5%
과학과 73
 
2.5%
기술 73
 
2.5%

대분류코드
Categorical

HIGH CORRELATION 

Distinct22
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size22.0 KiB
HG
568 
HF
397 
LC
203 
LB
169 
EA
148 
Other values (17)
1314 

Length

Max length4
Median length2
Mean length2.0507324
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
HG 568
20.3%
HF 397
14.2%
LC 203
 
7.3%
LB 169
 
6.0%
EA 148
 
5.3%
ED 126
 
4.5%
HH 125
 
4.5%
ND 117
 
4.2%
EC 109
 
3.9%
NC 98
 
3.5%
Other values (12) 739
26.4%

Length

2023-12-12T23:53:14.455049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hg 568
20.3%
hf 397
14.2%
lc 203
 
7.3%
lb 169
 
6.0%
ea 148
 
5.3%
ed 126
 
4.5%
hh 125
 
4.5%
nd 117
 
4.2%
ec 109
 
3.9%
nc 98
 
3.5%
Other values (12) 739
26.4%

대분류명
Categorical

HIGH CORRELATION 

Distinct22
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size22.0 KiB
사회과학
568 
인문학
397 
보건의료
203 
농림수산식품
169 
기계
148 
Other values (17)
1314 

Length

Max length17
Median length11
Mean length4.4340836
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row수학
2nd row수학
3rd row수학
4th row수학
5th row수학

Common Values

ValueCountFrequency (%)
사회과학 568
20.3%
인문학 397
14.2%
보건의료 203
 
7.3%
농림수산식품 169
 
6.0%
기계 148
 
5.3%
전기/전자 126
 
4.5%
문화예술체육학 125
 
4.5%
지구과학(지구/대기/해양/천문) 117
 
4.2%
화공 109
 
3.9%
화학 98
 
3.5%
Other values (12) 739
26.4%

Length

2023-12-12T23:53:14.562380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
사회과학 568
19.9%
인문학 397
13.9%
보건의료 203
 
7.1%
농림수산식품 169
 
5.9%
기계 148
 
5.2%
전기/전자 126
 
4.4%
문화예술체육학 125
 
4.4%
지구과학(지구/대기/해양/천문 117
 
4.1%
화공 109
 
3.8%
화학 98
 
3.4%
Other values (14) 788
27.7%

대분류 영문명
Categorical

HIGH CORRELATION 

Distinct22
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size22.0 KiB
Social Science
568 
Humanities
397 
Health Sciences
203 
Agriculture, Fishery and Food
169 
Machinery
148 
Other values (17)
1314 

Length

Max length48
Median length29
Mean length16.892104
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMathematics
2nd rowMathematics
3rd rowMathematics
4th rowMathematics
5th rowMathematics

Common Values

ValueCountFrequency (%)
Social Science 568
20.3%
Humanities 397
14.2%
Health Sciences 203
 
7.3%
Agriculture, Fishery and Food 169
 
6.0%
Machinery 148
 
5.3%
Electricity/Electronics 126
 
4.5%
Culture/Arts/Sports 125
 
4.5%
Earth Science(Earth/Atmosphere/Marine/Astronomy) 117
 
4.2%
Chemical Engineering 109
 
3.9%
Chemistry 98
 
3.5%
Other values (12) 739
26.4%

Length

2023-12-12T23:53:14.671841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
science 675
14.6%
social 568
 
12.3%
humanities 397
 
8.6%
sciences 242
 
5.2%
and 218
 
4.7%
health 203
 
4.4%
agriculture 169
 
3.6%
fishery 169
 
3.6%
food 169
 
3.6%
machinery 148
 
3.2%
Other values (23) 1677
36.2%
Distinct277
Distinct (%)9.9%
Missing0
Missing (%)0.0%
Memory size22.0 KiB
2023-12-12T23:53:14.985846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters11196
Distinct characters22
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)0.8%

Sample

1st rowNA01
2nd rowNA01
3rd rowNA01
4th rowNA01
5th rowNA01
ValueCountFrequency (%)
hg12 63
 
2.3%
hf01 59
 
2.1%
hf02 48
 
1.7%
hg11 45
 
1.6%
hg04 44
 
1.6%
hg15 43
 
1.5%
hg19 41
 
1.5%
hf12 39
 
1.4%
hg01 38
 
1.4%
hg20 35
 
1.3%
Other values (267) 2344
83.7%
2023-12-12T23:53:15.502156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1849
16.5%
1 1446
12.9%
H 1292
11.5%
E 935
 
8.4%
G 641
 
5.7%
2 596
 
5.3%
F 448
 
4.0%
L 445
 
4.0%
C 444
 
4.0%
N 343
 
3.1%
Other values (12) 2757
24.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5598
50.0%
Uppercase Letter 5598
50.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
H 1292
23.1%
E 935
16.7%
G 641
11.5%
F 448
 
8.0%
L 445
 
7.9%
C 444
 
7.9%
N 343
 
6.1%
B 325
 
5.8%
A 316
 
5.6%
D 243
 
4.3%
Other values (2) 166
 
3.0%
Decimal Number
ValueCountFrequency (%)
0 1849
33.0%
1 1446
25.8%
2 596
 
10.6%
3 315
 
5.6%
4 278
 
5.0%
5 262
 
4.7%
8 239
 
4.3%
9 230
 
4.1%
7 201
 
3.6%
6 182
 
3.3%

Most occurring scripts

ValueCountFrequency (%)
Common 5598
50.0%
Latin 5598
50.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
H 1292
23.1%
E 935
16.7%
G 641
11.5%
F 448
 
8.0%
L 445
 
7.9%
C 444
 
7.9%
N 343
 
6.1%
B 325
 
5.8%
A 316
 
5.6%
D 243
 
4.3%
Other values (2) 166
 
3.0%
Common
ValueCountFrequency (%)
0 1849
33.0%
1 1446
25.8%
2 596
 
10.6%
3 315
 
5.6%
4 278
 
5.0%
5 262
 
4.7%
8 239
 
4.3%
9 230
 
4.1%
7 201
 
3.6%
6 182
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11196
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1849
16.5%
1 1446
12.9%
H 1292
11.5%
E 935
 
8.4%
G 641
 
5.7%
2 596
 
5.3%
F 448
 
4.0%
L 445
 
4.0%
C 444
 
4.0%
N 343
 
3.1%
Other values (12) 2757
24.6%
Distinct276
Distinct (%)9.9%
Missing0
Missing (%)0.0%
Memory size22.0 KiB
2023-12-12T23:53:15.790602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length13
Mean length5.5144695
Min length2

Characters and Unicode

Total characters15435
Distinct characters218
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)0.8%

Sample

1st row대수학
2nd row대수학
3rd row대수학
4th row대수학
5th row대수학
ValueCountFrequency (%)
문학 214
 
6.2%
기타 68
 
2.0%
법학 63
 
1.8%
역사학 59
 
1.7%
철학 48
 
1.4%
교육학 45
 
1.3%
경영학 44
 
1.3%
지리학 43
 
1.3%
심리과학 41
 
1.2%
한국어와 39
 
1.1%
Other values (301) 2764
80.6%
2023-12-12T23:53:16.232476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1609
 
10.4%
734
 
4.8%
629
 
4.1%
/ 435
 
2.8%
338
 
2.2%
328
 
2.1%
315
 
2.0%
288
 
1.9%
284
 
1.8%
255
 
1.7%
Other values (208) 10220
66.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 14195
92.0%
Space Separator 629
 
4.1%
Other Punctuation 599
 
3.9%
Uppercase Letter 12
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1609
 
11.3%
734
 
5.2%
338
 
2.4%
328
 
2.3%
315
 
2.2%
288
 
2.0%
284
 
2.0%
255
 
1.8%
246
 
1.7%
241
 
1.7%
Other values (203) 9557
67.3%
Other Punctuation
ValueCountFrequency (%)
/ 435
72.6%
· 164
 
27.4%
Uppercase Letter
ValueCountFrequency (%)
S 6
50.0%
W 6
50.0%
Space Separator
ValueCountFrequency (%)
629
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 14195
92.0%
Common 1228
 
8.0%
Latin 12
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1609
 
11.3%
734
 
5.2%
338
 
2.4%
328
 
2.3%
315
 
2.2%
288
 
2.0%
284
 
2.0%
255
 
1.8%
246
 
1.7%
241
 
1.7%
Other values (203) 9557
67.3%
Common
ValueCountFrequency (%)
629
51.2%
/ 435
35.4%
· 164
 
13.4%
Latin
ValueCountFrequency (%)
S 6
50.0%
W 6
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 14195
92.0%
ASCII 1076
 
7.0%
None 164
 
1.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1609
 
11.3%
734
 
5.2%
338
 
2.4%
328
 
2.3%
315
 
2.2%
288
 
2.0%
284
 
2.0%
255
 
1.8%
246
 
1.7%
241
 
1.7%
Other values (203) 9557
67.3%
ASCII
ValueCountFrequency (%)
629
58.5%
/ 435
40.4%
S 6
 
0.6%
W 6
 
0.6%
None
ValueCountFrequency (%)
· 164
100.0%
Distinct276
Distinct (%)9.9%
Missing0
Missing (%)0.0%
Memory size22.0 KiB
2023-12-12T23:53:16.594897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length70
Median length40
Mean length22.034655
Min length3

Characters and Unicode

Total characters61675
Distinct characters56
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)0.8%

Sample

1st rowAlgebra
2nd rowAlgebra
3rd rowAlgebra
4th rowAlgebra
5th rowAlgebra
ValueCountFrequency (%)
and 478
 
7.0%
science 267
 
3.9%
linguistics 237
 
3.5%
literature 231
 
3.4%
management 108
 
1.6%
technology 99
 
1.5%
safety 96
 
1.4%
information 85
 
1.3%
administration 77
 
1.1%
chemistry 75
 
1.1%
Other values (335) 5039
74.2%
2023-12-12T23:53:17.078279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 5619
 
9.1%
i 5587
 
9.1%
n 4726
 
7.7%
a 4241
 
6.9%
3993
 
6.5%
t 3952
 
6.4%
o 3874
 
6.3%
r 3484
 
5.6%
c 3260
 
5.3%
s 3026
 
4.9%
Other values (46) 19913
32.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 50300
81.6%
Uppercase Letter 6732
 
10.9%
Space Separator 3993
 
6.5%
Other Punctuation 603
 
1.0%
Dash Punctuation 38
 
0.1%
Open Punctuation 9
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 5619
11.2%
i 5587
11.1%
n 4726
9.4%
a 4241
8.4%
t 3952
 
7.9%
o 3874
 
7.7%
r 3484
 
6.9%
c 3260
 
6.5%
s 3026
 
6.0%
l 2355
 
4.7%
Other values (15) 10176
20.2%
Uppercase Letter
ValueCountFrequency (%)
S 777
11.5%
M 608
 
9.0%
P 593
 
8.8%
L 583
 
8.7%
C 514
 
7.6%
E 509
 
7.6%
A 440
 
6.5%
D 361
 
5.4%
F 353
 
5.2%
T 279
 
4.1%
Other values (13) 1715
25.5%
Other Punctuation
ValueCountFrequency (%)
/ 483
80.1%
& 46
 
7.6%
· 36
 
6.0%
, 24
 
4.0%
' 14
 
2.3%
Space Separator
ValueCountFrequency (%)
3993
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 38
100.0%
Open Punctuation
ValueCountFrequency (%)
( 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 57032
92.5%
Common 4643
 
7.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 5619
 
9.9%
i 5587
 
9.8%
n 4726
 
8.3%
a 4241
 
7.4%
t 3952
 
6.9%
o 3874
 
6.8%
r 3484
 
6.1%
c 3260
 
5.7%
s 3026
 
5.3%
l 2355
 
4.1%
Other values (38) 16908
29.6%
Common
ValueCountFrequency (%)
3993
86.0%
/ 483
 
10.4%
& 46
 
1.0%
- 38
 
0.8%
· 36
 
0.8%
, 24
 
0.5%
' 14
 
0.3%
( 9
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 61639
99.9%
None 36
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 5619
 
9.1%
i 5587
 
9.1%
n 4726
 
7.7%
a 4241
 
6.9%
3993
 
6.5%
t 3952
 
6.4%
o 3874
 
6.3%
r 3484
 
5.7%
c 3260
 
5.3%
s 3026
 
4.9%
Other values (45) 19877
32.2%
None
ValueCountFrequency (%)
· 36
100.0%

Correlations

2023-12-12T23:53:17.196579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연구분야대분류코드대분류명대분류 영문명
연구분야1.0001.0001.0001.000
대분류코드1.0001.0001.0001.000
대분류명1.0001.0001.0001.000
대분류 영문명1.0001.0001.0001.000
2023-12-12T23:53:17.306355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대분류명연구분야대분류코드대분류 영문명
대분류명1.0000.9971.0001.000
연구분야0.9971.0000.9970.997
대분류코드1.0000.9971.0001.000
대분류 영문명1.0000.9971.0001.000
2023-12-12T23:53:17.434366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연구분야대분류코드대분류명대분류 영문명
연구분야1.0000.9970.9970.997
대분류코드0.9971.0001.0001.000
대분류명0.9971.0001.0001.000
대분류 영문명0.9971.0001.0001.000

Missing values

2023-12-12T23:53:13.705479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T23:53:13.815684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연구분야대분류코드대분류명대분류 영문명중분류코드중분류명중분류 영문명
0자연<NA>수학MathematicsNA01대수학Algebra
1자연<NA>수학MathematicsNA01대수학Algebra
2자연<NA>수학MathematicsNA01대수학Algebra
3자연<NA>수학MathematicsNA01대수학Algebra
4자연<NA>수학MathematicsNA01대수학Algebra
5자연<NA>수학MathematicsNA01대수학Algebra
6자연<NA>수학MathematicsNA01대수학Algebra
7자연<NA>수학MathematicsNA01대수학Algebra
8자연<NA>수학MathematicsNA02해석학Analysis
9자연<NA>수학MathematicsNA02해석학Analysis
연구분야대분류코드대분류명대분류 영문명중분류코드중분류명중분류 영문명
2789인간 과학과 기술OC과학기술과 인문사회Science, Technology and SocietyOC03과학기술정책 사회Science and Technology Policy Society
2790인간 과학과 기술OC과학기술과 인문사회Science, Technology and SocietyOC04생명 의료윤리Bioethics Medical Ethics
2791인간 과학과 기술OC과학기술과 인문사회Science, Technology and SocietyOC04생명 의료윤리Bioethics Medical Ethics
2792인간 과학과 기술OC과학기술과 인문사회Science, Technology and SocietyOC04생명 의료윤리Bioethics Medical Ethics
2793인간 과학과 기술OC과학기술과 인문사회Science, Technology and SocietyOC04생명 의료윤리Bioethics Medical Ethics
2794인간 과학과 기술OC과학기술과 인문사회Science, Technology and SocietyOC04생명 의료윤리Bioethics Medical Ethics
2795인간 과학과 기술OC과학기술과 인문사회Science, Technology and SocietyOC05안전사회/재난관리Safe Society/Disaster Management
2796인간 과학과 기술OC과학기술과 인문사회Science, Technology and SocietyOC05안전사회/재난관리Safe Society/Disaster Management
2797인간 과학과 기술OC과학기술과 인문사회Science, Technology and SocietyOC05안전사회/재난관리Safe Society/Disaster Management
2798인간 과학과 기술OC과학기술과 인문사회Science, Technology and SocietyOC99기타 과학기술과 인문사회Other Science, Technology and Society

Duplicate rows

Most frequently occurring

연구분야대분류코드대분류명대분류 영문명중분류코드중분류명중분류 영문명# duplicates
188인문사회학HG사회과학Social ScienceHG12법학Law63
155인문사회학HF인문학HumanitiesHF01역사학History59
156인문사회학HF인문학HumanitiesHF02철학Philosophy48
187인문사회학HG사회과학Social ScienceHG11교육학Education45
180인문사회학HG사회과학Social ScienceHG04경영학Business Administration44
191인문사회학HG사회과학Social ScienceHG15지리학Geographical43
195인문사회학HG사회과학Social ScienceHG19심리과학Psychology41
165인문사회학HF인문학HumanitiesHF12한국어와 문학Korean Linguistics and Literature39
177인문사회학HG사회과학Social ScienceHG01정치외교학Political Science & Diplomacy38
196인문사회학HG사회과학Social ScienceHG20생활과학Human Ecology35