Overview

Dataset statistics

Number of variables13
Number of observations6653
Missing cells7236
Missing cells (%)8.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory688.8 KiB
Average record size in memory106.0 B

Variable types

Numeric1
Categorical6
Text5
Unsupported1

Dataset

Description2021년 9월 23일 기준 국가과학기술표분류정보에 대한 정보입니다. 국가과학기술표준분류정보: 과학기술표준분류정보 서비스를 하기 위한 정보로써 연구분야, 적용분야 등 데이터를 구분하여 사용가능한 정보가 포함됨. (과학기술표준분류 소개 및 개요 정보) 해당 데이터가 보유한 컬럼은 다음과 같습니다. 칼럼명: 분류정보일련번호, 분류구분코드, 분류구분코드한글명, 분류코드, 분류코드 한글명, 분야코드, 분야코드 한글명, 대분류코드, 대분류코드 한글명, 중분류코드, 중분류코드 한글명, 제목, 내용태그
Author한국과학기술기획평가원(KISTEP)
URLhttps://www.data.go.kr/data/15065876/fileData.do

Alerts

분류코드 is highly overall correlated with 분류정보일련번호 and 5 other fieldsHigh correlation
분류구분코드 is highly overall correlated with 분류정보일련번호 and 3 other fieldsHigh correlation
분야코드한글명 is highly overall correlated with 분류정보일련번호 and 3 other fieldsHigh correlation
분류코드한글명 is highly overall correlated with 분류코드 and 2 other fieldsHigh correlation
분야코드 is highly overall correlated with 분류정보일련번호 and 5 other fieldsHigh correlation
분류구분코드한글명 is highly overall correlated with 분류정보일련번호 and 3 other fieldsHigh correlation
분류정보일련번호 is highly overall correlated with 분류구분코드 and 4 other fieldsHigh correlation
분류구분코드 is highly imbalanced (90.3%)Imbalance
분류구분코드한글명 is highly imbalanced (90.3%)Imbalance
분류코드 is highly imbalanced (91.1%)Imbalance
분류코드한글명 is highly imbalanced (95.5%)Imbalance
대분류코드 has 82 (1.2%) missing valuesMissing
제목 has 371 (5.6%) missing valuesMissing
내용태그 has 6653 (100.0%) missing valuesMissing
내용태그 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 14:23:44.682994
Analysis finished2023-12-12 14:23:46.814691
Duration2.13 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

분류정보일련번호
Real number (ℝ)

HIGH CORRELATION 

Distinct3364
Distinct (%)50.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1664.0439
Minimum1
Maximum3598
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size58.6 KiB
2023-12-12T23:23:46.887870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile167
Q1832
median1664
Q32495
95-th percentile3160.4
Maximum3598
Range3597
Interquartile range (IQR)1663

Descriptive statistics

Standard deviation960.86299
Coefficient of variation (CV)0.57742647
Kurtosis-1.1973128
Mean1664.0439
Median Absolute Deviation (MAD)832
Skewness0.0018812196
Sum11070884
Variance923257.68
MonotonicityIncreasing
2023-12-12T23:23:47.020267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 2
 
< 0.1%
2198 2
 
< 0.1%
2188 2
 
< 0.1%
2189 2
 
< 0.1%
2190 2
 
< 0.1%
2191 2
 
< 0.1%
2192 2
 
< 0.1%
2193 2
 
< 0.1%
2194 2
 
< 0.1%
2195 2
 
< 0.1%
Other values (3354) 6633
99.7%
ValueCountFrequency (%)
1 2
< 0.1%
2 2
< 0.1%
3 2
< 0.1%
4 2
< 0.1%
5 2
< 0.1%
6 2
< 0.1%
7 2
< 0.1%
8 2
< 0.1%
9 2
< 0.1%
10 2
< 0.1%
ValueCountFrequency (%)
3598 1
< 0.1%
3372 1
< 0.1%
3371 1
< 0.1%
3370 1
< 0.1%
3369 1
< 0.1%
3368 1
< 0.1%
3358 1
< 0.1%
3357 1
< 0.1%
3356 1
< 0.1%
3355 1
< 0.1%

분류구분코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.1 KiB
CL001
6570 
CL002
 
83

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCL001
2nd rowCL001
3rd rowCL001
4th rowCL001
5th rowCL001

Common Values

ValueCountFrequency (%)
CL001 6570
98.8%
CL002 83
 
1.2%

Length

2023-12-12T23:23:47.149874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:23:47.241990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
cl001 6570
98.8%
cl002 83
 
1.2%

분류구분코드한글명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.1 KiB
분류정보
6570 
임시정보
 
83

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row분류정보
2nd row분류정보
3rd row분류정보
4th row분류정보
5th row분류정보

Common Values

ValueCountFrequency (%)
분류정보 6570
98.8%
임시정보 83
 
1.2%

Length

2023-12-12T23:23:47.332171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:23:47.420808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
분류정보 6570
98.8%
임시정보 83
 
1.2%

분류코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.1 KiB
RCSARE
6537 
TMPR01
 
83
APPARE
 
33

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRCSARE
2nd rowRCSARE
3rd rowRCSARE
4th rowRCSARE
5th rowRCSARE

Common Values

ValueCountFrequency (%)
RCSARE 6537
98.3%
TMPR01 83
 
1.2%
APPARE 33
 
0.5%

Length

2023-12-12T23:23:47.515591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:23:47.604156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
rcsare 6537
98.3%
tmpr01 83
 
1.2%
appare 33
 
0.5%

분류코드한글명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.1 KiB
연구분야
6620 
적용분야
 
33

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row연구분야
2nd row연구분야
3rd row연구분야
4th row연구분야
5th row연구분야

Common Values

ValueCountFrequency (%)
연구분야 6620
99.5%
적용분야 33
 
0.5%

Length

2023-12-12T23:23:47.695599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:23:47.795623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
연구분야 6620
99.5%
적용분야 33
 
0.5%

분야코드
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size52.1 KiB
ATARE
1935 
SOARE
1444 
HUARE
1215 
LFARE
993 
NAARE
781 
Other values (6)
285 

Length

Max length9
Median length5
Mean length5.0499023
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNAARE
2nd rowNAARE
3rd rowNAARE
4th rowNAARE
5th rowNAARE

Common Values

ValueCountFrequency (%)
ATARE 1935
29.1%
SOARE 1444
21.7%
HUARE 1215
18.3%
LFARE 993
14.9%
NAARE 781
11.7%
SIARE 169
 
2.5%
TMPR01L01 38
 
0.6%
TMPR01L03 34
 
0.5%
INARE 20
 
0.3%
CMARE 13
 
0.2%

Length

2023-12-12T23:23:47.924316image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
atare 1935
29.1%
soare 1444
21.7%
huare 1215
18.3%
lfare 993
14.9%
naare 781
11.7%
siare 169
 
2.5%
tmpr01l01 38
 
0.6%
tmpr01l03 34
 
0.5%
inare 20
 
0.3%
cmare 13
 
0.2%

분야코드한글명
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size52.1 KiB
인공물
1946 
사회
1444 
인간
1215 
생명
1027 
자연
781 
Other values (3)
240 

Length

Max length8
Median length2
Mean length2.4791823
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row자연
2nd row자연
3rd row자연
4th row자연
5th row자연

Common Values

ValueCountFrequency (%)
인공물 1946
29.2%
사회 1444
21.7%
인간 1215
18.3%
생명 1027
15.4%
자연 781
11.7%
인간과학과 기술 207
 
3.1%
산업 20
 
0.3%
공공 13
 
0.2%

Length

2023-12-12T23:23:48.071411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:23:48.188788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
인공물 1946
28.4%
사회 1444
21.0%
인간 1215
17.7%
생명 1027
15.0%
자연 781
11.4%
인간과학과 207
 
3.0%
기술 207
 
3.0%
산업 20
 
0.3%
공공 13
 
0.2%

대분류코드
Text

MISSING 

Distinct86
Distinct (%)1.3%
Missing82
Missing (%)1.2%
Memory size52.1 KiB
2023-12-12T23:23:48.453342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length2
Mean length2.1313347
Min length2

Characters and Unicode

Total characters14005
Distinct characters31
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)0.5%

Sample

1st rowTA
2nd rowTA
3rd rowTA
4th rowTA
5th rowTA
ValueCountFrequency (%)
hc 304
 
4.6%
he 287
 
4.4%
hd 284
 
4.3%
sc 246
 
3.7%
lc 218
 
3.3%
tg 216
 
3.3%
sd 212
 
3.2%
hb 192
 
2.9%
sb 192
 
2.9%
tf 191
 
2.9%
Other values (76) 4229
64.4%
2023-12-12T23:23:48.822742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
T 1939
13.8%
H 1601
11.4%
E 1550
11.1%
S 1444
10.3%
C 1176
8.4%
D 897
 
6.4%
B 840
 
6.0%
A 703
 
5.0%
L 688
 
4.9%
G 412
 
2.9%
Other values (21) 2755
19.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 13441
96.0%
Decimal Number 564
 
4.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 1939
14.4%
H 1601
11.9%
E 1550
11.5%
S 1444
10.7%
C 1176
8.7%
D 897
6.7%
B 840
 
6.2%
A 703
 
5.2%
L 688
 
5.1%
G 412
 
3.1%
Other values (11) 2191
16.3%
Decimal Number
ValueCountFrequency (%)
0 269
47.7%
1 210
37.2%
3 37
 
6.6%
2 26
 
4.6%
9 7
 
1.2%
7 3
 
0.5%
6 3
 
0.5%
5 3
 
0.5%
4 3
 
0.5%
8 3
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 13441
96.0%
Common 564
 
4.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 1939
14.4%
H 1601
11.9%
E 1550
11.5%
S 1444
10.7%
C 1176
8.7%
D 897
6.7%
B 840
 
6.2%
A 703
 
5.2%
L 688
 
5.1%
G 412
 
3.1%
Other values (11) 2191
16.3%
Common
ValueCountFrequency (%)
0 269
47.7%
1 210
37.2%
3 37
 
6.6%
2 26
 
4.6%
9 7
 
1.2%
7 3
 
0.5%
6 3
 
0.5%
5 3
 
0.5%
4 3
 
0.5%
8 3
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14005
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 1939
13.8%
H 1601
11.4%
E 1550
11.1%
S 1444
10.3%
C 1176
8.4%
D 897
 
6.4%
B 840
 
6.0%
A 703
 
5.0%
L 688
 
4.9%
G 412
 
2.9%
Other values (21) 2755
19.7%
Distinct100
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size52.1 KiB
2023-12-12T23:23:49.099540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length68
Median length43
Mean length15.768225
Min length1

Characters and Unicode

Total characters104906
Distinct characters182
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)0.5%

Sample

1st row수학(Mathematics)
2nd row수학
3rd row수학(Mathematics)
4th row수학
5th row수학(Mathematics)
ValueCountFrequency (%)
and 439
 
4.5%
sciences 265
 
2.7%
230
 
2.4%
농림수산식품 225
 
2.3%
science 220
 
2.3%
보건의료(health 218
 
2.2%
보건의료 216
 
2.2%
food 190
 
2.0%
농림수산식품(agriculture 190
 
2.0%
fishery 190
 
2.0%
Other values (154) 7318
75.4%
2023-12-12T23:23:49.587399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 6251
 
6.0%
i 5745
 
5.5%
e 5252
 
5.0%
o 4909
 
4.7%
n 4817
 
4.6%
r 4297
 
4.1%
t 4192
 
4.0%
c 3936
 
3.8%
) 3678
 
3.5%
( 3678
 
3.5%
Other values (172) 58151
55.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 52649
50.2%
Other Letter 28320
27.0%
Uppercase Letter 6880
 
6.6%
Other Punctuation 6516
 
6.2%
Close Punctuation 3678
 
3.5%
Open Punctuation 3678
 
3.5%
Space Separator 3185
 
3.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1761
 
6.2%
1222
 
4.3%
1070
 
3.8%
965
 
3.4%
804
 
2.8%
753
 
2.7%
744
 
2.6%
700
 
2.5%
680
 
2.4%
655
 
2.3%
Other values (129) 18966
67.0%
Lowercase Letter
ValueCountFrequency (%)
i 5745
10.9%
e 5252
10.0%
o 4909
9.3%
n 4817
9.1%
r 4297
 
8.2%
t 4192
 
8.0%
c 3936
 
7.5%
a 3571
 
6.8%
s 2829
 
5.4%
l 2133
 
4.1%
Other values (11) 10968
20.8%
Uppercase Letter
ValueCountFrequency (%)
E 1057
15.4%
S 928
13.5%
A 871
12.7%
C 677
9.8%
M 550
8.0%
L 536
7.8%
P 495
7.2%
F 380
 
5.5%
H 340
 
4.9%
R 236
 
3.4%
Other values (6) 810
11.8%
Other Punctuation
ValueCountFrequency (%)
/ 6251
95.9%
, 247
 
3.8%
& 18
 
0.3%
Close Punctuation
ValueCountFrequency (%)
) 3678
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3678
100.0%
Space Separator
ValueCountFrequency (%)
3185
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 59529
56.7%
Hangul 28320
27.0%
Common 17057
 
16.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1761
 
6.2%
1222
 
4.3%
1070
 
3.8%
965
 
3.4%
804
 
2.8%
753
 
2.7%
744
 
2.6%
700
 
2.5%
680
 
2.4%
655
 
2.3%
Other values (129) 18966
67.0%
Latin
ValueCountFrequency (%)
i 5745
 
9.7%
e 5252
 
8.8%
o 4909
 
8.2%
n 4817
 
8.1%
r 4297
 
7.2%
t 4192
 
7.0%
c 3936
 
6.6%
a 3571
 
6.0%
s 2829
 
4.8%
l 2133
 
3.6%
Other values (27) 17848
30.0%
Common
ValueCountFrequency (%)
/ 6251
36.6%
) 3678
21.6%
( 3678
21.6%
3185
18.7%
, 247
 
1.4%
& 18
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 76586
73.0%
Hangul 28320
 
27.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 6251
 
8.2%
i 5745
 
7.5%
e 5252
 
6.9%
o 4909
 
6.4%
n 4817
 
6.3%
r 4297
 
5.6%
t 4192
 
5.5%
c 3936
 
5.1%
) 3678
 
4.8%
( 3678
 
4.8%
Other values (33) 29831
39.0%
Hangul
ValueCountFrequency (%)
1761
 
6.2%
1222
 
4.3%
1070
 
3.8%
965
 
3.4%
804
 
2.8%
753
 
2.7%
744
 
2.6%
700
 
2.5%
680
 
2.4%
655
 
2.3%
Other values (129) 18966
67.0%
Distinct593
Distinct (%)9.0%
Missing65
Missing (%)1.0%
Memory size52.1 KiB
2023-12-12T23:23:50.001588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length4
Mean length4.1385853
Min length4

Characters and Unicode

Total characters27265
Distinct characters29
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)0.2%

Sample

1st rowNA01
2nd rowTA01
3rd rowNA01
4th rowTA01
5th rowNA01
ValueCountFrequency (%)
hc01 46
 
0.7%
sd05 42
 
0.6%
he14 39
 
0.6%
hc02 38
 
0.6%
hc11 36
 
0.5%
hd12 36
 
0.5%
hb01 34
 
0.5%
hd01 34
 
0.5%
sd08 34
 
0.5%
sa01 32
 
0.5%
Other values (583) 6217
94.4%
2023-12-12T23:23:50.623863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 5610
20.6%
1 2660
 
9.8%
T 1923
 
7.1%
H 1594
 
5.8%
E 1547
 
5.7%
S 1518
 
5.6%
C 1173
 
4.3%
2 938
 
3.4%
D 894
 
3.3%
B 836
 
3.1%
Other values (19) 8572
31.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 13674
50.2%
Uppercase Letter 13591
49.8%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 1923
14.1%
H 1594
11.7%
E 1547
11.4%
S 1518
11.2%
C 1173
8.6%
D 894
6.6%
B 836
 
6.2%
A 780
 
5.7%
L 687
 
5.1%
N 469
 
3.5%
Other values (9) 2170
16.0%
Decimal Number
ValueCountFrequency (%)
0 5610
41.0%
1 2660
19.5%
2 938
 
6.9%
4 800
 
5.9%
3 796
 
5.8%
5 743
 
5.4%
9 596
 
4.4%
6 574
 
4.2%
7 519
 
3.8%
8 438
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
Common 13674
50.2%
Latin 13591
49.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 1923
14.1%
H 1594
11.7%
E 1547
11.4%
S 1518
11.2%
C 1173
8.6%
D 894
6.6%
B 836
 
6.2%
A 780
 
5.7%
L 687
 
5.1%
N 469
 
3.5%
Other values (9) 2170
16.0%
Common
ValueCountFrequency (%)
0 5610
41.0%
1 2660
19.5%
2 938
 
6.9%
4 800
 
5.9%
3 796
 
5.8%
5 743
 
5.4%
9 596
 
4.4%
6 574
 
4.2%
7 519
 
3.8%
8 438
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27265
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5610
20.6%
1 2660
 
9.8%
T 1923
 
7.1%
H 1594
 
5.8%
E 1547
 
5.7%
S 1518
 
5.6%
C 1173
 
4.3%
2 938
 
3.4%
D 894
 
3.3%
B 836
 
3.1%
Other values (19) 8572
31.4%
Distinct753
Distinct (%)11.4%
Missing65
Missing (%)1.0%
Memory size52.1 KiB
2023-12-12T23:23:50.942613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length76
Median length57
Mean length18.185489
Min length2

Characters and Unicode

Total characters119806
Distinct characters300
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)0.5%

Sample

1st row대수학(Algebra)
2nd row대수학
3rd row대수학(Algebra)
4th row대수학
5th row대수학(Algebra)
ValueCountFrequency (%)
and 242
 
1.9%
204
 
1.6%
science 192
 
1.5%
기타 166
 
1.3%
management 164
 
1.3%
general 153
 
1.2%
technology 135
 
1.1%
기술 129
 
1.0%
linguistics 125
 
1.0%
literature 125
 
1.0%
Other values (1042) 11167
87.2%
2023-12-12T23:23:51.451104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 7416
 
6.2%
i 6281
 
5.2%
6215
 
5.2%
n 5712
 
4.8%
a 5168
 
4.3%
o 4855
 
4.1%
t 4669
 
3.9%
r 4305
 
3.6%
c 3719
 
3.1%
s 3550
 
3.0%
Other values (290) 67916
56.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 60816
50.8%
Other Letter 34529
28.8%
Uppercase Letter 8429
 
7.0%
Space Separator 6215
 
5.2%
Open Punctuation 3269
 
2.7%
Close Punctuation 3269
 
2.7%
Other Punctuation 3145
 
2.6%
Dash Punctuation 134
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1933
 
5.6%
1816
 
5.3%
813
 
2.4%
808
 
2.3%
681
 
2.0%
680
 
2.0%
604
 
1.7%
573
 
1.7%
561
 
1.6%
536
 
1.6%
Other values (235) 25524
73.9%
Lowercase Letter
ValueCountFrequency (%)
e 7416
12.2%
i 6281
10.3%
n 5712
9.4%
a 5168
8.5%
o 4855
8.0%
t 4669
 
7.7%
r 4305
 
7.1%
c 3719
 
6.1%
s 3550
 
5.8%
l 3272
 
5.4%
Other values (15) 11869
19.5%
Uppercase Letter
ValueCountFrequency (%)
S 1032
12.2%
P 789
 
9.4%
M 715
 
8.5%
C 649
 
7.7%
E 591
 
7.0%
A 561
 
6.7%
L 451
 
5.4%
T 448
 
5.3%
F 433
 
5.1%
R 381
 
4.5%
Other values (13) 2379
28.2%
Other Punctuation
ValueCountFrequency (%)
/ 2912
92.6%
, 231
 
7.3%
& 2
 
0.1%
Space Separator
ValueCountFrequency (%)
6215
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3269
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3269
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 134
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 69245
57.8%
Hangul 34529
28.8%
Common 16032
 
13.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1933
 
5.6%
1816
 
5.3%
813
 
2.4%
808
 
2.3%
681
 
2.0%
680
 
2.0%
604
 
1.7%
573
 
1.7%
561
 
1.6%
536
 
1.6%
Other values (235) 25524
73.9%
Latin
ValueCountFrequency (%)
e 7416
 
10.7%
i 6281
 
9.1%
n 5712
 
8.2%
a 5168
 
7.5%
o 4855
 
7.0%
t 4669
 
6.7%
r 4305
 
6.2%
c 3719
 
5.4%
s 3550
 
5.1%
l 3272
 
4.7%
Other values (38) 20298
29.3%
Common
ValueCountFrequency (%)
6215
38.8%
( 3269
20.4%
) 3269
20.4%
/ 2912
18.2%
, 231
 
1.4%
- 134
 
0.8%
& 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 85277
71.2%
Hangul 34529
28.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 7416
 
8.7%
i 6281
 
7.4%
6215
 
7.3%
n 5712
 
6.7%
a 5168
 
6.1%
o 4855
 
5.7%
t 4669
 
5.5%
r 4305
 
5.0%
c 3719
 
4.4%
s 3550
 
4.2%
Other values (45) 33387
39.2%
Hangul
ValueCountFrequency (%)
1933
 
5.6%
1816
 
5.3%
813
 
2.4%
808
 
2.3%
681
 
2.0%
680
 
2.0%
604
 
1.7%
573
 
1.7%
561
 
1.6%
536
 
1.6%
Other values (235) 25524
73.9%

제목
Text

MISSING 

Distinct6277
Distinct (%)99.9%
Missing371
Missing (%)5.6%
Memory size52.1 KiB
2023-12-12T23:23:51.819757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length140
Median length90
Mean length39.886501
Min length2

Characters and Unicode

Total characters250567
Distinct characters537
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6272 ?
Unique (%)99.8%

Sample

1st row선형대수(Linear algebra)
2nd rowNA01. 대수학(Algebra)
3rd row수리논리학/집합론(Mathematical logic/set theory)
4th rowNA0101. 선형대수(Linear algebra)
5th row수론(Number theory)
ValueCountFrequency (%)
달리 748
 
2.9%
않는 740
 
2.9%
분류되지 719
 
2.8%
of 523
 
2.0%
and 462
 
1.8%
technology 306
 
1.2%
251
 
1.0%
management 210
 
0.8%
system 171
 
0.7%
literature 154
 
0.6%
Other values (9451) 21458
83.4%
2023-12-12T23:23:52.377928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
19473
 
7.8%
e 14659
 
5.9%
i 13068
 
5.2%
n 11659
 
4.7%
o 11618
 
4.6%
a 11521
 
4.6%
t 11385
 
4.5%
r 9993
 
4.0%
s 8208
 
3.3%
c 7291
 
2.9%
Other values (527) 131692
52.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 137837
55.0%
Other Letter 45013
 
18.0%
Space Separator 19474
 
7.8%
Uppercase Letter 14671
 
5.9%
Decimal Number 12735
 
5.1%
Other Punctuation 7271
 
2.9%
Open Punctuation 6664
 
2.7%
Close Punctuation 6636
 
2.6%
Dash Punctuation 262
 
0.1%
Final Punctuation 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1889
 
4.2%
1557
 
3.5%
1511
 
3.4%
1334
 
3.0%
1057
 
2.3%
986
 
2.2%
825
 
1.8%
777
 
1.7%
754
 
1.7%
749
 
1.7%
Other values (451) 33574
74.6%
Lowercase Letter
ValueCountFrequency (%)
e 14659
10.6%
i 13068
9.5%
n 11659
 
8.5%
o 11618
 
8.4%
a 11521
 
8.4%
t 11385
 
8.3%
r 9993
 
7.2%
s 8208
 
6.0%
c 7291
 
5.3%
l 7122
 
5.2%
Other values (16) 31313
22.7%
Uppercase Letter
ValueCountFrequency (%)
E 1818
12.4%
C 1484
 
10.1%
S 1476
 
10.1%
H 1051
 
7.2%
A 1005
 
6.9%
O 949
 
6.5%
L 816
 
5.6%
D 798
 
5.4%
B 757
 
5.2%
N 649
 
4.4%
Other values (15) 3868
26.4%
Decimal Number
ValueCountFrequency (%)
0 5091
40.0%
1 1970
 
15.5%
9 1216
 
9.5%
2 895
 
7.0%
3 770
 
6.0%
4 750
 
5.9%
5 661
 
5.2%
6 528
 
4.1%
7 466
 
3.7%
8 388
 
3.0%
Other Punctuation
ValueCountFrequency (%)
/ 3691
50.8%
. 3374
46.4%
, 145
 
2.0%
& 22
 
0.3%
: 17
 
0.2%
· 9
 
0.1%
' 6
 
0.1%
; 4
 
0.1%
# 3
 
< 0.1%
Space Separator
ValueCountFrequency (%)
19473
> 99.9%
  1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 6664
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6636
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 262
100.0%
Final Punctuation
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 152508
60.9%
Common 53046
 
21.2%
Hangul 45013
 
18.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1889
 
4.2%
1557
 
3.5%
1511
 
3.4%
1334
 
3.0%
1057
 
2.3%
986
 
2.2%
825
 
1.8%
777
 
1.7%
754
 
1.7%
749
 
1.7%
Other values (451) 33574
74.6%
Latin
ValueCountFrequency (%)
e 14659
 
9.6%
i 13068
 
8.6%
n 11659
 
7.6%
o 11618
 
7.6%
a 11521
 
7.6%
t 11385
 
7.5%
r 9993
 
6.6%
s 8208
 
5.4%
c 7291
 
4.8%
l 7122
 
4.7%
Other values (41) 45984
30.2%
Common
ValueCountFrequency (%)
19473
36.7%
( 6664
 
12.6%
) 6636
 
12.5%
0 5091
 
9.6%
/ 3691
 
7.0%
. 3374
 
6.4%
1 1970
 
3.7%
9 1216
 
2.3%
2 895
 
1.7%
3 770
 
1.5%
Other values (15) 3266
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 205540
82.0%
Hangul 45012
 
18.0%
None 10
 
< 0.1%
Punctuation 4
 
< 0.1%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
19473
 
9.5%
e 14659
 
7.1%
i 13068
 
6.4%
n 11659
 
5.7%
o 11618
 
5.7%
a 11521
 
5.6%
t 11385
 
5.5%
r 9993
 
4.9%
s 8208
 
4.0%
c 7291
 
3.5%
Other values (63) 86665
42.2%
Hangul
ValueCountFrequency (%)
1889
 
4.2%
1557
 
3.5%
1511
 
3.4%
1334
 
3.0%
1057
 
2.3%
986
 
2.2%
825
 
1.8%
777
 
1.7%
754
 
1.7%
749
 
1.7%
Other values (450) 33573
74.6%
None
ValueCountFrequency (%)
· 9
90.0%
  1
 
10.0%
Punctuation
ValueCountFrequency (%)
4
100.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%

내용태그
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing6653
Missing (%)100.0%
Memory size58.6 KiB

Interactions

2023-12-12T23:23:46.219748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T23:23:52.512125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분류정보일련번호분류구분코드분류구분코드한글명분류코드분류코드한글명분야코드분야코드한글명대분류코드대분류코드한글명
분류정보일련번호1.0000.8470.8470.7100.5520.8780.8730.9730.988
분류구분코드0.8471.0001.0001.0000.0001.0000.3901.0000.851
분류구분코드한글명0.8471.0001.0001.0000.0001.0000.3901.0000.851
분류코드0.7101.0001.0001.0001.0001.0000.8081.0000.971
분류코드한글명0.5520.0000.0001.0001.0001.0001.0001.0001.000
분야코드0.8781.0001.0001.0001.0001.0001.0001.0000.992
분야코드한글명0.8730.3900.3900.8081.0001.0001.0001.0001.000
대분류코드0.9731.0001.0001.0001.0001.0001.0001.0001.000
대분류코드한글명0.9880.8510.8510.9711.0000.9921.0001.0001.000
2023-12-12T23:23:52.672862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분류코드분류구분코드분야코드한글명분류코드한글명분야코드분류구분코드한글명
분류코드1.0001.0000.7371.0000.9991.000
분류구분코드1.0001.0000.2930.0000.9990.994
분야코드한글명0.7370.2931.0001.0001.0000.293
분류코드한글명1.0000.0001.0001.0000.9990.000
분야코드0.9990.9991.0000.9991.0000.999
분류구분코드한글명1.0000.9940.2930.0000.9991.000
2023-12-12T23:23:52.789112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분류정보일련번호분류구분코드분류구분코드한글명분류코드분류코드한글명분야코드분야코드한글명
분류정보일련번호1.0000.6790.6790.5690.4260.6300.669
분류구분코드0.6791.0000.9941.0000.0000.9990.293
분류구분코드한글명0.6790.9941.0001.0000.0000.9990.293
분류코드0.5691.0001.0001.0001.0000.9990.737
분류코드한글명0.4260.0000.0001.0001.0000.9991.000
분야코드0.6300.9990.9990.9990.9991.0001.000
분야코드한글명0.6690.2930.2930.7371.0001.0001.000

Missing values

2023-12-12T23:23:46.383219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T23:23:46.597890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T23:23:46.742578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

분류정보일련번호분류구분코드분류구분코드한글명분류코드분류코드한글명분야코드분야코드한글명대분류코드대분류코드한글명중분류코드중분류코드한글명제목내용태그
01CL001분류정보RCSARE연구분야NAARE자연<NA>수학(Mathematics)NA01대수학(Algebra)선형대수(Linear algebra)<NA>
11CL001분류정보RCSARE연구분야NAARE자연TA수학TA01대수학NA01. 대수학(Algebra)<NA>
22CL001분류정보RCSARE연구분야NAARE자연<NA>수학(Mathematics)NA01대수학(Algebra)수리논리학/집합론(Mathematical logic/set theory)<NA>
32CL001분류정보RCSARE연구분야NAARE자연TA수학TA01대수학NA0101. 선형대수(Linear algebra)<NA>
43CL001분류정보RCSARE연구분야NAARE자연<NA>수학(Mathematics)NA01대수학(Algebra)수론(Number theory)<NA>
53CL001분류정보RCSARE연구분야NAARE자연TA수학TA01대수학NA0102. 수리논리학/집합론(Mathematical logic/set theory)<NA>
64CL001분류정보RCSARE연구분야NAARE자연<NA>수학(Mathematics)NA01대수학(Algebra)군/표현(Group/representation theory)<NA>
74CL001분류정보RCSARE연구분야NAARE자연TA수학TA01대수학NA0103. 수론(Number theory)<NA>
85CL001분류정보RCSARE연구분야NAARE자연<NA>수학(Mathematics)NA01대수학(Algebra)대수기하/가환환(Algebraic geometry/commutative ring theory)<NA>
95CL001분류정보RCSARE연구분야NAARE자연TA수학TA01대수학NA0104. 군/표현(Group/representation theory)<NA>
분류정보일련번호분류구분코드분류구분코드한글명분류코드분류코드한글명분야코드분야코드한글명대분류코드대분류코드한글명중분류코드중분류코드한글명제목내용태그
66433355CL002임시정보TMPR01연구분야TMPR01L01인간과학과 기술TMPR01L01M01인력 및 인프라TMPR01L01M01S04연구 및 기타시설 / 장비OX0409. 연구개발 이외 국가 목적의 과학적인 시설 및 장비<NA>
66443356CL002임시정보TMPR01연구분야TMPR01L01인간과학과 기술TMPR01L01M01인력 및 인프라TMPR01L01M01S04연구 및 기타시설 / 장비OX0499. 기타 과학적인 시설 및 장비 관련 연구와 응용분야<NA>
66453357CL002임시정보TMPR01연구분야TMPR01L01인간과학과 기술TMPR01L01M01인력 및 인프라TMPR01L01M01S05기타 인력 및 인프라OX99. 기타 인력 및 인프라<NA>
66463358CL002임시정보TMPR01연구분야TMPR01L01인간과학과 기술TMPR01L01M01인력 및 인프라TMPR01L01M01S05기타 인력 및 인프라OX9999. 달리 분류되지 않는 인력 및 인프라<NA>
66473368CL002임시정보TMPR01연구분야TMPR01L01인간과학과 기술TMPR01L01M02과학기술과 인문사회TMPR01L01M02S01과학기술과 재난/안전OY0101.안전시스템과학<NA>
66483369CL002임시정보TMPR01연구분야TMPR01L01인간과학과 기술TMPR01L01M02과학기술과 인문사회TMPR01L01M02S01과학기술과 재난/안전OY0102.재난관리/방재<NA>
66493370CL002임시정보TMPR01연구분야TMPR01L01인간과학과 기술TMPR01L01M02과학기술과 인문사회TMPR01L01M02S01과학기술과 재난/안전OY0103.소방학<NA>
66503371CL002임시정보TMPR01연구분야TMPR01L01인간과학과 기술TMPR01L01M02과학기술과 인문사회TMPR01L01M02S01과학기술과 재난/안전OY0104.사회위기<NA>
66513372CL002임시정보TMPR01연구분야TMPR01L01인간과학과 기술TMPR01L01M02과학기술과 인문사회TMPR01L01M02S01과학기술과 재난/안전OY0199.달리 분류되지 않는 과학기술과 재난/안전<NA>
66523598CL001분류정보RCSARE연구분야NAARE자연TA수학<NA><NA>수학<NA>