Overview

Dataset statistics

Number of variables6
Number of observations273
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory12.9 KiB
Average record size in memory48.5 B

Variable types

Text4
Categorical1
Boolean1

Dataset

Description(주)한국가스기술공사 연구관리 시스템에 사용되는 기관표준단어 목록으로 단어명 약어 영문명 단어유형 금칙어 정의 등의 항목을 제공합니다
URLhttps://www.data.go.kr/data/15103149/fileData.do

Alerts

금칙어여부 has constant value ""Constant

Reproduction

Analysis started2023-12-11 22:49:08.141450
Analysis finished2023-12-11 22:49:08.544761
Duration0.4 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct252
Distinct (%)92.3%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2023-12-12T07:49:08.796156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length2
Mean length2.5567766
Min length1

Characters and Unicode

Total characters698
Distinct characters212
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique234 ?
Unique (%)85.7%

Sample

1st row전문기관평가
2nd row평가위원회구분
3rd row기술
4th row기술적
5th row부/팀
ValueCountFrequency (%)
과제 3
 
1.1%
기술 3
 
1.1%
기관 3
 
1.1%
코드 2
 
0.7%
결과 2
 
0.7%
기타 2
 
0.7%
참여 2
 
0.7%
영문 2
 
0.7%
여부 2
 
0.7%
예산 2
 
0.7%
Other values (242) 252
91.6%
2023-12-12T07:49:09.245756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
22
 
3.2%
14
 
2.0%
13
 
1.9%
12
 
1.7%
12
 
1.7%
12
 
1.7%
11
 
1.6%
10
 
1.4%
10
 
1.4%
10
 
1.4%
Other values (202) 572
81.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 678
97.1%
Uppercase Letter 16
 
2.3%
Other Punctuation 2
 
0.3%
Space Separator 2
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
22
 
3.2%
14
 
2.1%
13
 
1.9%
12
 
1.8%
12
 
1.8%
12
 
1.8%
11
 
1.6%
10
 
1.5%
10
 
1.5%
10
 
1.5%
Other values (191) 552
81.4%
Uppercase Letter
ValueCountFrequency (%)
I 4
25.0%
S 4
25.0%
N 2
12.5%
C 1
 
6.2%
L 1
 
6.2%
R 1
 
6.2%
U 1
 
6.2%
D 1
 
6.2%
B 1
 
6.2%
Other Punctuation
ValueCountFrequency (%)
/ 2
100.0%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 678
97.1%
Latin 16
 
2.3%
Common 4
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
22
 
3.2%
14
 
2.1%
13
 
1.9%
12
 
1.8%
12
 
1.8%
12
 
1.8%
11
 
1.6%
10
 
1.5%
10
 
1.5%
10
 
1.5%
Other values (191) 552
81.4%
Latin
ValueCountFrequency (%)
I 4
25.0%
S 4
25.0%
N 2
12.5%
C 1
 
6.2%
L 1
 
6.2%
R 1
 
6.2%
U 1
 
6.2%
D 1
 
6.2%
B 1
 
6.2%
Common
ValueCountFrequency (%)
/ 2
50.0%
2
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 678
97.1%
ASCII 20
 
2.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
22
 
3.2%
14
 
2.1%
13
 
1.9%
12
 
1.8%
12
 
1.8%
12
 
1.8%
11
 
1.6%
10
 
1.5%
10
 
1.5%
10
 
1.5%
Other values (191) 552
81.4%
ASCII
ValueCountFrequency (%)
I 4
20.0%
S 4
20.0%
/ 2
10.0%
2
10.0%
N 2
10.0%
C 1
 
5.0%
L 1
 
5.0%
R 1
 
5.0%
U 1
 
5.0%
D 1
 
5.0%

약어
Text

Distinct251
Distinct (%)91.9%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2023-12-12T07:49:09.620891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length6
Mean length4.2087912
Min length2

Characters and Unicode

Total characters1149
Distinct characters27
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique229 ?
Unique (%)83.9%

Sample

1st rowENGNEVL
2nd rowCMIT
3rd rowTCH
4th rowTCHNLGY
5th rowTEAM
ValueCountFrequency (%)
cntc 2
 
0.7%
schlshp 2
 
0.7%
id 2
 
0.7%
de 2
 
0.7%
rst 2
 
0.7%
pssrp 2
 
0.7%
en 2
 
0.7%
rsch 2
 
0.7%
nation 2
 
0.7%
grad 2
 
0.7%
Other values (241) 253
92.7%
2023-12-12T07:49:10.115067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
T 127
11.1%
R 120
 
10.4%
N 96
 
8.4%
C 88
 
7.7%
S 87
 
7.6%
E 83
 
7.2%
P 79
 
6.9%
A 59
 
5.1%
M 50
 
4.4%
D 44
 
3.8%
Other values (17) 316
27.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1148
99.9%
Connector Punctuation 1
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 127
11.1%
R 120
 
10.5%
N 96
 
8.4%
C 88
 
7.7%
S 87
 
7.6%
E 83
 
7.2%
P 79
 
6.9%
A 59
 
5.1%
M 50
 
4.4%
D 44
 
3.8%
Other values (16) 315
27.4%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1148
99.9%
Common 1
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 127
11.1%
R 120
 
10.5%
N 96
 
8.4%
C 88
 
7.7%
S 87
 
7.6%
E 83
 
7.2%
P 79
 
6.9%
A 59
 
5.1%
M 50
 
4.4%
D 44
 
3.8%
Other values (16) 315
27.4%
Common
ValueCountFrequency (%)
_ 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1149
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 127
11.1%
R 120
 
10.4%
N 96
 
8.4%
C 88
 
7.7%
S 87
 
7.6%
E 83
 
7.2%
P 79
 
6.9%
A 59
 
5.1%
M 50
 
4.4%
D 44
 
3.8%
Other values (17) 316
27.5%
Distinct242
Distinct (%)88.6%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2023-12-12T07:49:10.459587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length27
Mean length9.8424908
Min length2

Characters and Unicode

Total characters2687
Distinct characters30
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique215 ?
Unique (%)78.8%

Sample

1st rowENGNEVL
2nd rowCMIT
3rd rowTECHNOLOGY OF INSTITUDE
4th rowTECHNOLOGY
5th rowTEAM
ValueCountFrequency (%)
number 8
 
2.1%
research 8
 
2.1%
participation 7
 
1.9%
date 7
 
1.9%
amount 5
 
1.3%
practical 4
 
1.1%
plan 4
 
1.1%
of 4
 
1.1%
result 4
 
1.1%
registration 4
 
1.1%
Other values (246) 322
85.4%
2023-12-12T07:49:10.978185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 311
11.6%
T 245
 
9.1%
N 224
 
8.3%
A 213
 
7.9%
I 210
 
7.8%
R 207
 
7.7%
O 203
 
7.6%
C 136
 
5.1%
S 132
 
4.9%
104
 
3.9%
Other values (20) 702
26.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2569
95.6%
Space Separator 104
 
3.9%
Open Punctuation 5
 
0.2%
Close Punctuation 5
 
0.2%
Connector Punctuation 4
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 311
12.1%
T 245
9.5%
N 224
 
8.7%
A 213
 
8.3%
I 210
 
8.2%
R 207
 
8.1%
O 203
 
7.9%
C 136
 
5.3%
S 132
 
5.1%
P 104
 
4.0%
Other values (16) 584
22.7%
Space Separator
ValueCountFrequency (%)
104
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2569
95.6%
Common 118
 
4.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 311
12.1%
T 245
9.5%
N 224
 
8.7%
A 213
 
8.3%
I 210
 
8.2%
R 207
 
8.1%
O 203
 
7.9%
C 136
 
5.3%
S 132
 
5.1%
P 104
 
4.0%
Other values (16) 584
22.7%
Common
ValueCountFrequency (%)
104
88.1%
( 5
 
4.2%
) 5
 
4.2%
_ 4
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2687
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 311
11.6%
T 245
 
9.1%
N 224
 
8.3%
A 213
 
7.9%
I 210
 
7.8%
R 207
 
7.7%
O 203
 
7.6%
C 136
 
5.1%
S 132
 
4.9%
104
 
3.9%
Other values (20) 702
26.1%

단어유형
Categorical

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
수식어
218 
분류어
55 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row수식어
2nd row분류어
3rd row수식어
4th row수식어
5th row수식어

Common Values

ValueCountFrequency (%)
수식어 218
79.9%
분류어 55
 
20.1%

Length

2023-12-12T07:49:11.122947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T07:49:11.215554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
수식어 218
79.9%
분류어 55
 
20.1%

금칙어여부
Boolean

CONSTANT 

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size405.0 B
False
273 
ValueCountFrequency (%)
False 273
100.0%
2023-12-12T07:49:11.313001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

정의
Text

Distinct252
Distinct (%)92.3%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2023-12-12T07:49:11.642354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length2
Mean length2.6043956
Min length1

Characters and Unicode

Total characters711
Distinct characters214
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique234 ?
Unique (%)85.7%

Sample

1st row특정 일에 대한 전문가 집단의 심사
2nd row평가위원회구분
3rd row기술
4th row기술적
5th row부/팀
ValueCountFrequency (%)
과제 3
 
1.1%
기술 3
 
1.1%
기관 3
 
1.1%
코드 2
 
0.7%
결과 2
 
0.7%
요약 2
 
0.7%
기타 2
 
0.7%
영문 2
 
0.7%
여부 2
 
0.7%
예산 2
 
0.7%
Other values (247) 257
91.8%
2023-12-12T07:49:12.161775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
21
 
3.0%
14
 
2.0%
13
 
1.8%
13
 
1.8%
12
 
1.7%
12
 
1.7%
12
 
1.7%
10
 
1.4%
10
 
1.4%
10
 
1.4%
Other values (204) 584
82.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 686
96.5%
Uppercase Letter 15
 
2.1%
Space Separator 7
 
1.0%
Other Punctuation 2
 
0.3%
Lowercase Letter 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
21
 
3.1%
14
 
2.0%
13
 
1.9%
13
 
1.9%
12
 
1.7%
12
 
1.7%
12
 
1.7%
10
 
1.5%
10
 
1.5%
10
 
1.5%
Other values (193) 559
81.5%
Uppercase Letter
ValueCountFrequency (%)
S 4
26.7%
I 4
26.7%
N 2
13.3%
C 1
 
6.7%
R 1
 
6.7%
U 1
 
6.7%
L 1
 
6.7%
B 1
 
6.7%
Space Separator
ValueCountFrequency (%)
7
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2
100.0%
Lowercase Letter
ValueCountFrequency (%)
d 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 686
96.5%
Latin 16
 
2.3%
Common 9
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
21
 
3.1%
14
 
2.0%
13
 
1.9%
13
 
1.9%
12
 
1.7%
12
 
1.7%
12
 
1.7%
10
 
1.5%
10
 
1.5%
10
 
1.5%
Other values (193) 559
81.5%
Latin
ValueCountFrequency (%)
S 4
25.0%
I 4
25.0%
N 2
12.5%
C 1
 
6.2%
R 1
 
6.2%
d 1
 
6.2%
U 1
 
6.2%
L 1
 
6.2%
B 1
 
6.2%
Common
ValueCountFrequency (%)
7
77.8%
/ 2
 
22.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 686
96.5%
ASCII 25
 
3.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
21
 
3.1%
14
 
2.0%
13
 
1.9%
13
 
1.9%
12
 
1.7%
12
 
1.7%
12
 
1.7%
10
 
1.5%
10
 
1.5%
10
 
1.5%
Other values (193) 559
81.5%
ASCII
ValueCountFrequency (%)
7
28.0%
S 4
16.0%
I 4
16.0%
N 2
 
8.0%
/ 2
 
8.0%
C 1
 
4.0%
R 1
 
4.0%
d 1
 
4.0%
U 1
 
4.0%
L 1
 
4.0%

Missing values

2023-12-12T07:49:08.420499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T07:49:08.509110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

단어명약어영문명단어유형금칙어여부정의
0전문기관평가ENGNEVLENGNEVL수식어N특정 일에 대한 전문가 집단의 심사
1평가위원회구분CMITCMIT분류어N평가위원회구분
2기술TCHTECHNOLOGY OF INSTITUDE수식어N기술
3기술적TCHNLGYTECHNOLOGY수식어N기술적
4부/팀TEAMTEAM수식어N부/팀
5기술TECHTECHNOLOGICAL OF CLASSIFICATION수식어N기술
6전화TELTELEPHONE수식어N전화
7논문THESISTHESIS수식어N논문
8시간TMTIME수식어N시간
9TOTTOTAL수식어N
단어명약어영문명단어유형금칙어여부정의
263상태STTUSSTATUS수식어N상태
264현황STTUSSTATUS수식어N현황
265결과 요약SUMRYSUMMARY분류어N결과 요약
266요약SUMRYSUMMARY수식어N요약
267소프트웨어SWSOFTWARE수식어N소프트웨어
268테이블TABLETABLE수식어N테이블
269과제TASTASK수식어N과제
270기술TCTECHNICAL수식어N기술
271기술료TCTECHNICAL수식어N기술료
272과학기술분류TCCLTECHNICAL CLASSIFICATION수식어N과학기술분류