Overview

Dataset statistics

Number of variables5
Number of observations468
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory18.4 KiB
Average record size in memory40.3 B

Variable types

Text2
Boolean1
DateTime1
Categorical1

Dataset

Description한국기계연구원의 연구관리 분야에서 기술이전관련연구과제 테이블 정보(과제번호, 메인과제여부, 기술이전계약일, 책임자, 작성일 등을 관리)
URLhttps://www.data.go.kr/data/15078103/fileData.do

Alerts

작성일 has constant value ""Constant
메인과제여부 is highly imbalanced (69.9%)Imbalance

Reproduction

Analysis started2023-12-12 13:42:08.545836
Analysis finished2023-12-12 13:42:08.897005
Duration0.35 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct153
Distinct (%)32.7%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
2023-12-12T22:42:09.151999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters2808
Distinct characters30
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique76 ?
Unique (%)16.2%

Sample

1st rowNK220D
2nd rowNK214B
3rd rowNK214S
4th rowNK225C
5th rowNK227A
ValueCountFrequency (%)
nk234a 27
 
5.8%
nk227a 27
 
5.8%
nk221b 25
 
5.3%
nk214b 21
 
4.5%
nk240b 15
 
3.2%
nk240a 14
 
3.0%
nk226b 11
 
2.4%
nk232b 11
 
2.4%
nk220c 11
 
2.4%
nk220d 9
 
1.9%
Other values (143) 297
63.5%
2023-12-12T22:42:09.591900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 628
22.4%
K 443
15.8%
N 440
15.7%
1 214
 
7.6%
3 177
 
6.3%
4 148
 
5.3%
0 125
 
4.5%
B 107
 
3.8%
C 87
 
3.1%
A 81
 
2.9%
Other values (20) 358
12.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1434
51.1%
Uppercase Letter 1374
48.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
K 443
32.2%
N 440
32.0%
B 107
 
7.8%
C 87
 
6.3%
A 81
 
5.9%
D 49
 
3.6%
F 32
 
2.3%
S 31
 
2.3%
G 30
 
2.2%
E 26
 
1.9%
Other values (10) 48
 
3.5%
Decimal Number
ValueCountFrequency (%)
2 628
43.8%
1 214
 
14.9%
3 177
 
12.3%
4 148
 
10.3%
0 125
 
8.7%
7 62
 
4.3%
6 41
 
2.9%
9 16
 
1.1%
8 13
 
0.9%
5 10
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common 1434
51.1%
Latin 1374
48.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
K 443
32.2%
N 440
32.0%
B 107
 
7.8%
C 87
 
6.3%
A 81
 
5.9%
D 49
 
3.6%
F 32
 
2.3%
S 31
 
2.3%
G 30
 
2.2%
E 26
 
1.9%
Other values (10) 48
 
3.5%
Common
ValueCountFrequency (%)
2 628
43.8%
1 214
 
14.9%
3 177
 
12.3%
4 148
 
10.3%
0 125
 
8.7%
7 62
 
4.3%
6 41
 
2.9%
9 16
 
1.1%
8 13
 
0.9%
5 10
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2808
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 628
22.4%
K 443
15.8%
N 440
15.7%
1 214
 
7.6%
3 177
 
6.3%
4 148
 
5.3%
0 125
 
4.5%
B 107
 
3.8%
C 87
 
3.1%
A 81
 
2.9%
Other values (20) 358
12.7%

메인과제여부
Boolean

IMBALANCE 

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size600.0 B
True
443 
False
 
25
ValueCountFrequency (%)
True 443
94.7%
False 25
 
5.3%
2023-12-12T22:42:09.701271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Distinct284
Distinct (%)60.7%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
Minimum2018-01-01 00:00:00
Maximum2023-06-30 00:00:00
2023-12-12T22:42:09.829977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:42:09.959337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct66
Distinct (%)14.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
2023-12-12T22:42:10.154686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1404
Distinct characters67
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)2.6%

Sample

1st row*병*
2nd row*백*
3rd row*용*
4th row*도*
5th row*승*
ValueCountFrequency (%)
36
 
7.7%
26
 
5.6%
25
 
5.3%
23
 
4.9%
22
 
4.7%
20
 
4.3%
17
 
3.6%
17
 
3.6%
15
 
3.2%
13
 
2.8%
Other values (56) 254
54.3%
2023-12-12T22:42:10.452387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 936
66.7%
36
 
2.6%
26
 
1.9%
25
 
1.8%
23
 
1.6%
22
 
1.6%
20
 
1.4%
17
 
1.2%
17
 
1.2%
15
 
1.1%
Other values (57) 267
 
19.0%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 936
66.7%
Other Letter 468
33.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
36
 
7.7%
26
 
5.6%
25
 
5.3%
23
 
4.9%
22
 
4.7%
20
 
4.3%
17
 
3.6%
17
 
3.6%
15
 
3.2%
13
 
2.8%
Other values (56) 254
54.3%
Other Punctuation
ValueCountFrequency (%)
* 936
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 936
66.7%
Hangul 468
33.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
36
 
7.7%
26
 
5.6%
25
 
5.3%
23
 
4.9%
22
 
4.7%
20
 
4.3%
17
 
3.6%
17
 
3.6%
15
 
3.2%
13
 
2.8%
Other values (56) 254
54.3%
Common
ValueCountFrequency (%)
* 936
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 936
66.7%
Hangul 468
33.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 936
100.0%
Hangul
ValueCountFrequency (%)
36
 
7.7%
26
 
5.6%
25
 
5.3%
23
 
4.9%
22
 
4.7%
20
 
4.3%
17
 
3.6%
17
 
3.6%
15
 
3.2%
13
 
2.8%
Other values (56) 254
54.3%

작성일
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
2023-07-28
468 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-07-28
2nd row2023-07-28
3rd row2023-07-28
4th row2023-07-28
5th row2023-07-28

Common Values

ValueCountFrequency (%)
2023-07-28 468
100.0%

Length

2023-12-12T22:42:10.572726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:42:10.648997image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023-07-28 468
100.0%

Correlations

2023-12-12T22:42:10.704362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
메인과제여부책임자
메인과제여부1.0000.000
책임자0.0001.000

Missing values

2023-12-12T22:42:08.756654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:42:08.858347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사업_과제번호메인과제여부기술이전계약일책임자작성일
0NK220DY2019-12-03*병*2023-07-28
1NK214BY2018-06-01*백*2023-07-28
2NK214SY2018-09-14*용*2023-07-28
3NK225CY2020-02-10*도*2023-07-28
4NK227AY2020-03-01*승*2023-07-28
5NK225CY2020-06-22*도*2023-07-28
6NK214BY2018-07-01*성*2023-07-28
7NK213BY2018-10-22*대*2023-07-28
8NK214LY2018-11-30*양*2023-07-28
9NK226DY2020-07-09*준*2023-07-28
사업_과제번호메인과제여부기술이전계약일책임자작성일
458NK226BY2021-12-03*의*2023-07-28
459NK240AY2022-06-28*준*2023-07-28
460NK234AY2021-07-07*완*2023-07-28
461NK234AY2021-09-01*광*2023-07-28
462NK240AY2022-06-27*상*2023-07-28
463NK240BN2022-07-13*용*2023-07-28
464NK236JY2022-11-15*준*2023-07-28
465NK240EY2023-06-30*찬*2023-07-28
466NK240BY2022-10-12*성*2023-07-28
467NK237DY2023-02-27*범*2023-07-28