Overview

Dataset statistics

Number of variables10
Number of observations2043
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory167.7 KiB
Average record size in memory84.1 B

Variable types

Text3
Numeric3
Categorical2
DateTime2

Dataset

Description한국기계연구원에서 발표한 논문 과제정보(사업과제신청번호,사업_과제번호,성과활동관련과제구분,작성자,작성일,과제기여율 등)
URLhttps://www.data.go.kr/data/15049190/fileData.do

Alerts

성과활동관련과제구분 has constant value ""Constant
수정자 has constant value ""Constant
과제기여율 is highly overall correlated with 오더번호High correlation
오더번호 is highly overall correlated with 과제기여율High correlation

Reproduction

Analysis started2023-12-12 11:15:30.909874
Analysis finished2023-12-12 11:15:33.392758
Duration2.48 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1340
Distinct (%)65.6%
Missing0
Missing (%)0.0%
Memory size16.1 KiB
2023-12-12T20:15:33.734399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters18387
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique786 ?
Unique (%)38.5%

Sample

1st rowT20180177
2nd rowT20180186
3rd rowT20180210
4th rowT20180726
5th rowT20180726
ValueCountFrequency (%)
t20210427 6
 
0.3%
t20220435 5
 
0.2%
t20200310 5
 
0.2%
t20220408 5
 
0.2%
t20220419 5
 
0.2%
t20220434 5
 
0.2%
t20210377 4
 
0.2%
t20210642 4
 
0.2%
t20180300 4
 
0.2%
t20210287 4
 
0.2%
Other values (1330) 1996
97.7%
2023-12-12T20:15:34.868968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 5174
28.1%
2 4542
24.7%
T 2043
 
11.1%
1 1963
 
10.7%
8 778
 
4.2%
9 746
 
4.1%
3 736
 
4.0%
4 693
 
3.8%
5 623
 
3.4%
6 572
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16344
88.9%
Uppercase Letter 2043
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5174
31.7%
2 4542
27.8%
1 1963
 
12.0%
8 778
 
4.8%
9 746
 
4.6%
3 736
 
4.5%
4 693
 
4.2%
5 623
 
3.8%
6 572
 
3.5%
7 517
 
3.2%
Uppercase Letter
ValueCountFrequency (%)
T 2043
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 16344
88.9%
Latin 2043
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5174
31.7%
2 4542
27.8%
1 1963
 
12.0%
8 778
 
4.8%
9 746
 
4.6%
3 736
 
4.5%
4 693
 
4.2%
5 623
 
3.8%
6 572
 
3.5%
7 517
 
3.2%
Latin
ValueCountFrequency (%)
T 2043
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18387
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5174
28.1%
2 4542
24.7%
T 2043
 
11.1%
1 1963
 
10.7%
8 778
 
4.2%
9 746
 
4.1%
3 736
 
4.0%
4 693
 
3.8%
5 623
 
3.4%
6 572
 
3.1%
Distinct833
Distinct (%)40.8%
Missing0
Missing (%)0.0%
Memory size16.1 KiB
2023-12-12T20:15:35.430861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters12258
Distinct characters29
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique431 ?
Unique (%)21.1%

Sample

1st rowMO8580
2nd rowNK213D
3rd rowMO5590
4th rowNK211B
5th rowOD1760
ValueCountFrequency (%)
nk226d 26
 
1.3%
nk211b 17
 
0.8%
nm9730 14
 
0.7%
nk232a 13
 
0.6%
nk213e 13
 
0.6%
nk230c 13
 
0.6%
nk231f 13
 
0.6%
nm9440 12
 
0.6%
nk220d 12
 
0.6%
nk238d 11
 
0.5%
Other values (823) 1899
93.0%
2023-12-12T20:15:36.289193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2015
16.4%
2 1359
11.1%
N 1350
11.0%
1 840
 
6.9%
K 715
 
5.8%
3 696
 
5.7%
M 550
 
4.5%
9 516
 
4.2%
7 484
 
3.9%
6 435
 
3.5%
Other values (19) 3298
26.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7441
60.7%
Uppercase Letter 4817
39.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 1350
28.0%
K 715
14.8%
M 550
11.4%
B 363
 
7.5%
E 363
 
7.5%
T 256
 
5.3%
D 246
 
5.1%
O 233
 
4.8%
C 179
 
3.7%
G 174
 
3.6%
Other values (9) 388
 
8.1%
Decimal Number
ValueCountFrequency (%)
0 2015
27.1%
2 1359
18.3%
1 840
11.3%
3 696
 
9.4%
9 516
 
6.9%
7 484
 
6.5%
6 435
 
5.8%
4 380
 
5.1%
8 377
 
5.1%
5 339
 
4.6%

Most occurring scripts

ValueCountFrequency (%)
Common 7441
60.7%
Latin 4817
39.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 1350
28.0%
K 715
14.8%
M 550
11.4%
B 363
 
7.5%
E 363
 
7.5%
T 256
 
5.3%
D 246
 
5.1%
O 233
 
4.8%
C 179
 
3.7%
G 174
 
3.6%
Other values (9) 388
 
8.1%
Common
ValueCountFrequency (%)
0 2015
27.1%
2 1359
18.3%
1 840
11.3%
3 696
 
9.4%
9 516
 
6.9%
7 484
 
6.5%
6 435
 
5.8%
4 380
 
5.1%
8 377
 
5.1%
5 339
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12258
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2015
16.4%
2 1359
11.1%
N 1350
11.0%
1 840
 
6.9%
K 715
 
5.8%
3 696
 
5.7%
M 550
 
4.5%
9 516
 
4.2%
7 484
 
3.9%
6 435
 
3.5%
Other values (19) 3298
26.9%

사업_과제신청순번
Real number (ℝ)

Distinct10
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.7963779
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.1 KiB
2023-12-12T20:15:36.540083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.4734498
Coefficient of variation (CV)0.52691371
Kurtosis0.93232891
Mean2.7963779
Median Absolute Deviation (MAD)1
Skewness0.79244933
Sum5713
Variance2.1710545
MonotonicityNot monotonic
2023-12-12T20:15:36.723642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
3 526
25.7%
1 468
22.9%
2 462
22.6%
4 312
15.3%
5 203
 
9.9%
6 50
 
2.4%
8 8
 
0.4%
7 6
 
0.3%
9 6
 
0.3%
10 2
 
0.1%
ValueCountFrequency (%)
1 468
22.9%
2 462
22.6%
3 526
25.7%
4 312
15.3%
5 203
 
9.9%
6 50
 
2.4%
7 6
 
0.3%
8 8
 
0.4%
9 6
 
0.3%
10 2
 
0.1%
ValueCountFrequency (%)
10 2
 
0.1%
9 6
 
0.3%
8 8
 
0.4%
7 6
 
0.3%
6 50
 
2.4%
5 203
 
9.9%
4 312
15.3%
3 526
25.7%
2 462
22.6%
1 468
22.9%

성과활동관련과제구분
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.1 KiB
논문
2043 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row논문
2nd row논문
3rd row논문
4th row논문
5th row논문

Common Values

ValueCountFrequency (%)
논문 2043
100.0%

Length

2023-12-12T20:15:36.903540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:15:37.067671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
논문 2043
100.0%
Distinct309
Distinct (%)15.1%
Missing0
Missing (%)0.0%
Memory size16.1 KiB
2023-12-12T20:15:37.656409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters8172
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)2.5%

Sample

1st row0813
2nd row0908
3rd row0985
4th row0973
5th row0973
ValueCountFrequency (%)
0795 59
 
2.9%
0934 39
 
1.9%
0869 35
 
1.7%
0612 35
 
1.7%
1066 34
 
1.7%
0868 33
 
1.6%
0994 30
 
1.5%
1043 30
 
1.5%
0872 29
 
1.4%
1079 28
 
1.4%
Other values (299) 1691
82.8%
2023-12-12T20:15:38.549689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2098
25.7%
1 1286
15.7%
8 899
11.0%
9 892
10.9%
6 659
 
8.1%
7 505
 
6.2%
4 491
 
6.0%
3 488
 
6.0%
2 411
 
5.0%
5 356
 
4.4%
Other values (3) 87
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8085
98.9%
Uppercase Letter 87
 
1.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2098
25.9%
1 1286
15.9%
8 899
11.1%
9 892
11.0%
6 659
 
8.2%
7 505
 
6.2%
4 491
 
6.1%
3 488
 
6.0%
2 411
 
5.1%
5 356
 
4.4%
Uppercase Letter
ValueCountFrequency (%)
M 52
59.8%
H 28
32.2%
U 7
 
8.0%

Most occurring scripts

ValueCountFrequency (%)
Common 8085
98.9%
Latin 87
 
1.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2098
25.9%
1 1286
15.9%
8 899
11.1%
9 892
11.0%
6 659
 
8.2%
7 505
 
6.2%
4 491
 
6.1%
3 488
 
6.0%
2 411
 
5.1%
5 356
 
4.4%
Latin
ValueCountFrequency (%)
M 52
59.8%
H 28
32.2%
U 7
 
8.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8172
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2098
25.7%
1 1286
15.7%
8 899
11.0%
9 892
10.9%
6 659
 
8.1%
7 505
 
6.2%
4 491
 
6.0%
3 488
 
6.0%
2 411
 
5.0%
5 356
 
4.4%
Other values (3) 87
 
1.1%
Distinct710
Distinct (%)34.8%
Missing0
Missing (%)0.0%
Memory size16.1 KiB
Minimum2018-01-03 00:00:00
Maximum2023-02-14 00:00:00
2023-12-12T20:15:38.826038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:15:39.100362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

수정자
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.1 KiB
9999
2043 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row9999
2nd row9999
3rd row9999
4th row9999
5th row9999

Common Values

ValueCountFrequency (%)
9999 2043
100.0%

Length

2023-12-12T20:15:39.317896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:15:39.487177image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
9999 2043
100.0%
Distinct710
Distinct (%)34.8%
Missing0
Missing (%)0.0%
Memory size16.1 KiB
Minimum2018-01-03 00:00:00
Maximum2023-02-14 00:00:00
2023-12-12T20:15:39.643666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:15:39.892546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

과제기여율
Real number (ℝ)

HIGH CORRELATION 

Distinct29
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean62.100343
Minimum0
Maximum100
Zeros3
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size18.1 KiB
2023-12-12T20:15:40.113944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10
Q140
median50
Q3100
95-th percentile100
Maximum100
Range100
Interquartile range (IQR)60

Descriptive statistics

Standard deviation31.965291
Coefficient of variation (CV)0.51473615
Kurtosis-1.2936384
Mean62.100343
Median Absolute Deviation (MAD)30
Skewness-0.0774988
Sum126871
Variance1021.7798
MonotonicityNot monotonic
2023-12-12T20:15:40.319880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
100 708
34.7%
50 526
25.7%
30 167
 
8.2%
20 106
 
5.2%
40 106
 
5.2%
70 82
 
4.0%
10 71
 
3.5%
60 58
 
2.8%
80 50
 
2.4%
1 45
 
2.2%
Other values (19) 124
 
6.1%
ValueCountFrequency (%)
0 3
 
0.1%
1 45
2.2%
2 4
 
0.2%
3 1
 
< 0.1%
5 24
 
1.2%
10 71
3.5%
15 4
 
0.2%
20 106
5.2%
25 29
 
1.4%
26 1
 
< 0.1%
ValueCountFrequency (%)
100 708
34.7%
95 4
 
0.2%
90 19
 
0.9%
85 1
 
< 0.1%
80 50
 
2.4%
70 82
 
4.0%
67 1
 
< 0.1%
60 58
 
2.8%
51 2
 
0.1%
50 526
25.7%

오더번호
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.4395497
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.1 KiB
2023-12-12T20:15:40.501428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile3
Maximum6
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.69744335
Coefficient of variation (CV)0.48448717
Kurtosis3.9729702
Mean1.4395497
Median Absolute Deviation (MAD)0
Skewness1.8071709
Sum2941
Variance0.48642722
MonotonicityNot monotonic
2023-12-12T20:15:40.700454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 1339
65.5%
2 552
27.0%
3 118
 
5.8%
4 27
 
1.3%
5 6
 
0.3%
6 1
 
< 0.1%
ValueCountFrequency (%)
1 1339
65.5%
2 552
27.0%
3 118
 
5.8%
4 27
 
1.3%
5 6
 
0.3%
6 1
 
< 0.1%
ValueCountFrequency (%)
6 1
 
< 0.1%
5 6
 
0.3%
4 27
 
1.3%
3 118
 
5.8%
2 552
27.0%
1 1339
65.5%

Interactions

2023-12-12T20:15:32.431191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:15:31.400061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:15:31.886358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:15:32.601462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:15:31.537844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:15:32.053066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:15:32.783728image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:15:31.706840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:15:32.242590image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T20:15:40.856098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업_과제신청순번과제기여율오더번호
사업_과제신청순번1.0000.0000.016
과제기여율0.0001.0000.583
오더번호0.0160.5831.000
2023-12-12T20:15:41.007707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업_과제신청순번과제기여율오더번호
사업_과제신청순번1.0000.061-0.083
과제기여율0.0611.000-0.656
오더번호-0.083-0.6561.000

Missing values

2023-12-12T20:15:33.022281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T20:15:33.277647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

성과활동신청번호사업_과제번호사업_과제신청순번성과활동관련과제구분작성자작성일수정자수정일과제기여율오더번호
0T20180177MO85803논문08132018-01-1799992018-01-171001
1T20180186NK213D1논문09082018-01-2299992018-01-221001
2T20180210MO55905논문09852018-02-1999992018-02-191001
3T20180726NK211B1논문09732018-09-1499992018-09-14501
4T20180726OD17605논문09732018-09-1499992018-09-14502
5T20180982OD17903논문07362018-11-1699992018-11-161001
6T20180958NE63001논문10372018-11-1399992018-11-131001
7T20181141NK212D1논문10042018-12-2099992018-12-20501
8T20181141NK215C1논문10042018-12-2099992018-12-20502
9T20190030MO90604논문06672019-01-0899992019-01-081001
성과활동신청번호사업_과제번호사업_과제신청순번성과활동관련과제구분작성자작성일수정자수정일과제기여율오더번호
2033T20230055NB17301논문06212023-01-0599992023-01-051001
2034T20230057NE79702논문07272023-01-0599992023-01-051001
2035T20230059NK237A5논문08372023-01-0599992023-01-051001
2036T20230060NK237A5논문07272023-01-0599992023-01-05501
2037T20230060AI38601논문07272023-01-0599992023-01-05502
2038T20230065GG32302논문07362023-01-0599992023-01-051001
2039T20230114NE79303논문09472023-01-0699992023-01-061001
2040T20230152NE80805논문09512023-02-1499992023-02-141001
2041T20230126NK213F4논문00212023-01-1699992023-01-16401
2042T20230126MT21302논문00212023-01-1699992023-01-16602