Overview

Dataset statistics

Number of variables10
Number of observations1371
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory112.6 KiB
Average record size in memory84.1 B

Variable types

Text2
Categorical1
DateTime3
Numeric4

Dataset

Description한국기계연구원의 연구관리 분야에서 과제계획서참여연구원을 관리하는 정보(과제번호, 참여자, 참여형태, 참여시작일, 참여종료일, 참여율 등을 관리)
URLhttps://www.data.go.kr/data/15078050/fileData.do

Alerts

작성일 has constant value ""Constant
참여개월 is highly overall correlated with 참여일수High correlation
참여일수 is highly overall correlated with 참여개월High correlation
참여율 is highly overall correlated with 참여연구원인건비High correlation
참여연구원인건비 is highly overall correlated with 참여율High correlation
참여형태 is highly imbalanced (64.2%)Imbalance
참여율 has 21 (1.5%) zerosZeros

Reproduction

Analysis started2023-12-12 15:55:56.720699
Analysis finished2023-12-12 15:55:59.984914
Duration3.26 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct91
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Memory size10.8 KiB
2023-12-13T00:56:00.284018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters8226
Distinct characters24
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNK237C
2nd rowNK237C
3rd rowNK237C
4th rowNK232C
5th rowNK232C
ValueCountFrequency (%)
nk237b 43
 
3.1%
nk231b 41
 
3.0%
nk232d 39
 
2.8%
nk236i 37
 
2.7%
nk237a 35
 
2.6%
nk240a 34
 
2.5%
nk234a 33
 
2.4%
nk232f 33
 
2.4%
nk230c 32
 
2.3%
nk238f 31
 
2.3%
Other values (81) 1013
73.9%
2023-12-13T00:56:00.815209image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 1487
18.1%
K 1343
16.3%
N 1340
16.3%
3 1292
15.7%
0 335
 
4.1%
A 255
 
3.1%
6 239
 
2.9%
4 210
 
2.6%
1 201
 
2.4%
B 194
 
2.4%
Other values (14) 1330
16.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4151
50.5%
Uppercase Letter 4075
49.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
K 1343
33.0%
N 1340
32.9%
A 255
 
6.3%
B 194
 
4.8%
C 180
 
4.4%
F 178
 
4.4%
D 161
 
4.0%
E 139
 
3.4%
G 93
 
2.3%
I 62
 
1.5%
Other values (5) 130
 
3.2%
Decimal Number
ValueCountFrequency (%)
2 1487
35.8%
3 1292
31.1%
0 335
 
8.1%
6 239
 
5.8%
4 210
 
5.1%
1 201
 
4.8%
7 188
 
4.5%
8 143
 
3.4%
9 56
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Common 4151
50.5%
Latin 4075
49.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
K 1343
33.0%
N 1340
32.9%
A 255
 
6.3%
B 194
 
4.8%
C 180
 
4.4%
F 178
 
4.4%
D 161
 
4.0%
E 139
 
3.4%
G 93
 
2.3%
I 62
 
1.5%
Other values (5) 130
 
3.2%
Common
ValueCountFrequency (%)
2 1487
35.8%
3 1292
31.1%
0 335
 
8.1%
6 239
 
5.8%
4 210
 
5.1%
1 201
 
4.8%
7 188
 
4.5%
8 143
 
3.4%
9 56
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8226
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 1487
18.1%
K 1343
16.3%
N 1340
16.3%
3 1292
15.7%
0 335
 
4.1%
A 255
 
3.1%
6 239
 
2.9%
4 210
 
2.6%
1 201
 
2.4%
B 194
 
2.4%
Other values (14) 1330
16.2%
Distinct114
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size10.8 KiB
2023-12-13T00:56:01.159482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters4113
Distinct characters115
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.3%

Sample

1st row*민*
2nd row*한*
3rd row*영*
4th row*치*
5th row*재*
ValueCountFrequency (%)
58
 
4.2%
55
 
4.0%
54
 
3.9%
51
 
3.7%
49
 
3.6%
49
 
3.6%
46
 
3.4%
45
 
3.3%
41
 
3.0%
39
 
2.8%
Other values (104) 886
64.5%
2023-12-13T00:56:01.677789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 2742
66.7%
58
 
1.4%
55
 
1.3%
54
 
1.3%
51
 
1.2%
49
 
1.2%
49
 
1.2%
46
 
1.1%
45
 
1.1%
41
 
1.0%
Other values (105) 923
 
22.4%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 2742
66.7%
Other Letter 1369
33.3%
Space Separator 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
58
 
4.2%
55
 
4.0%
54
 
3.9%
51
 
3.7%
49
 
3.6%
49
 
3.6%
46
 
3.4%
45
 
3.3%
41
 
3.0%
39
 
2.8%
Other values (103) 882
64.4%
Other Punctuation
ValueCountFrequency (%)
* 2742
100.0%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2744
66.7%
Hangul 1369
33.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
58
 
4.2%
55
 
4.0%
54
 
3.9%
51
 
3.7%
49
 
3.6%
49
 
3.6%
46
 
3.4%
45
 
3.3%
41
 
3.0%
39
 
2.8%
Other values (103) 882
64.4%
Common
ValueCountFrequency (%)
* 2742
99.9%
2
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2744
66.7%
Hangul 1369
33.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 2742
99.9%
2
 
0.1%
Hangul
ValueCountFrequency (%)
58
 
4.2%
55
 
4.0%
54
 
3.9%
51
 
3.7%
49
 
3.6%
49
 
3.6%
46
 
3.4%
45
 
3.3%
41
 
3.0%
39
 
2.8%
Other values (103) 882
64.4%

참여형태
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size10.8 KiB
연구원
1278 
책임자
 
93

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row책임자
2nd row연구원
3rd row연구원
4th row연구원
5th row연구원

Common Values

ValueCountFrequency (%)
연구원 1278
93.2%
책임자 93
 
6.8%

Length

2023-12-13T00:56:01.859032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:56:02.065773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
연구원 1278
93.2%
책임자 93
 
6.8%
Distinct48
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size10.8 KiB
Minimum2021-01-01 00:00:00
Maximum2022-12-01 00:00:00
2023-12-13T00:56:02.241248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:56:02.415776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
Distinct44
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size10.8 KiB
Minimum2021-01-27 00:00:00
Maximum2022-12-31 00:00:00
2023-12-13T00:56:02.593780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:56:02.790079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)

참여개월
Real number (ℝ)

HIGH CORRELATION 

Distinct12
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.126185
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2023-12-13T00:56:02.951370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q112
median12
Q312
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.4278148
Coefficient of variation (CV)0.21820729
Kurtosis6.8421714
Mean11.126185
Median Absolute Deviation (MAD)0
Skewness-2.8277215
Sum15254
Variance5.8942846
MonotonicityNot monotonic
2023-12-13T00:56:03.124268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
12 1176
85.8%
2 27
 
2.0%
10 24
 
1.8%
4 23
 
1.7%
6 22
 
1.6%
9 22
 
1.6%
3 17
 
1.2%
7 17
 
1.2%
8 13
 
0.9%
5 13
 
0.9%
Other values (2) 17
 
1.2%
ValueCountFrequency (%)
1 10
 
0.7%
2 27
2.0%
3 17
1.2%
4 23
1.7%
5 13
0.9%
6 22
1.6%
7 17
1.2%
8 13
0.9%
9 22
1.6%
10 24
1.8%
ValueCountFrequency (%)
12 1176
85.8%
11 7
 
0.5%
10 24
 
1.8%
9 22
 
1.6%
8 13
 
0.9%
7 17
 
1.2%
6 22
 
1.6%
5 13
 
0.9%
4 23
 
1.7%
3 17
 
1.2%

참여일수
Real number (ℝ)

HIGH CORRELATION 

Distinct77
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean338.48505
Minimum17
Maximum365
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2023-12-13T00:56:03.289122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum17
5-th percentile122
Q1365
median365
Q3365
95-th percentile365
Maximum365
Range348
Interquartile range (IQR)0

Descriptive statistics

Standard deviation73.652728
Coefficient of variation (CV)0.21759522
Kurtosis6.9100609
Mean338.48505
Median Absolute Deviation (MAD)0
Skewness-2.8349037
Sum464063
Variance5424.7244
MonotonicityNot monotonic
2023-12-13T00:56:03.468240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
365 1173
85.6%
59 23
 
1.7%
275 14
 
1.0%
306 13
 
0.9%
184 12
 
0.9%
122 9
 
0.7%
243 8
 
0.6%
107 6
 
0.4%
187 5
 
0.4%
91 5
 
0.4%
Other values (67) 103
 
7.5%
ValueCountFrequency (%)
17 1
 
0.1%
25 1
 
0.1%
27 1
 
0.1%
29 1
 
0.1%
31 2
 
0.1%
32 4
 
0.3%
53 1
 
0.1%
59 23
1.7%
73 2
 
0.1%
76 1
 
0.1%
ValueCountFrequency (%)
365 1173
85.6%
364 1
 
0.1%
362 2
 
0.1%
342 1
 
0.1%
334 2
 
0.1%
329 1
 
0.1%
321 3
 
0.2%
319 1
 
0.1%
314 1
 
0.1%
312 1
 
0.1%

참여율
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct522
Distinct (%)38.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.827206
Minimum0
Maximum100
Zeros21
Zeros (%)1.5%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2023-12-13T00:56:03.668526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.7
Q122.35
median44.1
Q352.05
95-th percentile99.35
Maximum100
Range100
Interquartile range (IQR)29.7

Descriptive statistics

Standard deviation24.956924
Coefficient of variation (CV)0.61128169
Kurtosis0.058112502
Mean40.827206
Median Absolute Deviation (MAD)15.1
Skewness0.46911993
Sum55974.1
Variance622.84804
MonotonicityNot monotonic
2023-12-13T00:56:03.864701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
48.0 78
 
5.7%
100.0 66
 
4.8%
10.0 32
 
2.3%
24.0 29
 
2.1%
20.0 26
 
1.9%
0.0 21
 
1.5%
51.0 19
 
1.4%
19.6 13
 
0.9%
50.7 13
 
0.9%
2.0 10
 
0.7%
Other values (512) 1064
77.6%
ValueCountFrequency (%)
0.0 21
1.5%
0.5 2
 
0.1%
0.6 4
 
0.3%
0.7 1
 
0.1%
0.8 2
 
0.1%
0.9 3
 
0.2%
1.0 5
 
0.4%
1.1 6
 
0.4%
1.3 3
 
0.2%
1.4 9
0.7%
ValueCountFrequency (%)
100.0 66
4.8%
99.9 1
 
0.1%
99.7 1
 
0.1%
99.6 1
 
0.1%
99.1 2
 
0.1%
98.8 1
 
0.1%
98.1 1
 
0.1%
97.8 1
 
0.1%
96.9 1
 
0.1%
96.5 2
 
0.1%

참여연구원인건비
Real number (ℝ)

HIGH CORRELATION 

Distinct1258
Distinct (%)91.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29630409
Minimum57446
Maximum1.18655 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2023-12-13T00:56:04.021354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum57446
5-th percentile1659500
Q111445828
median27892000
Q344440000
95-th percentile63297152
Maximum1.18655 × 108
Range1.1859755 × 108
Interquartile range (IQR)32994172

Descriptive statistics

Standard deviation20958774
Coefficient of variation (CV)0.70734003
Kurtosis0.016081396
Mean29630409
Median Absolute Deviation (MAD)16500344
Skewness0.57968134
Sum4.0623291 × 1010
Variance4.3927023 × 1014
MonotonicityNot monotonic
2023-12-13T00:56:04.214033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2000000 19
 
1.4%
3000000 16
 
1.2%
1000000 12
 
0.9%
1400000 8
 
0.6%
6000000 6
 
0.4%
1300000 5
 
0.4%
600000 5
 
0.4%
1500000 5
 
0.4%
550000 4
 
0.3%
2500000 4
 
0.3%
Other values (1248) 1287
93.9%
ValueCountFrequency (%)
57446 1
 
0.1%
300000 1
 
0.1%
400000 1
 
0.1%
550000 4
0.3%
599206 1
 
0.1%
600000 5
0.4%
800000 1
 
0.1%
860000 1
 
0.1%
900000 3
0.2%
960000 1
 
0.1%
ValueCountFrequency (%)
118655000 1
0.1%
111390228 1
0.1%
100522000 1
0.1%
99535113 1
0.1%
97337000 1
0.1%
96135407 1
0.1%
93790625 1
0.1%
93590000 1
0.1%
92611267 1
0.1%
90490000 1
0.1%

작성일
Date

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size10.8 KiB
Minimum2023-07-28 00:00:00
Maximum2023-07-28 00:00:00
2023-12-13T00:56:04.371578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:56:04.467936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Interactions

2023-12-13T00:55:59.233221image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:57.321698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:57.819288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:58.408119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:59.347057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:57.419164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:57.954682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:58.518959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:59.478760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:57.552705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:58.100058image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:59.007164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:59.581995image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:57.684127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:58.242702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:55:59.106362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T00:56:04.548228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업_과제번호참여형태참여시작일참여종료일참여개월참여일수참여율참여연구원인건비
사업_과제번호1.0000.1820.6620.4780.5850.5890.7800.513
참여형태0.1821.0000.0000.0000.0840.0910.2120.323
참여시작일0.6620.0001.0000.0000.9500.9400.5910.000
참여종료일0.4780.0000.0001.0000.9120.9110.3330.000
참여개월0.5850.0840.9500.9121.0000.9950.4450.379
참여일수0.5890.0910.9400.9110.9951.0000.4310.367
참여율0.7800.2120.5910.3330.4450.4311.0000.803
참여연구원인건비0.5130.3230.0000.0000.3790.3670.8031.000
2023-12-13T00:56:04.701364image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
참여개월참여일수참여율참여연구원인건비참여형태
참여개월1.0000.994-0.1110.3630.064
참여일수0.9941.000-0.1110.3610.069
참여율-0.111-0.1111.0000.6400.178
참여연구원인건비0.3630.3610.6401.0000.247
참여형태0.0640.0690.1780.2471.000

Missing values

2023-12-13T00:55:59.743805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T00:55:59.912428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사업_과제번호참여자명참여형태참여시작일참여종료일참여개월참여일수참여율참여연구원인건비작성일
0NK237C*민*책임자2022-01-012022-12-311236543.5415722202023-07-28
1NK237C*한*연구원2022-01-012022-12-311236535.2382664652023-07-28
2NK237C*영*연구원2022-01-012022-12-311236544.7522547182023-07-28
3NK232C*치*연구원2021-01-012021-12-311236548.0567100002023-07-28
4NK232C*재*연구원2021-01-012021-12-311236547.7515180002023-07-28
5NK232C*태*연구원2021-01-012021-12-311236548.0468010002023-07-28
6NK232C*상*연구원2021-01-012021-12-311236548.0421090002023-07-28
7NK232C*상*연구원2021-01-012021-12-311236548.0354020002023-07-28
8NK232C*승*연구원2021-01-012021-12-311236549.6371995402023-07-28
9NK232C*선*연구원2021-01-012021-12-311236548.0290690002023-07-28
사업_과제번호참여자명참여형태참여시작일참여종료일참여개월참여일수참여율참여연구원인건비작성일
1361NK236G*수*연구원2022-01-012022-12-311236522.0170481972023-07-28
1362NK236G*준*연구원2022-09-142022-12-31410949.2102306052023-07-28
1363NK236G*봉*연구원2022-01-012022-02-2225392.529790632023-07-28
1364NK236G*명*연구원2022-01-012022-12-311236518.249931632023-07-28
1365NK236G*종*연구원2022-01-012022-12-3112365100.078706302023-07-28
1366NK236C*명*연구원2022-01-012022-12-3112365100.0190491252023-07-28
1367NK236C*구*연구원2022-01-012022-12-3112365100.0267008752023-07-28
1368NK236C*혁*연구원2022-01-012022-12-311236535.680000002023-07-28
1369NK236C*규*연구원2022-01-012022-12-311236556.1170000002023-07-28
1370NK236C*재*연구원2022-09-012022-12-3141220.062500002023-07-28