Overview

Dataset statistics

Number of variables9
Number of observations573
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory42.1 KiB
Average record size in memory75.2 B

Variable types

Categorical2
Text1
DateTime4
Numeric2

Dataset

Description한국기계연구원의 연구관리 분야에서 사업/과제계획서참여연구원월별참여율을 관리하는 테이블 정보(과제번호, 참여자, 참여월, 참여시작일, 참여종료일, 참여율 등을 관리)
URLhttps://www.data.go.kr/data/15078048/fileData.do

Alerts

참여연구원참여적용년월일 has constant value ""Constant
작성일 has constant value ""Constant
참여율 is highly overall correlated with 참여연구원인건비High correlation
참여연구원인건비 is highly overall correlated with 참여율High correlation
참여일수 is highly imbalanced (97.7%)Imbalance

Reproduction

Analysis started2023-12-12 06:40:15.581021
Analysis finished2023-12-12 06:40:16.720993
Duration1.14 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct48
Distinct (%)8.4%
Missing0
Missing (%)0.0%
Memory size4.6 KiB
NK236I
 
34
NK240A
 
31
NK237B
 
29
NK237A
 
26
NK236C
 
25
Other values (43)
428 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNK240K
2nd rowNK240B
3rd rowNK240B
4th rowNK240K
5th rowNK240B

Common Values

ValueCountFrequency (%)
NK236I 34
 
5.9%
NK240A 31
 
5.4%
NK237B 29
 
5.1%
NK237A 26
 
4.5%
NK236C 25
 
4.4%
NK237G 24
 
4.2%
NK238B 23
 
4.0%
NK238F 22
 
3.8%
NK237C 21
 
3.7%
NK236F 21
 
3.7%
Other values (38) 317
55.3%

Length

2023-12-12T15:40:16.803514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
nk236i 34
 
5.9%
nk240a 31
 
5.4%
nk237b 29
 
5.1%
nk237a 26
 
4.5%
nk236c 25
 
4.4%
nk237g 24
 
4.2%
nk238b 23
 
4.0%
nk238f 22
 
3.8%
nk237c 21
 
3.7%
nk236f 21
 
3.7%
Other values (38) 317
55.3%
Distinct92
Distinct (%)16.1%
Missing0
Missing (%)0.0%
Memory size4.6 KiB
2023-12-12T15:40:17.131080image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1719
Distinct characters93
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)4.0%

Sample

1st row*정*
2nd row*종*
3rd row*경*
4th row*동*
5th row*동*
ValueCountFrequency (%)
26
 
4.5%
25
 
4.4%
23
 
4.0%
22
 
3.8%
22
 
3.8%
22
 
3.8%
20
 
3.5%
20
 
3.5%
17
 
3.0%
15
 
2.6%
Other values (82) 361
63.0%
2023-12-12T15:40:17.614214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 1146
66.7%
26
 
1.5%
25
 
1.5%
23
 
1.3%
22
 
1.3%
22
 
1.3%
22
 
1.3%
20
 
1.2%
20
 
1.2%
17
 
1.0%
Other values (83) 376
 
21.9%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 1146
66.7%
Other Letter 573
33.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
26
 
4.5%
25
 
4.4%
23
 
4.0%
22
 
3.8%
22
 
3.8%
22
 
3.8%
20
 
3.5%
20
 
3.5%
17
 
3.0%
15
 
2.6%
Other values (82) 361
63.0%
Other Punctuation
ValueCountFrequency (%)
* 1146
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1146
66.7%
Hangul 573
33.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
26
 
4.5%
25
 
4.4%
23
 
4.0%
22
 
3.8%
22
 
3.8%
22
 
3.8%
20
 
3.5%
20
 
3.5%
17
 
3.0%
15
 
2.6%
Other values (82) 361
63.0%
Common
ValueCountFrequency (%)
* 1146
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1146
66.7%
Hangul 573
33.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 1146
100.0%
Hangul
ValueCountFrequency (%)
26
 
4.5%
25
 
4.4%
23
 
4.0%
22
 
3.8%
22
 
3.8%
22
 
3.8%
20
 
3.5%
20
 
3.5%
17
 
3.0%
15
 
2.6%
Other values (82) 361
63.0%
Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.6 KiB
Minimum2022-01-28 00:00:00
Maximum2022-01-28 00:00:00
2023-12-12T15:40:17.746170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:40:17.858963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)
Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size4.6 KiB
Minimum2022-01-01 00:00:00
Maximum2022-01-24 00:00:00
2023-12-12T15:40:17.973260image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:40:18.121999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=2)
Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size4.6 KiB
Minimum2022-01-25 00:00:00
Maximum2022-01-31 00:00:00
2023-12-12T15:40:18.266849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:40:18.393557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=2)

참여일수
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size4.6 KiB
31
571 
8
 
1
25
 
1

Length

Max length2
Median length2
Mean length1.9982548
Min length1

Unique

Unique2 ?
Unique (%)0.3%

Sample

1st row31
2nd row31
3rd row31
4th row31
5th row31

Common Values

ValueCountFrequency (%)
31 571
99.7%
8 1
 
0.2%
25 1
 
0.2%

Length

2023-12-12T15:40:18.536745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:40:18.658671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
31 571
99.7%
8 1
 
0.2%
25 1
 
0.2%

참여율
Real number (ℝ)

HIGH CORRELATION 

Distinct264
Distinct (%)46.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.639616
Minimum0.5
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.2 KiB
2023-12-12T15:40:18.806268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.5
5-th percentile1.66
Q120.9
median49.4
Q352.6
95-th percentile63.5
Maximum100
Range99.5
Interquartile range (IQR)31.7

Descriptive statistics

Standard deviation21.61813
Coefficient of variation (CV)0.54536679
Kurtosis-0.20571018
Mean39.639616
Median Absolute Deviation (MAD)8.4
Skewness-0.18699418
Sum22713.5
Variance467.34355
MonotonicityNot monotonic
2023-12-12T15:40:18.954507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 13
 
2.3%
51.8 11
 
1.9%
20.1 9
 
1.6%
52.6 9
 
1.6%
52.0 8
 
1.4%
100.0 8
 
1.4%
52.2 8
 
1.4%
50.4 7
 
1.2%
52.3 7
 
1.2%
51.7 7
 
1.2%
Other values (254) 486
84.8%
ValueCountFrequency (%)
0.5 1
 
0.2%
0.6 3
0.5%
0.9 3
0.5%
1.0 2
 
0.3%
1.1 6
1.0%
1.3 2
 
0.3%
1.4 6
1.0%
1.5 1
 
0.2%
1.6 5
0.9%
1.7 3
0.5%
ValueCountFrequency (%)
100.0 8
1.4%
99.0 1
 
0.2%
98.3 1
 
0.2%
85.5 1
 
0.2%
82.4 1
 
0.2%
82.0 1
 
0.2%
81.5 1
 
0.2%
81.4 1
 
0.2%
80.3 1
 
0.2%
80.1 1
 
0.2%

참여연구원인건비
Real number (ℝ)

HIGH CORRELATION 

Distinct537
Distinct (%)93.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3020252.4
Minimum46712
Maximum10077548
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.2 KiB
2023-12-12T15:40:19.128754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum46712
5-th percentile132085.4
Q11692855
median3260945
Q34260504
95-th percentile5352281.4
Maximum10077548
Range10030836
Interquartile range (IQR)2567649

Descriptive statistics

Standard deviation1743239
Coefficient of variation (CV)0.57718321
Kurtosis-0.18733202
Mean3020252.4
Median Absolute Deviation (MAD)1267178
Skewness0.031285158
Sum1.7306046 × 109
Variance3.0388821 × 1012
MonotonicityNot monotonic
2023-12-12T15:40:19.298090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
169863 15
 
2.6%
84932 6
 
1.0%
46712 4
 
0.7%
254795 3
 
0.5%
76438 3
 
0.5%
3212449 3
 
0.5%
3903877 2
 
0.3%
622973 2
 
0.3%
1167044 2
 
0.3%
2509896 2
 
0.3%
Other values (527) 531
92.7%
ValueCountFrequency (%)
46712 4
0.7%
67945 1
 
0.2%
73041 1
 
0.2%
76438 3
0.5%
81534 1
 
0.2%
84932 6
1.0%
92151 1
 
0.2%
93425 1
 
0.2%
103871 1
 
0.2%
104466 1
 
0.2%
ValueCountFrequency (%)
10077548 1
0.2%
8070022 1
0.2%
7978466 1
0.2%
7685452 1
0.2%
7596189 1
0.2%
7571219 1
0.2%
7549732 1
0.2%
7213742 1
0.2%
7125074 1
0.2%
6887436 1
0.2%

작성일
Date

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.6 KiB
Minimum2023-07-28 00:00:00
Maximum2023-07-28 00:00:00
2023-12-12T15:40:19.414912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:40:19.835811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Interactions

2023-12-12T15:40:16.148988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:40:15.927677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:40:16.256744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:40:16.030484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:40:19.908481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업_과제번호참여자명참여시작일참여종료일참여일수참여율참여연구원인건비
사업_과제번호1.0000.0000.0000.0000.0000.8070.709
참여자명0.0001.0000.0000.0000.0000.4760.486
참여시작일0.0000.0001.0000.0001.0000.1390.000
참여종료일0.0000.0000.0001.0001.0000.0000.000
참여일수0.0000.0001.0001.0001.0000.0000.000
참여율0.8070.4760.1390.0000.0001.0000.837
참여연구원인건비0.7090.4860.0000.0000.0000.8371.000
2023-12-12T15:40:20.015493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
참여일수사업_과제번호
참여일수1.0000.000
사업_과제번호0.0001.000
2023-12-12T15:40:20.122338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
참여율참여연구원인건비사업_과제번호참여일수
참여율1.0000.8430.4140.000
참여연구원인건비0.8431.0000.3300.000
사업_과제번호0.4140.3301.0000.000
참여일수0.0000.0000.0001.000

Missing values

2023-12-12T15:40:16.417804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:40:16.632056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사업_과제번호참여자명참여연구원참여적용년월일참여시작일참여종료일참여일수참여율참여연구원인건비작성일
0NK240K*정*2022-01-282022-01-012022-01-31310.6467122023-07-28
1NK240B*종*2022-01-282022-01-012022-01-313155.550601342023-07-28
2NK240B*경*2022-01-282022-01-012022-01-313131.223057212023-07-28
3NK240K*동*2022-01-282022-01-012022-01-31310.5467122023-07-28
4NK240B*동*2022-01-282022-01-012022-01-313131.924977512023-07-28
5NK237B*명*2022-01-282022-01-012022-01-313121.616837672023-07-28
6NK237B*유*2022-01-282022-01-012022-01-313135.530111622023-07-28
7NK237B*용*2022-01-282022-01-012022-01-313152.252993012023-07-28
8NK237B*필*2022-01-282022-01-012022-01-313151.450339752023-07-28
9NK239A*준*2022-01-282022-01-012022-01-313130.228197262023-07-28
사업_과제번호참여자명참여연구원참여적용년월일참여시작일참여종료일참여일수참여율참여연구원인건비작성일
563NK240A*영*2022-01-282022-01-012022-01-313134.727464302023-07-28
564NK240A*현*2022-01-282022-01-012022-01-313153.639568742023-07-28
565NK240A*정*2022-01-282022-01-012022-01-313134.428454602023-07-28
566NK240C*형*2022-01-282022-01-012022-01-31311.41316442023-07-28
567NK240C*재*2022-01-282022-01-012022-01-31311.61445532023-07-28
568NK240C*기*2022-01-282022-01-012022-01-31311.41409012023-07-28
569NK240C*기*2022-01-282022-01-012022-01-31311.41317292023-07-28
570NK240C*순*2022-01-282022-01-012022-01-31311.31044662023-07-28
571NK240C*준*2022-01-282022-01-012022-01-31311.41038712023-07-28
572NK240C*학*2022-01-282022-01-012022-01-31311.4921512023-07-28