Overview

Dataset statistics

Number of variables9
Number of observations174
Missing cells116
Missing cells (%)7.4%
Duplicate rows1
Duplicate rows (%)0.6%
Total size in memory13.4 KiB
Average record size in memory78.8 B

Variable types

Text1
Numeric3
Categorical5

Dataset

Description2014-2019년 문예진흥기금 공모사업 중 공연예술 분야 "올해의 신작" 지원 사업의 일자리 창출 성과(예: 고용유형, 고용기간, 고용인원 등)
Author한국문화예술위원회
URLhttps://www.data.go.kr/data/15076409/fileData.do

Alerts

Dataset has 1 (0.6%) duplicate rowsDuplicates
고용유형 is highly overall correlated with 고용기간_주 and 2 other fieldsHigh correlation
고용기간_주 is highly overall correlated with 고용유형High correlation
고용기간_월 is highly overall correlated with 고용유형High correlation
고용대상 is highly overall correlated with 고용유형High correlation
고용기간_시간 is highly imbalanced (53.0%)Imbalance
고용기간_일 has 58 (33.3%) missing valuesMissing
고용인원(명) has 58 (33.3%) missing valuesMissing
고용기간_일 has 95 (54.6%) zerosZeros
고용인원(명) has 44 (25.3%) zerosZeros

Reproduction

Analysis started2023-12-12 20:10:09.736233
Analysis finished2023-12-12 20:10:11.967591
Duration2.23 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct106
Distinct (%)60.9%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
2023-12-13T05:10:12.202759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters870
Distinct characters120
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique80 ?
Unique (%)46.0%

Sample

1st row*나**단
2nd row*경**단
3rd row*러**단
4th row*울**터
5th row*선**단
ValueCountFrequency (%)
이**단 9
 
5.2%
단**험 7
 
4.0%
앤**스 7
 
4.0%
스**n 6
 
3.4%
로**인 6
 
3.4%
애**순 6
 
3.4%
빈**스 5
 
2.9%
은**단 5
 
2.9%
페**크 4
 
2.3%
블**티 3
 
1.7%
Other values (96) 116
66.7%
2023-12-13T05:10:12.650189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 522
60.0%
65
 
7.5%
22
 
2.5%
14
 
1.6%
8
 
0.9%
8
 
0.9%
7
 
0.8%
7
 
0.8%
7
 
0.8%
n 6
 
0.7%
Other values (110) 204
 
23.4%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 523
60.1%
Other Letter 316
36.3%
Lowercase Letter 18
 
2.1%
Uppercase Letter 12
 
1.4%
Decimal Number 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
65
 
20.6%
22
 
7.0%
14
 
4.4%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (94) 166
52.5%
Lowercase Letter
ValueCountFrequency (%)
n 6
33.3%
o 4
22.2%
y 4
22.2%
i 1
 
5.6%
r 1
 
5.6%
t 1
 
5.6%
a 1
 
5.6%
Uppercase Letter
ValueCountFrequency (%)
C 5
41.7%
D 2
 
16.7%
S 2
 
16.7%
P 1
 
8.3%
R 1
 
8.3%
J 1
 
8.3%
Other Punctuation
ValueCountFrequency (%)
* 522
99.8%
! 1
 
0.2%
Decimal Number
ValueCountFrequency (%)
1 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 524
60.2%
Hangul 316
36.3%
Latin 30
 
3.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
65
 
20.6%
22
 
7.0%
14
 
4.4%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (94) 166
52.5%
Latin
ValueCountFrequency (%)
n 6
20.0%
C 5
16.7%
o 4
13.3%
y 4
13.3%
D 2
 
6.7%
S 2
 
6.7%
P 1
 
3.3%
i 1
 
3.3%
r 1
 
3.3%
R 1
 
3.3%
Other values (3) 3
10.0%
Common
ValueCountFrequency (%)
* 522
99.6%
1 1
 
0.2%
! 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 554
63.7%
Hangul 316
36.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 522
94.2%
n 6
 
1.1%
C 5
 
0.9%
o 4
 
0.7%
y 4
 
0.7%
D 2
 
0.4%
S 2
 
0.4%
1 1
 
0.2%
P 1
 
0.2%
i 1
 
0.2%
Other values (6) 6
 
1.1%
Hangul
ValueCountFrequency (%)
65
 
20.6%
22
 
7.0%
14
 
4.4%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (94) 166
52.5%

사업연도
Real number (ℝ)

Distinct6
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.9368
Minimum2014
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-12-13T05:10:12.794750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2014
5-th percentile2014
Q12016
median2017
Q32018
95-th percentile2019
Maximum2019
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.4904408
Coefficient of variation (CV)0.00073896259
Kurtosis-0.73470475
Mean2016.9368
Median Absolute Deviation (MAD)1
Skewness-0.50524711
Sum350947
Variance2.2214139
MonotonicityIncreasing
2023-12-13T05:10:12.927581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2018 52
29.9%
2017 41
23.6%
2019 23
13.2%
2016 22
12.6%
2015 21
12.1%
2014 15
 
8.6%
ValueCountFrequency (%)
2014 15
 
8.6%
2015 21
12.1%
2016 22
12.6%
2017 41
23.6%
2018 52
29.9%
2019 23
13.2%
ValueCountFrequency (%)
2019 23
13.2%
2018 52
29.9%
2017 41
23.6%
2016 22
12.6%
2015 21
12.1%
2014 15
 
8.6%

고용유형
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
<NA>
58 
미분류
44 
1개월~12개월
27 
1일~1주
21 
1주~1개월
17 

Length

Max length8
Median length6
Mean length4.7241379
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 58
33.3%
미분류 44
25.3%
1개월~12개월 27
15.5%
1일~1주 21
 
12.1%
1주~1개월 17
 
9.8%
1일 이내 7
 
4.0%

Length

2023-12-13T05:10:13.090775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T05:10:13.221855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 58
32.0%
미분류 44
24.3%
1개월~12개월 27
14.9%
1일~1주 21
 
11.6%
1주~1개월 17
 
9.4%
1일 7
 
3.9%
이내 7
 
3.9%

고용기간_시간
Categorical

IMBALANCE 

Distinct6
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
0
109 
<NA>
58 
9
 
2
8
 
2
4
 
2

Length

Max length4
Median length1
Mean length2
Min length1

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
0 109
62.6%
<NA> 58
33.3%
9 2
 
1.1%
8 2
 
1.1%
4 2
 
1.1%
7 1
 
0.6%

Length

2023-12-13T05:10:13.387141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T05:10:13.556257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 109
62.6%
na 58
33.3%
9 2
 
1.1%
8 2
 
1.1%
4 2
 
1.1%
7 1
 
0.6%

고용기간_일
Real number (ℝ)

MISSING  ZEROS 

Distinct6
Distinct (%)5.2%
Missing58
Missing (%)33.3%
Infinite0
Infinite (%)0.0%
Mean0.82758621
Minimum0
Maximum6
Zeros95
Zeros (%)54.6%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-12-13T05:10:13.701977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5.25
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.8522647
Coefficient of variation (CV)2.2381532
Kurtosis2.2399704
Mean0.82758621
Median Absolute Deviation (MAD)0
Skewness1.9697096
Sum96
Variance3.4308846
MonotonicityNot monotonic
2023-12-13T05:10:13.830411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 95
54.6%
5 7
 
4.0%
6 6
 
3.4%
4 3
 
1.7%
3 3
 
1.7%
2 2
 
1.1%
(Missing) 58
33.3%
ValueCountFrequency (%)
0 95
54.6%
2 2
 
1.1%
3 3
 
1.7%
4 3
 
1.7%
5 7
 
4.0%
6 6
 
3.4%
ValueCountFrequency (%)
6 6
 
3.4%
5 7
 
4.0%
4 3
 
1.7%
3 3
 
1.7%
2 2
 
1.1%
0 95
54.6%

고용기간_주
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
0
99 
<NA>
58 
2
 
8
3
 
7
4
 
2

Length

Max length4
Median length1
Mean length2
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
0 99
56.9%
<NA> 58
33.3%
2 8
 
4.6%
3 7
 
4.0%
4 2
 
1.1%

Length

2023-12-13T05:10:13.985374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T05:10:14.129204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 99
56.9%
na 58
33.3%
2 8
 
4.6%
3 7
 
4.0%
4 2
 
1.1%

고용기간_월
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
0
89 
<NA>
58 
3
17 
2
 
8
4
 
2

Length

Max length4
Median length1
Mean length2
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
0 89
51.1%
<NA> 58
33.3%
3 17
 
9.8%
2 8
 
4.6%
4 2
 
1.1%

Length

2023-12-13T05:10:14.290731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T05:10:14.441957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 89
51.1%
na 58
33.3%
3 17
 
9.8%
2 8
 
4.6%
4 2
 
1.1%

고용인원(명)
Real number (ℝ)

MISSING  ZEROS 

Distinct22
Distinct (%)19.0%
Missing58
Missing (%)33.3%
Infinite0
Infinite (%)0.0%
Mean4.2068966
Minimum0
Maximum30
Zeros44
Zeros (%)25.3%
Negative0
Negative (%)0.0%
Memory size1.7 KiB
2023-12-13T05:10:14.606912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q35
95-th percentile18.5
Maximum30
Range30
Interquartile range (IQR)5

Descriptive statistics

Standard deviation6.6679209
Coefficient of variation (CV)1.5849976
Kurtosis3.4613659
Mean4.2068966
Median Absolute Deviation (MAD)1
Skewness2.0006082
Sum488
Variance44.461169
MonotonicityNot monotonic
2023-12-13T05:10:14.749829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
0 44
25.3%
1 21
 
12.1%
2 9
 
5.2%
4 6
 
3.4%
3 6
 
3.4%
5 5
 
2.9%
10 3
 
1.7%
15 2
 
1.1%
20 2
 
1.1%
25 2
 
1.1%
Other values (12) 16
 
9.2%
(Missing) 58
33.3%
ValueCountFrequency (%)
0 44
25.3%
1 21
12.1%
2 9
 
5.2%
3 6
 
3.4%
4 6
 
3.4%
5 5
 
2.9%
6 2
 
1.1%
8 1
 
0.6%
9 1
 
0.6%
10 3
 
1.7%
ValueCountFrequency (%)
30 1
0.6%
27 1
0.6%
25 2
1.1%
20 2
1.1%
18 1
0.6%
17 2
1.1%
16 2
1.1%
15 2
1.1%
14 1
0.6%
13 1
0.6%

고용대상
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
<NA>
58 
미분류
44 
일반
40 
청년
32 

Length

Max length4
Median length3
Mean length2.9195402
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 58
33.3%
미분류 44
25.3%
일반 40
23.0%
청년 32
18.4%

Length

2023-12-13T05:10:14.914517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T05:10:15.050824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 58
33.3%
미분류 44
25.3%
일반 40
23.0%
청년 32
18.4%

Interactions

2023-12-13T05:10:10.871578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:10:10.234250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:10:10.548584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:10:10.977874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:10:10.339339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:10:10.647395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:10:11.412336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:10:10.454111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:10:10.756653image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T05:10:15.147676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업연도고용유형고용기간_시간고용기간_일고용기간_주고용기간_월고용인원(명)고용대상
사업연도1.0000.3880.1250.0000.2120.1640.3630.672
고용유형0.3881.0000.8400.6050.6280.6280.4960.723
고용기간_시간0.1250.8401.0000.0000.0000.0000.0000.150
고용기간_일0.0000.6050.0001.0000.0000.0000.3460.482
고용기간_주0.2120.6280.0000.0001.0000.0000.0660.199
고용기간_월0.1640.6280.0000.0000.0001.0000.3850.278
고용인원(명)0.3630.4960.0000.3460.0660.3851.0000.698
고용대상0.6720.7230.1500.4820.1990.2780.6981.000
2023-12-13T05:10:15.263952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
고용대상고용기간_월고용기간_주고용기간_시간고용유형
고용대상1.0000.2640.1870.1110.707
고용기간_월0.2641.0000.0000.0000.554
고용기간_주0.1870.0001.0000.0000.554
고용기간_시간0.1110.0000.0001.0000.472
고용유형0.7070.5540.5540.4721.000
2023-12-13T05:10:15.374910image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업연도고용기간_일고용인원(명)고용유형고용기간_시간고용기간_주고용기간_월고용대상
사업연도1.000-0.075-0.1630.3150.0920.2010.1540.330
고용기간_일-0.0751.0000.3290.4630.0000.0000.0000.223
고용인원(명)-0.1630.3291.0000.3080.0000.0000.3250.331
고용유형0.3150.4630.3081.0000.4720.5540.5540.707
고용기간_시간0.0920.0000.0000.4721.0000.0000.0000.111
고용기간_주0.2010.0000.0000.5540.0001.0000.0000.187
고용기간_월0.1540.0000.3250.5540.0000.0001.0000.264
고용대상0.3300.2230.3310.7070.1110.1870.2641.000

Missing values

2023-12-13T05:10:11.552445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T05:10:11.692068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T05:10:11.855003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

공연단체명사업연도고용유형고용기간_시간고용기간_일고용기간_주고용기간_월고용인원(명)고용대상
0*나**단2014<NA><NA><NA><NA><NA><NA><NA>
1*경**단2014<NA><NA><NA><NA><NA><NA><NA>
2*러**단2014<NA><NA><NA><NA><NA><NA><NA>
3*울**터2014<NA><NA><NA><NA><NA><NA><NA>
4*선**단2014<NA><NA><NA><NA><NA><NA><NA>
5*스**창2014<NA><NA><NA><NA><NA><NA><NA>
6*경**영2014<NA><NA><NA><NA><NA><NA><NA>
7*o**r2014<NA><NA><NA><NA><NA><NA><NA>
8*이**쳐2014<NA><NA><NA><NA><NA><NA><NA>
9*능**부2014<NA><NA><NA><NA><NA><NA><NA>
공연단체명사업연도고용유형고용기간_시간고용기간_일고용기간_주고용기간_월고용인원(명)고용대상
164*정**옥2019미분류00000미분류
165*단**수2019미분류00000미분류
166*단**고2019미분류00000미분류
167*금**스2019미분류00000미분류
168*처**인2019미분류00000미분류
169*단**리2019미분류00000미분류
170*이**트2019미분류00000미분류
171*컴**니2019미분류00000미분류
172*단**희2019미분류00000미분류
173*단**자2019미분류00000미분류

Duplicate rows

Most frequently occurring

공연단체명사업연도고용유형고용기간_시간고용기간_일고용기간_주고용기간_월고용인원(명)고용대상# duplicates
0*이**단20171일~1주04001일반2