Overview

Dataset statistics

Number of variables6
Number of observations1531
Missing cells0
Missing cells (%)0.0%
Duplicate rows237
Duplicate rows (%)15.5%
Total size in memory76.4 KiB
Average record size in memory51.1 B

Variable types

Categorical2
Numeric3
Text1

Dataset

Description2014-2019년 문예진흥기금 공모사업 중 문학 분야 "집필공간운영" 지원 사업의 생산 예술작품 현황(예: 생산분야, 입주년, 생산 작품 수 등)
Author한국문화예술위원회
URLhttps://www.data.go.kr/data/15076477/fileData.do

Alerts

Dataset has 237 (15.5%) duplicate rowsDuplicates
사업연도 is highly overall correlated with 입주년High correlation
입주년 is highly overall correlated with 사업연도High correlation

Reproduction

Analysis started2023-12-12 05:19:51.961664
Analysis finished2023-12-12 05:19:53.879562
Duration1.92 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

문학단체명
Categorical

Distinct7
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size12.1 KiB
*지**단
629 
*을**집
399 
*버**집
229 
*1**학
107 
*날**날
86 
Other values (2)
81 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row*악**원
2nd row*악**원
3rd row*악**원
4th row*악**원
5th row*악**원

Common Values

ValueCountFrequency (%)
*지**단 629
41.1%
*을**집 399
26.1%
*버**집 229
 
15.0%
*1**학 107
 
7.0%
*날**날 86
 
5.6%
*악**원 77
 
5.0%
*산**꽃 4
 
0.3%

Length

2023-12-12T14:19:53.984181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:19:54.433657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지**단 629
41.1%
을**집 399
26.1%
버**집 229
 
15.0%
1**학 107
 
7.0%
날**날 86
 
5.6%
악**원 77
 
5.0%
산**꽃 4
 
0.3%

사업연도
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2017.1509
Minimum2014
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.6 KiB
2023-12-12T14:19:54.528455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2014
5-th percentile2014
Q12016
median2018
Q32019
95-th percentile2019
Maximum2019
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.6931516
Coefficient of variation (CV)0.00083937776
Kurtosis-0.95208686
Mean2017.1509
Median Absolute Deviation (MAD)1
Skewness-0.56084543
Sum3088258
Variance2.8667623
MonotonicityIncreasing
2023-12-12T14:19:54.672675image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2019 438
28.6%
2018 344
22.5%
2017 258
16.9%
2014 170
 
11.1%
2016 163
 
10.6%
2015 158
 
10.3%
ValueCountFrequency (%)
2014 170
 
11.1%
2015 158
 
10.3%
2016 163
 
10.6%
2017 258
16.9%
2018 344
22.5%
2019 438
28.6%
ValueCountFrequency (%)
2019 438
28.6%
2018 344
22.5%
2017 258
16.9%
2016 163
 
10.6%
2015 158
 
10.3%
2014 170
 
11.1%

구분
Categorical

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size12.1 KiB
미분류
1140 
발표
248 
출간
143 

Length

Max length3
Median length3
Mean length2.7446114
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row미분류
2nd row미분류
3rd row미분류
4th row미분류
5th row미분류

Common Values

ValueCountFrequency (%)
미분류 1140
74.5%
발표 248
 
16.2%
출간 143
 
9.3%

Length

2023-12-12T14:19:54.795794image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:19:54.937120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
미분류 1140
74.5%
발표 248
 
16.2%
출간 143
 
9.3%
Distinct76
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Memory size12.1 KiB
2023-12-12T14:19:55.145696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length7
Mean length2.1730895
Min length1

Characters and Unicode

Total characters3327
Distinct characters90
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique37 ?
Unique (%)2.4%

Sample

1st row장편소설
2nd row장편소설
3rd row번역
4th row
5th row희곡
ValueCountFrequency (%)
430
28.1%
소설 166
 
10.8%
산문 136
 
8.9%
미분류 107
 
7.0%
동화 73
 
4.8%
희곡 67
 
4.4%
단편소설 65
 
4.2%
장편소설 64
 
4.2%
평론 49
 
3.2%
시조 44
 
2.9%
Other values (66) 330
21.6%
2023-12-12T14:19:55.517833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
537
16.1%
335
 
10.1%
319
 
9.6%
211
 
6.3%
192
 
5.8%
161
 
4.8%
140
 
4.2%
125
 
3.8%
107
 
3.2%
107
 
3.2%
Other values (80) 1093
32.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3311
99.5%
Math Symbol 14
 
0.4%
Decimal Number 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
537
16.2%
335
 
10.1%
319
 
9.6%
211
 
6.4%
192
 
5.8%
161
 
4.9%
140
 
4.2%
125
 
3.8%
107
 
3.2%
107
 
3.2%
Other values (77) 1077
32.5%
Decimal Number
ValueCountFrequency (%)
3 1
50.0%
1 1
50.0%
Math Symbol
ValueCountFrequency (%)
| 14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3311
99.5%
Common 16
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
537
16.2%
335
 
10.1%
319
 
9.6%
211
 
6.4%
192
 
5.8%
161
 
4.9%
140
 
4.2%
125
 
3.8%
107
 
3.2%
107
 
3.2%
Other values (77) 1077
32.5%
Common
ValueCountFrequency (%)
| 14
87.5%
3 1
 
6.2%
1 1
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3311
99.5%
ASCII 16
 
0.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
537
16.2%
335
 
10.1%
319
 
9.6%
211
 
6.4%
192
 
5.8%
161
 
4.9%
140
 
4.2%
125
 
3.8%
107
 
3.2%
107
 
3.2%
Other values (77) 1077
32.5%
ASCII
ValueCountFrequency (%)
| 14
87.5%
3 1
 
6.2%
1 1
 
6.2%

입주년
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.8641
Minimum2010
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.6 KiB
2023-12-12T14:19:55.636018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2014
Q12015
median2017
Q32019
95-th percentile2019
Maximum2019
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.8671115
Coefficient of variation (CV)0.00092574978
Kurtosis-0.84106134
Mean2016.8641
Median Absolute Deviation (MAD)2
Skewness-0.53794729
Sum3087819
Variance3.4861055
MonotonicityNot monotonic
2023-12-12T14:19:55.753171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
2019 386
25.2%
2017 316
20.6%
2018 294
19.2%
2015 192
12.5%
2014 182
11.9%
2016 103
 
6.7%
2013 51
 
3.3%
2012 6
 
0.4%
2010 1
 
0.1%
ValueCountFrequency (%)
2010 1
 
0.1%
2012 6
 
0.4%
2013 51
 
3.3%
2014 182
11.9%
2015 192
12.5%
2016 103
 
6.7%
2017 316
20.6%
2018 294
19.2%
2019 386
25.2%
ValueCountFrequency (%)
2019 386
25.2%
2018 294
19.2%
2017 316
20.6%
2016 103
 
6.7%
2015 192
12.5%
2014 182
11.9%
2013 51
 
3.3%
2012 6
 
0.4%
2010 1
 
0.1%

생산작품수(건)
Real number (ℝ)

Distinct19
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.5218811
Minimum0
Maximum70
Zeros5
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size13.6 KiB
2023-12-12T14:19:55.871386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum70
Range70
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.7745361
Coefficient of variation (CV)1.8230964
Kurtosis344.30039
Mean1.5218811
Median Absolute Deviation (MAD)0
Skewness16.082109
Sum2330
Variance7.6980503
MonotonicityNot monotonic
2023-12-12T14:19:55.987197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
1 1252
81.8%
2 178
 
11.6%
3 33
 
2.2%
5 18
 
1.2%
8 8
 
0.5%
6 7
 
0.5%
10 6
 
0.4%
0 5
 
0.3%
4 5
 
0.3%
12 4
 
0.3%
Other values (9) 15
 
1.0%
ValueCountFrequency (%)
0 5
 
0.3%
1 1252
81.8%
2 178
 
11.6%
3 33
 
2.2%
4 5
 
0.3%
5 18
 
1.2%
6 7
 
0.5%
7 3
 
0.2%
8 8
 
0.5%
10 6
 
0.4%
ValueCountFrequency (%)
70 1
 
0.1%
56 1
 
0.1%
20 2
 
0.1%
17 2
 
0.1%
16 1
 
0.1%
14 1
 
0.1%
13 2
 
0.1%
12 4
0.3%
11 2
 
0.1%
10 6
0.4%

Interactions

2023-12-12T14:19:53.088292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:19:52.278966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:19:52.675827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:19:53.236654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:19:52.396170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:19:52.809019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:19:53.402618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:19:52.529866image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:19:52.944801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:19:56.079773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
문학단체명사업연도구분생산분야입주년생산작품수(건)
문학단체명1.0000.4050.6000.7650.4820.077
사업연도0.4051.0000.3270.5920.8770.083
구분0.6000.3271.0000.6730.4040.000
생산분야0.7650.5920.6731.0000.6490.482
입주년0.4820.8770.4040.6491.0000.000
생산작품수(건)0.0770.0830.0000.4820.0001.000
2023-12-12T14:19:56.184982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분문학단체명
구분1.0000.492
문학단체명0.4921.000
2023-12-12T14:19:56.268911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업연도입주년생산작품수(건)문학단체명구분
사업연도1.0000.9270.0360.2490.294
입주년0.9271.0000.0760.2820.285
생산작품수(건)0.0360.0761.0000.0500.000
문학단체명0.2490.2820.0501.0000.492
구분0.2940.2850.0000.4921.000

Missing values

2023-12-12T14:19:53.633956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:19:53.809359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

문학단체명사업연도구분생산분야입주년생산작품수(건)
0*악**원2014미분류장편소설20141
1*악**원2014미분류장편소설20141
2*악**원2014미분류번역20141
3*악**원2014미분류20141
4*악**원2014미분류희곡20141
5*악**원2014미분류장편소설20141
6*악**원2014미분류장편소설20141
7*악**원2014미분류장편소설20131
8*악**원2014미분류장편소설20131
9*악**원2014미분류장편소설20131
문학단체명사업연도구분생산분야입주년생산작품수(건)
1521*악**원2019미분류동시20191
1522*악**원2019미분류201912
1523*악**원2019미분류연구논문20191
1524*악**원2019미분류소설201913
1525*악**원2019미분류동인시20191
1526*악**원2019미분류동인시20191
1527*악**원2019미분류산문20191
1528*악**원2019미분류소설20191
1529*악**원2019미분류창작동화20191
1530*악**원2019미분류구비문학20191

Duplicate rows

Most frequently occurring

문학단체명사업연도구분생산분야입주년생산작품수(건)# duplicates
56*버**집2019미분류2019160
28*날**날2018미분류미분류2018143
75*을**집2014미분류동화책2014126
114*을**집2019미분류산문2019126
116*을**집2019미분류2019125
112*을**집2019미분류동화2019121
117*을**집2019미분류2019221
80*을**집2015미분류동화2015120
93*을**집2017미분류2017219
219*지**단2019발표산문2019118