Overview

Dataset statistics

Number of variables5
Number of observations885
Missing cells734
Missing cells (%)16.6%
Duplicate rows100
Duplicate rows (%)11.3%
Total size in memory36.4 KiB
Average record size in memory42.1 B

Variable types

Categorical2
Numeric2
Text1

Dataset

Description2014-2019년 문예진흥기금 공모사업 중 문학 분야 "집필공간운영" 지원 사업의 선정자 현황(예: 선정 분야, 성별, 생년)
Author한국문화예술위원회
URLhttps://www.data.go.kr/data/15076478/fileData.do

Alerts

Dataset has 100 (11.3%) duplicate rowsDuplicates
문학단체명 is highly overall correlated with 성별High correlation
성별 is highly overall correlated with 문학단체명High correlation
선정분야 has 36 (4.1%) missing valuesMissing
생년 has 698 (78.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 22:57:19.217256
Analysis finished2023-12-12 22:57:20.027171
Duration0.81 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

문학단체명
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
*지**단
303 
*을**집
181 
*1**학
162 
*악**원
99 
*날**날
63 
Other values (2)
77 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row*악**원
2nd row*악**원
3rd row*악**원
4th row*악**원
5th row*악**원

Common Values

ValueCountFrequency (%)
*지**단 303
34.2%
*을**집 181
20.5%
*1**학 162
18.3%
*악**원 99
 
11.2%
*날**날 63
 
7.1%
*버**집 50
 
5.6%
*산**꽃 27
 
3.1%

Length

2023-12-13T07:57:20.086789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:57:20.195397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지**단 303
34.2%
을**집 181
20.5%
1**학 162
18.3%
악**원 99
 
11.2%
날**날 63
 
7.1%
버**집 50
 
5.6%
산**꽃 27
 
3.1%

사업연도
Real number (ℝ)

Distinct6
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.3243
Minimum2014
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2023-12-13T07:57:20.323403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2014
5-th percentile2014
Q12015
median2016
Q32018
95-th percentile2019
Maximum2019
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.6579466
Coefficient of variation (CV)0.00082226188
Kurtosis-1.168757
Mean2016.3243
Median Absolute Deviation (MAD)1
Skewness0.1579232
Sum1784447
Variance2.748787
MonotonicityIncreasing
2023-12-13T07:57:20.423237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2015 170
19.2%
2016 167
18.9%
2014 153
17.3%
2017 148
16.7%
2018 126
14.2%
2019 121
13.7%
ValueCountFrequency (%)
2014 153
17.3%
2015 170
19.2%
2016 167
18.9%
2017 148
16.7%
2018 126
14.2%
2019 121
13.7%
ValueCountFrequency (%)
2019 121
13.7%
2018 126
14.2%
2017 148
16.7%
2016 167
18.9%
2015 170
19.2%
2014 153
17.3%

선정분야
Text

MISSING 

Distinct53
Distinct (%)6.2%
Missing36
Missing (%)4.1%
Memory size7.0 KiB
2023-12-13T07:57:20.605593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length6
Mean length2.3804476
Min length1

Characters and Unicode

Total characters2021
Distinct characters60
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)2.7%

Sample

1st row소설
2nd row
3rd row소설
4th row희곡
5th row소설
ValueCountFrequency (%)
소설 241
28.3%
미분류 222
26.0%
157
18.4%
아동문학 47
 
5.5%
희곡 47
 
5.5%
동화 42
 
4.9%
평론 18
 
2.1%
시조 8
 
0.9%
번역 6
 
0.7%
수필 6
 
0.7%
Other values (36) 59
 
6.9%
2023-12-13T07:57:20.993387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
256
12.7%
254
12.6%
224
11.1%
223
11.0%
222
11.0%
188
9.3%
92
 
4.6%
72
 
3.6%
55
 
2.7%
50
 
2.5%
Other values (50) 385
19.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1936
95.8%
Space Separator 72
 
3.6%
Math Symbol 13
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
256
13.2%
254
13.1%
224
11.6%
223
11.5%
222
11.5%
188
9.7%
92
 
4.8%
55
 
2.8%
50
 
2.6%
50
 
2.6%
Other values (48) 322
16.6%
Space Separator
ValueCountFrequency (%)
72
100.0%
Math Symbol
ValueCountFrequency (%)
| 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1936
95.8%
Common 85
 
4.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
256
13.2%
254
13.1%
224
11.6%
223
11.5%
222
11.5%
188
9.7%
92
 
4.8%
55
 
2.8%
50
 
2.6%
50
 
2.6%
Other values (48) 322
16.6%
Common
ValueCountFrequency (%)
72
84.7%
| 13
 
15.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1936
95.8%
ASCII 85
 
4.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
256
13.2%
254
13.1%
224
11.6%
223
11.5%
222
11.5%
188
9.7%
92
 
4.8%
55
 
2.8%
50
 
2.6%
50
 
2.6%
Other values (48) 322
16.6%
ASCII
ValueCountFrequency (%)
72
84.7%
| 13
 
15.3%

성별
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
미분류
697 
117 
71 

Length

Max length3
Median length3
Mean length2.5751412
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row미분류
2nd row미분류
3rd row미분류
4th row미분류
5th row미분류

Common Values

ValueCountFrequency (%)
미분류 697
78.8%
117
 
13.2%
71
 
8.0%

Length

2023-12-13T07:57:21.174819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:57:21.306005image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
미분류 697
78.8%
117
 
13.2%
71
 
8.0%

생년
Real number (ℝ)

MISSING 

Distinct44
Distinct (%)23.5%
Missing698
Missing (%)78.9%
Infinite0
Infinite (%)0.0%
Mean1967.4545
Minimum1940
Maximum1996
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2023-12-13T07:57:21.478812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1940
5-th percentile1952
Q11961
median1967
Q31972
95-th percentile1986.7
Maximum1996
Range56
Interquartile range (IQR)11

Descriptive statistics

Standard deviation10.256486
Coefficient of variation (CV)0.0052130739
Kurtosis0.24702732
Mean1967.4545
Median Absolute Deviation (MAD)6
Skewness0.46723076
Sum367914
Variance105.1955
MonotonicityNot monotonic
2023-12-13T07:57:21.647123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
1970 15
 
1.7%
1969 12
 
1.4%
1962 11
 
1.2%
1967 11
 
1.2%
1959 10
 
1.1%
1956 8
 
0.9%
1963 8
 
0.9%
1961 7
 
0.8%
1952 7
 
0.8%
1966 6
 
0.7%
Other values (34) 92
 
10.4%
(Missing) 698
78.9%
ValueCountFrequency (%)
1940 1
 
0.1%
1945 1
 
0.1%
1949 2
 
0.2%
1951 1
 
0.1%
1952 7
0.8%
1953 2
 
0.2%
1954 3
 
0.3%
1955 1
 
0.1%
1956 8
0.9%
1957 3
 
0.3%
ValueCountFrequency (%)
1996 1
 
0.1%
1995 1
 
0.1%
1993 2
 
0.2%
1991 2
 
0.2%
1989 1
 
0.1%
1988 2
 
0.2%
1987 1
 
0.1%
1986 5
0.6%
1984 2
 
0.2%
1982 1
 
0.1%

Interactions

2023-12-13T07:57:19.615979image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:57:19.429118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:57:19.695585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:57:19.533984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:57:21.749368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
문학단체명사업연도선정분야성별생년
문학단체명1.0000.4960.6700.6090.331
사업연도0.4961.0000.5840.4020.431
선정분야0.6700.5841.0000.4450.407
성별0.6090.4020.4451.0000.290
생년0.3310.4310.4070.2901.000
2023-12-13T07:57:21.865622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별문학단체명
성별1.0000.503
문학단체명0.5031.000
2023-12-13T07:57:21.954301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업연도생년문학단체명성별
사업연도1.000-0.1720.3110.299
생년-0.1721.0000.2020.209
문학단체명0.3110.2021.0000.503
성별0.2990.2090.5031.000

Missing values

2023-12-13T07:57:19.814106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:57:19.905600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T07:57:19.984508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

문학단체명사업연도선정분야성별생년
0*악**원2014소설미분류<NA>
1*악**원2014미분류<NA>
2*악**원2014소설미분류<NA>
3*악**원2014희곡미분류<NA>
4*악**원2014소설미분류<NA>
5*악**원2014소설미분류<NA>
6*악**원2014소설미분류<NA>
7*악**원2014소설미분류<NA>
8*악**원2014소설미분류<NA>
9*악**원2014소설미분류<NA>
문학단체명사업연도선정분야성별생년
875*악**원2019아동문학1966
876*악**원2019소설1953
877*악**원2019소설1949
878*악**원2019번역1971
879*악**원20191970
880*악**원2019아동문학1963
881*악**원2019소설1957
882*악**원2019소설1972
883*악**원2019소설1956
884*악**원2019평론1972

Duplicate rows

Most frequently occurring

문학단체명사업연도선정분야성별생년# duplicates
78*지**단2015미분류미분류<NA>62
89*지**단2018미분류미분류<NA>47
60*을**집2015미분류<NA>23
39*버**집2018미분류미분류<NA>21
46*악**원2014소설미분류<NA>20
84*지**단2017소설미분류<NA>20
79*지**단2016소설미분류<NA>19
42*산**꽃2015소설미분류<NA>18
48*악**원2015소설미분류<NA>18
31*날**날2016소설미분류<NA>17