Overview

Dataset statistics

Number of variables5
Number of observations1708
Missing cells1069
Missing cells (%)12.5%
Duplicate rows201
Duplicate rows (%)11.8%
Total size in memory70.2 KiB
Average record size in memory42.1 B

Variable types

Categorical2
Numeric2
Text1

Dataset

Description2014-2019년 문예진흥기금 공모사업 중 문학 분야 "집필공간운영" 지원 사업의 신청자 현황(예: 선정 분야, 성별, 생년)
Author한국문화예술위원회
URLhttps://www.data.go.kr/data/15076480/fileData.do

Alerts

Dataset has 201 (11.8%) duplicate rowsDuplicates
문학단체명 is highly overall correlated with 성별High correlation
성별 is highly overall correlated with 문학단체명High correlation
생년 has 1069 (62.6%) missing valuesMissing

Reproduction

Analysis started2024-04-17 11:19:38.899285
Analysis finished2024-04-17 11:19:39.526188
Duration0.63 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

문학단체명
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size13.5 KiB
*1**학
501 
*지**단
470 
*을**집
390 
*악**원
107 
*버**집
100 
Other values (2)
140 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row*악**원
2nd row*악**원
3rd row*악**원
4th row*악**원
5th row*악**원

Common Values

ValueCountFrequency (%)
*1**학 501
29.3%
*지**단 470
27.5%
*을**집 390
22.8%
*악**원 107
 
6.3%
*버**집 100
 
5.9%
*날**날 83
 
4.9%
*산**꽃 57
 
3.3%

Length

2024-04-17T20:19:39.580891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T20:19:39.687073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1**학 501
29.3%
지**단 470
27.5%
을**집 390
22.8%
악**원 107
 
6.3%
버**집 100
 
5.9%
날**날 83
 
4.9%
산**꽃 57
 
3.3%

사업연도
Real number (ℝ)

Distinct6
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.3343
Minimum2014
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.1 KiB
2024-04-17T20:19:39.786117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2014
5-th percentile2014
Q12015
median2016
Q32018
95-th percentile2019
Maximum2019
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.7339285
Coefficient of variation (CV)0.000859941
Kurtosis-1.2685346
Mean2016.3343
Median Absolute Deviation (MAD)2
Skewness0.095637094
Sum3443899
Variance3.0065082
MonotonicityIncreasing
2024-04-17T20:19:39.881265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2014 368
21.5%
2016 310
18.1%
2017 272
15.9%
2019 257
15.0%
2018 255
14.9%
2015 246
14.4%
ValueCountFrequency (%)
2014 368
21.5%
2015 246
14.4%
2016 310
18.1%
2017 272
15.9%
2018 255
14.9%
2019 257
15.0%
ValueCountFrequency (%)
2019 257
15.0%
2018 255
14.9%
2017 272
15.9%
2016 310
18.1%
2015 246
14.4%
2014 368
21.5%
Distinct52
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size13.5 KiB
2024-04-17T20:19:40.301432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7
Mean length2.2183841
Min length1

Characters and Unicode

Total characters3789
Distinct characters57
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique25 ?
Unique (%)1.5%

Sample

1st row소설
2nd row
3rd row소설
4th row희곡|소설
5th row소설
ValueCountFrequency (%)
소설 537
31.3%
410
23.9%
미분류 305
17.8%
아동문학 147
 
8.6%
희곡 104
 
6.1%
동화 49
 
2.9%
평론 30
 
1.7%
번역 21
 
1.2%
시나리오 20
 
1.2%
산문 18
 
1.0%
Other values (28) 76
 
4.4%
2024-04-17T20:19:40.587691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
552
14.6%
550
14.5%
459
12.1%
306
8.1%
305
8.0%
305
8.0%
201
 
5.3%
170
 
4.5%
152
 
4.0%
148
 
3.9%
Other values (47) 641
16.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3721
98.2%
Space Separator 52
 
1.4%
Math Symbol 12
 
0.3%
Close Punctuation 2
 
0.1%
Open Punctuation 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
552
14.8%
550
14.8%
459
12.3%
306
8.2%
305
8.2%
305
8.2%
201
 
5.4%
170
 
4.6%
152
 
4.1%
148
 
4.0%
Other values (43) 573
15.4%
Space Separator
ValueCountFrequency (%)
52
100.0%
Math Symbol
ValueCountFrequency (%)
| 12
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3721
98.2%
Common 68
 
1.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
552
14.8%
550
14.8%
459
12.3%
306
8.2%
305
8.2%
305
8.2%
201
 
5.4%
170
 
4.6%
152
 
4.1%
148
 
4.0%
Other values (43) 573
15.4%
Common
ValueCountFrequency (%)
52
76.5%
| 12
 
17.6%
) 2
 
2.9%
( 2
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3721
98.2%
ASCII 68
 
1.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
552
14.8%
550
14.8%
459
12.3%
306
8.2%
305
8.2%
305
8.2%
201
 
5.4%
170
 
4.6%
152
 
4.1%
148
 
4.0%
Other values (43) 573
15.4%
ASCII
ValueCountFrequency (%)
52
76.5%
| 12
 
17.6%
) 2
 
2.9%
( 2
 
2.9%

성별
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size13.5 KiB
미분류
951 
409 
348 

Length

Max length3
Median length3
Mean length2.1135831
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row미분류
2nd row미분류
3rd row미분류
4th row미분류
5th row미분류

Common Values

ValueCountFrequency (%)
미분류 951
55.7%
409
23.9%
348
 
20.4%

Length

2024-04-17T20:19:40.704906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T20:19:40.799923image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
미분류 951
55.7%
409
23.9%
348
 
20.4%

생년
Real number (ℝ)

MISSING 

Distinct54
Distinct (%)8.5%
Missing1069
Missing (%)62.6%
Infinite0
Infinite (%)0.0%
Mean1966.0501
Minimum1940
Maximum1996
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.1 KiB
2024-04-17T20:19:40.897748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1940
5-th percentile1950
Q11959
median1965
Q31972
95-th percentile1986
Maximum1996
Range56
Interquartile range (IQR)13

Descriptive statistics

Standard deviation10.297335
Coefficient of variation (CV)0.0052375751
Kurtosis0.010954117
Mean1966.0501
Median Absolute Deviation (MAD)6
Skewness0.36210909
Sum1256306
Variance106.03511
MonotonicityNot monotonic
2024-04-17T20:19:41.012214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1961 43
 
2.5%
1956 42
 
2.5%
1970 33
 
1.9%
1963 29
 
1.7%
1962 26
 
1.5%
1965 25
 
1.5%
1967 25
 
1.5%
1969 25
 
1.5%
1968 24
 
1.4%
1959 22
 
1.3%
Other values (44) 345
 
20.2%
(Missing) 1069
62.6%
ValueCountFrequency (%)
1940 1
 
0.1%
1941 1
 
0.1%
1943 2
 
0.1%
1945 4
0.2%
1946 6
0.4%
1947 3
 
0.2%
1948 3
 
0.2%
1949 8
0.5%
1950 7
0.4%
1951 6
0.4%
ValueCountFrequency (%)
1996 1
 
0.1%
1995 2
 
0.1%
1993 3
 
0.2%
1992 2
 
0.1%
1991 4
 
0.2%
1990 4
 
0.2%
1989 2
 
0.1%
1988 4
 
0.2%
1987 3
 
0.2%
1986 10
0.6%

Interactions

2024-04-17T20:19:39.243103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T20:19:39.093581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T20:19:39.326819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T20:19:39.169082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-17T20:19:41.101740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
문학단체명사업연도신청분야성별생년
문학단체명1.0000.5590.7230.6360.307
사업연도0.5591.0000.5390.5070.171
신청분야0.7230.5391.0000.5290.625
성별0.6360.5070.5291.0000.201
생년0.3070.1710.6250.2011.000
2024-04-17T20:19:41.188822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
문학단체명성별
문학단체명1.0000.534
성별0.5341.000
2024-04-17T20:19:41.263802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업연도생년문학단체명성별
사업연도1.000-0.0090.3500.395
생년-0.0091.0000.1290.121
문학단체명0.3500.1291.0000.534
성별0.3950.1210.5341.000

Missing values

2024-04-17T20:19:39.418934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T20:19:39.494040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

문학단체명사업연도신청분야성별생년
0*악**원2014소설미분류<NA>
1*악**원2014미분류<NA>
2*악**원2014소설미분류<NA>
3*악**원2014희곡|소설미분류<NA>
4*악**원2014소설미분류<NA>
5*악**원2014소설미분류<NA>
6*악**원2014소설미분류<NA>
7*악**원2014소설미분류<NA>
8*악**원2014소설미분류<NA>
9*악**원2014소설미분류<NA>
문학단체명사업연도신청분야성별생년
1698*악**원2019희곡1974
1699*악**원2019평론1952
1700*악**원20191962
1701*악**원2019희곡1971
1702*악**원2019아동문학1963
1703*악**원2019소설1972
1704*악**원2019평론1974
1705*악**원2019소설1972
1706*악**원2019소설1972
1707*악**원2019소설1961

Duplicate rows

Most frequently occurring

문학단체명사업연도신청분야성별생년# duplicates
0*1**학2014미분류<NA>74
1*1**학2014미분류<NA>74
159*지**단2014소설미분류<NA>43
166*지**단2015소설미분류<NA>40
94*버**집2018미분류미분류<NA>39
119*을**집2014미분류<NA>33
101*산**꽃2015소설미분류<NA>31
125*을**집2015미분류<NA>31
137*을**집2017미분류<NA>31
172*지**단2016소설미분류<NA>31