Overview

Dataset statistics

Number of variables7
Number of observations56
Missing cells33
Missing cells (%)8.4%
Duplicate rows2
Duplicate rows (%)3.6%
Total size in memory3.2 KiB
Average record size in memory59.4 B

Variable types

Text1
Categorical2
Boolean3
DateTime1

Dataset

Description2014, 2015, 2018, 2019년 문예진흥기금 공모사업 중 문학 분야 "문학행사 및 연구" 지원 사업의 개요(예: 사업유형, 사업시작일, 사업종료일)
Author한국문화예술위원회
URLhttps://www.data.go.kr/data/15076463/fileData.do

Alerts

Dataset has 2 (3.6%) duplicate rowsDuplicates
사업유형_기타 is highly overall correlated with 사업종료일High correlation
사업종료일 is highly overall correlated with 사업연도 and 1 other fieldsHigh correlation
사업연도 is highly overall correlated with 사업종료일High correlation
사업유형_행사 is highly overall correlated with 사업유형_연구조사High correlation
사업유형_연구조사 is highly overall correlated with 사업유형_행사High correlation
사업유형_기타 is highly imbalanced (87.1%)Imbalance
사업시작일 has 33 (58.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 13:03:26.243174
Analysis finished2023-12-12 13:03:26.811025
Duration0.57 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct28
Distinct (%)50.0%
Missing0
Missing (%)0.0%
Memory size580.0 B
2023-12-12T22:03:26.934941image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters280
Distinct characters36
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19 ?
Unique (%)33.9%

Sample

1st row*국**회
2nd row*색**원
3rd row*린**회
4th row*랑**회
5th row*국**회
ValueCountFrequency (%)
국**회 20
35.7%
동**회 3
 
5.4%
린**회 2
 
3.6%
우**터 2
 
3.6%
디**원 2
 
3.6%
주**의 2
 
3.6%
국**관 2
 
3.6%
국**의 2
 
3.6%
b**회 2
 
3.6%
학**사 1
 
1.8%
Other values (18) 18
32.1%
2023-12-12T22:03:27.298965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 168
60.0%
33
 
11.8%
24
 
8.6%
4
 
1.4%
3
 
1.1%
3
 
1.1%
3
 
1.1%
3
 
1.1%
3
 
1.1%
3
 
1.1%
Other values (26) 33
 
11.8%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 168
60.0%
Other Letter 110
39.3%
Uppercase Letter 2
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
33
30.0%
24
21.8%
4
 
3.6%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
Other values (24) 28
25.5%
Other Punctuation
ValueCountFrequency (%)
* 168
100.0%
Uppercase Letter
ValueCountFrequency (%)
B 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 168
60.0%
Hangul 110
39.3%
Latin 2
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
33
30.0%
24
21.8%
4
 
3.6%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
Other values (24) 28
25.5%
Common
ValueCountFrequency (%)
* 168
100.0%
Latin
ValueCountFrequency (%)
B 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 170
60.7%
Hangul 110
39.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 168
98.8%
B 2
 
1.2%
Hangul
ValueCountFrequency (%)
33
30.0%
24
21.8%
4
 
3.6%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
Other values (24) 28
25.5%

사업연도
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)7.1%
Missing0
Missing (%)0.0%
Memory size580.0 B
2014
22 
2018
12 
2015
11 
2019
11 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2014
2nd row2014
3rd row2014
4th row2014
5th row2014

Common Values

ValueCountFrequency (%)
2014 22
39.3%
2018 12
21.4%
2015 11
19.6%
2019 11
19.6%

Length

2023-12-12T22:03:27.460612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:03:27.577261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2014 22
39.3%
2018 12
21.4%
2015 11
19.6%
2019 11
19.6%

사업유형_행사
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size188.0 B
False
30 
True
26 
ValueCountFrequency (%)
False 30
53.6%
True 26
46.4%
2023-12-12T22:03:27.671967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

사업유형_연구조사
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size188.0 B
True
32 
False
24 
ValueCountFrequency (%)
True 32
57.1%
False 24
42.9%
2023-12-12T22:03:27.760372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

사업유형_기타
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size188.0 B
False
55 
True
 
1
ValueCountFrequency (%)
False 55
98.2%
True 1
 
1.8%
2023-12-12T22:03:27.857203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

사업시작일
Date

MISSING 

Distinct17
Distinct (%)73.9%
Missing33
Missing (%)58.9%
Memory size580.0 B
Minimum2018-01-01 00:00:00
Maximum2019-06-29 00:00:00
2023-12-12T22:03:27.963014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:03:28.105299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)

사업종료일
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)14.3%
Missing0
Missing (%)0.0%
Memory size580.0 B
<NA>
33 
2018-12-31
11 
2019-12-31
2018-09-16
 
1
2020-01-31
 
1
Other values (3)
 
3

Length

Max length10
Median length4
Mean length6.4642857
Min length4

Unique

Unique5 ?
Unique (%)8.9%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 33
58.9%
2018-12-31 11
 
19.6%
2019-12-31 7
 
12.5%
2018-09-16 1
 
1.8%
2020-01-31 1
 
1.8%
2019-11-16 1
 
1.8%
2019-12-04 1
 
1.8%
2019-06-29 1
 
1.8%

Length

2023-12-12T22:03:28.265030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:03:28.417483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 33
58.9%
2018-12-31 11
 
19.6%
2019-12-31 7
 
12.5%
2018-09-16 1
 
1.8%
2020-01-31 1
 
1.8%
2019-11-16 1
 
1.8%
2019-12-04 1
 
1.8%
2019-06-29 1
 
1.8%

Correlations

2023-12-12T22:03:28.508197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
문학단체명사업연도사업유형_행사사업유형_연구조사사업유형_기타사업시작일사업종료일
문학단체명1.0000.0000.5980.6260.0000.0000.000
사업연도0.0001.0000.4130.4780.0001.0001.000
사업유형_행사0.5980.4131.0000.9590.0000.0000.000
사업유형_연구조사0.6260.4780.9591.0000.0000.0000.000
사업유형_기타0.0000.0000.0000.0001.000NaNNaN
사업시작일0.0001.0000.0000.000NaN1.0000.754
사업종료일0.0001.0000.0000.000NaN0.7541.000
2023-12-12T22:03:28.628771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업유형_연구조사사업유형_기타사업유형_행사사업연도사업종료일
사업유형_연구조사1.0000.0000.8180.3160.000
사업유형_기타0.0001.0000.0000.0001.000
사업유형_행사0.8180.0001.0000.2710.000
사업연도0.3160.0000.2711.0000.873
사업종료일0.0001.0000.0000.8731.000
2023-12-12T22:03:29.021554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업연도사업유형_행사사업유형_연구조사사업유형_기타사업종료일
사업연도1.0000.2710.3160.0000.873
사업유형_행사0.2711.0000.8180.0000.000
사업유형_연구조사0.3160.8181.0000.0000.000
사업유형_기타0.0000.0000.0001.0001.000
사업종료일0.8730.0000.0001.0001.000

Missing values

2023-12-12T22:03:26.619672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:03:26.760096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

문학단체명사업연도사업유형_행사사업유형_연구조사사업유형_기타사업시작일사업종료일
0*국**회2014NYN<NA><NA>
1*색**원2014YNN<NA><NA>
2*린**회2014YYN<NA><NA>
3*랑**회2014YNN<NA><NA>
4*국**회2014NYN<NA><NA>
5*오**촌2014YNN<NA><NA>
6*서**요2014YYN<NA><NA>
7*국**회2014NYN<NA><NA>
8*우**터2014NYN<NA><NA>
9*국**회2014NYN<NA><NA>
문학단체명사업연도사업유형_행사사업유형_연구조사사업유형_기타사업시작일사업종료일
46*국**회2019YNN2019-02-012020-01-31
47*B**회2019NYN2019-02-012019-11-16
48*림**회2019NYN2019-02-012019-12-31
49*주**의2019NYN2019-03-012019-12-31
50*학**실2019YNN2019-03-012019-12-31
51*린**대2019NYN2019-03-012019-12-31
52*동**회2019YNN2019-04-012019-12-04
53*국**의2019YNN2019-06-012019-12-31
54*디**원2019YNN2019-06-012019-12-31
55*지**션2019YNN2019-06-292019-06-29

Duplicate rows

Most frequently occurring

문학단체명사업연도사업유형_행사사업유형_연구조사사업유형_기타사업시작일사업종료일# duplicates
0*국**회2014NYN<NA><NA>6
1*국**회2015NYN<NA><NA>4