gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	181
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	12
Duplicate rows (%)	6.6%
Total size in memory	4.5 KiB
Average record size in memory	25.7 B

Variable types

Text	1
Numeric	1
Categorical	1

Dataset

Description	2014-2019년 문예진흥기금 공모사업 중 문학 분야 "문예지발간" 지원 사업의 전자책/웹진 서비스 여부
Author	한국문화예술위원회
URL	https://www.data.go.kr/data/15076421/fileData.do

Alerts

Dataset has 12 (6.6%) duplicate rows	Duplicates
`사업연도` is highly overall correlated with `온라인문예지서비스추진여부`	High correlation
`온라인문예지서비스추진여부` is highly overall correlated with `사업연도`	High correlation

Reproduction

Analysis started	2023-12-12 02:35:36.334155
Analysis finished	2023-12-12 02:35:36.689179
Duration	0.36 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

문학단체명
Text

Distinct	62
Distinct (%)	34.3%
Missing	0
Missing (%)	0.0%
Memory size	1.5 KiB

Length

Max length	5
Median length	5
Mean length	5
Min length	5

Characters and Unicode

Total characters	905
Distinct characters	68
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	19 ?
Unique (%)	10.5%

Sample

1st row	제*부
2nd row	국*회
3rd row	1*학
4th row	학*네
5th row	학*상

Value	Count	Frequency (%)
국**회	45	24.9%
대**학	8	4.4%
제**부	5	2.8%
학**네	4	2.2%
학**상	4	2.2%
비**비	4	2.2%
학**사	4	2.2%
음**음	4	2.2%
년**작	3	1.7%
학**당	3	1.7%
Other values (52)	97	53.6%

Most occurring characters

Value	Count	Frequency (%)
*	543	60.0%
국	54	6.0%
회	48	5.3%
학	41	4.5%
사	17	1.9%
음	11	1.2%
대	10	1.1%
시	9	1.0%
서	9	1.0%
비	8	0.9%
Other values (58)	155	17.1%

Most occurring categories

Value	Count	Frequency (%)
Other Punctuation	543	60.0%
Other Letter	360	39.8%
Decimal Number	2	0.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
국	54	15.0%
회	48	13.3%
학	41	11.4%
사	17	4.7%
음	11	3.1%
대	10	2.8%
시	9	2.5%
서	9	2.5%
비	8	2.2%
린	7	1.9%
Other values (56)	146	40.6%

Other Punctuation

Value	Count	Frequency (%)
*	543	100.0%

Decimal Number

Value	Count	Frequency (%)
1	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	545	60.2%
Hangul	360	39.8%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
국	54	15.0%
회	48	13.3%
학	41	11.4%
사	17	4.7%
음	11	3.1%
대	10	2.8%
시	9	2.5%
서	9	2.5%
비	8	2.2%
린	7	1.9%
Other values (56)	146	40.6%

Common

Value	Count	Frequency (%)
*	543	99.6%
1	2	0.4%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	545	60.2%
Hangul	360	39.8%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
*	543	99.6%
1	2	0.4%

Hangul

Value	Count	Frequency (%)
국	54	15.0%
회	48	13.3%
학	41	11.4%
사	17	4.7%
음	11	3.1%
대	10	2.8%
시	9	2.5%
서	9	2.5%
비	8	2.2%
린	7	1.9%
Other values (56)	146	40.6%

사업연도
Real number (ℝ)

HIGH CORRELATION

Distinct	6
Distinct (%)	3.3%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	2016.7624

Minimum	2014
Maximum	2019
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	1.7 KiB

Quantile statistics

Minimum	2014
5-th percentile	2014
Q1	2014
median	2018
Q3	2019
95-th percentile	2019
Maximum	2019
Range	5
Interquartile range (IQR)	5

Descriptive statistics

Standard deviation	2.0395867
Coefficient of variation (CV)	0.0010113173
Kurtosis	-1.5893876
Mean	2016.7624
Median Absolute Deviation (MAD)	1
Skewness	-0.35284142
Sum	365034
Variance	4.1599141
Monotonicity	Increasing

Histogram with fixed size bins (bins=6)

Value	Count	Frequency (%)
2014	51	28.2%
2018	50	27.6%
2019	47	26.0%
2015	14	7.7%
2017	13	7.2%
2016	6	3.3%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
2014	51	28.2%
2015	14	7.7%
2016	6	3.3%
2017	13	7.2%
2018	50	27.6%
2019	47	26.0%

Value	Count	Frequency (%)
2019	47	26.0%
2018	50	27.6%
2017	13	7.2%
2016	6	3.3%
2015	14	7.7%
2014	51	28.2%

온라인문예지서비스추진여부
Categorical

HIGH CORRELATION

Distinct	4
Distinct (%)	2.2%
Missing	0
Missing (%)	0.0%
Memory size	1.5 KiB

<NA>	134
N	22
Y	17
미응답	8

Length

Max length	4
Median length	4
Mean length	3.3093923
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	<NA>
2nd row	<NA>
3rd row	<NA>
4th row	<NA>
5th row	<NA>

Common Values

Value	Count	Frequency (%)
<NA>	134	74.0%
N	22	12.2%
Y	17	9.4%
미응답	8	4.4%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	134	74.0%
n	22	12.2%
y	17	9.4%
미응답	8	4.4%

사업연도

사업연도

Phik (φk)
Auto

Heatmap
Table

	문학단체명	사업연도	온라인문예지서비스추진여부
문학단체명	1.000	0.000	0.324
사업연도	0.000	1.000	NaN
온라인문예지서비스추진여부	0.324	NaN	1.000

Heatmap
Table

	사업연도	온라인문예지서비스추진여부
사업연도	1.000	1.000
온라인문예지서비스추진여부	1.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	문학단체명	사업연도	온라인문예지서비스추진여부
0	제*부	2014	<NA>
1	국*회	2014	<NA>
2	1*학	2014	<NA>
3	학*네	2014	<NA>
4	학*상	2014	<NA>
5	음*사	2014	<NA>
6	천*학	2014	<NA>
7	행*사	2014	<NA>
8	년*작	2014	<NA>
9	간*선	2014	<NA>

	문학단체명	사업연도	온라인문예지서비스추진여부
171	서*망	2019	N
172	와*시	2019	미응답
173	국*회	2019	Y
174	시*아	2019	N
175	국*회	2019	Y
176	행*사	2019	미응답
177	국*연	2019	Y
178	대*학	2019	N
179	국*회	2019	미응답
180	천*학	2019	N

Most frequently occurring

	문학단체명	사업연도	온라인문예지서비스추진여부	# duplicates
0	국*회	2014	<NA>	10
3	국*회	2017	<NA>	10
4	국*회	2018	<NA>	10
2	국*회	2016	<NA>	5
6	국*회	2019	Y	4
1	국*회	2015	<NA>	2
5	국*회	2019	N	2
7	국*회	2019	미응답	2
8	대*학	2014	<NA>	2
9	대*학	2015	<NA>	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Other Punctuation

Decimal Number

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring