gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	145
Missing cells	115
Missing cells (%)	15.9%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	6.1 KiB
Average record size in memory	42.9 B

Variable types

Numeric	2
Text	3

Dataset

Description	한국언론진흥재단 미디어 이슈(20년 6호)에 개재된 "네이버 많이 본 뉴스 개편에 대한 이용자 인식"을 정리한 데이터입니다. 자세한 내용은 홈페이지 참고바랍니다.
Author	한국언론진흥재단
URL	https://www.data.go.kr/data/15086565/fileData.do

Alerts

`중분류` has 115 (79.3%) missing values	Missing
`번호` has unique values	Unique
`대분류` has unique values	Unique

Reproduction

Analysis started	2023-12-12 21:44:34.580085
Analysis finished	2023-12-12 21:44:35.506185
Duration	0.93 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

번호
Real number (ℝ)

UNIQUE

Distinct	145
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	73

Minimum	1
Maximum	145
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	1.4 KiB

Quantile statistics

Minimum	1
5-th percentile	8.2
Q1	37
median	73
Q3	109
95-th percentile	137.8
Maximum	145
Range	144
Interquartile range (IQR)	72

Descriptive statistics

Standard deviation	42.001984
Coefficient of variation (CV)	0.57536964
Kurtosis	-1.2
Mean	73
Median Absolute Deviation (MAD)	36
Skewness	0
Sum	10585
Variance	1764.1667
Monotonicity	Strictly increasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
1	1	0.7%
110	1	0.7%
94	1	0.7%
95	1	0.7%
96	1	0.7%
97	1	0.7%
98	1	0.7%
99	1	0.7%
100	1	0.7%
101	1	0.7%
Other values (135)	135	93.1%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	1	0.7%
2	1	0.7%
3	1	0.7%
4	1	0.7%
5	1	0.7%
6	1	0.7%
7	1	0.7%
8	1	0.7%
9	1	0.7%
10	1	0.7%

Value	Count	Frequency (%)
145	1	0.7%
144	1	0.7%
143	1	0.7%
142	1	0.7%
141	1	0.7%
140	1	0.7%
139	1	0.7%
138	1	0.7%
137	1	0.7%
136	1	0.7%

대분류
Text

UNIQUE

Distinct	145
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	1.3 KiB

Length

Max length	44
Median length	26
Mean length	17.248276
Min length	3

Characters and Unicode

Total characters	2501
Distinct characters	129
Distinct categories	10 ?
Distinct scripts	2 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	145 ?
Unique (%)	100.0%

Sample

1st row	성별1
2nd row	성별2
3rd row	연령1
4th row	연령2
5th row	연령3

Value	Count	Frequency (%)
‘네이버뉴스’가	30	4.9%
생각하는	30	4.9%
개편을	30	4.9%
뉴스	20	3.3%
댓글	16	2.6%
포털	16	2.6%
인터넷	16	2.6%
잘했다고	15	2.5%
대한	15	2.5%
못했다고	15	2.5%
Other values (166)	409	66.8%

Most occurring characters

Value	Count	Frequency (%)
	467	18.7%
이	130	5.2%
스	110	4.4%
뉴	88	3.5%
네	62	2.5%
버	62	2.5%
개	55	2.2%
편	55	2.2%
‘	53	2.1%
1	50	2.0%
Other values (119)	1369	54.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1710	68.4%
Space Separator	467	18.7%
Decimal Number	167	6.7%
Initial Punctuation	53	2.1%
Final Punctuation	48	1.9%
Modifier Symbol	28	1.1%
Math Symbol	10	0.4%
Other Punctuation	8	0.3%
Close Punctuation	5	0.2%
Open Punctuation	5	0.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
이	130	7.6%
스	110	6.4%
뉴	88	5.1%
네	62	3.6%
버	62	3.6%
개	55	3.2%
편	55	3.2%
가	42	2.5%
는	40	2.3%
지	37	2.2%
Other values (100)	1029	60.2%

Decimal Number

Value	Count	Frequency (%)
1	50	29.9%
2	28	16.8%
3	27	16.2%
4	23	13.8%
5	15	9.0%
6	6	3.6%
8	5	3.0%
7	5	3.0%
0	4	2.4%
9	4	2.4%

Math Symbol

Value	Count	Frequency (%)
<	5	50.0%
>	5	50.0%

Space Separator

Value	Count	Frequency (%)
	467	100.0%

Initial Punctuation

Value	Count	Frequency (%)
‘	53	100.0%

Final Punctuation

Value	Count	Frequency (%)
’	48	100.0%

Modifier Symbol

Value	Count	Frequency (%)
`	28	100.0%

Other Punctuation

Value	Count	Frequency (%)
,	8	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	5	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1710	68.4%
Common	791	31.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
이	130	7.6%
스	110	6.4%
뉴	88	5.1%
네	62	3.6%
버	62	3.6%
개	55	3.2%
편	55	3.2%
가	42	2.5%
는	40	2.3%
지	37	2.2%
Other values (100)	1029	60.2%

Common

Value	Count	Frequency (%)
	467	59.0%
‘	53	6.7%
1	50	6.3%
’	48	6.1%
2	28	3.5%
`	28	3.5%
3	27	3.4%
4	23	2.9%
5	15	1.9%
,	8	1.0%
Other values (9)	44	5.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1710	68.4%
ASCII	690	27.6%
Punctuation	101	4.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	467	67.7%
1	50	7.2%
2	28	4.1%
`	28	4.1%
3	27	3.9%
4	23	3.3%
5	15	2.2%
,	8	1.2%
6	6	0.9%
)	5	0.7%
Other values (7)	33	4.8%

Hangul

Value	Count	Frequency (%)
이	130	7.6%
스	110	6.4%
뉴	88	5.1%
네	62	3.6%
버	62	3.6%
개	55	3.2%
편	55	3.2%
가	42	2.5%
는	40	2.3%
지	37	2.2%
Other values (100)	1029	60.2%

Punctuation

Value	Count	Frequency (%)
‘	53	52.5%
’	48	47.5%

중분류
Text

MISSING

Distinct	30
Distinct (%)	100.0%
Missing	115
Missing (%)	79.3%
Memory size	1.3 KiB

Length

Max length	26
Median length	23
Mean length	20.7
Min length	13

Characters and Unicode

Total characters	621
Distinct characters	78
Distinct categories	6 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	30 ?
Unique (%)	100.0%

Sample

1st row	1) 다양한 기사 제공1
2nd row	1) 다양한 기사 제공2
3rd row	1) 다양한 기사 제공3
4th row	2) 성별, 세대별로 가르는 부작용1
5th row	2) 성별, 세대별로 가르는 부작용2

Value	Count	Frequency (%)
1	6	3.6%
성별	6	3.6%
경쟁	6	3.6%
클릭수(페이지뷰	6	3.6%
4	6	3.6%
3	6	3.6%
파악	6	3.6%
2	6	3.6%
더	6	3.6%
기사	6	3.6%
Other values (49)	108	64.3%

Most occurring characters

Value	Count	Frequency (%)
	141	22.7%
)	36	5.8%
이	18	2.9%
2	16	2.6%
3	16	2.6%
1	16	2.6%
별	12	1.9%
다	12	1.9%
는	9	1.4%
개	9	1.4%
Other values (68)	336	54.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	372	59.9%
Space Separator	141	22.7%
Decimal Number	60	9.7%
Close Punctuation	36	5.8%
Open Punctuation	6	1.0%
Other Punctuation	6	1.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
이	18	4.8%
별	12	3.2%
다	12	3.2%
는	9	2.4%
개	9	2.4%
지	9	2.4%
하	9	2.4%
성	9	2.4%
함	9	2.4%
사	9	2.4%
Other values (59)	267	71.8%

Decimal Number

Value	Count	Frequency (%)
2	16	26.7%
3	16	26.7%
1	16	26.7%
5	6	10.0%
4	6	10.0%

Space Separator

Value	Count	Frequency (%)
	141	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	36	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	6	100.0%

Other Punctuation

Value	Count	Frequency (%)
,	6	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	372	59.9%
Common	249	40.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
이	18	4.8%
별	12	3.2%
다	12	3.2%
는	9	2.4%
개	9	2.4%
지	9	2.4%
하	9	2.4%
성	9	2.4%
함	9	2.4%
사	9	2.4%
Other values (59)	267	71.8%

Common

Value	Count	Frequency (%)
	141	56.6%
)	36	14.5%
2	16	6.4%
3	16	6.4%
1	16	6.4%
(	6	2.4%
5	6	2.4%
4	6	2.4%
,	6	2.4%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	372	59.9%
ASCII	249	40.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	141	56.6%
)	36	14.5%
2	16	6.4%
3	16	6.4%
1	16	6.4%
(	6	2.4%
5	6	2.4%
4	6	2.4%
,	6	2.4%

Hangul

Value	Count	Frequency (%)
이	18	4.8%
별	12	3.2%
다	12	3.2%
는	9	2.4%
개	9	2.4%
지	9	2.4%
하	9	2.4%
성	9	2.4%
함	9	2.4%
사	9	2.4%
Other values (59)	267	71.8%

소분류
Text

Distinct	106
Distinct (%)	73.1%
Missing	0
Missing (%)	0.0%
Memory size	1.3 KiB

Length

Max length	46
Median length	36
Mean length	14.489655
Min length	3

Characters and Unicode

Total characters	2101
Distinct characters	228
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	91 ?
Unique (%)	62.8%

Sample

1st row	남자1
2nd row	남자2
3rd row	1) 만20-29세
4th row	2) 만30-39세
5th row	3) 만40-49세

Value	Count	Frequency (%)
1	31	5.8%
3	31	5.8%
2	31	5.8%
4	18	3.4%
그렇다	14	2.6%
아니다	10	1.9%
무응답	10	1.9%
5	10	1.9%
종사자	8	1.5%
미만	7	1.3%
Other values (218)	366	68.3%

Most occurring characters

Value	Count	Frequency (%)
	667	31.7%
)	149	7.1%
다	64	3.0%
1	45	2.1%
2	39	1.9%
3	38	1.8%
0	35	1.7%
이	33	1.6%
만	29	1.4%
4	24	1.1%
Other values (218)	978	46.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1011	48.1%
Space Separator	667	31.7%
Decimal Number	223	10.6%
Close Punctuation	149	7.1%
Dash Punctuation	14	0.7%
Math Symbol	10	0.5%
Other Punctuation	9	0.4%
Open Punctuation	6	0.3%
Initial Punctuation	4	0.2%
Final Punctuation	4	0.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
다	64	6.3%
이	33	3.3%
만	29	2.9%
지	21	2.1%
기	19	1.9%
스	19	1.9%
렇	18	1.8%
그	18	1.8%
원	17	1.7%
사	17	1.7%
Other values (197)	756	74.8%

Decimal Number

Value	Count	Frequency (%)
1	45	20.2%
2	39	17.5%
3	38	17.0%
0	35	15.7%
4	24	10.8%
5	16	7.2%
6	9	4.0%
9	7	3.1%
8	5	2.2%
7	5	2.2%

Math Symbol

Value	Count	Frequency (%)
>	5	50.0%
<	5	50.0%

Uppercase Letter

Value	Count	Frequency (%)
C	2	50.0%
P	2	50.0%

Space Separator

Value	Count	Frequency (%)
	667	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	149	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	14	100.0%

Other Punctuation

Value	Count	Frequency (%)
,	9	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	6	100.0%

Initial Punctuation

Value	Count	Frequency (%)
‘	4	100.0%

Final Punctuation

Value	Count	Frequency (%)
’	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	1086	51.7%
Hangul	1011	48.1%
Latin	4	0.2%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
다	64	6.3%
이	33	3.3%
만	29	2.9%
지	21	2.1%
기	19	1.9%
스	19	1.9%
렇	18	1.8%
그	18	1.8%
원	17	1.7%
사	17	1.7%
Other values (197)	756	74.8%

Common

Value	Count	Frequency (%)
	667	61.4%
)	149	13.7%
1	45	4.1%
2	39	3.6%
3	38	3.5%
0	35	3.2%
4	24	2.2%
5	16	1.5%
-	14	1.3%
6	9	0.8%
Other values (9)	50	4.6%

Latin

Value	Count	Frequency (%)
C	2	50.0%
P	2	50.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1082	51.5%
Hangul	1011	48.1%
Punctuation	8	0.4%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	667	61.6%
)	149	13.8%
1	45	4.2%
2	39	3.6%
3	38	3.5%
0	35	3.2%
4	24	2.2%
5	16	1.5%
-	14	1.3%
6	9	0.8%
Other values (9)	46	4.3%

Hangul

Value	Count	Frequency (%)
다	64	6.3%
이	33	3.3%
만	29	2.9%
지	21	2.1%
기	19	1.9%
스	19	1.9%
렇	18	1.8%
그	18	1.8%
원	17	1.7%
사	17	1.7%
Other values (197)	756	74.8%

Punctuation

Value	Count	Frequency (%)
‘	4	50.0%
’	4	50.0%

사례수
Real number (ℝ)

Distinct	119
Distinct (%)	82.1%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	259.22759

Minimum	1
Maximum	1202
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	1.4 KiB

Quantile statistics

Minimum	1
5-th percentile	23.2
Q1	69
median	161
Q3	379
95-th percentile	793.8
Maximum	1202
Range	1201
Interquartile range (IQR)	310

Descriptive statistics

Standard deviation	250.3703
Coefficient of variation (CV)	0.96583202
Kurtosis	1.1221037
Mean	259.22759
Median Absolute Deviation (MAD)	122
Skewness	1.2878763
Sum	37588
Variance	62685.288
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
889	5	3.4%
379	5	3.4%
39	3	2.1%
57	3	2.1%
116	3	2.1%
82	2	1.4%
224	2	1.4%
293	2	1.4%
34	2	1.4%
31	2	1.4%
Other values (109)	116	80.0%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	1	0.7%
5	1	0.7%
6	1	0.7%
19	1	0.7%
20	1	0.7%
21	1	0.7%
22	1	0.7%
23	1	0.7%
24	1	0.7%
26	2	1.4%

Value	Count	Frequency (%)
1202	1	0.7%
889	5	3.4%
813	1	0.7%
795	1	0.7%
789	1	0.7%
745	1	0.7%
695	1	0.7%
692	1	0.7%
690	1	0.7%
666	1	0.7%

번호
사례수

사례수
번호

사례수
번호

Phik (φk)
Auto

Heatmap
Table

	번호	중분류	사례수
번호	1.000	1.000	0.406
중분류	1.000	1.000	1.000
사례수	0.406	1.000	1.000

Heatmap
Table

	번호	사례수
번호	1.000	-0.033
사례수	-0.033	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	번호	대분류	중분류	소분류	사례수
0	1	성별1	<NA>	남자1	613
1	2	성별2	<NA>	남자2	589
2	3	연령1	<NA>	1) 만20-29세	262
3	4	연령2	<NA>	2) 만30-39세	267
4	5	연령3	<NA>	3) 만40-49세	260
5	6	연령4	<NA>	4) 만50-59세	268
6	7	연령5	<NA>	5) 만60-69세	145
7	8	거주 지역1	<NA>	1) 서울	217
8	9	거주 지역2	<NA>	2) 부산	76
9	10	거주 지역3	<NA>	3) 대구	57

	번호	대분류	중분류	소분류	사례수
135	136	월 평균 가구 소득8	<NA>	8) 800만원 이상	116
136	137	사회계층1	<NA>	1) 하층	115
137	138	사회계층2	<NA>	2) 중하층	450
138	139	사회계층3	<NA>	3) 중간층	561
139	140	사회계층4	<NA>	4) 중상층	76
140	141	정치적 성향1	<NA>	1) 보수	31
141	142	정치적 성향2	<NA>	2) 보수에 가까움	173
142	143	정치적 성향3	<NA>	3) 중도	692
143	144	정치적 성향4	<NA>	4) 진보에 가까움	280
144	145	정치적 성향5	<NA>	5) 진보	26

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Math Symbol

Space Separator

Initial Punctuation

Final Punctuation

Modifier Symbol

Other Punctuation

Close Punctuation

Open Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Punctuation

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Space Separator

Close Punctuation

Open Punctuation

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Math Symbol

Uppercase Letter

Space Separator

Close Punctuation

Dash Punctuation

Other Punctuation

Open Punctuation

Initial Punctuation

Final Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Punctuation

Interactions

Correlations

Missing values

Sample