Overview

Dataset statistics

Number of variables3
Number of observations454
Missing cells762
Missing cells (%)55.9%
Duplicate rows1
Duplicate rows (%)0.2%
Total size in memory11.7 KiB
Average record size in memory26.3 B

Variable types

Numeric2
Text1

Dataset

Description분야별 보도에서 월별로 가장 많이 등장한 명사를 200개 추출해 순위와 빈도를 제공뉴스데이터베이스 "BIGKinds" 에서 54개 신문방송의 뉴스를 분석한 메타정보https://www.bigkinds.or.kr 에 접속하면 보다 많은 정보를 확인할 수 있습니다.
Author한국언론진흥재단
URLhttps://www.data.go.kr/data/15065411/fileData.do

Alerts

Dataset has 1 (0.2%) duplicate rowsDuplicates
순위 is highly overall correlated with 빈도수High correlation
빈도수 is highly overall correlated with 순위High correlation
순위 has 254 (55.9%) missing valuesMissing
키워드 has 254 (55.9%) missing valuesMissing
빈도수 has 254 (55.9%) missing valuesMissing

Reproduction

Analysis started2024-03-14 12:56:21.081445
Analysis finished2024-03-14 12:56:23.245925
Duration2.16 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순위
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct200
Distinct (%)100.0%
Missing254
Missing (%)55.9%
Infinite0
Infinite (%)0.0%
Mean100.5
Minimum1
Maximum200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 KiB
2024-03-14T21:56:23.479374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile10.95
Q150.75
median100.5
Q3150.25
95-th percentile190.05
Maximum200
Range199
Interquartile range (IQR)99.5

Descriptive statistics

Standard deviation57.879185
Coefficient of variation (CV)0.57591228
Kurtosis-1.2
Mean100.5
Median Absolute Deviation (MAD)50
Skewness0
Sum20100
Variance3350
MonotonicityStrictly increasing
2024-03-14T21:56:23.932613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
139 1
 
0.2%
129 1
 
0.2%
130 1
 
0.2%
131 1
 
0.2%
132 1
 
0.2%
133 1
 
0.2%
134 1
 
0.2%
135 1
 
0.2%
136 1
 
0.2%
137 1
 
0.2%
Other values (190) 190
41.9%
(Missing) 254
55.9%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
200 1
0.2%
199 1
0.2%
198 1
0.2%
197 1
0.2%
196 1
0.2%
195 1
0.2%
194 1
0.2%
193 1
0.2%
192 1
0.2%
191 1
0.2%

키워드
Text

MISSING 

Distinct200
Distinct (%)100.0%
Missing254
Missing (%)55.9%
Memory size3.7 KiB
2024-03-14T21:56:25.048648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length3
Mean length3.54
Min length2

Characters and Unicode

Total characters708
Distinct characters251
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique200 ?
Unique (%)100.0%

Sample

1st row대통령
2nd row민주당
3rd row이재명
4th row위원장
5th row러시아
ValueCountFrequency (%)
문재인 1
 
0.5%
홍익표 1
 
0.5%
선생님 1
 
0.5%
장관_후보자 1
 
0.5%
유인촌 1
 
0.5%
선거구 1
 
0.5%
중소기업 1
 
0.5%
해임건의안 1
 
0.5%
보스토치니 1
 
0.5%
항공청 1
 
0.5%
Other values (190) 190
95.0%
2024-03-14T21:56:26.598486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
19
 
2.7%
_ 17
 
2.4%
16
 
2.3%
16
 
2.3%
14
 
2.0%
14
 
2.0%
13
 
1.8%
12
 
1.7%
12
 
1.7%
11
 
1.6%
Other values (241) 564
79.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 674
95.2%
Connector Punctuation 17
 
2.4%
Uppercase Letter 10
 
1.4%
Lowercase Letter 4
 
0.6%
Decimal Number 2
 
0.3%
Other Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
19
 
2.8%
16
 
2.4%
16
 
2.4%
14
 
2.1%
14
 
2.1%
13
 
1.9%
12
 
1.8%
12
 
1.8%
11
 
1.6%
11
 
1.6%
Other values (225) 536
79.5%
Uppercase Letter
ValueCountFrequency (%)
S 3
30.0%
R 1
 
10.0%
G 1
 
10.0%
D 1
 
10.0%
O 1
 
10.0%
C 1
 
10.0%
P 1
 
10.0%
N 1
 
10.0%
Lowercase Letter
ValueCountFrequency (%)
y 1
25.0%
t 1
25.0%
r 1
25.0%
a 1
25.0%
Decimal Number
ValueCountFrequency (%)
0 1
50.0%
2 1
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 17
100.0%
Other Punctuation
ValueCountFrequency (%)
& 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 674
95.2%
Common 20
 
2.8%
Latin 14
 
2.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
19
 
2.8%
16
 
2.4%
16
 
2.4%
14
 
2.1%
14
 
2.1%
13
 
1.9%
12
 
1.8%
12
 
1.8%
11
 
1.6%
11
 
1.6%
Other values (225) 536
79.5%
Latin
ValueCountFrequency (%)
S 3
21.4%
R 1
 
7.1%
G 1
 
7.1%
D 1
 
7.1%
O 1
 
7.1%
y 1
 
7.1%
C 1
 
7.1%
t 1
 
7.1%
r 1
 
7.1%
a 1
 
7.1%
Other values (2) 2
14.3%
Common
ValueCountFrequency (%)
_ 17
85.0%
& 1
 
5.0%
0 1
 
5.0%
2 1
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 674
95.2%
ASCII 34
 
4.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
19
 
2.8%
16
 
2.4%
16
 
2.4%
14
 
2.1%
14
 
2.1%
13
 
1.9%
12
 
1.8%
12
 
1.8%
11
 
1.6%
11
 
1.6%
Other values (225) 536
79.5%
ASCII
ValueCountFrequency (%)
_ 17
50.0%
S 3
 
8.8%
& 1
 
2.9%
R 1
 
2.9%
0 1
 
2.9%
2 1
 
2.9%
G 1
 
2.9%
D 1
 
2.9%
O 1
 
2.9%
y 1
 
2.9%
Other values (6) 6
 
17.6%

빈도수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct159
Distinct (%)79.5%
Missing254
Missing (%)55.9%
Infinite0
Infinite (%)0.0%
Mean608.43
Minimum151
Maximum13415
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 KiB
2024-03-14T21:56:27.011530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum151
5-th percentile154
Q1189.25
median297.5
Q3504
95-th percentile1345.9
Maximum13415
Range13264
Interquartile range (IQR)314.75

Descriptive statistics

Standard deviation1308.8554
Coefficient of variation (CV)2.1512013
Kurtosis55.326579
Mean608.43
Median Absolute Deviation (MAD)127.5
Skewness6.8581703
Sum121686
Variance1713102.5
MonotonicityDecreasing
2024-03-14T21:56:27.474957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
152 4
 
0.9%
207 3
 
0.7%
151 3
 
0.7%
182 3
 
0.7%
159 3
 
0.7%
171 3
 
0.7%
526 3
 
0.7%
190 2
 
0.4%
195 2
 
0.4%
198 2
 
0.4%
Other values (149) 172
37.9%
(Missing) 254
55.9%
ValueCountFrequency (%)
151 3
0.7%
152 4
0.9%
153 2
0.4%
154 2
0.4%
159 3
0.7%
161 2
0.4%
162 2
0.4%
163 2
0.4%
165 2
0.4%
166 1
 
0.2%
ValueCountFrequency (%)
13415 1
0.2%
8429 1
0.2%
6821 1
0.2%
5449 1
0.2%
4921 1
0.2%
3141 1
0.2%
1984 1
0.2%
1806 1
0.2%
1755 1
0.2%
1515 1
0.2%

Interactions

2024-03-14T21:56:21.845891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T21:56:21.305224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T21:56:22.117527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T21:56:21.575344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-14T21:56:27.743294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순위빈도수
순위1.0000.352
빈도수0.3521.000
2024-03-14T21:56:27.879743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순위빈도수
순위1.000-1.000
빈도수-1.0001.000

Missing values

2024-03-14T21:56:22.452847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T21:56:22.695201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-14T21:56:23.112248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

순위키워드빈도수
01대통령13415
12민주당8429
23이재명6821
34위원장5449
45러시아4921
56윤석열3141
67김정은1984
78대통령실1806
89더불어민주당1755
910본회의1515
순위키워드빈도수
444<NA><NA><NA>
445<NA><NA><NA>
446<NA><NA><NA>
447<NA><NA><NA>
448<NA><NA><NA>
449<NA><NA><NA>
450<NA><NA><NA>
451<NA><NA><NA>
452<NA><NA><NA>
453<NA><NA><NA>

Duplicate rows

Most frequently occurring

순위키워드빈도수# duplicates
0<NA><NA><NA>254