Overview

Dataset statistics

Number of variables3
Number of observations200
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.2 KiB
Average record size in memory26.6 B

Variable types

Numeric2
Text1

Dataset

Description뉴스데이터베이스 "BIGKinds" 에서 54개 신문방송의 뉴스를 분석한 메타정보.분야별 보도에서 월별로 가장 많이 등장한 명사를 200개 추출해 순위와 빈도를 제공https://www.bigkinds.or.kr 에 접속하면 보다 많은 정보를 확인할 수 있습니다.
Author한국언론진흥재단
URLhttps://www.data.go.kr/data/15065434/fileData.do

Alerts

순위 is highly overall correlated with 빈도수High correlation
빈도수 is highly overall correlated with 순위High correlation
순위 has unique valuesUnique
키워드 has unique valuesUnique

Reproduction

Analysis started2024-03-14 20:49:25.176299
Analysis finished2024-03-14 20:49:27.172124
Duration2 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순위
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct200
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.5
Minimum1
Maximum200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2024-03-15T05:49:27.455297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile10.95
Q150.75
median100.5
Q3150.25
95-th percentile190.05
Maximum200
Range199
Interquartile range (IQR)99.5

Descriptive statistics

Standard deviation57.879185
Coefficient of variation (CV)0.57591228
Kurtosis-1.2
Mean100.5
Median Absolute Deviation (MAD)50
Skewness0
Sum20100
Variance3350
MonotonicityStrictly increasing
2024-03-15T05:49:28.434710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.5%
139 1
 
0.5%
129 1
 
0.5%
130 1
 
0.5%
131 1
 
0.5%
132 1
 
0.5%
133 1
 
0.5%
134 1
 
0.5%
135 1
 
0.5%
136 1
 
0.5%
Other values (190) 190
95.0%
ValueCountFrequency (%)
1 1
0.5%
2 1
0.5%
3 1
0.5%
4 1
0.5%
5 1
0.5%
6 1
0.5%
7 1
0.5%
8 1
0.5%
9 1
0.5%
10 1
0.5%
ValueCountFrequency (%)
200 1
0.5%
199 1
0.5%
198 1
0.5%
197 1
0.5%
196 1
0.5%
195 1
0.5%
194 1
0.5%
193 1
0.5%
192 1
0.5%
191 1
0.5%

키워드
Text

UNIQUE 

Distinct200
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2024-03-15T05:49:30.907037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length3
Mean length3.325
Min length3

Characters and Unicode

Total characters665
Distinct characters243
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique200 ?
Unique (%)100.0%

Sample

1st row피해자
2nd row이재명
3rd row학부모
4th row장애인
5th row교육부
ValueCountFrequency (%)
피해자 1
 
0.5%
성범죄 1
 
0.5%
챌린지 1
 
0.5%
보건소 1
 
0.5%
공동체 1
 
0.5%
조우형 1
 
0.5%
플라스틱 1
 
0.5%
보호자 1
 
0.5%
기상청 1
 
0.5%
sns 1
 
0.5%
Other values (190) 190
95.0%
2024-03-15T05:49:33.254014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
17
 
2.6%
16
 
2.4%
16
 
2.4%
16
 
2.4%
14
 
2.1%
13
 
2.0%
12
 
1.8%
12
 
1.8%
11
 
1.7%
10
 
1.5%
Other values (233) 528
79.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 628
94.4%
Uppercase Letter 27
 
4.1%
Connector Punctuation 7
 
1.1%
Decimal Number 3
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
17
 
2.7%
16
 
2.5%
16
 
2.5%
16
 
2.5%
14
 
2.2%
13
 
2.1%
12
 
1.9%
12
 
1.9%
11
 
1.8%
10
 
1.6%
Other values (216) 491
78.2%
Uppercase Letter
ValueCountFrequency (%)
S 5
18.5%
C 4
14.8%
T 4
14.8%
E 3
11.1%
B 2
 
7.4%
N 1
 
3.7%
G 1
 
3.7%
R 1
 
3.7%
O 1
 
3.7%
D 1
 
3.7%
Other values (4) 4
14.8%
Decimal Number
ValueCountFrequency (%)
9 2
66.7%
1 1
33.3%
Connector Punctuation
ValueCountFrequency (%)
_ 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 628
94.4%
Latin 27
 
4.1%
Common 10
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
17
 
2.7%
16
 
2.5%
16
 
2.5%
16
 
2.5%
14
 
2.2%
13
 
2.1%
12
 
1.9%
12
 
1.9%
11
 
1.8%
10
 
1.6%
Other values (216) 491
78.2%
Latin
ValueCountFrequency (%)
S 5
18.5%
C 4
14.8%
T 4
14.8%
E 3
11.1%
B 2
 
7.4%
N 1
 
3.7%
G 1
 
3.7%
R 1
 
3.7%
O 1
 
3.7%
D 1
 
3.7%
Other values (4) 4
14.8%
Common
ValueCountFrequency (%)
_ 7
70.0%
9 2
 
20.0%
1 1
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 628
94.4%
ASCII 37
 
5.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
17
 
2.7%
16
 
2.5%
16
 
2.5%
16
 
2.5%
14
 
2.2%
13
 
2.1%
12
 
1.9%
12
 
1.9%
11
 
1.8%
10
 
1.6%
Other values (216) 491
78.2%
ASCII
ValueCountFrequency (%)
_ 7
18.9%
S 5
13.5%
C 4
10.8%
T 4
10.8%
E 3
8.1%
B 2
 
5.4%
9 2
 
5.4%
N 1
 
2.7%
G 1
 
2.7%
R 1
 
2.7%
Other values (7) 7
18.9%

빈도수
Real number (ℝ)

HIGH CORRELATION 

Distinct157
Distinct (%)78.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean446.04
Minimum164
Maximum2785
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2024-03-15T05:49:33.667176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum164
5-th percentile173
Q1208
median275
Q3481.75
95-th percentile1200.8
Maximum2785
Range2621
Interquartile range (IQR)273.75

Descriptive statistics

Standard deviation424.85425
Coefficient of variation (CV)0.95250258
Kurtosis11.592418
Mean446.04
Median Absolute Deviation (MAD)89
Skewness3.0898
Sum89208
Variance180501.13
MonotonicityDecreasing
2024-03-15T05:49:34.060019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
179 4
 
2.0%
201 4
 
2.0%
208 4
 
2.0%
193 3
 
1.5%
270 3
 
1.5%
197 3
 
1.5%
222 2
 
1.0%
203 2
 
1.0%
205 2
 
1.0%
210 2
 
1.0%
Other values (147) 171
85.5%
ValueCountFrequency (%)
164 1
0.5%
165 2
1.0%
166 2
1.0%
168 1
0.5%
169 1
0.5%
171 2
1.0%
173 2
1.0%
174 2
1.0%
175 1
0.5%
176 1
0.5%
ValueCountFrequency (%)
2785 1
0.5%
2621 1
0.5%
2515 1
0.5%
2331 1
0.5%
1684 1
0.5%
1562 1
0.5%
1394 1
0.5%
1366 1
0.5%
1330 2
1.0%
1194 1
0.5%

Interactions

2024-03-15T05:49:26.094929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T05:49:25.396534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T05:49:26.454465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T05:49:25.756846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-15T05:49:34.312016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순위빈도수
순위1.0000.788
빈도수0.7881.000
2024-03-15T05:49:34.471298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순위빈도수
순위1.000-1.000
빈도수-1.0001.000

Missing values

2024-03-15T05:49:26.801804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-15T05:49:27.080606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

순위키워드빈도수
01피해자2785
12이재명2621
23학부모2515
34장애인2331
45교육부1684
56서비스1562
67민주당1394
78위원장1366
89청소년1330
910근로자1330
순위키워드빈도수
190191방심위173
191192서이초_교사171
192193공모전171
193194조직원169
194195보증금168
195196응급실166
196197사업주166
197198지원자165
198199병원장165
199200임단협164