Overview

Dataset statistics

Number of variables3
Number of observations274
Missing cells222
Missing cells (%)27.0%
Duplicate rows1
Duplicate rows (%)0.4%
Total size in memory7.1 KiB
Average record size in memory26.5 B

Variable types

Numeric2
Text1

Dataset

Description뉴스데이터베이스 "BIGKinds" 에서 54개 신문방송의 뉴스를 분석한 메타정보.분야별 보도에서 월별로 가장 많이 등장한 명사를 200개 추출해 순위와 빈도를 제공https://www.bigkinds.or.kr 에 접속하면 보다 많은 정보를 확인할 수 있습니다.
Author한국언론진흥재단
URLhttps://www.data.go.kr/data/15068899/fileData.do

Alerts

Dataset has 1 (0.4%) duplicate rowsDuplicates
순위 is highly overall correlated with 빈도수High correlation
빈도수 is highly overall correlated with 순위High correlation
순위 has 74 (27.0%) missing valuesMissing
키워드 has 74 (27.0%) missing valuesMissing
빈도수 has 74 (27.0%) missing valuesMissing

Reproduction

Analysis started2024-03-14 13:57:21.704112
Analysis finished2024-03-14 13:57:23.315574
Duration1.61 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순위
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct200
Distinct (%)100.0%
Missing74
Missing (%)27.0%
Infinite0
Infinite (%)0.0%
Mean100.5
Minimum1
Maximum200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2024-03-14T22:57:23.452461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile10.95
Q150.75
median100.5
Q3150.25
95-th percentile190.05
Maximum200
Range199
Interquartile range (IQR)99.5

Descriptive statistics

Standard deviation57.879185
Coefficient of variation (CV)0.57591228
Kurtosis-1.2
Mean100.5
Median Absolute Deviation (MAD)50
Skewness0
Sum20100
Variance3350
MonotonicityStrictly increasing
2024-03-14T22:57:23.719449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
139 1
 
0.4%
129 1
 
0.4%
130 1
 
0.4%
131 1
 
0.4%
132 1
 
0.4%
133 1
 
0.4%
134 1
 
0.4%
135 1
 
0.4%
136 1
 
0.4%
137 1
 
0.4%
Other values (190) 190
69.3%
(Missing) 74
 
27.0%
ValueCountFrequency (%)
1 1
0.4%
2 1
0.4%
3 1
0.4%
4 1
0.4%
5 1
0.4%
6 1
0.4%
7 1
0.4%
8 1
0.4%
9 1
0.4%
10 1
0.4%
ValueCountFrequency (%)
200 1
0.4%
199 1
0.4%
198 1
0.4%
197 1
0.4%
196 1
0.4%
195 1
0.4%
194 1
0.4%
193 1
0.4%
192 1
0.4%
191 1
0.4%

키워드
Text

MISSING 

Distinct200
Distinct (%)100.0%
Missing74
Missing (%)27.0%
Memory size2.3 KiB
2024-03-14T22:57:24.972921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length3
Mean length3.45
Min length3

Characters and Unicode

Total characters690
Distinct characters240
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique200 ?
Unique (%)100.0%

Sample

1st row서비스
2nd row글로벌
3rd row반도체
4th row에너지
5th row소비자
ValueCountFrequency (%)
디지털 1
 
0.5%
연체율 1
 
0.5%
운반선 1
 
0.5%
금융위 1
 
0.5%
우크라이나 1
 
0.5%
환경부 1
 
0.5%
lng 1
 
0.5%
오염수 1
 
0.5%
하나은행 1
 
0.5%
관광객 1
 
0.5%
Other values (190) 190
95.0%
2024-03-14T22:57:26.674915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
21
 
3.0%
20
 
2.9%
19
 
2.8%
11
 
1.6%
10
 
1.4%
9
 
1.3%
8
 
1.2%
8
 
1.2%
8
 
1.2%
8
 
1.2%
Other values (230) 568
82.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 625
90.6%
Uppercase Letter 60
 
8.7%
Connector Punctuation 2
 
0.3%
Decimal Number 2
 
0.3%
Other Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
21
 
3.4%
20
 
3.2%
19
 
3.0%
11
 
1.8%
10
 
1.6%
9
 
1.4%
8
 
1.3%
8
 
1.3%
8
 
1.3%
8
 
1.3%
Other values (205) 503
80.5%
Uppercase Letter
ValueCountFrequency (%)
G 5
 
8.3%
S 5
 
8.3%
O 5
 
8.3%
M 4
 
6.7%
I 4
 
6.7%
D 4
 
6.7%
C 3
 
5.0%
B 3
 
5.0%
P 3
 
5.0%
L 3
 
5.0%
Other values (11) 21
35.0%
Decimal Number
ValueCountFrequency (%)
9 1
50.0%
1 1
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%
Other Punctuation
ValueCountFrequency (%)
& 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 625
90.6%
Latin 60
 
8.7%
Common 5
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
21
 
3.4%
20
 
3.2%
19
 
3.0%
11
 
1.8%
10
 
1.6%
9
 
1.4%
8
 
1.3%
8
 
1.3%
8
 
1.3%
8
 
1.3%
Other values (205) 503
80.5%
Latin
ValueCountFrequency (%)
G 5
 
8.3%
S 5
 
8.3%
O 5
 
8.3%
M 4
 
6.7%
I 4
 
6.7%
D 4
 
6.7%
C 3
 
5.0%
B 3
 
5.0%
P 3
 
5.0%
L 3
 
5.0%
Other values (11) 21
35.0%
Common
ValueCountFrequency (%)
_ 2
40.0%
9 1
20.0%
1 1
20.0%
& 1
20.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 625
90.6%
ASCII 65
 
9.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
21
 
3.4%
20
 
3.2%
19
 
3.0%
11
 
1.8%
10
 
1.6%
9
 
1.4%
8
 
1.3%
8
 
1.3%
8
 
1.3%
8
 
1.3%
Other values (205) 503
80.5%
ASCII
ValueCountFrequency (%)
G 5
 
7.7%
S 5
 
7.7%
O 5
 
7.7%
M 4
 
6.2%
I 4
 
6.2%
D 4
 
6.2%
C 3
 
4.6%
B 3
 
4.6%
P 3
 
4.6%
L 3
 
4.6%
Other values (15) 26
40.0%

빈도수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct165
Distinct (%)82.5%
Missing74
Missing (%)27.0%
Infinite0
Infinite (%)0.0%
Mean614.07
Minimum219
Maximum6103
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2024-03-14T22:57:27.127977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum219
5-th percentile225.95
Q1267
median349.5
Q3535
95-th percentile1990.9
Maximum6103
Range5884
Interquartile range (IQR)268

Descriptive statistics

Standard deviation788.6471
Coefficient of variation (CV)1.2842951
Kurtosis20.775606
Mean614.07
Median Absolute Deviation (MAD)95
Skewness4.1426905
Sum122814
Variance621964.25
MonotonicityDecreasing
2024-03-14T22:57:27.600840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
265 5
 
1.8%
224 4
 
1.5%
271 3
 
1.1%
232 3
 
1.1%
257 2
 
0.7%
317 2
 
0.7%
260 2
 
0.7%
261 2
 
0.7%
751 2
 
0.7%
309 2
 
0.7%
Other values (155) 173
63.1%
(Missing) 74
27.0%
ValueCountFrequency (%)
219 1
 
0.4%
220 2
0.7%
221 1
 
0.4%
224 4
1.5%
225 2
0.7%
226 1
 
0.4%
228 2
0.7%
229 1
 
0.4%
230 1
 
0.4%
231 1
 
0.4%
ValueCountFrequency (%)
6103 1
0.4%
5465 1
0.4%
3906 1
0.4%
3517 1
0.4%
3022 1
0.4%
2911 1
0.4%
2892 1
0.4%
2424 1
0.4%
2347 1
0.4%
2179 1
0.4%

Interactions

2024-03-14T22:57:22.458425image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T22:57:21.925511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T22:57:22.756241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T22:57:22.195844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-14T22:57:27.850618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순위빈도수
순위1.0000.625
빈도수0.6251.000
2024-03-14T22:57:28.091831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순위빈도수
순위1.000-1.000
빈도수-1.0001.000

Missing values

2024-03-14T22:57:22.950720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T22:57:23.085449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-14T22:57:23.228320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

순위키워드빈도수
01서비스6103
12글로벌5465
23반도체3906
34에너지3517
45소비자3022
56부동산2911
67전기차2892
78아파트2424
89자동차2347
910투자자2179
순위키워드빈도수
264<NA><NA><NA>
265<NA><NA><NA>
266<NA><NA><NA>
267<NA><NA><NA>
268<NA><NA><NA>
269<NA><NA><NA>
270<NA><NA><NA>
271<NA><NA><NA>
272<NA><NA><NA>
273<NA><NA><NA>

Duplicate rows

Most frequently occurring

순위키워드빈도수# duplicates
0<NA><NA><NA>74