Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 300 |
Missing cells | 2733 |
Missing cells (%) | 82.8% |
Duplicate rows | 4 |
Duplicate rows (%) | 1.3% |
Total size in memory | 27.7 KiB |
Average record size in memory | 94.4 B |
Variable types
Text | 4 |
---|---|
Numeric | 2 |
Categorical | 1 |
Unsupported | 4 |
Dataset
Description | 샘플 데이터 |
---|---|
Author | MBN |
URL | https://kdx.kr/data/view/1011 |
Dataset has 4 (1.3%) duplicate rows | Duplicates |
NEWS_NO is highly overall correlated with NWS_CN | High correlation |
NWS_CN is highly overall correlated with NEWS_NO | High correlation |
NWS_CN is highly imbalanced (85.7%) | Imbalance |
BDCT_NO has 112 (37.3%) missing values | Missing |
NEWS_CGR_CD has 281 (93.7%) missing values | Missing |
NEWS_NO has 280 (93.3%) missing values | Missing |
BDCT_DATE has 280 (93.3%) missing values | Missing |
BDCT_TIME has 290 (96.7%) missing values | Missing |
NWS_SJ has 290 (96.7%) missing values | Missing |
NWS_JRNL_NM has 300 (100.0%) missing values | Missing |
REG_DATE has 300 (100.0%) missing values | Missing |
MVP_CRS_NM has 300 (100.0%) missing values | Missing |
Unnamed: 10 has 300 (100.0%) missing values | Missing |
NWS_JRNL_NM is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
REG_DATE is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
MVP_CRS_NM is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
Analysis started | 2023-12-11 21:32:38.163588 |
---|---|
Analysis finished | 2023-12-11 21:32:39.468076 |
Duration | 1.3 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
BDCT_NO
Text
MISSING
 
Distinct | 174 |
---|---|
Distinct (%) | 92.6% |
Missing | 112 |
Missing (%) | 37.3% |
Memory size | 2.5 KiB |
Length
Max length | 118 |
---|---|
Median length | 68 |
Mean length | 41.148936 |
Min length | 6 |
Characters and Unicode
Total characters | 7736 |
---|---|
Distinct characters | 565 |
Distinct categories | 10 ? |
Distinct scripts | 3 ? |
Distinct blocks | 6 ? |
Unique
Unique | 170 ? |
---|---|
Unique (%) | 90.4% |
Sample
1st row | 1173098 |
---|---|
2nd row | 전국의 해돋이 명소에 수십만 명의 해맞이 인파가 몰렸습니다. 문재인 대통령도 2017년을 빛낸 의인 6명과 함께 북한산 해돋이 산행으로 집권 2년차 공식 일정을 시작했습니다. |
3rd row | ▶ "평창 대표단 파견 용의"…'핵 단추' 위협 |
4th row | 북한 김정은이 평창 올림픽에 대표단을 파견할 용의가 있다고 말했습니다. 하지만, 미국을 향해선 미 본토 전역이 사정권 안에 있다고 하는 등 위협을 이어갔습니다. |
5th row | ▶ 청와대 "북 대표단 파견·당국 대화 용의 환영" |
Value | Count | Frequency (%) |
74 | 4.2% | |
▶ | 25 | 1.4% |
인터뷰 | 16 | 0.9% |
기자 | 16 | 0.9% |
북한 | 13 | 0.7% |
김정은 | 12 | 0.7% |
첫 | 10 | 0.6% |
【 | 9 | 0.5% |
】 | 9 | 0.5% |
김정은은 | 9 | 0.5% |
Other values (1187) | 1589 |
Most occurring characters
Value | Count | Frequency (%) |
1905 | 24.6% | |
다 | 174 | 2.2% |
. | 149 | 1.9% |
이 | 144 | 1.9% |
니 | 119 | 1.5% |
에 | 96 | 1.2% |
한 | 94 | 1.2% |
을 | 90 | 1.2% |
은 | 88 | 1.1% |
대 | 83 | 1.1% |
Other values (555) | 4794 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 5130 | |
Space Separator | 1905 | 24.6% |
Other Punctuation | 363 | 4.7% |
Decimal Number | 154 | 2.0% |
Lowercase Letter | 58 | 0.7% |
Uppercase Letter | 38 | 0.5% |
Other Symbol | 27 | 0.3% |
Dash Punctuation | 21 | 0.3% |
Open Punctuation | 20 | 0.3% |
Close Punctuation | 20 | 0.3% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
다 | 174 | 3.4% |
이 | 144 | 2.8% |
니 | 119 | 2.3% |
에 | 96 | 1.9% |
한 | 94 | 1.8% |
을 | 90 | 1.8% |
은 | 88 | 1.7% |
대 | 83 | 1.6% |
의 | 78 | 1.5% |
해 | 75 | 1.5% |
Other values (494) | 4089 |
Lowercase Letter
Value | Count | Frequency (%) |
o | 7 | |
m | 6 | |
n | 6 | |
e | 5 | |
r | 4 | 6.9% |
h | 4 | 6.9% |
c | 4 | 6.9% |
a | 4 | 6.9% |
k | 3 | 5.2% |
b | 3 | 5.2% |
Other values (8) | 12 |
Uppercase Letter
Value | Count | Frequency (%) |
N | 10 | |
M | 9 | |
B | 9 | |
S | 2 | 5.3% |
D | 1 | 2.6% |
C | 1 | 2.6% |
L | 1 | 2.6% |
U | 1 | 2.6% |
A | 1 | 2.6% |
E | 1 | 2.6% |
Other values (2) | 2 | 5.3% |
Other Punctuation
Value | Count | Frequency (%) |
. | 149 | |
" | 74 | |
, | 43 | 11.8% |
: | 32 | 8.8% |
' | 22 | 6.1% |
/ | 20 | 5.5% |
… | 11 | 3.0% |
@ | 4 | 1.1% |
· | 4 | 1.1% |
% | 2 | 0.6% |
Decimal Number
Value | Count | Frequency (%) |
1 | 48 | |
0 | 28 | |
3 | 19 | 12.3% |
2 | 17 | 11.0% |
7 | 15 | 9.7% |
6 | 6 | 3.9% |
5 | 6 | 3.9% |
9 | 5 | 3.2% |
8 | 5 | 3.2% |
4 | 5 | 3.2% |
Open Punctuation
Value | Count | Frequency (%) |
【 | 9 | |
( | 7 | |
[ | 4 |
Close Punctuation
Value | Count | Frequency (%) |
】 | 9 | |
) | 7 | |
] | 4 |
Other Symbol
Value | Count | Frequency (%) |
▶ | 25 | |
☎ | 2 | 7.4% |
Space Separator
Value | Count | Frequency (%) |
1905 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 21 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 5130 | |
Common | 2510 | |
Latin | 96 | 1.2% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
다 | 174 | 3.4% |
이 | 144 | 2.8% |
니 | 119 | 2.3% |
에 | 96 | 1.9% |
한 | 94 | 1.8% |
을 | 90 | 1.8% |
은 | 88 | 1.7% |
대 | 83 | 1.6% |
의 | 78 | 1.5% |
해 | 75 | 1.5% |
Other values (494) | 4089 |
Common
Value | Count | Frequency (%) |
1905 | ||
. | 149 | 5.9% |
" | 74 | 2.9% |
1 | 48 | 1.9% |
, | 43 | 1.7% |
: | 32 | 1.3% |
0 | 28 | 1.1% |
▶ | 25 | 1.0% |
' | 22 | 0.9% |
- | 21 | 0.8% |
Other values (21) | 163 | 6.5% |
Latin
Value | Count | Frequency (%) |
N | 10 | 10.4% |
M | 9 | 9.4% |
B | 9 | 9.4% |
o | 7 | 7.3% |
m | 6 | 6.2% |
n | 6 | 6.2% |
e | 5 | 5.2% |
r | 4 | 4.2% |
h | 4 | 4.2% |
c | 4 | 4.2% |
Other values (20) | 32 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 5130 | |
ASCII | 2546 | |
Geometric Shapes | 25 | 0.3% |
None | 22 | 0.3% |
Punctuation | 11 | 0.1% |
Misc Symbols | 2 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1905 | ||
. | 149 | 5.9% |
" | 74 | 2.9% |
1 | 48 | 1.9% |
, | 43 | 1.7% |
: | 32 | 1.3% |
0 | 28 | 1.1% |
' | 22 | 0.9% |
- | 21 | 0.8% |
/ | 20 | 0.8% |
Other values (45) | 204 | 8.0% |
Hangul
Value | Count | Frequency (%) |
다 | 174 | 3.4% |
이 | 144 | 2.8% |
니 | 119 | 2.3% |
에 | 96 | 1.9% |
한 | 94 | 1.8% |
을 | 90 | 1.8% |
은 | 88 | 1.7% |
대 | 83 | 1.6% |
의 | 78 | 1.5% |
해 | 75 | 1.5% |
Other values (494) | 4089 |
Geometric Shapes
Value | Count | Frequency (%) |
▶ | 25 |
Punctuation
Value | Count | Frequency (%) |
… | 11 |
None
Value | Count | Frequency (%) |
【 | 9 | |
】 | 9 | |
· | 4 |
Misc Symbols
Value | Count | Frequency (%) |
☎ | 2 |
NEWS_CGR_CD
Text
MISSING
 
Distinct | 11 |
---|---|
Distinct (%) | 57.9% |
Missing | 281 |
Missing (%) | 93.7% |
Memory size | 2.5 KiB |
Value | Count | Frequency (%) |
mbn00006 | 8 | |
mbn00009 | 2 | 10.5% |
유호정 | 1 | 5.3% |
송주영 | 1 | 5.3% |
김한준 | 1 | 5.3% |
차민아 | 1 | 5.3% |
오태윤 | 1 | 5.3% |
주진희 | 1 | 5.3% |
이혁준 | 1 | 5.3% |
이정호 | 1 | 5.3% |
Most occurring characters
Value | Count | Frequency (%) |
0 | 40 | |
m | 10 | 9.3% |
n | 10 | 9.3% |
b | 10 | 9.3% |
6 | 8 | 7.5% |
정 | 2 | 1.9% |
이 | 2 | 1.9% |
주 | 2 | 1.9% |
준 | 2 | 1.9% |
9 | 2 | 1.9% |
Other values (18) | 19 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 50 | |
Lowercase Letter | 30 | |
Other Letter | 27 |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
정 | 2 | 7.4% |
이 | 2 | 7.4% |
주 | 2 | 7.4% |
준 | 2 | 7.4% |
호 | 2 | 7.4% |
송 | 1 | 3.7% |
태 | 1 | 3.7% |
은 | 1 | 3.7% |
최 | 1 | 3.7% |
혁 | 1 | 3.7% |
Other values (12) | 12 |
Decimal Number
Value | Count | Frequency (%) |
0 | 40 | |
6 | 8 | 16.0% |
9 | 2 | 4.0% |
Lowercase Letter
Value | Count | Frequency (%) |
m | 10 | |
n | 10 | |
b | 10 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 50 | |
Latin | 30 | |
Hangul | 27 |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
정 | 2 | 7.4% |
이 | 2 | 7.4% |
주 | 2 | 7.4% |
준 | 2 | 7.4% |
호 | 2 | 7.4% |
송 | 1 | 3.7% |
태 | 1 | 3.7% |
은 | 1 | 3.7% |
최 | 1 | 3.7% |
혁 | 1 | 3.7% |
Other values (12) | 12 |
Common
Value | Count | Frequency (%) |
0 | 40 | |
6 | 8 | 16.0% |
9 | 2 | 4.0% |
Latin
Value | Count | Frequency (%) |
m | 10 | |
n | 10 | |
b | 10 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 80 | |
Hangul | 27 | 25.2% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 40 | |
m | 10 | 12.5% |
n | 10 | 12.5% |
b | 10 | 12.5% |
6 | 8 | 10.0% |
9 | 2 | 2.5% |
Hangul
Value | Count | Frequency (%) |
정 | 2 | 7.4% |
이 | 2 | 7.4% |
주 | 2 | 7.4% |
준 | 2 | 7.4% |
호 | 2 | 7.4% |
송 | 1 | 3.7% |
태 | 1 | 3.7% |
은 | 1 | 3.7% |
최 | 1 | 3.7% |
혁 | 1 | 3.7% |
Other values (12) | 12 |
NEWS_NO
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 11 |
---|---|
Distinct (%) | 55.0% |
Missing | 280 |
Missing (%) | 93.3% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 11802334 |
Minimum | 3424555 |
---|---|
Maximum | 20180101 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.8 KiB |
Quantile statistics
Minimum | 3424555 |
---|---|
5-th percentile | 3424556.9 |
Q1 | 3424566.2 |
median | 11802340 |
Q3 | 20180101 |
95-th percentile | 20180101 |
Maximum | 20180101 |
Range | 16755546 |
Interquartile range (IQR) | 16755535 |
Descriptive statistics
Standard deviation | 8595407.9 |
---|---|
Coefficient of variation (CV) | 0.72828037 |
Kurtosis | -2.2352941 |
Mean | 11802334 |
Median Absolute Deviation (MAD) | 8377760.5 |
Skewness | -1.819558 × 10-12 |
Sum | 2.3604667 × 108 |
Variance | 7.3881038 × 1013 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
20180101 | 10 | 3.3% |
3424555 | 1 | 0.3% |
3424570 | 1 | 0.3% |
3424576 | 1 | 0.3% |
3424561 | 1 | 0.3% |
3424578 | 1 | 0.3% |
3424557 | 1 | 0.3% |
3424580 | 1 | 0.3% |
3424558 | 1 | 0.3% |
3424568 | 1 | 0.3% |
(Missing) | 280 |
Value | Count | Frequency (%) |
3424555 | 1 | |
3424557 | 1 | |
3424558 | 1 | |
3424560 | 1 | |
3424561 | 1 | |
3424568 | 1 | |
3424570 | 1 | |
3424576 | 1 | |
3424578 | 1 | |
3424580 | 1 |
Value | Count | Frequency (%) |
20180101 | 10 | |
3424580 | 1 | 0.3% |
3424578 | 1 | 0.3% |
3424576 | 1 | 0.3% |
3424570 | 1 | 0.3% |
3424568 | 1 | 0.3% |
3424561 | 1 | 0.3% |
3424560 | 1 | 0.3% |
3424558 | 1 | 0.3% |
3424557 | 1 | 0.3% |
BDCT_DATE
Text
MISSING
 
Distinct | 11 |
---|---|
Distinct (%) | 55.0% |
Missing | 280 |
Missing (%) | 93.3% |
Memory size | 2.5 KiB |
Length
Max length | 82 |
---|---|
Median length | 45 |
Mean length | 45 |
Min length | 8 |
Characters and Unicode
Total characters | 900 |
---|---|
Distinct characters | 37 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 10 ? |
---|---|
Unique (%) | 50.0% |
Sample
1st row | 20180101 |
---|---|
2nd row | http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1173098 |
3rd row | 20180101 |
4th row | http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1173099 |
5th row | 20180101 |
Value | Count | Frequency (%) |
20180101 | 10 | |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173098 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173099 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173100 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173101 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173102 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173103 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173104 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173105 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173106 | 1 | 5.0% |
Most occurring characters
Value | Count | Frequency (%) |
t | 80 | 8.9% |
n | 80 | 8.9% |
1 | 59 | 6.6% |
0 | 51 | 5.7% |
e | 50 | 5.6% |
o | 50 | 5.6% |
c | 50 | 5.6% |
. | 40 | 4.4% |
/ | 40 | 4.4% |
w | 30 | 3.3% |
Other values (27) | 370 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 560 | |
Decimal Number | 170 | 18.9% |
Other Punctuation | 110 | 12.2% |
Connector Punctuation | 30 | 3.3% |
Math Symbol | 20 | 2.2% |
Uppercase Letter | 10 | 1.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
t | 80 | |
n | 80 | |
e | 50 | 8.9% |
o | 50 | 8.9% |
c | 50 | 8.9% |
w | 30 | 5.4% |
m | 30 | 5.4% |
s | 20 | 3.6% |
i | 20 | 3.6% |
d | 20 | 3.6% |
Other values (9) | 130 |
Decimal Number
Value | Count | Frequency (%) |
1 | 59 | |
0 | 51 | |
2 | 21 | 12.4% |
8 | 11 | 6.5% |
7 | 11 | 6.5% |
3 | 11 | 6.5% |
9 | 3 | 1.8% |
4 | 1 | 0.6% |
5 | 1 | 0.6% |
6 | 1 | 0.6% |
Other Punctuation
Value | Count | Frequency (%) |
. | 40 | |
/ | 40 | |
: | 10 | 9.1% |
? | 10 | 9.1% |
& | 10 | 9.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 30 |
Math Symbol
Value | Count | Frequency (%) |
= | 20 |
Uppercase Letter
Value | Count | Frequency (%) |
C | 10 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 570 | |
Common | 330 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
t | 80 | |
n | 80 | |
e | 50 | 8.8% |
o | 50 | 8.8% |
c | 50 | 8.8% |
w | 30 | 5.3% |
m | 30 | 5.3% |
s | 20 | 3.5% |
i | 20 | 3.5% |
d | 20 | 3.5% |
Other values (10) | 140 |
Common
Value | Count | Frequency (%) |
1 | 59 | |
0 | 51 | |
. | 40 | |
/ | 40 | |
_ | 30 | |
2 | 21 | 6.4% |
= | 20 | 6.1% |
8 | 11 | 3.3% |
7 | 11 | 3.3% |
3 | 11 | 3.3% |
Other values (7) | 36 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 900 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
t | 80 | 8.9% |
n | 80 | 8.9% |
1 | 59 | 6.6% |
0 | 51 | 5.7% |
e | 50 | 5.6% |
o | 50 | 5.6% |
c | 50 | 5.6% |
. | 40 | 4.4% |
/ | 40 | 4.4% |
w | 30 | 3.3% |
Other values (27) | 370 |
BDCT_TIME
Real number (ℝ)
MISSING
 
Distinct | 10 |
---|---|
Distinct (%) | 100.0% |
Missing | 290 |
Missing (%) | 96.7% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1935.9 |
Minimum | 1930 |
---|---|
Maximum | 1943 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.8 KiB |
Quantile statistics
Minimum | 1930 |
---|---|
5-th percentile | 1930.45 |
Q1 | 1932.5 |
median | 1935.5 |
Q3 | 1938.75 |
95-th percentile | 1942.1 |
Maximum | 1943 |
Range | 13 |
Interquartile range (IQR) | 6.25 |
Descriptive statistics
Standard deviation | 4.3320511 |
---|---|
Coefficient of variation (CV) | 0.0022377453 |
Kurtosis | -1.0159488 |
Mean | 1935.9 |
Median Absolute Deviation (MAD) | 3.5 |
Skewness | 0.23862779 |
Sum | 19359 |
Variance | 18.766667 |
Monotonicity | Strictly increasing |
Value | Count | Frequency (%) |
1930 | 1 | 0.3% |
1931 | 1 | 0.3% |
1932 | 1 | 0.3% |
1934 | 1 | 0.3% |
1935 | 1 | 0.3% |
1936 | 1 | 0.3% |
1938 | 1 | 0.3% |
1939 | 1 | 0.3% |
1941 | 1 | 0.3% |
1943 | 1 | 0.3% |
(Missing) | 290 |
Value | Count | Frequency (%) |
1930 | 1 | |
1931 | 1 | |
1932 | 1 | |
1934 | 1 | |
1935 | 1 | |
1936 | 1 | |
1938 | 1 | |
1939 | 1 | |
1941 | 1 | |
1943 | 1 |
Value | Count | Frequency (%) |
1943 | 1 | |
1941 | 1 | |
1939 | 1 | |
1938 | 1 | |
1936 | 1 | |
1935 | 1 | |
1934 | 1 | |
1932 | 1 | |
1931 | 1 | |
1930 | 1 |
NWS_SJ
Text
MISSING
 
Distinct | 10 |
---|---|
Distinct (%) | 100.0% |
Missing | 290 |
Missing (%) | 96.7% |
Memory size | 2.5 KiB |
Length
Max length | 37 |
---|---|
Median length | 29 |
Mean length | 28.6 |
Min length | 22 |
Characters and Unicode
Total characters | 286 |
---|---|
Distinct characters | 124 |
Distinct categories | 5 ? |
Distinct scripts | 3 ? |
Distinct blocks | 4 ? |
Unique
Unique | 10 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | 김주하 앵커가 전하는 1월 1일 MBN 뉴스8 주요뉴스 |
---|---|
2nd row | "황금 개띠의 해 첫 주인공은 나야 나" |
3rd row | 문 대통령, 의인들과 신년맞이 '해돋이 산행'…시민과 '깜짝 통화' |
4th row | 김정은, "평창 올림픽 참가 용의"…당국 대화 시사 |
5th row | 북 김정은 "미 전역이 사정권"…'핵 단추' 위협 |
Value | Count | Frequency (%) |
김정은 | 5 | 7.6% |
북 | 2 | 3.0% |
솔솔 | 1 | 1.5% |
사정권"…'핵 | 1 | 1.5% |
단추 | 1 | 1.5% |
위협 | 1 | 1.5% |
청와대 | 1 | 1.5% |
대화제의 | 1 | 1.5% |
환영"…여야 | 1 | 1.5% |
신년사 | 1 | 1.5% |
Other values (51) | 51 |
Most occurring characters
Value | Count | Frequency (%) |
56 | 19.6% | |
' | 16 | 5.6% |
" | 10 | 3.5% |
정 | 8 | 2.8% |
은 | 8 | 2.8% |
김 | 7 | 2.4% |
대 | 7 | 2.4% |
… | 7 | 2.4% |
이 | 5 | 1.7% |
의 | 5 | 1.7% |
Other values (114) | 157 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 182 | |
Space Separator | 56 | 19.6% |
Other Punctuation | 39 | 13.6% |
Uppercase Letter | 6 | 2.1% |
Decimal Number | 3 | 1.0% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
정 | 8 | 4.4% |
은 | 8 | 4.4% |
김 | 7 | 3.8% |
대 | 7 | 3.8% |
이 | 5 | 2.7% |
의 | 5 | 2.7% |
가 | 4 | 2.2% |
화 | 3 | 1.6% |
주 | 3 | 1.6% |
사 | 3 | 1.6% |
Other values (99) | 129 |
Other Punctuation
Value | Count | Frequency (%) |
' | 16 | |
" | 10 | |
… | 7 | |
, | 3 | 7.7% |
· | 2 | 5.1% |
? | 1 | 2.6% |
Uppercase Letter
Value | Count | Frequency (%) |
A | 1 | |
E | 1 | |
U | 1 | |
N | 1 | |
B | 1 | |
M | 1 |
Decimal Number
Value | Count | Frequency (%) |
1 | 2 | |
8 | 1 |
Space Separator
Value | Count | Frequency (%) |
56 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 182 | |
Common | 98 | |
Latin | 6 | 2.1% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
정 | 8 | 4.4% |
은 | 8 | 4.4% |
김 | 7 | 3.8% |
대 | 7 | 3.8% |
이 | 5 | 2.7% |
의 | 5 | 2.7% |
가 | 4 | 2.2% |
화 | 3 | 1.6% |
주 | 3 | 1.6% |
사 | 3 | 1.6% |
Other values (99) | 129 |
Common
Value | Count | Frequency (%) |
56 | ||
' | 16 | 16.3% |
" | 10 | 10.2% |
… | 7 | 7.1% |
, | 3 | 3.1% |
1 | 2 | 2.0% |
· | 2 | 2.0% |
? | 1 | 1.0% |
8 | 1 | 1.0% |
Latin
Value | Count | Frequency (%) |
A | 1 | |
E | 1 | |
U | 1 | |
N | 1 | |
B | 1 | |
M | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 182 | |
ASCII | 95 | |
Punctuation | 7 | 2.4% |
None | 2 | 0.7% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
56 | ||
' | 16 | 16.8% |
" | 10 | 10.5% |
, | 3 | 3.2% |
1 | 2 | 2.1% |
? | 1 | 1.1% |
A | 1 | 1.1% |
E | 1 | 1.1% |
U | 1 | 1.1% |
8 | 1 | 1.1% |
Other values (3) | 3 | 3.2% |
Hangul
Value | Count | Frequency (%) |
정 | 8 | 4.4% |
은 | 8 | 4.4% |
김 | 7 | 3.8% |
대 | 7 | 3.8% |
이 | 5 | 2.7% |
의 | 5 | 2.7% |
가 | 4 | 2.2% |
화 | 3 | 1.6% |
주 | 3 | 1.6% |
사 | 3 | 1.6% |
Other values (99) | 129 |
Punctuation
Value | Count | Frequency (%) |
… | 7 |
None
Value | Count | Frequency (%) |
· | 2 |
NWS_CN
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 3 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 2.5 KiB |
<NA> | |
---|---|
【 앵커멘트 】 | 9 |
▶ '무술년 첫 일출'…전국 해맞이 인파 북적 | 1 |
Length
Max length | 25 |
---|---|
Median length | 4 |
Mean length | 4.19 |
Min length | 4 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 0.3% |
Sample
1st row | <NA> |
---|---|
2nd row | ▶ '무술년 첫 일출'…전국 해맞이 인파 북적 |
3rd row | <NA> |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 290 | |
【 앵커멘트 】 | 9 | 3.0% |
▶ '무술년 첫 일출'…전국 해맞이 인파 북적 | 1 | 0.3% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 290 | |
【 | 9 | 2.8% |
앵커멘트 | 9 | 2.8% |
】 | 9 | 2.8% |
▶ | 1 | 0.3% |
무술년 | 1 | 0.3% |
첫 | 1 | 0.3% |
일출'…전국 | 1 | 0.3% |
해맞이 | 1 | 0.3% |
인파 | 1 | 0.3% |
NWS_JRNL_NM
Unsupported
MISSING
  REJECTED
  UNSUPPORTED
 
Missing | 300 |
---|---|
Missing (%) | 100.0% |
Memory size | 2.8 KiB |
REG_DATE
Unsupported
MISSING
  REJECTED
  UNSUPPORTED
 
Missing | 300 |
---|---|
Missing (%) | 100.0% |
Memory size | 2.8 KiB |
MVP_CRS_NM
Unsupported
MISSING
  REJECTED
  UNSUPPORTED
 
Missing | 300 |
---|---|
Missing (%) | 100.0% |
Memory size | 2.8 KiB |
Unnamed: 10
Unsupported
MISSING
  REJECTED
  UNSUPPORTED
 
Missing | 300 |
---|---|
Missing (%) | 100.0% |
Memory size | 2.8 KiB |
NEWS_CGR_CD | NEWS_NO | BDCT_DATE | BDCT_TIME | NWS_SJ | NWS_CN | |
---|---|---|---|---|---|---|
NEWS_CGR_CD | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 |
NEWS_NO | 1.000 | 1.000 | 1.000 | NaN | NaN | NaN |
BDCT_DATE | 1.000 | 1.000 | 1.000 | NaN | NaN | NaN |
BDCT_TIME | 1.000 | NaN | NaN | 1.000 | 1.000 | NaN |
NWS_SJ | 1.000 | NaN | NaN | 1.000 | 1.000 | 1.000 |
NWS_CN | 0.000 | NaN | NaN | NaN | 1.000 | 1.000 |
NEWS_NO | BDCT_TIME | NWS_CN | |
---|---|---|---|
NEWS_NO | 1.000 | 0.018 | 1.000 |
BDCT_TIME | 0.018 | 1.000 | 0.000 |
NWS_CN | 1.000 | 0.000 | 1.000 |
BDCT_NO | NEWS_CGR_CD | NEWS_NO | BDCT_DATE | BDCT_TIME | NWS_SJ | NWS_CN | NWS_JRNL_NM | REG_DATE | MVP_CRS_NM | Unnamed: 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
1 | 1173098 | mbn00006 | 3424555 | 20180101 | 1930 | 김주하 앵커가 전하는 1월 1일 MBN 뉴스8 주요뉴스 | ▶ '무술년 첫 일출'…전국 해맞이 인파 북적 | <NA> | <NA> | <NA> | <NA> |
2 | 전국의 해돋이 명소에 수십만 명의 해맞이 인파가 몰렸습니다. 문재인 대통령도 2017년을 빛낸 의인 6명과 함께 북한산 해돋이 산행으로 집권 2년차 공식 일정을 시작했습니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
3 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
4 | ▶ "평창 대표단 파견 용의"…'핵 단추' 위협 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
5 | 북한 김정은이 평창 올림픽에 대표단을 파견할 용의가 있다고 말했습니다. 하지만, 미국을 향해선 미 본토 전역이 사정권 안에 있다고 하는 등 위협을 이어갔습니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
6 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
7 | ▶ 청와대 "북 대표단 파견·당국 대화 용의 환영" | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
8 | 청와대는 북한의 평창올림픽 참가 용의에, 환영의 뜻을 밝혔습니다. 여당도 긍정적으로 평가했지만, 자유한국당은 얄팍한 위장 평화 공세라며 반박했습니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
BDCT_NO | NEWS_CGR_CD | NEWS_NO | BDCT_DATE | BDCT_TIME | NWS_SJ | NWS_CN | NWS_JRNL_NM | REG_DATE | MVP_CRS_NM | Unnamed: 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
290 | 이 전 대통령은 원전 계약을 성사시키는 대신 국방분야 협력을 약속한 것 아니냐는 주장에 대해 "알지 못한다"며 "이면계약은 없었다"고 말했습니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
291 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
292 | 그러면서 "내가 이야기하면 폭로여서 이야기할 수 없다"며 "문재인 정부가 정신을 차리고 수습한다고 하니 잘 정리될 것"이라고 덧붙였습니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
293 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
294 | ▶ 스탠딩 : 최은미 / 기자 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
295 | - "잠시 주춤하는 듯했던 자유한국당도 기다렸다는 듯 국정조사를 촉구하고 나서며 당분간 논란은 계속될 것으로 보입니다. MBN뉴스 최은미입니다." [ cem@mbn.co.kr ] | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
296 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
297 | 영상취재 : 정재성, 박상곤 기자 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
298 | 영상편집 : 이소영 | 최은미 | 20180101 | http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1173107 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
299 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
Most frequently occurring
BDCT_NO | NEWS_CGR_CD | NEWS_NO | BDCT_DATE | BDCT_TIME | NWS_SJ | NWS_CN | # duplicates | |
---|---|---|---|---|---|---|---|---|
3 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 112 |
2 | 【 기자 】 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 9 |
0 | ▶ 인터뷰 : 김정은 / 북한 노동당 위원장 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 5 |
1 | ▶ 인터뷰 : 문재인 / 대통령 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 2 |