Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 298 |
Missing cells | 2713 |
Missing cells (%) | 82.8% |
Duplicate rows | 5 |
Duplicate rows (%) | 1.7% |
Total size in memory | 27.5 KiB |
Average record size in memory | 94.4 B |
Variable types
Text | 4 |
---|---|
Numeric | 2 |
Categorical | 1 |
Unsupported | 4 |
Dataset
Description | 샘플 데이터 |
---|---|
Author | MBN |
URL | https://kdx.kr/data/view/173 |
Dataset has 5 (1.7%) duplicate rows | Duplicates |
NEWS_NO is highly overall correlated with NWS_CN | High correlation |
BDCT_TIME is highly overall correlated with NWS_CN | High correlation |
NWS_CN is highly overall correlated with NEWS_NO and 1 other fields | High correlation |
NWS_CN is highly imbalanced (78.8%) | Imbalance |
BDCT_NO has 111 (37.2%) missing values | Missing |
NEWS_CGR_CD has 278 (93.3%) missing values | Missing |
NEWS_NO has 278 (93.3%) missing values | Missing |
BDCT_DATE has 278 (93.3%) missing values | Missing |
BDCT_TIME has 288 (96.6%) missing values | Missing |
NWS_SJ has 288 (96.6%) missing values | Missing |
NWS_JRNL_NM has 298 (100.0%) missing values | Missing |
REG_DATE has 298 (100.0%) missing values | Missing |
MVP_CRS_NM has 298 (100.0%) missing values | Missing |
Unnamed: 10 has 298 (100.0%) missing values | Missing |
NWS_JRNL_NM is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
REG_DATE is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
MVP_CRS_NM is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
Analysis started | 2023-12-11 21:11:43.772057 |
---|---|
Analysis finished | 2023-12-11 21:11:44.825614 |
Duration | 1.05 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
BDCT_NO
Text
MISSING
 
Distinct | 164 |
---|---|
Distinct (%) | 87.7% |
Missing | 111 |
Missing (%) | 37.2% |
Memory size | 2.5 KiB |
Length
Max length | 132 |
---|---|
Median length | 81 |
Mean length | 41.235294 |
Min length | 6 |
Characters and Unicode
Total characters | 7711 |
---|---|
Distinct characters | 552 |
Distinct categories | 10 ? |
Distinct scripts | 3 ? |
Distinct blocks | 7 ? |
Unique
Unique | 159 ? |
---|---|
Unique (%) | 85.0% |
Sample
1st row | 1144456 |
---|---|
2nd row | 박영수 특별검사팀은 새해 첫날부터 문형표 전 복지부 장관과 김 종 전 문체부 차관 등 '최순실 게이트'의 핵심 인물들을 줄소환했습니다. |
3rd row | 박영수 특검은 열심히 하겠다는 짧은 한마디로 새해 특검의 각오를 밝혔습니다. |
4th row | 한민용 기자입니다. |
5th row | 【 기자 】 |
Value | Count | Frequency (%) |
91 | 5.0% | |
▶ | 24 | 1.3% |
기자 | 20 | 1.1% |
박근혜 | 16 | 0.9% |
전 | 16 | 0.9% |
김정은 | 12 | 0.7% |
인터뷰 | 11 | 0.6% |
김정은은 | 11 | 0.6% |
【 | 10 | 0.5% |
】 | 10 | 0.5% |
Other values (1148) | 1600 |
Most occurring characters
Value | Count | Frequency (%) |
1904 | 24.7% | |
다 | 148 | 1.9% |
. | 140 | 1.8% |
이 | 128 | 1.7% |
니 | 111 | 1.4% |
을 | 86 | 1.1% |
의 | 83 | 1.1% |
은 | 81 | 1.1% |
는 | 78 | 1.0% |
고 | 78 | 1.0% |
Other values (542) | 4874 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 5090 | |
Space Separator | 1904 | 24.7% |
Other Punctuation | 362 | 4.7% |
Decimal Number | 142 | 1.8% |
Uppercase Letter | 70 | 0.9% |
Lowercase Letter | 47 | 0.6% |
Other Symbol | 27 | 0.4% |
Dash Punctuation | 25 | 0.3% |
Close Punctuation | 22 | 0.3% |
Open Punctuation | 22 | 0.3% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
다 | 148 | 2.9% |
이 | 128 | 2.5% |
니 | 111 | 2.2% |
을 | 86 | 1.7% |
의 | 83 | 1.6% |
은 | 81 | 1.6% |
는 | 78 | 1.5% |
고 | 78 | 1.5% |
에 | 71 | 1.4% |
사 | 69 | 1.4% |
Other values (484) | 4157 |
Lowercase Letter
Value | Count | Frequency (%) |
n | 5 | |
m | 5 | |
a | 4 | 8.5% |
o | 4 | 8.5% |
c | 4 | 8.5% |
t | 3 | 6.4% |
b | 3 | 6.4% |
r | 3 | 6.4% |
g | 2 | 4.3% |
y | 2 | 4.3% |
Other values (9) | 12 |
Decimal Number
Value | Count | Frequency (%) |
4 | 36 | |
1 | 34 | |
3 | 13 | 9.2% |
0 | 13 | 9.2% |
2 | 13 | 9.2% |
6 | 10 | 7.0% |
5 | 10 | 7.0% |
7 | 8 | 5.6% |
8 | 4 | 2.8% |
9 | 1 | 0.7% |
Other Punctuation
Value | Count | Frequency (%) |
. | 140 | |
" | 63 | |
, | 54 | 14.9% |
: | 38 | 10.5% |
' | 30 | 8.3% |
/ | 25 | 6.9% |
… | 7 | 1.9% |
@ | 4 | 1.1% |
· | 1 | 0.3% |
Uppercase Letter
Value | Count | Frequency (%) |
N | 18 | |
B | 11 | |
M | 11 | |
S | 9 | |
C | 8 | |
Y | 8 | |
K | 2 | 2.9% |
D | 2 | 2.9% |
L | 1 | 1.4% |
Other Symbol
Value | Count | Frequency (%) |
▶ | 24 | |
☎ | 2 | 7.4% |
㎜ | 1 | 3.7% |
Close Punctuation
Value | Count | Frequency (%) |
】 | 10 | |
) | 8 | |
] | 4 | 18.2% |
Open Punctuation
Value | Count | Frequency (%) |
【 | 10 | |
( | 8 | |
[ | 4 | 18.2% |
Space Separator
Value | Count | Frequency (%) |
1904 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 25 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 5090 | |
Common | 2504 | |
Latin | 117 | 1.5% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
다 | 148 | 2.9% |
이 | 128 | 2.5% |
니 | 111 | 2.2% |
을 | 86 | 1.7% |
의 | 83 | 1.6% |
은 | 81 | 1.6% |
는 | 78 | 1.5% |
고 | 78 | 1.5% |
에 | 71 | 1.4% |
사 | 69 | 1.4% |
Other values (484) | 4157 |
Common
Value | Count | Frequency (%) |
1904 | ||
. | 140 | 5.6% |
" | 63 | 2.5% |
, | 54 | 2.2% |
: | 38 | 1.5% |
4 | 36 | 1.4% |
1 | 34 | 1.4% |
' | 30 | 1.2% |
- | 25 | 1.0% |
/ | 25 | 1.0% |
Other values (20) | 155 | 6.2% |
Latin
Value | Count | Frequency (%) |
N | 18 | |
B | 11 | 9.4% |
M | 11 | 9.4% |
S | 9 | 7.7% |
C | 8 | 6.8% |
Y | 8 | 6.8% |
n | 5 | 4.3% |
m | 5 | 4.3% |
a | 4 | 3.4% |
o | 4 | 3.4% |
Other values (18) | 34 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 5090 | |
ASCII | 2566 | |
Geometric Shapes | 24 | 0.3% |
None | 21 | 0.3% |
Punctuation | 7 | 0.1% |
Misc Symbols | 2 | < 0.1% |
CJK Compat | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1904 | ||
. | 140 | 5.5% |
" | 63 | 2.5% |
, | 54 | 2.1% |
: | 38 | 1.5% |
4 | 36 | 1.4% |
1 | 34 | 1.3% |
' | 30 | 1.2% |
- | 25 | 1.0% |
/ | 25 | 1.0% |
Other values (41) | 217 | 8.5% |
Hangul
Value | Count | Frequency (%) |
다 | 148 | 2.9% |
이 | 128 | 2.5% |
니 | 111 | 2.2% |
을 | 86 | 1.7% |
의 | 83 | 1.6% |
은 | 81 | 1.6% |
는 | 78 | 1.5% |
고 | 78 | 1.5% |
에 | 71 | 1.4% |
사 | 69 | 1.4% |
Other values (484) | 4157 |
Geometric Shapes
Value | Count | Frequency (%) |
▶ | 24 |
None
Value | Count | Frequency (%) |
】 | 10 | |
【 | 10 | |
· | 1 | 4.8% |
Punctuation
Value | Count | Frequency (%) |
… | 7 |
Misc Symbols
Value | Count | Frequency (%) |
☎ | 2 |
CJK Compat
Value | Count | Frequency (%) |
㎜ | 1 |
NEWS_CGR_CD
Text
MISSING
 
Distinct | 12 |
---|---|
Distinct (%) | 60.0% |
Missing | 278 |
Missing (%) | 93.3% |
Memory size | 2.5 KiB |
Value | Count | Frequency (%) |
mbn00006 | 7 | |
mbn00009 | 3 | |
한민용 | 1 | 5.0% |
전정인 | 1 | 5.0% |
노태현 | 1 | 5.0% |
박통일 | 1 | 5.0% |
윤지원 | 1 | 5.0% |
강영구 | 1 | 5.0% |
이재호 | 1 | 5.0% |
김건훈 | 1 | 5.0% |
Other values (2) | 2 | 10.0% |
Most occurring characters
Value | Count | Frequency (%) |
0 | 40 | |
m | 10 | 9.1% |
n | 10 | 9.1% |
b | 10 | 9.1% |
6 | 7 | 6.4% |
9 | 3 | 2.7% |
이 | 2 | 1.8% |
훈 | 2 | 1.8% |
지 | 2 | 1.8% |
강 | 1 | 0.9% |
Other values (23) | 23 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 50 | |
Lowercase Letter | 30 | |
Other Letter | 30 |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
이 | 2 | 6.7% |
훈 | 2 | 6.7% |
지 | 2 | 6.7% |
강 | 1 | 3.3% |
영 | 1 | 3.3% |
구 | 1 | 3.3% |
건 | 1 | 3.3% |
재 | 1 | 3.3% |
호 | 1 | 3.3% |
김 | 1 | 3.3% |
Other values (17) | 17 |
Decimal Number
Value | Count | Frequency (%) |
0 | 40 | |
6 | 7 | 14.0% |
9 | 3 | 6.0% |
Lowercase Letter
Value | Count | Frequency (%) |
m | 10 | |
n | 10 | |
b | 10 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 50 | |
Latin | 30 | |
Hangul | 30 |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
이 | 2 | 6.7% |
훈 | 2 | 6.7% |
지 | 2 | 6.7% |
강 | 1 | 3.3% |
영 | 1 | 3.3% |
구 | 1 | 3.3% |
건 | 1 | 3.3% |
재 | 1 | 3.3% |
호 | 1 | 3.3% |
김 | 1 | 3.3% |
Other values (17) | 17 |
Common
Value | Count | Frequency (%) |
0 | 40 | |
6 | 7 | 14.0% |
9 | 3 | 6.0% |
Latin
Value | Count | Frequency (%) |
m | 10 | |
n | 10 | |
b | 10 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 80 | |
Hangul | 30 | 27.3% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 40 | |
m | 10 | 12.5% |
n | 10 | 12.5% |
b | 10 | 12.5% |
6 | 7 | 8.8% |
9 | 3 | 3.8% |
Hangul
Value | Count | Frequency (%) |
이 | 2 | 6.7% |
훈 | 2 | 6.7% |
지 | 2 | 6.7% |
강 | 1 | 3.3% |
영 | 1 | 3.3% |
구 | 1 | 3.3% |
건 | 1 | 3.3% |
재 | 1 | 3.3% |
호 | 1 | 3.3% |
김 | 1 | 3.3% |
Other values (17) | 17 |
NEWS_NO
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 11 |
---|---|
Distinct (%) | 55.0% |
Missing | 278 |
Missing (%) | 93.3% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 11637877 |
Minimum | 3105642 |
---|---|
Maximum | 20170101 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.7 KiB |
Quantile statistics
Minimum | 3105642 |
---|---|
5-th percentile | 3105645.8 |
Q1 | 3105653.8 |
median | 11637880 |
Q3 | 20170101 |
95-th percentile | 20170101 |
Maximum | 20170101 |
Range | 17064459 |
Interquartile range (IQR) | 17064447 |
Descriptive statistics
Standard deviation | 8753877.4 |
---|---|
Coefficient of variation (CV) | 0.75218854 |
Kurtosis | -2.2352941 |
Mean | 11637877 |
Median Absolute Deviation (MAD) | 8532220.5 |
Skewness | -5.853214 × 10-13 |
Sum | 2.3275753 × 108 |
Variance | 7.663037 × 1013 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
20170101 | 10 | 3.4% |
3105654 | 1 | 0.3% |
3105652 | 1 | 0.3% |
3105649 | 1 | 0.3% |
3105646 | 1 | 0.3% |
3105660 | 1 | 0.3% |
3105657 | 1 | 0.3% |
3105656 | 1 | 0.3% |
3105653 | 1 | 0.3% |
3105642 | 1 | 0.3% |
(Missing) | 278 |
Value | Count | Frequency (%) |
3105642 | 1 | |
3105646 | 1 | |
3105649 | 1 | |
3105652 | 1 | |
3105653 | 1 | |
3105654 | 1 | |
3105655 | 1 | |
3105656 | 1 | |
3105657 | 1 | |
3105660 | 1 |
Value | Count | Frequency (%) |
20170101 | 10 | |
3105660 | 1 | 0.3% |
3105657 | 1 | 0.3% |
3105656 | 1 | 0.3% |
3105655 | 1 | 0.3% |
3105654 | 1 | 0.3% |
3105653 | 1 | 0.3% |
3105652 | 1 | 0.3% |
3105649 | 1 | 0.3% |
3105646 | 1 | 0.3% |
BDCT_DATE
Text
MISSING
 
Distinct | 11 |
---|---|
Distinct (%) | 55.0% |
Missing | 278 |
Missing (%) | 93.3% |
Memory size | 2.5 KiB |
Length
Max length | 82 |
---|---|
Median length | 45 |
Mean length | 45 |
Min length | 8 |
Characters and Unicode
Total characters | 900 |
---|---|
Distinct characters | 37 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 10 ? |
---|---|
Unique (%) | 50.0% |
Sample
1st row | 20170101 |
---|---|
2nd row | http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1144456 |
3rd row | 20170101 |
4th row | http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1144457 |
5th row | 20170101 |
Value | Count | Frequency (%) |
20170101 | 10 | |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144456 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144457 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144458 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144459 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144460 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144461 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144462 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144463 | 1 | 5.0% |
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144464 | 1 | 5.0% |
Most occurring characters
Value | Count | Frequency (%) |
t | 80 | 8.9% |
n | 80 | 8.9% |
1 | 51 | 5.7% |
e | 50 | 5.6% |
o | 50 | 5.6% |
c | 50 | 5.6% |
0 | 41 | 4.6% |
/ | 40 | 4.4% |
. | 40 | 4.4% |
4 | 31 | 3.4% |
Other values (27) | 387 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 560 | |
Decimal Number | 170 | 18.9% |
Other Punctuation | 110 | 12.2% |
Connector Punctuation | 30 | 3.3% |
Math Symbol | 20 | 2.2% |
Uppercase Letter | 10 | 1.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
t | 80 | |
n | 80 | |
e | 50 | 8.9% |
o | 50 | 8.9% |
c | 50 | 8.9% |
m | 30 | 5.4% |
w | 30 | 5.4% |
s | 20 | 3.6% |
d | 20 | 3.6% |
i | 20 | 3.6% |
Other values (9) | 130 |
Decimal Number
Value | Count | Frequency (%) |
1 | 51 | |
0 | 41 | |
4 | 31 | |
2 | 21 | |
7 | 11 | 6.5% |
6 | 7 | 4.1% |
5 | 5 | 2.9% |
8 | 1 | 0.6% |
9 | 1 | 0.6% |
3 | 1 | 0.6% |
Other Punctuation
Value | Count | Frequency (%) |
/ | 40 | |
. | 40 | |
: | 10 | 9.1% |
& | 10 | 9.1% |
? | 10 | 9.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 30 |
Math Symbol
Value | Count | Frequency (%) |
= | 20 |
Uppercase Letter
Value | Count | Frequency (%) |
C | 10 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 570 | |
Common | 330 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
t | 80 | |
n | 80 | |
e | 50 | 8.8% |
o | 50 | 8.8% |
c | 50 | 8.8% |
m | 30 | 5.3% |
w | 30 | 5.3% |
s | 20 | 3.5% |
d | 20 | 3.5% |
i | 20 | 3.5% |
Other values (10) | 140 |
Common
Value | Count | Frequency (%) |
1 | 51 | |
0 | 41 | |
/ | 40 | |
. | 40 | |
4 | 31 | |
_ | 30 | |
2 | 21 | |
= | 20 | 6.1% |
7 | 11 | 3.3% |
: | 10 | 3.0% |
Other values (7) | 35 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 900 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
t | 80 | 8.9% |
n | 80 | 8.9% |
1 | 51 | 5.7% |
e | 50 | 5.6% |
o | 50 | 5.6% |
c | 50 | 5.6% |
0 | 41 | 4.6% |
/ | 40 | 4.4% |
. | 40 | 4.4% |
4 | 31 | 3.4% |
Other values (27) | 387 |
BDCT_TIME
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 10 |
---|---|
Distinct (%) | 100.0% |
Missing | 288 |
Missing (%) | 96.6% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1939 |
Minimum | 1930 |
---|---|
Maximum | 1948 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.7 KiB |
Quantile statistics
Minimum | 1930 |
---|---|
5-th percentile | 1930.9 |
Q1 | 1934.5 |
median | 1939 |
Q3 | 1943.5 |
95-th percentile | 1947.1 |
Maximum | 1948 |
Range | 18 |
Interquartile range (IQR) | 9 |
Descriptive statistics
Standard deviation | 6.0553007 |
---|---|
Coefficient of variation (CV) | 0.0031228988 |
Kurtosis | -1.2 |
Mean | 1939 |
Median Absolute Deviation (MAD) | 5 |
Skewness | 0 |
Sum | 19390 |
Variance | 36.666667 |
Monotonicity | Strictly increasing |
Value | Count | Frequency (%) |
1930 | 1 | 0.3% |
1932 | 1 | 0.3% |
1934 | 1 | 0.3% |
1936 | 1 | 0.3% |
1938 | 1 | 0.3% |
1940 | 1 | 0.3% |
1942 | 1 | 0.3% |
1944 | 1 | 0.3% |
1946 | 1 | 0.3% |
1948 | 1 | 0.3% |
(Missing) | 288 |
Value | Count | Frequency (%) |
1930 | 1 | |
1932 | 1 | |
1934 | 1 | |
1936 | 1 | |
1938 | 1 | |
1940 | 1 | |
1942 | 1 | |
1944 | 1 | |
1946 | 1 | |
1948 | 1 |
Value | Count | Frequency (%) |
1948 | 1 | |
1946 | 1 | |
1944 | 1 | |
1942 | 1 | |
1940 | 1 | |
1938 | 1 | |
1936 | 1 | |
1934 | 1 | |
1932 | 1 | |
1930 | 1 |
NWS_SJ
Text
MISSING
 
Distinct | 10 |
---|---|
Distinct (%) | 100.0% |
Missing | 288 |
Missing (%) | 96.6% |
Memory size | 2.5 KiB |
Length
Max length | 35 |
---|---|
Median length | 27.5 |
Mean length | 26 |
Min length | 17 |
Characters and Unicode
Total characters | 260 |
---|---|
Distinct characters | 134 |
Distinct categories | 4 ? |
Distinct scripts | 3 ? |
Distinct blocks | 4 ? |
Unique
Unique | 10 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | 특검, 새해 첫날부터 줄소환…'삼성그룹' 겨눈다 |
---|---|
2nd row | 블랙리스트 수사도 속도…김기춘·조윤선 곧 소환 |
3rd row | 헌재, 모레부터 본격 심리 돌입…새해 벽두부터 강행군 |
4th row | 박 대통령 "세월호 의혹, 기가 막히고 어이가 없다" |
5th row | 박 대통령 "뇌물죄? 누구 봐줄 생각 손톱만큼도 없다" |
Value | Count | Frequency (%) |
김정은 | 3 | 4.8% |
없다 | 2 | 3.2% |
대통령 | 2 | 3.2% |
박 | 2 | 3.2% |
특검 | 1 | 1.6% |
담담하고 | 1 | 1.6% |
재개 | 1 | 1.6% |
icbm | 1 | 1.6% |
시험발사 | 1 | 1.6% |
마감 | 1 | 1.6% |
Other values (47) | 47 |
Most occurring characters
Value | Count | Frequency (%) |
52 | 20.0% | |
" | 8 | 3.1% |
… | 6 | 2.3% |
사 | 5 | 1.9% |
' | 4 | 1.5% |
, | 4 | 1.5% |
정 | 4 | 1.5% |
다 | 4 | 1.5% |
? | 4 | 1.5% |
김 | 4 | 1.5% |
Other values (124) | 165 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 177 | |
Space Separator | 52 | 20.0% |
Other Punctuation | 27 | 10.4% |
Uppercase Letter | 4 | 1.5% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
사 | 5 | 2.8% |
정 | 4 | 2.3% |
다 | 4 | 2.3% |
김 | 4 | 2.3% |
시 | 3 | 1.7% |
도 | 3 | 1.7% |
이 | 3 | 1.7% |
고 | 3 | 1.7% |
은 | 3 | 1.7% |
세 | 3 | 1.7% |
Other values (113) | 142 |
Other Punctuation
Value | Count | Frequency (%) |
" | 8 | |
… | 6 | |
' | 4 | |
, | 4 | |
? | 4 | |
· | 1 | 3.7% |
Uppercase Letter
Value | Count | Frequency (%) |
B | 1 | |
C | 1 | |
I | 1 | |
M | 1 |
Space Separator
Value | Count | Frequency (%) |
52 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 177 | |
Common | 79 | |
Latin | 4 | 1.5% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
사 | 5 | 2.8% |
정 | 4 | 2.3% |
다 | 4 | 2.3% |
김 | 4 | 2.3% |
시 | 3 | 1.7% |
도 | 3 | 1.7% |
이 | 3 | 1.7% |
고 | 3 | 1.7% |
은 | 3 | 1.7% |
세 | 3 | 1.7% |
Other values (113) | 142 |
Common
Value | Count | Frequency (%) |
52 | ||
" | 8 | 10.1% |
… | 6 | 7.6% |
' | 4 | 5.1% |
, | 4 | 5.1% |
? | 4 | 5.1% |
· | 1 | 1.3% |
Latin
Value | Count | Frequency (%) |
B | 1 | |
C | 1 | |
I | 1 | |
M | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 177 | |
ASCII | 76 | |
Punctuation | 6 | 2.3% |
None | 1 | 0.4% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
52 | ||
" | 8 | 10.5% |
' | 4 | 5.3% |
, | 4 | 5.3% |
? | 4 | 5.3% |
B | 1 | 1.3% |
C | 1 | 1.3% |
I | 1 | 1.3% |
M | 1 | 1.3% |
Punctuation
Value | Count | Frequency (%) |
… | 6 |
Hangul
Value | Count | Frequency (%) |
사 | 5 | 2.8% |
정 | 4 | 2.3% |
다 | 4 | 2.3% |
김 | 4 | 2.3% |
시 | 3 | 1.7% |
도 | 3 | 1.7% |
이 | 3 | 1.7% |
고 | 3 | 1.7% |
은 | 3 | 1.7% |
세 | 3 | 1.7% |
Other values (113) | 142 |
None
Value | Count | Frequency (%) |
· | 1 |
NWS_CN
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 0.7% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 2.5 KiB |
<NA> | |
---|---|
【 앵커멘트 】 | 10 |
Length
Max length | 8 |
---|---|
Median length | 4 |
Mean length | 4.1342282 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | <NA> |
---|---|
2nd row | 【 앵커멘트 】 |
3rd row | <NA> |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 288 | |
【 앵커멘트 】 | 10 | 3.4% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 288 | |
【 | 10 | 3.1% |
앵커멘트 | 10 | 3.1% |
】 | 10 | 3.1% |
NWS_JRNL_NM
Unsupported
MISSING
  REJECTED
  UNSUPPORTED
 
Missing | 298 |
---|---|
Missing (%) | 100.0% |
Memory size | 2.7 KiB |
REG_DATE
Unsupported
MISSING
  REJECTED
  UNSUPPORTED
 
Missing | 298 |
---|---|
Missing (%) | 100.0% |
Memory size | 2.7 KiB |
MVP_CRS_NM
Unsupported
MISSING
  REJECTED
  UNSUPPORTED
 
Missing | 298 |
---|---|
Missing (%) | 100.0% |
Memory size | 2.7 KiB |
Unnamed: 10
Unsupported
MISSING
  REJECTED
  UNSUPPORTED
 
Missing | 298 |
---|---|
Missing (%) | 100.0% |
Memory size | 2.7 KiB |
NEWS_CGR_CD | NEWS_NO | BDCT_DATE | BDCT_TIME | NWS_SJ | |
---|---|---|---|---|---|
NEWS_CGR_CD | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
NEWS_NO | 1.000 | 1.000 | 1.000 | NaN | NaN |
BDCT_DATE | 1.000 | 1.000 | 1.000 | NaN | NaN |
BDCT_TIME | 1.000 | NaN | NaN | 1.000 | 1.000 |
NWS_SJ | 1.000 | NaN | NaN | 1.000 | 1.000 |
NEWS_NO | BDCT_TIME | NWS_CN | |
---|---|---|---|
NEWS_NO | 1.000 | 0.091 | 1.000 |
BDCT_TIME | 0.091 | 1.000 | 1.000 |
NWS_CN | 1.000 | 1.000 | 1.000 |
BDCT_NO | NEWS_CGR_CD | NEWS_NO | BDCT_DATE | BDCT_TIME | NWS_SJ | NWS_CN | NWS_JRNL_NM | REG_DATE | MVP_CRS_NM | Unnamed: 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
1 | 1144456 | mbn00009 | 3105654 | 20170101 | 1930 | 특검, 새해 첫날부터 줄소환…'삼성그룹' 겨눈다 | 【 앵커멘트 】 | <NA> | <NA> | <NA> | <NA> |
2 | 박영수 특별검사팀은 새해 첫날부터 문형표 전 복지부 장관과 김 종 전 문체부 차관 등 '최순실 게이트'의 핵심 인물들을 줄소환했습니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
3 | 박영수 특검은 열심히 하겠다는 짧은 한마디로 새해 특검의 각오를 밝혔습니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
4 | 한민용 기자입니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
5 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
6 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
7 | 【 기자 】 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
8 | 새해 첫날도 잊고 핵심 인물들을 줄소환하며 강행군을 이어가고 있는 박영수 특별검사팀. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
9 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
BDCT_NO | NEWS_CGR_CD | NEWS_NO | BDCT_DATE | BDCT_TIME | NWS_SJ | NWS_CN | NWS_JRNL_NM | REG_DATE | MVP_CRS_NM | Unnamed: 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
288 | - "2013년을 보내고 앞날에 대한 확신과 혁명적 자부심에 넘쳐 새해 2014년을 맞이합니다." | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
289 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
290 | 다만 신년사 화면 구성은 교차편집으로 지난해와 같았습니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
291 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
292 | 미사일 도발부터 북한 여자 축구 선수, 각종 공장과 농장 사진과 노동당 청사 정지 화면을 김정은의 신년사 낭독 모습과 번갈아 보여줬습니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
293 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
294 | 또 27분짜리 신년사 도중 마치 관중 앞에서 연설 하는 것과 같은 박수 소리를 무려 37차례나 넣었습니다. | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
295 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
296 | MBN뉴스 오지예입니다. | 오지예 | 20170101 | http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1144465 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
297 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
Most frequently occurring
BDCT_NO | NEWS_CGR_CD | NEWS_NO | BDCT_DATE | BDCT_TIME | NWS_SJ | NWS_CN | # duplicates | |
---|---|---|---|---|---|---|---|---|
4 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 111 |
3 | 【 기자 】 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 9 |
1 | ▶ SYNC : 박근혜 / 대통령 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 8 |
2 | ▶ 인터뷰 : 김정은 / 북한 노동당 위원장 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 7 |
0 | 영상취재 : 김인성 기자 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 2 |