Overview

Dataset statistics

Number of variables11
Number of observations275
Missing cells2501
Missing cells (%)82.7%
Duplicate rows5
Duplicate rows (%)1.8%
Total size in memory25.4 KiB
Average record size in memory94.5 B

Variable types

Text4
Numeric2
Categorical1
Unsupported4

Dataset

Description샘플 데이터
AuthorMBN
URLhttps://kdx.kr/data/view/175

Alerts

Dataset has 5 (1.8%) duplicate rowsDuplicates
NEWS_NO is highly overall correlated with NWS_CNHigh correlation
BDCT_TIME is highly overall correlated with NWS_CNHigh correlation
NWS_CN is highly overall correlated with NEWS_NO and 1 other fieldsHigh correlation
NWS_CN is highly imbalanced (87.1%)Imbalance
BDCT_NO has 104 (37.8%) missing valuesMissing
NEWS_CGR_CD has 257 (93.5%) missing valuesMissing
NEWS_NO has 255 (92.7%) missing valuesMissing
BDCT_DATE has 255 (92.7%) missing valuesMissing
BDCT_TIME has 265 (96.4%) missing valuesMissing
NWS_SJ has 265 (96.4%) missing valuesMissing
NWS_JRNL_NM has 275 (100.0%) missing valuesMissing
REG_DATE has 275 (100.0%) missing valuesMissing
MVP_CRS_NM has 275 (100.0%) missing valuesMissing
Unnamed: 10 has 275 (100.0%) missing valuesMissing
NWS_JRNL_NM is an unsupported type, check if it needs cleaning or further analysisUnsupported
REG_DATE is an unsupported type, check if it needs cleaning or further analysisUnsupported
MVP_CRS_NM is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 22:26:33.571346
Analysis finished2023-12-11 22:26:35.108292
Duration1.54 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

BDCT_NO
Text

MISSING 

Distinct160
Distinct (%)93.6%
Missing104
Missing (%)37.8%
Memory size2.3 KiB
2023-12-12T07:26:35.322290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length141
Median length70
Mean length43.263158
Min length6

Characters and Unicode

Total characters7398
Distinct characters519
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique155 ?
Unique (%)90.6%

Sample

1st row1201343
2nd row 60년 만에 찾아온 황금돼지의 해가 밝았습니다. 오늘 전국 곳곳의 해돋이 명소에는 새해 첫 일출을 보며 소원을 비는 인파로 인산인해를 이뤘습니다.
3rd row▶ 김정은 신년사 "완전한 비핵화 확고"
4th row 북한 김정은 위원장이 신년사를 통해 "완전한 비핵화는 불변한 입장"이라며, 한반도 비핵화 의지를 재확인했습니다. 또한 트럼프 대통령과 언제든 다시 만날 준비가 되어 있지만, 북한의 인내심을 오판하면 새로운 길을 모색하지 않을 수 없다고 밝혔습니다.
5th row▶ 육성으로 첫 언급…"평화 의지 환영"
ValueCountFrequency (%)
56
 
3.4%
김정은 24
 
1.4%
22
 
1.3%
인터뷰 16
 
1.0%
북한 12
 
0.7%
기자 11
 
0.7%
위원장의 10
 
0.6%
9
 
0.5%
9
 
0.5%
문재인 8
 
0.5%
Other values (1115) 1493
89.4%
2023-12-12T07:26:35.706221image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1745
 
23.6%
154
 
2.1%
. 149
 
2.0%
134
 
1.8%
120
 
1.6%
97
 
1.3%
96
 
1.3%
92
 
1.2%
89
 
1.2%
80
 
1.1%
Other values (509) 4642
62.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4964
67.1%
Space Separator 1745
 
23.6%
Other Punctuation 351
 
4.7%
Decimal Number 202
 
2.7%
Uppercase Letter 39
 
0.5%
Lowercase Letter 30
 
0.4%
Other Symbol 22
 
0.3%
Dash Punctuation 17
 
0.2%
Open Punctuation 14
 
0.2%
Close Punctuation 14
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
154
 
3.1%
134
 
2.7%
120
 
2.4%
97
 
2.0%
96
 
1.9%
92
 
1.9%
89
 
1.8%
80
 
1.6%
80
 
1.6%
77
 
1.6%
Other values (456) 3945
79.5%
Lowercase Letter
ValueCountFrequency (%)
a 4
13.3%
o 4
13.3%
m 3
10.0%
t 2
 
6.7%
c 2
 
6.7%
r 2
 
6.7%
e 2
 
6.7%
v 2
 
6.7%
n 2
 
6.7%
z 1
 
3.3%
Other values (6) 6
20.0%
Other Punctuation
ValueCountFrequency (%)
. 149
42.5%
, 57
 
16.2%
" 50
 
14.2%
: 25
 
7.1%
' 24
 
6.8%
/ 14
 
4.0%
% 12
 
3.4%
· 8
 
2.3%
7
 
2.0%
? 3
 
0.9%
Decimal Number
ValueCountFrequency (%)
1 47
23.3%
0 43
21.3%
2 29
14.4%
4 20
9.9%
3 20
9.9%
5 17
 
8.4%
9 9
 
4.5%
6 6
 
3.0%
7 6
 
3.0%
8 5
 
2.5%
Uppercase Letter
ValueCountFrequency (%)
N 11
28.2%
B 11
28.2%
M 11
28.2%
T 2
 
5.1%
V 2
 
5.1%
I 1
 
2.6%
Q 1
 
2.6%
Open Punctuation
ValueCountFrequency (%)
8
57.1%
( 4
28.6%
[ 2
 
14.3%
Close Punctuation
ValueCountFrequency (%)
8
57.1%
) 4
28.6%
] 2
 
14.3%
Space Separator
ValueCountFrequency (%)
1745
100.0%
Other Symbol
ValueCountFrequency (%)
22
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4962
67.1%
Common 2365
32.0%
Latin 69
 
0.9%
Han 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
154
 
3.1%
134
 
2.7%
120
 
2.4%
97
 
2.0%
96
 
1.9%
92
 
1.9%
89
 
1.8%
80
 
1.6%
80
 
1.6%
77
 
1.6%
Other values (454) 3943
79.5%
Common
ValueCountFrequency (%)
1745
73.8%
. 149
 
6.3%
, 57
 
2.4%
" 50
 
2.1%
1 47
 
2.0%
0 43
 
1.8%
2 29
 
1.2%
: 25
 
1.1%
' 24
 
1.0%
22
 
0.9%
Other values (20) 174
 
7.4%
Latin
ValueCountFrequency (%)
N 11
15.9%
B 11
15.9%
M 11
15.9%
a 4
 
5.8%
o 4
 
5.8%
m 3
 
4.3%
T 2
 
2.9%
V 2
 
2.9%
t 2
 
2.9%
c 2
 
2.9%
Other values (13) 17
24.6%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4962
67.1%
ASCII 2381
32.2%
None 24
 
0.3%
Geometric Shapes 22
 
0.3%
Punctuation 7
 
0.1%
CJK 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1745
73.3%
. 149
 
6.3%
, 57
 
2.4%
" 50
 
2.1%
1 47
 
2.0%
0 43
 
1.8%
2 29
 
1.2%
: 25
 
1.0%
' 24
 
1.0%
4 20
 
0.8%
Other values (38) 192
 
8.1%
Hangul
ValueCountFrequency (%)
154
 
3.1%
134
 
2.7%
120
 
2.4%
97
 
2.0%
96
 
1.9%
92
 
1.9%
89
 
1.8%
80
 
1.6%
80
 
1.6%
77
 
1.6%
Other values (454) 3943
79.5%
Geometric Shapes
ValueCountFrequency (%)
22
100.0%
None
ValueCountFrequency (%)
· 8
33.3%
8
33.3%
8
33.3%
Punctuation
ValueCountFrequency (%)
7
100.0%
CJK
ValueCountFrequency (%)
1
50.0%
1
50.0%

NEWS_CGR_CD
Text

MISSING 

Distinct11
Distinct (%)61.1%
Missing257
Missing (%)93.5%
Memory size2.3 KiB
2023-12-12T07:26:35.822985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length5.7777778
Min length3

Characters and Unicode

Total characters104
Distinct characters26
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)50.0%

Sample

1st rowmbn00009
2nd rowmbn00009
3rd rowmbn00007
4th row이동훈
5th rowmbn00006
ValueCountFrequency (%)
mbn00006 7
38.9%
mbn00009 2
 
11.1%
mbn00007 1
 
5.6%
이동훈 1
 
5.6%
김근희 1
 
5.6%
주진희 1
 
5.6%
이상주 1
 
5.6%
김경기 1
 
5.6%
최중락 1
 
5.6%
이정호 1
 
5.6%
2023-12-12T07:26:36.024510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 40
38.5%
m 10
 
9.6%
n 10
 
9.6%
b 10
 
9.6%
6 7
 
6.7%
3
 
2.9%
2
 
1.9%
2
 
1.9%
2
 
1.9%
9 2
 
1.9%
Other values (16) 16
 
15.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 50
48.1%
Lowercase Letter 30
28.8%
Other Letter 24
23.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3
 
12.5%
2
 
8.3%
2
 
8.3%
2
 
8.3%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Other values (9) 9
37.5%
Decimal Number
ValueCountFrequency (%)
0 40
80.0%
6 7
 
14.0%
9 2
 
4.0%
7 1
 
2.0%
Lowercase Letter
ValueCountFrequency (%)
m 10
33.3%
n 10
33.3%
b 10
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common 50
48.1%
Latin 30
28.8%
Hangul 24
23.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3
 
12.5%
2
 
8.3%
2
 
8.3%
2
 
8.3%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Other values (9) 9
37.5%
Common
ValueCountFrequency (%)
0 40
80.0%
6 7
 
14.0%
9 2
 
4.0%
7 1
 
2.0%
Latin
ValueCountFrequency (%)
m 10
33.3%
n 10
33.3%
b 10
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 80
76.9%
Hangul 24
 
23.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 40
50.0%
m 10
 
12.5%
n 10
 
12.5%
b 10
 
12.5%
6 7
 
8.8%
9 2
 
2.5%
7 1
 
1.2%
Hangul
ValueCountFrequency (%)
3
 
12.5%
2
 
8.3%
2
 
8.3%
2
 
8.3%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Other values (9) 9
37.5%

NEWS_NO
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct11
Distinct (%)55.0%
Missing255
Missing (%)92.7%
Infinite0
Infinite (%)0.0%
Mean11957298
Minimum3724486
Maximum20190101
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2023-12-12T07:26:36.110078image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3724486
5-th percentile3724487
Q13724493.5
median11957303
Q320190101
95-th percentile20190101
Maximum20190101
Range16465615
Interquartile range (IQR)16465608

Descriptive statistics

Standard deviation8446677.8
Coefficient of variation (CV)0.70640356
Kurtosis-2.2352941
Mean11957298
Median Absolute Deviation (MAD)8232798
Skewness-1.0298138 × 10-12
Sum2.3914596 × 108
Variance7.1346365 × 1013
MonotonicityNot monotonic
2023-12-12T07:26:36.187671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
20190101 10
 
3.6%
3724486 1
 
0.4%
3724504 1
 
0.4%
3724497 1
 
0.4%
3724492 1
 
0.4%
3724502 1
 
0.4%
3724487 1
 
0.4%
3724494 1
 
0.4%
3724505 1
 
0.4%
3724490 1
 
0.4%
(Missing) 255
92.7%
ValueCountFrequency (%)
3724486 1
0.4%
3724487 1
0.4%
3724490 1
0.4%
3724491 1
0.4%
3724492 1
0.4%
3724494 1
0.4%
3724497 1
0.4%
3724502 1
0.4%
3724504 1
0.4%
3724505 1
0.4%
ValueCountFrequency (%)
20190101 10
3.6%
3724505 1
 
0.4%
3724504 1
 
0.4%
3724502 1
 
0.4%
3724497 1
 
0.4%
3724494 1
 
0.4%
3724492 1
 
0.4%
3724491 1
 
0.4%
3724490 1
 
0.4%
3724487 1
 
0.4%

BDCT_DATE
Text

MISSING 

Distinct11
Distinct (%)55.0%
Missing255
Missing (%)92.7%
Memory size2.3 KiB
2023-12-12T07:26:36.322734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length82
Median length45
Mean length45
Min length8

Characters and Unicode

Total characters900
Distinct characters37
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)50.0%

Sample

1st row20190101
2nd rowhttp://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1201343
3rd row20190101
4th rowhttp://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1201344
5th row20190101
ValueCountFrequency (%)
20190101 10
50.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1201343 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1201344 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1201345 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1201346 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1201347 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1201348 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1201349 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1201350 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1201351 1
 
5.0%
2023-12-12T07:26:36.578366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 80
 
8.9%
n 80
 
8.9%
0 51
 
5.7%
1 51
 
5.7%
e 50
 
5.6%
o 50
 
5.6%
c 50
 
5.6%
. 40
 
4.4%
/ 40
 
4.4%
2 31
 
3.4%
Other values (27) 377
41.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 560
62.2%
Decimal Number 170
 
18.9%
Other Punctuation 110
 
12.2%
Connector Punctuation 30
 
3.3%
Math Symbol 20
 
2.2%
Uppercase Letter 10
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 80
14.3%
n 80
14.3%
e 50
 
8.9%
o 50
 
8.9%
c 50
 
8.9%
w 30
 
5.4%
m 30
 
5.4%
s 20
 
3.6%
i 20
 
3.6%
d 20
 
3.6%
Other values (9) 130
23.2%
Decimal Number
ValueCountFrequency (%)
0 51
30.0%
1 51
30.0%
2 31
18.2%
9 11
 
6.5%
3 11
 
6.5%
4 8
 
4.7%
5 4
 
2.4%
6 1
 
0.6%
7 1
 
0.6%
8 1
 
0.6%
Other Punctuation
ValueCountFrequency (%)
. 40
36.4%
/ 40
36.4%
: 10
 
9.1%
? 10
 
9.1%
& 10
 
9.1%
Connector Punctuation
ValueCountFrequency (%)
_ 30
100.0%
Math Symbol
ValueCountFrequency (%)
= 20
100.0%
Uppercase Letter
ValueCountFrequency (%)
C 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 570
63.3%
Common 330
36.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 80
14.0%
n 80
14.0%
e 50
 
8.8%
o 50
 
8.8%
c 50
 
8.8%
w 30
 
5.3%
m 30
 
5.3%
s 20
 
3.5%
i 20
 
3.5%
d 20
 
3.5%
Other values (10) 140
24.6%
Common
ValueCountFrequency (%)
0 51
15.5%
1 51
15.5%
. 40
12.1%
/ 40
12.1%
2 31
9.4%
_ 30
9.1%
= 20
 
6.1%
9 11
 
3.3%
3 11
 
3.3%
: 10
 
3.0%
Other values (7) 35
10.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 80
 
8.9%
n 80
 
8.9%
0 51
 
5.7%
1 51
 
5.7%
e 50
 
5.6%
o 50
 
5.6%
c 50
 
5.6%
. 40
 
4.4%
/ 40
 
4.4%
2 31
 
3.4%
Other values (27) 377
41.9%

BDCT_TIME
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct10
Distinct (%)100.0%
Missing265
Missing (%)96.4%
Infinite0
Infinite (%)0.0%
Mean1937.8
Minimum1930
Maximum1944
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2023-12-12T07:26:36.685624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1930
5-th percentile1931.35
Q11935.25
median1938
Q31940.75
95-th percentile1943.55
Maximum1944
Range14
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation4.4422217
Coefficient of variation (CV)0.0022924046
Kurtosis-0.61551869
Mean1937.8
Median Absolute Deviation (MAD)3
Skewness-0.30002336
Sum19378
Variance19.733333
MonotonicityStrictly increasing
2023-12-12T07:26:36.776912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1930 1
 
0.4%
1933 1
 
0.4%
1935 1
 
0.4%
1936 1
 
0.4%
1937 1
 
0.4%
1939 1
 
0.4%
1940 1
 
0.4%
1941 1
 
0.4%
1943 1
 
0.4%
1944 1
 
0.4%
(Missing) 265
96.4%
ValueCountFrequency (%)
1930 1
0.4%
1933 1
0.4%
1935 1
0.4%
1936 1
0.4%
1937 1
0.4%
1939 1
0.4%
1940 1
0.4%
1941 1
0.4%
1943 1
0.4%
1944 1
0.4%
ValueCountFrequency (%)
1944 1
0.4%
1943 1
0.4%
1941 1
0.4%
1940 1
0.4%
1939 1
0.4%
1937 1
0.4%
1936 1
0.4%
1935 1
0.4%
1933 1
0.4%
1930 1
0.4%

NWS_SJ
Text

MISSING 

Distinct10
Distinct (%)100.0%
Missing265
Missing (%)96.4%
Memory size2.3 KiB
2023-12-12T07:26:36.952814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length41
Median length30
Mean length28.5
Min length18

Characters and Unicode

Total characters285
Distinct characters132
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)100.0%

Sample

1st row김주하 앵커가 전하는 1월 1일 뉴스8 주요뉴스
2nd row[신년영상] 2019 황금돼지 해 밝았다
3rd row60년 만에 돌아온 황금돼지의 해
4th row김정은, '완전한 비핵화' 첫 언급…"약속 안 지키면 새 길”
5th row김정은, 직접 개성공단 언급했지만…사실상 남측에 숙제
ValueCountFrequency (%)
김정은 4
 
6.0%
대통령 2
 
3.0%
2
 
3.0%
mbn 2
 
3.0%
여론조사 2
 
3.0%
2
 
3.0%
긍정적"…문 1
 
1.5%
해결에 1
 
1.5%
문제 1
 
1.5%
김주하 1
 
1.5%
Other values (49) 49
73.1%
2023-12-12T07:26:37.279896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
57
 
20.0%
8
 
2.8%
6
 
2.1%
6
 
2.1%
5
 
1.8%
, 5
 
1.8%
4
 
1.4%
4
 
1.4%
4
 
1.4%
4
 
1.4%
Other values (122) 182
63.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 182
63.9%
Space Separator 57
 
20.0%
Other Punctuation 19
 
6.7%
Decimal Number 14
 
4.9%
Uppercase Letter 6
 
2.1%
Open Punctuation 3
 
1.1%
Close Punctuation 3
 
1.1%
Final Punctuation 1
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8
 
4.4%
6
 
3.3%
5
 
2.7%
4
 
2.2%
4
 
2.2%
4
 
2.2%
4
 
2.2%
4
 
2.2%
3
 
1.6%
3
 
1.6%
Other values (101) 137
75.3%
Decimal Number
ValueCountFrequency (%)
0 3
21.4%
1 3
21.4%
5 2
14.3%
4 2
14.3%
6 1
 
7.1%
9 1
 
7.1%
2 1
 
7.1%
8 1
 
7.1%
Other Punctuation
ValueCountFrequency (%)
6
31.6%
, 5
26.3%
" 3
15.8%
' 2
 
10.5%
% 2
 
10.5%
. 1
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
M 2
33.3%
B 2
33.3%
N 2
33.3%
Space Separator
ValueCountFrequency (%)
57
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 3
100.0%
Close Punctuation
ValueCountFrequency (%)
] 3
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 182
63.9%
Common 97
34.0%
Latin 6
 
2.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8
 
4.4%
6
 
3.3%
5
 
2.7%
4
 
2.2%
4
 
2.2%
4
 
2.2%
4
 
2.2%
4
 
2.2%
3
 
1.6%
3
 
1.6%
Other values (101) 137
75.3%
Common
ValueCountFrequency (%)
57
58.8%
6
 
6.2%
, 5
 
5.2%
[ 3
 
3.1%
] 3
 
3.1%
0 3
 
3.1%
" 3
 
3.1%
1 3
 
3.1%
' 2
 
2.1%
5 2
 
2.1%
Other values (8) 10
 
10.3%
Latin
ValueCountFrequency (%)
M 2
33.3%
B 2
33.3%
N 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 182
63.9%
ASCII 96
33.7%
Punctuation 7
 
2.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
57
59.4%
, 5
 
5.2%
[ 3
 
3.1%
] 3
 
3.1%
0 3
 
3.1%
" 3
 
3.1%
1 3
 
3.1%
M 2
 
2.1%
B 2
 
2.1%
N 2
 
2.1%
Other values (9) 13
 
13.5%
Hangul
ValueCountFrequency (%)
8
 
4.4%
6
 
3.3%
5
 
2.7%
4
 
2.2%
4
 
2.2%
4
 
2.2%
4
 
2.2%
4
 
2.2%
3
 
1.6%
3
 
1.6%
Other values (101) 137
75.3%
Punctuation
ValueCountFrequency (%)
6
85.7%
1
 
14.3%

NWS_CN
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
<NA>
265 
【 앵커멘트 】
 
8
▶ '황금돼지 해' 밝아…해맞이 인파 북적
 
1
2019 기해년, 황금돼지 해가 밝았습니다.
 
1

Length

Max length24
Median length4
Mean length4.2581818
Min length4

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st row<NA>
2nd row▶ '황금돼지 해' 밝아…해맞이 인파 북적
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 265
96.4%
【 앵커멘트 】 8
 
2.9%
▶ '황금돼지 해' 밝아…해맞이 인파 북적 1
 
0.4%
2019 기해년, 황금돼지 해가 밝았습니다. 1
 
0.4%

Length

2023-12-12T07:26:37.401925image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T07:26:37.493567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 265
88.3%
8
 
2.7%
앵커멘트 8
 
2.7%
8
 
2.7%
황금돼지 2
 
0.7%
1
 
0.3%
1
 
0.3%
밝아…해맞이 1
 
0.3%
인파 1
 
0.3%
북적 1
 
0.3%
Other values (4) 4
 
1.3%

NWS_JRNL_NM
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing275
Missing (%)100.0%
Memory size2.5 KiB

REG_DATE
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing275
Missing (%)100.0%
Memory size2.5 KiB

MVP_CRS_NM
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing275
Missing (%)100.0%
Memory size2.5 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing275
Missing (%)100.0%
Memory size2.5 KiB

Interactions

2023-12-12T07:26:34.666322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T07:26:34.531939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T07:26:34.737962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T07:26:34.598332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T07:26:37.560399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CN
NEWS_CGR_CD1.0001.0001.0001.0001.0000.898
NEWS_NO1.0001.0001.000NaNNaNNaN
BDCT_DATE1.0001.0001.000NaNNaNNaN
BDCT_TIME1.000NaNNaN1.0001.0001.000
NWS_SJ1.000NaNNaN1.0001.0001.000
NWS_CN0.898NaNNaN1.0001.0001.000
2023-12-12T07:26:37.660677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NEWS_NOBDCT_TIMENWS_CN
NEWS_NO1.000-0.0181.000
BDCT_TIME-0.0181.0000.655
NWS_CN1.0000.6551.000

Missing values

2023-12-12T07:26:34.831814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T07:26:34.945569image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T07:26:35.040405image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CNNWS_JRNL_NMREG_DATEMVP_CRS_NMUnnamed: 10
0<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
11201343mbn000093724486201901011930김주하 앵커가 전하는 1월 1일 뉴스8 주요뉴스▶ '황금돼지 해' 밝아…해맞이 인파 북적<NA><NA><NA><NA>
260년 만에 찾아온 황금돼지의 해가 밝았습니다. 오늘 전국 곳곳의 해돋이 명소에는 새해 첫 일출을 보며 소원을 비는 인파로 인산인해를 이뤘습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
3<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
4▶ 김정은 신년사 "완전한 비핵화 확고"<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
5북한 김정은 위원장이 신년사를 통해 "완전한 비핵화는 불변한 입장"이라며, 한반도 비핵화 의지를 재확인했습니다. 또한 트럼프 대통령과 언제든 다시 만날 준비가 되어 있지만, 북한의 인내심을 오판하면 새로운 길을 모색하지 않을 수 없다고 밝혔습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
6<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
7▶ 육성으로 첫 언급…"평화 의지 환영"<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
8김정은 위원장이 '완전한 비핵화'를 북한 주민들에게 육성으로 언급한 건 이번이 처음입니다. 우리 정부는 김정은 위원장의 신년사에 환영의 뜻을 밝혔습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
9<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CNNWS_JRNL_NMREG_DATEMVP_CRS_NMUnnamed: 10
265그렇다면, 이 총리와 황 전 총리가 양자대결을 한다면 어떻게 될까.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
266<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
267여론조사 결과, 이 총리가 40.4%를 기록하며, 24.5%를 기록한 황 전 총리에 크게 앞서는 것으로 나타났습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
268<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
269하지만, 진보후보와 보수후보 가운데 차기 대선 지지후보가 없다고 응답한 비율이 30%가 넘는다는 점에서, 시민들은 좀 더 상황을 지켜보겠다는 신중한 자세를 보이고 있습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
270<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
271MBN뉴스 오태윤입니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
272<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
273영상편집 : 이유진오태윤20190101http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1201352<NA><NA><NA><NA><NA><NA><NA>
274<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CN# duplicates
4<NA><NA><NA><NA><NA><NA><NA>104
3【 기자 】<NA><NA><NA><NA><NA><NA>7
0▶ 인터뷰 : 김정은 / 북한 국무위원장<NA><NA><NA><NA><NA><NA>3
1▶ 인터뷰 : 문재인 대통령<NA><NA><NA><NA><NA><NA>2
2▶ 인터뷰 : 신범철 / 아산정책연구원 안보통일센터장<NA><NA><NA><NA><NA><NA>2