Overview

Dataset statistics

Number of variables11
Number of observations300
Missing cells2733
Missing cells (%)82.8%
Duplicate rows4
Duplicate rows (%)1.3%
Total size in memory27.7 KiB
Average record size in memory94.4 B

Variable types

Text4
Numeric2
Categorical1
Unsupported4

Dataset

Description샘플 데이터
AuthorMBN
URLhttps://kdx.kr/data/view/1011

Alerts

Dataset has 4 (1.3%) duplicate rowsDuplicates
NEWS_NO is highly overall correlated with NWS_CNHigh correlation
NWS_CN is highly overall correlated with NEWS_NOHigh correlation
NWS_CN is highly imbalanced (85.7%)Imbalance
BDCT_NO has 112 (37.3%) missing valuesMissing
NEWS_CGR_CD has 281 (93.7%) missing valuesMissing
NEWS_NO has 280 (93.3%) missing valuesMissing
BDCT_DATE has 280 (93.3%) missing valuesMissing
BDCT_TIME has 290 (96.7%) missing valuesMissing
NWS_SJ has 290 (96.7%) missing valuesMissing
NWS_JRNL_NM has 300 (100.0%) missing valuesMissing
REG_DATE has 300 (100.0%) missing valuesMissing
MVP_CRS_NM has 300 (100.0%) missing valuesMissing
Unnamed: 10 has 300 (100.0%) missing valuesMissing
NWS_JRNL_NM is an unsupported type, check if it needs cleaning or further analysisUnsupported
REG_DATE is an unsupported type, check if it needs cleaning or further analysisUnsupported
MVP_CRS_NM is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 21:32:38.163588
Analysis finished2023-12-11 21:32:39.468076
Duration1.3 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

BDCT_NO
Text

MISSING 

Distinct174
Distinct (%)92.6%
Missing112
Missing (%)37.3%
Memory size2.5 KiB
2023-12-12T06:32:39.679247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length118
Median length68
Mean length41.148936
Min length6

Characters and Unicode

Total characters7736
Distinct characters565
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique170 ?
Unique (%)90.4%

Sample

1st row1173098
2nd row 전국의 해돋이 명소에 수십만 명의 해맞이 인파가 몰렸습니다. 문재인 대통령도 2017년을 빛낸 의인 6명과 함께 북한산 해돋이 산행으로 집권 2년차 공식 일정을 시작했습니다.
3rd row▶ "평창 대표단 파견 용의"…'핵 단추' 위협
4th row 북한 김정은이 평창 올림픽에 대표단을 파견할 용의가 있다고 말했습니다. 하지만, 미국을 향해선 미 본토 전역이 사정권 안에 있다고 하는 등 위협을 이어갔습니다.
5th row▶ 청와대 "북 대표단 파견·당국 대화 용의 환영"
ValueCountFrequency (%)
74
 
4.2%
25
 
1.4%
인터뷰 16
 
0.9%
기자 16
 
0.9%
북한 13
 
0.7%
김정은 12
 
0.7%
10
 
0.6%
9
 
0.5%
9
 
0.5%
김정은은 9
 
0.5%
Other values (1187) 1589
89.2%
2023-12-12T06:32:40.078773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1905
 
24.6%
174
 
2.2%
. 149
 
1.9%
144
 
1.9%
119
 
1.5%
96
 
1.2%
94
 
1.2%
90
 
1.2%
88
 
1.1%
83
 
1.1%
Other values (555) 4794
62.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 5130
66.3%
Space Separator 1905
 
24.6%
Other Punctuation 363
 
4.7%
Decimal Number 154
 
2.0%
Lowercase Letter 58
 
0.7%
Uppercase Letter 38
 
0.5%
Other Symbol 27
 
0.3%
Dash Punctuation 21
 
0.3%
Open Punctuation 20
 
0.3%
Close Punctuation 20
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
174
 
3.4%
144
 
2.8%
119
 
2.3%
96
 
1.9%
94
 
1.8%
90
 
1.8%
88
 
1.7%
83
 
1.6%
78
 
1.5%
75
 
1.5%
Other values (494) 4089
79.7%
Lowercase Letter
ValueCountFrequency (%)
o 7
12.1%
m 6
10.3%
n 6
10.3%
e 5
8.6%
r 4
 
6.9%
h 4
 
6.9%
c 4
 
6.9%
a 4
 
6.9%
k 3
 
5.2%
b 3
 
5.2%
Other values (8) 12
20.7%
Uppercase Letter
ValueCountFrequency (%)
N 10
26.3%
M 9
23.7%
B 9
23.7%
S 2
 
5.3%
D 1
 
2.6%
C 1
 
2.6%
L 1
 
2.6%
U 1
 
2.6%
A 1
 
2.6%
E 1
 
2.6%
Other values (2) 2
 
5.3%
Other Punctuation
ValueCountFrequency (%)
. 149
41.0%
" 74
20.4%
, 43
 
11.8%
: 32
 
8.8%
' 22
 
6.1%
/ 20
 
5.5%
11
 
3.0%
@ 4
 
1.1%
· 4
 
1.1%
% 2
 
0.6%
Decimal Number
ValueCountFrequency (%)
1 48
31.2%
0 28
18.2%
3 19
 
12.3%
2 17
 
11.0%
7 15
 
9.7%
6 6
 
3.9%
5 6
 
3.9%
9 5
 
3.2%
8 5
 
3.2%
4 5
 
3.2%
Open Punctuation
ValueCountFrequency (%)
9
45.0%
( 7
35.0%
[ 4
20.0%
Close Punctuation
ValueCountFrequency (%)
9
45.0%
) 7
35.0%
] 4
20.0%
Other Symbol
ValueCountFrequency (%)
25
92.6%
2
 
7.4%
Space Separator
ValueCountFrequency (%)
1905
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 5130
66.3%
Common 2510
32.4%
Latin 96
 
1.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
174
 
3.4%
144
 
2.8%
119
 
2.3%
96
 
1.9%
94
 
1.8%
90
 
1.8%
88
 
1.7%
83
 
1.6%
78
 
1.5%
75
 
1.5%
Other values (494) 4089
79.7%
Common
ValueCountFrequency (%)
1905
75.9%
. 149
 
5.9%
" 74
 
2.9%
1 48
 
1.9%
, 43
 
1.7%
: 32
 
1.3%
0 28
 
1.1%
25
 
1.0%
' 22
 
0.9%
- 21
 
0.8%
Other values (21) 163
 
6.5%
Latin
ValueCountFrequency (%)
N 10
 
10.4%
M 9
 
9.4%
B 9
 
9.4%
o 7
 
7.3%
m 6
 
6.2%
n 6
 
6.2%
e 5
 
5.2%
r 4
 
4.2%
h 4
 
4.2%
c 4
 
4.2%
Other values (20) 32
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 5130
66.3%
ASCII 2546
32.9%
Geometric Shapes 25
 
0.3%
None 22
 
0.3%
Punctuation 11
 
0.1%
Misc Symbols 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1905
74.8%
. 149
 
5.9%
" 74
 
2.9%
1 48
 
1.9%
, 43
 
1.7%
: 32
 
1.3%
0 28
 
1.1%
' 22
 
0.9%
- 21
 
0.8%
/ 20
 
0.8%
Other values (45) 204
 
8.0%
Hangul
ValueCountFrequency (%)
174
 
3.4%
144
 
2.8%
119
 
2.3%
96
 
1.9%
94
 
1.8%
90
 
1.8%
88
 
1.7%
83
 
1.6%
78
 
1.5%
75
 
1.5%
Other values (494) 4089
79.7%
Geometric Shapes
ValueCountFrequency (%)
25
100.0%
Punctuation
ValueCountFrequency (%)
11
100.0%
None
ValueCountFrequency (%)
9
40.9%
9
40.9%
· 4
18.2%
Misc Symbols
ValueCountFrequency (%)
2
100.0%

NEWS_CGR_CD
Text

MISSING 

Distinct11
Distinct (%)57.9%
Missing281
Missing (%)93.7%
Memory size2.5 KiB
2023-12-12T06:32:40.197894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length5.6315789
Min length3

Characters and Unicode

Total characters107
Distinct characters28
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)47.4%

Sample

1st rowmbn00006
2nd rowmbn00009
3rd row유호정
4th rowmbn00006
5th row송주영
ValueCountFrequency (%)
mbn00006 8
42.1%
mbn00009 2
 
10.5%
유호정 1
 
5.3%
송주영 1
 
5.3%
김한준 1
 
5.3%
차민아 1
 
5.3%
오태윤 1
 
5.3%
주진희 1
 
5.3%
이혁준 1
 
5.3%
이정호 1
 
5.3%
2023-12-12T06:32:40.396952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 40
37.4%
m 10
 
9.3%
n 10
 
9.3%
b 10
 
9.3%
6 8
 
7.5%
2
 
1.9%
2
 
1.9%
2
 
1.9%
2
 
1.9%
9 2
 
1.9%
Other values (18) 19
17.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 50
46.7%
Lowercase Letter 30
28.0%
Other Letter 27
25.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (12) 12
44.4%
Decimal Number
ValueCountFrequency (%)
0 40
80.0%
6 8
 
16.0%
9 2
 
4.0%
Lowercase Letter
ValueCountFrequency (%)
m 10
33.3%
n 10
33.3%
b 10
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common 50
46.7%
Latin 30
28.0%
Hangul 27
25.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (12) 12
44.4%
Common
ValueCountFrequency (%)
0 40
80.0%
6 8
 
16.0%
9 2
 
4.0%
Latin
ValueCountFrequency (%)
m 10
33.3%
n 10
33.3%
b 10
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 80
74.8%
Hangul 27
 
25.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 40
50.0%
m 10
 
12.5%
n 10
 
12.5%
b 10
 
12.5%
6 8
 
10.0%
9 2
 
2.5%
Hangul
ValueCountFrequency (%)
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (12) 12
44.4%

NEWS_NO
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct11
Distinct (%)55.0%
Missing280
Missing (%)93.3%
Infinite0
Infinite (%)0.0%
Mean11802334
Minimum3424555
Maximum20180101
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 KiB
2023-12-12T06:32:40.484383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3424555
5-th percentile3424556.9
Q13424566.2
median11802340
Q320180101
95-th percentile20180101
Maximum20180101
Range16755546
Interquartile range (IQR)16755535

Descriptive statistics

Standard deviation8595407.9
Coefficient of variation (CV)0.72828037
Kurtosis-2.2352941
Mean11802334
Median Absolute Deviation (MAD)8377760.5
Skewness-1.819558 × 10-12
Sum2.3604667 × 108
Variance7.3881038 × 1013
MonotonicityNot monotonic
2023-12-12T06:32:40.569859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
20180101 10
 
3.3%
3424555 1
 
0.3%
3424570 1
 
0.3%
3424576 1
 
0.3%
3424561 1
 
0.3%
3424578 1
 
0.3%
3424557 1
 
0.3%
3424580 1
 
0.3%
3424558 1
 
0.3%
3424568 1
 
0.3%
(Missing) 280
93.3%
ValueCountFrequency (%)
3424555 1
0.3%
3424557 1
0.3%
3424558 1
0.3%
3424560 1
0.3%
3424561 1
0.3%
3424568 1
0.3%
3424570 1
0.3%
3424576 1
0.3%
3424578 1
0.3%
3424580 1
0.3%
ValueCountFrequency (%)
20180101 10
3.3%
3424580 1
 
0.3%
3424578 1
 
0.3%
3424576 1
 
0.3%
3424570 1
 
0.3%
3424568 1
 
0.3%
3424561 1
 
0.3%
3424560 1
 
0.3%
3424558 1
 
0.3%
3424557 1
 
0.3%

BDCT_DATE
Text

MISSING 

Distinct11
Distinct (%)55.0%
Missing280
Missing (%)93.3%
Memory size2.5 KiB
2023-12-12T06:32:40.707562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length82
Median length45
Mean length45
Min length8

Characters and Unicode

Total characters900
Distinct characters37
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)50.0%

Sample

1st row20180101
2nd rowhttp://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1173098
3rd row20180101
4th rowhttp://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1173099
5th row20180101
ValueCountFrequency (%)
20180101 10
50.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173098 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173099 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173100 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173101 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173102 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173103 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173104 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173105 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1173106 1
 
5.0%
2023-12-12T06:32:40.951162image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 80
 
8.9%
n 80
 
8.9%
1 59
 
6.6%
0 51
 
5.7%
e 50
 
5.6%
o 50
 
5.6%
c 50
 
5.6%
. 40
 
4.4%
/ 40
 
4.4%
w 30
 
3.3%
Other values (27) 370
41.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 560
62.2%
Decimal Number 170
 
18.9%
Other Punctuation 110
 
12.2%
Connector Punctuation 30
 
3.3%
Math Symbol 20
 
2.2%
Uppercase Letter 10
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 80
14.3%
n 80
14.3%
e 50
 
8.9%
o 50
 
8.9%
c 50
 
8.9%
w 30
 
5.4%
m 30
 
5.4%
s 20
 
3.6%
i 20
 
3.6%
d 20
 
3.6%
Other values (9) 130
23.2%
Decimal Number
ValueCountFrequency (%)
1 59
34.7%
0 51
30.0%
2 21
 
12.4%
8 11
 
6.5%
7 11
 
6.5%
3 11
 
6.5%
9 3
 
1.8%
4 1
 
0.6%
5 1
 
0.6%
6 1
 
0.6%
Other Punctuation
ValueCountFrequency (%)
. 40
36.4%
/ 40
36.4%
: 10
 
9.1%
? 10
 
9.1%
& 10
 
9.1%
Connector Punctuation
ValueCountFrequency (%)
_ 30
100.0%
Math Symbol
ValueCountFrequency (%)
= 20
100.0%
Uppercase Letter
ValueCountFrequency (%)
C 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 570
63.3%
Common 330
36.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 80
14.0%
n 80
14.0%
e 50
 
8.8%
o 50
 
8.8%
c 50
 
8.8%
w 30
 
5.3%
m 30
 
5.3%
s 20
 
3.5%
i 20
 
3.5%
d 20
 
3.5%
Other values (10) 140
24.6%
Common
ValueCountFrequency (%)
1 59
17.9%
0 51
15.5%
. 40
12.1%
/ 40
12.1%
_ 30
9.1%
2 21
 
6.4%
= 20
 
6.1%
8 11
 
3.3%
7 11
 
3.3%
3 11
 
3.3%
Other values (7) 36
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 80
 
8.9%
n 80
 
8.9%
1 59
 
6.6%
0 51
 
5.7%
e 50
 
5.6%
o 50
 
5.6%
c 50
 
5.6%
. 40
 
4.4%
/ 40
 
4.4%
w 30
 
3.3%
Other values (27) 370
41.1%

BDCT_TIME
Real number (ℝ)

MISSING 

Distinct10
Distinct (%)100.0%
Missing290
Missing (%)96.7%
Infinite0
Infinite (%)0.0%
Mean1935.9
Minimum1930
Maximum1943
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 KiB
2023-12-12T06:32:41.039858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1930
5-th percentile1930.45
Q11932.5
median1935.5
Q31938.75
95-th percentile1942.1
Maximum1943
Range13
Interquartile range (IQR)6.25

Descriptive statistics

Standard deviation4.3320511
Coefficient of variation (CV)0.0022377453
Kurtosis-1.0159488
Mean1935.9
Median Absolute Deviation (MAD)3.5
Skewness0.23862779
Sum19359
Variance18.766667
MonotonicityStrictly increasing
2023-12-12T06:32:41.126113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1930 1
 
0.3%
1931 1
 
0.3%
1932 1
 
0.3%
1934 1
 
0.3%
1935 1
 
0.3%
1936 1
 
0.3%
1938 1
 
0.3%
1939 1
 
0.3%
1941 1
 
0.3%
1943 1
 
0.3%
(Missing) 290
96.7%
ValueCountFrequency (%)
1930 1
0.3%
1931 1
0.3%
1932 1
0.3%
1934 1
0.3%
1935 1
0.3%
1936 1
0.3%
1938 1
0.3%
1939 1
0.3%
1941 1
0.3%
1943 1
0.3%
ValueCountFrequency (%)
1943 1
0.3%
1941 1
0.3%
1939 1
0.3%
1938 1
0.3%
1936 1
0.3%
1935 1
0.3%
1934 1
0.3%
1932 1
0.3%
1931 1
0.3%
1930 1
0.3%

NWS_SJ
Text

MISSING 

Distinct10
Distinct (%)100.0%
Missing290
Missing (%)96.7%
Memory size2.5 KiB
2023-12-12T06:32:41.297697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length37
Median length29
Mean length28.6
Min length22

Characters and Unicode

Total characters286
Distinct characters124
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)100.0%

Sample

1st row김주하 앵커가 전하는 1월 1일 MBN 뉴스8 주요뉴스
2nd row"황금 개띠의 해 첫 주인공은 나야 나"
3rd row문 대통령, 의인들과 신년맞이 '해돋이 산행'…시민과 '깜짝 통화'
4th row김정은, "평창 올림픽 참가 용의"…당국 대화 시사
5th row북 김정은 "미 전역이 사정권"…'핵 단추' 위협
ValueCountFrequency (%)
김정은 5
 
7.6%
2
 
3.0%
솔솔 1
 
1.5%
사정권"…'핵 1
 
1.5%
단추 1
 
1.5%
위협 1
 
1.5%
청와대 1
 
1.5%
대화제의 1
 
1.5%
환영"…여야 1
 
1.5%
신년사 1
 
1.5%
Other values (51) 51
77.3%
2023-12-12T06:32:41.579198image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
56
 
19.6%
' 16
 
5.6%
" 10
 
3.5%
8
 
2.8%
8
 
2.8%
7
 
2.4%
7
 
2.4%
7
 
2.4%
5
 
1.7%
5
 
1.7%
Other values (114) 157
54.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 182
63.6%
Space Separator 56
 
19.6%
Other Punctuation 39
 
13.6%
Uppercase Letter 6
 
2.1%
Decimal Number 3
 
1.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8
 
4.4%
8
 
4.4%
7
 
3.8%
7
 
3.8%
5
 
2.7%
5
 
2.7%
4
 
2.2%
3
 
1.6%
3
 
1.6%
3
 
1.6%
Other values (99) 129
70.9%
Other Punctuation
ValueCountFrequency (%)
' 16
41.0%
" 10
25.6%
7
17.9%
, 3
 
7.7%
· 2
 
5.1%
? 1
 
2.6%
Uppercase Letter
ValueCountFrequency (%)
A 1
16.7%
E 1
16.7%
U 1
16.7%
N 1
16.7%
B 1
16.7%
M 1
16.7%
Decimal Number
ValueCountFrequency (%)
1 2
66.7%
8 1
33.3%
Space Separator
ValueCountFrequency (%)
56
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 182
63.6%
Common 98
34.3%
Latin 6
 
2.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8
 
4.4%
8
 
4.4%
7
 
3.8%
7
 
3.8%
5
 
2.7%
5
 
2.7%
4
 
2.2%
3
 
1.6%
3
 
1.6%
3
 
1.6%
Other values (99) 129
70.9%
Common
ValueCountFrequency (%)
56
57.1%
' 16
 
16.3%
" 10
 
10.2%
7
 
7.1%
, 3
 
3.1%
1 2
 
2.0%
· 2
 
2.0%
? 1
 
1.0%
8 1
 
1.0%
Latin
ValueCountFrequency (%)
A 1
16.7%
E 1
16.7%
U 1
16.7%
N 1
16.7%
B 1
16.7%
M 1
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 182
63.6%
ASCII 95
33.2%
Punctuation 7
 
2.4%
None 2
 
0.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
56
58.9%
' 16
 
16.8%
" 10
 
10.5%
, 3
 
3.2%
1 2
 
2.1%
? 1
 
1.1%
A 1
 
1.1%
E 1
 
1.1%
U 1
 
1.1%
8 1
 
1.1%
Other values (3) 3
 
3.2%
Hangul
ValueCountFrequency (%)
8
 
4.4%
8
 
4.4%
7
 
3.8%
7
 
3.8%
5
 
2.7%
5
 
2.7%
4
 
2.2%
3
 
1.6%
3
 
1.6%
3
 
1.6%
Other values (99) 129
70.9%
Punctuation
ValueCountFrequency (%)
7
100.0%
None
ValueCountFrequency (%)
· 2
100.0%

NWS_CN
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
<NA>
290 
【 앵커멘트 】
 
9
▶ '무술년 첫 일출'…전국 해맞이 인파 북적
 
1

Length

Max length25
Median length4
Mean length4.19
Min length4

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row<NA>
2nd row▶ '무술년 첫 일출'…전국 해맞이 인파 북적
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 290
96.7%
【 앵커멘트 】 9
 
3.0%
▶ '무술년 첫 일출'…전국 해맞이 인파 북적 1
 
0.3%

Length

2023-12-12T06:32:41.704867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T06:32:41.789396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 290
89.5%
9
 
2.8%
앵커멘트 9
 
2.8%
9
 
2.8%
1
 
0.3%
무술년 1
 
0.3%
1
 
0.3%
일출'…전국 1
 
0.3%
해맞이 1
 
0.3%
인파 1
 
0.3%

NWS_JRNL_NM
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing300
Missing (%)100.0%
Memory size2.8 KiB

REG_DATE
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing300
Missing (%)100.0%
Memory size2.8 KiB

MVP_CRS_NM
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing300
Missing (%)100.0%
Memory size2.8 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing300
Missing (%)100.0%
Memory size2.8 KiB

Interactions

2023-12-12T06:32:38.838302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T06:32:38.690886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T06:32:38.905232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T06:32:38.764282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T06:32:41.846813image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CN
NEWS_CGR_CD1.0001.0001.0001.0001.0000.000
NEWS_NO1.0001.0001.000NaNNaNNaN
BDCT_DATE1.0001.0001.000NaNNaNNaN
BDCT_TIME1.000NaNNaN1.0001.000NaN
NWS_SJ1.000NaNNaN1.0001.0001.000
NWS_CN0.000NaNNaNNaN1.0001.000
2023-12-12T06:32:41.939588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NEWS_NOBDCT_TIMENWS_CN
NEWS_NO1.0000.0181.000
BDCT_TIME0.0181.0000.000
NWS_CN1.0000.0001.000

Missing values

2023-12-12T06:32:39.184573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T06:32:39.301688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T06:32:39.402441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CNNWS_JRNL_NMREG_DATEMVP_CRS_NMUnnamed: 10
0<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
11173098mbn000063424555201801011930김주하 앵커가 전하는 1월 1일 MBN 뉴스8 주요뉴스▶ '무술년 첫 일출'…전국 해맞이 인파 북적<NA><NA><NA><NA>
2전국의 해돋이 명소에 수십만 명의 해맞이 인파가 몰렸습니다. 문재인 대통령도 2017년을 빛낸 의인 6명과 함께 북한산 해돋이 산행으로 집권 2년차 공식 일정을 시작했습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
3<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
4▶ "평창 대표단 파견 용의"…'핵 단추' 위협<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
5북한 김정은이 평창 올림픽에 대표단을 파견할 용의가 있다고 말했습니다. 하지만, 미국을 향해선 미 본토 전역이 사정권 안에 있다고 하는 등 위협을 이어갔습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
6<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
7▶ 청와대 "북 대표단 파견·당국 대화 용의 환영"<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
8청와대는 북한의 평창올림픽 참가 용의에, 환영의 뜻을 밝혔습니다. 여당도 긍정적으로 평가했지만, 자유한국당은 얄팍한 위장 평화 공세라며 반박했습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
9<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CNNWS_JRNL_NMREG_DATEMVP_CRS_NMUnnamed: 10
290이 전 대통령은 원전 계약을 성사시키는 대신 국방분야 협력을 약속한 것 아니냐는 주장에 대해 "알지 못한다"며 "이면계약은 없었다"고 말했습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
291<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
292그러면서 "내가 이야기하면 폭로여서 이야기할 수 없다"며 "문재인 정부가 정신을 차리고 수습한다고 하니 잘 정리될 것"이라고 덧붙였습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
293<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
294▶ 스탠딩 : 최은미 / 기자<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
295- "잠시 주춤하는 듯했던 자유한국당도 기다렸다는 듯 국정조사를 촉구하고 나서며 당분간 논란은 계속될 것으로 보입니다. MBN뉴스 최은미입니다." [ cem@mbn.co.kr ]<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
296<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
297영상취재 : 정재성, 박상곤 기자<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
298영상편집 : 이소영최은미20180101http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1173107<NA><NA><NA><NA><NA><NA><NA>
299<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CN# duplicates
3<NA><NA><NA><NA><NA><NA><NA>112
2【 기자 】<NA><NA><NA><NA><NA><NA>9
0▶ 인터뷰 : 김정은 / 북한 노동당 위원장<NA><NA><NA><NA><NA><NA>5
1▶ 인터뷰 : 문재인 / 대통령<NA><NA><NA><NA><NA><NA>2