Overview

Dataset statistics

Number of variables11
Number of observations226
Missing cells1632
Missing cells (%)65.6%
Duplicate rows2
Duplicate rows (%)0.9%
Total size in memory20.9 KiB
Average record size in memory94.6 B

Variable types

Text4
Unsupported3
Categorical3
Numeric1

Dataset

Description샘플 데이터
AuthorMBN
URLhttps://kdx.kr/data/view/167

Alerts

Dataset has 2 (0.9%) duplicate rowsDuplicates
REG_DATE is highly overall correlated with BDCT_TIME and 1 other fieldsHigh correlation
NWS_CN is highly overall correlated with REG_DATEHigh correlation
BDCT_TIME is highly overall correlated with REG_DATEHigh correlation
NEWS_NO is highly imbalanced (77.9%)Imbalance
NWS_CN is highly imbalanced (86.2%)Imbalance
REG_DATE is highly imbalanced (92.7%)Imbalance
BDCT_NO has 90 (39.8%) missing valuesMissing
NEWS_CGR_CD has 226 (100.0%) missing valuesMissing
BDCT_DATE has 208 (92.0%) missing valuesMissing
BDCT_TIME has 216 (95.6%) missing valuesMissing
NWS_SJ has 216 (95.6%) missing valuesMissing
NWS_JRNL_NM has 226 (100.0%) missing valuesMissing
MVP_CRS_NM has 224 (99.1%) missing valuesMissing
Unnamed: 10 has 226 (100.0%) missing valuesMissing
NEWS_CGR_CD is an unsupported type, check if it needs cleaning or further analysisUnsupported
NWS_JRNL_NM is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 22:28:29.508974
Analysis finished2023-12-11 22:28:31.783223
Duration2.27 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

BDCT_NO
Text

MISSING 

Distinct130
Distinct (%)95.6%
Missing90
Missing (%)39.8%
Memory size1.9 KiB
2023-12-12T07:28:32.010811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length109
Median length68
Mean length42.558824
Min length1

Characters and Unicode

Total characters5788
Distinct characters537
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique129 ?
Unique (%)94.9%

Sample

1st row1023827
2nd row1023828
3rd row1023829
4th row 바람이 불어 다소 쌀쌀했지만 화창한 날씨 덕분에 많은 시민들이 개나리가 활짝 핀 산을 찾는 등, 휴일을 즐겼는데요,
5th row 다양했던 시민들 표정 김순철 기자가 담았습니다.
ValueCountFrequency (%)
46
 
3.5%
12
 
0.9%
기자 9
 
0.7%
인터뷰 9
 
0.7%
있습니다 9
 
0.7%
7
 
0.5%
7
 
0.5%
mbn뉴스 6
 
0.5%
북한이 5
 
0.4%
불법 5
 
0.4%
Other values (992) 1216
91.4%
2023-12-12T07:28:32.464629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1383
 
23.9%
127
 
2.2%
125
 
2.2%
. 119
 
2.1%
99
 
1.7%
71
 
1.2%
71
 
1.2%
63
 
1.1%
62
 
1.1%
60
 
1.0%
Other values (527) 3608
62.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3762
65.0%
Space Separator 1383
 
23.9%
Other Punctuation 252
 
4.4%
Decimal Number 181
 
3.1%
Lowercase Letter 92
 
1.6%
Open Punctuation 31
 
0.5%
Close Punctuation 31
 
0.5%
Uppercase Letter 29
 
0.5%
Other Symbol 14
 
0.2%
Dash Punctuation 13
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
127
 
3.4%
125
 
3.3%
99
 
2.6%
71
 
1.9%
71
 
1.9%
63
 
1.7%
62
 
1.6%
60
 
1.6%
58
 
1.5%
53
 
1.4%
Other values (465) 2973
79.0%
Lowercase Letter
ValueCountFrequency (%)
o 13
14.1%
m 9
9.8%
n 9
9.8%
k 8
 
8.7%
b 7
 
7.6%
c 7
 
7.6%
r 6
 
6.5%
e 5
 
5.4%
g 4
 
4.3%
a 4
 
4.3%
Other values (12) 20
21.7%
Other Punctuation
ValueCountFrequency (%)
. 119
47.2%
" 36
 
14.3%
, 28
 
11.1%
' 24
 
9.5%
/ 13
 
5.2%
: 13
 
5.2%
@ 5
 
2.0%
? 4
 
1.6%
· 4
 
1.6%
! 3
 
1.2%
Other values (2) 3
 
1.2%
Decimal Number
ValueCountFrequency (%)
1 51
28.2%
2 31
17.1%
3 28
15.5%
0 21
11.6%
8 15
 
8.3%
4 12
 
6.6%
7 8
 
4.4%
9 5
 
2.8%
6 5
 
2.8%
5 5
 
2.8%
Uppercase Letter
ValueCountFrequency (%)
B 8
27.6%
M 8
27.6%
N 8
27.6%
A 3
 
10.3%
V 1
 
3.4%
T 1
 
3.4%
Open Punctuation
ValueCountFrequency (%)
( 10
32.3%
8
25.8%
7
22.6%
[ 6
19.4%
Close Punctuation
ValueCountFrequency (%)
) 10
32.3%
8
25.8%
7
22.6%
] 6
19.4%
Other Symbol
ValueCountFrequency (%)
12
85.7%
2
 
14.3%
Space Separator
ValueCountFrequency (%)
1383
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3762
65.0%
Common 1905
32.9%
Latin 121
 
2.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
127
 
3.4%
125
 
3.3%
99
 
2.6%
71
 
1.9%
71
 
1.9%
63
 
1.7%
62
 
1.6%
60
 
1.6%
58
 
1.5%
53
 
1.4%
Other values (465) 2973
79.0%
Common
ValueCountFrequency (%)
1383
72.6%
. 119
 
6.2%
1 51
 
2.7%
" 36
 
1.9%
2 31
 
1.6%
3 28
 
1.5%
, 28
 
1.5%
' 24
 
1.3%
0 21
 
1.1%
8 15
 
0.8%
Other values (24) 169
 
8.9%
Latin
ValueCountFrequency (%)
o 13
 
10.7%
m 9
 
7.4%
n 9
 
7.4%
B 8
 
6.6%
k 8
 
6.6%
M 8
 
6.6%
N 8
 
6.6%
b 7
 
5.8%
c 7
 
5.8%
r 6
 
5.0%
Other values (18) 38
31.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3762
65.0%
ASCII 1976
34.1%
None 34
 
0.6%
Geometric Shapes 12
 
0.2%
Punctuation 2
 
< 0.1%
Misc Symbols 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1383
70.0%
. 119
 
6.0%
1 51
 
2.6%
" 36
 
1.8%
2 31
 
1.6%
3 28
 
1.4%
, 28
 
1.4%
' 24
 
1.2%
0 21
 
1.1%
8 15
 
0.8%
Other values (44) 240
 
12.1%
Hangul
ValueCountFrequency (%)
127
 
3.4%
125
 
3.3%
99
 
2.6%
71
 
1.9%
71
 
1.9%
63
 
1.7%
62
 
1.6%
60
 
1.6%
58
 
1.5%
53
 
1.4%
Other values (465) 2973
79.0%
Geometric Shapes
ValueCountFrequency (%)
12
100.0%
None
ValueCountFrequency (%)
8
23.5%
8
23.5%
7
20.6%
7
20.6%
· 4
11.8%
Punctuation
ValueCountFrequency (%)
2
100.0%
Misc Symbols
ValueCountFrequency (%)
2
100.0%

NEWS_CGR_CD
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing226
Missing (%)100.0%
Memory size2.1 KiB

NEWS_NO
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
<NA>
218 
20120407
 
8

Length

Max length8
Median length4
Mean length4.1415929
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 218
96.5%
20120407 8
 
3.5%

Length

2023-12-12T07:28:32.585641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T07:28:32.669641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 218
96.5%
20120407 8
 
3.5%

BDCT_DATE
Text

MISSING 

Distinct9
Distinct (%)50.0%
Missing208
Missing (%)92.0%
Memory size1.9 KiB
2023-12-12T07:28:32.788212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length82
Median length8
Mean length40.888889
Min length8

Characters and Unicode

Total characters736
Distinct characters37
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)44.4%

Sample

1st row20120407
2nd row20120407
3rd row20120407
4th rowhttp://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1023829
5th row20120407
ValueCountFrequency (%)
20120407 10
55.6%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1023829 1
 
5.6%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1023830 1
 
5.6%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1023831 1
 
5.6%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1023832 1
 
5.6%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1023833 1
 
5.6%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1023834 1
 
5.6%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1023835 1
 
5.6%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1023836 1
 
5.6%
2023-12-12T07:28:33.073035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 64
 
8.7%
n 64
 
8.7%
0 47
 
6.4%
e 40
 
5.4%
o 40
 
5.4%
c 40
 
5.4%
2 38
 
5.2%
. 32
 
4.3%
/ 32
 
4.3%
_ 24
 
3.3%
Other values (27) 315
42.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 448
60.9%
Decimal Number 152
 
20.7%
Other Punctuation 88
 
12.0%
Connector Punctuation 24
 
3.3%
Math Symbol 16
 
2.2%
Uppercase Letter 8
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 64
14.3%
n 64
14.3%
e 40
 
8.9%
o 40
 
8.9%
c 40
 
8.9%
w 24
 
5.4%
m 24
 
5.4%
s 16
 
3.6%
l 16
 
3.6%
i 16
 
3.6%
Other values (9) 104
23.2%
Decimal Number
ValueCountFrequency (%)
0 47
30.9%
2 38
25.0%
1 19
12.5%
3 16
 
10.5%
4 11
 
7.2%
7 10
 
6.6%
8 8
 
5.3%
9 1
 
0.7%
5 1
 
0.7%
6 1
 
0.7%
Other Punctuation
ValueCountFrequency (%)
. 32
36.4%
/ 32
36.4%
& 8
 
9.1%
? 8
 
9.1%
: 8
 
9.1%
Connector Punctuation
ValueCountFrequency (%)
_ 24
100.0%
Math Symbol
ValueCountFrequency (%)
= 16
100.0%
Uppercase Letter
ValueCountFrequency (%)
C 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 456
62.0%
Common 280
38.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 64
14.0%
n 64
14.0%
e 40
 
8.8%
o 40
 
8.8%
c 40
 
8.8%
w 24
 
5.3%
m 24
 
5.3%
s 16
 
3.5%
l 16
 
3.5%
i 16
 
3.5%
Other values (10) 112
24.6%
Common
ValueCountFrequency (%)
0 47
16.8%
2 38
13.6%
. 32
11.4%
/ 32
11.4%
_ 24
8.6%
1 19
6.8%
= 16
 
5.7%
3 16
 
5.7%
4 11
 
3.9%
7 10
 
3.6%
Other values (7) 35
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 736
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 64
 
8.7%
n 64
 
8.7%
0 47
 
6.4%
e 40
 
5.4%
o 40
 
5.4%
c 40
 
5.4%
2 38
 
5.2%
. 32
 
4.3%
/ 32
 
4.3%
_ 24
 
3.3%
Other values (27) 315
42.8%

BDCT_TIME
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct10
Distinct (%)100.0%
Missing216
Missing (%)95.6%
Infinite0
Infinite (%)0.0%
Mean2021.1
Minimum2000
Maximum2028
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB
2023-12-12T07:28:33.171953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2008.1
Q12020.25
median2023
Q32025.75
95-th percentile2027.55
Maximum2028
Range28
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation8.0753397
Coefficient of variation (CV)0.0039955171
Kurtosis6.0621744
Mean2021.1
Median Absolute Deviation (MAD)3
Skewness-2.291706
Sum20211
Variance65.211111
MonotonicityNot monotonic
2023-12-12T07:28:33.270180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2028 1
 
0.4%
2000 1
 
0.4%
2027 1
 
0.4%
2026 1
 
0.4%
2025 1
 
0.4%
2024 1
 
0.4%
2022 1
 
0.4%
2021 1
 
0.4%
2020 1
 
0.4%
2018 1
 
0.4%
(Missing) 216
95.6%
ValueCountFrequency (%)
2000 1
0.4%
2018 1
0.4%
2020 1
0.4%
2021 1
0.4%
2022 1
0.4%
2024 1
0.4%
2025 1
0.4%
2026 1
0.4%
2027 1
0.4%
2028 1
0.4%
ValueCountFrequency (%)
2028 1
0.4%
2027 1
0.4%
2026 1
0.4%
2025 1
0.4%
2024 1
0.4%
2022 1
0.4%
2021 1
0.4%
2020 1
0.4%
2018 1
0.4%
2000 1
0.4%

NWS_SJ
Text

MISSING 

Distinct10
Distinct (%)100.0%
Missing216
Missing (%)95.6%
Memory size1.9 KiB
2023-12-12T07:28:33.454532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length42
Median length34.5
Mean length27.3
Min length3

Characters and Unicode

Total characters273
Distinct characters125
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)100.0%

Sample

1st row주요 뉴스
2nd row클로징
3rd row[김순철 기자] 개나리꽃 보러 왔어요…봄 나들이객 '북적'
4th row[김태영 기자] 강풍 피해 속출…상하층 온도 차 원인
5th row[김명준 기자] [4·11 총선] 여야, 마지막 주말 '유세 총력전'
ValueCountFrequency (%)
기자 7
 
10.6%
총선 3
 
4.5%
4·11 3
 
4.5%
주요 1
 
1.5%
경찰 1
 
1.5%
김시영 1
 
1.5%
선거판에 1
 
1.5%
선파라치 1
 
1.5%
떴다 1
 
1.5%
이성훈 1
 
1.5%
Other values (46) 46
69.7%
2023-12-12T07:28:33.733249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
56
 
20.5%
1 10
 
3.7%
] 9
 
3.3%
[ 9
 
3.3%
' 8
 
2.9%
7
 
2.6%
7
 
2.6%
5
 
1.8%
5
 
1.8%
4
 
1.5%
Other values (115) 153
56.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 159
58.2%
Space Separator 56
 
20.5%
Other Punctuation 22
 
8.1%
Decimal Number 18
 
6.6%
Close Punctuation 9
 
3.3%
Open Punctuation 9
 
3.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7
 
4.4%
7
 
4.4%
5
 
3.1%
5
 
3.1%
4
 
2.5%
4
 
2.5%
3
 
1.9%
3
 
1.9%
3
 
1.9%
3
 
1.9%
Other values (100) 115
72.3%
Decimal Number
ValueCountFrequency (%)
1 10
55.6%
4 3
 
16.7%
2 2
 
11.1%
6 1
 
5.6%
3 1
 
5.6%
7 1
 
5.6%
Other Punctuation
ValueCountFrequency (%)
' 8
36.4%
" 4
18.2%
, 3
 
13.6%
· 3
 
13.6%
3
 
13.6%
! 1
 
4.5%
Space Separator
ValueCountFrequency (%)
56
100.0%
Close Punctuation
ValueCountFrequency (%)
] 9
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 159
58.2%
Common 114
41.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7
 
4.4%
7
 
4.4%
5
 
3.1%
5
 
3.1%
4
 
2.5%
4
 
2.5%
3
 
1.9%
3
 
1.9%
3
 
1.9%
3
 
1.9%
Other values (100) 115
72.3%
Common
ValueCountFrequency (%)
56
49.1%
1 10
 
8.8%
] 9
 
7.9%
[ 9
 
7.9%
' 8
 
7.0%
" 4
 
3.5%
4 3
 
2.6%
, 3
 
2.6%
· 3
 
2.6%
3
 
2.6%
Other values (5) 6
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 159
58.2%
ASCII 108
39.6%
None 3
 
1.1%
Punctuation 3
 
1.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
56
51.9%
1 10
 
9.3%
] 9
 
8.3%
[ 9
 
8.3%
' 8
 
7.4%
" 4
 
3.7%
4 3
 
2.8%
, 3
 
2.8%
2 2
 
1.9%
! 1
 
0.9%
Other values (3) 3
 
2.8%
Hangul
ValueCountFrequency (%)
7
 
4.4%
7
 
4.4%
5
 
3.1%
5
 
3.1%
4
 
2.5%
4
 
2.5%
3
 
1.9%
3
 
1.9%
3
 
1.9%
3
 
1.9%
Other values (100) 115
72.3%
None
ValueCountFrequency (%)
· 3
100.0%
Punctuation
ValueCountFrequency (%)
3
100.0%

NWS_CN
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
<NA>
216 
【 앵커멘트 】
 
7
주요 뉴스
 
1
클로징
 
1
경기도 수원 여성 납치 살해 사건의 부실 대응 논란과 관련해 경찰이 112 신고센터와 상황실 운영 체계를 전면 개편하기로 했습니다.
 
1

Length

Max length73
Median length4
Mean length4.4292035
Min length3

Unique

Unique3 ?
Unique (%)1.3%

Sample

1st row<NA>
2nd row주요 뉴스
3rd row클로징
4th row【 앵커멘트 】
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 216
95.6%
【 앵커멘트 】 7
 
3.1%
주요 뉴스 1
 
0.4%
클로징 1
 
0.4%
경기도 수원 여성 납치 살해 사건의 부실 대응 논란과 관련해 경찰이 112 신고센터와 상황실 운영 체계를 전면 개편하기로 했습니다. 1
 
0.4%

Length

2023-12-12T07:28:33.841612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T07:28:33.920387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 216
83.4%
앵커멘트 7
 
2.7%
7
 
2.7%
7
 
2.7%
논란과 1
 
0.4%
개편하기로 1
 
0.4%
전면 1
 
0.4%
체계를 1
 
0.4%
운영 1
 
0.4%
상황실 1
 
0.4%
Other values (16) 16
 
6.2%

NWS_JRNL_NM
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing226
Missing (%)100.0%
Memory size2.1 KiB

REG_DATE
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
<NA>
224 
20120407
 
2

Length

Max length8
Median length4
Mean length4.0353982
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row20120407
3rd row20120407
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 224
99.1%
20120407 2
 
0.9%

Length

2023-12-12T07:28:34.018586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T07:28:34.104267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 224
99.1%
20120407 2
 
0.9%

MVP_CRS_NM
Text

MISSING 

Distinct2
Distinct (%)100.0%
Missing224
Missing (%)99.1%
Memory size1.9 KiB
2023-12-12T07:28:34.255216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length82
Median length82
Mean length82
Min length82

Characters and Unicode

Total characters164
Distinct characters33
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)100.0%

Sample

1st rowhttp://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1023827
2nd rowhttp://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1023828
ValueCountFrequency (%)
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1023827 1
50.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1023828 1
50.0%
2023-12-12T07:28:34.505290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 16
 
9.8%
t 16
 
9.8%
c 10
 
6.1%
e 10
 
6.1%
o 10
 
6.1%
/ 8
 
4.9%
. 8
 
4.9%
2 6
 
3.7%
w 6
 
3.7%
m 6
 
3.7%
Other values (23) 68
41.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 112
68.3%
Other Punctuation 22
 
13.4%
Decimal Number 18
 
11.0%
Connector Punctuation 6
 
3.7%
Math Symbol 4
 
2.4%
Uppercase Letter 2
 
1.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 16
14.3%
t 16
14.3%
c 10
 
8.9%
e 10
 
8.9%
o 10
 
8.9%
w 6
 
5.4%
m 6
 
5.4%
l 4
 
3.6%
i 4
 
3.6%
d 4
 
3.6%
Other values (9) 26
23.2%
Decimal Number
ValueCountFrequency (%)
2 6
33.3%
0 4
22.2%
8 3
16.7%
1 2
 
11.1%
3 2
 
11.1%
7 1
 
5.6%
Other Punctuation
ValueCountFrequency (%)
/ 8
36.4%
. 8
36.4%
& 2
 
9.1%
? 2
 
9.1%
: 2
 
9.1%
Connector Punctuation
ValueCountFrequency (%)
_ 6
100.0%
Math Symbol
ValueCountFrequency (%)
= 4
100.0%
Uppercase Letter
ValueCountFrequency (%)
C 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 114
69.5%
Common 50
30.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 16
14.0%
t 16
14.0%
c 10
 
8.8%
e 10
 
8.8%
o 10
 
8.8%
w 6
 
5.3%
m 6
 
5.3%
l 4
 
3.5%
i 4
 
3.5%
d 4
 
3.5%
Other values (10) 28
24.6%
Common
ValueCountFrequency (%)
/ 8
16.0%
. 8
16.0%
2 6
12.0%
_ 6
12.0%
= 4
8.0%
0 4
8.0%
8 3
 
6.0%
& 2
 
4.0%
1 2
 
4.0%
3 2
 
4.0%
Other values (3) 5
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 164
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 16
 
9.8%
t 16
 
9.8%
c 10
 
6.1%
e 10
 
6.1%
o 10
 
6.1%
/ 8
 
4.9%
. 8
 
4.9%
2 6
 
3.7%
w 6
 
3.7%
m 6
 
3.7%
Other values (23) 68
41.5%

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing226
Missing (%)100.0%
Memory size2.1 KiB

Interactions

2023-12-12T07:28:31.298770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T07:28:34.570579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
BDCT_DATEBDCT_TIMENWS_SJNWS_CNMVP_CRS_NM
BDCT_DATE1.000NaNNaNNaNNaN
BDCT_TIMENaN1.0001.0000.000NaN
NWS_SJNaN1.0001.0001.0000.000
NWS_CNNaN0.0001.0001.0000.000
MVP_CRS_NMNaNNaN0.0000.0001.000
2023-12-12T07:28:34.650694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NEWS_NOREG_DATENWS_CN
NEWS_NO1.000NaNNaN
REG_DATENaN1.0001.000
NWS_CNNaN1.0001.000
2023-12-12T07:28:34.721002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
BDCT_TIMENEWS_NONWS_CNREG_DATE
BDCT_TIME1.0000.0000.2671.000
NEWS_NO0.0001.0000.0000.000
NWS_CN0.2670.0001.0001.000
REG_DATE1.0000.0001.0001.000

Missing values

2023-12-12T07:28:31.459751image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T07:28:31.586912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T07:28:31.702201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CNNWS_JRNL_NMREG_DATEMVP_CRS_NMUnnamed: 10
0<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
11023827<NA><NA>201204072028주요 뉴스주요 뉴스<NA>20120407http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1023827<NA>
21023828<NA><NA>201204072000클로징클로징<NA>20120407http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1023828<NA>
31023829<NA><NA>201204072027[김순철 기자] 개나리꽃 보러 왔어요…봄 나들이객 '북적'【 앵커멘트 】<NA><NA><NA><NA>
4바람이 불어 다소 쌀쌀했지만 화창한 날씨 덕분에 많은 시민들이 개나리가 활짝 핀 산을 찾는 등, 휴일을 즐겼는데요,<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
5다양했던 시민들 표정 김순철 기자가 담았습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
6<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
7<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
8【 기자 】<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
9「"날씨가 오늘 기가 막힌다. 하늘 봐, 하늘."<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CNNWS_JRNL_NMREG_DATEMVP_CRS_NMUnnamed: 10
216시기적으로도 북한이 14일을 선택할 것이라는 전망이 우세합니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
217<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
218오는 15일은 북한이 전례 없는 대규모 행사를 준비하고 있는 김일성의 100회 생일입니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
219<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
220또 13일은 최고인민회의가 예정돼 있어 14일이 로켓 발사의 극적인 효과를 내는 데 가장 좋은 날이기 때문입니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
221<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
222군 당국은 북한이 곧 로켓에 연료주입을 시작할 것으로 보고, 발사장 상황을 면밀하게 살피고 있습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
223<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
224MBN뉴스 이미혜입니다.<NA>20120407http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1023836<NA><NA><NA><NA><NA><NA><NA>
225<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

BDCT_NONEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CNREG_DATEMVP_CRS_NM# duplicates
1<NA><NA><NA><NA><NA><NA><NA><NA>90
0【 기자 】<NA><NA><NA><NA><NA><NA><NA>7