Overview

Dataset statistics

Number of variables11
Number of observations298
Missing cells2713
Missing cells (%)82.8%
Duplicate rows5
Duplicate rows (%)1.7%
Total size in memory27.5 KiB
Average record size in memory94.4 B

Variable types

Text4
Numeric2
Categorical1
Unsupported4

Dataset

Description샘플 데이터
AuthorMBN
URLhttps://kdx.kr/data/view/173

Alerts

Dataset has 5 (1.7%) duplicate rowsDuplicates
NEWS_NO is highly overall correlated with NWS_CNHigh correlation
BDCT_TIME is highly overall correlated with NWS_CNHigh correlation
NWS_CN is highly overall correlated with NEWS_NO and 1 other fieldsHigh correlation
NWS_CN is highly imbalanced (78.8%)Imbalance
BDCT_NO has 111 (37.2%) missing valuesMissing
NEWS_CGR_CD has 278 (93.3%) missing valuesMissing
NEWS_NO has 278 (93.3%) missing valuesMissing
BDCT_DATE has 278 (93.3%) missing valuesMissing
BDCT_TIME has 288 (96.6%) missing valuesMissing
NWS_SJ has 288 (96.6%) missing valuesMissing
NWS_JRNL_NM has 298 (100.0%) missing valuesMissing
REG_DATE has 298 (100.0%) missing valuesMissing
MVP_CRS_NM has 298 (100.0%) missing valuesMissing
Unnamed: 10 has 298 (100.0%) missing valuesMissing
NWS_JRNL_NM is an unsupported type, check if it needs cleaning or further analysisUnsupported
REG_DATE is an unsupported type, check if it needs cleaning or further analysisUnsupported
MVP_CRS_NM is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 21:11:43.772057
Analysis finished2023-12-11 21:11:44.825614
Duration1.05 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

BDCT_NO
Text

MISSING 

Distinct164
Distinct (%)87.7%
Missing111
Missing (%)37.2%
Memory size2.5 KiB
2023-12-12T06:11:45.035023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length132
Median length81
Mean length41.235294
Min length6

Characters and Unicode

Total characters7711
Distinct characters552
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique159 ?
Unique (%)85.0%

Sample

1st row1144456
2nd row 박영수 특별검사팀은 새해 첫날부터 문형표 전 복지부 장관과 김 종 전 문체부 차관 등 '최순실 게이트'의 핵심 인물들을 줄소환했습니다.
3rd row 박영수 특검은 열심히 하겠다는 짧은 한마디로 새해 특검의 각오를 밝혔습니다.
4th row 한민용 기자입니다.
5th row【 기자 】
ValueCountFrequency (%)
91
 
5.0%
24
 
1.3%
기자 20
 
1.1%
박근혜 16
 
0.9%
16
 
0.9%
김정은 12
 
0.7%
인터뷰 11
 
0.6%
김정은은 11
 
0.6%
10
 
0.5%
10
 
0.5%
Other values (1148) 1600
87.9%
2023-12-12T06:11:45.413830image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1904
 
24.7%
148
 
1.9%
. 140
 
1.8%
128
 
1.7%
111
 
1.4%
86
 
1.1%
83
 
1.1%
81
 
1.1%
78
 
1.0%
78
 
1.0%
Other values (542) 4874
63.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 5090
66.0%
Space Separator 1904
 
24.7%
Other Punctuation 362
 
4.7%
Decimal Number 142
 
1.8%
Uppercase Letter 70
 
0.9%
Lowercase Letter 47
 
0.6%
Other Symbol 27
 
0.4%
Dash Punctuation 25
 
0.3%
Close Punctuation 22
 
0.3%
Open Punctuation 22
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
148
 
2.9%
128
 
2.5%
111
 
2.2%
86
 
1.7%
83
 
1.6%
81
 
1.6%
78
 
1.5%
78
 
1.5%
71
 
1.4%
69
 
1.4%
Other values (484) 4157
81.7%
Lowercase Letter
ValueCountFrequency (%)
n 5
10.6%
m 5
10.6%
a 4
 
8.5%
o 4
 
8.5%
c 4
 
8.5%
t 3
 
6.4%
b 3
 
6.4%
r 3
 
6.4%
g 2
 
4.3%
y 2
 
4.3%
Other values (9) 12
25.5%
Decimal Number
ValueCountFrequency (%)
4 36
25.4%
1 34
23.9%
3 13
 
9.2%
0 13
 
9.2%
2 13
 
9.2%
6 10
 
7.0%
5 10
 
7.0%
7 8
 
5.6%
8 4
 
2.8%
9 1
 
0.7%
Other Punctuation
ValueCountFrequency (%)
. 140
38.7%
" 63
17.4%
, 54
 
14.9%
: 38
 
10.5%
' 30
 
8.3%
/ 25
 
6.9%
7
 
1.9%
@ 4
 
1.1%
· 1
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
N 18
25.7%
B 11
15.7%
M 11
15.7%
S 9
12.9%
C 8
11.4%
Y 8
11.4%
K 2
 
2.9%
D 2
 
2.9%
L 1
 
1.4%
Other Symbol
ValueCountFrequency (%)
24
88.9%
2
 
7.4%
1
 
3.7%
Close Punctuation
ValueCountFrequency (%)
10
45.5%
) 8
36.4%
] 4
 
18.2%
Open Punctuation
ValueCountFrequency (%)
10
45.5%
( 8
36.4%
[ 4
 
18.2%
Space Separator
ValueCountFrequency (%)
1904
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 5090
66.0%
Common 2504
32.5%
Latin 117
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
148
 
2.9%
128
 
2.5%
111
 
2.2%
86
 
1.7%
83
 
1.6%
81
 
1.6%
78
 
1.5%
78
 
1.5%
71
 
1.4%
69
 
1.4%
Other values (484) 4157
81.7%
Common
ValueCountFrequency (%)
1904
76.0%
. 140
 
5.6%
" 63
 
2.5%
, 54
 
2.2%
: 38
 
1.5%
4 36
 
1.4%
1 34
 
1.4%
' 30
 
1.2%
- 25
 
1.0%
/ 25
 
1.0%
Other values (20) 155
 
6.2%
Latin
ValueCountFrequency (%)
N 18
15.4%
B 11
 
9.4%
M 11
 
9.4%
S 9
 
7.7%
C 8
 
6.8%
Y 8
 
6.8%
n 5
 
4.3%
m 5
 
4.3%
a 4
 
3.4%
o 4
 
3.4%
Other values (18) 34
29.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 5090
66.0%
ASCII 2566
33.3%
Geometric Shapes 24
 
0.3%
None 21
 
0.3%
Punctuation 7
 
0.1%
Misc Symbols 2
 
< 0.1%
CJK Compat 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1904
74.2%
. 140
 
5.5%
" 63
 
2.5%
, 54
 
2.1%
: 38
 
1.5%
4 36
 
1.4%
1 34
 
1.3%
' 30
 
1.2%
- 25
 
1.0%
/ 25
 
1.0%
Other values (41) 217
 
8.5%
Hangul
ValueCountFrequency (%)
148
 
2.9%
128
 
2.5%
111
 
2.2%
86
 
1.7%
83
 
1.6%
81
 
1.6%
78
 
1.5%
78
 
1.5%
71
 
1.4%
69
 
1.4%
Other values (484) 4157
81.7%
Geometric Shapes
ValueCountFrequency (%)
24
100.0%
None
ValueCountFrequency (%)
10
47.6%
10
47.6%
· 1
 
4.8%
Punctuation
ValueCountFrequency (%)
7
100.0%
Misc Symbols
ValueCountFrequency (%)
2
100.0%
CJK Compat
ValueCountFrequency (%)
1
100.0%

NEWS_CGR_CD
Text

MISSING 

Distinct12
Distinct (%)60.0%
Missing278
Missing (%)93.3%
Memory size2.5 KiB
2023-12-12T06:11:45.533579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length5.5
Mean length5.5
Min length3

Characters and Unicode

Total characters110
Distinct characters33
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)50.0%

Sample

1st rowmbn00009
2nd row한민용
3rd rowmbn00009
4th row전정인
5th rowmbn00009
ValueCountFrequency (%)
mbn00006 7
35.0%
mbn00009 3
15.0%
한민용 1
 
5.0%
전정인 1
 
5.0%
노태현 1
 
5.0%
박통일 1
 
5.0%
윤지원 1
 
5.0%
강영구 1
 
5.0%
이재호 1
 
5.0%
김건훈 1
 
5.0%
Other values (2) 2
 
10.0%
2023-12-12T06:11:45.737635image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 40
36.4%
m 10
 
9.1%
n 10
 
9.1%
b 10
 
9.1%
6 7
 
6.4%
9 3
 
2.7%
2
 
1.8%
2
 
1.8%
2
 
1.8%
1
 
0.9%
Other values (23) 23
20.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 50
45.5%
Lowercase Letter 30
27.3%
Other Letter 30
27.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
Other values (17) 17
56.7%
Decimal Number
ValueCountFrequency (%)
0 40
80.0%
6 7
 
14.0%
9 3
 
6.0%
Lowercase Letter
ValueCountFrequency (%)
m 10
33.3%
n 10
33.3%
b 10
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common 50
45.5%
Latin 30
27.3%
Hangul 30
27.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
Other values (17) 17
56.7%
Common
ValueCountFrequency (%)
0 40
80.0%
6 7
 
14.0%
9 3
 
6.0%
Latin
ValueCountFrequency (%)
m 10
33.3%
n 10
33.3%
b 10
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 80
72.7%
Hangul 30
 
27.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 40
50.0%
m 10
 
12.5%
n 10
 
12.5%
b 10
 
12.5%
6 7
 
8.8%
9 3
 
3.8%
Hangul
ValueCountFrequency (%)
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
Other values (17) 17
56.7%

NEWS_NO
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct11
Distinct (%)55.0%
Missing278
Missing (%)93.3%
Infinite0
Infinite (%)0.0%
Mean11637877
Minimum3105642
Maximum20170101
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.7 KiB
2023-12-12T06:11:45.826229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3105642
5-th percentile3105645.8
Q13105653.8
median11637880
Q320170101
95-th percentile20170101
Maximum20170101
Range17064459
Interquartile range (IQR)17064447

Descriptive statistics

Standard deviation8753877.4
Coefficient of variation (CV)0.75218854
Kurtosis-2.2352941
Mean11637877
Median Absolute Deviation (MAD)8532220.5
Skewness-5.853214 × 10-13
Sum2.3275753 × 108
Variance7.663037 × 1013
MonotonicityNot monotonic
2023-12-12T06:11:45.909435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
20170101 10
 
3.4%
3105654 1
 
0.3%
3105652 1
 
0.3%
3105649 1
 
0.3%
3105646 1
 
0.3%
3105660 1
 
0.3%
3105657 1
 
0.3%
3105656 1
 
0.3%
3105653 1
 
0.3%
3105642 1
 
0.3%
(Missing) 278
93.3%
ValueCountFrequency (%)
3105642 1
0.3%
3105646 1
0.3%
3105649 1
0.3%
3105652 1
0.3%
3105653 1
0.3%
3105654 1
0.3%
3105655 1
0.3%
3105656 1
0.3%
3105657 1
0.3%
3105660 1
0.3%
ValueCountFrequency (%)
20170101 10
3.4%
3105660 1
 
0.3%
3105657 1
 
0.3%
3105656 1
 
0.3%
3105655 1
 
0.3%
3105654 1
 
0.3%
3105653 1
 
0.3%
3105652 1
 
0.3%
3105649 1
 
0.3%
3105646 1
 
0.3%

BDCT_DATE
Text

MISSING 

Distinct11
Distinct (%)55.0%
Missing278
Missing (%)93.3%
Memory size2.5 KiB
2023-12-12T06:11:46.061256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length82
Median length45
Mean length45
Min length8

Characters and Unicode

Total characters900
Distinct characters37
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)50.0%

Sample

1st row20170101
2nd rowhttp://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1144456
3rd row20170101
4th rowhttp://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1144457
5th row20170101
ValueCountFrequency (%)
20170101 10
50.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144456 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144457 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144458 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144459 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144460 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144461 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144462 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144463 1
 
5.0%
http://www.mbn.co.kr/player/moviecontents.mbn?content_cls_cd=20&content_id=1144464 1
 
5.0%
2023-12-12T06:11:46.326931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 80
 
8.9%
n 80
 
8.9%
1 51
 
5.7%
e 50
 
5.6%
o 50
 
5.6%
c 50
 
5.6%
0 41
 
4.6%
/ 40
 
4.4%
. 40
 
4.4%
4 31
 
3.4%
Other values (27) 387
43.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 560
62.2%
Decimal Number 170
 
18.9%
Other Punctuation 110
 
12.2%
Connector Punctuation 30
 
3.3%
Math Symbol 20
 
2.2%
Uppercase Letter 10
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 80
14.3%
n 80
14.3%
e 50
 
8.9%
o 50
 
8.9%
c 50
 
8.9%
m 30
 
5.4%
w 30
 
5.4%
s 20
 
3.6%
d 20
 
3.6%
i 20
 
3.6%
Other values (9) 130
23.2%
Decimal Number
ValueCountFrequency (%)
1 51
30.0%
0 41
24.1%
4 31
18.2%
2 21
12.4%
7 11
 
6.5%
6 7
 
4.1%
5 5
 
2.9%
8 1
 
0.6%
9 1
 
0.6%
3 1
 
0.6%
Other Punctuation
ValueCountFrequency (%)
/ 40
36.4%
. 40
36.4%
: 10
 
9.1%
& 10
 
9.1%
? 10
 
9.1%
Connector Punctuation
ValueCountFrequency (%)
_ 30
100.0%
Math Symbol
ValueCountFrequency (%)
= 20
100.0%
Uppercase Letter
ValueCountFrequency (%)
C 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 570
63.3%
Common 330
36.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 80
14.0%
n 80
14.0%
e 50
 
8.8%
o 50
 
8.8%
c 50
 
8.8%
m 30
 
5.3%
w 30
 
5.3%
s 20
 
3.5%
d 20
 
3.5%
i 20
 
3.5%
Other values (10) 140
24.6%
Common
ValueCountFrequency (%)
1 51
15.5%
0 41
12.4%
/ 40
12.1%
. 40
12.1%
4 31
9.4%
_ 30
9.1%
2 21
6.4%
= 20
 
6.1%
7 11
 
3.3%
: 10
 
3.0%
Other values (7) 35
10.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 80
 
8.9%
n 80
 
8.9%
1 51
 
5.7%
e 50
 
5.6%
o 50
 
5.6%
c 50
 
5.6%
0 41
 
4.6%
/ 40
 
4.4%
. 40
 
4.4%
4 31
 
3.4%
Other values (27) 387
43.0%

BDCT_TIME
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct10
Distinct (%)100.0%
Missing288
Missing (%)96.6%
Infinite0
Infinite (%)0.0%
Mean1939
Minimum1930
Maximum1948
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.7 KiB
2023-12-12T06:11:46.412863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1930
5-th percentile1930.9
Q11934.5
median1939
Q31943.5
95-th percentile1947.1
Maximum1948
Range18
Interquartile range (IQR)9

Descriptive statistics

Standard deviation6.0553007
Coefficient of variation (CV)0.0031228988
Kurtosis-1.2
Mean1939
Median Absolute Deviation (MAD)5
Skewness0
Sum19390
Variance36.666667
MonotonicityStrictly increasing
2023-12-12T06:11:46.501429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1930 1
 
0.3%
1932 1
 
0.3%
1934 1
 
0.3%
1936 1
 
0.3%
1938 1
 
0.3%
1940 1
 
0.3%
1942 1
 
0.3%
1944 1
 
0.3%
1946 1
 
0.3%
1948 1
 
0.3%
(Missing) 288
96.6%
ValueCountFrequency (%)
1930 1
0.3%
1932 1
0.3%
1934 1
0.3%
1936 1
0.3%
1938 1
0.3%
1940 1
0.3%
1942 1
0.3%
1944 1
0.3%
1946 1
0.3%
1948 1
0.3%
ValueCountFrequency (%)
1948 1
0.3%
1946 1
0.3%
1944 1
0.3%
1942 1
0.3%
1940 1
0.3%
1938 1
0.3%
1936 1
0.3%
1934 1
0.3%
1932 1
0.3%
1930 1
0.3%

NWS_SJ
Text

MISSING 

Distinct10
Distinct (%)100.0%
Missing288
Missing (%)96.6%
Memory size2.5 KiB
2023-12-12T06:11:46.673538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length27.5
Mean length26
Min length17

Characters and Unicode

Total characters260
Distinct characters134
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)100.0%

Sample

1st row특검, 새해 첫날부터 줄소환…'삼성그룹' 겨눈다
2nd row블랙리스트 수사도 속도…김기춘·조윤선 곧 소환
3rd row헌재, 모레부터 본격 심리 돌입…새해 벽두부터 강행군
4th row박 대통령 "세월호 의혹, 기가 막히고 어이가 없다"
5th row박 대통령 "뇌물죄? 누구 봐줄 생각 손톱만큼도 없다"
ValueCountFrequency (%)
김정은 3
 
4.8%
없다 2
 
3.2%
대통령 2
 
3.2%
2
 
3.2%
특검 1
 
1.6%
담담하고 1
 
1.6%
재개 1
 
1.6%
icbm 1
 
1.6%
시험발사 1
 
1.6%
마감 1
 
1.6%
Other values (47) 47
75.8%
2023-12-12T06:11:46.954283image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
52
 
20.0%
" 8
 
3.1%
6
 
2.3%
5
 
1.9%
' 4
 
1.5%
, 4
 
1.5%
4
 
1.5%
4
 
1.5%
? 4
 
1.5%
4
 
1.5%
Other values (124) 165
63.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 177
68.1%
Space Separator 52
 
20.0%
Other Punctuation 27
 
10.4%
Uppercase Letter 4
 
1.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5
 
2.8%
4
 
2.3%
4
 
2.3%
4
 
2.3%
3
 
1.7%
3
 
1.7%
3
 
1.7%
3
 
1.7%
3
 
1.7%
3
 
1.7%
Other values (113) 142
80.2%
Other Punctuation
ValueCountFrequency (%)
" 8
29.6%
6
22.2%
' 4
14.8%
, 4
14.8%
? 4
14.8%
· 1
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
B 1
25.0%
C 1
25.0%
I 1
25.0%
M 1
25.0%
Space Separator
ValueCountFrequency (%)
52
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 177
68.1%
Common 79
30.4%
Latin 4
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5
 
2.8%
4
 
2.3%
4
 
2.3%
4
 
2.3%
3
 
1.7%
3
 
1.7%
3
 
1.7%
3
 
1.7%
3
 
1.7%
3
 
1.7%
Other values (113) 142
80.2%
Common
ValueCountFrequency (%)
52
65.8%
" 8
 
10.1%
6
 
7.6%
' 4
 
5.1%
, 4
 
5.1%
? 4
 
5.1%
· 1
 
1.3%
Latin
ValueCountFrequency (%)
B 1
25.0%
C 1
25.0%
I 1
25.0%
M 1
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 177
68.1%
ASCII 76
29.2%
Punctuation 6
 
2.3%
None 1
 
0.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
52
68.4%
" 8
 
10.5%
' 4
 
5.3%
, 4
 
5.3%
? 4
 
5.3%
B 1
 
1.3%
C 1
 
1.3%
I 1
 
1.3%
M 1
 
1.3%
Punctuation
ValueCountFrequency (%)
6
100.0%
Hangul
ValueCountFrequency (%)
5
 
2.8%
4
 
2.3%
4
 
2.3%
4
 
2.3%
3
 
1.7%
3
 
1.7%
3
 
1.7%
3
 
1.7%
3
 
1.7%
3
 
1.7%
Other values (113) 142
80.2%
None
ValueCountFrequency (%)
· 1
100.0%

NWS_CN
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
<NA>
288 
【 앵커멘트 】
 
10

Length

Max length8
Median length4
Mean length4.1342282
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row【 앵커멘트 】
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 288
96.6%
【 앵커멘트 】 10
 
3.4%

Length

2023-12-12T06:11:47.059520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T06:11:47.147502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 288
90.6%
10
 
3.1%
앵커멘트 10
 
3.1%
10
 
3.1%

NWS_JRNL_NM
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing298
Missing (%)100.0%
Memory size2.7 KiB

REG_DATE
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing298
Missing (%)100.0%
Memory size2.7 KiB

MVP_CRS_NM
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing298
Missing (%)100.0%
Memory size2.7 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing298
Missing (%)100.0%
Memory size2.7 KiB

Interactions

2023-12-12T06:11:44.378293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T06:11:44.230208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T06:11:44.451431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T06:11:44.304373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T06:11:47.206649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJ
NEWS_CGR_CD1.0001.0001.0001.0001.000
NEWS_NO1.0001.0001.000NaNNaN
BDCT_DATE1.0001.0001.000NaNNaN
BDCT_TIME1.000NaNNaN1.0001.000
NWS_SJ1.000NaNNaN1.0001.000
2023-12-12T06:11:47.303983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NEWS_NOBDCT_TIMENWS_CN
NEWS_NO1.0000.0911.000
BDCT_TIME0.0911.0001.000
NWS_CN1.0001.0001.000

Missing values

2023-12-12T06:11:44.551820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T06:11:44.664181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T06:11:44.759645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CNNWS_JRNL_NMREG_DATEMVP_CRS_NMUnnamed: 10
0<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
11144456mbn000093105654201701011930특검, 새해 첫날부터 줄소환…'삼성그룹' 겨눈다【 앵커멘트 】<NA><NA><NA><NA>
2박영수 특별검사팀은 새해 첫날부터 문형표 전 복지부 장관과 김 종 전 문체부 차관 등 '최순실 게이트'의 핵심 인물들을 줄소환했습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
3박영수 특검은 열심히 하겠다는 짧은 한마디로 새해 특검의 각오를 밝혔습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
4한민용 기자입니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
5<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
6<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
7【 기자 】<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
8새해 첫날도 잊고 핵심 인물들을 줄소환하며 강행군을 이어가고 있는 박영수 특별검사팀.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
9<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CNNWS_JRNL_NMREG_DATEMVP_CRS_NMUnnamed: 10
288- "2013년을 보내고 앞날에 대한 확신과 혁명적 자부심에 넘쳐 새해 2014년을 맞이합니다."<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
289<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
290다만 신년사 화면 구성은 교차편집으로 지난해와 같았습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
291<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
292미사일 도발부터 북한 여자 축구 선수, 각종 공장과 농장 사진과 노동당 청사 정지 화면을 김정은의 신년사 낭독 모습과 번갈아 보여줬습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
293<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
294또 27분짜리 신년사 도중 마치 관중 앞에서 연설 하는 것과 같은 박수 소리를 무려 37차례나 넣었습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
295<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
296MBN뉴스 오지예입니다.오지예20170101http://www.mbn.co.kr/player/movieContents.mbn?content_cls_cd=20&content_id=1144465<NA><NA><NA><NA><NA><NA><NA>
297<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

BDCT_NONEWS_CGR_CDNEWS_NOBDCT_DATEBDCT_TIMENWS_SJNWS_CN# duplicates
4<NA><NA><NA><NA><NA><NA><NA>111
3【 기자 】<NA><NA><NA><NA><NA><NA>9
1▶ SYNC : 박근혜 / 대통령<NA><NA><NA><NA><NA><NA>8
2▶ 인터뷰 : 김정은 / 북한 노동당 위원장<NA><NA><NA><NA><NA><NA>7
0영상취재 : 김인성 기자<NA><NA><NA><NA><NA><NA>2