Overview

Dataset statistics

Number of variables10
Number of observations150
Missing cells1195
Missing cells (%)79.7%
Duplicate rows3
Duplicate rows (%)2.0%
Total size in memory12.6 KiB
Average record size in memory85.9 B

Variable types

Text4
Categorical1
Numeric1
Unsupported4

Dataset

Description샘플 데이터
AuthorMBN
URLhttps://kdx.kr/data/view/26950

Alerts

Dataset has 3 (2.0%) duplicate rowsDuplicates
STD_YEAR is highly overall correlated with MDA_CGR_NMHigh correlation
MDA_CGR_NM is highly overall correlated with STD_YEARHigh correlation
MDA_CGR_NM is highly imbalanced (75.8%)Imbalance
MBN_MDA_SP_CD has 51 (34.0%) missing valuesMissing
MDA_ART_ESSN_NO has 134 (89.3%) missing valuesMissing
STD_YEAR has 130 (86.7%) missing valuesMissing
ART_SJ_CN has 140 (93.3%) missing valuesMissing
ART_CN has 140 (93.3%) missing valuesMissing
ATCH_IMG_NM has 150 (100.0%) missing valuesMissing
JRNL_NM has 150 (100.0%) missing valuesMissing
WRT_DATE has 150 (100.0%) missing valuesMissing
Unnamed: 9 has 150 (100.0%) missing valuesMissing
ATCH_IMG_NM is an unsupported type, check if it needs cleaning or further analysisUnsupported
JRNL_NM is an unsupported type, check if it needs cleaning or further analysisUnsupported
WRT_DATE is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 21:23:53.540884
Analysis finished2023-12-11 21:23:55.829447
Duration2.29 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

MBN_MDA_SP_CD
Text

MISSING 

Distinct85
Distinct (%)85.9%
Missing51
Missing (%)34.0%
Memory size1.3 KiB
2023-12-12T06:23:56.030381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length442
Median length116
Mean length76.30303
Min length3

Characters and Unicode

Total characters7554
Distinct characters600
Distinct categories14 ?
Distinct scripts3 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique80 ?
Unique (%)80.8%

Sample

1st rowMBN
2nd row원로 연극평론가 구히서(본명 구희서) 선생이 31일 별세했습니다. 향년 80세입니다.
3rd row고인은 수년 전 건강이 악화해 자택에서 투병했으며, 오늘 새벽 3시쯤 서울대병원에서 영면에 들었습니다.
4th row고인은 서울에서 태어나 경기여고와 이화여자대학교 사학과를 졸업하고 문화재관리국, 문화재연구소 등지에서 근무하다 1970년부터 1994년까지 한국일보와 일간스포츠에서 연극 전문기자로 활동했습니다. 퇴직 후인 1994∼1998년에는 한국연극평론가협회 회장을 지냈습니다.
5th row고인은 여석기, 한상철, 이태주, 이상일 평론가와 함께 한국연극평론가협회의 전신인 서울연극평론가그룹을 이끈 것으로 유명합니다. 서울연극평론가그룹은 공연예술계에서 평론가 집단을 형성한 최초 사례입니다.
ValueCountFrequency (%)
22
 
1.8%
mbn 16
 
1.3%
대한 6
 
0.5%
있는 5
 
0.4%
5
 
0.4%
조정석은 5
 
0.4%
고인은 4
 
0.3%
있습니다 4
 
0.3%
4
 
0.3%
함께 4
 
0.3%
Other values (979) 1163
93.9%
2023-12-12T06:23:56.398746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1214
 
16.1%
b 217
 
2.9%
n 217
 
2.9%
& 216
 
2.9%
; 213
 
2.8%
p 213
 
2.8%
s 213
 
2.8%
131
 
1.7%
129
 
1.7%
. 115
 
1.5%
Other values (590) 4676
61.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4310
57.1%
Space Separator 1214
 
16.1%
Lowercase Letter 905
 
12.0%
Other Punctuation 706
 
9.3%
Uppercase Letter 122
 
1.6%
Decimal Number 109
 
1.4%
Dash Punctuation 101
 
1.3%
Open Punctuation 22
 
0.3%
Close Punctuation 22
 
0.3%
Math Symbol 15
 
0.2%
Other values (4) 28
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
131
 
3.0%
129
 
3.0%
91
 
2.1%
89
 
2.1%
86
 
2.0%
75
 
1.7%
74
 
1.7%
68
 
1.6%
61
 
1.4%
59
 
1.4%
Other values (518) 3447
80.0%
Lowercase Letter
ValueCountFrequency (%)
b 217
24.0%
n 217
24.0%
p 213
23.5%
s 213
23.5%
k 8
 
0.9%
m 8
 
0.9%
c 6
 
0.7%
r 6
 
0.7%
o 6
 
0.7%
l 2
 
0.2%
Other values (7) 9
 
1.0%
Uppercase Letter
ValueCountFrequency (%)
M 23
18.9%
B 21
17.2%
N 20
16.4%
S 14
11.5%
O 12
9.8%
P 11
9.0%
C 4
 
3.3%
H 4
 
3.3%
T 4
 
3.3%
V 2
 
1.6%
Other values (6) 7
 
5.7%
Other Punctuation
ValueCountFrequency (%)
& 216
30.6%
; 213
30.2%
. 115
16.3%
' 68
 
9.6%
" 42
 
5.9%
, 36
 
5.1%
: 7
 
1.0%
! 4
 
0.6%
? 2
 
0.3%
@ 2
 
0.3%
Decimal Number
ValueCountFrequency (%)
1 40
36.7%
9 20
18.3%
3 12
 
11.0%
2 11
 
10.1%
0 9
 
8.3%
4 5
 
4.6%
8 4
 
3.7%
5 3
 
2.8%
6 3
 
2.8%
7 2
 
1.8%
Math Symbol
ValueCountFrequency (%)
> 6
40.0%
< 6
40.0%
~ 2
 
13.3%
1
 
6.7%
Open Punctuation
ValueCountFrequency (%)
[ 11
50.0%
( 9
40.9%
2
 
9.1%
Close Punctuation
ValueCountFrequency (%)
] 11
50.0%
) 9
40.9%
2
 
9.1%
Other Symbol
ValueCountFrequency (%)
3
50.0%
2
33.3%
1
 
16.7%
Space Separator
ValueCountFrequency (%)
1214
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 101
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 8
100.0%
Final Punctuation
ValueCountFrequency (%)
7
100.0%
Initial Punctuation
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4310
57.1%
Common 2217
29.3%
Latin 1027
 
13.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
131
 
3.0%
129
 
3.0%
91
 
2.1%
89
 
2.1%
86
 
2.0%
75
 
1.7%
74
 
1.7%
68
 
1.6%
61
 
1.4%
59
 
1.4%
Other values (518) 3447
80.0%
Common
ValueCountFrequency (%)
1214
54.8%
& 216
 
9.7%
; 213
 
9.6%
. 115
 
5.2%
- 101
 
4.6%
' 68
 
3.1%
" 42
 
1.9%
1 40
 
1.8%
, 36
 
1.6%
9 20
 
0.9%
Other values (29) 152
 
6.9%
Latin
ValueCountFrequency (%)
b 217
21.1%
n 217
21.1%
p 213
20.7%
s 213
20.7%
M 23
 
2.2%
B 21
 
2.0%
N 20
 
1.9%
S 14
 
1.4%
O 12
 
1.2%
P 11
 
1.1%
Other values (23) 66
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4310
57.1%
ASCII 3219
42.6%
Punctuation 14
 
0.2%
None 4
 
0.1%
Enclosed Alphanum 3
 
< 0.1%
Geometric Shapes 2
 
< 0.1%
Math Operators 1
 
< 0.1%
Misc Symbols 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1214
37.7%
b 217
 
6.7%
n 217
 
6.7%
& 216
 
6.7%
; 213
 
6.6%
p 213
 
6.6%
s 213
 
6.6%
. 115
 
3.6%
- 101
 
3.1%
' 68
 
2.1%
Other values (54) 432
 
13.4%
Hangul
ValueCountFrequency (%)
131
 
3.0%
129
 
3.0%
91
 
2.1%
89
 
2.1%
86
 
2.0%
75
 
1.7%
74
 
1.7%
68
 
1.6%
61
 
1.4%
59
 
1.4%
Other values (518) 3447
80.0%
Punctuation
ValueCountFrequency (%)
7
50.0%
7
50.0%
Enclosed Alphanum
ValueCountFrequency (%)
3
100.0%
None
ValueCountFrequency (%)
2
50.0%
2
50.0%
Geometric Shapes
ValueCountFrequency (%)
2
100.0%
Math Operators
ValueCountFrequency (%)
1
100.0%
Misc Symbols
ValueCountFrequency (%)
1
100.0%

MDA_ART_ESSN_NO
Text

MISSING 

Distinct16
Distinct (%)100.0%
Missing134
Missing (%)89.3%
Memory size1.3 KiB
2023-12-12T06:23:56.560075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length297
Median length7
Mean length47.3125
Min length7

Characters and Unicode

Total characters757
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)100.0%

Sample

1st row4023243
2nd rowhttp://img.mbn.co.kr/filewww/news/2020/01/01/15778451315e0c018b9f1e7.jpg,,,,,,,,,
3rd row4023334
4th rowhttp://img.mbn.co.kr/filewww/news/2020/01/01/15778574355e0c319b1b12a.jpg,,,,,,,,,
5th row4023389
ValueCountFrequency (%)
4023243 1
 
6.2%
http://img.mbn.co.kr/filewww/news/2020/01/01/15778451315e0c018b9f1e7.jpg 1
 
6.2%
4023334 1
 
6.2%
http://img.mbn.co.kr/filewww/news/2020/01/01/15778574355e0c319b1b12a.jpg 1
 
6.2%
4023389 1
 
6.2%
http://img.mbn.co.kr/filewww/news/other/2020/01/01/002002012102.png 1
 
6.2%
4023448 1
 
6.2%
4023460 1
 
6.2%
4023529 1
 
6.2%
4023531 1
 
6.2%
Other values (6) 6
37.5%
2023-12-12T06:23:56.802659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 75
 
9.9%
0 71
 
9.4%
2 61
 
8.1%
, 54
 
7.1%
. 36
 
4.8%
w 36
 
4.8%
3 32
 
4.2%
1 30
 
4.0%
e 30
 
4.0%
5 25
 
3.3%
Other values (24) 307
40.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 300
39.6%
Decimal Number 283
37.4%
Other Punctuation 174
23.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
w 36
 
12.0%
e 30
 
10.0%
t 21
 
7.0%
n 19
 
6.3%
g 18
 
6.0%
m 18
 
6.0%
i 18
 
6.0%
p 18
 
6.0%
b 15
 
5.0%
f 13
 
4.3%
Other values (10) 94
31.3%
Decimal Number
ValueCountFrequency (%)
0 71
25.1%
2 61
21.6%
3 32
11.3%
1 30
10.6%
5 25
 
8.8%
4 19
 
6.7%
7 15
 
5.3%
9 13
 
4.6%
8 10
 
3.5%
6 7
 
2.5%
Other Punctuation
ValueCountFrequency (%)
/ 75
43.1%
, 54
31.0%
. 36
20.7%
: 9
 
5.2%

Most occurring scripts

ValueCountFrequency (%)
Common 457
60.4%
Latin 300
39.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
w 36
 
12.0%
e 30
 
10.0%
t 21
 
7.0%
n 19
 
6.3%
g 18
 
6.0%
m 18
 
6.0%
i 18
 
6.0%
p 18
 
6.0%
b 15
 
5.0%
f 13
 
4.3%
Other values (10) 94
31.3%
Common
ValueCountFrequency (%)
/ 75
16.4%
0 71
15.5%
2 61
13.3%
, 54
11.8%
. 36
7.9%
3 32
7.0%
1 30
 
6.6%
5 25
 
5.5%
4 19
 
4.2%
7 15
 
3.3%
Other values (4) 39
8.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 757
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 75
 
9.9%
0 71
 
9.4%
2 61
 
8.1%
, 54
 
7.1%
. 36
 
4.8%
w 36
 
4.8%
3 32
 
4.2%
1 30
 
4.0%
e 30
 
4.0%
5 25
 
3.3%
Other values (24) 307
40.6%

MDA_CGR_NM
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
<NA>
135 
mbn00007
 
10
김정은
 
2
서주희 인턴
 
1
이동훈
 
1

Length

Max length8
Median length4
Mean length4.2533333
Min length3

Unique

Unique3 ?
Unique (%)2.0%

Sample

1st row<NA>
2nd rowmbn00007
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 135
90.0%
mbn00007 10
 
6.7%
김정은 2
 
1.3%
서주희 인턴 1
 
0.7%
이동훈 1
 
0.7%
이기종 1
 
0.7%

Length

2023-12-12T06:23:56.909091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T06:23:57.021466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 135
89.4%
mbn00007 10
 
6.6%
김정은 2
 
1.3%
서주희 1
 
0.7%
인턴 1
 
0.7%
이동훈 1
 
0.7%
이기종 1
 
0.7%

STD_YEAR
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct10
Distinct (%)50.0%
Missing130
Missing (%)86.7%
Infinite0
Infinite (%)0.0%
Mean1.0100051 × 1013
Minimum2020
Maximum2.0200102 × 1013
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-12T06:23:57.122607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2020
5-th percentile2020
Q12020
median1.0100051 × 1013
Q32.0200101 × 1013
95-th percentile2.0200102 × 1013
Maximum2.0200102 × 1013
Range2.0200102 × 1013
Interquartile range (IQR)2.0200101 × 1013

Descriptive statistics

Standard deviation1.0362433 × 1013
Coefficient of variation (CV)1.0259784
Kurtosis-2.2352941
Mean1.0100051 × 1013
Median Absolute Deviation (MAD)1.0100051 × 1013
Skewness3.6215387 × 10-15
Sum2.0200102 × 1014
Variance1.0738003 × 1026
MonotonicityNot monotonic
2023-12-12T06:23:57.207579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2020 10
 
6.7%
20200101193035 2
 
1.3%
20200101112455 1
 
0.7%
20200101145423 1
 
0.7%
20200101161204 1
 
0.7%
20200102080135 1
 
0.7%
20200102080235 1
 
0.7%
20200102090335 1
 
0.7%
20200102094144 1
 
0.7%
20200102094236 1
 
0.7%
(Missing) 130
86.7%
ValueCountFrequency (%)
2020 10
6.7%
20200101112455 1
 
0.7%
20200101145423 1
 
0.7%
20200101161204 1
 
0.7%
20200101193035 2
 
1.3%
20200102080135 1
 
0.7%
20200102080235 1
 
0.7%
20200102090335 1
 
0.7%
20200102094144 1
 
0.7%
20200102094236 1
 
0.7%
ValueCountFrequency (%)
20200102094236 1
 
0.7%
20200102094144 1
 
0.7%
20200102090335 1
 
0.7%
20200102080235 1
 
0.7%
20200102080135 1
 
0.7%
20200101193035 2
 
1.3%
20200101161204 1
 
0.7%
20200101145423 1
 
0.7%
20200101112455 1
 
0.7%
2020 10
6.7%

ART_SJ_CN
Text

MISSING 

Distinct10
Distinct (%)100.0%
Missing140
Missing (%)93.3%
Memory size1.3 KiB
2023-12-12T06:23:57.406723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length47
Median length32.5
Mean length28.9
Min length18

Characters and Unicode

Total characters289
Distinct characters144
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)100.0%

Sample

1st row원로 연극평론가 구히서 선생 별세
2nd row'사노라면' 쏘가리 부부의 동상이몽..."남편 보면 속이 터져"
3rd row조정석, 수상 소감 중 아내 거미에게 "사랑해"
4th row짐승 탈을 쓴 배우들…새해 흥행 공식은 '동물'
5th row국민MC 송해 입원에 응원 댓글 이어져…"단순 감기몸살"
ValueCountFrequency (%)
송해 2
 
3.3%
원로 1
 
1.7%
댓글 1
 
1.7%
감기몸살 1
 
1.7%
트와이스 1
 
1.7%
나연 1
 
1.7%
스토커 1
 
1.7%
기내 1
 
1.7%
소란…jyp 1
 
1.7%
최고강도 1
 
1.7%
Other values (49) 49
81.7%
2023-12-12T06:23:57.708958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
50
 
17.3%
" 10
 
3.5%
' 8
 
2.8%
6
 
2.1%
5
 
1.7%
, 4
 
1.4%
4
 
1.4%
4
 
1.4%
4
 
1.4%
4
 
1.4%
Other values (134) 190
65.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 183
63.3%
Space Separator 50
 
17.3%
Other Punctuation 38
 
13.1%
Lowercase Letter 12
 
4.2%
Uppercase Letter 5
 
1.7%
Other Symbol 1
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6
 
3.3%
5
 
2.7%
4
 
2.2%
4
 
2.2%
4
 
2.2%
3
 
1.6%
3
 
1.6%
3
 
1.6%
3
 
1.6%
3
 
1.6%
Other values (114) 145
79.2%
Other Punctuation
ValueCountFrequency (%)
" 10
26.3%
' 8
21.1%
, 4
 
10.5%
4
 
10.5%
. 3
 
7.9%
; 3
 
7.9%
& 3
 
7.9%
? 2
 
5.3%
· 1
 
2.6%
Uppercase Letter
ValueCountFrequency (%)
M 1
20.0%
C 1
20.0%
P 1
20.0%
Y 1
20.0%
J 1
20.0%
Lowercase Letter
ValueCountFrequency (%)
b 3
25.0%
p 3
25.0%
s 3
25.0%
n 3
25.0%
Space Separator
ValueCountFrequency (%)
50
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 183
63.3%
Common 89
30.8%
Latin 17
 
5.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6
 
3.3%
5
 
2.7%
4
 
2.2%
4
 
2.2%
4
 
2.2%
3
 
1.6%
3
 
1.6%
3
 
1.6%
3
 
1.6%
3
 
1.6%
Other values (114) 145
79.2%
Common
ValueCountFrequency (%)
50
56.2%
" 10
 
11.2%
' 8
 
9.0%
, 4
 
4.5%
4
 
4.5%
. 3
 
3.4%
; 3
 
3.4%
& 3
 
3.4%
? 2
 
2.2%
1
 
1.1%
Latin
ValueCountFrequency (%)
b 3
17.6%
p 3
17.6%
s 3
17.6%
n 3
17.6%
M 1
 
5.9%
C 1
 
5.9%
P 1
 
5.9%
Y 1
 
5.9%
J 1
 
5.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 183
63.3%
ASCII 100
34.6%
Punctuation 4
 
1.4%
Misc Symbols 1
 
0.3%
None 1
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
50
50.0%
" 10
 
10.0%
' 8
 
8.0%
, 4
 
4.0%
. 3
 
3.0%
b 3
 
3.0%
; 3
 
3.0%
p 3
 
3.0%
s 3
 
3.0%
& 3
 
3.0%
Other values (7) 10
 
10.0%
Hangul
ValueCountFrequency (%)
6
 
3.3%
5
 
2.7%
4
 
2.2%
4
 
2.2%
4
 
2.2%
3
 
1.6%
3
 
1.6%
3
 
1.6%
3
 
1.6%
3
 
1.6%
Other values (114) 145
79.2%
Punctuation
ValueCountFrequency (%)
4
100.0%
Misc Symbols
ValueCountFrequency (%)
1
100.0%
None
ValueCountFrequency (%)
· 1
100.0%

ART_CN
Text

MISSING 

Distinct8
Distinct (%)80.0%
Missing140
Missing (%)93.3%
Memory size1.3 KiB
2023-12-12T06:23:57.850616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length183
Median length81
Mean length63.8
Min length8

Characters and Unicode

Total characters638
Distinct characters167
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)60.0%

Sample

1st row<!------------ PHOTO_POS_0 ------------>
2nd row<!------------ PHOTO_POS_0 ------------>
3rd row<!------------ PHOTO_POS_0 ------------> 배우 조정석이 '2019 SBS 연기대상'에서 아내 거미를 언급했다.
4th row【 앵커멘트 】
5th row【 앵커멘트 】
ValueCountFrequency (%)
10
 
13.2%
photo_pos_0 5
 
6.6%
2
 
2.6%
앵커멘트 2
 
2.6%
2
 
2.6%
교황이 1
 
1.3%
끌어당긴 1
 
1.3%
움켜쥐고 1
 
1.3%
손을 1
 
1.3%
자신의 1
 
1.3%
Other values (50) 50
65.8%
2023-12-12T06:23:58.110975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 120
 
18.8%
67
 
10.5%
p 18
 
2.8%
& 18
 
2.8%
n 18
 
2.8%
b 18
 
2.8%
s 18
 
2.8%
; 18
 
2.8%
O 15
 
2.4%
P 10
 
1.6%
Other values (157) 318
49.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 225
35.3%
Dash Punctuation 120
18.8%
Lowercase Letter 72
 
11.3%
Space Separator 67
 
10.5%
Other Punctuation 54
 
8.5%
Uppercase Letter 47
 
7.4%
Decimal Number 21
 
3.3%
Connector Punctuation 10
 
1.6%
Math Symbol 10
 
1.6%
Open Punctuation 6
 
0.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9
 
4.0%
8
 
3.6%
7
 
3.1%
5
 
2.2%
4
 
1.8%
4
 
1.8%
4
 
1.8%
4
 
1.8%
4
 
1.8%
4
 
1.8%
Other values (122) 172
76.4%
Uppercase Letter
ValueCountFrequency (%)
O 15
31.9%
P 10
21.3%
S 7
14.9%
T 5
 
10.6%
H 5
 
10.6%
B 2
 
4.3%
N 1
 
2.1%
M 1
 
2.1%
X 1
 
2.1%
Decimal Number
ValueCountFrequency (%)
0 8
38.1%
2 5
23.8%
9 2
 
9.5%
1 2
 
9.5%
8 1
 
4.8%
6 1
 
4.8%
5 1
 
4.8%
4 1
 
4.8%
Other Punctuation
ValueCountFrequency (%)
& 18
33.3%
; 18
33.3%
' 7
 
13.0%
. 6
 
11.1%
! 5
 
9.3%
Lowercase Letter
ValueCountFrequency (%)
p 18
25.0%
n 18
25.0%
b 18
25.0%
s 18
25.0%
Math Symbol
ValueCountFrequency (%)
< 5
50.0%
> 5
50.0%
Open Punctuation
ValueCountFrequency (%)
( 4
66.7%
2
33.3%
Close Punctuation
ValueCountFrequency (%)
) 4
66.7%
2
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 120
100.0%
Space Separator
ValueCountFrequency (%)
67
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 294
46.1%
Hangul 225
35.3%
Latin 119
18.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9
 
4.0%
8
 
3.6%
7
 
3.1%
5
 
2.2%
4
 
1.8%
4
 
1.8%
4
 
1.8%
4
 
1.8%
4
 
1.8%
4
 
1.8%
Other values (122) 172
76.4%
Common
ValueCountFrequency (%)
- 120
40.8%
67
22.8%
& 18
 
6.1%
; 18
 
6.1%
_ 10
 
3.4%
0 8
 
2.7%
' 7
 
2.4%
. 6
 
2.0%
! 5
 
1.7%
2 5
 
1.7%
Other values (12) 30
 
10.2%
Latin
ValueCountFrequency (%)
p 18
15.1%
n 18
15.1%
b 18
15.1%
s 18
15.1%
O 15
12.6%
P 10
8.4%
S 7
 
5.9%
T 5
 
4.2%
H 5
 
4.2%
B 2
 
1.7%
Other values (3) 3
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 409
64.1%
Hangul 225
35.3%
None 4
 
0.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 120
29.3%
67
16.4%
p 18
 
4.4%
& 18
 
4.4%
n 18
 
4.4%
b 18
 
4.4%
s 18
 
4.4%
; 18
 
4.4%
O 15
 
3.7%
P 10
 
2.4%
Other values (23) 89
21.8%
Hangul
ValueCountFrequency (%)
9
 
4.0%
8
 
3.6%
7
 
3.1%
5
 
2.2%
4
 
1.8%
4
 
1.8%
4
 
1.8%
4
 
1.8%
4
 
1.8%
4
 
1.8%
Other values (122) 172
76.4%
None
ValueCountFrequency (%)
2
50.0%
2
50.0%

ATCH_IMG_NM
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing150
Missing (%)100.0%
Memory size1.4 KiB

JRNL_NM
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing150
Missing (%)100.0%
Memory size1.4 KiB

WRT_DATE
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing150
Missing (%)100.0%
Memory size1.4 KiB

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing150
Missing (%)100.0%
Memory size1.4 KiB

Interactions

2023-12-12T06:23:55.270670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T06:23:58.199550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
MBN_MDA_SP_CDMDA_ART_ESSN_NOMDA_CGR_NMSTD_YEARART_SJ_CNART_CN
MBN_MDA_SP_CD1.0001.0001.000NaNNaNNaN
MDA_ART_ESSN_NO1.0001.0001.000NaN1.0001.000
MDA_CGR_NM1.0001.0001.000NaNNaNNaN
STD_YEARNaNNaNNaN1.000NaNNaN
ART_SJ_CNNaN1.000NaNNaN1.0001.000
ART_CNNaN1.000NaNNaN1.0001.000
2023-12-12T06:23:58.282056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
STD_YEARMDA_CGR_NM
STD_YEAR1.0000.877
MDA_CGR_NM0.8771.000

Missing values

2023-12-12T06:23:55.413586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T06:23:55.522714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T06:23:55.619896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

MBN_MDA_SP_CDMDA_ART_ESSN_NOMDA_CGR_NMSTD_YEARART_SJ_CNART_CNATCH_IMG_NMJRNL_NMWRT_DATEUnnamed: 9
0<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
1MBN4023243mbn000072020원로 연극평론가 구히서 선생 별세<!------------ PHOTO_POS_0 ------------><NA><NA><NA><NA>
2원로 연극평론가 구히서(본명 구희서) 선생이 31일 별세했습니다. 향년 80세입니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA>
3<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
4고인은 수년 전 건강이 악화해 자택에서 투병했으며, 오늘 새벽 3시쯤 서울대병원에서 영면에 들었습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA>
5<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
6고인은 서울에서 태어나 경기여고와 이화여자대학교 사학과를 졸업하고 문화재관리국, 문화재연구소 등지에서 근무하다 1970년부터 1994년까지 한국일보와 일간스포츠에서 연극 전문기자로 활동했습니다. 퇴직 후인 1994∼1998년에는 한국연극평론가협회 회장을 지냈습니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA>
7<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
8고인은 여석기, 한상철, 이태주, 이상일 평론가와 함께 한국연극평론가협회의 전신인 서울연극평론가그룹을 이끈 것으로 유명합니다. 서울연극평론가그룹은 공연예술계에서 평론가 집단을 형성한 최초 사례입니다.<NA><NA><NA><NA><NA><NA><NA><NA><NA>
9<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
MBN_MDA_SP_CDMDA_ART_ESSN_NOMDA_CGR_NMSTD_YEARART_SJ_CNART_CNATCH_IMG_NMJRNL_NMWRT_DATEUnnamed: 9
140MBN4023626mbn000072020엑스원 한승우·에이핑크 정은지 열애? 양측 부인<!------------ PHOTO_POS_0 ------------> 엑스원 한승우(25)와 에이핑크 정은지(26) 양측이 열애 의혹을 부인했다.<NA><NA><NA><NA>
141한승우와 정은지 소속사 플레이엠엔터테인먼트 관계자는 1일 "현재 온라인상에 언급되고 있는 내용은 사실무근"이라고 전했다. 이어 "두 사람은 같은 회사 선후배일 뿐 더 이상의 억측은 자제해달라"고 당부했다.<NA><NA><NA><NA><NA><NA><NA><NA><NA>
142앞서 온라인 커뮤니티와 SNS를 중심으로 한승우와 정은지로 보이는 두 남녀를 포착한 사진이 빠르게 퍼졌다. 마스크로 얼굴을 가렸지만 언뜻 봐도 한승우와 정은지였다.<NA><NA><NA><NA><NA><NA><NA><NA><NA>
143이를 게시한 누리꾼은 "한승우와 정은지가 12월 13일 오후 6시 하남시 스타필드에서 포착됐다. 1월 1일이 곧 지나가는데 왜 디스패치는 아직 발표를 안 하느냐. 답답해서 대신 하나 올린다"고 했다. 팬들은 사진의 진위 여부와 더불어 열애 의혹에 대한 소속사의 입장 발표를 기다렸다.<NA><NA><NA><NA><NA><NA><NA><NA><NA>
144한승우는 그룹 시크릿으로 활동한 배우 한선화의 남동생으로, 그룹 빅톤의 멤버로 데뷔했다. 한승우는 Mnet '프로듀스X101'을 통해 프로젝트 그룹 엑스원의 멤버로 발탁됐으나, 현재 엑스원은 제작진의 문자 투표 조작 논란 이후 잠정적으로 활동을 중단했다.<NA><NA><NA><NA><NA><NA><NA><NA><NA>
145정은지는 에이핑크 소속 멤버이자 연기돌로 활동하고 있다. 오는 2월 1일~2일 양일간 여섯 번째 단독 콘서트를 개최하고 팬들을 만날 계획이다.<NA><NA><NA><NA><NA><NA><NA><NA><NA>
146[디지털뉴스국 김정은 인턴기자]<NA><NA><NA><NA><NA><NA><NA><NA><NA>
147<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
148[ⓒ 매일경제 & mk.co.kr, 무단전재 및 재배포 금지]<br>http://img.mbn.co.kr/filewww/news/other/2020/01/02/022439020929.jpg,,,,,,,,,김정은20200102094236<NA><NA><NA><NA><NA><NA>
149<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

MBN_MDA_SP_CDMDA_ART_ESSN_NOMDA_CGR_NMSTD_YEARART_SJ_CNART_CN# duplicates
2<NA><NA><NA><NA><NA><NA>51
0[디지털뉴스국 김정은 인턴기자]<NA><NA><NA><NA><NA>2
1【 기자 】<NA><NA><NA><NA><NA>2