Overview

Dataset statistics

Number of variables13
Number of observations2270
Missing cells1837
Missing cells (%)6.2%
Duplicate rows2
Duplicate rows (%)0.1%
Total size in memory230.7 KiB
Average record size in memory104.1 B

Variable types

Text7
Categorical4
Boolean2

Dataset

Description해외 한국학 관련 학과교수 목록 정보
Author한국학중앙연구원
URLhttps://www.data.go.kr/data/15049054/fileData.do

Alerts

Dataset has 2 (0.1%) duplicate rowsDuplicates
ERASER is highly overall correlated with MODIFIER and 1 other fieldsHigh correlation
ERASE_YN is highly overall correlated with ERASE_DT and 1 other fieldsHigh correlation
ERASE_DT is highly overall correlated with MODIFIER and 1 other fieldsHigh correlation
MODIFIER is highly overall correlated with ERASE_DT and 1 other fieldsHigh correlation
CODE_ORDER is highly imbalanced (50.4%)Imbalance
ERASE_DT is highly imbalanced (84.3%)Imbalance
ERASER is highly imbalanced (71.1%)Imbalance
MEMBER_NUM has 1484 (65.4%) missing valuesMissing
MEMBER_ID has 103 (4.5%) missing valuesMissing
MODIFY_DT has 210 (9.3%) missing valuesMissing

Reproduction

Analysis started2023-12-12 03:03:59.075526
Analysis finished2023-12-12 03:04:01.298183
Duration2.22 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2267
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size17.9 KiB
2023-12-12T12:04:01.768775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length4
Mean length4.0511013
Min length4

Characters and Unicode

Total characters9196
Distinct characters44
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2264 ?
Unique (%)99.7%

Sample

1st row1538
2nd row1539
3rd row1540
4th row1541
5th row1543
ValueCountFrequency (%)
한국어교육학(korean 2
 
0.1%
education 2
 
0.1%
문법론(grammar 2
 
0.1%
번역학(translation 2
 
0.1%
studies 2
 
0.1%
language 2
 
0.1%
3216 1
 
< 0.1%
3217 1
 
< 0.1%
1538 1
 
< 0.1%
3221 1
 
< 0.1%
Other values (2260) 2260
99.3%
2023-12-12T12:04:02.615989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 1588
17.3%
4 1352
14.7%
1 1051
11.4%
3 802
8.7%
7 743
8.1%
5 726
7.9%
9 724
7.9%
8 711
7.7%
6 711
7.7%
0 648
7.0%
Other values (34) 140
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9056
98.5%
Lowercase Letter 84
 
0.9%
Other Letter 24
 
0.3%
Uppercase Letter 12
 
0.1%
Space Separator 8
 
0.1%
Close Punctuation 6
 
0.1%
Open Punctuation 6
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 16
19.0%
n 10
11.9%
r 8
9.5%
t 6
 
7.1%
i 6
 
7.1%
u 6
 
7.1%
o 6
 
7.1%
e 6
 
7.1%
g 4
 
4.8%
m 4
 
4.8%
Other values (4) 12
14.3%
Other Letter
ValueCountFrequency (%)
4
16.7%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
Decimal Number
ValueCountFrequency (%)
2 1588
17.5%
4 1352
14.9%
1 1051
11.6%
3 802
8.9%
7 743
8.2%
5 726
8.0%
9 724
8.0%
8 711
7.9%
6 711
7.9%
0 648
7.2%
Uppercase Letter
ValueCountFrequency (%)
E 2
16.7%
S 2
16.7%
K 2
16.7%
L 2
16.7%
T 2
16.7%
G 2
16.7%
Space Separator
ValueCountFrequency (%)
8
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 9076
98.7%
Latin 96
 
1.0%
Hangul 24
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 16
16.7%
n 10
10.4%
r 8
 
8.3%
t 6
 
6.2%
i 6
 
6.2%
u 6
 
6.2%
o 6
 
6.2%
e 6
 
6.2%
g 4
 
4.2%
m 4
 
4.2%
Other values (10) 24
25.0%
Common
ValueCountFrequency (%)
2 1588
17.5%
4 1352
14.9%
1 1051
11.6%
3 802
8.8%
7 743
8.2%
5 726
8.0%
9 724
8.0%
8 711
7.8%
6 711
7.8%
0 648
7.1%
Other values (3) 20
 
0.2%
Hangul
ValueCountFrequency (%)
4
16.7%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9172
99.7%
Hangul 24
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 1588
17.3%
4 1352
14.7%
1 1051
11.5%
3 802
8.7%
7 743
8.1%
5 726
7.9%
9 724
7.9%
8 711
7.8%
6 711
7.8%
0 648
7.1%
Other values (23) 116
 
1.3%
Hangul
ValueCountFrequency (%)
4
16.7%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
2
8.3%
Distinct291
Distinct (%)12.8%
Missing5
Missing (%)0.2%
Memory size17.9 KiB
2023-12-12T12:04:03.095161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length6
Mean length6.007064
Min length6

Characters and Unicode

Total characters13606
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)2.3%

Sample

1st row100001
2nd row100001
3rd row100001
4th row100001
5th row100002
ValueCountFrequency (%)
100029 45
 
2.0%
100212 36
 
1.6%
100006 35
 
1.5%
100085 33
 
1.5%
100426 33
 
1.5%
100294 32
 
1.4%
100971 30
 
1.3%
100653 29
 
1.3%
100379 28
 
1.2%
100576 28
 
1.2%
Other values (281) 1936
85.5%
2023-12-12T12:04:03.752319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 5507
40.5%
1 2902
21.3%
5 809
 
5.9%
2 793
 
5.8%
4 732
 
5.4%
8 729
 
5.4%
7 581
 
4.3%
6 537
 
3.9%
3 517
 
3.8%
9 485
 
3.6%
Other values (10) 14
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 13592
99.9%
Uppercase Letter 12
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5507
40.5%
1 2902
21.4%
5 809
 
6.0%
2 793
 
5.8%
4 732
 
5.4%
8 729
 
5.4%
7 581
 
4.3%
6 537
 
4.0%
3 517
 
3.8%
9 485
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 3
25.0%
E 2
16.7%
N 1
 
8.3%
G 1
 
8.3%
T 1
 
8.3%
I 1
 
8.3%
H 1
 
8.3%
C 1
 
8.3%
R 1
 
8.3%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13594
99.9%
Latin 12
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5507
40.5%
1 2902
21.3%
5 809
 
6.0%
2 793
 
5.8%
4 732
 
5.4%
8 729
 
5.4%
7 581
 
4.3%
6 537
 
4.0%
3 517
 
3.8%
9 485
 
3.6%
Latin
ValueCountFrequency (%)
A 3
25.0%
E 2
16.7%
N 1
 
8.3%
G 1
 
8.3%
T 1
 
8.3%
I 1
 
8.3%
H 1
 
8.3%
C 1
 
8.3%
R 1
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13606
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5507
40.5%
1 2902
21.3%
5 809
 
5.9%
2 793
 
5.8%
4 732
 
5.4%
8 729
 
5.4%
7 581
 
4.3%
6 537
 
3.9%
3 517
 
3.8%
9 485
 
3.6%
Other values (10) 14
 
0.1%
Distinct326
Distinct (%)14.4%
Missing4
Missing (%)0.2%
Memory size17.9 KiB
2023-12-12T12:04:04.130358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length10
Mean length10.000883
Min length7

Characters and Unicode

Total characters22662
Distinct characters26
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique61 ?
Unique (%)2.7%

Sample

1st row1000001385
2nd row1000001385
3rd row1000001385
4th row1000001385
5th row1000001386
ValueCountFrequency (%)
1000001412 40
 
1.8%
1000001390 35
 
1.5%
1000001646 33
 
1.5%
1000001495 32
 
1.4%
1000002514 31
 
1.4%
1000002666 30
 
1.3%
1000001504 28
 
1.2%
1000001544 28
 
1.2%
1000002453 27
 
1.2%
1000002394 25
 
1.1%
Other values (316) 1957
86.4%
2023-12-12T12:04:04.769500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 11820
52.2%
1 4385
 
19.3%
5 1210
 
5.3%
2 1178
 
5.2%
4 1102
 
4.9%
6 860
 
3.8%
3 606
 
2.7%
9 600
 
2.6%
7 573
 
2.5%
8 311
 
1.4%
Other values (16) 17
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 22645
99.9%
Lowercase Letter 9
 
< 0.1%
Other Letter 3
 
< 0.1%
Other Punctuation 2
 
< 0.1%
Close Punctuation 1
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 11820
52.2%
1 4385
 
19.4%
5 1210
 
5.3%
2 1178
 
5.2%
4 1102
 
4.9%
6 860
 
3.8%
3 606
 
2.7%
9 600
 
2.6%
7 573
 
2.5%
8 311
 
1.4%
Lowercase Letter
ValueCountFrequency (%)
a 2
22.2%
s 1
11.1%
m 1
11.1%
c 1
11.1%
i 1
11.1%
t 1
11.1%
g 1
11.1%
r 1
11.1%
Other Letter
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Other Punctuation
ValueCountFrequency (%)
: 1
50.0%
. 1
50.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 22649
99.9%
Latin 10
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 11820
52.2%
1 4385
 
19.4%
5 1210
 
5.3%
2 1178
 
5.2%
4 1102
 
4.9%
6 860
 
3.8%
3 606
 
2.7%
9 600
 
2.6%
7 573
 
2.5%
8 311
 
1.4%
Other values (4) 4
 
< 0.1%
Latin
ValueCountFrequency (%)
a 2
20.0%
s 1
10.0%
m 1
10.0%
c 1
10.0%
i 1
10.0%
t 1
10.0%
g 1
10.0%
r 1
10.0%
P 1
10.0%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 22659
> 99.9%
Hangul 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 11820
52.2%
1 4385
 
19.4%
5 1210
 
5.3%
2 1178
 
5.2%
4 1102
 
4.9%
6 860
 
3.8%
3 606
 
2.7%
9 600
 
2.6%
7 573
 
2.5%
8 311
 
1.4%
Other values (13) 14
 
0.1%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

MEMBER_NUM
Text

MISSING 

Distinct756
Distinct (%)96.2%
Missing1484
Missing (%)65.4%
Memory size17.9 KiB
2023-12-12T12:04:05.258112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length6
Mean length6.0012723
Min length6

Characters and Unicode

Total characters4717
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique728 ?
Unique (%)92.6%

Sample

1st row101146
2nd row100004
3rd row100005
4th row100968
5th row100007
ValueCountFrequency (%)
101074 3
 
0.4%
101075 3
 
0.4%
100777 2
 
0.3%
101289 2
 
0.3%
100728 2
 
0.3%
101052 2
 
0.3%
100176 2
 
0.3%
100687 2
 
0.3%
100058 2
 
0.3%
100958 2
 
0.3%
Other values (746) 764
97.2%
2023-12-12T12:04:05.922966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1738
36.8%
1 1179
25.0%
5 257
 
5.4%
8 249
 
5.3%
6 248
 
5.3%
7 244
 
5.2%
9 236
 
5.0%
2 203
 
4.3%
3 191
 
4.0%
4 170
 
3.6%
Other values (2) 2
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4715
> 99.9%
Other Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1738
36.9%
1 1179
25.0%
5 257
 
5.5%
8 249
 
5.3%
6 248
 
5.3%
7 244
 
5.2%
9 236
 
5.0%
2 203
 
4.3%
3 191
 
4.1%
4 170
 
3.6%
Other Punctuation
ValueCountFrequency (%)
: 1
50.0%
. 1
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4717
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1738
36.8%
1 1179
25.0%
5 257
 
5.4%
8 249
 
5.3%
6 248
 
5.3%
7 244
 
5.2%
9 236
 
5.0%
2 203
 
4.3%
3 191
 
4.0%
4 170
 
3.6%
Other values (2) 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4717
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1738
36.8%
1 1179
25.0%
5 257
 
5.4%
8 249
 
5.3%
6 248
 
5.3%
7 244
 
5.2%
9 236
 
5.0%
2 203
 
4.3%
3 191
 
4.0%
4 170
 
3.6%
Other values (2) 2
 
< 0.1%

MEMBER_ID
Text

MISSING 

Distinct1734
Distinct (%)80.0%
Missing103
Missing (%)4.5%
Memory size17.9 KiB
2023-12-12T12:04:06.341585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length48
Median length41
Mean length19.010152
Min length1

Characters and Unicode

Total characters41195
Distinct characters82
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1698 ?
Unique (%)78.4%

Sample

1st rowpori.park@asu.edu
2nd rowyoungoh@asu.edu
3rd rowyysys@uchicago.edu
4th rowCypark@asu.edu
5th rowgarethmc@bu.edu
ValueCountFrequency (%)
ksnet@aks.ac.kr 295
 
13.4%
temp.aks.ac.kr 101
 
4.6%
educopar@gmail.com 5
 
0.2%
unsw.edu.au 4
 
0.2%
4
 
0.2%
edu.au 4
 
0.2%
cencon@hindustanuniv.ac.in 3
 
0.1%
princeton.edu 3
 
0.1%
icfks-msu@yandex.ru 3
 
0.1%
hegartyn@stjohns.edu 2
 
0.1%
Other values (1743) 1780
80.8%
2023-12-12T12:04:06.977545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 3984
 
9.7%
a 3936
 
9.6%
e 2810
 
6.8%
n 2290
 
5.6%
k 2262
 
5.5%
@ 2166
 
5.3%
u 2045
 
5.0%
s 2010
 
4.9%
o 1992
 
4.8%
c 1885
 
4.6%
Other values (72) 15815
38.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 32997
80.1%
Other Punctuation 6162
 
15.0%
Decimal Number 1541
 
3.7%
Uppercase Letter 221
 
0.5%
Dash Punctuation 128
 
0.3%
Connector Punctuation 93
 
0.2%
Space Separator 40
 
0.1%
Other Letter 10
 
< 0.1%
Close Punctuation 1
 
< 0.1%
Open Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 3936
 
11.9%
e 2810
 
8.5%
n 2290
 
6.9%
k 2262
 
6.9%
u 2045
 
6.2%
s 2010
 
6.1%
o 1992
 
6.0%
c 1885
 
5.7%
m 1744
 
5.3%
i 1704
 
5.2%
Other values (16) 10319
31.3%
Uppercase Letter
ValueCountFrequency (%)
S 23
 
10.4%
C 23
 
10.4%
H 22
 
10.0%
J 18
 
8.1%
K 17
 
7.7%
A 15
 
6.8%
M 13
 
5.9%
L 11
 
5.0%
I 9
 
4.1%
E 8
 
3.6%
Other values (13) 62
28.1%
Decimal Number
ValueCountFrequency (%)
2 246
16.0%
1 203
13.2%
3 193
12.5%
0 166
10.8%
4 154
10.0%
7 136
8.8%
8 114
7.4%
5 111
7.2%
6 111
7.2%
9 107
6.9%
Other Letter
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Other Punctuation
ValueCountFrequency (%)
. 3984
64.7%
@ 2166
35.2%
, 8
 
0.1%
: 1
 
< 0.1%
/ 1
 
< 0.1%
? 1
 
< 0.1%
; 1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 128
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 93
100.0%
Space Separator
ValueCountFrequency (%)
40
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 33218
80.6%
Common 7967
 
19.3%
Hangul 10
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 3936
 
11.8%
e 2810
 
8.5%
n 2290
 
6.9%
k 2262
 
6.8%
u 2045
 
6.2%
s 2010
 
6.1%
o 1992
 
6.0%
c 1885
 
5.7%
m 1744
 
5.3%
i 1704
 
5.1%
Other values (39) 10540
31.7%
Common
ValueCountFrequency (%)
. 3984
50.0%
@ 2166
27.2%
2 246
 
3.1%
1 203
 
2.5%
3 193
 
2.4%
0 166
 
2.1%
4 154
 
1.9%
7 136
 
1.7%
- 128
 
1.6%
8 114
 
1.4%
Other values (13) 477
 
6.0%
Hangul
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 41185
> 99.9%
Hangul 10
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 3984
 
9.7%
a 3936
 
9.6%
e 2810
 
6.8%
n 2290
 
5.6%
k 2262
 
5.5%
@ 2166
 
5.3%
u 2045
 
5.0%
s 2010
 
4.9%
o 1992
 
4.8%
c 1885
 
4.6%
Other values (62) 15805
38.4%
Hangul
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%

CODE_ORDER
Categorical

IMBALANCE 

Distinct33
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size17.9 KiB
0
1414 
1
 
112
2
 
100
3
 
92
4
 
81
Other values (28)
471 

Length

Max length4
Median length1
Mean length1.1079295
Min length1

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row0
2nd row1
3rd row2
4th row3
5th row0

Common Values

ValueCountFrequency (%)
0 1414
62.3%
1 112
 
4.9%
2 100
 
4.4%
3 92
 
4.1%
4 81
 
3.6%
5 70
 
3.1%
6 63
 
2.8%
7 50
 
2.2%
8 38
 
1.7%
9 36
 
1.6%
Other values (23) 214
 
9.4%

Length

2023-12-12T12:04:07.160122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0 1414
62.3%
1 112
 
4.9%
2 100
 
4.4%
3 92
 
4.1%
4 81
 
3.6%
5 70
 
3.1%
6 63
 
2.8%
7 50
 
2.2%
8 38
 
1.7%
9 36
 
1.6%
Other values (23) 214
 
9.4%

EMAIL_OPEN
Boolean

Distinct2
Distinct (%)0.1%
Missing17
Missing (%)0.7%
Memory size4.6 KiB
True
1759 
False
494 
(Missing)
 
17
ValueCountFrequency (%)
True 1759
77.5%
False 494
 
21.8%
(Missing) 17
 
0.7%
2023-12-12T12:04:07.295605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Distinct581
Distinct (%)25.7%
Missing7
Missing (%)0.3%
Memory size17.9 KiB
2023-12-12T12:04:07.700697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters15841
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique518 ?
Unique (%)22.9%

Sample

1st row15:23.8
2nd row15:23.8
3rd row15:23.8
4th row15:23.8
5th row15:23.8
ValueCountFrequency (%)
15:23.8 1449
64.0%
32:34.7 13
 
0.6%
31:31.6 12
 
0.5%
52:04.9 12
 
0.5%
06:19.7 11
 
0.5%
51:27.6 11
 
0.5%
33:37.9 9
 
0.4%
06:37.7 8
 
0.4%
40:37.6 8
 
0.4%
28:40.6 8
 
0.4%
Other values (571) 722
31.9%
2023-12-12T12:04:08.258775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
: 2263
14.3%
. 2263
14.3%
3 1999
12.6%
5 1930
12.2%
1 1917
12.1%
2 1915
12.1%
8 1710
10.8%
4 503
 
3.2%
0 494
 
3.1%
7 303
 
1.9%
Other values (2) 544
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 11315
71.4%
Other Punctuation 4526
 
28.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 1999
17.7%
5 1930
17.1%
1 1917
16.9%
2 1915
16.9%
8 1710
15.1%
4 503
 
4.4%
0 494
 
4.4%
7 303
 
2.7%
6 278
 
2.5%
9 266
 
2.4%
Other Punctuation
ValueCountFrequency (%)
: 2263
50.0%
. 2263
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15841
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
: 2263
14.3%
. 2263
14.3%
3 1999
12.6%
5 1930
12.2%
1 1917
12.1%
2 1915
12.1%
8 1710
10.8%
4 503
 
3.2%
0 494
 
3.1%
7 303
 
1.9%
Other values (2) 544
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15841
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
: 2263
14.3%
. 2263
14.3%
3 1999
12.6%
5 1930
12.2%
1 1917
12.1%
2 1915
12.1%
8 1710
10.8%
4 503
 
3.2%
0 494
 
3.1%
7 303
 
1.9%
Other values (2) 544
 
3.4%

MODIFY_DT
Text

MISSING 

Distinct110
Distinct (%)5.3%
Missing210
Missing (%)9.3%
Memory size17.9 KiB
2023-12-12T12:04:08.563044image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters14420
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)1.1%

Sample

1st row43:05.1
2nd row43:05.1
3rd row43:05.1
4th row43:05.1
5th row00:00.0
ValueCountFrequency (%)
00:00.0 1473
71.5%
05:32.3 25
 
1.2%
50:47.0 19
 
0.9%
24:39.0 17
 
0.8%
53:29.6 15
 
0.7%
15:51.2 14
 
0.7%
29:57.7 13
 
0.6%
21:29.6 12
 
0.6%
27:17.9 12
 
0.6%
20:35.6 11
 
0.5%
Other values (100) 449
 
21.8%
2023-12-12T12:04:08.993948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 7659
53.1%
: 2060
 
14.3%
. 2060
 
14.3%
2 500
 
3.5%
1 400
 
2.8%
5 394
 
2.7%
3 330
 
2.3%
4 248
 
1.7%
7 230
 
1.6%
6 220
 
1.5%
Other values (2) 319
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10300
71.4%
Other Punctuation 4120
 
28.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 7659
74.4%
2 500
 
4.9%
1 400
 
3.9%
5 394
 
3.8%
3 330
 
3.2%
4 248
 
2.4%
7 230
 
2.2%
6 220
 
2.1%
9 215
 
2.1%
8 104
 
1.0%
Other Punctuation
ValueCountFrequency (%)
: 2060
50.0%
. 2060
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 14420
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 7659
53.1%
: 2060
 
14.3%
. 2060
 
14.3%
2 500
 
3.5%
1 400
 
2.8%
5 394
 
2.7%
3 330
 
2.3%
4 248
 
1.7%
7 230
 
1.6%
6 220
 
1.5%
Other values (2) 319
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14420
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 7659
53.1%
: 2060
 
14.3%
. 2060
 
14.3%
2 500
 
3.5%
1 400
 
2.8%
5 394
 
2.7%
3 330
 
2.3%
4 248
 
1.7%
7 230
 
1.6%
6 220
 
1.5%
Other values (2) 319
 
2.2%

MODIFIER
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size17.9 KiB
ksnet@aks.ac.kr
1699 
oic@aks.ac.kr
350 
<NA>
210 
goodday2me@gmail.com
 
11

Length

Max length20
Median length15
Mean length13.698238
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowksnet@aks.ac.kr
2nd rowksnet@aks.ac.kr
3rd rowksnet@aks.ac.kr
4th rowksnet@aks.ac.kr
5th row<NA>

Common Values

ValueCountFrequency (%)
ksnet@aks.ac.kr 1699
74.8%
oic@aks.ac.kr 350
 
15.4%
<NA> 210
 
9.3%
goodday2me@gmail.com 11
 
0.5%

Length

2023-12-12T12:04:09.147672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T12:04:09.278953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
ksnet@aks.ac.kr 1699
74.8%
oic@aks.ac.kr 350
 
15.4%
na 210
 
9.3%
goodday2me@gmail.com 11
 
0.5%

ERASE_YN
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)0.1%
Missing7
Missing (%)0.3%
Memory size4.6 KiB
False
2002 
True
261 
(Missing)
 
7
ValueCountFrequency (%)
False 2002
88.2%
True 261
 
11.5%
(Missing) 7
 
0.3%
2023-12-12T12:04:09.368212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

ERASE_DT
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct35
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size17.9 KiB
<NA>
2042 
00:00.0
 
108
08:33.2
 
16
54:38.4
 
13
06:21.1
 
10
Other values (30)
 
81

Length

Max length7
Median length4
Mean length4.3013216
Min length4

Unique

Unique12 ?
Unique (%)0.5%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row00:00.0

Common Values

ValueCountFrequency (%)
<NA> 2042
90.0%
00:00.0 108
 
4.8%
08:33.2 16
 
0.7%
54:38.4 13
 
0.6%
06:21.1 10
 
0.4%
57:29.0 8
 
0.4%
54:53.3 8
 
0.4%
31:04.2 7
 
0.3%
36:55.1 5
 
0.2%
17:49.7 4
 
0.2%
Other values (25) 49
 
2.2%

Length

2023-12-12T12:04:09.492731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 2042
90.0%
00:00.0 108
 
4.8%
08:33.2 16
 
0.7%
54:38.4 13
 
0.6%
06:21.1 10
 
0.4%
57:29.0 8
 
0.4%
54:53.3 8
 
0.4%
31:04.2 7
 
0.3%
36:55.1 5
 
0.2%
17:49.7 4
 
0.2%
Other values (25) 49
 
2.2%

ERASER
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size17.9 KiB
<NA>
2042 
ksnet@aks.ac.kr
 
162
goodday2me@gmail.com
 
54
oic@aks.ac.kr
 
12

Length

Max length20
Median length4
Mean length5.2132159
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th rowoic@aks.ac.kr

Common Values

ValueCountFrequency (%)
<NA> 2042
90.0%
ksnet@aks.ac.kr 162
 
7.1%
goodday2me@gmail.com 54
 
2.4%
oic@aks.ac.kr 12
 
0.5%

Length

2023-12-12T12:04:09.632991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T12:04:09.750544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 2042
90.0%
ksnet@aks.ac.kr 162
 
7.1%
goodday2me@gmail.com 54
 
2.4%
oic@aks.ac.kr 12
 
0.5%

Correlations

2023-12-12T12:04:09.840888image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
CODE_ORDEREMAIL_OPENMODIFIERERASE_YNERASE_DTERASER
CODE_ORDER1.0000.0460.4010.0000.7350.466
EMAIL_OPEN0.0461.0000.0490.0000.3970.163
MODIFIER0.4010.0491.0000.0920.8130.344
ERASE_YN0.0000.0000.0921.000NaNNaN
ERASE_DT0.7350.3970.813NaN1.0000.523
ERASER0.4660.1630.344NaN0.5231.000
2023-12-12T12:04:09.983407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ERASERERASE_YNCODE_ORDERMODIFIEREMAIL_OPENERASE_DT
ERASER1.0001.0000.2350.5490.2690.286
ERASE_YN1.0001.0000.0000.1520.0001.000
CODE_ORDER0.2350.0001.0000.2200.0390.275
MODIFIER0.5490.1520.2201.0000.0820.622
EMAIL_OPEN0.2690.0000.0390.0821.0000.312
ERASE_DT0.2861.0000.2750.6220.3121.000
2023-12-12T12:04:10.092188image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
CODE_ORDEREMAIL_OPENMODIFIERERASE_YNERASE_DTERASER
CODE_ORDER1.0000.0390.2200.0000.2750.235
EMAIL_OPEN0.0391.0000.0820.0000.3120.269
MODIFIER0.2200.0821.0000.1520.6220.549
ERASE_YN0.0000.0000.1521.0001.0001.000
ERASE_DT0.2750.3120.6221.0001.0000.286
ERASER0.2350.2690.5491.0000.2861.000

Missing values

2023-12-12T12:04:00.556433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T12:04:00.831920image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T12:04:01.080729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

INSTITUTION_FACULTY_IDINSTITUTION_NUMPROFILE_IDMEMBER_NUMMEMBER_IDCODE_ORDEREMAIL_OPENREGISTER_DTMODIFY_DTMODIFIERERASE_YNERASE_DTERASER
015381000011000001385<NA>pori.park@asu.edu0Y15:23.843:05.1ksnet@aks.ac.krN<NA><NA>
115391000011000001385101146youngoh@asu.edu1Y15:23.843:05.1ksnet@aks.ac.krN<NA><NA>
215401000011000001385<NA>yysys@uchicago.edu2Y15:23.843:05.1ksnet@aks.ac.krN<NA><NA>
315411000011000001385<NA>Cypark@asu.edu3Y15:23.843:05.1ksnet@aks.ac.krN<NA><NA>
415431000021000001386100004garethmc@bu.edu0Y15:23.8<NA><NA>Y00:00.0oic@aks.ac.kr
515441000031000001387100005hye-sook_wang@brown.edu0Y15:23.800:00.0ksnet@aks.ac.krN<NA><NA>
615451000031000001387<NA>Samuel_Perry@brown.edu0Y15:23.800:00.0ksnet@aks.ac.krN<NA><NA>
715461000031000001387<NA>James_McClain@brown.edu0Y15:23.800:00.0ksnet@aks.ac.krN<NA><NA>
815481000041000001388100968jkh25@columbia.edu0Y15:23.836:02.6ksnet@aks.ac.krN<NA><NA>
915491000041000001388<NA>bl355@columbia.edu1Y15:23.836:02.6ksnet@aks.ac.krN<NA><NA>
INSTITUTION_FACULTY_IDINSTITUTION_NUMPROFILE_IDMEMBER_NUMMEMBER_IDCODE_ORDEREMAIL_OPENREGISTER_DTMODIFY_DTMODIFIERERASE_YNERASE_DTERASER
226050161003281000002222<NA>ksnet@aks.ac.kr7Y31:08.533:25.3ksnet@aks.ac.krN<NA><NA>
226150171003281000002222<NA>ksnet@aks.ac.kr8Y31:08.533:25.3ksnet@aks.ac.krN<NA><NA>
226250181006531000002747<NA>ksnet@aks.ac.kr0Y04:36.6<NA><NA>N<NA><NA>
226350191006531000002747<NA>ksnet@aks.ac.kr1Y04:36.6<NA><NA>N<NA><NA>
226450201006531000002747<NA>ksnet@aks.ac.kr2Y04:36.6<NA><NA>N<NA><NA>
226550211006531000002747<NA>ksnet@aks.ac.kr3Y04:36.6<NA><NA>N<NA><NA>
226650221006531000002747<NA>ksnet@aks.ac.kr4Y04:36.6<NA><NA>N<NA><NA>
226750231006531000002747<NA>ksnet@aks.ac.kr5Y04:36.6<NA><NA>N<NA><NA>
226850241006531000002747<NA>ksnet@aks.ac.kr6Y04:36.7<NA><NA>N<NA><NA>
226950251006531000002747<NA>ksnet@aks.ac.kr7Y04:36.7<NA><NA>N<NA><NA>

Duplicate rows

Most frequently occurring

INSTITUTION_FACULTY_IDINSTITUTION_NUMPROFILE_IDMEMBER_NUMMEMBER_IDCODE_ORDEREMAIL_OPENREGISTER_DTMODIFY_DTMODIFIERERASE_YNERASE_DTERASER# duplicates
0문법론(Grammar)<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>2
1번역학(Translation Studies)<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>2