Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows192
Duplicate rows (%)1.9%
Total size in memory312.5 KiB
Average record size in memory32.0 B

Variable types

Text2
Categorical1

Dataset

DescriptionKnowTBT포털 회원의 관심국가에 대한 정보를 제공함, 우선 대륙별-국가이름 정보를 제공하되, 회원의 ID는 비식별화하였습니다.
URLhttps://www.data.go.kr/data/15068826/fileData.do

Alerts

Dataset has 192 (1.9%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 08:11:58.135795
Analysis finished2023-12-12 08:11:58.578149
Duration0.44 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct892
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T17:11:58.853261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length17
Mean length7.3653
Min length5

Characters and Unicode

Total characters73653
Distinct characters57
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique212 ?
Unique (%)2.1%

Sample

1st rowf****n
2nd rowk****gs2
3rd rown****w
4th rowe****do
5th rowj****k82
ValueCountFrequency (%)
k 152
 
1.5%
t 86
 
0.9%
j 72
 
0.7%
y****s 65
 
0.7%
m 65
 
0.7%
s 59
 
0.6%
p****s 56
 
0.6%
j****635 56
 
0.6%
h 54
 
0.5%
r 53
 
0.5%
Other values (878) 9282
92.8%
2023-12-12T17:11:59.342685image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 40000
54.3%
s 2055
 
2.8%
0 1791
 
2.4%
e 1777
 
2.4%
n 1676
 
2.3%
a 1499
 
2.0%
k 1489
 
2.0%
1 1435
 
1.9%
2 1358
 
1.8%
r 1254
 
1.7%
Other values (47) 19319
26.2%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 40326
54.8%
Lowercase Letter 22782
30.9%
Decimal Number 10258
 
13.9%
Uppercase Letter 284
 
0.4%
Dash Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 2055
 
9.0%
e 1777
 
7.8%
n 1676
 
7.4%
a 1499
 
6.6%
k 1489
 
6.5%
r 1254
 
5.5%
o 1164
 
5.1%
g 1115
 
4.9%
j 1102
 
4.8%
i 1079
 
4.7%
Other values (16) 8572
37.6%
Uppercase Letter
ValueCountFrequency (%)
D 61
21.5%
K 50
17.6%
G 35
12.3%
O 28
9.9%
F 22
 
7.7%
A 22
 
7.7%
S 14
 
4.9%
T 13
 
4.6%
L 11
 
3.9%
E 7
 
2.5%
Other values (7) 21
 
7.4%
Decimal Number
ValueCountFrequency (%)
0 1791
17.5%
1 1435
14.0%
2 1358
13.2%
9 1170
11.4%
7 952
9.3%
8 923
9.0%
3 717
7.0%
4 687
 
6.7%
5 651
 
6.3%
6 574
 
5.6%
Other Punctuation
ValueCountFrequency (%)
* 40000
99.2%
. 211
 
0.5%
@ 115
 
0.3%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 50587
68.7%
Latin 23066
31.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 2055
 
8.9%
e 1777
 
7.7%
n 1676
 
7.3%
a 1499
 
6.5%
k 1489
 
6.5%
r 1254
 
5.4%
o 1164
 
5.0%
g 1115
 
4.8%
j 1102
 
4.8%
i 1079
 
4.7%
Other values (33) 8856
38.4%
Common
ValueCountFrequency (%)
* 40000
79.1%
0 1791
 
3.5%
1 1435
 
2.8%
2 1358
 
2.7%
9 1170
 
2.3%
7 952
 
1.9%
8 923
 
1.8%
3 717
 
1.4%
4 687
 
1.4%
5 651
 
1.3%
Other values (4) 903
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 73653
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 40000
54.3%
s 2055
 
2.8%
0 1791
 
2.4%
e 1777
 
2.4%
n 1676
 
2.3%
a 1499
 
2.0%
k 1489
 
2.0%
1 1435
 
1.9%
2 1358
 
1.8%
r 1254
 
1.7%
Other values (47) 19319
26.2%

대륙이름
Categorical

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
유럽
3492 
아시아
2222 
중남미
1616 
아프리카
1058 
중동
854 
Other values (3)
758 

Length

Max length5
Median length4
Mean length2.7035
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row아시아
2nd row유럽
3rd row오세아니아
4th row중남미
5th row아시아

Common Values

ValueCountFrequency (%)
유럽 3492
34.9%
아시아 2222
22.2%
중남미 1616
16.2%
아프리카 1058
 
10.6%
중동 854
 
8.5%
북미 393
 
3.9%
오세아니아 358
 
3.6%
전세계 7
 
0.1%

Length

2023-12-12T17:11:59.537784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:11:59.720347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
유럽 3492
34.9%
아시아 2222
22.2%
중남미 1616
16.2%
아프리카 1058
 
10.6%
중동 854
 
8.5%
북미 393
 
3.9%
오세아니아 358
 
3.6%
전세계 7
 
0.1%
Distinct237
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T17:12:00.099984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length11
Mean length3.7603
Min length1

Characters and Unicode

Total characters37603
Distinct characters220
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st row일본
2nd row스웨덴
3rd row마우리티우스
4th row과테말라
5th row인도
ValueCountFrequency (%)
미국 223
 
2.2%
중국 207
 
2.0%
일본 157
 
1.5%
독일 147
 
1.4%
한국 145
 
1.4%
영국 144
 
1.4%
캐나다 136
 
1.3%
인도 132
 
1.3%
프랑스 122
 
1.2%
사우디아라비아 117
 
1.1%
Other values (237) 8675
85.0%
2023-12-12T17:12:00.610045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2814
 
7.5%
1569
 
4.2%
1297
 
3.4%
1153
 
3.1%
1083
 
2.9%
1014
 
2.7%
917
 
2.4%
902
 
2.4%
817
 
2.2%
757
 
2.0%
Other values (210) 25280
67.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 37345
99.3%
Space Separator 205
 
0.5%
Dash Punctuation 53
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2814
 
7.5%
1569
 
4.2%
1297
 
3.5%
1153
 
3.1%
1083
 
2.9%
1014
 
2.7%
917
 
2.5%
902
 
2.4%
817
 
2.2%
757
 
2.0%
Other values (208) 25022
67.0%
Space Separator
ValueCountFrequency (%)
205
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 53
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 37345
99.3%
Common 258
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2814
 
7.5%
1569
 
4.2%
1297
 
3.5%
1153
 
3.1%
1083
 
2.9%
1014
 
2.7%
917
 
2.5%
902
 
2.4%
817
 
2.2%
757
 
2.0%
Other values (208) 25022
67.0%
Common
ValueCountFrequency (%)
205
79.5%
- 53
 
20.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 37345
99.3%
ASCII 258
 
0.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2814
 
7.5%
1569
 
4.2%
1297
 
3.5%
1153
 
3.1%
1083
 
2.9%
1014
 
2.7%
917
 
2.5%
902
 
2.4%
817
 
2.2%
757
 
2.0%
Other values (208) 25022
67.0%
ASCII
ValueCountFrequency (%)
205
79.5%
- 53
 
20.5%

Missing values

2023-12-12T17:11:58.411041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T17:11:58.532092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

회원아이디대륙이름국가이름
9450f****n아시아일본
22079k****gs2유럽스웨덴
27551n****w오세아니아마우리티우스
8467e****do중남미과테말라
18178j****k82아시아인도
35223s****w유럽헝가리
11690g****er유럽네덜란드
36128s****유럽프랑스
32562r****중동사우디아라비아
16717j****b아시아태국
회원아이디대륙이름국가이름
28758o****7중동사우디아라비아
8547e****n9021유럽폴란드
15898i****@kaeri.re.kr중남미아르헨티나
29151o****598오세아니아뉴질랜드
5150c****r유럽벨기에
23666l****39유럽아일랜드
35182s****w중동요르단
5351c****eed아시아베트남
37755s****k유럽마케도니아 공화국
22012k****1중동쿠웨이트

Duplicate rows

Most frequently occurring

회원아이디대륙이름국가이름# duplicates
12e****6아프리카감비아7
13e****6유럽건지7
17e****6중남미가이아나6
137s****아시아중국5
11e****6북미4
35j****중남미브라질4
50k****아시아말레이시아4
57k****아시아일본4
66k****유럽라트비아4
71k****유럽스위스4