Overview

Dataset statistics

Number of variables6
Number of observations681
Missing cells621
Missing cells (%)15.2%
Duplicate rows25
Duplicate rows (%)3.7%
Total size in memory32.7 KiB
Average record size in memory49.2 B

Variable types

Text4
Categorical2

Dataset

Description2020년 12월 31일 기준으로 해외에 진출한 우리나라 공공 및 민간 기관의 유형, 공공기관 코드 등의 정보를 제공합니다.
Author외교부
URLhttps://www.data.go.kr/data/15076565/fileData.do

Alerts

기준년도 has constant value ""Constant
Dataset has 25 (3.7%) duplicate rowsDuplicates
공공기관진출내용 has 617 (90.6%) missing valuesMissing

Reproduction

Analysis started2023-12-12 05:31:48.472695
Analysis finished2023-12-12 05:31:49.378628
Duration0.91 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

국가
Text

Distinct95
Distinct (%)14.0%
Missing0
Missing (%)0.0%
Memory size5.4 KiB
2023-12-12T14:31:49.597800image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7
Mean length3.4581498
Min length2

Characters and Unicode

Total characters2355
Distinct characters125
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)2.3%

Sample

1st row가봉
2nd row가봉
3rd row가봉
4th row가봉
5th row과테말라
ValueCountFrequency (%)
칠레 24
 
3.5%
인도네시아 22
 
3.2%
프랑스 21
 
3.1%
파나마 20
 
2.9%
스페인 20
 
2.9%
요르단 19
 
2.8%
몽골 18
 
2.6%
멕시코 17
 
2.5%
캐나다 16
 
2.3%
벨기에 16
 
2.3%
Other values (85) 488
71.7%
2023-12-12T14:31:50.017247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
148
 
6.3%
105
 
4.5%
101
 
4.3%
83
 
3.5%
69
 
2.9%
66
 
2.8%
62
 
2.6%
59
 
2.5%
52
 
2.2%
48
 
2.0%
Other values (115) 1562
66.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2355
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
148
 
6.3%
105
 
4.5%
101
 
4.3%
83
 
3.5%
69
 
2.9%
66
 
2.8%
62
 
2.6%
59
 
2.5%
52
 
2.2%
48
 
2.0%
Other values (115) 1562
66.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2355
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
148
 
6.3%
105
 
4.5%
101
 
4.3%
83
 
3.5%
69
 
2.9%
66
 
2.8%
62
 
2.6%
59
 
2.5%
52
 
2.2%
48
 
2.0%
Other values (115) 1562
66.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2355
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
148
 
6.3%
105
 
4.5%
101
 
4.3%
83
 
3.5%
69
 
2.9%
66
 
2.8%
62
 
2.6%
59
 
2.5%
52
 
2.2%
48
 
2.0%
Other values (115) 1562
66.3%
Distinct95
Distinct (%)14.0%
Missing0
Missing (%)0.0%
Memory size5.4 KiB
2023-12-12T14:31:50.319672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1362
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)2.3%

Sample

1st rowGA
2nd rowGA
3rd rowGA
4th rowGA
5th rowGT
ValueCountFrequency (%)
cl 24
 
3.5%
id 22
 
3.2%
fr 21
 
3.1%
pa 20
 
2.9%
es 20
 
2.9%
jo 19
 
2.8%
mn 18
 
2.6%
mx 17
 
2.5%
ca 16
 
2.3%
be 16
 
2.3%
Other values (85) 488
71.7%
2023-12-12T14:31:50.684777image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 122
 
9.0%
A 110
 
8.1%
P 87
 
6.4%
N 83
 
6.1%
R 77
 
5.7%
I 74
 
5.4%
S 72
 
5.3%
M 65
 
4.8%
C 64
 
4.7%
T 61
 
4.5%
Other values (16) 547
40.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1362
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 122
 
9.0%
A 110
 
8.1%
P 87
 
6.4%
N 83
 
6.1%
R 77
 
5.7%
I 74
 
5.4%
S 72
 
5.3%
M 65
 
4.8%
C 64
 
4.7%
T 61
 
4.5%
Other values (16) 547
40.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 1362
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 122
 
9.0%
A 110
 
8.1%
P 87
 
6.4%
N 83
 
6.1%
R 77
 
5.7%
I 74
 
5.4%
S 72
 
5.3%
M 65
 
4.8%
C 64
 
4.7%
T 61
 
4.5%
Other values (16) 547
40.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1362
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 122
 
9.0%
A 110
 
8.1%
P 87
 
6.4%
N 83
 
6.1%
R 77
 
5.7%
I 74
 
5.4%
S 72
 
5.3%
M 65
 
4.8%
C 64
 
4.7%
T 61
 
4.5%
Other values (16) 547
40.2%
Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.4 KiB
일반기관
431 
정부투자기관
129 
정부기관
121 

Length

Max length6
Median length4
Mean length4.3788546
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반기관
2nd row일반기관
3rd row정부기관
4th row정부기관
5th row일반기관

Common Values

ValueCountFrequency (%)
일반기관 431
63.3%
정부투자기관 129
 
18.9%
정부기관 121
 
17.8%

Length

2023-12-12T14:31:50.858843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:31:50.969445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반기관 431
63.3%
정부투자기관 129
 
18.9%
정부기관 121
 
17.8%
Distinct310
Distinct (%)45.8%
Missing4
Missing (%)0.6%
Memory size5.4 KiB
2023-12-12T14:31:51.230686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length22
Mean length5.1875923
Min length2

Characters and Unicode

Total characters3512
Distinct characters304
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique241 ?
Unique (%)35.6%

Sample

1st rowKT
2nd rowKTencore
3rd rowFTC
4th row대사관
5th rowKOICA
ValueCountFrequency (%)
대사관 70
 
8.8%
kotra 48
 
6.0%
삼성전자 34
 
4.3%
lg전자 30
 
3.8%
삼성 13
 
1.6%
koica 12
 
1.5%
현대자동차 11
 
1.4%
대우 10
 
1.3%
업체 10
 
1.3%
한국관광공사 10
 
1.3%
Other values (342) 550
68.9%
2023-12-12T14:31:51.671130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
171
 
4.9%
163
 
4.6%
123
 
3.5%
108
 
3.1%
102
 
2.9%
101
 
2.9%
K 88
 
2.5%
87
 
2.5%
85
 
2.4%
O 78
 
2.2%
Other values (294) 2406
68.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2604
74.1%
Uppercase Letter 588
 
16.7%
Space Separator 123
 
3.5%
Decimal Number 72
 
2.1%
Lowercase Letter 66
 
1.9%
Close Punctuation 24
 
0.7%
Open Punctuation 24
 
0.7%
Other Punctuation 8
 
0.2%
Dash Punctuation 2
 
0.1%
Other Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
171
 
6.6%
163
 
6.3%
108
 
4.1%
102
 
3.9%
101
 
3.9%
87
 
3.3%
85
 
3.3%
77
 
3.0%
70
 
2.7%
62
 
2.4%
Other values (240) 1578
60.6%
Uppercase Letter
ValueCountFrequency (%)
K 88
15.0%
O 78
13.3%
A 70
11.9%
T 65
11.1%
G 56
9.5%
R 52
8.8%
L 50
8.5%
S 35
 
6.0%
C 25
 
4.3%
I 20
 
3.4%
Other values (11) 49
8.3%
Lowercase Letter
ValueCountFrequency (%)
e 13
19.7%
o 8
12.1%
r 8
12.1%
n 8
12.1%
a 7
10.6%
t 6
9.1%
c 5
 
7.6%
d 2
 
3.0%
s 2
 
3.0%
l 2
 
3.0%
Other values (5) 5
 
7.6%
Decimal Number
ValueCountFrequency (%)
0 25
34.7%
1 14
19.4%
2 10
 
13.9%
3 6
 
8.3%
5 4
 
5.6%
4 4
 
5.6%
7 3
 
4.2%
9 3
 
4.2%
6 2
 
2.8%
8 1
 
1.4%
Other Punctuation
ValueCountFrequency (%)
, 5
62.5%
& 2
 
25.0%
1
 
12.5%
Space Separator
ValueCountFrequency (%)
123
100.0%
Close Punctuation
ValueCountFrequency (%)
) 24
100.0%
Open Punctuation
ValueCountFrequency (%)
( 24
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2599
74.0%
Latin 654
 
18.6%
Common 253
 
7.2%
Han 6
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
171
 
6.6%
163
 
6.3%
108
 
4.2%
102
 
3.9%
101
 
3.9%
87
 
3.3%
85
 
3.3%
77
 
3.0%
70
 
2.7%
62
 
2.4%
Other values (235) 1573
60.5%
Latin
ValueCountFrequency (%)
K 88
13.5%
O 78
11.9%
A 70
10.7%
T 65
9.9%
G 56
8.6%
R 52
8.0%
L 50
7.6%
S 35
 
5.4%
C 25
 
3.8%
I 20
 
3.1%
Other values (26) 115
17.6%
Common
ValueCountFrequency (%)
123
48.6%
0 25
 
9.9%
) 24
 
9.5%
( 24
 
9.5%
1 14
 
5.5%
2 10
 
4.0%
3 6
 
2.4%
, 5
 
2.0%
5 4
 
1.6%
4 4
 
1.6%
Other values (7) 14
 
5.5%
Han
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
貿 1
16.7%
1
16.7%
1
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2598
74.0%
ASCII 906
 
25.8%
CJK 6
 
0.2%
None 2
 
0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
171
 
6.6%
163
 
6.3%
108
 
4.2%
102
 
3.9%
101
 
3.9%
87
 
3.3%
85
 
3.3%
77
 
3.0%
70
 
2.7%
62
 
2.4%
Other values (234) 1572
60.5%
ASCII
ValueCountFrequency (%)
123
13.6%
K 88
 
9.7%
O 78
 
8.6%
A 70
 
7.7%
T 65
 
7.2%
G 56
 
6.2%
R 52
 
5.7%
L 50
 
5.5%
S 35
 
3.9%
0 25
 
2.8%
Other values (42) 264
29.1%
CJK
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
貿 1
16.7%
1
16.7%
1
16.7%
None
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct57
Distinct (%)89.1%
Missing617
Missing (%)90.6%
Memory size5.4 KiB
2023-12-12T14:31:52.004416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length115
Median length35
Mean length16.09375
Min length2

Characters and Unicode

Total characters1030
Distinct characters253
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique53 ?
Unique (%)82.8%

Sample

1st row약 170개 업체 진출 (주로 섬유, 봉제업)
2nd row대부분 섬유, 봉제업체
3rd row도매 및 소매업, 건설공사업, 제조업
4th rowLagos 소재
5th row섬유 ·의류 업체 및 협력업체 진출
ValueCountFrequency (%)
주로 7
 
2.9%
6
 
2.5%
4
 
1.7%
섬유 4
 
1.7%
가전3사 3
 
1.2%
의류 3
 
1.2%
3
 
1.2%
진출 3
 
1.2%
kotra 3
 
1.2%
철수 3
 
1.2%
Other values (187) 202
83.8%
2023-12-12T14:31:52.517277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
177
 
17.2%
, 67
 
6.5%
24
 
2.3%
24
 
2.3%
16
 
1.6%
0 13
 
1.3%
12
 
1.2%
12
 
1.2%
1 12
 
1.2%
12
 
1.2%
Other values (243) 661
64.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 662
64.3%
Space Separator 177
 
17.2%
Other Punctuation 83
 
8.1%
Decimal Number 65
 
6.3%
Uppercase Letter 18
 
1.7%
Lowercase Letter 8
 
0.8%
Close Punctuation 6
 
0.6%
Open Punctuation 6
 
0.6%
Dash Punctuation 4
 
0.4%
Currency Symbol 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
24
 
3.6%
24
 
3.6%
16
 
2.4%
12
 
1.8%
12
 
1.8%
12
 
1.8%
11
 
1.7%
11
 
1.7%
11
 
1.7%
10
 
1.5%
Other values (207) 519
78.4%
Decimal Number
ValueCountFrequency (%)
0 13
20.0%
1 12
18.5%
2 8
12.3%
7 6
9.2%
4 6
9.2%
3 6
9.2%
5 5
 
7.7%
9 5
 
7.7%
6 2
 
3.1%
8 2
 
3.1%
Uppercase Letter
ValueCountFrequency (%)
O 3
16.7%
K 3
16.7%
M 2
11.1%
A 2
11.1%
R 2
11.1%
T 2
11.1%
G 1
 
5.6%
C 1
 
5.6%
U 1
 
5.6%
L 1
 
5.6%
Lowercase Letter
ValueCountFrequency (%)
a 2
25.0%
o 2
25.0%
r 1
12.5%
t 1
12.5%
s 1
12.5%
g 1
12.5%
Other Punctuation
ValueCountFrequency (%)
, 67
80.7%
' 6
 
7.2%
: 5
 
6.0%
· 3
 
3.6%
. 2
 
2.4%
Space Separator
ValueCountFrequency (%)
177
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 662
64.3%
Common 342
33.2%
Latin 26
 
2.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
24
 
3.6%
24
 
3.6%
16
 
2.4%
12
 
1.8%
12
 
1.8%
12
 
1.8%
11
 
1.7%
11
 
1.7%
11
 
1.7%
10
 
1.5%
Other values (207) 519
78.4%
Common
ValueCountFrequency (%)
177
51.8%
, 67
 
19.6%
0 13
 
3.8%
1 12
 
3.5%
2 8
 
2.3%
) 6
 
1.8%
7 6
 
1.8%
4 6
 
1.8%
( 6
 
1.8%
' 6
 
1.8%
Other values (10) 35
 
10.2%
Latin
ValueCountFrequency (%)
O 3
11.5%
K 3
11.5%
a 2
 
7.7%
o 2
 
7.7%
M 2
 
7.7%
A 2
 
7.7%
R 2
 
7.7%
T 2
 
7.7%
G 1
 
3.8%
C 1
 
3.8%
Other values (6) 6
23.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 662
64.3%
ASCII 365
35.4%
None 3
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
177
48.5%
, 67
 
18.4%
0 13
 
3.6%
1 12
 
3.3%
2 8
 
2.2%
) 6
 
1.6%
7 6
 
1.6%
4 6
 
1.6%
( 6
 
1.6%
' 6
 
1.6%
Other values (25) 58
 
15.9%
Hangul
ValueCountFrequency (%)
24
 
3.6%
24
 
3.6%
16
 
2.4%
12
 
1.8%
12
 
1.8%
12
 
1.8%
11
 
1.7%
11
 
1.7%
11
 
1.7%
10
 
1.5%
Other values (207) 519
78.4%
None
ValueCountFrequency (%)
· 3
100.0%

기준년도
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.4 KiB
2020
681 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
2020 681
100.0%

Length

2023-12-12T14:31:52.680745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:31:52.811934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020 681
100.0%

Correlations

2023-12-12T14:31:52.913373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국가국가코드(ISO 2자리 코드)공공기관유형공공기관진출내용
국가1.0001.0000.6571.000
국가코드(ISO 2자리 코드)1.0001.0000.6571.000
공공기관유형0.6570.6571.0000.933
공공기관진출내용1.0001.0000.9331.000

Missing values

2023-12-12T14:31:49.098373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:31:49.227943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T14:31:49.328877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

국가국가코드(ISO 2자리 코드)공공기관유형공공기관명공공기관진출내용기준년도
0가봉GA일반기관KT<NA>2020
1가봉GA일반기관KTencore<NA>2020
2가봉GA정부기관FTC<NA>2020
3가봉GA정부기관대사관<NA>2020
4과테말라GT일반기관<NA>약 170개 업체 진출 (주로 섬유, 봉제업)2020
5과테말라GT정부기관KOICA<NA>2020
6과테말라GT일반기관100여개 업체대부분 섬유, 봉제업체2020
7과테말라GT정부투자기관KOTRA<NA>2020
8과테말라GT정부기관대사관<NA>2020
9나이지리아NG일반기관24개사도매 및 소매업, 건설공사업, 제조업2020
국가국가코드(ISO 2자리 코드)공공기관유형공공기관명공공기관진출내용기준년도
671헝가리HU일반기관삼성SID<NA>2020
672헝가리HU일반기관한국타이어<NA>2020
673호주AU정부투자기관KOTRA시드니, 멜버른2020
674호주AU정부투자기관관광공사<NA>2020
675호주AU정부기관대사관<NA>2020
676호주AU정부투자기관대한광물자원공사<NA>2020
677호주AU정부투자기관외환은행<NA>2020
678호주AU정부기관주멜번분관<NA>2020
679호주AU정부기관주시드니총영사관<NA>2020
680호주AU정부투자기관한국전력공사<NA>2020

Duplicate rows

Most frequently occurring

국가국가코드(ISO 2자리 코드)공공기관유형공공기관명공공기관진출내용기준년도# duplicates
0네팔NP일반기관유신<NA>20202
1모로코MA일반기관LG<NA>20202
2모로코MA일반기관대우건설<NA>20202
3모로코MA일반기관삼성전자<NA>20202
4모로코MA일반기관현대자동차<NA>20202
5베네수엘라VE일반기관현대건설<NA>20202
6브라질BR일반기관LG전자<NA>20202
7브라질BR일반기관삼성전자<NA>20202
8브라질BR일반기관현대자동차<NA>20202
9아르헨티나AR일반기관LG전자<NA>20202