Overview

Dataset statistics

Number of variables6
Number of observations1576
Missing cells716
Missing cells (%)7.6%
Duplicate rows2
Duplicate rows (%)0.1%
Total size in memory74.0 KiB
Average record size in memory48.1 B

Variable types

Categorical3
Text3

Dataset

Description청코드, 시설제원코드, 시설제원명 등으로 해운항만물류정보시스템(PORT-MIS)에서 사용하는 항만 시설 제원코드 정보를 조회 하여, 항구청코드, 시설코드, 시설서브코드, 시설한글명, 시설영문명에 대한 정보를 제공합니다.
URLhttps://www.data.go.kr/data/15119666/fileData.do

Alerts

Dataset has 2 (0.1%) duplicate rowsDuplicates
시설이용구분 is highly overall correlated with 시설구분High correlation
시설구분 is highly overall correlated with 시설이용구분High correlation
시설이용구분 is highly imbalanced (53.7%)Imbalance
선석 구분 has 713 (45.2%) missing valuesMissing

Reproduction

Analysis started2023-12-12 03:58:47.210643
Analysis finished2023-12-12 03:58:48.070516
Duration0.86 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

항만
Categorical

Distinct32
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size12.4 KiB
부산항
305 
인천항
172 
울산항
135 
광양여천항
102 
여수항
79 
Other values (27)
783 

Length

Max length5
Median length3
Mean length3.268401
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부산항
2nd row부산항
3rd row부산항
4th row부산항
5th row부산항

Common Values

ValueCountFrequency (%)
부산항 305
19.4%
인천항 172
 
10.9%
울산항 135
 
8.6%
광양여천항 102
 
6.5%
여수항 79
 
5.0%
포항신항 75
 
4.8%
목포항 71
 
4.5%
군산항 67
 
4.3%
평택당진 60
 
3.8%
대산항 57
 
3.6%
Other values (22) 453
28.7%

Length

2023-12-12T12:58:48.172983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
부산항 305
19.4%
인천항 172
 
10.9%
울산항 135
 
8.6%
광양여천항 102
 
6.5%
여수항 79
 
5.0%
포항신항 75
 
4.8%
목포항 71
 
4.5%
군산항 67
 
4.3%
평택당진 60
 
3.8%
대산항 57
 
3.6%
Other values (22) 453
28.7%

시설이용구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size12.4 KiB
계류시설 M
1221 
수역 W
294 
기타
 
59
박지 A
 
2

Length

Max length6
Median length6
Mean length5.4746193
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row계류시설 M
2nd row계류시설 M
3rd row계류시설 M
4th row계류시설 M
5th row계류시설 M

Common Values

ValueCountFrequency (%)
계류시설 M 1221
77.5%
수역 W 294
 
18.7%
기타 59
 
3.7%
박지 A 2
 
0.1%

Length

2023-12-12T12:58:48.429088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T12:58:48.665509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
계류시설 1221
39.5%
m 1221
39.5%
수역 294
 
9.5%
w 294
 
9.5%
기타 59
 
1.9%
박지 2
 
0.1%
a 2
 
0.1%

시설구분
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size12.4 KiB
선석 B
814 
박지 A
284 
돌핀 D
105 
소형선부두 W
93 
기타
 
79
Other values (9)
201 

Length

Max length7
Median length4
Mean length4.1110406
Min length2

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row선석 B
2nd row선석 B
3rd row선석 B
4th row선석 B
5th row선석 B

Common Values

ValueCountFrequency (%)
선석 B 814
51.6%
박지 A 284
 
18.0%
돌핀 D 105
 
6.7%
소형선부두 W 93
 
5.9%
기타 79
 
5.0%
잔교 F 43
 
2.7%
안벽 Q 41
 
2.6%
선석 K 40
 
2.5%
선석 S 31
 
2.0%
조선소 Y 27
 
1.7%
Other values (4) 19
 
1.2%

Length

2023-12-12T12:58:48.852451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
선석 885
28.9%
b 814
26.6%
박지 284
 
9.3%
a 284
 
9.3%
돌핀 105
 
3.4%
d 105
 
3.4%
w 94
 
3.1%
소형선부두 93
 
3.0%
기타 79
 
2.6%
잔교 43
 
1.4%
Other values (12) 271
 
8.9%
Distinct833
Distinct (%)53.0%
Missing3
Missing (%)0.2%
Memory size12.4 KiB
2023-12-12T12:58:49.260458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length15
Mean length6.3623649
Min length2

Characters and Unicode

Total characters10008
Distinct characters306
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique649 ?
Unique (%)41.3%

Sample

1st row1부두
2nd row1부두
3rd row1부두
4th row1부두
5th row1부두
ValueCountFrequency (%)
부두 104
 
4.1%
정박지 76
 
3.0%
내항 47
 
1.9%
43
 
1.7%
박지 42
 
1.7%
5부두 40
 
1.6%
감천 38
 
1.5%
북항 38
 
1.5%
4부두 34
 
1.3%
7부두 31
 
1.2%
Other values (802) 2044
80.6%
2023-12-12T12:58:49.977023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1023
 
10.2%
922
 
9.2%
902
 
9.0%
275
 
2.7%
267
 
2.7%
258
 
2.6%
245
 
2.4%
1 245
 
2.4%
169
 
1.7%
2 166
 
1.7%
Other values (296) 5536
55.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7479
74.7%
Space Separator 1023
 
10.2%
Decimal Number 839
 
8.4%
Uppercase Letter 449
 
4.5%
Close Punctuation 74
 
0.7%
Open Punctuation 74
 
0.7%
Dash Punctuation 49
 
0.5%
Other Punctuation 14
 
0.1%
Lowercase Letter 5
 
< 0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
922
 
12.3%
902
 
12.1%
275
 
3.7%
267
 
3.6%
258
 
3.4%
245
 
3.3%
169
 
2.3%
138
 
1.8%
136
 
1.8%
127
 
1.7%
Other values (251) 4040
54.0%
Uppercase Letter
ValueCountFrequency (%)
S 80
17.8%
A 79
17.6%
W 32
 
7.1%
G 29
 
6.5%
K 29
 
6.5%
L 27
 
6.0%
O 25
 
5.6%
C 22
 
4.9%
E 18
 
4.0%
B 13
 
2.9%
Other values (13) 95
21.2%
Decimal Number
ValueCountFrequency (%)
1 245
29.2%
2 166
19.8%
3 91
 
10.8%
5 69
 
8.2%
4 68
 
8.1%
0 51
 
6.1%
6 48
 
5.7%
7 47
 
5.6%
8 43
 
5.1%
9 11
 
1.3%
Other Punctuation
ValueCountFrequency (%)
/ 7
50.0%
, 5
35.7%
· 1
 
7.1%
: 1
 
7.1%
Lowercase Letter
ValueCountFrequency (%)
i 2
40.0%
l 2
40.0%
c 1
20.0%
Space Separator
ValueCountFrequency (%)
1023
100.0%
Close Punctuation
ValueCountFrequency (%)
) 74
100.0%
Open Punctuation
ValueCountFrequency (%)
( 74
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 49
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7479
74.7%
Common 2075
 
20.7%
Latin 454
 
4.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
922
 
12.3%
902
 
12.1%
275
 
3.7%
267
 
3.6%
258
 
3.4%
245
 
3.3%
169
 
2.3%
138
 
1.8%
136
 
1.8%
127
 
1.7%
Other values (251) 4040
54.0%
Latin
ValueCountFrequency (%)
S 80
17.6%
A 79
17.4%
W 32
 
7.0%
G 29
 
6.4%
K 29
 
6.4%
L 27
 
5.9%
O 25
 
5.5%
C 22
 
4.8%
E 18
 
4.0%
B 13
 
2.9%
Other values (16) 100
22.0%
Common
ValueCountFrequency (%)
1023
49.3%
1 245
 
11.8%
2 166
 
8.0%
3 91
 
4.4%
) 74
 
3.6%
( 74
 
3.6%
5 69
 
3.3%
4 68
 
3.3%
0 51
 
2.5%
- 49
 
2.4%
Other values (9) 165
 
8.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7479
74.7%
ASCII 2528
 
25.3%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1023
40.5%
1 245
 
9.7%
2 166
 
6.6%
3 91
 
3.6%
S 80
 
3.2%
A 79
 
3.1%
) 74
 
2.9%
( 74
 
2.9%
5 69
 
2.7%
4 68
 
2.7%
Other values (34) 559
22.1%
Hangul
ValueCountFrequency (%)
922
 
12.3%
902
 
12.1%
275
 
3.7%
267
 
3.6%
258
 
3.4%
245
 
3.3%
169
 
2.3%
138
 
1.8%
136
 
1.8%
127
 
1.7%
Other values (251) 4040
54.0%
None
ValueCountFrequency (%)
· 1
100.0%

선석 구분
Text

MISSING 

Distinct442
Distinct (%)51.2%
Missing713
Missing (%)45.2%
Memory size12.4 KiB
2023-12-12T12:58:50.583144image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length3.8458864
Min length1

Characters and Unicode

Total characters3319
Distinct characters60
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique341 ?
Unique (%)39.5%

Sample

1st row10선석
2nd row11선석
3rd row12선석
4th row13선석
5th row14선석
ValueCountFrequency (%)
선석 99
 
9.1%
1선석 36
 
3.3%
2선석 34
 
3.1%
1 32
 
2.9%
2 31
 
2.9%
a 26
 
2.4%
3선석 26
 
2.4%
3 25
 
2.3%
b 25
 
2.3%
4선석 16
 
1.5%
Other values (330) 737
67.8%
2023-12-12T12:58:51.313000image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
589
17.7%
589
17.7%
1 300
 
9.0%
2 262
 
7.9%
225
 
6.8%
3 185
 
5.6%
4 150
 
4.5%
5 111
 
3.3%
6 82
 
2.5%
7 68
 
2.0%
Other values (50) 758
22.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1543
46.5%
Decimal Number 1270
38.3%
Uppercase Letter 246
 
7.4%
Space Separator 225
 
6.8%
Dash Punctuation 19
 
0.6%
Open Punctuation 8
 
0.2%
Close Punctuation 8
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
589
38.2%
589
38.2%
66
 
4.3%
59
 
3.8%
59
 
3.8%
52
 
3.4%
40
 
2.6%
12
 
0.8%
9
 
0.6%
9
 
0.6%
Other values (18) 59
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 56
22.8%
B 36
14.6%
W 33
13.4%
E 19
 
7.7%
P 13
 
5.3%
K 13
 
5.3%
T 13
 
5.3%
S 12
 
4.9%
N 10
 
4.1%
M 10
 
4.1%
Other values (8) 31
12.6%
Decimal Number
ValueCountFrequency (%)
1 300
23.6%
2 262
20.6%
3 185
14.6%
4 150
11.8%
5 111
 
8.7%
6 82
 
6.5%
7 68
 
5.4%
0 48
 
3.8%
8 43
 
3.4%
9 21
 
1.7%
Space Separator
ValueCountFrequency (%)
225
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 19
100.0%
Open Punctuation
ValueCountFrequency (%)
( 8
100.0%
Close Punctuation
ValueCountFrequency (%)
) 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1543
46.5%
Common 1530
46.1%
Latin 246
 
7.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
589
38.2%
589
38.2%
66
 
4.3%
59
 
3.8%
59
 
3.8%
52
 
3.4%
40
 
2.6%
12
 
0.8%
9
 
0.6%
9
 
0.6%
Other values (18) 59
 
3.8%
Latin
ValueCountFrequency (%)
A 56
22.8%
B 36
14.6%
W 33
13.4%
E 19
 
7.7%
P 13
 
5.3%
K 13
 
5.3%
T 13
 
5.3%
S 12
 
4.9%
N 10
 
4.1%
M 10
 
4.1%
Other values (8) 31
12.6%
Common
ValueCountFrequency (%)
1 300
19.6%
2 262
17.1%
225
14.7%
3 185
12.1%
4 150
9.8%
5 111
 
7.3%
6 82
 
5.4%
7 68
 
4.4%
0 48
 
3.1%
8 43
 
2.8%
Other values (4) 56
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1776
53.5%
Hangul 1543
46.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
589
38.2%
589
38.2%
66
 
4.3%
59
 
3.8%
59
 
3.8%
52
 
3.4%
40
 
2.6%
12
 
0.8%
9
 
0.6%
9
 
0.6%
Other values (18) 59
 
3.8%
ASCII
ValueCountFrequency (%)
1 300
16.9%
2 262
14.8%
225
12.7%
3 185
10.4%
4 150
8.4%
5 111
 
6.2%
6 82
 
4.6%
7 68
 
3.8%
A 56
 
3.2%
0 48
 
2.7%
Other values (22) 289
16.3%

코드
Text

Distinct846
Distinct (%)53.7%
Missing0
Missing (%)0.0%
Memory size12.4 KiB
2023-12-12T12:58:51.743958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters9456
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique579 ?
Unique (%)36.7%

Sample

1st rowMB1-00
2nd rowMB1-01
3rd rowMB1-02
4th rowMB1-03
5th rowMB1-04
ValueCountFrequency (%)
waq-01 16
 
1.0%
mb1-01 15
 
1.0%
waa-01 13
 
0.8%
waa-02 13
 
0.8%
mb3-01 12
 
0.8%
waa-04 12
 
0.8%
waa-03 12
 
0.8%
mb2-01 12
 
0.8%
mb6-01 11
 
0.7%
mb5-01 11
 
0.7%
Other values (836) 1449
91.9%
2023-12-12T12:58:52.308105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 1576
16.7%
0 1377
14.6%
M 1264
13.4%
B 889
9.4%
1 810
8.6%
W 466
 
4.9%
A 421
 
4.5%
2 391
 
4.1%
3 260
 
2.7%
4 200
 
2.1%
Other values (28) 1802
19.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4321
45.7%
Decimal Number 3513
37.2%
Dash Punctuation 1576
 
16.7%
Other Punctuation 46
 
0.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 1264
29.3%
B 889
20.6%
W 466
 
10.8%
A 421
 
9.7%
D 163
 
3.8%
S 135
 
3.1%
K 133
 
3.1%
F 89
 
2.1%
Y 78
 
1.8%
Q 75
 
1.7%
Other values (16) 608
14.1%
Decimal Number
ValueCountFrequency (%)
0 1377
39.2%
1 810
23.1%
2 391
 
11.1%
3 260
 
7.4%
4 200
 
5.7%
5 160
 
4.6%
6 113
 
3.2%
7 97
 
2.8%
8 64
 
1.8%
9 41
 
1.2%
Dash Punctuation
ValueCountFrequency (%)
- 1576
100.0%
Other Punctuation
ValueCountFrequency (%)
* 46
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5135
54.3%
Latin 4321
45.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 1264
29.3%
B 889
20.6%
W 466
 
10.8%
A 421
 
9.7%
D 163
 
3.8%
S 135
 
3.1%
K 133
 
3.1%
F 89
 
2.1%
Y 78
 
1.8%
Q 75
 
1.7%
Other values (16) 608
14.1%
Common
ValueCountFrequency (%)
- 1576
30.7%
0 1377
26.8%
1 810
15.8%
2 391
 
7.6%
3 260
 
5.1%
4 200
 
3.9%
5 160
 
3.1%
6 113
 
2.2%
7 97
 
1.9%
8 64
 
1.2%
Other values (2) 87
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9456
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 1576
16.7%
0 1377
14.6%
M 1264
13.4%
B 889
9.4%
1 810
8.6%
W 466
 
4.9%
A 421
 
4.5%
2 391
 
4.1%
3 260
 
2.7%
4 200
 
2.1%
Other values (28) 1802
19.1%

Correlations

2023-12-12T12:58:52.428611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
항만시설이용구분시설구분
항만1.0000.4180.652
시설이용구분0.4181.0000.900
시설구분0.6520.9001.000
2023-12-12T12:58:52.532892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
항만시설이용구분시설구분
항만1.0000.2080.255
시설이용구분0.2081.0000.756
시설구분0.2550.7561.000
2023-12-12T12:58:52.651879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
항만시설이용구분시설구분
항만1.0000.2080.255
시설이용구분0.2081.0000.756
시설구분0.2550.7561.000

Missing values

2023-12-12T12:58:47.671422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T12:58:47.792261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T12:58:47.999051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

항만시설이용구분시설구분시설명선석 구분코드
0부산항계류시설 M선석 B1부두10선석MB1-00
1부산항계류시설 M선석 B1부두11선석MB1-01
2부산항계류시설 M선석 B1부두12선석MB1-02
3부산항계류시설 M선석 B1부두13선석MB1-03
4부산항계류시설 M선석 B1부두14선석MB1-04
5부산항계류시설 M선석 B1부두15선석MB1-05
6부산항계류시설 M선석 B양곡부두(구 5부두)51선석MB5-01
7부산항계류시설 M선석 B양곡부두(구 5부두)52선석MB5-02
8부산항계류시설 M선석 B자성대부두(구6부두)61선석MB6-01
9부산항계류시설 M선석 B자성대부두(구6부두)62선석MB6-02
항만시설이용구분시설구분시설명선석 구분코드
1566서귀포항계류시설 M선석 B제 7부두 71선석<NA>MB7-01
1567서귀포항계류시설 M선석 B제8부두81선석MB8-01
1568서귀포항계류시설 M선석 B제8부두82선석MB8-02
1569서귀포항계류시설 M선석 B제8부두83선석MB8-03
1570서귀포항계류시설 M선석 B어선부두<NA>MBB-01
1571서귀포항계류시설 M선석 B유람선부두<NA>MBC-01
1572서귀포항계류시설 M선석 B강정지구 제1부두11선석MBG-01
1573서귀포항계류시설 M선석 B강정지구 제1부두12선석MBG-02
1574서귀포항계류시설 M선석 B강정지구 제2부두21선석MBH-01
1575서귀포항계류시설 M선석 B강정지구 제2부두22선석MBH-02

Duplicate rows

Most frequently occurring

항만시설이용구분시설구분시설명선석 구분코드# duplicates
0대산항수역 W박지 A장안서 대기정박지<NA>WAJ-012
1부산항계류시설 M선석 B감만 부두2선석MBR-022