Overview

Dataset statistics

Number of variables4
Number of observations179
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.7 KiB
Average record size in memory32.7 B

Variable types

Text2
Categorical2

Dataset

Description국내 유선 및 도선의 사업장 현황 정보 - 유선사업은 유선 및 유선장을 갖추고 수상에서 고기잡이, 관광, 그 밖의 유락을 위하여 선박을 대여하거나 유락하는 사람을 승선시키는 것 - 도선사업은 도선 및 도선장을 갖추고 내수면 또는 대통령령으로 정하는 바다목에서 사람을 운송하거나 사람과 물건을 운송하는 것
URLhttps://www.data.go.kr/data/15070406/fileData.do

Alerts

사무실_전화번호 is highly imbalanced (64.8%)Imbalance

Reproduction

Analysis started2023-12-12 22:45:13.907924
Analysis finished2023-12-12 22:45:14.329318
Duration0.42 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct176
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
2023-12-13T07:45:14.467123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length16
Mean length7.7206704
Min length3

Characters and Unicode

Total characters1382
Distinct characters230
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique173 ?
Unique (%)96.6%

Sample

1st row인천배낚시
2nd row금강스타유선
3rd row현주바다낚시
4th row(주)푸른
5th row현대마린개발
ValueCountFrequency (%)
인천배낚시 3
 
1.5%
해진유선 2
 
1.0%
광복낚시 2
 
1.0%
태원유선 2
 
1.0%
도선 2
 
1.0%
장승포유람선협회 1
 
0.5%
노도도선운영위원회 1
 
0.5%
㈜지세포관광 1
 
0.5%
유람선협회 1
 
0.5%
와현유람선협회 1
 
0.5%
Other values (182) 182
91.9%
2023-12-13T07:45:14.852079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
71
 
5.1%
66
 
4.8%
58
 
4.2%
51
 
3.7%
) 49
 
3.5%
( 49
 
3.5%
43
 
3.1%
36
 
2.6%
27
 
2.0%
27
 
2.0%
Other values (220) 905
65.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1231
89.1%
Close Punctuation 49
 
3.5%
Open Punctuation 49
 
3.5%
Space Separator 19
 
1.4%
Other Symbol 18
 
1.3%
Decimal Number 16
 
1.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
71
 
5.8%
66
 
5.4%
58
 
4.7%
51
 
4.1%
43
 
3.5%
36
 
2.9%
27
 
2.2%
27
 
2.2%
22
 
1.8%
22
 
1.8%
Other values (210) 808
65.6%
Decimal Number
ValueCountFrequency (%)
2 6
37.5%
0 3
18.8%
1 3
18.8%
3 2
 
12.5%
5 1
 
6.2%
4 1
 
6.2%
Close Punctuation
ValueCountFrequency (%)
) 49
100.0%
Open Punctuation
ValueCountFrequency (%)
( 49
100.0%
Space Separator
ValueCountFrequency (%)
19
100.0%
Other Symbol
ValueCountFrequency (%)
18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1249
90.4%
Common 133
 
9.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
71
 
5.7%
66
 
5.3%
58
 
4.6%
51
 
4.1%
43
 
3.4%
36
 
2.9%
27
 
2.2%
27
 
2.2%
22
 
1.8%
22
 
1.8%
Other values (211) 826
66.1%
Common
ValueCountFrequency (%)
) 49
36.8%
( 49
36.8%
19
 
14.3%
2 6
 
4.5%
0 3
 
2.3%
1 3
 
2.3%
3 2
 
1.5%
5 1
 
0.8%
4 1
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1231
89.1%
ASCII 133
 
9.6%
None 18
 
1.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
71
 
5.8%
66
 
5.4%
58
 
4.7%
51
 
4.1%
43
 
3.5%
36
 
2.9%
27
 
2.2%
27
 
2.2%
22
 
1.8%
22
 
1.8%
Other values (210) 808
65.6%
ASCII
ValueCountFrequency (%)
) 49
36.8%
( 49
36.8%
19
 
14.3%
2 6
 
4.5%
0 3
 
2.3%
1 3
 
2.3%
3 2
 
1.5%
5 1
 
0.8%
4 1
 
0.8%
None
ValueCountFrequency (%)
18
100.0%

업종
Categorical

Distinct5
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
유선/면허
106 
도선/면허
66 
유선/신고
 
4
도선/신고
 
2
유선/면하
 
1

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row유선/면허
2nd row유선/면허
3rd row유선/면허
4th row유선/면허
5th row유선/면허

Common Values

ValueCountFrequency (%)
유선/면허 106
59.2%
도선/면허 66
36.9%
유선/신고 4
 
2.2%
도선/신고 2
 
1.1%
유선/면하 1
 
0.6%

Length

2023-12-13T07:45:15.001057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:45:15.122500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
유선/면허 106
59.2%
도선/면허 66
36.9%
유선/신고 4
 
2.2%
도선/신고 2
 
1.1%
유선/면하 1
 
0.6%
Distinct141
Distinct (%)78.8%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
2023-12-13T07:45:15.359171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length99
Median length48
Mean length20.128492
Min length12

Characters and Unicode

Total characters3603
Distinct characters190
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique124 ?
Unique (%)69.3%

Sample

1st row인천광역시 중구 항동7가60번지
2nd row인천광역시 중구 항동7가60번지
3rd row인천광역시 중구 항동7가60번지
4th row인천광역시 중구 항동7가60번지
5th row인천광역시 중구 항동7가60번지
ValueCountFrequency (%)
경남 45
 
6.5%
전남 38
 
5.5%
인천광역시 35
 
5.1%
여수시 21
 
3.0%
중구 21
 
3.0%
거제시 16
 
2.3%
항동7가100번지 11
 
1.6%
옹진군 10
 
1.4%
통영시 9
 
1.3%
서귀포시 8
 
1.2%
Other values (281) 478
69.1%
2023-12-13T07:45:15.764864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
514
 
14.3%
143
 
4.0%
124
 
3.4%
1 117
 
3.2%
95
 
2.6%
2 91
 
2.5%
- 83
 
2.3%
80
 
2.2%
3 76
 
2.1%
74
 
2.1%
Other values (180) 2206
61.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2303
63.9%
Decimal Number 621
 
17.2%
Space Separator 514
 
14.3%
Dash Punctuation 83
 
2.3%
Open Punctuation 35
 
1.0%
Close Punctuation 34
 
0.9%
Other Punctuation 13
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
143
 
6.2%
124
 
5.4%
95
 
4.1%
80
 
3.5%
74
 
3.2%
69
 
3.0%
69
 
3.0%
68
 
3.0%
66
 
2.9%
60
 
2.6%
Other values (165) 1455
63.2%
Decimal Number
ValueCountFrequency (%)
1 117
18.8%
2 91
14.7%
3 76
12.2%
0 65
10.5%
5 56
9.0%
7 52
8.4%
6 46
 
7.4%
4 45
 
7.2%
9 37
 
6.0%
8 36
 
5.8%
Space Separator
ValueCountFrequency (%)
514
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 83
100.0%
Open Punctuation
ValueCountFrequency (%)
( 35
100.0%
Close Punctuation
ValueCountFrequency (%)
) 34
100.0%
Other Punctuation
ValueCountFrequency (%)
, 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2303
63.9%
Common 1300
36.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
143
 
6.2%
124
 
5.4%
95
 
4.1%
80
 
3.5%
74
 
3.2%
69
 
3.0%
69
 
3.0%
68
 
3.0%
66
 
2.9%
60
 
2.6%
Other values (165) 1455
63.2%
Common
ValueCountFrequency (%)
514
39.5%
1 117
 
9.0%
2 91
 
7.0%
- 83
 
6.4%
3 76
 
5.8%
0 65
 
5.0%
5 56
 
4.3%
7 52
 
4.0%
6 46
 
3.5%
4 45
 
3.5%
Other values (5) 155
 
11.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2303
63.9%
ASCII 1300
36.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
514
39.5%
1 117
 
9.0%
2 91
 
7.0%
- 83
 
6.4%
3 76
 
5.8%
0 65
 
5.0%
5 56
 
4.3%
7 52
 
4.0%
6 46
 
3.5%
4 45
 
3.5%
Other values (5) 155
 
11.9%
Hangul
ValueCountFrequency (%)
143
 
6.2%
124
 
5.4%
95
 
4.1%
80
 
3.5%
74
 
3.2%
69
 
3.0%
69
 
3.0%
68
 
3.0%
66
 
2.9%
60
 
2.6%
Other values (165) 1455
63.2%

사무실_전화번호
Categorical

IMBALANCE 

Distinct37
Distinct (%)20.7%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
개인번호
141 
061-749-4382
 
3
051-403-9098
 
1
063-461-1116
 
1
032-202-8965
 
1
Other values (32)
32 

Length

Max length12
Median length4
Mean length5.698324
Min length4

Unique

Unique35 ?
Unique (%)19.6%

Sample

1st row개인번호
2nd row개인번호
3rd row개인번호
4th row개인번호
5th row개인번호

Common Values

ValueCountFrequency (%)
개인번호 141
78.8%
061-749-4382 3
 
1.7%
051-403-9098 1
 
0.6%
063-461-1116 1
 
0.6%
032-202-8965 1
 
0.6%
061-245-3222 1
 
0.6%
041-934-6896 1
 
0.6%
041-935-8959 1
 
0.6%
041-631-0103 1
 
0.6%
063-464-1919 1
 
0.6%
Other values (27) 27
 
15.1%

Length

2023-12-13T07:45:15.903587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
개인번호 141
78.8%
061-749-4382 3
 
1.7%
064-792-1188 1
 
0.6%
064-738-5355 1
 
0.6%
064-784-6163 1
 
0.6%
064-796-3515 1
 
0.6%
064-784-2335 1
 
0.6%
064-733-1874 1
 
0.6%
064-782-5271 1
 
0.6%
064-783-0000 1
 
0.6%
Other values (27) 27
 
15.1%

Correlations

2023-12-13T07:45:15.992257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종사무실_전화번호
업종1.0000.794
사무실_전화번호0.7941.000
2023-12-13T07:45:16.070660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사무실_전화번호업종
사무실_전화번호1.0000.469
업종0.4691.000
2023-12-13T07:45:16.146243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종사무실_전화번호
업종1.0000.469
사무실_전화번호0.4691.000

Missing values

2023-12-13T07:45:14.215518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:45:14.297370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사업장_상호명업종사업장_소재지사무실_전화번호
0인천배낚시유선/면허인천광역시 중구 항동7가60번지개인번호
1금강스타유선유선/면허인천광역시 중구 항동7가60번지개인번호
2현주바다낚시유선/면허인천광역시 중구 항동7가60번지개인번호
3(주)푸른유선/면허인천광역시 중구 항동7가60번지개인번호
4현대마린개발유선/면허인천광역시 중구 항동7가60번지개인번호
5국제유선유선/면허인천광역시 중구 항동7가100번지개인번호
6신나라유선유선/면허인천광역시 중구 항동7가100번지개인번호
7연안유선유선/면허인천광역시 중구 항동7가100번지개인번호
8킹콩낚시유선/면허인천광역시 중구 항동7가100번지032-202-8965
9하나유선유선/면허인천광역시 중구 항동7가100번지개인번호
사업장_상호명업종사업장_소재지사무실_전화번호
169제이에스에이유선/면허서귀포시 서홍동707-5번지(서귀항)064-733-1874
170파라다이스유선/면허서귀포시 서홍동707-5번지(서귀항)064-732-1717
171호반호텔앤리조트유선/면허서귀포시 색달동 2950-5(성천포구)064-738-2111
172제이엠유선/면허서귀포시 대포동 2184-1064-739-7776
173그린크루즈유선/면허서귀포시 안덕면 화순리636-15064-792-1188
174제주씨월드유선/면하서귀포시 성산읍 성산리347-5번지064-784-2337
175제주해양관광유선/면허서귀포시 성산읍 성산리347-5번지064-784-6163
176우도해운도선/면허제주시 우도면 서광리1401-3064-782-5671
177우림해운도선/면허제주시 우도면 서광리2395-5064-784-2335
178우도랜드도선/면허제주시 우도면 연평리1734-14064-782-4210