Overview

Dataset statistics

Number of variables9
Number of observations3556
Missing cells0
Missing cells (%)0.0%
Duplicate rows1086
Duplicate rows (%)30.5%
Total size in memory257.1 KiB
Average record size in memory74.0 B

Variable types

Categorical6
Text1
Numeric2

Dataset

Description전라남도 신안군 선사별 선박정보, 좌석등급, 출발지, 도착지별 일반 대인, 소아 요금, 도서 대인,소아 요금정보입니다.
Author전라남도 신안군
URLhttps://www.data.go.kr/data/15091724/fileData.do

Alerts

Dataset has 1086 (30.5%) duplicate rowsDuplicates
선박 is highly overall correlated with 선사 and 1 other fieldsHigh correlation
선사 is highly overall correlated with 선박 and 1 other fieldsHigh correlation
일반소아 is highly overall correlated with 도서소아 and 2 other fieldsHigh correlation
도서소아 is highly overall correlated with 일반소아 and 2 other fieldsHigh correlation
좌석등급 is highly overall correlated with 선사 and 1 other fieldsHigh correlation
출발지 is highly overall correlated with 도서대인 High correlation
일반대인 is highly overall correlated with 일반소아 and 2 other fieldsHigh correlation
도서대인 is highly overall correlated with 일반소아 and 3 other fieldsHigh correlation
일반대인 is highly imbalanced (62.4%)Imbalance
도서대인 is highly imbalanced (59.9%)Imbalance
일반소아 has 297 (8.4%) zerosZeros
도서소아 has 297 (8.4%) zerosZeros

Reproduction

Analysis started2023-12-12 06:27:30.241129
Analysis finished2023-12-12 06:27:31.562031
Duration1.32 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

선사
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size27.9 KiB
(주)남해고속
2048 
(주)동양고속훼리
656 
주식회사 동양훼리
492 
(주)해광운수
 
167
합자회사 대흥
 
69
Other values (5)
 
124

Length

Max length11
Median length7
Mean length7.7097863
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row(유)신안해운
2nd row(유)신안해운
3rd row(유)신안해운
4th row(유)신안해운
5th row(유)신안해운

Common Values

ValueCountFrequency (%)
(주)남해고속 2048
57.6%
(주)동양고속훼리 656
 
18.4%
주식회사 동양훼리 492
 
13.8%
(주)해광운수 167
 
4.7%
합자회사 대흥 69
 
1.9%
남신안농협철부선사업소 55
 
1.5%
(유)해진해운 52
 
1.5%
(주)조양운수 9
 
0.3%
(유)신안해운 6
 
0.2%
(유)해진해운(화물) 2
 
0.1%

Length

2023-12-12T15:27:31.693598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:27:31.840688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
주)남해고속 2048
49.7%
주)동양고속훼리 656
 
15.9%
주식회사 492
 
12.0%
동양훼리 492
 
12.0%
주)해광운수 167
 
4.1%
합자회사 69
 
1.7%
대흥 69
 
1.7%
남신안농협철부선사업소 55
 
1.3%
유)해진해운 52
 
1.3%
주)조양운수 9
 
0.2%
Other values (2) 8
 
0.2%

선박
Categorical

HIGH CORRELATION 

Distinct34
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size27.9 KiB
남 해 프 린 스
564 
남 해 퀸
376 
남 해 엔 젤
356 
뉴-골드스타
328 
유토피아
328 
Other values (29)
1604 

Length

Max length10
Median length9
Mean length6.151856
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row신안페리1호
2nd row신안페리1호
3rd row신안페리1호
4th row신안페리2호
5th row신안페리2호

Common Values

ValueCountFrequency (%)
남 해 프 린 스 564
15.9%
남 해 퀸 376
10.6%
남 해 엔 젤 356
10.0%
뉴-골드스타 328
9.2%
유토피아 328
9.2%
동양골드 328
9.2%
뉴엔젤호 188
 
5.3%
뉴돌핀호 188
 
5.3%
뉴 퀸 188
 
5.3%
핑크돌핀호 188
 
5.3%
Other values (24) 524
14.7%

Length

2023-12-12T15:27:31.984040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1296
16.4%
1296
16.4%
564
 
7.1%
564
 
7.1%
564
 
7.1%
564
 
7.1%
356
 
4.5%
356
 
4.5%
유토피아 328
 
4.2%
동양골드 328
 
4.2%
Other values (31) 1680
21.3%

좌석등급
Categorical

HIGH CORRELATION 

Distinct21
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size27.9 KiB
1층
899 
2층
899 
2층의자
328 
1층의자
328 
2층VIP2
188 
Other values (16)
914 

Length

Max length7
Median length2
Mean length3.2438133
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3등객실
2nd row3등객실
3rd row3등객실
4th row3등객실
5th row3등객실

Common Values

ValueCountFrequency (%)
1층 899
25.3%
2층 899
25.3%
2층의자 328
 
9.2%
1층의자 328
 
9.2%
2층VIP2 188
 
5.3%
2층VIP1 188
 
5.3%
일반실 176
 
4.9%
2층VIP3 94
 
2.6%
2층VIP4 94
 
2.6%
2층VIP 89
 
2.5%
Other values (11) 273
 
7.7%

Length

2023-12-12T15:27:32.125216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1층 899
25.3%
2층 899
25.3%
2층의자 328
 
9.2%
1층의자 328
 
9.2%
2층vip2 188
 
5.3%
2층vip1 188
 
5.3%
일반실 176
 
4.9%
2층vip3 94
 
2.6%
2층vip4 94
 
2.6%
1층석 89
 
2.5%
Other values (11) 273
 
7.7%

출발지
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size27.9 KiB
목포
2274 
홍도
736 
대흑산도
473 
비금도
 
72
해남우수영
 
1

Length

Max length5
Median length2
Mean length2.2871204
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row목포
2nd row목포
3rd row목포
4th row목포
5th row목포

Common Values

ValueCountFrequency (%)
목포 2274
63.9%
홍도 736
 
20.7%
대흑산도 473
 
13.3%
비금도 72
 
2.0%
해남우수영 1
 
< 0.1%

Length

2023-12-12T15:27:32.266679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:27:32.387913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
목포 2274
63.9%
홍도 736
 
20.7%
대흑산도 473
 
13.3%
비금도 72
 
2.0%
해남우수영 1
 
< 0.1%
Distinct73
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size27.9 KiB
2023-12-12T15:27:32.628450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length3
Mean length3.5247469
Min length2

Characters and Unicode

Total characters12534
Distinct characters94
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row마진도
2nd row장산도
3rd row상태_신의
4th row상태_신의
5th row장산도
ValueCountFrequency (%)
대흑산도 408
11.5%
홍도 332
9.3%
다물도 306
8.6%
비금-도초 296
8.3%
소흑산_가거도 274
 
7.7%
도초도 273
 
7.7%
만재도 234
 
6.6%
하태도 234
 
6.6%
중태도 234
 
6.6%
상태도 234
 
6.6%
Other values (63) 731
20.6%
2023-12-12T15:27:33.034600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3524
28.1%
734
 
5.9%
722
 
5.8%
682
 
5.4%
572
 
4.6%
515
 
4.1%
495
 
3.9%
420
 
3.4%
_ 403
 
3.2%
332
 
2.6%
Other values (84) 4135
33.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 11825
94.3%
Connector Punctuation 403
 
3.2%
Dash Punctuation 300
 
2.4%
Decimal Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3524
29.8%
734
 
6.2%
722
 
6.1%
682
 
5.8%
572
 
4.8%
515
 
4.4%
495
 
4.2%
420
 
3.6%
332
 
2.8%
306
 
2.6%
Other values (80) 3523
29.8%
Decimal Number
ValueCountFrequency (%)
1 3
50.0%
2 3
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 403
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 300
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 11825
94.3%
Common 709
 
5.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3524
29.8%
734
 
6.2%
722
 
6.1%
682
 
5.8%
572
 
4.8%
515
 
4.4%
495
 
4.2%
420
 
3.6%
332
 
2.8%
306
 
2.6%
Other values (80) 3523
29.8%
Common
ValueCountFrequency (%)
_ 403
56.8%
- 300
42.3%
1 3
 
0.4%
2 3
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 11825
94.3%
ASCII 709
 
5.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3524
29.8%
734
 
6.2%
722
 
6.1%
682
 
5.8%
572
 
4.8%
515
 
4.4%
495
 
4.2%
420
 
3.6%
332
 
2.8%
306
 
2.6%
Other values (80) 3523
29.8%
ASCII
ValueCountFrequency (%)
_ 403
56.8%
- 300
42.3%
1 3
 
0.4%
2 3
 
0.4%

일반대인
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct21
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size27.9 KiB
1,500
2599 
0
297 
900
 
241
1,250
 
108
1,200
 
70
Other values (16)
 
241

Length

Max length5
Median length5
Mean length4.4454443
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row650
2nd row650
3rd row950
4th row950
5th row650

Common Values

ValueCountFrequency (%)
1,500 2599
73.1%
0 297
 
8.4%
900 241
 
6.8%
1,250 108
 
3.0%
1,200 70
 
2.0%
1,000 65
 
1.8%
650 35
 
1.0%
950 22
 
0.6%
800 21
 
0.6%
500 19
 
0.5%
Other values (11) 79
 
2.2%

Length

2023-12-12T15:27:33.267877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1,500 2599
73.1%
0 297
 
8.4%
900 241
 
6.8%
1,250 108
 
3.0%
1,200 70
 
2.0%
1,000 65
 
1.8%
650 35
 
1.0%
950 22
 
0.6%
800 21
 
0.6%
500 19
 
0.5%
Other values (11) 79
 
2.2%

일반소아
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct13
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean533.80202
Minimum0
Maximum750
Zeros297
Zeros (%)8.4%
Negative0
Negative (%)0.0%
Memory size31.4 KiB
2023-12-12T15:27:33.474391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1350
median700
Q3750
95-th percentile750
Maximum750
Range750
Interquartile range (IQR)400

Descriptive statistics

Standard deviation249.68447
Coefficient of variation (CV)0.46774732
Kurtosis-0.61761589
Mean533.80202
Median Absolute Deviation (MAD)50
Skewness-0.84633076
Sum1898200
Variance62342.334
MonotonicityNot monotonic
2023-12-12T15:27:33.648622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
750 1393
39.2%
700 391
 
11.0%
200 366
 
10.3%
400 348
 
9.8%
0 297
 
8.4%
600 224
 
6.3%
550 223
 
6.3%
350 91
 
2.6%
300 76
 
2.1%
250 69
 
1.9%
Other values (3) 78
 
2.2%
ValueCountFrequency (%)
0 297
8.4%
200 366
10.3%
250 69
 
1.9%
300 76
 
2.1%
350 91
 
2.6%
400 348
9.8%
450 21
 
0.6%
500 54
 
1.5%
550 223
6.3%
600 224
6.3%
ValueCountFrequency (%)
750 1393
39.2%
700 391
 
11.0%
650 3
 
0.1%
600 224
 
6.3%
550 223
 
6.3%
500 54
 
1.5%
450 21
 
0.6%
400 348
 
9.8%
350 91
 
2.6%
300 76
 
2.1%

도서대인
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct14
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size27.9 KiB
750
2643 
0
297 
400
 
256
700
 
78
500
 
74
Other values (9)
 
208

Length

Max length5
Median length3
Mean length2.8335208
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row300
2nd row300
3rd row400
4th row400
5th row300

Common Values

ValueCountFrequency (%)
750 2643
74.3%
0 297
 
8.4%
400 256
 
7.2%
700 78
 
2.2%
500 74
 
2.1%
600 50
 
1.4%
300 42
 
1.2%
200 35
 
1.0%
350 32
 
0.9%
450 24
 
0.7%
Other values (4) 25
 
0.7%

Length

2023-12-12T15:27:34.222937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
750 2643
74.3%
0 297
 
8.4%
400 256
 
7.2%
700 78
 
2.2%
500 74
 
2.1%
600 50
 
1.4%
300 42
 
1.2%
200 35
 
1.0%
350 32
 
0.9%
450 24
 
0.7%
Other values (4) 25
 
0.7%

도서소아
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct14
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean445.71147
Minimum0
Maximum750
Zeros297
Zeros (%)8.4%
Negative0
Negative (%)0.0%
Memory size31.4 KiB
2023-12-12T15:27:34.376289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1350
median350
Q3650
95-th percentile750
Maximum750
Range750
Interquartile range (IQR)300

Descriptive statistics

Standard deviation233.33987
Coefficient of variation (CV)0.52352225
Kurtosis-0.99962554
Mean445.71147
Median Absolute Deviation (MAD)200
Skewness-0.16547614
Sum1584950
Variance54447.496
MonotonicityNot monotonic
2023-12-12T15:27:34.534011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
350 1147
32.3%
750 802
22.6%
200 390
 
11.0%
650 390
 
11.0%
0 297
 
8.4%
550 216
 
6.1%
600 108
 
3.0%
250 81
 
2.3%
300 59
 
1.7%
500 39
 
1.1%
Other values (4) 27
 
0.8%
ValueCountFrequency (%)
0 297
 
8.4%
150 6
 
0.2%
200 390
 
11.0%
250 81
 
2.3%
300 59
 
1.7%
325 2
 
0.1%
350 1147
32.3%
400 13
 
0.4%
450 6
 
0.2%
500 39
 
1.1%
ValueCountFrequency (%)
750 802
22.6%
650 390
 
11.0%
600 108
 
3.0%
550 216
 
6.1%
500 39
 
1.1%
450 6
 
0.2%
400 13
 
0.4%
350 1147
32.3%
325 2
 
0.1%
300 59
 
1.7%

Interactions

2023-12-12T15:27:31.156773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:27:30.971632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:27:31.244478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:27:31.067616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:27:34.662912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선사선박좌석등급출발지도착지일반대인일반소아도서대인도서소아
선사1.0000.9880.9360.3200.9010.7680.4340.6150.592
선박0.9881.0000.9620.5140.8780.7620.5350.7290.573
좌석등급0.9360.9621.0000.2930.7980.7070.4290.6120.468
출발지0.3200.5140.2931.0000.6270.5380.6580.8210.783
도착지0.9010.8780.7980.6271.0000.9850.8830.9410.890
일반대인0.7680.7620.7070.5380.9851.0000.9030.9610.878
일반소아0.4340.5350.4290.6580.8830.9031.0000.8980.919
도서대인0.6150.7290.6120.8210.9410.9610.8981.0000.865
도서소아0.5920.5730.4680.7830.8900.8780.9190.8651.000
2023-12-12T15:27:34.828169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출발지선박좌석등급선사일반대인도서대인
출발지1.0000.2680.1480.1380.2980.607
선박0.2681.0000.6580.9080.2910.311
좌석등급0.1480.6581.0000.7110.2150.234
선사0.1380.9080.7111.0000.4110.303
일반대인0.2980.2910.2150.4111.0000.721
도서대인0.6070.3110.2340.3030.7211.000
2023-12-12T15:27:34.978520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일반소아도서소아선사선박좌석등급출발지일반대인도서대인
일반소아1.0000.7580.2140.2280.1800.4560.6400.673
도서소아0.7581.0000.2180.2400.1920.4380.5710.586
선사0.2140.2181.0000.9080.7110.1380.4110.303
선박0.2280.2400.9081.0000.6580.2680.2910.311
좌석등급0.1800.1920.7110.6581.0000.1480.2150.234
출발지0.4560.4380.1380.2680.1481.0000.2980.607
일반대인0.6400.5710.4110.2910.2150.2981.0000.721
도서대인0.6730.5860.3030.3110.2340.6070.7211.000

Missing values

2023-12-12T15:27:31.369043image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:27:31.501101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

선사선박좌석등급출발지도착지일반대인일반소아도서대인도서소아
0(유)신안해운신안페리1호3등객실목포마진도650200300200
1(유)신안해운신안페리1호3등객실목포장산도650200300200
2(유)신안해운신안페리1호3등객실목포상태_신의950200400200
3(유)신안해운신안페리2호3등객실목포상태_신의950200400200
4(유)신안해운신안페리2호3등객실목포장산도650200300200
5(유)신안해운신안페리2호3등객실목포마진도650200300200
6(유)해진해운뉴드림호3등객실대흑산도도초도1,300300500250
7(유)해진해운뉴드림호3등객실대흑산도송공1,500500750400
8(유)해진해운뉴드림호3등객실대흑산도해남우수영1,500500750400
9(유)해진해운뉴드림호3등객실해남우수영대흑산도1,0005001,000500
선사선박좌석등급출발지도착지일반대인일반소아도서대인도서소아
3546합자회사 대흥비금농협고속페리2호등급없음목포수치도800200350200
3547합자회사 대흥비금농협고속페리2호등급없음목포사치도650200300200
3548합자회사 대흥비금농협고속페리2호등급없음목포도초도900200400200
3549합자회사 대흥비금농협고속페리2호등급없음목포비금_수대900200400200
3550합자회사 대흥비금농협고속페리2호등급없음목포비금_가산800200350200
3551합자회사 대흥비금농협고속페리2호등급없음목포암태도0000
3552합자회사 대흥비금농협고속페리2호등급없음목포대흑산도1,500500750400
3553합자회사 대흥비금농협고속페리2호등급없음목포안좌_읍동0000
3554합자회사 대흥비금농협고속페리2호등급없음목포비금_가산0000
3555합자회사 대흥비금농협고속페리2호등급없음목포도초도900200400200

Duplicate rows

Most frequently occurring

선사선박좌석등급출발지도착지일반대인일반소아도서대인도서소아# duplicates
6(주)남해고속남 해 엔 젤1층목포대흑산도1,5007007506506
22(주)남해고속남 해 엔 젤1층목포홍도1,5007507507506
36(주)남해고속남 해 엔 젤1층석목포대흑산도1,5007007506506
52(주)남해고속남 해 엔 젤1층석목포홍도1,5007507507506
66(주)남해고속남 해 엔 젤2층목포대흑산도1,5007007506506
82(주)남해고속남 해 엔 젤2층목포홍도1,5007507507506
96(주)남해고속남 해 엔 젤2층VIP목포대흑산도1,5007007506506
112(주)남해고속남 해 엔 젤2층VIP목포홍도1,5007507507506
126(주)남해고속남 해 퀸1층목포대흑산도1,5007007506506
142(주)남해고속남 해 퀸1층목포홍도1,5007507507506