Overview

Dataset statistics

Number of variables4
Number of observations163
Missing cells169
Missing cells (%)25.9%
Duplicate rows30
Duplicate rows (%)18.4%
Total size in memory5.2 KiB
Average record size in memory32.8 B

Variable types

Categorical2
Unsupported1
Text1

Dataset

Description홍성군 시외버스 현황 제공
Author충청남도 홍성군
URLhttps://www.data.go.kr/data/3073599/fileData.do

Alerts

Dataset has 30 (18.4%) duplicate rowsDuplicates
Unnamed: 1 has 18 (11.0%) missing valuesMissing
Unnamed: 3 has 151 (92.6%) missing valuesMissing
Unnamed: 1 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-13 00:41:07.031603
Analysis finished2023-12-13 00:41:07.378855
Duration0.35 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct36
Distinct (%)22.1%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
천안
26 
보령
18 
태안
17 
홍성
15 
서산
15 
Other values (31)
72 

Length

Max length17
Median length2
Mean length3.0613497
Min length2

Unique

Unique16 ?
Unique (%)9.8%

Sample

1st row<NA>
2nd row천안직행 (충37,금12,한1)
3rd row천안직행 시외버스 운행안내
4th row시발지
5th row홍성

Common Values

ValueCountFrequency (%)
천안 26
16.0%
보령 18
11.0%
태안 17
 
10.4%
홍성 15
 
9.2%
서산 15
 
9.2%
<NA> 6
 
3.7%
공대전 6
 
3.7%
시발지 6
 
3.7%
동서울 5
 
3.1%
청대전 5
 
3.1%
Other values (26) 44
27.0%

Length

2023-12-13T09:41:07.435957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
천안 26
14.3%
보령 18
 
9.9%
태안 17
 
9.3%
서산 15
 
8.2%
홍성 15
 
8.2%
운행안내 6
 
3.3%
na 6
 
3.3%
공대전 6
 
3.3%
시발지 6
 
3.3%
시외버스 6
 
3.3%
Other values (31) 61
33.5%

Unnamed: 1
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing18
Missing (%)11.0%
Memory size1.4 KiB

Unnamed: 2
Categorical

Distinct20
Distinct (%)12.3%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
천안
31 
서산
25 
보령
20 
<NA>
18 
태안
13 
Other values (15)
56 

Length

Max length4
Median length2
Mean length2.3803681
Min length2

Unique

Unique3 ?
Unique (%)1.8%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row행선지
5th row천안

Common Values

ValueCountFrequency (%)
천안 31
19.0%
서산 25
15.3%
보령 20
12.3%
<NA> 18
11.0%
태안 13
8.0%
공대전 9
 
5.5%
청대전 8
 
4.9%
행선지 6
 
3.7%
군산 5
 
3.1%
안산 5
 
3.1%
Other values (10) 23
14.1%

Length

2023-12-13T09:41:07.536645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
천안 31
19.0%
서산 25
15.3%
보령 20
12.3%
na 18
11.0%
태안 13
8.0%
공대전 9
 
5.5%
청대전 8
 
4.9%
행선지 6
 
3.7%
군산 5
 
3.1%
안산 5
 
3.1%
Other values (10) 23
14.1%

Unnamed: 3
Text

MISSING 

Distinct6
Distinct (%)50.0%
Missing151
Missing (%)92.6%
Memory size1.4 KiB
2023-12-13T09:41:07.637159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length2
Mean length2.5833333
Min length2

Characters and Unicode

Total characters31
Distinct characters15
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)33.3%

Sample

1st row비고
2nd row신양 I.C
3rd row산양I.C
4th row비고
5th row비고
ValueCountFrequency (%)
비고 6
46.2%
직통 2
 
15.4%
신양 1
 
7.7%
i.c 1
 
7.7%
산양i.c 1
 
7.7%
장항 1
 
7.7%
서천 1
 
7.7%
2023-12-13T09:41:07.829755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6
19.4%
6
19.4%
2
 
6.5%
2
 
6.5%
2
 
6.5%
I 2
 
6.5%
. 2
 
6.5%
C 2
 
6.5%
1
 
3.2%
1
 
3.2%
Other values (5) 5
16.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 24
77.4%
Uppercase Letter 4
 
12.9%
Other Punctuation 2
 
6.5%
Space Separator 1
 
3.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6
25.0%
6
25.0%
2
 
8.3%
2
 
8.3%
2
 
8.3%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
I 2
50.0%
C 2
50.0%
Other Punctuation
ValueCountFrequency (%)
. 2
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 24
77.4%
Latin 4
 
12.9%
Common 3
 
9.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6
25.0%
6
25.0%
2
 
8.3%
2
 
8.3%
2
 
8.3%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Latin
ValueCountFrequency (%)
I 2
50.0%
C 2
50.0%
Common
ValueCountFrequency (%)
. 2
66.7%
1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 24
77.4%
ASCII 7
 
22.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6
25.0%
6
25.0%
2
 
8.3%
2
 
8.3%
2
 
8.3%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
ASCII
ValueCountFrequency (%)
I 2
28.6%
. 2
28.6%
C 2
28.6%
1
14.3%

Correlations

2023-12-13T09:41:07.896484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시외버스 시간표Unnamed: 2Unnamed: 3
시외버스 시간표1.0000.8541.000
Unnamed: 20.8541.0001.000
Unnamed: 31.0001.0001.000
2023-12-13T09:41:07.961383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 2시외버스 시간표
Unnamed: 21.0000.424
시외버스 시간표0.4241.000
2023-12-13T09:41:08.021719image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시외버스 시간표Unnamed: 2
시외버스 시간표1.0000.424
Unnamed: 20.4241.000

Missing values

2023-12-13T09:41:07.174739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T09:41:07.252821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T09:41:07.329210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시외버스 시간표Unnamed: 1Unnamed: 2Unnamed: 3
0<NA>NaN<NA><NA>
1천안직행 (충37,금12,한1)NaN<NA><NA>
2천안직행 시외버스 운행안내NaN<NA><NA>
3시발지시간행선지비고
4홍성06:40:00천안<NA>
5홍성07:10:00천안<NA>
6서산07:33:00천안<NA>
7홍성07:38:00공대전신양 I.C
8홍성07:43:00청주<NA>
9태안08:05:00천안<NA>
시외버스 시간표Unnamed: 1Unnamed: 2Unnamed: 3
153동서울21:50:00보령<NA>
154<NA>NaN<NA><NA>
155안산행 (금5)NaN<NA><NA>
156안산행 시외버스 운행안내NaN<NA><NA>
157시발지시간행선지비고
158보령08:51:00안산<NA>
159보령11:09(직)안산<NA>
160보령15:24:00안산<NA>
161보령16:32:00안산<NA>
162보령18:10(직)안산<NA>

Duplicate rows

Most frequently occurring

시외버스 시간표Unnamed: 2Unnamed: 3# duplicates
24태안천안<NA>11
19천안서산<NA>10
8보령천안<NA>6
14시발지행선지비고6
20천안태안<NA>6
29<NA><NA><NA>6
0공대전서산<NA>5
4동서울보령<NA>5
6보령안산<NA>5
10서산공대전<NA>5