Overview

Dataset statistics

Number of variables6
Number of observations114
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.6 KiB
Average record size in memory50.2 B

Variable types

Categorical1
Numeric1
Text4

Dataset

Description부산교통공사_도시철도역명정보_20221012
Author부산교통공사
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=3077187

Alerts

역번호 is highly overall correlated with 호선High correlation
호선 is highly overall correlated with 역번호High correlation
전화번호 has unique valuesUnique

Reproduction

Analysis started2023-12-10 16:20:59.674086
Analysis finished2023-12-10 16:21:00.231779
Duration0.56 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2호선
43 
1호선
40 
3호선
17 
4호선
14 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
2호선 43
37.7%
1호선 40
35.1%
3호선 17
 
14.9%
4호선 14
 
12.3%

Length

2023-12-11T01:21:00.293031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:21:00.416226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2호선 43
37.7%
1호선 40
35.1%
3호선 17
 
14.9%
4호선 14
 
12.3%

역번호
Real number (ℝ)

HIGH CORRELATION 

Distinct113
Distinct (%)99.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean220.04386
Minimum95
Maximum414
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-12-11T01:21:00.562859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum95
5-th percentile100.65
Q1123.25
median217.5
Q3302.75
95-th percentile408.35
Maximum414
Range319
Interquartile range (IQR)179.5

Descriptive statistics

Standard deviation97.979173
Coefficient of variation (CV)0.44527111
Kurtosis-0.67713011
Mean220.04386
Median Absolute Deviation (MAD)90
Skewness0.53386789
Sum25085
Variance9599.9184
MonotonicityIncreasing
2023-12-11T01:21:00.732790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
209 2
 
1.8%
95 1
 
0.9%
234 1
 
0.9%
302 1
 
0.9%
301 1
 
0.9%
243 1
 
0.9%
242 1
 
0.9%
241 1
 
0.9%
240 1
 
0.9%
239 1
 
0.9%
Other values (103) 103
90.4%
ValueCountFrequency (%)
95 1
0.9%
96 1
0.9%
97 1
0.9%
98 1
0.9%
99 1
0.9%
100 1
0.9%
101 1
0.9%
102 1
0.9%
103 1
0.9%
104 1
0.9%
ValueCountFrequency (%)
414 1
0.9%
413 1
0.9%
412 1
0.9%
411 1
0.9%
410 1
0.9%
409 1
0.9%
408 1
0.9%
407 1
0.9%
406 1
0.9%
405 1
0.9%

역명
Text

Distinct108
Distinct (%)94.7%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2023-12-11T01:21:01.052528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length2
Mean length2.5438596
Min length2

Characters and Unicode

Total characters290
Distinct characters134
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique102 ?
Unique (%)89.5%

Sample

1st row다대포해수욕장
2nd row다대포항
3rd row낫개
4th row신장림
5th row장림
ValueCountFrequency (%)
연산 2
 
1.8%
미남 2
 
1.8%
덕천 2
 
1.8%
동래 2
 
1.8%
수영 2
 
1.8%
서면 2
 
1.8%
증산 1
 
0.9%
동원 1
 
0.9%
금곡 1
 
0.9%
호포 1
 
0.9%
Other values (98) 98
86.0%
2023-12-11T01:21:01.543017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
18
 
6.2%
16
 
5.5%
10
 
3.4%
9
 
3.1%
8
 
2.8%
8
 
2.8%
8
 
2.8%
6
 
2.1%
6
 
2.1%
5
 
1.7%
Other values (124) 196
67.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 288
99.3%
Other Punctuation 2
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
18
 
6.2%
16
 
5.6%
10
 
3.5%
9
 
3.1%
8
 
2.8%
8
 
2.8%
8
 
2.8%
6
 
2.1%
6
 
2.1%
5
 
1.7%
Other values (123) 194
67.4%
Other Punctuation
ValueCountFrequency (%)
· 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 288
99.3%
Common 2
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
18
 
6.2%
16
 
5.6%
10
 
3.5%
9
 
3.1%
8
 
2.8%
8
 
2.8%
8
 
2.8%
6
 
2.1%
6
 
2.1%
5
 
1.7%
Other values (123) 194
67.4%
Common
ValueCountFrequency (%)
· 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 288
99.3%
None 2
 
0.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
18
 
6.2%
16
 
5.6%
10
 
3.5%
9
 
3.1%
8
 
2.8%
8
 
2.8%
8
 
2.8%
6
 
2.1%
6
 
2.1%
5
 
1.7%
Other values (123) 194
67.4%
None
ValueCountFrequency (%)
· 2
100.0%

영문
Text

Distinct108
Distinct (%)94.7%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2023-12-11T01:21:01.914078image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length37
Median length29
Mean length8.9210526
Min length4

Characters and Unicode

Total characters1017
Distinct characters52
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique102 ?
Unique (%)89.5%

Sample

1st rowDadaepo Beach
2nd rowDadaepo Harbor
3rd rowNatgae
4th rowSinjangnim
5th rowJangnim
ValueCountFrequency (%)
univ 6
 
4.1%
nat'l 4
 
2.7%
busan 3
 
2.1%
yeonsan 2
 
1.4%
seomyeon 2
 
1.4%
of 2
 
1.4%
dadaepo 2
 
1.4%
sports 2
 
1.4%
city 2
 
1.4%
yangsan 2
 
1.4%
Other values (114) 119
81.5%
2023-12-11T01:21:02.468766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 134
 
13.2%
a 107
 
10.5%
o 93
 
9.1%
e 89
 
8.8%
g 63
 
6.2%
u 48
 
4.7%
s 36
 
3.5%
i 35
 
3.4%
32
 
3.1%
m 31
 
3.0%
Other values (42) 349
34.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 815
80.1%
Uppercase Letter 151
 
14.8%
Space Separator 32
 
3.1%
Other Punctuation 14
 
1.4%
Dash Punctuation 3
 
0.3%
Close Punctuation 1
 
0.1%
Open Punctuation 1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 134
16.4%
a 107
13.1%
o 93
11.4%
e 89
10.9%
g 63
7.7%
u 48
 
5.9%
s 36
 
4.4%
i 35
 
4.3%
m 31
 
3.8%
y 24
 
2.9%
Other values (14) 155
19.0%
Uppercase Letter
ValueCountFrequency (%)
S 18
11.9%
D 17
11.3%
B 16
10.6%
G 15
9.9%
M 14
9.3%
N 13
8.6%
J 11
7.3%
C 9
 
6.0%
U 7
 
4.6%
Y 7
 
4.6%
Other values (11) 24
15.9%
Other Punctuation
ValueCountFrequency (%)
. 7
50.0%
' 5
35.7%
· 2
 
14.3%
Space Separator
ValueCountFrequency (%)
32
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 966
95.0%
Common 51
 
5.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 134
13.9%
a 107
 
11.1%
o 93
 
9.6%
e 89
 
9.2%
g 63
 
6.5%
u 48
 
5.0%
s 36
 
3.7%
i 35
 
3.6%
m 31
 
3.2%
y 24
 
2.5%
Other values (35) 306
31.7%
Common
ValueCountFrequency (%)
32
62.7%
. 7
 
13.7%
' 5
 
9.8%
- 3
 
5.9%
· 2
 
3.9%
) 1
 
2.0%
( 1
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1015
99.8%
None 2
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 134
 
13.2%
a 107
 
10.5%
o 93
 
9.2%
e 89
 
8.8%
g 63
 
6.2%
u 48
 
4.7%
s 36
 
3.5%
i 35
 
3.4%
32
 
3.2%
m 31
 
3.1%
Other values (41) 347
34.2%
None
ValueCountFrequency (%)
· 2
100.0%

주소
Text

Distinct109
Distinct (%)95.6%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2023-12-11T01:21:02.822819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length23
Mean length20.5
Min length16

Characters and Unicode

Total characters2337
Distinct characters86
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique104 ?
Unique (%)91.2%

Sample

1st row부산광역시 사하구 다대로 지하 692(다대동)
2nd row부산광역시 사하구 다대로 지하548(다대동)
3rd row부산광역시 사하구 다대로 지하442(다대동)
4th row부산광역시 사하구 다대로 지하310(장림동)
5th row부산광역시 사하구 다대로 지하230(장림동)
ValueCountFrequency (%)
부산광역시 109
20.0%
지하 84
 
15.4%
중앙대로 21
 
3.8%
북구 13
 
2.4%
사하구 12
 
2.2%
부산진구 10
 
1.8%
동래구 10
 
1.8%
해운대구 10
 
1.8%
수영로 10
 
1.8%
금정구 9
 
1.6%
Other values (150) 258
47.3%
2023-12-11T01:21:03.359502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
432
18.5%
126
 
5.4%
119
 
5.1%
118
 
5.0%
114
 
4.9%
111
 
4.7%
109
 
4.7%
109
 
4.7%
102
 
4.4%
89
 
3.8%
Other values (76) 908
38.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1540
65.9%
Space Separator 432
 
18.5%
Decimal Number 348
 
14.9%
Open Punctuation 6
 
0.3%
Close Punctuation 6
 
0.3%
Dash Punctuation 5
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
126
 
8.2%
119
 
7.7%
118
 
7.7%
114
 
7.4%
111
 
7.2%
109
 
7.1%
109
 
7.1%
102
 
6.6%
89
 
5.8%
83
 
5.4%
Other values (62) 460
29.9%
Decimal Number
ValueCountFrequency (%)
1 65
18.7%
2 52
14.9%
0 41
11.8%
4 36
10.3%
7 32
9.2%
3 32
9.2%
9 25
 
7.2%
6 24
 
6.9%
5 21
 
6.0%
8 20
 
5.7%
Space Separator
ValueCountFrequency (%)
432
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1540
65.9%
Common 797
34.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
126
 
8.2%
119
 
7.7%
118
 
7.7%
114
 
7.4%
111
 
7.2%
109
 
7.1%
109
 
7.1%
102
 
6.6%
89
 
5.8%
83
 
5.4%
Other values (62) 460
29.9%
Common
ValueCountFrequency (%)
432
54.2%
1 65
 
8.2%
2 52
 
6.5%
0 41
 
5.1%
4 36
 
4.5%
7 32
 
4.0%
3 32
 
4.0%
9 25
 
3.1%
6 24
 
3.0%
5 21
 
2.6%
Other values (4) 37
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1540
65.9%
ASCII 797
34.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
432
54.2%
1 65
 
8.2%
2 52
 
6.5%
0 41
 
5.1%
4 36
 
4.5%
7 32
 
4.0%
3 32
 
4.0%
9 25
 
3.1%
6 24
 
3.0%
5 21
 
2.6%
Other values (4) 37
 
4.6%
Hangul
ValueCountFrequency (%)
126
 
8.2%
119
 
7.7%
118
 
7.7%
114
 
7.4%
111
 
7.2%
109
 
7.1%
109
 
7.1%
102
 
6.6%
89
 
5.8%
83
 
5.4%
Other values (62) 460
29.9%

전화번호
Text

UNIQUE 

Distinct114
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2023-12-11T01:21:03.681844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters1368
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique114 ?
Unique (%)100.0%

Sample

1st row051-678-6195
2nd row051-678-6196
3rd row051-678-6197
4th row051-678-6198
5th row051-678-6199
ValueCountFrequency (%)
051-678-6195 1
 
0.9%
051-678-6302 1
 
0.9%
051-678-6243 1
 
0.9%
051-678-6242 1
 
0.9%
051-678-6241 1
 
0.9%
051-678-6240 1
 
0.9%
051-678-6239 1
 
0.9%
051-678-6238 1
 
0.9%
051-678-6237 1
 
0.9%
051-678-6236 1
 
0.9%
Other values (104) 104
91.2%
2023-12-11T01:21:04.158313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6 239
17.5%
- 228
16.7%
1 200
14.6%
0 161
11.8%
5 125
9.1%
7 125
9.1%
8 124
9.1%
2 76
 
5.6%
3 45
 
3.3%
4 30
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1140
83.3%
Dash Punctuation 228
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6 239
21.0%
1 200
17.5%
0 161
14.1%
5 125
11.0%
7 125
11.0%
8 124
10.9%
2 76
 
6.7%
3 45
 
3.9%
4 30
 
2.6%
9 15
 
1.3%
Dash Punctuation
ValueCountFrequency (%)
- 228
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1368
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6 239
17.5%
- 228
16.7%
1 200
14.6%
0 161
11.8%
5 125
9.1%
7 125
9.1%
8 124
9.1%
2 76
 
5.6%
3 45
 
3.3%
4 30
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1368
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6 239
17.5%
- 228
16.7%
1 200
14.6%
0 161
11.8%
5 125
9.1%
7 125
9.1%
8 124
9.1%
2 76
 
5.6%
3 45
 
3.3%
4 30
 
2.2%

Interactions

2023-12-11T01:20:59.908023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:21:04.282132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선역번호
호선1.0001.000
역번호1.0001.000
2023-12-11T01:21:04.397237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역번호호선
역번호1.0000.991
호선0.9911.000

Missing values

2023-12-11T01:21:00.058469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:21:00.191039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선역번호역명영문주소전화번호
01호선95다대포해수욕장Dadaepo Beach부산광역시 사하구 다대로 지하 692(다대동)051-678-6195
11호선96다대포항Dadaepo Harbor부산광역시 사하구 다대로 지하548(다대동)051-678-6196
21호선97낫개Natgae부산광역시 사하구 다대로 지하442(다대동)051-678-6197
31호선98신장림Sinjangnim부산광역시 사하구 다대로 지하310(장림동)051-678-6198
41호선99장림Jangnim부산광역시 사하구 다대로 지하230(장림동)051-678-6199
51호선100동매Dongmae부산광역시 사하구 신산로 지하168(신평동)051-678-6100
61호선101신평Sinpyeong부산광역시 사하구 하신번영로 140051-678-6101
71호선102하단Hadan부산광역시 사하구 낙동남로 지하 1415051-678-6102
81호선103당리Dangni부산광역시 사하구 낙동대로 지하 405051-678-6103
91호선104사하Saha부산광역시 사하구 낙동대로 지하 309051-678-6104
호선역번호역명영문주소전화번호
1044호선405충렬사Chungnyeolsa부산광역시 동래구 반송로 지하 205051-678-6405
1054호선406명장Myeongjang부산광역시 동래구 반송로 지하 281051-678-6406
1064호선407서동Seo-dong부산광역시 금정구 반송로 지하 387051-678-6407
1074호선408금사Geumsa부산광역시 금정구 반송로 지하 465051-678-6408
1084호선409반여농산물시장Banyeo Agricultural Market부산광역시 해운대구 반송로 550051-678-6409
1094호선410석대Seokdae부산광역시 해운대구 석대천로 121051-678-6410
1104호선411영산대Youngsan Univ.부산광역시 해운대구 반송로 803051-678-6411
1114호선412윗반송Witbansong부산광역시 해운대구 반송로 917051-678-6412
1124호선413고촌Gochon부산광역시 기장군 철마면 반송로 991051-678-6413
1134호선414안평Anpyeong부산광역시 기장군 철마면 반송로 1101051-678-6414