Overview

Dataset statistics

Number of variables6
Number of observations114
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.6 KiB
Average record size in memory50.2 B

Variable types

Categorical1
Numeric1
Text4

Dataset

Description부산교통공사_도시철도역명정보_20201021
Author부산교통공사
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=3077187

Alerts

역번호 is highly overall correlated with 구분High correlation
구분 is highly overall correlated with 역번호High correlation
전화번호 has unique valuesUnique

Reproduction

Analysis started2023-12-10 16:21:13.053527
Analysis finished2023-12-10 16:21:13.773712
Duration0.72 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2호선
43 
1호선
40 
3호선
17 
4호선
14 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
2호선 43
37.7%
1호선 40
35.1%
3호선 17
 
14.9%
4호선 14
 
12.3%

Length

2023-12-11T01:21:13.873812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:21:14.025694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2호선 43
37.7%
1호선 40
35.1%
3호선 17
 
14.9%
4호선 14
 
12.3%

역번호
Real number (ℝ)

HIGH CORRELATION 

Distinct113
Distinct (%)99.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean220.04386
Minimum95
Maximum414
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-12-11T01:21:14.201726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum95
5-th percentile100.65
Q1123.25
median217.5
Q3302.75
95-th percentile408.35
Maximum414
Range319
Interquartile range (IQR)179.5

Descriptive statistics

Standard deviation97.979173
Coefficient of variation (CV)0.44527111
Kurtosis-0.67713011
Mean220.04386
Median Absolute Deviation (MAD)90
Skewness0.53386789
Sum25085
Variance9599.9184
MonotonicityIncreasing
2023-12-11T01:21:14.410138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
209 2
 
1.8%
95 1
 
0.9%
234 1
 
0.9%
302 1
 
0.9%
301 1
 
0.9%
243 1
 
0.9%
242 1
 
0.9%
241 1
 
0.9%
240 1
 
0.9%
239 1
 
0.9%
Other values (103) 103
90.4%
ValueCountFrequency (%)
95 1
0.9%
96 1
0.9%
97 1
0.9%
98 1
0.9%
99 1
0.9%
100 1
0.9%
101 1
0.9%
102 1
0.9%
103 1
0.9%
104 1
0.9%
ValueCountFrequency (%)
414 1
0.9%
413 1
0.9%
412 1
0.9%
411 1
0.9%
410 1
0.9%
409 1
0.9%
408 1
0.9%
407 1
0.9%
406 1
0.9%
405 1
0.9%
Distinct108
Distinct (%)94.7%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2023-12-11T01:21:14.853556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length2
Mean length2.5614035
Min length2

Characters and Unicode

Total characters292
Distinct characters133
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique102 ?
Unique (%)89.5%

Sample

1st row다대포해수욕장
2nd row다대포항
3rd row낫개
4th row신장림
5th row장림
ValueCountFrequency (%)
연산 2
 
1.8%
미남 2
 
1.8%
덕천 2
 
1.8%
동래 2
 
1.8%
수영 2
 
1.8%
서면 2
 
1.8%
증산 1
 
0.9%
동원 1
 
0.9%
금곡 1
 
0.9%
호포 1
 
0.9%
Other values (98) 98
86.0%
2023-12-11T01:21:15.453181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
19
 
6.5%
17
 
5.8%
11
 
3.8%
9
 
3.1%
9
 
3.1%
8
 
2.7%
8
 
2.7%
6
 
2.1%
6
 
2.1%
5
 
1.7%
Other values (123) 194
66.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 290
99.3%
Other Punctuation 2
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
19
 
6.6%
17
 
5.9%
11
 
3.8%
9
 
3.1%
9
 
3.1%
8
 
2.8%
8
 
2.8%
6
 
2.1%
6
 
2.1%
5
 
1.7%
Other values (122) 192
66.2%
Other Punctuation
ValueCountFrequency (%)
· 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 290
99.3%
Common 2
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
19
 
6.6%
17
 
5.9%
11
 
3.8%
9
 
3.1%
9
 
3.1%
8
 
2.8%
8
 
2.8%
6
 
2.1%
6
 
2.1%
5
 
1.7%
Other values (122) 192
66.2%
Common
ValueCountFrequency (%)
· 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 290
99.3%
None 2
 
0.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
19
 
6.6%
17
 
5.9%
11
 
3.8%
9
 
3.1%
9
 
3.1%
8
 
2.8%
8
 
2.8%
6
 
2.1%
6
 
2.1%
5
 
1.7%
Other values (122) 192
66.2%
None
ValueCountFrequency (%)
· 2
100.0%

영어
Text

Distinct108
Distinct (%)94.7%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2023-12-11T01:21:15.938611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length37
Median length29
Mean length8.9912281
Min length4

Characters and Unicode

Total characters1025
Distinct characters51
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique102 ?
Unique (%)89.5%

Sample

1st rowDadaepo Beach
2nd rowDadaepo Harbor
3rd rowNatgae
4th rowSinjangnim
5th rowJangnim
ValueCountFrequency (%)
univ 6
 
4.1%
nat'l 4
 
2.7%
busan 3
 
2.0%
yeonsan 2
 
1.4%
seomyeon 2
 
1.4%
of 2
 
1.4%
dadaepo 2
 
1.4%
sports 2
 
1.4%
city 2
 
1.4%
pusan 2
 
1.4%
Other values (115) 120
81.6%
2023-12-11T01:21:16.612619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 134
 
13.1%
a 107
 
10.4%
o 94
 
9.2%
e 91
 
8.9%
g 64
 
6.2%
u 49
 
4.8%
s 36
 
3.5%
i 34
 
3.3%
33
 
3.2%
m 31
 
3.0%
Other values (41) 352
34.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 819
79.9%
Uppercase Letter 153
 
14.9%
Space Separator 33
 
3.2%
Other Punctuation 14
 
1.4%
Dash Punctuation 4
 
0.4%
Open Punctuation 1
 
0.1%
Close Punctuation 1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 134
16.4%
a 107
13.1%
o 94
11.5%
e 91
11.1%
g 64
7.8%
u 49
 
6.0%
s 36
 
4.4%
i 34
 
4.2%
m 31
 
3.8%
l 25
 
3.1%
Other values (14) 154
18.8%
Uppercase Letter
ValueCountFrequency (%)
D 18
11.8%
S 18
11.8%
B 16
10.5%
G 15
9.8%
M 14
9.2%
N 13
8.5%
J 11
7.2%
C 10
6.5%
U 7
 
4.6%
Y 7
 
4.6%
Other values (10) 24
15.7%
Other Punctuation
ValueCountFrequency (%)
. 7
50.0%
' 5
35.7%
· 2
 
14.3%
Space Separator
ValueCountFrequency (%)
33
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 972
94.8%
Common 53
 
5.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 134
13.8%
a 107
 
11.0%
o 94
 
9.7%
e 91
 
9.4%
g 64
 
6.6%
u 49
 
5.0%
s 36
 
3.7%
i 34
 
3.5%
m 31
 
3.2%
l 25
 
2.6%
Other values (34) 307
31.6%
Common
ValueCountFrequency (%)
33
62.3%
. 7
 
13.2%
' 5
 
9.4%
- 4
 
7.5%
· 2
 
3.8%
( 1
 
1.9%
) 1
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1023
99.8%
None 2
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 134
 
13.1%
a 107
 
10.5%
o 94
 
9.2%
e 91
 
8.9%
g 64
 
6.3%
u 49
 
4.8%
s 36
 
3.5%
i 34
 
3.3%
33
 
3.2%
m 31
 
3.0%
Other values (40) 350
34.2%
None
ValueCountFrequency (%)
· 2
100.0%

주소
Text

Distinct109
Distinct (%)95.6%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2023-12-11T01:21:17.072274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length23
Mean length20.5
Min length16

Characters and Unicode

Total characters2337
Distinct characters86
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique104 ?
Unique (%)91.2%

Sample

1st row부산광역시 사하구 다대로 지하 692(다대동)
2nd row부산광역시 사하구 다대로 지하548(다대동)
3rd row부산광역시 사하구 다대로 지하442(다대동)
4th row부산광역시 사하구 다대로 지하310(장림동)
5th row부산광역시 사하구 다대로 지하230(장림동)
ValueCountFrequency (%)
부산광역시 109
20.0%
지하 84
 
15.4%
중앙대로 21
 
3.8%
북구 13
 
2.4%
사하구 12
 
2.2%
부산진구 10
 
1.8%
동래구 10
 
1.8%
해운대구 10
 
1.8%
수영로 10
 
1.8%
금정구 9
 
1.6%
Other values (150) 258
47.3%
2023-12-11T01:21:17.691280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
432
18.5%
126
 
5.4%
119
 
5.1%
118
 
5.0%
114
 
4.9%
111
 
4.7%
109
 
4.7%
109
 
4.7%
102
 
4.4%
89
 
3.8%
Other values (76) 908
38.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1540
65.9%
Space Separator 432
 
18.5%
Decimal Number 348
 
14.9%
Open Punctuation 6
 
0.3%
Close Punctuation 6
 
0.3%
Dash Punctuation 5
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
126
 
8.2%
119
 
7.7%
118
 
7.7%
114
 
7.4%
111
 
7.2%
109
 
7.1%
109
 
7.1%
102
 
6.6%
89
 
5.8%
83
 
5.4%
Other values (62) 460
29.9%
Decimal Number
ValueCountFrequency (%)
1 65
18.7%
2 52
14.9%
0 41
11.8%
4 36
10.3%
7 32
9.2%
3 32
9.2%
9 25
 
7.2%
6 24
 
6.9%
5 21
 
6.0%
8 20
 
5.7%
Space Separator
ValueCountFrequency (%)
432
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1540
65.9%
Common 797
34.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
126
 
8.2%
119
 
7.7%
118
 
7.7%
114
 
7.4%
111
 
7.2%
109
 
7.1%
109
 
7.1%
102
 
6.6%
89
 
5.8%
83
 
5.4%
Other values (62) 460
29.9%
Common
ValueCountFrequency (%)
432
54.2%
1 65
 
8.2%
2 52
 
6.5%
0 41
 
5.1%
4 36
 
4.5%
7 32
 
4.0%
3 32
 
4.0%
9 25
 
3.1%
6 24
 
3.0%
5 21
 
2.6%
Other values (4) 37
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1540
65.9%
ASCII 797
34.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
432
54.2%
1 65
 
8.2%
2 52
 
6.5%
0 41
 
5.1%
4 36
 
4.5%
7 32
 
4.0%
3 32
 
4.0%
9 25
 
3.1%
6 24
 
3.0%
5 21
 
2.6%
Other values (4) 37
 
4.6%
Hangul
ValueCountFrequency (%)
126
 
8.2%
119
 
7.7%
118
 
7.7%
114
 
7.4%
111
 
7.2%
109
 
7.1%
109
 
7.1%
102
 
6.6%
89
 
5.8%
83
 
5.4%
Other values (62) 460
29.9%

전화번호
Text

UNIQUE 

Distinct114
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2023-12-11T01:21:18.107088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters1368
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique114 ?
Unique (%)100.0%

Sample

1st row051-678-6195
2nd row051-678-6196
3rd row051-678-6197
4th row051-678-6198
5th row051-678-6199
ValueCountFrequency (%)
051-678-6195 1
 
0.9%
051-678-6302 1
 
0.9%
051-678-6243 1
 
0.9%
051-678-6242 1
 
0.9%
051-678-6241 1
 
0.9%
051-678-6240 1
 
0.9%
051-678-6239 1
 
0.9%
051-678-6238 1
 
0.9%
051-678-6237 1
 
0.9%
051-678-6236 1
 
0.9%
Other values (104) 104
91.2%
2023-12-11T01:21:18.589286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6 239
17.5%
- 228
16.7%
1 200
14.6%
0 161
11.8%
5 125
9.1%
7 125
9.1%
8 124
9.1%
2 76
 
5.6%
3 45
 
3.3%
4 30
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1140
83.3%
Dash Punctuation 228
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6 239
21.0%
1 200
17.5%
0 161
14.1%
5 125
11.0%
7 125
11.0%
8 124
10.9%
2 76
 
6.7%
3 45
 
3.9%
4 30
 
2.6%
9 15
 
1.3%
Dash Punctuation
ValueCountFrequency (%)
- 228
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1368
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6 239
17.5%
- 228
16.7%
1 200
14.6%
0 161
11.8%
5 125
9.1%
7 125
9.1%
8 124
9.1%
2 76
 
5.6%
3 45
 
3.3%
4 30
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1368
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6 239
17.5%
- 228
16.7%
1 200
14.6%
0 161
11.8%
5 125
9.1%
7 125
9.1%
8 124
9.1%
2 76
 
5.6%
3 45
 
3.3%
4 30
 
2.2%

Interactions

2023-12-11T01:21:13.375926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:21:18.746415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분역번호
구분1.0001.000
역번호1.0001.000
2023-12-11T01:21:18.869143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역번호구분
역번호1.0000.991
구분0.9911.000

Missing values

2023-12-11T01:21:13.526711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:21:13.686887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

구분역번호한국어영어주소전화번호
01호선95다대포해수욕장Dadaepo Beach부산광역시 사하구 다대로 지하 692(다대동)051-678-6195
11호선96다대포항Dadaepo Harbor부산광역시 사하구 다대로 지하548(다대동)051-678-6196
21호선97낫개Natgae부산광역시 사하구 다대로 지하442(다대동)051-678-6197
31호선98신장림Sinjangnim부산광역시 사하구 다대로 지하310(장림동)051-678-6198
41호선99장림Jangnim부산광역시 사하구 다대로 지하230(장림동)051-678-6199
51호선100동매Dongmae부산광역시 사하구 신산로 지하168(신평동)051-678-6100
61호선101신평Sinpyeong부산광역시 사하구 하신번영로 140051-678-6101
71호선102하단Hadan부산광역시 사하구 낙동남로 지하 1415051-678-6102
81호선103당리Dangni부산광역시 사하구 낙동대로 지하 405051-678-6103
91호선104사하Saha부산광역시 사하구 낙동대로 지하 309051-678-6104
구분역번호한국어영어주소전화번호
1044호선405충렬사Chungnyeolsa부산광역시 동래구 반송로 지하 205051-678-6405
1054호선406명장Myeongjang부산광역시 동래구 반송로 지하 281051-678-6406
1064호선407서동Seo-dong부산광역시 금정구 반송로 지하 387051-678-6407
1074호선408금사Geumsa부산광역시 금정구 반송로 지하 465051-678-6408
1084호선409반여농산물시장Banyeo Agricultural Market부산광역시 해운대구 반송로 550051-678-6409
1094호선410석대Seokdae부산광역시 해운대구 석대천로 121051-678-6410
1104호선411영산대Youngsan Univ.부산광역시 해운대구 반송로 803051-678-6411
1114호선412동부산대학Dong-Pusan College부산광역시 해운대구 반송로 917051-678-6412
1124호선413고촌Gochon부산광역시 기장군 철마면 반송로 991051-678-6413
1134호선414안평Anpyeong부산광역시 기장군 철마면 반송로 1101051-678-6414