Overview

Dataset statistics

Number of variables7
Number of observations360
Missing cells1
Missing cells (%)< 0.1%
Duplicate rows17
Duplicate rows (%)4.7%
Total size in memory20.2 KiB
Average record size in memory57.4 B

Variable types

Categorical4
Text2
Numeric1

Dataset

Description부산교통공사에서 관리하는 도시광역철도역들의 철도운영기관명, 선명, 역명, 공기호흡기의 역층, 출구번호, 상세위치의 데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041446/fileData.do

Alerts

철도운영기관명 has constant value ""Constant
Dataset has 17 (4.7%) duplicate rowsDuplicates
출입구번호 is highly overall correlated with and 1 other fieldsHigh correlation
지상지하 is highly overall correlated with and 1 other fieldsHigh correlation
is highly overall correlated with 선명 and 2 other fieldsHigh correlation
선명 is highly overall correlated with High correlation
지상지하 is highly imbalanced (97.2%)Imbalance
출입구번호 is highly imbalanced (84.0%)Imbalance

Reproduction

Analysis started2023-12-12 12:09:31.460737
Analysis finished2023-12-12 12:09:32.133062
Duration0.67 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
부산교통공사
360 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부산교통공사
2nd row부산교통공사
3rd row부산교통공사
4th row부산교통공사
5th row부산교통공사

Common Values

ValueCountFrequency (%)
부산교통공사 360
100.0%

Length

2023-12-12T21:09:32.218465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:09:32.332592image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부산교통공사 360
100.0%

선명
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
1호선
132 
2호선
95 
3호선
66 
4호선
62 
3호선
 
5

Length

Max length4
Median length3
Mean length3.0138889
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
1호선 132
36.7%
2호선 95
26.4%
3호선 66
18.3%
4호선 62
17.2%
3호선 5
 
1.4%

Length

2023-12-12T21:09:32.494809image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:09:32.614673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1호선 132
36.7%
2호선 95
26.4%
3호선 71
19.7%
4호선 62
17.2%

역명
Text

Distinct87
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
2023-12-12T21:09:32.920467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length3
Mean length3.5194444
Min length3

Characters and Unicode

Total characters1267
Distinct characters116
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row다대포해수욕장역
2nd row다대포해수욕장역
3rd row다대포해수욕장역
4th row다대포해수욕장역
5th row다대포해수욕장역
ValueCountFrequency (%)
동래역 10
 
2.8%
배산역 10
 
2.8%
대티역 10
 
2.8%
연산역 9
 
2.5%
서대신역 8
 
2.2%
명장역 8
 
2.2%
충렬사역 8
 
2.2%
수안역 8
 
2.2%
종합운동장역 8
 
2.2%
서동역 8
 
2.2%
Other values (77) 273
75.8%
2023-12-12T21:09:33.414032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
360
28.4%
50
 
3.9%
45
 
3.6%
43
 
3.4%
40
 
3.2%
30
 
2.4%
25
 
2.0%
22
 
1.7%
21
 
1.7%
19
 
1.5%
Other values (106) 612
48.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1265
99.8%
Other Punctuation 2
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
360
28.5%
50
 
4.0%
45
 
3.6%
43
 
3.4%
40
 
3.2%
30
 
2.4%
25
 
2.0%
22
 
1.7%
21
 
1.7%
19
 
1.5%
Other values (105) 610
48.2%
Other Punctuation
ValueCountFrequency (%)
· 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1265
99.8%
Common 2
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
360
28.5%
50
 
4.0%
45
 
3.6%
43
 
3.4%
40
 
3.2%
30
 
2.4%
25
 
2.0%
22
 
1.7%
21
 
1.7%
19
 
1.5%
Other values (105) 610
48.2%
Common
ValueCountFrequency (%)
· 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1265
99.8%
None 2
 
0.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
360
28.5%
50
 
4.0%
45
 
3.6%
43
 
3.4%
40
 
3.2%
30
 
2.4%
25
 
2.0%
22
 
1.7%
21
 
1.7%
19
 
1.5%
Other values (105) 610
48.2%
None
ValueCountFrequency (%)
· 2
100.0%

지상지하
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
지하
359 
중층
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row지하
2nd row중층
3rd row지하
4th row지하
5th row지하

Common Values

ValueCountFrequency (%)
지하 359
99.7%
중층 1
 
0.3%

Length

2023-12-12T21:09:33.559645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:09:33.674718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지하 359
99.7%
중층 1
 
0.3%


Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)2.5%
Missing1
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean2.1810585
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.3 KiB
2023-12-12T21:09:33.783815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q32
95-th percentile5
Maximum9
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.4562821
Coefficient of variation (CV)0.66769511
Kurtosis7.2244943
Mean2.1810585
Median Absolute Deviation (MAD)1
Skewness2.4160847
Sum783
Variance2.1207575
MonotonicityNot monotonic
2023-12-12T21:09:33.946310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
2 159
44.2%
1 118
32.8%
3 41
 
11.4%
4 17
 
4.7%
5 10
 
2.8%
6 4
 
1.1%
8 4
 
1.1%
9 4
 
1.1%
7 2
 
0.6%
(Missing) 1
 
0.3%
ValueCountFrequency (%)
1 118
32.8%
2 159
44.2%
3 41
 
11.4%
4 17
 
4.7%
5 10
 
2.8%
6 4
 
1.1%
7 2
 
0.6%
8 4
 
1.1%
9 4
 
1.1%
ValueCountFrequency (%)
9 4
 
1.1%
8 4
 
1.1%
7 2
 
0.6%
6 4
 
1.1%
5 10
 
2.8%
4 17
 
4.7%
3 41
 
11.4%
2 159
44.2%
1 118
32.8%

출입구번호
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct12
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
<NA>
336 
2/4
 
4
2
 
4
4
 
3
5
 
2
Other values (7)
 
11

Length

Max length5
Median length4
Mean length3.8388889
Min length1

Unique

Unique3 ?
Unique (%)0.8%

Sample

1st row2/4
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 336
93.3%
2/4 4
 
1.1%
2 4
 
1.1%
4 3
 
0.8%
5 2
 
0.6%
3 2
 
0.6%
7 2
 
0.6%
1 2
 
0.6%
10 2
 
0.6%
3/4/6 1
 
0.3%
Other values (2) 2
 
0.6%

Length

2023-12-12T21:09:34.139394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 336
93.3%
2/4 4
 
1.1%
2 4
 
1.1%
4 3
 
0.8%
5 2
 
0.6%
3 2
 
0.6%
7 2
 
0.6%
1 2
 
0.6%
10 2
 
0.6%
3/4/6 1
 
0.3%
Other values (2) 2
 
0.6%
Distinct298
Distinct (%)82.8%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
2023-12-12T21:09:34.536403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length69
Median length28
Mean length16.494444
Min length4

Characters and Unicode

Total characters5938
Distinct characters191
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique279 ?
Unique (%)77.5%

Sample

1st row(B1) 역무안전실 출입문 인근
2nd row(중층) 중층 복도
3rd row(B2) 기능동 환기실D 복도
4th row(B2) 상선 2-3 맞은편 소화전 내
5th row(B2) 상선 7-2 맞은편 소화전 내
ValueCountFrequency (%)
127
 
8.4%
승강장 91
 
6.0%
b2 88
 
5.8%
역무안전실 69
 
4.6%
출입문 60
 
4.0%
방향 53
 
3.5%
b1 53
 
3.5%
52
 
3.5%
하선 51
 
3.4%
상선 49
 
3.3%
Other values (321) 812
54.0%
2023-12-12T21:09:35.119175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1171
 
19.7%
( 274
 
4.6%
) 273
 
4.6%
B 218
 
3.7%
2 192
 
3.2%
- 163
 
2.7%
157
 
2.6%
1 155
 
2.6%
150
 
2.5%
145
 
2.4%
Other values (181) 3040
51.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2994
50.4%
Space Separator 1171
 
19.7%
Decimal Number 673
 
11.3%
Uppercase Letter 298
 
5.0%
Open Punctuation 274
 
4.6%
Close Punctuation 273
 
4.6%
Dash Punctuation 163
 
2.7%
Other Punctuation 89
 
1.5%
Lowercase Letter 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
157
 
5.2%
150
 
5.0%
145
 
4.8%
143
 
4.8%
138
 
4.6%
125
 
4.2%
123
 
4.1%
110
 
3.7%
106
 
3.5%
95
 
3.2%
Other values (154) 1702
56.8%
Uppercase Letter
ValueCountFrequency (%)
B 218
73.2%
E 22
 
7.4%
S 18
 
6.0%
L 10
 
3.4%
D 9
 
3.0%
P 8
 
2.7%
A 7
 
2.3%
V 3
 
1.0%
G 2
 
0.7%
M 1
 
0.3%
Decimal Number
ValueCountFrequency (%)
2 192
28.5%
1 155
23.0%
3 119
17.7%
4 96
14.3%
5 39
 
5.8%
6 34
 
5.1%
8 16
 
2.4%
7 12
 
1.8%
9 6
 
0.9%
0 4
 
0.6%
Lowercase Letter
ValueCountFrequency (%)
s 2
66.7%
o 1
33.3%
Space Separator
ValueCountFrequency (%)
1171
100.0%
Open Punctuation
ValueCountFrequency (%)
( 274
100.0%
Close Punctuation
ValueCountFrequency (%)
) 273
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 163
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 89
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2994
50.4%
Common 2643
44.5%
Latin 301
 
5.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
157
 
5.2%
150
 
5.0%
145
 
4.8%
143
 
4.8%
138
 
4.6%
125
 
4.2%
123
 
4.1%
110
 
3.7%
106
 
3.5%
95
 
3.2%
Other values (154) 1702
56.8%
Common
ValueCountFrequency (%)
1171
44.3%
( 274
 
10.4%
) 273
 
10.3%
2 192
 
7.3%
- 163
 
6.2%
1 155
 
5.9%
3 119
 
4.5%
4 96
 
3.6%
/ 89
 
3.4%
5 39
 
1.5%
Other values (5) 72
 
2.7%
Latin
ValueCountFrequency (%)
B 218
72.4%
E 22
 
7.3%
S 18
 
6.0%
L 10
 
3.3%
D 9
 
3.0%
P 8
 
2.7%
A 7
 
2.3%
V 3
 
1.0%
G 2
 
0.7%
s 2
 
0.7%
Other values (2) 2
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2994
50.4%
ASCII 2944
49.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1171
39.8%
( 274
 
9.3%
) 273
 
9.3%
B 218
 
7.4%
2 192
 
6.5%
- 163
 
5.5%
1 155
 
5.3%
3 119
 
4.0%
4 96
 
3.3%
/ 89
 
3.0%
Other values (17) 194
 
6.6%
Hangul
ValueCountFrequency (%)
157
 
5.2%
150
 
5.0%
145
 
4.8%
143
 
4.8%
138
 
4.6%
125
 
4.2%
123
 
4.1%
110
 
3.7%
106
 
3.5%
95
 
3.2%
Other values (154) 1702
56.8%

Interactions

2023-12-12T21:09:31.843534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:09:35.272233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명역명지상지하출입구번호
선명1.0000.9900.0000.7050.776
역명0.9901.0000.0000.7330.925
지상지하0.0000.0001.000NaNNaN
0.7050.733NaN1.000NaN
출입구번호0.7760.925NaNNaN1.000
2023-12-12T21:09:35.407642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명출입구번호지상지하
선명1.0000.4700.000
출입구번호0.4701.0001.000
지상지하0.0001.0001.000
2023-12-12T21:09:35.522846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명지상지하출입구번호
1.0000.5031.0001.000
선명0.5031.0000.0000.470
지상지하1.0000.0001.0001.000
출입구번호1.0000.4701.0001.000

Missing values

2023-12-12T21:09:31.972469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:09:32.084711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명지상지하출입구번호상세위치
0부산교통공사1호선다대포해수욕장역지하12/4(B1) 역무안전실 출입문 인근
1부산교통공사1호선다대포해수욕장역중층<NA><NA>(중층) 중층 복도
2부산교통공사1호선다대포해수욕장역지하2<NA>(B2) 기능동 환기실D 복도
3부산교통공사1호선다대포해수욕장역지하2<NA>(B2) 상선 2-3 맞은편 소화전 내
4부산교통공사1호선다대포해수욕장역지하2<NA>(B2) 상선 7-2 맞은편 소화전 내
5부산교통공사1호선다대포해수욕장역지하2<NA>(B2) 하선 2-3 맞은편 소화전
6부산교통공사1호선다대포해수욕장역지하2<NA>(B2) 하선 7-2 맞은편 소화전 내
7부산교통공사1호선다대포항역지하2<NA>(B2) 다대포해수욕장역 방향(상행) 승강장 6-2
8부산교통공사1호선다대포항역지하2<NA>(B2) 낫개역 방향(하행) 승강장 6-2
9부산교통공사1호선다대포항역지하2<NA>(B2) 다대포해수욕장역 방향(상행) 승강장 3-2
철도운영기관명선명역명지상지하출입구번호상세위치
350부산교통공사4호선금사역지하1<NA>대합실 역무기기실앞(1)
351부산교통공사4호선금사역지하2<NA>지하2층 미화원실앞(3)
352부산교통공사4호선금사역지하2<NA>승강장 하선방향(5)
353부산교통공사4호선금사역지하2<NA>지하2층 신호통신실앞(4)
354부산교통공사4호선금사역지하2<NA>승강장 상선방향(6)
355부산교통공사4호선금사역지하1<NA>대합실 역무실옆(2)
356부산교통공사4호선반여농산물시장역지하2<NA>상선승강장(금사방면)
357부산교통공사4호선반여농산물시장역지하2<NA>상선승강장(석대방면)
358부산교통공사4호선반여농산물시장역지하2<NA>하선승강장(금사방면)
359부산교통공사4호선반여농산물시장역지하2<NA>하선승강장(석대방면)

Duplicate rows

Most frequently occurring

철도운영기관명선명역명지상지하출입구번호상세위치# duplicates
0부산교통공사1호선대티역지하2<NA>(B2) 고객센터앞2
1부산교통공사1호선대티역지하3<NA>(B3) 용역대기실 입구 E/S 1호기 우측2
2부산교통공사1호선대티역지하4<NA>(B4) B환기실 앞2
3부산교통공사1호선범어사역지하1<NA>(B1) 용역대기실 앞2
4부산교통공사1호선범어사역지하2<NA>(B1) 고객안내실2
5부산교통공사1호선서대신역지하2<NA>(B2) 역무실2
6부산교통공사1호선시청역지하1<NA>(B1) 역무안전실2
7부산교통공사1호선신평역지하1<NA>(B1) 역무안전실 내2
8부산교통공사1호선장림역지하1<NA>(B1) 고객센터 내부 종합제어실 안2
9부산교통공사3호선거제역지하2<NA>(B2)상선승강장 중앙2