Overview

Dataset statistics

Number of variables7
Number of observations71
Missing cells0
Missing cells (%)0.0%
Duplicate rows7
Duplicate rows (%)9.9%
Total size in memory4.2 KiB
Average record size in memory59.9 B

Variable types

Categorical5
Numeric1
Text1

Dataset

Description부산3호선에 포함된 도시광역철도역들의 철도운영기관명, 선명, 역명, 공기호흡기의 역층, 출구번호, 상세위치의 데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041449/fileData.do

Alerts

철도운영기관명 has constant value ""Constant
지상지하 has constant value ""Constant
Dataset has 7 (9.9%) duplicate rowsDuplicates
선명 is highly overall correlated with and 2 other fieldsHigh correlation
출입구번호 is highly overall correlated with and 2 other fieldsHigh correlation
is highly overall correlated with 선명 and 2 other fieldsHigh correlation
역명 is highly overall correlated with and 2 other fieldsHigh correlation
선명 is highly imbalanced (63.3%)Imbalance
출입구번호 is highly imbalanced (71.7%)Imbalance

Reproduction

Analysis started2023-12-12 15:58:41.242793
Analysis finished2023-12-12 15:58:41.979584
Duration0.74 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

CONSTANT 

Distinct1
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size700.0 B
부산교통공사
71 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부산교통공사
2nd row부산교통공사
3rd row부산교통공사
4th row부산교통공사
5th row부산교통공사

Common Values

ValueCountFrequency (%)
부산교통공사 71
100.0%

Length

2023-12-13T00:58:42.159244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:58:42.298965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부산교통공사 71
100.0%

선명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size700.0 B
3호선
66 
3호선
 
5

Length

Max length4
Median length3
Mean length3.0704225
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3호선
2nd row3호선
3rd row3호선
4th row3호선
5th row3호선

Common Values

ValueCountFrequency (%)
3호선 66
93.0%
3호선 5
 
7.0%

Length

2023-12-13T00:58:42.410693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:58:42.536709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3호선 71
100.0%

역명
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)18.3%
Missing0
Missing (%)0.0%
Memory size700.0 B
배산역
10 
물만골역
종합운동장역
만덕역
망미역
Other values (8)
30 

Length

Max length6
Median length3
Mean length3.5211268
Min length3

Unique

Unique2 ?
Unique (%)2.8%

Sample

1st row연산역
2nd row연산역
3rd row연산역
4th row연산역
5th row수영역

Common Values

ValueCountFrequency (%)
배산역 10
14.1%
물만골역 8
11.3%
종합운동장역 8
11.3%
만덕역 8
11.3%
망미역 7
9.9%
거제역 6
8.5%
사직역 5
7.0%
남산정역 5
7.0%
연산역 4
 
5.6%
미남역 4
 
5.6%
Other values (3) 6
8.5%

Length

2023-12-13T00:58:42.690243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
배산역 10
14.1%
물만골역 8
11.3%
종합운동장역 8
11.3%
만덕역 8
11.3%
망미역 7
9.9%
거제역 6
8.5%
사직역 5
7.0%
남산정역 5
7.0%
연산역 4
 
5.6%
미남역 4
 
5.6%
Other values (3) 6
8.5%

지상지하
Categorical

CONSTANT 

Distinct1
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size700.0 B
지하
71 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row지하
2nd row지하
3rd row지하
4th row지하
5th row지하

Common Values

ValueCountFrequency (%)
지하 71
100.0%

Length

2023-12-13T00:58:42.849994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:58:42.969554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지하 71
100.0%


Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)12.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.5633803
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size771.0 B
2023-12-13T00:58:43.071062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8.5
Maximum9
Range8
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.4126594
Coefficient of variation (CV)0.67707043
Kurtosis-0.25992961
Mean3.5633803
Median Absolute Deviation (MAD)2
Skewness0.87151696
Sum253
Variance5.8209256
MonotonicityNot monotonic
2023-12-13T00:58:43.257378image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
2 18
25.4%
1 15
21.1%
4 8
11.3%
3 8
11.3%
5 8
11.3%
6 4
 
5.6%
8 4
 
5.6%
9 4
 
5.6%
7 2
 
2.8%
ValueCountFrequency (%)
1 15
21.1%
2 18
25.4%
3 8
11.3%
4 8
11.3%
5 8
11.3%
6 4
 
5.6%
7 2
 
2.8%
8 4
 
5.6%
9 4
 
5.6%
ValueCountFrequency (%)
9 4
 
5.6%
8 4
 
5.6%
7 2
 
2.8%
6 4
 
5.6%
5 8
11.3%
4 8
11.3%
3 8
11.3%
2 18
25.4%
1 15
21.1%

출입구번호
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Memory size700.0 B
<NA>
64 
1
 
2
2
 
2
4
 
2
5
 
1

Length

Max length4
Median length4
Mean length3.7042254
Min length1

Unique

Unique1 ?
Unique (%)1.4%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 64
90.1%
1 2
 
2.8%
2 2
 
2.8%
4 2
 
2.8%
5 1
 
1.4%

Length

2023-12-13T00:58:43.467858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:58:43.627673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 64
90.1%
1 2
 
2.8%
2 2
 
2.8%
4 2
 
2.8%
5 1
 
1.4%
Distinct62
Distinct (%)87.3%
Missing0
Missing (%)0.0%
Memory size700.0 B
2023-12-13T00:58:43.891730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length23
Mean length16.295775
Min length5

Characters and Unicode

Total characters1157
Distinct characters102
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)77.5%

Sample

1st row(B4) 수영행 승강장 앞
2nd row(B4) 수영행 승강장 sos 비상구급 구역내
3rd row(B4) 대저행 승강장 앞
4th row(B4) 대저행 승강장 뒤
5th row상선 미화원실 앞 / 하선 4번 창고 앞
ValueCountFrequency (%)
승강장 26
 
9.5%
24
 
8.7%
출입문 16
 
5.8%
방향 16
 
5.8%
b2 10
 
3.6%
역무안전실 8
 
2.9%
하선 7
 
2.5%
b1 7
 
2.5%
근처 6
 
2.2%
b5 6
 
2.2%
Other values (77) 149
54.2%
2023-12-13T00:58:44.362769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
210
 
18.2%
) 64
 
5.5%
( 64
 
5.5%
B 57
 
4.9%
38
 
3.3%
38
 
3.3%
37
 
3.2%
37
 
3.2%
1 33
 
2.9%
28
 
2.4%
Other values (92) 551
47.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 597
51.6%
Space Separator 210
 
18.2%
Decimal Number 119
 
10.3%
Uppercase Letter 70
 
6.1%
Close Punctuation 64
 
5.5%
Open Punctuation 64
 
5.5%
Dash Punctuation 23
 
2.0%
Other Punctuation 7
 
0.6%
Lowercase Letter 3
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
38
 
6.4%
38
 
6.4%
37
 
6.2%
37
 
6.2%
28
 
4.7%
26
 
4.4%
25
 
4.2%
24
 
4.0%
21
 
3.5%
20
 
3.4%
Other values (69) 303
50.8%
Decimal Number
ValueCountFrequency (%)
1 33
27.7%
2 28
23.5%
4 22
18.5%
3 16
13.4%
5 8
 
6.7%
8 5
 
4.2%
9 5
 
4.2%
6 1
 
0.8%
7 1
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
B 57
81.4%
L 4
 
5.7%
E 4
 
5.7%
A 2
 
2.9%
P 1
 
1.4%
D 1
 
1.4%
S 1
 
1.4%
Lowercase Letter
ValueCountFrequency (%)
s 2
66.7%
o 1
33.3%
Space Separator
ValueCountFrequency (%)
210
100.0%
Close Punctuation
ValueCountFrequency (%)
) 64
100.0%
Open Punctuation
ValueCountFrequency (%)
( 64
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 23
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 597
51.6%
Common 487
42.1%
Latin 73
 
6.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
38
 
6.4%
38
 
6.4%
37
 
6.2%
37
 
6.2%
28
 
4.7%
26
 
4.4%
25
 
4.2%
24
 
4.0%
21
 
3.5%
20
 
3.4%
Other values (69) 303
50.8%
Common
ValueCountFrequency (%)
210
43.1%
) 64
 
13.1%
( 64
 
13.1%
1 33
 
6.8%
2 28
 
5.7%
- 23
 
4.7%
4 22
 
4.5%
3 16
 
3.3%
5 8
 
1.6%
/ 7
 
1.4%
Other values (4) 12
 
2.5%
Latin
ValueCountFrequency (%)
B 57
78.1%
L 4
 
5.5%
E 4
 
5.5%
s 2
 
2.7%
A 2
 
2.7%
P 1
 
1.4%
o 1
 
1.4%
D 1
 
1.4%
S 1
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 597
51.6%
ASCII 560
48.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
210
37.5%
) 64
 
11.4%
( 64
 
11.4%
B 57
 
10.2%
1 33
 
5.9%
2 28
 
5.0%
- 23
 
4.1%
4 22
 
3.9%
3 16
 
2.9%
5 8
 
1.4%
Other values (13) 35
 
6.2%
Hangul
ValueCountFrequency (%)
38
 
6.4%
38
 
6.4%
37
 
6.2%
37
 
6.2%
28
 
4.7%
26
 
4.4%
25
 
4.2%
24
 
4.0%
21
 
3.5%
20
 
3.4%
Other values (69) 303
50.8%

Interactions

2023-12-13T00:58:41.564532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T00:58:44.456638image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명역명출입구번호상세위치
선명1.0000.8190.856NaN1.000
역명0.8191.0000.8100.8170.988
0.8560.8101.000NaN1.000
출입구번호NaN0.817NaN1.0000.817
상세위치1.0000.9881.0000.8171.000
2023-12-13T00:58:44.567324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명선명출입구번호
역명1.0000.7270.612
선명0.7271.0001.000
출입구번호0.6121.0001.000
2023-12-13T00:58:44.653981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
선명역명출입구번호
1.0000.8410.5031.000
선명0.8411.0000.7271.000
역명0.5030.7271.0000.612
출입구번호1.0001.0000.6121.000

Missing values

2023-12-13T00:58:41.740636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T00:58:41.911840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명지상지하출입구번호상세위치
0부산교통공사3호선연산역지하4<NA>(B4) 수영행 승강장 앞
1부산교통공사3호선연산역지하4<NA>(B4) 수영행 승강장 sos 비상구급 구역내
2부산교통공사3호선연산역지하4<NA>(B4) 대저행 승강장 앞
3부산교통공사3호선연산역지하4<NA>(B4) 대저행 승강장 뒤
4부산교통공사3호선수영역지하3<NA>상선 미화원실 앞 / 하선 4번 창고 앞
5부산교통공사3호선덕천역지하3<NA>상선 4-3 앞 / 하선 1-1 앞
6부산교통공사3호선망미역지하15역무안전실 내
7부산교통공사3호선망미역지하5<NA>지하 5층 상측
8부산교통공사3호선망미역지하5<NA>지하 5층 하측
9부산교통공사3호선망미역지하6<NA>하선 앞쪽
철도운영기관명선명역명지상지하출입구번호상세위치
61부산교통공사3호선만덕역지하9<NA>(B9)남산정방면(하선)승강장 3-2 출입문 앞
62부산교통공사3호선남산정역지하12(B1) 2번출구 인근 역무안전실
63부산교통공사3호선남산정역지하2<NA>(B2) 만덕역 방향 상선 승강장 2-1 출입문 근처
64부산교통공사3호선남산정역지하2<NA>(B3) 만덕역 방향 상선 승강장 3-4 출입문 근처
65부산교통공사3호선남산정역지하2<NA>(B2) 숙등역 방향 하선 승강장 1-3 출입문 근처
66부산교통공사3호선남산정역지하2<NA>(B2) 숙등역 방향 하선 승강장 4-1 출입문 근처
67부산교통공사3호선숙등역지하1<NA>(B1)역무안전실
68부산교통공사3호선숙등역지하2<NA>(B2)A/B 환기실
69부산교통공사3호선숙등역지하3<NA>(B3)수영행 승강장
70부산교통공사3호선숙등역지하3<NA>(B3)대저행 승강장

Duplicate rows

Most frequently occurring

철도운영기관명선명역명지상지하출입구번호상세위치# duplicates
0부산교통공사3호선거제역지하2<NA>(B2)상선승강장 중앙2
1부산교통공사3호선거제역지하2<NA>(B2)하선승강장 중앙2
2부산교통공사3호선만덕역지하14(B1) 4번출구 역무안전실2
3부산교통공사3호선미남역지하1<NA>(B1)역무안전실 내2
4부산교통공사3호선배산역지하1<NA>역무안전실2
5부산교통공사3호선종합운동장역지하2<NA>(B2) 역무실2
6부산교통공사3호선망미역지하6<NA>하선 뒤쪽2