Overview

Dataset statistics

Number of variables6
Number of observations1093
Missing cells1029
Missing cells (%)15.7%
Duplicate rows4
Duplicate rows (%)0.4%
Total size in memory54.6 KiB
Average record size in memory51.1 B

Variable types

Categorical3
Text1
Numeric2

Dataset

Description부산교통공사의 1~4호선 역사 내 승강기(에스컬레이터) 대수 및 에스컬레이터 호기별 설치연도, 개량년도에 관한 자료 (호선,역명,호기,설치년도,교체주기(개량년도))
URLhttps://www.data.go.kr/data/15052663/fileData.do

Alerts

Dataset has 4 (0.4%) duplicate rowsDuplicates
설치년도 is highly overall correlated with 교체주기(개량년도) and 1 other fieldsHigh correlation
교체주기(개량년도) is highly overall correlated with 설치년도 and 1 other fieldsHigh correlation
호선 is highly overall correlated with 설치년도 and 1 other fieldsHigh correlation
교체주기(개량년도) has 1029 (94.1%) missing valuesMissing

Reproduction

Analysis started2023-12-12 18:24:53.553423
Analysis finished2023-12-12 18:24:54.404958
Duration0.85 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
2
376 
1
297 
3
242 
4
178 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 376
34.4%
1 297
27.2%
3 242
22.1%
4 178
16.3%

Length

2023-12-13T03:24:54.464298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:24:54.559733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 376
34.4%
1 297
27.2%
3 242
22.1%
4 178
16.3%

역명
Text

Distinct114
Distinct (%)10.4%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
2023-12-13T03:24:54.883026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length2
Mean length2.6331199
Min length2

Characters and Unicode

Total characters2878
Distinct characters135
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.2%

Sample

1st row서면
2nd row서면
3rd row서면
4th row서면
5th row서면
ValueCountFrequency (%)
동래 32
 
2.9%
서면 27
 
2.5%
만덕 27
 
2.5%
다대포항 26
 
2.4%
배산 22
 
2.0%
수안 21
 
1.9%
동매 21
 
1.9%
연산동 20
 
1.8%
센텀시티 20
 
1.8%
다대포해수욕장 19
 
1.7%
Other values (104) 858
78.5%
2023-12-13T03:24:55.399240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
175
 
6.1%
159
 
5.5%
143
 
5.0%
103
 
3.6%
88
 
3.1%
79
 
2.7%
70
 
2.4%
64
 
2.2%
63
 
2.2%
53
 
1.8%
Other values (125) 1881
65.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2870
99.7%
Close Punctuation 4
 
0.1%
Open Punctuation 4
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
175
 
6.1%
159
 
5.5%
143
 
5.0%
103
 
3.6%
88
 
3.1%
79
 
2.8%
70
 
2.4%
64
 
2.2%
63
 
2.2%
53
 
1.8%
Other values (123) 1873
65.3%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2870
99.7%
Common 8
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
175
 
6.1%
159
 
5.5%
143
 
5.0%
103
 
3.6%
88
 
3.1%
79
 
2.8%
70
 
2.4%
64
 
2.2%
63
 
2.2%
53
 
1.8%
Other values (123) 1873
65.3%
Common
ValueCountFrequency (%)
) 4
50.0%
( 4
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2870
99.7%
ASCII 8
 
0.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
175
 
6.1%
159
 
5.5%
143
 
5.0%
103
 
3.6%
88
 
3.1%
79
 
2.8%
70
 
2.4%
64
 
2.2%
63
 
2.2%
53
 
1.8%
Other values (123) 1873
65.3%
ASCII
ValueCountFrequency (%)
) 4
50.0%
( 4
50.0%

구분
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
에스컬레이터
649 
엘리베이터
444 

Length

Max length6
Median length6
Mean length5.5937786
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row에스컬레이터
2nd row에스컬레이터
3rd row에스컬레이터
4th row에스컬레이터
5th row에스컬레이터

Common Values

ValueCountFrequency (%)
에스컬레이터 649
59.4%
엘리베이터 444
40.6%

Length

2023-12-13T03:24:55.565230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:24:55.677262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
에스컬레이터 649
59.4%
엘리베이터 444
40.6%

호기
Categorical

Distinct27
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
1
192 
2
187 
3
168 
4
149 
5
83 
Other values (22)
314 

Length

Max length3
Median length1
Mean length1.1143641
Min length1

Unique

Unique7 ?
Unique (%)0.6%

Sample

1st row1
2nd row2
3rd row3
4th row4
5th row5

Common Values

ValueCountFrequency (%)
1 192
17.6%
2 187
17.1%
3 168
15.4%
4 149
13.6%
5 83
7.6%
6 69
 
6.3%
7 53
 
4.8%
8 45
 
4.1%
9 28
 
2.6%
10 26
 
2.4%
Other values (17) 93
8.5%

Length

2023-12-13T03:24:55.790380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 192
17.6%
2 187
17.1%
3 168
15.4%
4 149
13.6%
5 83
7.6%
6 69
 
6.3%
7 53
 
4.8%
8 45
 
4.1%
9 28
 
2.6%
10 26
 
2.4%
Other values (17) 93
8.5%

설치년도
Real number (ℝ)

HIGH CORRELATION 

Distinct27
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2007.9003
Minimum1985
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.7 KiB
2023-12-13T03:24:55.916007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1985
5-th percentile1998
Q12005
median2009
Q32011
95-th percentile2017
Maximum2023
Range38
Interquartile range (IQR)6

Descriptive statistics

Standard deviation6.398126
Coefficient of variation (CV)0.003186476
Kurtosis1.9583401
Mean2007.9003
Median Absolute Deviation (MAD)4
Skewness-0.94482953
Sum2194635
Variance40.936016
MonotonicityNot monotonic
2023-12-13T03:24:56.054676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
2011 270
24.7%
2005 256
23.4%
2017 104
 
9.5%
2012 68
 
6.2%
2001 64
 
5.9%
1998 54
 
4.9%
2007 52
 
4.8%
2008 35
 
3.2%
2010 28
 
2.6%
2016 25
 
2.3%
Other values (17) 137
12.5%
ValueCountFrequency (%)
1985 13
 
1.2%
1987 6
 
0.5%
1988 8
 
0.7%
1989 7
 
0.6%
1994 3
 
0.3%
1998 54
4.9%
2001 64
5.9%
2002 22
 
2.0%
2003 2
 
0.2%
2004 1
 
0.1%
ValueCountFrequency (%)
2023 1
 
0.1%
2022 4
 
0.4%
2021 6
 
0.5%
2018 6
 
0.5%
2017 104
9.5%
2016 25
 
2.3%
2015 9
 
0.8%
2014 5
 
0.5%
2013 6
 
0.5%
2012 68
6.2%

교체주기(개량년도)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct19
Distinct (%)29.7%
Missing1029
Missing (%)94.1%
Infinite0
Infinite (%)0.0%
Mean2010.4062
Minimum1998
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.7 KiB
2023-12-13T03:24:56.178609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1998
5-th percentile2004
Q12005
median2011
Q32016
95-th percentile2019
Maximum2022
Range24
Interquartile range (IQR)11

Descriptive statistics

Standard deviation5.9939619
Coefficient of variation (CV)0.002981468
Kurtosis-1.3519612
Mean2010.4062
Median Absolute Deviation (MAD)6
Skewness0.10356429
Sum128666
Variance35.927579
MonotonicityNot monotonic
2023-12-13T03:24:56.292700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
2004 11
 
1.0%
2016 10
 
0.9%
2005 10
 
0.9%
2015 5
 
0.5%
2011 4
 
0.4%
2006 4
 
0.4%
2017 3
 
0.3%
2018 2
 
0.2%
2019 2
 
0.2%
2003 2
 
0.2%
Other values (9) 11
 
1.0%
(Missing) 1029
94.1%
ValueCountFrequency (%)
1998 1
 
0.1%
2003 2
 
0.2%
2004 11
1.0%
2005 10
0.9%
2006 4
 
0.4%
2008 1
 
0.1%
2009 1
 
0.1%
2010 1
 
0.1%
2011 4
 
0.4%
2012 2
 
0.2%
ValueCountFrequency (%)
2022 1
 
0.1%
2021 1
 
0.1%
2020 1
 
0.1%
2019 2
 
0.2%
2018 2
 
0.2%
2017 3
 
0.3%
2016 10
0.9%
2015 5
0.5%
2013 2
 
0.2%
2012 2
 
0.2%

Interactions

2023-12-13T03:24:54.076007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:24:53.841038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:24:54.154922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:24:53.981936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T03:24:56.368105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선구분호기설치년도교체주기(개량년도)
호선1.0000.3260.2010.8420.744
구분0.3261.0000.5420.5600.091
호기0.2010.5421.0000.0000.000
설치년도0.8420.5600.0001.0000.743
교체주기(개량년도)0.7440.0910.0000.7431.000
2023-12-13T03:24:56.471217image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선구분호기
호선1.0000.2170.106
구분0.2171.0000.463
호기0.1060.4631.000
2023-12-13T03:24:56.549673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설치년도교체주기(개량년도)호선구분호기
설치년도1.0000.7060.6870.4380.000
교체주기(개량년도)0.7061.0000.6040.0000.000
호선0.6870.6041.0000.2170.106
구분0.4380.0000.2171.0000.463
호기0.0000.0000.1060.4631.000

Missing values

2023-12-13T03:24:54.260469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T03:24:54.361710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선역명구분호기설치년도교체주기(개량년도)
01서면에스컬레이터119852004
11서면에스컬레이터219852004
21서면에스컬레이터319852004
31서면에스컬레이터419852004
41서면에스컬레이터519852004
51서면에스컬레이터619852004
61서면에스컬레이터719852004
71서면에스컬레이터819852004
81서면에스컬레이터919852004
91서면에스컬레이터1019852004
호선역명구분호기설치년도교체주기(개량년도)
10834고촌에스컬레이터52011<NA>
10844고촌에스컬레이터62011<NA>
10854고촌엘리베이터12011<NA>
10864고촌엘리베이터22011<NA>
10874고촌엘리베이터32011<NA>
10884수안엘리베이터22011<NA>
10894수안엘리베이터32011<NA>
10904수안엘리베이터42011<NA>
10914반여엘리베이터12011<NA>
10924반여엘리베이터22011<NA>

Duplicate rows

Most frequently occurring

호선역명구분호기설치년도교체주기(개량년도)# duplicates
01남포에스컬레이터72017<NA>2
14윗반송엘리베이터12011<NA>2
24윗반송엘리베이터22011<NA>2
34윗반송엘리베이터32011<NA>2