Overview

Dataset statistics

Number of variables5
Number of observations644
Missing cells584
Missing cells (%)18.1%
Duplicate rows1
Duplicate rows (%)0.2%
Total size in memory27.2 KiB
Average record size in memory43.2 B

Variable types

Categorical2
Text1
Numeric2

Dataset

Description부산교통공사_승강기연도별설치현황_20210526
Author부산교통공사
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15052663

Alerts

Dataset has 1 (0.2%) duplicate rowsDuplicates
설치년도 is highly overall correlated with 교체주기(개량년도) and 1 other fieldsHigh correlation
교체주기(개량년도) is highly overall correlated with 설치년도 and 1 other fieldsHigh correlation
호선 is highly overall correlated with 설치년도 and 1 other fieldsHigh correlation
교체주기(개량년도) has 584 (90.7%) missing valuesMissing

Reproduction

Analysis started2023-12-10 16:07:25.198484
Analysis finished2023-12-10 16:07:26.183882
Duration0.99 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
2
200 
3
174 
1
139 
4
131 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 200
31.1%
3 174
27.0%
1 139
21.6%
4 131
20.3%

Length

2023-12-11T01:07:26.277802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:07:26.434541image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 200
31.1%
3 174
27.0%
1 139
21.6%
4 131
20.3%

역명
Text

Distinct77
Distinct (%)12.0%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
2023-12-11T01:07:26.756182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length2
Mean length2.765528
Min length2

Characters and Unicode

Total characters1781
Distinct characters110
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.5%

Sample

1st row노포
2nd row범어사
3rd row동래
4th row동래
5th row교대
ValueCountFrequency (%)
동래 26
 
4.0%
다대포항 21
 
3.2%
만덕 20
 
3.1%
연산동 20
 
3.1%
배산 16
 
2.4%
서면 16
 
2.4%
수안 16
 
2.4%
센텀시티 14
 
2.1%
장림 14
 
2.1%
종합운동장 13
 
2.0%
Other values (68) 479
73.1%
2023-12-11T01:07:27.284743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
120
 
6.7%
104
 
5.8%
97
 
5.4%
70
 
3.9%
57
 
3.2%
48
 
2.7%
47
 
2.6%
43
 
2.4%
37
 
2.1%
37
 
2.1%
Other values (100) 1121
62.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1751
98.3%
Space Separator 22
 
1.2%
Close Punctuation 4
 
0.2%
Open Punctuation 4
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
120
 
6.9%
104
 
5.9%
97
 
5.5%
70
 
4.0%
57
 
3.3%
48
 
2.7%
47
 
2.7%
43
 
2.5%
37
 
2.1%
37
 
2.1%
Other values (97) 1091
62.3%
Space Separator
ValueCountFrequency (%)
22
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1751
98.3%
Common 30
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
120
 
6.9%
104
 
5.9%
97
 
5.5%
70
 
4.0%
57
 
3.3%
48
 
2.7%
47
 
2.7%
43
 
2.5%
37
 
2.1%
37
 
2.1%
Other values (97) 1091
62.3%
Common
ValueCountFrequency (%)
22
73.3%
) 4
 
13.3%
( 4
 
13.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1751
98.3%
ASCII 30
 
1.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
120
 
6.9%
104
 
5.9%
97
 
5.5%
70
 
4.0%
57
 
3.3%
48
 
2.7%
47
 
2.7%
43
 
2.5%
37
 
2.1%
37
 
2.1%
Other values (97) 1091
62.3%
ASCII
ValueCountFrequency (%)
22
73.3%
) 4
 
13.3%
( 4
 
13.3%

호기
Categorical

Distinct27
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
1
79 
2
77 
3
68 
4
67 
5
59 
Other values (22)
294 

Length

Max length3
Median length1
Mean length1.1925466
Min length1

Unique

Unique7 ?
Unique (%)1.1%

Sample

1st row1
2nd row1
3rd row1
4th row2
5th row1

Common Values

ValueCountFrequency (%)
1 79
12.3%
2 77
12.0%
3 68
10.6%
4 67
10.4%
5 59
9.2%
6 57
8.9%
7 49
7.6%
8 43
6.7%
9 27
 
4.2%
10 25
 
3.9%
Other values (17) 93
14.4%

Length

2023-12-11T01:07:27.491348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 79
12.3%
2 77
12.0%
3 68
10.6%
4 67
10.4%
5 59
9.2%
6 57
8.9%
7 49
7.6%
8 43
6.7%
9 27
 
4.2%
10 25
 
3.9%
Other values (17) 93
14.4%

설치년도
Real number (ℝ)

HIGH CORRELATION 

Distinct22
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2006.4239
Minimum1985
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.8 KiB
2023-12-11T01:07:27.679194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1985
5-th percentile1989
Q12002
median2005
Q32011
95-th percentile2017
Maximum2021
Range36
Interquartile range (IQR)9

Descriptive statistics

Standard deviation7.32019
Coefficient of variation (CV)0.0036483766
Kurtosis0.79359092
Mean2006.4239
Median Absolute Deviation (MAD)6
Skewness-0.6852411
Sum1292137
Variance53.585182
MonotonicityNot monotonic
2023-12-11T01:07:27.884606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
2005 167
25.9%
2011 135
21.0%
2017 88
13.7%
2001 64
 
9.9%
1998 52
 
8.1%
2007 34
 
5.3%
2008 18
 
2.8%
1985 13
 
2.0%
2002 12
 
1.9%
2006 9
 
1.4%
Other values (12) 52
 
8.1%
ValueCountFrequency (%)
1985 13
 
2.0%
1987 6
 
0.9%
1988 8
 
1.2%
1989 7
 
1.1%
1994 4
 
0.6%
1998 52
 
8.1%
2001 64
 
9.9%
2002 12
 
1.9%
2004 1
 
0.2%
2005 167
25.9%
ValueCountFrequency (%)
2021 2
 
0.3%
2018 4
 
0.6%
2017 88
13.7%
2016 5
 
0.8%
2015 1
 
0.2%
2014 4
 
0.6%
2012 4
 
0.6%
2011 135
21.0%
2009 6
 
0.9%
2008 18
 
2.8%

교체주기(개량년도)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct17
Distinct (%)28.3%
Missing584
Missing (%)90.7%
Infinite0
Infinite (%)0.0%
Mean2009.6167
Minimum1998
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.8 KiB
2023-12-11T01:07:28.050679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1998
5-th percentile2003.95
Q12004.75
median2008.5
Q32016
95-th percentile2018
Maximum2020
Range22
Interquartile range (IQR)11.25

Descriptive statistics

Standard deviation5.66611
Coefficient of variation (CV)0.0028194979
Kurtosis-1.4105201
Mean2009.6167
Median Absolute Deviation (MAD)4.5
Skewness0.18898412
Sum120577
Variance32.104802
MonotonicityNot monotonic
2023-12-11T01:07:28.252883image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
2004 12
 
1.9%
2005 10
 
1.6%
2016 9
 
1.4%
2006 4
 
0.6%
2015 4
 
0.6%
2011 4
 
0.6%
2017 3
 
0.5%
2012 2
 
0.3%
2013 2
 
0.3%
2003 2
 
0.3%
Other values (7) 8
 
1.2%
(Missing) 584
90.7%
ValueCountFrequency (%)
1998 1
 
0.2%
2003 2
 
0.3%
2004 12
1.9%
2005 10
1.6%
2006 4
 
0.6%
2008 1
 
0.2%
2009 1
 
0.2%
2010 1
 
0.2%
2011 4
 
0.6%
2012 2
 
0.3%
ValueCountFrequency (%)
2020 1
 
0.2%
2019 1
 
0.2%
2018 2
 
0.3%
2017 3
 
0.5%
2016 9
1.4%
2015 4
0.6%
2013 2
 
0.3%
2012 2
 
0.3%
2011 4
0.6%
2010 1
 
0.2%

Interactions

2023-12-11T01:07:25.739532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:25.500670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:25.841782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:25.620032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:07:28.402054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선역명호기설치년도교체주기(개량년도)
호선1.0000.9980.1090.9871.000
역명0.9981.0000.0000.9970.883
호기0.1090.0001.0000.0000.000
설치년도0.9870.9970.0001.0000.720
교체주기(개량년도)1.0000.8830.0000.7201.000
2023-12-11T01:07:28.573730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호기호선
호기1.0000.056
호선0.0561.000
2023-12-11T01:07:28.693727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설치년도교체주기(개량년도)호선호기
설치년도1.0000.6590.9410.000
교체주기(개량년도)0.6591.0000.9380.000
호선0.9410.9381.0000.056
호기0.0000.0000.0561.000

Missing values

2023-12-11T01:07:25.986593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:07:26.127529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선역명호기설치년도교체주기(개량년도)
01노포12004<NA>
11범어사119852011
21동래12011<NA>
31동래22011<NA>
41교대12016<NA>
51연산12006<NA>
61연산22006<NA>
71연산32008<NA>
81연산42008<NA>
91서면119852004
호선역명호기설치년도교체주기(개량년도)
6344고촌32011<NA>
6354고촌42011<NA>
6364고촌52011<NA>
6374고촌62011<NA>
6384안평12011<NA>
6394안평22011<NA>
6404안평32011<NA>
6414안평42011<NA>
6424안평52011<NA>
6434안평62011<NA>

Duplicate rows

Most frequently occurring

호선역명호기설치년도교체주기(개량년도)# duplicates
01남포72017<NA>2