Overview

Dataset statistics

Number of variables5
Number of observations640
Missing cells582
Missing cells (%)18.2%
Duplicate rows1
Duplicate rows (%)0.2%
Total size in memory27.0 KiB
Average record size in memory43.2 B

Variable types

Categorical2
Text1
Numeric2

Dataset

Description부산교통공사_승강기연도별설치현황_20200527
Author부산교통공사
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15052663

Alerts

Dataset has 1 (0.2%) duplicate rowsDuplicates
설치년도 is highly overall correlated with 교체주기(개량년도) and 1 other fieldsHigh correlation
교체주기(개량년도) is highly overall correlated with 설치년도 and 1 other fieldsHigh correlation
호선 is highly overall correlated with 설치년도 and 1 other fieldsHigh correlation
교체주기(개량년도) has 582 (90.9%) missing valuesMissing

Reproduction

Analysis started2023-12-10 16:07:29.922775
Analysis finished2023-12-10 16:07:31.114993
Duration1.19 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
2
198 
3
174 
1
137 
4
131 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 198
30.9%
3 174
27.2%
1 137
21.4%
4 131
20.5%

Length

2023-12-11T01:07:31.196049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:07:31.358098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 198
30.9%
3 174
27.2%
1 137
21.4%
4 131
20.5%

역명
Text

Distinct76
Distinct (%)11.9%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
2023-12-11T01:07:31.640089image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length2
Mean length2.7671875
Min length2

Characters and Unicode

Total characters1771
Distinct characters109
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.5%

Sample

1st row노포
2nd row범어사
3rd row동래
4th row동래
5th row교대
ValueCountFrequency (%)
동래 26
 
4.0%
다대포항 21
 
3.2%
연산동 20
 
3.1%
만덕 20
 
3.1%
배산 16
 
2.5%
서면 16
 
2.5%
수안 16
 
2.5%
장림 14
 
2.2%
센텀시티 14
 
2.2%
낫개 13
 
2.0%
Other values (67) 475
73.0%
2023-12-11T01:07:32.043964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
118
 
6.7%
104
 
5.9%
97
 
5.5%
70
 
4.0%
57
 
3.2%
48
 
2.7%
47
 
2.7%
43
 
2.4%
37
 
2.1%
35
 
2.0%
Other values (99) 1115
63.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1741
98.3%
Space Separator 22
 
1.2%
Close Punctuation 4
 
0.2%
Open Punctuation 4
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
118
 
6.8%
104
 
6.0%
97
 
5.6%
70
 
4.0%
57
 
3.3%
48
 
2.8%
47
 
2.7%
43
 
2.5%
37
 
2.1%
35
 
2.0%
Other values (96) 1085
62.3%
Space Separator
ValueCountFrequency (%)
22
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1741
98.3%
Common 30
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
118
 
6.8%
104
 
6.0%
97
 
5.6%
70
 
4.0%
57
 
3.3%
48
 
2.8%
47
 
2.7%
43
 
2.5%
37
 
2.1%
35
 
2.0%
Other values (96) 1085
62.3%
Common
ValueCountFrequency (%)
22
73.3%
) 4
 
13.3%
( 4
 
13.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1741
98.3%
ASCII 30
 
1.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
118
 
6.8%
104
 
6.0%
97
 
5.6%
70
 
4.0%
57
 
3.3%
48
 
2.8%
47
 
2.7%
43
 
2.5%
37
 
2.1%
35
 
2.0%
Other values (96) 1085
62.3%
ASCII
ValueCountFrequency (%)
22
73.3%
) 4
 
13.3%
( 4
 
13.3%

호기
Categorical

Distinct27
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
1
78 
2
76 
3
68 
4
67 
5
58 
Other values (22)
293 

Length

Max length3
Median length1
Mean length1.19375
Min length1

Unique

Unique7 ?
Unique (%)1.1%

Sample

1st row1
2nd row1
3rd row1
4th row2
5th row1

Common Values

ValueCountFrequency (%)
1 78
12.2%
2 76
11.9%
3 68
10.6%
4 67
10.5%
5 58
9.1%
6 56
8.8%
7 49
7.7%
8 43
6.7%
9 27
 
4.2%
10 25
 
3.9%
Other values (17) 93
14.5%

Length

2023-12-11T01:07:32.205216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 78
12.2%
2 76
11.9%
3 68
10.6%
4 67
10.5%
5 58
9.1%
6 56
8.8%
7 49
7.7%
8 43
6.7%
9 27
 
4.2%
10 25
 
3.9%
Other values (17) 93
14.5%

설치년도
Real number (ℝ)

HIGH CORRELATION 

Distinct21
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2006.3422
Minimum1985
Maximum2018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.8 KiB
2023-12-11T01:07:32.353417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1985
5-th percentile1989
Q12002
median2005
Q32011
95-th percentile2017
Maximum2018
Range33
Interquartile range (IQR)9

Descriptive statistics

Standard deviation7.2683916
Coefficient of variation (CV)0.0036227078
Kurtosis0.82369968
Mean2006.3422
Median Absolute Deviation (MAD)6
Skewness-0.70855837
Sum1284059
Variance52.829516
MonotonicityNot monotonic
2023-12-11T01:07:32.497028image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
2005 167
26.1%
2011 135
21.1%
2017 88
13.8%
2001 64
 
10.0%
1998 52
 
8.1%
2007 34
 
5.3%
2008 18
 
2.8%
1985 13
 
2.0%
2002 12
 
1.9%
2006 9
 
1.4%
Other values (11) 48
 
7.5%
ValueCountFrequency (%)
1985 13
 
2.0%
1987 6
 
0.9%
1988 8
 
1.2%
1989 7
 
1.1%
1994 4
 
0.6%
1998 52
 
8.1%
2001 64
 
10.0%
2002 12
 
1.9%
2004 1
 
0.2%
2005 167
26.1%
ValueCountFrequency (%)
2018 2
 
0.3%
2017 88
13.8%
2016 5
 
0.8%
2015 1
 
0.2%
2014 4
 
0.6%
2012 4
 
0.6%
2011 135
21.1%
2009 6
 
0.9%
2008 18
 
2.8%
2007 34
 
5.3%

교체주기(개량년도)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct15
Distinct (%)25.9%
Missing582
Missing (%)90.9%
Infinite0
Infinite (%)0.0%
Mean2009.2759
Minimum1998
Maximum2018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.8 KiB
2023-12-11T01:07:32.624144image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1998
5-th percentile2003.85
Q12004.25
median2007
Q32015
95-th percentile2017
Maximum2018
Range20
Interquartile range (IQR)10.75

Descriptive statistics

Standard deviation5.4476556
Coefficient of variation (CV)0.0027112532
Kurtosis-1.4414379
Mean2009.2759
Median Absolute Deviation (MAD)3.5
Skewness0.19633221
Sum116538
Variance29.676951
MonotonicityNot monotonic
2023-12-11T01:07:32.754946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
2004 12
 
1.9%
2005 10
 
1.6%
2016 9
 
1.4%
2011 4
 
0.6%
2006 4
 
0.6%
2015 4
 
0.6%
2017 3
 
0.5%
2012 2
 
0.3%
2013 2
 
0.3%
2003 2
 
0.3%
Other values (5) 6
 
0.9%
(Missing) 582
90.9%
ValueCountFrequency (%)
1998 1
 
0.2%
2003 2
 
0.3%
2004 12
1.9%
2005 10
1.6%
2006 4
 
0.6%
2008 1
 
0.2%
2009 1
 
0.2%
2010 1
 
0.2%
2011 4
 
0.6%
2012 2
 
0.3%
ValueCountFrequency (%)
2018 2
 
0.3%
2017 3
 
0.5%
2016 9
1.4%
2015 4
0.6%
2013 2
 
0.3%
2012 2
 
0.3%
2011 4
0.6%
2010 1
 
0.2%
2009 1
 
0.2%
2008 1
 
0.2%

Interactions

2023-12-11T01:07:30.721033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:30.217843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:30.825873image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:30.619460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:07:32.858330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선역명호기설치년도교체주기(개량년도)
호선1.0001.0000.1080.9491.000
역명1.0001.0000.0000.9980.954
호기0.1080.0001.0000.0000.000
설치년도0.9490.9980.0001.0000.905
교체주기(개량년도)1.0000.9540.0000.9051.000
2023-12-11T01:07:32.954875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호기호선
호기1.0000.056
호선0.0561.000
2023-12-11T01:07:33.067967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설치년도교체주기(개량년도)호선호기
설치년도1.0000.6350.8720.000
교체주기(개량년도)0.6351.0000.9350.000
호선0.8720.9351.0000.056
호기0.0000.0000.0561.000

Missing values

2023-12-11T01:07:30.973575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:07:31.073200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선역명호기설치년도교체주기(개량년도)
01노포12004<NA>
11범어사119852011
21동래12011<NA>
31동래22011<NA>
41교대12016<NA>
51연산12006<NA>
61연산22006<NA>
71연산32008<NA>
81연산42008<NA>
91서면119852004
호선역명호기설치년도교체주기(개량년도)
6304고촌32011<NA>
6314고촌42011<NA>
6324고촌52011<NA>
6334고촌62011<NA>
6344안평12011<NA>
6354안평22011<NA>
6364안평32011<NA>
6374안평42011<NA>
6384안평52011<NA>
6394안평62011<NA>

Duplicate rows

Most frequently occurring

호선역명호기설치년도교체주기(개량년도)# duplicates
01남포72017<NA>2