Overview

Dataset statistics

Number of variables5
Number of observations646
Missing cells585
Missing cells (%)18.1%
Duplicate rows1
Duplicate rows (%)0.2%
Total size in memory27.3 KiB
Average record size in memory43.2 B

Variable types

Categorical2
Text1
Numeric2

Dataset

Description부산교통공사_승강기연도별설치현황_20211231
Author부산교통공사
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15052663

Alerts

Dataset has 1 (0.2%) duplicate rowsDuplicates
설치년도 is highly overall correlated with 교체주기(개량년도) and 1 other fieldsHigh correlation
교체주기(개량년도) is highly overall correlated with 설치년도 and 1 other fieldsHigh correlation
호선 is highly overall correlated with 설치년도 and 1 other fieldsHigh correlation
교체주기(개량년도) has 585 (90.6%) missing valuesMissing

Reproduction

Analysis started2023-12-10 16:07:21.135968
Analysis finished2023-12-10 16:07:21.832486
Duration0.7 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
2
202 
3
174 
1
139 
4
131 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 202
31.3%
3 174
26.9%
1 139
21.5%
4 131
20.3%

Length

2023-12-11T01:07:21.889331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:07:21.981744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 202
31.3%
3 174
26.9%
1 139
21.5%
4 131
20.3%

역명
Text

Distinct77
Distinct (%)11.9%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
2023-12-11T01:07:22.186564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length2
Mean length2.7291022
Min length2

Characters and Unicode

Total characters1763
Distinct characters109
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.5%

Sample

1st row서면
2nd row서면
3rd row서면
4th row서면
5th row서면
ValueCountFrequency (%)
동래 26
 
4.0%
다대포항 21
 
3.3%
만덕 20
 
3.1%
연산동 20
 
3.1%
배산 16
 
2.5%
서면 16
 
2.5%
수안 16
 
2.5%
장림 14
 
2.2%
센텀시티 14
 
2.2%
낫개 13
 
2.0%
Other values (67) 470
72.8%
2023-12-11T01:07:22.536607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
120
 
6.8%
104
 
5.9%
97
 
5.5%
70
 
4.0%
57
 
3.2%
48
 
2.7%
47
 
2.7%
43
 
2.4%
37
 
2.1%
37
 
2.1%
Other values (99) 1103
62.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1755
99.5%
Close Punctuation 4
 
0.2%
Open Punctuation 4
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
120
 
6.8%
104
 
5.9%
97
 
5.5%
70
 
4.0%
57
 
3.2%
48
 
2.7%
47
 
2.7%
43
 
2.5%
37
 
2.1%
37
 
2.1%
Other values (97) 1095
62.4%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1755
99.5%
Common 8
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
120
 
6.8%
104
 
5.9%
97
 
5.5%
70
 
4.0%
57
 
3.2%
48
 
2.7%
47
 
2.7%
43
 
2.5%
37
 
2.1%
37
 
2.1%
Other values (97) 1095
62.4%
Common
ValueCountFrequency (%)
) 4
50.0%
( 4
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1755
99.5%
ASCII 8
 
0.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
120
 
6.8%
104
 
5.9%
97
 
5.5%
70
 
4.0%
57
 
3.2%
48
 
2.7%
47
 
2.7%
43
 
2.5%
37
 
2.1%
37
 
2.1%
Other values (97) 1095
62.4%
ASCII
ValueCountFrequency (%)
) 4
50.0%
( 4
50.0%

호기
Categorical

Distinct27
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
1
79 
2
77 
3
69 
4
68 
5
59 
Other values (22)
294 

Length

Max length3
Median length1
Mean length1.1919505
Min length1

Unique

Unique7 ?
Unique (%)1.1%

Sample

1st row1
2nd row2
3rd row3
4th row4
5th row5

Common Values

ValueCountFrequency (%)
1 79
12.2%
2 77
11.9%
3 69
10.7%
4 68
10.5%
5 59
9.1%
6 57
8.8%
7 49
7.6%
8 43
6.7%
9 27
 
4.2%
10 25
 
3.9%
Other values (17) 93
14.4%

Length

2023-12-11T01:07:22.673159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 79
12.2%
2 77
11.9%
3 69
10.7%
4 68
10.5%
5 59
9.1%
6 57
8.8%
7 49
7.6%
8 43
6.7%
9 27
 
4.2%
10 25
 
3.9%
Other values (17) 93
14.4%

설치년도
Real number (ℝ)

HIGH CORRELATION 

Distinct22
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2006.469
Minimum1985
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.8 KiB
2023-12-11T01:07:22.796861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1985
5-th percentile1989
Q12002
median2005
Q32011
95-th percentile2017
Maximum2021
Range36
Interquartile range (IQR)9

Descriptive statistics

Standard deviation7.3536239
Coefficient of variation (CV)0.0036649576
Kurtosis0.77774392
Mean2006.469
Median Absolute Deviation (MAD)6
Skewness-0.66807875
Sum1296179
Variance54.075784
MonotonicityNot monotonic
2023-12-11T01:07:22.902810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
2005 167
25.9%
2011 135
20.9%
2017 88
13.6%
2001 64
 
9.9%
1998 52
 
8.0%
2007 34
 
5.3%
2008 18
 
2.8%
1985 13
 
2.0%
2002 12
 
1.9%
2006 9
 
1.4%
Other values (12) 54
 
8.4%
ValueCountFrequency (%)
1985 13
 
2.0%
1987 6
 
0.9%
1988 8
 
1.2%
1989 7
 
1.1%
1994 4
 
0.6%
1998 52
 
8.0%
2001 64
 
9.9%
2002 12
 
1.9%
2004 1
 
0.2%
2005 167
25.9%
ValueCountFrequency (%)
2021 4
 
0.6%
2018 4
 
0.6%
2017 88
13.6%
2016 5
 
0.8%
2015 1
 
0.2%
2014 4
 
0.6%
2012 4
 
0.6%
2011 135
20.9%
2009 6
 
0.9%
2008 18
 
2.8%

교체주기(개량년도)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct18
Distinct (%)29.5%
Missing585
Missing (%)90.6%
Infinite0
Infinite (%)0.0%
Mean2009.8033
Minimum1998
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.8 KiB
2023-12-11T01:07:23.014394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1998
5-th percentile2004
Q12005
median2009
Q32016
95-th percentile2018
Maximum2021
Range23
Interquartile range (IQR)11

Descriptive statistics

Standard deviation5.8046524
Coefficient of variation (CV)0.0028881694
Kurtosis-1.3602124
Mean2009.8033
Median Absolute Deviation (MAD)5
Skewness0.20295419
Sum122598
Variance33.693989
MonotonicityNot monotonic
2023-12-11T01:07:23.113561image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
2004 12
 
1.9%
2005 10
 
1.5%
2016 9
 
1.4%
2006 4
 
0.6%
2011 4
 
0.6%
2015 4
 
0.6%
2017 3
 
0.5%
2003 2
 
0.3%
2018 2
 
0.3%
2013 2
 
0.3%
Other values (8) 9
 
1.4%
(Missing) 585
90.6%
ValueCountFrequency (%)
1998 1
 
0.2%
2003 2
 
0.3%
2004 12
1.9%
2005 10
1.5%
2006 4
 
0.6%
2008 1
 
0.2%
2009 1
 
0.2%
2010 1
 
0.2%
2011 4
 
0.6%
2012 2
 
0.3%
ValueCountFrequency (%)
2021 1
 
0.2%
2020 1
 
0.2%
2019 1
 
0.2%
2018 2
 
0.3%
2017 3
 
0.5%
2016 9
1.4%
2015 4
0.6%
2013 2
 
0.3%
2012 2
 
0.3%
2011 4
0.6%

Interactions

2023-12-11T01:07:21.540494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:21.378160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:21.609809image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:21.468889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:07:23.187850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선역명호기설치년도교체주기(개량년도)
호선1.0000.9980.1150.9870.999
역명0.9981.0000.0000.9980.928
호기0.1150.0001.0000.0000.000
설치년도0.9870.9980.0001.0000.722
교체주기(개량년도)0.9990.9280.0000.7221.000
2023-12-11T01:07:23.272172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호기호선
호기1.0000.059
호선0.0591.000
2023-12-11T01:07:23.376666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설치년도교체주기(개량년도)호선호기
설치년도1.0000.6600.9410.000
교체주기(개량년도)0.6601.0000.9120.000
호선0.9410.9121.0000.059
호기0.0000.0000.0591.000

Missing values

2023-12-11T01:07:21.706964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:07:21.797721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선역명호기설치년도교체주기(개량년도)
01서면119852004
11서면219852004
21서면319852004
31서면419852004
41서면519852004
51서면619852004
61서면719852004
71서면819852004
81서면919852004
91서면1019852004
호선역명호기설치년도교체주기(개량년도)
6364금사62011<NA>
6374금사72011<NA>
6384금사82011<NA>
6394금사92011<NA>
6404고촌12011<NA>
6414고촌22011<NA>
6424고촌32011<NA>
6434고촌42011<NA>
6444고촌52011<NA>
6454고촌62011<NA>

Duplicate rows

Most frequently occurring

호선역명호기설치년도교체주기(개량년도)# duplicates
01남포72017<NA>2