Overview

Dataset statistics

Number of variables7
Number of observations114
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.7 KiB
Average record size in memory60.2 B

Variable types

Categorical3
Text2
Numeric2

Dataset

Description부산도시철도역사건축현황(2020년)
Author부산교통공사
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15043687

Alerts

역사면적(㎡) is highly overall correlated with 역형식High correlation
준공년도 is highly overall correlated with 호선High correlation
호선 is highly overall correlated with 준공년도High correlation
역형식 is highly overall correlated with 역사면적(㎡)High correlation
역사면적(㎡) has unique valuesUnique

Reproduction

Analysis started2023-12-10 17:00:20.297156
Analysis finished2023-12-10 17:00:21.916075
Duration1.62 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2
43 
1
40 
3
17 
4
14 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 43
37.7%
1 40
35.1%
3 17
 
14.9%
4 14
 
12.3%

Length

2023-12-11T02:00:22.016372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T02:00:22.171983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 43
37.7%
1 40
35.1%
3 17
 
14.9%
4 14
 
12.3%

역명
Text

Distinct108
Distinct (%)94.7%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2023-12-11T02:00:22.606972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length2
Mean length2.6315789
Min length2

Characters and Unicode

Total characters300
Distinct characters134
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique102 ?
Unique (%)89.5%

Sample

1st row다대포해수욕장역
2nd row다대포항역
3rd row낫개역
4th row신장림역
5th row장림역
ValueCountFrequency (%)
연산 2
 
1.7%
미남 2
 
1.7%
덕천 2
 
1.7%
동래 2
 
1.7%
수영 2
 
1.7%
서면 2
 
1.7%
증산 1
 
0.9%
동원 1
 
0.9%
금곡 1
 
0.9%
호포 1
 
0.9%
Other values (100) 100
86.2%
2023-12-11T02:00:23.274285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
19
 
6.3%
17
 
5.7%
11
 
3.7%
9
 
3.0%
9
 
3.0%
8
 
2.7%
8
 
2.7%
7
 
2.3%
6
 
2.0%
6
 
2.0%
Other values (124) 200
66.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 298
99.3%
Space Separator 2
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
19
 
6.4%
17
 
5.7%
11
 
3.7%
9
 
3.0%
9
 
3.0%
8
 
2.7%
8
 
2.7%
7
 
2.3%
6
 
2.0%
6
 
2.0%
Other values (123) 198
66.4%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 298
99.3%
Common 2
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
19
 
6.4%
17
 
5.7%
11
 
3.7%
9
 
3.0%
9
 
3.0%
8
 
2.7%
8
 
2.7%
7
 
2.3%
6
 
2.0%
6
 
2.0%
Other values (123) 198
66.4%
Common
ValueCountFrequency (%)
2
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 298
99.3%
ASCII 2
 
0.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
19
 
6.4%
17
 
5.7%
11
 
3.7%
9
 
3.0%
9
 
3.0%
8
 
2.7%
8
 
2.7%
7
 
2.3%
6
 
2.0%
6
 
2.0%
Other values (123) 198
66.4%
ASCII
ValueCountFrequency (%)
2
100.0%

역사면적(㎡)
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct114
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8752.0561
Minimum2087
Maximum20197
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-12-11T02:00:23.516906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2087
5-th percentile4162.95
Q17113.555
median8468
Q310196.865
95-th percentile14581.3
Maximum20197
Range18110
Interquartile range (IQR)3083.31

Descriptive statistics

Standard deviation3070.2232
Coefficient of variation (CV)0.35080022
Kurtosis2.4943667
Mean8752.0561
Median Absolute Deviation (MAD)1399.5
Skewness0.98595494
Sum997734.4
Variance9426270.4
MonotonicityNot monotonic
2023-12-11T02:00:23.788630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17493.14 1
 
0.9%
11128.0 1
 
0.9%
12350.0 1
 
0.9%
4262.0 1
 
0.9%
13594.0 1
 
0.9%
7934.0 1
 
0.9%
9055.0 1
 
0.9%
6830.0 1
 
0.9%
9999.0 1
 
0.9%
6401.0 1
 
0.9%
Other values (104) 104
91.2%
ValueCountFrequency (%)
2087.0 1
0.9%
2747.0 1
0.9%
2751.0 1
0.9%
2851.0 1
0.9%
2981.0 1
0.9%
3979.0 1
0.9%
4262.0 1
0.9%
4818.0 1
0.9%
4905.0 1
0.9%
5295.0 1
0.9%
ValueCountFrequency (%)
20197.0 1
0.9%
19011.0 1
0.9%
17493.14 1
0.9%
16182.0 1
0.9%
15524.0 1
0.9%
14619.0 1
0.9%
14561.0 1
0.9%
14216.0 1
0.9%
13594.0 1
0.9%
13004.0 1
0.9%

역형식
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
상대식
92 
섬식
22 

Length

Max length3
Median length3
Mean length2.8070175
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row상대식
2nd row상대식
3rd row상대식
4th row상대식
5th row상대식

Common Values

ValueCountFrequency (%)
상대식 92
80.7%
섬식 22
 
19.3%

Length

2023-12-11T02:00:24.043970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T02:00:24.214812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
상대식 92
80.7%
섬식 22
 
19.3%

층수
Categorical

Distinct17
Distinct (%)14.9%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
지하 2층
43 
지하 3층
27 
지상 2층
15 
지하 4층
지하2층
 
4
Other values (12)
17 

Length

Max length9
Median length5
Mean length5.0263158
Min length2

Unique

Unique8 ?
Unique (%)7.0%

Sample

1st row지하3층
2nd row지하2층
3rd row지하3층
4th row지하2층
5th row지하2층

Common Values

ValueCountFrequency (%)
지하 2층 43
37.7%
지하 3층 27
23.7%
지상 2층 15
 
13.2%
지하 4층 8
 
7.0%
지하2층 4
 
3.5%
지상 3층 3
 
2.6%
지하 5층 2
 
1.8%
지상 5층 2
 
1.8%
지하3층 2
 
1.8%
지하 8층 1
 
0.9%
Other values (7) 7
 
6.1%

Length

2023-12-11T02:00:24.388808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
지하 83
37.6%
2층 58
26.2%
3층 30
 
13.6%
지상 21
 
9.5%
4층 9
 
4.1%
지하2층 4
 
1.8%
5층 4
 
1.8%
지하1층 3
 
1.4%
지하3층 2
 
0.9%
8층 1
 
0.5%
Other values (6) 6
 
2.7%
Distinct88
Distinct (%)77.2%
Missing0
Missing (%)0.0%
Memory size1.0 KiB
2023-12-11T02:00:24.720362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length4.2368421
Min length2

Characters and Unicode

Total characters483
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique84 ?
Unique (%)73.7%

Sample

1st row16.2
2nd row20.27
3rd row20.16
4th row25.31
5th row18.61
ValueCountFrequency (%)
지상 24
 
21.1%
23.99 2
 
1.8%
11.95 2
 
1.8%
18.37 2
 
1.8%
12.15 1
 
0.9%
12.9 1
 
0.9%
13.31 1
 
0.9%
22.54 1
 
0.9%
20.72 1
 
0.9%
13.49 1
 
0.9%
Other values (78) 78
68.4%
2023-12-11T02:00:25.213681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 92
19.0%
. 89
18.4%
2 60
12.4%
3 41
8.5%
7 25
 
5.2%
24
 
5.0%
24
 
5.0%
9 24
 
5.0%
5 23
 
4.8%
8 23
 
4.8%
Other values (3) 58
12.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 346
71.6%
Other Punctuation 89
 
18.4%
Other Letter 48
 
9.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 92
26.6%
2 60
17.3%
3 41
11.8%
7 25
 
7.2%
9 24
 
6.9%
5 23
 
6.6%
8 23
 
6.6%
6 22
 
6.4%
0 19
 
5.5%
4 17
 
4.9%
Other Letter
ValueCountFrequency (%)
24
50.0%
24
50.0%
Other Punctuation
ValueCountFrequency (%)
. 89
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 435
90.1%
Hangul 48
 
9.9%

Most frequent character per script

Common
ValueCountFrequency (%)
1 92
21.1%
. 89
20.5%
2 60
13.8%
3 41
9.4%
7 25
 
5.7%
9 24
 
5.5%
5 23
 
5.3%
8 23
 
5.3%
6 22
 
5.1%
0 19
 
4.4%
Hangul
ValueCountFrequency (%)
24
50.0%
24
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 435
90.1%
Hangul 48
 
9.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 92
21.1%
. 89
20.5%
2 60
13.8%
3 41
9.4%
7 25
 
5.7%
9 24
 
5.5%
5 23
 
5.3%
8 23
 
5.3%
6 22
 
5.1%
0 19
 
4.4%
Hangul
ValueCountFrequency (%)
24
50.0%
24
50.0%

준공년도
Real number (ℝ)

HIGH CORRELATION 

Distinct11
Distinct (%)9.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1999.5965
Minimum1985
Maximum2016
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 KiB
2023-12-11T02:00:25.353141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1985
5-th percentile1985
Q11994
median2001
Q32005
95-th percentile2012.75
Maximum2016
Range31
Interquartile range (IQR)11

Descriptive statistics

Standard deviation9.1754957
Coefficient of variation (CV)0.0045886737
Kurtosis-0.8733611
Mean1999.5965
Median Absolute Deviation (MAD)7
Skewness-0.20856378
Sum227954
Variance84.189722
MonotonicityNot monotonic
2023-12-11T02:00:25.493026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
1999 21
18.4%
1985 17
14.9%
2005 17
14.9%
2011 14
12.3%
2002 9
7.9%
2001 9
7.9%
2016 6
 
5.3%
1994 6
 
5.3%
1987 6
 
5.3%
1990 5
 
4.4%
ValueCountFrequency (%)
1985 17
14.9%
1987 6
 
5.3%
1990 5
 
4.4%
1994 6
 
5.3%
1999 21
18.4%
2001 9
7.9%
2002 9
7.9%
2005 17
14.9%
2008 4
 
3.5%
2011 14
12.3%
ValueCountFrequency (%)
2016 6
 
5.3%
2011 14
12.3%
2008 4
 
3.5%
2005 17
14.9%
2002 9
7.9%
2001 9
7.9%
1999 21
18.4%
1994 6
 
5.3%
1990 5
 
4.4%
1987 6
 
5.3%

Interactions

2023-12-11T02:00:21.233835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:00:20.908946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:00:21.399228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:00:21.074073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T02:00:25.602547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선역사면적(㎡)역형식층수역사심도(지하승강장~지상 m)준공년도
호선1.0000.6530.5360.6030.0001.000
역사면적(㎡)0.6531.0000.6620.7290.6320.528
역형식0.5360.6621.0000.0000.0000.301
층수0.6030.7290.0001.0000.0000.777
역사심도(지하승강장~지상 m)0.0000.6320.0000.0001.0000.502
준공년도1.0000.5280.3010.7770.5021.000
2023-12-11T02:00:25.749853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
층수호선역형식
층수1.0000.3570.000
호선0.3571.0000.362
역형식0.0000.3621.000
2023-12-11T02:00:25.875918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역사면적(㎡)준공년도호선역형식층수
역사면적(㎡)1.000-0.0080.4520.5200.382
준공년도-0.0081.0000.9770.3170.450
호선0.4520.9771.0000.3620.357
역형식0.5200.3170.3621.0000.000
층수0.3820.4500.3570.0001.000

Missing values

2023-12-11T02:00:21.601332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T02:00:21.820689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선역명역사면적(㎡)역형식층수역사심도(지하승강장~지상 m)준공년도
01다대포해수욕장역17493.14상대식지하3층16.22016
11다대포항역9283.17상대식지하2층20.272016
21낫개역10213.82상대식지하3층20.162016
31신장림역7118.22상대식지하2층25.312016
41장림역7863.45상대식지하2층18.612016
51동매역7163.6상대식지하2층16.72016
61신평11290.0상대식지하1층 지상3층지상1994
71하단9767.0상대식지하 2층11.681994
81당리7112.0상대식지하 2층12.711994
91사하7151.0상대식지하 2층12.451994
호선역명역사면적(㎡)역형식층수역사심도(지하승강장~지상 m)준공년도
1044충렬사5379.0상대식지하 3층20.472011
1054명장5295.0상대식지하 3층19.312011
1064서동5304.0상대식지하 3층20.692011
1074금사4818.0섬식지하 3층19.732011
1084반여농산물시장6841.0상대식지하1층 지상6층4.512011
1094석대2087.0섬식지상 2층지상2011
1104영산대2851.0섬식지상 2층지상2011
1114동부산대학2747.0섬식지상 2층지상2011
1124고촌2751.0섬식지상 2층지상2011
1134안평2981.0섬식지상 2층지상2011