Overview

Dataset statistics

Number of variables8
Number of observations329
Missing cells335
Missing cells (%)12.7%
Duplicate rows11
Duplicate rows (%)3.3%
Total size in memory21.7 KiB
Average record size in memory67.4 B

Variable types

Categorical3
Text2
Numeric1
DateTime2

Dataset

Description부산도시철도 역사내 등 임대시설물 시설명, 면적, 계약현황 등(호선, 역사명, 상가명, 업종, 면적, 수량, 계약시작일, 계약종료일)
Author부산교통공사
URLhttps://www.data.go.kr/data/3057656/fileData.do

Alerts

수량 has constant value ""Constant
Dataset has 11 (3.3%) duplicate rowsDuplicates
시설명 is highly imbalanced (50.3%)Imbalance
계약시작일 has 6 (1.8%) missing valuesMissing
계약종료일 has 6 (1.8%) missing valuesMissing
사업진행단계 has 323 (98.2%) missing valuesMissing

Reproduction

Analysis started2024-03-23 05:37:03.059457
Analysis finished2024-03-23 05:37:04.401425
Duration1.34 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

Distinct4
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
2
176 
1
125 
4
 
15
3
 
13

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 176
53.5%
1 125
38.0%
4 15
 
4.6%
3 13
 
4.0%

Length

2024-03-23T14:37:04.508152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T14:37:04.719333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 176
53.5%
1 125
38.0%
4 15
 
4.6%
3 13
 
4.0%
Distinct85
Distinct (%)25.8%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
2024-03-23T14:37:05.098665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length3
Mean length3.3708207
Min length3

Characters and Unicode

Total characters1109
Distinct characters116
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique43 ?
Unique (%)13.1%

Sample

1st row장림역
2nd row신평역
3rd row신평역
4th row신평역
5th row하단역
ValueCountFrequency (%)
수영역 37
 
11.2%
전포역 29
 
8.8%
양산역 19
 
5.7%
중앙역 16
 
4.8%
금련산역 14
 
4.2%
광안역 13
 
3.9%
연산역 11
 
3.3%
자갈치역 10
 
3.0%
센텀시티역 9
 
2.7%
중동역 9
 
2.7%
Other values (74) 164
49.5%
2024-03-23T14:37:05.963059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
331
29.8%
66
 
6.0%
38
 
3.4%
38
 
3.4%
35
 
3.2%
32
 
2.9%
26
 
2.3%
26
 
2.3%
25
 
2.3%
21
 
1.9%
Other values (106) 471
42.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1103
99.5%
Space Separator 2
 
0.2%
Other Punctuation 2
 
0.2%
Open Punctuation 1
 
0.1%
Close Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
331
30.0%
66
 
6.0%
38
 
3.4%
38
 
3.4%
35
 
3.2%
32
 
2.9%
26
 
2.4%
26
 
2.4%
25
 
2.3%
21
 
1.9%
Other values (102) 465
42.2%
Space Separator
ValueCountFrequency (%)
2
100.0%
Other Punctuation
ValueCountFrequency (%)
, 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1103
99.5%
Common 6
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
331
30.0%
66
 
6.0%
38
 
3.4%
38
 
3.4%
35
 
3.2%
32
 
2.9%
26
 
2.4%
26
 
2.4%
25
 
2.3%
21
 
1.9%
Other values (102) 465
42.2%
Common
ValueCountFrequency (%)
2
33.3%
, 2
33.3%
( 1
16.7%
) 1
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1103
99.5%
ASCII 6
 
0.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
331
30.0%
66
 
6.0%
38
 
3.4%
38
 
3.4%
35
 
3.2%
32
 
2.9%
26
 
2.4%
26
 
2.4%
25
 
2.3%
21
 
1.9%
Other values (102) 465
42.2%
ASCII
ValueCountFrequency (%)
2
33.3%
, 2
33.3%
( 1
16.7%
) 1
16.7%

시설명
Categorical

IMBALANCE 

Distinct20
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
상가
206 
개발상가
28 
전문상가(화장품)
25 
전문상가(편의점)
21 
통로상가
 
15
Other values (15)
34 

Length

Max length12
Median length2
Mean length3.7659574
Min length2

Unique

Unique9 ?
Unique (%)2.7%

Sample

1st row상가
2nd row상가
3rd row상가
4th row상가
5th row상가

Common Values

ValueCountFrequency (%)
상가 206
62.6%
개발상가 28
 
8.5%
전문상가(화장품) 25
 
7.6%
전문상가(편의점) 21
 
6.4%
통로상가 15
 
4.6%
전문상가(디저트카페) 10
 
3.0%
커피전문점 5
 
1.5%
사무실 4
 
1.2%
약국 2
 
0.6%
소극장 2
 
0.6%
Other values (10) 11
 
3.3%

Length

2024-03-23T14:37:06.247843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
상가 207
62.9%
개발상가 28
 
8.5%
전문상가(화장품 25
 
7.6%
전문상가(편의점 21
 
6.4%
통로상가 15
 
4.6%
전문상가(디저트카페 10
 
3.0%
커피전문점 5
 
1.5%
사무실 4
 
1.2%
전문상가(건강식품 2
 
0.6%
소극장 2
 
0.6%
Other values (9) 10
 
3.0%

면적(제곱미터)
Real number (ℝ)

Distinct251
Distinct (%)76.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean132.01535
Minimum6.05
Maximum12258
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.0 KiB
2024-03-23T14:37:06.500628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum6.05
5-th percentile9.958
Q120
median30
Q345.13
95-th percentile311.9
Maximum12258
Range12251.95
Interquartile range (IQR)25.13

Descriptive statistics

Standard deviation867.25542
Coefficient of variation (CV)6.5693529
Kurtosis161.48685
Mean132.01535
Median Absolute Deviation (MAD)12.19
Skewness12.506922
Sum43433.05
Variance752131.96
MonotonicityNot monotonic
2024-03-23T14:37:06.967260image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30.0 24
 
7.3%
27.36 9
 
2.7%
20.0 7
 
2.1%
27.0 6
 
1.8%
10.0 5
 
1.5%
29.4 4
 
1.2%
16.49 3
 
0.9%
60.0 3
 
0.9%
32.4 3
 
0.9%
28.0 3
 
0.9%
Other values (241) 262
79.6%
ValueCountFrequency (%)
6.05 1
0.3%
7.0 1
0.3%
7.59 1
0.3%
7.82 1
0.3%
8.0 1
0.3%
8.05 1
0.3%
8.14 1
0.3%
8.58 1
0.3%
8.59 1
0.3%
9.0 1
0.3%
ValueCountFrequency (%)
12258.0 1
0.3%
9691.0 1
0.3%
1025.33 1
0.3%
948.57 1
0.3%
927.29 1
0.3%
821.8 1
0.3%
820.9 1
0.3%
775.18 1
0.3%
731.85 1
0.3%
691.46 1
0.3%

수량
Categorical

CONSTANT 

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size2.7 KiB
1
329 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 329
100.0%

Length

2024-03-23T14:37:07.219133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T14:37:07.438800image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 329
100.0%

계약시작일
Date

MISSING 

Distinct191
Distinct (%)59.1%
Missing6
Missing (%)1.8%
Memory size2.7 KiB
Minimum2010-06-01 00:00:00
Maximum2024-03-05 00:00:00
2024-03-23T14:37:07.611856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T14:37:07.939966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

계약종료일
Date

MISSING 

Distinct196
Distinct (%)60.7%
Missing6
Missing (%)1.8%
Memory size2.7 KiB
Minimum2023-07-02 00:00:00
Maximum2040-11-30 00:00:00
2024-03-23T14:37:08.262948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T14:37:08.493961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

사업진행단계
Text

MISSING 

Distinct5
Distinct (%)83.3%
Missing323
Missing (%)98.2%
Memory size2.7 KiB
2024-03-23T14:37:08.746236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length10
Mean length7.8333333
Min length2

Characters and Unicode

Total characters47
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)66.7%

Sample

1st row소송 진행 중
2nd row개장 준비 중
3rd row상가 정상화 추진 중
4th row공실(계약진행 중)
5th row공실(계약진행 중)
ValueCountFrequency (%)
5
33.3%
공실(계약진행 2
 
13.3%
소송 1
 
6.7%
진행 1
 
6.7%
개장 1
 
6.7%
준비 1
 
6.7%
상가 1
 
6.7%
정상화 1
 
6.7%
추진 1
 
6.7%
공실 1
 
6.7%
2024-03-23T14:37:09.206818image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9
19.1%
5
10.6%
4
 
8.5%
3
 
6.4%
3
 
6.4%
3
 
6.4%
( 2
 
4.3%
2
 
4.3%
2
 
4.3%
) 2
 
4.3%
Other values (11) 12
25.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 34
72.3%
Space Separator 9
 
19.1%
Open Punctuation 2
 
4.3%
Close Punctuation 2
 
4.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5
14.7%
4
11.8%
3
 
8.8%
3
 
8.8%
3
 
8.8%
2
 
5.9%
2
 
5.9%
2
 
5.9%
1
 
2.9%
1
 
2.9%
Other values (8) 8
23.5%
Space Separator
ValueCountFrequency (%)
9
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 34
72.3%
Common 13
 
27.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5
14.7%
4
11.8%
3
 
8.8%
3
 
8.8%
3
 
8.8%
2
 
5.9%
2
 
5.9%
2
 
5.9%
1
 
2.9%
1
 
2.9%
Other values (8) 8
23.5%
Common
ValueCountFrequency (%)
9
69.2%
( 2
 
15.4%
) 2
 
15.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 34
72.3%
ASCII 13
 
27.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9
69.2%
( 2
 
15.4%
) 2
 
15.4%
Hangul
ValueCountFrequency (%)
5
14.7%
4
11.8%
3
 
8.8%
3
 
8.8%
3
 
8.8%
2
 
5.9%
2
 
5.9%
2
 
5.9%
1
 
2.9%
1
 
2.9%
Other values (8) 8
23.5%

Interactions

2024-03-23T14:37:03.591496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T14:37:09.355873image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선역사명시설명면적(제곱미터)사업진행단계
호선1.0000.9940.6590.0001.000
역사명0.9941.0000.8951.0001.000
시설명0.6590.8951.0000.000NaN
면적(제곱미터)0.0001.0000.0001.0001.000
사업진행단계1.0001.000NaN1.0001.000
2024-03-23T14:37:09.582114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시설명호선
시설명1.0000.359
호선0.3591.000
2024-03-23T14:37:09.789532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
면적(제곱미터)호선시설명
면적(제곱미터)1.0000.0000.000
호선0.0001.0000.359
시설명0.0000.3591.000

Missing values

2024-03-23T14:37:03.873732image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T14:37:04.127998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-23T14:37:04.310173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

호선역사명시설명면적(제곱미터)수량계약시작일계약종료일사업진행단계
01장림역상가23.612022-04-272027-04-26<NA>
11신평역상가55.8912020-05-152025-05-14<NA>
21신평역상가75.0212023-04-022028-04-01<NA>
31신평역상가820.912024-02-012029-01-31<NA>
41하단역상가65.612020-07-182025-07-17<NA>
51하단역상가23.4512023-09-222028-09-21<NA>
61당리역상가14.1712021-11-242026-11-23<NA>
71당리역상가14.1812021-11-272026-11-26<NA>
81괴정역상가28.012021-06-072026-06-06<NA>
91동대신역상가9.1212022-02-242027-02-23<NA>
호선역사명시설명면적(제곱미터)수량계약시작일계약종료일사업진행단계
3194동래역상가60.5912020-06-182025-06-17<NA>
3204수안역상가61.6612021-09-172026-09-16<NA>
3214낙민역상가12.6312024-02-192029-02-18<NA>
3224충렬사역상가9.9312020-03-282025-03-27<NA>
3234명장역상가10.9212022-06-262027-06-25<NA>
3244서동역상가12.3512021-05-062026-05-05<NA>
3254금사역상가16.4912023-01-152028-01-14<NA>
3264반여농산물시장역상가41.9112022-09-112024-09-10<NA>
3274석대역상가10.3612021-11-022026-11-01<NA>
3284영산대역상가9.4112021-08-132026-08-12<NA>

Duplicate rows

Most frequently occurring

호선역사명시설명면적(제곱미터)수량계약시작일계약종료일사업진행단계# duplicates
82전포역상가30.012024-01-012024-12-31<NA>6
32수영역상가27.3612023-06-032028-06-02<NA>3
42수영역상가27.3612023-07-082028-07-07<NA>3
62전포역상가29.412023-09-252025-09-24<NA>3
01중앙역통로상가20.012023-06-162028-06-15<NA>2
11중앙역통로상가20.012023-06-212028-06-20<NA>2
22수영역상가27.3612021-07-082026-07-07<NA>2
52전포역상가23.112023-10-122028-10-11<NA>2
72전포역상가30.012023-09-262028-09-25<NA>2
92전포역상가47.5212024-01-012024-12-31<NA>2