Overview

Dataset statistics

Number of variables12
Number of observations1942
Missing cells1943
Missing cells (%)8.3%
Duplicate rows297
Duplicate rows (%)15.3%
Total size in memory182.2 KiB
Average record size in memory96.1 B

Variable types

Unsupported6
Categorical3
Text2
DateTime1

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-12927/F/1/datasetView.do

Alerts

Unnamed: 11 has constant value ""Constant
Dataset has 297 (15.3%) duplicate rowsDuplicates
Unnamed: 1 is highly overall correlated with Unnamed: 10High correlation
Unnamed: 2 is highly overall correlated with Unnamed: 10High correlation
Unnamed: 10 is highly overall correlated with Unnamed: 1 and 1 other fieldsHigh correlation
Unnamed: 10 is highly imbalanced (57.7%)Imbalance
Unnamed: 5 has 208 (10.7%) missing valuesMissing
Unnamed: 6 has 467 (24.0%) missing valuesMissing
Unnamed: 7 has 526 (27.1%) missing valuesMissing
Unnamed: 8 has 526 (27.1%) missing valuesMissing
Unnamed: 9 has 215 (11.1%) missing valuesMissing
상가현황(2017.10월) is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-29 16:39:04.347364
Analysis finished2024-04-29 16:39:05.271155
Duration0.92 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

상가현황(2017.10월)
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size15.3 KiB

Unnamed: 1
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size15.3 KiB
네트워크(브랜드)
426 
GS
406 
개별(일반)
398 
공실
309 
복합
250 
Other values (7)
153 

Length

Max length12
Median length2
Mean length4.5942327
Min length2

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row상가유형
2nd row공실
3rd row네트워크(브랜드)
4th row네트워크(브랜드)
5th row네트워크(브랜드)

Common Values

ValueCountFrequency (%)
네트워크(브랜드) 426
21.9%
GS 406
20.9%
개별(일반) 398
20.5%
공실 309
15.9%
복합 250
12.9%
입찰공고중 82
 
4.2%
개별(장기) 29
 
1.5%
개별(대형) 19
 
1.0%
기타 19
 
1.0%
개별(일반-무상) 2
 
0.1%
Other values (2) 2
 
0.1%

Length

2024-04-30T01:39:05.547665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
네트워크(브랜드 426
21.9%
gs 406
20.9%
개별(일반 398
20.5%
공실 309
15.9%
복합 250
12.9%
입찰공고중 82
 
4.2%
개별(장기 29
 
1.5%
개별(대형 19
 
1.0%
기타 19
 
1.0%
개별(일반-무상 2
 
0.1%
Other values (2) 2
 
0.1%

Unnamed: 2
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size15.3 KiB
7호선
519 
5호선
358 
2호선
329 
6호선
265 
4호선
195 
Other values (4)
276 

Length

Max length3
Median length3
Mean length2.9994851
Min length2

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
7호선 519
26.7%
5호선 358
18.4%
2호선 329
16.9%
6호선 265
13.6%
4호선 195
 
10.0%
3호선 170
 
8.8%
8호선 55
 
2.8%
1호선 50
 
2.6%
호선 1
 
0.1%

Length

2024-04-30T01:39:05.638577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:39:05.739570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
7호선 519
26.7%
5호선 358
18.4%
2호선 329
16.9%
6호선 265
13.6%
4호선 195
 
10.0%
3호선 170
 
8.8%
8호선 55
 
2.8%
1호선 50
 
2.6%
호선 1
 
0.1%
Distinct249
Distinct (%)12.8%
Missing0
Missing (%)0.0%
Memory size15.3 KiB
2024-04-30T01:39:05.970517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length4.530896
Min length2

Characters and Unicode

Total characters8799
Distinct characters212
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)1.5%

Sample

1st row역명
2nd row서울(1)역
3rd row서울(1)역
4th row서울(1)역
5th row서울(1)역
ValueCountFrequency (%)
오목교역 83
 
4.3%
반포역 46
 
2.4%
청담역 39
 
2.0%
사당(4)역 33
 
1.7%
잠실역 33
 
1.7%
합정역 30
 
1.5%
공덕역 29
 
1.5%
천호역 28
 
1.4%
고속터미널역 27
 
1.4%
이수역 25
 
1.3%
Other values (233) 1569
80.8%
2024-04-30T01:39:06.314671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1965
22.3%
624
 
7.1%
234
 
2.7%
) 230
 
2.6%
( 230
 
2.6%
193
 
2.2%
152
 
1.7%
127
 
1.4%
123
 
1.4%
120
 
1.4%
Other values (202) 4801
54.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7427
84.4%
Space Separator 624
 
7.1%
Decimal Number 288
 
3.3%
Close Punctuation 230
 
2.6%
Open Punctuation 230
 
2.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1965
26.5%
234
 
3.2%
193
 
2.6%
152
 
2.0%
127
 
1.7%
123
 
1.7%
120
 
1.6%
113
 
1.5%
106
 
1.4%
103
 
1.4%
Other values (192) 4191
56.4%
Decimal Number
ValueCountFrequency (%)
4 81
28.1%
3 65
22.6%
2 52
18.1%
7 28
 
9.7%
6 28
 
9.7%
1 23
 
8.0%
5 11
 
3.8%
Space Separator
ValueCountFrequency (%)
624
100.0%
Close Punctuation
ValueCountFrequency (%)
) 230
100.0%
Open Punctuation
ValueCountFrequency (%)
( 230
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7427
84.4%
Common 1372
 
15.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1965
26.5%
234
 
3.2%
193
 
2.6%
152
 
2.0%
127
 
1.7%
123
 
1.7%
120
 
1.6%
113
 
1.5%
106
 
1.4%
103
 
1.4%
Other values (192) 4191
56.4%
Common
ValueCountFrequency (%)
624
45.5%
) 230
 
16.8%
( 230
 
16.8%
4 81
 
5.9%
3 65
 
4.7%
2 52
 
3.8%
7 28
 
2.0%
6 28
 
2.0%
1 23
 
1.7%
5 11
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7427
84.4%
ASCII 1372
 
15.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1965
26.5%
234
 
3.2%
193
 
2.6%
152
 
2.0%
127
 
1.7%
123
 
1.7%
120
 
1.6%
113
 
1.5%
106
 
1.4%
103
 
1.4%
Other values (192) 4191
56.4%
ASCII
ValueCountFrequency (%)
624
45.5%
) 230
 
16.8%
( 230
 
16.8%
4 81
 
5.9%
3 65
 
4.7%
2 52
 
3.8%
7 28
 
2.0%
6 28
 
2.0%
1 23
 
1.7%
5 11
 
0.8%

Unnamed: 4
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size15.3 KiB

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing208
Missing (%)10.7%
Memory size15.3 KiB

Unnamed: 6
Text

MISSING 

Distinct62
Distinct (%)4.2%
Missing467
Missing (%)24.0%
Memory size15.3 KiB
2024-04-30T01:39:06.494854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length14
Mean length3.6976271
Min length2

Characters and Unicode

Total characters5454
Distinct characters138
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)1.2%

Sample

1st row업종
2nd row커피
3rd row커피
4th row화장품
5th row액세서리
ValueCountFrequency (%)
화장품 203
12.9%
편의점 199
12.6%
의류 157
 
10.0%
액세서리 111
 
7.0%
제과 88
 
5.6%
기타 86
 
5.5%
의류(여성 83
 
5.3%
복합상가 81
 
5.1%
커피 77
 
4.9%
공실 65
 
4.1%
Other values (54) 425
27.0%
2024-04-30T01:39:06.816416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
462
 
8.5%
260
 
4.8%
260
 
4.8%
217
 
4.0%
204
 
3.7%
201
 
3.7%
200
 
3.7%
149
 
2.7%
144
 
2.6%
138
 
2.5%
Other values (128) 3219
59.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4982
91.3%
Other Punctuation 142
 
2.6%
Close Punctuation 113
 
2.1%
Open Punctuation 113
 
2.1%
Space Separator 102
 
1.9%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
462
 
9.3%
260
 
5.2%
260
 
5.2%
217
 
4.4%
204
 
4.1%
201
 
4.0%
200
 
4.0%
149
 
3.0%
144
 
2.9%
138
 
2.8%
Other values (121) 2747
55.1%
Other Punctuation
ValueCountFrequency (%)
, 84
59.2%
. 56
39.4%
@ 2
 
1.4%
Close Punctuation
ValueCountFrequency (%)
) 113
100.0%
Open Punctuation
ValueCountFrequency (%)
( 113
100.0%
Space Separator
ValueCountFrequency (%)
102
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4982
91.3%
Common 472
 
8.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
462
 
9.3%
260
 
5.2%
260
 
5.2%
217
 
4.4%
204
 
4.1%
201
 
4.0%
200
 
4.0%
149
 
3.0%
144
 
2.9%
138
 
2.8%
Other values (121) 2747
55.1%
Common
ValueCountFrequency (%)
) 113
23.9%
( 113
23.9%
102
21.6%
, 84
17.8%
. 56
11.9%
@ 2
 
0.4%
+ 2
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4982
91.3%
ASCII 472
 
8.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
462
 
9.3%
260
 
5.2%
260
 
5.2%
217
 
4.4%
204
 
4.1%
201
 
4.0%
200
 
4.0%
149
 
3.0%
144
 
2.9%
138
 
2.8%
Other values (121) 2747
55.1%
ASCII
ValueCountFrequency (%)
) 113
23.9%
( 113
23.9%
102
21.6%
, 84
17.8%
. 56
11.9%
@ 2
 
0.4%
+ 2
 
0.4%

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing526
Missing (%)27.1%
Memory size15.3 KiB

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing526
Missing (%)27.1%
Memory size15.3 KiB

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing215
Missing (%)11.1%
Memory size15.3 KiB

Unnamed: 10
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.3 KiB
<NA>
1490 
공실
309 
입찰공고중
 
82
명도거부
 
45
계약만료
 
15

Length

Max length5
Median length4
Mean length3.722966
Min length2

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row비고
2nd row공실
3rd row<NA>
4th row명도거부
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 1490
76.7%
공실 309
 
15.9%
입찰공고중 82
 
4.2%
명도거부 45
 
2.3%
계약만료 15
 
0.8%
비고 1
 
0.1%

Length

2024-04-30T01:39:06.929747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:39:07.026589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 1490
76.7%
공실 309
 
15.9%
입찰공고중 82
 
4.2%
명도거부 45
 
2.3%
계약만료 15
 
0.8%
비고 1
 
0.1%

Unnamed: 11
Date

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing1
Missing (%)0.1%
Memory size15.3 KiB
Minimum2017-10-11 00:00:00
Maximum2017-10-11 00:00:00
2024-04-30T01:39:07.115304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:39:07.205008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Correlations

2024-04-30T01:39:07.284612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 2Unnamed: 6Unnamed: 10
Unnamed: 11.0000.7680.9200.937
Unnamed: 20.7681.0000.8600.765
Unnamed: 60.9200.8601.0000.828
Unnamed: 100.9370.7650.8281.000
2024-04-30T01:39:07.379752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 10Unnamed: 2Unnamed: 1
Unnamed: 101.0000.5750.897
Unnamed: 20.5751.0000.460
Unnamed: 10.8970.4601.000
2024-04-30T01:39:07.456252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 2Unnamed: 10
Unnamed: 11.0000.4600.897
Unnamed: 20.4601.0000.575
Unnamed: 100.8970.5751.000

Missing values

2024-04-30T01:39:04.931026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T01:39:05.060410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-30T01:39:05.195233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

상가현황(2017.10월)Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11
0NO상가유형호선역명상가번호면적업종계약시작계약종료월 임대료(원)비고NaT
11공실1호선서울(1)역15010733<NA>NaNNaN6254183.333333공실2017-10-11
22네트워크(브랜드)1호선서울(1)역15010833커피2012-07-06 00:00:002017-08-15 00:00:005386912<NA>2017-10-11
33네트워크(브랜드)1호선서울(1)역15010912커피2012-05-21 00:00:002017-07-10 00:00:005178550명도거부2017-10-11
44네트워크(브랜드)1호선서울(1)역15011041.3화장품2015-09-05 00:00:002018-11-03 00:00:0016970250<NA>2017-10-11
55개별(일반)1호선시청(1)역15110119.18액세서리2013-03-18 00:00:002018-03-17 00:00:004800900<NA>2017-10-11
66공실1호선시청(1)역15110215.03<NA>NaNNaN3077250공실2017-10-11
77개별(일반-무상)1호선시청(1)역15110357.6액세서리2015-02-01 00:00:002020-01-31 00:00:00무상<NA>2017-10-11
88공실1호선시청(1)역15110425<NA>NaNNaN7266625공실2017-10-11
99네트워크(브랜드)1호선시청(1)역15110525커피2012-06-28 00:00:002017-08-07 00:00:006164224<NA>2017-10-11
상가현황(2017.10월)Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11
19321932네트워크(브랜드)8호선남한산성입구역822-200517음료.제과2014-05-26 00:00:002019-06-25 00:00:002031500<NA>2017-10-11
19331933공실8호선단대오거리역823-100142.5<NA>NaNNaN3201000공실2017-10-11
19341934네트워크(브랜드)8호선단대오거리역823-100236.78화장품2014-01-24 00:00:002019-03-25 00:00:0015685314.383333<NA>2017-10-11
19351935네트워크(브랜드)8호선단대오거리역823-200132.5편의점2016-07-25 00:00:002021-11.178712991<NA>2017-10-11
19361936네트워크(브랜드)8호선단대오거리역823-200228.97음료.제과2014-10-06 00:00:002019-11-04 00:00:005225054.666667<NA>2017-10-11
19371937네트워크(브랜드)8호선단대오거리역823-200354.03음료.제과2015-05-21 00:00:002020-06-20 00:00:009418666.666667<NA>2017-10-11
19381938네트워크(브랜드)8호선단대오거리역823-200475.09액세서리2016-05-23 00:00:002021-06-22 00:00:004284455<NA>2017-10-11
19391939네트워크(브랜드)8호선신흥역824-100140편의점2016-07-25 00:00:002021-11.176124682<NA>2017-10-11
19401940네트워크(브랜드)8호선수진역825-100140편의점2016-07-25 00:00:002021-11.175575875<NA>2017-10-11
19411941네트워크(브랜드)8호선모란역826-100150편의점2016-07-25 00:00:002021-11.175831070<NA>2017-10-11

Duplicate rows

Most frequently occurring

Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 6Unnamed: 10Unnamed: 11# duplicates
272복합5호선오목교역<NA><NA>2017-10-1167
296입찰공고중7호선반포역<NA>입찰공고중2017-10-1140
273복합5호선천호역천호 복합상가<NA>2017-10-1126
228공실7호선청담역<NA>공실2017-10-1122
205공실5호선오목교역<NA>공실2017-10-1115
278복합7호선고속터미널역고속터미널 복합상가<NA>2017-10-1112
279복합7호선노원역테라피휴 복합상가<NA>2017-10-1112
270복합5호선공덕역공덕,합정,영등포구청 스트리트몰<NA>2017-10-1110
285복합8호선잠실역잠실스트리트몰<NA>2017-10-119
21GS6호선석계역공실<NA>2017-10-118