Overview

Dataset statistics

Number of variables5
Number of observations2295
Missing cells49
Missing cells (%)0.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory89.8 KiB
Average record size in memory40.1 B

Variable types

Categorical2
Text3

Dataset

Description서울시설공단이 관리하는 지하도상가(25개)의 권역, 위치별 점포 현황 및 점포별 현재 운영중인 업종 현황을 제공하는 데이터입니다.
URLhttps://www.data.go.kr/data/15003426/fileData.do

Alerts

권역 is highly overall correlated with 상가명High correlation
상가명 is highly overall correlated with 권역High correlation
업종 has 47 (2.0%) missing valuesMissing

Reproduction

Analysis started2023-12-12 22:48:50.149757
Analysis finished2023-12-12 22:48:51.282597
Duration1.13 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

권역
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size18.1 KiB
터미널
627 
명동
566 
강남
359 
을지로
306 
영등포
233 

Length

Max length3
Median length3
Mean length2.508061
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강남
2nd row강남
3rd row강남
4th row강남
5th row강남

Common Values

ValueCountFrequency (%)
터미널 627
27.3%
명동 566
24.7%
강남 359
15.6%
을지로 306
13.3%
영등포 233
 
10.2%
종로 204
 
8.9%

Length

2023-12-13T07:48:51.362422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:48:51.462697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
터미널 627
27.3%
명동 566
24.7%
강남 359
15.6%
을지로 306
13.3%
영등포 233
 
10.2%
종로 204
 
8.9%

상가명
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size18.1 KiB
터미널
627 
강남역
221 
회현
221 
을지로
208 
소공
139 
Other values (13)
879 

Length

Max length6
Median length3
Mean length3.0631808
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강남역
2nd row강남역
3rd row강남역
4th row강남역
5th row강남역

Common Values

ValueCountFrequency (%)
터미널 627
27.3%
강남역 221
 
9.6%
회현 221
 
9.6%
을지로 208
 
9.1%
소공 139
 
6.1%
잠실역 138
 
6.0%
영등포로터리 91
 
4.0%
남대문 79
 
3.4%
종각 77
 
3.4%
영등포역 74
 
3.2%
Other values (8) 420
18.3%

Length

2023-12-13T07:48:51.625464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
터미널 627
27.3%
강남역 221
 
9.6%
회현 221
 
9.6%
을지로 208
 
9.1%
소공 139
 
6.1%
잠실역 138
 
6.0%
영등포로터리 91
 
4.0%
남대문 79
 
3.4%
종각 77
 
3.4%
영등포역 74
 
3.2%
Other values (8) 420
18.3%
Distinct2025
Distinct (%)88.3%
Missing2
Missing (%)0.1%
Memory size18.1 KiB
2023-12-13T07:48:51.890087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length19
Mean length4.8159616
Min length1

Characters and Unicode

Total characters11043
Distinct characters750
Distinct categories13 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1834 ?
Unique (%)80.0%

Sample

1st row바미라운지
2nd row바미라운지
3rd row월드전자랜드강남역점
4th row월드전자랜드강남역점
5th row메이디
ValueCountFrequency (%)
공실 20
 
0.8%
강남역점 7
 
0.3%
토니모리 6
 
0.2%
caseflex 6
 
0.2%
입점예정 5
 
0.2%
5
 
0.2%
블링박스 4
 
0.2%
아리따움 4
 
0.2%
갤러리 4
 
0.2%
소호 4
 
0.2%
Other values (2124) 2447
97.4%
2023-12-13T07:48:52.311093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1000
 
9.1%
324
 
2.9%
247
 
2.2%
219
 
2.0%
) 143
 
1.3%
( 143
 
1.3%
139
 
1.3%
139
 
1.3%
132
 
1.2%
105
 
1.0%
Other values (740) 8452
76.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8595
77.8%
Space Separator 1000
 
9.1%
Uppercase Letter 547
 
5.0%
Lowercase Letter 396
 
3.6%
Close Punctuation 143
 
1.3%
Open Punctuation 143
 
1.3%
Decimal Number 143
 
1.3%
Other Punctuation 50
 
0.5%
Dash Punctuation 12
 
0.1%
Other Symbol 9
 
0.1%
Other values (3) 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
324
 
3.8%
247
 
2.9%
219
 
2.5%
139
 
1.6%
139
 
1.6%
132
 
1.5%
105
 
1.2%
104
 
1.2%
102
 
1.2%
99
 
1.2%
Other values (664) 6985
81.3%
Uppercase Letter
ValueCountFrequency (%)
A 57
 
10.4%
O 51
 
9.3%
T 43
 
7.9%
E 40
 
7.3%
L 33
 
6.0%
S 30
 
5.5%
M 29
 
5.3%
C 25
 
4.6%
I 22
 
4.0%
R 21
 
3.8%
Other values (16) 196
35.8%
Lowercase Letter
ValueCountFrequency (%)
e 58
14.6%
a 37
 
9.3%
o 36
 
9.1%
l 29
 
7.3%
i 28
 
7.1%
s 24
 
6.1%
n 22
 
5.6%
r 19
 
4.8%
u 17
 
4.3%
c 17
 
4.3%
Other values (14) 109
27.5%
Decimal Number
ValueCountFrequency (%)
2 38
26.6%
1 33
23.1%
0 18
12.6%
4 12
 
8.4%
3 10
 
7.0%
6 7
 
4.9%
9 7
 
4.9%
8 6
 
4.2%
7 6
 
4.2%
5 6
 
4.2%
Other Punctuation
ValueCountFrequency (%)
. 24
48.0%
, 11
22.0%
' 5
 
10.0%
& 5
 
10.0%
# 2
 
4.0%
? 1
 
2.0%
/ 1
 
2.0%
· 1
 
2.0%
Space Separator
ValueCountFrequency (%)
1000
100.0%
Close Punctuation
ValueCountFrequency (%)
) 143
100.0%
Open Punctuation
ValueCountFrequency (%)
( 143
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 12
100.0%
Other Symbol
ValueCountFrequency (%)
9
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8603
77.9%
Common 1495
 
13.5%
Latin 944
 
8.5%
Han 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
324
 
3.8%
247
 
2.9%
219
 
2.5%
139
 
1.6%
139
 
1.6%
132
 
1.5%
105
 
1.2%
104
 
1.2%
102
 
1.2%
99
 
1.2%
Other values (664) 6993
81.3%
Latin
ValueCountFrequency (%)
e 58
 
6.1%
A 57
 
6.0%
O 51
 
5.4%
T 43
 
4.6%
E 40
 
4.2%
a 37
 
3.9%
o 36
 
3.8%
L 33
 
3.5%
S 30
 
3.2%
M 29
 
3.1%
Other values (41) 530
56.1%
Common
ValueCountFrequency (%)
1000
66.9%
) 143
 
9.6%
( 143
 
9.6%
2 38
 
2.5%
1 33
 
2.2%
. 24
 
1.6%
0 18
 
1.2%
- 12
 
0.8%
4 12
 
0.8%
, 11
 
0.7%
Other values (14) 61
 
4.1%
Han
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8594
77.8%
ASCII 2437
 
22.1%
None 10
 
0.1%
CJK 1
 
< 0.1%
Number Forms 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1000
41.0%
) 143
 
5.9%
( 143
 
5.9%
e 58
 
2.4%
A 57
 
2.3%
O 51
 
2.1%
T 43
 
1.8%
E 40
 
1.6%
2 38
 
1.6%
a 37
 
1.5%
Other values (63) 827
33.9%
Hangul
ValueCountFrequency (%)
324
 
3.8%
247
 
2.9%
219
 
2.5%
139
 
1.6%
139
 
1.6%
132
 
1.5%
105
 
1.2%
104
 
1.2%
102
 
1.2%
99
 
1.2%
Other values (663) 6984
81.3%
None
ValueCountFrequency (%)
9
90.0%
· 1
 
10.0%
CJK
ValueCountFrequency (%)
1
100.0%
Number Forms
ValueCountFrequency (%)
1
100.0%

호수
Text

Distinct1976
Distinct (%)86.1%
Missing0
Missing (%)0.0%
Memory size18.1 KiB
2023-12-13T07:48:52.758829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length16
Mean length4.4823529
Min length1

Characters and Unicode

Total characters10287
Distinct characters54
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1804 ?
Unique (%)78.6%

Sample

1st rowA-1호
2nd rowA-2호
3rd rowA-3호
4th rowA-4호
5th rowA-5호
ValueCountFrequency (%)
18 7
 
0.3%
17 7
 
0.3%
4 7
 
0.3%
1 6
 
0.3%
16 6
 
0.3%
5 6
 
0.3%
12 6
 
0.3%
15 5
 
0.2%
45 5
 
0.2%
23 5
 
0.2%
Other values (1921) 2254
97.4%
2023-12-13T07:48:53.394053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 1750
17.0%
1 1222
11.9%
0 886
 
8.6%
2 777
 
7.6%
3 641
 
6.2%
4 516
 
5.0%
462
 
4.5%
5 388
 
3.8%
6 388
 
3.8%
7 357
 
3.5%
Other values (44) 2900
28.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5824
56.6%
Dash Punctuation 1750
 
17.0%
Uppercase Letter 1148
 
11.2%
Other Letter 738
 
7.2%
Space Separator 462
 
4.5%
Other Punctuation 353
 
3.4%
Lowercase Letter 10
 
0.1%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
280
37.9%
60
 
8.1%
56
 
7.6%
53
 
7.2%
52
 
7.0%
46
 
6.2%
43
 
5.8%
26
 
3.5%
25
 
3.4%
20
 
2.7%
Other values (11) 77
 
10.4%
Uppercase Letter
ValueCountFrequency (%)
B 243
21.2%
D 226
19.7%
C 203
17.7%
A 201
17.5%
E 115
10.0%
F 58
 
5.1%
T 46
 
4.0%
G 36
 
3.1%
S 9
 
0.8%
I 5
 
0.4%
Other values (2) 6
 
0.5%
Decimal Number
ValueCountFrequency (%)
1 1222
21.0%
0 886
15.2%
2 777
13.3%
3 641
11.0%
4 516
8.9%
5 388
 
6.7%
6 388
 
6.7%
7 357
 
6.1%
8 334
 
5.7%
9 315
 
5.4%
Other Punctuation
ValueCountFrequency (%)
, 271
76.8%
· 61
 
17.3%
. 19
 
5.4%
/ 2
 
0.6%
Lowercase Letter
ValueCountFrequency (%)
n 3
30.0%
a 3
30.0%
b 2
20.0%
e 2
20.0%
Dash Punctuation
ValueCountFrequency (%)
- 1750
100.0%
Space Separator
ValueCountFrequency (%)
462
100.0%
Math Symbol
ValueCountFrequency (%)
~ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 8391
81.6%
Latin 1158
 
11.3%
Hangul 738
 
7.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
280
37.9%
60
 
8.1%
56
 
7.6%
53
 
7.2%
52
 
7.0%
46
 
6.2%
43
 
5.8%
26
 
3.5%
25
 
3.4%
20
 
2.7%
Other values (11) 77
 
10.4%
Common
ValueCountFrequency (%)
- 1750
20.9%
1 1222
14.6%
0 886
10.6%
2 777
9.3%
3 641
 
7.6%
4 516
 
6.1%
462
 
5.5%
5 388
 
4.6%
6 388
 
4.6%
7 357
 
4.3%
Other values (7) 1004
12.0%
Latin
ValueCountFrequency (%)
B 243
21.0%
D 226
19.5%
C 203
17.5%
A 201
17.4%
E 115
9.9%
F 58
 
5.0%
T 46
 
4.0%
G 36
 
3.1%
S 9
 
0.8%
I 5
 
0.4%
Other values (6) 16
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9488
92.2%
Hangul 713
 
6.9%
None 61
 
0.6%
Compat Jamo 25
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 1750
18.4%
1 1222
12.9%
0 886
9.3%
2 777
 
8.2%
3 641
 
6.8%
4 516
 
5.4%
462
 
4.9%
5 388
 
4.1%
6 388
 
4.1%
7 357
 
3.8%
Other values (22) 2101
22.1%
Hangul
ValueCountFrequency (%)
280
39.3%
60
 
8.4%
56
 
7.9%
53
 
7.4%
52
 
7.3%
46
 
6.5%
43
 
6.0%
26
 
3.6%
20
 
2.8%
16
 
2.2%
Other values (10) 61
 
8.6%
None
ValueCountFrequency (%)
· 61
100.0%
Compat Jamo
ValueCountFrequency (%)
25
100.0%

업종
Text

MISSING 

Distinct636
Distinct (%)28.3%
Missing47
Missing (%)2.0%
Memory size18.1 KiB
2023-12-13T07:48:53.683074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length30
Mean length4.2864769
Min length1

Characters and Unicode

Total characters9636
Distinct characters313
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique445 ?
Unique (%)19.8%

Sample

1st row의류,화장품,핸드폰악세사리및잡화류
2nd row의류,화장품,핸드폰악세사리및잡화류
3rd row이동통신
4th row이동통신
5th row의류잡화속옷
ValueCountFrequency (%)
의류 502
 
21.1%
잡화 108
 
4.5%
여성의류 92
 
3.9%
의류,잡화 84
 
3.5%
식음료 74
 
3.1%
액세서리 65
 
2.7%
신발 62
 
2.6%
화장품 48
 
2.0%
스포츠의류 39
 
1.6%
가방 36
 
1.5%
Other values (564) 1267
53.3%
2023-12-13T07:48:54.150504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
980
 
10.2%
928
 
9.6%
698
 
7.2%
578
 
6.0%
, 567
 
5.9%
427
 
4.4%
243
 
2.5%
241
 
2.5%
189
 
2.0%
146
 
1.5%
Other values (303) 4639
48.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8291
86.0%
Space Separator 698
 
7.2%
Other Punctuation 602
 
6.2%
Close Punctuation 20
 
0.2%
Open Punctuation 19
 
0.2%
Uppercase Letter 6
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
980
 
11.8%
928
 
11.2%
578
 
7.0%
427
 
5.2%
243
 
2.9%
241
 
2.9%
189
 
2.3%
146
 
1.8%
145
 
1.7%
136
 
1.6%
Other values (293) 4278
51.6%
Uppercase Letter
ValueCountFrequency (%)
D 2
33.3%
C 2
33.3%
P 1
16.7%
L 1
16.7%
Other Punctuation
ValueCountFrequency (%)
, 567
94.2%
/ 19
 
3.2%
. 16
 
2.7%
Space Separator
ValueCountFrequency (%)
698
100.0%
Close Punctuation
ValueCountFrequency (%)
) 20
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8291
86.0%
Common 1339
 
13.9%
Latin 6
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
980
 
11.8%
928
 
11.2%
578
 
7.0%
427
 
5.2%
243
 
2.9%
241
 
2.9%
189
 
2.3%
146
 
1.8%
145
 
1.7%
136
 
1.6%
Other values (293) 4278
51.6%
Common
ValueCountFrequency (%)
698
52.1%
, 567
42.3%
) 20
 
1.5%
( 19
 
1.4%
/ 19
 
1.4%
. 16
 
1.2%
Latin
ValueCountFrequency (%)
D 2
33.3%
C 2
33.3%
P 1
16.7%
L 1
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8291
86.0%
ASCII 1345
 
14.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
980
 
11.8%
928
 
11.2%
578
 
7.0%
427
 
5.2%
243
 
2.9%
241
 
2.9%
189
 
2.3%
146
 
1.8%
145
 
1.7%
136
 
1.6%
Other values (293) 4278
51.6%
ASCII
ValueCountFrequency (%)
698
51.9%
, 567
42.2%
) 20
 
1.5%
( 19
 
1.4%
/ 19
 
1.4%
. 16
 
1.2%
D 2
 
0.1%
C 2
 
0.1%
P 1
 
0.1%
L 1
 
0.1%

Correlations

2023-12-13T07:48:54.243736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
권역상가명
권역1.0001.000
상가명1.0001.000
2023-12-13T07:48:54.332876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
권역상가명
권역1.0000.997
상가명0.9971.000
2023-12-13T07:48:54.440843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
권역상가명
권역1.0000.997
상가명0.9971.000

Missing values

2023-12-13T07:48:51.049176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:48:51.151734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T07:48:51.232084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

권역상가명점포명호수업종
0강남강남역바미라운지A-1호의류,화장품,핸드폰악세사리및잡화류
1강남강남역바미라운지A-2호의류,화장품,핸드폰악세사리및잡화류
2강남강남역월드전자랜드강남역점A-3호이동통신
3강남강남역월드전자랜드강남역점A-4호이동통신
4강남강남역메이디A-5호의류잡화속옷
5강남강남역슬림핏A-6호기타섬유,직물및의복액세서리소매업
6강남강남역보니또A-7호의류/잡화
7강남강남역더블리스A-8·9호의류판매
8강남강남역GS25강남메트로A-10호편의점
9강남강남역GS25강남메트로A-11호편의점
권역상가명점포명호수업종
2285종로동대문2차한국15--2카페트,전기요
2286종로동대문2차혼수백화점16이불
2287종로동대문2차이불나라17이불
2288종로동대문2차이불나라18이불
2289종로동대문2차함지박주단19,20,25한복
2290종로동대문2차영화침구21,22이불
2291종로동대문2차고운한복23한복
2292종로동대문2차스마일커텐24커튼,수예
2293종로동대문2차목화침구26이불
2294종로동대문2차이다유통27양말,잡화