Overview

Dataset statistics

Number of variables8
Number of observations73
Missing cells103
Missing cells (%)17.6%
Duplicate rows1
Duplicate rows (%)1.4%
Total size in memory4.8 KiB
Average record size in memory67.8 B

Variable types

Categorical4
Text2
Unsupported1
DateTime1

Dataset

Description서울특별시 광진구 보도상영업시설물 현황에 관한 자료로서 유형(가로판매대,구두수선대), 관리번호, 시설물 주소, 점용면적, 취급품목등의 자료를 제공합니다
Author서울특별시 광진구
URLhttps://www.data.go.kr/data/15064346/fileData.do

Alerts

자료기준일 has constant value ""Constant
Dataset has 1 (1.4%) duplicate rowsDuplicates
유형 is highly overall correlated with 면적(제곱미터) and 1 other fieldsHigh correlation
면적(제곱미터) is highly overall correlated with 유형High correlation
취급품목 is highly overall correlated with 유형High correlation
관리번호 has 10 (13.7%) missing valuesMissing
주소 has 10 (13.7%) missing valuesMissing
비고 has 73 (100.0%) missing valuesMissing
자료기준일 has 10 (13.7%) missing valuesMissing
비고 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-14 22:46:49.953308
Analysis finished2024-03-14 22:46:51.204370
Duration1.25 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

유형
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size712.0 B
가로판매대(서울시형)
34 
구두수선대
29 
<NA>
10 

Length

Max length11
Median length5
Mean length7.6575342
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row가로판매대(서울시형)
2nd row가로판매대(서울시형)
3rd row가로판매대(서울시형)
4th row가로판매대(서울시형)
5th row가로판매대(서울시형)

Common Values

ValueCountFrequency (%)
가로판매대(서울시형) 34
46.6%
구두수선대 29
39.7%
<NA> 10
 
13.7%

Length

2024-03-15T07:46:51.400769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-15T07:46:51.746274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
가로판매대(서울시형 34
46.6%
구두수선대 29
39.7%
na 10
 
13.7%

관리번호
Text

MISSING 

Distinct63
Distinct (%)100.0%
Missing10
Missing (%)13.7%
Memory size712.0 B
2024-03-15T07:46:52.643455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters504
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique63 ?
Unique (%)100.0%

Sample

1st row가로판매대-01
2nd row가로판매대-02
3rd row가로판매대-03
4th row가로판매대-04
5th row가로판매대-05
ValueCountFrequency (%)
가로판매대-02 1
 
1.6%
구두수선대-15 1
 
1.6%
구두수선대-01 1
 
1.6%
구두수선대-02 1
 
1.6%
구두수선대-03 1
 
1.6%
구두수선대-04 1
 
1.6%
구두수선대-05 1
 
1.6%
구두수선대-06 1
 
1.6%
구두수선대-07 1
 
1.6%
구두수선대-08 1
 
1.6%
Other values (53) 53
84.1%
2024-03-15T07:46:53.740918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
63
12.5%
- 63
12.5%
34
 
6.7%
34
 
6.7%
34
 
6.7%
34
 
6.7%
29
 
5.8%
29
 
5.8%
29
 
5.8%
29
 
5.8%
Other values (10) 126
25.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 315
62.5%
Decimal Number 126
 
25.0%
Dash Punctuation 63
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 27
21.4%
2 25
19.8%
0 23
18.3%
3 14
11.1%
4 7
 
5.6%
6 6
 
4.8%
5 6
 
4.8%
7 6
 
4.8%
8 6
 
4.8%
9 6
 
4.8%
Other Letter
ValueCountFrequency (%)
63
20.0%
34
10.8%
34
10.8%
34
10.8%
34
10.8%
29
9.2%
29
9.2%
29
9.2%
29
9.2%
Dash Punctuation
ValueCountFrequency (%)
- 63
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 315
62.5%
Common 189
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
- 63
33.3%
1 27
14.3%
2 25
 
13.2%
0 23
 
12.2%
3 14
 
7.4%
4 7
 
3.7%
6 6
 
3.2%
5 6
 
3.2%
7 6
 
3.2%
8 6
 
3.2%
Hangul
ValueCountFrequency (%)
63
20.0%
34
10.8%
34
10.8%
34
10.8%
34
10.8%
29
9.2%
29
9.2%
29
9.2%
29
9.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 315
62.5%
ASCII 189
37.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
63
20.0%
34
10.8%
34
10.8%
34
10.8%
34
10.8%
29
9.2%
29
9.2%
29
9.2%
29
9.2%
ASCII
ValueCountFrequency (%)
- 63
33.3%
1 27
14.3%
2 25
 
13.2%
0 23
 
12.2%
3 14
 
7.4%
4 7
 
3.7%
6 6
 
3.2%
5 6
 
3.2%
7 6
 
3.2%
8 6
 
3.2%

주소
Text

MISSING 

Distinct52
Distinct (%)82.5%
Missing10
Missing (%)13.7%
Memory size712.0 B
2024-03-15T07:46:54.564855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length24
Mean length22.396825
Min length17

Characters and Unicode

Total characters1411
Distinct characters53
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45 ?
Unique (%)71.4%

Sample

1st row서울특별시 광진구 자양로 95(자양동)
2nd row서울특별시 광진구 아차산로 377(구의동)
3rd row서울특별시 광진구 아차산로 219(화양동)
4th row서울특별시 광진구 아차산로 224(자양동)
5th row서울특별시 광진구 아차산로 244(자양동)
ValueCountFrequency (%)
서울특별시 62
24.9%
광진구 62
24.9%
능동로 12
 
4.8%
강변역로 10
 
4.0%
아차산로 10
 
4.0%
천호대로 8
 
3.2%
자양로 5
 
2.0%
50(구의동 4
 
1.6%
53(구의동 4
 
1.6%
광나루로 4
 
1.6%
Other values (60) 68
27.3%
2024-03-15T07:46:55.743240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
186
 
13.2%
84
 
6.0%
75
 
5.3%
70
 
5.0%
63
 
4.5%
63
 
4.5%
63
 
4.5%
63
 
4.5%
63
 
4.5%
63
 
4.5%
Other values (43) 618
43.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 918
65.1%
Space Separator 186
 
13.2%
Decimal Number 177
 
12.5%
Close Punctuation 62
 
4.4%
Open Punctuation 62
 
4.4%
Dash Punctuation 6
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
84
 
9.2%
75
 
8.2%
70
 
7.6%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
Other values (29) 248
27.0%
Decimal Number
ValueCountFrequency (%)
5 39
22.0%
2 25
14.1%
1 21
11.9%
3 19
10.7%
4 17
9.6%
7 16
9.0%
0 13
 
7.3%
6 11
 
6.2%
9 9
 
5.1%
8 7
 
4.0%
Space Separator
ValueCountFrequency (%)
186
100.0%
Close Punctuation
ValueCountFrequency (%)
) 62
100.0%
Open Punctuation
ValueCountFrequency (%)
( 62
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 918
65.1%
Common 493
34.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
84
 
9.2%
75
 
8.2%
70
 
7.6%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
Other values (29) 248
27.0%
Common
ValueCountFrequency (%)
186
37.7%
) 62
 
12.6%
( 62
 
12.6%
5 39
 
7.9%
2 25
 
5.1%
1 21
 
4.3%
3 19
 
3.9%
4 17
 
3.4%
7 16
 
3.2%
0 13
 
2.6%
Other values (4) 33
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 918
65.1%
ASCII 493
34.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
186
37.7%
) 62
 
12.6%
( 62
 
12.6%
5 39
 
7.9%
2 25
 
5.1%
1 21
 
4.3%
3 19
 
3.9%
4 17
 
3.4%
7 16
 
3.2%
0 13
 
2.6%
Other values (4) 33
 
6.7%
Hangul
ValueCountFrequency (%)
84
 
9.2%
75
 
8.2%
70
 
7.6%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
63
 
6.9%
Other values (29) 248
27.0%

성 명
Categorical

Distinct23
Distinct (%)31.5%
Missing0
Missing (%)0.0%
Memory size712.0 B
김**
14 
<NA>
10 
박**
정**
이**
Other values (18)
28 

Length

Max length4
Median length3
Mean length3.1369863
Min length3

Unique

Unique10 ?
Unique (%)13.7%

Sample

1st row신**
2nd row박**
3rd row임**
4th row한**
5th row박**

Common Values

ValueCountFrequency (%)
김** 14
19.2%
<NA> 10
13.7%
박** 9
12.3%
정** 7
9.6%
이** 5
 
6.8%
전** 3
 
4.1%
임** 3
 
4.1%
유** 2
 
2.7%
최** 2
 
2.7%
송** 2
 
2.7%
Other values (13) 16
21.9%

Length

2024-03-15T07:46:56.129730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
14
19.2%
na 10
13.7%
9
12.3%
7
9.6%
5
 
6.8%
3
 
4.1%
3
 
4.1%
2
 
2.7%
2
 
2.7%
2
 
2.7%
Other values (13) 16
21.9%

면적(제곱미터)
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Memory size712.0 B
3.92
34 
4.48
27 
<NA>
10 
3.5
 
2

Length

Max length4
Median length4
Mean length3.9726027
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3.92
2nd row3.92
3rd row3.92
4th row3.92
5th row3.92

Common Values

ValueCountFrequency (%)
3.92 34
46.6%
4.48 27
37.0%
<NA> 10
 
13.7%
3.5 2
 
2.7%

Length

2024-03-15T07:46:56.531280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-15T07:46:56.872889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3.92 34
46.6%
4.48 27
37.0%
na 10
 
13.7%
3.5 2
 
2.7%

취급품목
Categorical

HIGH CORRELATION 

Distinct29
Distinct (%)39.7%
Missing0
Missing (%)0.0%
Memory size712.0 B
구두닦이 및 수선
29 
<NA>
10 
과일
복권, 교통카드, 신문, 음료
 
2
교통카드, 담배, 음료
 
2
Other values (24)
27 

Length

Max length16
Median length14
Mean length7.7671233
Min length1

Unique

Unique21 ?
Unique (%)28.8%

Sample

1st row잡화
2nd row기타
3rd row핫도그/제빵
4th row핫도그, 토스트
5th row양말, 장갑

Common Values

ValueCountFrequency (%)
구두닦이 및 수선 29
39.7%
<NA> 10
 
13.7%
과일 3
 
4.1%
복권, 교통카드, 신문, 음료 2
 
2.7%
교통카드, 담배, 음료 2
 
2.7%
김밥, 토스트 2
 
2.7%
잡화 2
 
2.7%
핫도그, 김밥, 샌드위치 2
 
2.7%
기타 1
 
1.4%
토스트, 김밥 1
 
1.4%
Other values (19) 19
26.0%

Length

2024-03-15T07:46:57.304497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
구두닦이 29
17.3%
수선 29
17.3%
29
17.3%
음료 12
 
7.1%
na 10
 
6.0%
교통카드 8
 
4.8%
잡화 6
 
3.6%
김밥 5
 
3.0%
담배 4
 
2.4%
토스트 4
 
2.4%
Other values (21) 32
19.0%

비고
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing73
Missing (%)100.0%
Memory size785.0 B

자료기준일
Date

CONSTANT  MISSING 

Distinct1
Distinct (%)1.6%
Missing10
Missing (%)13.7%
Memory size712.0 B
Minimum2024-02-13 00:00:00
Maximum2024-02-13 00:00:00
2024-03-15T07:46:57.695182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-15T07:46:57.862722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Correlations

2024-03-15T07:46:57.998949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
유형관리번호주소성 명면적(제곱미터)취급품목
유형1.0001.0000.5020.0001.0001.000
관리번호1.0001.0001.0001.0001.0001.000
주소0.5021.0001.0000.9140.0000.759
성 명0.0001.0000.9141.0000.0000.726
면적(제곱미터)1.0001.0000.0000.0001.0000.570
취급품목1.0001.0000.7590.7260.5701.000
2024-03-15T07:46:58.182783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
유형성 명면적(제곱미터)취급품목
유형1.0000.0000.9920.757
성 명0.0001.0000.0000.227
면적(제곱미터)0.9920.0001.0000.258
취급품목0.7570.2270.2581.000
2024-03-15T07:46:58.516451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
유형성 명면적(제곱미터)취급품목
유형1.0000.0000.9920.757
성 명0.0001.0000.0000.227
면적(제곱미터)0.9920.0001.0000.258
취급품목0.7570.2270.2581.000

Missing values

2024-03-15T07:46:50.594943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-15T07:46:50.816890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-15T07:46:51.030621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

유형관리번호주소성 명면적(제곱미터)취급품목비고자료기준일
0가로판매대(서울시형)가로판매대-01서울특별시 광진구 자양로 95(자양동)신**3.92잡화<NA>2024-02-13
1가로판매대(서울시형)가로판매대-02서울특별시 광진구 아차산로 377(구의동)박**3.92기타<NA>2024-02-13
2가로판매대(서울시형)가로판매대-03서울특별시 광진구 아차산로 219(화양동)임**3.92핫도그/제빵<NA>2024-02-13
3가로판매대(서울시형)가로판매대-04서울특별시 광진구 아차산로 224(자양동)한**3.92핫도그, 토스트<NA>2024-02-13
4가로판매대(서울시형)가로판매대-05서울특별시 광진구 아차산로 244(자양동)박**3.92양말, 장갑<NA>2024-02-13
5가로판매대(서울시형)가로판매대-06서울특별시 광진구 능동로 92(자양동)진**3.92과일<NA>2024-02-13
6가로판매대(서울시형)가로판매대-07서울특별시 광진구 능등로 103(화양동)문**3.92잡화<NA>2024-02-13
7가로판매대(서울시형)가로판매대-08서울특별시 광진구 능동로 107(화양동)이**3.92핫도그, 김밥, 샌드위치<NA>2024-02-13
8가로판매대(서울시형)가로판매대-09서울특별시 광진구 능동로 115(화양동)고**3.92<NA>2024-02-13
9가로판매대(서울시형)가로판매대-10서울특별시 광진구 능동로 117(화양동)양**3.92애견용품<NA>2024-02-13
유형관리번호주소성 명면적(제곱미터)취급품목비고자료기준일
63<NA><NA><NA><NA><NA><NA><NA><NA>
64<NA><NA><NA><NA><NA><NA><NA><NA>
65<NA><NA><NA><NA><NA><NA><NA><NA>
66<NA><NA><NA><NA><NA><NA><NA><NA>
67<NA><NA><NA><NA><NA><NA><NA><NA>
68<NA><NA><NA><NA><NA><NA><NA><NA>
69<NA><NA><NA><NA><NA><NA><NA><NA>
70<NA><NA><NA><NA><NA><NA><NA><NA>
71<NA><NA><NA><NA><NA><NA><NA><NA>
72<NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

유형관리번호주소성 명면적(제곱미터)취급품목자료기준일# duplicates
0<NA><NA><NA><NA><NA><NA><NA>10