Overview

Dataset statistics

Number of variables4
Number of observations153
Missing cells95
Missing cells (%)15.5%
Duplicate rows2
Duplicate rows (%)1.3%
Total size in memory4.9 KiB
Average record size in memory32.9 B

Variable types

Categorical1
Text3

Dataset

Description경기도 안양시 관내 식품소분업에 대한 (업종명, 업소명, 소재지(도로명), 소재지전화) 현황 정보입니다. 전화번호의경우 미수집된 데이터가 존재합니다. ## LINK 미리보기 [![미리보기](http://curate.gimi9.com/linkview/www-data-go-kr-data-filedata-3080191?url=https%3A//www.anyang.go.kr/main/selectBbsNttView.do%3Fkey%3D194%26bbsNo%3D34%26nttNo%3D253038&version=d7)](https://www.data.go.kr/data/3080191/fileData.do)
URLhttps://www.data.go.kr/data/3080191/fileData.do

Alerts

Dataset has 2 (1.3%) duplicate rowsDuplicates
업종명 is highly imbalanced (70.4%)Imbalance
업소명 has 8 (5.2%) missing valuesMissing
소재지(도로명) has 8 (5.2%) missing valuesMissing
소재지전화 has 79 (51.6%) missing valuesMissing

Reproduction

Analysis started2023-12-13 00:30:14.819438
Analysis finished2023-12-13 00:30:15.504930
Duration0.69 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

업종명
Categorical

IMBALANCE 

Distinct2
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
식품소분업
145 
<NA>
 
8

Length

Max length5
Median length5
Mean length4.9477124
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row식품소분업
2nd row식품소분업
3rd row식품소분업
4th row식품소분업
5th row식품소분업

Common Values

ValueCountFrequency (%)
식품소분업 145
94.8%
<NA> 8
 
5.2%

Length

2023-12-13T09:30:15.557636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:30:15.639999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
식품소분업 145
94.8%
na 8
 
5.2%

업소명
Text

MISSING 

Distinct134
Distinct (%)92.4%
Missing8
Missing (%)5.2%
Memory size1.3 KiB
2023-12-13T09:30:15.797769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length14
Mean length6.5034483
Min length2

Characters and Unicode

Total characters943
Distinct characters240
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique123 ?
Unique (%)84.8%

Sample

1st row해동상사
2nd row해동상사
3rd row안양농협 하나로마트 비산점
4th row안양농협 하나로마트 비산점
5th row우주유통
ValueCountFrequency (%)
주식회사 6
 
3.8%
청미식품 2
 
1.3%
마루유통 2
 
1.3%
유환회사 2
 
1.3%
한마음마트 2
 
1.3%
월드유통 2
 
1.3%
비산점 2
 
1.3%
안양농협 2
 
1.3%
하나로마트 2
 
1.3%
해동상사 2
 
1.3%
Other values (130) 135
84.9%
2023-12-13T09:30:16.076805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
42
 
4.5%
) 37
 
3.9%
( 36
 
3.8%
28
 
3.0%
26
 
2.8%
25
 
2.7%
22
 
2.3%
22
 
2.3%
17
 
1.8%
17
 
1.8%
Other values (230) 671
71.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 845
89.6%
Close Punctuation 37
 
3.9%
Open Punctuation 36
 
3.8%
Space Separator 14
 
1.5%
Uppercase Letter 6
 
0.6%
Lowercase Letter 4
 
0.4%
Other Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
42
 
5.0%
28
 
3.3%
26
 
3.1%
25
 
3.0%
22
 
2.6%
22
 
2.6%
17
 
2.0%
17
 
2.0%
17
 
2.0%
16
 
1.9%
Other values (216) 613
72.5%
Uppercase Letter
ValueCountFrequency (%)
F 1
16.7%
S 1
16.7%
B 1
16.7%
C 1
16.7%
M 1
16.7%
K 1
16.7%
Lowercase Letter
ValueCountFrequency (%)
s 1
25.0%
u 1
25.0%
g 1
25.0%
a 1
25.0%
Close Punctuation
ValueCountFrequency (%)
) 37
100.0%
Open Punctuation
ValueCountFrequency (%)
( 36
100.0%
Space Separator
ValueCountFrequency (%)
14
100.0%
Other Punctuation
ValueCountFrequency (%)
& 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 845
89.6%
Common 88
 
9.3%
Latin 10
 
1.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
42
 
5.0%
28
 
3.3%
26
 
3.1%
25
 
3.0%
22
 
2.6%
22
 
2.6%
17
 
2.0%
17
 
2.0%
17
 
2.0%
16
 
1.9%
Other values (216) 613
72.5%
Latin
ValueCountFrequency (%)
F 1
10.0%
S 1
10.0%
s 1
10.0%
u 1
10.0%
g 1
10.0%
a 1
10.0%
B 1
10.0%
C 1
10.0%
M 1
10.0%
K 1
10.0%
Common
ValueCountFrequency (%)
) 37
42.0%
( 36
40.9%
14
 
15.9%
& 1
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 845
89.6%
ASCII 98
 
10.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
42
 
5.0%
28
 
3.3%
26
 
3.1%
25
 
3.0%
22
 
2.6%
22
 
2.6%
17
 
2.0%
17
 
2.0%
17
 
2.0%
16
 
1.9%
Other values (216) 613
72.5%
ASCII
ValueCountFrequency (%)
) 37
37.8%
( 36
36.7%
14
 
14.3%
F 1
 
1.0%
& 1
 
1.0%
S 1
 
1.0%
s 1
 
1.0%
u 1
 
1.0%
g 1
 
1.0%
a 1
 
1.0%
Other values (4) 4
 
4.1%

소재지(도로명)
Text

MISSING 

Distinct143
Distinct (%)98.6%
Missing8
Missing (%)5.2%
Memory size1.3 KiB
2023-12-13T09:30:16.350846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length58
Median length48
Mean length36.337931
Min length22

Characters and Unicode

Total characters5269
Distinct characters182
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique141 ?
Unique (%)97.2%

Sample

1st row경기도 안양시 만안구 안양로329번길 29 (안양동,지상1층)
2nd row 안양시 만안구 안양로329번길 29 (안양동,지상1층)
3rd row경기도 안양시 동안구 관악대로 82 (비산동)
4th row 안양시 동안구 관악대로 82 (비산동)
5th row경기도 안양시 동안구 갈산로44번길 28 (호계동)
ValueCountFrequency (%)
안양시 146
 
13.7%
경기도 137
 
12.8%
동안구 84
 
7.9%
만안구 61
 
5.7%
안양동 35
 
3.3%
호계동 33
 
3.1%
관양동 25
 
2.3%
1층 20
 
1.9%
지상1층 15
 
1.4%
지하1층 11
 
1.0%
Other values (301) 500
46.9%
2023-12-13T09:30:16.747449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
930
 
17.7%
384
 
7.3%
245
 
4.6%
241
 
4.6%
1 207
 
3.9%
160
 
3.0%
, 152
 
2.9%
2 151
 
2.9%
148
 
2.8%
147
 
2.8%
Other values (172) 2504
47.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2978
56.5%
Space Separator 930
 
17.7%
Decimal Number 836
 
15.9%
Other Punctuation 158
 
3.0%
Close Punctuation 145
 
2.8%
Open Punctuation 145
 
2.8%
Uppercase Letter 42
 
0.8%
Dash Punctuation 20
 
0.4%
Lowercase Letter 12
 
0.2%
Math Symbol 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
384
 
12.9%
245
 
8.2%
241
 
8.1%
160
 
5.4%
148
 
5.0%
147
 
4.9%
145
 
4.9%
140
 
4.7%
137
 
4.6%
91
 
3.1%
Other values (139) 1140
38.3%
Uppercase Letter
ValueCountFrequency (%)
B 16
38.1%
A 6
 
14.3%
I 4
 
9.5%
S 4
 
9.5%
K 3
 
7.1%
T 3
 
7.1%
V 2
 
4.8%
Z 1
 
2.4%
P 1
 
2.4%
E 1
 
2.4%
Decimal Number
ValueCountFrequency (%)
1 207
24.8%
2 151
18.1%
3 87
10.4%
0 71
 
8.5%
5 69
 
8.3%
4 66
 
7.9%
7 51
 
6.1%
9 47
 
5.6%
8 44
 
5.3%
6 43
 
5.1%
Lowercase Letter
ValueCountFrequency (%)
e 4
33.3%
r 2
16.7%
t 2
16.7%
n 2
16.7%
c 2
16.7%
Other Punctuation
ValueCountFrequency (%)
, 152
96.2%
. 6
 
3.8%
Space Separator
ValueCountFrequency (%)
930
100.0%
Close Punctuation
ValueCountFrequency (%)
) 145
100.0%
Open Punctuation
ValueCountFrequency (%)
( 145
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 20
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2978
56.5%
Common 2237
42.5%
Latin 54
 
1.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
384
 
12.9%
245
 
8.2%
241
 
8.1%
160
 
5.4%
148
 
5.0%
147
 
4.9%
145
 
4.9%
140
 
4.7%
137
 
4.6%
91
 
3.1%
Other values (139) 1140
38.3%
Common
ValueCountFrequency (%)
930
41.6%
1 207
 
9.3%
, 152
 
6.8%
2 151
 
6.8%
) 145
 
6.5%
( 145
 
6.5%
3 87
 
3.9%
0 71
 
3.2%
5 69
 
3.1%
4 66
 
3.0%
Other values (7) 214
 
9.6%
Latin
ValueCountFrequency (%)
B 16
29.6%
A 6
 
11.1%
I 4
 
7.4%
S 4
 
7.4%
e 4
 
7.4%
K 3
 
5.6%
T 3
 
5.6%
r 2
 
3.7%
t 2
 
3.7%
n 2
 
3.7%
Other values (6) 8
14.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2978
56.5%
ASCII 2291
43.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
930
40.6%
1 207
 
9.0%
, 152
 
6.6%
2 151
 
6.6%
) 145
 
6.3%
( 145
 
6.3%
3 87
 
3.8%
0 71
 
3.1%
5 69
 
3.0%
4 66
 
2.9%
Other values (23) 268
 
11.7%
Hangul
ValueCountFrequency (%)
384
 
12.9%
245
 
8.2%
241
 
8.1%
160
 
5.4%
148
 
5.0%
147
 
4.9%
145
 
4.9%
140
 
4.7%
137
 
4.6%
91
 
3.1%
Other values (139) 1140
38.3%

소재지전화
Text

MISSING 

Distinct73
Distinct (%)98.6%
Missing79
Missing (%)51.6%
Memory size1.3 KiB
2023-12-13T09:30:16.933562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length12
Mean length12.094595
Min length11

Characters and Unicode

Total characters895
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique72 ?
Unique (%)97.3%

Sample

1st row0708-2424-989
2nd row070-8811-0804
3rd row070-8671-1234
4th row070-7776-0172
5th row070-4236-5988
ValueCountFrequency (%)
031-424-3534 2
 
2.7%
031-427-4555 1
 
1.4%
031-380-6601 1
 
1.4%
02-3281-6811 1
 
1.4%
02-530-5000 1
 
1.4%
031-323-2820 1
 
1.4%
031-340-6415 1
 
1.4%
031-342-5150 1
 
1.4%
031-345-8100 1
 
1.4%
031-382-7181 1
 
1.4%
Other values (63) 63
85.1%
2023-12-13T09:30:17.236453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 148
16.5%
0 129
14.4%
3 119
13.3%
4 106
11.8%
1 104
11.6%
2 59
 
6.6%
8 54
 
6.0%
6 53
 
5.9%
5 46
 
5.1%
7 46
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 747
83.5%
Dash Punctuation 148
 
16.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 129
17.3%
3 119
15.9%
4 106
14.2%
1 104
13.9%
2 59
7.9%
8 54
7.2%
6 53
7.1%
5 46
 
6.2%
7 46
 
6.2%
9 31
 
4.1%
Dash Punctuation
ValueCountFrequency (%)
- 148
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 895
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 148
16.5%
0 129
14.4%
3 119
13.3%
4 106
11.8%
1 104
11.6%
2 59
 
6.6%
8 54
 
6.0%
6 53
 
5.9%
5 46
 
5.1%
7 46
 
5.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 895
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 148
16.5%
0 129
14.4%
3 119
13.3%
4 106
11.8%
1 104
11.6%
2 59
 
6.6%
8 54
 
6.0%
6 53
 
5.9%
5 46
 
5.1%
7 46
 
5.1%

Correlations

2023-12-13T09:30:17.315955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
소재지전화
소재지전화1.000

Missing values

2023-12-13T09:30:15.331236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T09:30:15.393188image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T09:30:15.462011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

업종명업소명소재지(도로명)소재지전화
0식품소분업해동상사경기도 안양시 만안구 안양로329번길 29 (안양동,지상1층)<NA>
1식품소분업해동상사안양시 만안구 안양로329번길 29 (안양동,지상1층)<NA>
2식품소분업안양농협 하나로마트 비산점경기도 안양시 동안구 관악대로 82 (비산동)<NA>
3식품소분업안양농협 하나로마트 비산점안양시 동안구 관악대로 82 (비산동)<NA>
4식품소분업우주유통경기도 안양시 동안구 갈산로44번길 28 (호계동)<NA>
5식품소분업우주유통안양시 동안구 갈산로44번길 28 (호계동)<NA>
6식품소분업골드종합식품경기도 안양시 만안구 덕천로127번길 111 (안양동)<NA>
7식품소분업골드종합식품안양시 만안구 덕천로127번길 111 (안양동)<NA>
8식품소분업(주)보원식품경기도 안양시 만안구 박달로275번길 19, 지상1층 (박달동)<NA>
9식품소분업(주)보원식품안양시 만안구 박달로275번길 19, 지상1층 (박달동)<NA>
업종명업소명소재지(도로명)소재지전화
143식품소분업홈파티경기도 안양시 동안구 평촌대로211번길 16 (호계동,동아월드 201호)031-216-0365
144식품소분업안양슈퍼경기도 안양시 만안구 안양로291번길 20 (안양동)<NA>
145<NA><NA><NA><NA>
146<NA><NA><NA><NA>
147<NA><NA><NA><NA>
148<NA><NA><NA><NA>
149<NA><NA><NA><NA>
150<NA><NA><NA><NA>
151<NA><NA><NA><NA>
152<NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

업종명업소명소재지(도로명)소재지전화# duplicates
1<NA><NA><NA><NA>8
0식품소분업한마음마트경기도 안양시 만안구 박달우회로124번길 43, 1층 (박달동)<NA>2