Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells29554
Missing cells (%)59.1%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory468.8 KiB
Average record size in memory48.0 B

Variable types

Categorical2
Text3

Dataset

Description제주특별자치도 제주시 관내 공중위생업 관련 미용업 현황 데이터를 제공합니다.
Author제주특별자치도 제주시
URLhttps://www.data.go.kr/data/15056159/fileData.do

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
데이터기준일자 is highly overall correlated with 업종명High correlation
업종명 is highly overall correlated with 데이터기준일자High correlation
업종명 is highly imbalanced (95.5%)Imbalance
데이터기준일자 is highly imbalanced (87.3%)Imbalance
업소명 has 9826 (98.3%) missing valuesMissing
주소 has 9826 (98.3%) missing valuesMissing
전화번호 has 9902 (99.0%) missing valuesMissing

Reproduction

Analysis started2023-12-12 08:11:37.579112
Analysis finished2023-12-12 08:11:38.458253
Duration0.88 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

업종명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct13
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9826 
일반미용업
 
80
피부미용업
 
35
미용업
 
21
네일미용업
 
16
Other values (8)
 
22

Length

Max length21
Median length4
Mean length4.0274
Min length3

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9826
98.3%
일반미용업 80
 
0.8%
피부미용업 35
 
0.4%
미용업 21
 
0.2%
네일미용업 16
 
0.2%
화장ㆍ분장 미용업 5
 
0.1%
일반미용업 화장ㆍ분장 미용업 4
 
< 0.1%
네일미용업 화장ㆍ분장 미용업 3
 
< 0.1%
종합미용업 3
 
< 0.1%
피부미용업 네일미용업 3
 
< 0.1%
Other values (3) 4
 
< 0.1%

Length

2023-12-12T17:11:38.567153image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 9826
98.0%
일반미용업 87
 
0.9%
피부미용업 41
 
0.4%
미용업 34
 
0.3%
네일미용업 24
 
0.2%
화장ㆍ분장 13
 
0.1%
종합미용업 3
 
< 0.1%

업소명
Text

MISSING 

Distinct174
Distinct (%)100.0%
Missing9826
Missing (%)98.3%
Memory size156.2 KiB
2023-12-12T17:11:38.906917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length10
Mean length5.183908
Min length1

Characters and Unicode

Total characters902
Distinct characters236
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique174 ?
Unique (%)100.0%

Sample

1st row반짝반짝네일
2nd row네일하나
3rd row센스클럽헤어샵
4th row꼽슬머리
5th row동인
ValueCountFrequency (%)
헤어아트 2
 
1.1%
태후사랑 2
 
1.1%
스킨존 1
 
0.5%
쉼뷰티 1
 
0.5%
헤어캄 1
 
0.5%
고고살롱 1
 
0.5%
깍쟁이헤어 1
 
0.5%
라야롬에스테틱 1
 
0.5%
미라인 1
 
0.5%
설렘주의보 1
 
0.5%
Other values (174) 174
93.5%
2023-12-12T17:11:39.476256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
73
 
8.1%
64
 
7.1%
35
 
3.9%
32
 
3.5%
22
 
2.4%
19
 
2.1%
17
 
1.9%
17
 
1.9%
17
 
1.9%
15
 
1.7%
Other values (226) 591
65.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 886
98.2%
Space Separator 12
 
1.3%
Decimal Number 4
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
73
 
8.2%
64
 
7.2%
35
 
4.0%
32
 
3.6%
22
 
2.5%
19
 
2.1%
17
 
1.9%
17
 
1.9%
17
 
1.9%
15
 
1.7%
Other values (222) 575
64.9%
Decimal Number
ValueCountFrequency (%)
0 2
50.0%
1 1
25.0%
9 1
25.0%
Space Separator
ValueCountFrequency (%)
12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 886
98.2%
Common 16
 
1.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
73
 
8.2%
64
 
7.2%
35
 
4.0%
32
 
3.6%
22
 
2.5%
19
 
2.1%
17
 
1.9%
17
 
1.9%
17
 
1.9%
15
 
1.7%
Other values (222) 575
64.9%
Common
ValueCountFrequency (%)
12
75.0%
0 2
 
12.5%
1 1
 
6.2%
9 1
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 886
98.2%
ASCII 16
 
1.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
73
 
8.2%
64
 
7.2%
35
 
4.0%
32
 
3.6%
22
 
2.5%
19
 
2.1%
17
 
1.9%
17
 
1.9%
17
 
1.9%
15
 
1.7%
Other values (222) 575
64.9%
ASCII
ValueCountFrequency (%)
12
75.0%
0 2
 
12.5%
1 1
 
6.2%
9 1
 
6.2%

주소
Text

MISSING 

Distinct172
Distinct (%)98.9%
Missing9826
Missing (%)98.3%
Memory size156.2 KiB
2023-12-12T17:11:39.894266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length25
Mean length19.781609
Min length17

Characters and Unicode

Total characters3442
Distinct characters113
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique170 ?
Unique (%)97.7%

Sample

1st row제주특별자치도 제주시 우정로11길 18
2nd row제주특별자치도 제주시 서광로 288
3rd row제주특별자치도 제주시 절물1길 32
4th row제주특별자치도 제주시 남광북3길 18
5th row제주특별자치도 제주시 중앙로26길 2
ValueCountFrequency (%)
제주특별자치도 174
24.5%
제주시 174
24.5%
한림읍 8
 
1.1%
2 7
 
1.0%
34 5
 
0.7%
27 4
 
0.6%
5 4
 
0.6%
9 4
 
0.6%
1 4
 
0.6%
동문로 4
 
0.6%
Other values (244) 322
45.4%
2023-12-12T17:11:40.561699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
536
15.6%
352
 
10.2%
348
 
10.1%
178
 
5.2%
174
 
5.1%
174
 
5.1%
174
 
5.1%
174
 
5.1%
174
 
5.1%
111
 
3.2%
Other values (103) 1047
30.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2390
69.4%
Space Separator 536
 
15.6%
Decimal Number 490
 
14.2%
Dash Punctuation 26
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
352
14.7%
348
14.6%
178
7.4%
174
7.3%
174
7.3%
174
7.3%
174
7.3%
174
7.3%
111
 
4.6%
109
 
4.6%
Other values (91) 422
17.7%
Decimal Number
ValueCountFrequency (%)
1 102
20.8%
2 72
14.7%
3 62
12.7%
4 54
11.0%
5 44
9.0%
8 37
 
7.6%
7 33
 
6.7%
9 31
 
6.3%
6 29
 
5.9%
0 26
 
5.3%
Space Separator
ValueCountFrequency (%)
536
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 26
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2390
69.4%
Common 1052
30.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
352
14.7%
348
14.6%
178
7.4%
174
7.3%
174
7.3%
174
7.3%
174
7.3%
174
7.3%
111
 
4.6%
109
 
4.6%
Other values (91) 422
17.7%
Common
ValueCountFrequency (%)
536
51.0%
1 102
 
9.7%
2 72
 
6.8%
3 62
 
5.9%
4 54
 
5.1%
5 44
 
4.2%
8 37
 
3.5%
7 33
 
3.1%
9 31
 
2.9%
6 29
 
2.8%
Other values (2) 52
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2390
69.4%
ASCII 1052
30.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
536
51.0%
1 102
 
9.7%
2 72
 
6.8%
3 62
 
5.9%
4 54
 
5.1%
5 44
 
4.2%
8 37
 
3.5%
7 33
 
3.1%
9 31
 
2.9%
6 29
 
2.8%
Other values (2) 52
 
4.9%
Hangul
ValueCountFrequency (%)
352
14.7%
348
14.6%
178
7.4%
174
7.3%
174
7.3%
174
7.3%
174
7.3%
174
7.3%
111
 
4.6%
109
 
4.6%
Other values (91) 422
17.7%

전화번호
Text

MISSING 

Distinct98
Distinct (%)100.0%
Missing9902
Missing (%)99.0%
Memory size156.2 KiB
2023-12-12T17:11:40.887518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length12
Mean length12.05102
Min length12

Characters and Unicode

Total characters1181
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique98 ?
Unique (%)100.0%

Sample

1st row064-742-3634
2nd row064-721-8844
3rd row064-758-3304
4th row064-757-3088
5th row064-782-1062
ValueCountFrequency (%)
064-742-8611 1
 
1.0%
064-758-6781 1
 
1.0%
064-752-0176 1
 
1.0%
070-8223-9099 1
 
1.0%
064-743-0600 1
 
1.0%
064-759-3003 1
 
1.0%
064-725-1220 1
 
1.0%
064-796-0246 1
 
1.0%
064-758-0761 1
 
1.0%
064-756-2861 1
 
1.0%
Other values (88) 88
89.8%
2023-12-12T17:11:41.301798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 196
16.6%
0 158
13.4%
4 152
12.9%
7 151
12.8%
6 146
12.4%
2 90
7.6%
5 79
6.7%
1 61
 
5.2%
8 57
 
4.8%
3 48
 
4.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 985
83.4%
Dash Punctuation 196
 
16.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 158
16.0%
4 152
15.4%
7 151
15.3%
6 146
14.8%
2 90
9.1%
5 79
8.0%
1 61
 
6.2%
8 57
 
5.8%
3 48
 
4.9%
9 43
 
4.4%
Dash Punctuation
ValueCountFrequency (%)
- 196
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1181
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 196
16.6%
0 158
13.4%
4 152
12.9%
7 151
12.8%
6 146
12.4%
2 90
7.6%
5 79
6.7%
1 61
 
5.2%
8 57
 
4.8%
3 48
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1181
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 196
16.6%
0 158
13.4%
4 152
12.9%
7 151
12.8%
6 146
12.4%
2 90
7.6%
5 79
6.7%
1 61
 
5.2%
8 57
 
4.8%
3 48
 
4.1%

데이터기준일자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9826 
2021-02-15
 
174

Length

Max length10
Median length4
Mean length4.1044
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9826
98.3%
2021-02-15 174
 
1.7%

Length

2023-12-12T17:11:41.450583image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:11:41.571725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9826
98.3%
2021-02-15 174
 
1.7%

Correlations

2023-12-12T17:11:41.648937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종명전화번호
업종명1.0001.000
전화번호1.0001.000
2023-12-12T17:11:41.734553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
데이터기준일자업종명
데이터기준일자1.0001.000
업종명1.0001.000
2023-12-12T17:11:41.826251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종명데이터기준일자
업종명1.0001.000
데이터기준일자1.0001.000

Missing values

2023-12-12T17:11:38.039895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T17:11:38.188504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T17:11:38.348430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

업종명업소명주소전화번호데이터기준일자
1951<NA><NA><NA><NA><NA>
56891<NA><NA><NA><NA><NA>
35125<NA><NA><NA><NA><NA>
48212<NA><NA><NA><NA><NA>
69842<NA><NA><NA><NA><NA>
23012<NA><NA><NA><NA><NA>
20049<NA><NA><NA><NA><NA>
66074<NA><NA><NA><NA><NA>
75933<NA><NA><NA><NA><NA>
48478<NA><NA><NA><NA><NA>
업종명업소명주소전화번호데이터기준일자
93117<NA><NA><NA><NA><NA>
76312<NA><NA><NA><NA><NA>
11237<NA><NA><NA><NA><NA>
11006<NA><NA><NA><NA><NA>
40142<NA><NA><NA><NA><NA>
67015<NA><NA><NA><NA><NA>
77128<NA><NA><NA><NA><NA>
52466<NA><NA><NA><NA><NA>
55079<NA><NA><NA><NA><NA>
60359<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

업종명업소명주소전화번호데이터기준일자# duplicates
0<NA><NA><NA><NA><NA>9826