Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells39681
Missing cells (%)79.4%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory468.8 KiB
Average record size in memory48.0 B

Variable types

Categorical1
Text3
DateTime1

Dataset

Description제주특별자치도 제주시 관내 공중위생업 관련 숙박업 현황 데이터를 제공합니다.
Author제주특별자치도 제주시
URLhttps://www.data.go.kr/data/15056155/fileData.do

Alerts

데이터기준일자 has constant value ""Constant
Dataset has 1 (< 0.1%) duplicate rowsDuplicates
업종명 is highly imbalanced (95.3%)Imbalance
업소명 has 9918 (99.2%) missing valuesMissing
주소 has 9918 (99.2%) missing valuesMissing
전화번호 has 9927 (99.3%) missing valuesMissing
데이터기준일자 has 9918 (99.2%) missing valuesMissing

Reproduction

Analysis started2023-12-12 08:22:12.873424
Analysis finished2023-12-12 08:22:13.670169
Duration0.8 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

업종명
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9918 
숙박업(일반)
 
64
숙박업(생활)
 
18

Length

Max length7
Median length4
Mean length4.0246
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9918
99.2%
숙박업(일반) 64
 
0.6%
숙박업(생활) 18
 
0.2%

Length

2023-12-12T17:22:13.775491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:22:14.254281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9918
99.2%
숙박업(일반 64
 
0.6%
숙박업(생활 18
 
0.2%

업소명
Text

MISSING 

Distinct82
Distinct (%)100.0%
Missing9918
Missing (%)99.2%
Memory size156.2 KiB
2023-12-12T17:22:14.594349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length4.6097561
Min length2

Characters and Unicode

Total characters378
Distinct characters169
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique82 ?
Unique (%)100.0%

Sample

1st row더오크라
2nd row크라운호텔
3rd row절물앞동천펜션
4th row우도봉
5th row나디아호스텔
ValueCountFrequency (%)
호텔엘린 1
 
1.2%
삼해인 1
 
1.2%
엠버서더 1
 
1.2%
노형호텔 1
 
1.2%
브라보 1
 
1.2%
노노레타 1
 
1.2%
산지물호텔 1
 
1.2%
하버 1
 
1.2%
케이모텔 1
 
1.2%
라온골프클럽휴양콘도미니엄 1
 
1.2%
Other values (75) 75
88.2%
2023-12-12T17:22:15.259510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
26
 
6.9%
20
 
5.3%
13
 
3.4%
12
 
3.2%
11
 
2.9%
8
 
2.1%
7
 
1.9%
7
 
1.9%
6
 
1.6%
6
 
1.6%
Other values (159) 262
69.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 369
97.6%
Space Separator 3
 
0.8%
Decimal Number 3
 
0.8%
Close Punctuation 1
 
0.3%
Uppercase Letter 1
 
0.3%
Open Punctuation 1
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
26
 
7.0%
20
 
5.4%
13
 
3.5%
12
 
3.3%
11
 
3.0%
8
 
2.2%
7
 
1.9%
7
 
1.9%
6
 
1.6%
6
 
1.6%
Other values (153) 253
68.6%
Decimal Number
ValueCountFrequency (%)
2 2
66.7%
9 1
33.3%
Space Separator
ValueCountFrequency (%)
3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Uppercase Letter
ValueCountFrequency (%)
T 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 369
97.6%
Common 8
 
2.1%
Latin 1
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
26
 
7.0%
20
 
5.4%
13
 
3.5%
12
 
3.3%
11
 
3.0%
8
 
2.2%
7
 
1.9%
7
 
1.9%
6
 
1.6%
6
 
1.6%
Other values (153) 253
68.6%
Common
ValueCountFrequency (%)
3
37.5%
2 2
25.0%
9 1
 
12.5%
) 1
 
12.5%
( 1
 
12.5%
Latin
ValueCountFrequency (%)
T 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 369
97.6%
ASCII 9
 
2.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
26
 
7.0%
20
 
5.4%
13
 
3.5%
12
 
3.3%
11
 
3.0%
8
 
2.2%
7
 
1.9%
7
 
1.9%
6
 
1.6%
6
 
1.6%
Other values (153) 253
68.6%
ASCII
ValueCountFrequency (%)
3
33.3%
2 2
22.2%
9 1
 
11.1%
) 1
 
11.1%
T 1
 
11.1%
( 1
 
11.1%

주소
Text

MISSING 

Distinct82
Distinct (%)100.0%
Missing9918
Missing (%)99.2%
Memory size156.2 KiB
2023-12-12T17:22:15.683059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length26
Mean length21.073171
Min length17

Characters and Unicode

Total characters1728
Distinct characters95
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique82 ?
Unique (%)100.0%

Sample

1st row제주특별자치도 제주시 신광로4길 38
2nd row제주특별자치도 제주시 동문로 138
3rd row제주특별자치도 제주시 조천읍 명림로 655-77
4th row제주특별자치도 제주시 우도면 영일길 156-32
5th row제주특별자치도 제주시 애월읍 애원로 74
ValueCountFrequency (%)
제주특별자치도 82
23.2%
제주시 82
23.2%
애월읍 9
 
2.5%
조천읍 7
 
2.0%
3 6
 
1.7%
한림읍 5
 
1.4%
남조로 4
 
1.1%
구좌읍 3
 
0.8%
12 3
 
0.8%
도령로 3
 
0.8%
Other values (128) 150
42.4%
2023-12-12T17:22:16.312654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
272
15.7%
166
 
9.6%
164
 
9.5%
87
 
5.0%
82
 
4.7%
82
 
4.7%
82
 
4.7%
82
 
4.7%
82
 
4.7%
1 66
 
3.8%
Other values (85) 563
32.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1181
68.3%
Space Separator 272
 
15.7%
Decimal Number 252
 
14.6%
Dash Punctuation 23
 
1.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
166
14.1%
164
13.9%
87
 
7.4%
82
 
6.9%
82
 
6.9%
82
 
6.9%
82
 
6.9%
82
 
6.9%
56
 
4.7%
45
 
3.8%
Other values (73) 253
21.4%
Decimal Number
ValueCountFrequency (%)
1 66
26.2%
3 36
14.3%
2 33
13.1%
4 26
 
10.3%
7 20
 
7.9%
6 17
 
6.7%
5 16
 
6.3%
8 15
 
6.0%
9 13
 
5.2%
0 10
 
4.0%
Space Separator
ValueCountFrequency (%)
272
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1181
68.3%
Common 547
31.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
166
14.1%
164
13.9%
87
 
7.4%
82
 
6.9%
82
 
6.9%
82
 
6.9%
82
 
6.9%
82
 
6.9%
56
 
4.7%
45
 
3.8%
Other values (73) 253
21.4%
Common
ValueCountFrequency (%)
272
49.7%
1 66
 
12.1%
3 36
 
6.6%
2 33
 
6.0%
4 26
 
4.8%
- 23
 
4.2%
7 20
 
3.7%
6 17
 
3.1%
5 16
 
2.9%
8 15
 
2.7%
Other values (2) 23
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1181
68.3%
ASCII 547
31.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
272
49.7%
1 66
 
12.1%
3 36
 
6.6%
2 33
 
6.0%
4 26
 
4.8%
- 23
 
4.2%
7 20
 
3.7%
6 17
 
3.1%
5 16
 
2.9%
8 15
 
2.7%
Other values (2) 23
 
4.2%
Hangul
ValueCountFrequency (%)
166
14.1%
164
13.9%
87
 
7.4%
82
 
6.9%
82
 
6.9%
82
 
6.9%
82
 
6.9%
82
 
6.9%
56
 
4.7%
45
 
3.8%
Other values (73) 253
21.4%

전화번호
Text

MISSING 

Distinct73
Distinct (%)100.0%
Missing9927
Missing (%)99.3%
Memory size156.2 KiB
2023-12-12T17:22:16.655758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length11.958904
Min length9

Characters and Unicode

Total characters873
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique73 ?
Unique (%)100.0%

Sample

1st row064-748-2280
2nd row064-753-8011
3rd row064-758-6565
4th row064-751-4988
5th row064-747-8933
ValueCountFrequency (%)
064-722-1444 1
 
1.4%
064-757-6582 1
 
1.4%
064-747-2263 1
 
1.4%
064-758-7076 1
 
1.4%
064-783-0804 1
 
1.4%
064-748-2105 1
 
1.4%
064-754-6000 1
 
1.4%
064-756-8700 1
 
1.4%
064-795-1000 1
 
1.4%
064-742-7775 1
 
1.4%
Other values (63) 63
86.3%
2023-12-12T17:22:17.108218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 145
16.6%
0 137
15.7%
4 113
12.9%
6 105
12.0%
7 102
11.7%
5 61
7.0%
2 50
 
5.7%
9 45
 
5.2%
8 41
 
4.7%
1 39
 
4.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 728
83.4%
Dash Punctuation 145
 
16.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 137
18.8%
4 113
15.5%
6 105
14.4%
7 102
14.0%
5 61
8.4%
2 50
 
6.9%
9 45
 
6.2%
8 41
 
5.6%
1 39
 
5.4%
3 35
 
4.8%
Dash Punctuation
ValueCountFrequency (%)
- 145
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 873
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 145
16.6%
0 137
15.7%
4 113
12.9%
6 105
12.0%
7 102
11.7%
5 61
7.0%
2 50
 
5.7%
9 45
 
5.2%
8 41
 
4.7%
1 39
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 873
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 145
16.6%
0 137
15.7%
4 113
12.9%
6 105
12.0%
7 102
11.7%
5 61
7.0%
2 50
 
5.7%
9 45
 
5.2%
8 41
 
4.7%
1 39
 
4.5%

데이터기준일자
Date

CONSTANT  MISSING 

Distinct1
Distinct (%)1.2%
Missing9918
Missing (%)99.2%
Memory size156.2 KiB
Minimum2021-02-15 00:00:00
Maximum2021-02-15 00:00:00
2023-12-12T17:22:17.260849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:22:17.370640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Correlations

2023-12-12T17:22:17.464557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종명업소명주소전화번호
업종명1.0001.0001.0001.000
업소명1.0001.0001.0001.000
주소1.0001.0001.0001.000
전화번호1.0001.0001.0001.000

Missing values

2023-12-12T17:22:13.341407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T17:22:13.466552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T17:22:13.592055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

업종명업소명주소전화번호데이터기준일자
71007<NA><NA><NA><NA><NA>
46424<NA><NA><NA><NA><NA>
31793<NA><NA><NA><NA><NA>
37901<NA><NA><NA><NA><NA>
27231<NA><NA><NA><NA><NA>
963<NA><NA><NA><NA><NA>
2594<NA><NA><NA><NA><NA>
34978<NA><NA><NA><NA><NA>
99616<NA><NA><NA><NA><NA>
80863<NA><NA><NA><NA><NA>
업종명업소명주소전화번호데이터기준일자
76922<NA><NA><NA><NA><NA>
99317<NA><NA><NA><NA><NA>
96838<NA><NA><NA><NA><NA>
58241<NA><NA><NA><NA><NA>
37927<NA><NA><NA><NA><NA>
40787<NA><NA><NA><NA><NA>
19582<NA><NA><NA><NA><NA>
65506<NA><NA><NA><NA><NA>
42670<NA><NA><NA><NA><NA>
37962<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

업종명업소명주소전화번호데이터기준일자# duplicates
0<NA><NA><NA><NA><NA>9918