Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells39956
Missing cells (%)79.9%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Numeric2
Categorical1
Text2

Dataset

Description광주광역시에서 지정한 우수숙박업소(크린숙박업소)에 대한 현황자료입니다.(연번,업소명,소재지,객실수)
URLhttps://www.data.go.kr/data/15055845/fileData.do

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
연번 is highly overall correlated with 자치구High correlation
자치구 is highly overall correlated with 연번High correlation
자치구 is highly imbalanced (99.3%)Imbalance
연번 has 9989 (99.9%) missing valuesMissing
업 소 명 has 9989 (99.9%) missing valuesMissing
소 재 지 has 9989 (99.9%) missing valuesMissing
객실수 has 9989 (99.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 12:36:10.288813
Analysis finished2023-12-12 12:36:11.404923
Duration1.12 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct11
Distinct (%)100.0%
Missing9989
Missing (%)99.9%
Infinite0
Infinite (%)0.0%
Mean36.727273
Minimum9
Maximum68
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T21:36:11.578656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile15
Q124
median39
Q344.5
95-th percentile65
Maximum68
Range59
Interquartile range (IQR)20.5

Descriptive statistics

Standard deviation17.900229
Coefficient of variation (CV)0.48738246
Kurtosis-0.42792545
Mean36.727273
Median Absolute Deviation (MAD)13
Skewness0.36675511
Sum404
Variance320.41818
MonotonicityNot monotonic
2023-12-12T21:36:11.699191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
22 1
 
< 0.1%
44 1
 
< 0.1%
62 1
 
< 0.1%
68 1
 
< 0.1%
21 1
 
< 0.1%
39 1
 
< 0.1%
26 1
 
< 0.1%
45 1
 
< 0.1%
41 1
 
< 0.1%
9 1
 
< 0.1%
(Missing) 9989
99.9%
ValueCountFrequency (%)
9 1
< 0.1%
21 1
< 0.1%
22 1
< 0.1%
26 1
< 0.1%
27 1
< 0.1%
39 1
< 0.1%
41 1
< 0.1%
44 1
< 0.1%
45 1
< 0.1%
62 1
< 0.1%
ValueCountFrequency (%)
68 1
< 0.1%
62 1
< 0.1%
45 1
< 0.1%
44 1
< 0.1%
41 1
< 0.1%
39 1
< 0.1%
27 1
< 0.1%
26 1
< 0.1%
22 1
< 0.1%
21 1
< 0.1%

자치구
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9989 
서구
 
5
북구
 
4
광산구
 
2

Length

Max length4
Median length4
Mean length3.998
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9989
99.9%
서구 5
 
0.1%
북구 4
 
< 0.1%
광산구 2
 
< 0.1%

Length

2023-12-12T21:36:11.832173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:36:11.948127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9989
99.9%
서구 5
 
< 0.1%
북구 4
 
< 0.1%
광산구 2
 
< 0.1%

업 소 명
Text

MISSING 

Distinct11
Distinct (%)100.0%
Missing9989
Missing (%)99.9%
Memory size156.2 KiB
2023-12-12T21:36:12.129546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length7
Mean length5.0909091
Min length3

Characters and Unicode

Total characters56
Distinct characters34
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)100.0%

Sample

1st row모텔캐슬
2nd row러브스토리모텔
3rd row호텔야자
4th row샆모텔
5th row선샤인
ValueCountFrequency (%)
모텔캐슬 1
 
7.7%
러브스토리모텔 1
 
7.7%
호텔야자 1
 
7.7%
샆모텔 1
 
7.7%
선샤인 1
 
7.7%
아리아모텔 1
 
7.7%
호텔 1
 
7.7%
1
 
7.7%
베네치아모텔 1
 
7.7%
주식회사 1
 
7.7%
Other values (3) 3
23.1%
2023-12-12T21:36:12.449427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9
 
16.1%
6
 
10.7%
3
 
5.4%
3
 
5.4%
3
 
5.4%
2
 
3.6%
2
 
3.6%
2
 
3.6%
1
 
1.8%
1
 
1.8%
Other values (24) 24
42.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 54
96.4%
Space Separator 2
 
3.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9
 
16.7%
6
 
11.1%
3
 
5.6%
3
 
5.6%
3
 
5.6%
2
 
3.7%
2
 
3.7%
1
 
1.9%
1
 
1.9%
1
 
1.9%
Other values (23) 23
42.6%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 54
96.4%
Common 2
 
3.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9
 
16.7%
6
 
11.1%
3
 
5.6%
3
 
5.6%
3
 
5.6%
2
 
3.7%
2
 
3.7%
1
 
1.9%
1
 
1.9%
1
 
1.9%
Other values (23) 23
42.6%
Common
ValueCountFrequency (%)
2
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 54
96.4%
ASCII 2
 
3.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9
 
16.7%
6
 
11.1%
3
 
5.6%
3
 
5.6%
3
 
5.6%
2
 
3.7%
2
 
3.7%
1
 
1.9%
1
 
1.9%
1
 
1.9%
Other values (23) 23
42.6%
ASCII
ValueCountFrequency (%)
2
100.0%

소 재 지
Text

MISSING 

Distinct11
Distinct (%)100.0%
Missing9989
Missing (%)99.9%
Memory size156.2 KiB
2023-12-12T21:36:12.667231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length20
Mean length19
Min length14

Characters and Unicode

Total characters209
Distinct characters51
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)100.0%

Sample

1st row서구 상무평화로 97-6 (치평동)
2nd row북구 무등로218번길 38 (신안동)
3rd row광산구 사암로171번길 90-24(우산동)
4th row광산구 용아로401번길 31 (하남동)
5th row서구 금화로85번길 4-24 (금호동)
ValueCountFrequency (%)
서구 5
 
12.5%
북구 4
 
10.0%
치평동 3
 
7.5%
광산구 2
 
5.0%
상무평화로 2
 
5.0%
설죽로217번길 1
 
2.5%
154 1
 
2.5%
8 1
 
2.5%
상무연하로 1
 
2.5%
36(오룡동 1
 
2.5%
Other values (19) 19
47.5%
2023-12-12T21:36:13.012183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
29
 
13.9%
( 11
 
5.3%
11
 
5.3%
11
 
5.3%
) 11
 
5.3%
11
 
5.3%
1 9
 
4.3%
7
 
3.3%
6
 
2.9%
4 6
 
2.9%
Other values (41) 97
46.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 110
52.6%
Decimal Number 44
 
21.1%
Space Separator 29
 
13.9%
Open Punctuation 11
 
5.3%
Close Punctuation 11
 
5.3%
Dash Punctuation 4
 
1.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
11
 
10.0%
11
 
10.0%
11
 
10.0%
7
 
6.4%
6
 
5.5%
6
 
5.5%
5
 
4.5%
5
 
4.5%
4
 
3.6%
4
 
3.6%
Other values (27) 40
36.4%
Decimal Number
ValueCountFrequency (%)
1 9
20.5%
4 6
13.6%
3 5
11.4%
8 5
11.4%
2 4
9.1%
5 4
9.1%
7 3
 
6.8%
0 3
 
6.8%
6 3
 
6.8%
9 2
 
4.5%
Space Separator
ValueCountFrequency (%)
29
100.0%
Open Punctuation
ValueCountFrequency (%)
( 11
100.0%
Close Punctuation
ValueCountFrequency (%)
) 11
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 110
52.6%
Common 99
47.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
11
 
10.0%
11
 
10.0%
11
 
10.0%
7
 
6.4%
6
 
5.5%
6
 
5.5%
5
 
4.5%
5
 
4.5%
4
 
3.6%
4
 
3.6%
Other values (27) 40
36.4%
Common
ValueCountFrequency (%)
29
29.3%
( 11
 
11.1%
) 11
 
11.1%
1 9
 
9.1%
4 6
 
6.1%
3 5
 
5.1%
8 5
 
5.1%
2 4
 
4.0%
5 4
 
4.0%
- 4
 
4.0%
Other values (4) 11
 
11.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 110
52.6%
ASCII 99
47.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
29
29.3%
( 11
 
11.1%
) 11
 
11.1%
1 9
 
9.1%
4 6
 
6.1%
3 5
 
5.1%
8 5
 
5.1%
2 4
 
4.0%
5 4
 
4.0%
- 4
 
4.0%
Other values (4) 11
 
11.1%
Hangul
ValueCountFrequency (%)
11
 
10.0%
11
 
10.0%
11
 
10.0%
7
 
6.4%
6
 
5.5%
6
 
5.5%
5
 
4.5%
5
 
4.5%
4
 
3.6%
4
 
3.6%
Other values (27) 40
36.4%

객실수
Real number (ℝ)

MISSING 

Distinct8
Distinct (%)72.7%
Missing9989
Missing (%)99.9%
Infinite0
Infinite (%)0.0%
Mean37.090909
Minimum30
Maximum48
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T21:36:13.152484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum30
5-th percentile30.5
Q132.5
median36
Q340
95-th percentile45
Maximum48
Range18
Interquartile range (IQR)7.5

Descriptive statistics

Standard deviation5.4855181
Coefficient of variation (CV)0.14789387
Kurtosis-0.1540749
Mean37.090909
Median Absolute Deviation (MAD)4
Skewness0.52196395
Sum408
Variance30.090909
MonotonicityNot monotonic
2023-12-12T21:36:13.289711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
40 3
 
< 0.1%
36 2
 
< 0.1%
42 1
 
< 0.1%
32 1
 
< 0.1%
48 1
 
< 0.1%
30 1
 
< 0.1%
31 1
 
< 0.1%
33 1
 
< 0.1%
(Missing) 9989
99.9%
ValueCountFrequency (%)
30 1
 
< 0.1%
31 1
 
< 0.1%
32 1
 
< 0.1%
33 1
 
< 0.1%
36 2
< 0.1%
40 3
< 0.1%
42 1
 
< 0.1%
48 1
 
< 0.1%
ValueCountFrequency (%)
48 1
 
< 0.1%
42 1
 
< 0.1%
40 3
< 0.1%
36 2
< 0.1%
33 1
 
< 0.1%
32 1
 
< 0.1%
31 1
 
< 0.1%
30 1
 
< 0.1%

Interactions

2023-12-12T21:36:10.789085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:36:10.588835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:36:10.921610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:36:10.682152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:36:13.381915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번자치구업 소 명소 재 지객실수
연번1.0001.0001.0001.0000.525
자치구1.0001.0001.0001.0000.740
업 소 명1.0001.0001.0001.0001.000
소 재 지1.0001.0001.0001.0001.000
객실수0.5250.7401.0001.0001.000
2023-12-12T21:36:13.498435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번객실수자치구
연번1.0000.3260.707
객실수0.3261.0000.250
자치구0.7070.2501.000

Missing values

2023-12-12T21:36:11.091357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:36:11.202524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T21:36:11.324390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번자치구업 소 명소 재 지객실수
35413<NA><NA><NA><NA><NA>
28049<NA><NA><NA><NA><NA>
78654<NA><NA><NA><NA><NA>
8770<NA><NA><NA><NA><NA>
19934<NA><NA><NA><NA><NA>
27884<NA><NA><NA><NA><NA>
93443<NA><NA><NA><NA><NA>
34793<NA><NA><NA><NA><NA>
62723<NA><NA><NA><NA><NA>
83351<NA><NA><NA><NA><NA>
연번자치구업 소 명소 재 지객실수
35490<NA><NA><NA><NA><NA>
59438<NA><NA><NA><NA><NA>
83069<NA><NA><NA><NA><NA>
89447<NA><NA><NA><NA><NA>
13427<NA><NA><NA><NA><NA>
5331<NA><NA><NA><NA><NA>
98753<NA><NA><NA><NA><NA>
95954<NA><NA><NA><NA><NA>
64283<NA><NA><NA><NA><NA>
71192<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

연번자치구업 소 명소 재 지객실수# duplicates
0<NA><NA><NA><NA><NA>9989