Overview

Dataset statistics

Number of variables6
Number of observations698
Missing cells21
Missing cells (%)0.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory33.5 KiB
Average record size in memory49.2 B

Variable types

Text3
Categorical2
Numeric1

Dataset

Description키,상호,행정시,행정구,행정동,객실수
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-13074/S/1/datasetView.do

Alerts

행정시 has constant value ""Constant
객실수 has 21 (3.0%) missing valuesMissing
has unique valuesUnique

Reproduction

Analysis started2024-04-14 07:25:59.404934
Analysis finished2024-04-14 07:26:02.949283
Duration3.54 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables


Text

UNIQUE 

Distinct698
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2024-04-14T16:26:03.621672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters9772
Distinct characters18
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique698 ?
Unique (%)100.0%

Sample

1st rowBE_LiST20-0333
2nd rowBE_LiST20-0334
3rd rowBE_LiST20-0335
4th rowBE_LiST20-0336
5th rowBE_LiST20-0337
ValueCountFrequency (%)
be_list20-0333 1
 
0.1%
be_list20-0121 1
 
0.1%
be_list20-0154 1
 
0.1%
be_list20-0124 1
 
0.1%
be_list20-0115 1
 
0.1%
be_list20-0116 1
 
0.1%
be_list20-0117 1
 
0.1%
be_list20-0118 1
 
0.1%
be_list20-0119 1
 
0.1%
be_list20-0120 1
 
0.1%
Other values (688) 688
98.6%
2024-04-14T16:26:04.872849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1633
16.7%
2 938
9.6%
B 698
7.1%
T 698
7.1%
E 698
7.1%
- 698
7.1%
S 698
7.1%
i 698
7.1%
L 698
7.1%
_ 698
7.1%
Other values (8) 1617
16.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4188
42.9%
Uppercase Letter 3490
35.7%
Dash Punctuation 698
 
7.1%
Lowercase Letter 698
 
7.1%
Connector Punctuation 698
 
7.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1633
39.0%
2 938
22.4%
3 240
 
5.7%
4 240
 
5.7%
1 240
 
5.7%
5 240
 
5.7%
6 239
 
5.7%
7 140
 
3.3%
8 140
 
3.3%
9 138
 
3.3%
Uppercase Letter
ValueCountFrequency (%)
B 698
20.0%
T 698
20.0%
E 698
20.0%
S 698
20.0%
L 698
20.0%
Dash Punctuation
ValueCountFrequency (%)
- 698
100.0%
Lowercase Letter
ValueCountFrequency (%)
i 698
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 698
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5584
57.1%
Latin 4188
42.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1633
29.2%
2 938
16.8%
- 698
12.5%
_ 698
12.5%
3 240
 
4.3%
4 240
 
4.3%
1 240
 
4.3%
5 240
 
4.3%
6 239
 
4.3%
7 140
 
2.5%
Other values (2) 278
 
5.0%
Latin
ValueCountFrequency (%)
B 698
16.7%
T 698
16.7%
E 698
16.7%
S 698
16.7%
i 698
16.7%
L 698
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9772
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1633
16.7%
2 938
9.6%
B 698
7.1%
T 698
7.1%
E 698
7.1%
- 698
7.1%
S 698
7.1%
i 698
7.1%
L 698
7.1%
_ 698
7.1%
Other values (8) 1617
16.5%

상호
Text

Distinct654
Distinct (%)93.7%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2024-04-14T16:26:05.768447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length14
Mean length4.777937
Min length1

Characters and Unicode

Total characters3335
Distinct characters470
Distinct categories12 ?
Distinct scripts4 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique630 ?
Unique (%)90.3%

Sample

1st row?牛山丘民宿
2nd row?民宿
3rd row芒果民宿
4th row十字路口背包旅人
5th row?富
ValueCountFrequency (%)
民宿 15
 
2.1%
之家 9
 
1.2%
7
 
1.0%
住宿 5
 
0.7%
5
 
0.7%
家庭寄宿 5
 
0.7%
the 5
 
0.7%
安民宿 4
 
0.6%
人之家 3
 
0.4%
的家 3
 
0.4%
Other values (636) 659
91.5%
2024-04-14T16:26:07.107859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
? 573
 
17.2%
宿 306
 
9.2%
256
 
7.7%
245
 
7.3%
89
 
2.7%
65
 
1.9%
47
 
1.4%
43
 
1.3%
40
 
1.2%
31
 
0.9%
Other values (460) 1640
49.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2287
68.6%
Other Punctuation 589
 
17.7%
Uppercase Letter 219
 
6.6%
Lowercase Letter 102
 
3.1%
Decimal Number 94
 
2.8%
Space Separator 25
 
0.7%
Dash Punctuation 9
 
0.3%
Open Punctuation 3
 
0.1%
Close Punctuation 3
 
0.1%
Letter Number 2
 
0.1%
Other values (2) 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
宿 306
 
13.4%
256
 
11.2%
245
 
10.7%
89
 
3.9%
65
 
2.8%
47
 
2.1%
43
 
1.9%
40
 
1.7%
31
 
1.4%
26
 
1.1%
Other values (372) 1139
49.8%
Uppercase Letter
ValueCountFrequency (%)
J 29
 
13.2%
O 21
 
9.6%
K 21
 
9.6%
S 14
 
6.4%
I 10
 
4.6%
C 9
 
4.1%
Y 9
 
4.1%
A 9
 
4.1%
U 9
 
4.1%
N 9
 
4.1%
Other values (25) 79
36.1%
Lowercase Letter
ValueCountFrequency (%)
e 17
16.7%
o 14
13.7%
a 8
 
7.8%
t 8
 
7.8%
s 7
 
6.9%
h 7
 
6.9%
u 6
 
5.9%
n 5
 
4.9%
i 5
 
4.9%
z 4
 
3.9%
Other values (10) 21
20.6%
Decimal Number
ValueCountFrequency (%)
2 29
30.9%
8 13
13.8%
1 11
 
11.7%
4 10
 
10.6%
0 7
 
7.4%
9 6
 
6.4%
7 4
 
4.3%
3 3
 
3.2%
3
 
3.2%
6 3
 
3.2%
Other values (3) 5
 
5.3%
Other Punctuation
ValueCountFrequency (%)
? 573
97.3%
. 6
 
1.0%
@ 3
 
0.5%
2
 
0.3%
; 1
 
0.2%
1
 
0.2%
' 1
 
0.2%
& 1
 
0.2%
1
 
0.2%
Space Separator
ValueCountFrequency (%)
22
88.0%
  3
 
12.0%
Open Punctuation
ValueCountFrequency (%)
( 2
66.7%
1
33.3%
Close Punctuation
ValueCountFrequency (%)
) 2
66.7%
1
33.3%
Letter Number
ValueCountFrequency (%)
1
50.0%
1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%
Initial Punctuation
ValueCountFrequency (%)
1
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 2281
68.4%
Common 725
 
21.7%
Latin 323
 
9.7%
Hangul 6
 
0.2%

Most frequent character per script

Han
ValueCountFrequency (%)
宿 306
 
13.4%
256
 
11.2%
245
 
10.7%
89
 
3.9%
65
 
2.8%
47
 
2.1%
43
 
1.9%
40
 
1.8%
31
 
1.4%
26
 
1.1%
Other values (369) 1133
49.7%
Latin
ValueCountFrequency (%)
J 29
 
9.0%
O 21
 
6.5%
K 21
 
6.5%
e 17
 
5.3%
o 14
 
4.3%
S 14
 
4.3%
I 10
 
3.1%
C 9
 
2.8%
Y 9
 
2.8%
A 9
 
2.8%
Other values (47) 170
52.6%
Common
ValueCountFrequency (%)
? 573
79.0%
2 29
 
4.0%
22
 
3.0%
8 13
 
1.8%
1 11
 
1.5%
4 10
 
1.4%
- 9
 
1.2%
0 7
 
1.0%
. 6
 
0.8%
9 6
 
0.8%
Other values (21) 39
 
5.4%
Hangul
ValueCountFrequency (%)
4
66.7%
1
 
16.7%
1
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
CJK 2281
68.4%
ASCII 1014
30.4%
None 30
 
0.9%
Hangul 6
 
0.2%
Number Forms 2
 
0.1%
Punctuation 2
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
? 573
56.5%
2 29
 
2.9%
J 29
 
2.9%
22
 
2.2%
O 21
 
2.1%
K 21
 
2.1%
e 17
 
1.7%
o 14
 
1.4%
S 14
 
1.4%
8 13
 
1.3%
Other values (54) 261
25.7%
CJK
ValueCountFrequency (%)
宿 306
 
13.4%
256
 
11.2%
245
 
10.7%
89
 
3.9%
65
 
2.8%
47
 
2.1%
43
 
1.9%
40
 
1.8%
31
 
1.4%
26
 
1.1%
Other values (369) 1133
49.7%
Hangul
ValueCountFrequency (%)
4
66.7%
1
 
16.7%
1
 
16.7%
None
ValueCountFrequency (%)
  3
 
10.0%
3
 
10.0%
3
 
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
1
 
3.3%
1
 
3.3%
Other values (10) 10
33.3%
Number Forms
ValueCountFrequency (%)
1
50.0%
1
50.0%
Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%

행정시
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
首?特?市
698 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row首?特?市
2nd row首?特?市
3rd row首?特?市
4th row首?特?市
5th row首?特?市

Common Values

ValueCountFrequency (%)
首?特?市 698
100.0%

Length

2024-04-14T16:26:07.517105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-14T16:26:07.838765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
首?特?市 698
100.0%

행정구
Categorical

Distinct25
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
麻浦?
209 
中?
76 
?山?
63 
江南?
49 
?路?
44 
Other values (20)
257 

Length

Max length4
Median length3
Mean length2.9684814
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row麻浦?
2nd row麻浦?
3rd row麻浦?
4th row麻浦?
5th row麻浦?

Common Values

ValueCountFrequency (%)
麻浦? 209
29.9%
中? 76
 
10.9%
?山? 63
 
9.0%
江南? 49
 
7.0%
?路? 44
 
6.3%
松坡? 41
 
5.9%
西大?? 27
 
3.9%
瑞草? 25
 
3.6%
冠岳? 20
 
2.9%
恩平? 19
 
2.7%
Other values (15) 125
17.9%

Length

2024-04-14T16:26:08.196142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
麻浦 209
29.9%
76
 
10.9%
63
 
9.0%
江南 49
 
7.0%
44
 
6.3%
松坡 41
 
5.9%
西大 27
 
3.9%
瑞草 25
 
3.6%
冠岳 20
 
2.9%
恩平 19
 
2.7%
Other values (15) 125
17.9%
Distinct195
Distinct (%)27.9%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2024-04-14T16:26:09.202363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length3
Mean length3.4197708
Min length2

Characters and Unicode

Total characters2387
Distinct characters153
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique87 ?
Unique (%)12.5%

Sample

1st row西?洞
2nd row城山1洞
3rd row西?洞
4th row西?洞
5th row西?洞
ValueCountFrequency (%)
西?洞 116
 
16.6%
延南洞 61
 
8.7%
40
 
5.7%
明洞 22
 
3.2%
梨泰院1洞 18
 
2.6%
南洞 12
 
1.7%
新沙洞 11
 
1.6%
社稷洞 9
 
1.3%
化洞 9
 
1.3%
大?洞 8
 
1.1%
Other values (184) 392
56.2%
2024-04-14T16:26:10.584411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
698
29.2%
? 379
15.9%
西 118
 
4.9%
1 104
 
4.4%
2 88
 
3.7%
78
 
3.3%
68
 
2.8%
36
 
1.5%
29
 
1.2%
3 29
 
1.2%
Other values (143) 760
31.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1749
73.3%
Other Punctuation 388
 
16.3%
Decimal Number 250
 
10.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
698
39.9%
西 118
 
6.7%
78
 
4.5%
68
 
3.9%
36
 
2.1%
29
 
1.7%
27
 
1.5%
26
 
1.5%
23
 
1.3%
22
 
1.3%
Other values (133) 624
35.7%
Decimal Number
ValueCountFrequency (%)
1 104
41.6%
2 88
35.2%
3 29
 
11.6%
4 15
 
6.0%
5 6
 
2.4%
7 4
 
1.6%
6 3
 
1.2%
8 1
 
0.4%
Other Punctuation
ValueCountFrequency (%)
? 379
97.7%
. 9
 
2.3%

Most occurring scripts

ValueCountFrequency (%)
Han 1725
72.3%
Common 638
 
26.7%
Hangul 24
 
1.0%

Most frequent character per script

Han
ValueCountFrequency (%)
698
40.5%
西 118
 
6.8%
78
 
4.5%
68
 
3.9%
36
 
2.1%
29
 
1.7%
27
 
1.6%
26
 
1.5%
23
 
1.3%
22
 
1.3%
Other values (131) 600
34.8%
Common
ValueCountFrequency (%)
? 379
59.4%
1 104
 
16.3%
2 88
 
13.8%
3 29
 
4.5%
4 15
 
2.4%
. 9
 
1.4%
5 6
 
0.9%
7 4
 
0.6%
6 3
 
0.5%
8 1
 
0.2%
Hangul
ValueCountFrequency (%)
21
87.5%
3
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
CJK 1725
72.3%
ASCII 638
 
26.7%
Hangul 24
 
1.0%

Most frequent character per block

CJK
ValueCountFrequency (%)
698
40.5%
西 118
 
6.8%
78
 
4.5%
68
 
3.9%
36
 
2.1%
29
 
1.7%
27
 
1.6%
26
 
1.5%
23
 
1.3%
22
 
1.3%
Other values (131) 600
34.8%
ASCII
ValueCountFrequency (%)
? 379
59.4%
1 104
 
16.3%
2 88
 
13.8%
3 29
 
4.5%
4 15
 
2.4%
. 9
 
1.4%
5 6
 
0.9%
7 4
 
0.6%
6 3
 
0.5%
8 1
 
0.2%
Hangul
ValueCountFrequency (%)
21
87.5%
3
 
12.5%

객실수
Real number (ℝ)

MISSING 

Distinct17
Distinct (%)2.5%
Missing21
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean3.2614476
Minimum1
Maximum21
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2024-04-14T16:26:10.962398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile8
Maximum21
Range20
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.5497197
Coefficient of variation (CV)0.78177548
Kurtosis6.9123379
Mean3.2614476
Median Absolute Deviation (MAD)1
Skewness2.0917911
Sum2208
Variance6.5010707
MonotonicityNot monotonic
2024-04-14T16:26:11.352887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
2 199
28.5%
1 158
22.6%
3 108
15.5%
4 54
 
7.7%
5 50
 
7.2%
6 31
 
4.4%
7 28
 
4.0%
8 20
 
2.9%
9 12
 
1.7%
10 6
 
0.9%
Other values (7) 11
 
1.6%
(Missing) 21
 
3.0%
ValueCountFrequency (%)
1 158
22.6%
2 199
28.5%
3 108
15.5%
4 54
 
7.7%
5 50
 
7.2%
6 31
 
4.4%
7 28
 
4.0%
8 20
 
2.9%
9 12
 
1.7%
10 6
 
0.9%
ValueCountFrequency (%)
21 1
 
0.1%
18 1
 
0.1%
17 1
 
0.1%
14 1
 
0.1%
13 2
 
0.3%
12 1
 
0.1%
11 4
 
0.6%
10 6
 
0.9%
9 12
1.7%
8 20
2.9%

Interactions

2024-04-14T16:26:02.075596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-14T16:26:11.610489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
행정구객실수
행정구1.0000.247
객실수0.2471.000
2024-04-14T16:26:11.832827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
객실수행정구
객실수1.0000.000
행정구0.0001.000

Missing values

2024-04-14T16:26:02.449095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-14T16:26:02.802077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

상호행정시행정구행정동객실수
0BE_LiST20-0333?牛山丘民宿首?特?市麻浦?西?洞7
1BE_LiST20-0334?民宿首?特?市麻浦?城山1洞4
2BE_LiST20-0335芒果民宿首?特?市麻浦?西?洞4
3BE_LiST20-0336十字路口背包旅人首?特?市麻浦?西?洞8
4BE_LiST20-0337?富首?特?市麻浦?西?洞12
5BE_LiST20-0338?之家首?特?市麻浦?西?洞6
6BE_LiST20-0339阿?法民宿首?特?市麻浦?大?洞2
7BE_LiST20-0340?梁背包旅人首?特?市麻浦?西?洞2
8BE_LiST20-0341弘大民宿?????首?特?市麻浦?西?洞5
9BE_LiST20-034224民宿弘大首?特?市麻浦?延南洞6
상호행정시행정구행정동객실수
688BE_LiST20-0323?毅之家首?特?市麻浦?延南洞9
689BE_LiST20-0324?托姆首?特?市麻浦?望?2洞6
690BE_LiST20-0325?点首?首?特?市麻浦?西?洞6
691BE_LiST20-0326?果民宿首?特?市麻浦?延南洞10
692BE_LiST20-0327KPOP住宿首?特?市麻浦?西?洞4
693BE_LiST20-0328KPOP住宿II首?特?市麻浦?西?洞3
694BE_LiST20-0329??住宿首?特?市麻浦?西?洞4
695BE_LiST20-0330弘大家人旅?2首?特?市麻浦?延南洞2
696BE_LiST20-0331逗?先生民宿首?特?市麻浦?延南洞3
697BE_LiST20-0332?背包旅人2首?特?市麻浦?西?洞3