Overview

Dataset statistics

Number of variables5
Number of observations147
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.9 KiB
Average record size in memory40.9 B

Variable types

Text3
Categorical2

Dataset

Description키,명칭,행정시,행정구,행정동
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-13053/S/1/datasetView.do

Alerts

행정시 has constant value ""Constant
has unique valuesUnique

Reproduction

Analysis started2023-12-11 07:39:07.583881
Analysis finished2023-12-11 07:39:08.791253
Duration1.21 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables


Text

UNIQUE 

Distinct147
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
2023-12-11T16:39:09.044484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters1764
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique147 ?
Unique (%)100.0%

Sample

1st rowBE_IW17-0099
2nd rowBE_IW17-0100
3rd rowBE_IW17-0101
4th rowBE_IW17-0102
5th rowBE_IW17-0103
ValueCountFrequency (%)
be_iw17-0099 1
 
0.7%
be_iw17-0026 1
 
0.7%
be_iw17-0052 1
 
0.7%
be_iw17-0046 1
 
0.7%
be_iw17-0047 1
 
0.7%
be_iw17-0048 1
 
0.7%
be_iw17-0049 1
 
0.7%
be_iw17-0050 1
 
0.7%
be_iw17-0051 1
 
0.7%
be_iw17-0053 1
 
0.7%
Other values (137) 137
93.2%
2023-12-11T16:39:09.619876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 279
15.8%
1 230
13.0%
7 172
9.8%
B 147
8.3%
E 147
8.3%
_ 147
8.3%
I 147
8.3%
W 147
8.3%
- 147
8.3%
3 35
 
2.0%
Other values (6) 166
9.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 882
50.0%
Uppercase Letter 588
33.3%
Connector Punctuation 147
 
8.3%
Dash Punctuation 147
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 279
31.6%
1 230
26.1%
7 172
19.5%
3 35
 
4.0%
2 35
 
4.0%
4 33
 
3.7%
6 25
 
2.8%
5 25
 
2.8%
9 24
 
2.7%
8 24
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
B 147
25.0%
E 147
25.0%
I 147
25.0%
W 147
25.0%
Connector Punctuation
ValueCountFrequency (%)
_ 147
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 147
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1176
66.7%
Latin 588
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 279
23.7%
1 230
19.6%
7 172
14.6%
_ 147
12.5%
- 147
12.5%
3 35
 
3.0%
2 35
 
3.0%
4 33
 
2.8%
6 25
 
2.1%
5 25
 
2.1%
Other values (2) 48
 
4.1%
Latin
ValueCountFrequency (%)
B 147
25.0%
E 147
25.0%
I 147
25.0%
W 147
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1764
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 279
15.8%
1 230
13.0%
7 172
9.8%
B 147
8.3%
E 147
8.3%
_ 147
8.3%
I 147
8.3%
W 147
8.3%
- 147
8.3%
3 35
 
2.0%
Other values (6) 166
9.4%

명칭
Text

Distinct91
Distinct (%)61.9%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
2023-12-11T16:39:10.037806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length14
Mean length8.8571429
Min length3

Characters and Unicode

Total characters1302
Distinct characters138
Distinct categories9 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique83 ?
Unique (%)56.5%

Sample

1st row(Sun plaza) ?光百?
2nd row世?百?店
3rd row汝矣?百?店
4th row世界免?店
5th row?粮?合百?店
ValueCountFrequency (%)
e-mart 31
 
14.7%
易?得 31
 
14.7%
home 19
 
9.0%
天百?店 9
 
4.3%
hanaro超市 7
 
3.3%
代百?店 7
 
3.3%
hanaro 5
 
2.4%
新世界百?商店 4
 
1.9%
costco?平店 2
 
0.9%
new 2
 
0.9%
Other values (90) 94
44.5%
2023-12-11T16:39:10.634527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
? 192
 
14.7%
86
 
6.6%
67
 
5.1%
a 54
 
4.1%
m 52
 
4.0%
r 45
 
3.5%
E 34
 
2.6%
t 34
 
2.6%
34
 
2.6%
- 33
 
2.5%
Other values (128) 671
51.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 439
33.7%
Lowercase Letter 350
26.9%
Other Punctuation 192
14.7%
Uppercase Letter 189
14.5%
Space Separator 67
 
5.1%
Dash Punctuation 33
 
2.5%
Decimal Number 30
 
2.3%
Open Punctuation 1
 
0.1%
Close Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
86
19.6%
34
 
7.7%
31
 
7.1%
31
 
7.1%
24
 
5.5%
16
 
3.6%
16
 
3.6%
14
 
3.2%
12
 
2.7%
10
 
2.3%
Other values (79) 165
37.6%
Lowercase Letter
ValueCountFrequency (%)
a 54
15.4%
m 52
14.9%
r 45
12.9%
t 34
9.7%
l 30
8.6%
e 29
8.3%
o 29
8.3%
u 27
7.7%
s 19
 
5.4%
n 11
 
3.1%
Other values (10) 20
 
5.7%
Uppercase Letter
ValueCountFrequency (%)
E 34
18.0%
H 31
16.4%
P 21
11.1%
O 17
9.0%
C 17
9.0%
A 16
8.5%
N 11
 
5.8%
R 7
 
3.7%
S 7
 
3.7%
T 7
 
3.7%
Other values (10) 21
11.1%
Decimal Number
ValueCountFrequency (%)
0 14
46.7%
2 8
26.7%
1 7
23.3%
5 1
 
3.3%
Other Punctuation
ValueCountFrequency (%)
? 192
100.0%
Space Separator
ValueCountFrequency (%)
67
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 33
100.0%
Open Punctuation
ValueCountFrequency (%)
1
100.0%
Close Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 539
41.4%
Han 437
33.6%
Common 324
24.9%
Hangul 2
 
0.2%

Most frequent character per script

Han
ValueCountFrequency (%)
86
19.7%
34
 
7.8%
31
 
7.1%
31
 
7.1%
24
 
5.5%
16
 
3.7%
16
 
3.7%
14
 
3.2%
12
 
2.7%
10
 
2.3%
Other values (78) 163
37.3%
Latin
ValueCountFrequency (%)
a 54
 
10.0%
m 52
 
9.6%
r 45
 
8.3%
E 34
 
6.3%
t 34
 
6.3%
H 31
 
5.8%
l 30
 
5.6%
e 29
 
5.4%
o 29
 
5.4%
u 27
 
5.0%
Other values (30) 174
32.3%
Common
ValueCountFrequency (%)
? 192
59.3%
67
 
20.7%
- 33
 
10.2%
0 14
 
4.3%
2 8
 
2.5%
1 7
 
2.2%
1
 
0.3%
1
 
0.3%
5 1
 
0.3%
Hangul
ValueCountFrequency (%)
2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 861
66.1%
CJK 437
33.6%
Hangul 2
 
0.2%
None 2
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
? 192
22.3%
67
 
7.8%
a 54
 
6.3%
m 52
 
6.0%
r 45
 
5.2%
E 34
 
3.9%
t 34
 
3.9%
- 33
 
3.8%
H 31
 
3.6%
l 30
 
3.5%
Other values (37) 289
33.6%
CJK
ValueCountFrequency (%)
86
19.7%
34
 
7.8%
31
 
7.1%
31
 
7.1%
24
 
5.5%
16
 
3.7%
16
 
3.7%
14
 
3.2%
12
 
2.7%
10
 
2.3%
Other values (78) 163
37.3%
Hangul
ValueCountFrequency (%)
2
100.0%
None
ValueCountFrequency (%)
1
50.0%
1
50.0%

행정시
Categorical

CONSTANT 

Distinct1
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
首?特?市
147 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row首?特?市
2nd row首?特?市
3rd row首?特?市
4th row首?特?市
5th row首?特?市

Common Values

ValueCountFrequency (%)
首?特?市 147
100.0%

Length

2023-12-11T16:39:10.830976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T16:39:10.982428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
首?特?市 147
100.0%

행정구
Categorical

Distinct25
Distinct (%)17.0%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
永登浦?
15 
?川?
10 
?原?
 
9
中浪?
 
8
江西?
 
8
Other values (20)
97 

Length

Max length4
Median length3
Mean length3.122449
Min length2

Unique

Unique1 ?
Unique (%)0.7%

Sample

1st row永登浦?
2nd row?路?
3rd row永登浦?
4th row?山?
5th row?川?

Common Values

ValueCountFrequency (%)
永登浦? 15
 
10.2%
?川? 10
 
6.8%
?原? 9
 
6.1%
中浪? 8
 
5.4%
江西? 8
 
5.4%
江南? 8
 
5.4%
江?? 8
 
5.4%
瑞草? 7
 
4.8%
?大?? 7
 
4.8%
松坡? 7
 
4.8%
Other values (15) 60
40.8%

Length

2023-12-11T16:39:11.150736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
永登浦 15
 
10.2%
10
 
6.8%
9
 
6.1%
中浪 8
 
5.4%
江西 8
 
5.4%
江南 8
 
5.4%
8
 
5.4%
瑞草 7
 
4.8%
7
 
4.8%
松坡 7
 
4.8%
Other values (15) 60
40.8%
Distinct98
Distinct (%)66.7%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
2023-12-11T16:39:11.593785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length4
Mean length3.8503401
Min length2

Characters and Unicode

Total characters566
Distinct characters91
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique66 ?
Unique (%)44.9%

Sample

1st row大林1洞
2nd row?路1.2.3.4街洞
3rd row汝矣?洞
4th row南?洞
5th row新亭3洞
ValueCountFrequency (%)
永登浦洞 5
 
3.4%
中?2.3洞 5
 
3.4%
木1洞 5
 
3.4%
良才2洞 4
 
2.7%
加山洞 4
 
2.7%
3
 
2.0%
傍花2洞 3
 
2.0%
文井2洞 3
 
2.0%
千?2洞 3
 
2.0%
月谷1洞 2
 
1.4%
Other values (88) 110
74.8%
2023-12-11T16:39:12.224079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
147
26.0%
? 72
 
12.7%
2 38
 
6.7%
1 35
 
6.2%
3 20
 
3.5%
11
 
1.9%
9
 
1.6%
9
 
1.6%
8
 
1.4%
. 8
 
1.4%
Other values (81) 209
36.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 380
67.1%
Decimal Number 106
 
18.7%
Other Punctuation 80
 
14.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
147
38.7%
11
 
2.9%
9
 
2.4%
9
 
2.4%
8
 
2.1%
7
 
1.8%
7
 
1.8%
6
 
1.6%
6
 
1.6%
6
 
1.6%
Other values (73) 164
43.2%
Decimal Number
ValueCountFrequency (%)
2 38
35.8%
1 35
33.0%
3 20
18.9%
5 6
 
5.7%
4 6
 
5.7%
6 1
 
0.9%
Other Punctuation
ValueCountFrequency (%)
? 72
90.0%
. 8
 
10.0%

Most occurring scripts

ValueCountFrequency (%)
Han 377
66.6%
Common 186
32.9%
Hangul 3
 
0.5%

Most frequent character per script

Han
ValueCountFrequency (%)
147
39.0%
11
 
2.9%
9
 
2.4%
9
 
2.4%
8
 
2.1%
7
 
1.9%
7
 
1.9%
6
 
1.6%
6
 
1.6%
6
 
1.6%
Other values (72) 161
42.7%
Common
ValueCountFrequency (%)
? 72
38.7%
2 38
20.4%
1 35
18.8%
3 20
 
10.8%
. 8
 
4.3%
5 6
 
3.2%
4 6
 
3.2%
6 1
 
0.5%
Hangul
ValueCountFrequency (%)
3
100.0%

Most occurring blocks

ValueCountFrequency (%)
CJK 377
66.6%
ASCII 186
32.9%
Hangul 3
 
0.5%

Most frequent character per block

CJK
ValueCountFrequency (%)
147
39.0%
11
 
2.9%
9
 
2.4%
9
 
2.4%
8
 
2.1%
7
 
1.9%
7
 
1.9%
6
 
1.6%
6
 
1.6%
6
 
1.6%
Other values (72) 161
42.7%
ASCII
ValueCountFrequency (%)
? 72
38.7%
2 38
20.4%
1 35
18.8%
3 20
 
10.8%
. 8
 
4.3%
5 6
 
3.2%
4 6
 
3.2%
6 1
 
0.5%
Hangul
ValueCountFrequency (%)
3
100.0%

Correlations

2023-12-11T16:39:12.355362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
명칭행정구행정동
명칭1.0000.7010.597
행정구0.7011.0000.999
행정동0.5970.9991.000

Missing values

2023-12-11T16:39:08.265421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T16:39:08.740482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

명칭행정시행정구행정동
0BE_IW17-0099(Sun plaza) ?光百?首?特?市永登浦?大林1洞
1BE_IW17-0100世?百?店首?特?市?路??路1.2.3.4街洞
2BE_IW17-0101汝矣?百?店首?特?市永登浦?汝矣?洞
3BE_IW17-0102世界免?店首?特?市?山?南?洞
4BE_IW17-0103?粮?合百?店首?特?市?川?新亭3洞
5BE_IW17-0104?凉里?代百?店首?特?市?大???凉里洞
6BE_IW17-0105太平百?店首?特?市?雀?舍堂2洞
7BE_IW17-0106幸福百?店首?特?市?川?木1洞
8BE_IW17-0107新世界百?商店首?特?市中?明洞
9BE_IW17-0108新世界百?商店首?特?市永登浦?永登浦洞
명칭행정시행정구행정동
137BE_IW17-0089Hanaro Club?山店首?特?市?山??江路洞
138BE_IW17-0090Hanaro Club?洞店首?特?市道峰??4洞
139BE_IW17-0091COSTCO上?店首?特?市中浪?上?2洞
140BE_IW17-0092COSTCO良才店首?特?市瑞草?良才2洞
141BE_IW17-0093COSTCO?平店首?特?市永登浦??坪1洞
142BE_IW17-0094COSTCO?平店首?特?市永登浦??坪1洞
143BE_IW17-0095Techno Mart江?店首?特?市?津?九宜3洞
144BE_IW17-0096Kunyoungomni百?店首?特?市?原?中?2.3洞
145BE_IW17-0097京坊?代??首?特?市永登浦?永登浦洞
146BE_IW17-0098?方百?店首?特?市?大??祭基洞