Overview

Dataset statistics

Number of variables5
Number of observations340
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.4 KiB
Average record size in memory40.4 B

Variable types

Text3
Categorical2

Dataset

Description키,명칭,행정시,행정구,행정동
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-13056/S/1/datasetView.do

Alerts

행정시 has constant value ""Constant
has unique valuesUnique

Reproduction

Analysis started2023-12-11 10:08:01.774225
Analysis finished2023-12-11 10:08:02.224792
Duration0.45 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables


Text

UNIQUE 

Distinct340
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
2023-12-11T19:08:02.453455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters4080
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique340 ?
Unique (%)100.0%

Sample

1st rowBE_IW18-0273
2nd rowBE_IW18-0274
3rd rowBE_IW18-0275
4th rowBE_IW18-0276
5th rowBE_IW18-0277
ValueCountFrequency (%)
be_iw18-0273 1
 
0.3%
be_iw18-0156 1
 
0.3%
be_iw18-0165 1
 
0.3%
be_iw18-0164 1
 
0.3%
be_iw18-0163 1
 
0.3%
be_iw18-0162 1
 
0.3%
be_iw18-0161 1
 
0.3%
be_iw18-0160 1
 
0.3%
be_iw18-0159 1
 
0.3%
be_iw18-0158 1
 
0.3%
Other values (330) 330
97.1%
2023-12-11T19:08:02.889299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 514
12.6%
0 512
12.5%
8 404
9.9%
B 340
8.3%
E 340
8.3%
_ 340
8.3%
I 340
8.3%
W 340
8.3%
- 340
8.3%
2 174
 
4.3%
Other values (6) 436
10.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2040
50.0%
Uppercase Letter 1360
33.3%
Connector Punctuation 340
 
8.3%
Dash Punctuation 340
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 514
25.2%
0 512
25.1%
8 404
19.8%
2 174
 
8.5%
3 115
 
5.6%
4 65
 
3.2%
7 64
 
3.1%
5 64
 
3.1%
6 64
 
3.1%
9 64
 
3.1%
Uppercase Letter
ValueCountFrequency (%)
B 340
25.0%
E 340
25.0%
I 340
25.0%
W 340
25.0%
Connector Punctuation
ValueCountFrequency (%)
_ 340
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 340
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2720
66.7%
Latin 1360
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
1 514
18.9%
0 512
18.8%
8 404
14.9%
_ 340
12.5%
- 340
12.5%
2 174
 
6.4%
3 115
 
4.2%
4 65
 
2.4%
7 64
 
2.4%
5 64
 
2.4%
Other values (2) 128
 
4.7%
Latin
ValueCountFrequency (%)
B 340
25.0%
E 340
25.0%
I 340
25.0%
W 340
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4080
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 514
12.6%
0 512
12.5%
8 404
9.9%
B 340
8.3%
E 340
8.3%
_ 340
8.3%
I 340
8.3%
W 340
8.3%
- 340
8.3%
2 174
 
4.3%
Other values (6) 436
10.7%

명칭
Text

Distinct306
Distinct (%)90.0%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
2023-12-11T19:08:03.309615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length15
Mean length4.7794118
Min length1

Characters and Unicode

Total characters1625
Distinct characters288
Distinct categories9 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique284 ?
Unique (%)83.5%

Sample

1st row日新山庄
2nd row易姆斯庄?酒店
3rd rowJava旅?
4th row紫霞???
5th row第一旅?
ValueCountFrequency (%)
7
 
2.0%
汽?旅 6
 
1.7%
山庄 5
 
1.4%
5
 
1.4%
天空旅 4
 
1.2%
大提琴旅 3
 
0.9%
主?旅店 3
 
0.9%
3
 
0.9%
洛代?旅社 2
 
0.6%
2
 
0.6%
Other values (292) 305
88.4%
2023-12-11T19:08:03.969206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
? 466
28.7%
170
 
10.5%
73
 
4.5%
72
 
4.4%
28
 
1.7%
e 24
 
1.5%
22
 
1.4%
22
 
1.4%
a 17
 
1.0%
i 15
 
0.9%
Other values (278) 716
44.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 896
55.1%
Other Punctuation 466
28.7%
Lowercase Letter 161
 
9.9%
Uppercase Letter 51
 
3.1%
Close Punctuation 13
 
0.8%
Open Punctuation 13
 
0.8%
Space Separator 12
 
0.7%
Decimal Number 10
 
0.6%
Dash Punctuation 3
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
170
 
19.0%
73
 
8.1%
72
 
8.0%
28
 
3.1%
22
 
2.5%
22
 
2.5%
11
 
1.2%
11
 
1.2%
10
 
1.1%
10
 
1.1%
Other values (222) 467
52.1%
Lowercase Letter
ValueCountFrequency (%)
e 24
14.9%
a 17
10.6%
i 15
9.3%
o 13
 
8.1%
r 12
 
7.5%
l 12
 
7.5%
t 10
 
6.2%
n 9
 
5.6%
s 8
 
5.0%
u 7
 
4.3%
Other values (14) 34
21.1%
Uppercase Letter
ValueCountFrequency (%)
C 7
13.7%
M 6
11.8%
T 4
 
7.8%
S 4
 
7.8%
L 3
 
5.9%
G 3
 
5.9%
O 3
 
5.9%
P 3
 
5.9%
W 2
 
3.9%
A 2
 
3.9%
Other values (12) 14
27.5%
Decimal Number
ValueCountFrequency (%)
2 6
60.0%
1 3
30.0%
5 1
 
10.0%
Close Punctuation
ValueCountFrequency (%)
12
92.3%
) 1
 
7.7%
Open Punctuation
ValueCountFrequency (%)
12
92.3%
( 1
 
7.7%
Other Punctuation
ValueCountFrequency (%)
? 466
100.0%
Space Separator
ValueCountFrequency (%)
12
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 895
55.1%
Common 517
31.8%
Latin 212
 
13.0%
Hangul 1
 
0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
170
 
19.0%
73
 
8.2%
72
 
8.0%
28
 
3.1%
22
 
2.5%
22
 
2.5%
11
 
1.2%
11
 
1.2%
10
 
1.1%
10
 
1.1%
Other values (221) 466
52.1%
Latin
ValueCountFrequency (%)
e 24
 
11.3%
a 17
 
8.0%
i 15
 
7.1%
o 13
 
6.1%
r 12
 
5.7%
l 12
 
5.7%
t 10
 
4.7%
n 9
 
4.2%
s 8
 
3.8%
u 7
 
3.3%
Other values (36) 85
40.1%
Common
ValueCountFrequency (%)
? 466
90.1%
12
 
2.3%
12
 
2.3%
12
 
2.3%
2 6
 
1.2%
- 3
 
0.6%
1 3
 
0.6%
) 1
 
0.2%
( 1
 
0.2%
5 1
 
0.2%
Hangul
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
CJK 895
55.1%
ASCII 705
43.4%
None 24
 
1.5%
Hangul 1
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
? 466
66.1%
e 24
 
3.4%
a 17
 
2.4%
i 15
 
2.1%
o 13
 
1.8%
12
 
1.7%
r 12
 
1.7%
l 12
 
1.7%
t 10
 
1.4%
n 9
 
1.3%
Other values (44) 115
 
16.3%
CJK
ValueCountFrequency (%)
170
 
19.0%
73
 
8.2%
72
 
8.0%
28
 
3.1%
22
 
2.5%
22
 
2.5%
11
 
1.2%
11
 
1.2%
10
 
1.1%
10
 
1.1%
Other values (221) 466
52.1%
None
ValueCountFrequency (%)
12
50.0%
12
50.0%
Hangul
ValueCountFrequency (%)
1
100.0%

행정시
Categorical

CONSTANT 

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
首?特?市
340 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row首?特?市
2nd row首?特?市
3rd row首?特?市
4th row首?特?市
5th row首?特?市

Common Values

ValueCountFrequency (%)
首?特?市 340
100.0%

Length

2023-12-11T19:08:04.135152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T19:08:04.229831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
首?特?市 340
100.0%

행정구
Categorical

Distinct25
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
冠岳?
32 
江西?
28 
?路?
22 
松坡?
 
21
永登浦?
 
20
Other values (20)
217 

Length

Max length4
Median length3
Mean length3.1058824
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row西大??
2nd row?路?
3rd row?路?
4th row?路?
5th row永登浦?

Common Values

ValueCountFrequency (%)
冠岳? 32
 
9.4%
江西? 28
 
8.2%
?路? 22
 
6.5%
松坡? 21
 
6.2%
永登浦? 20
 
5.9%
?津? 20
 
5.9%
江南? 19
 
5.6%
中浪? 19
 
5.6%
九老? 18
 
5.3%
江北? 17
 
5.0%
Other values (15) 124
36.5%

Length

2023-12-11T19:08:04.345469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
冠岳 32
 
9.4%
江西 28
 
8.2%
22
 
6.5%
松坡 21
 
6.2%
永登浦 20
 
5.9%
20
 
5.9%
江南 19
 
5.6%
中浪 19
 
5.6%
九老 18
 
5.3%
江北 17
 
5.0%
Other values (15) 124
36.5%
Distinct128
Distinct (%)37.6%
Missing0
Missing (%)0.0%
Memory size2.8 KiB
2023-12-11T19:08:04.650309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length4
Mean length3.9441176
Min length2

Characters and Unicode

Total characters1341
Distinct characters123
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique58 ?
Unique (%)17.1%

Sample

1st row新村洞
2nd row?路1.2.3.4街洞
3rd row崇仁2洞
4th row平?洞
5th row新吉6洞
ValueCountFrequency (%)
20
 
5.9%
方荑2洞 15
 
4.4%
禾谷1洞 15
 
4.4%
新村洞 14
 
4.1%
新林洞 14
 
4.1%
路1.2.3.4街洞 12
 
3.5%
永登浦洞 9
 
2.6%
三1洞 8
 
2.4%
南?洞 7
 
2.1%
大?洞 7
 
2.1%
Other values (118) 219
64.4%
2023-12-11T19:08:05.099420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
340
25.4%
? 162
 
12.1%
1 78
 
5.8%
2 77
 
5.7%
42
 
3.1%
. 41
 
3.1%
25
 
1.9%
3 25
 
1.9%
4 24
 
1.8%
23
 
1.7%
Other values (113) 504
37.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 917
68.4%
Decimal Number 221
 
16.5%
Other Punctuation 203
 
15.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
340
37.1%
42
 
4.6%
25
 
2.7%
23
 
2.5%
21
 
2.3%
19
 
2.1%
18
 
2.0%
17
 
1.9%
15
 
1.6%
15
 
1.6%
Other values (104) 382
41.7%
Decimal Number
ValueCountFrequency (%)
1 78
35.3%
2 77
34.8%
3 25
 
11.3%
4 24
 
10.9%
6 8
 
3.6%
5 5
 
2.3%
7 4
 
1.8%
Other Punctuation
ValueCountFrequency (%)
? 162
79.8%
. 41
 
20.2%

Most occurring scripts

ValueCountFrequency (%)
Han 910
67.9%
Common 424
31.6%
Hangul 7
 
0.5%

Most frequent character per script

Han
ValueCountFrequency (%)
340
37.4%
42
 
4.6%
25
 
2.7%
23
 
2.5%
21
 
2.3%
19
 
2.1%
18
 
2.0%
17
 
1.9%
15
 
1.6%
15
 
1.6%
Other values (102) 375
41.2%
Common
ValueCountFrequency (%)
? 162
38.2%
1 78
18.4%
2 77
18.2%
. 41
 
9.7%
3 25
 
5.9%
4 24
 
5.7%
6 8
 
1.9%
5 5
 
1.2%
7 4
 
0.9%
Hangul
ValueCountFrequency (%)
4
57.1%
3
42.9%

Most occurring blocks

ValueCountFrequency (%)
CJK 910
67.9%
ASCII 424
31.6%
Hangul 7
 
0.5%

Most frequent character per block

CJK
ValueCountFrequency (%)
340
37.4%
42
 
4.6%
25
 
2.7%
23
 
2.5%
21
 
2.3%
19
 
2.1%
18
 
2.0%
17
 
1.9%
15
 
1.6%
15
 
1.6%
Other values (102) 375
41.2%
ASCII
ValueCountFrequency (%)
? 162
38.2%
1 78
18.4%
2 77
18.2%
. 41
 
9.7%
3 25
 
5.9%
4 24
 
5.7%
6 8
 
1.9%
5 5
 
1.2%
7 4
 
0.9%
Hangul
ValueCountFrequency (%)
4
57.1%
3
42.9%

Missing values

2023-12-11T19:08:02.093370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T19:08:02.187337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

명칭행정시행정구행정동
0BE_IW18-0273日新山庄首?特?市西大??新村洞
1BE_IW18-0274易姆斯庄?酒店首?特?市?路??路1.2.3.4街洞
2BE_IW18-0275Java旅?首?特?市?路?崇仁2洞
3BE_IW18-0276紫霞???首?特?市?路?平?洞
4BE_IW18-0277第一旅?首?特?市永登浦?新吉6洞
5BE_IW18-0278第一山庄首?特?市冠岳???洞
6BE_IW18-0279?伊娜首?特?市江西?禾谷本洞
7BE_IW18-0280朱?首?特?市?津?中谷3洞
8BE_IW18-0281晋?旅?首?特?市九老?加里峰洞
9BE_IW18-0282昌原山庄首?特?市九老?九老2洞
명칭행정시행정구행정동
330BE_IW18-0263友利山庄首?特?市永登浦?堂山1洞
331BE_IW18-0264雨林山庄首?特?市九老?九老2洞
332BE_IW18-0265月壁城首?特?市江北?牛耳洞
333BE_IW18-0266月城客?首?特?市九老?加里峰洞
334BE_IW18-0267威尼安首?特?市中浪?墨2洞
335BE_IW18-0268?星旅?首?特?市中浪?上?2洞
336BE_IW18-0269?星山庄首?特?市永登浦?永登浦洞
337BE_IW18-0270ez旅?首?特?市西大??新村洞
338BE_IW18-0271?彩旅?首?特?市西大??新村洞
339BE_IW18-0272因特?旅?首?特?市?路??路5.6街洞