Overview

Dataset statistics

Number of variables5
Number of observations1104
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory43.3 KiB
Average record size in memory40.1 B

Variable types

Text3
Categorical2

Dataset

Description키,명칭,행정시,행정구,행정동
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-13045/S/1/datasetView.do

Alerts

행정시 has constant value ""Constant
has unique valuesUnique

Reproduction

Analysis started2023-12-11 07:34:53.822531
Analysis finished2023-12-11 07:34:54.631657
Duration0.81 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables


Text

UNIQUE 

Distinct1104
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size8.8 KiB
2023-12-11T16:34:54.849296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters13248
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1104 ?
Unique (%)100.0%

Sample

1st rowBE_IW16-0511
2nd rowBE_IW16-1049
3rd rowBE_IW16-1050
4th rowBE_IW16-1051
5th rowBE_IW16-1052
ValueCountFrequency (%)
be_iw16-0511 1
 
0.1%
be_iw16-0968 1
 
0.1%
be_iw16-0963 1
 
0.1%
be_iw16-0964 1
 
0.1%
be_iw16-0965 1
 
0.1%
be_iw16-0966 1
 
0.1%
be_iw16-0967 1
 
0.1%
be_iw16-0960 1
 
0.1%
be_iw16-0970 1
 
0.1%
be_iw16-0959 1
 
0.1%
Other values (1094) 1094
99.1%
2023-12-11T16:34:55.338894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1535
11.6%
6 1424
10.7%
0 1422
10.7%
B 1104
8.3%
E 1104
8.3%
_ 1104
8.3%
I 1104
8.3%
W 1104
8.3%
- 1104
8.3%
2 321
 
2.4%
Other values (6) 1922
14.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 6624
50.0%
Uppercase Letter 4416
33.3%
Connector Punctuation 1104
 
8.3%
Dash Punctuation 1104
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 1535
23.2%
6 1424
21.5%
0 1422
21.5%
2 321
 
4.8%
4 321
 
4.8%
3 321
 
4.8%
5 320
 
4.8%
9 320
 
4.8%
8 320
 
4.8%
7 320
 
4.8%
Uppercase Letter
ValueCountFrequency (%)
B 1104
25.0%
E 1104
25.0%
I 1104
25.0%
W 1104
25.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1104
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1104
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 8832
66.7%
Latin 4416
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
1 1535
17.4%
6 1424
16.1%
0 1422
16.1%
_ 1104
12.5%
- 1104
12.5%
2 321
 
3.6%
4 321
 
3.6%
3 321
 
3.6%
5 320
 
3.6%
9 320
 
3.6%
Other values (2) 640
7.2%
Latin
ValueCountFrequency (%)
B 1104
25.0%
E 1104
25.0%
I 1104
25.0%
W 1104
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13248
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1535
11.6%
6 1424
10.7%
0 1422
10.7%
B 1104
8.3%
E 1104
8.3%
_ 1104
8.3%
I 1104
8.3%
W 1104
8.3%
- 1104
8.3%
2 321
 
2.4%
Other values (6) 1922
14.5%

명칭
Text

Distinct706
Distinct (%)63.9%
Missing0
Missing (%)0.0%
Memory size8.8 KiB
2023-12-11T16:34:55.684645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length14
Mean length3.745471
Min length1

Characters and Unicode

Total characters4135
Distinct characters521
Distinct categories9 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique522 ?
Unique (%)47.3%

Sample

1st row松林亭
2nd row?村
3rd row?村
4th row?代花?
5th row?代花?
ValueCountFrequency (%)
31
 
2.8%
20
 
1.8%
全州餐 19
 
1.7%
11
 
1.0%
村小屋 11
 
1.0%
10
 
0.9%
南原泥 9
 
0.8%
麻浦排骨 8
 
0.7%
老村子 8
 
0.7%
柳?家 7
 
0.6%
Other values (667) 974
87.9%
2023-12-11T16:34:56.253345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
? 1159
28.0%
111
 
2.7%
93
 
2.2%
87
 
2.1%
86
 
2.1%
67
 
1.6%
54
 
1.3%
46
 
1.1%
36
 
0.9%
36
 
0.9%
Other values (511) 2360
57.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2780
67.2%
Other Punctuation 1159
28.0%
Lowercase Letter 107
 
2.6%
Uppercase Letter 33
 
0.8%
Close Punctuation 20
 
0.5%
Open Punctuation 20
 
0.5%
Space Separator 10
 
0.2%
Dash Punctuation 3
 
0.1%
Decimal Number 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
111
 
4.0%
93
 
3.3%
87
 
3.1%
86
 
3.1%
67
 
2.4%
54
 
1.9%
46
 
1.7%
36
 
1.3%
36
 
1.3%
36
 
1.3%
Other values (468) 2128
76.5%
Lowercase Letter
ValueCountFrequency (%)
a 19
17.8%
i 17
15.9%
r 9
8.4%
n 8
7.5%
o 8
7.5%
m 6
 
5.6%
g 6
 
5.6%
s 6
 
5.6%
u 5
 
4.7%
y 4
 
3.7%
Other values (8) 19
17.8%
Uppercase Letter
ValueCountFrequency (%)
G 6
18.2%
M 5
15.2%
K 3
9.1%
D 3
9.1%
Y 3
9.1%
P 2
 
6.1%
L 2
 
6.1%
B 2
 
6.1%
A 2
 
6.1%
R 1
 
3.0%
Other values (4) 4
12.1%
Decimal Number
ValueCountFrequency (%)
6 1
33.3%
1 1
33.3%
9 1
33.3%
Close Punctuation
ValueCountFrequency (%)
) 17
85.0%
3
 
15.0%
Open Punctuation
ValueCountFrequency (%)
( 17
85.0%
3
 
15.0%
Space Separator
ValueCountFrequency (%)
7
70.0%
  3
30.0%
Other Punctuation
ValueCountFrequency (%)
? 1159
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 2767
66.9%
Common 1215
29.4%
Latin 140
 
3.4%
Hangul 13
 
0.3%

Most frequent character per script

Han
ValueCountFrequency (%)
111
 
4.0%
93
 
3.4%
87
 
3.1%
86
 
3.1%
67
 
2.4%
54
 
2.0%
46
 
1.7%
36
 
1.3%
36
 
1.3%
36
 
1.3%
Other values (463) 2115
76.4%
Latin
ValueCountFrequency (%)
a 19
 
13.6%
i 17
 
12.1%
r 9
 
6.4%
n 8
 
5.7%
o 8
 
5.7%
m 6
 
4.3%
g 6
 
4.3%
s 6
 
4.3%
G 6
 
4.3%
M 5
 
3.6%
Other values (22) 50
35.7%
Common
ValueCountFrequency (%)
? 1159
95.4%
) 17
 
1.4%
( 17
 
1.4%
7
 
0.6%
3
 
0.2%
3
 
0.2%
- 3
 
0.2%
  3
 
0.2%
6 1
 
0.1%
1 1
 
0.1%
Hangul
ValueCountFrequency (%)
9
69.2%
1
 
7.7%
1
 
7.7%
1
 
7.7%
1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
CJK 2762
66.8%
ASCII 1346
32.6%
Hangul 13
 
0.3%
None 9
 
0.2%
CJK Compat Ideographs 5
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
? 1159
86.1%
a 19
 
1.4%
i 17
 
1.3%
) 17
 
1.3%
( 17
 
1.3%
r 9
 
0.7%
n 8
 
0.6%
o 8
 
0.6%
7
 
0.5%
m 6
 
0.4%
Other values (30) 79
 
5.9%
CJK
ValueCountFrequency (%)
111
 
4.0%
93
 
3.4%
87
 
3.1%
86
 
3.1%
67
 
2.4%
54
 
2.0%
46
 
1.7%
36
 
1.3%
36
 
1.3%
36
 
1.3%
Other values (459) 2110
76.4%
Hangul
ValueCountFrequency (%)
9
69.2%
1
 
7.7%
1
 
7.7%
1
 
7.7%
1
 
7.7%
None
ValueCountFrequency (%)
3
33.3%
3
33.3%
  3
33.3%
CJK Compat Ideographs
ValueCountFrequency (%)
2
40.0%
1
20.0%
1
20.0%
1
20.0%

행정시
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size8.8 KiB
首?特?市
1104 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row首?特?市
2nd row首?特?市
3rd row首?特?市
4th row首?特?市
5th row首?特?市

Common Values

ValueCountFrequency (%)
首?特?市 1104
100.0%

Length

2023-12-11T16:34:56.460365image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T16:34:56.576313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
首?特?市 1104
100.0%

행정구
Categorical

Distinct25
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size8.8 KiB
江南?
153 
瑞草?
93 
?路?
82 
江北?
67 
中?
 
58
Other values (20)
651 

Length

Max length4
Median length3
Mean length3.0452899
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row中浪?
2nd row江??
3rd row瑞草?
4th row?路?
5th row九老?

Common Values

ValueCountFrequency (%)
江南? 153
 
13.9%
瑞草? 93
 
8.4%
?路? 82
 
7.4%
江北? 67
 
6.1%
中? 58
 
5.3%
松坡? 57
 
5.2%
麻浦? 49
 
4.4%
永登浦? 45
 
4.1%
冠岳? 43
 
3.9%
城北? 42
 
3.8%
Other values (15) 415
37.6%

Length

2023-12-11T16:34:56.724586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
江南 153
 
13.9%
瑞草 93
 
8.4%
82
 
7.4%
江北 67
 
6.1%
58
 
5.3%
松坡 57
 
5.2%
麻浦 49
 
4.4%
永登浦 45
 
4.1%
冠岳 43
 
3.9%
城北 42
 
3.8%
Other values (15) 415
37.6%
Distinct300
Distinct (%)27.2%
Missing0
Missing (%)0.0%
Memory size8.8 KiB
2023-12-11T16:34:57.198500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length4
Mean length3.7798913
Min length2

Characters and Unicode

Total characters4173
Distinct characters181
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique92 ?
Unique (%)8.3%

Sample

1st row面牧4洞
2nd row高?1洞
3rd row瑞草3洞
4th row嘉?洞
5th row九老1洞
ValueCountFrequency (%)
牛耳洞 31
 
2.8%
三1洞 25
 
2.3%
路1.2.3.4街洞 25
 
2.3%
23
 
2.1%
瑞草3洞 20
 
1.8%
淸潭洞 20
 
1.8%
2洞 20
 
1.8%
谷洞 17
 
1.5%
明洞 16
 
1.4%
1洞 14
 
1.3%
Other values (287) 893
80.9%
2023-12-11T16:34:57.869564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1104
26.5%
? 566
 
13.6%
1 254
 
6.1%
2 213
 
5.1%
3 101
 
2.4%
. 85
 
2.0%
4 74
 
1.8%
71
 
1.7%
58
 
1.4%
51
 
1.2%
Other values (171) 1596
38.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2843
68.1%
Decimal Number 679
 
16.3%
Other Punctuation 651
 
15.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1104
38.8%
71
 
2.5%
58
 
2.0%
51
 
1.8%
41
 
1.4%
40
 
1.4%
39
 
1.4%
39
 
1.4%
36
 
1.3%
32
 
1.1%
Other values (161) 1332
46.9%
Decimal Number
ValueCountFrequency (%)
1 254
37.4%
2 213
31.4%
3 101
 
14.9%
4 74
 
10.9%
5 18
 
2.7%
6 13
 
1.9%
7 4
 
0.6%
8 2
 
0.3%
Other Punctuation
ValueCountFrequency (%)
? 566
86.9%
. 85
 
13.1%

Most occurring scripts

ValueCountFrequency (%)
Han 2826
67.7%
Common 1330
31.9%
Hangul 17
 
0.4%

Most frequent character per script

Han
ValueCountFrequency (%)
1104
39.1%
71
 
2.5%
58
 
2.1%
51
 
1.8%
41
 
1.5%
40
 
1.4%
39
 
1.4%
39
 
1.4%
36
 
1.3%
32
 
1.1%
Other values (158) 1315
46.5%
Common
ValueCountFrequency (%)
? 566
42.6%
1 254
19.1%
2 213
 
16.0%
3 101
 
7.6%
. 85
 
6.4%
4 74
 
5.6%
5 18
 
1.4%
6 13
 
1.0%
7 4
 
0.3%
8 2
 
0.2%
Hangul
ValueCountFrequency (%)
11
64.7%
5
29.4%
1
 
5.9%

Most occurring blocks

ValueCountFrequency (%)
CJK 2826
67.7%
ASCII 1330
31.9%
Hangul 17
 
0.4%

Most frequent character per block

CJK
ValueCountFrequency (%)
1104
39.1%
71
 
2.5%
58
 
2.1%
51
 
1.8%
41
 
1.5%
40
 
1.4%
39
 
1.4%
39
 
1.4%
36
 
1.3%
32
 
1.1%
Other values (158) 1315
46.5%
ASCII
ValueCountFrequency (%)
? 566
42.6%
1 254
19.1%
2 213
 
16.0%
3 101
 
7.6%
. 85
 
6.4%
4 74
 
5.6%
5 18
 
1.4%
6 13
 
1.0%
7 4
 
0.3%
8 2
 
0.2%
Hangul
ValueCountFrequency (%)
11
64.7%
5
29.4%
1
 
5.9%

Missing values

2023-12-11T16:34:54.424434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T16:34:54.574454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

명칭행정시행정구행정동
0BE_IW16-0511松林亭首?特?市中浪?面牧4洞
1BE_IW16-1049?村首?特?市江??高?1洞
2BE_IW16-1050?村首?特?市瑞草?瑞草3洞
3BE_IW16-1051?代花?首?特?市?路?嘉?洞
4BE_IW16-1052?代花?首?特?市九老?九老1洞
5BE_IW16-1053?代水?首?特?市中?中林洞
6BE_IW16-1054兄弟首?特?市城???水1街2洞
7BE_IW16-1055兄弟?排骨首?特?市江??千?2洞
8BE_IW16-1056兄弟餐?首?特?市?大??踏十里2洞
9BE_IW16-1057兄弟餐?首?特?市江北?松川洞
명칭행정시행정구행정동
1094BE_IW16-0501?家首?特?市城北?城北洞
1095BE_IW16-0502??餐?首?特?市江北?仁水洞
1096BE_IW16-0503松林巷首?特?市?雀?大方洞
1097BE_IW16-0504松田木炭排骨首?特?市江北?松川洞
1098BE_IW16-0505松香首?特?市永登浦??坪2洞
1099BE_IW16-0506松香首?特?市?山?元?路1洞
1100BE_IW16-0507高杆旅?首?特?市?路??化洞
1101BE_IW16-0508松潭泥??首?特?市麻浦?延南洞
1102BE_IW16-0509??餐?首?特?市?大???凉里洞
1103BE_IW16-0510松林?店首?特?市?津?紫?3洞