Overview

Dataset statistics

Number of variables6
Number of observations1494
Missing cells1
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory71.6 KiB
Average record size in memory49.1 B

Variable types

Text3
Numeric1
Categorical2

Dataset

Description키값,등록번호,상호,행정시,행정구,행정동
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-13041/S/1/datasetView.do

Alerts

행정구 is highly overall correlated with 행정시High correlation
행정시 is highly overall correlated with 행정구High correlation
행정시 is highly imbalanced (98.6%)Imbalance
키값 has unique valuesUnique
등록번호 has unique valuesUnique

Reproduction

Analysis started2024-04-19 06:17:14.987129
Analysis finished2024-04-19 06:17:15.739689
Duration0.75 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

키값
Text

UNIQUE 

Distinct1494
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size11.8 KiB
2024-04-19T15:17:15.906720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters20916
Distinct characters18
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1494 ?
Unique (%)100.0%

Sample

1st rowBE_LiST21-0936
2nd rowBE_LiST21-0937
3rd rowBE_LiST21-0938
4th rowBE_LiST21-0939
5th rowBE_LiST21-0940
ValueCountFrequency (%)
be_list21-0936 1
 
0.1%
be_list21-0673 1
 
0.1%
be_list21-0682 1
 
0.1%
be_list21-0681 1
 
0.1%
be_list21-0680 1
 
0.1%
be_list21-0679 1
 
0.1%
be_list21-0678 1
 
0.1%
be_list21-0677 1
 
0.1%
be_list21-0676 1
 
0.1%
be_list21-0675 1
 
0.1%
Other values (1484) 1484
99.3%
2024-04-19T15:17:16.338773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2489
11.9%
2 1994
9.5%
0 1496
 
7.2%
B 1494
 
7.1%
T 1494
 
7.1%
E 1494
 
7.1%
- 1494
 
7.1%
S 1494
 
7.1%
i 1494
 
7.1%
L 1494
 
7.1%
Other values (8) 4479
21.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8964
42.9%
Uppercase Letter 7470
35.7%
Dash Punctuation 1494
 
7.1%
Lowercase Letter 1494
 
7.1%
Connector Punctuation 1494
 
7.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2489
27.8%
2 1994
22.2%
0 1496
16.7%
3 500
 
5.6%
4 495
 
5.5%
6 399
 
4.5%
7 399
 
4.5%
5 399
 
4.5%
8 399
 
4.5%
9 394
 
4.4%
Uppercase Letter
ValueCountFrequency (%)
B 1494
20.0%
T 1494
20.0%
E 1494
20.0%
S 1494
20.0%
L 1494
20.0%
Dash Punctuation
ValueCountFrequency (%)
- 1494
100.0%
Lowercase Letter
ValueCountFrequency (%)
i 1494
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1494
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11952
57.1%
Latin 8964
42.9%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2489
20.8%
2 1994
16.7%
0 1496
12.5%
- 1494
12.5%
_ 1494
12.5%
3 500
 
4.2%
4 495
 
4.1%
6 399
 
3.3%
7 399
 
3.3%
5 399
 
3.3%
Other values (2) 793
 
6.6%
Latin
ValueCountFrequency (%)
B 1494
16.7%
T 1494
16.7%
E 1494
16.7%
S 1494
16.7%
i 1494
16.7%
L 1494
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20916
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2489
11.9%
2 1994
9.5%
0 1496
 
7.2%
B 1494
 
7.1%
T 1494
 
7.1%
E 1494
 
7.1%
- 1494
 
7.1%
S 1494
 
7.1%
i 1494
 
7.1%
L 1494
 
7.1%
Other values (8) 4479
21.4%

등록번호
Real number (ℝ)

UNIQUE 

Distinct1494
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2337.7544
Minimum1
Maximum9998
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.3 KiB
2024-04-19T15:17:16.480976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile184.3
Q11364
median2577.5
Q33369.75
95-th percentile3988.7
Maximum9998
Range9997
Interquartile range (IQR)2005.75

Descriptive statistics

Standard deviation1237.5948
Coefficient of variation (CV)0.52939472
Kurtosis-0.19297006
Mean2337.7544
Median Absolute Deviation (MAD)947
Skewness-0.22293602
Sum3492605
Variance1531640.9
MonotonicityNot monotonic
2024-04-19T15:17:16.620074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2688 1
 
0.1%
2783 1
 
0.1%
2349 1
 
0.1%
2312 1
 
0.1%
2330 1
 
0.1%
2326 1
 
0.1%
2315 1
 
0.1%
2298 1
 
0.1%
2279 1
 
0.1%
2281 1
 
0.1%
Other values (1484) 1484
99.3%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
4 1
0.1%
16 1
0.1%
20 1
0.1%
21 1
0.1%
22 1
0.1%
24 1
0.1%
25 1
0.1%
27 1
0.1%
ValueCountFrequency (%)
9998 1
0.1%
4138 1
0.1%
4137 1
0.1%
4136 1
0.1%
4135 1
0.1%
4133 1
0.1%
4132 1
0.1%
4127 1
0.1%
4126 1
0.1%
4125 1
0.1%

상호
Text

Distinct1384
Distinct (%)92.6%
Missing0
Missing (%)0.0%
Memory size11.8 KiB
2024-04-19T15:17:16.840237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.6807229
Min length3

Characters and Unicode

Total characters11475
Distinct characters583
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1317 ?
Unique (%)88.2%

Sample

1st row世美整形外科?院
2nd rowUD江南牙科?院
3rd row?挺挺?院
4th rowJS美?院
5th row三星?耳鼻咽喉科
ValueCountFrequency (%)
18
 
1.2%
ud牙科?院 12
 
0.8%
整形外科?院 11
 
0.7%
牙科?院 9
 
0.6%
拉???院 5
 
0.3%
熙春??院 4
 
0.3%
美皮?科?院 4
 
0.3%
微笑?牙科?院 4
 
0.3%
挺挺?院 4
 
0.3%
the 4
 
0.3%
Other values (1369) 1443
95.1%
2024-04-19T15:17:17.225715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
? 3028
26.4%
1284
 
11.2%
915
 
8.0%
349
 
3.0%
332
 
2.9%
332
 
2.9%
291
 
2.5%
125
 
1.1%
108
 
0.9%
e 88
 
0.8%
Other values (573) 4623
40.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6796
59.2%
Other Punctuation 3060
26.7%
Uppercase Letter 971
 
8.5%
Lowercase Letter 503
 
4.4%
Decimal Number 59
 
0.5%
Close Punctuation 26
 
0.2%
Space Separator 25
 
0.2%
Open Punctuation 24
 
0.2%
Dash Punctuation 6
 
0.1%
Math Symbol 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1284
18.9%
915
 
13.5%
349
 
5.1%
332
 
4.9%
332
 
4.9%
291
 
4.3%
125
 
1.8%
108
 
1.6%
84
 
1.2%
84
 
1.2%
Other values (499) 2892
42.6%
Uppercase Letter
ValueCountFrequency (%)
S 81
 
8.3%
E 76
 
7.8%
I 66
 
6.8%
N 63
 
6.5%
A 60
 
6.2%
U 56
 
5.8%
M 50
 
5.1%
D 50
 
5.1%
L 48
 
4.9%
B 40
 
4.1%
Other values (16) 381
39.2%
Lowercase Letter
ValueCountFrequency (%)
e 88
17.5%
a 46
9.1%
i 46
9.1%
n 40
 
8.0%
r 35
 
7.0%
l 35
 
7.0%
o 31
 
6.2%
s 28
 
5.6%
h 22
 
4.4%
u 22
 
4.4%
Other values (15) 110
21.9%
Decimal Number
ValueCountFrequency (%)
1 11
18.6%
5 8
13.6%
6 8
13.6%
3 8
13.6%
8 6
10.2%
2 6
10.2%
4 4
 
6.8%
0 3
 
5.1%
7 3
 
5.1%
9 2
 
3.4%
Other Punctuation
ValueCountFrequency (%)
? 3028
99.0%
& 20
 
0.7%
' 6
 
0.2%
. 3
 
0.1%
: 1
 
< 0.1%
1
 
< 0.1%
, 1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
25
96.2%
) 1
 
3.8%
Space Separator
ValueCountFrequency (%)
25
100.0%
Open Punctuation
ValueCountFrequency (%)
24
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%
Math Symbol
ValueCountFrequency (%)
+ 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 6788
59.2%
Common 3205
27.9%
Latin 1474
 
12.8%
Hangul 8
 
0.1%

Most frequent character per script

Han
ValueCountFrequency (%)
1284
18.9%
915
 
13.5%
349
 
5.1%
332
 
4.9%
332
 
4.9%
291
 
4.3%
125
 
1.8%
108
 
1.6%
84
 
1.2%
84
 
1.2%
Other values (494) 2884
42.5%
Latin
ValueCountFrequency (%)
e 88
 
6.0%
S 81
 
5.5%
E 76
 
5.2%
I 66
 
4.5%
N 63
 
4.3%
A 60
 
4.1%
U 56
 
3.8%
M 50
 
3.4%
D 50
 
3.4%
L 48
 
3.3%
Other values (41) 836
56.7%
Common
ValueCountFrequency (%)
? 3028
94.5%
25
 
0.8%
25
 
0.8%
24
 
0.7%
& 20
 
0.6%
1 11
 
0.3%
5 8
 
0.2%
6 8
 
0.2%
3 8
 
0.2%
8 6
 
0.2%
Other values (13) 42
 
1.3%
Hangul
ValueCountFrequency (%)
2
25.0%
2
25.0%
2
25.0%
1
12.5%
1
12.5%

Most occurring blocks

ValueCountFrequency (%)
CJK 6788
59.2%
ASCII 4629
40.3%
None 50
 
0.4%
Hangul 8
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
? 3028
65.4%
e 88
 
1.9%
S 81
 
1.7%
E 76
 
1.6%
I 66
 
1.4%
N 63
 
1.4%
A 60
 
1.3%
U 56
 
1.2%
M 50
 
1.1%
D 50
 
1.1%
Other values (61) 1011
 
21.8%
CJK
ValueCountFrequency (%)
1284
18.9%
915
 
13.5%
349
 
5.1%
332
 
4.9%
332
 
4.9%
291
 
4.3%
125
 
1.8%
108
 
1.6%
84
 
1.2%
84
 
1.2%
Other values (494) 2884
42.5%
None
ValueCountFrequency (%)
25
50.0%
24
48.0%
1
 
2.0%
Hangul
ValueCountFrequency (%)
2
25.0%
2
25.0%
2
25.0%
1
12.5%
1
12.5%

행정시
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size11.8 KiB
首?特?市
1491 
京畿道
 
2
<NA>
 
1

Length

Max length5
Median length5
Mean length4.9966533
Min length3

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row首?特?市
2nd row首?特?市
3rd row首?特?市
4th row首?特?市
5th row首?特?市

Common Values

ValueCountFrequency (%)
首?特?市 1491
99.8%
京畿道 2
 
0.1%
<NA> 1
 
0.1%

Length

2024-04-19T15:17:17.372542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-19T15:17:17.802833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
首?特?市 1491
99.8%
京畿道 2
 
0.1%
na 1
 
0.1%

행정구
Categorical

HIGH CORRELATION 

Distinct28
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size11.8 KiB
江南?
740 
瑞草?
193 
中?
94 
永登浦?
 
46
松坡?
 
45
Other values (23)
376 

Length

Max length7
Median length3
Mean length3.0046854
Min length2

Unique

Unique3 ?
Unique (%)0.2%

Sample

1st row江南?
2nd row江南?
3rd row?原?
4th row江南?
5th row?路?

Common Values

ValueCountFrequency (%)
江南? 740
49.5%
瑞草? 193
 
12.9%
中? 94
 
6.3%
永登浦? 46
 
3.1%
松坡? 45
 
3.0%
江西? 40
 
2.7%
麻浦? 33
 
2.2%
?大?? 28
 
1.9%
?路? 24
 
1.6%
冠岳? 22
 
1.5%
Other values (18) 229
 
15.3%

Length

2024-04-19T15:17:17.907815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
江南 740
49.5%
瑞草 193
 
12.9%
94
 
6.3%
永登浦 46
 
3.1%
松坡 45
 
3.0%
江西 40
 
2.7%
麻浦 33
 
2.2%
28
 
1.9%
24
 
1.6%
冠岳 22
 
1.5%
Other values (19) 231
 
15.4%
Distinct234
Distinct (%)15.7%
Missing1
Missing (%)0.1%
Memory size11.8 KiB
2024-04-19T15:17:18.228828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length4
Mean length3.7206966
Min length2

Characters and Unicode

Total characters5555
Distinct characters163
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique108 ?
Unique (%)7.2%

Sample

1st row新沙洞
2nd row?三1洞
3rd row上?6.7洞
4th row?三1洞
5th row?路1.2.3.4街洞
ValueCountFrequency (%)
狎?亭洞 140
 
9.4%
三1洞 137
 
9.2%
新沙洞 123
 
8.2%
1洞 91
 
6.1%
淸潭洞 90
 
6.0%
瑞草4洞 75
 
5.0%
2洞 59
 
4.0%
明洞 56
 
3.8%
쒧院洞 28
 
1.9%
三成1洞 21
 
1.4%
Other values (222) 673
45.1%
2024-04-19T15:17:18.679299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1492
26.9%
? 869
15.6%
1 404
 
7.3%
2 204
 
3.7%
193
 
3.5%
169
 
3.0%
146
 
2.6%
140
 
2.5%
126
 
2.3%
122
 
2.2%
Other values (153) 1690
30.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3774
67.9%
Other Punctuation 927
 
16.7%
Decimal Number 854
 
15.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1492
39.5%
193
 
5.1%
169
 
4.5%
146
 
3.9%
140
 
3.7%
126
 
3.3%
122
 
3.2%
122
 
3.2%
90
 
2.4%
90
 
2.4%
Other values (142) 1084
28.7%
Decimal Number
ValueCountFrequency (%)
1 404
47.3%
2 204
23.9%
4 121
 
14.2%
3 76
 
8.9%
6 22
 
2.6%
5 15
 
1.8%
7 10
 
1.2%
8 1
 
0.1%
0 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
? 869
93.7%
. 58
 
6.3%

Most occurring scripts

ValueCountFrequency (%)
Han 3713
66.8%
Common 1781
32.1%
Hangul 61
 
1.1%

Most frequent character per script

Han
ValueCountFrequency (%)
1492
40.2%
193
 
5.2%
169
 
4.6%
146
 
3.9%
140
 
3.8%
126
 
3.4%
122
 
3.3%
122
 
3.3%
90
 
2.4%
90
 
2.4%
Other values (139) 1023
27.6%
Common
ValueCountFrequency (%)
? 869
48.8%
1 404
22.7%
2 204
 
11.5%
4 121
 
6.8%
3 76
 
4.3%
. 58
 
3.3%
6 22
 
1.2%
5 15
 
0.8%
7 10
 
0.6%
8 1
 
0.1%
Hangul
ValueCountFrequency (%)
53
86.9%
6
 
9.8%
2
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
CJK 3711
66.8%
ASCII 1781
32.1%
Hangul 61
 
1.1%
CJK Compat Ideographs 2
 
< 0.1%

Most frequent character per block

CJK
ValueCountFrequency (%)
1492
40.2%
193
 
5.2%
169
 
4.6%
146
 
3.9%
140
 
3.8%
126
 
3.4%
122
 
3.3%
122
 
3.3%
90
 
2.4%
90
 
2.4%
Other values (138) 1021
27.5%
ASCII
ValueCountFrequency (%)
? 869
48.8%
1 404
22.7%
2 204
 
11.5%
4 121
 
6.8%
3 76
 
4.3%
. 58
 
3.3%
6 22
 
1.2%
5 15
 
0.8%
7 10
 
0.6%
8 1
 
0.1%
Hangul
ValueCountFrequency (%)
53
86.9%
6
 
9.8%
2
 
3.3%
CJK Compat Ideographs
ValueCountFrequency (%)
2
100.0%

Interactions

2024-04-19T15:17:15.472889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-19T15:17:18.784071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록번호행정시행정구
등록번호1.0000.0000.177
행정시0.0001.0001.000
행정구0.1771.0001.000
2024-04-19T15:17:18.871179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
행정구행정시
행정구1.0000.992
행정시0.9921.000
2024-04-19T15:17:18.948397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록번호행정시행정구
등록번호1.0000.0000.078
행정시0.0001.0000.992
행정구0.0780.9921.000

Missing values

2024-04-19T15:17:15.583626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-19T15:17:15.701128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

키값등록번호상호행정시행정구행정동
0BE_LiST21-09362688世美整形外科?院首?特?市江南?新沙洞
1BE_LiST21-09372645UD江南牙科?院首?特?市江南??三1洞
2BE_LiST21-09382659?挺挺?院首?特?市?原?上?6.7洞
3BE_LiST21-09392637JS美?院首?特?市江南??三1洞
4BE_LiST21-09402689三星?耳鼻咽喉科首?特?市?路??路1.2.3.4街洞
5BE_LiST21-09412647江南高??院首?特?市冠岳?幸?洞
6BE_LiST21-09422649我的未?皮?科?院首?特?市永登浦?汝矣?洞
7BE_LiST21-09432681SEBARUN?院首?特?市江西?登村1洞
8BE_LiST21-09442690?永?院首?特?市瑞草?瑞草4洞
9BE_LiST21-09452686威尼斯牙科?院首?特?市中?光熙洞
키값등록번호상호행정시행정구행정동
1484BE_LiST21-04641429RAUM整形外科?院首?特?市江南?狎?亭洞
1485BE_LiST21-04651431UD牙科?院首?特?市?原?上?6.7洞
1486BE_LiST21-04661438松坡第一?院首?特?市松坡?松坡1洞
1487BE_LiST21-04671448S普?普姿整形外科首?特?市江南???1洞
1488BE_LiST21-04681451UD牙科?院首?特?市瑞草?瑞草2洞
1489BE_LiST21-04691452?熙??院首?特?市?大??祭基洞
1490BE_LiST21-04701456???牙科?院首?特?市中?明洞
1491BE_LiST21-04711457UD牙科?院首?特?市永登浦?汝矣?洞
1492BE_LiST21-04721461UD牙科?院首?特?市?大??典?2洞
1493BE_LiST21-04731463UD牙科?院首?特?市麻浦?西?洞