Overview

Dataset statistics

Number of variables1
Number of observations429
Missing cells0
Missing cells (%)0.0%
Duplicate rows14
Duplicate rows (%)3.3%
Total size in memory3.5 KiB
Average record size in memory8.3 B

Variable types

Text1

Dataset

Description강남구에 위치한 400여개 의료기관에 대한 기관명 정보에 대해 데이터를 제공합니다.(중국어), 자세한 사항은 강남구청 관광진흥과로 문의하여 주시기 바랍니다.
Author서울특별시 강남구
URLhttps://www.data.go.kr/data/15072594/fileData.do

Alerts

Dataset has 14 (3.3%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 02:17:01.772929
Analysis finished2023-12-12 02:17:02.083649
Duration0.31 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct400
Distinct (%)93.2%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
2023-12-12T11:17:02.262585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length37
Median length25
Mean length7.0792541
Min length1

Characters and Unicode

Total characters3037
Distinct characters370
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique386 ?
Unique (%)90.0%

Sample

1st row多古整形外科
2nd row光明眼科中心
3rd row代美整形外科所
4th row迪斯整形外科
5th rowMD 乳腺外科所
ValueCountFrequency (%)
15
 
2.7%
整形外科 11
 
2.0%
牙科所 9
 
1.6%
江南severance院 7
 
1.2%
整形外科所 6
 
1.1%
三星首中心 5
 
0.9%
眼科所 4
 
0.7%
皮所 4
 
0.7%
spa 3
 
0.5%
皮科所 3
 
0.5%
Other values (463) 493
88.0%
2023-12-12T11:17:02.730608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
224
 
7.4%
189
 
6.2%
131
 
4.3%
127
 
4.2%
126
 
4.1%
125
 
4.1%
e 91
 
3.0%
a 58
 
1.9%
n 54
 
1.8%
52
 
1.7%
Other values (360) 1860
61.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1816
59.8%
Lowercase Letter 541
 
17.8%
Uppercase Letter 486
 
16.0%
Space Separator 131
 
4.3%
Other Punctuation 23
 
0.8%
Decimal Number 20
 
0.7%
Close Punctuation 7
 
0.2%
Open Punctuation 7
 
0.2%
Dash Punctuation 3
 
0.1%
Modifier Symbol 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
224
 
12.3%
189
 
10.4%
127
 
7.0%
126
 
6.9%
125
 
6.9%
52
 
2.9%
52
 
2.9%
39
 
2.1%
29
 
1.6%
28
 
1.5%
Other values (286) 825
45.4%
Lowercase Letter
ValueCountFrequency (%)
e 91
16.8%
a 58
10.7%
n 54
10.0%
o 39
 
7.2%
i 38
 
7.0%
r 36
 
6.7%
l 27
 
5.0%
u 27
 
5.0%
m 24
 
4.4%
h 22
 
4.1%
Other values (16) 125
23.1%
Uppercase Letter
ValueCountFrequency (%)
S 49
 
10.1%
A 35
 
7.2%
E 33
 
6.8%
N 29
 
6.0%
O 29
 
6.0%
M 26
 
5.3%
I 25
 
5.1%
L 23
 
4.7%
B 23
 
4.7%
C 23
 
4.7%
Other values (16) 191
39.3%
Decimal Number
ValueCountFrequency (%)
2 3
15.0%
1 3
15.0%
8 2
10.0%
6 2
10.0%
0 2
10.0%
3 2
10.0%
4 2
10.0%
9 2
10.0%
7 1
 
5.0%
5 1
 
5.0%
Other Punctuation
ValueCountFrequency (%)
& 9
39.1%
. 7
30.4%
' 4
17.4%
2
 
8.7%
/ 1
 
4.3%
Close Punctuation
ValueCountFrequency (%)
6
85.7%
) 1
 
14.3%
Open Punctuation
ValueCountFrequency (%)
4
57.1%
( 3
42.9%
Space Separator
ValueCountFrequency (%)
131
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 1775
58.4%
Latin 1027
33.8%
Common 194
 
6.4%
Hangul 41
 
1.4%

Most frequent character per script

Han
ValueCountFrequency (%)
224
 
12.6%
189
 
10.6%
127
 
7.2%
126
 
7.1%
125
 
7.0%
52
 
2.9%
52
 
2.9%
39
 
2.2%
29
 
1.6%
28
 
1.6%
Other values (264) 784
44.2%
Latin
ValueCountFrequency (%)
e 91
 
8.9%
a 58
 
5.6%
n 54
 
5.3%
S 49
 
4.8%
o 39
 
3.8%
i 38
 
3.7%
r 36
 
3.5%
A 35
 
3.4%
E 33
 
3.2%
N 29
 
2.8%
Other values (42) 565
55.0%
Common
ValueCountFrequency (%)
131
67.5%
& 9
 
4.6%
. 7
 
3.6%
6
 
3.1%
' 4
 
2.1%
4
 
2.1%
- 3
 
1.5%
( 3
 
1.5%
2 3
 
1.5%
` 3
 
1.5%
Other values (12) 21
 
10.8%
Hangul
ValueCountFrequency (%)
6
14.6%
4
 
9.8%
4
 
9.8%
4
 
9.8%
2
 
4.9%
2
 
4.9%
2
 
4.9%
2
 
4.9%
2
 
4.9%
1
 
2.4%
Other values (12) 12
29.3%

Most occurring blocks

ValueCountFrequency (%)
CJK 1775
58.4%
ASCII 1209
39.8%
Hangul 41
 
1.4%
None 12
 
0.4%

Most frequent character per block

CJK
ValueCountFrequency (%)
224
 
12.6%
189
 
10.6%
127
 
7.2%
126
 
7.1%
125
 
7.0%
52
 
2.9%
52
 
2.9%
39
 
2.2%
29
 
1.6%
28
 
1.6%
Other values (264) 784
44.2%
ASCII
ValueCountFrequency (%)
131
 
10.8%
e 91
 
7.5%
a 58
 
4.8%
n 54
 
4.5%
S 49
 
4.1%
o 39
 
3.2%
i 38
 
3.1%
r 36
 
3.0%
A 35
 
2.9%
E 33
 
2.7%
Other values (61) 645
53.3%
Hangul
ValueCountFrequency (%)
6
14.6%
4
 
9.8%
4
 
9.8%
4
 
9.8%
2
 
4.9%
2
 
4.9%
2
 
4.9%
2
 
4.9%
2
 
4.9%
1
 
2.4%
Other values (12) 12
29.3%
None
ValueCountFrequency (%)
6
50.0%
4
33.3%
2
 
16.7%

Missing values

2023-12-12T11:17:01.990394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T11:17:02.053402image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

기관명
0多古整形外科
1光明眼科中心
2代美整形外科所
3迪斯整形外科
4MD 乳腺外科所
5半塘整形外科
6卓越整形外科
7歌整形外科
8牙科所SOJOONG
9For.B整形外科
기관명
419
420江南家庭旅
421格拉莫斯酒店
422酒店
423特里酒店
424最佳西方精品江南酒店
425首大使江南富特酒店
426首思酒店
427JBIS酒店
428克伍德豪景世中心

Duplicate rows

Most frequently occurring

기관명# duplicates
10江南Severance院7
1三星首中心5
6整形外科5
43
7整形外科所3
9氏口腔院3
11牙科所3
0CHA大CHA江南院2
2形象整形外科2
3我整形外科2