Overview

Dataset statistics

Number of variables1
Number of observations424
Missing cells0
Missing cells (%)0.0%
Duplicate rows7
Duplicate rows (%)1.7%
Total size in memory3.4 KiB
Average record size in memory8.3 B

Variable types

Text1

Dataset

Description서울특별시 강남구에 위치한 400여개 의료기관에 대한 기관명 데이터를 제공합니다.(일본어) 자세한 사항은 서울특별시 강남구 관관진흥과로 문의하여 주시기 바랍니다.
Author서울특별시 강남구
URLhttps://www.data.go.kr/data/15072593/fileData.do

Alerts

Dataset has 7 (1.7%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 18:18:53.250558
Analysis finished2023-12-12 18:18:53.496533
Duration0.25 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct411
Distinct (%)96.9%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
2023-12-13T03:18:53.676903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length49
Median length26
Mean length9.8632075
Min length3

Characters and Unicode

Total characters4182
Distinct characters271
Distinct categories11 ?
Distinct scripts6 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique404 ?
Unique (%)95.3%

Sample

1st rowTAKO美容外科
2nd rowバルグン聖母眼科
3rd row現代美美容形成外科
4th row潭女神美容外科
5th rowMDクリニック 胸整形センタ
ValueCountFrequency (%)
美容形成外科 31
 
5.0%
19
 
3.0%
科院 16
 
2.6%
皮膚科院 12
 
1.9%
病院 10
 
1.6%
韓院 9
 
1.4%
gangnam 8
 
1.3%
眼科院 6
 
1.0%
kim 6
 
1.0%
the 5
 
0.8%
Other values (459) 504
80.5%
2023-12-13T03:18:54.067501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
281
 
6.7%
217
 
5.2%
167
 
4.0%
141
 
3.4%
133
 
3.2%
130
 
3.1%
126
 
3.0%
e 107
 
2.6%
n 85
 
2.0%
a 81
 
1.9%
Other values (261) 2714
64.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2646
63.3%
Lowercase Letter 795
 
19.0%
Uppercase Letter 442
 
10.6%
Space Separator 220
 
5.3%
Other Punctuation 23
 
0.5%
Decimal Number 20
 
0.5%
Dash Punctuation 12
 
0.3%
Open Punctuation 8
 
0.2%
Close Punctuation 8
 
0.2%
Modifier Symbol 7
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
281
 
10.6%
167
 
6.3%
141
 
5.3%
133
 
5.0%
130
 
4.9%
126
 
4.8%
74
 
2.8%
65
 
2.5%
59
 
2.2%
56
 
2.1%
Other values (182) 1414
53.4%
Uppercase Letter
ValueCountFrequency (%)
S 45
 
10.2%
C 29
 
6.6%
A 29
 
6.6%
M 23
 
5.2%
E 23
 
5.2%
L 23
 
5.2%
O 22
 
5.0%
I 21
 
4.8%
K 21
 
4.8%
B 20
 
4.5%
Other values (19) 186
42.1%
Lowercase Letter
ValueCountFrequency (%)
e 107
13.5%
n 85
10.7%
a 81
10.2%
o 70
 
8.8%
i 53
 
6.7%
u 44
 
5.5%
l 44
 
5.5%
g 43
 
5.4%
r 43
 
5.4%
m 43
 
5.4%
Other values (14) 182
22.9%
Decimal Number
ValueCountFrequency (%)
3 3
15.0%
1 3
15.0%
2 3
15.0%
0 2
10.0%
6 2
10.0%
8 2
10.0%
9 2
10.0%
7 1
 
5.0%
5 1
 
5.0%
4 1
 
5.0%
Other Punctuation
ValueCountFrequency (%)
. 10
43.5%
& 7
30.4%
/ 2
 
8.7%
1
 
4.3%
· 1
 
4.3%
: 1
 
4.3%
1
 
4.3%
Space Separator
ValueCountFrequency (%)
217
98.6%
  3
 
1.4%
Open Punctuation
ValueCountFrequency (%)
( 7
87.5%
1
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 7
87.5%
1
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
- 12
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 7
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 1348
32.2%
Katakana 1274
30.5%
Latin 1237
29.6%
Common 299
 
7.1%
Hangul 16
 
0.4%
Hiragana 8
 
0.2%

Most frequent character per script

Han
ValueCountFrequency (%)
281
20.8%
167
12.4%
141
10.5%
130
9.6%
126
9.3%
56
 
4.2%
53
 
3.9%
44
 
3.3%
38
 
2.8%
38
 
2.8%
Other values (86) 274
20.3%
Katakana
ValueCountFrequency (%)
133
 
10.4%
74
 
5.8%
65
 
5.1%
59
 
4.6%
51
 
4.0%
48
 
3.8%
45
 
3.5%
40
 
3.1%
37
 
2.9%
37
 
2.9%
Other values (64) 685
53.8%
Latin
ValueCountFrequency (%)
e 107
 
8.6%
n 85
 
6.9%
a 81
 
6.5%
o 70
 
5.7%
i 53
 
4.3%
S 45
 
3.6%
u 44
 
3.6%
l 44
 
3.6%
g 43
 
3.5%
r 43
 
3.5%
Other values (43) 622
50.3%
Common
ValueCountFrequency (%)
217
72.6%
- 12
 
4.0%
. 10
 
3.3%
( 7
 
2.3%
) 7
 
2.3%
` 7
 
2.3%
& 7
 
2.3%
3 3
 
1.0%
1 3
 
1.0%
  3
 
1.0%
Other values (16) 23
 
7.7%
Hangul
ValueCountFrequency (%)
2
 
12.5%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
Other values (5) 5
31.2%
Hiragana
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1524
36.4%
CJK 1348
32.2%
Katakana 1274
30.5%
Hangul 16
 
0.4%
None 12
 
0.3%
Hiragana 8
 
0.2%

Most frequent character per block

CJK
ValueCountFrequency (%)
281
20.8%
167
12.4%
141
10.5%
130
9.6%
126
9.3%
56
 
4.2%
53
 
3.9%
44
 
3.3%
38
 
2.8%
38
 
2.8%
Other values (86) 274
20.3%
ASCII
ValueCountFrequency (%)
217
 
14.2%
e 107
 
7.0%
n 85
 
5.6%
a 81
 
5.3%
o 70
 
4.6%
i 53
 
3.5%
S 45
 
3.0%
u 44
 
2.9%
l 44
 
2.9%
g 43
 
2.8%
Other values (60) 735
48.2%
Katakana
ValueCountFrequency (%)
133
 
10.4%
74
 
5.8%
65
 
5.1%
59
 
4.6%
51
 
4.0%
48
 
3.8%
45
 
3.5%
40
 
3.1%
37
 
2.9%
37
 
2.9%
Other values (64) 685
53.8%
None
ValueCountFrequency (%)
  3
25.0%
2
16.7%
1
 
8.3%
1
 
8.3%
1
 
8.3%
· 1
 
8.3%
1
 
8.3%
1
 
8.3%
1
 
8.3%
Hiragana
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Hangul
ValueCountFrequency (%)
2
 
12.5%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
Other values (5) 5
31.2%

Missing values

2023-12-13T03:18:53.405282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T03:18:53.472759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

기관명
0TAKO美容外科
1バルグン聖母眼科
2現代美美容形成外科
3潭女神美容外科
4MDクリニック 胸整形センタ
5バタン美容外科
6プリミア美容外科
7オペラ美容整形外科
8ソジュン科
9For.B美容外科
기관명
414Foreheal
415カンナムファミリホテル
416Grammos Hotel
417ホテル.ザ.デザイナス
418ツリアホテル
419ベストウエスタンプレミア江南ホテル
420パクハイヤット
421ホテルリッツカルトン ソウル
422JBIS Hotel
423オクウッドプリミアコエックスセンタ

Duplicate rows

Most frequently occurring

기관명# duplicates
1サムソンソウル病院5
5江南セブランス病院4
4江南セブランス 病院3
0グロビ美容外科2
2ハヌルチェ韓院2
3ラビアン美容外科2
6自生韓方病院2