Overview

Dataset statistics

Number of variables6
Number of observations1494
Missing cells1
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory71.6 KiB
Average record size in memory49.1 B

Variable types

Text3
Numeric1
Categorical2

Dataset

Description키값,등록번호,상호,행정시,행정구,행정동
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-13040/S/1/datasetView.do

Alerts

행정구 is highly overall correlated with 행정시High correlation
행정시 is highly overall correlated with 행정구High correlation
행정시 is highly imbalanced (98.6%)Imbalance
키값 has unique valuesUnique
등록번호 has unique valuesUnique

Reproduction

Analysis started2024-04-06 11:50:52.545190
Analysis finished2024-04-06 11:50:54.858602
Duration2.31 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

키값
Text

UNIQUE 

Distinct1494
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size11.8 KiB
2024-04-06T20:50:55.216367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters20916
Distinct characters18
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1494 ?
Unique (%)100.0%

Sample

1st rowBE_LiST21-0595
2nd rowBE_LiST21-0596
3rd rowBE_LiST21-0597
4th rowBE_LiST21-0598
5th rowBE_LiST21-0599
ValueCountFrequency (%)
be_list21-0595 1
 
0.1%
be_list21-0403 1
 
0.1%
be_list21-0412 1
 
0.1%
be_list21-0411 1
 
0.1%
be_list21-0410 1
 
0.1%
be_list21-0409 1
 
0.1%
be_list21-0408 1
 
0.1%
be_list21-0407 1
 
0.1%
be_list21-0406 1
 
0.1%
be_list21-0405 1
 
0.1%
Other values (1484) 1484
99.3%
2024-04-06T20:50:55.879827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2489
11.9%
2 1994
9.5%
0 1496
 
7.2%
B 1494
 
7.1%
T 1494
 
7.1%
E 1494
 
7.1%
- 1494
 
7.1%
S 1494
 
7.1%
i 1494
 
7.1%
L 1494
 
7.1%
Other values (8) 4479
21.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8964
42.9%
Uppercase Letter 7470
35.7%
Dash Punctuation 1494
 
7.1%
Lowercase Letter 1494
 
7.1%
Connector Punctuation 1494
 
7.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2489
27.8%
2 1994
22.2%
0 1496
16.7%
3 500
 
5.6%
4 495
 
5.5%
5 399
 
4.5%
6 399
 
4.5%
8 399
 
4.5%
7 399
 
4.5%
9 394
 
4.4%
Uppercase Letter
ValueCountFrequency (%)
B 1494
20.0%
T 1494
20.0%
E 1494
20.0%
S 1494
20.0%
L 1494
20.0%
Dash Punctuation
ValueCountFrequency (%)
- 1494
100.0%
Lowercase Letter
ValueCountFrequency (%)
i 1494
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1494
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11952
57.1%
Latin 8964
42.9%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2489
20.8%
2 1994
16.7%
0 1496
12.5%
- 1494
12.5%
_ 1494
12.5%
3 500
 
4.2%
4 495
 
4.1%
5 399
 
3.3%
6 399
 
3.3%
8 399
 
3.3%
Other values (2) 793
 
6.6%
Latin
ValueCountFrequency (%)
B 1494
16.7%
T 1494
16.7%
E 1494
16.7%
S 1494
16.7%
i 1494
16.7%
L 1494
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20916
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2489
11.9%
2 1994
9.5%
0 1496
 
7.2%
B 1494
 
7.1%
T 1494
 
7.1%
E 1494
 
7.1%
- 1494
 
7.1%
S 1494
 
7.1%
i 1494
 
7.1%
L 1494
 
7.1%
Other values (8) 4479
21.4%

등록번호
Real number (ℝ)

UNIQUE 

Distinct1494
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2337.7544
Minimum1
Maximum9998
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.3 KiB
2024-04-06T20:50:56.162923image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile184.3
Q11364
median2577.5
Q33369.75
95-th percentile3988.7
Maximum9998
Range9997
Interquartile range (IQR)2005.75

Descriptive statistics

Standard deviation1237.5948
Coefficient of variation (CV)0.52939472
Kurtosis-0.19297006
Mean2337.7544
Median Absolute Deviation (MAD)947
Skewness-0.22293602
Sum3492605
Variance1531640.9
MonotonicityNot monotonic
2024-04-06T20:50:56.463456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1884 1
 
0.1%
1160 1
 
0.1%
3258 1
 
0.1%
3256 1
 
0.1%
1202 1
 
0.1%
1197 1
 
0.1%
1195 1
 
0.1%
1186 1
 
0.1%
1175 1
 
0.1%
1173 1
 
0.1%
Other values (1484) 1484
99.3%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
4 1
0.1%
16 1
0.1%
20 1
0.1%
21 1
0.1%
22 1
0.1%
24 1
0.1%
25 1
0.1%
27 1
0.1%
ValueCountFrequency (%)
9998 1
0.1%
4138 1
0.1%
4137 1
0.1%
4136 1
0.1%
4135 1
0.1%
4133 1
0.1%
4132 1
0.1%
4127 1
0.1%
4126 1
0.1%
4125 1
0.1%

상호
Text

Distinct1415
Distinct (%)94.7%
Missing0
Missing (%)0.0%
Memory size11.8 KiB
2024-04-06T20:50:57.225573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length75
Median length59
Mean length24.661312
Min length6

Characters and Unicode

Total characters36844
Distinct characters70
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1358 ?
Unique (%)90.9%

Sample

1st rowSeoul Lee Geon Dental Clinic
2nd rowMokhuri Oriental Medicine Hospital
3rd rowMy D Dermatology Clinic
4th rowEver M Dental Clinic
5th rowSeoul Mirae Hospital
ValueCountFrequency (%)
clinic 957
 
18.0%
surgery 358
 
6.7%
plastic 324
 
6.1%
dental 286
 
5.4%
medicine 180
 
3.4%
oriental 154
 
2.9%
hospital 151
 
2.8%
dermatology 109
 
2.0%
seoul 79
 
1.5%
gangnam 62
 
1.2%
Other values (1301) 2666
50.1%
2024-04-06T20:50:58.424127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3954
 
10.7%
i 3753
 
10.2%
n 3100
 
8.4%
e 2855
 
7.7%
l 2572
 
7.0%
a 2342
 
6.4%
c 1714
 
4.7%
r 1609
 
4.4%
o 1578
 
4.3%
t 1483
 
4.0%
Other values (60) 11884
32.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 27091
73.5%
Uppercase Letter 5494
 
14.9%
Space Separator 3954
 
10.7%
Other Punctuation 153
 
0.4%
Dash Punctuation 96
 
0.3%
Decimal Number 44
 
0.1%
Open Punctuation 6
 
< 0.1%
Close Punctuation 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 3753
13.9%
n 3100
11.4%
e 2855
10.5%
l 2572
9.5%
a 2342
8.6%
c 1714
 
6.3%
r 1609
 
5.9%
o 1578
 
5.8%
t 1483
 
5.5%
g 1150
 
4.2%
Other values (16) 4935
18.2%
Uppercase Letter
ValueCountFrequency (%)
C 1170
21.3%
S 724
13.2%
D 530
9.6%
M 448
 
8.2%
P 428
 
7.8%
H 286
 
5.2%
O 227
 
4.1%
G 200
 
3.6%
Y 149
 
2.7%
B 130
 
2.4%
Other values (15) 1202
21.9%
Decimal Number
ValueCountFrequency (%)
6 9
20.5%
3 8
18.2%
1 6
13.6%
2 5
11.4%
5 5
11.4%
7 4
9.1%
8 3
 
6.8%
0 2
 
4.5%
9 2
 
4.5%
Other Punctuation
ValueCountFrequency (%)
' 53
34.6%
& 49
32.0%
. 32
20.9%
, 11
 
7.2%
? 7
 
4.6%
: 1
 
0.7%
Space Separator
ValueCountFrequency (%)
3954
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 96
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 32585
88.4%
Common 4259
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 3753
 
11.5%
n 3100
 
9.5%
e 2855
 
8.8%
l 2572
 
7.9%
a 2342
 
7.2%
c 1714
 
5.3%
r 1609
 
4.9%
o 1578
 
4.8%
t 1483
 
4.6%
C 1170
 
3.6%
Other values (41) 10409
31.9%
Common
ValueCountFrequency (%)
3954
92.8%
- 96
 
2.3%
' 53
 
1.2%
& 49
 
1.2%
. 32
 
0.8%
, 11
 
0.3%
6 9
 
0.2%
3 8
 
0.2%
? 7
 
0.2%
( 6
 
0.1%
Other values (9) 34
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36844
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3954
 
10.7%
i 3753
 
10.2%
n 3100
 
8.4%
e 2855
 
7.7%
l 2572
 
7.0%
a 2342
 
6.4%
c 1714
 
4.7%
r 1609
 
4.4%
o 1578
 
4.3%
t 1483
 
4.0%
Other values (60) 11884
32.3%

행정시
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size11.8 KiB
Seoul
1491 
Gyeonggi-do
 
2
<NA>
 
1

Length

Max length11
Median length5
Mean length5.0073628
Min length4

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowSeoul
2nd rowSeoul
3rd rowSeoul
4th rowSeoul
5th rowSeoul

Common Values

ValueCountFrequency (%)
Seoul 1491
99.8%
Gyeonggi-do 2
 
0.1%
<NA> 1
 
0.1%

Length

2024-04-06T20:50:58.794776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T20:50:59.022600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
seoul 1491
99.8%
gyeonggi-do 2
 
0.1%
na 1
 
0.1%

행정구
Categorical

HIGH CORRELATION 

Distinct28
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size11.8 KiB
Gangnam-gu
740 
Seocho-gu
193 
Jung-gu
94 
Yeongdeungpo-gu
 
46
Songpa-gu
 
45
Other values (23)
376 

Length

Max length22
Median length10
Mean length9.8440428
Min length4

Unique

Unique3 ?
Unique (%)0.2%

Sample

1st rowSeocho-gu
2nd rowGangnam-gu
3rd rowGangdong-gu
4th rowGangnam-gu
5th rowGangnam-gu

Common Values

ValueCountFrequency (%)
Gangnam-gu 740
49.5%
Seocho-gu 193
 
12.9%
Jung-gu 94
 
6.3%
Yeongdeungpo-gu 46
 
3.1%
Songpa-gu 45
 
3.0%
Gangseo-gu 40
 
2.7%
Mapo-gu 33
 
2.2%
Dongdaemun-gu 28
 
1.9%
Jongno-gu 24
 
1.6%
Gwanak-gu 22
 
1.5%
Other values (18) 229
 
15.3%

Length

2024-04-06T20:50:59.292179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gangnam-gu 740
49.5%
seocho-gu 193
 
12.9%
jung-gu 94
 
6.3%
yeongdeungpo-gu 46
 
3.1%
songpa-gu 45
 
3.0%
gangseo-gu 40
 
2.7%
mapo-gu 33
 
2.2%
dongdaemun-gu 28
 
1.9%
jongno-gu 24
 
1.6%
gwanak-gu 22
 
1.5%
Other values (19) 231
 
15.4%
Distinct246
Distinct (%)16.5%
Missing1
Missing (%)0.1%
Memory size11.8 KiB
2024-04-06T20:50:59.801207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length18
Mean length12.606162
Min length8

Characters and Unicode

Total characters18821
Distinct characters49
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique118 ?
Unique (%)7.9%

Sample

1st rowSeocho4-dong
2nd rowDogok1-dong
3rd rowSeongnae2-dong
4th rowNonhyeon1-dong
5th rowSamseong1-dong
ValueCountFrequency (%)
apgujeong-dong 140
 
9.4%
yeoksam1-dong 137
 
9.2%
sinsa-dong 123
 
8.2%
nonhyeon1-dong 91
 
6.1%
cheongdam-dong 90
 
6.0%
seocho4-dong 75
 
5.0%
myeong-dong 56
 
3.8%
nonhyeon2-dong 56
 
3.8%
jamwon-dong 28
 
1.9%
samseong1-dong 21
 
1.4%
Other values (236) 676
45.3%
2024-04-06T20:51:00.573633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 2990
15.9%
n 2850
15.1%
g 2356
12.5%
d 1625
 
8.6%
- 1493
 
7.9%
e 1040
 
5.5%
a 894
 
4.8%
h 498
 
2.6%
S 414
 
2.2%
1 405
 
2.2%
Other values (39) 4256
22.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 14923
79.3%
Dash Punctuation 1493
 
7.9%
Uppercase Letter 1493
 
7.9%
Decimal Number 854
 
4.5%
Other Punctuation 58
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 2990
20.0%
n 2850
19.1%
g 2356
15.8%
d 1625
10.9%
e 1040
 
7.0%
a 894
 
6.0%
h 498
 
3.3%
s 397
 
2.7%
m 397
 
2.7%
i 305
 
2.0%
Other values (11) 1571
10.5%
Uppercase Letter
ValueCountFrequency (%)
S 414
27.7%
Y 199
13.3%
N 158
 
10.6%
A 149
 
10.0%
C 108
 
7.2%
J 101
 
6.8%
D 85
 
5.7%
M 83
 
5.6%
H 57
 
3.8%
G 56
 
3.8%
Other values (7) 83
 
5.6%
Decimal Number
ValueCountFrequency (%)
1 405
47.4%
2 202
23.7%
4 121
 
14.2%
3 77
 
9.0%
6 22
 
2.6%
5 15
 
1.8%
7 10
 
1.2%
0 1
 
0.1%
8 1
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 1493
100.0%
Other Punctuation
ValueCountFrequency (%)
. 58
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16416
87.2%
Common 2405
 
12.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 2990
18.2%
n 2850
17.4%
g 2356
14.4%
d 1625
9.9%
e 1040
 
6.3%
a 894
 
5.4%
h 498
 
3.0%
S 414
 
2.5%
s 397
 
2.4%
m 397
 
2.4%
Other values (28) 2955
18.0%
Common
ValueCountFrequency (%)
- 1493
62.1%
1 405
 
16.8%
2 202
 
8.4%
4 121
 
5.0%
3 77
 
3.2%
. 58
 
2.4%
6 22
 
0.9%
5 15
 
0.6%
7 10
 
0.4%
0 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18821
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 2990
15.9%
n 2850
15.1%
g 2356
12.5%
d 1625
 
8.6%
- 1493
 
7.9%
e 1040
 
5.5%
a 894
 
4.8%
h 498
 
2.6%
S 414
 
2.2%
1 405
 
2.2%
Other values (39) 4256
22.6%

Interactions

2024-04-06T20:50:54.330626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-06T20:51:00.773946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록번호행정시행정구
등록번호1.0000.0000.177
행정시0.0001.0001.000
행정구0.1771.0001.000
2024-04-06T20:51:00.937211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
행정구행정시
행정구1.0000.992
행정시0.9921.000
2024-04-06T20:51:01.089066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록번호행정시행정구
등록번호1.0000.0000.078
행정시0.0001.0000.992
행정구0.0780.9921.000

Missing values

2024-04-06T20:50:54.558486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-06T20:50:54.773523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

키값등록번호상호행정시행정구행정동
0BE_LiST21-05951884Seoul Lee Geon Dental ClinicSeoulSeocho-guSeocho4-dong
1BE_LiST21-05961901Mokhuri Oriental Medicine HospitalSeoulGangnam-guDogok1-dong
2BE_LiST21-05971902My D Dermatology ClinicSeoulGangdong-guSeongnae2-dong
3BE_LiST21-05981904Ever M Dental ClinicSeoulGangnam-guNonhyeon1-dong
4BE_LiST21-05991908Seoul Mirae HospitalSeoulGangnam-guSamseong1-dong
5BE_LiST21-06001910Lee Beom-geun Dental ClinicSeoulJung-guMyeong-dong
6BE_LiST21-06011915Geon Rehabilitation ClinicSeoulSeocho-guSeocho2-dong
7BE_LiST21-06021920Cham Teunteun HospitalSeoulGuro-guGuro3-dong
8BE_LiST21-06031921UD Dental ClinicSeoulSeongbuk-guDongseon-dong
9BE_LiST21-06041922Yeoreobun HospitalSeoulGangnam-guNonhyeon1-dong
키값등록번호상호행정시행정구행정동
1484BE_LiST21-14854121S-Top Plastic SurgerySeoulGangseo-guHwagok3-dong
1485BE_LiST21-14864127Seoul Surgical HospitalSeoulSongpa-guGarakbon-dong
1486BE_LiST21-14874123Sebarun HospitalSeoulSeocho-guSeocho3-dong
1487BE_LiST21-14884124Bareuda Yu Oriental Medicine ClinicSeoulJongno-guJongno1.2.3.4ga-dong
1488BE_LiST21-14894125CY ENT CenterSeoulGangnam-guYeoksam1-dong
1489BE_LiST21-14904132System Plastic SurgerySeoulGangnam-guCheongdam-dong
1490BE_LiST21-14914136Cheongdam Best Internal Medicine ClinicSeoulGangnam-guCheongdam-dong
1491BE_LiST21-14924133Caheum Pain ClinicSeoulMapo-guDohwa-dong
1492BE_LiST21-14934137TS Plastic SurgerySeoulGangnam-guNonhyeon1-dong
1493BE_LiST21-14944138WidWinDermatology ClinicSeoulGangnam-guApgujeong-dong