Overview

Dataset statistics

Number of variables5
Number of observations500
Missing cells12
Missing cells (%)0.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory20.6 KiB
Average record size in memory42.3 B

Variable types

Numeric2
Categorical1
Text2

Alerts

CTY_NM has constant value ""Constant
RSTRNT_TEL_NO has 12 (2.4%) missing valuesMissing
OVSEA_RSTRNT_ID has unique valuesUnique
RSTRNT_NM has unique valuesUnique

Reproduction

Analysis started2023-12-10 09:47:56.739278
Analysis finished2023-12-10 09:47:59.244145
Duration2.5 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

OVSEA_RSTRNT_ID
Real number (ℝ)

UNIQUE 

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean501247.87
Minimum500000
Maximum502598
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T18:47:59.830016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum500000
5-th percentile500155.5
Q1500628
median501149
Q3501890.5
95-th percentile502465.65
Maximum502598
Range2598
Interquartile range (IQR)1262.5

Descriptive statistics

Standard deviation734.62287
Coefficient of variation (CV)0.001465588
Kurtosis-1.1529263
Mean501247.87
Median Absolute Deviation (MAD)617
Skewness0.1272818
Sum2.5062393 × 108
Variance539670.77
MonotonicityStrictly increasing
2023-12-10T18:48:00.331611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500000 1
 
0.2%
501638 1
 
0.2%
501699 1
 
0.2%
501684 1
 
0.2%
501683 1
 
0.2%
501680 1
 
0.2%
501678 1
 
0.2%
501662 1
 
0.2%
501653 1
 
0.2%
501647 1
 
0.2%
Other values (490) 490
98.0%
ValueCountFrequency (%)
500000 1
0.2%
500001 1
0.2%
500003 1
0.2%
500018 1
0.2%
500019 1
0.2%
500028 1
0.2%
500030 1
0.2%
500046 1
0.2%
500054 1
0.2%
500057 1
0.2%
ValueCountFrequency (%)
502598 1
0.2%
502595 1
0.2%
502586 1
0.2%
502583 1
0.2%
502581 1
0.2%
502578 1
0.2%
502569 1
0.2%
502567 1
0.2%
502542 1
0.2%
502538 1
0.2%

CTY_NM
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
shanghai
500 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowshanghai
2nd rowshanghai
3rd rowshanghai
4th rowshanghai
5th rowshanghai

Common Values

ValueCountFrequency (%)
shanghai 500
100.0%

Length

2023-12-10T18:48:00.628166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:48:00.843192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
shanghai 500
100.0%

RSTRNT_NM
Text

UNIQUE 

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2023-12-10T18:48:01.152628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length16
Mean length8.316
Min length2

Characters and Unicode

Total characters4158
Distinct characters748
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique500 ?
Unique (%)100.0%

Sample

1st row阿凡提大饭店
2nd rowSasha's萨莎
3rd row1221餐馆
4th rowVa Bene华万意意大利餐厅
5th rowWagas 沃歌斯(中信泰富店)
ValueCountFrequency (%)
阿凡提大饭店 1
 
0.2%
避风塘(金玉兰 1
 
0.2%
台园圆圆香云吞(徐汇店 1
 
0.2%
新香园港式茶餐厅(乌鲁木齐中路店 1
 
0.2%
华华川菜馆 1
 
0.2%
苹果花园 1
 
0.2%
新吉士酒楼(虹桥店 1
 
0.2%
恒隆酒楼(春申店 1
 
0.2%
小四川鱼庄(招远路总店 1
 
0.2%
黑三娘(仙霞店 1
 
0.2%
Other values (502) 502
98.0%
2023-12-10T18:48:02.073107image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
320
 
7.7%
( 272
 
6.5%
) 272
 
6.5%
90
 
2.2%
83
 
2.0%
68
 
1.6%
65
 
1.6%
62
 
1.5%
61
 
1.5%
49
 
1.2%
Other values (738) 2816
67.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3454
83.1%
Open Punctuation 272
 
6.5%
Close Punctuation 272
 
6.5%
Lowercase Letter 82
 
2.0%
Uppercase Letter 51
 
1.2%
Space Separator 12
 
0.3%
Decimal Number 10
 
0.2%
Other Punctuation 4
 
0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
320
 
9.3%
90
 
2.6%
83
 
2.4%
68
 
2.0%
65
 
1.9%
62
 
1.8%
61
 
1.8%
49
 
1.4%
43
 
1.2%
43
 
1.2%
Other values (685) 2570
74.4%
Lowercase Letter
ValueCountFrequency (%)
a 15
18.3%
e 12
14.6%
i 9
11.0%
s 8
9.8%
n 6
 
7.3%
c 4
 
4.9%
l 4
 
4.9%
o 3
 
3.7%
h 3
 
3.7%
t 3
 
3.7%
Other values (12) 15
18.3%
Uppercase Letter
ValueCountFrequency (%)
C 7
13.7%
T 4
 
7.8%
O 4
 
7.8%
D 3
 
5.9%
S 3
 
5.9%
A 3
 
5.9%
N 3
 
5.9%
K 3
 
5.9%
W 3
 
5.9%
L 2
 
3.9%
Other values (11) 16
31.4%
Decimal Number
ValueCountFrequency (%)
5 3
30.0%
0 2
20.0%
2 2
20.0%
1 2
20.0%
8 1
 
10.0%
Open Punctuation
ValueCountFrequency (%)
( 272
100.0%
Close Punctuation
ValueCountFrequency (%)
) 272
100.0%
Space Separator
ValueCountFrequency (%)
12
100.0%
Other Punctuation
ValueCountFrequency (%)
' 4
100.0%
Dash Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 3454
83.1%
Common 571
 
13.7%
Latin 133
 
3.2%

Most frequent character per script

Han
ValueCountFrequency (%)
320
 
9.3%
90
 
2.6%
83
 
2.4%
68
 
2.0%
65
 
1.9%
62
 
1.8%
61
 
1.8%
49
 
1.4%
43
 
1.2%
43
 
1.2%
Other values (685) 2570
74.4%
Latin
ValueCountFrequency (%)
a 15
 
11.3%
e 12
 
9.0%
i 9
 
6.8%
s 8
 
6.0%
C 7
 
5.3%
n 6
 
4.5%
c 4
 
3.0%
T 4
 
3.0%
O 4
 
3.0%
l 4
 
3.0%
Other values (33) 60
45.1%
Common
ValueCountFrequency (%)
( 272
47.6%
) 272
47.6%
12
 
2.1%
' 4
 
0.7%
5 3
 
0.5%
0 2
 
0.4%
2 2
 
0.4%
1 2
 
0.4%
8 1
 
0.2%
1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
CJK 3454
83.1%
ASCII 703
 
16.9%
Punctuation 1
 
< 0.1%

Most frequent character per block

CJK
ValueCountFrequency (%)
320
 
9.3%
90
 
2.6%
83
 
2.4%
68
 
2.0%
65
 
1.9%
62
 
1.8%
61
 
1.8%
49
 
1.4%
43
 
1.2%
43
 
1.2%
Other values (685) 2570
74.4%
ASCII
ValueCountFrequency (%)
( 272
38.7%
) 272
38.7%
a 15
 
2.1%
e 12
 
1.7%
12
 
1.7%
i 9
 
1.3%
s 8
 
1.1%
C 7
 
1.0%
n 6
 
0.9%
' 4
 
0.6%
Other values (42) 86
 
12.2%
Punctuation
ValueCountFrequency (%)
1
100.0%

RSTRNT_TEL_NO
Real number (ℝ)

MISSING 

Distinct480
Distinct (%)98.4%
Missing12
Missing (%)2.4%
Infinite0
Infinite (%)0.0%
Mean3.5137406 × 1010
Minimum2.1223518 × 109
Maximum6.8868889 × 1011
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T18:48:02.446101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2.1223518 × 109
5-th percentile2.1529264 × 109
Q12.1621781 × 109
median2.1632305 × 109
Q32.1646594 × 109
95-th percentile5.0471235 × 1011
Maximum6.8868889 × 1011
Range6.8656654 × 1011
Interquartile range (IQR)2481305

Descriptive statistics

Standard deviation1.3610182 × 1011
Coefficient of variation (CV)3.8734168
Kurtosis14.035169
Mean3.5137406 × 1010
Median Absolute Deviation (MAD)1160998.5
Skewness3.9727987
Sum1.7147054 × 1013
Variance1.8523706 × 1022
MonotonicityNot monotonic
2023-12-10T18:48:02.779422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2153965000 3
 
0.6%
4008209777 3
 
0.6%
4001917917 2
 
0.4%
2163739458 2
 
0.4%
2163403067 2
 
0.4%
504712348908 2
 
0.4%
2154370078 1
 
0.2%
2162293377 1
 
0.2%
2164018048 1
 
0.2%
2164012583 1
 
0.2%
Other values (470) 470
94.0%
(Missing) 12
 
2.4%
ValueCountFrequency (%)
2122351753 1
0.2%
2132070213 1
0.2%
2133134888 1
0.2%
2150306659 1
0.2%
2150471234 1
0.2%
2150471266 1
0.2%
2150471917 1
0.2%
2150478838 1
0.2%
2150490703 1
0.2%
2150714876 1
0.2%
ValueCountFrequency (%)
688688888789 1
0.2%
688688888728 1
0.2%
641555882756 1
0.2%
641555882431 1
0.2%
641511115217 1
0.2%
641511115216 1
0.2%
641511115212 1
0.2%
640665188608 1
0.2%
633518887368 1
0.2%
627588884814 1
0.2%
Distinct481
Distinct (%)96.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2023-12-10T18:48:03.205402image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length46
Median length37
Mean length20.3
Min length5

Characters and Unicode

Total characters10150
Distinct characters545
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique473 ?
Unique (%)94.6%

Sample

1st row虹口区曲阳路775号天山宾馆B1楼
2nd row徐汇区东平路11号(近衡山路)
3rd row长宁区延安西路1221号(近番禺路)
4th row卢湾区太仓路181弄新天地北里7号楼2楼(近马当路)
5th row静安区南京西路1168号中信泰富B1楼(近陕西北路)
ValueCountFrequency (%)
多家连锁店 11
 
2.2%
浦东新区世纪大道88号金茂君悦大酒店56楼(近二号线陆家嘴站 3
 
0.6%
静安区威海路500号四季酒店2楼(近石门一路 3
 
0.6%
徐汇区汾阳路150号(近桃江路 2
 
0.4%
静安区石门二路19号(近南京西路 2
 
0.4%
徐汇区天平路220号(近康平路 2
 
0.4%
黄浦区豫园路98号(近绿波廊 2
 
0.4%
黄浦区九江路555号王宝和大酒店2楼(近福建中路 2
 
0.4%
2
 
0.4%
闵行区虹梅路3293号(近延安西路 1
 
0.2%
Other values (472) 472
94.0%
2023-12-10T18:48:03.896398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
938
 
9.2%
512
 
5.0%
496
 
4.9%
) 482
 
4.7%
( 482
 
4.7%
404
 
4.0%
1 373
 
3.7%
2 209
 
2.1%
8 202
 
2.0%
5 182
 
1.8%
Other values (535) 5870
57.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7345
72.4%
Decimal Number 1755
 
17.3%
Close Punctuation 485
 
4.8%
Open Punctuation 485
 
4.8%
Dash Punctuation 33
 
0.3%
Uppercase Letter 28
 
0.3%
Other Punctuation 17
 
0.2%
Space Separator 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
938
 
12.8%
512
 
7.0%
496
 
6.8%
404
 
5.5%
160
 
2.2%
155
 
2.1%
154
 
2.1%
135
 
1.8%
114
 
1.6%
109
 
1.5%
Other values (504) 4168
56.7%
Decimal Number
ValueCountFrequency (%)
1 373
21.3%
2 209
11.9%
8 202
11.5%
5 182
10.4%
3 169
9.6%
0 168
9.6%
4 119
 
6.8%
7 112
 
6.4%
6 111
 
6.3%
9 110
 
6.3%
Uppercase Letter
ValueCountFrequency (%)
B 12
42.9%
A 5
17.9%
H 2
 
7.1%
O 2
 
7.1%
S 2
 
7.1%
E 2
 
7.1%
T 1
 
3.6%
F 1
 
3.6%
C 1
 
3.6%
Other Punctuation
ValueCountFrequency (%)
, 7
41.2%
4
23.5%
3
17.6%
/ 1
 
5.9%
: 1
 
5.9%
1
 
5.9%
Close Punctuation
ValueCountFrequency (%)
) 482
99.4%
3
 
0.6%
Open Punctuation
ValueCountFrequency (%)
( 482
99.4%
3
 
0.6%
Dash Punctuation
ValueCountFrequency (%)
- 33
100.0%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 7345
72.4%
Common 2777
 
27.4%
Latin 28
 
0.3%

Most frequent character per script

Han
ValueCountFrequency (%)
938
 
12.8%
512
 
7.0%
496
 
6.8%
404
 
5.5%
160
 
2.2%
155
 
2.1%
154
 
2.1%
135
 
1.8%
114
 
1.6%
109
 
1.5%
Other values (504) 4168
56.7%
Common
ValueCountFrequency (%)
) 482
17.4%
( 482
17.4%
1 373
13.4%
2 209
7.5%
8 202
7.3%
5 182
 
6.6%
3 169
 
6.1%
0 168
 
6.0%
4 119
 
4.3%
7 112
 
4.0%
Other values (12) 279
10.0%
Latin
ValueCountFrequency (%)
B 12
42.9%
A 5
17.9%
H 2
 
7.1%
O 2
 
7.1%
S 2
 
7.1%
E 2
 
7.1%
T 1
 
3.6%
F 1
 
3.6%
C 1
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
CJK 7345
72.4%
ASCII 2791
 
27.5%
None 14
 
0.1%

Most frequent character per block

CJK
ValueCountFrequency (%)
938
 
12.8%
512
 
7.0%
496
 
6.8%
404
 
5.5%
160
 
2.2%
155
 
2.1%
154
 
2.1%
135
 
1.8%
114
 
1.6%
109
 
1.5%
Other values (504) 4168
56.7%
ASCII
ValueCountFrequency (%)
) 482
17.3%
( 482
17.3%
1 373
13.4%
2 209
7.5%
8 202
7.2%
5 182
 
6.5%
3 169
 
6.1%
0 168
 
6.0%
4 119
 
4.3%
7 112
 
4.0%
Other values (16) 293
10.5%
None
ValueCountFrequency (%)
4
28.6%
3
21.4%
3
21.4%
3
21.4%
1
 
7.1%

Interactions

2023-12-10T18:47:58.508824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:58.066273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:58.686302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:58.292477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T18:48:04.071935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
OVSEA_RSTRNT_IDRSTRNT_TEL_NO
OVSEA_RSTRNT_ID1.0000.161
RSTRNT_TEL_NO0.1611.000
2023-12-10T18:48:04.220443image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
OVSEA_RSTRNT_IDRSTRNT_TEL_NO
OVSEA_RSTRNT_ID1.000-0.055
RSTRNT_TEL_NO-0.0551.000

Missing values

2023-12-10T18:47:58.937833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T18:47:59.151065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

OVSEA_RSTRNT_IDCTY_NMRSTRNT_NMRSTRNT_TEL_NORSTRNT_ADDR
0500000shanghai阿凡提大饭店2165559604虹口区曲阳路775号天山宾馆B1楼
1500001shanghaiSasha's萨莎2164746628徐汇区东平路11号(近衡山路)
2500003shanghai1221餐馆2162132441长宁区延安西路1221号(近番禺路)
3500018shanghaiVa Bene华万意意大利餐厅2163112211卢湾区太仓路181弄新天地北里7号楼2楼(近马当路)
4500019shanghaiWagas 沃歌斯(中信泰富店)2152925228静安区南京西路1168号中信泰富B1楼(近陕西北路)
5500028shanghai阿山饭店2162686583长宁区虹桥路2378号(近动物园)
6500030shanghai艾迪多慕思2162488499静安区延安西路200号文艺宾馆新楼1楼(近乌鲁木齐北路)
7500046shanghai白家餐厅2164376915徐汇区宛平路189弄12号(近衡山路)
8500054shanghai半岛鱼翅海鲜2164189393徐汇区零陵路518号长航宾馆1-2楼(近东安路)
9500057shanghai宝莱纳餐厅(新天地店)2163203935卢湾区太仓路181弄新天地北里19-20号(近马当路)
OVSEA_RSTRNT_IDCTY_NMRSTRNT_NMRSTRNT_TEL_NORSTRNT_ADDR
490502538shanghai申申面包房(复兴西路店)2164373493徐汇区复兴西路8号(近淮海中路)
491502542shanghai上雅铁板烧(古北店)2162569390徐汇区黄金城道851号
492502567shanghai荣日本料理(兴义店)2162789778长宁区兴义路48号新世纪广场B座1楼(近娄山关路)
493502569shanghai大娘水饺(天钥桥路店)2164649381徐汇区天钥桥路57号(近肇嘉浜路)
494502578shanghai老姜烧烤13391285089浦东新区东方路蓝村路车站(东方路蓝村路)
495502581shanghai百味瓦罐煨汤(江西北路店)2163574841虹口区江西北路208号(近七浦路)
496502583shanghai美新点心店2162470030静安区陕西北路105号(近威海路)
497502586shanghai山梁桂林米粉(仙霞路店)13764527052长宁区仙霞路179号(近娄山关路)
498502595shanghai面包新语(美罗城店)2164267307徐汇区肇嘉浜路1111号美罗城1-27店铺(近漕溪北路,东方商厦,港汇广场,六百,汇金百货)
499502598shanghai德兴面馆(福建中路店)2163602866黄浦区福建中路529号(近北京东路)