Overview

Dataset statistics

Number of variables4
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.4 KiB
Average record size in memory35.3 B

Variable types

Numeric2
Text2

Alerts

sake_id has unique valuesUnique
sake_nm has unique valuesUnique
sake_pc has 2 (2.0%) zerosZeros

Reproduction

Analysis started2023-12-10 09:44:03.678465
Analysis finished2023-12-10 09:44:05.277637
Duration1.6 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

sake_id
Real number (ℝ)

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16060.44
Minimum15609
Maximum29021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T18:44:05.791584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum15609
5-th percentile15614.95
Q115635.75
median15661.5
Q315686.25
95-th percentile15706.05
Maximum29021
Range13412
Interquartile range (IQR)50.5

Descriptive statistics

Standard deviation2290.7638
Coefficient of variation (CV)0.14263394
Kurtosis29.88781
Mean16060.44
Median Absolute Deviation (MAD)25.5
Skewness5.5932971
Sum1606044
Variance5247599
MonotonicityNot monotonic
2023-12-10T18:44:06.141050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15609 1
 
1.0%
15673 1
 
1.0%
15683 1
 
1.0%
15682 1
 
1.0%
15681 1
 
1.0%
15680 1
 
1.0%
15679 1
 
1.0%
15678 1
 
1.0%
15677 1
 
1.0%
15676 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
15609 1
1.0%
15611 1
1.0%
15612 1
1.0%
15613 1
1.0%
15614 1
1.0%
15615 1
1.0%
15617 1
1.0%
15618 1
1.0%
15619 1
1.0%
15620 1
1.0%
ValueCountFrequency (%)
29021 1
1.0%
29020 1
1.0%
29019 1
1.0%
15708 1
1.0%
15707 1
1.0%
15706 1
1.0%
15705 1
1.0%
15704 1
1.0%
15703 1
1.0%
15702 1
1.0%

sake_nm
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T18:44:06.653593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length19
Mean length11.77
Min length6

Characters and Unicode

Total characters1177
Distinct characters304
Distinct categories9 ?
Distinct scripts5 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st row十四代 (じゅうよんだい)
2nd row神聖 純米大吟醸 山田錦氷温囲い 1.8L
3rd row而今 (じこん)
4th rowNo.6 (ナンバーシックス)
5th row花邑 (はなむら)
ValueCountFrequency (%)
純米大吟醸 3
 
1.4%
神聖 3
 
1.4%
720ml 2
 
0.9%
山田錦 2
 
0.9%
十四代 1
 
0.5%
(みやかんばい) 1
 
0.5%
(うごのつき) 1
 
0.5%
羽根屋 1
 
0.5%
(はねや) 1
 
0.5%
春霞 1
 
0.5%
Other values (195) 195
92.4%
2023-12-10T18:44:07.442637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
111
 
9.4%
102
 
8.7%
102
 
8.7%
35
 
3.0%
31
 
2.6%
27
 
2.3%
26
 
2.2%
23
 
2.0%
22
 
1.9%
20
 
1.7%
Other values (294) 678
57.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 838
71.2%
Space Separator 111
 
9.4%
Close Punctuation 102
 
8.7%
Open Punctuation 102
 
8.7%
Decimal Number 9
 
0.8%
Lowercase Letter 5
 
0.4%
Uppercase Letter 5
 
0.4%
Modifier Letter 3
 
0.3%
Other Punctuation 2
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
35
 
4.2%
31
 
3.7%
27
 
3.2%
26
 
3.1%
23
 
2.7%
22
 
2.6%
20
 
2.4%
19
 
2.3%
17
 
2.0%
14
 
1.7%
Other values (275) 604
72.1%
Decimal Number
ValueCountFrequency (%)
2
22.2%
2
22.2%
2
22.2%
1
11.1%
1
11.1%
6 1
11.1%
Uppercase Letter
ValueCountFrequency (%)
N 2
40.0%
D 1
20.0%
A 1
20.0%
1
20.0%
Lowercase Letter
ValueCountFrequency (%)
2
40.0%
2
40.0%
o 1
20.0%
Other Punctuation
ValueCountFrequency (%)
1
50.0%
. 1
50.0%
Space Separator
ValueCountFrequency (%)
111
100.0%
Close Punctuation
ValueCountFrequency (%)
102
100.0%
Open Punctuation
ValueCountFrequency (%)
102
100.0%
Modifier Letter
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hiragana 527
44.8%
Common 329
28.0%
Han 287
24.4%
Katakana 24
 
2.0%
Latin 10
 
0.8%

Most frequent character per script

Han
ValueCountFrequency (%)
8
 
2.8%
7
 
2.4%
6
 
2.1%
4
 
1.4%
4
 
1.4%
4
 
1.4%
4
 
1.4%
3
 
1.0%
3
 
1.0%
3
 
1.0%
Other values (192) 241
84.0%
Hiragana
ValueCountFrequency (%)
35
 
6.6%
31
 
5.9%
27
 
5.1%
26
 
4.9%
23
 
4.4%
22
 
4.2%
20
 
3.8%
19
 
3.6%
17
 
3.2%
14
 
2.7%
Other values (58) 293
55.6%
Katakana
ValueCountFrequency (%)
3
12.5%
2
 
8.3%
2
 
8.3%
2
 
8.3%
2
 
8.3%
2
 
8.3%
2
 
8.3%
2
 
8.3%
1
 
4.2%
1
 
4.2%
Other values (5) 5
20.8%
Common
ValueCountFrequency (%)
111
33.7%
102
31.0%
102
31.0%
3
 
0.9%
2
 
0.6%
2
 
0.6%
2
 
0.6%
1
 
0.3%
1
 
0.3%
1
 
0.3%
Other values (2) 2
 
0.6%
Latin
ValueCountFrequency (%)
2
20.0%
2
20.0%
N 2
20.0%
D 1
10.0%
A 1
10.0%
1
10.0%
o 1
10.0%

Most occurring blocks

ValueCountFrequency (%)
Hiragana 527
44.8%
CJK 287
24.4%
None 218
18.5%
ASCII 118
 
10.0%
Katakana 27
 
2.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
111
94.1%
N 2
 
1.7%
D 1
 
0.8%
A 1
 
0.8%
o 1
 
0.8%
. 1
 
0.8%
6 1
 
0.8%
None
ValueCountFrequency (%)
102
46.8%
102
46.8%
2
 
0.9%
2
 
0.9%
2
 
0.9%
2
 
0.9%
2
 
0.9%
1
 
0.5%
1
 
0.5%
1
 
0.5%
Hiragana
ValueCountFrequency (%)
35
 
6.6%
31
 
5.9%
27
 
5.1%
26
 
4.9%
23
 
4.4%
22
 
4.2%
20
 
3.8%
19
 
3.6%
17
 
3.2%
14
 
2.7%
Other values (58) 293
55.6%
CJK
ValueCountFrequency (%)
8
 
2.8%
7
 
2.4%
6
 
2.1%
4
 
1.4%
4
 
1.4%
4
 
1.4%
4
 
1.4%
3
 
1.0%
3
 
1.0%
3
 
1.0%
Other values (192) 241
84.0%
Katakana
ValueCountFrequency (%)
3
11.1%
3
11.1%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
Other values (6) 6
22.2%
Distinct92
Distinct (%)92.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T18:44:07.826146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length13
Mean length9.52
Min length7

Characters and Unicode

Total characters952
Distinct characters198
Distinct categories4 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique87 ?
Unique (%)87.0%

Sample

1st row山形 | 高木酒造
2nd row株式会社山本本家
3rd row三重 | 木屋正酒造
4th row秋田 | 新政酒造
5th row秋田 | 両関酒造
ValueCountFrequency (%)
97
32.9%
秋田 12
 
4.1%
福島 7
 
2.4%
愛知 6
 
2.0%
山形 6
 
2.0%
山口 5
 
1.7%
宮城 5
 
1.7%
新政酒造 4
 
1.4%
三重 4
 
1.4%
長野 4
 
1.4%
Other values (117) 145
49.2%
2023-12-10T18:44:08.477867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
195
20.5%
| 97
 
10.2%
86
 
9.0%
79
 
8.3%
22
 
2.3%
17
 
1.8%
14
 
1.5%
14
 
1.5%
13
 
1.4%
13
 
1.4%
Other values (188) 402
42.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 659
69.2%
Space Separator 195
 
20.5%
Math Symbol 97
 
10.2%
Modifier Letter 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
86
 
13.1%
79
 
12.0%
22
 
3.3%
17
 
2.6%
14
 
2.1%
14
 
2.1%
13
 
2.0%
13
 
2.0%
12
 
1.8%
11
 
1.7%
Other values (185) 378
57.4%
Space Separator
ValueCountFrequency (%)
195
100.0%
Math Symbol
ValueCountFrequency (%)
| 97
100.0%
Modifier Letter
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 647
68.0%
Common 293
30.8%
Hiragana 8
 
0.8%
Katakana 4
 
0.4%

Most frequent character per script

Han
ValueCountFrequency (%)
86
 
13.3%
79
 
12.2%
22
 
3.4%
17
 
2.6%
14
 
2.2%
14
 
2.2%
13
 
2.0%
13
 
2.0%
12
 
1.9%
11
 
1.7%
Other values (177) 366
56.6%
Hiragana
ValueCountFrequency (%)
4
50.0%
2
25.0%
1
 
12.5%
1
 
12.5%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Common
ValueCountFrequency (%)
195
66.6%
| 97
33.1%
1
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
CJK 647
68.0%
ASCII 292
30.7%
Hiragana 8
 
0.8%
Katakana 5
 
0.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
195
66.8%
| 97
33.2%
CJK
ValueCountFrequency (%)
86
 
13.3%
79
 
12.2%
22
 
3.4%
17
 
2.6%
14
 
2.2%
14
 
2.2%
13
 
2.0%
13
 
2.0%
12
 
1.9%
11
 
1.7%
Other values (177) 366
56.6%
Hiragana
ValueCountFrequency (%)
4
50.0%
2
25.0%
1
 
12.5%
1
 
12.5%
Katakana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%

sake_pc
Real number (ℝ)

ZEROS 

Distinct73
Distinct (%)73.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24343.11
Minimum0
Maximum328000
Zeros2
Zeros (%)2.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T18:44:08.837280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3424.2
Q15971.75
median10800
Q321950
95-th percentile108000
Maximum328000
Range328000
Interquartile range (IQR)15978.25

Descriptive statistics

Standard deviation42797.3
Coefficient of variation (CV)1.7580868
Kurtosis27.044851
Mean24343.11
Median Absolute Deviation (MAD)5515.5
Skewness4.6351477
Sum2434311
Variance1.8316089 × 109
MonotonicityNot monotonic
2023-12-10T18:44:09.182467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10800 9
 
9.0%
32400 5
 
5.0%
12960 3
 
3.0%
8640 3
 
3.0%
50900 2
 
2.0%
13845 2
 
2.0%
11000 2
 
2.0%
21600 2
 
2.0%
20000 2
 
2.0%
0 2
 
2.0%
Other values (63) 68
68.0%
ValueCountFrequency (%)
0 2
2.0%
2354 1
1.0%
2970 1
1.0%
3200 1
1.0%
3436 1
1.0%
3500 1
1.0%
3672 1
1.0%
3780 1
1.0%
3800 1
1.0%
3888 2
2.0%
ValueCountFrequency (%)
328000 1
1.0%
171951 1
1.0%
148000 1
1.0%
120960 1
1.0%
108000 2
2.0%
76380 1
1.0%
75600 1
1.0%
64800 1
1.0%
54000 1
1.0%
50900 2
2.0%

Interactions

2023-12-10T18:44:04.664395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:44:04.300179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:44:04.841662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:44:04.470957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T18:44:09.355521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
sake_idsake_nmsake_region_nmsake_pc
sake_id1.0001.0001.0000.626
sake_nm1.0001.0001.0001.000
sake_region_nm1.0001.0001.0000.403
sake_pc0.6261.0000.4031.000
2023-12-10T18:44:09.545091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
sake_idsake_pc
sake_id1.000-0.177
sake_pc-0.1771.000

Missing values

2023-12-10T18:44:05.056137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T18:44:05.214998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

sake_idsake_nmsake_region_nmsake_pc
015609十四代 (じゅうよんだい)山形 | 高木酒造328000
129019神聖 純米大吟醸 山田錦氷温囲い 1.8L株式会社山本本家50900
215611而今 (じこん)三重 | 木屋正酒造108000
315612No.6 (ナンバーシックス)秋田 | 新政酒造17280
415613花邑 (はなむら)秋田 | 両関酒造32400
515614川中島 幻舞 (かわなかじま げんぶ)長野 | 酒千蔵野3888
615615信州亀齢 (きれい)長野 | 岡崎酒造3436
729020神聖 純米大吟醸 山田錦 氷温囲い 720ml株式会社山本本家30540
815617陽乃鳥 (ひのとり)秋田 | 新政酒造16200
915618鳳凰美田 (ほうおうびでん)栃木 | 小林酒造27000
sake_idsake_nmsake_region_nmsake_pc
9015699勝山 (かつやま)宮城 | 仙台伊澤家 勝山酒造32400
9115700流輝 (るか)群馬 | 松屋酒造2970
9215701手取川 (てどりがわ)石川 | 吉田酒造店12790
9315702遊穂 (ゆうほ)石川 | 御祖酒造3500
9415703田光 (たびか)三重 | 早川酒造8802
9515704仙介 (せんすけ)兵庫 | 泉酒造11016
9615705五橋 (ごきょう)山口 | 酒井酒造11880
9715706豊盃 (ほうはい)青森 | 三浦酒造店35999
9815707花の香 (はなのか)熊本 | 花の香酒造5400
9915708にいだしぜんしゅ (にいだしぜんしゅ)福島 | 仁井田本家3780