Overview

Dataset statistics

Number of variables5
Number of observations199
Missing cells0
Missing cells (%)0.0%
Duplicate rows17
Duplicate rows (%)8.5%
Total size in memory8.5 KiB
Average record size in memory43.7 B

Variable types

Categorical2
Text1
Numeric2

Alerts

Dataset has 17 (8.5%) duplicate rowsDuplicates
4212 is highly overall correlated with 5High correlation
5 is highly overall correlated with 4212High correlation

Reproduction

Analysis started2023-12-10 06:48:27.833759
Analysis finished2023-12-10 06:48:28.612371
Duration0.78 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

5
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
6
57 
5
49 
2
37 
1
32 
3
24 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5
2nd row2
3rd row3
4th row3
5th row5

Common Values

ValueCountFrequency (%)
6 57
28.6%
5 49
24.6%
2 37
18.6%
1 32
16.1%
3 24
12.1%

Length

2023-12-10T15:48:28.686855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:48:28.802543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6 57
28.6%
5 49
24.6%
2 37
18.6%
1 32
16.1%
3 24
12.1%

2021
Categorical

Distinct8
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2021
142 
2021H1
 
12
2019H1
 
10
2020H2
 
9
2018H1
 
8
Other values (3)
18 

Length

Max length6
Median length4
Mean length4.5728643
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021
2nd row2021
3rd row2021
4th row2021
5th row2021

Common Values

ValueCountFrequency (%)
2021 142
71.4%
2021H1 12
 
6.0%
2019H1 10
 
5.0%
2020H2 9
 
4.5%
2018H1 8
 
4.0%
2020H1 8
 
4.0%
2018H2 5
 
2.5%
2019H2 5
 
2.5%

Length

2023-12-10T15:48:28.954122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:48:29.122120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2021 142
71.4%
2021h1 12
 
6.0%
2019h1 10
 
5.0%
2020h2 9
 
4.5%
2018h1 8
 
4.0%
2020h1 8
 
4.0%
2018h2 5
 
2.5%
2019h2 5
 
2.5%
Distinct126
Distinct (%)63.3%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2023-12-10T15:48:29.434668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length8
Mean length6.5728643
Min length1

Characters and Unicode

Total characters1308
Distinct characters15
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique106 ?
Unique (%)53.3%

Sample

1st row11680720
2nd row17032
3rd row1109075010013
4th row1116073030021
5th row11500590
ValueCountFrequency (%)
a 24
 
12.1%
dd 21
 
10.6%
o 12
 
6.0%
11680531 3
 
1.5%
11305660 3
 
1.5%
11500520 2
 
1.0%
11680750 2
 
1.0%
11680640 2
 
1.0%
11680510 2
 
1.0%
11500593 2
 
1.0%
Other values (116) 126
63.3%
2023-12-10T15:48:29.926890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 308
23.5%
1 236
18.0%
5 125
9.6%
6 110
 
8.4%
3 86
 
6.6%
2 78
 
6.0%
4 68
 
5.2%
8 61
 
4.7%
7 54
 
4.1%
D 42
 
3.2%
Other values (5) 140
10.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1166
89.1%
Uppercase Letter 78
 
6.0%
Other Letter 64
 
4.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 308
26.4%
1 236
20.2%
5 125
10.7%
6 110
 
9.4%
3 86
 
7.4%
2 78
 
6.7%
4 68
 
5.8%
8 61
 
5.2%
7 54
 
4.6%
9 40
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
D 42
53.8%
A 24
30.8%
O 12
 
15.4%
Other Letter
ValueCountFrequency (%)
32
50.0%
32
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1166
89.1%
Latin 78
 
6.0%
Hangul 64
 
4.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 308
26.4%
1 236
20.2%
5 125
10.7%
6 110
 
9.4%
3 86
 
7.4%
2 78
 
6.7%
4 68
 
5.8%
8 61
 
5.2%
7 54
 
4.6%
9 40
 
3.4%
Latin
ValueCountFrequency (%)
D 42
53.8%
A 24
30.8%
O 12
 
15.4%
Hangul
ValueCountFrequency (%)
32
50.0%
32
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1244
95.1%
Hangul 64
 
4.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 308
24.8%
1 236
19.0%
5 125
10.0%
6 110
 
8.8%
3 86
 
6.9%
2 78
 
6.3%
4 68
 
5.5%
8 61
 
4.9%
7 54
 
4.3%
D 42
 
3.4%
Other values (3) 76
 
6.1%
Hangul
ValueCountFrequency (%)
32
50.0%
32
50.0%

4212
Real number (ℝ)

HIGH CORRELATION 

Distinct130
Distinct (%)65.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1601.1809
Minimum1
Maximum5011
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-10T15:48:30.093594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q116.5
median597
Q33033.5
95-th percentile4731.4
Maximum5011
Range5010
Interquartile range (IQR)3017

Descriptive statistics

Standard deviation1810.7965
Coefficient of variation (CV)1.1309131
Kurtosis-1.2241557
Mean1601.1809
Median Absolute Deviation (MAD)595
Skewness0.63991714
Sum318635
Variance3278984
MonotonicityNot monotonic
2023-12-10T15:48:30.240605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 12
 
6.0%
2 7
 
3.5%
8 5
 
2.5%
9 4
 
2.0%
27 4
 
2.0%
3 4
 
2.0%
13 3
 
1.5%
2716 3
 
1.5%
6 3
 
1.5%
1114 3
 
1.5%
Other values (120) 151
75.9%
ValueCountFrequency (%)
1 12
6.0%
2 7
3.5%
3 4
 
2.0%
4 2
 
1.0%
5 1
 
0.5%
6 3
 
1.5%
7 1
 
0.5%
8 5
2.5%
9 4
 
2.0%
10 2
 
1.0%
ValueCountFrequency (%)
5011 1
 
0.5%
4872 1
 
0.5%
4836 3
1.5%
4812 1
 
0.5%
4792 1
 
0.5%
4776 1
 
0.5%
4753 2
1.0%
4729 3
1.5%
4723 1
 
0.5%
4615 1
 
0.5%

9.29
Real number (ℝ)

Distinct118
Distinct (%)59.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.551106
Minimum0
Maximum66.04
Zeros1
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-10T15:48:30.392286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median10.4
Q314.51
95-th percentile19.585
Maximum66.04
Range66.04
Interquartile range (IQR)12.51

Descriptive statistics

Standard deviation9.046495
Coefficient of variation (CV)0.85739784
Kurtosis12.959039
Mean10.551106
Median Absolute Deviation (MAD)5.46
Skewness2.6923948
Sum2099.67
Variance81.839073
MonotonicityNot monotonic
2023-12-10T15:48:30.556137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.0 33
 
16.6%
1.0 24
 
12.1%
17.04 3
 
1.5%
13.17 3
 
1.5%
9.47 3
 
1.5%
16.36 3
 
1.5%
8.93 2
 
1.0%
54.89 2
 
1.0%
13.34 2
 
1.0%
9.9 2
 
1.0%
Other values (108) 122
61.3%
ValueCountFrequency (%)
0.0 1
 
0.5%
1.0 24
12.1%
2.0 33
16.6%
4.85 1
 
0.5%
4.92 1
 
0.5%
5.14 1
 
0.5%
5.28 1
 
0.5%
6.78 1
 
0.5%
8.02 1
 
0.5%
8.15 1
 
0.5%
ValueCountFrequency (%)
66.04 1
0.5%
54.89 2
1.0%
51.47 1
0.5%
23.16 1
0.5%
23.04 1
0.5%
22.19 1
0.5%
21.95 1
0.5%
21.65 1
0.5%
19.81 1
0.5%
19.56 1
0.5%

Interactions

2023-12-10T15:48:28.219931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:48:28.019175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:48:28.322273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:48:28.125538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T15:48:30.689666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
5202142129.29
51.0000.6490.8730.640
20210.6491.0000.5850.559
42120.8730.5851.0000.545
9.290.6400.5590.5451.000
2023-12-10T15:48:30.810460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
20215
20211.0000.468
50.4681.000
2023-12-10T15:48:30.901486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
42129.2952021
42121.000-0.4730.5390.326
9.29-0.4731.0000.4790.341
50.5390.4791.0000.468
20210.3260.3410.4681.000

Missing values

2023-12-10T15:48:28.473603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T15:48:28.573135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

520211130554542129.29
052021116807204458.74
12202117032918.22
23202111090750100139510.61
33202111160730300211317.26
45202111500590304412.7
562018H2A28112.0
622021152861212.95
712021다사3900551014.92
85202111500615167023.16
912021다사592046703522.19
520211130554542129.29
18912021다사435048604216.81
1905202111305595279210.02
191520211168073097813.17
192320211123075010002917.23
19322021264902611.63
19462021H1DD11652.0
19512021다사6060416010.0
19612021다사5630612015.28
197520211130553445329.89
19862021H1O11142.0

Duplicate rows

Most frequently occurring

520211130554542129.29# duplicates
3520211130566048369.473
105202111680531271617.043
0320211123076020001278.882
15202111305595279210.022
2520211130562523369.92
4520211150052087913.342
55202111500540417016.662
65202111500590304412.72
75202111500593209914.62
85202111500611161611.92