Overview

Dataset statistics

Number of variables5
Number of observations100
Missing cells76
Missing cells (%)15.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.2 KiB
Average record size in memory43.3 B

Variable types

Categorical2
Numeric1
Text2

Alerts

stnd_year is highly overall correlated with heal_statHigh correlation
heal_stat is highly overall correlated with tms and 1 other fieldsHigh correlation
tms is highly overall correlated with heal_statHigh correlation
stnd_year is highly imbalanced (80.6%)Imbalance
trng_stat has 76 (76.0%) missing valuesMissing

Reproduction

Analysis started2023-12-10 10:07:13.119412
Analysis finished2023-12-10 10:07:14.174697
Duration1.06 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

stnd_year
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2019
97 
2021
 
3

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019
2nd row2021
3rd row2019
4th row2019
5th row2019

Common Values

ValueCountFrequency (%)
2019 97
97.0%
2021 3
 
3.0%

Length

2023-12-10T19:07:14.288301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:07:14.522314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2019 97
97.0%
2021 3
 
3.0%

tms
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.21
Minimum10
Maximum32
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T19:07:15.087730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile18
Q128
median30
Q330
95-th percentile30
Maximum32
Range22
Interquartile range (IQR)2

Descriptive statistics

Standard deviation3.8357568
Coefficient of variation (CV)0.13597153
Kurtosis7.6981488
Mean28.21
Median Absolute Deviation (MAD)0
Skewness-2.7971704
Sum2821
Variance14.71303
MonotonicityNot monotonic
2023-12-10T19:07:15.407573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
30 55
55.0%
28 33
33.0%
18 7
 
7.0%
32 3
 
3.0%
15 1
 
1.0%
10 1
 
1.0%
ValueCountFrequency (%)
10 1
 
1.0%
15 1
 
1.0%
18 7
 
7.0%
28 33
33.0%
30 55
55.0%
32 3
 
3.0%
ValueCountFrequency (%)
32 3
 
3.0%
30 55
55.0%
28 33
33.0%
18 7
 
7.0%
15 1
 
1.0%
10 1
 
1.0%
Distinct79
Distinct (%)79.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T19:07:15.893064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters600
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique58 ?
Unique (%)58.0%

Sample

1st row07-011
2nd row02-015
3rd row14-011
4th row14-008
5th row11-007
ValueCountFrequency (%)
04-007 2
 
2.0%
03-004 2
 
2.0%
02-009 2
 
2.0%
02-016 2
 
2.0%
14-003 2
 
2.0%
01-035 2
 
2.0%
02-031 2
 
2.0%
02-015 2
 
2.0%
15-008 2
 
2.0%
01-030 2
 
2.0%
Other values (69) 80
80.0%
2023-12-10T19:07:16.808079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 233
38.8%
- 100
16.7%
1 93
 
15.5%
2 40
 
6.7%
3 29
 
4.8%
4 24
 
4.0%
5 24
 
4.0%
7 18
 
3.0%
9 16
 
2.7%
6 12
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 500
83.3%
Dash Punctuation 100
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 233
46.6%
1 93
 
18.6%
2 40
 
8.0%
3 29
 
5.8%
4 24
 
4.8%
5 24
 
4.8%
7 18
 
3.6%
9 16
 
3.2%
6 12
 
2.4%
8 11
 
2.2%
Dash Punctuation
ValueCountFrequency (%)
- 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 233
38.8%
- 100
16.7%
1 93
 
15.5%
2 40
 
6.7%
3 29
 
4.8%
4 24
 
4.0%
5 24
 
4.0%
7 18
 
3.0%
9 16
 
2.7%
6 12
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 233
38.8%
- 100
16.7%
1 93
 
15.5%
2 40
 
6.7%
3 29
 
4.8%
4 24
 
4.0%
5 24
 
4.0%
7 18
 
3.0%
9 16
 
2.7%
6 12
 
2.0%

heal_stat
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
양호
67 
<NA>
33 

Length

Max length4
Median length2
Mean length2.66
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row양호
2nd row양호
3rd row양호
4th row양호
5th row양호

Common Values

ValueCountFrequency (%)
양호 67
67.0%
<NA> 33
33.0%

Length

2023-12-10T19:07:17.151460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:07:17.339363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
양호 67
67.0%
na 33
33.0%

trng_stat
Text

MISSING 

Distinct24
Distinct (%)100.0%
Missing76
Missing (%)76.0%
Memory size932.0 B
2023-12-10T19:07:17.598691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length23
Mean length18.75
Min length7

Characters and Unicode

Total characters450
Distinct characters22
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24 ?
Unique (%)100.0%

Sample

1st row스타트 16회
2nd row개인선회 10회, 편대선회 5회, 스타트 10회
3rd row개인선회 5회, 스타트 24회, 편대 5회
4th row개인선회 10회, 스타트 10회, 편대 5회
5th row개인선회 7회. 스타트 7회.
ValueCountFrequency (%)
개인선회 19
17.9%
스타트 18
17.0%
5회 9
 
8.5%
편대 8
 
7.5%
10회 8
 
7.5%
8회 5
 
4.7%
11회 3
 
2.8%
2회 3
 
2.8%
1회 2
 
1.9%
20회 2
 
1.9%
Other values (23) 29
27.4%
2023-12-10T19:07:18.184000image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
84
18.7%
81
18.0%
, 28
 
6.2%
25
 
5.6%
23
 
5.1%
23
 
5.1%
23
 
5.1%
22
 
4.9%
22
 
4.9%
1 21
 
4.7%
Other values (12) 98
21.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 247
54.9%
Decimal Number 87
 
19.3%
Space Separator 84
 
18.7%
Other Punctuation 32
 
7.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 21
24.1%
0 14
16.1%
5 13
14.9%
2 13
14.9%
4 7
 
8.0%
3 7
 
8.0%
8 6
 
6.9%
6 3
 
3.4%
7 2
 
2.3%
9 1
 
1.1%
Other Letter
ValueCountFrequency (%)
81
32.8%
25
 
10.1%
23
 
9.3%
23
 
9.3%
23
 
9.3%
22
 
8.9%
22
 
8.9%
14
 
5.7%
14
 
5.7%
Other Punctuation
ValueCountFrequency (%)
, 28
87.5%
. 4
 
12.5%
Space Separator
ValueCountFrequency (%)
84
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 247
54.9%
Common 203
45.1%

Most frequent character per script

Common
ValueCountFrequency (%)
84
41.4%
, 28
 
13.8%
1 21
 
10.3%
0 14
 
6.9%
5 13
 
6.4%
2 13
 
6.4%
4 7
 
3.4%
3 7
 
3.4%
8 6
 
3.0%
. 4
 
2.0%
Other values (3) 6
 
3.0%
Hangul
ValueCountFrequency (%)
81
32.8%
25
 
10.1%
23
 
9.3%
23
 
9.3%
23
 
9.3%
22
 
8.9%
22
 
8.9%
14
 
5.7%
14
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 247
54.9%
ASCII 203
45.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
84
41.4%
, 28
 
13.8%
1 21
 
10.3%
0 14
 
6.9%
5 13
 
6.4%
2 13
 
6.4%
4 7
 
3.4%
3 7
 
3.4%
8 6
 
3.0%
. 4
 
2.0%
Other values (3) 6
 
3.0%
Hangul
ValueCountFrequency (%)
81
32.8%
25
 
10.1%
23
 
9.3%
23
 
9.3%
23
 
9.3%
22
 
8.9%
22
 
8.9%
14
 
5.7%
14
 
5.7%

Interactions

2023-12-10T19:07:13.532622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T19:07:18.347649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
stnd_yeartmsracer_notrng_stat
stnd_year1.0000.0000.000NaN
tms0.0001.0000.0001.000
racer_no0.0000.0001.0001.000
trng_statNaN1.0001.0001.000
2023-12-10T19:07:18.495887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
stnd_yearheal_stat
stnd_year1.0001.000
heal_stat1.0001.000
2023-12-10T19:07:18.661282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
tmsstnd_yearheal_stat
tms1.0000.0001.000
stnd_year0.0001.0001.000
heal_stat1.0001.0001.000

Missing values

2023-12-10T19:07:13.853956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T19:07:14.108259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

stnd_yeartmsracer_noheal_stattrng_stat
020191507-011양호스타트 16회
120213202-015양호<NA>
220193014-011양호<NA>
320193014-008양호<NA>
420193011-007양호<NA>
520193011-006양호<NA>
620191003-004양호<NA>
720213202-008양호<NA>
820193009-002양호<NA>
920193013-001양호개인선회 10회, 편대선회 5회, 스타트 10회
stnd_yeartmsracer_noheal_stattrng_stat
9020192801-030<NA><NA>
9120192801-019<NA><NA>
9220192801-002<NA><NA>
9320191812-009양호개인선회5회, 스타트 30회 , 편대 1회
9420191804-006양호개인선회 3회,스타트 20회
9520191801-047양호개인선회 2회, 스타트 5회
9620191802-010양호개인선회 4회,스타트32회.
9720191809-002양호개인선회 4회. 스타트 16회
9820191815-013양호스타트 45회
9920191814-003양호<NA>