Overview

Dataset statistics

Number of variables5
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.4 KiB
Average record size in memory45.3 B

Variable types

Categorical3
Numeric2

Alerts

py30_avg_value is highly overall correlated with py40_avg_value and 1 other fieldsHigh correlation
py40_avg_value is highly overall correlated with py30_avg_value and 1 other fieldsHigh correlation
base_year is highly overall correlated with py30_avg_value and 1 other fieldsHigh correlation
py30_avg_value has 2 (2.0%) zerosZeros
py40_avg_value has 5 (5.0%) zerosZeros

Reproduction

Analysis started2023-12-10 10:07:55.930009
Analysis finished2023-12-10 10:07:57.332000
Duration1.4 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

base_year
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2015
61 
2016
36 
2019
 
3

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015
2nd row2019
3rd row2015
4th row2015
5th row2015

Common Values

ValueCountFrequency (%)
2015 61
61.0%
2016 36
36.0%
2019 3
 
3.0%

Length

2023-12-10T19:07:57.453291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:07:57.709087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2015 61
61.0%
2016 36
36.0%
2019 3
 
3.0%

base_month
Categorical

Distinct4
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
6
31 
3
30 
9
20 
12
19 

Length

Max length2
Median length1
Mean length1.19
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row12
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
6 31
31.0%
3 30
30.0%
9 20
20.0%
12 19
19.0%

Length

2023-12-10T19:07:57.934870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:07:58.178394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6 31
31.0%
3 30
30.0%
9 20
20.0%
12 19
19.0%

gu_dc
Categorical

Distinct16
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
강서구
영도구
기장군
남구
중구
Other values (11)
65 

Length

Max length4
Median length3
Mean length2.81
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강서구
2nd row영도구
3rd row기장군
4th row남구
5th row동구

Common Values

ValueCountFrequency (%)
강서구 7
 
7.0%
영도구 7
 
7.0%
기장군 7
 
7.0%
남구 7
 
7.0%
중구 7
 
7.0%
동구 6
 
6.0%
동래구 6
 
6.0%
부산진구 6
 
6.0%
사상구 6
 
6.0%
사하구 6
 
6.0%
Other values (6) 35
35.0%

Length

2023-12-10T19:07:58.418606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
강서구 7
 
7.0%
영도구 7
 
7.0%
기장군 7
 
7.0%
남구 7
 
7.0%
중구 7
 
7.0%
동구 6
 
6.0%
동래구 6
 
6.0%
부산진구 6
 
6.0%
사상구 6
 
6.0%
사하구 6
 
6.0%
Other values (6) 35
35.0%

py30_avg_value
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct65
Distinct (%)65.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-4.946
Minimum-53.7
Maximum3.4
Zeros2
Zeros (%)2.0%
Negative58
Negative (%)58.0%
Memory size1.0 KiB
2023-12-10T19:07:58.675224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-53.7
5-th percentile-42.335
Q1-3.425
median-0.6
Q30.7
95-th percentile1.815
Maximum3.4
Range57.1
Interquartile range (IQR)4.125

Descriptive statistics

Standard deviation11.926953
Coefficient of variation (CV)-2.4114341
Kurtosis7.0791811
Mean-4.946
Median Absolute Deviation (MAD)1.85
Skewness-2.8016749
Sum-494.6
Variance142.25221
MonotonicityNot monotonic
2023-12-10T19:07:59.013503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.8 5
 
5.0%
0.3 4
 
4.0%
-0.1 4
 
4.0%
0.7 4
 
4.0%
1.8 3
 
3.0%
0.2 3
 
3.0%
0.1 3
 
3.0%
-2.9 3
 
3.0%
-2.2 3
 
3.0%
1.5 3
 
3.0%
Other values (55) 65
65.0%
ValueCountFrequency (%)
-53.7 1
1.0%
-46.9 1
1.0%
-45.3 1
1.0%
-43.7 1
1.0%
-43.0 1
1.0%
-42.3 1
1.0%
-34.4 1
1.0%
-33.2 1
1.0%
-14.4 1
1.0%
-13.2 1
1.0%
ValueCountFrequency (%)
3.4 1
 
1.0%
3.3 1
 
1.0%
2.3 1
 
1.0%
2.2 1
 
1.0%
2.1 1
 
1.0%
1.8 3
3.0%
1.7 1
 
1.0%
1.5 3
3.0%
1.4 2
2.0%
1.2 1
 
1.0%

py40_avg_value
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct56
Distinct (%)56.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-1.479
Minimum-20.8
Maximum5.2
Zeros5
Zeros (%)5.0%
Negative59
Negative (%)59.0%
Memory size1.0 KiB
2023-12-10T19:07:59.277233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-20.8
5-th percentile-11.105
Q1-1.7
median-0.3
Q30.425
95-th percentile2.105
Maximum5.2
Range26
Interquartile range (IQR)2.125

Descriptive statistics

Standard deviation4.1104522
Coefficient of variation (CV)-2.7792104
Kurtosis7.1663891
Mean-1.479
Median Absolute Deviation (MAD)1.05
Skewness-2.4577024
Sum-147.9
Variance16.895817
MonotonicityNot monotonic
2023-12-10T19:07:59.560257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.0 5
 
5.0%
-0.1 4
 
4.0%
-0.2 4
 
4.0%
0.1 4
 
4.0%
-0.8 4
 
4.0%
-0.7 3
 
3.0%
-0.4 3
 
3.0%
0.2 3
 
3.0%
0.9 3
 
3.0%
0.7 3
 
3.0%
Other values (46) 64
64.0%
ValueCountFrequency (%)
-20.8 1
1.0%
-16.5 1
1.0%
-14.0 1
1.0%
-13.3 1
1.0%
-13.1 1
1.0%
-11.0 1
1.0%
-9.6 1
1.0%
-6.6 1
1.0%
-5.7 2
2.0%
-5.5 1
1.0%
ValueCountFrequency (%)
5.2 1
1.0%
4.4 1
1.0%
2.9 1
1.0%
2.7 1
1.0%
2.2 1
1.0%
2.1 1
1.0%
1.9 1
1.0%
1.6 2
2.0%
1.5 2
2.0%
1.4 2
2.0%

Interactions

2023-12-10T19:07:56.748307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:07:56.367296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:07:56.896957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:07:56.602099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T19:07:59.757694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
base_yearbase_monthgu_dcpy30_avg_valuepy40_avg_value
base_year1.0000.3650.0000.6890.806
base_month0.3651.0000.0000.2430.420
gu_dc0.0000.0001.0000.5730.547
py30_avg_value0.6890.2430.5731.0000.850
py40_avg_value0.8060.4200.5470.8501.000
2023-12-10T19:07:59.965347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
gu_dcbase_yearbase_month
gu_dc1.0000.0000.000
base_year0.0001.0000.352
base_month0.0000.3521.000
2023-12-10T19:08:00.161171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
py30_avg_valuepy40_avg_valuebase_yearbase_monthgu_dc
py30_avg_value1.0000.7520.5870.1640.288
py40_avg_value0.7521.0000.6640.2600.223
base_year0.5870.6641.0000.3520.000
base_month0.1640.2600.3521.0000.000
gu_dc0.2880.2230.0000.0001.000

Missing values

2023-12-10T19:07:57.073105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T19:07:57.260387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

base_yearbase_monthgu_dcpy30_avg_valuepy40_avg_value
020153강서구1.12.1
1201912영도구-46.9-9.6
220153기장군-8.7-1.0
320153남구-1.6-0.9
420153동구2.11.6
520153동래구-5.5-5.2
620153부산진구1.2-1.4
7201912중구-45.3-2.9
820153사상구-2.1-0.7
920153사하구0.1-0.0
base_yearbase_monthgu_dcpy30_avg_valuepy40_avg_value
9020166서구-7.7-2.8
9120166수영구0.7-1.0
9220166연제구1.8-3.1
9320166영도구-2.9-1.8
9420166중구-34.4-13.3
9520166해운대구3.31.5
9620169강서구-43.0-16.5
9720169금정구-13.2-5.5
9820169기장군-4.9-5.7
9920169남구-0.90.4