R-Squared/Coefficient of determination

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.

This metric is specifically designed for regression-based algorithms where the output is a real value.

Computing Coefficient of determination

Let x, y, y^ be the input, output and predicted output vectors in linear regression.

Lets assume there are n points So for a point with input x_i, output y_i and predicted output y_i^{^}

We can define error e_i be defined as e_i = y_i – y_i^{^}

Sum of Squares Total

Let say the mean of all the outputs in the training data be y_mean.

Hence

y_mean = (y₁+y₂+….y_n)/n

Then we can define sum of squares total as

Sum of Squares, SS_total = ((y₁ – y_mean)² + (y₂ – y_mean)²… (y_n – y_mean)²)

Lets assume a very naive model for which for every new query point, we just return the average y_mean for any new query point.

In this case Sum of Squares would be 0.

Sum of Squares Residual

Similarly we can define SS_residue as the sum of sqaures of difference between expected output and predicted output.

SS_residue = ((y₁ – y₁^{^})² + (y₂ – y₂^{^})²… (y_n – y_n^{^})²)

where e_i = y_i − f_i

For the naive model case,since every predicted value is mean

SS_residue=SS_total

Coefficient of determination

Hence we can say coefficient of determination is

R² can be seen to be related to the fraction of variance unexplained (FVU), since the second term compares the unexplained variance (variance of the model’s errors) with the total variance (of the data)

Now there are 4 cases related to this

Case 1: SS_residue=0 In this case R²=1,

which is the best case we could have.

Case 2: SS_residue=SS_total, In this case R²=0, which is the same as the naive mean model case.

Case 3:SS_residue < SS_total In this case R² lies between 0 and 1, which would be for most cases.

Case 4:SS_residue > SS_total In this case R² is less than 0.

This would be the worst case. This means our model is behaving worse than the naive mean model.

>>> from sklearn.metrics import r2_score
>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> r2_score(y_true, y_pred)  
0.948...
>>> y_true = [[0.5, 1], [-1, 1], [7, -6]]
>>> y_pred = [[0, 2], [-1, 2], [8, -5]]
>>> r2_score(y_true, y_pred,
...          multioutput='variance_weighted') 
0.938...
>>> y_true = [1, 2, 3]
>>> y_pred = [1, 2, 3]
>>> r2_score(y_true, y_pred)
1.0
>>> y_true = [1, 2, 3]
>>> y_pred = [2, 2, 2]
>>> r2_score(y_true, y_pred)
0.0
>>> y_true = [1, 2, 3]
>>> y_pred = [3, 2, 1]
>>> r2_score(y_true, y_pred)
-3.0

R-Squared/Coefficient of determination

Published by admin on October 20, 2019October 20, 2019

Computing Coefficient of determination

Sum of Squares Total

Sum of Squares Residual

Coefficient of determination

Like this:

0 Comments

What do you think?Cancel reply

Machine learning

Coursera Week 1: Intro to Neural Networks Notes

Distributed systems

Exploring Vector Databases

Machine learning

Best Practices for Building Machine Learning Applications

R-Squared/Coefficient of determination

Published by admin on October 20, 2019October 20, 2019

Computing Coefficient of determination

Sum of Squares Total

Sum of Squares Residual

Coefficient of determination

Like this:

0 Comments

What do you think?Cancel reply

Related Posts

Machine learning

Coursera Week 1: Intro to Neural Networks Notes

Distributed systems

Exploring Vector Databases

Machine learning

Best Practices for Building Machine Learning Applications

Subscribe For Latest Updates