Suppose you have taken the data for heights and weights of students in class and you want to figure out the correlation between heights and weights of students.

The relation between these two parameters is defined mathematically by one of the 3 ways

1) Covariance

2) Pearson Correlation Coefficient

3) Spearman's rank correlation coefficient

Each of these metrics has its own pros and cons so let's dive deeper into them.

 

Covariance

Image result for covariance funny

Covariance is a measure of the joint variability of two random variables.

If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, (i.e., the variables tend to show similar behavior), the covariance is positive.

In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, (i.e., the variables tend to show the opposite behavior), the covariance is negative.

The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables

 

Image result for covariance formula

In case we want to the covariance of a variable with respect to itself, it is always zero.

 

A simple way to understand covariance is by using this graph as an example

Image result for covariance graph

In this we can see stock market returns increase as economic growth increases and vice versa, hence we can say these two are positively correlated.

Further gasoline prices and world oil production decrease as the other increase and we can say they are negatively correlated.

The reason why monotonically increasing seems to have positive covariance is because for any point they will be either above mean or below mean and hence make overall covariance +tive.

Note
1) The magnitude of covariance has nothing to do with the amount of overlap. Let's say something has a covariance of 5 doesn't mean anything.

In fact, even if we change the units of heights and weights from cms to feet,

lbs to kgs the covariance for the same dataset will change.

What if we standardize the datasets before applying covariance, that becomes correlation and that can tell how much the data is correlated.

2) However, if there are outliers in the dataset, we may have a situation where covariance is -time for monotonically increasing relation.

Pearson correlation coefficient

The Pearson correlation coefficient (PCC), also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC) or the bivariate correlation, is a measure of the linear correlation between two variables X and Y.

Owing to the Cauchy–Schwarz inequality it has a value between +1 and −1,

where 1 is the total positive linear correlation,

0 is no linear correlation,

and −1 is the total negative linear correlation

Image result for pearson correlation coefficient

 

Image result for pearson correlation coefficient

ρ =1 when there is a positive and perfect correlation.

A naive example of this would be the height of a group of individuals in cms and inches.

0< ρ <1 means there is some correlation but a higher Pearson correlation coefficient (PCC) implies more correlation.

Similarly when -1< ρ <0 means they are inversely related and lower the ρ means higher inverse correlation.

ρ =0 when we can't establish a correlation

PCC is good when we a linear relationship but doesn't that well for nonlinear relations

 

Spearman's rank correlation coefficient

Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter ρ(rho) .

It is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables).

It assesses how well the relationship between two variables can be described using a monotonic function.

The Spearman correlation between the two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships.

Image result for Spearman's rank correlation coefficient

Spearman's correlation assesses monotonic relationships (whether linear or not).

If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.

Spearman’s correlation coefficient does not take into consideration the linear or not.

Related image

 

Note

Correlation Doesn't imply Causation

The statement means just because there is some relation with respect to increase or decrease of one variable with respect to another. It doesn't mean they cause one another.

 

 

 

If you took anything from the blog post, it is  please don't buy the rock 😛


0 Comments

What do you think?