When data is not normally distributed or when the presence of outliers gives a distorted picture of the association between two random variables, the Spearman’s rank correlation is a non-parametric test that can be used instead of the Pearson’s correlation coefficient.
On this webpage we will explain the basic concepts about Spearman’s correlation and how to calculate it. Click here to find out how to perform hypothesis testing to determine whether Spearman’s correlation statistically equal to zero (i.e. to determine whether two samples are independent).
Definition 1: The Spearman’s rank correlation (also called Spearman’s rho) is the Pearson’s correlation coefficient on the ranks of the data.
Example 1: The left side of Figure 1 displays the association between the IQ of each adolescent in a sample with the number of hours they listen to rock music per month. Determine the strength of the correlation between IQ and rock music using both the Pearson’s correlation coefficient and Spearman’s rank correlation. Compare the results.
Figure 1 – Data for Example 1
To calculate Spearman’s rho, we need to determine the rank for each of the IQ scores and each of the Rock scores. E.g. the rank of the first IQ score (cell A4 in Figure 1) is =RANK.AVG(A4,A$4:A$13,1), and so we put this formula in cell C4. If you are using Excel 2007 you would use the Real Statistics function RANK_AVG instead of RANK.AVG (as explained in Ranking).
We now calculate both correlation coefficients as follows:
Pearson’s correlation = CORREL(A4:A13,B4:B13) = -0.036
Spearman’s rho = CORREL(C4:C13,D4:D13) = -0.115
We see that there isn’t much of a correlation between IQ and listening to rock music based on the sample.
Observation: When conducting an analysis, if you discover the presence of outliers (e.g. via a histogram or scatter diagram), proceed as follows:
Calculate the Pearson’s correlation coefficient for the sample with and without the outliers. If there isn’t much difference, then you can be pretty confident that the outliers are not influencing the results. You can also calculate the Spearman’s rank coefficient. If this is pretty similar to the Pearson’s correlation coefficient, this is also a good indicator that the outliers are not substantially influencing the results.
If there are clear differences then you will need to be cautious about how you treat the outliers.
Similarly if you test the sample data for the x and y variables and see that either one of them is not roughly normal (using the techniques described in Testing for Normality and Symmetry), then you will need to use the Spearman’s coefficient rather than Pearson’s.
Figure 2 displays a scatter diagram for the data in Example 1. We see that the data is pretty randomly scattered although there is a potential outlier where the rap music listening spikes to 45. This gives some evidence that the Spearman’s rho might be a better choice.
Figure 2 – Scatter diagram for data from Example 1
Observation: Spearman’s rho for the data in ranges R1 and R2 can be calculated in Excel via the formula
For versions of Excel prior to Excel 2010, the following formula will do the job.
Real Statistics Function: The Real Statistics Resource Pack supplies the following function:
SCORREL(R1, R2) = Spearman’s rho for the data in ranges R1 and R2
For Example 1, SCORREL(A4:A13,B4:B13) = -0.115.
Observation: When there are no ties in the ranking, there is alternative way of calculating Spearman’s rho using the following property.
Property 1: When there are no ties, Spearman’s rho is equal to
where di = rank xi – rank yi.
Example 2: Calculate Spearman’s rho for the data from Example 1 using Property 1.
Figure 3 – Alternative way of calculating rho for Example 1
Using Property 1 and the data in Figure 3:
To find out how to test whether Spearman’s correlation coefficient is statistically equal to zero (i.e. to determine whether two samples are independent) click here.