When data is not normally distributed or when the presence of outliers gives a distorted picture of the association between two random variables, the Spearman’s rank correlation is a non-parametric test that can be used instead of the Pearson’s correlation coefficient.

On this webpage we will explain the basic concepts about Spearman’s correlation and how to calculate it. Click here to find out how to perform hypothesis testing to determine whether Spearman’s correlation statistically equal to zero (i.e. to determine whether two samples are independent).

**Definition 1**: The **Spearman’s rank correlation** (also called **Spearman’s rho**) is the Pearson’s correlation coefficient on the ranks of the data.

**Example 1**: The left side of Figure 1 displays the association between the IQ of each adolescent in a sample with the number of hours they listen to rap music per month. Determine the strength of the correlation between IQ and rap music using both the Pearson’s correlation coefficient and Spearman’s rank correlation. Compare the results.

**Figure 1 – Data for Example 1**

To calculate Spearman’s rho, we need to determine the rank for each of the IQ scores and each of the Rap scores. E.g. the rank of the first IQ score (cell A4 in Figure 1) is =RANK.AVG(A4,A$4:A$13,1), and so we put this formula in cell C4. If you are using Excel 2007 you would use the supplemental function RANK_AVG instead of RANK.AVG (as explained in Ranking).

We now calculate both correlation coefficients as follows:

Pearson’s correlation = CORREL(A4:A13,B4:B13) = -0.036

Spearman’s rho = CORREL(C4:C13,D4:D13) = -0.115

We see that there isn’t much of a correlation between IQ and listening to rap music based on the sample, although the Spearman’s rho is closer to zero (indicating independent samples) than the Pearson’s.

**Observation**: When conducting an analysis, if you discover the presence of outliers (e.g. via a histogram or scatter diagram), proceed as follows:

Calculate the Pearson’s correlation coefficient for the sample with and without the outliers. If there isn’t much difference, then you can be pretty confident that the outliers are not influencing the results. You can also calculate the Spearman’s rank coefficient. If this is pretty similar to the Pearson’s correlation coefficient, this is also a good indicator that the outliers are not substantially influencing the results.

If there are clear differences then you will need to be cautious about how you treat the outliers.

Similarly if you test the sample data for the *x* and y variables and see that either one of them is not roughly normal (using the techniques described in Testing for Normality and Symmetry), then you will need to use the Spearman’s coefficient rather than Pearson’s.

Figure 2 displays a scatter diagram for the data in Example 1. We see that the data is pretty randomly scattered although there is a potential outlier where the rap music listening spikes to 45. This gives some evidence that the Spearman’s rho might be a better choice.

**Figure 2 – Scatter diagram for data from Example 1**

**Observation**: Spearman’s rho for the data in ranges R1 and R2 can be calculated in Excel via the formula

=CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))

For versions of Excel prior to Excel 2010, the following formula will do the job.

=CORREL(RANK(R1,R1,1)+(COUNTIF(R1,R1)-1)/2,RANK(R2,R2,1)+(COUNTIF(R2,R2)-1)/2)

**Real Statistics Function**: The Real Statistics Resource Pack supplies the following function:

SCORREL(R1, R2) = Spearman’s rho for the data in ranges R1 and R2

For Example 1, SCORREL(A4:A13,B4:B13) = -0.115.

**Observation**: When there are no ties in the ranking, there is alternative way of calculating Spearman’s rho using the following property.

**Property 1**: When there are no ties, Spearman’s rho is equal to

where *d _{i} = rank x_{i} – rank *y

_{i}.**Example 2**: Calculate Spearman’s rho for the data from Example 1 using Property 1.

**Figure 3 – Alternative way of calculating rho for Example 1**

Using Property 1 and the data in Figure 3:

To find out how to test whether Spearman’s correlation coefficent is statistically equal to zero (i.e. to determine whether two samples are independent) click here.

Hi Charles,

I am busy writing a MBA thesis and your pages on a Spearman’s and Multiple correlation have been a lifesaver.

Thank you kindly,

Gregg

your explanation is very helpful. It’d better be utilized by excel file to be down loaded. Thank U

Many thanks Charles. Your explanation is very helpful.

Hello,

Thank you for your description! I have read that for tied ranks there is a specific formula to use when there are tied ranks. If there aren’t any tied ranks, it would generate the same answer as the formula in the example provided here. However, I’ve seen examples from other sources that use the above formula regardless of whether they are dealing with tied ranks, so I’m a little confused.

So, just to clarify, in the excel formula

=CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1)) can the ranks be tied?

Thank you

Lynn,

In a number of these nonparametric approaches there are two levels at which ties are handled. The first level is to use RANK.AVG to handle ties. In this case if two data items are tied with rank 6, then the actual rank used for both is 6.5. This is the approach used on the referenced webpage (using the formula you have included in your comment). This approach works well provided there aren’t too many ties.

If there are a lot of ties then in addition a ties correction factor is employed. I have described such a ties correction factor for Kendall’s tau and for Mann-Whitney for example. I have not done this for Spearman’s rho correlation, although the approach is quite similar.

Charles

Thank you, your information was very helpful. I am Civil engineer and my information about statistic is low. I want to use the first paragraph of this page in my thesis our article, I would appreciate if you give me a reference which includes the meaning of first paragraph.

If you want to cite the webpage use the citation at http://www.real-statistics.com/appendix/citation-real-statistics-software-website/

Charles

Thanks a lot. I see in method 2 above you have ranked the values with the smallest being assigned position 1. I tried the opposite so that the biggest value is number 1 and was getting the same answer. is that right? I have tried with other sets of data and the answer is the same whatever the ranking used.

Kevin,

I believe that you are correct.

Charles