Spearman’s Rank Correlation

When data is not normally distributed or when the presence of outliers gives a distorted picture of the association between two random variables, the Spearman’s rank correlation is a non-parametric test that can be used instead of the Pearson’s correlation coefficient.

On this webpage we will explain the basic concepts about Spearman’s correlation and how to calculate it. Click here to find out how to perform hypothesis testing to determine whether Spearman’s correlation statistically equal to zero (i.e. to determine whether two samples are independent).

Definition 1: The Spearman’s rank correlation (also called Spearman’s rho) is the Pearson’s correlation coefficient on the ranks of the data.

Example 1: The left side of Figure 1 displays the association between the IQ of each adolescent in a sample with the number of hours they listen to rock music per month. Determine the strength of the correlation between IQ and rock music using both the Pearson’s correlation coefficient and Spearman’s rank correlation. Compare the results.

Spearman's rho

Figure 1 – Data for Example 1

To calculate Spearman’s rho, we need to determine the rank for each of the IQ scores and each of the Rock scores. E.g. the rank of the first IQ score (cell A4 in Figure 1) is =RANK.AVG(A4,A$4:A$13,1), and so we put this formula in cell C4. If you are using Excel 2007 you would use the Real Statistics function RANK_AVG instead of RANK.AVG (as explained in Ranking).

We now calculate both correlation coefficients as follows:

Pearson’s correlation = CORREL(A4:A13,B4:B13) = -0.036

Spearman’s rho = CORREL(C4:C13,D4:D13) = -0.115

We see that there isn’t much of a correlation between IQ and listening to rock music based on the sample.

Observation: When conducting an analysis, if you discover the presence of outliers (e.g. via a histogram or scatter diagram), proceed as follows:

Calculate the Pearson’s correlation coefficient for the sample with and without the outliers. If there isn’t much difference, then you can be pretty confident that the outliers are not influencing the results. You can also calculate the Spearman’s rank coefficient. If this is pretty similar to the Pearson’s correlation coefficient, this is also a good indicator that the outliers are not substantially influencing the results.

If there are clear differences then you will need to be cautious about how you treat the outliers.

Similarly if you test the sample data for the x and y variables and see that either one of them is not roughly normal (using the techniques described in Testing for Normality and Symmetry), then you will need to use the Spearman’s coefficient rather than Pearson’s.

Figure 2 displays a scatter diagram for the data in Example 1. We see that the data is pretty randomly scattered although there is a potential outlier where the rap music listening spikes to 45. This gives some evidence that the Spearman’s rho might be a better choice.

Scatter plot Spearman's rho

Figure 2 – Scatter diagram for data from Example 1

Observation: Spearman’s rho for the data in ranges R1 and R2 can be calculated in Excel via the formula

=CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))

For versions of Excel prior to Excel 2010, the following formula will do the job.

=CORREL(RANK(R1,R1,1)+(COUNTIF(R1,R1)-1)/2,RANK(R2,R2,1)+(COUNTIF(R2,R2)-1)/2)

Real Statistics Function: The Real Statistics Resource Pack supplies the following function:

SCORREL(R1, R2) = Spearman’s rho for the data in ranges R1 and R2

For Example 1, SCORREL(A4:A13,B4:B13) = -0.115.

Observation: When there are no ties in the ranking, there is alternative way of calculating Spearman’s rho using the following property.

Property 1: When there are no ties, Spearman’s rho is equal to

Spearman's rho

where di = rank xi – rank yi.

Example 2: Calculate Spearman’s rho for the data from Example 1 using Property 1.

Spearman's rho quick version

Figure 3 – Alternative way of calculating rho for Example 1

Using Property 1 and the data in Figure 3:

image3655

To find out how to test whether Spearman’s correlation coefficient is statistically equal to zero (i.e. to determine whether two samples are independent) click here.

48 Responses to Spearman’s Rank Correlation

  1. Lily says:

    Dear Charles,
    First I want to thank you for your efforts on the website. Now my question:compared to using the formula for calculating spearman,using =CORREL (…) is a very easy way to get the same result. Unfortunately, in some rare cases I achieve the same result for spearman rho through both formula and =CORREL. This happens mostly in a sample of 10 respondents. In all other cases I’ve two different results. Meanwhile I checked every step so many times but unfortunately I’m not able to achieve the same outcome through both ways consistently. I hoop my problem is clear.

    Thank you.
    Lily

    • Charles says:

      Dear Lily,
      I don’t completely understand the problem that you are identifying. Can you send me an Excel file with an example? You can send it to my email addres, which is listed on the Contact Us webpage.
      Charles

    • Charles says:

      Lily,

      Thanks for sending me the Excel file with your data.

      The formula (1-6*sum/n(n^2-1)) is only valid when there are no ties. In my example there were no ties; in your example there are ties. Using CORREL on the ranked data will always work. Using SCORREL on the original data will always work.

      Charles

  2. Dan says:

    Would you mind changing the example above? I often find that rap music is a stand in for race in many circles. Psychology has such a tainted history of manufacturing differences in IQ according to racial lines. I’m sure you didn’t mean to create a racist example, but I think there are so many other scenarios you could use that don’t reinforce the experience of so many people with stereotypes.

  3. Henry says:

    Good afternoon,
    Thank you for the page, it is really helpful.
    I just have comment: how can I deal with missing observations? I mean, I need to calculate a Spearman correlation matrix, and unfortunately I need to deal with missing observations which are different for each of the variables.
    Can you please help?
    Thanks!

    • Charles says:

      Henry,
      When you say that you have a Spearman correlation matrix, are you referring to a square matrix whose elements are pairwise Spearman’s correlation coefficients based on ranks (and not Pearson’s correlation coefficients)? How many variables are there (i.e. what is the dimension of the correlation matrix)? How many subjects are in the sample?
      Charles

      • Henry says:

        Yes I have a set of 5 ordinal variables for which I would calculate all the correlations, something like the normal square correlation matrix but using Spearman rho instead.
        My problem is that each variable has different missing observations (for different units, so the size of the sample is different for each variable, going from 87 to 98). I am not sure on how to deal with this in excel.
        Thanks for your help!
        Henry

        • Charles says:

          Henry,
          You can calculate the correlation coefficient for each pair (with listwise deletion of missing data for each pair). This will give you C(5,2) = 10 correlation coefficients plus 5 variances. Put these into a 5 x 5 matrix (here 10 x 2 + 5 = 25) and you have a pseudo correlation matrix. This matrix won’t have all the properties of a real correlation matrix, but if you don’t have too much missing data it will be pretty close.
          Charles

  4. Faith Modie says:

    Good morning, please give me an example of calculating spearman’s rho when having a tie.

    • Charles says:

      Hello Faith,
      Just change the first IQ value in Figure 1 of the referenced webpage from 99 to 90 and you will get a tied value. You can do this on the Excel worksheet that you can download from Examples Workbooks.
      Charles

  5. Allan Castillo says:

    Hi, Charles.

    Using the Resource Pack, the value Ive got for Spearman for the above unranked data is 0.1244 and NOT -0.115. However, the Spearman value is the same for unranked and ranked dataset in Example 1 of Spearman’s Rank Correlation Hypothesis Testing page (http://www.real-statistics.com/correlation/spearmans-rank-correlation/spearmans-rank-correlation-detailed/). As I understand it, the Resource Pack correlation option does the ranking work for us such that we do not have to rank the dataset by ourselves and we just put the range of the unranked datasets in the correlation menu of the Resource Pack to get the correlation coefficient, is this so? Thanks very much.

    • Allan Castillo says:

      by “above unranked data” I mean Figure 1 of example 1. Thanks.

    • Charles says:

      Allan,

      Unfortunately, there is an error in the SCORREL function. The correct Spearman’s correlation value is -0.115 and not 0.1244. This error occurs when one of the data elements is zero. I will correct this in the next release, which I will issue in the next day or so.

      The Resource Pack does the ranking for you. You don’t need to do it yourself.

      Thanks for your comment. Thanks to following up on your comment I was able to discover that the SCORREL has a bug.

      Charles

  6. lukku says:

    how to find R1 & R2..?

    • Charles says:

      Lukku,
      R1 and R2 represent two data ranges. If you want the correlation coefficient corresponding to the data in A1:A10 and B1:B10, you can think of R1 as representing A1:A10 and R2 as representing B1:B10.
      Charles

  7. Irene says:

    I want to know the correlation of the weighted means of my two set of respondents, the parents and students… I used five-point likert scale. I ranked responses based on the statements given on the two groups of respondents according to the weighted means. Now, my question is how am I going to correlate it? Thank you.

  8. Madeleine says:

    Hello,

    I have tried using =CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1)) but Excel really doesn’t like the (R1,R1,1) or (R2,R2,1) bits.

    I’ve substituted them for (A2,A12,1) and (B2,B12,1) because my data is in the first two columns, 11 pieces each from rows 2-12. It doesn’t help. I don’t know why there are three terms in the brackets anyway – what is the “1” for at the end?

    I’ve also tried to do it the long way round – do a RANK.AVG on each column and then do another column for d and one for d². But I can’t work out what to type after RANK.AVG. The Help function doesn’t really help.

    (I know it would be easier to rank by hand, but ultimately I want to enter 120 pairs of data).

    Thanks to anyone who can help me…

    • Charles says:

      Madeline,

      R1 and R2 are just abbreviations for range 1 and range 2. You can’t enter these letters in the formulas unless you are actually referring to cells R1 and R2.

      More importantly, when you enter a range into any Excel formula you need to specify the range by its end points separated by a colon (not a comma). Thus your two ranges are probably A2:A12 and B2:B12. The formula you are probably trying to use if therefore

      =CORREL(RANK.AVG(A2:A12,A2:A12,1),RANK.AVG(B2:B12,B2:B12,1))

      Charles

  9. Phoenix says:

    Is there any way I can solve two weighted means that do not have the same number of variables using spearman’s?

    • Charles says:

      Phoenix,
      Sorry, but I don’t understand what you mean by “solve two weighted means” nor why you would want to use spearman’s to do this.
      Charles

  10. Diswat says:

    Dear Sir
    What can I do if i have 3 repetitions.According to your table if student have the same IQ score how can you rank them and how can you calculate?
    Best Regard

    • Charles says:

      As described on the referenced webpage, you can calculate Spearman’s rank correlation by using the formula
      =CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))

      The RANK.AVG function takes care of any ties.

      You can also use the Real Statistics SCORREL(R1,R2) function.

      Charles

  11. Stacey Provan says:

    Hi Charles,

    I am completing an elective project and am trying to establish if there is a correlation between being a member of a certain organisation and whether or not someone thinks a subject matter should be included in training and development (answer options to this are simply yes and no).

    Firstly, is this possible to do? And if so how do I do it?

    Stacey

    • Charles says:

      Stacey,
      You can calculate the correlation using the CORREL function. It is hard to tell from the summary you have given, but a t test may be sufficient for your purposes.
      Charles

  12. Ankit Gandhi says:

    Is it possible to use Spearman’s rank correlation between 3 and 5 point likert scale questions?
    In my survey I am asking question on attitude and purchase behavior, but both of them have questions in different likert scale. Attitude questions are in 5 point likert scale and purchase behavior question are in 3 point likert scale. So is it possible to use Spearman Rank correlation between these two different likert scale question in testing the Hypothesis?

    • Charles says:

      Yes, you can calculate a Spearman’s rank correlation coefficient from different Likert scales. What hypothesis do you want to test?
      Charles

  13. Matthias says:

    Hi Charles,

    what would be an example for the independence case, i.e., r = 0, with ranked values?

  14. malik says:

    Dear Charles!
    Is there any way to find Spearman’s Rank Correlation, when number of variables are not same?
    example X has 35 values and Y has 30 values.

    • Charles says:

      Just like Pearson’s correlation, this statistic performs a calculation based on pairs of data elements from X and Y. Thus the X and Y counts need to be the same.
      Charles

  15. Hoplite says:

    Charles,

    Once again – great website!

    How do you conclude with a spearman rho of -0.115 and a Pearson of -0.036 that ‘the Spearman’s rho is closer to zero (indicating independent samples) than the Pearson’s.’? Isn’t Spearman indicating they are more negatively correlated than Pearson?

    • Charles says:

      Thanks for identifying this mistake. I have removed this phrase, which is clearly in error, from the webpage.
      Charles

  16. Tina says:

    Hello,
    This is a very helpful explanation, thank you.
    You mention that a different approach is needed with a set of data with many ties, and I hope you can help me out, as I am no mathematician, but trying to analyse data from some language tests. I have 46 participants with three test results each (they have scored between 1-6 in each test). I would like to compare the results in two sets of test results to see if they correalate, but there is a large number of ties – what would you suggest that I do?
    Thanks in advance, Tina

    • Charles says:

      Tina,
      Spearman’s handles ties by averaging the ranking values; this is the equivalent of RANK.AVG. I haven’t been able to find a ties correction factor for Spearman’s. In any case, Kendall’s tau is usually recommended over Spearman’s and Kendall’s tau does have a ties correction factor as explained on the webpage Kendall’s Tau with Ties.
      Charles

  17. Hammad Khan says:

    Would you please explain..
    1. In which order we rank the data? Ascending or descending? Which value is labelled as no 1, highest or lowest???

    2. Why it is multiply by 6?

    Thanks.

    • Charles says:

      1. As you can see from Figure 1, the largest data gets the highest ranking.
      2. I don’t know why the formula has a multiplier of 6. I guess the mathematics just worked out that way.
      Charles

  18. Kevin says:

    Thanks a lot. I see in method 2 above you have ranked the values with the smallest being assigned position 1. I tried the opposite so that the biggest value is number 1 and was getting the same answer. is that right? I have tried with other sets of data and the answer is the same whatever the ranking used.

  19. mohsen says:

    Thank you, your information was very helpful. I am Civil engineer and my information about statistic is low. I want to use the first paragraph of this page in my thesis our article, I would appreciate if you give me a reference which includes the meaning of first paragraph.

  20. Lynn says:

    Hello,

    Thank you for your description! I have read that for tied ranks there is a specific formula to use when there are tied ranks. If there aren’t any tied ranks, it would generate the same answer as the formula in the example provided here. However, I’ve seen examples from other sources that use the above formula regardless of whether they are dealing with tied ranks, so I’m a little confused.

    So, just to clarify, in the excel formula
    =CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1)) can the ranks be tied?

    Thank you :)

    • Charles says:

      Lynn,

      In a number of these nonparametric approaches there are two levels at which ties are handled. The first level is to use RANK.AVG to handle ties. In this case if two data items are tied with rank 6, then the actual rank used for both is 6.5. This is the approach used on the referenced webpage (using the formula you have included in your comment). This approach works well provided there aren’t too many ties.

      If there are a lot of ties then in addition a ties correction factor is employed. I have described such a ties correction factor for Kendall’s tau and for Mann-Whitney for example. I have not done this for Spearman’s rho correlation, although the approach is quite similar.

      Charles

  21. Adipati Camma says:

    Many thanks Charles. Your explanation is very helpful.

  22. sumarno says:

    your explanation is very helpful. It’d better be utilized by excel file to be down loaded. Thank U

  23. Hi Charles,

    I am busy writing a MBA thesis and your pages on a Spearman’s and Multiple correlation have been a lifesaver.

    Thank you kindly,
    Gregg

Leave a Reply

Your email address will not be published. Required fields are marked *