When data is not normally distributed or when the presence of outliers gives a distorted picture of the association between two random variables, the Spearman’s rank correlation is a non-parametric test that can be used instead of the Pearson’s correlation coefficient.

On this webpage we will explain the basic concepts about Spearman’s correlation and how to calculate it. Click here to find out how to perform hypothesis testing to determine whether Spearman’s correlation statistically equal to zero (i.e. to determine whether two samples are independent).

**Definition 1**: The **Spearman’s rank correlation** (also called **Spearman’s rho**) is the Pearson’s correlation coefficient on the ranks of the data.

**Example 1**: The left side of Figure 1 displays the association between the IQ of each adolescent in a sample with the number of hours they listen to rock music per month. Determine the strength of the correlation between IQ and rock music using both the Pearson’s correlation coefficient and Spearman’s rank correlation. Compare the results.

**Figure 1 – Data for Example 1**

To calculate Spearman’s rho, we need to determine the rank for each of the IQ scores and each of the Rock scores. E.g. the rank of the first IQ score (cell A4 in Figure 1) is =RANK.AVG(A4,A$4:A$13,1), and so we put this formula in cell C4. If you are using Excel 2007 you would use the Real Statistics function RANK_AVG instead of RANK.AVG (as explained in Ranking).

We now calculate both correlation coefficients as follows:

Pearson’s correlation = CORREL(A4:A13,B4:B13) = -0.036

Spearman’s rho = CORREL(C4:C13,D4:D13) = -0.115

We see that there isn’t much of a correlation between IQ and listening to rock music based on the sample.

**Observation**: When conducting an analysis, if you discover the presence of outliers (e.g. via a histogram or scatter diagram), proceed as follows:

Calculate the Pearson’s correlation coefficient for the sample with and without the outliers. If there isn’t much difference, then you can be pretty confident that the outliers are not influencing the results. You can also calculate the Spearman’s rank coefficient. If this is pretty similar to the Pearson’s correlation coefficient, this is also a good indicator that the outliers are not substantially influencing the results.

If there are clear differences then you will need to be cautious about how you treat the outliers.

Similarly if you test the sample data for the *x* and y variables and see that either one of them is not roughly normal (using the techniques described in Testing for Normality and Symmetry), then you will need to use the Spearman’s coefficient rather than Pearson’s.

Figure 2 displays a scatter diagram for the data in Example 1. We see that the data is pretty randomly scattered although there is a potential outlier where the rap music listening spikes to 45. This gives some evidence that the Spearman’s rho might be a better choice.

**Figure 2 – Scatter diagram for data from Example 1**

**Observation**: Spearman’s rho for the data in ranges R1 and R2 can be calculated in Excel via the formula

=CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))

For versions of Excel prior to Excel 2010, the following formula will do the job.

=CORREL(RANK(R1,R1,1)+(COUNTIF(R1,R1)-1)/2,RANK(R2,R2,1)+(COUNTIF(R2,R2)-1)/2)

**Real Statistics Function**: The Real Statistics Resource Pack supplies the following function:

**SCORREL**(R1, R2) = Spearman’s rho for the data in ranges R1 and R2

For Example 1, SCORREL(A4:A13,B4:B13) = -0.115.

**Observation**: When there are no ties in the ranking, there is alternative way of calculating Spearman’s rho using the following property.

**Property 1**: When there are no ties, Spearman’s rho is equal to

where *d _{i} = rank x_{i} – rank *y

_{i}.**Example 2**: Calculate Spearman’s rho for the data from Example 1 using Property 1.

**Figure 3 – Alternative way of calculating rho for Example 1**

Using Property 1 and the data in Figure 3:

To find out how to test whether Spearman’s correlation coefficient is statistically equal to zero (i.e. to determine whether two samples are independent) click here.

For those who have been asking about how to calculate Spearman’s rho when there are ties, this is how you would do it manually.

1) Rank the values in each of the two groups (call them X and Y) individually, representing tied values by the average of the ranks they would have been assigned had they not been tied.

2) Calculate the average of the ranks for the two groups (X-bar and Y-bar).

3) Calculate (X minus X-bar)*(Y minus Y-bar) for each paired data point and add these values together. Note that some of these products may be negative if one value is above its mean and the other is below it.

4) Divide the total in step 3 by n-1, where n is the number of pairs. This is the covariance of the ranks.

5) Divide the above covariance by the product (Sx * Sy), where Sx and Sy are the sample standard deviations (using n-1) of the ranks of the two groups X and Y.

The result of this division is the Spearman’s rho value. Note that it may not be equal to the value of SR given by the formula with the 6; as a. Matter of fact, if there are extensive ties, there may be quite a difference. But if there are no ties, it is mathematically equivalent to the formula with the 6.

Good luck, everyone!

Hey, I’m running some calculations in Excel and have very strange results: is it possible that for the same dataset Pearson is positive and Spearman is negative?

Yes

Hi All,

If i have such data (Below). How do i calculate the Spear man’s Rank order for the

Statistical Parameter D R A S T I C

Mean 6.5 4.5 7.0 8.0 6.0 8.5 1.5

Std Error 3.5 1.5 3.0 2.0 3.0 1.5 0.5

Median 6.5 4.5 7.0 8.0 6.0 8.5 1.5

Minimum 3.0 3.0 4.0 6.0 3.0 7.0 1.0

Maximum 10.0 6.0 10.0 10.0 9.0 10.0 2.0

SD 4.95 2.12 4.24 2.83 4.24 2.12 0.71

CV (%) 76.15 47.14 60.61 35.36 70.71 24.96 47.14

NB: DRASTIC is a model used for groundwater Vulnerability Assessment.

Any body with an idea kindly assist.

Alex,

I believe that you will need the raw data and not just the summary data that you have provided.

Charles

Hi Charles,

Thanks for your brilliant information.

I have 150 rows of data across 20 variants to compare. I am using ranked data. My plan is to use Spearman Rho, and I need to do the complex method as I have quite a lot of tied data. Your example is very clear and I have this under control.

Can you tell me what is the difference between the Excel CORRELL function, and doing all the calculations needed for Spearman Rho? They do give slightly different answers, but when I looked up the CORRELL function it suggested to me it was doing what I’m doing to get Spearman Rho? Sorry I am probably confused, but would love to know the answer.

Many thanks,

Claire

Claire,

The CORREL function outputs the Pearson’s correlation coefficient. Spearman’s rho is Pearson’s correlation coefficient, not on the original data, but on the ranked data.

Charles

Hi Charles,

Thank you for your explanations.

I need to do a Spearman correlation with correction for multiple testing of 2 variables but instead of getting the rho coeficient of the 2 variables, I need to get rho for each pair of data, i.e., the table should look like this:

gene var1 var2 rho p-value FDR

SFRP1 -0.44 -8.96 -0.0130 0.7130 0.9258

PCDH10 -0.28 -7.21 0.2530 0.3630 0.7256

STMN2 -0.10 -6.76 -0.7050 0.0050 0.0158

LCE3D -0.35 -6.00 0.0590 0.8340 0.9657

LY6K 0.64 -5.64 -0.5970 0.0190 0.6852

By “pair” I mean each individual gene considering the var1 and var2 values.

Any light on how I could do this analysis?

Thank you very much in advance.

Best

Gema

Gemma,

It sounds like you are looking for the correlation matrix using Spearman’s correlation instead of Pearson’s correlation. You can do this by ranking the data (as for Spearman’s correlation). Then create a correlation matrix based on the ranked data. E.g. you can use the Real Statistics CORR function for this.

Charles

Thanks!

I just installed the add-in and I will have a try

Gema

PS. Is there any example of what you suggested in your example data files?

Gema,

I don’t think so, but it is pretty easy to do.

Charles

Hi charles I am wondering because I am doing a likert scale analysis using herzeberge two factor theory, do i use the mean for my central tendency and sd? How about pearson and anova? I am very new at this TIA

Vladimir,

It depends on what you are trying to test.

Charles

Using the Minnesota Satisfaction Questionnaire i want to find out which factor affects the teacher the most. How do i analyze the data? I have seen some works that uses mean and standard deviation is that enough? Are there better test for this? N=15 only. TIA very much appreciated.

Vladimir,

I don’t know which test you are referring to.

The answer depends on how you are doing your evaluation. E.g. if you are using regression, then one way to determine which factors have the largest effect is to use Shapley-Owen.

Charles

thanks =)

hi Charles it’s me again. I am going to compare the job satisfaction and job motivation of new and old teachers. Can I use the T Test for this one. My hypothesis would be there is a relationship between tenure and job motivation and satisfaction. TIA

Vladimir,

If, for example, you can use the t test to compare the job satisfaction scores for teachers with tenure with the job satisfaction scores for those without tenure.

Charles

Yes that is the hypothesis. I have read https://www.researchgate.net/publication/266212127_Five-Point_Likert_Items_t_test_versus_Mann-Whitney-Wilcoxon

now that I am going to use the T test, do i get the total score or mean for each question before doing the test. TIA

Vladimir,

It really depends on what you want to test.

Charles

I was wondering whether it is known as to how/why Spearman’s Rank formula works, rather than how to use it?

Zak,

Spearman’s rank correlation is used because ranked data is normally distributed, and so has some desirable mathematical properties.

Charles

Sorry, I probably wasn’t very good at asking what I meant, I was wondering as to exactly why the formula for Spearman’s Rank itself actually works. Why ρ = 1 − (6 ∑ d^2)/(n ( n 2 − 1 )) is a formula that gives you a correlation between to sets of data and why this digit is between 1 and -1. Thank you, though, for the reply.

Zak,

That formula was popular when people used to do these calculations by hand. Today with computers you don’t need to use this formula, but simply calculate the correlation coefficient of the ranks.

As to why the formula works, you need to do a little algebra to show that this formula is equivalent to calculating the correlation coefficient of the ranks (at least when there are no ties). I haven’t included this algebra because I thought that there wouldn’t be much interest in it.

Charles

What If the given sample gets repeated? How to rank such repeated sample? For example if in rock 45 is repeated twice?

Revathy,

The approach is exactly as explained on the referenced webpage. The RANK.AVG function takes care of ties. If ROCK 45 is repeated then each will have a rank of 9.5.

Charles

Dear Charles,

Thank you for this! I would just like to ask if there is any way this could be done when comparing data across time. Thank you!

Jem,

Can you give me an example?

Charles

Sorry to hijack this, but I would like to do the same. You see, I am comparing data in which the correlation would be lagged eg weather data that effect population further down the line. How would I do that? Also, how does one get a p-value from here?

Chisara,

Please explain what you want to do in more detail. I don’t completely understand. Are you looking for a ranked version of autocorrelation?

Charles

Does ranked autocorrelation allow you to see where the best correlation would occur? I am trying to see if weather affects on earlier generations affect future ones and I need to match up the the variables to see where there is the most correlation.

Chisara,

I still don’t really understand your questions. Why can’t you simply calculate the correlation matrix and see which coefficients are highest?

Charles

Hi

I would like to calculate spearman’s rank order correlation coefficients for 30 variables and then create a matrix to determine which variables are highly correlated with each other. May I have any appropriate example for that?

Yask,

Assuming you are interested in pairwise correlations, I suggest that you calculate the ranks of the data for each of the 30 variables. You can do this as described on the referenced webpage using the RANK.AVG function. Once you have done this you can use the technique shown on the webpage http://www.real-statistics.com/multiple-regression/least-squares-method-multiple-regression/. In particular, you can use the Real Statistics CORR function.

Charles

Hi

I was wondering if this will be still spearman rank order correlation coefficient?

Are you suggesting to use this formuala CORR(R1) = MMULT(TRANSPOSE((R1-R2)/R3),(R1-R2)/R3)/(ROWS(R1)–1)

Yask,

Yes, where R1 contains the ranks of the original data and R2 and R3 are derived from the ranked data.

You can check to see whether this works by using the SCORREL function on any pair of columns in the original data.

Charles

Who to find xi

Sorry, but I don’t understand which xi you are referring to.

Charles

Dear Charles,

First I want to thank you for your efforts on the website. Now my question:compared to using the formula for calculating spearman,using =CORREL (…) is a very easy way to get the same result. Unfortunately, in some rare cases I achieve the same result for spearman rho through both formula and =CORREL. This happens mostly in a sample of 10 respondents. In all other cases I’ve two different results. Meanwhile I checked every step so many times but unfortunately I’m not able to achieve the same outcome through both ways consistently. I hoop my problem is clear.

Thank you.

Lily

Dear Lily,

I don’t completely understand the problem that you are identifying. Can you send me an Excel file with an example? You can send it to my email addres, which is listed on the Contact Us webpage.

Charles

Lily,

Thanks for sending me the Excel file with your data.

The formula (1-6*sum/n(n^2-1)) is only valid when there are no ties. In my example there were no ties; in your example there are ties. Using CORREL on the ranked data will always work. Using SCORREL on the original data will always work.

Charles

Would you mind changing the example above? I often find that rap music is a stand in for race in many circles. Psychology has such a tainted history of manufacturing differences in IQ according to racial lines. I’m sure you didn’t mean to create a racist example, but I think there are so many other scenarios you could use that don’t reinforce the experience of so many people with stereotypes.

Dan,

No offence was intended. I see your point and will change the example shortly.

Charles

Good afternoon,

Thank you for the page, it is really helpful.

I just have comment: how can I deal with missing observations? I mean, I need to calculate a Spearman correlation matrix, and unfortunately I need to deal with missing observations which are different for each of the variables.

Can you please help?

Thanks!

Henry,

When you say that you have a Spearman correlation matrix, are you referring to a square matrix whose elements are pairwise Spearman’s correlation coefficients based on ranks (and not Pearson’s correlation coefficients)? How many variables are there (i.e. what is the dimension of the correlation matrix)? How many subjects are in the sample?

Charles

Yes I have a set of 5 ordinal variables for which I would calculate all the correlations, something like the normal square correlation matrix but using Spearman rho instead.

My problem is that each variable has different missing observations (for different units, so the size of the sample is different for each variable, going from 87 to 98). I am not sure on how to deal with this in excel.

Thanks for your help!

Henry

Henry,

You can calculate the correlation coefficient for each pair (with listwise deletion of missing data for each pair). This will give you C(5,2) = 10 correlation coefficients plus 5 variances. Put these into a 5 x 5 matrix (here 10 x 2 + 5 = 25) and you have a pseudo correlation matrix. This matrix won’t have all the properties of a real correlation matrix, but if you don’t have too much missing data it will be pretty close.

Charles

Good morning, please give me an example of calculating spearman’s rho when having a tie.

Hello Faith,

Just change the first IQ value in Figure 1 of the referenced webpage from 99 to 90 and you will get a tied value. You can do this on the Excel worksheet that you can download from Examples Workbooks.

Charles

Hi, Charles.

Using the Resource Pack, the value Ive got for Spearman for the above unranked data is 0.1244 and NOT -0.115. However, the Spearman value is the same for unranked and ranked dataset in Example 1 of Spearman’s Rank Correlation Hypothesis Testing page (http://www.real-statistics.com/correlation/spearmans-rank-correlation/spearmans-rank-correlation-detailed/). As I understand it, the Resource Pack correlation option does the ranking work for us such that we do not have to rank the dataset by ourselves and we just put the range of the unranked datasets in the correlation menu of the Resource Pack to get the correlation coefficient, is this so? Thanks very much.

by “above unranked data” I mean Figure 1 of example 1. Thanks.

Allan,

Unfortunately, there is an error in the SCORREL function. The correct Spearman’s correlation value is -0.115 and not 0.1244. This error occurs when one of the data elements is zero. I will correct this in the next release, which I will issue in the next day or so.

The Resource Pack does the ranking for you. You don’t need to do it yourself.

Thanks for your comment. Thanks to following up on your comment I was able to discover that the SCORREL has a bug.

Charles

how to find R1 & R2..?

Lukku,

R1 and R2 represent two data ranges. If you want the correlation coefficient corresponding to the data in A1:A10 and B1:B10, you can think of R1 as representing A1:A10 and R2 as representing B1:B10.

Charles

I want to know the correlation of the weighted means of my two set of respondents, the parents and students… I used five-point likert scale. I ranked responses based on the statements given on the two groups of respondents according to the weighted means. Now, my question is how am I going to correlate it? Thank you.

Irene,

You can simply calculate the correlation coefficient using the CORREL function. Now the real questions is how meaningful is this correlation. This depends on what you are trying to accomplish.

Charles

witty

Hello,

I have tried using =CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1)) but Excel really doesn’t like the (R1,R1,1) or (R2,R2,1) bits.

I’ve substituted them for (A2,A12,1) and (B2,B12,1) because my data is in the first two columns, 11 pieces each from rows 2-12. It doesn’t help. I don’t know why there are three terms in the brackets anyway – what is the “1” for at the end?

I’ve also tried to do it the long way round – do a RANK.AVG on each column and then do another column for d and one for d². But I can’t work out what to type after RANK.AVG. The Help function doesn’t really help.

(I know it would be easier to rank by hand, but ultimately I want to enter 120 pairs of data).

Thanks to anyone who can help me…

Madeline,

R1 and R2 are just abbreviations for range 1 and range 2. You can’t enter these letters in the formulas unless you are actually referring to cells R1 and R2.

More importantly, when you enter a range into any Excel formula you need to specify the range by its end points separated by a colon (not a comma). Thus your two ranges are probably A2:A12 and B2:B12. The formula you are probably trying to use if therefore

=CORREL(RANK.AVG(A2:A12,A2:A12,1),RANK.AVG(B2:B12,B2:B12,1))

Charles

Is there any way I can solve two weighted means that do not have the same number of variables using spearman’s?

Phoenix,

Sorry, but I don’t understand what you mean by “solve two weighted means” nor why you would want to use spearman’s to do this.

Charles

Dear Sir

What can I do if i have 3 repetitions.According to your table if student have the same IQ score how can you rank them and how can you calculate?

Best Regard

As described on the referenced webpage, you can calculate Spearman’s rank correlation by using the formula

=CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))

The RANK.AVG function takes care of any ties.

You can also use the Real Statistics SCORREL(R1,R2) function.

Charles

Hi Charles,

I am completing an elective project and am trying to establish if there is a correlation between being a member of a certain organisation and whether or not someone thinks a subject matter should be included in training and development (answer options to this are simply yes and no).

Firstly, is this possible to do? And if so how do I do it?

Stacey

Stacey,

You can calculate the correlation using the CORREL function. It is hard to tell from the summary you have given, but a t test may be sufficient for your purposes.

Charles

Is it possible to use Spearman’s rank correlation between 3 and 5 point likert scale questions?

In my survey I am asking question on attitude and purchase behavior, but both of them have questions in different likert scale. Attitude questions are in 5 point likert scale and purchase behavior question are in 3 point likert scale. So is it possible to use Spearman Rank correlation between these two different likert scale question in testing the Hypothesis?

Yes, you can calculate a Spearman’s rank correlation coefficient from different Likert scales. What hypothesis do you want to test?

Charles

Hi Charles,

what would be an example for the independence case, i.e., r = 0, with ranked values?

I am not sure that I completely understand your question, but you may find the answer on the following webpage:

Hypothesis Testing of Spearman’s Rank Correlation.

Charles

Dear Charles!

Is there any way to find Spearman’s Rank Correlation, when number of variables are not same?

example X has 35 values and Y has 30 values.

Just like Pearson’s correlation, this statistic performs a calculation based on pairs of data elements from X and Y. Thus the X and Y counts need to be the same.

Charles

Charles,

Once again – great website!

How do you conclude with a spearman rho of -0.115 and a Pearson of -0.036 that ‘the Spearman’s rho is closer to zero (indicating independent samples) than the Pearson’s.’? Isn’t Spearman indicating they are more negatively correlated than Pearson?

Thanks for identifying this mistake. I have removed this phrase, which is clearly in error, from the webpage.

Charles

Hello,

This is a very helpful explanation, thank you.

You mention that a different approach is needed with a set of data with many ties, and I hope you can help me out, as I am no mathematician, but trying to analyse data from some language tests. I have 46 participants with three test results each (they have scored between 1-6 in each test). I would like to compare the results in two sets of test results to see if they correalate, but there is a large number of ties – what would you suggest that I do?

Thanks in advance, Tina

Tina,

Spearman’s handles ties by averaging the ranking values; this is the equivalent of RANK.AVG. I haven’t been able to find a ties correction factor for Spearman’s. In any case, Kendall’s tau is usually recommended over Spearman’s and Kendall’s tau does have a ties correction factor as explained on the webpage Kendall’s Tau with Ties.

Charles

Would you please explain..

1. In which order we rank the data? Ascending or descending? Which value is labelled as no 1, highest or lowest???

2. Why it is multiply by 6?

Thanks.

1. As you can see from Figure 1, the largest data gets the highest ranking.

2. I don’t know why the formula has a multiplier of 6. I guess the mathematics just worked out that way.

Charles

Hi Charles,

Here’s the reasoning behind why there is a 6 in the Spearman’s rank formula…it’s actually pretty insightful. If the two rankings are identical in every place, obviously the sum of their squared differences will be 0, the minimum. If the rankings are in reverse order, the sum of squared differences is a maximum, and it can be shown to equal (n*(n^2-1))/3, where n equals the number of pairs. We want a “common” scale to judge these differences by; that is, we do not want this scale to depend on the number of pairs. In addition, we want the scale to range from -1 (perfect negative correlation) to 1 (perfect positive correlation). Therefore, we are looking for a transformation that takes 0 (minimum sum of squares) to 1 and that takes (n*(n^2-1)/3) to -1. If you find the slope of the line connecting the two points (0,1) and (n(n^2-1)/3, -1), you will find that it equals -6/(n(n^2-1)). There’s your 6!

Kevin,

Tanks for your explanations.

Charles

Thanks a lot. I see in method 2 above you have ranked the values with the smallest being assigned position 1. I tried the opposite so that the biggest value is number 1 and was getting the same answer. is that right? I have tried with other sets of data and the answer is the same whatever the ranking used.

Kevin,

I believe that you are correct.

Charles

Yes, this is correct. As long as you are consistent in your rankings of both groups, it doesn’t matter whether 1 corresponds to the highest or lowest value in the group. But if 1 corresponds to the highest value in group 1, it must also correspond to the highest value in group 2, and vice versa.

Thank you, your information was very helpful. I am Civil engineer and my information about statistic is low. I want to use the first paragraph of this page in my thesis our article, I would appreciate if you give me a reference which includes the meaning of first paragraph.

If you want to cite the webpage use the citation at http://www.real-statistics.com/appendix/citation-real-statistics-software-website/

Charles

Hello,

Thank you for your description! I have read that for tied ranks there is a specific formula to use when there are tied ranks. If there aren’t any tied ranks, it would generate the same answer as the formula in the example provided here. However, I’ve seen examples from other sources that use the above formula regardless of whether they are dealing with tied ranks, so I’m a little confused.

So, just to clarify, in the excel formula

=CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1)) can the ranks be tied?

Thank you 🙂

Lynn,

In a number of these nonparametric approaches there are two levels at which ties are handled. The first level is to use RANK.AVG to handle ties. In this case if two data items are tied with rank 6, then the actual rank used for both is 6.5. This is the approach used on the referenced webpage (using the formula you have included in your comment). This approach works well provided there aren’t too many ties.

If there are a lot of ties then in addition a ties correction factor is employed. I have described such a ties correction factor for Kendall’s tau and for Mann-Whitney for example. I have not done this for Spearman’s rho correlation, although the approach is quite similar.

Charles

Many thanks Charles. Your explanation is very helpful.

your explanation is very helpful. It’d better be utilized by excel file to be down loaded. Thank U

Hi Charles,

I am busy writing a MBA thesis and your pages on a Spearman’s and Multiple correlation have been a lifesaver.

Thank you kindly,

Gregg