We show how to calculate Spearman’s rho and how to use it for hypothesis testing.
Example 1: The table in Figure 1 associates the IQ of each adolescent in a sample with the number of hours they listen to rap music per month. Determine the strength of the correlation between IQ and rap music using both the Pearson’s correlation coefficient and Spearman’s rank correlation. Compare the results.
To calculate Spearman’s rho, we need to determine the rank for each of the IQ scores and each of the Rap scores. E.g. the rank of the first IQ score (cell A4 in Figure E3.1) is =RANK.AVG(A4,A$4:A$13,1), and so we put this formula in cell C4. If you are using Excel 2007 you would use the supplemental function RANK_AVG instead of RANK.AVG (as explained in Ranking).
We now calculate both correlation coefficients as follows:
Pearson’s correlation = CORREL(A4:A13,B4:B13) = -0.036
Spearman’s rho = CORREL(C4:C13,D4:D13) = -0.115
The above analysis shows there isn’t much of a correlation between IQ and listening to rap music based on the sample, although the Spearman’s rho is closer to zero (indicating independent samples) than the Pearson’s.
Observation: For sufficiently high values of (some say n > 10), Theorem 1 and 2 of One Sample Hypothesis Testing for Correlation can now be used for hypothesis testing using the Spearman’s rank coefficient instead of the correlation coefficient. For lower values, the table of critical values found in Spearman’s Rho Table can be used
When conducting an analysis, if you discover the presence of outliers (e.g. via a histogram or scatter diagram), proceed as follows:
Calculate the Pearson’s correlation coefficient for the sample with and without the outliers. If there isn’t much difference, then you can be pretty confident that the outliers are not influencing the results. You can also calculate the Spearman’s rank coefficient. If this is pretty similar to the Pearson’s correlation coefficient, this is also a good indicator that the outliers are not substantially influencing the results.
If there are clear differences then you will need to be cautious about how you treat the outliers.
Similarly if you test the sample data for the x and y variables and see that either one of them is not roughly normal (using the techniques described in Testing for Normality and Symmetry), then you will need to use the Spearman’s coefficient rather than Pearson’s.
Figure 2 displays a scatter diagram for the data in Example 1 We see that the data is pretty randomly scattered although there is a potential outlier where the rap music listening spikes to 45. This gives some evidence that the Spearman’s rho might be a better choice.
Observation: When there are no ties in the ranking, there is alternative way of calculating Spearman’s rho using the following property.
Property 1: When there are no ties, Spearman’s rho =
where di = rank xi – rank yi.
Example 2: Calculate Spearman’s rho for the data from Example 1 using Property 1.
Figure 3 – Alternative way of calculating rho for Example 1
Using Property 1 and the table in Figure 3:
Example 3: Repeat the analysis for Example 1 of One Sample Hypothesis Testing for Correlation using Spearman’s rho.
As for Example 1, Spearman’s rho is the correlation coefficient on the ranked data, namely CORREL(C5:C19,D5:D19) = -.674 (see Figure 4).
We now use the table in Spearman’s Rho Table to find the critical value for the two-tail test where n = 15 and α = .05. Interpolating between the values for n = 14 and 16, we get a critical value of .525. Since the absolute value of rho is larger than the critical value, we reject the null hypothesis that there is no correlation between cigarette smoking and longevity.
Since n = 15 ≥ 10, we can use a t-test instead of the table. By Theorem 1 of One Sample Hypothesis Testing for Correlation, we use the test statistic
Since |t| = 3.29 > 2.16 = tcrit = TINV(.05,13), we again conclude that there is a significant negative correlation between the number of cigarettes smoked and longevity.
Observation: To conduct a one-tail test, use the table in Spearman’s Rho Table with α divided by 2.
Real Statistics Excel Function: The following function is provided in the Real Statistics Resource Pack:
RhoCRIT(n, α, t) = the critical value of the Spearman’s rho test for samples of size n, for the given value of alpha, and t = 1 (one tail) or 2 (two tails).