An alternative to using Fisher’s transformation for one-sample correlation testing is to use resampling techniques, bootstrapping and randomization, as described in Resampling Procedures and Resampling Data Analysis Tool.
Example 1: Repeat Example 5 of One-sample Correlation Hypothesis Testing using bootstrapping. In particular, find the 95% confidence interval for the correlation coefficient of the correlation between Sample 1 and Sample 2 shown in range A4:B11 of Figure 1.
The correlation coefficient for the sample data is CORREL(A5:A11,B5;B11) = .97292 (cell B13). Range A15:B18 shows the results of the analysis using the Fisher transformation. We now explain the rest of Figure 1.
Figure 1 – Bootstrapping of the correlation coefficient
We start by generating 2,000 random sample pairs based on pairs from the original sample and then calculate the correlation coefficient of each of these pairs. Since the original sample consists of 7 pairs, each bootstrap sample should consist of 7 pairs. Since we want to make sure that pairs from the original sample stay together (e.g. if 34 is included in a bootstrap sample then it should be paired with 46 from Sample 1 and not 78 or some other value in Sample 2), we randomly generate 7 numbers between 1 and 7 with replacement and use these values as indices to the two original samples.
Figure 1 shows the first 20 of the 2,000 random bootstrap samples. The first pair of bootstrap samples is shown in ranges M5:S5 and U5:AA5 of Figure 1. The 7 random numbers used to generate this pair is shown in range E5:K5. Each of the cells in this range contain the formula =RANDBETWEEN(1, 7). Cell M5 contains the formula =INDEX($A$5:$A$11,E5) and cell U5 contains the formula =INDEX($B$5:$B$11,E5).
In fact once the formulas in cells E5, M5 and U5 are inserted, all the others can be generated by (1) highlighting range E5:K2004 and pressing Ctrl-R and then Ctrl-D, (2) highlighting range M5:S2004 and pressing Ctrl-R and then Ctrl-D and (3) highlighting range U5:AA2004 and pressing Ctrl-R and then Ctrl-D.
Next we need to calculate the values of the correlation coefficient for each bootstrap sample pair. This can be done by inserting the formula =CORREL(M5:S5,U5:AA5) in cell AC5, highlighting the range AC5:AC2004 and pressing Ctrl-R and Ctrl-D. The result is 2,000 correlation coefficients in range AC5:AC2004. We can now create a frequency table and histogram of these values, as shown in Figure 2.
Figure 2 – Frequency table and histogram
We observe that the bootstrap correlations are not normally distributed. We also observe that the total number of samples (cell AG32) doesn’t come out to 2,000 as expected. This is because one of the pairs of bootstrapped samples has zero standard deviation (since all the sample values are the same) and so the correlation coefficient is undefined.
Since there are 2,000 iterations and α = .05, the lower end of the confidence interval is the 2,000 · .025 = 50th smallest of the bootstrapped correlation coefficients, while the upper end of the confidence interval is the 50th largest of the bootstrapped correlation coefficients. The resulting confidence interval (.894, .999) can be calculated in Excel as shown in Figure 3.
Figure 3 – Confidence interval
Real Statistics Data Analysis Tool: We now show how to perform the above analysis using the Resampling data analysis tool.
Press Ctrl-m and double-click on the Resampling data analysis tool from the menu. Next fill in the dialog box that appears as shown in Figure 4 and click on the OK button.
Figure 4 – Resampling dialog box
The output includes a Frequency Table and Histogram similar to that shown in Figure 2, plus the output shown in Figure 5.
Figure 5 – Bootstrapping for correlation coefficient
The correlation coefficient of the original samples (A5:A11 and B5:B11 of Figure 1) is 0.975292 (cell AQ37). Since the hypothetical population correlation coefficient of .9 is less than .975292, the left-tailed p-value (.024 in cell AQ38) is the count of all bootstrapped correlation coefficients that are less than .9 divided by 2,000 (the total number of bootstrapped correlations generated).
Since .975292 – .9 = .075292, the right-tailed p-value (cell AQ39) is the count of all bootstrapped correlation coefficients that are larger than 0.975292 + .075292 = 1.050584 divided by 2,000. But since 1.050584 > 1, the right-tailed p-value is 0. The two-tailed p-value is the sum of the left and righted tailed, which is .024 + 0 = .024. Since this value is less than α, we conclude that the population correlation coefficient is significantly different from .9, which is different from the conclusion we reached using the Fisher transformation in Example 5 of One-sample Correlation Hypothesis Testing.
Figure 5 also shows the 95% confidence interval of (.901627, .998899) produced by the resampling data analysis tool. The mean of all the bootstrapped correlation coefficients is .972337 (cell AQ44) with standard deviation .039079 (cell AQ45).
Observation: If the Resampling option is chosen instead of the Bootstrapping option, then the hypothesized population correlation coefficient is assumed to be zero and sampling is done without replacement. Each sample pair now consists of all the elements in sample 1 (range A5:A11) in their original order together with the elements in sample 2 (range B5:B11) in random order (equivalent to using the SHUFFLE function).