For samples of any given size n it turns out that r is not normally distributed when ρ ≠ 0 (even when the population has a normal distribution), and so we can’t use Theorem 1 from Correlation Testing via t Test.
There is a simple transformation of r, however, that gets around this problem, and allows us to test whether ρ = ρ0 for some value of ρ0 ≠ 0.
Definition 1: For any r define the Fisher transformation of r as follows:
Theorem 1: If x and y have a joint bivariate normal distribution or n is sufficiently large, then the Fisher transformation r’ of the correlation coefficient r for samples of size n has distribution N(ρ′, sr′) where
Corollary 1: Suppose r1 and r2 are as in the theorem where r1 and r2 are based on independent samples and further suppose that ρ1 = ρ2. If z is defined as follows, then z ~ N(0, 1).
Proof: From the theorem
for i = 1, 2. By Property 1 and 2 of Basic Characteristics of the Normal Distribution,
where s is as defined above. Since ρ1 = ρ2, it follows that , and so ~ N(0,s) from which it follows that z ~ N(0,1).
Excel Functions: Excel provides the following functions that calculate the Fisher transformation and its inverse.
FISHER(r) = .5 * LN((1 + r) / (1 – r))
FISHERINV(z) = (EXP(2 * z) – 1) / (EXP(2 * z) + 1)
Observation: We can use Theorem 1 to test the null hypothesis H0: ρ = ρ0. This test is very sensitive to outliers. If outliers are present it may be better to use the Spearman rank correlation test or Kendall’s tau test.
The corollary can be used to test whether two samples are drawn from populations with equal correlations.
Example 1: Suppose we calculate r = .7 for a sample of size n = 100. Test the following null hypothesis and find the 95% confidence interval.
H0: ρ = .6
r′ = FISHER(r) = FISHER(.7) = 0.867
ρ′ = FISHER(ρ) = FISHER(.6) = 0.693
sr′ = 1 / SQRT(n – 3) = 1 / SQRT(100 – 3) = 0.102
Since r′ > ρ′ we are looking at the right tail of a two-tail test
p-value = 2*(1–NORMDIST(r′, ρ′, sr′, TRUE)) = 2*(1–NORMDIST(.867, .693, .102, TRUE)) = .0863 > 0.05 = α
r′-crit = NORMINV(1–α/2, ρ′, sr′) = NORMINV(.975, .693, .102) = .892 > .693 = r′
In either case, we cannot reject the null hypothesis.
The 95% confidential interval for ρ′ is
r′ ± zcrit ∙ sr′ = 0.867 ± 1.96 ∙ 0.102 = (0.668, 1.066)
Since zcrit = ABS(NORMSINV(.025)) = 1.96 the 95% confidence interval for ρ′ is (FISHERINV(0.668), FISHERINV(1.066)) = (.584, .788). Note that .6 lies in this interval, confirming our conclusion not to reject the null hypothesis.
Example 2: Repeat the analysis of Example 2 of Correlation Testing via t Test using Theorem 1, this time performing a two-tail test (H0: ρ = 0) using the standard normal test z = (r′– ρ′) / sr′
r = CORREL(R1, R2) = .564
r′ = FISHER(r) = FISHER(.564) = .639
ρ′ = FISHER(ρ) = FISHER(0) = 0 (based on the null hypothesis)
sr′ = 1 / SQRT(n – 3) = .146
z = (r′ – ρ′) / sr′ = 4.38
Since z > 0, we perform the standard normal test on the right tail:
p-value = 1 – NORMSDIST(z) = NORMSDIST(4.38) = 5.9E-06 < 0.025 = α/2
zcrit = NORMSINV(1 – α/2) = NORMSINV(.975) = 1.96 < 4.38 = zobs
In either case we reject the null hypothesis (H0: ρ = 0) and conclude that there is some association between the variables.
We can also calculate the 95% confidence interval for ρ′ as follows:
r′ ± zcrit ∙ sr′ = .639 ± (1.96)(.146) = (.353, .925)
Using FISHERINV we transform this interval to a 95% confidence interval for ρ:
(FISHERINV(.353), FISHERINV(.925)) = (.339, .728)
Since ρ = 0 is outside this interval, once again we reject the null hypothesis.
Real Statistics Functions: The following functions are provided in the Real Statistics Resource Pack.
CorrTest(exp, obs, size, tails) = the p-value of the one sample two-tail test of the correlation coefficient using Theorem 2 where exp is the expected population correlation coefficient and obs is the observed correlation coefficient based on a sample of the stated size. If tails = 2 (default) a two-tailed test is employed, while if tails = 1 a one tailed test is employed.
CorrLower(r, size, alpha) = the lower bound of the 1 – alpha confidence interval of the population correlation coefficient based on a sample correlation coefficient r for a sample of the stated size.
CorrUpper(r, size, alpha) = the upper bound of the 1 – alpha confidence interval of the population correlation coefficient based on a sample correlation coefficient r for a sample of the stated size.
CorrelTest(r, size, rho, alpha, lab, tails): array function which outputs z, p-value, lower and upper (i.e. lower and upper bound of the 1 – alpha confidence interval), where rho, r and size are as described above. If lab = True then output takes the form of a 4 × 2 range with the first column consisting of labels, while if lab = False (default) then output takes the form of a 4 × 1 range without labels.
CorrelTest(R1, R2, rho, alpha, lab, tails) = CorrelTest(r, size, rho, alpha, lab, tails) where r = CORREL(R1, R2) and size = the common sample size, i.e. the number of pairs from R1 and R2 which both contain numeric data.
If alpha is omitted it defaults to .05. If tails = 2 (default) a two-tailed test is employed, while if tails = 1 a one tailed test is employed.
Observation: For Example 1, CorrTest(.6, .7, 100) = .0864, CorrLower(.7, 100, .05) = .584 and CorrLower(.7, 100, .05) = .788. Also =CorrelTest(.7, 100, .6, 100, .05, TRUE) generates the following output:
Figure 1 – Hypothesis testing of the correlation coefficient
We calculate the correlation coefficient for the two samples is .975 (cell O12) using the formula =CORREL(K12:K18,L12:L18). The two-tailed test is conducted in the range N14:O17 via the array formula =CorrelTest(K12:K18,L12:L18,0.9,0.05,TRUE). Since p-value = .15 > .05 = α, we cannot reject the null hypothesis that the data is taken from a population with correlation .9.