For samples of any given size *n* it turns out that *r* is not normally distributed when *ρ* ≠ 0 (even when the population has a normal distribution), and so we can’t use Theorem 1 from Correlation Testing via t Test.

There is a simple transformation of *r*, however, that gets around this problem, and allows us to test whether *ρ = ρ _{0}* for some value of

*ρ*0.

_{0}≠**Definition 1**: For any *r* define the **Fisher transformation** of *r* as follows:

**Theorem 1**: If *x* and y have a joint bivariate normal distribution or *n* is sufficiently large, then the Fisher transformation *r’* of the correlation coefficient *r* for samples of size *n* has distribution *N*(*ρ′, s _{r′}*) where

**Corollary 1**: Suppose *r _{1}* and

*r*are as in the theorem where

_{2}*r*and

_{1}*r*are based on independent samples and further suppose that

_{2}*ρ*. If z is defined as follows, then

_{1}= ρ_{2}*z*~

*N*(0, 1).

Proof: From the theorem

for *i* = 1, 2. By Property 1 and 2 of Basic Characteristics of the Normal Distribution,

where *s* is as defined above. Since *ρ _{1} = ρ_{2}*, it follows that , and so ~

*N*(0,

*s*) from which it follows that

*z*~

*N*(0,1).

**Excel Functions**: Excel provides the following functions that calculate the Fisher transformation and its inverse.

**FISHER**(*r*) = .5 * LN((1 + *r*) / (1 – *r*))

**FISHERINV**(*z*) = (EXP(2 * *z*) – 1) / (EXP(2 * *z*) + 1)

**Observation**: We can use Theorem 1 to test the null hypothesis H_{0}: *ρ = ρ _{0}*. This test is very sensitive to outliers. If outliers are present it may be better to use the Spearman rank correlation test or Kendall’s tau test.

The corollary can be used to test whether two samples are drawn from populations with equal correlations.

**Example 1**: Suppose we calculate *r* = .7 for a sample of size *n* = 100. Test the following null hypothesis and find the 95% confidence interval.

H_{0}: *ρ* = .6

Observe that

*r′* = FISHER(*r*) = FISHER(.7) = 0.867

*ρ′* = FISHER(*ρ*) = FISHER(.6) = 0.693

*s _{r′}* = 1 / SQRT(

*n*– 3) = 1 / SQRT(100 – 3) = 0.102

Since *r′* > *ρ′* we are looking at the right tail of a two-tail test

p-value = 2*(1–NORMDIST(*r′, ρ′, s _{r′},* TRUE)) = 2*(1–NORMDIST(.867, .693, .102, TRUE)) = .0863 > 0.05 =

*α*

*r′-crit* = NORMINV(1–*α/2, ρ′, s _{r′}*) = NORMINV(.975, .693, .102) = .892 > .693 =

*r*

*′*In either case, we cannot reject the null hypothesis.

The 95% confidential interval for *ρ ′* is

*r′ ± z _{crit} ∙ s_{r′} *= 0.867 ± 1.96 ∙ 0.102 = (0.668, 1.066)

Since *z _{crit}* = ABS(NORMSINV(.025)) = 1.96 the 95% confidence interval for

*ρ′*is (FISHERINV(0.668), FISHERINV(1.066)) = (.584, .788). Note that .6 lies in this interval, confirming our conclusion not to reject the null hypothesis.

**Example 2**: Repeat the analysis of Example 2 of Correlation Testing via t Test using Theorem 1, this time performing a two-tail test (H_{0}: *ρ* = 0) using the standard normal test *z = (r′– ρ′) / s _{r′}*

*r* = CORREL(R1, R2) = .564

*r′* = FISHER(r) = FISHER(.564) = .639

*ρ′* = FISHER(*ρ*) = FISHER(0) = 0 (based on the null hypothesis)

*s _{r′} *= 1 / SQRT(

*n*– 3) = .146

*z = (r′ – ρ′) / s _{r′} *= 4.38

Since *z* > 0, we perform the standard normal test on the right tail:

p-value = 1 – NORMSDIST(*z*) = NORMSDIST(4.38) = 5.9E-06 < 0.025 = *α*/2

*z _{crit} *= NORMSINV(1 –

*α*/2) = NORMSINV(.975) = 1.96 < 4.38 =

*z*

_{obs}In either case we reject the null hypothesis (H_{0}: *ρ* = 0) and conclude that there is some association between the variables.

We can also calculate the 95% confidence interval for *ρ′ *as follows:

*r′ ± z _{crit} ∙ s_{r′} *= .639 ± (1.96)(.146) = (.353, .925)

Using FISHERINV we transform this interval to a 95% confidence interval for *ρ*:

(FISHERINV(.353), FISHERINV(.925)) = (.339, .728)

Since *ρ* = 0 is outside this interval, once again we reject the null hypothesis.

**Real Statistics Functions**: The following functions are provided in the Real Statistics Resource Pack.

**CorrTest**(*exp, obs, size, tails*) = the p-value of the one sample two-tail test of the correlation coefficient using Theorem 2 where *exp* is the expected population correlation coefficient and *obs* is the observed correlation coefficient based on a sample of the stated *size*. If *tails* = 2 (default) a two-tailed test is employed, while if *tails* = 1 a one tailed test is employed.

**CorrLower**(*r, size, alpha*) = the lower bound of the 1 – *alpha* confidence interval of the population correlation coefficient based on a sample correlation coefficient *r* for a sample of the stated *size*.

**CorrUpper**(*r, size, alpha*) = the upper bound of the 1 – *alpha* confidence interval of the population correlation coefficient based on a sample correlation coefficient *r* for a sample of the stated *size*.

**CorrelTest**(*r, size, rho, alpha, lab, tails*): array function which outputs *z*, p-value, lower and upper (i.e. lower and upper bound of the 1 – alpha confidence interval), where *rho*, *r* and *size* are as described above. If *lab* = True then output takes the form of a 4 × 2 range with the first column consisting of labels, while if *lab* = False (default) then output takes the form of a 4 × 1 range without labels.

**CorrelTest**(R1, R2, *rho, alpha, lab, tails*) = CorrelTest(*r, size, rho, alpha, lab, tails*) where *r* = CORREL(R1, R2) and *size* = the common sample size, i.e. the number of pairs from R1 and R2 which both contain numeric data.

If *alpha* is omitted it defaults to .05. If *tails* = 2 (default) a two-tailed test is employed, while if *tails* = 1 a one tailed test is employed.

**Observation**: For Example 1, CorrTest(.6, .7, 100) = .0864, CorrLower(.7, 100, .05) = .584 and CorrLower(.7, 100, .05) = .788. Also =CorrelTest(.7, 100, .6, 100, .05, TRUE) generates the following output:

**Example 3**: Test whether the correlation coefficient for the data in the ranges K12:K18 and L12:L18 of the worksheet in Figure 1 is significantly different from .9.

**Figure 1 – Hypothesis testing of the correlation coefficient**

We calculate the correlation coefficient for the two samples is .975 (cell O12) using the formula =CORREL(K12:K18,L12:L18). The two-tailed test is conducted in the range N14:O17 via the array formula =CorrelTest(K12:K18,L12:L18,0.9,0.05,TRUE). Since p-value = .15 > .05 = *α*, we cannot reject the null hypothesis that the data is taken from a population with correlation .9.

how can we take care when hypothesis is rho =1

Anuj,

It’s a good question since the Fisher test doesn’t work in this case. The best I can think of at this moment is to test instead for a value just under one; e.g. rho = .99.

Charles

Hey Charles,

For Example 2, why is alpha divided by 2 to determine the significance of the p-value? Is it because of the one-tailed test? In the event of a two-tailed test, would you leave the alpha as is?

Thanks.

After a bit more research, it seems like one-tailed tests use the alpha as is, while the two-tailed tests use alpha/2. If that is the case, you may need to change the way you calculated the z-crit value in Example 2.

David,

I believe that I have calculated the z-crit value correctly, using alpha/2.

Charles