Correlation testing via Fisher transformation

For samples of any given size n it turns out that r is not normally distributed when ρ ≠ 0 (even when the population has a normal distribution), and so we can’t use Theorem 1 from Correlation Testing via t Test.

There is a simple transformation of r, however, that gets around this problem, and allows us to test whether ρ = ρ0 for some value of ρ0 ≠ 0.

Definition 1: For any r define the Fisher transformation of r as follows:

Fisher transformation

Theorem 1: If x and y have a joint bivariate normal distribution or n is sufficiently large, then the Fisher transformation r’ of the correlation coefficient r for samples of size n has distribution N(ρ′, sr′) where

image1564

Corollary 1: Suppose r1 and r2 are as in the theorem where r1 and r2 are based on independent samples and further suppose that ρ1 = ρ2. If z is defined as follows, then z ~ N(0, 1).

image1569

where
image1570

Proof: From the theorem

image1571

for i = 1, 2. By Property 1 and 2 of Basic Characteristics of the Normal Distribution,

image1572

where s is as defined above. Since ρ1 = ρ2, it follows that \rho'_1 = \rho'_2, and so r'_1 = r'_2 ~ N(0,s) from which it follows that z ~ N(0,1).

Excel Functions: Excel provides the following functions that calculate the Fisher transformation and its inverse.

FISHER(r) = .5 * LN((1 + r) / (1 – r))

FISHERINV(z) = (EXP(2 * z) – 1) / (EXP(2 * z) + 1)

Observation: We can use Theorem 1 to test the null hypothesis H0: ρ = ρ0. This test is very sensitive to outliers. If outliers are present it may be better to use the Spearman rank correlation test or Kendall’s tau test.

The corollary can be used to test whether two samples are drawn from populations with equal correlations.

Example 1: Suppose we calculate r = .7 for a sample of size n = 100. Test the following null hypothesis and find the 95% confidence interval.

H0: ρ = .6

Observe that

r′ = FISHER(r) = FISHER(.7) = 0.867

ρ′ = FISHER(ρ) = FISHER(.6) = 0.693

sr′ = 1 / SQRT(n – 3) = 1 / SQRT(100 – 3) = 0.102

Since r′ > ρ′ we are looking at the right tail of a two-tail test

p-value = 2*(1–NORMDIST(r′, ρ′, sr′, TRUE)) = 2*(1–NORMDIST(.867, .693, .102, TRUE)) = .0863 > 0.05 = α

r′-crit = NORMINV(1–α/2, ρ′, sr′) = NORMINV(.975, .693, .102) = .892 > .693 = r

In either case, we cannot reject the null hypothesis.

The 95% confidential interval for ρ is

r′ ± zcrit ∙ sr′ = 0.867 ± 1.96 ∙ 0.102 = (0.668, 1.066)

Since zcrit = ABS(NORMSINV(.025)) = 1.96 the 95% confidence interval for ρ′ is (FISHERINV(0.668), FISHERINV(1.066)) = (.584, .788). Note that .6 lies in this interval, confirming our conclusion not to reject the null hypothesis.

Example 2: Repeat the analysis of Example 2 of Correlation Testing via t Test using Theorem 1, this time performing a two-tail test (H0: ρ = 0) using the standard normal test z = (r′– ρ′) / sr′

r = CORREL(R1, R2) = .564

r′ = FISHER(r) = FISHER(.564) = .639

ρ′ = FISHER(ρ) = FISHER(0) = 0 (based on the null hypothesis)

sr′ = 1 / SQRT(n – 3) = .146

z = (r′ – ρ′) / sr′ = 4.38

Since z > 0, we perform the standard normal test on the right tail:

p-value = 1 – NORMSDIST(z) = NORMSDIST(4.38) = 5.9E-06 < 0.025 = α/2

zcrit = NORMSINV(1 – α/2) = NORMSINV(.975) = 1.96 < 4.38 = zobs

In either case we reject the null hypothesis (H0: ρ = 0) and conclude that there is some association between the variables.

We can also calculate the 95% confidence interval for ρ′ as follows:

r′ ± zcrit ∙ sr′ = .639 ± (1.96)(.146) = (.353, .925)

Using FISHERINV we transform this interval to a 95% confidence interval for ρ:

(FISHERINV(.353), FISHERINV(.925)) = (.339, .728)

Since ρ = 0 is outside this interval, once again we reject the null hypothesis.

Real Statistics Functions: The following functions are provided in the Real Statistics Resource Pack.

CorrTest(exp, obs, size, tails) = the p-value of the one sample two-tail test of the correlation coefficient using Theorem 2 where exp is the expected population correlation coefficient and obs is the observed correlation coefficient based on a sample of the stated size. If tails = 2 (default) a two-tailed test is employed, while if tails = 1 a one tailed test is employed.

CorrLower(r, size, alpha) = the lower bound of the 1 – alpha confidence interval of the population correlation coefficient based on a sample correlation coefficient r for a sample of the stated size.

CorrUpper(r, size, alpha) = the upper bound of the 1 – alpha confidence interval of the population correlation coefficient based on a sample correlation coefficient r for a sample of the stated size.

CorrelTest(r, size, rho, alpha, lab, tails): array function which outputs z, p-value, lower and upper (i.e. lower and upper bound of the 1 – alpha confidence interval), where rho, r and size are as described above. If lab = True then output takes the form of a 4 × 2 range with the first column consisting of labels, while if lab = False (default) then output takes the form of a 4 × 1 range without labels.

CorrelTest(R1, R2, rho, alpha, lab, tails) = CorrelTest(r, size, rho, alpha, lab, tails) where r = CORREL(R1, R2) and size = the common sample size, i.e. the number of pairs from R1 and R2 which both contain numeric data.

If alpha is omitted it defaults to .05. If tails = 2 (default) a two-tailed test is employed, while if tails = 1 a one tailed test is employed.

Observation: For Example 1, CorrTest(.6, .7, 100) = .0864, CorrLower(.7, 100, .05) = .584 and CorrLower(.7, 100, .05) = .788. Also =CorrelTest(.7, 100, .6, 100, .05, TRUE) generates the following output:

CorrelTest functionExample 3: Test whether the correlation coefficient for the data in the ranges K12:K18 and L12:L18 of the worksheet in Figure 1 is significantly different from .9.

Correlation testing Fisher transformation

Figure 1 – Hypothesis testing of the correlation coefficient

We calculate the correlation coefficient for the two samples is .975 (cell O12) using the formula =CORREL(K12:K18,L12:L18). The two-tailed test is conducted in the range N14:O17 via the array formula =CorrelTest(K12:K18,L12:L18,0.9,0.05,TRUE). Since p-value = .15 > .05 = α, we cannot reject the null hypothesis that the data is taken from a population with correlation .9.

3 Responses to Correlation testing via Fisher transformation

  1. David says:

    Hey Charles,

    For Example 2, why is alpha divided by 2 to determine the significance of the p-value? Is it because of the one-tailed test? In the event of a two-tailed test, would you leave the alpha as is?

    Thanks.

    • David says:

      After a bit more research, it seems like one-tailed tests use the alpha as is, while the two-tailed tests use alpha/2. If that is the case, you may need to change the way you calculated the z-crit value in Example 2.

Leave a Reply

Your email address will not be published. Required fields are marked *