When the conditions for Pearson’s chi-square test are not met, especially when one of more of the cells have *exp _{i}* < 5, an alternative approach with 2 × 2 contingency tables is to use

**Fisher’s exact test**. Since this method is more computationally intense, it is best used for smaller samples.

**Example 1**: Repeat Example 2 from Independence Testing using the data in range A5:D8 of Figure 1; i.e. determine whether the cure rate is independent of the therapy used.

As you can see from Figure 1, the expectation for two of the cells is less than 5. Since we are dealing with a 2 × 2 contingency table with relatively small sample size, it is better to use Fisher’s exact test.

The approach is to determine how many different ways the above marginal frequencies can be achieved and then determine the probability that the above observed cell configuration can be obtained merely by chance.

We can restrict our attention to any one of the cells since once the frequency for one cell is determined the frequencies for the other cells can be determined from the marginal totals. We choose cell B6 since it has the smallest marginal total (namely 9 in cell D6) and it is smaller than the other element that makes up this marginal total (namely 7 in cell C6).

Now cell B6 can take any value between 0 and 9; once this value is set the values of the other three cells can be adjusted to maintain the marginal totals.

The probability that cell B6 takes on a specific value *x* is equivalent to the probability of getting *x* successes in a sample of size 9 (cell D6) taken without replacement from a population of size 21 (cell D8) which contains 11 (cell B8) successful choices. This can be calculated by the hypergeometric distribution. Here cells D6 and B8 are cells with the marginal totals corresponding to cell B6 and cell D8 contains the grand total.

Figure 2 contains a table of the probabilities for each possible value of *x*.

**Figure 2 – Fisher exact test for Example 1**

Thus, e.g., cell L11 contains the formula

=HYPGEOMDIST(K11,$B$8,$D$6,$D$8)

Our test consists of determining whether the probability that at most 2 of those taking therapy 1 are cured (the observed count in cell B6) is less than .05. From Figure 2, we see that the probability of count 0 is 3.4E-05, the probability of count 1 is .001684 and the probability of count 2 is .022454 for a cumulative probability of .024172 < .05 = *α*, and so we reject the null hypothesis and conclude there is a significant difference between the cure rates for the two therapies.

There are one-tail and two-tail versions of the test. The p-value for the one tail test (cell L17) is given by the formula =SUM(L6:L8) or equivalently (for versons of Excel starting with Excel 2010)

= HYPGEOM.DIST(K8,B8,D6,D8,TRUE)

The p-value for the two tail test (cell L18) given by the formula

=SUM(L6:L8)+SUM(L14:L15)

where K14 is the leftmost cell in the right tail that has a pdf value ≤ L8 (since .005614 ≤ .022454, but .050522 > .022454). Equivalently, we can use the formula (for versons of Excel starting with Excel 2010)

= HYPGEOM.DIST(K8,B8,D6,D8,TRUE)+1−HYPGEOM.DIST(K13,B8,D6,D8,TRUE)

**Real Statistics Excel Function**: The following function is provided in the Real Statistics Resource Pack:

**FISHERTEST**(R1, *t*) = the probability calculated by the Fisher exact test for the 2 × 2 contingency table contained in range R1 where *t* = the number of tails = 1 (one-tail) or 2 (two tail).

The range R1 must contain only numeric values.

For Example 1, FISHERTEST(B6:C7,1) = .024172 and FISHERTEST(B6:C7, 2) = .029973.

I downloaded the Real Statistics Resource Pack, but I an unable to use perform a fisher test as advertised. It is not in the list of functions when I open Realstats, and it doesn’t work when I try entering the function directly into the cell either (=FISHERTEST(R1,t)). when I gry this I get #VALUE!. I am selecting only the 2×2 table for my data range and it is definitely only numeric data. Other functions work ( I tested the chi square function). FYI I have the 2007 version of excel.

Any suggestions?

Thank you.

Sir

It is very creative to use hypergometric distribution to solve this problem, you are very cool! But I think there is a typo in the following paragraph. Maybe you should change “with replacement” to “without replacement”. Is that right?

“ The probability that cell B6 takes on a specific value is equivalent to the probability of getting successes in a sample of size 9 (cell D6) taken with replacement from a population of size 21 (cell D8) which contains 11 (cell B8) successful choices. This can be calculated by the hypergeometric distribution. ”

Colin,

Another good catch. I have made the change that you have suggested.

Charles

Sir

You wrote : “The probability that cell B6 takes on a specific value is equivalent to the probability of getting successes in a sample of size 9 (cell D6) taken with replacement from a population of size 21 (cell D8) which contains 11 (cell B8) successful choices. This can be calculated by the hypergeometric distribution. ”

According to above, should the Excel formula “=HYPGEOMDIST(K11,$B$8,$D$6,$D$8)” change to “=HYPGEOMDIST(K11,$D$6,$B$8,$D$8)”

Sir

I’ve checked the Excel Workbook you provided. The formula in this post is a typo.

Colin,

Use of your formulas will yield the same result.

Charles

Sir

You should change ” .023172 < .05 = α" to "0.024172", and "FISHERTEST(B6:C7,1) = .023172" should be "0.024172".

BTW, I cannot understand the two-tailed test formula "=2*MIN(SUM(L6:L8),SUM(L8:L15),.5)."

Colin,

Thanks for catching another error. I have now corrected the typo. The two-tailed test is just twice the one-tailed test; the .5 entry is there just in case in order to make sure that the probability value is not larger than 1.

Charles

Thank you so much for your help and your brilliant add-in!