Definition 1: Under the same assumptions as for the binomial distribution, from a population of size m of which k are successes, a sample of size n is drawn. Let x be a random variable whose value is the number of successes in the sample. The probability density function (pdf) for x, called the hypergeometric distribution, is given by
Observations: Let p = k/m. Then the situation is the same as for the binomial distribution B(n, p) except that in the binomial case after each trial the selection (whether success or failure) is put back in the population, while in the hypergeometric case the selection is not put back and so can’t be drawn again. When n is large the hypergeometric and bionomial distributions yield more or less the same result, but this is not necessarily true for small samples.
Excel Functions: Excel provides the following function:
HYPGEOMDIST(x, n, k, m) = the probability of getting x successes from a sample of size n, where the population has size m of which k are successes; i.e. the pdf of the hypergeometric distribution.
Excel 2010/2013 provide the following additional function: HYPGEOM.DIST(x, n, k, m, cum) where cum takes the value TRUE or FALSE. HYPGEOM.DIST(x, n, k, m, FALSE) = HYPGEOMDIST(x, n, k, m), while HYPGEOM.DIST(x, n, k, m, TRUE) = the probability of getting at most x successes from a sample of size n, where the population has size m of which k are successes; i.e. the cumulative probability function.
Real Statistics Function: Excel doesn’t provide a worksheet function for the inverse of the hypergeometric distribution. Instead you can use the following function provided by the Real Statistics Resource Pack.
HYPGEOM_INV(p, n, k, m) = smallest integer x such that HYPGEOM.DIST(x, n, k, m, TRUE) ≥ p.
Note that the maximum value of x is n. A value higher than this (namely n+1) indicates an error. This function is only available for users of Excel 2010 or later.
Property 1: The mean of the hypergeometric distribution given above is np where p = k/m.
Proof: Let xi be the random variable such that xi = 1 if the ith sample drawn is a success and 0 if it is a failure. Since the mean of each xi is p and x = , it follows by Property 1 of Expectation that
Example 1: A bag contains 12 balls, 8 red and 4 blue. You reach in the bag and pick 3 balls at random (without replacement). What is the probability that at least 2 of the balls will be blue?
At least 2 blue balls means 2 or 3 blue balls in this context and so the answer is 32.6%, calculated as follows:
= HYPGEOMDIST(2, 3, 4, 12) + HYPGEOMDIST(3, 3, 4, 12)
= .218 + .018 = .236
Example 2: Mary and Jane both attend the same university, but don’t know each other. Each has about 200 friends at the university. Assuming that each of these groups of friends represents a random sample from the 50,000 students who attend the university, what is the probability that Mary and Jane will have at least one friend in common.
It turns out that this problem is equivalent to picking 200 balls at random (representing Mary’s friends) from a bag containing 49,998 balls (representing the 50,000 students less Mary and Jane), 200 of which are blue (representing Jane’s friends), and getting at least one blue ball. We first calculate the probability that none of the balls will be blue as follows:
HYPGEOMDIST(0, 200, 200, 49998) = .448
Thus the answer is 1 – .448 = 55.2%.