The one sample hypothesis test described in Hypothesis Testing using the Central Limit Theorem using the normal distribution is fine when one knows the standard deviation of the population distribution and the population is either normally distributed or the sample is sufficiently large that the Central Limit Theorem applies.
The problem is that the standard deviation of the population is generally not known. One approach for addressing this is to use the standard deviation s of the sample as an approximation for the standard deviation σ for the population. In fact, as is described below, such an approach is possible using the t distribution.
Definition 1: The (Student’s) t distribution with k degrees of freedom, abbreviated T(k) has probability distribution function given by
Observations: Key statistical properties of the t distribution are:
- Mean = 0 for k > 0
- Median = 0
- Mode = 0
- Range = (-∞, ∞)
- Variance = k ⁄ (k – 2) for k > 2
- Skewness = 0 for k > 3
- Kurtosis = 6 ⁄ (k – 4) for k > 4
The overall shape of the probability density function of the t distribution resembles the bell shape of a normally distributed variable with mean 0 and variance 1, except that it is a bit lower and wider. As the number of degrees of freedom grows, the t distribution approaches the standard normal distribution, and in fact the approximation is quite close for k ≥ 30.
Figure 1 – Chart of t distribution by degrees of freedom
Theorem 1: If x has normal distribution N(μ, σ), then for samples of size n, the random variable
has distribution T(n – 1).
Click here for a proof of Theorem 1.
Corollary 1: For samples of sufficiently large size n, the random variable
has distribution T(n – 1).
Proof: This follows from the theorem by the Central Limit Theorem.
Observation: The test statistic in the theorem and corollary are the same as
from Central Limit Theorem with the population standard deviation σ replaced by the sample standard deviation s. What makes this useful is that usually the standard deviation of the population is unknown while the standard devastation of the sample is known.
Excel Functions: Excel provides the following functions regarding the t distribution:
TDIST(x, df, tails) = the right tail at x of the Student’s t cumulative probability distribution function with df degrees of freedom when tails = 1 (for a one-tailed test). When tails = 2 (for a two-tailed test), TDIST(x, df, tails) is the sum of the right and left tails.
Since the t distribution is symmetric about x = 0, TDIST(x, df, 2) is simply 2 * TDIST(x, df, 1). Also note that x must be non-negative, but since the t distribution is symmetric about x = 0, the left tail when x < 0 is TDIST(-x, df, tails). Thus we can use the formula TDIST(ABS(x), df, tails) for any x. The cumulative probability distribution function is given by 1 – TDIST(x, df, 1) when x ≥ 0 and by TDIST(-x, df, 1) when x < 0.
TINV(p, df) = x such that TDIST(x, df, 2) = p; i.e. TINV is the inverse of TDIST in the two-tailed case. For the one-tailed case simply double p; i.e. TINV(2*p, df) = x such that TDIST(x, df, 1) = p.
With Excel 2010/2013 there are a number of new functions (T.DIST, T.INV, T.DIST.RT, T.INV.RT and T.INV.2T) that provide equivalent functionality to TDIST and TINV, but whose syntax is more consistent with other distribution functions. These functions are described in Built-in Statistical Functions.
In Excel 2010/2013 T.DIST(x, df, TRUE) is the cumulative distribution function for the t distribution with df degrees of freedom and T.DIST(x, df, FALSE) is the pdf for the t distribution. For prior versions of Excel, the supplemental function T_DIST(x, df, cum) contained in the Real Statistics Resource Pack provides the equivalent functionality.