**Definition 1**: The probability density function of the **normal distribution** is defined as:

Here is the constant *e* = 2.7183…, and is the constant *π* = 3.1415… which are described in Built-in Excel Functions.

The normal distribution is completely determined by the parameters *µ *and *σ*. It turns out that *µ* is the mean of the normal distribution and *σ* is the standard deviation. We use the abbreviation *N*(*µ, σ*) to refer to a normal distribution with mean *µ* and standard deviation *σ*.

As we shall see, the normal distribution occurs frequently and is very useful in statistics.

**Excel Functions**: Excel provides the following functions regarding the normal distribution:

**NORMDIST**(*x, μ, σ*, cum) where cum takes the value TRUE or FALSE

NORMDIST(*x, μ, σ,* FALSE) = probability density function value *f*(*x*) for the normal distribution

NORMDIST(*x, μ, σ*, TRUE) = cumulative probability distribution value *F*(*x*) for the normal distribution

**NORMINV**(*p, μ, σ)* is the inverse of NORMDIST(*x, μ, σ*, TRUE)

NORMINV(*p, μ, σ*) = the value *x* such that NORMDIST(*x, μ, σ*, TRUE) = *p*

Excel 2010/2013 provide the following additional functions: **NORM.DIST**, which is equivalent to NORMDIST, and **NORM.INV**, which is equivalent to NORMINV.

**Example 1**: Create a graph of the distribution of IQ scores using the Stanford-Binet scale.

This distribution is known to be the normal distribution *N*(100, 16). To create the graph, we first create a table with the values of the probability density function* f*(*x*) for for values of *x* = 50, 51, …, 150. This table begins as shown in Figure 1.

The value of *f*(*x*) for each *x* is calculated using the NORMDIST function with cum = FALSE. The probability density curve is created as a line chart using the techniques described in Line Charts. From Figure 2, you can see that the curve in this chart has the characteristic bell shape of the normal distribution.

**Observation**: As can be seen from Figure 2, the area under the curve to the right of 100 is equal to the area under the curve to left of 100; this makes 100 the mean. Since the normal curve is symmetric about the mean, it follows that the median is also 100. Since the curve reaches its highest point at 100, it follows that the mode is also 100.

**Observation**: The basic parameters of the normal distribution are as follows:

- Mean = median = mode =
*µ* - Standard deviation =
*σ* - Skewness = kurtosis = 0

The function is symmetric about the mean with inflection points (i.e. the points where there curve changes from concave up to concave down or from concave down to concave up) at *x* = *μ* ± *σ*.

As can be seen from Figure 3, the area under the curve in the interval *μ – σ* < *x* < *μ + σ* is approximately 68.26% of the total area under the curve. The area under the curve in the interval *μ – *2*σ < x < μ + *2*σ* is approximately 95.44% of the total area under the curve and the area under the curve in the interval *μ – *3*σ < x < μ + *3*σ* is approximately 99.74% of the area under the curve.

Given the symmetry of the curve, this means that the area under the curve where *x > μ + σ* is 15.87%, i.e. (100% – 68.26%) / 2. The area under the curve where *x > μ + *2*σ* is 2.28% and the area under the curve where *x > μ + *3*σ* is 0.13%.

It also turns out that 95% of the area under the curve is in the interval -1.96 < *x* < 1.96. This will be important when considering the critical value for *α* = .05.

**Property 1**: If *x* has normal distribution *N*(*μ, σ*) then the linear transform y* = ax + b*, where *a* and *b* are constants, has normal distribution *N*(*aμ+b, aσ*)*.*

**Property 2**: If *x*_{1} and *x*_{2} are independent random variables, and *x*_{1} has normal distribution *N*(*μ*_{1}*, σ*_{1})* *and *x*_{2} has normal distribution *N*(*μ*_{2}*, σ*_{2}) then *x*_{1} + *x*_{2} has normal distribution *N*(*μ*_{1}+*μ*_{2}, *σ*) where

**Observation**: Click here for addition characteristics of the normal distribution function (using calculus), as well as a proof of Property 1 and 2.

**Example 2**: A charity group prepares sandwiches for the poor. The weights of the sandwiches are distributed normally with mean 150 grams and standard deviation of 25 grams. One sandwich is chosen at random (this is a random sample of size one). What is the probability that this sandwich will weigh between 145 and 155 grams?

NORMDIST(145, 150, 25, TRUE) = .42074 = probability that weight is less than 145 grams

NORMDIST(155, 150, 25, TRUE) = .57926 = probability that weight is less than 155 grams

The answer therefore = .57926 – . 42074 = .15852 = 15.85%.

Thank you very much for the help, the materials are very helpful. but i have a little question about the that you stated as the mean , median , mode. Please i would like to have more information about this statement.

Traore,

I have just added further explanation in the paragraph following Figure 2 on the referenced webpage. I hope this helps.

Charles

Dear Charles. Thank you very much, now i can understand. its really helpful .

the material is very helpfull for my studies as iam doing my BBA. please keep it up.

a bit confused … when you have a set of data do you first find the average and std deviation of that data and then try and approximate it with a probability distribution and use this for your analysis?

how do you know that the data can be represented by a normal distribution?

Paul,

On the referenced webpage, we are assuming that we already know that the normal distribution with a given mean and std dev is a suitable model.

In general a number of popular statistical tests will assume that the data can be represented by a normal distribution. You can then test whether this assumption is true as described on the webpage http://www.real-statistics.com/tests-normality-and-symmetry/. If this assumption is not true, you will need to find a different test for which the assumptions are met, or use one of the non-parametric tests as described on the webpage http://www.real-statistics.com/non-parametric-tests/.

Charles

Hi Charles, Thanks for putting together this really useful and very well put together

website. I think this is an error-

In the few lines below figure 3, the left area of the curve is missing i.e. it only says x> sigma but should x > μ + σ or x < μ + σ is 15.87%, i.e. (100% – 68.26%) / 2.

Also, is should be 15.87 instead of 16.13 and for 2σ and σ sigma values.

Thanks!

Kabir,

Good catch. You are 100% correct that the value should be 15.87%. I have just changed the webpage to reflect this. Thanks for finding this error.

Regarding left area of the curve being missing, I don’t see this. With my browser (Crome) the complete curve is displayed. I don’t reference the left area of the curve since it is the same as the right).

Charles

In other places, including my course notes and Wikipedia, normal distribution is represented by N(mu,sigma squared), such that the second parameter is the variance. In your example, in N(100,16), the 16 is the standard deviation as illustrated by the graph. Have you used a different interpretation and is this the case through all your pages ?

Simon,

I have indeed used the standard deviation instead of the variance as the second parameter. I have done this consistently throughout the website.

I chose to use the standard deviation because Excel’s formula NORMDIST(x, mean, stdev, cum) uses the standard deviation. In some sense it is also simpler. The problem with this approach occurs when you look at the multivariate normal distribution, where the second parameter is the covariance matrix, which is really a multivariate version of the variance and not the standard deviation. For this reason, the variance might have been a better choice.

Charles

Charles,

Thanks for explaining your logic. The inconsistency between usual theory and Excel practicaility could perhaps be reconciled by writing it as N(100,16^2) rather than

N(100,256) or N(100,16) ?. (though with a neater squared symbol than ^2)

Great site by the way !

Simon