**Definition 1**: The probability density function of the **normal distribution** is defined as:

Here is the constant *e* = 2.7183…, and is the constant *π* = 3.1415… which are described in Built-in Excel Functions.

The normal distribution is completely determined by the parameters *µ *and *σ*. It turns out that *µ* is the mean of the normal distribution and *σ* is the standard deviation. We use the abbreviation *N*(*µ, σ*) to refer to a normal distribution with mean *µ* and standard deviation *σ*.

As we shall see, the normal distribution occurs frequently and is very useful in statistics.

**Excel Functions**: Excel provides the following functions regarding the normal distribution:

**NORMDIST**(*x, μ, σ*, cum) where cum takes the value TRUE or FALSE

NORMDIST(*x, μ, σ,* FALSE) = probability density function value *f*(*x*) for the normal distribution

NORMDIST(*x, μ, σ*, TRUE) = cumulative probability distribution value *F*(*x*) for the normal distribution

**NORMINV**(*p, μ, σ)* is the inverse of NORMDIST(*x, μ, σ*, TRUE)

NORMINV(*p, μ, σ*) = the value *x* such that NORMDIST(*x, μ, σ*, TRUE) = *p*

Excel 2010/2013/2016 provide the following additional functions: **NORM.DIST**, which is equivalent to NORMDIST, and **NORM.INV**, which is equivalent to NORMINV.

**Example 1**: Create a graph of the distribution of IQ scores using the Stanford-Binet scale.

This distribution is known to be the normal distribution *N*(100, 16). To create the graph, we first create a table with the values of the probability density function* f*(*x*) for for values of *x* = 50, 51, …, 150. This table begins as shown in Figure 1.

**Figure 1 – Probability density function for IQ**

The value of *f*(*x*) for each *x* is calculated using the NORMDIST function with cum = FALSE. The probability density curve is created as a line chart using the techniques described in Line Charts. From Figure 2, you can see that the curve in this chart has the characteristic bell shape of the normal distribution.

**Figure 2 – IQ scores as normal curve**

**Observation**: As can be seen from Figure 2, the area under the curve to the right of 100 is equal to the area under the curve to left of 100; this makes 100 the mean. Since the normal curve is symmetric about the mean, it follows that the median is also 100. Since the curve reaches its highest point at 100, it follows that the mode is also 100.

**Observation**: The basic parameters of the normal distribution are as follows:

- Mean = median = mode =
*µ* - Standard deviation =
*σ* - Skewness = kurtosis = 0

The function is symmetric about the mean with inflection points (i.e. the points where there curve changes from concave up to concave down or from concave down to concave up) at *x* = *μ* ± *σ*.

As can be seen from Figure 3, the area under the curve in the interval *μ – σ* < *x* < *μ + σ* is approximately 68.26% of the total area under the curve. The area under the curve in the interval *μ – *2*σ < x < μ + *2*σ* is approximately 95.44% of the total area under the curve and the area under the curve in the interval *μ – *3*σ < x < μ + *3*σ* is approximately 99.74% of the area under the curve.

**Figure 3 – Areas under normal curve**

Given the symmetry of the curve, this means that the area under the curve where *x > μ + σ* is 15.87%, i.e. (100% – 68.26%) / 2. The area under the curve where *x > μ + *2*σ* is 2.28% and the area under the curve where *x > μ + *3*σ* is 0.13%.

It also turns out that 95% of the area under the curve is in the interval -1.96 < *x* < 1.96. This will be important when considering the critical value for *α* = .05.

**Property 1**: If *x* has normal distribution *N*(*μ, σ*) then the linear transform y* = ax + b*, where *a* and *b* are constants, has normal distribution *N*(*aμ+b, aσ*)*.*

**Property 2**: If *x*_{1} and *x*_{2} are independent random variables, and *x*_{1} has normal distribution *N*(*μ*_{1}*, σ*_{1})* *and *x*_{2} has normal distribution *N*(*μ*_{2}*, σ*_{2}) then *x*_{1} + *x*_{2} has normal distribution *N*(*μ*_{1}+*μ*_{2}, *σ*) where

**Observation**: Click here for addition characteristics of the normal distribution function (using calculus), as well as a proof of Property 1 and 2.

**Example 2**: A charity group prepares sandwiches for the poor. The weights of the sandwiches are distributed normally with mean 150 grams and standard deviation of 25 grams. One sandwich is chosen at random (this is a random sample of size one). What is the probability that this sandwich will weigh between 145 and 155 grams?

NORMDIST(145, 150, 25, TRUE) = .42074 = probability that weight is less than 145 grams

NORMDIST(155, 150, 25, TRUE) = .57926 = probability that weight is less than 155 grams

The answer therefore = .57926 – . 42074 = .15852 = 15.85%.

Hi Charles,

few questions:

1. RE your example of the IQ score — Why did you use the standard deviation of 16 instead of the actual value 29.3?

2. I’ve seen many examples where the analysts use the median rank instead of using the excel function Normdist(x, mean, standard deviation, true) for cummulative probability. Why is this?

Tony,

1. If you are referring to Example 1, then I used 16 since that is the population standard deviation for the IQ test.

2. Sorry, but I don’t know what median rank you are referring to?

Charles

Charles, thx for your reply and below is clarification on my previous questions.

1. yes I was referring to your Example1 (IQ test). Using excel formula stdev (50,51,52,…150) produces the standard deviation of 29.3 and not 16 as you used.

2. Cumulative F(x) is calculated by using the median ranking (i-0.3)/(n+0.4), where n=total sample size and i=the rank position of the sample. For example, if n=20, then F(x) of the first sample would be (1-0.3)/(20+0.4) and the last sample F(x)=(20-0.3)/(20+0.4). The cumulative is calculated this way for each sample instead of using the excel normdist(x, mean, stdev, 1).

thanks in advance for your inputs.

Tony,

1. I drew the chart for the population distribution which has a known standard deviation of 16. I am not trying to fit the data to a normal distribution. I could have added values 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 and 151, 152, etc. and the chart would look more or less the same. The values 50, 51, …, 150 are not to be considered to be sample values — there is no sample here; they are merely x values where I am plotting x,y on the chart.

2. Again, the approach you are referring to is used to fit the data to a normal distribution (as is done to create a QQ plot). This is not what I was doing. I was simply plotting (x,y) values where y = f(x) and the function f is the pdf of the normal distribution.

Charles

Dear Chares, i think there is some confusing description about the basic parameters of the normal distribution, eg. Skewness = kurtosis = 0. This is only valid when μ = 0,σ = 1, right?

Chun,

Skewness = kurtosis = 0 for any normal distribution even when μ ≠ 0 or σ ≠ 1.

Charles

oh great

NORMDIST(145, 150, 25, TRUE) = .42074 = probability that weight is less than 145 grams

I’m sorry, but is that probability that weight is less than 145 grams, or probability that weight is less than or equal to 145 grams?

Vladimar,

Since for a continuous distribution (such as the normal distribution) the probability of any specific value x is zero, either interpretation is accurate (i.e. you can use “probability that weight is less than 145 grams” or “probability that weight is less thanor equal to 145 grams”)

Charles

Thanks Charles, I really benefit from this wishing you all the best

thanks charles for you good additional information am benefiting from.

please try to write the basic main uses and properties of normla ditribution, binomial distribution and poisson distribution 🙂

The main uses and properties of the normal distribution are described throughout the website. The main uses and properties of the binomial distribution are described on the webpage Binomial Distribution. Those of the Poisson Distribution on the webpage Poisson Distribution.

Charles

from kenya av learnent alot

Charles,

Thanks for explaining your logic. The inconsistency between usual theory and Excel practicaility could perhaps be reconciled by writing it as N(100,16^2) rather than

N(100,256) or N(100,16) ?. (though with a neater squared symbol than ^2)

Great site by the way !

Simon

In other places, including my course notes and Wikipedia, normal distribution is represented by N(mu,sigma squared), such that the second parameter is the variance. In your example, in N(100,16), the 16 is the standard deviation as illustrated by the graph. Have you used a different interpretation and is this the case through all your pages ?

Simon,

I have indeed used the standard deviation instead of the variance as the second parameter. I have done this consistently throughout the website.

I chose to use the standard deviation because Excel’s formula NORMDIST(x, mean, stdev, cum) uses the standard deviation. In some sense it is also simpler. The problem with this approach occurs when you look at the multivariate normal distribution, where the second parameter is the covariance matrix, which is really a multivariate version of the variance and not the standard deviation. For this reason, the variance might have been a better choice.

Charles

Hi Charles, Thanks for putting together this really useful and very well put together

website. I think this is an error-

In the few lines below figure 3, the left area of the curve is missing i.e. it only says x> sigma but should x > μ + σ or x < μ + σ is 15.87%, i.e. (100% – 68.26%) / 2.

Also, is should be 15.87 instead of 16.13 and for 2σ and σ sigma values.

Thanks!

Kabir,

Good catch. You are 100% correct that the value should be 15.87%. I have just changed the webpage to reflect this. Thanks for finding this error.

Regarding left area of the curve being missing, I don’t see this. With my browser (Crome) the complete curve is displayed. I don’t reference the left area of the curve since it is the same as the right).

Charles

a bit confused … when you have a set of data do you first find the average and std deviation of that data and then try and approximate it with a probability distribution and use this for your analysis?

how do you know that the data can be represented by a normal distribution?

Paul,

On the referenced webpage, we are assuming that we already know that the normal distribution with a given mean and std dev is a suitable model.

In general a number of popular statistical tests will assume that the data can be represented by a normal distribution. You can then test whether this assumption is true as described on the webpage http://www.real-statistics.com/tests-normality-and-symmetry/. If this assumption is not true, you will need to find a different test for which the assumptions are met, or use one of the non-parametric tests as described on the webpage http://www.real-statistics.com/non-parametric-tests/.

Charles

the material is very helpfull for my studies as iam doing my BBA. please keep it up.

Dear Charles. Thank you very much, now i can understand. its really helpful .

Thank you very much for the help, the materials are very helpful. but i have a little question about the that you stated as the mean , median , mode. Please i would like to have more information about this statement.

Traore,

I have just added further explanation in the paragraph following Figure 2 on the referenced webpage. I hope this helps.

Charles