We consider a random variable *x* and a data set *S = *{*x*_{1}*, x*_{2}*, …, x _{n}*} of size

*n*which contains possible values of

*x*. The data set can represent either the population being studied or a sample drawn from the population. The mean is the statistic used most often to characterize the center of the data in

*S*. We now consider the following commonly used measures of variability of the data around the mean, namely the

**standard deviation, variance**,

**squared deviation**and

**average absolute deviation**.

In addition we also explore three other measures of variability that are not linked to the mean, namely the **median absolute deviation**,** range** and **inter-quartile range**.

Of these statistics the variance and standard deviation are most commonly employed.

**Excel Functions**: If R is an Excel range which contains the data elements in *S* then the Excel function which calculates each of these statistics is shown in Figure 1. Functions marked with an asterisk are supplemental functions found in the Real Statistics Resource Pack, although equivalent formulas in standard Excel are described later.

Statistic |
Excel 2007 |
Excel 2010+ |
Symbol |

Population Variance | VARP(R) | VAR.P(R) | σ^{2} |

Sample Variance | VAR(R) | VAR.S(R) | s^{2} |

Population Standard Deviation | STDEVP(R) | STDEV.P(R) | σ |

Sample Standard Deviation | STDEV(R) | STDEV.S(R) | s |

Squared Deviation | DEVSQ(R) | DEVSQ(R) | SS |

Average Absolute Deviation | AVEDEV(R) | AVEDEV(R) | AAD |

Median Absolute Deviation | MAD(R) * | MAD(R) * | MAD |

Range | RNG(R) * | RNG(R) * | |

Inter-quartile Range | IQR(R, b) * |
IQR(R, b) * |
IQR |

Coefficient of Variation | STDEV(R)/AVERAGE(R) | STDEV.S(R)/AVERAGE(R) | V |

**Figure 1 – Measures of Variability**

**Observation**: These functions ignore any empty or non-numeric cells.

**Variance**

**Definition 1**: The variance is a measure of the dispersion of the data around the mean. Where *S* represents a population the **population variance** (symbol ** σ^{2}**) is calculated from the population mean

*µ*as follows:

Where *S* represents a sample the **sample variance** (symbol ** s^{2}**) is calculated from the sample mean

*x̄*as follows:

The reason the expression for the population variance involves division by *n* while that of the sample variance involves division by *n* – 1 is explained in Property 3 of Estimators, where division by *n* – 1 is required to obtained an unbiased estimator of the population variance.

**Excel Function**: The sample variance is calculated in Excel using the worksheet function **VAR**. The population variance is calculated in Excel using the function **VARP**. In Excel 2010/2013 the alternative forms of these functions are **VAR.S** and **VAR.P**.

**Example 1**: If *S* = {2, 5, -1, 3, 4, 5, 0, 2} represents a population, then the variance = 4.25.

This is calculated as follows. First, the mean = (2+5-1+3+4+5+0+2)/8 = 2.5, and so the squared deviation *SS* = (2–2.5)^{2} + (5–2.5)^{2} + (-1–2.5)^{2 }+ (3–2.5)^{2} + (4–2.5)^{2} + (5–2.5)^{2} + (0–2.5)^{2 }+ (2–2.5)^{2} = 34. Thus the variance = *SS*/*n* = 34/8 = 4.25

If instead *S* represents a sample, then the mean is still 2.5, but the variance = *SS*/(*n–*1) = 34/7 = 4.86.

These can be calculated in Excel by the formulas VARP(B3;B10) and VAR(B3:B10), as shown in Figure 2.

**Figure 2 – Examples of measures of variability**

**Observation**: When data is expressed in the form of frequency tables then the following properties are useful. Click here for the proofs of these properties.

**Property 1**: If *x̄* is the mean of the sample *S = *{*x*_{1}*, x*_{2}*, …, x _{n}*}, then the sample variance can be expressed by

**Property 2**: If *µ* is the mean of the population *S = *{*x*_{1}*, x*_{2}*, …, x _{n}*}, then the population variance can be expressed by

**Standard Deviation**

**Definition 2**: The **standard deviation** is the square root of the variance. Thus the population and sample standard deviations are calculated respectively as follows:

** Excel Function**: The sample standard deviation is calculated in Excel using the worksheet function

**STDEV**. The population standard deviation is calculated in Excel using the function

**STDEVP**. In Excel 2010/2013 the alternative forms of these functions are

**STDEV.S**and

**STDEV.P**.

**Example 2**: If *S* = {2, 5, -1, 3, 4, 5, 0, 2} is a population, then the standard deviation = square root of the population variance = = 2.06

If *S* is a sample, then the sample standard deviation = square root of the sample variance = = 2.20

These are the results of the formulas STDEVP(B3:B10) and STDEV(B3:B10), as shown in Figure 2.

**Real Statistics Functions**: The Real Statistics Resource Pack furnishes the following array functions:

**VARCOL**(R1) = a row range which contains the sample standard variances of each of the columns in R1

**STDEVCOL**(R1) = a row range which contains the sample standard deviations of each of the columns in R1

**VARROW**(R1) = a column range which contains the sample standard variances of each of the rows in R1

**STDEVROW**(R1) = a column range which contains the sample standard deviations of each of the rows in R1

**Example 3**: Use the VARCOL and STDEVCOL functions to calculate the sample variance and standard deviation of each of the columns in the range L4:N11 of Figure 3.

The formula =VARCOL(J4:L11) produces the first result (in range J15:L15), while the formula =STDEVCOL(J4:L11) produces the second result (in range J16:L16). Remember that after entering either of these formulas you must press **Ctrl-Shft-Enter**.

**Figure 3 – Sample Variance and Standard Deviation by Column**

**Property 3**: If the population {*x _{1}, x*

_{2}

*, …, x*} has mean

_{n}*µ*and standard deviation

_{x}*σ*and the population {y

_{x}_{1}, y

_{2}, …, y

*} has mean*

_{m}*µ*

_{y }and standard deviation

*σ*

_{y}, then the variance of the combined population is

Thus if *µ _{x}* =

*µ*

_{y }the combined population variance would be

**Property 4**: If the sample {*x*_{1}*, x*_{2}*, …, x _{n}*} has mean

*x̄*and standard deviation

*s*

_{x}and the sample {y

_{1}, y

_{2}, …, y

*} has mean*

_{m}*ȳ*and standard deviation

*s*

_{y}, then the variance of the combined sample is

Thus if *x̄ = ȳ*, the combined sample variance would be

**Example 4**: Find the mean and variance of the sample which results from combining the two samples {3, 4, 6, 7} and {6, 1, 5}.

**Figure 4 – Calculation of combined mean and standard deviation**

The data in the two samples is given in the range B3:C7 of Figure 4. From these, the mean, variance and standard deviation are calculated for each of the two samples (ranges B12:B15 and C12:C15). Using Property 4, we can calculate the mean and variance of the combined sample (D13 and D14).

If we simply combine the two samples we obtain the data in the range F3:F10, from which we can calculate the mean, variance and standard deviation in the normal way (range D12:D18). As we can see the results are the same.

**Observation**: In practice instead of using Property 3 and 4, we use the approach shown in the following example, especially since it can be applied to more than two samples or populations.

**Example 5**: Find the mean and variance of the sample which results from combining the three samples shown in range A3:D6 of Figure 5.

**Figure 5 – Calculation of combined mean and variance**

We have three samples whose total sample size is 58 (cell B7), calculated via =SUM(B4:B6). The sum of the elements in each sample can be calculated from the mean as shown in range F4:F6. E.g. the sum of all the data elements in sample 1 is 276 (cell F4), calculated via the formula =B4*C4. Thus the sum of all the elements in all three sample is 786 (cell F7), calculated via the formula =SUM(F4:F6). The mean of the combined sample is therefore 13.5517 (cell C7), calculated via the formula =F7/B7.

The calculation of the combined variance is similar. The key is to first find the sum of the squares of all the elements in each sample. These are given in range I4:I6. E.g. the sum of the squares of all the elements in sample 1 is 5512 (cell I4), calculated by =G4+H4 (using Property 1), where G4 contains the formula =B4*C4^2 and H4 contains =(B4-1)*D4. Thus the sum of squares of all the elements in the combined sample is 19,832 (cell I7), calculated by =SUM(I4:I6). Finally, the variance for the combined sample is 161.059 (cell D7), calculated by =(I7-B7*C7^2)/(B7-1), based on Property 1. The standard deviation is therefore 12.6909.

**Squared Deviation**

**Definition 3**: The **squared deviation** (symbol ** SS** for

**sum of squares**) is most often used in

**ANOVA**and related tests. It is calculated as

** Excel Function**: The squared deviation is calculated in Excel using the worksheet function

**DEVSQ**.

**Example 6**: If *S* = {2, 5, -1, 3, 4, 5, 0, 2}, the squared deviation = 34. This is the same as the result of the formula DEVSQ(B3:B10) as shown in Figure 2.

**Average Absolute Deviation**

**Definition 4**: The **average absolute deviation** (**AAD**) of data set *S* is calculated as

**Excel Function**: The average absolute deviation is calculated in Excel using the worksheet function

**AVEDEV**.

**Example 7**: If *S* = {2, 5, -1, 3, 4, 5, 0, 2}, the average absolute deviation = 1.75. This is the same as the result of the formula AVEDEV(B3:B10) as shown in Figure 2.

**Median Absolute Deviation**

**Definition 5**: The **median absolute deviation** (**MAD**) of data set *S* is calculated as

Median {|*x _{i}* – | :

*x*in

_{i}*S*}

where = median of the data elements in *S*.

**Excel Formula**: If R is a range which contains the data elements in *S* then the MAD of *S* can be calculated in Excel by the array formula:

=MEDIAN(ABS(R-MEDIAN(R)))

Even though the value is presented in a single cell it is essential that you press **Ctrl-Shft-Enter** to obtain the array value, otherwise the result won’t come out correctly. This function only works properly when R doesn’t contain any empty cell or cell with a non-numeric value.

Alternatively, you can use the supplemental function **MAD**(R) which is contained in the Real Statistics Resource Pack. This function works properly even when R contains empty cells and/or cells with non-numeric values. You don’t need to press **Ctrl-Shft-Enter** to use this function.

**Example 8**: If *S* = {2, 5, -1, 3, 4, 5, 0, 2}, the median absolute deviation = 2 since *S = *{-1, 0, 2, 2, 3, 4, 5, 5}, and so the median of *S* = (2+3)/2 = 2.5. Thus MAD = the median of {3.5, 2.5, 0.5, 0.5, 0.5, 1.5, 2.5, 2.5} = {0.5, 0.5, 0.5, 1.5, 2.5, 2.5, 2.5, 3.5}, i.e. (1.5+2.5)/2 = 2.

You can achieve the same result using the supplemental formula =MAD(E3:E10) as shown in Figure 2.

**Observation**: This metric is less affected by extremes in the tails because the data in the tails have less influence on the calculation of the median than they do on the mean.

**Range**

**Definition 6**: The **range** of a data set *S* is a crude measure of variability and consists simply of the difference between the largest and smallest values in *S*.

**Excel Formula**: If R is a range which contains the data elements in *S* then the range of *S* can be calculated in Excel by the formula:

=MAX(R) – MIN(R)

Alternatively, you can use the supplemental function **RNG**(R) which is contained in the Real Statistics Resource Pack.

**Example 9**: If *S* = {2, 5, -1, 3, 4, 5, 0, 2}, the range = 5 – (-1) = 6. You can achieve the same result using the supplemental formula =RNG(E3:E10) as shown in Figure 2.

**Inter-quartile Range**

**Definition 7**: The** inter-quartile range** (**IQR**) of a data set *S* is calculated as the 75% percentile of *S* minus the 25% percentile. The IQR provides a rough approximation of the variability near the center of the data in *S.*

**Excel Formula**: If R is a range which contains the data elements in *S* then the IQR of *S* can be calculated in Excel by the formula:

=QUARTILE(R, 3) – QUARTILE(R, 1)

In Excel 2010/2013 there is a new version of the quartile function called QUARTILE.EXC. An alternative version of IQR is therefore

=QUARTILE.EXC(R, 3) – QUARTILE.EXC(R, 1)

See Ranking Functions in Excel for further information about the QUARTILE and QUARTILE.EXC functions. Alternatively, you can calculate the inter-quartile range via the supplemental function **IQR**(R, *b*) which is contained in the Real Statistics Resource Pack. When *b* = FALSE (default), the first version of IQR is returned, while when *b* = TRUE the second version is returned.

**Example 10**: If *S* = {2, 5, -1, 3, 4, 5, 0, 2}, then the first version of IQR = 4.25 – 1.5 = 2.75, while the second version is IQR = 4.75 – 0.5 = 4.25. You can achieve the same result using the Real Statistics formulas =IQR(B3:B10) and =IQR(B3:B10,TRUE), as shown in Figure 2.

**Observation**: The variance, standard deviation, average absolute deviation and median absolute deviation measure both the variability near the center and the variability in the tails of the distribution which represents the data. The average absolute deviation and median absolute deviation do not give undue weight to the tails. On the other hand, the range only uses the two most extreme points and the interquartile range only uses the middle portion of the data.

**Coefficient of Variation**

**Definition 8**: The **coefficient of variation** (aka the **coefficient of variability**), * V* (or

*), of the data set*

**CV***S*is calculated as

*V = s/x̄*

Since *s* and *x̄* have the same units of measurement, *V* has no units of measurement. This statistic only makes sense for ratio scale data. The higher the value of *V* the more **dispersion** there is.

Clearly, the coefficient of variation is only defined when the mean is not zero.

**Excel Formula**: If R is a range which contains the data elements in* S* then the coefficient of variation for *S* can be calculated in Excel by the formula:

=VAR.S(R)/AVERAGE(R)

The population version of *V* is *σ/μ* which can be calculated in Excel by the formula

=VAR.P(R)/AVERAGE(R)

**Example 11**: If *S* = {2, 5, -1, 3, 4, 5, 0, 2} represents a sample, then, as we can see from Example 1, the coefficient of variation is

*V = s/x̄ *= 2.203892/2.5 = 88.16%

**Example 12**: Stock A has an expected return of 12% with a standard deviation of 9% and stock B has an expected return of 8% with a standard deviation of 5%. Use the coefficient of variation to determine which is the better investment.

Since *V _{A}* = .09/.12 = .75 and

*V*= .05/.08 = .625, stock B is considered to be the better investment since its relative risk (equal to its coefficient of variation) is lower.

_{B}
Dear Mr. Zaiontz,

I have a question regarding to the IQR example.

First, This is how I calculate the IQR for the data set

given S = {2, 5, -1, 3, 4, 5, 0, 2}

-(1)- reorder: -1, 0, 2, 2, 3, 4, 5, 5

-(2)- median: -1, 0, 2, 2 | 2.5 | 3, 4, 5, 5

-(3)- Q1 as median of {-1, 0, 2, 2} = (0+ 2)/2 = 1

-(4)- Q3 as median of {3, 4, 5, 5} = (4 + 5)/2 = 4.5

-(5)- IQR = Q3 – Q1 = 4.5 – 1 = 3.5

As you can see, the result is totally different with the two versions of IQR calculated using Excel function QUARTILE.EXC (IQR = 4.75 – 0.5 = 4.25) and QUARTILE.INC (IQR = 4.25 – 1.5 = 2.75).

My questions are:

– Which IQR result should I rely on?

——————————————

For people who is curious about why there is a difference in calculation between QUARTILE.EXC and QUARTILE.INC, this article:

http://datapigtechnologies.com/blog/index.php/why-excel-has-multiple-quartile-functions-and-how-to-replicate-the-quartiles-from-r-and-other-statistical-packages/ gives a good explanation.

Dear Mr. Zaiontz,

There is a typo in example 12, on the calculation of VB.

It should be VB =.05/.08 not .5/.08

Thanks.

Le,

Thanks for catching this typo. I have just corrected the mistake.

I appreciate your diligence and help in improving the website.

Charles

Le,

As the article you referenced explains there is no one right answer. The consensus seems to be that it is better to use QUARTILE.EXC though.

Charles

Hi, Charles,

In the Coefficient of Variation section, you said that V = S/x_bar.

S is variance, right? Then how come it has the same unit as the average x_bar?

You said that coefficient of variation only works with ratio data. For ratio data, can we also use standard deviation to measure variability? What’s the difference between the variation coefficient and SD in this case?

Thanks!

Xiaobin,

S stands for standard deviation (not variance). This is why the units are the same.

You can certainly use the standard deviation to measure variability (which is why it is one of the statistics described on the referenced webpage).

The coefficient of variation is the standard deviation divided by the mean.

Charles

Dear Dr Zaiontz,

I’m trying to analyze summary survey data that used a 6 point Likert-type scale (where 1=strongly disagree and 6=strongly agree). I’ve done some of the basic summary stats, but I would like to calculate variance/standard deviation but all my data are in summary form instead of all the examples I’ve seen elsewhere that list each individual response. So for example: for question 1, I have 1 row of data: 50 people chose strongly agree, 11 agree, 4 somewhat agree, and 0 chose somewhat disagree, disagree, or strongly disagree, for a total of 65 responses. I’ve calculated a mean summary score (5.7076) but is there a way to use the stats functions in Excel to calculate SD without manually expanding all the data to be in a format showing each individual response? I’m still pretty new at this so would really appreciate any advice – thanks!!

Jennifer,

Essentially you have a frequency table. You can calculate both the mean and variance via a weighted sum, which in Excel is implemented using the SUMPRODUCT function. This is shown in Figure 4 of the following webpage:

http://www.real-statistics.com/real-statistics-environment/data-conversion/frequency-table-conversion/

For your specific data the variance is .335 as described using two methods in the following image (the standard deviation is the square root): Click here for the image

Charles

I see – thank you so much for the help!!

Hi Charles, I’m having trouble getting MAD to work with an array. Specifically, I have a few groups (Group 1,2,3 in column C) with different values for some variableA in column D. There are different numbers of members in each group (max is specified below with ####), but each group is clustered together in sequence, i.e., the order in the spreadsheet is always 1,1,2,3, not 1,3,2,1. I want to compute MAD in Column B of variableA for Groups 1,2,3 with a formula that relative references the name “1”, “2”, or “3”. Column A is the list of these groups that will be relative referenced: 1 is in A1, 2 is in A2, 3 is in A3.

This is what I’ve got:

B1= {=mad(IF(C$1:C$####=A1,D$1:D$####))}

the formula works for the first entry A1, but not A2:A#### if it’s not an array formula. But if it is, then something that’s clearly not the MAD returns (expected MAD is about 0.01 for the dataset but when the formula is entered as an array it returns a value >5.0). Do you have any insight?

Error in:

the formula works for the first entry A1, but not A2:A#### if it’s not an array formula.

Should read:

the formula works for the first entry A1, but not A2:A3, if it’s not an array formula.

Radek,

Suppose I have the following data in range A1:B6:

1 45

2 56

1 78

2 23

1 45

1 25

The array formula =MAD(IF($A$1:$A$6=1,$B$1:$B$6,””)) will yield the result of 10, but you must press Ctrl-Shft-Enter to get this result.

Charles

How do i compare 2 distributions using median absolute deviation ?

Sindhuja,

You can certainly calculate the MAD for each distribution and look at the ratio between the two. don’t know of an analytic test to determine whether there is a significant difference between the two, but you can certainly use one of the resampling techniques described on the website.

Charles

Dear Dr. Charles,

How to calculate robust standard deviation By Q method (as per DIN 38402 – A45:9/2003 and ISO/TS 20612:2007 respectively. Can you explain with simple example?

Regards.

A.Dasgupta.

Assuming that you are referring to robust standard errors, please see the following webpage

Robust Standard Errors

Charles

Dear Dr Charles,

Many thanks for your immediate response. It seems Q method is very similar to Qn method of Std Dev as given in ISO 13528 ( based first quartile of successive difference in reading and correction factor), but not exactly same.

Regards.

A.Dasgupta.

Sorry, but the only robust standard error approaches I can explain are the ones that appear in the website.

Chatles

Dear Mr. Zaiontz,

How would one go about calculating the Mean Average Deviation of two combined samples utilizing only the summary statistics of the two individual samples (something analogous to Property #4 above as opposed to having to combine the two samples and then re-compute these statistics for the new, combined sample). Or, is there a way to compute a “correlation factor” of some kind using Mean Average Deviations instead of Variances?

Keith,

What is your definition of mean average deviation? Are you referring to median average deviation, mean absolute deviation or some other statistic that can be abbreviated MAD?

I also don’t know what you mean by “correlation factor”?

Charles

Dear Charles Zaiontz,

I am investigating the ratio of two means, let’s say X:Y (as this is a measure of plant leaf dry content). Is there anyway I can combine their standard deviations?

X is 27 (an average from 20, 28, 33) (leaf dry weight)

Y is 156 (an average from 113, 168, 188) (leaf fresh weight)

I am interested in the ratio which is 0.17

St.dev. of X is 5.35

St.dev. of Y is 31.71

Can I calculate a St.dev. for the ratio 0.17?

Kindest regards,

Ditte

Hello Ditte,

First of all, you have two methods of estimating the population mean. The first is the approach you used, namely 27/57. The second is to take the mean of 20/113, 28/168, 33/188. These won’t necessarily be the same. Regarding standard deviations, see the following webpage:

http://www.stat.smu.edu/~hseltman/files/ratio.pdf

Charles

sir,

i have a data sample of 50 with 8 variables of comfort factors affect to the public transit,

those data is in likert scale format,

sir, i wanted to analyse those data with mean, std deviation and variation, and also to do regression analysis

please help me to find it (method) ASAP

thank you!

Dilan,

Sorry, but I don’t understand your question. What do you mean by “analyse those data”? You can calculate the mean, standard deviation and variance (e.g. by using the Excel functions AVERAGE, STDEV and VAR). You can perform regression as described on the Real Statistics website (insert Regression in the Search box).

Charles

Dr. Charles,

I think he has got some ordinal data as he says “those data is in likert scale format”. Is it ok to say “You can calculate the mean, standard deviation and variance (e.g. by using the Excel functions AVERAGE, STDEV and VAR).”?

You can only calculate the mean, standard deviation, etc. if you interpret Likert data as interval data, i.e. the distance between 1 and 2 is the same as between 2 and 3, 3 and 4, and 4 and 5 (assuming as 1,2,3,4,5 scale). In this case you can use the Excel functions.

Charles

Dr. Zaiontz,

Wow, thanks for the quick reply! That definitely worked perfectly, I checked it inductively at first, then deductively to prove it and if I did everything right, your method is exactly what I wanted. Thank you so much!

Immeasurable thanks,

Cave

Dear Dr. Zaiontz,

My gratitude for this great support. I would like to know how to calculate common standard deviation from ttest paired. I need to get it to assess continuous data from crossover trial.

Thanks you very much.

Arturo

Arturo,

The formula for the standard deviation is the one given on the webpage http://www.real-statistics.com/descriptive-statistics/measures-variability/. Or you can use the Excel formula STDEV as usual. In the case of the paired t test you calculate the standard deviation on the difference between the data values for the two samples (as shown in Figure 2 of http://www.real-statistics.com/students-t-distribution/paired-sample-t-test/). The standard error is then this value divided by the square root of the sample size, as explained right after Figure 2.

If you download the Real Statistics Examples Workbook you can access the spreadsheets for all the examples on the website. From these spreadsheets you can see how all the calculations are done.

Charles

Dear Mr. Zaiontz,

I’m looking for a method to evaluate the variability of some kinetic parameters of gait and found MAD more appropriate that coefficient of variation. My question is why you didn’t have used in the MAD formula the multiplier “b” as in the reference http://web.ipac.caltech.edu/staff/fmasci/home/statistics_refs/BetterThanMAD.pdf

[ Alternatives to the Median Absolute Deviation]

Thank you for your time,

Respectfully,

Daniel

Dear Daniel,

The standard calculation for the MAD function is the one that I have used. The b multiplier depends on the distribution. For large samples which are normally distributed, b * MAD is approximately equal to the standard deviation where b = 1.4826. For other distributions b would have a different value.

Charles