Another way to characterize a distribution or a sample is via a **box plot **(aka a **box and whiskers plot**). Specifically, a box plot provides a pictorial representation of the following statistics: maximum, 75^{th} percentile, median (50^{th} percentile), mean, 25^{th} percentile and minimum.

Box plots are especially useful when comparing samples and testing whether data is distributed symmetrically.

**Real Statistics Data Analysis Tool**: To generate a box plot, you can use the **Box Plot **option of the **Descriptive Statistics and Normality** data analysis tool found in the Real Statistics Resource Pack, as described in the following example. See also Special Charting Capabilities for how to create the box plot manually using Excel’s charting capabilities.

**Example 1**: A market research company asks 30 people to evaluate three brands of tablet computers using a questionnaire. The 30 people are divided at random into 3 groups of 10 people each, where the first group evaluates Brand A, the second evaluates Brand B and the third evaluates Brand C. The questionnaire scores from these groups are summarized in Figure 1.

**Figure 1 – Sample data**

To generate the box plots for these three groups, press **Ctrl-m** and select the **Descriptive Statistics and Normality** data analysis tool. A dialog box will appear. Select the **Box Plot** option and insert A3:C13 in the **Input Range.** Check **Headings included with the data **and uncheck **Use exclusive version of quartile**.

The resulting chart is shown in Figure 2.

**Figure 2 – Box Plot**

Note too that the data analysis tool also generates a table, which may be located behind the chart. For those who are interested, this table contains the information in Figure 3, as explained further in Special Charting Capabilities.

For each sample, the box plot consists of a rectangular box with one line extending upward and another extending downward (usually called whiskers). The box itself is divided into two parts. In particular, the meaning of each element in the box plot is described in Figure 3.

Element |
Meaning |

Top of upper whisker | Maximum value of the sample |

Top of box | 75th percentile of the sample |

Line through the box | Median of the sample |

Bottom of the box | 25th percentile of the sample |

Bottom of the lower whisker | Minimum of the sample |

× markers | Mean of the sample |

**Figure 3 – Box Plot elements**

There are two versions of this table, depending on whether you check or uncheck the **Use exclusive version of quartile** field. If checked then the QUARTILE.EXC version of the 25^{th} and 75^{th} percentile is used (or QUARTILE_EXC for Excel 2007 users), while if this field is unchecked then the QUARTILE (or equivalently the QUARTILE.INC) version is used.

From the box plot (see Figure 2) we can see that the scores for Brand C tend to be higher than for the other brands and those for Brand B tend to be lower. We also see that the distribution of Brand A is pretty symmetric at least in the range between the 1^{st} and 3^{rd} quartiles, although there is some asymmetry for higher values (or potentially there is an outlier). Brands B and C look less symmetric. Because of the long upper whisker (especially with respect to the box), Brand B may have an outlier (see Outliers and Robustness for a discussion of outliers).

Another indication of symmetry is whether the × marker for the mean coincides with the median.

We can also convert the box plot to a horizontal representation of the data (as shown in Figure 4) by first deleting the markers for the means (by clicking on any of these markers and pressing the backspace key) and then clicking on the chart and selecting **Insert > Charts|Bar > Stacked Bar**.

**Figure 4 – Horizontal Box Plot**

**Observation**: When a data set has a negative value, the y-axis will be shifted upward by -MIN(R1) where R1 is the data range containing the data. Thus if R1 ranges from -10 to 20, the range in the chart will range from 0 to 30.

**Example 2**: Create the box plot for the data in Figure 5.9.1 where cell B11 is changed to -300 and the exclusive version of the quartile function.

The procedure is the same as for Example 1, except that this time we check the **Use exclusive version of quartile** option. The output is shown in Figure 5.

The key difference is that since the smallest data value is -300 (the value in cell F13), all the box plot values are shifted up by 300. This is evident by noting that the lower tail for Brand B is at 0 instead of -300 (and that cell G6 contains 0 instead of -300).

**Figure 5 – Box plot for negative data**

Note that two y-axes are displayed. The one on left is based on the displacement of 300 units, while the one on the right shows the correct units.

You can remove the y-axis on the left by following the following steps:

- Select the y-axis on the left and then right click.
- Choose the
**Format Axis…**option from the menu that appears. - When the menu of option appears as shown in Figure 5.9.6, change the
**Label Position**option from**Next to Axis**to**None**.

**Figure 6 – Remove left y-axis**

Note that if you change any of the data elements, the box chart will still be correct, although the right y-axis will not change and will still reflect the original data, and so you will need to rely on the left y-axis (you can remove the right y-axis as described above for the left y-axis).

See Special Charting Capabilities for how to create a box plot manually, using only Excel charting capabilities.

See Box Plots with Outliers to see how to generate box plots in Excel which also explicitly show outliers.

should I use inclusive or exclusive median when calculating the IQR?

Duncan,

There isn’t a definitive answer, but generally the exclusive version of the quartiles (not the median) is preferred.

Charles

I am having trouble getting outliers to show for me, even when I check the outliers and missing data option. I see there are fields for Outlier limit and # of outliers – do I need to populate those fields and, if so, how?

Sarah,

You just need to check the Box Plot w/ Outliers option. You don’t need to use the Outlier limit or # of Outliers fields. This boxplot will show the outliers.

The Outliers with Missing Data option is different and is explained at the following webpage

http://www.real-statistics.com/sampling-distributions/identifying-outliers-missing-data/

Charles

I can’t do any of the instructions. I use excel 2013.

Steven,

What problem are you having?

Charles

I use excel 2016. it has box and whisker chart but I cant change y axis scale. I need y axis to be logarithmic scale. can you help me?

One approach is to take the log of the raw data and then draw of the box and whisker chart of this transformed data.

Charles

Hi! How can I show the outliers in the boxplot?

Mjay,

If you are using Excel 2016, you can use Excel’s Boxplot chart capability. If you are using another version of Excel, you can use the approach described on the Real Statistics website (including the Boxplot feature of the Descriptive Statistics data analysis tool). Unfortunately, these don’t yet include outliers.

Charles

Hi Charles,

Thanks for the box plot tutorial. But i am having a problem with using the ‘Descriptive Statistics and Normality’ tool provided with the Real Statistics Resource Pack. When i try to create the box plot, it gives me an error message showing the following message:

‘Compile error in hidden module: Analysis’

pls. help me with this problem.

Thanks

Naveen

In order to help you with this problem, please answer the following questions:

Which version of Excel and Windows are you using?

What language is Excel using (English, French, etc.)?

What is the output when you enter the formula =VER() in any cell?

Are you able to use any of the other Real Statistics data analysis tools? If so, which one works?

Charles

Hi Charles,

I am also having the same problem as that of Naveen. I am using 2007 version of Excel and Windows 8. the language of my Excel is English. And the output of VER()=4.7 Excel 2007. And I am unable to use any of the real stats data analysis tool the same problem ‘Compile error in hidden module: Analysis’ is displayed. Please help to resolve its quite necessary for me.

Thanks and regards

Suman

Hi Suman,

The usual reason for this is that the addin wasn’t installed properly. These instructions are listed on the the following webpage:

Resource Pack for Excel 2007

In particular, you need to make sure that Excel’s Solver is installed before you try to install Real Statistics Resource Pack.

To check to see which addins are installed press Alt-TI and see which have a check mark next to them.

Charles

what if the question did not give the population data? only population size, mean and standard deviation are given. how to find the min,q1,q2,q3 and max?

You can’t calculate min,q1,q2,q3 and max from population size, mean and std dev.

Charles

Hi

There is only descriptive statistics option in the analysis tool pack in that Excel version that I am using. Can you inform me about that version which contain descriptive statistics and normality option on analysis tool pack please?

Thanks a lot.

Tara

Tara,

The standard Excel analysis pack doesn’t contain a normality option. This is included in the Real Statistics Resource Pack, which you can download for free from the webpage Free Download.

Charles

I’m having trouble using the boxplot feature from the Descriptive Statistics and Normality tool. It seems it’s not properly accounting for negative values.

Ryan,

The very last paragraph of the referenced webpage tells you where to get more information about handling negative values. This paragraph states:

“See Special Charting Capabilities for more information about the Box Plot data analysis tool, especially regarding issues that arise when some of the data is negative.”

Unfortunately, handling negative values in boxplots is a challenge using Excel VBA (which is the programming language used to create the Real Statistics Resource Pack). As a result, I had to use a trick: e.g. if the smallest negative number among the data is say -10, then I added 10 to all the data elements; in this way there are no negative values. The result is that the boxplot looks exactly as it should except that it is shifted up 10 units. This won’t matter since you can still determine whether the data is symmetric, whether there is the risk of outliers, whether variance are similar, etc. In fact, if you remove the labels on the y-axis then you won’t be able to tell the difference.

If you want to handle negative values without shifting the boxplots in this way, I give two choices in the website: (1) you can create the boxplot manually or (2) you can use the Real Statistics data analysis tool, but the last steps need to be be done manually. These options are described on the webpage http://www.real-statistics.com/excel-capabilities/special-charting-capabilities.

Charles

Charles,

I recognize what you’re saying – the boxplot itself is accurate, it’s just the axis units shifted. I too have programmed in VBA and understand the complexity of writing a function, such as this one. Nonetheless, I appreciate all your work and enjoy learning from it.

Ryan

Hi Charles.

I have a spreadsheet which will make any number of parallel box plots automatically (well, up to 25 anyway) as you type or paste the categories and values. It doesn’t use macros. You’re welcome to it if it’s of use. Just contact me.

Derek

Hi Derek – would love to utilise the spreadsheet if possible?

Mail me at Derek.Christie@boppoly.ac.nz

Dear Derek,

I would like to make use of the spreadsheet.

Regards,