Another way to characterize a distribution or sample is via a box plot. Specifically, a box plot provides a pictorial representation of the following statistics: maximum, 75th percentile, median (50th percentile), 25th percentile and minimum.
Box plots are especially useful when comparing samples and testing whether data is symmetric.
Real Statistics Data Analysis Tool: To generate a box plot, you can use the Box Plot option of the Descriptive Statistics and Normality supplemental data analysis tool found in the Real Statistics Resource Pack, as described in the following example. See also Special Charting Capabilities for how to create the box plot manually using Excel’s charting capabilities.
Example 1: A market research company asks 30 people to evaluate three brands of tablet computers using a questionnaire. The 30 people are divided at random into 3 groups of 10 people each, where the first group evaluates Brand A, the second evaluates Brand B and the third evaluates Brand C. The questionnaire scores from these groups are summarized in Figure 1.
Figure 1 – Sample data
To generate the box plots for these three groups, enter Ctrl-m and select the Descriptive Statistics and Normality supplemental data analysis tool. A dialog box will appear. Select the Box Plot option and insert A3:C13 in the Input Range. Check Headings included with the data and uncheck Use exclusive version of quartile.
The resulting plot is shown in Figure 2.
Figure 2 – Box Plot
Note too that the data analysis tool also generates a table, which may in fact be located behind the chart. For those who are interested, this table contains the information in Figure 3, as explained in Special Charting Capabilities.
For each sample, the box plot consists of a rectangular box with one line extending upward and another extending downward (usually called whiskers). The box itself is divided into two parts. In particular, the meaning of each element in the box plot is described in Figure 3.
|Top of upper whisker||Maximum value of the sample|
|Top of box||75th percentile of the sample|
|Line through the box||Median of the sample|
|Bottom of the box||25th percentile of the sample|
|Bottom of the lower whisker||Minimum of the sample|
Figure 3 – Box Plot elements
There are two versions of this table, depending on whether you check or uncheck the Use exclusive version of quartile field. If checked then the QUARTILE.EXC version of the 25th and 75th percentile is used, while if this field is unchecked then the QUARTILE (or equivalently the QUARTILE.INC) version is used.
From the box plot (see Figure 2) we can see that the scores for Brand C tend to be higher than for the other brands and those for Brand B tend to be lower. We also see that the distribution of Brand A is pretty symmetric at least in the range between the 1st and 3rd quartiles, although there is some asymmetry for higher values (or potentially there is an outlier). Brands B and C look less symmetric. Because of the long upper whisker (especially with respect to the box), Brand B may have an outlier (see Outliers and Robustness for a discussion of outliers).
We can also convert the box plot to a horizontal representation of the data (as in Figure 4) by clicking on the chart and selecting Insert > Charts|Bar > Stacked Bar.
Figure 4 – Horizontal Box Plot
See Special Charting Capabilities for more information about the Box Plot data analysis tool, especially regarding issues that arise when some of the data is negative.
See Box Plots with Outliers to see how to generate box plots in Excel which also explicitly show outliers.