In Box Plots we saw how to create box plots using the Box Plots option of the Real Statistics **Descriptive Statistics and Normality** data analysis tool. We now show how to create these box plots manually using Excel’s charting capabilities.

**Example 1**: Create the box plot for Example 1 of Box Plots using Excel’s charting capabilities.

**Figure 1 – Box Plot data**

Select the range containing the data, including the headings (A3:C13). Now create the table in the range E3:H8. The output in column F corresponds to the raw data from column A. Column G corresponds to column B and column H corresponds to column C. In fact once you construct the formulas for the range F4:F8, you can fill in the rest of the table by highlighting the range F4:H8 and pressing Ctrl-R.

The formulas for the cells in the range F4:F8 are as follows:

**Figure 2 – Box Plot table formulas**

Once you have constructed the table, you can create the corresponding box plot as follows:

- Select the data range E3:H7. Notice that the headings are included on the range, but not the last row.
- Select
**Insert > Charts|Column > Stacked Column** - Select
**Design > Data|Switch Row/Column**if necessary so that the x-axis represents the brands - Select the lowest data series in the chart (i.e.
**Min**) and set fill to No fill (and if necessary set border color to No Line) to remove the lowest boxes. This is done by right clicking on any of the three Min data series boxes in the chart and selecting**Format Data Series…**On the resulting dialog box, choose**Fill|No fill**. - Repeat the previous steps for the lowest visible data series (i.e.
**Q1-Min**); i.e. right click on the Q1-Min data series and select**Format Data Series… > Fill|No fill**. Alternatively right click on the Q1-Min data series and press Ctl-Y. - With the Q1-Min data series still selected, choose
**Layout > Analysis|Error Bars > More Error Bar Options**. On the resulting dialog box (Vertical Error Bars menu), click on the**Minus**and**Percentage**radio buttons and insert a percentage error of**100%.** - Click on the
**Q3-Med**data series (the uppermost one) and choose**Layout > Analysis|Error Bars > More Error Bar Options**. On the resulting dialog box (Vertical Error Bars menu), click on the**Plus**and**Custom**radio buttons and then click on the**Specify Value**button. Now specify the range F8:H8, i.e. the last row of the table you created above, in the dialog box that appears (in the**Positive Error Values**field). - Remove the legend by selecting
**Layout > Labels|Legend > None**.

The resulting box plot is

**Figure 3 – Box Plot**

**Observation**: The approach described above works perfectly for non-negative data. When a data set has a negative value, the axis will be shifted upward by –MIN(R) where R is the data range containing the data. Thus if R ranges from -10 to 20, the range in the chart will range from 0 to 30. This is true of the manual approach described above as well as the box plot produced by the supplemental data analysis tool in the Real Statistics Resource Pack. We now show how to manually produce the box plot when one or more data elements are negative.

**Example 2**: Create the box plot for the data in Figure 4 using Excel’s charting capabilities.

Figure 4 contains the same data as in Figure 1 except that we have changed the value in cell B11 to make it negative.

**Figure 4 – Box plot for negative data**

From the data we construct the modified table in range E3:H8 as above and then carry out the following steps to create the chart:

- Select the data range E3:H6. Notice that the headings are included on the range, but not the last two rows.
- Select
**Insert > Charts|Column > Stacked Column** - Select
**Design > Data|Switch Row/Column**if necessary so that axis represents the brands - Select the lowest data series in the chart (i.e.
**Q1**) and set fill to No fill (and if necessary set border color to No Line) to remove the lowest boxes. This is done by right clicking on any of the three Q1 data series boxes in the chart and selecting**Format Data Series…**On the resulting dialog box, choose**Fill|No fill**. - With the Q1 data series still selected, choose
**Layout > Analysis|Error Bars > More Error Bar Options**. On the resulting dialog box (Vertical Error Bars menu), click on the**Minus**and**Custom**radio buttons and then click on the**Specify Value**button. Now specify the range F7:H7, i.e. the second to last row of the table you created above, in the dialog box that appears (in the**Minus Error Values**field). - Click on the
**Q3-Med**data series (the uppermost one) and choose**Layout > Analysis|Error Bars > More Error Bar Options**. On the resulting dialog box (Vertical Error Bars menu), click on the**Plus**and**Custom**radio buttons and then click on the**Specify Value**button. Now specify the range F8:H8, i.e. the last row of the table you created above, in the dialog box that appears (in the**Positive Error Values**field). - Remove the legend by selecting
**Layout > Labels|Legend > None**.

The resulting chart is as follows:

**Figure 5 – Box Plot: alternative approach**

**Observation**: Unfortunately, the data analysis tool in the Real Statistics Resource Pack cannot produce this box plot automatically. When negative data is present there are two choices:

- Produce the box plot manually as described in Example 2
- Use the
**Box Plot**option of the Real Statistics**Descriptive Statistics and Normality**data analysis tool (as described in Box Plots) and either accept that the y-axis will be displaced upwards or simply remove the labels on the y-axis by right clicking on the labels in the y-axis and selecting**Delete**(or by pressing the**Backspace**key)

I made my boxplot using the directions above for negative data but when I put my error bars in one of the ‘minus’ error bars doesn’t connect to the boxes. Would you know how to fix this?

Sofia,

I would have to see what you have done in order to figure out what has gone wrong and what to do about it.

Charles

As a workaround to the problem with negative values in the source data, would it be possible in your tool to create an automatically scaled artificial Y axis (covering the positive and negative range) using extremes from the calculated data, then put the boxplots and means on the secondary axis with a suitably scaled range but with the values hidden?

Bob,

In the past, I tried to find a way to do this but was not successful.

I see that you have sent me an email with suggestions as to how to do this. I haven’t read your email yet, but will do so shortly since this approach sounds very interesting. Thank you for sending this to me.

Charles

I think you need to change it to Min-Q1 (not Q1 – min) when you have lots of negative data (e.g. negative median, negative minimum and negative Q1)

Kirk,

I just tried that, but it doesn’t seem to work. I like the idea, but I have been able to make it work.

Can you send me an example where you have made the modification(s) that you are suggesting? You can find my email address at Contact Us.

Charles

Hi Charles,

Thank you for your tutorial. I have just tried creating a boxplot for a dataset containing both negative and positive values. Will it only work when there is only a single negative value in the dataset? Or will it not work when Q1 is negative?

My dataset is as follows:

Group 1: -0.22, 0.81, 1.02, 0.89, -0.44

Group 2: 1.09, 1.07, 0.81, 1.58, 0.55, -0.31

For Group 1:

Q1 -0.22

Med-Q1 1.03

Q3-Med 0.08

Q1-Min 0.22

Max-Q3 0.13

For Group 2:

Q1 0.62

Med-Q1 0.32

Q3-Med 0.15

Q1-Min 0.93

Max-Q3 0.50

My boxplot for Group 1 is off; the median (where the Med-Q1 and Q3-Med boxes meet) is higher than it should be, and the bottom error bar does not reach the Med-Q1 box (starts from the bottom of the Q1 box rather than the top like the other error bars).

Thoughts?

Thanks

Catherine,

Are you using the Real Statistics data analysis tool to create the box plot or are you doing it manually?

I just used the Real Statistics data analysis tool for Group 1 and found that it works correctly even with two negative values. The chart is simply raised by .44 (i.e. the negative of the lowest negative value).

If you created the chart manually, then you need to fix how you created the table. The first value is not Q1, but Q1-Min, and this has value +.22.

Charles

Hi Charles,

I am trying to create a boxplot with data that contains negative values. I am confused about the order the table before plotting. In the tutorial you have Q1, Med-Q1, Q3-Med, Q1-Min, and Max-Q3. In your response to Catherine you state that if the table is done manually it will be Q1-Min for the first value. I want to double check the order of the table to check if I am plotting correctly. I am plotting this manually.

Thank you ,

Vicky

Vicky,

The order is

Min

Q1-Min

Med-Q1

Q3-Med

Max-Q3

Charles

Dear Charles

Thank you; I have been struggling to produce boxplots when there is negative data present, so your 2nd exercise here is very promising. I have not yet succeeded completely so I will try a few more permutations along the lines you provide.

In step 1, you write: “1.Select the data range E3:H7. Notice that the headings are included on the range, but not the last two rows”. However E3:H7 omits only the last row, not the last two rows. Which is correct?

Many thanks, Rowan

Got it! It is indeed the last two rows that need to be omitted in the original data selection; makes sense since those two are both used in specifying custom values for the error bars. So, there’s just a minor typo in instruction 1 applying to negative data. In your example it is E3:H6 rather than E3:H7 that should be selected.

A tip for other readers: since the fractionally more complex process needed when there is negative data also works perfectly for data that is all positive, I recommend building your spreadhseet to cope with negative data in the first place.

Many thanks again Charles,

Rowan

Rowan,

Thanks for catching this error. I have just corrected the error on the webpage.

I really appreciated your help in improving the quality of the website.

Charles

Hi,

I want to compare two groups over time (t0,t1,t2,t3) in a graph. Is it possible to group the boxplots of both groups together per timepoint? So, on the x-axis the timepoints and per timepoint 2 boxplots (for each group)?

Hope you can help!

Thanks a lot

Josien,

Yes. Put the data in 8 columns. Use the headings t0A, t0B, t1A, t1B, t2A, t2B, t3A, t3B (or something similar) as the column headings.

Charles

Amazing…..

I want the guidance to prepare the comparative boxplot for various 3 treatment on pain scale. we wish to show effect on pain scale before and after Rx1, before and after Rx2 and Before and After Rx3. How to plot in excel?

D,

The Real Statistics Resource Pack provides a data analysis tool that produces boxplots in Excel. See the webpage

Boxplots

If you have Excel 2016, you can use the new box chart capability.

Charles

thank you very much I enjoy of this site

Thank you very much for your nice guidance. I have tried it, and it work.

ok, got it. thank you very much

i am trying to create the Chart of standard error of the means using excel 2013 and finding difficult to follow the following steps

Highlight the sample means (range B19:E19 of Figure 6) and then select Insert > Charts|Line and choose the Line with Markers option.

Under Chart Tools select Layout > Analysis|Error Bars and then More Error Bar Options > Vertical Error Bars

Now select Both, Cap and Custom radio buttons, and click Specify Value

On the dialog box that is displayed enter the range where the standard error is stored (B21:E21 in Figure 6) in both of the boxes that are displayed, and then click OK and then Close.

please help.