Creating Simple Box Plots in Excel

In Box Plots we saw how to create box plots using the Box Plots option of the Real Statistics Descriptive Statistics and Normality data analysis tool. We now show how to create these box plots manually using Excel’s charting capabilities.

Example 1: Create the box plot for Example 1 of Box Plots using Excel’s charting capabilities.

Box plot data

Figure 1 – Box Plot data

Select the range containing the data, including the headings (A3:C13). Now create the table in the range E3:H8. The output in column F corresponds to the raw data from column A. Column G corresponds to column B and column H corresponds to column C. In fact once you construct the formulas for the range F4:F8, you can fill in the rest of the table by highlighting the range F4:H8 and pressing Ctrl-R.

The formulas for the cells in the range F4:F8 are as follows:

Box plot formulas

Figure 2 – Box Plot table formulas

Once you have constructed the table, you can create the corresponding box plot as follows:

  1. Select the data range E3:H7. Notice that the headings are included on the range, but not the last row.
  2. Select Insert > Charts|Column > Stacked Column
  3. Select Design > Data|Switch Row/Column if necessary so that the x-axis represents the brands
  4. Select the lowest data series in the chart (i.e. Min) and set fill to No fill (and if necessary set border color to No Line) to remove the lowest boxes. This is done by right clicking on any of the three Min data series boxes in the chart and selecting Format Data Series… On the resulting dialog box, choose Fill|No fill.
  5. Repeat the previous steps for the lowest visible data series (i.e. Q1-Min); i.e. right click on the Q1-Min data series and select Format Data Series… > Fill|No fill. Alternatively right click on the Q1-Min data series and press Ctl-Y.
  6. With the Q1-Min data series still selected, choose Layout > Analysis|Error Bars > More Error Bar Options. On the resulting dialog box (Vertical Error Bars menu), click on the Minus and Percentage radio buttons and insert a percentage error of 100%.
  7. Click on the Q3-Med data series (the uppermost one) and choose Layout > Analysis|Error Bars > More Error Bar Options. On the resulting dialog box (Vertical Error Bars menu), click on the Plus and Custom radio buttons and then click on the Specify Value button. Now specify the range F8:H8, i.e. the last row of the table you created above, in the dialog box that appears (in the Positive Error Values field).
  8. Remove the legend by selecting Layout > Labels|Legend > None.

The resulting box plot is

Box plot Excel

Figure 3 – Box Plot

Observation: The approach described above works perfectly for non-negative data. When a data set has a negative value, the  axis will be shifted upward by –MIN(R) where R is the data range containing the data. Thus if R ranges from -10 to 20, the range in the chart will range from 0 to 30. This is true of the manual approach described above as well as the box plot produced by the supplemental data analysis tool in the Real Statistics Resource Pack. We now show how to manually produce the box plot when one or more data elements are negative.

Example 2: Create the box plot for the data in Figure 4 using Excel’s charting capabilities.

Figure 4 contains the same data as in Figure 1 except that we have changed the value in cell B11 to make it negative.

Box plot negative data

Figure 4 – Box plot for negative data

From the data we construct the modified table in range E3:H8 as above and then carry out the following steps to create the chart:

  1. Select the data range E3:H6. Notice that the headings are included on the range, but not the last two rows.
  2. Select Insert > Charts|Column > Stacked Column
  3. Select Design > Data|Switch Row/Column if necessary so that axis represents the brands
  4. Select the lowest data series in the chart (i.e. Q1) and set fill to No fill (and if necessary set border color to No Line) to remove the lowest boxes. This is done by right clicking on any of the three Q1 data series boxes in the chart and selecting Format Data Series… On the resulting dialog box, choose Fill|No fill.
  5. With the Q1 data series still selected, choose Layout > Analysis|Error Bars > More Error Bar Options. On the resulting dialog box (Vertical Error Bars menu), click on the Minus and Custom radio buttons and then click on the Specify Value button. Now specify the range F7:H7, i.e. the second to last row of the table you created above, in the dialog box that appears (in the Minus Error Values field).
  6. Click on the Q3-Med data series (the uppermost one) and choose Layout > Analysis|Error Bars > More Error Bar Options. On the resulting dialog box (Vertical Error Bars menu), click on the Plus and Custom radio buttons and then click on the Specify Value button. Now specify the range F8:H8, i.e. the last row of the table you created above, in the dialog box that appears (in the Positive Error Values field).
  7. Remove the legend by selecting Layout > Labels|Legend > None.

The resulting chart is as follows:

Box Plot: alternative approach

Figure 5 – Box Plot: alternative approach

Observation: Unfortunately, the data analysis tool in the Real Statistics Resource Pack cannot produce this box plot automatically. When negative data is present there are two choices:

  • Produce the box plot manually as described in Example 2
  • Use the Box Plot option of the Real Statistics Descriptive Statistics and Normality data analysis tool (as described in Box Plots) and either accept that the y-axis will be displaced upwards or simply remove the labels on the y-axis by right clicking on the labels in the y-axis and selecting Delete (or by pressing the Backspace key)

24 Responses to Creating Simple Box Plots in Excel

  1. Michael says:

    Is there any way to add a box and whisker chart to an Excel spreadsheet using Visual Studio? I have not been able to find any help on the topic using C#.

  2. sofia says:

    I made my boxplot using the directions above for negative data but when I put my error bars in one of the ‘minus’ error bars doesn’t connect to the boxes. Would you know how to fix this?

  3. Bob says:

    As a workaround to the problem with negative values in the source data, would it be possible in your tool to create an automatically scaled artificial Y axis (covering the positive and negative range) using extremes from the calculated data, then put the boxplots and means on the secondary axis with a suitably scaled range but with the values hidden?

    • Charles says:

      Bob,
      In the past, I tried to find a way to do this but was not successful.
      I see that you have sent me an email with suggestions as to how to do this. I haven’t read your email yet, but will do so shortly since this approach sounds very interesting. Thank you for sending this to me.
      Charles

  4. Kirk Vodopals says:

    I think you need to change it to Min-Q1 (not Q1 – min) when you have lots of negative data (e.g. negative median, negative minimum and negative Q1)

    • Charles says:

      Kirk,
      I just tried that, but it doesn’t seem to work. I like the idea, but I have been able to make it work.
      Can you send me an example where you have made the modification(s) that you are suggesting? You can find my email address at Contact Us.
      Charles

  5. Catherine says:

    Hi Charles,

    Thank you for your tutorial. I have just tried creating a boxplot for a dataset containing both negative and positive values. Will it only work when there is only a single negative value in the dataset? Or will it not work when Q1 is negative?
    My dataset is as follows:
    Group 1: -0.22, 0.81, 1.02, 0.89, -0.44
    Group 2: 1.09, 1.07, 0.81, 1.58, 0.55, -0.31

    For Group 1:
    Q1 -0.22
    Med-Q1 1.03
    Q3-Med 0.08
    Q1-Min 0.22
    Max-Q3 0.13

    For Group 2:
    Q1 0.62
    Med-Q1 0.32
    Q3-Med 0.15
    Q1-Min 0.93
    Max-Q3 0.50

    My boxplot for Group 1 is off; the median (where the Med-Q1 and Q3-Med boxes meet) is higher than it should be, and the bottom error bar does not reach the Med-Q1 box (starts from the bottom of the Q1 box rather than the top like the other error bars).

    Thoughts?
    Thanks

    • Charles says:

      Catherine,
      Are you using the Real Statistics data analysis tool to create the box plot or are you doing it manually?
      I just used the Real Statistics data analysis tool for Group 1 and found that it works correctly even with two negative values. The chart is simply raised by .44 (i.e. the negative of the lowest negative value).
      If you created the chart manually, then you need to fix how you created the table. The first value is not Q1, but Q1-Min, and this has value +.22.
      Charles

      • Vicky says:

        Hi Charles,

        I am trying to create a boxplot with data that contains negative values. I am confused about the order the table before plotting. In the tutorial you have Q1, Med-Q1, Q3-Med, Q1-Min, and Max-Q3. In your response to Catherine you state that if the table is done manually it will be Q1-Min for the first value. I want to double check the order of the table to check if I am plotting correctly. I am plotting this manually.

        Thank you ,
        Vicky

  6. Rowan says:

    Dear Charles
    Thank you; I have been struggling to produce boxplots when there is negative data present, so your 2nd exercise here is very promising. I have not yet succeeded completely so I will try a few more permutations along the lines you provide.
    In step 1, you write: “1.Select the data range E3:H7. Notice that the headings are included on the range, but not the last two rows”. However E3:H7 omits only the last row, not the last two rows. Which is correct?
    Many thanks, Rowan

    • Rowan says:

      Got it! It is indeed the last two rows that need to be omitted in the original data selection; makes sense since those two are both used in specifying custom values for the error bars. So, there’s just a minor typo in instruction 1 applying to negative data. In your example it is E3:H6 rather than E3:H7 that should be selected.
      A tip for other readers: since the fractionally more complex process needed when there is negative data also works perfectly for data that is all positive, I recommend building your spreadhseet to cope with negative data in the first place.
      Many thanks again Charles,
      Rowan

      • Charles says:

        Rowan,
        Thanks for catching this error. I have just corrected the error on the webpage.
        I really appreciated your help in improving the quality of the website.
        Charles

  7. Josien says:

    Hi,
    I want to compare two groups over time (t0,t1,t2,t3) in a graph. Is it possible to group the boxplots of both groups together per timepoint? So, on the x-axis the timepoints and per timepoint 2 boxplots (for each group)?
    Hope you can help!
    Thanks a lot

    • Charles says:

      Josien,
      Yes. Put the data in 8 columns. Use the headings t0A, t0B, t1A, t1B, t2A, t2B, t3A, t3B (or something similar) as the column headings.
      Charles

  8. D says:

    Amazing…..
    I want the guidance to prepare the comparative boxplot for various 3 treatment on pain scale. we wish to show effect on pain scale before and after Rx1, before and after Rx2 and Before and After Rx3. How to plot in excel?

    • Charles says:

      D,

      The Real Statistics Resource Pack provides a data analysis tool that produces boxplots in Excel. See the webpage
      Boxplots

      If you have Excel 2016, you can use the new box chart capability.

      Charles

  9. hojat alaii says:

    thank you very much I enjoy of this site

  10. Silmi says:

    Thank you very much for your nice guidance. I have tried it, and it work.

  11. terry says:

    ok, got it. thank you very much

  12. terry says:

    i am trying to create the Chart of standard error of the means using excel 2013 and finding difficult to follow the following steps

    Highlight the sample means (range B19:E19 of Figure 6) and then select Insert > Charts|Line and choose the Line with Markers option.
    Under Chart Tools select Layout > Analysis|Error Bars and then More Error Bar Options > Vertical Error Bars
    Now select Both, Cap and Custom radio buttons, and click Specify Value
    On the dialog box that is displayed enter the range where the standard error is stored (B21:E21 in Figure 6) in both of the boxes that are displayed, and then click OK and then Close.

    please help.

Leave a Reply

Your email address will not be published. Required fields are marked *