Excel Box and Whisker Diagrams (Box Plots)
Tuesday, June 7, 2011 by Jon Peltier
Peltier Technical Services, Inc. Copyright © 2015 .
Box and Whisker Charts (Box Plots) are commonly used in the display of statistical analyses. Microsoft Excel does not have a built in Box and Whisker chart type, but you can create your own custom Box and Whisker charts, using stacked bar or column charts and error bars. This tutorial shows how to make box plots, in vertical or horizontal orientations, in all modern versions of Excel.
In its simplest form, the box and whisker diagram has a box showing the range from first to third quartiles, and the median divides this large box, the “interquartile range”, into two boxes, for the second and third quartiles. The whiskers span the first quartile, from the second quartile box down to the minimum, and the fourth quartile, from the third quartile box up to the maximum.
Sample Data and Calculations
To play along at home in Excel 2007 or 2010, download the workbook Excel_2007_Box_Plot_Workbook.xlsx .
Let’s use the following simple data set for our tutorial. The values were taken from a normally distributed population with a mean of 10 and standard deviation of 5. There are four sets of 20 values.
All of these values are positive. If your data set has mixed positive and negative values, this technique requires major modifications.
First, insert a bunch of blank rows, and set up a range for calculations. Only the horizontal version of the box plot uses the last calculated row, “Offset”. It will not hurt to include it in the vertical box plot’s calculations.
First, compute some simple statistics, such as the count, mean, and standard deviation. The formulas used in column B are shown in column G of the screen shot.
Now let’s compute the minimum and maximum, median, and first and third quartiles.
Finally, let’s determine which values we need to plot. Our chart has a box for the second quartile, which shows the difference between median and first quartile calculated above. It has a box for third quartile, which show the difference between the third quartile calculation and the median. The bottom of the lower box rests on the first calculated quartile. The down whisker is as long as the first quartile minus the minimum, and the up whisker is as long as the maximum minus the third quartile.
The offset values are calculated as follows: In my example, I have four categories, Alpha through Delta. I can divide my horizontal chart into four horizontal strips, numbered from 0 to 4, each containing one box-and-whisker unit. I need to position my average points in the middle of each 1-unit horizontal strip. These will ultimately go onto a secondary vertical axis which I will have conveniently scaled from 0 to 4. Hence the Y values I will need are 0.5, 1.5, 2.5, and 3.5.
Select the header row of the calculated data, then hold Ctrl while selecting the three rows that include Bottom, 2Q Box, and 3Q Box. This multiple-area range is highlighted in orange below.
With this range selected, insert a stacked column chart or a stacked bar chart. Be sure to use the stacked version, and not the 100% stacked version, of the column or bar chart.
The labels in the bar chart go bottom-to-top. To reverse the labels, select the vertical axis, press Ctrl-1 (numeral one) to open the Format Axis dialog, then check the “Categories in Reverse Order” box, then under “Horizontal Axis Crosses”, select “At maximum category”.
To add the down whisker, select the Bottom series, then in the Chart Tools > Layout tab, click Error Bars, and select More Error Bar Options from the bottom of the menu. Choose the Minus direction, select Custom for Error Amount, and click on Specify Value. Leave the contents of the Positive Error Value box alone (“=<1>”) in the mini dialog that appears, then clear the Negative Error Value box and select the Whisker- row from the table (B14:E14). Click OK and Close to get back to Excel. These “down” error bars (whiskers) extend from the bottom (left) edge of the 2Q Box downward (leftward) into the Bottom series.
To add the up whisker, select the 3Q Box series,then in
the Chart Tools > Layout tab, click Error Bars, and select More Error Bar Options from the bottom of the menu. Choose the Plus direction, select Custom for Error Amount, and click on Specify Value. Leave the contents of the Negative Error Value box alone (“=<1>”) in the mini dialog that appears, then clear the Positive Error Value box and select the Whisker+ row from the table (B15:E15). Click OK and Close to get back to Excel.
These “up” error bars (whiskers) extend upward (rightward) from the top (right) of the 3Q Box.
Now we can format the boxes. Select the Bottom series, and apply no fill and no border, so it is hidden. Then select each of the 2Q Box and 3Q Box series, and apply a dark border and a light fill.
Adding the Mean
To add the mean as a series of markers, select the Mean row in the calculated range (highlighted in blue). If you are making a horizontal box plot, hold Ctrl and also select the Offset row (highlighted in green), so both areas are selected. Copy the selected range.
Select the chart, and use Paste Special to add the data as a new series. If you are making a horizontal box and whisker diagram, check the “Category (X Labels) in First Row” box. The “Series Names in First Column” box should already be checked.
The new series is added as another column or bar stacked on top of the existing ones.
Select this new series, then on the Chart Tools > Design tab, click on Change Chart Type. If you are making a vertical box plot, choose a Line Chart style. If you are making a horizontal box plot, choose an XY Scatter style.
The points in the horizontal box plot are in reverse order. To change the order of points, select the secondary vertical axis (right edge of the chart), press Ctrl-1 (numeral one) to open the Format Axis dialog, then check the “Values in Reverse Order” box.
If you’re making a horizontal box plot in Excel 2003, this last process is a little more involved. Excel draws both secondary axes, but the vertical one is hidden behind the primary axis with the text labels (below left). Double click on the secondary horizontal axis (top of chart), and on the scale tab of the Format Axis dialog, check “Value (Y) Axis Crosses at Maximum Value” (below right).
Excel 2003, continued: Double click the secondary vertical axis (right of chart), and on the scale tab, check “Values in Reverse Order” and uncheck “Value (X) Axis Crosses at Maximum Value” (below left). Finally, select the secondary horizontal axis (top) and click Delete; Excel will now plot the XY series on the primary horizontal axis.
All Versions: Now format the mean series: remove the line, and use an appropriate marker of a contrasting color. If you’ve made a horizontal box plot, hide the secondary Y axis (right edge of the chart) by choosing no tick marks, no tick labels, and no line in the Format Axis dialog.
That was easy and didn’t take too long.
Peltier Tech Chart Utility
This tutorial shows how to create Box and Whisker Charts in Excel, including the calculations and specialized data layout needed, and the detailed combination of chart series and chart types required. This manual process takes a little time, can be prone to error, and becomes tedious.
I have created the Peltier Tech Chart Utility for Excel to create Box Plots (and many other custom charts) automatically from raw data. This utility, a standard Excel add-in, lays out data in the required layout, then constructs a chart with the right combination of chart types. This is a commercial product, tested on hundreds of machines in a wide variety of configurations, which saves time and aggravation.
The Peltier Tech Chart Utility creates box plots in either horizontal or vertical orientation. It provides six different methods for calculating quartiles. This add-in also provides three styles for the box plots, including one that shows outliers.
Please visit the Peltier Tech Chart Utility page for more information.Source: peltiertech.com