ôô

How to Make Box and Whisker Plots

Learn what box and whisker plots are, how to make them, and how they can help you interpret data.

By
Jason Marshall, PhD
July 15, 2010
Episode #027

How to Make Box and Whisker Plots

We’ve now talked about several different useful ways for looking at and understanding a group of numbers. The mean, median, and mode are great for determining the average value, and last time we learned that the range and standard deviation are useful for determining how much values are spread out. In this article we’re going to put this all together and see how these quantities can be used to make “box and whisker plots” that will help you visualize how numbers in a group are distributed.

The podcast edition of this article was sponsored by Go to Meeting. With this meeting service, you can hold your meetings over the Internet and give presentations, product demos and training sessions right from your PC. For a free, 45 day trial, visit GoToMeeting.com/podcast.

Review of the Range and Standard Deviation

One important thing to keep in mind about averages is that they are always associated with groups of numbers. This may seem obvious, but it’s an important point for the simple reason that not all groups are alike, even if they have the same average properties. For example, the average age of students in the sophomore class of a high school might be the same as the overall average age of all students in the high school, but the group of just sophomore students is very different than the group of all students—just ask a senior if they’re similar to a sophomore!

Clearly it’s important to distinguish whether or not two groups with similar average properties are, in fact, similar—and that’s where the concepts of range and standard deviation that we talked about last time shine. The range is simply the overall spread in a group of numbers. So, since freshmen typically enter high school when they are 14, and seniors graduate when they are 18 (hopefully!), the overall range of the ages of high schoolers is 18 – 14 = 4. In contrast, sophomores are generally 15 and 16 year old, so the range of the ages of sophomores is 16 – 15 = 1. Clearly, these two groups are different despite having the same mean.

In the last article, we also talked about an approximation to the standard deviation (which I called the quasi-standard deviation) that’s a bit easier to understand and calculate than the real thing. If you want to see an explanation of how to calculate the true standard deviation, check out last week’s Math Dude “Video Extra!” episode. But what’s important about the standard deviation? Well, the take-away message is that whereas the range tells us about the overall spread of numbers in a group, the standard deviation tells us how far the numbers are spread away from the group’s mean. That’s all we’ll say for now—we’ll talk more about the standard deviation next week.

What Is a Box and Whisker Plot?

And that brings us to the most important question of the day: How many whiskers does a cat have? That’s right—since we’re talking about box and whisker plots, I thought it appropriate to have a feline-focused question. And, after a bit of research, I’ve discovered that cats typically have a total of 24 whiskers—12 on each side of their nose, all arranged in four horizontal rows. But, that’s just a typical number. Some cats have fewer (occasionally they fall out and are replaced), and, for the sake of our argument here, let’s assume that some cats have more. Now, imagine going out into the world and staring into and below the eyes of 9 cats in order to count their whiskers. What does your statistical sample of whisker numbers look like? Well, let’s imagine you find cats with 20, 21, 22, 23, 24, 25, 26, 27, and 28 whiskers. So, the mean and median are both 24, and the range is 8. But wouldn’t it be nice if we could come up with a way to visualize what this distribution of cat whiskers looks like?

How to Make Box and Whisker Plots

That’s precisely what a box and whisker plot does. In one glance, the box and whisker plot tells you how the numbers in a distribution are arranged, and where the majority of them are located. And the best part is that it’s pretty easy to make. Here’s how to do it: First, make sure your list of whisker numbers is arranged in order from smallest to largest:

Next, find the median value—in our case, it’s 24 whiskers. Now, you need to find what are called the lower and upper medians. The lower median is just the median value of all the numbers below the overall median. So, since the median is 24, the lower median is just the median of 20, 21, 22, 23. When there is an even number of elements in a list, the median is the mean of the two numbers in the middle. In other words, it’s the number between 21 and 22, or 21.5. Similarly, the upper median is the median of the numbers above the overall median: that’s 25, 26, 27, 28. So the upper median is 26.5.

[[AdMiddle]And that’s everything we need to calculate in order to make the plot. So here’s how to proceed: Draw a normal number line that includes all the numbers from our sample—that is, it needs to include all the number of cat whiskers from 20 to 28. Just to give enough room to work, let’s make the number line extend from 18 to 30.

Now, draw a vertical line above the number line at the position of the overall median value, 24. Then draw two similar vertical lines above the number line at the positions of the lower and upper medians,21.5 and 26.5. Next, create a box by connecting together the two tops and two bottoms of the vertical lines at the upper and lower medians (this is your “box” of the box and whisker plot). Finally, draw your whiskers as horizontal lines that extend from the center of the left and right sides of the box out to the minimum and maximum values—20 and 28, respectively.

And that’s it! You’ve created a box and whisker plot that clearly shows the median (the line at the center of the box), the full range (the extent of the whiskers), and a measure of the typical width of the distribution (the box contains half of all the data points). And all that information is in just one plot! Although this was a simple and somewhat contrived example, I hope it’s clear that the box and whisker plot is a great tool to help you quickly and easily visualize and comprehend large sets of data.

Wrap Up

Last time I promised we’d take a look at some example of how statistical quantities like the standard deviation are used in the real world—in particular in science and politics. Well, we didn’t end up having time for that this week, so next week’s article will be full of tips to help you understand the statistics you see in media reports about science and politics.

Thanks again to our sponsor this week, Go To Meeting. Visit GoToMeeting.com/podcast and sign up for a free 45 day trial of their online conferencing service.

Please email your math questions and comments to mathdude@quickanddirtytips.com. You can get updates about the Math Dude podcast, the “Video Extra!” episodes on YouTube, and all my other musings about math, science, and life in general by following me on Twitter. And don’t forget to join our great community of social networking math fans by becoming a fan of the Math Dude on Facebook.

Until next time, this is Jason Marshall with The Math Dude’s Quick and Dirty Tips to Make Math Easier. Thanks for reading, math fans!

Related Tips

Facebook

Twitter

Pinterest