Learn how bags of potato chips show that the median value of a sample is often a better measure of its average than the more common mean.
In the last article, we learned that there are several ways to calculate the average value of a sample of data. We focused on one such method, the arithmetic mean, and used it to calculate the average number of potato chips in a bag. Today, we’re going to continue where we left off and talk about a second quantity used to determine average values: the median.
Review of the Arithmetic Mean
Let’s start with a quick recap of the last article on how to calculate mean values. We imagined opening 11 fun-pack sized bags of potato chips and counting and recording the total number of chips in each bag. Our imaginary bags of chips contained 18, 15, 19, 18, 23, 17, 18, 16, 19, 34, and 17 chips. The first big statistical task we undertook was to calculate the mean number of chips in a bag. We calculated the arithmetic mean by adding up the total number of chips contained in all bags, and then dividing this number by the total number of bags. In our example, that’s 214 / 11, giving a mean value of about 19.45. Finally, just before finishing up, we noted that the arithmetic mean is not the only way to calculate an average value and that some of these other ways are more useful for analyzing certain types of problems, and I claimed that our potato chip problem is itself just such a case. But why did I make this claim?
The Problem with Arithmetic Means as Average Numbers
To find out why, let’s go back and take another look at our sample of data. First, let’s write out the number of chips in each bag in order from smallest to largest: 15, 16, 17, 17, 18, 18, 18, 19, 19, 23, 34. The mean value we calculated earlier, 19.45, logically falls between the minimum of 15 and the maximum of 34; but it’s not really in the middle of the sample like you might expect the average value to be. In fact, only 2 of the 11 numbers are larger than the mean value (23 and 34). Why is that? Well, it’s because the mean value is skewed toward a higher value since the number 34 is much larger than any of the other numbers—perhaps that particular bag was crushed, breaking the chips into a bunch of small pieces.
But the situation could be even worse: What if that bag was really crushed, and the chips were broken into really small pieces—instead of 34 small chips, imagine the bag contained 100 tiny chips! If that were the case, the mean number of chips would jump from 19.45 to almost 25.5. It’s certainly clear that 25.5 is not a very good representation of the typical number of chips in a fun-pack bag since this supposedly average value is higher than the total number of chips in all but one of the 11 bags. The problem is that the single anomalously high value of 100 chips is throwing off our calculation of the mean. So, we need another way to measure the average value that is resistant to this type of outlying value.
What is the Median Value?
And that’s exactly what the median value is: an outlier-resistant measure of the average value of a sample of data. In other words, it’s a value that’s similar in interpretation to the arithmetic mean, but that doesn’t get thrown off by a single crazy-big or small data point. So how is the median value actually calculated? It’s remarkably easy. The first step, which we’ve already done, is to write the data in our sample in order from smallest to largest. If we include the extremely crushed fun-pack bag of chips in our sample, this list is: 15, 16, 17, 17, 18, 18, 18, 19, 19, 23, 100. Now, the median value is simply defined to be the number in the middle. In this case, since there are 11 values, the median is the number in the middle with 5 values on either side of it. In other words, it’s 18.
Why the Median Value Matters in Real Life
[[AdMiddle]Notice that the size of the large number 100 doesn’t impact the median value at all. In fact, that number could have been 1000, 10000, or even larger and the median value would have been exactly the same. That ability to resist outliers is exactly why the median value is such a useful and important statistic for describing many measurable quantities in the real-world—for example: average housing prices. Why? Well, most cities tend to have lots of mid-priced houses, and a few astronomically expensive properties. Describing the average housing price in a city using the median instead of the mean statistic ensures that these few extremely expensive (and certainly atypical) properties don’t skew the overall average price to a higher value.
But that’s not all median values have to offer. Next time, we’ll take a look at a very cool trick that uses median averaging to make people disappear from your photographs! Who knows—it might come in handy after your next vacation, so be sure to check it out. And be sure to watch this week’s Math Dude “Video Extra!” episode on YouTube too—it’ll feature a few more tips and tricks to help you calculate median values in various situations.
Okay, that’s all the math we have time for today. If you like what you’ve read today and have a few minutes to spare, can you please do me a favor and leave a review on iTunes? Thanks in advance! And while you’re there, don’t forget to subscribe to the podcast and ensure you’ll never miss a new Math Dude episode.
If you long for more math, I have two great ways to help you get your fill. First, if you’re interested in my day-to-day thoughts about the latest math and science news, please follow me on Twitter. And second, if you’d like to get updates about the show and to interact with your fellow math fans, please become a fan of the Math Dude on Facebook. I hope to see you there!
Until next time, this is Jason Marshall with The Math Dude’s Quick and Dirty Tips to Make Math Easier. Thanks for reading, math fans!