Learn what statistics and estimators are in preparation for understanding how they helped solve the “German tank problem” in World War II.
http://www.shutterstock.com/pic-127443647/stock-vector-data-analysis.html?src=HEQViHQALBGWHPLoFdvnjg-1-2Sometimes statistics* seems magical. Of course, there isn’t really anything magical going on—but the fact that statistical methods can be used to pull useful information out of a hatful of messy data is, at a minimum, remarkable. And, as we’ll talk about today and next time, it’s extremely useful too.
Before we get started, I want to point out that the things called statistics that we’re going to talk about today are a part of, but different than the field of statistics, which is the science of collecting, sorting, organizing, and generally making sense of data.
A Statistical Thought Experiment
Okay, with that out of the way, let’s start off with a thought experiment. Imagine you’re handed a bag containing a bunch of tiles that are all carved into the shapes of integers. Someone else prepared the bag, so you have no idea how many total integer tiles are in it. You do, however, know that the first tile put in the bag was shaped like the number 1, the second like the number 2, and so on, and that the last tile put in the bag was, therefore, in the shape of the total number of tiles. Your task in this experiment is to randomly pull six integer tiles out of the bag, and then to use these integers to estimate the total number of tiles in the bag (which, remember, you don’t know beforehand). And, if you think about it for a minute, you’ll see that this is identical to asking you to estimate the value of the largest integer in the bag. So, how do you do it? No big surprise, the answer has something to do with today’s main topics: statistics and estimators.
What is a Statistic?
A statistic is a quantity calculated from a sample of data that tells us something about the properties of that sample. To help us better understand what this means, let’s go back and think about the bag of integer shaped tiles. In that example, the entire group of integers in the bag is called the “population,” and the six integers you pulled out of the bag are called a “sample.” There are, of course, many possible samples besides the one you pulled. For example, you could have pulled six entirely different integers. That still would have been a sample drawn from the population, but it would have been a different sample. So, in this case, a statistic is some number you can calculate from the six integers you pulled out of the bag that tells you something about those numbers.
Okay, so what does an actual statistic look like? Well, the minimum value of your six integer sample is one example of a statistic, and the maximum value of that sample is another. And, these statistics can be used to infer information about the six integer sample. For example, if you subtract the minimum value from the maximum value, you learn the range of the sample. Neither of these statistics is very useful for inferring information about the population of tiles as a whole though. For example, the range of the entire population could be very different than the range of the sample, since the bag could contain much higher or lower integers than are contained in the sample. So, is there some way to learn about an entire population from only a sampling of its data? There is—it’s called an estimator.