Everyday Einstein tackles a topic everyone is thinking about this fall: Can you really trust those election polls? How do you design a reliable poll? How many people do you need to question to predict the results of a national election?
Around here at Everyday Einstein, when we talk about science experiments, we are usually discussing biology, chemistry, or physics experiments. But today I want to take a look at a different kind of science experiment, one that is receiving a lot of attention lately in light of the upcoming presidential election in the United States: the social science and statistical experiment of political polls.
How do you design a reliable poll? How many people do you need to question to predict the results of a national election? Can you trust poll results?
How Do Polls Work?
Large public opinion polls like those we turn to for predictions during an election year are typically conducted by specialized companies that often provide their results for a fee. Polling methods have not changed much in the past several decades: the preferred method of surveying participants is still by phone.
What has changed, however, is the response rate. According to the Pew Research Center, response rates in 1997, even after Caller ID was already quite popular in the US, were still as high as 36%. Thus if 1,000 respondents were required for a poll, a typical number for state polls, at least 2,777 people were called. However, the response rate dropped to only ~9% in 2012. Thus, the same poll would require more than 11,000 calls be made. Lower response rates are due in part to the fact that calls from unknown numbers are easier to ignore but are also likely connected to growing concerns of privacy and thus a hesitance to share personal information.
At least the number of people polled is less important than the range of demographics covered by the people included in the poll. Participants are typically selected through a process called Random Digit Dialing. Pollsters pick the first six digits of the phone numbers they will call, often targeting people who live within the same area, and then randomly generate the last four digits. The randomization allows for the inclusion of unlisted numbers and attempts to sample a cross section of the population that will be representative of the whole.
So how well do polls typically do at covering their representation bases? According to the Harvard Business Review, the groups that are most commonly under-represented in polling are young people, Spanish speakers, Evangelicals, and African Americans—particularly African American men.
For example, more and more people, myself included, do not use a land line at all. While land lines can be called by computer, federal regulations require that cell phones be dialed by hand, thus making polls that reach cell phone users more expensive. Missing responses from cell-phone-only users is not a problem, if they cover the full range of respondent demographics that are also represented among land line users. Unfortunately, that is not the case: those people who have given up their land lines tend to skew young, in the 18-30-year-old age group.
Another group entirely missed by phone polls: the ~7 million Americans who live overseas, including soldiers, teachers, missionaries, and students.
Online polls have started to gain traction but do not have the benefit of adjusting to decades of trial and error to hone their effectiveness. Additionally, an estimated 16% of Americans don’t use the internet (while almost everyone has a phone either in their pocket or at least in their house). Online polls also tend to over represent men, particularly those who are unemployed.
Can You Trust Poll Results?
So creating a reliable poll first requires the sampling of a truly representative subset of the group whose opinion you are trying to measure. Like any science experiment, you must consider the potential bias in your chosen methods.
A well-known case study of sample bias is the Literary Digest poll leading up to the 1936 presidential election between Alf Landon and Franklin D. Roosevelt. Previously one of the most well-trusted and accurate polls, the Digest poll failed to consider the sample bias inherent in polling people by phone in a year that marked the end of the Great Depression. At the time, only the upper middle and upper class could afford telephones and those classes tended to skew Republican. Thus, while their statistical conclusions accurately represented the people that they polled, the people that they polled did not accurately represent the voting public. They predicted Landon would receive 57% of the vote with Roosevelt earning only 43% when in fact, Roosevelt won with 62% of the vote.
When a certain bias cannot be avoided like, for example, a lack of Spanish speakers responding to a poll in English, the members of the missing group that do respond can be given more weight in the final tally.