Saturday, January 21, 2012

Google Knew About Gingrich SC Blowout: Forecasting and Assessing Public Opinion with Google Search Data

Right now, CBS News reports that roughly three quarters of South Carolina's precincts have reported results and that that Newt Gingrich has won about 41% of the reported vote compared to 27% for Mitt Romney. Gingrich is winning 60% of the two-candidate, Romney-Gingrich vote (or about 1.5 votes for every vote won by Romney). Though several recent polls had shown Gingrich pulling ahead of Romney during the last few days, none had indicated the extent to which he had displaced Romney as the South Carolina front runner.

The most pro-Gingrich poll reported by RealClearPolitics in the last week before the election was conducted by Public Policy Polling from January 18-20. It showed Gingrich up by 9% over Romney (37% to 28%), winning about 57% of the two candidate vote. (PPP's one night returns for January 20 showed with 40% to Romney's 26%, or 61% of the two-candidate vote.) Other polls showed Gingrich up by 2% to 6%.

Conventional polling was able to capture the Gingrich surge if it continued right up until the last minute, but most polling organizations had stopped rolling samples by January 18. As it turns out, though, Google Insights search index data also captured the Gingrich surge. As I have blogged about elsewhere, I have strong suspicions that web search volumes are indicative of affective attachment to political objects (see here, here, here, here, and here.) Yesterday, I posted a forecast for the South Carolina based on Google Insights search index data through January 18 predicting Gingrich winning 55% of the Romney-Gingrich vote.

Google Insights data through yesterday,  January 20, are now available. These show Gingrich's continued extension of his lead over Romney in the intervening two days' worth of data. Using the two candidate search volume data I described in yesterday's post, the relative Google Insights's standing of Newt Gingrich and Mitt Romney over the preceding eight days (from January 13-January 20) is illustrated below.

The final day's results show Gingrich attracting 64% of the two-candidate web-search volume. Taking that as a forecast for today's primary yields a prediction that outperforms most individual conventional polls, poll averages, or poll-based models. (Also, it is important to remember that I am using the most crude, publicly available web-search data imagineable. If someone working with raw web-use data were so inclined, they could generate web-search volumes from users who fit profiles of likely voters, weight data from different types of users, or generally do something much more precise and sophisticated that whatever I can pull off through the regular Google Insights interface.)

The most important point, for me, is not that Google search volumes might be useful for forecasting, though that is a fun, potential implication of these efforts. For me, the important point is that web searches are indicative of public affect or political support (or some related concept) and that they, therefore, might be useful for making inferences about the public's support for or attachment to candidates or policies for which conventional survey data is simply not available, which is precisely the argument Sylvia Manzano and I advance in our forthcoming paper on Latino support for the nomination of Sonia Sotomayor to the Supreme Court.

