South Carolina Google Insights Forecast Through January 18

I have previously written about using web search volumes as a proxy for survey data to indicate support for political candidates or proposals as an alternative or adjunct to conventional polling data. My tentative position, that web searches indicate support, and that relative web search volumes can indicate relative levels of support (especially cross-sectionally) indicates potential for using web search volumes to forecast election outcomes. My efforts to use Google Insights index scores to forecast the GOP Iowa Caucuses produced pretty mixed results compared to conventional polling (and poll-based models).

I missed forecasting the New Hampshire primary while I was traveling, but I am back in action for South Caroline. (I also plan to go back and "forecast" New Hampshire over the next few days.) My intuition (and hope) is that search data will do better for forecasting in primaries than in caucuses (and better in open primaries than closed primaries, and better still in general elections than in primaries).

In any event, I have collected Google Insights scores from South Carolina over the past 90 days for Romney (the combined indices of "Mitt Romney" and "Romney") and Gingrich (the combined indices of "Newt Gingrich" and "Gingrich") through January 18 in order to develop a forecast of the relative vote totals of those two candidates, i.e. how well they do compared to one another (as opposed to how well they do among all votes cast for all candidates).

The first figure illustrates the combined index scores for each candidate since December 1.

The second shows the relative proportion of the total Google search volume dedicated to each candidate with the values for January 18 labeled.

As of two days ago, Romney's commanding lead in expressed interest in South Carolina has dwindled substantially. As of the 18th, he lead Gingrich by a margin of only 0.53 to 0.47. His proportion of the two web search volume has shrunk to just over half. It had been as high as 0.75 as late as January 13.

Moreover, if the recent trend in the data through January 18 has continued over the last two days, it is likely that Gingrich's web search prominence has overtaken Romney. Using data for the last seven observed days (January 12-18), a linear model predicts a decline of 0.04 points in Romney's proportion of the two candidate web search volume for each passing day, and an increase of 0.04 points in Gingrich's share. Extrapolating that estimate forward through today gives Gingrich a two percentage point lead in the head-to-head web search race, and extending it through tomorrow would give him a final advantage of 10% over Romney.

Taking that three-day extrapolation of the seven-day model as the Google Insights forecast, then, predicts that Gingrich will win 55.3% of the two candidate vote share in South Carolina with Romney claiming the remaining 44.7%. If Paul and Santorum take a combined 30% of the vote in South Carolina, this indicates total vote shares of 38.5% and 31.5%, respectively, for Gingrich and Romney. If Paul and Santorum take 20%, Gingrich and Romney's  predicted vote shares increase to  44.0% and 36.0%.

Update 4:05PM: For whatever it's worth, 538's final South Carolina forecast (based on a statistical model of polling data) is:

Gingrich: 35.6%
Romney: 32.5%
Paul: 15.8%
Santorum: 12.8%

For those of you keeping score, that forecast gives Gingrich 52.3% of the Gingrich-Romney, two candidate vote, which is 3% lower than my Google Insights-based forecast of 55.3%.

