Tuesday, January 3, 2012

Political Analysis with Google Search Data: A Roundup

My friend and coauthor Sylvia Manzano pointed out this story from yesterday's Morning Edition on NPR. NPR's science correspondent, Shankar Vedantam, interviewed UNC sociologist Phil Cohen, who discussed how political web searches correlate with cultural and aesthetic web searches. Here's part of the transcript of the conversation between Morning Edition host Linda Werthheimer and Vedantam:
VEDANTAM: There's a sociologist that I spoke with at the University of North Carolina. His name is Phil Cohen. And what he did is he said can we apply this tool [Google Correlate] to politics. And so he said let me search for prominent, liberal and conservative commentators - people like Rachel Maddow and Stephen Colbert, or Rush Limbaugh.
And what he found, unsurprisingly actually, was that the places where people were doing a lot of searches for the liberal commentators tended to be liberal places. They were places that tended to vote for President Obama in the 2008 election.
WERTHEIMER: California.
VEDANTAM: Exactly. But he also found that the places which searched for the liberal commentators also tended to search for very particular kinds of foods.
WERTHEIMER: Now, that is very strange.
VEDANTAM: So let me play you a little bit of what Phil Cohen told me in terms of what the liberals who are searching for Rachel Maddow are also searching for, in terms of their food.
PHIL COHEN: On the liberal list are arugula pasta, beets nutrition, beets urine, fake meat, fennel salad, firm tofu, a variety of vegetarian cooking, vegetarian recipes. Something like a Republican stereotype of what a liberal food diet might be.
And, from my reader comments, I learned about this Italian blog, Studi e Proiezioni Elettorali (Surveys and Election Forecasts), that's been playing around with using Google Trends and Google Insights data to forecast European elections and trying different ways of weighting the these web search data to correct for some kinds of selection bias inherent in drawing inferences about the electorate from the population of internet searchers.

Some of my own musings about the value of web search data for political analysis here and here.

  1. http://www.google.com/insights/search/?hl=en-GB#q=%22ron%20paul%22%2Cbachmann%2Bhuntsman%2B%22rick%20perry%22%2Cromney%2Cgingrich%2Csantorum&geo=US-IA&date=today%201-m&cmpt=q

    Without using any weightings and/or corrections, and using a simple 3-day moving average, Google insights gives the following percentage (updated up to the 1st of january)

    Paul 34.74
    Santorum 18.83
    Romney 16.50
    Gingrich 10.45
    Bachmann+Huntsman+Perry: 19.47

    Unfortunately, as discussed in a comment to your previous post, the self selection bias when using Google data for the Iowa caucus can be very large and invalidate completely any statistical analysis. Therefore the previous numbers have to be taken with extreme case , and more attention have to given to the underlying trends instead.

    Regards, Gigi