Facebook data: be careful what you ‘Like’

Any doubts about the value of the pile of data upon which Zuckerberg sits, were dispelled for me last month with the publishing of a short research paper by Michal Kosinksi and colleagues from the Cambridge Psychometrics Centre in the Proceedings of the National Academy of Sciences.

The study took the ‘likes’ of 58,000 American Facebook users, and created statistical models against their volunteered demographic information and psychometric profiles (all with the users’ permission).

The goal of the study was to see how accurately personal details such as religious beliefs, political leaning and sexual orientation, could be predicted using only their Facebook likes.

Using over 700 likes, Kosinski built models with outstanding predictive accuracy using non-explicit ‘likes’ such as music choices rather than ‘likes gay marriage’.

Here are some of the results:

  • 88 per cent reliable for determining male sexuality
  • 95 per cent accurate in differentiating African-American from Caucasian American
  • 85 per cent accurate in differentiating Republican from Democrat
  • 82 per cent accurate in differentiating Christians and Muslims

The predictions use inference rather than explicit likes for its predictive power. That is, using large numbers of relatively innocuous likes such as music, TV and food preferences. For example, the best predictors of high intelligence include Thunderstorms, The Colbert Report, Science and Curly Fries, whereas low intelligence was indicated by Sephora, I Love Being A Mom, Harley Davidson, and Lady Antebellum.

Good predictors of male homosexuality included Mac Cosmetics, Wicked The Musical and No H8 Campaign (which is not a surprise, given it’s a charitable campaign designed to raise opposition to Proposition 8 banning gay marriage in California). Yes, my favourite TV show is the Colbert Report, but I’ll leave you to make your own inferences.

Whilst there are some bizarre correlations here (curly fries and high intelligence?), many fit comfortably within the boundaries of racial, sexual and political stereotypes. Bear in mind, these are predictors, not guaranteed outcomes.

Interestingly the models are also accurate in predicting consumption behaviour, particularly alcohol, cigarette and drug use. For example liking ‘That Spider is More Scared than U’ is a predictor of being a non-smoker.

Clearly, using aggregated “like” data can generate a surprisingly accurate picture of the personal traits of millions of Facebook users worldwide – a potential boon for advertisers. Unlike other academic research in this area, the authors don’t decry the dangers of such data being used by online advertisers and in fact give examples of how online advertising could be positively targeted for the users’ benefit. (full disclosure – the researchers were in part funded by Microsoft).

Any Facebook user can try this one-click personality test for themselves at http://www.youarewhatyoulike.com/.

The potential goes beyond advertising of course. In a recent article in the Guardian  it was reported that Facebook has begun to work with the police by using algorithms and historical data to predict which of their users might commit crimes using their services, e.g. male adult user chats with under 18s, most friends are female, uses keywords such as ‘sex’ or ‘date’. Has anyone seen Minority Report?

Unfortunately for Facebook, their share price continues to fall despite this big data vindication of what Zuckerberg’s been telling us.

Optimism that the social network has found a way to unlock the value of its mobile users raised the share price from $26 at the end of last year to almost $33 at the end of January. But it has since dropped back down, for a number of reasons including the comparative success of LinkedIn, data indicating that users spend less time on the site than they did just months ago, and a general sentiment in the investment community that Facebook should have been sold privately to a Google or Yahoo, and not floated on the general public with such limited streams of revenue.

I have tried to find references to the PNAS study in financial blogs, but it seems the investment community hasn’t picked up on what I think is a valuable piece of PR for Facebook.

What’s the best conclusion you can draw from all of this – don’t take investment advice from me!