Back off man, I’m a scientist: Advice on the use of ‘big data’

It’s en vogue these days for statisticians and business analysts to refer to themselves as ‘scientists’, which is fine —as long as such people bring a scientist’s mindset to the task.

“Our business is infested with idiots who try to impress by using pretentious jargon.” So said David Ogilvy sometime in the 1960s, and the phrase is even more powerful today. Regular readers of my columns will know that I have little patience for jargon, despite the fact (or perhaps because) I’ve worked in buzzword-heavy industries my entire career. As a former scientist, the one I really object to now is ‘data scientist’.

Science is nothing without observational data. Therefore, every scientist who has ever lived is by definition a type of data scientist, but not necessarily vice versa. Please understand I’m not a snob. It’s not that I object to you statisticians or glorified business analysts using the title ‘scientist’. But if you’re going to call yourself one, make sure you behave like one.

So much of our industry is focused on collecting huge quantities of consumer data—behavioural, psycho-demographic, transactional data—combined into massive repositories that, mined by data scientists, will help our clients secure that consumer and increase wallet share.

What many advertisers and marketers seem to miss is that in making data your unrelenting focus, you might miss the forest (knowledge) for the trees (data).

Too often digital marketers and their agencies set about gathering data with no clear idea of what they want to do with it. Even Anjul Bhambhri, vice president of big data products at IBM, says,

 “A data scientist is somebody who is inquisitive, who can stare at data and spot trends.”

But making observations is only the first step of the scientific method. The critical part is asking an interesting question and framing a hypothesis to test it.  And this goes way beyond split testing, which might tell us which creative has a higher conversion rate, but never explains why—and therefore never becomes knowledge that can be re-used.

In his article, “Eyes bigger than stomachs: Data glut“, Jim Meskauskas observes that,

“Data is turned into information, and information is turned into knowledge. Sometimes though, data is meaningless and provides useless information that results in inapplicable knowledge.”

For example, last month I cited Facebook research that correlates a penchant for curly fries with high intelligence. I’m not sure anyone knows how to really monetize this factoid. And it is not solely the role of the data scientist to ask the interesting questions; this responsibility lies with marketers, who could benefit from using the scientific method in their daily practice.

I know this sounds like heresy from a data-driven marketer. But sometimes there are diminishing returns from mining the data vein too deep.

The industry is engaged in, as Meskauskas puts it, “a frenzied gathering of bits, collecting any and all manner of jetsam and flotsam. There is a subconscious belief that the marketer will know what it is that they are looking for once they find it.“

But without some sense of where the data might lead, the resources expended to collect it might be better spent on more media to send more messages, he adds.

Meskauskas recognizes that more data isn’t better, and in fact can be worse. Vast expanses of data have a way of lulling organizations into a false sense of security. It is easy to believe that the answer to virtually any business question will be available because all available data has been collected.

Whilst I was shopping in The Times Book Shop on the weekend, I stumbled upon the latest novel from one of my favourite authors, Will Self. And then I thought, “Why haven’t I seen this on Amazon? And why doesn’t Amazon know my favourite authors and musicians? Why have they never asked me? And why can’t I simply get email notifications from Amazon when their works are released?”

It’s well known that Amazon’s recommendation service accounts for at least 30 per cent of its sales, so I’m not criticizing the company’s approach. My point is that we shouldn’t forget in this world of big data that some simple things like preferences still have relevance and power. If Amazon had the knowledge that I buy everything Will Self and Martin Amis publish, I’m confident within two standard deviations that its share of my wallet would be greater than The Times.