Research Buyers Guide

The authoritative guide to research suppliers. Go

Research Jobfinder

The No.1 jobs resource for research and insight professionals. Go

Research-live.com

The definitive source of research news and opinion. Go

Fair Data

The Fair Data mark helps consumers recognise who they can trust. Go

International Journal of Market Research (IJMR)

The world authority on research methodologies and techniques for professionals and academics. Go

Geodemographics Knowledge Base (GKB)

For people interested in the application of geodemographics and geo-spatial analysis. Go

x

 

[Digital first] means papers that have not yet appeared in print. See all

Peter Mouncey Blog

'Everything is interesting' - big challenges in big data

31-03-2014
In a keynote interview at the recent MRS annual conference, the renowned author, Will Self, commented that potentially everything is interesting; humans are passive entities that need to be stimulated. 

We have the option of looking at the world from deep and narrow or broad and shallow perspectives. We can experience epiphanies, or insights, seeing the relationship between detail and the bigger picture. 

When asked how he knew when to stop researching the background for his novels, Self replied that it is never possible to have all the available data, and time is not inexhaustible, whilst there is always the temptation to attempt completeness. Therefore, every research project needs a structure. 

Finally, in creating a narrative, Self reminded us that this takes away as much as it gives – it is just one, our own, interpretation of data, events etc.


I mused on these points from Self’s interview during a recent fascinating one day conference organised by Cancer Research UK and Winton Capital Management at The British Library on April 1st on the theme of ‘Multidisciplinary Challenges of Big Data’. 

As the speakers during the day reminded us, we cannot start from the perspective that everything might be interesting when analysing data – the challenge is detecting the real signals from that mass of noise, as underlined by presentations that covered the international project on cancer genomics, the Large Hadron Collider, cosmology, international financial markets, Google. 

These were speakers who were also obviously highly stimulated by the challenges they face. The amount of data generated in some of these projects is beyond human comprehension, as were some of the figures from an article in National Geographic on neuroscience (‘Secrets of the Brain’, Vol. 225 No. 2, 2014) I happened to read on the train going home – a mouse brain has 70 million neurons and the project to complete that scan will take a further two years. 

When asked to comment on scanning the entire human brain, Jeff Lightman (Ramón y Cajal Professor of Arts and Sciences at Harvard) replied: ‘I don’t dwell on that. It’s too painful’! 

In fact, the work to date in re-creating just a tiny grain of salt sized part of the mouse’s brain contains 100 terabytes of data, the equivalent of 25,000 high definition movies. An image of the human brain would, it is estimated, contain 1.3 billion terabytes of data – just under half of total global digital storage in 2012.

In the conference, Harry Cliff (University of Cambridge) described the Hadron projects that involve 1 billion collisions per second, generating 10 zettabytes of raw data a year, 30 times greater than all current knowledge in the world. This data is communicated via a Worldwide LHC Computing Grid to 170 centres in 40 countries for analysis. 

So, echoing Self, there is indeed a need to create an appropriate structure to projects when data is that big, and most of it is in fact not interesting and discarded, lost forever. 

As speakers described, it is vital to identify the question before setting up the analysis – and these huge projects have protocols for doing this to ensure that the interesting data is robustly analysed against a defined aim. 

The challenges facing the International Cancer Genome Consortium (ICGC) are equally complex with 10 billion items of data being generated in 2012. And their focus is only on the 20 most common cancers. As Jan Korbel (European Molecular Biology Laboratory) described, no two conditions are alike. 

For example, Ruth Travis (Oxford University) described how prostate cancer in young men is different than the form in older men, and emphasised the importance of creating a holistic view of each individual patient – a data integration challenge. 

The eventual aim from the ICGC programme is personalised medicine where a patient can be scanned and the data analysed at an affordable cost within an appropriate timeframe (could be hours, or days depending on the diagnosed condition), this data being also integrated with NHS and contextual data on the patient (e.g. lifestyle, demographics etc) to produce a tailored treatment programme.

Travis also described the issue of representativeness - who agrees to take part in the research and why the data still remains valuable when it isn’t, and therefore the need for replication studies. 

The more complex the data, the more chance of bias, especially where data collected for one purpose is later used in other ways, and, as another panel member, John Copas (Warwick University), warned there is also the difference between what we can measure and what we’re interested in. 

Don’t get fooled by the scale of available data (John Quackenbush, Dana-Farber Cancer Institute), and ensure you know the context.

Self’s advice about knowing when to draw the line and stop aiming for completeness seems another big challenge facing researchers in these fields. Whilst the immediate analytic need maybe met, this is simply one element in a complex, long-term programme of research. 

When human health, or the building-blocks of the universe is the topic, drawing a line in the sand must be an extremely difficult decision to make.

What was particularly interesting about this conference was not just the detail that speakers provided in their presentations, but the interdisciplinary range of experiences covered in the programme, with plenty of opportunities to learn from those working in other fields. 

This was underlined by the Rt Hon David Willetts MP (UK Minister for Universities and Science), who reminded us that George Osborne had announced in the recent Budget statement funding of £42m over 5 years for a new institute, the Alan Turing Centre (named after the UK WW2 codebreaker and computing pioneer) to ensure Britain leads the way in the use of big data. 

Willetts also described how an adaptation of the real time telemetry system used in Formula 1 races by McLaren was being tested as a way to continuously monitor sick children in one UK hospital.

All of this makes the challenges faced in day-to-day market research projects seem rather small beer, but the same principles need to be applied, especially as the survey data is increasingly just one source of information in the pursuit of holistic knowledge about consumers and their behaviour. 

Researchers need to collaborate across disciplines and borrow methods successful in other fields.

In ‘big data’, the challenges faced by researchers are the real ‘big’ – it is not simply about having lots of data. Research and curiosity go hand-in-hand; you don’t become a researcher unless you have an enquiring mind and a fascination for creating new knowledge. 

However, not everything can be interesting, but as all researchers know, drawing the boundaries between what is, and what isn’t, can be the ultimate challenge.

P.S. Still on the big data theme, if you’ve not seen it, the UK National Statistician has recently recommended a predominantly online census in 2021 supplemented by the further use of administrative and survey data. It is now up to the government to make the final decision. 


Comments (0)

No Comments Added

Leave a reply





Please enter the 3 black symbols only