Go back

A call to action on social science, big data and privacy

In the digital world, anonymity and informed consent are no longer meaningful. A global effort is needed to devise new protocols and regulations for research, says Julia Lane.

Research in the social sciences has fundamentally changed, because the world of data has changed. The advent of online society and mobile devices has allowed researchers, by studying digital footprints, to measure human behaviour using data unprecedented in their volume, timeliness and variety.

The result is that social scientists and statisticians have gone from being a bit like farmers to being more like fishers. Where once they set out to cultivate and harvest data by surveying a particular population with a particular end in mind, they now cast their nets into a sea of information and haul in data created in the wild.

The opportunities for understanding human behaviour are tremendous. The new world of data is strikingly different from the old world, where the state controlled most information sources. The variety of data makes it possible to link many sources, the volume permits much more granular analysis and the timeliness can help inform policy decisions quickly. 

There is, though, the potential for major errors: in April 2013, for example, social media users analysing photos of the Boston Marathon bombing incorrectly cast suspicion on a missing student whose body was later recovered from a river. And Google’s efforts to track flu outbreaks from search terms have led to well-publicised failures.

Unprecedented quantities of data, then, do not mean that more and better insights are instantly and equally available. Researchers are still crucial to making statistically valid sense of these opportunities.

Yet giving researchers access to big data on human beings raises huge privacy issues. It has become much easier to re-identify subjects, because even if one source has been anonymised, comparing multiple sources—a postcode from here, a date of birth from there—can create a unique fingerprint for an individual. The speed with which data are now collected and analysed also makes them potentially more harmful—the information is like a surveillance video, not a static snapshot.

The traditional sources for disseminating social science data—national statistical agencies—have been marginalised, and the pillars on which their dissemination rules were founded no longer hold. The first pillar—anonymisation—is no longer possible, because of the volume and variety of data and because the data are found, not made. 

The second pillar—the informed consent of human subjects—is similarly impossible. Quite simply, as the privacy expert Helen Nissenbaum has pointed out, information protocols can either be comprehensible or comprehensive, but not both. And informed consent has become almost irrelevant, as people’s attributes can be inferred from data on as few as 20 per cent of their peers—researchers no longer need to study you to characterise your behaviour.

If the old approaches don’t work, what are the emerging options? There are several promising approaches. Instead of using the traditional statistical means to protect data, we could create economic incentives to protect privacy, social incentives to share personal data for the public good, and legal deterrents, or we could rely on business and technical safeguards.

We could create walled gardens for research: closed systems where it is easier to control how data are made and used. Or we could rely on people offering their data as a public good, in the same way they might donate blood. There is a promising line of research called differential privacy, which uses cryptographic approaches to build a theory of privacy from first principles, seeking to make database searches maximally informative and minimally intrusive. The UK has provided leadership in the area of researcher access, thanks particularly to the efforts of data researcher Peter Elias and initiatives such as the UK Data Archive and the administrative data centres funded by the Economic and Social Research Council.

But these are scattered efforts. The data, and the companies controlling them, are global, so we need international solutions. No systematic, international approaches to build awareness of the problems, or international actions to address them, have been forthcoming.

We have paths to provide guidelines to promote researcher access, but they are trodden by individual teams or at best by regional and national governments. These must become interstate highways: protocols and regulations that will apply internationally and create a legal context appropriate to modern data. We have the problems and questions, and some partial answers; we now need to build a fully functional international system.

More to say? Email comment@ResearchResearch.com

Julia Lane is an institute fellow and senior managing economist at the American Institutes for Research. She holds a Gutenberg chair at the University of Strasbourg and is a University of Melbourne professor. She is an editor of Privacy, Big Data, and the Public Good: Frameworks for engagement (Cambridge University Press, 2014).

This article also appeared in Research Fortnight