Go back

Mine that data

The Commission’s leadership in research data coordination is welcome

Research has been revolutionised, in the last decade or so, by the quite unprecedented volume of data that researchers can generate and store.

Efforts are under way to manage this deluge: many of the infrastructure projects prioritised by the European Strategy Forum on Research Infrastructure, for example, are data management projects, aiming to build reliable databanks for different disciplines.

The arrival of the Research Data Alliance, to try to better coordinate this process (see Cover story), is therefore to be warmly welcomed.

Some disciplines are well down the road towards sharing data in standard formats. But with many subdisciplines seeking to reinvent the wheel, more should be done to ensure that discipline-specific efforts proceed along more compatible lines.

The analogy that the RDA’s founders draw with the internet is an appropriate one: where would we be today if there were one set of internet protocols for particle physicists, another for military use, and so on?

The generation and publication of research data was once limited by the logistics of writing and publishing, and, more recently, by the expense of reliable data storage. It is only quite recently that these limits have, to all intents and purposes, disappeared and data are being generated in mind-boggling volumes. Often, the capacity of researchers to do the storing, publishing and analysing isn’t keeping up.

One area where change is overdue is scientific publishing. Many researchers and research administrators now strongly believe that all research data should be published, whether it ‘fits in’ to a paper or not.

In the highly quantitative disciplines at the forefront of this change, such as genetics and atmospheric science, the shape of datasets is already agreed, and the scope for data sharing is considerable. In other disciplines, ranging from botany to the social sciences, frameworks are seldom in place to allow data to be stored and shared. 

The new alliance will be addressing that and, given the direct involvement of the formidable digital agenda commissioner Neelie Kroes, high-level support from the European Commission is assured.

That is a good thing, because the coordination role that the RDA could play, globally, is a good example of the sort of thing that the EU (as opposed to the member states) should be able to add value to. Under Horizon 2020, the EU hasn’t been able to come up with the kind of financial support for the ESFRI projects that some would have liked. But at least its backing for the RDA can help to address some of the challenges that each of the ESFRI databank projects will confront.

Data storage and mining are utterly central to modern research, but their intricacies lie beyond the purview of most scientists and social scientists. The researchers’ primary interest is usually the natural or social phenomena under study: researchers care about molecules or species, not about data structures. Yet their wholehearted participation in the RDA will be essential to its success. Those researchers who put themselves at the cutting edge will, of course, win ample reward in the longer term, as data structures emerge that meet their own needs and interests.