Evaluation reforms shouldn’t throw the baby out with the bathwater, say Lutz Bornmann and colleagues
There is a concerted campaign in many areas of science policy against the inappropriate use of bibliometrics in research evaluation. One stated goal of the Coalition for Advancing Research Assessment is to “reduce the dominance of a narrow set of overly quantitative journal and publication-based metrics”.
In Germany, the German Science and Humanities Council (WR) and the German Research Foundation (DFG) have opposed the quantitative evaluation of research. The WR has said it favours “assessing the quality of individual publications…rather than focusing on the publication venue, or indicators derived from it”. While the DFG has warned that “a narrow focus in the system of attributing academic reputation—for example, based on bibliometric indicators—not only has a detrimental effect on publication behaviour, it also fails to do justice to scholarship in all its diversity”.
The principal concern behind these reform efforts is that using bibliometrics to measure research quality leads to goal displacement: scoring high becomes the end, damaging research in the process. And it is true that improper use of bibliometric data—such as tying funding or salaries to measures of productivity or citation impact—can have undesirable effects on scholarship.
In the evaluation of economic research, for example, there is a focus on just five renowned journals. Consequently, they are overwhelmed with submissions, the risk of scientific misconduct in the pursuit of spectacular results rises, and much other high-quality research that is not published in these five gets overlooked.
But the research assessment baby should not be thrown out with the bibliometric bathwater. It is important to distinguish between amateur and professional biblio-metrics, but neither the WR nor the DFG does so.
By amateur bibliometrics, we mean decision-makers whose lack of expertise in the field puts them at risk of using unsuitable indicators and databases and inadequate datasets, resulting in harmful incentives and misguided decisions. Professional biblio-metrics makes use of large datasets from appropriate sources, and professionally recognised methods.
Bibliometrics is unsuitable for some purposes, such as evaluating junior scientists for doctoral positions. But for others, such as surveying the academic activities of institutions or countries and the cooperation between them, it can yield empirical results of the highest validity. If bibliometrics is to be used in an evaluation, specialists in bibliometrics and experts in the relevant fields should collaborate to decide whether the use is sensible and, if so, how bibliometrics should be applied.
To tackle the inappropriate use of metrics in research evaluation, the reasons that make them popular should first and foremost be addressed. One is speed: evaluating academic work using metrics can be quicker than examining its content, and several studies have shown that bibliometrics and peer review achieve similar results in many cases. Using bibliometrics as a heuristic saves overstretched academics valuable time.
Another reason metrics are popular is the desire to reduce subjectivity in evaluation. As a rule, reviewers of the same work reach different judgments; metrics offer the (claimed) possibility of objectivity. It is much easier to base an argument on a metric than on the content of a work, which could be open to dispute. However, the use of metrics in decision-making may also cause researchers to relinquish some of their own judgments.
Addressing these concerns means reforming peer review processes. Scholars should be given enough time to carry out peer review, and reviewers should be chosen on the basis of expertise, not eminence. Highly regarded scholars are overwhelmed with requests to review papers, grant proposals, job applications and so on, often outside their own fields. Focusing on reputation can result in eminent but unqualified reviewers. In such a situation, bibliometric indicators become more appealing. To avoid their use, only those competent to conduct qualitative peer review should be given the job of evaluating the research, irrespective of their reputation.
Just as the use of bibliometrics in research evaluation can be criticised, so can qualitative, content-focused peer review—in terms of reliability and fairness, for example. Both modes of evaluation have their own pros and cons. Rather than reject one or the other, a specific procedure should be developed for each planned research evaluation to allow for the optimal assessment of that research.
The views of the authors do not necessarily match those of the Max Planck Society
Lutz Bornmann and Georg Botz work at the administrative headquarters of the Max Planck Society in Munich, Germany, and Robin Haunschild is at the Max Planck Institute for Solid State Research in Stuttgart, Germany
This article also appeared in Research Europe