“You must not measure everybody using the same set of criteria”
Paul Wouters is Professor of Scientometrics and director of the Centre for Science and Technology Studies at Leiden University. He specialises in the development of evaluation systems and how these systems are creating new constraints for the development of knowledge. We met him on the occasion of a symposium on “New Models of Research Evaluation”, organised by the University of Barcelona and the Open University of Catalonia.
Why is it important to measure and evaluate science?
A lot of researchers are asking this question, because they are finding they have to spend more and more time on evaluation, and this takes time away from their primary interest: research and projects. I think there are some good reasons for paying attention to it.
One is that old accountability styles (knowing people locally or knowing people in your profession) don’t work anymore. Everything has become distributed and globalized. This means that no single individual can oversee the field anymore. You need a more advanced way of assessing where you are in your own development.
The second reason is also related with accountability: science should explain to the public what it is doing because it is basically funded by public money. In addition, the way people live their lives is strongly influenced by research: communication, media, healthcare... So we can see that research is becoming a very important resource for society as a whole.
And the third reason is perhaps a more positive one: if you use evaluation in a formative sense, then it is actually helpful for you as a researcher. We call it formative because is meant to detect where your strengths and weaknesses are, and how you can develop. If people are not performing well, this should be said, but a formative assessment is meant to make sure that you improve. The problem is that institutions want summative assessments: “Was my money spent well enough? Why is my university sliding down in the rankings?” They tend to look back, but the main energy should be spent on looking forward and using evaluation in this creative way.
Could you explain what the “Leiden Manifesto” is?
The Leiden Manifesto was published in 2015 in Nature. It was an initiative created at a scientometrics community conference. We discussed which indicators we have to use, especially at the level of individual researchers’ assessments. We decided that we wanted to give a voice to the concerns that many researchers have.
The Manifesto has ten principles, but they could be summarized in two main ideas. One idea is that indicators should always support a judgement and not replace it. The second idea is that you must not measure everybody using the same set of criteria. If you do an evaluation, you start with the mission of the research group that you are evaluating and then the rest follows. For example, a clinical research group is very different from a group that studies traffic in Barcelona.
How do we currently measure the impact of science? What indicators are normally used?
At the moment the most popular indicators are the journals impact factor and the h-index (for individual researchers or group Principal Investigators). Another important indicator is the share of your articles, whether they are highly cited and your position in the top 10 percent or in the top 1 percent; this is perhaps best used at the level of departments. Another indicator is the amount of external funds that you can attract. On the universities level, the position in the global university rankings is very popular. All universities are anxiously checking where they are in every ranking. I find it a little strange, but these are the more popular indicators.
How do we measure or capture the quality of research? Is there any problem with these measurements?
In my field, the consensus is that the best way to do this is by informed peer review, where you make a combination between quantitative indicators, citation indicators and your own judgement about the quality of the proposal. In bibliometrics, some people claim that the number of citations is equal to quality, but we don’t believe that. Citations measure feasibility and short-term impact, rather than quality, which requires a more multidimensional approach.
The difficulty with indicators is that they do not measure quality directly: you can only measure a very small set of dimensions of scientific performance. There are two key problems. One is that peer reviews tend to be conservative, because they are based on the current state of expertise. The other problem is with citation rates, because these are based on past performance. So, what we try to do with the scientific system is a mix between peer review and evidence-based indicators, but you can then find yourself with the problem that a proposal is too wild or too radical to be appreciated by peers, or that things like creativity or a good research question are not measured.
How do we detect science that is relevant from a societal point of view?
There is no general answer for that. You really need a case-study approach and in that approach you can incorporate very particular indicators and databases. Impact has to be defined specifically even if the discipline is too general.
For example, we have a project on the impact of research into heart conditions, and it is funded by the Dutch Heart Foundation. There, we look at the impact in professional clinical journals, the impact on healthcare arrangements, on hospital practices and on medical treatments. But we also have a project with a research group in theology. Here what you talk about is very different: you speak about the way problems are discussed in newspapers or the way they influence the general discourse about, for example, the meaning of life.
Can metrics change the way research is conducted? How?
Yes, there are two main mechanisms: through funding and careers. I know several people whose research area was no longer funded because it was no longer seen as interesting by the funding agencies. For example, at universities in the Netherlands, economists have been fired from economics departments because they were not publishing in economics journals, so they moved to sociology or to law and then did economic research in law or sociology. Assessment can have an enormous influence on the short-term way that research is done.
In the long term it also has an influence. It shapes the definition, for people who are being trained and educated, of what a researcher is. The new generation of researchers are entering a system in which these indicators are very important, they know nothing else. For them this is normal practice. That creates the problem that they no longer see research as an aim to solve a particular problem, but as a career device based on publishing in a high-impact journal. Then you are actually corrupting the scientific system, you are damaging the very core meaning of why we have the scientific system.
Can indicators change the definition of excellent science?
Yes, actually indicators do this. In the past, excellent science was always recognized by its leading experts. Nowadays excellent science is defined as being published in top journals. In the future it will be changing again, because this journal impact factor is not the best way to assess quality. In the long term, you would want to recognise which piece of knowledge has a big impact on the structure of our knowledge. But we only know that after twenty years, so it is difficult to predict.
This is also why we are so interested in this problem of evaluation. The way you create the indicators influences the definition of the original activity that you wanted to measure, and it changes that activity. So you have a kind of feedback loop.
Open science represents an innovation in the way research is conducted: why is this approach relevant in today’s society?
That is also a question researchers face, because sometimes they see that open science is rather like evaluation, that it is been imposed upon them. They think “What’s the problem? I’m doing well!”.
I think opening up the process of knowledge creation is more fitting in a society where lots of citizens are also well educated and can say something useful for researchers. There is no reason anymore to keep it closed: science could become more like a platform on which people can interact with each other. This way you decrease the capital cost involved in entering the scientific enterprise, and that can only be good. This can also accelerate innovation.
How should evaluation change if we shift from current procedures to open science?
Evaluation and peer review systems should also become more transparent and open. It makes sense to have peer reviewers to be responsible for their job, to be accountable. An important element is to include stakeholders and users of knowledge in the process of evaluation. For example, we participate in an experiment at Utrecht University, in the Netherlands, at a medical hospital, to include patients in the assessment of research. This is a first step: I think it should not be limited to patients.
What is the “altmetrics” movement about? In which sense are these metrics an alternative?
Altmetrics are measures of communication and social media processes, we do not consider them as a substitute of traditional quality or impact indicators. This is measuring something different from scientific performance in scientific publications. For example, if your publication is tweeted a lot, it may mean very different things. It certainly means there is a lot of communication going on about your article, that you are able to engage people. You can’t say tweeting is about societal impact, or Facebooking is about friends, because it can be very different in each case: it does not always mean quality or a long-term impact.
It should be the context of a particular evaluation that defines how you want to use indicators every time. There is no general formula: indicators are only the last part of the story, so they should not be leading the research you do.
Interview by Chema Arcos and Mireia Pons