Reproducibility crisis in science: Taking down the many headed monster

Atreyi Chakrabarty (St. Cross College, DPhil in Interdisciplinary Bioscience) in conversation with Professor Dorothy Bishop (Department of Experimental Psychology, Oxford)

More and more scientists are starting to doubt the nature of their own playing field. Their colleagues’ studies published in acclaimed journals, but the key findings mysteriously never seem to materialise when others try to reproduce them. And with this lack of reproducibility comes the onslaught of aggressive defence, self-doubt, failure to find funding, the pressure to conform and the big unreachable cloud of publishing.

The credibility of the scientific method is on the line because of the huge lack of reproducibility seen across several areas of research, predominantly in psychology, biomedical sciences and other life sciences. Only in the last decade or so have scientists been realising that the scale of disparity seen in the literature is embarrassing and needs to change.

Dorothy Bishop, a neurodevelopmental psychologist at the University of Oxford and a pioneer in the movement for reproducible science, was uneasy about some scientific practices from early in her career. She was studying the relationship between handedness and neurodevelopmental disorders. She found that everyone in the field analysed their data differently to dig around for the best correlations. This is poor practice for scientists, who should be more specific with the questions they ask. “Even with completely random data, in a very high percentage of cases you will find something with apparent significance,” Bishop explains, “and that random finding will likely never be reproduced.”

The malpractice of hypothesising after results are known (HARKing) is a major player in the current mess of irreproducible studies. It is closely related to another fallacy, P-hacking. This is when you try out multiple statistical tests on your data and use the one that gives you a statistically significant ‘p-value’. “This lack of statistical understanding is dangerous, and statisticians are often perceived as killjoys for rethinking interesting results,” Bishop adds. Even famous psychologist at Cornell University, Daryl Bem, who has infamously defended his studies on paranormal phenomena, was found to be a perpetrator of p-hacking and HARKing, two of the cardinal sins of science.

The temptation of going astray can be tackled partly if scientists are asked to formulate their experimental and analysis plans before data collection. Chris Chambers, a cognitive neuroscientist at Cardiff University, promoted this approach through a journal. Scientists submitted their ‘registered reports’, which was reviewed and altered until deemed sufficiently robust and statistically powerful, and only then did they proceed with their experiments. This does demand a lot of time but results in far more sound science. As an incentive, the journal published the scientists’ work whatever the outcome of the study. ‘Even negative results?’ I hear you say. Most definitely yes.

Publication bias is another culprit for the reproducibility crisis. Scientists can only seem to get their work recognised if they find something sensational, not if they find nothing. But if other scientists do not know in what circumstances you find nothing, during replications they will make the same mistakes, draining resources and stalling the progress of science. Science is just as much about finding nothing as it is about finding something. We are in dire need of a cultural shift.

We also need scientists to leave their egos out of the picture. Settling differences between original authors and replicators often becomes a tiresome game of dodgeball. Daniel Kahneman, cognitive psychologist, proposed a ‘replication etiquette’ whereby replicators must consult the original author through the process, as the results might be sensitive to specific conditions. Dorothy suggests that “We need to acknowledge that we all make mistakes – a lot. Lack of replication doesn’t mean the result was wrong, but getting defensive is problematic.” Similarly, attacking the original result is also problematic because the replication itself may have been poor. However, if a result is highly dependent on specific conditions is it an important finding anyway?

One way the life sciences can learn from its older sibling, the physical sciences, is by collaborating in research rather than competing in it. Tony Weidberg, a physicist at the University of Oxford and the Large Hadron Collider, provided insight into the way physics questions are approached. They often work on projects in large teams, which allows for large sample sizes and high statistical power. They often have sub-teams developing different experimental methods to test the same hypothesis. If they all reach the same conclusion then the result is thought to be more robust. “Science seeks to find patterns in nature that have some sort of generalisability,” Dorothy adds, “so we need to be more open to coming at the same question from different angles.” In fact, Kahneman developed a protocol whereby researchers go head to head in an adversarial collaboration. This reduces the likelihood and effects of biases and encourages cross-questioning.

Many research funding bodies are starting to actively endorse initiatives for cleaning up the state of the field and building up scientific credibility again. They are investing in more training and raising awareness for researchers about the ethics of good science. Scientists need to be more transparent with their data and analyses, develop their hypothesis at the beginning of their research and have sound statistical understanding. Scientific journals need to shift the positive results bias, incentivise replication studies, and be more representative of the field. Dorothy is looking forward to the revolution of reproducible science. “I no longer feel like an outlier recognising the crisis. These are times of great change,” she says. A national UK reproducibility network (UKRN) has recently been formed in October 2018, and Oxford has its own consortium of professionals from a range of science and humanities divisions, dedicated to improving reproducibility and open research.

So science may not be the ultimate truth you expected it to be, but scientists are bravely testing unknown waters every day. And with so much unknown, there are bound to be irreproducible findings. But with good scientific practice we can ask the right questions to understand why that is without distorting the truth.