Email Icon Facebook icon Twitter Icon GooglePlus Icon Contact

User Top Menu

SfN 2014—Failure to Replicate Goes Under the Microscope

1 Dec 2014

December 2, 2014. On Sunday morning, November 16, scientists at the 2014 Society for Neuroscience meeting in Washington, D.C., tore themselves away from the latest research reports for a discussion of one of the most important, albeit uncomfortable, issues in modern biomedical science: the lack of reproducibility of research studies.

Speakers representing different arms of the research enterprise—the National Institutes of Health (NIH), research journals, and academia—presented varying strategies to combat the problem, though several common themes emerged. The session did not deal with research misconduct, the willful and fraudulent misrepresentation of the truth, but instead focused on inadvertent events that undermine the reliability of results.

Recent projects aimed at replicating preclinical findings on spinal cord injury and amyotrophic lateral sclerosis (ALS) have reported dismal news: roughly 10 percent of spinal cord injury studies and none of the 70 ALS drugs in which efficacy was previously reported could be replicated (Steward et al., 2012; Scott et al., 2008). No scientific discipline seems to be immune to the ubiquitous reproducibility problem. For example, only six out of a total of 53 "landmark" studies in hematology and oncology could be confirmed (Begley and Ellis, 2012).

Session co-moderator Story Landis of the National Institute of Neurological Disorders and Stroke (NINDS) in nearby Bethesda, Maryland, provided the session's introductory remarks. She suggested that four factors contribute to the reproducibility problem: the limited space in articles to adequately describe methods, a lack of transparency in the methods used, statistical naiveté, and the complexity of the biological systems under study.

A goal, said Landis, is to strike "the right balance between rigor and reproducibility." At stake is not only the progress toward new treatments, but also the public's trust in the scientific endeavor, which has already been shaken due to recent reports of the problem in the media (e.g., The Economist).

View from the NIH

The first featured speaker was another local resident, NIH director Francis Collins, who provided his take on the causes of the reliability problem that is "casting a cloud over biomedical research," he said. Collins presented an equation where the combination of deficient experimental procedures, lack of transparency in reporting, and publication bias results in poor reproducibility.

"This session is not an attempt to beat up investigators," he said. Instead, the focus is trying to find a path forward, Collins added. He called for the publication of negative outcomes as well as positive ones. He also highlighted the need for better reporting of experimental procedures, pointing to the fact that very few studies provide critical details such as whether randomization, blinded assessments of outcome, and power calculations of sample size were used.

Collins discussed several recent workshops the NIH has held over the past few years to determine best practices (Landis et al., 2012). Participants included scientists and journal editors as well as representatives from the Pharmaceutical Research and Manufacturers of America (PhRMA), an organization representing drug companies.

Collins also described several specific measures the NIH is taking in order to increase reproducibility, including exploring the idea of providing funding for replication studies, compiling reviewer checklists in order to standardize the review process, and the creation of PubMed Commons, where researchers can share opinions as well as methodological details.

… From a journal

Veronique Kiermer from Nature in New York City described the journal's perspective on the reproducibility problem. She noted that the number of formal corrections to articles has increased in the past 10 years and that many of these could have been eliminated if the research had been more rigorous.

In May 2013, Nature implemented a number of measures that focused on awareness and reporting: a checklist of reporting standards for reviewers; elimination of length limits for methods sections; increased scrutiny of statistics; and a re-emphasis on data sharing. Early assessments indicate that while the measures are not yet changing how experiments are being done, they are increasing how much information is being reported, she said.

According to Kiermer, the role of journals is to raise awareness of the problem and be catalysts and facilitators of discussions. They might also be able to drive some changes by ensuring full reporting, effective review, and measured conclusions. Journals should respond quickly and thoroughly to criticisms of published studies, she added.

However, despite playing a large role, journals are only one component that can effect change. "This is an ecosystem. It will take all of us," she said. Universities and institutions can improve training as well as oversight and compliance. In addition, they need to prioritize putting the infrastructure and support in place for data sharing (including computer code) and replication, and provide incentives and recognition for good laboratory leadership, Kiermer added.

… From a scientist

Huda Zoghbi of Baylor College of Medicine in Houston, Texas, provided the senior investigator's perspective on why translation from preclinical to clinical research is so poor. She emphasized the need for researchers to truly understand the disease they are modeling. This requires more than reading a few reviews or meeting patients socially, she said, and encouraged collaboration with clinicians.

Animal models have great value, but also limitations, said Zoghbi. They must have both construct and face validity, she added, noting that the latter is especially challenging due to differences between human and rodent brains. She then touched on several aspects of study design, data analysis, and interpretation that affect reproducibility, including the type of statistical test used and the appropriate way to calculate sample size (e.g., when 20 neurons are captured from five mice, n = 5 … not 20!).

The culture of science is also contributing to the reproducibility problem, said Zoghbi. Most preclinical studies are designed on short student or postdoc time frames rather than being continued to assess long-term outcome. Study design might also share some of the blame, she continued. Therapies that are effective in young animals may not work in older ones, so "we need to think carefully about the timing of trial initiation and outcome measures."

Turning to journal editors, Zoghbi said that journals are much less likely to publish a study that failed to replicate one of the journal's earlier reports. She advocated for a new system in which any study that failed to replicate the original would appear in the same high-quality journal as the first. That will make the public respect us, she said. On the front end, another potential solution to increase replication is to encourage collaborations in which different teams complete studies in parallel and share authorship, Zoghbi added.

… From a dean

The final speaker was John Morrison of Mount Sinai School of Medicine in New York City, who, as a dean of graduate students, postdocs, and faculty, brought an education administration viewpoint to the session. He called for a lab culture in which best practices such as prospective study design and careful methodological record keeping are openly discussed and made a top priority. It's not good science when students in the lab are proving, rather than testing, a hypothesis, he cautioned.

Morrison highlighted a recent paper that he encouraged audience members to use as a syllabus for training students in the responsible conduct of research (Steward and Balice-Gordon, 2014). The paper covers a variety of topics including a priori power calculations, appropriating randomization and blinding, and ways to minimize bias.

In addition, faculty members need to recognize and confront the enormous pressure to publish in a high-profile journal in order to get or keep a job, said Morrison. We are the editors and reviewers of journals, the reviewers of grants, and members of the appointments and promotion committees, said Morrison. It's up to us to change our standards and expectations, and to evaluate candidates based not only on impact but on rigor as well. "We've seen the enemy, and it is us. But we've also seen the solution, and it is us," he concluded.

In the panel discussion that followed, several additional ideas were raised. Session co-chair Tom Insel of the National Institute of Mental Health (NIMH) suggested that journal articles should not be an end product, but rather an "advertisement" for results that points those interested to a more complete description of both the methods and the data located elsewhere. Nora Volkow of the National Institute on Drug Abuse (NIDA) highlighted a source of untapped information, calling for an investigation into the characteristics of the studies that have been successfully replicated.—Allison Marin.