Biomedical science uncovers new disease genes, determines which cancer treatments work and helps us decide what to eat and how much to sleep. But a concern is rising among biomedical leaders: What if the studies are wrong?
An emerging breed of scientists known as “meta-researchers” is taking a close look at how modern science is conducted and reported. Their analyses are sobering. They suggest that despite researchers’ best intentions, much of the published evidence guiding the health advice dispensed by physicians or on the evening news is misleading — or just plain false.
It’s not that the age-old scientific method — in which researchers make predictions about how the world works and collect data to test them — is flawed. If a hypothesis is true, subsequent experiments should confirm it. If it is false, it should falter in further testing and eventually fall out of consideration.
Reproducibility and self-correction are core features of this process. Yet to advance their careers, scientists feel compelled to publish often — and it looks better to report a new discovery instead of painstakingly confirming what another lab has reported. As a result, unvalidated claims dominate the biomedical literature while null findings — those that fail to support a hypothesis — lurk within file cabinets and hard drives.
Leaders of the U.S. National Institutes of Health acknowledge this problem, and in a January Nature commentary, NIH director Francis Collins announced plans to explore how to enhance research reproducibility. That same month, a special issue of The Lancet featured five papers outlining steps to boost value and cut waste in biomedical research. And meta-researchers around the globe are devoting serious attention and resources toward the cause.
They include John Ioannidis, MD, DSc, and Steven Goodman, MD, PhD, who co-direct the new Meta-Research Innovation Center at Stanford, known as METRICS. “Meta-research is not a recognized field, in a sense. There are no departments, yet many people work in this area,” says Goodman, associate dean for clinical and translational research at Stanford. “Our goal is to help define meta-research as a coherent and cohesive field of scholarship and policy action.”
It’s that second component they hope will drive change. “Right now a lot of what’s done is research. There are fewer groups translating the research into actual research policy,” Goodman says. “We see METRICS as a research-to-action center where we identify key areas in which different policies might help, and work to develop those policies or develop the evidence base behind them.”
METRICS working groups will focus on improving the quality of peer review, educating scientists and trainees on statistical methods, facilitating data sharing and openness, and shifting funder and academic incentives to promote research reproducibility.
A mega meta-researcher
Ioannidis plunged into the “research of research” after seeing dozens of heart patients receive angioplasty during his internal medicine residency at Boston’s New England Deaconess Hospital in the early 1990s. Known as percutaneous coronary intervention, or PCI, the procedure helps clear clogged arteries. Curiously, though, many who received the intervention had arrived at the hospital stable and symptom-free.
“I was often asking myself, ‘Why are we doing this?’ or ‘Why am I being told to do this test or give this treatment?’ It was not clear to me,” says Ioannidis, a former college math whiz in Athens, who turned to medicine “to have direct impact on human beings.” Medical decision-making seemed largely intuitive and expert-based — “mostly gut feeling and a little bit of tradition,” recalls Ioannidis, now a professor of medicine and of health research and policy at Stanford. Today he has more than 700 papers to his name. In his work, he has collaborated with scientists from various disciplines who share similar interests and concerns, ranging from clinical investigators, epidemiologists, neuroscientists, psychologists and geneticists among others.
The head-scratching angioplasty procedures during his residency prompted Ioannidis (pronounced yo-a-NEE-dees) and colleagues many years later to sift through data from 11 randomized trials comparing PCI with conservative medical treatment. The studies involved about 3,000 people with stable coronary artery disease. The bottom line: Barring the subset of patients who had suffered a recent heart attack, PCI offered no benefit. Meanwhile, millions had undergone this invasive procedure for no good reason, says Ioannidis.
Meta-analyses such as these are the currency of “evidence-based medicine,” a movement that gained traction in the 1990s and still guides medical practice today. It is fueled by the belief that physicians should not advise on instinct, or even on the basis of individual reports, but rather draw objective conclusions by synthesizing data from the wider body of published literature. Evidence-based medicine was embedded in law as part of the Affordable Care Act. Signed by President Barack Obama in 2010, the health-care reform law mandates a shift to reimbursing providers based on health outcomes rather than visits, tests or procedures.
While evidence-based medicine sounds reasonable in theory, in practice the approach is precarious, Ioannidis says. Data from individual studies comprising the meta-analyses are often unreliable — or unavailable.
Years earlier, UC-Riverside psychologist Robert Rosenthal, PhD, called this the “file drawer problem.” In a 1979 Psychological Bulletin paper, he assessed the impact of the predicament, in which journals overflow with research studies claiming effects that are not real while the many studies failing to show a statistically meaningful effect remain tucked inside file cabinets. In some cases, Rosenthal argues, it wouldn’t take many of these “filed” analyses to make a published result non-significant.
Ioannidis saw from his own research how easily it went awry. From designing experiments, running them, analyzing results and writing them up, “it was very easy to make errors,” he says. “And many of the errors I saw in other papers I was also seeing in my own work, despite very good intentions.”
Could it be that many reports in the biomedical literature are, in fact, wrong? Ioannidis wanted to see if this longstanding hunch could be evaluated with mathematical rigor. Using a systematic approach, he calculated the likelihood that a biomedical research study would yield a true result, and determined how various factors, such as sample size and conflicts of interest, affect that probability. His decade-long efforts culminated in a 2005 PLoS Medicine essay, “Why Most Published Research Findings Are False.” By April 2014, the freely downloadable paper had received more than 1 million views — the first PLoS article to reach this milestone.
While scientists have long questioned the quality of biomedical research, “what John did is actually put numbers to it,” says Michael Bracken, PhD, a professor of epidemiology at Yale. “That was perhaps the most startling aspect of the PLoS Medicine paper — that so much of what we publish is almost certainly exaggerated or false.” The essay was “an instant classic,” says Bracken, who teaches a course in evidence-based medicine. “It became required reading for my students.”
In his 2005 essay and subsequent studies, Ioannidis used creative approaches to address the scope of the research quality problem. “He applied analytical rigor and provided a conceptual framework to address many of the issues that have been discussed,” says Deborah Zarin, MD, of the National Institutes of Health in Bethesda, Md. She oversees ClinicalTrials.gov, the world’s largest registry and results database for clinical trials and observational studies.
A look at clinical trials
Clinical trials are research studies conducted on people to test the safety and effectiveness of therapies or devices. Often costing hundreds of millions of dollars, these studies are the final crucible in the long, excruciating process by which a tiny subset of experimental treatments reaches the market. On average, fewer than 20 percent of potential medications that enter clinical testing will complete the final stages. For tough fields such as cancer and neurologic diseases, that number hovers at a measly 5 percent to 10 percent.
Making matters worse, the results of nearly a third of clinical trials do not get published. If a research study doesn’t see the light of day, it doesn’t contribute to the knowledge base that guides patient care, says Goodman.
“This is a chronic problem that has been documented since the mid-1980s, and the figure has not changed,” Goodman says. That translates into millions of dollars wasted on research whose outcomes remain hidden from the public domain and unavailable to guide future science.
Part of the issue is that most research yields null results yet journals tend to publish positive data. However, the bigger problem — according to a study by the U.S. Cochrane Center — lies less with journals rejecting negative data and more with authors not submitting them. It makes sense, psychologically. “You’ve got lots of projects. Some have really exciting positive results and you’ve got a few negative ones over here. Which ones do you bother to take to conferences, write up, present? It’s those exciting positives,” explains Paul Glasziou, PhD, MBBS, professor of evidence-based medicine at Bond University in Australia in a podcast with the recent Lancet series on research. “But the negatives are important to communicate as well, because they tell people dead ends but also add to the body of research. You get an overestimate of the positivity of the net set of results if you only have the positive studies.”
And even before research data gets reported and disseminated, waste can occur at many other levels, from a study’s conception to its design, conduct and analysis. “There’s a chain of inefficiency,” explains Ioannidis. In a 2009 Lancet Viewpoint, meta-researchers Glasziou and Iain Chalmers, MBBS, of Oxford’s James Lind Library, examined this “chain” and estimated that more than 85 percent of biomedical research is lost to correctable problems in the production and reporting of evidence.
The problem begins with animal research, which precedes clinical trials. How well an experimental compound performs in animals often determines whether it’s worth testing in people. It’s a high bar. Only 1 in 6,000 new compounds that pharmaceutical companies test in cell and animal models each year move into the first phase of clinical testing.
But are animal experiments reliable enough to be steering the multimillion-dollar clinical enterprise? In directing many randomized trials, Bracken says “it became increasingly obvious that animal work, which we depend on, has been a very poor predictor of what happens in humans. The methods used in animal research are substantially flawed.”
Researchers seldom carry out at the lab bench the procedures and practices they routinely conduct in clinical trials. Often researchers don’t randomize the animals into treatment groups. Nor do they mask the experimenter who’s judging how the animals fare with treatment. Bracken and colleagues raised these issues in a 2004 British Medical Journal commentary and revisited them in a follow-up BMJ analysis published this May. “The trajectory is improving but is still far short of what it needs to be,” Bracken says. Recognizing the need for better education, the NIH is stepping up its efforts to train postdoctoral fellows in designing good experiments and conducting them responsibly. “There is wider recognition that people who do animal work need more training in design and statistical methods,” says Bracken.
Finally, there is a lack of reproducibility — the ability to replicate previously reported findings. It’s essential for what makes science a self-correcting enterprise. In the recent Nature comment, NIH director Collins writes that the “complex system for ensuring the reproducibility of biomedical research is failing and is in need of restructuring.” The ability to reproduce published data can determine whether an experimental compound has a future — whether a company will spend big money developing it. These high stakes have prompted pharmaceutical researchers to take a hard look at the published data on potential drug targets. They asked a simple question: Is it trustworthy?
Bayer HealthCare scientists analyzed in-house target validation projects in oncology, women’s health and cardiovascular disease. In a 2011 comment in Nature Reviews Drug Discovery, they reported that the company’s validation teams failed to replicate published data about three-quarters of the time. A similar effort by Amgen scientists also yielded dismal results. Hematology and oncology researchers combed the literature for 53 “landmark” papers and found their findings confirmed in only six cases, they reported in Nature in 2012.
Why such difficulty? Some would say science is inherently challenging. It’s an endeavor in which failures are frequent and successes incremental. Others blame misdirected incentives. “There isn’t a culture for replication. The drive is entirely toward innovation,” says Brian Nosek, PhD, a social psychologist at the University of Virginia in Charlottesville. “We are incentivized to make our research look more beautiful than it actually is. As a consequence, the published literature is a skewed representation of reality.”
Plus there’s job competition. Researchers advance their careers by publishing papers. “Whether they’re right or wrong has little consequence, especially in the short term of getting jobs and tenure,” Nosek says. “‘I want to get it right, but I need to survive.’ We’re faced with these sorts of decisions.”
In early 2013, Nosek and one of his graduate students, Jeff Spies, founded the Center for Open Science — a nonprofit tech startup that aims to align scientific practices with scientific values. Headquartered a few miles from the University of Virginia, the center builds open-source tools to improve scientific workflow; works with publishers and societies to develop incentives that encourage transparency; and, along with Stanford’s METRICS, supports meta-science. This includes “reproducibility projects” — large collaborative efforts in which individual studies are crowdsourced for replication by individual research teams. One reproducibility project was launched in psychology in late 2011, another in cancer biology six months ago, and future projects in neurodegeneration and ecology are in the works, Nosek says.
Because cancer biology studies are costly, Science Exchange, a network of verified research labs that run experiments on a fee-for-service basis, is conducting the replications. “Researchers can go online to find an expert to do an experiment for them,” says Elizabeth Iorns, PhD, who founded the Palo Alto, Calif.-based company in 2011. “It is a marketplace. You can search the database for whoever offers your experimental need and order it from them.”
From initiatives that improve the quality of animal work to new policies promoting transparency and reproducibility, meta-science is slowly but surely coming of age. As for where to start and what to prioritize, “I feel a little bit like a child in a candy shop,” says Ioannidis, who receives more than 1,000 invitations each year to lecture on his meta-research. The good thing is that “many are sensitized to these issues — people who can push for transformation that could improve very different fields,” he says. METRICS plans to organize a conference in late 2015 for meta-researchers in biomedicine, as well as those who study the research process in other fields.
“There’s a lot of parallel activity going on and a lot to be learned from things that physicists and social scientists are doing, for instance,” says Goodman. “And they have a lot to learn from what’s going on in biomedicine. That’s part of the beauty of the center. We can reach across these boundaries and see what everybody’s doing and inform each other. The issues are not unique to biomedical science.”