Surrogate endpoints in global health research: searching for silver bullets?
In clinical research, there is widespread acceptance that surrogate endpoints might not translate into long-term benefits (e.g. lives saved). But in global health, we are often stunned when improvements in surrogate endpoints do not save lives. We do this, despite knowing that silver bullets don’t work in global health.
All students who learn about clinical trials and evidence-based medicine are taught about the hazards of surrogate endpoints, which are measurements (e.g. biomarker levels, lab test results, or short-term improvements in health status) that substitute for hard outcomes that are important for patients (e.g. avoiding premature death or severe disability). For example, in cardiovascular research, improvements in parameters such as blood pressure or cholesterol are often used, instead of outcomes such as cardiac deaths. In oncology, tumor shrinkage or reduction in biomarker levels are often used, instead of increased overall survival and improvement of quality of life.
But everyone doing cardiovascular or oncology research will freely acknowledge that such surrogate endpoints may not correlate with real outcomes of interest. In fact, they will tell you that improvements in surrogate endpoints can even increase risk of death. The clinical epidemiology literature is full of famous examples and case studies that illustrate the hazards of using surrogates.
In sharp contrast to clinical researchers, global health researchers do not seem to care much about the hazards of surrogate endpoints. At least, I have not seen much public debate or discussion about this issue.
I am convinced that we should also worry about surrogate endpoints in global health. I offer two examples. The first is the case of a new tuberculosis detection technology called Xpert MTB/RIF, an automated, molecular test for TB and drug-resistance (image). This tool was first endorsed by WHO in 2010. Since then, the test has been rolled-out in many countries and over 23 million tests have been done over the past 6 years.
Image credit: Madhukar Pai
While the test is rapid, accurate and much superior to what we have been using for decades, pragmatic randomized trials have shown disappointing results on improvements in long-term outcomes such as reduction in mortality and reduction in TB incidence. This has prompted media headlines that read “Improved diagnostics fail to halt the rise of tuberculosis (graphic below).”
Image credit: Nature
Another example is the recent large trial in India of the WHO Safe Childbirth Checklist, a quality-improvement tool (graphic below), that promotes systematic adherence to practices that have been associated with improved childbirth outcomes.
Image credit: WHO
In a large-scale study in 24 districts in India, birth attendants’ adherence to essential birth practices was higher in facilities that used the coaching-based WHO Safe Childbirth Checklist program than in those that did not, but maternal and perinatal mortality and maternal morbidity did not differ significantly between the two groups. This prompted media headlines that read “A birth checklist fails to reduce deaths in rural India” and “A lifesaving childbirth tool was successfully introduced in India—but saved no lives.”
So, in global health, we seem to have great expectations that new tools and checklists will save lives, and we are stunned and disappointed when they don’t. But everyone involved in global health research is aware of the painful realities of the weak health systems in which we work. A tool or a checklist or a mobile app or a drone might indeed improve some surrogate endpoints, but might fail to meaningfully improve patient outcomes because long-term outcomes improve, only when a series of causal events get completed. In other words, the entire cascade of care needs to improve, and merely improving the initial step in the cascade (or causal chain) might not improve overall outcomes or result in sustained benefit.
Take the case of a new TB test, for example. I am convinced that tests can never save lives, for any disease. It is timely and correct treatment that saves lives. The purpose of a TB test is to rapidly and accurate identify patients who have TB (which Xpert MTB/RIF does wonderfully well, better than any TB test in history). Once this is done, the TB test fades into the background. It’s job is done. Other factors become more prominent. What TB treatment is initiated on the patient, and how quickly does this happen? What systems are put in place to support patients through a long course of anti-TB therapy? How good are the treatment completion rates? What about treatment of comorbid conditions such as HIV/AIDS, diabetes and malnutrition? Taken together, all these interventions should (and do) save lives. But is it fair to expect the initial TB test (at time T0) to influence outcomes after a year (e.g. time T1) or longer, especially when we know the cascades of TB care are broken in so many countries? The graphic below shows the steps between the initial test and patient outcomes of importance. So, it is not as simple as doing a test at diagnosis!
Source: Schumacher et al. PLoS ONE 2016
In the case of the Better Birth Trial, is it fair to expect that ticking boxes on a checklist will automatically save lives? The purpose of any checklist, whether it is for aviation or healthcare, is to ensure that essential tasks are done. During childbirth, this includes tasks such as hand washing and blood pressure monitoring. But if the most vulnerable women do not come to health facilities on time, or if pregnant women referred for urgent hospital care are unable to go to higher levels of care, or if the health centers lack facilities for C-section or blood transfusion, surrogate endpoints such as checklist completion might not save lives. Indeed, this probably what happened in the Indian trial. As Atul Gawande, one of the authors of the trial, said: “The improvement in quality was part of the answer, but it was not enough. We need to understand what more needs to be added to get to the endpoint everyone wants.”
So, how should we deal with this problem of surrogate endpoints in global health? We cannot get rid of surrogate endpoints but need to get smarter about using and interpreting them. Surrogate endpoints are sometimes necessary, since waiting for long-term, patient outcomes could greatly delay the introduction of a good tool or a lifesaving drug/vaccine. On the other hand, we cannot use surrogates naively, without alerting consumers and policy makers of the dangers inherent in such endpoints. Here, we must learn from clinical epidemiologists and clinical trialists.
As Grimes and Schulz have argued, ‘researchers should avoid surrogate end points unless they have been validated; that requires at least one well done trial using both the surrogate and true outcome.” Kemp and Prasad suggest that “the use of surrogate outcomes should be limited to situations where a surrogate has demonstrated robust ability to predict meaningful benefits, or where cases are dire, rare or with few treatment options. In both cases, surrogates must be used only when continuing studies examining hard endpoints have been fully recruited.”
So, in global health, we need to design stronger studies where we can demonstrate how surrogate endpoints alter subsequent causal events, and how they influence patient outcomes (or not). For example, in the case of TB diagnostic trials, if we care about reducing mortality after TB testing, then we should also ensure TB treatment completion is improved. This requires a shift from thinking about tools to complete, patient-centric solutions. A shift from fixing the initial part of the care cascade to strengthening the entire care cascade. A shift from diagnostic trials to evaluations of the entire ‘test and treat’ strategy.
In addition, we need to explicitly warn our readers when we publish studies with surrogate endpoints and lower unreasonable expectations. We need to lower our own expectations as researchers, and most definitely avoid hyping up positive results with surrogate endpoints. In the same vein, we must not get too disappointed when surrogate endpoints don’t save lives. In the case of Xpert MTB/RIF and WHO childbirth checklist, neither of these tools should be given up, just because of the trial results. But we do need to appreciate that tools and checklists can only go so far. These are not silver bullets that can magically fix underlying health system deficiencies. If we care about making a real difference, we need to work on strengthening health systems to ensure complete solutions for our patients. There are no easy shortcuts to success in global health.