Surrogate endpoints in global health research: searching for silver bullets?

In clinical research, there is widespread acceptance that surrogate endpoints might not translate into long-term benefits (e.g. lives saved). But in global health, we are often stunned when improvements in surrogate endpoints do not save lives. We do this, despite knowing that silver bullets don’t work in global health.
Published in Microbiology
Surrogate endpoints in global health research: searching for silver bullets?
Like

All students who learn about clinical trials and evidence-based medicine are taught about the hazards of surrogate endpoints, which are measurements (e.g. biomarker levels, lab test results, or short-term improvements in health status) that substitute for hard outcomes that are important for patients (e.g. avoiding premature death or severe disability). For example, in cardiovascular research, improvements in parameters such as blood pressure or cholesterol are often used, instead of outcomes such as cardiac deaths. In oncology, tumor shrinkage or reduction in biomarker levels are often used, instead of increased overall survival and improvement of quality of life.

But everyone doing cardiovascular or oncology research will freely acknowledge that such surrogate endpoints may not correlate with real outcomes of interest. In fact, they will tell you that improvements in surrogate endpoints can even increase risk of death. The clinical epidemiology literature is full of famous examples and case studies that illustrate the hazards of using surrogates.

In sharp contrast to clinical researchers, global health researchers do not seem to care much about the hazards of surrogate endpoints. At least, I have not seen much public debate or discussion about this issue.

I am convinced that we should also worry about surrogate endpoints in global health. I offer two examples. The first is the case of a new tuberculosis detection technology called Xpert MTB/RIF, an automated, molecular test for TB and drug-resistance (image). This tool was first endorsed by WHO in 2010. Since then, the test has been rolled-out in many countries and over 23 million tests have been done over the past 6 years.

Image credit: Madhukar Pai

While the test is rapid, accurate and much superior to what we have been using for decades, pragmatic randomized trials have shown disappointing results on improvements in long-term outcomes such as reduction in mortality and reduction in TB incidence. This has prompted media headlines that read “Improved diagnostics fail to halt the rise of tuberculosis (graphic below).”

Image credit: Nature

Another example is the recent large trial in India of the WHO Safe Childbirth Checklist, a quality-improvement tool (graphic below), that promotes systematic adherence to practices that have been associated with improved childbirth outcomes.

Image credit: WHO

In a large-scale study in 24 districts in India, birth attendants’ adherence to essential birth practices was higher in facilities that used the coaching-based WHO Safe Childbirth Checklist program than in those that did not, but maternal and perinatal mortality and maternal morbidity did not differ significantly between the two groups. This prompted media headlines that read “A birth checklist fails to reduce deaths in rural India” and “A lifesaving childbirth tool was successfully introduced in India—but saved no lives.”

So, in global health, we seem to have great expectations that new tools and checklists will save lives, and we are stunned and disappointed when they don’t. But everyone involved in global health research is aware of the painful realities of the weak health systems in which we work. A tool or a checklist or a mobile app or a drone might indeed improve some surrogate endpoints, but might fail to meaningfully improve patient outcomes because long-term outcomes improve, only when a series of causal events get completed. In other words, the entire cascade of care needs to improve, and merely improving the initial step in the cascade (or causal chain) might not improve overall outcomes or result in sustained benefit.

Take the case of a new TB test, for example. I am convinced that tests can never save lives, for any disease. It is timely and correct treatment that saves lives. The purpose of a TB test is to rapidly and accurate identify patients who have TB (which Xpert MTB/RIF does wonderfully well, better than any TB test in history). Once this is done, the TB test fades into the background. It’s job is done. Other factors become more prominent. What TB treatment is initiated on the patient, and how quickly does this happen? What systems are put in place to support patients through a long course of anti-TB therapy? How good are the treatment completion rates? What about treatment of comorbid conditions such as HIV/AIDS, diabetes and malnutrition? Taken together, all these interventions should (and do) save lives. But is it fair to expect the initial TB test (at time T0) to influence outcomes after a year (e.g. time T1) or longer, especially when we know the cascades of TB care are broken in so many countries? The graphic below shows the steps between the initial test and patient outcomes of importance. So, it is not as simple as doing a test at diagnosis!

Source: Schumacher et al. PLoS ONE 2016

In the case of the Better Birth Trial, is it fair to expect that ticking boxes on a checklist will automatically save lives? The purpose of any checklist, whether it is for aviation or healthcare, is to ensure that essential tasks are done. During childbirth, this includes tasks such as hand washing and blood pressure monitoring. But if the most vulnerable women do not come to health facilities on time, or if pregnant women referred for urgent hospital care are unable to go to higher levels of care, or if the health centers lack facilities for C-section or blood transfusion, surrogate endpoints such as checklist completion might not save lives. Indeed, this probably what happened in the Indian trial. As Atul Gawande, one of the authors of the trial, said: “The improvement in quality was part of the answer, but it was not enough. We need to understand what more needs to be added to get to the endpoint everyone wants.”

So, how should we deal with this problem of surrogate endpoints in global health? We cannot get rid of surrogate endpoints but need to get smarter about using and interpreting them. Surrogate endpoints are sometimes necessary, since waiting for long-term, patient outcomes could greatly delay the introduction of a good tool or a lifesaving drug/vaccine. On the other hand, we cannot use surrogates naively, without alerting consumers and policy makers of the dangers inherent in such endpoints. Here, we must learn from clinical epidemiologists and clinical trialists.

As Grimes and Schulz have argued, ‘researchers should avoid surrogate end points unless they have been validated; that requires at least one well done trial using both the surrogate and true outcome.” Kemp and Prasad suggest that “the use of surrogate outcomes should be limited to situations where a surrogate has demonstrated robust ability to predict meaningful benefits, or where cases are dire, rare or with few treatment options. In both cases, surrogates must be used only when continuing studies examining hard endpoints have been fully recruited.”

So, in global health, we need to design stronger studies where we can demonstrate how surrogate endpoints alter subsequent causal events, and how they influence patient outcomes (or not). For example, in the case of TB diagnostic trials, if we care about reducing mortality after TB testing, then we should also ensure TB treatment completion is improved. This requires a shift from thinking about tools to complete, patient-centric solutions. A shift from fixing the initial part of the care cascade to strengthening the entire care cascade. A shift from diagnostic trials to evaluations of the entire ‘test and treat’ strategy.

In addition, we need to explicitly warn our readers when we publish studies with surrogate endpoints and lower unreasonable expectations. We need to lower our own expectations as researchers, and most definitely avoid hyping up positive results with surrogate endpoints. In the same vein, we must not get too disappointed when surrogate endpoints don’t save lives. In the case of Xpert MTB/RIF and WHO childbirth checklist, neither of these tools should be given up, just because of the trial results. But we do need to appreciate that tools and checklists can only go so far. These are not silver bullets that can magically fix underlying health system deficiencies. If we care about making a real difference, we need to work on strengthening health systems to ensure complete solutions for our patients. There are no easy shortcuts to success in global health.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Go to the profile of Narender Kumar
about 6 years ago

Very informative and nicely elaborated.

Especially with reagrds to TB where treatement regimens are months to years long asking or evaluating any effect on the final outcome within a year or two would be too early to infer any definitve conclusion.

Go to the profile of Samuel G. Schumacher
about 6 years ago

Hi Madhu,

Thank you for starting this conversation — it is such an important topic. It is also a very complex and challenging one though. As you know, I have thought about this a fair bit for the area of infectious disease diagnostics and by now feel like I could write a book about it ;-) I will keep it here to a few comments on three quotes that I picked out from what you wrote above and with the view on infectious disease diagnostics in particular:

#1 Madhu: "We cannot get rid of surrogate endpoints but need to get smarter about using and interpreting them. Surrogate endpoints are sometimes necessary, since waiting for long-term, patient outcomes could greatly delay the introduction of a good tool or a lifesaving drug/vaccine. On the other hand, we cannot use surrogates naively, without alerting consumers and policy makers of the dangers inherent in such endpoints."

I agree, if we start out by saying that “impact studies” are a requirement before new TB diagnostics can be recommended for use, the result would simply be that companies stay away from making any new tool, since this bar is too high and risks stifling innovation. No such requirements exist in any other disease area (whether infectious disease, non-communicable disease and whether for the WHO or the FDA). On the other hand, we do need to look beyond accuracy and need to be smarter about use and interpretation of surrogate endpoints. I’ve argued before that there are a number of ways in which we can do better but I would highlight two very practical and very doable things in particular: 
(1) Map out the care pathway in which we think to insert a new tool and clearly identify how we think it will make a difference. This is also what I argued in the PLoS ONE paper you show in your post (attached again here): it forces you to think through the causal pathway through which a diagnostic tool can have an impact and helps identifying the assumptions that need to hold for that potential to be fulfilled. This can also be done as part of a systematic review of diagnostic accuracy and is indeed e.g. encouraged by Cochrane (although often not done or only superficially).
(2) Conduct more operational, implementation and sheath systems research. This kind of research can help confirm or refute the assumptions identified when mapping out the care pathway. It is also essential when trying to generalise what has been found in prior studies to new settings.
If we do these two things well, I am sure it will greatly strengthen our ability to understand and predict what solution will or will not have an impact and where and what the key drivers are that will contribute to success or failure.

#2 Madhu: "This requires a shift from thinking about tools to complete, patient-centric solutions. A shift from fixing the initial part of the care cascade to strengthening the entire care cascade. A shift from diagnostic trials to evaluations of the entire ‘test and treat’ strategy."

I think we all agree that we need to think about complete solutions rather than just tools or technology. I also agree that doing more work beyond accuracy studies is key. But I don’t think that the answer is necessarily large randomised trials with mortality as the main outcome. This may seem very appealing at first glance, or even necessary when looking at the cancer/CVD literature on surrogate endpoints, but there are a lot of problems with test-treatment RCTs. Compared to trials of vaccines or drugs, this is a relatively new field methodologically and the additional complexities and challenges with it are still poorly understood. I’m attaching a few recent papers by Lavinia Ferrante di Ruffano on this topic for those interested. That is not to say that test-treatment RCTs don’t have a role but that they are not the obvious and simple answer some may think they are — especially when trying to generate broadly generalisable results that apply across wildly varying health systems. The best approach to look beyond accuracy will depend on the specific situation.

#3 Madhu: "In the case of Xpert MTB/RIF and WHO childbirth checklist, neither of these tools should be given up, just because of the trial results. But we do need to appreciate that tools and checklists can only go so far. These are not silver bullets that can magically fix underlying health system deficiencies. If we care about making a real difference, we need to work on strengthening health systems to ensure complete solutions for our patients. There are no easy shortcuts to success in global health.”

I just wanted to quote this here and comment no further, other than to say that I wholeheartedly agree --  I hope others will share their views as well!
Samuel