Interpreting composite endpoints in health research
Composite or combined endpoints offer advantages but introduce potential problems
Dear Reader,
Composite endpoints show up everywhere in modern clinical trials. Instead of hanging the results on a single outcome, researchers often bundle several clinically related events together and analyze them as one. That can be genuinely helpful, especially when an intervention is expected to affect patients in more than one way, or when the outcomes of interest are infrequent and difficult to study on their own. By pooling events, trials often gain statistical efficiency and power. Nonetheless, composite endpoints solve some real problems while quietly creating a few new ones of their own.
Here are a few familiar examples of composite endpoints across different therapeutic areas:
Cardiology: Major adverse cardiovascular events (MACE) typically combine nonfatal myocardial infarction, nonfatal stroke, and cardiovascular death.
Respirology: Composites often include outcomes such as death, the need for intubation or mechanical ventilation, and escalation to systemic corticosteroid therapy.
Nephrology: Common components include doubling of baseline serum creatinine, progression to end-stage kidney disease, and death.
On paper, these outcomes look closely related. In the real world, they can differ dramatically in how often they occur, how serious they are, and how much they actually matter to patients.
Many excellent papers have explored these issues in depth.1 Rather than try to summarize that literature, today’s letter focuses on the five things I think are most important to keep in mind about composite endpoints.
1. Composite endpoints combine multiple outcomes that are implicitly weighted equally, even though patients often value the components very differently.
The components of a composite endpoint should matter equally to patients yet trials count every event the same regardless of clinical importance. A single composite result can be driven entirely by outcomes patients consider minor, while masking the absence of benefit on outcomes they care about most, such as death or stroke.
Consider the following example.
The composite endpoint used in landmark diabetic nephropathy trials including RENAAL (losartan vs placebo) and IDNT (irbesartan vs amlodipine vs placebo) typically combines death from any cause, end-stage kidney disease (ESKD, defined as the need for dialysis or transplant), and doubling of serum creatinine. More recent trials, such as EMPA-KIDNEY, also include sustained ≥40% decline in eGFR as part of the composite.
Importantly, these components span a wide range of clinical severity. Death and ESKD are irreversible, life-altering events that patients would rank as catastrophic. A doubling of serum creatinine is a laboratory threshold clinically meaningful as a marker of disease progression, but something a patient may be entirely unaware of. A ≥40% eGFR decline is similarly a surrogate measure, one step further removed from what patients actually experience day to day. Yet all components count equally in the composite. A positive trial result driven primarily by slowing eGFR decline, rather than preventing dialysis or death, represents a meaningfully different clinical benefit than the headline number implies.
Components of a composite endpoint should be clinically homogeneous. In other words, they should be similar in nature, severity, and importance to patients. When they are not, a statistically positive composite result may obscure more than it reveals.
2. The most frequent component, which is often the least clinically important, usually dominates the composite result.
Not only should the components of a composite be of similar importance, it is critical to examine whether more and less important endpoints occurred with similar frequency. Components that occur more frequently, such as hospitalizations or revascularization procedures, tend to dominate the composite signal. This can make a treatment appear more effective than its effect on serious outcomes actually warrants.
Consider the following example.
The TRITON-TIMI 38 trial compared prasugrel to clopidogrel in acute coronary syndrome patients undergoing percutaneous coronary intervention. The primary composite was cardiovascular death, nonfatal MI, or nonfatal stroke. Prasugrel significantly reduced the composite (9.9% vs 12.1%; HR 0.81; p<0.001). However, this benefit was driven almost entirely by a reduction in nonfatal MI, the most frequent component. There was no significant difference in cardiovascular death (2.1% vs 2.4%) or nonfatal stroke (1.0% vs 1.0%) between the groups.
A reader seeing only the headline result might conclude prasugrel prevents death. A reader examining the components understands the drug primarily prevents recurrent myocardial infarction, an important but meaningfully different finding.
Always identify which component drives the composite result. If the effect is concentrated in the most frequent, least important component, the clinical significance of the trial may be substantially overstated.
3. When treatment effects differ across components, especially in direction, the composite obscures meaningful benefit and harm.
Composite endpoints work only if all their parts respond to treatment in the same direction. When they don’t, a trial can look neutral or even positive while quietly worsening the outcomes patients care about most. This has the potential to seriously mislead readers.
Consider the following example.
The ACCORD trial evaluated intensive glucose lowering (target HbA1c <6.0%) versus standard therapy (target 7.0–7.9%) in patients with type 2 diabetes at high cardiovascular risk. The primary composite endpoint consisted of 3 parts including cardiovascular death, nonfatal MI, or nonfatal stroke, which showed a non-significant finding in respect to intensive therapy (HR 0.90; 95% CI 0.78–1.04).
However, examining the components reveals a critically important divergence: intensive therapy reduced nonfatal MI (HR 0.76, 95% CI 0.62-0.92) while simultaneously increasing cardiovascular death (HR 1.35, 95% CI 1.04-1.76) and all-cause mortality (HR 1.22, 95% CI 1.01-1.46). The trial was stopped early due to excess mortality in the intensive arm.
Interpreted at face value, the neutral composite result would lead readers to miss an increase in cardiovascular (and all-cause) mortality. In this case, the composite endpoint functioned not as a summary, but as a veil over life‑threatening harm.
A neutral or positive composite can hide serious harm on individual components. Always examine the forest plot or component-level results table before drawing clinical conclusions.
4. A positive composite may reflect improvement only in the most bias‑prone components, not the most important ones.
Statistical significance applies to the composite as a whole, not to each outcome within it. Composite endpoints are particularly vulnerable when components differ in their susceptibility to bias, especially when subjective or practice-dependent outcomes are included alongside hard endpoints like death.
Hard endpoints such as death and biomarker-confirmed MI are objective and consistent across sites. Soft endpoints such as revascularization decisions, urgent outpatient visits, and even decision to admit to the hospital often vary substantially based on local practice patterns, institutional culture, and clinician thresholds. A drug that appears to reduce hospitalizations in one country or institution may show no effect elsewhere simply because hospitalization thresholds differ.
Consider the following example.
In the UKPDS-34 trial, a 21-point composite outcome included components ranging from death and stroke (hard, irreversible) to retinal photocoagulation (a procedure whose indication is somewhat practice-dependent) to microalbuminuria (a laboratory threshold that can fluctuate). The hard endpoints anchor clinical meaning; the soft endpoints introduce noise and potential bias.
Composite endpoints often reward what is easiest to move not necessarily what is hardest to live with.
5. Composite endpoints are most credible when components share a common biology.
When components are biologically related, that is arising from the same underlying pathophysiological process, a composite result is more coherent and interpretable. The treatment is plausibly acting through a single mechanism that produces multiple downstream effects. When components arise from different mechanisms, a positive composite may reflect two distinct drug actions, neither of which can be properly characterized from the composite result alone.
Consider this example of a biologically coherent composite.
The standard three‑component MACE endpoint including myocardial infarction, ischemic stroke, and cardiovascular death is among the most biologically coherent composite endpoints in clinical medicine. In populations with established atherosclerotic cardiovascular disease, these outcomes arise largely from shared atherothrombotic mechanisms, making concordant treatment effects biologically plausible. Therapies that reduce platelet aggregation or stabilize atherosclerotic plaques can reasonably be expected to reduce all three. For this reason, MACE remains the most widely accepted and regulator‑endorsed composite endpoint in cardiovascular trials, despite ongoing debate about its limitations.
Now, consider this example of a biologically incoherent composite:
The DREAM trial evaluated rosiglitazone versus placebo for prevention of type 2 diabetes in adults with impaired glucose tolerance or impaired fasting glucose. The primary composite was ‘incident diabetes or death.’ Although rosiglitazone markedly reduced the composite endpoint (11.6% vs 26.0%; HR 0.40; 95% CI 0.35–0.46), nearly all of this apparent benefit reflected diabetes prevention (10.6% vs 25.0%). Deaths were uncommon and did not meaningfully differ between groups (1.1% vs 1.3%).
The biological problem is transparent: rosiglitazone reduces insulin resistance, a plausible mechanism for preventing diabetes. But death in a population of middle-aged adults without prior cardiovascular disease can result from hundreds of causes unrelated to insulin resistance. There is no mechanistic rationale to expect rosiglitazone to prevent all-cause death at a similar rate. Combining these two fundamentally different outcomes in a single composite conflates a metabolic drug effect with an entirely unrelated endpoint, making the composite result seem more impressive than it is, and obscuring what the drug is actually doing.
To finish off this letter, here is a concise chart which presents the red flags to use when detecting poor composite endpoints!
In Letters from a Pharmacist, we’ll treat composite endpoints not as summaries, but as arguments that need to be examined, before they’re allowed to change practice.
Thanks for reading!
Peace and kindness,
JM
References
Prieto-Merino D, Smeeth L, Staa TP v., et al. Dangers of non-specific composite outcome measures in clinical trials. BMJ. 2013;347:f6782–f6782. doi: 10.1136/bmj.f6782
Gidda S, Tejani A. “All for one, one for all”: Evaluating composite endpoints in clinical trials. CPJ. 2008;141:28–31.
Freemantle N, Calvert M. Composite and surrogate outcomes in randomised controlled trials. BMJ. 2007;334:756–7. doi: 10.1136/bmj.39176.461227.80
Montori VM, Permanyer-Miralda G, Ferreira-González I, et al. Validity of composite end points in clinical trials. BMJ. 2005;330:594–6. doi: 10.1136/bmj.330.7491.594
Freemantle N, Calvert M, Wood J, et al. Composite outcomes in randomized trials: greater precision but with greater uncertainty? JAMA. 2003;289:2554–9. doi: 10.1001/jama.289.19.2554

