Abstract
Composite endpoints are commonly used in clinical trials, and time-to-first-event analysis has been the usual standard. Time-to-first-event analysis treats all components of the composite endpoint as having equal severity and is heavily influenced by short-term components. Over the last decade, novel statistical approaches have been introduced to overcome these limitations. We reviewed win ratio analysis, competing risk regression, negative binomial regression, Andersen-Gill regression, and weighted composite endpoint (WCE) analysis. Each method has both advantages and limitations. The advantage of win ratio and WCE analyses is that they take event severity into account by assigning weights to each component of the composite endpoint. These weights should be pre-specified because they strongly influence treatment effect estimates. Negative binomial regression and Andersen-Gill analyses consider all events for each patient –rather than only the first event – and tend to have more statistical power than time-to-first-event analysis. Pre-specified novel statistical methods may enhance our understanding of novel therapy when components vary substantially in severity and timing. These methods consider the specific types of patients, drugs, devices, events, and follow-up duration.
Introduction
Composite endpoints are commonly used in clinical trials. Recently, the Academic Research Consortium-2 consensus stated that patient-oriented composite endpoints – the overall cardiovascular outcomes from the patient perspective, including all-cause death, any type of stroke, any myocardial infarction (MI), and any repeat revascularisation – should constitute the foundation of novel coronary device or pharmacotherapeutic agent assessment1.
The time-to-first-event method has been commonly used for the analysis of composite endpoints; however, it has the inherent limitation of treating all contributory endpoints as having equal severity and only gives weight to the first endpoint encountered in time. Thus, non-fatal events that occurred earlier have more impact than more serious events such as stroke or death that occur later. Furthermore, death may preclude or render impossible the observation of non-fatal events.
Over the last decade, several novel statistical methods have been proposed to overcome these limitations. These methods consider all events occurring until follow-up, incorporate the severity of clinical events, and account for the competing risk nature of different events2,3,4,5,6,7,8,9,10,11.
We aimed to review the different statistical methods other than the traditional time-to-first-event analysis, including win ratio analysis, competing risk regression, negative binomial regression, Andersen-Gill regression, and weighted composite endpoint (WCE) analysis (Figure 1).
Statistical approaches
WIN RATIO ANALYSIS
Win ratio analysis was introduced by Pocock et al in 2012 and is a rank-based method, which puts more emphasis on the most clinically important component of the composite endpoints by ranking the constituent components2. This analysis requires four steps: 1) ranking events by their severity, 2) making patient pairs, 3) deciding on a winner in each patient pair, and 4) calculation of the win ratio.
First, the components of the composite endpoint are ranked on the basis of their perceived severity. Second, the concept is to match patients with a different treatment assignment based on their individual risk estimates. Pocock et al proposed estimating a composite risk score for each patient based on pre-selected baseline prognostic factors3. Patients in the experimental treatment arm are matched to patients with a similar risk score in the control arm on the condition that the follow-up durations do not differ greatly (Figure 2A-1). When the number of patients in the two groups differs, some patients are randomly excluded to equalise the number of patients in both groups.
The third step is to decide on a winner in each matched patient pair (Figure 2A-2). The comparison of each pair is performed using every type of categorised event – death, or stroke, or MI, or other event. The events of each patient pair are evaluated to decide whether one had the most severe event (usually death is applied). If this is not the case (both patients were alive at the end of follow-up), the remaining pairs are then evaluated for the occurrence of an event ranked second in severity, and so on for each ranking (third, fourth, or fifth rank). If there were no events until the time of last follow-up, the pair is treated as “tied”2. The win ratio emphasises the more severe components when comparing composite endpoints between two groups of patients (Figure 2A-2 and Figure 2B).
Fourth, the win ratio is calculated as the number of winners divided by the number of losers; a 95% confidence interval for the win ratio is easily obtainable1 (Figure 2A-3). Since matched pairings are influenced by patients who are randomly excluded, it may be necessary to perform analyses repeatedly with different randomly excluded patients. Pocock et al have described the formulas for these calculations2; these calculations do not require special software. In addition, Luo et al presented a code for R software (R Foundation for Statistical Computing, Vienna, Austria) which could be helpful12.
Win ratio analysis is a rank-based method. It could reflect the event severity in the analysis of composite endpoints. Therefore, it is valuable when the components of the composite endpoint vary in their clinical severity and importance (e.g., composite endpoint of death, stroke, MI, and revascularisation in an ischaemic heart disease trial; composite endpoint of cardiovascular death and heart failure hospitalisation in a heart failure trial). On the other hand, there are several limitations. Severity ranking of each adverse event affects the result of the composite endpoint and the ranking in itself is debatable without universal consensus (e.g., severity ranking of MI and major bleeding). In addition, it can only be applied to the comparison between two groups. An example used in the EMPHASIS-HF study, which compared eplerenone (n=1,364) and placebo (n=1,373) in patients with New York Heart Association (NYHA) Class II heart failure and ejection fraction ≤35%, is shown in Figure 2C2.
Several options for making pairs have been proposed for comparing patients with similar anatomic and physio-pathological backgrounds. For example, prognostic scores, such as the anatomic SYNTAX score and SYNTAX score II, have been applied, instead of composite relative risk scores4,5.
In long-term event-driven trials, patient follow-up durations vary greatly, and many pairs are often categorised as “tied” (Figure 2D). To reduce this problem, patients can be stratified into several follow-up duration categories: patients are matched in strata of similar follow-up duration2.
When baseline risk factors are not well established, it is more difficult to match patients on the basis of risk. In this case, one can compare every patient in one group with every patient in the other group (unmatched pairs approach)2,3.
COMPETING RISK REGRESSION
Events (e.g., non-cardiovascular death) which preclude less severe events or events (e.g., heart transplant) which change the possibility to observe events of interest (e.g., congestive heart failure) are called competing risks (Figure 3A). The competing risk regression method takes these issues into account for composite endpoints and allows disentangling the contribution of an intervention to each type of event. The Fine-Gray model is the most popular model13. In this model, patients experiencing competing risk events remain in the risk set for the event of interest until they experience events of interest or they are censored (Figure 3B, Figure 3C). This analysis can be performed easily using free statistical software (EZR). Kanda has described the method in detail14.
This competing risk within clinical research was first introduced in the field of oncology. In patients who underwent chemotherapy for cancer, failure events commonly studied are relapse of the cancer and treatment-related death. The interest is to estimate the probability of relapse. In this case, treatment-related death is a competing risk event (which would obviously not allow the investigators to observe any relapse of cancer because the patients are dead) and competing risk regression analysis is useful15. When the age of the study population is high, death could be used as a competing risk since the rate of non-treatment-related death is relatively high. In the substudy of prosthetic valve endocarditis from the PARTNER trial16, the age of patients was 83 years and death was used as the competing risk event. The incidence of prosthetic valve endocarditis after transcatheter and surgical aortic valve replacement was assessed using this competing risk regression model (Figure 3D). In the field of cardiology, all-cause death may often be less device- or procedure-specific than deaths adjudicated as cardiovascular death. Non-cardiovascular death could be used as a competing risk, although all-cause death is the most unbiased method to report deaths.
NEGATIVE BINOMIAL REGRESSION
The traditional time-to-first-event analysis only evaluates the first adverse event and does not capture the subsequent events. However, in the field of cardiology, some adverse events, such as revascularisation, bleeding, hospitalisation for heart failure, occur repeatedly. Incorporation of all events is meaningful in terms of the evaluation of patients’ quality of life and medical cost. In addition, an increase in the number of events could yield additional statistical power. A simple method for the assessment of all adverse events between two groups is to compare the number of events.
In a book entitled “The Law of Small Numbers”, Bortkiewicz investigated the annual deaths by horse kicks in the Prussian Army from 1875 to 1894, noting that events with low frequency in a large population follow a Poisson distribution even when the probability varies (Supplementary Figure 1A). The Poisson distribution has commonly been used to model the number of events in an interval of time (Supplementary Figure 1A). The variance of clinical events in a trial is usually greater than the mean (Supplementary Figure 1B). In other words, the distribution of the number of clinical events is better represented by an overdispersed Poisson distribution. The negative binomial distribution is often used for modelling overdispersed Poisson data. Negative binomial regression analysis has been used to estimate treatment effect in terms of the rate ratio of a composite endpoint6,7,8,9 (Figure 4A) and is valuable especially in a high-risk population since patients tend to experience repeated adverse events. For this analysis, the “glm.nb” function from the “MASS” package in R software could be helpful17. In the PARADIGM-HF trial8, the primary endpoint (a composite of cardiovascular death or hospitalisation for congestive heart failure) was analysed by a negative binomial regression analysis (Figure 4B). On the other hand, this analysis considers only the total account of events per patient. Therefore, the same follow-up duration should be applied per patient, which sometimes restricts the application of this method.
COX-BASED MODELS FOR RECURRENT EVENTS
Negative binomial regression analysis is not applicable if the follow-up duration differs from patient to patient. To overcome this limitation, several time-to-event methods have been proposed for the analysis of repeated events. The Andersen-Gill model is a simple extension of the traditional Cox model and is based on a gap-time approach, in which the clock is reset after an event and the patient is at risk for the next event. This analysis assumes that the risk of an event is not affected by whether another event has already occurred4,5,9. The Wei-Lin-Weissfeld (WLW) model is different from the Andersen-Gill model in that it uses the time from study entry to the first, second and subsequent events (Figure 5A)8,9. In the WLW model, each time-ordered event is analysed on its own time-to-event basis, that is, for the first events in each patient, the second events in each patient, the third events in each patient, and so on. For these analyses, the “coxph” function from the “survival” package in R software could be helpful18. These analyses consider all adverse events and time to events. Therefore, these analyses are valuable in a high-risk population, like the negative binomial regression analysis. In addition, these analyses are applicable regardless of the follow-up duration of each patient. On the other hand, this methodological approach treats all adverse events as having equal severity; severe adverse events, such as death, could be underestimated as well as time-to-first-event analysis. In the REDUCE-IT trial, the primary endpoint (a composite of cardiovascular death, MI, stroke, revascularisation, or hospitalisation for unstable angina) – including recurrent events – was analysed using the Andersen-Gill and the WLW approaches (Figure 5B)9.
WEIGHTED COMPOSITE ENDPOINT (WCE)
The WCE methodology extends the standard time-to-event methodology by determining a weight for each non-fatal event (event severity) and incorporating all adverse events into the analysis (recurrent events)4,5,10,11. The WCE analysis requires four steps: 1) a decision on event weights, 2) calculation of residual weight at the end of each day in each patient, 3) creation of a modified life table with a weighted number of patients at risk, and 4) comparison of groups (Figure 6A).
In the field of cardiovascular disease, two sets of event weights have been used10,11. The first set gives a weight of 1.0 to death, 0.47 to stroke, 0.38 to MI, and 0.25 to target vessel revascularisation4,5. In the second set, death has a weight of 1.0, shock has a weight of 0.5, congestive heart failure has a weight of 0.3, re-MI has a weight of 0.2, and re-ischaemia has a weight of 0.1. These weights were decided on based on Delphi panels to achieve consensus between clinician-investigators. A Delphi panel is a panel of experts to achieve consensus in solving a problem or deciding on the most appropriate strategy based on the results of multiple rounds of questionnaires.
For calculation of residual weights at each time point, each patient starts with a weight of 1.0, which remains unaltered if no event occurs until the end of follow-up (Figure 6A-2a). Non-fatal events reduce the residual weight of a patient by the weight of the event (Figure 6A-2b, Figure 6A-2c, Figure 6A-2d). From the individual patient data, a modified life table with a weighted number of patients at risk is created, providing estimates of weighted event rates in each group and of a weighted hazard ratio for the reference group (Figure 6A-3). The WCE method allows the incorporation of repeated events in a single patient and distinguishes between the severity of components of the composite endpoint. The indication for this method is the same as that for time-to-first-event analysis. A representative analysis of the WCE in the DELTA registry4 is shown in Figure 6B. This approach may better reflect all event information, but evidently depends on the assigned event weights. Furthermore, weighting events reduces the number of effective events. Therefore, the WCE could limit power and it requires a larger sample size, although statistical power largely depends on severe outcomes, such as death19. To date, commercial statistical software does not support this analysis and there is no R package for this analysis in the Comprehensive R Archive Network or Bioconductor. Therefore, this analysis needs a dedicated program.
Comparison of methods – How do we treat a composite endpoint?
The differences in dealing with composite endpoints are shown in Figure 7. These statistical methods have recently been applied to several clinical trials in the field of cardiology (Figure 8, Figure 9). The estimated treatment effect, using multiple statistical methods, showed similar tendencies but, as expected, the significance of the treatment effect estimates was dependent on the statistical method used in the trials. The negative binomial regression and the Andersen-Gill analyses tended to have more statistical power than time-to-first-event analysis, while the statistical power of the WCE method tended to be low. In particular, the WCE method did not demonstrate a significant difference between treatments (Figure 8), in contrast with time-to-first-event analyses.
The method of counting a “series of events” has to be defined in detail for analyses using all adverse events20. Whenever a revascularisation is performed on the same day as MI, the number of serial events would depend on the methodological definition. Two events (MI and revascularisation) occurring on the same day could even be counted as one event4,9. Therefore, the method of event counting could affect the result.
The win ratio and WCE analyses depend on the severity ranking and weighting of event severity, which may induce arbitrariness of the comparison. On the other hand, a universal ranking is not appropriate because the event severity may depend on patient characteristics. For example, the impact of revascularisation is different in the patients with and without a history of percutaneous coronary intervention. The way to determine event severity should be discussed in future trials. Pre-specification of weights is necessary to avoid any arbitrariness.
Conclusion
All methods for the analysis of composite endpoints have strengths and weaknesses (Figure 10). Pre-specified novel statistical methods may enhance our understanding when components vary substantially in severity and timing. These methods should consider the specific types of patients, drugs, devices, events, and follow-up duration.
Guest Editor
This paper was guest edited by Adnan Kastrati, MD; Deutsches Herzzentrum, Munich, Germany.
Supplementary data
Acknowledgments
Conflict of interest statement
H. Hironori is supported by a grant for studying overseas from the Japanese Circulation Society and a grant from the Fukuda Foundation for Medical Technology. The other authors have no conflicts of interest to declare. The Guest Editor has no conflicts of interest to declare.
Abbreviations
- NYHA
New York Heart Association
- WCE
weighted composite endpoint
- WLW
Wei-Lin-Weissfeld
Contributor Information
Hironori Hara, Department of Cardiology, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands; Department of Cardiology, National University of Ireland, Galway (NUIG), Galway, Ireland.
David van Klaveren, Department of Public Health, Center for Medical Decision Making, Erasmus MC, Rotterdam, the Netherlands; Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA.
Norihiro Kogame, Department of Cardiology, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands.
Ply Chichareon, Department of Cardiology, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands.
Rodrigo Modolo, Department of Cardiology, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands.
Mariusz Tomaniak, Department of Cardiology, Erasmus Medical Center, Erasmus University, Rotterdam, the Netherlands; First Department of Cardiology, Medical University of Warsaw, Warsaw, Poland.
Masafumi Ono, Department of Cardiology, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands; Department of Cardiology, National University of Ireland, Galway (NUIG), Galway, Ireland.
Hideyuki Kawashima, Department of Cardiology, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands; Department of Cardiology, National University of Ireland, Galway (NUIG), Galway, Ireland.
Kuniaki Takahashi, Department of Cardiology, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands.
Davide Capodanno, Division of Cardiology, Cardio-Thoraco-Vascular and Transplant Department, CAST, Rodolico Hospital, AOU “Policlinico-Vittorio Emanuele”, University of Catania, Catania, Italy.
Yoshinobu Onuma, Department of Cardiology, National University of Ireland, Galway (NUIG), Galway, Ireland.
Patrick W. Serruys, Department of Cardiology, National University of Ireland, Galway (NUIG), Galway, Ireland; NHLI, Imperial College London, London, United Kingdom.
References
- Garcia-Garcia HM, McFadden EP, Farb A, Mehran R, Stone GW, Spertus J, Onuma Y, Morel MA, van Es GA, Zuckerman B, Fearon WF, Taggart D, Kappetein AP, Krucoff MW, Vranckx P, Windecker S, Cutlip D, Serruys PW Academic Research Consortium. Standardized End Point Definitions for Coronary Intervention Trials: The Academic Research Consortium-2 Consensus Document. Eur Heart J. 2018;39:2192–207. doi: 10.1093/eurheartj/ehy223. [DOI] [PubMed] [Google Scholar]
- Pocock SJ, Ariti CA, Collier TJ, Wang D. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. Eur Heart J. 2012;33:176–82. doi: 10.1093/eurheartj/ehr352. [DOI] [PubMed] [Google Scholar]
- Milojevic M, Head SJ, Andrinopoulou ER, Serruys PW, Mohr FW, Tijssen JG, Kappetein AP. Hierarchical testing of composite endpoints: applying the win ratio to percutaneous coronary intervention versus coronary artery bypass grafting in the SYNTAX trial. EuroIntervention. 2017;13:106–14. doi: 10.4244/EIJ-D-16-00745. [DOI] [PubMed] [Google Scholar]
- Capodanno D, Gargiulo G, Buccheri S, Chieffo A, Meliga E, Latib A, Park SJ, Onuma Y, Capranzano P, Valgimigli M, Narbute I, Makkar RR, Palacios IF, Kim YH, Buszman PE, Chakravarty T, Sheiban I, Mehran R, Naber C, Margey R, Agnihotri A, Marra S, Leon MB, Moses JW, Fajadet J, Lefevre T, Morice MC, Erglis A, Alfieri O, Serruys PW, Colombo A, Tamburino C DELTA Investigators. Computing Methods for Composite Clinical Endpoints in Unprotected Left Main Coronary Artery Revascularization: A Post Hoc Analysis of the DELTA Registry. JACC Cardiovasc Interv. 2016;9:2280–8. doi: 10.1016/j.jcin.2016.08.025. [DOI] [PubMed] [Google Scholar]
- Bakal JA, Roe MT, Ohman EM, Goodman SG, Fox KA, Zheng Y, Westerhout CM, Hochman JS, Lokhnygina Y, Brown EB, Armstrong PW. Applying novel methods to assess clinical outcomes: insights from the TRILOGY ACS trial. Eur Heart J. 2015;36:385–92a. doi: 10.1093/eurheartj/ehu262. [DOI] [PubMed] [Google Scholar]
- Rogers JK, McMurray JJ, Pocock SJ, Zannad F, Krum H, Van Veldhuisen DJ, Swedberg K, Shi H, Vincent J, Pitt B. Eplerenone in patients with systolic heart failure and mild symptoms: analysis of repeat hospitalizations. Circulation. 2012;126:2317–23. doi: 10.1161/CIRCULATIONAHA.112.110536. [DOI] [PubMed] [Google Scholar]
- Rogers JK, Pocock SJ, McMurray JJ, Granger CB, Michelson EL, Ostergren J, Pfeffer MA, Solomon SD, Swedberg K, Yusuf S. Analysing recurrent hospitalizations in heart failure: a review of statistical methodology, with application to CHARM-Preserved. Eur J Heart Fail. 2014;16:33–40. doi: 10.1002/ejhf.29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mogensen UM, Gong J, Jhund PS, Shen L, Kober L, Desai AS, Lefkowitz MP, Packer M, Rouleau JL, Solomon SD, Claggett BL, Swedberg K, Zile MR, Mueller-Velten G, McMurray JJV. Effect of sacubitril/valsartan on recurrent events in the Prospective comparison of ARNI with ACEI to Determine Impact on Global Mortality and morbidity in Heart Failure trial (PARADIGM-HF). Eur J Heart Fail. 2018;20:760–8. doi: 10.1002/ejhf.1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhatt DL, Steg PG, Miller M, Brinton EA, Jacobson TA, Ketchum SB, Doyle RT, Jr, Juliano RA, Jiao L, Granowitz C, Tardif JC, Gregson J, Pocock SJ, Ballantyne CM REDUCE-IT Investigators. Effects of Icosapent Ethyl on Total Ischemic Events: From REDUCE-IT. J Am Coll Cardiol. 2019;73:2791–802. doi: 10.1016/j.jacc.2019.02.032. [DOI] [PubMed] [Google Scholar]
- Armstrong PW, Westerhout CM, Van de Werf F, Califf RM, Welsh RC, Wilcox RG, Bakal JA. Refining clinical trial composite outcomes: an application to the Assessment of the Safety and Efficacy of a New Thrombolytic-3 (ASSENT-3) trial. Am Heart J. 2011;161:848–54. doi: 10.1016/j.ahj.2010.12.026. [DOI] [PubMed] [Google Scholar]
- Bakal JA, Westerhout CM, Cantor WJ, Fernández-Avilés F, Welsh RC, Fitchett D, Goodman SG, Armstrong PW. Evaluation of early percutaneous coronary intervention vs. standard therapy after fibrinolysis for ST-segment elevation myocardial infarction: contribution of weighting the composite endpoint. Eur Heart J. 2013;34:903–8. doi: 10.1093/eurheartj/ehs438. [DOI] [PubMed] [Google Scholar]
- Luo X, Tian H, Mohanty S, Tsai WY. An alternative approach to confidence interval estimation for the win ratio statistic. Biometrics. 2015;71:139–45. doi: 10.1111/biom.12225. [DOI] [PubMed] [Google Scholar]
- Wolbers M, Koller MT, Stel VS, Schaer B, Jager KJ, Leffondre K, Heinze G. Competing risks analyses: objectives and approaches. Eur Heart J. 2014;35:2936–41. doi: 10.1093/eurheartj/ehu131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanda Y. Investigation of the freely available easy-to-use software ‘EZR’ for medical statistics. Bone Marrow Transplant. 2013;48:452–8. doi: 10.1038/bmt.2012.244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scrucca L, Santucci A, Aversa F. Competing risk analysis using R: an easy guide for clinicians. Bone Marrow Transplant. 2007;40:381–7. doi: 10.1038/sj.bmt.1705727. [DOI] [PubMed] [Google Scholar]
- Summers MR, Leon MB, Smith CR, Kodali SK, Thourani VH, Herrmann HC, Makkar RR, Pibarot P, Webb JG, Leipsic J, Alu MC, Crowley A, Hahn RT, Kapadia SR, Tuzcu EM, Svensson L, Cremer PC, Jaber WA. Prosthetic Valve Endocarditis After TAVR and SAVR: Insights From the PARTNER Trials. Circulation. 2019;140:1984–94. doi: 10.1161/CIRCULATIONAHA.119.041399. [DOI] [PubMed] [Google Scholar]
- Venables WN, Ripley BD, Venables WN. Modern applied statistics with S. 2002 [Google Scholar]
- Therneau TM, Grambsch PM. Modeling survival data : extending the Cox model. 2000 [Google Scholar]
- Bakal JA, Westerhout CM, Armstrong PW. Impact of weighted composite compared to traditional composite endpoints for the design of randomized controlled trials. Stat Methods Med Res. 2015;24:980–8. doi: 10.1177/0962280211436004. [DOI] [PubMed] [Google Scholar]
- Granger CB, Nelson AJ, Pagidipati NJ. Risk of Total Events With Icosapent Ethyl: Can We Reduce It? J Am Coll Cardiol. 2019;73:2803–5. doi: 10.1016/j.jacc.2019.03.492. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.