Abstract
With the recent development of transplant-specific composite endpoints for evaluation of allogeneic hematopoietic cell transplantation (alloHCT) outcomes, the use of these novel endpoints is growing rapidly. Combining multiple endpoints into a single endpoint, these composite endpoints appear simple and can be used as a summary measure for overall effectiveness of an intervention. However, all component endpoints may not have equal clinical significance and an intervention may not work proportionally in the same direction for all components of a composite endpoint. This may complicate the interpretation of results, particularly if there are opposing effects of differing component endpoints. We assess the benefits and limitations of various composite endpoints used in alloHCT studies recently and propose guidelines for their use and interpretation.
Keywords: Composite Endpoint, GRFS, allogeneic hematopoietic cell transplantation
Introduction
Allogeneic hematopoietic cell transplantation (alloHCT) is a long-established therapy for hematologic malignancies and offers an opportunity for cure. The therapeutic benefit and potential cure achieved with alloHCT is mediated by the graft-vs-tumor (GVT) effect that limits relapse based upon anti-neoplastic alloreactivity derived from the donor’s immune system [1-3]. Therefore, harnessing the GVT effect is critical for the success of HCT. However, GVT is closely associated with graft-versus-host disease (GVHD), an immune reaction against normal host tissues that can result in morbid and debilitating complications and compromise quality of life for many survivors. Consequently, attempts to enhance GVT have often led to increased GVHD and interventions to reduce GVHD have compromised GVT, raising risks of malignant relapse [4]. Therefore, although transplant outcomes have been substantially and steadily improving over the past 20 years, enhancing GVT while also controlling GVHD remains elusive.
Choices of donor, graft type, conditioning intensity and GVHD prophylaxis are all parts of the complex strategy needed to maximize complication-free survival. To assess the GVT effect and compare the effectiveness of different HCT strategies, relapse-free or progression-free survival (PFS) has been widely used as an important endpoint in alloHCT studies. Overall survival (OS), while unambiguous, is less appropriate for evaluating alloHCT strategies as salvage therapies following disease relapse can rescue or control the malignancy. Consequently, the net effect of alloHCT strategies can be diluted. PFS, however, does not take GVHD directly into account unless severe GVHD leads to death and thus it must therefore be assessed separately. In an effort to assess both GVT and GVHD plus other life-threatening or lethal morbidities in a single endpoint, the Blood and Marrow Transplant Clinical Trials Network (BMT CTN) developed a transplant-specific composite endpoint of GVHD-free, relapse-free survival (GRFS) [5] which has been adopted as a primary endpoint in many alloHCT clinical trials [6]. By combining multiple endpoints in an aggregated single endpoint, a composite endpoint can substantially reduce sample sizes needed to compare transplant strategies which would be prohibitive if multiple endpoints are considered simultaneously under a competing risks framework. However, because all components of a composite endpoint do not have equal consequence, severity or permanence, these composite endpoints come with limitations and challenges of interpretation. If misused or misinterpreted, they can complicate evaluation of a best strategy for alloHCT and confound clinical decision-making. In this article, we review recently developed composite endpoints in transplant studies, discuss their benefits and limitations, and provide examples where they could be used to compare transplant strategies.
Composite Endpoints for alloHCT Trials
A composite endpoint aggregates two or more distinct endpoints (called ‘components’ [7]) into a single endpoint. Well-known composite endpoints include disease-free survival (DFS), relapse-free survival (RFS), progression-free survival (PFS) and event-free survival (EFS). Given that there have been no international consensus standards for the definition of survival endpoints used in cancer clinical trials, DFS and RFS are synonymous according to the National Cancer Institute’s dictionary of cancer terms. PFS is similarly defined though most applicable if patients are not in complete remission at the onset of the transplant and widely adopted in transplant studies. Of note, DFS is often termed as leukemia-free survival (LFS) in studies of leukemia patients. While DFS, RFS and PFS are similarly defined as time to disease recurrence/progression or death and used almost interchangeably, events considered in EFS may not be limited to disease recurrence and death. EFS has been defined differently in clinical trials depending on the disease. It is generally defined as time to onset of pre-specified adverse event, institution of new therapy, onset of new malignancy, disease relapse/progression or death, whichever occurs first. Recently developed novel composite endpoints for transplant trials are different types of EFS endpoints and these are listed below.
GVHD-free, relapse-free survival (GRFS).
This is the time to first occurrence of grade III-IV acute GVHD, chronic GVHD requiring systemic treatment, relapse, or death, whichever occurs first [5].
Chronic GVHD-free, relapse-free survival (CRFS).
This is the time to first occurrence of chronic GVHD requiring systemic therapy, relapse or death, whichever occurs first [8].
GVHD-free survival.
This is the time to first occurrence of grade III-IV acute GVHD, chronic GVHD requiring systemic treatment or death, whichever occurs first [6,9].
Current GVHD-free, relapse-free survival (cGRFS).
This is the probability of being alive, disease-free without moderate-severe chronic GVHD at a particular time point after HCT. The components of cGRFS are the same as for GRFS. Unlike GRFS, CRFS, and GVHD-free survival all of which decrease as a function of time and thus can be estimated by the Kaplan-Meier method, because it includes resolution of chronic GVHD, cGRFS may decrease and later increase as a function of time. cGRFS is estimated by linear combinations of five Kaplan-Meier estimates in a multistate model with 10 possible health states. In the model, chronic GVHD is considered as potentially temporary and its resolution is accounted for in the multistate model [10].
Dynamic GVHD-free, relapse-free survival (dGRFS).
dGRFS is similar to cGRFS except that resolution of both relapse and acute/chronic GVHD is included in a multistate model with 13 possible health states [11]. Like cGRFS, dGRFS may decrease and increase as a function of time.
Since cGRFS and dGRFS use data after the first component event, they require rigorous data collection over a prolonged time period to capture resolution dates of relapse and GVHD which may fluctuate between active and resolved. The challenge of accurately collecting such data and designing trials based on hypothesized treatment effects at various time points has led to reluctance to using these endpoints as primary endpoints, particularly in multicenter trials. Moreover, definitions of the components are not fully standardized. Some studies have defined a patient has having chronic GVHD regardless of whether systemic therapy was required or not. Others required patients to have moderate to severe chronic GVHD by National Institutes of Health (NIH) criteria versus an alternative of chronic GVHD requiring systemic immunosuppression. One other [12] required patients to have severe chronic GVHD only as systemic treatment of cGvHD does not always reflect the severity of cGvHD. Treatment could be indicated for patients with moderate disease. Information on the start of systemic therapy is not routinely collected by the major HCT registries.
A shortcoming of composite endpoints such as GRFS or CRFS is that all components are assumed to have equal clinical significance. Furthermore, if a patient experiences multiple events (e.g., GVHD and death), only the patient’s first event (GVHD) is counted. The first event is typically of lesser clinical importance than subsequent events. In response to this shortcoming, modified composite endpoints have been proposed including a weighted composite endpoint [13] or a rank-based composite endpoint [14, 15]. A weighted composite endpoint takes into account the clinical significance of its components by pre-specifying weights for those components. For example, if the weight for death is 1, the weight for GVHD or relapse is less than 1. Determination of weights requires a consensus and must be pre-specified prior to conducting trials. Schoenfeld and Finkelstein [14] and Pocock et al [15] proposed a rank-based approach in which component endpoints are ranked according to their severity. For example, death is worse than relapse which is worse than GVHD. All pairs of patients, one from each treatment arm in each pair, are compared with respect to the severity and timing of endpoints. More severe, earlier endpoints are of greater consequence. The superior treatment corresponds to the treatment with less severe and/or later endpoints in more of the pairs. These modified composite endpoints have not yet been utilized in alloHCT studies.
Benefits of Composite Endpoints
Because it combines several clinically relevant endpoints into a single endpoint, a composite endpoint is simple and avoids the issue of controlling competing events when designing randomized clinical trials. Since competing risks are inherent to alloHCT studies, if a single component is a primary endpoint, it is necessary to control competing events in sample size calculation to show that an intervention does not exacerbate the competing events. If an intervention benefits all of the component outcomes, the composite endpoint will potentially have higher statistical power, requiring fewer patients and a shorter follow-up time [16, 17]. The alternative is to calculate sample size within the competing risks framework [18,19]. These approaches typically require a much larger sample size to achieve the same level of statistical power due to the competing event incidence that needs to be controlled. Since conducting a randomized trial in transplant is very resource intensive, with a limited patient pool and often costly interventions, if this assumption is satisfied, this can be an important advantage in trial design.
Limitations of Composite Endpoints
Composite endpoints lump all component endpoints together and treat each component with equal clinical severity. This assumption is often not satisfied. If the clinical severity is unequal among component endpoints, use of a composite endpoint as a sole basis for decision-making raises serious concerns and may lead to misuse and misinterpretation of study results. For example, GVHD, relapse and death are treated with equal severity in GRFS although relapse and death are much more serious than GVHD which frequently responds to treatment and may, in fact, limit the risks of later relapse [6, 20-23]. In a recent retrospective cohort study of 565 adult patients undergoing first-time alloHCT, Magenau et al [24] examined the relative impact of each GRFS component on overall survival in a Cox regression model treating each component as a time-dependent covariate. Relapse conferred the greatest risk for death (hazard ratio [HR], 7.89; 95% confidence interval [CI], 5.83,10.69), followed by grade III-IV acute GVHD (HR, 6.16; 95% CI, 1.16, 2.46) and then chronic GVHD requiring systemic immunosuppressive therapy (HR, 1.69; 95% CI, 1.16 to 2.46). The overall GRFS composite of these three components (excluding death) had an HR of 4.81 (95% CI, 3.61 to 6.41) for death, which was primarily driven by the impact of relapse and grade III-IV acute GVHD on death. Lower risks from chronic GVHD requiring immunosuppressive therapy is not surprising as many cases of chronic GVHD can be reversed by appropriate treatment. This result is consistent with the report by Solomon et al. [10]. In this retrospective study of 422 patients who received an alloHCT, patients living with active moderate-to-severe chronic GVHD decreased from 23% at 1 year to 4% at 4 years post-transplant. Most patients’ chronic GVHD resolved, even if associated with ongoing morbidity and an extended treatment burden.
Other limitations associated with composite endpoints include directionality and proportionality. A composite endpoint is most powerful when the intervention works proportionally and in the same direction for its components. Figure 1A illustrates this scenario when there are three components. A less powerful scenario is shown in Figure 1B where the intervention reduces the risk of an event of interest (e.g., GVHD) without affecting the other (competing) events (e.g., NRM and relapse). An example of this is the BMT CTN 1203 study which tested three novel GVHD prophylactic treatments (TAC/MMF/post-transplant cyclophosphamide (PTCy), TAC/MTX/Bortezomib, TAC/MTX/Maraviroc) against a non-randomized contemporary control [6]. (Figures 2A). In this study, cumulative incidences of both grade III-IV acute GVHD and chronic GVHD requiring immunosuppression were found to be significantly lower in the TAC/MMF/PTCy arm compared to the control arm without inciting relapse or NRM. In other words, higher GRFS in the TAC/MMF/PTCy arm was derived from a lower incidence rate of GVHD without compromising competing events of relapse or death.
Figure 1.
Hypothetical Scenarios for a composite endpoint which includes GVHD, NRM, relapse as its components. ‘C’ denotes the control arm; ‘E’ denotes the experimental arm. (A) All component endpoints proportionally decrease in the experimental arm. (B) GVHD decreases, but other components are unchanged in the experimental arm. (C) Decrease in GVHD, but increase in relapse in the experimental arm. (D) Decrease in GVHD, but increase in NRM in the experimental arm. (E) Decrease in GVHD, but increase in both NRM and relapse in the experimental arm.
Figure 2.
(A) BMT CTN 1203. Estimates of 1-year cumulative incidences of IS requiring chronic GVHD, NRM and relapse and 6 months grade III-IV acute GVHD. IS: immunosuppression. TAC: tacrolimus. MMF: mycophenolate. PTCy: post transplant cyclophosphamide. BORT: bortezomib. MVC: maraviroc. (B) BMT CTN 1302. Estimates of 2-year cumulative incidence of relapse, NRM and moderate-severe chronic GVHD, and 6-months cumulative incidence of grade III-IV acute GVHD. (C) ATLG study [20]. Estimates of 2-year cumulative incidences of moderate-severe chronic GVHD, NRM and relapse and 6 months grade III-IV acute GVHD. Note: incidence rates of these four events were taken from the published reports and stacked up for the purpose of presentation. Therefore, the sum of these rates may not be the same as the complementary of GRFS (i.e., 1-GRFS) although the sum of incidence rates of relapse and NRM is the complementary of PFS (i.e. 1-PFS). Cumulative incidence rates of moderate-severe or IS-requiring chronic GVHD and grade III-IV acute GVHD were calculated separately in the competing risks framework. *:P<0.05; **:P<0.01 in comparison with the control cohort.
Since a composite endpoint as a single endpoint does not control the rate of competing events, it is possible for an intervention to decrease the event of interest (e.g. GVHD), yet increase the competing events such as NRM or relapse. (Figures 1C-1E) This is shown in BMT CTN 1301 [25]. In this study, two calcineurin inhibitor-free GVHD prophylactic strategies (CD34 selection and PTCy) were compared to the control arm (TAC/MTX). The primary endpoint, CRFS, was similar across the three arms, but OS was significantly lower and NRM was significantly higher in the CD34 selection arm compared to the control arm. (Figure 2B). Another example is a prospective, randomized, doubleblind phase 3 trial of anti-T-lymphocyte globulin (ATLG) study to assess the impact of ATLG on chronic GVHD [26]. In this study, CRFS was similar between the ATLG and placebo arms (p=0.73) while the cumulative incidence of moderate-to-severe chronic GVHD was significantly lower in the ATLG arm (p<0.001). Although CRFS was higher, PFS (47% vs 65%, respectively, p=0.04) was significantly lower in the ATLG arm (Figure 2C). In Figure 2C, the complementary of PFS (i.e., 1-PFS) is broken down into relapse (32% vs 21%, p=0.1) and NRM (21% vs 13.5%, p=0.53).
If the effects of an intervention on different components point in opposite directions, there will be justifiably less power for the composite endpoint since the superiority of the intervention is questionable. This is particularly true when the intervention is hypothesized to directly impact on a less consequential endpoint (e.g. GVHD) versus the other components which are more serious (e.g. death or relapse). In fact, in many transplant studies, the assumption of directionality and proportionality is rarely met because GVT and GVHD are immunologically intertwined with opposing clinical consequences. That is, an immunosuppressive intervention that reduces GVHD risk often comes at the expense of compromising GVT although newer strategies such as PTCy could potentially decouple GVHD from GVL. Graft lymphodepletion was an early example. It is known that ex vivo or in vivo T-cell depleted allogeneic HCT can reduce GVHD but may negatively affect relapse by delaying immune recovery [4, 27]. Unmodified transplants are associated with a higher risk of GVHD than alloHCT using lymphodepleting regimens [27]. As a result, a bias in favor of lymphodepleting regimens can occur if the composite endpoint includes GVHD. In such a case, the overall efficacy evaluation can lead to potential misinterpretation. CD34-selected transplants with resultant lymphodepletion present an additional issue. CD34-selected transplants have lower GVHD risk, but higher infection rates [28] which are not captured in GRFS. It is unclear whether serious infections should be included in a composite endpoint although it could be difficult to include infections in the primary endpoint since there are many infection types with varying severity. Also, a composite endpoint with too many components may complicate the interpretation due to the directionality problems described above.
Example: 8/8 HLA allele MUD vs UCBT
To illustrate the limitations of a composite endpoint, we conducted a retrospective data analysis of 545 patients who underwent 8/8 HLA allele matched unrelated peripheral blood stem cell transplant (8/8 MUD) and 65 patients who underwent double unit umbilical cord transplant (UCBT) at the Dana-Farber Cancer Institute between 2010 and 2015. For the entire cohort, the median follow-up time among survivors was 54 months (range 14, 100). Four-year GRFS was 24% in the UCBT cohort and 8% in the 8/8 MUD cohort (p=0.0026). (Figure 3A, Table 1); four-year CRFS was 27% in the UCBT cohort and 9% in the 8/8 MUD cohort (p=0.0085). (Figure 3B, Table 1). However, 4-year OS was 40% in the UCBT cohort and 52% in the 8/8 MUD cohort (p=0.024). (Figure 3C, Table 1) The lower OS in the UCBT cohort was largely due to the higher NRM rate in this cohort as 4-year NRM was 28% vs 15%, respectively, p=0026 using the Gray test [29]. (Figure 3D, Table 1). The gain in GRFS and CRFS in the UCBT cohort is largely due to the substantially lower chronic GVHD incidence rate: 2-year cumulative incidence of chronic GVHD 23% vs 53% in the UCBT and 8/8 MUD cohort, respectively (p<0.001 using the Gray test). (Figures 3E, Table 1). In contrast, the competing event of relapse or death without developing chronic GVHD was significantly higher in the UCBT cohort (50% vs 32%, respectively, p=0.01). (Figures 3F, Table 1). By examining event times, it is noticed that many competing events occurred early after UCBT, which is consistent with previous reports that UCBT is associated with an increased risk of early mortality from delayed engraftment or infection [30]. This suggests that the lower GVHD rate for UCBT may in part be due to its higher early death rate. Because composite endpoints do not distinguish event types and event times of each event, the higher GRFS/CRFS outcomes in UCBT, which is driven by a lower chronic GVHD rate, mask the higher relapse and/or death rate. This example shows that to assess the benefits and risks of an intervention properly, all component endpoints must be evaluated and presented simultaneously. Composite endpoints evaluated without proper attention can lead to incorrect conclusions and thereby undermine the progress of HCT research and ultimately patient care.
Figure 3.
Comparison of endpoints between 545 patients in the 8/8 MUD cohort and 65 patients in the UCBT cohort. Comparison of (A) GRFS, (B) CRFS, (C) OS, (D) cumulative incidence of NRM treating relapse as a competing event, (E) cumulative incidence of chronic GVHD, and (F) cumulative incidence of the competing event of chronic GVHD, which is relapse or death (Rel/Death) without developing chronic GVHD. Log-rank test was used for comparison of GRFS, CRFS and OS whereas Gray test (23) was used for comparison of NRM, chronic GVHD, and Rel/Death without chronic GVHD
Table 1.
Summary of Clinical Outcome
| N | 4y GRFS | 4y CRFS | 4y OS | 4y NRM | 4y Relapse | 2y cGVHD | 2y Rel/Death | |
|---|---|---|---|---|---|---|---|---|
| UCBT | 65 | 24% (15, 35) | 27% (17, 39) | 40% (28, 52) | 28% (18, 40) | 34% (23, 45) | 23% (14, 34) | 48% (35, 59) |
| 8/8 MUD | 545 | 8% (6, 11) | 9% (7, 12) | 52% (47, 56) | 15% (12, 18) | 39% (35, 43) | 53% (48, 57) | 33% (29, 37) |
| p-value | 0.026 | 0.009 | 0.024 | 0.0026 | 0.49 | <0.001 | 0.01 |
() 95% confidence interval. Log-rank test was used for group comparisons of GRFS, CRFS and OS; Gray test (29) was used for group comparisons of NRM, relapse, chronic GVHD (cGVHD) and relapse/death. Relapse or death without developing cGVHD was a competing event for cGVHD.
Multivariable Regression Analysis for Composite Endpoints
Because composite endpoints combine all failure events together, it is not possible to identify risk factors for individual components and risk factors for directionally opposing component events may not be identified. Therefore, it is difficult to make any meaningful interpretation of the result from multivariable regression analysis. This is seen in a multivariable analysis of the aforementioned example for 8/8 MUD vs. UCBT. In multivariable analysis, UCBT is associated with significantly shorter survival (hazard ratio (HR) 1.96, p=0.0008) but lower cGVHD (subdistribution HR (sHR) 0.27, p<0.0001) using a Fine and Gray model [31] compared with 8/8 MUD. (Table 2) When these directionally opposite events are combined in a composite endpoint of GRFS, UCBT is marginally associated with improved GRFS (HR 0.73, p=0.058) despite the clear finding of worse OS. Advanced disease status is a significant risk factor for GRFS (HR 1.33 for no CR/PR, p=0.009), but this is largely due to the effect of disease status on OS (HR 1.45 for no CR/PR, p=0.007) and NRM (sHR 1.75, p=0.017). A higher HCT-CI score is a significant risk factor for OS (HR 1.74 for HCT-CI≥3, p<0.0001) and relapse (sHR 1.5, p=0.01) but is inversely associated with chronic GVHD (sHR 0.68, p=0.006). Due to the opposing effect of HCT-CI on chronic GVHD versus OS and relapse, HCT-CI is not significant in GRFS (HR 1.09, p=0.40). Multivariable regression analysis is important for identification of event-specific risk factors and thereby for developing event-specific interventions. This example shows that multivariable analysis for composite endpoints is not useful for identification of event-specific risk factors and interpretation of the result can be misleading.
Table 2.
Multivariable Cox model for GRFS, CRFS and OS and multivariable Fine and Gray model for cGVHD, NRM and Relapse.
| GRFS | CRFS | OS | cGVHD | NRM | Relapse | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HR | 95% CI | p-val. | HR | 95% CI | p-val. | HR | 95% CI | p-val. | sHR | 95% CI | p-val. | sHR | 95% CI | p-val. | sHR | 95% CI | p-val. | ||||||||
| Cohort | 8/8 MUD | ref | ref | ref | ref | ref | ref | ||||||||||||||||||
| UCBT | 0.73 | 0.52 | 1.01 | 0.058 | 0.68 | 0.48 | 0.95 | 0.025 | 1.96 | 1.33 | 2.91 | 0.0008 | 0.27 | 0.15 | 0.46 | <.0001 | 3.35 | 1.84 | 6.12 | <.0001 | 0.83 | 0.49 | 1.41 | 0.49 | |
| Age | 1.00 | 0.99 | 1.01 | 0.70 | 1.00 | 1.00 | 1.01 | 0.55 | 1.01 | 1.00 | 1.03 | 0.014 | 1.00 | 0.99 | 1.01 | 0.53 | 1.02 | 1.00 | 1.04 | 0.046 | 1.01 | 1.00 | 1.02 | 0.27 | |
| Dz Status | CR | ref | ref | ref | ref | ref | ref | ||||||||||||||||||
| PR | 1.19 | 0.96 | 1.48 | 0.12 | 1.15 | 0.93 | 1.44 | 0.20 | 0.93 | 0.68 | 1.27 | 0.65 | 1.10 | 0.83 | 1.44 | 0.51 | 0.97 | 0.57 | 1.66 | 0.92 | 1.01 | 0.72 | 1.43 | 0.94 | |
| no CR/PR | 1.33 | 1.08 | 1.65 | 0.009 | 1.26 | 1.02 | 1.57 | 0.035 | 1.45 | 1.11 | 1.91 | 0.007 | 0.95 | 0.71 | 1.26 | 0.72 | 1.75 | 1.11 | 2.77 | 0.017 | 1.07 | 0.77 | 1.48 | 0.69 | |
| HCT-CI | 0 | Ref | Ref | ref | ref | ref | ref | ||||||||||||||||||
| 1 | 0.99 | 0.76 | 1.29 | 0.92 | 0.99 | 0.75 | 1.29 | 0.92 | 1.09 | 0.74 | 1.59 | 0.66 | 1.05 | 0.77 | 1.44 | 0.76 | 1.34 | 0.75 | 2.41 | 0.32 | 0.70 | 0.43 | 1.14 | 0.16 | |
| 2 | 1.02 | 0.78 | 1.33 | 0.87 | 1.11 | 0.85 | 1.45 | 0.44 | 1.36 | 0.94 | 1.98 | 0.11 | 0.64 | 0.45 | 0.92 | 0.016 | 1.04 | 0.53 | 2.04 | 0.92 | 1.37 | 0.93 | 2.04 | 0.11 | |
| ≥ 3 | 1.09 | 0.89 | 1.34 | 0.40 | 1.12 | 0.91 | 1.38 | 0.28 | 1.74 | 1.32 | 2.29 | <.0001 | 0.68 | 0.52 | 0.90 | 0.006 | 1.12 | 0.70 | 1.80 | 0.63 | 1.50 | 1.11 | 2.05 | 0.01 | |
CI: confidence interval. HR: hazard ratio. sHR: subdistribution hazard ratio from Fine and Gray’s competing risks regression analysis (31). Dz Status: disease status at HCT. CR: complete remission. PR: partial remission. HCT-CI: HCT comorbidity index. All models included cohort, age, patient sex, disease status, HCT-CI, M<-F (male patient with female donor) Y/N, conditioning intensity, and Karnofsky score. Patient sex, M<-F, conditioning intensity, and Karnofsky score are not significant in any of these models. Therefore, these factors are not presented in the table for the purpose of clarity.
Designing Randomized Clinical Trials
Randomized clinical trials (RCTs) have played a pivotal role for new drug evaluation and providing evidence for decision-making and clinical guidelines. The success of a RCT is highly dependent on the choice of the primary endpoint. The optimal primary endpoint is a single endpoint that comprehensively characterizes the disease under investigation, informs with clinical evidence on the safety and efficacy of the intervention, provides clear interpretation of the effect, and can be readily analyzed using simple statistical methods [7]. A composite endpoint would be most appropriate as the primary endpoint when the incidence rate of each component is too low to be considered for a study with adequate power and when individual components have reasonably similar clinical importance. When components with low incidence rates are combined, the composite endpoint can provide a substantially higher overall event rate to have adequate power. A composite endpoint such as GRFS or CRFS is generally inadequate as the primary endpoint in transplant clinical trials due to the limitations discussed above. This is particularly relevant for GVHD prevention trials. Because the incidence rate of GVHD is high, the effect of an intervention can be entirely driven by GVHD. Even if an intervention is effective for preventing GVHD, it can adversely affect more serious components such as relapse or death. Therefore, if GRFS or CRFS is considered for the primary endpoint in a prospective RCT, plans such as designing a co-primary endpoint or imposing rigorous early stopping rules on critical endpoints should be made to control the incidence rates of critical endpoints. [32]
For GVHD treatment trials, the primary endpoint is usually response at a pre-specified time point. In such a trial, it is critical that patients who die without response assessment should be counted as treatment failure. If excessive immunosuppression which controls GVHD augments risks of infection and consequent NRM, then GVHD-free survival rate at a specific timepoint (e.g. day 56 GVHD-free survival) captures both the response and avoidance of secondary morbid complications.
dGRFS or cGRFS can be used as secondary endpoints along with quality-of-life (QOL) measures. Although important, QOL applies mostly to a subset of patients who survive a certain length of time; past their acute course. Thus, it is most appropriate if QOL is assessed separately for those patients as a supplement to defined clinical endpoints. QOL is also closely correlated with the clinical outcome. In other words, if an intervention lowers the incidence of moderate-severe chronic GVHD, the overall QOL in the intervention cohort should also be improved as it is known that patients with no or mild chronic GvHD generally have better QOL compared to those with moderate or severe chronic GVHD [33]. Thus, development of better prevention and/or treatment of moderate to severe chronic GvHD should be associated with improved QOL. In QOL studies, complete data capture for patient-reported outcomes is also a major challenge. In a recent study of patient-reported outcomes in association with chronic GVHD [33], among 3027 long term survivors with a mean survival time 13.8 years, only 45% of patients (N=1377) returned their survey indicating that subject compliance is a concern in QOL assessment. In addition, it may be difficult to determine what is a clinically meaningful improvement in patient-reported outcomes.
Data Analysis of Composite Endpoints
If a composite endpoint is the primary endpoint, data analysis of the composite endpoint must be accompanied by the analyses of individual components. This is particularly important for transplant trials with directionally different component effects. Even when efficacy is demonstrated for the composite endpoint, it remains important to examine the individual component effects to obtain a more complete interpretation of the results. This point is well described in the Food and Drug Administration (FDA) draft for the regulatory guidance and statistical strategies [32, 34]. According to the FDA draft, the effect on the composite endpoint will not be a reasonable indicator of the effect of all of the components if the clinical importance of different components is substantially different and the effect of an intervention is chiefly on the least important event. Furthermore, it is possible that a component with greater importance may appear to be adversely affected by the treatment, even if one or more event types of lesser importance are favorably affected. So that although the overall outcome still has a favorable statistical result, doubt may arise about the treatment’s clinical value. In this case, although the overall statistical analysis indicates the treatment is successful, careful examination of the data may challenge the conclusion. For this reason, as well as for a greater depth of understanding of the treatment’s effects, analyses of the components of the composite endpoint are important [32].
Conclusion
Cancer treatments come with substantial risks of treatment-related morbidity and mortality. While the ultimate goal is to achieve a longer and adverse event-free survival, patients are often treated aggressively with the aim of net clinical benefit and potential cure. GRFS or CRFS is generally inadequate as the primary endpoint although these composite endpoints can be used as a summary measure for overall effectiveness of an intervention. Since deleterious, but sometimes treatable events such as chronic GVHD occur commonly after alloHCT, if used as the primary endpoint, these composite endpoints can mislead the true efficacy and interpretation of transplant strategies. As a result, this may confound patient care decisions since many high-risk patients with hematologic disorders benefit from alloHCT despite the accompanying adverse, yet manageable complications. Dynamic GRFS and current GRFS could be used as a summary measure as well or comparison for overall treatment efficacy along with critical endpoints, but these endpoints are also unsuitable for primary endpoints in prospective (prevention) clinical trials as the multistate model required for these endpoints may be too complicated to evaluate prospectively and present challenges for sample size determination and data collection.
It is critical to identify risk factors for each component so that appropriate interventions can be developed. Thus, when a composite endpoint is analyzed, all its components must be analyzed and reported to provide a more complete description of treatment efficacy. Numerous factors must be examined and all these approaches may have their value in advancing the field. But the complexities and possible pitfalls in composite endpoints must be recognized.
Highlights.
The use of recently developed composite endpoints for evaluation of allogeneic hematopoietic cell transplantation (alloHCT) outcomes is growing rapidly.
All component endpoints may not have equal clinical significance and an intervention may not work proportionally in the same direction for all components of a composite endpoint. This may complicate the interpretation of results, particularly if there are opposing effects of differing component endpoints.
We assess the benefits and limitations of various novel composite endpoints used in alloHCT studies and propose guidelines for their use and interpretation.
Acknowledgements
This work was supported by research funding from the National Cancer Institute (P01CA229092) (HTK). Support for the BMT CTN 1203 and 1301 trials were provided by grant #U10HL069294 to the Blood and Marrow Transplant Clinical Trials Network from the National Heart, Lung, and Blood Institute and the National Cancer Institute. We thank the Blood and Marrow Transplant Clinical Trials Network for permitting use of the published data. The content is solely the responsibility of the authors and does not necessarily represent the official. We also thank Dr. Robert Soiffer for permitting use of the published data of ATLG [22].
Footnotes
Conflict-of-interest disclosure: None
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- 1.Weiden PL. Flournoy N, Thomas ED, et al. Antileukemic effect of graft-versus-host disease in human recipients of allogeneic-marrow grafts. N Engl J Med. 1979;300:1068–73. [DOI] [PubMed] [Google Scholar]
- 2.Horowitz MM, Gale RP, Sondel PM, et al. Graft-versus-leukemia reactions after bone marrow transplantation. Blood. 1990;75:555–562 [PubMed] [Google Scholar]
- 3.Kolb HJ. Donor leukocyte transfusions for treatment of recurrent chronic myelogenous leukemia in marrow transplant patients. Blood. 1990;76: 2462–5. [PubMed] [Google Scholar]
- 4.Marmont AM, Horowitz MM, Gale RP, et al. T-cell depletion of HLA-identical transplants in leukemia. Blood. 1991;78:2120–30. [PubMed] [Google Scholar]
- 5.Holtan SG, DeFor TE, Lazaryan A, Bejanyan N, Arora M, Brunstein CG, Blazar BR, MacMillan ML, Weisdorf DJ. Composite end point of graft-versus-host disease-free, relapse-free survival after allogeneic hematopoietic cell transplantation. Blood. 2015;125:1333–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bolaños-Meade J, Reshef R, Fraser R, et al. Three prophylaxis regimens (tacrolimus, mycophenolate mofetil, and cyclophosphamide; tacrolimus, methotrexate, and bortezomib; or tacrolimus, methotrexate, and maraviroc) versus tacrolimus and methotrexate for prevention of graft-versus-host disease with haemopoietic cell transplantation with reduced-intensity conditioning: a randomised phase 2 trial with a non-randomised contemporaneous control group (BMT CTN 1203). Lancet Haematol. 2019March;6(3):e132–e143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sankoh AJ, Li H, D'Agostino RB Sr. Use of composite endpoints in clinical trials. Stat Med. 2014November30;33(27):4709–14. [DOI] [PubMed] [Google Scholar]
- 8.Mehta RS, Holtan SG, Wang T, et al. Composite GRFS and CRFS Outcomes After Adult Alternative Donor HCT. J Clin Oncol. 2020June20;38(18):2062–2076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bolaños-Meade J, Logan BR, Alousi AM, et al. Phase 3 clinical trial of steroids/mycophenolate mofetil vs steroids/placebo as therapy for acute GVHD: BMT CTN 0802. Blood. 2014November20;124(22):3221–7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Solomon SR, Sizemore C, Zhang X, et al. 2017. Current graft-versus-host disease-free survival: a dynamic endpoint to better define efficacy after allogenic transplant. Biol. Blood Marrow Transplant 2017; 23:1208–1214. [DOI] [PubMed] [Google Scholar]
- 11.Holtan SG, Zhang L, DeFor TE, et al. Dynamic Graft-versus-Host Disease-Free, Relapse-Free Survival: Multistate Modeling of the Morbidity and Mortality of Allotransplantation. Biol Blood Marrow Transplant. 2019September;25(9):1884–1889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ruggeri A, Labopin M, Ciceri F, Mohty M, Nagler A. Definitionof GvHD-free, relapse-free survival for registry-based studies: anALWP–EBMT analysis on patients with AML in remission. Bone Marrow Transplant. 2016;51:610–1 [DOI] [PubMed] [Google Scholar]
- 13.Bakal JA, Westerhout CM, Armstrong PW. Impact of weighted composite compared to traditional composite endpoints for the design of randomized controlled trials. Stat Methods Med Res. 2015December;24(6):980–8 [DOI] [PubMed] [Google Scholar]
- 14.Finkelstein DM, Schoenfeld DA. Combining mortality and longitudinal measures in clinical trials. Stat Med. 1999. 18, 1341–1354. [DOI] [PubMed] [Google Scholar]
- 15.Pocock SJ, Ariti CA, Collier TJ, Wang D. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal. 2011. 33: 176–182. [DOI] [PubMed] [Google Scholar]
- 16.Armstrong PW, Westerhout CM. Composite End Points in Clinical Research: A Time for Reappraisal. Circulation. 2017June6;135(23):2299–2307. [DOI] [PubMed] [Google Scholar]
- 17.McCoy CE. Understanding the Use of Composite Endpoints in Clinical Trials. West J Emerg Med. 2018July;19(4):631–634. doi: 10.5811/westjem.2018.4.38383. Epub 2018 Jun 4. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Latouche A, Porcher R. Sample size calculations in the presence of competing risks. Stat Med. 2007December30;26(30):5370–80. doi: 10.1002/sim.3114. PMID: 17955563 [DOI] [PubMed] [Google Scholar]
- 19.Pintilie M Dealing with competing risks: testing covariates and calculating sample size. Stat Med. 2002November30;21(22):3317–24. doi: 10.1002/sim.1271. PMID: 12407674 [DOI] [PubMed] [Google Scholar]
- 20.MacMillan ML, Robin M, Harris AC, et al. A refined risk score for acute graft-versus-host disease that predicts response to initial therapy, survival, and transplant-related mortality. Biol Blood Marrow Transplant. 2015. April;21(4):761–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Holtan SG, Pasquini M, Weisdorf DJ. Acute graft-versus-host disease: a bench-to bedside update. Blood. 2014July17;124(3):363–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Holtan SG, Versluis J, Weisdorf DJ, Cornelissen JJ. Optimizing Donor Choice and GVHD Prophylaxis in Allogeneic Hematopoietic Cell Transplantation. J Clin Oncol. 2021February10;39(5):373–385. [DOI] [PubMed] [Google Scholar]
- 23.Yeshurun M, Weisdorf D, Rowe JM, et al. The impact of the graft-versus-leukemia effect on survival in acute lymphoblastic leukemia. Blood Adv. 2019February26;3(4):670–680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Magenau J, Braun T, Gatza E, Churay T, Mazzoli A, Chappell G, Brisson J, Runaas L, Anand S, Ghosh M, Riwes M, Pawarode A, Yanik G, Reddy P, Choi SW. Assessment of Individual versus Composite Endpoints of Acute Graft-versus-Host Disease in Determining Long-Term Survival after Allogeneic Transplantation. Biol Blood Marrow Transplant. 2019August;25(8):1682–1688. [DOI] [PubMed] [Google Scholar]
- 25.Pasquini MC, Luznik L, Logan B, et al. Calcineurin Inhibitor-Free GVHD Prophylaxis in HCT with Myeloablative Conditioning Regimens and HLA-Matched Donors: Results of the BMT CTN 1301 Progress II Trial, The 2021 TCT Meetings, Volume 27, Issue 3 Supplement, Pages S1–S488 (March 2021) [Google Scholar]
- 26.Soiffer RJ, Kim HT, McGuirk J et al. Prospective, Randomized, Double-Blind, Phase III Clinical Trial of Anti–T-Lymphocyte Globulin to Assess Impact on Chronic Graft-Versus-Host Disease–Free Survival in PatientsUndergoing HLA-Matched Unrelated Myeloablative Hematopoietic Cell Transplantation. J Clin Oncol. 2017December20;35(36):4003–4011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Bayraktar UD, de Lima M, Saliba RM, et al. Ex vivo T cell-depleted versus unmodified allografts in patients with acute myeloid leukemia in first complete remission. Biol Blood Marrow Transplant. 2013June;19(6):898–903 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Martino R, Rovira M, Carreras E, Solano C, et al. Severe infections after allogeneic peripheral blood stem cell transplantation: a matched-pair comparison of unmanipulated and CD34+ cell-selected transplantation. Haematologica. 2001October;86(10):1075–86. [PubMed] [Google Scholar]
- 29.Gray test Gray RJ. A Class of K-Sample Tests for Comparing the Cumulative Incidence of a Competing Risk. The Annals of Statistics 1988;16:1141–54 [Google Scholar]
- 30.Eapen M, Rocha V, Sanz G, et al. Effect of graft source on unrelated donor haemopoietic stem-cell transplantation in adults with acute leukaemia: a retrospective analysis. Lancet Oncol. 2010July;11(7):653–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Fine and Gray model Fine JP, Gray RJ. A proportional hazards model for the subdistribution of competing risk. J Am Stat Assoc 1999;94:496–509. [Google Scholar]
- 32.US FDA. Multiple endpoints in clinical trials guidance for industry, January2017. Retrieved from FDA:https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM536750.pdf [Accessed on 16 March 2017]. [Google Scholar]
- 33.Lee SJ, Onstad L, Chow EJ, et al. Patient-reported outcomes and health status associated with chronic graft-versus-host disease. Haematologica. 2018; 103(9):1535–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sankoh AJ, Li H, D'Agostino RB Sr. Composite and multicomponent end points in clinical trials. Stat Med. 2017December10;36(28):4437–4440 [DOI] [PubMed] [Google Scholar]
- 35.MacMillan ML, Robin M, Harris AC, et al. A refined risk score for acute graft-versus-host disease that predicts response to initial therapy, survival, and transplant-related mortality. Biol Blood Marrow Transplant. 2015April;21(4):761–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Holtan SG, Pasquini M, Weisdorf DJ. Acute graft-versus-host disease: a bench-to-bedside update. Blood. 2014July17;124(3):363–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Holtan SG, Versluis J, Weisdorf DJ, Cornelissen JJ. Optimizing Donor Choice and GVHD Prophylaxis in Allogeneic Hematopoietic Cell Transplantation. J Clin Oncol. 2021February10;39(5):373–385. [DOI] [PubMed] [Google Scholar]
- 38.Yeshurun M, Weisdorf D, Rowe JM, et al. The impact of the graft-versus-leukemia effect on survival in acute lymphoblastic leukemia. Blood Adv. 2019February26;3(4):670–680. [DOI] [PMC free article] [PubMed] [Google Scholar]




