Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Sep 1.
Published in final edited form as: Best Pract Res Clin Haematol. 2023 May 24;36(3):101479. doi: 10.1016/j.beha.2023.101479

Endpoint Selection and Evaluation in Hematology Studies

Ruta Brazauskas 1, Mary Eapen 2, Tao Wang 3
PMCID: PMC10979628  NIHMSID: NIHMS1974081  PMID: 37611997

Abstract

Observational studies and clinical trials in hematology aim to examine treatments for blood disorders. The outcomes being studied must address the goals of the study and provide meaningful information about treatment course, disease progression, describe patients’ survival experience and quality of life. Endpoints are the specific measures of these outcomes, and much consideration should be given to their selection. In this review, we describe the outcomes and endpoints frequently used in studying hematologic diseases and provide general guidelines for their statistical analysis. The main focus is on clinical outcomes which are commonly used in establishing treatment safety and efficacy. We also briefly discuss the role surrogate and composite endpoints play in hematology studies. The importance of patient reported outcomes to comprehensive assessment of the treatment effectiveness is highlighted. Provided practical considerations for choosing primary and secondary endpoints may be helpful in designing hematology clinical trials.

Keywords: hematology, clinical trials, endpoints, outcomes, surrogate endpoints, composite endpoints, primary endpoints, patient reported outcomes

Introduction

Observational studies and clinical trials in hematology aim to examine an array of treatments for blood disorders including chemotherapy, radiation therapy, hematopoietic cell transplantation, and, more recently, targeted treatments, gene therapy and immunotherapy. The latter includes chimeric antigen receptor-T cell (CAR T) therapy and immunomodulating drugs. Some treatment modalities are more common in patients diagnosed with non-malignant hematologic disorders while others are frequently used to treat malignant diseases. The relevant outcomes being studied must address the goals of the study and provide meaningful information to physicians, patients, and policymakers. The outcomes that are reported ought to reflect treatment course, disease progression, survival and reflect patients’ social, emotional and functional well-being [1]. Endpoints are the specific measures of these outcomes, and much consideration should be given to their selection.

Hematologic malignancies such as leukemia, lymphoma, and multiple myeloma pose an imminent threat to a patient’s life. The goals of the treatment for hematologic malignancies include cure (whenever possible), achieving and prolonging remission, treating existing symptoms, preventing complications, and improving patients’ quality of life. Most non-malignant hematologic diseases are not life-threatening, but they may cause symptoms such as fatigue, pain, infections, and blood loss. Treatments for such diseases are often aimed at the reduction of these symptoms.

In this review, we describe the clinical endpoints frequently used in studying hematologic diseases and provide general guidelines for their statistical analysis. Clinical endpoints are often used in establishing treatment safety and efficacy. Non-clinical endpoints such as biomarkers and their correlation to clinical outcomes may be harder for the patients to understand.

Nevertheless, non-clinical endpoints are also associated with disease manifestation and may influence treatment targets and decision making [1]. While some outcomes studied are common across malignant and non-malignant hematologic studies (e.g., survival and patient reported outcomes), others are disease or treatment specific (e.g., neurologic toxicity after CAR T cell therapy). We also briefly discuss composite and surrogate endpoints along with considerations for choosing primary and secondary endpoints in a clinical trial design framework.

Treatment Safety Outcomes

Phase I and II clinical trials usually focus on the safety profile of a new treatment. During the early phases of the trials, researchers determine whether a new treatment is safe, identify its side effects, and establish the best dose and schedule of the new treatment [2]. The latter step involves finding a dose whose escalation would result in undesirable side effects surpassing the threshold of acceptable toxicity. Given the nature of the investigation, most early phase studies focus on adverse events and toxicity caused by the experimental treatment. Most studies assess the safety of anticancer treatments by using the National Cancer Institute’s (NCI’s) Common Terminology Criteria for Adverse Events (CTCAE) reporting system [3]. While many adverse events in the CTCAE reflect abnormal clinical and/or laboratory values, such as anemia, thrombocytopenia, or neutropenia, symptoms such as fatigue, pain, loss of appetite, anxiety and depression are designated to represent symptomatic toxicities. The CTCAE reporting system provides guidelines for evaluating the presence of these complications and their grading.

As hematologic disease treatments evolve, there is a need for monitoring and reporting certain adverse events which may be specific to a particular treatment modality which has not been described in CTCAE. For example, during initials CAR T studies, it was observed that immune activation induced by the treatment may result in severe complications such as cytokine release syndrome (CRS) and the immune cell-associated neurotoxicity syndrome (ICANS) [4]. The features of these complications after CAR T infusion were not adequately reflected in CTCAE reporting system used at the time of early CAR T studies. To avoid ambiguity in evaluating CRS and ICANS and to make their reporting uniform, a new set of definitions and grading rules for CRS and neurotoxicity was adopted in 2019 [4]. Using an established criteria for evaluating toxicities allows researchers to compare the safety of different products across multiple studies.

Statistical considerations.

When designing early stage trials with toxicity being their primary outcome, it is important to identify adverse events to be monitored and the toxicity threshold which will be used for stopping the trial. The time window which will be used to observe possible toxicities is usually determined based on clinical experience. Safety assessments are done periodically by monitoring the number of adverse events relative to the number of enrolled patients to ensure that the safely threshold is not crossed.

Descriptive statistics are often used to summarize the frequency and severity of the adverse events. In addition, a cumulative incidence function (CIF) may be used to depict the cumulative incidence of adverse events over time [5]. This approach provides an accurate estimate of the adverse event incidence while accounting for competing risks (e.g., death before toxicity constitutes a competing risk). Gray’s test can be used to compare two or more cumulative incidence curves [6]. Factors associated with toxicity after the initiation of the therapy can be evaluated using the Fine-Gray subdistribution hazard model which accounts for competing events [7].

Efficacy Outcomes in Malignant Hematology Studies

A typical treatment regimen for most malignant hematologic diseases consists of chemotherapy with or without immunotherapy, and the treatment duration varies dependent on disease type and severity. Patients’ recovery could be lengthy and complicated as they may experience a number of events which may or may not be related to the treatment per se. The objectives of the treatment include treating cancer, preventing or alleviating treatment side effects, preventing a return of cancer, increasing patients’ survival, and improving their quality of life. Considering these objectives, several endpoints such as response to treatment, relapse, disease progression, and survival are often examined in hematology studies to evaluate treatment efficacy. Most of these endpoints are time-to-event outcomes and survival analysis techniques are required to account for possible censoring or competing risks. Definitions of these time-to-event endpoints are not uniform and may vary across different studies/clinical trials (e.g., best response to treatment). When evaluating an outcome, it is important to identify the potential censoring or competing risk mechanism based on the definition of the outcome. Table 1 summarizes clinical outcomes considered in hematology studies along with statistical considerations such as censoring or competing risks [[8],[9],[10],[11]]. Note that appropriately defining the time scale and time origin in time-to-event studies is very important though Table 1 does not specify the time origin. In most clinical studies, the time origin is the starting time of the treatment being studied. Alternatively, the time origin can be the time of diagnosis or, in randomized trials, the time from randomization.

Table 1.

Endpoints reported in malignant hematologic disease studies

Outcome Definition
Overall survival (OS) Time to death from any cause
Time to treatment failure (TTF) Time to discontinuation of treatment for any reason, including disease progression or recurrence, treatment toxicity and death
Time to progression or disease recurrence (TTP) Time to disease progression or disease recurrence
Progression free survival (PFS) Time to disease progression or death, whichever comes first.
Disease free survival (DFS) Time to relapse. DFS applies only to patients who achieve remission.
Overall response rate (ORR) Proportion of patients who respond to therapy either partially or fully
Duration of response (DOR) Time to disease progression or death in patients who achieve complete or partial response

Statistical considerations.

Time-to-event outcomes are common in hematology studies, and censored observations are present in most clinical studies. Survival analysis techniques must be used to properly account for censoring or competing risks when they are present [5]. Note that when analyzing time-to-event outcomes defined in Table 1, patients who are alive without experiencing the event of interest are censored at last contact.

Time-to-event outcomes in hematology studies typically fall into one of two types which require different analysis techniques. The first type of data deals with only one possible failure type. For example, when studying overall survival, death is the event of interest and each patient either experiences the event or is censored at the end of the follow-up without the event. DFS and PFS also fall into the category of simple survival outcomes. This type of events can be graphically presented by Kaplan–Meier curves which depict estimates of event probabilities over time [5]. Log-rank test is commonly used for comparing the Kaplan–Meier curves between two or more groups of patients [5]. The second type of data is competing risks data. Competing risks data arises when subjects can potentially fail from multiple causes and experiencing failure from one cause precludes the subject from experiencing any other types of events. Examples of competing risks include relapse and death in remission. Relapse is not observed for those who died in remission before they could experience a relapse. In this case, death in remission prior to relapse is a competing risk for relapse. Here, the proper summary statistic is the cumulative incidence function [5]. Cumulative incidence curves should be produced for all competing risks as that will allow investigators to form a cohesive picture of the treatment outcomes. Gray’s test is used for comparing cumulative incidence curves between two or more groups of patients [6].

Regression models play an important role in time-to event data analysis when the goal is to compare two treatments while adjusting for other risk factors or identify predictors associated with the outcome being studied. For survival outcomes such as OS and DFS, multivariable regression analyses are usually performed using Cox proportional hazards model [12]. For outcomes with competing risks, both the Cox proportional hazard model and the Fine-Gray subdistribution hazard model can be used in practice [[7], [12]]. When one is interested in investigating a direct causal or biological mechanism of a risk factor on the cause specific hazard of the event, the Cox model is more suitable. If the research interest is on prediction of the cumulative incidence probabilities of an event over time, then the Fine-Gray model is more adequate [[13], [14]].

When making conclusions about time-to-event outcomes at any given time point, it is essential that any such conclusion or analysis only uses the information known by that time point, and not any information which becomes available in the future [[5], [15]]. The outcomes such as overall response rate (ORR), best response, and duration of response (DOR) have to be analyzed by adhering to this fundamental principle. One way to summarize ORR would be presenting the proportion of patients who have partial or full response to therapy at specific time points, say at 6 months, 1 year and 2 years after start of treatment. These estimates along with the statistics about patients who died or were lost to follow-up will provide investigators with a complete picture of treatment effectiveness. In describing the ORR, a cumulative incidence function can be used. However, that requires a clear definition of the outcome whose incidence is being evaluated. For example, defining the outcome as time to achieving at least partial response (e.g., partial or complete remission) while treating death as competing risk will produce a valid estimate of the cumulative incidence of having a response, partial or complete, to therapy by a particular time point. If refinements are needed, an additional cumulative incidence calculation for complete response (e.g., complete remission) can be produced. Note that the duration of response can be described only among patients who have achieved the response in question. In this case, duration of response becomes a standard time-to-event endpoint which should be analyzed under the competing risks framework. For example, among patients who have achieved complete remission, the duration of response can be summarized via the CIF of relapse with the time origin being the moment when complete response to therapy was achieved to the time of relapse, while death is treated as a competing risk.

Outcomes Specific to Hematopoietic Stem Cell Transplantation

Hematopoietic cell transplantation (HCT) is used to treat malignant and nonmalignant diseases affecting blood-forming tissue such as the bone marrow or the cells of the immune system. During an HCT, chemotherapy with or without radiation are used to suppress patient’s diseased bone marrow responsible for the production of blood cells. Then healthy hematopoietic stem cells from the patient’s own body (autologous transplant) or from a related or unrelated donor (allogeneic transplant) are infused into the blood stream where they travel to the bones and rebuild bone marrow. Typical outcomes considered in HCT studies are listed in Table 2. Events such as neutrophil and platelet recovery reflect early outcomes of the transplantation. Other outcomes including relapse, disease-free survival, and overall survival are similar to those discussed in the previous section and reflect the efficacy of the treatment in regard to disease cure and/or survival. Acute and chronic graft-versus-host disease (aGVHD and cGVHD, respectively) are complications specific to HCT which occur when the donor cells (the graft) recognize the patient’s cells (the host) as foreign and mount an immune mediated response. Both aGVHD and cGVHD can have a debilitating effect on patient’s health and are important outcomes to consider in HCT research.

Table 2.

Endpoint reported in hematopoietic stem cell transplantation studies

Outcome Definition Comments
Neutrophil recovery Time to neutrophil recovery Patients are censored at last contact; death or second transplant without neutrophil recovery are competing risks
Platelet recovery Time to platelet recovery Patients are censored at last contact; death or second transplant without platelet recovery are competing risks
Primary graft failure Failure to achieve neutrophil recovery or low donor chimerism (<5%) by Day 28 after infusion of graft Number of patients who fail to achieve neutrophil recovery or donor chimerism ≥5% within 28 days post HCT
Secondary graft failure Initial donor engraftment followed by graft loss (sustained decline in neutrophil counts or loss of donor chimerism to <5%) Secondary graft failure applies only to patients who achieve neutrophil recovery or donor chimerism ≥5%
Grade II-IV acute graft-versus-host disease (aGVHD II-IV) Time to onset of grade II-IV acute GVHD Patients are censored at last contact; death or second transplant without grade II-IV aGVHD are competing risks
Grade III-IVacute graft-versus-host disease (aGVHD III-IV) Time to onset of grade III-IV acute GVHD Patients are censored at last contact; death or second transplant without grade III-IV aGVHD are competing risks
Chronic graft-versus-host disease (cGVHD) Time to onset of chronic GVHD Patients are censored at last contact; death or second transplant without cGVHD are competing risks
Relapse/progression Time to relapse Patients surviving without recurrence or progressive disease are censored at last contact. NRM is a competing risk
Non-relapse mortality (NRM) Time to death while being in remission Patients are censored at last contact. Relapse is a competing risk
Disease-free survival (DFS) Time to relapse or death. DFS applies only to patients who achieve remission Patients are censored at last contact
Overall survival (OS) Time to death Patients are censored at last contact
*

For patients who receive a transplant for malignant disease and are not in complete remission or do not achieve complete remission by 28 days post-transplant, an accepted approach is to set the time to relapse at one day (Day +1) post-transplant. The interval for a patient who does not relapse or have persistent disease post-transplant is equal to the time from date of transplant to date of death or last contact.

Statistical considerations.

All time-to-event outcomes encountered in HCT studies are handled using survival analysis methods. Careful consideration must be given to the time origin (usually the time of HCT), patient censoring and competing risks [16]. Note that when a patient undergoes second transplantation, the observation window for the outcomes related to the first HCT closes because subsequent events cannot be attributed to the first HCT. Therefore, second transplant constitutes a competing risk for the events such as neutrophil and platelet recovery and acute and chronic GVHD. Additional post-HCT procedures such as donor lymphocyte infusion or boost may be considered as competing risks as well. Outcomes such as DFS and OS are summarized using Kaplan-Meier curves. Log-rank test is used for comparing the DFS and OS probabilities between two or more groups of patients. Cox proportional hazards model is used to evaluate the relationship between predictors and the outcome of interest. Outcomes where the main event is subject to competing risks include neutrophil and platelet recovery, primary and secondary graft failure, acute and chronic GVHD, NRM and relapse. These outcomes are depicted using cumulative incidence curves [5]. Gray’s test is used to for comparing cumulative incidence probabilities between two or more groups of patients [6]. Either a Cox regression model or Fine-Gray model can be used to evaluate the relationship between predictors and the outcome of interest [[7], [12], [14]].

Composite Endpoints

Composite endpoints, in which multiple outcomes are combined, are frequently analyzed in observational and prospective studies [17]. It is common to consider a composite endpoint when the focus is on rare events which often require lengthy and expensive trials in order to observe sufficient number of events needed for an adequate statistical power. In these instances, it may be advantageous to define a composite endpoint which aggregates the total benefit of the therapy aimed at preventing several important albeit infrequent clinical events [18]. There have been numerous studies that combine multiple outcomes (such as death and major morbidity events) into a single measurement of treatment effectiveness [1].

The most common composite outcome in hematology studies is event-free survival. Notable examples of event-free survival include disease-free survival (DFS), relapse-free survival (RFS), progression-free survival (PFS) [19]. Note that, per the National Cancer Institute’s dictionary of cancer terms, DFS and RFS are synonymous [19]. Although the definition of PFS is similar, its concept is most applicable to patients who are not in complete remission and typically applied to studies of lymphoma. However, the definition of the composite outcome is dependent on the study and is not uniform across studies. It could also vary dependent on whether they represent primary or secondary endpoints. Primary composite endpoints relate to the most important questions in the trial or study. For example, if the main objective is keeping patient alive and in remission, then DFS may be an appropriate composite endpoint to consider. Secondary composite endpoints address secondary objectives in the trial. For example, a drug designed to induce disease remission might also have measures reflecting improvements in patient’s quality of life. In this case, a composite endpoint would be death and heavy symptom burden.

HCT is an area where composite endpoints play a prominent role in measuring the effectiveness of allogeneic transplantation, especially when complications such as acute and chronic graft-versus-host disease (GVHD) are of interest. Table 3 lists several composite endpoints developed for transplantation trials [[20], [21], [22], [23],[24]].

Table 3.

Composite outcomes in hematopoietic stem cell transplantation studies (adapted from [19]).

Outcome Definition Comments
GVHD-free survival Time to first occurrence of grade III-IV acute GVHD, chronic GVHD requiring systemic treatment or death, whichever occurs first Patients are censored at last contact
GVHD-free, relapse-free survival (GRFS) Time to first occurrence of grade III-IV acute GVHD, chronic GVHD requiring systemic treatment, relapse, or death, whichever occurs first Patients are censored at last contact
Refined GVHD-free, relapse-free survival (refined GRFS) Time to first occurrence of grade III-IV acute GVHD, severe GVHD, relapse, or death, whichever occurs first Patients are censored at last contact.
Chronic GVHD-free, relapse-free survival (CRFS) Time to first occurrence of chronic GVHD requiring systemic therapy, relapse or death, whichever occurs first Patients are censored at last contact
Current GVHD-free, relapse-free survival (CGRFS) The probability of being alive, disease-free without moderate-severe chronic GVHD at a particular time point after HCT CGRFS includes resolution of chronic GVHD and may decrease and later increase as a function of time. Analyzed by a multistate model.
Dynamic GVHD-free, relapse-free survival (DGRFS) The probability of being alive, disease-free without grade III-IV aGVHD and without chronic GVHD requiring systemic therapy at a particular time point after HCT DGRFS includes resolution of relapse and acute/chronic GVHD and may decrease and later increase as a function of time. Analyzed by a multistate model.

Statistical considerations.

Use of a composite endpoint as a primary outcome measure in randomized trials may increase statistical efficiency and precision and lead to smaller and shorter trials [[17], [1]]. Using a single composite endpoint can also help to avoid the multiple testing issue inherent when multiple endpoints are assessed separately [[18], [25]]. Combining an outcome with its competing risks may help avoid the competing risk problem and simplify the analysis or interpretation. However, composite endpoints should only be used when each endpoint in the composite is meaningful, both to the trial purpose and to the patient. It is also important to quantify the components of a composite endpoint for a better understanding of treatment response. For example, an HCT study may see a reduction in the incidence of relapse only because of an increase in NRM. Individual components of a composite endpoint should be reported (e.g., as secondary endpoints) together with the overall result. In some cases, there may be insufficient power to determine the treatment effect for each component. Multiple testing problem may arise when multiple individual components of a composite endpoint are targeted as primary outcomes. In this case, all the components must be prespecified in the protocol as separate endpoints with an appropriate analysis plan. Although in most cases the components of a composite endpoint are given equal weight, modified composite endpoints consisting of unequally weighted or ranked components may be considered [[26], [27], [28]].

A composite endpoint should be developed at the time the clinical trial protocol is written. Rules for developing composite endpoints depend on substantial experience in the clinical trial setting and empirical evidence of the following: the components are of similar importance to patients, the components are equally likely to occur with similar frequency, and the components are likely to have similar treatment effects [1]. Due to the difficulty of interpreting the overall treatment effect on a composite endpoint, the use of a composite endpoint for confirmatory clinical trials when large variations are predicted to exist between its components is discouraged [29].

Surrogate endpoints

Clinical trials frequently rely on the utilization of surrogate endpoints serving as substitutes for clinical endpoints [30]. Surrogate endpoints are clinical outcomes or markers (laboratory measurements, physical signs or other measures) that are known or are likely to predict clinical benefit of the treatment [31]. Surrogate endpoints are usually selected to allow for a more rapid assessment of the treatment success in clinical trials [32]. For example, in many oncology studies, the overall survival is the most important outcome reflecting effectiveness of the treatment [8]. However, it may be difficult to assess the relationship between treatment and OS because of long study durations and the impact of subsequent therapies on the outcome [33]. Surrogate endpoints such as response to treatment, progression free survival or disease-free survival are often used as proxies for OS to speed up prospective clinical studies. A clear advantage of using these outcomes instead of OS is that these events often occur earlier and are unaffected by use of later therapies [8].

Another common set of surrogate endpoints are biomarkers. Biomarkers are laboratory, physical measurements or images that can be evaluated and used to indicate a biological process or drug response [34]. Using them may improve efficiency of the study because biomarkers may respond more quickly to medical interventions and are easier to assess and quantify than true clinical endpoints. It allows investigators to see useful intermediate results from the studies that are of shorter duration and require smaller numbers of study participants. The use of biomarkers is particularly promising in translational research which provide a bridge from basic research to clinical practice. Assessing changes in biomarkers allows for relatively rapid feedback as to whether an intervention is worth testing in a larger trial with clinical endpoints. Thus, using biomarkers in phase II trials can inform the prioritization of phase III trials [35]

U.S. Food and Drug Administration (FDA) has published the surrogate endpoint table which provides valuable information on endpoints that may be considered and discussed with the FDA for individual drug development programs [36]. Surrogate endpoints used in hematology studies include but are not limited to hematologic response rate, complete remission rate, event-free survival, progression-free survival, minimal residual disease rate. Robust surrogate outcomes have the potential to reduce the financial and time burden associated with clinical trials and drug development. However, this benefit is only achieved if the surrogate is thoroughly validated in establishing true surrogacy for meaningful clinical outcomes [37]. Alternatively, surrogates may also be used when not validated in conditions that are rare, fatal, and with few treatment options [37]. In both cases, studies using surrogates should be accompanied by the trials measuring the endpoints the surrogate was used for. Recent studies have challenged the assumption that frequently used surrogates can accurately predict the effect of treatments on the most important endpoints such as overall survival and improve quality of life [38]. Of 55 regulatory approvals made by the FDA based on the improvement in surrogates between 2009 and 2014, 65% of them had no trial-level validation studies [39]. In the trials where a surrogate endpoint was studied, only 16% of them correlated highly with survival [39]. Even using PFS as a clinical endpoint is questionable because prolonged PFS does not always result in an extended survival [40]. Validation of PFS as a surrogate endpoint needs to be done for each indication and each intervention [40].

Primary and Secondary Endpoints

In order to get a comprehensive picture of the treatment effect, it is important to use multiple endpoints to describe changes in patients’ health resulting from the intervention [1]. Using multiple endpoints can provide valuable information about treatment success, side effects and quality of life and their usage is increasingly prevalent in hematology trials [1]. Clinical endpoints can be categorized depending on the trial goals and objectives. In general, clinical endpoints can be classified as primary, secondary, or exploratory depending on their relationship with the main research question [8]. While primary endpoints are efficacy measures that address the question directly, secondary and exploratory endpoints may be utilized to demonstrate additional effects, investigate disease mechanism, or explore less frequently occurring outcomes [1].

In hematology trials, the survival is frequently regarded as the gold standard primary clinical endpoint for determining treatment efficacy [33]. It provides an objective measurement of the treatment effect, and its abstraction will rarely be subject to bias [41]. However, lengthy studies required to assess the overall survival is a major drawback of having OS as the primary outcome of the trial [42]. Besides, using OS can be difficult for new therapies in hematologic malignancies. Studies about novel treatments for hematologic malignancies (e.g., CAR T) often focus on response rate as opposed to OS. Therefore, leading clinicians have worked with the FDA on establishing non-OS endpoints like progression-free survival (PFS) and response rate (RR), which may be used as appropriate endpoints for the regulatory approval of treatments for hematologic malignancies [[43],[44]]. Currently, FDA approvals for new drug or novel hematologic malignancy indications are often based on endpoints other than OS such as PFS or response rate. A 2017 study reviewed 63 FDA approvals granted between January 2002 and July 2015, which involved 35 drugs and 16 hematologic malignancies [33]. Of the 63 approvals, 71% were based on response rate, and 27% included progression-free survival or time to progression, and only 1 approval included OS. Expert recommendations for choosing diseases-specific or therapy-specific endpoints provide useful guidelines for determining primary and secondary outcomes [[11], [45], [46]].

Statistical considerations.

The choice of the primary endpoint is important because it will be used as a basis for trial planning and sample size estimation. Sometimes, several primary endpoints may be of interest. Here, significant treatment effect against any one of the endpoints may be taken as evidence of efficacy [1]. This approach may be useful in diseases that have multiple sequelae, where improvement in any pre-specified endpoint is clinically meaningful even in the absence of improvement in any other [25]. Because the risk of type I error increases with every additional endpoint assessed, appropriate statistical adjustment for multiple testing is generally required in order to control the overall risk of a false positive trial result. Regulatory agencies are particularly keen about this issue and have provided guidance on managing this risk [[25]; [47]]. Multiple primary endpoints become ‘co-primary’ if an effect on multiple outcomes is required to proof efficacy. In this case, there is no need to adjust for multiple testing when co-primary endpoints are used. Conversely, the power of a study is typically reduced by the requirement to demonstrate significant efficacy against more than one endpoint, unless those endpoints are highly correlated. Another option is to use composite endpoints as we have discussed before [1].

The statistical analysis plan should describe the planned primary and secondary analysis in detail noting whether the endpoint will be analyzed as a continuous variable (mean scores), dichotomous variable (success or failure), or a graded response; the primary and secondary endpoints; adjustments for multiple testing to control the overall type I error rate while testing primary and secondary endpoints; and the specific statistical methods planned [25].

Patient Reported Outcomes

The US Food and Drug Administration defines Patient-Reported Outcomes (PRO) as a measurement of any aspect of a patient’s health status that comes directly from the patient, without the interpretation of the patient’s responses by a clinician or anyone else [48]. As such, PROs provide a patient’s perspective of treatment effectiveness. The goal of including PROs as a clinical endpoint is to complement the results of quantitative endpoints such as OS [8]. Traditional survival, disease progression, and physiological outcomes reflect physical benefits of the treatment while a patient’s perspective is helpful for a comprehensive assessment of the benefits of the treatment under investigation. Incorporating PRO into hematologic research is embraced by researchers and the funding and regulatory agencies. In the US, medical community and drug developers should follow the guidance provided by the FDA, which reflects the organization’s view on PRO instruments as effective endpoints in clinical trials [48]. Several high-quality guidelines on choosing PRO endpoints have been published over the last two decades [49]. The 2013 CONSORT-PRO extension provides guidance for reporting results of randomized clinical trials in which PROs are primary or secondary endpoints [50]. For example, PCORI has issued specific methodological standards that could help investigators to set up patient-centered outcomes in research studies [51]. Also, the International Society for Quality of Life Research has published recommended standards for PRO measures which can be used in patient centered outcomes as well as comparative effectiveness research to guide the development of high-quality studies in this area [[52], [53]].

The European Medicines Agency (EMA) also highly values PROs and has recently issued an important guidance document that covers general aspects of the use of PRO endpoints in oncology studies [54]. The European Hematology Association Scientific Working Group (EHA SWG) “Quality of Life and Symptoms” has published comprehensive guidelines for patient-reported outcome assessment in hematology [55]. This comprehensive document discusses methodological issues in assessing PROs, reviews available PRO instruments, and gives practical recommendations for measuring PROs. PRO components tailored for patients with different hematological disorders or specific treatments (e.g., hematopoietic stem cell transplantation; anticoagulant therapy) are also examined. Special attention is paid to the pediatric PROs measures being used in hematology trials.

Robust methodology is a key issue in patient-centered outcomes research in the field of hematology. In line with other aspects of clinical trial design, PRO study objectives need to be justified and viewed in light of realistic expectations. Careful thought must go into designing and implementing PRO measures in the oncology clinical trials aiming to investigate a predefined hypothesis. A variety of PRO instruments can be used to assess a range of concepts including single item symptoms (e.g., fatigue or anxiety), symptom scales (e.g., disease symptom scale consisting of multiple symptoms), functional scales (e.g., physical function), role function (e.g., ability to work and carry out daily activities), and multi-dimensional complex concepts (e.g., health related quality of life).

PROs can be used to inform primary, co-primary, or secondary endpoints in the trial [48]. Implementation of PRO as co-primary outcome can provide a useful measurement of the impact of intervention in a way that is relevant to clinicians and patients [56]. When planning a clinical trial, key steps for incorporating PROs start with the endpoint definition followed by a selection of an appropriate PRO instrument. While planning clinical trials or longitudinal observational studies, the frequency and timing of the PRO assessment should be aligned with the study objective. A baseline assessment should be included as a reference point whenever possible as it will allow researchers to evaluate changes in patient’s quality of life or symptoms as their treatment progresses. Assessment frequency should take into account the administration schedule of the treatment being studied. Different PRO instruments may be administered at different times if such a schedule is aligned with study timeline and objectives.

Statistical considerations.

The statistical analysis for PRO endpoints is similar to those for any other endpoints used in medical product development or new treatment evaluation. Patient reported outcome measures usually result in either binary responses (e.g., experiencing moderate or severe fatigue), Likert-type numeric rating scale or verbal rating scale (e.g., none, very mild, moderate, severe, very severe) or continuous measurements. The continuous scores are often obtained from the questionnaires where responses to multiple questions are aggregated into a subscale or summary score. The widely used methods for multivariable analysis include linear regression for continuous responses and logistic regression for binary and ordinal outcomes. Given that most PRO assessments are done multiple times over the course of the study, the repeated measures analysis is often needed [57]. Generalized linear mixed models may be employed to properly handle the complexities of the longitudinal PRO data.

Conclusions

Both – prospective and retrospective – studies are critical to the development of medical interventions as they assess the safety and efficacy of each new treatment. Carefully chosen endpoints reflect the credibility and validity of the study and generate evidence used to guide clinical decision making. Many trials are designed to examine the effect of a treatment on multiple aspects of the disease and patients’ wellbeing. Therefore, trials involving multiple endpoints are common. Selection of primary, secondary, and exploratory endpoints must take into account study objectives and feasibility of the outcome assessment. In some hematology oncology studies, using surrogate or composite endpoints may be justified to have sufficient statistical power in a trial with shorter duration and fewer patients. Last but not least, patient reported outcomes contribute to the holistic interpretation and comprehensive assessment of the benefits of the treatment under investigation. Qualitative endpoints have become important endpoints for assessing clinical benefit and are embraced by research, funding, and regulatory agencies. Understanding the range of endpoints available for use and aligning them with the study objectives will lead to high quality clinical trial design. Appropriate statistical analysis is also crucial for making correct inference and conclusions.

Practice Points

  • Outcomes being studied must address the trial objectives and should be meaningful to clinicians and patients

  • Due to the need to evaluate treatment effect on multiple aspects of the disease and patient wellbeing, trials involving multiple endpoints are common

  • Clinical outcomes in combination with patient reported outcomes will provide comprehensive assessment of the benefits of the treatment under investigation

  • Statistical aspects of the primary and secondar endpoint selection and analysis should be considered when planning a trial

Research Agenda

  • New endpoints emerge as new therapies for hematologic diseases are being developed

  • Definitions and statistical analysis methods for new endpoints need to be established

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of Interest

The authors have no conflicts of interests to disclose.

Contributor Information

Ruta Brazauskas, Division of Biostatistics, Medical College of Wisconsin, Address: 8701 Watertown Plank Road, Milwaukee, WI 53226, USA.

Mary Eapen, Division of Hematology/Oncology, Department of Medicine, Medical College of Wisconsin, Address: 8701 Watertown Plank Road, Milwaukee, WI 53226, USA.

Tao Wang, Division of Biostatistics, Medical College of Wisconsin, Address: 8701 Watertown Plank Road, Milwaukee, WI 53226, USA.

References

  • [1].McLeod C, Norman R, Litton E, Saville BR, Webb S, Snelling TL. Choosing primary endpoints for clinical trials of health care interventions. Contemp Clin Trials Commun 2019; 16:100486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Cook N, Hansen AR, Siu LL, Abdul Razak AR. Early phase clinical trials to identify optimal dosing and safety. Mol Oncol 2015; 9: 997–1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].National Cancer Institute’s (NCI’s) Common Terminology Criteria for Adverse Events (CTCAE): https://ctep.cancer.gov/protocoldevelopment/electronic_applications/ctc.htm Accessed 12/01/2022
  • [4].Lee DW, Santomasso BD, Locke FL, Ghobadi A, Turtle CJ, Brudno JN et al. ASTCT Consensus Grading for Cytokine Release Syndrome and Neurologic Toxicity Associated with Immune Effector Cells. Biol Blood Marrow Transplant 2019; 25: 625–38. [DOI] [PubMed] [Google Scholar]
  • [5].Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data (2nd Edition). Springer-Verlag, New York, 2003. [Google Scholar]
  • [6].Gray RJ. A Class of KK-Sample Tests for Comparing the Cumulative Incidence of a Competing Risk. Ann. Statist 1988; 16: 1141 – 54. [Google Scholar]
  • [7].Fine JP, Gray RJ A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 1999; 94: 496–509. [Google Scholar]
  • [8].Delgado A, Guddati AK. Clinical endpoints in oncology - a primer. Am J Cancer Res 2021; 11: 1121–31. [PMC free article] [PubMed] [Google Scholar]
  • [9].Villaruz LC, Socinski MA. The clinical viewpoint: definitions, limitations of RECIST, practical considerations of measurement. Clin Cancer Res 2013; 19: 2629–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Garnett SA, Martin M, Jerusalem G, Petruzelka L, Torres R, Bondarenko IN et al. Comparing duration of response and duration of clinical benefit between fulvestrant treatment groups in the CONFIRM trial: application of new methodology. Breast Cancer Res Treat 2013; 138:149–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Li Y, Hwang W-T, Maude SL, Teachey DT, Frey NV, Myers RM et al. Statistical Considerations for Analyses of Time-To-Event Endpoints in Oncology Clinical Trials: Illustrations with CAR-T Immunotherapy Studies. Clin Cancer Res 2022; 28: 3940–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Cox D. Regression models and life tables (with discussion). J Royal Stat Soc - Series B 1972; 34: 187–220. [Google Scholar]
  • [13].Austin PC, Fine JP. Practical recommendations for reporting Fine-Gray model analyses for competing risk data. Stat Med 2017; 36: 4391–400 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Kim S, Fang X, and Ahn KW. The analysis of multiple types of outcomes for hematopoietic cell transplantation. Best Practice & Research Clinical Haematology 2023; In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Othus M, Zhang MJ & Gale RP. Clinical trials: design, endpoints and interpretation of outcomes. Bone Marrow Transplant 2002; 57: 338–42. [DOI] [PubMed] [Google Scholar]
  • [16].Iacobelli S; EBMT Statistical Committee. Suggestions on the use of statistical methodologies in studies of the European Group for Blood and Marrow Transplantation. Bone Marrow Transplant. 2013; 48 Suppl 1: S1–37. [DOI] [PubMed] [Google Scholar]
  • [17].Freemantle N, Calvert M, Wood J, Eastaugh J, Griffin C. Composite outcomes in randomized trials: greater precision but with greater uncertainty? JAMA 2003; 289: 2554–9. [DOI] [PubMed] [Google Scholar]
  • [18].Sankoh AJ, Li H, D’Agostino RB. Use of composite endpoints in clinical trials. Stat Med 2014; 27: 4709– 14. [DOI] [PubMed] [Google Scholar]
  • [19].Kim HT, Logan B, Weisdorf DJ. Novel composite endpoints after allogeneic hematopoietic cell transplantation. Transplant Cell Ther 2021; 27: 650–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Mehta RS, Holtan SG, Wang T, Hemmer MT, Spellman SR, Arora M et al. Composite GRFS and CRFS outcomes after adult alternative donor HCT. J Clin Oncol 2020; 38: 2062–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Ruggeri A, Labopin M, Ciceri F, Mohty M, Nagler A. Definition of GvHD-free, relapse-free survival for registry-based studies: an ALWP-EBMT analysis on patients with AML in remission. Bone Marrow Transplant. 2016; 51: 610–1. [DOI] [PubMed] [Google Scholar]
  • [22].Holtan SG, DeFor TE, Lazaryan A, Bejanyan N, Arora M, Brunstein CG et al. Composite end point of graft-versus-host disease-free, relapse-free survival after allogeneic hematopoietic cell transplantation. Blood 2015; 125: 1333–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Solomon SR, Sizemore C, Zhang X, Ridgeway M, Solh M, Morris LE et al. Current graft-versus-host disease-free, relapse-free survival: A dynamic endpoint to better define efficacy after allogenic transplant. Biol Blood Marrow Transplant 2017; 23: 1208–14. [DOI] [PubMed] [Google Scholar]
  • [24].Holtan SG, Zhang L, DeFor TE, Bejanyan N, Arora M, Rashidi A et al. Dynamic Graft-versus-host disease-free, relapse-free survival: Multistate modeling of the morbidity and mortality of allotransplantation. Biol Blood Marrow Transplant 2019; 25: 1884–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].US Department of Health and Human Services. Center for Drug Evaluation and Research (CDER) and Center for Biologics Evaluation and Research (CBER): Multiple Endpoints for Clinical Trials: Guidance for Industry. https://www.fda.gov/media/162416/download Accessed December 1, 2022
  • [26].Bakal JA, Westerhout CM, Armstrong PW. Impact of weighted composite compared to traditional composite endpoints for the design of randomized controlled trials. Stat Methods Med Res 2015; 24: 980–8. [DOI] [PubMed] [Google Scholar]
  • [27].Finkelstein DM, Schoenfeld DA. Combining mortality and longitudinal measures in clinical trials. Stat Med 1999; 18: 1341–54. [DOI] [PubMed] [Google Scholar]
  • [28].Pocock SJ, Ariti CA, Collier TJ, Wang D. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. Eur Heart J 2011; 33: 176–82. [DOI] [PubMed] [Google Scholar]
  • [29].U.S. Food and Drug Administration (2009) Guidance for industry – patient reported outcome measures: Use in medical product development to support labeling claims. https://www.fda.gov/media/77832/download Accessed Dec 1, 2022. [DOI] [PMC free article] [PubMed]
  • [30].Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Stat Med 1989; 8: 431–40. [DOI] [PubMed] [Google Scholar]
  • [31].U.S. Food and Drug Administration (2022). Accelerated Approval Program. https://www.fda.gov/drugs/nda-and-bla-approvals/accelerated-approval-program Accessed Dec 1, 2022
  • [32].Feigin A. Evidence from biomarkers and surrogate endpoints. NeuroRx 2004; 1: 323–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Smith BD, DeZern AE, Bastian AW, Durie BGM. Meaningful endpoints for therapies approved for hematologic malignancies. Cancer 2017; 123 :1689–94. [DOI] [PubMed] [Google Scholar]
  • [34].Dunn BK, Akpa E. Biomarkers as surrogate endpoints in cancer trials. Semin Oncol Nurs 2012; 28: 99–108. [DOI] [PubMed] [Google Scholar]
  • [35].Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med 1996; 125: 605–13 [DOI] [PubMed] [Google Scholar]
  • [36].Table of drug surrogates that were the basis of drug approval or licensure. [Accessed December 1, 2022]. https://www.fda.gov/Drugs/DevelopmentApprovalProcess/DevelopmentResources/ucm613636.htm.
  • [37].Kemp R, Prasad V. Surrogate endpoints in oncology: when are they acceptable for regulatory and clinical decisions, and are they currently overused? BMC Med 2017; 15: 134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Prasad V, Kim C, Burotto M, Vandross A. The strength of association between surrogate end points and survival in oncology: a systematic review of trial-level meta-analyses. JAMA Intern Med 2015; 175: 1389–98. [DOI] [PubMed] [Google Scholar]
  • [39].Kim C, Prasad V. Strength of Validation for Surrogate End Points Used in the US Food and Drug Administration’s Approval of Oncology Drugs. Mayo Clin Proc 2016; S0025-6196 (16) 00125-7. Epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Belin L, Tan A, De Rycke Y, Dechartres A. Progression-free survival as a surrogate for overall survival in oncology trials: a methodological systematic review. Br J Cancer 2020; 122: 1707–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Driscoll JJ, Rixe O. Overall survival: still the gold standard: why overall survival remains the definitive end point in cancer clinical trials. Cancer J 2009; 15: 401–5. [DOI] [PubMed] [Google Scholar]
  • [42].Saad and Buyse Statistical controversies in clinical research: end points other than overall survival are vital for regulatory approval of anticancer agents, Annals of Oncology 2016; 27: 373–8. [DOI] [PubMed] [Google Scholar]
  • [43].Ellis LM, Bernstein DS, Voest EE, Berlin JD, Sargent D, Cortazar P et al. American Society of Clinical Oncology perspective: Raising the bar for clinical trials by defining clinically meaningful outcomes. J Clin Oncol 2014; 32: 1277–80. [DOI] [PubMed] [Google Scholar]
  • [44].Johnson JR, Williams G, Pazdur R. End points and United States Food and Drug Administration approval of oncology drugs. J Clin Oncol 2003; 21: 1404–11. [DOI] [PubMed] [Google Scholar]
  • [45].Medeiros BC. Interpretation of clinical endpoints in trials of acute myeloid leukemia. Leuk Res 2018; 68: 32–39. [DOI] [PubMed] [Google Scholar]
  • [46].Anderson KC, Kyle RA, Rajkumar SV, Stewart AK, Weber D, Richardson P. Clinically relevant end points and new drug approvals for myeloma. Leukemia 2008; 22: 231–9. [DOI] [PubMed] [Google Scholar]
  • [47].European Medicines Agency. Guidelines on multiplicity issues in clinical trials. https://www.ema.europa.eu/en/documents/scientific-guideline/draft-guideline-multiplicity-issues-clinical-trials_en.pdf Accesses December 1, 2022
  • [48].Food and Drug Administration. Guidance for Industry Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. https://www.fda.gov/media/77832/download Accessed December 1, 2022
  • [49].Efficace F, Gaidano G, Lo-Coco F. Patient-reported outcomes in hematology: is it time to focus more on them in clinical trials and hematology practice? Blood 2017; 130: 859–866. [DOI] [PubMed] [Google Scholar]
  • [50].Calvert M, Blazeby J, Altman DG, Revicki DA, Moher D, Brundage MD et al. Reporting of Patient-Reported Outcomes in Randomized Trials: The CONSORT PRO Extension. JAMA 2013; 309: 814–22. [DOI] [PubMed] [Google Scholar]
  • [51].Patient-Centered Outcomes Research Institute (PCORI) The PCORI Methodology Report Appendix A: Methodology Standards, 2019. http://www.pcori.org/sites/default/files/PCORI-Methodology-Standards.pdf Accessed December 1, 2022
  • [52].Brundage M, Blazeby J, Revicki D, Bass B, de Vet H, Duffy H et al. Patient-reported outcomes in randomized clinical trials: development of ISOQOL reporting standards. Qual Life Res 2013; 22: 1161–1175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [53].Reeve BB, Wyrwich KW, Wu AW, Velikova G, Terwee CB, Snyder CF et al. ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Qual Life Res 2013; 22: 1889–1905. [DOI] [PubMed] [Google Scholar]
  • [54].European Medicines Agency. The use of patient reported outcome (PRO) measures in oncology studies. Appendix 2 to the guideline on the evaluation of anticancer medicinal products in man. 2016. https://www.ema.europa.eu/en/documents/other/appendix-2-guideline-evaluation-anticancer-medicinal-products-man_en.pdf Accessed December 1, 2022
  • [55].European Hematology Association. Guidelines: patient-reported outcomes in hematology.(Eds. Novik A, Salek S, Ionova T) https://ehaweb.org/assets/Uploads/EHA-Guideline-libro.pdf Accessed December 1, 2022
  • [56].Fiteni F, Pam A, Anota A, Vernerey D, Paget-Bailly S, Westeel V et al. Health-related quality-of-life as co-primary endpoint in randomized clinical trials in oncology. Expert Rev Anticancer Ther 2015; 15: 885–91. [DOI] [PubMed] [Google Scholar]
  • [57].Qian Y, Walters SJ, Jacques R, Flight L. Comprehensive review of statistical methods for analysing patient-reported outcomes (PROs) used as primary outcomes in randomised controlled trials (RCTs) published by the UK’s Health Technology Assessment (HTA) journal (1997-2020). BMJ Open 2021; 11: e051673. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES