Introduction
Myelofibrosis (MF) is a myeloproliferative neoplasm (MPN) associated with bone marrow fibrosis, cytopenias, constitutional symptoms, hepatosplenomegaly, and/or extramedullary hematopoiesis. Patients are at risk for premature death due to disease progression, leukemic transformation, thrombo-hemorrhagic complications, and infections. MF is a rare cancer and can be primary in nature or the result of post-polycythemia vera (PV) or essential thrombocythemia (ET) transformation with a median age of onset of 67 years [Mesa 1999]. Median survival in MF from diagnosis ranges from 2.3 to 15.9 years, depending on risk category and thus MF can become chronic in nature for patients with low-risk disease [Cervantes 2009]. The majority of patients have intermediate or high-risk disease and are eligible to receive JAK2 inhibitor treatment as first-line therapy.
Effectiveness of ruxolitinib, a JAK inhibitor, for reduction of splenomegaly and symptom relief in MF was demonstrated in both COMFORT-I and COMFORT-II phase III trials [Harrison 2012, Verstosvek 2012]. Fedratinib, also a JAK inhibitor, was effective for reducing splenomegaly and symptom burden in more than one-third of patients with myelofibrosis in an international, double-blind, placebo-controlled trial [Pardani 2015]. FDA approval of both JAK inhibitors as MF treatments has changed the treatment landscape for MF and thus increased complexity of clinical trial designs in MF. Many patients with MF will have insufficient response, intolerance, or loss of initial response to ruxolitinib [Harrison 2020]. Thus, clinical trials will likely shift to focus on development of novel agents for patients who become resistant to or intolerant of either of these agents. Alternative approaches may also include testing of experimental agents in combination with an approved JAK inhibitor.
Evaluating efficacy and response in clinical trials is challenging as MF response criteria are multi-faceted and incorporate hematologic parameters, bone marrow fibrosis, splenomegaly, transfusion dependency, and symptom measures along with molecular and cytogenetic changes. Considering alternative ways to measure response with newer agents may be warranted. Herein, we focus our discussion on clinical trial designs as related to therapeutics for treatment of MF (primary or post-PV or post-ET types). Non-pharmacologic interventions may represent promising therapeutic strategies and improve MF patient care [Surapaneni 2019, Huberty 2019]. Challenges in designing symptom management trials in cancer with complementary and alternative medicines are discussed in Buchanan et al. [Buchanan 2005].
Phase I
Phase I trials are conducted to understand how well a drug/biologic can be tolerated in a small number of patients and may represent a first-in-human study. Phase I studies can also be used to evaluate standard-of-care treatment combined for the first time with another tested therapy or a new modality (e.g., immunotherapy). Determining the maximum tolerated dose (MTD) for further testing is the goal of a phase I dose-escalation trial. Typically the design starts with the lowest dose and escalates dosing levels until the MTD is reached based on dose limiting toxicities (DLT). Ethical considerations include minimizing both the number of patients treated at sub-therapeutic doses as well as the number of patients treated at overly toxic dose levels. Phase I trials can be generally categorized as 1) rule-based, 2) model-based and 3) model-assisted [Table 1].Rule-based designs assign patients to dose levels according to pre-specified rules based on actual DLT observations. Model-based designs assign patients to dose levels based on estimating the target toxicity level from a statistical model of the dose-toxicity relationship. Model-assisted designs are a newer class of designs which reside partway between rule-based and model-based designs in which designs are based on underlying statistical models but decision rules can be pre-specified.
Table 1:
Selected characteristics of phase I designs
Design characteristics | Rule-based | Model-assisted | Model-based |
---|---|---|---|
Examples | 3+3 Up/down | mTPI, mTPI-2 Keyboard BOIN | CRM EWOC BLRM |
Pre-determined dose escalation rules set up before study | Yes | Yes | No |
Computationally intensive, repeated estimation of dose-toxicity curve | No | No | Yes |
Targets any pre-specified DLT rate | No | Yes | Yes |
Number of patients treated at MTD can be >6 | No | Yes | Yes |
Rapid dose escalation | No | Yes | Yes |
Good operating characteristics relative to sample size | No | Yes | Yes |
Allocates a high percentage of patients to the MTD | No | Yes | Yes |
Provides overdose control | Yes | Yes | Yes |
Won’t escalate the dose when the latest treated patient experiences toxicity; and will never deescalate the dose when the latest treated patient does not experience toxicity | No | Yes | Yes |
DLT=dose-limiting toxicity; MTD=maximum tolerated dose; CRM=continual reassessment method; EWOC= escalation with overdose control; BOIN=Bayesian optimal interval; mTPI=modified toxicity probability interval; BLRM=Bayesian logistic regression model
Rule-based
The standard 3+3 design is considered a rule-based design and is the most commonly used phase I design, though other up-and-down phase I designs exist. Over 90% of published phase I trials used a 3+3 design due to ease of use and simplicity of pre-specifying decisions in advance [Rogatko 2007]. No prior assumptions about the dose-toxicity relationship are needed, other than assuming a non-decreasing dose-toxicity curve. In brief, three patients are treated per dose level to assess for DLT, usually over the first cycle of treatment. If no patients experience DLTs, the dose is escalated for the next cohort of three patients. If one patient has a DLT, an additional three patients are treated at this level with dose escalation only if none of these additional patients experience DLTs. This process continues through the increasing dose levels and if ≥2 patients experience DLTs at a given dose level, the prior dose level is defined as the MTD. Therefore, the MTD is chosen as the highest dose level where six patients are treated with <2 experiencing DLTs. The 3+3 design assumes that the target probability for DLT takes values that are close to either 1/6 or 1/3. Disadvantages of the 3+3 design are that only 1/3 patients are treated at optimal doses [Zhou 2018] and this design may start far from the target dose, representing a more “conservative approach”. This is especially problematic in rare tumor types where large numbers of patients may not be available for study and concerns for under dosing and therefore sub-therapeutic benefit of a significant number of patients might be realized. Despite these downfalls, the 3+3 continues to be used due to its ease of use, simplicity and lack of software needed to implement the design. Furthermore, it is well received by institutional review boards (IRB) and other regulatory agencies. Simple up-and-down, rolling six and accelerated titration designs are also considered rule-based.
Model-based
Model-based designs have seen increasing development during the past decade with many variations. These include the continual re-assessment method (CRM) and escalation with overdose control (EWOC) along with other adaptive designs. Model-based designs assume a parametric model for the relationship between dose and toxicity. The general approach is that one starts with an initial estimate of the probability of DLT for the starting dose level and then observes the patient for occurrence of a DLT. After each patient, the probability estimate is revised and the next patient is assigned to the target dose level based on these updated estimates. These steps are repeated until the recommended dose level from the model does not change. The CRM approach does not depend on the starting dose level and the model can result in skipping dose levels (though restrictions are often implemented to moderate escalation in many studies). Benefits of the CRM approach are that a fewer number of patients are typically needed to find the MTD than the rule-based or model-assisted methods [O’Quigley 1990]. However, disadvantages include increased logistical burden and the requirement of ongoing data entry and monitoring after each patient. This can be difficult especially for multi-centered trials. Software is required to implement this approach and a strong collaboration with a biostatistician is needed.
The EWOC is an extension of the CRM method and is a Bayesian adaptive dose-finding design. Similar to the CRM, the estimated dose-toxicity curve is continuously updated. A feasibility bound parameter is pre-specified in order to control concerns about overdosing, and dose escalation does not proceed if the probability of overdosing exceeds this pre-specified value [Babb 1998]. Unlike the CRM, the EWOC design produces consistent sequences of doses without dose skipping. Other modifications of the CRM include the Bayesian logistic regression-model [Neuenschwander 2008]. A time-to-event version (TITE-CRM) is also available [Yuan 2018].
Model-assisted
Model-assisted designs represent a newer class of designs, with increasing use due in part to availability of software packages and online applications (RShiny apps, websites, etc.) for implementation. Model-assisted designs are based on underlying statistical models that represent a “middle ground” between the traditional rule-based and model-based designs with decisions that can be pre-specified in advance for ease of use. An initial estimation (prior distribution) of the dose–toxicity curve is used and occurrence of toxicities in patients enrolled at each dose level provides an update to the statistical model, resulting in adjustment to this curve (posterior distribution). At the end of the trial, the posterior distribution is evaluated to identify the dose closest to the targeted toxicity level. Designs include the modified toxicity probability interval (mTPI), keyboard (which has been shown to be the same as the mTPI-2), Bayesian optimal interval design (BOIN) and others. These models have been shown to have good performance and superior operating characteristics as compared to the 3 + 3 design in a variety of scenarios [Zhou 2018]. Another benefit is that the model-assisted designs can handle passive changes in the number of evaluable patients (such as when a patient becomes inevaluable after enrollment for the cohort closes) per dose level as compared to the 3+3 design, which requires reopening in order to enroll additional patients to fill the required cohort of three patients when one or more of the patients become inevaluable.
The mTPI design specifies beforehand (a priori) three intervals corresponding to proper, under and over dosing. A local beta-binomial probability model is used to describe the toxicities at the current dose level being studied. Dose escalation decisions are then based on the unit probability mass (UPM) of the three intervals corresponding to the area under the posterior distribution curve. If the toxicity rate of the current dose level lies within the under dosing interval, then dose escalation is indicated. If the toxicity rate of the current dose level is within the proper dosing interval, remaining at the current dose level is indicated. If the toxicity rate of the current dose level is within the over dosing interval, then de-escalation is indicated. Dose escalation decisions can be generated under a range of parameters (i.e., do not have to assume toxicity of 1/3 or 1/6 like a 3+3 design). Recommended sample size for the mTPI design is k x (d+1) where k is cohort size and d is the number of dose levels evaluated [Ji 2013]. At the end of the study, toxicity data across all dose levels are combined to estimate the non-decreasing toxicity probabilities across the dose levels using a pool-adjacent-violators algorithm. The dose level with the toxicity probability closest to the target probability is then selected as the MTD. If no dose level has a toxicity probability within the target probability interval, the highest dose is considered the MTD or if all doses have toxicity probability greater than the target probability, then no dose is selected. Overdosing is the biggest concern of the mTPI design due to the unequal width intervals of the UPM distribution. Therefore extensions of the mTPI known as mTPI-2 and the keyboard designs were developed [Yan 2017], and later shown to be identical [Zhou 2018]. These designs overcome the problem of overdosing by dividing the under-dosing and over-dosing intervals (also known as “keys”) into shorter subintervals. Thus dose escalation decisions are defined for the sub-intervals/keys to alleviate the concern of overdosing.
Risk of overdosing is also minimized by the BOIN design. Both mTPI and keyboard designs require calculating the posterior distribution (area under the curve) whereas the BOIN design relies on only comparison of the observed DLT rate at the current dose with fixed, pre-specified escalation/de-escalation boundaries. An overdose control parameter is used to assure elimination of current and higher doses from the trial to prevent treating future patients if the DLT rate at the current dose level is greater than a pre-specified threshold [Yuan 2016]. Similar to the mTPI design, once the pre-specified sample size is exhausted, the MTD is computed based on isotonic regression. The BOIN design can be extended to use with combination of agents (BOIN-COMB) or late onset toxicity (TITE-BOIN). U-BOIN is a utility-based seamless phase I/II trial design for finding the optimal biological dose for targeted and immune therapies with the incorporation of a risk-benefit trade-off (between toxicity and efficacy) in order to more realistically reflect clinical practice [Zhou 2019].
Phase I designs of combination regimens
Dose escalation with drug combinations in a phase I setting can be examined in various ways. Depending on the mechanisms of action and potential for overlapping toxicities, consideration of the following approaches is warranted: alternating escalation of the agents in a series of sequential dose levels, simultaneous escalation of both agents, or escalation of one agent to the recommended dose for phase II trials while holding the other agent at a fixed (generally high or low) dose [LeTournea 2009]. Bayesian models in this setting are also very useful. The dose–toxicity probability curves are updated after each cohort of patients for both agents by using all toxicity data. Incorporating both toxicity and efficacy endpoints might be useful in the combination setting [Yin 2006, Zhang 2016]. Synergism and interaction between agents should be carefully considered with pharmacokinetics.
Phase II
Once the MTD has been established in phase I, a drug (or combination) may move to the phase II setting in which the goal shifts to providing initial estimates of efficacy and ascertaining whether treatment warrants further development in future controlled trials (i.e., phase III setting). Single-arm (non-randomized) Simon two-stage designs are some of the most popular phase II designs [Simon 1989]. The hypothesis is tested in two (or more) “stages” in order to minimize the number of patients treated with a drug of low activity. The design is based on properties of the binomial distribution and requires specifying the largest “success” rate (typically, disease response rate) observed that would suggest a drug does not warrant further investigation. The design parameters are expressed in terms of the number of “successes” seen from n patients with a boundary (r) cutoff for determining whether to continue onto the second (or next) stage. The Simon optimal two-stage design has the minimum expected sample size overall and will require less number of patients for the first stage whereas the minimax design will require the smallest total sample size overall.
Other phase II designs similar to the Simon two-stage design include Fleming’s version, which makes it possible to stop after the first stage if too few responses or too many responses are observed [Fleming 1982]. Single-arm (non-randomized) three-outcome phase II designs also exist in which the hypotheses tested would also include the conclusion that the observed activity of the drug is “borderline” and the trial is inconclusive [Sargent CCT 2001].
A flexible randomized “pick the winner” design in which patients are randomized to one of two or more experimental regimens may be used as a way to more closely mimic clinical practice where there are many factors that determine a patient’s choice of treatment including adverse events, cost, and treatment schedules [Sargent Stat Med 2001]. In this case the goal is to exclude a substantially inferior treatment for further study and select a substantially superior treatment for further study (continue on to phase III) when a superior treatment exists. If the observed difference in the “success” rate of the treatments (could include multiple arms) studied is larger than a pre-specified difference, then the treatment with the highest success rate is selected. Otherwise, if the observed differences do not meet this requirement, other factors may be selected for the treatment of choice. This design is appropriate for selecting among experimental regimens (i.e., selection design), and not versus a control (i.e., screening design in which one of the randomized arms is a standard-of-care arm) because the selection design is not formally testing superiority (or more accurately, the null hypothesis that the “success” rates are equal). For a discussion of screening designs, see Rubinstein et al. [Rubinstein 2005].
For small sample sizes and rare tumors, a design worth mentioning is the single-arm (non-randomized) Gehan design, which is a design with the minimum number of patients needed to conclude that a drug is worthy of further study or unlikely to be effective in a target number of patients [Gehan 1979]. This design quickly screens out ineffective drugs in a timely manner. Based on the target effectiveness rate, the design calculates the chance of having a certain number of consecutive failures in a row (under the null hypothesis) and 1 or more “successes” observed in the required sample size would be noteworthy for further study. For example, target effectiveness of 25% would require observing at least 1 “success” out of 9 patients to warrant further study of the drug.
In the phase II arena, several Bayesian trial designs exist. The Bayesian optimal phase II (BOP2) design is a flexible design that can include several interim analyses, can handle efficacy and toxicity endpoints together, and can apply to single-arm and two-arm trials [Zhou 2017]. Similar to the designs detailed above, it minimizes the expected sample size if the regimen has low activity and controls the type 1 error rate. A time-to-event (TOP) version is also available for real-time interim decisions when patients’ outcomes may still be pending, such as immunotherapy trials [Lin 2020].
Particularly, in the era following ruxolitinib and fedratinib approval for MF, combination trials of novel agents combined with these standard of care drugs will become the norm. In addition, patients who are deemed refractory or resistant to JAK inhibitor therapies will be prime candidates for studying the effects of single-agent novel therapies. Designing studies that target these unique sub-populations of MF patients is crucial for trial conduct and conserving trial resources.
Phase III
Phase III trials are typically designed to assess effectiveness of a new intervention versus standard of care, in a randomized setting. Designs can be specified as testing superiority, equivalence or non-inferiority (NI). There has been a recent interest in NI studies, in which the hypothesis of interest is that the difference between the experimental treatment and active control (i.e., standard of care) lies within a difference margin of interest or that the new treatment is “as good as” the current treatment given. In 2005, less than 100 NI trials were published versus 600 NI trials published in 2015 [Mauri 2017]. The NI difference margin chosen should be pre-specified in advance of the study and can be estimated based on past performance of the active comparator arm in prior studies. In addition, US Food and Drug Administration (FDA) guidance exists as to how to select this margin [FDA NI guidance]. The NI hypothesis is tested using a one-sided test of the upper bound of the 97.5% confidence interval including the margin of interest. This may decrease the sample size needed as compared to a superiority trial design if superiority of the experimental regimen is expected relative to standard of care.
Adhering to high standards of study conduct in NI trials is crucial as deviations (e.g., treatment non-adherence, protocol violations, attrition) typically create bias towards a NI result (i.e., makes the arms look more similar). Such biases are typically of lesser concern in superiority trials since they are in the direct of the null hypothesis (not to suggest that superiority trials can be carried out in a sloppy fashion!). In addition, NI trials should be analyzed using both intent-to-treat (i.e., patients according to their randomized assignment) and per-protocol (i.e., patients according to their treatment received) approaches. Only reporting intent-to-treat analyses may bias the results toward a false-positive conclusion of non-inferiority as the difference obtained will be narrower between treatments if substantial amounts of non-adherence, cross-over or loss to follow-up occurred. Caution should be taken when interpreting studies that were initially designed with superiority in mind and then fail to show a difference; one cannot conclude NI based on the absence of a significant treatment difference for a superiority study (i.e., post-hoc analysis is not appropriate). Trials can be designed to test NI first and then subsequently test for superiority difference after NI is established. Figure 1 shows the conclusions that can be made based on the upper bound of the confidence interval for treatment differences of various scenarios. NI trials might be useful for comparing newer agents with standards-of-care in MF.
Figure 1:
Possible conclusions from a non-inferiority trial design
Adaptive and other designs
Adaptive designs incorporate opportunities to change aspects of the study based on accumulating data and interim analyses. Changes are typically pre-specified and detailed in advance. Changes made to the study can include fluctuating randomization probabilities to increase enrollment in treatment arm(s) that are doing well, sample size re-estimation, and interim methods for early stopping for efficacy and/or futility [FDA AD guidance]. Hypotheses can be changed from NI to superiority, or eligibility criteria can be modified to enrich enrollment of subgroups that are deriving benefit (such as biomarker enrichment designs). Outcomes and analysis methods can also be modified based on accumulating data. Advantages of adaptive designs include increased flexibility and possible efficiency gains, but interim analyses rely on accurate data being quickly available for patients. Caution should be taken when interpreting treatment effects at the conclusion of the study as they are dependent on the adaptations made throughout the study.
Master trials using umbrella or basket designs have been incorporated into clinical trial design in order to accelerate drug development. Basket trials evaluate a targeted therapy in multiple diseases that have a common molecular alteration, whereas umbrella trials evaluate multiple targeted therapies for a single disease, stratified by subgroups based on molecular signatures. The Beat AML trial is an umbrella trial, sponsored by The Leukemia & Lymphoma Society that is focused on testing novel targeted therapies and combination studies in specific genomic subtypes for acute myeloid leukemia based on results of next-generation sequencing (NCT03013998) [Burd 2019]. As the landscape of MF treatment options and biomarkers evolves, the necessity for the basket/umbrella approach may be warranted.
Endpoints in MF
As clinical trial designs evolve in MF, so must the endpoints used for evaluating treatment success. Typically, response in MF is assessed via International Working Group (IWG) consensus criteria [Tefferi 2013]. Alternative response criteria are being examined such as decreased bone marrow fibrosis in patients who have moderate or severe fibrosis, such as in the ongoing MPN-RC 118 clinical trial (NCT03895112). Overall survival has been shown to differ by risk, though this endpoint may not be feasible when median survival is expect to be long. Median survival for patients under 60 with primary MF ranged from 2.4 to 13.4 years from diagnosis depending on International Prognostic Scoring System risk level [Vaidya 2009]. These estimates were similar to those reported by Cervantes et al. [Cervantes 2009]. However, a planned phase III trial (NCT04576156) of imetelstat versus best available therapy (BAT) was recently designed using overall survival as the primary endpoint. This trial of 320 patients (randomized 2:1) targets enrollment of patients with JAK-inhibitor refractory MF and is designed to detect a 40% reduction in death (i.e., a hazard ratio of 0.60 or improvement in median survival from 14 months in the BAT arm to 23 months in the imetelstat arm). The timely observation of deaths and strong hazard ratio both contribute to this design maintaining a feasible sample size. Overall survival as a primary endpoint would not likely be feasible in a small to moderately sized trial in earlier stage disease where overall survival is expected to be much longer (e.g., a combined analysis of the COMFORT-I/II trials reported a median survival of 5.3 years for the ruxolitinib arms and 2.3 years for the control arms [Verstovsek 2017]).
Splenomegaly and spleen size changes are typically evaluated in clinical trials based on imaging such as computed tomography (CT) or magnetic resonance imaging (MRI). For example, the primary endpoint in the JAKARTA trial (NCT01437787) was the proportion of patients achieving ≥35% reduction from baseline in spleen volume at the end of cycle 6 measured by MRI or CT with a follow-up scan 4 weeks later. However, these endpoints may limit trial enrollment to only patients with enlarged spleens, thus limiting generalizability to the larger MF patient population. In addition, clinical trial costs may be increased due to imaging costs outside standard of care.
To control overall type 1 error (i.e., α) across endpoints in a clinical trial, sequential testing of endpoints can be performed in a hierarchical fashion. In this instance, the primary endpoint serves as the “gatekeeper” for secondary analysis, such that if the primary null hypothesis was rejected then formal statistical testing can be undertaken for the subsequent secondary efficacy end points sequentially in a pre-specified order. If the primary null hypothesis is not rejected, formal sequential testing of secondary endpoints is halted. In the phase III SIMPLIFY-1 trial (NCT01969838), formal sequential testing was stopped after the first secondary endpoint (Total Symptom Score [TSS] response at Week 24) and the subsequent secondary endpoints involving anemia response were not formally evaluated, though would have supported benefit of the experimental therapy [Mesa 2017]. An alternative approach which avoids such a scenario is simultaneous testing, but usually at a cost of increased sample size which is needed in order to split α across hypothesis tests.
Complex endpoints looking at both toxicity and efficacy combined may be useful, particularly in phase I and II trials. Development of better endpoints using pooled data and consortium-based work in MF is vital to evaluating newer agents and drug classes with differing mechanisms of action. For phase I/II trials, toxicity and efficacy of molecularly targeted agents may not follow traditional dose-toxicity relationships historically seen with cytotoxic therapies. As such, toxicity and efficacy may not be dose dependent in a monotonic fashion. An area of interest for future trials is the investigation of biomarkers that could provide more rapid “efficacy” signals rather than using typical response criteria that require several cycles or more of therapy before response assessment is known. This is an area of ongoing work and still early in development. For trials intended to support drug approval by a regulatory agency, careful discussion prior to protocol finalization with the regulatory agency (e.g., the Food and Drug Administration (FDA) in the U.S.) regarding study design, patient population, and primary/secondary endpoints is needed in order to support successful registration. Recommended endpoints may differ based on earlier stage disease versus the refractory setting and may involve coupling an endpoint demonstrating clinical activity (e.g., spleen response) with an endpoint demonstrating patient benefit (e.g., clinically meaningful improvement in patient-reported symptom burden).
Patient-reported Outcomes (PROs)
Clinical trials have been increasingly incorporating PROs as outcomes. Symptom burden in patients with MPNs is assessed via the MPN Symptom Assessment Form (MPN-SAF) or the abbreviated TSS which assesses the most clinically relevant symptoms —fatigue, concentration problems, early satiety, inactivity, night sweats, pruritus, bone pain, abdominal discomfort, weight loss, and fever [Emanual 2012]. Recently, to harmonize MF symptom burden questionnaires across academic and industry partners, the MF-SAF version 4.0 was developed and includes 7 items: fatigue, night sweats, pruritus, abdominal discomfort, pain, early satiety, and bone pain. The MF-SAF v4.0 is recommended for use in MF trials and is available as a 7-day diary (24-hour recall) or one-time (1-day recall) assessment [Gwaltney 2017]. Graphical display of TSS data appears in Figure 2 and follows recommendations for optimal visualization of PROs by Snyder et al. [Snyder 2019]. Statistical analysis should be conducted on the TSS as well as individual symptom items to ensure maximal use of data.
Figure 2.
Mean changes from baseline during treatment for patient-reported outcomes
*MPN-SAF TSS and EORTC QLQ-C30 Global health status/QOL transformed to a 0–10 scale where 10 represents the worst outcome for consistency with other displayed items MPN-SAF TSS=Myeloproliferative Neoplasm Symptom Assessment Form Total Symptom Score; EORTC QLQ-C30=European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire Core 30; GHS/QOL=Global Health Status/Quality of Life
Symptom response for an individual patient in MF trials has historically been defined as a decrease (improvement) from baseline by 50%; however, this may not represent an attainable endpoint particularly in patients with lower baseline symptom burden. Investigating other symptom response definitions is an ongoing area of investigation, with decreases smaller than 50% likely representing meaningful improvements.
In addition to assessment of MF symptom burden, other PROs should be used in clinical trials to assess important domains of health-related quality of life during the patient’s treatment course. A possible approach is described by Kluetz et al. [Kluetz 2016].
MF trials during the last decade
Frequency of clinical trials from 2010–2019 for MF treatment is presented in Figure 3. Data were abstracted from ClinicalTrials.gov and reviewed for phase and trial design. A total of 165 treatment interventional studies including MF patients (some with MF as the primary cohort, others including MF along with other MPN/myelodysplastic syndrome or hematologic malignancies related cohorts) were reviewed. A median of 16.5 (range 13–20) trials were initiated per year; 33% phase I, 56% phase II, 10% phase III and 1% phase IV; and 47% industry sponsored. For phase I trials with dose escalation (n=41/54; 76%), the majority proposed a standard 3+3 design (or other rule-based design) or did not specifically detail the dose escalation schema; less than 5 dose escalation trials were explicitly described as model-based or assisted. Almost half of the phase I trials (23/54; 43%) included combination treatment, with 17/23 (74%) using ruxolitinib or fedratinib. Median number of primary/secondary outcomes was 6.0 (range 1–28) across all studies. Primary and secondary endpoints were reviewed for patient-reported quality of life, symptoms, or other PROs. Overall 63/165 (38%) studies included at least one PRO endpoint as a primary/secondary endpoint; 28% in phase I, 36% in phase II and 82% in phase III. Median target enrollment for the phase III trials (n=16) was 192 patients; only two phase III trials planned to enroll >500 patients.
Figure 3:
Clinical trials conducted in myelofibrosis (2010–2019) by phase of study
Conclusions and Summary
Key design features of phase I, II and III drug trials have been presented and discussed. For phase I trials in MF, incorporation of model-assisted and model-based approached are encouraged in order to maximize therapeutic benefits and include novel methods of dose-escalation. Clinical trial designs in MF have shifted in recent years to accommodate new challenges in the post JAK-inhibitors approval era and trials testing combination agents and/or employing non-inferiority or adaptive designs may become more prevalent in the future. Despite the availability of standardized response criteria, alternative measures of response with newer agents may be warranted. Finally, patient-reported outcomes including MF symptom burden and other domains of health-related quality of life should be encouraged as endpoints in clinical trial designs.
Clinical Care Points
A small number of phase I dose-escalation trials utilized a model-based or assisted design in MF clinical trials conducted from 2010–2019; 43% included combination treatment.
Standard and adaptive phase I and II designs are appropriate in MF.
Consideration of combination treatment clinical trial designs in the post-JAK inhibitor approval era with a small patient population is encouraged.
Disease response by International Working Group (IWG) criteria, patient-reported symptom burden assessed using MF-SAF v4.0 are standard; but additional endpoints are needed in some settings.
Key Points:
Phase I and II adaptive designs may be useful for clinical trials of myelofibrosis.
Clinical trial designs in myelofibrosis have shifted in recent years to accommodate new challenges in the post JAK inhibitors approval era.
Despite the availability of standardized response criteria, alternative measures of response in clinical trials evaluating newer agents may be warranted.
Patient-reported symptoms remain a key outcome in myelofibrosis clinical trials, particularly in the phase III setting; a validated questionnaire is available for measurement of patient symptom burden.
Synopsis
Design features of phase I, II and III clinical trials of pharmaceutical interventions in myelofibrosis (MF) are discussed. Model-assisted and model-based designs for phase I trials are useful for maximizing therapeutic benefit and include novel approaches to dose-escalation. Trials in MF have shifted in recent years to accommodate new challenges following approval of JAK inhibitor therapies. Standardized response criteria exist; however, alternative measures of response when evaluating newer agents may be needed. Non-inferiority and other adaptive designs can be used to incorporate design changes over time. Patient-reported outcomes including quality of life and symptom assessment should be included as outcome measures.
Footnotes
Conflicts of Interest Disclosure(s): Dr. Dueck receives royalties from commercial licensing of the Myeloproliferative Neoplasm Symptom Assessment Form.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Contributor Information
Heidi E. Kosiorek, Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Scottsdale, Arizona.
Amylou C. Dueck, Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Johnson Research Building, 13400 E. Shea Boulevard, Scottsdale, AZ 85259.
References
- 1.Mesa RA, Silverstein MN, Jacobsen SJ, Wollan PC, Tefferi A. Population-based incidence and survival figures in essential thrombocythemia and agnogenic myeloid metaplasia: an Olmsted County Study, 1976–1995. Am J Hematol 1999;61(1):10–15. [DOI] [PubMed] [Google Scholar]
- 2.Cervantes F, Dupriez B, Pereira A, et al. New prognostic scoring system for primary myelofibrosis based on a study of the International Working Group for Myelofibrosis Research and Treatment. Blood 2009;113(13):2895–2901. [DOI] [PubMed] [Google Scholar]
- 3.Harrison C, Kiladjian JJ, Al-Ali HK, et al. JAK inhibition with ruxolitinib versus best available therapy for myelofibrosis. N Engl J Med 2012;366(9):787–798. [DOI] [PubMed] [Google Scholar]
- 4.Verstovsek S, Mesa RA, Gotlib J, Levy RS, Gupta V, DiPersio JF, Catalano JV, Deininger M, Miller C, Silver RT, Talpaz M, Winton EF, Harvey JH Jr, Arcasoy MO, Hexner E, Lyons RM, Paquette R, Raza A, Vaddi K, Erickson-Viitanen S, KoumenisIL, Sun W, Sandor V, Kantarjian HM. A double-blind, placebo-controlled trial ofruxolitinib for myelofibrosis. N Engl J Med 2012. March 1;366(9):799–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pardanani A, Harrison C, Cortes JE, et al. Safety and Efficacy of Fedratinib in Patients With Primary or Secondary Myelofibrosis: A Randomized Clinical Trial. JAMA Oncol 2015;1(5):643–651. [DOI] [PubMed] [Google Scholar]
- 6.Harrison CN, Schaap N, Mesa RA. Management of myelofibrosis after ruxolitinib failure. Ann Hematol 2020; 99:1177–1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Surapaneni P, Scherber RM. Integrative Approaches to Managing Myeloproliferative Neoplasms: the Role of Nutrition, Exercise, and Psychological Interventions. Curr Hematol Malig Rep 2019;14:164–170. [DOI] [PubMed] [Google Scholar]
- 8.Huberty J, Eckert R, Dueck A. et al. Online yoga in myeloproliferative neoplasm patients: results of a randomized pilot trial to inform future research. BMC Complement Altern Med 2019;19: 121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Buchanan DR, White JD, O’Mara AM, Kelaghan JW, Smith WB, Minasian LM. Research-design issues in cancer-symptom–management trials using complementary and alternative medicine: Lessons from the National Cancer Institute Community Clinical Oncology Program Experience. J Clin Oncol 2005;23:27:6682–6689. [DOI] [PubMed] [Google Scholar]
- 10.Rogatko A, Schoeneck D, Jonas W, Tighiouart M, Khuri FR, Porter A. Translation of innovative designs into phase I trials. J Clin Oncol 2007;25(31):4982–4986. [DOI] [PubMed] [Google Scholar]
- 11.Zhou H, Yuan Y, Nie L. Accuracy, safety, and reliability of novel phase I trial designs. Clin Cancer Res 2018;24:4357–4364. [DOI] [PubMed] [Google Scholar]
- 12.O’Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase I clinical trials in cancer. Biometrics 1990;46:33–48. [PubMed] [Google Scholar]
- 13.Babb J, Rogatko A, Zacks S. Cancer phase I clinical trials: efficient dose escalation with overdose control. Stat Med 1998;17:1103–20. [DOI] [PubMed] [Google Scholar]
- 14.Neuenschwander B, Branson M, Gsponer T. Critical aspects of the Bayesian approach to phase I cancer trials. Stat Med 2008;27(13):2420–2439. [DOI] [PubMed] [Google Scholar]
- 15.Yuan Y, Lin R, Li D, Nie L, Warren KE. Time-to-Event Bayesian Optimal Interval Design to Accelerate Phase I Trials. Clin Cancer Res 2018;24(20):4921–4930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ji Y, Wang SJ. Modified toxicity probability interval design: a safer and more reliable method than the 3+3 design for practical phase I trials. J Clin Oncol 2013;31(14)1785–1791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yan F, Mandrekar SJ, Yuan Y. Keyboard: a novel bayesian toxicity probability interval design for phase I clinical trials. Clin Cancer Res 2017; 23:3994–4003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhou H, Murray TA, Pan H. Comparative review of novel model-assisted designs for phase I clinical trials. Stat. Med 2018;37:2208–2222. [DOI] [PubMed] [Google Scholar]
- 19.Yuan Y, Hess KR, Hilsenbeck SG, Gilber MR. Bayesian optimal interval design: a simple and well-performing design for phase I oncology trials. Clin Cancer Res 2016;22(17):4291–4301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zhou Y, Lee JJ, Yuan Y. A utility-based Bayesian optimal interval (U-BOIN) phase I/II design to identify the optimal biological dose for targeted and immune therapies. Stat Med 2019;38(28):5299–5316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Le Tourneau C, Lee JJ, Siu LL. Dose escalation methods in phase I cancer clinical trials. J Natl Cancer Inst 2009;101(10):708–720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Yin G, Li Y, Ji Y. Bayesian dose-finding in phase I/II clinical trials using toxicity and efficacy odds ratios. Biometrics 2006;62(3):777–787. [DOI] [PubMed] [Google Scholar]
- 23.Zhang L, Yuan Y. A practical Bayesian design to identify the maximum tolerated dose contour for drug combination trials. Stat Med 2016; 35(27):4924–4936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Simon R. Optimal Two-Stage Designs for Phase II Clinical Trials. Controlled Clinical Trials 1989; 10:1–10. [DOI] [PubMed] [Google Scholar]
- 25.Fleming TR. One-sample multiple testing procedure for phase II clinical trials. Biometrics 1982; 38: 143–151. [PubMed] [Google Scholar]
- 26.Sargent DJ, Chan V, Goldberg RM. A three-outcome design for phase II clinical trials. Control Clin Trials 2001;22(2):117–125. [DOI] [PubMed] [Google Scholar]
- 27.Sargent DJ, Goldberg RM. A flexible design for multiple armed screening trials. Stat Med 2001;20:1051–1060. [DOI] [PubMed] [Google Scholar]
- 28.Rubinstein LV, Korn EL, Freidlin B, et al. Design issues of randomized phase II trials and a proposal for phase II screening trials. J Clin Oncol. 2005;23:7199–7206. [DOI] [PubMed] [Google Scholar]
- 29.Gehan E. Clincial trials in cancer research. Environmental Health Perspectives 1979; 32:31–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhou H, Lee JJ, Yuan Y. BOP2: Bayesian optimal design for phase II clinical trials with simple and complex endpoints. Stat Med 2017;36(21):3302–3314. [DOI] [PubMed] [Google Scholar]
- 32.Lin R, Coleman RL, Yuan Y. TOP: Time-to-Event Bayesian Optimal Phase II Trial Design for Cancer Immunotherapy. J Natl Cancer Inst 2020;112(1):38–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Mauri L, D’Agostino RB. Challenges in the Design and Interpretation of Noninferiority Trials. N Engl J Med 2017; 377:1357–1367. [DOI] [PubMed] [Google Scholar]
- 34.Non-Inferiority Clinical Trials to Establish Effectiveness, Guidance for Industry. U.S. Department of Health and Human Services. FDA-2010-D-0075, November 2016. [Google Scholar]
- 35.Adaptive Design Clinical Trials for Drugs and Biologics Guidance for Industry. U.S. Department of Health and Human Services. FDA-2018-D-3124, December 2019. [Google Scholar]
- 36.Burd A, Schilsky RL, Byrd JC, et al. Challenges and approaches to implementing master/basket trials in oncology. Blood Adv 2019;3(14):2237–2243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Tefferi A, Cervantes F, Mesa R, et al. Revised response criteria for myelofibrosis: International Working Group-Myeloproliferative Neoplasms Research and Treatment (IWG-MRT) and European LeukemiaNet (ELN) consensus report. Blood 2013;122(8):1395–1398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Vaidya R, Siragusa S, Huang J, et al. Mature survival data for 176 patients younger than 60 years with primary myelofibrosis diagnosed between 1976 and 2005: evidence for survival gains in recent years. Mayo Clin Proc 2009;84(12):1114–1119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Verstovsek S, Gotlib J, Mesa RA, et al. Long-term survival in patients treated with ruxolitinib for myelofibrosis: COMFORT-I and -II pooled analyses. J Hematol Oncol. 2017;10(1):156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Mesa RA, Kiladjian JJ, Catalano JV, et al. SIMPLIFY-1: A Phase III Randomized Trial of Momelotinib Versus Ruxolitinib in Janus Kinase Inhibitor-Naïve Patients With Myelofibrosis. J Clin Oncol. 2017;35(34):3844–3850. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Emanuel RM, Dueck AC, Geyer HL, et al. Myeloproliferative Neoplasm (MPN) Symptom Assessment Form total symptom score: prospective international assessment of an abbreviated symptom burden scoring system among patients with MPNs. J Clin Oncol 2012;30:4098–4103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Gwaltney C, Paty J, Kwitkowski VE, et al. Development of a harmonized patient-reported outcome questionnaire to assess myelofibrosis symptoms in clinical trials. Leuk Res 2017;59:26–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Snyder C, Smith K, Holzner B, et al. Making a picture worth a thousand numbers: recommendations for graphically displaying patient-reported outcomes data. Qual Life Res 2019;28(2):345–356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kluetz PG, Slagle A, Papadopoulos EJ, Johnson LL, Donoghue M, Kwitkowski VE, Chen WH, Sridhara R, Farrell AT, Keegan P, Kim G, Pazdur R. Focusing on Core Patient-Reported Outcomes in Cancer Clinical Trials: Symptomatic Adverse Events, Physical Function, and Disease-Related Symptoms. Clin Cancer Res. 2016. April 1;22(7):1553–8. [DOI] [PubMed] [Google Scholar]