Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Oct 3.
Published in final edited form as: Stroke. 2011 Jun 30;42(8):2356–2362. doi: 10.1161/STROKEAHA.111.619122

Optimal Endpoints for Acute Stroke Therapy Trials: Best Ways to Measure Treatment Effects of Drugs and Devices

Jeffrey L Saver 1
PMCID: PMC3463240  NIHMSID: NIHMS310500  PMID: 21719772

Abstract

Background

Over the past decade, analysis of completed actual trials, model population studies, and theoretical work have improved approaches to selecting and analyzing endpoints in acute stroke treatment trials.

Methods

Narrative review.

Results

As stroke affects persons in their biologic, functional, social, and experiential dimensions, measures of impairment, disability, handicap, and quality of life are all desirable in pivotal trials, with disability most important. Scales that are valid, reliable, responsive, and easy to administer are preferred; consequently, the modified Rankin Scale has become the most widely employed single clinical efficacy measure. Since stroke cripples as well as kills, most outcome scales array patient outcome in ordered ranks, spread over the entire range from normal to disabled to dead. Generally, shift analysis, analyzing all health state transitions concurrently, is the most efficient analytic technique to detect treatment effects, with sliding dichotomy less and fixed dichotomy least efficient, unless treatment effects strongly cluster at one or a few health state transitions that can be prespecified. Test statistics must also take into account interpretability – how well they can be converted into metrics capturing all outcomes the intervention might alter, in proportion to the degree they are valued by the patient; full ordinal analysis is most, sliding dichotomy intermediate, and fixed dichotomy least informative regarding this global outcome.

Conclusion

Stroke trial power and interpretation can be substantially enhanced by adherence to the principles delineated in this review. Full ordinal and sliding dichotomy analysis will most often be advantageous compared with fixed dichotomous approaches.

Keywords: Acute Cerebral Hemorrhage, Acute Cerebral Infarction, Acute Stroke Syndromes, Emergency treatment of Stroke, Neuroprotectors, Thrombolysis


Acute stroke trialists made great strides in the first decade of the 21st century. The more than 125 acute stroke trials performed provided definitive support for three treatment advances (IV TPA in the 3-4.5 hour window, hemicraniectomy for malignant infarction, and coiling for aneurysmal subarachnoid hemorrhage).1 Of equal import, these trials and accompanying theoretical work refined methods for optimizing the design of acute stroke trials, laying a foundation for even more rapid progress in the coming decade. This narrative review will briefly survey important lessons that have been learned regarding best ways to select study endpoints in acute stroke trials and statistically analyze for evidence of benefit, drawing whenever possible upon approaches recommnended by consensus groups.2-4

Selecting Endpoints to Measure

In all clinical trials subjecting human persons to experimental intervention, safety endpoints are key measures. Universal safety endpoints across all trials include 1) all cause mortality and 2) serious adverse events. Additional safety endpoints in acute stroke trials should specifically interrogate adverse events expected based on mechanisms of drug or device action. Examples include: including hemorrhagic transformation for reperfusion treatments in acute cerebral ischemia, thromboembolic events for prothrombotic agents in intracerebral and subarachnoid hemorrhage, and femoral artery access complications in catheter device trials.

Trial phase is a key determinant of which efficacy endpoints should be selected as leading outcomes in acute stroke trials. In drug trials, very early phase studies will focus on pharmacokinetics. Midphase drug trials and early stage device trials seek to rapidly explore and optimize drug dosing or device design and use, to select the most promising approach to move to pivotal phase testing. If available, the best primary endpoint for midphase trials are biomarkers that directly reflect treatment effect, which typically have fewer confounding factors and consequently less noise than clinical endpoints and are therefore more informative for rapid treatment refinement. Biomarkers useful as primary endpoints in midphase trials include: for pharmacologic reperfusion - TCD, CT, and/or MR evidence of early reperfusion; for device reperfusion - angiographic reperfusion scales such as the Thrombolysis in Cerebral Infarction (TICI) and Arterial Occlusive Lesion (AOL) scales; for neuroprotection – salvage of penumbral tissue identified on multimodal CT or MR imaging; for intracerebral hemorrhage medical and surgical treatments – reduction in hematoma growth or hematoma volume on CT or MRI. Some, such as reperfusion scales, are well validated by multiple studies,5 while others, such as reduction in hematoma growth, have not yet been confirmed as valid predictors of clinical response.6 If no biomarker is available that is clearly tied to treatment mechanism and has less variability than clinical endpoints, than clinical outcome measures should be used in midphase trials, deploying analytic techniques that maximize detecting signals of potential efficacy, rather than clinical interpretability.

Pivotal, registration trials must determine whether the intervention alters patient final clinical outcome for good or ill. Candidate metrics to assess clinical outcome in acute stroke are legion – more than 45 different outcome measures have been used in recent trials.7 As a neurologic disease, stroke alters the cardinal domains of human behavior, including language, spatial, executive, affective, motor, and visual function. For acute rehabilitation trials focused on domain-specific interventions, outcome measures confined to one or a few of these domains are appropriate primary metrics. But for the most common acute interventions focused on improving outcomes across all domains, more comprehensive measures are needed.

The World Health Organization (WHO) provided a useful framework for conceptualizing outcome domains for clinical trials, dividing health dimensions into impairments, disabilities, and handicaps.8 (A more recent WHO framework is more complex and adapted to social policy and population health planning, but not as useful for RCTs in which individual patient outcomes are the key concern.) An impairment is a loss or abnormality of anatomic, physiologic, or psychologic function. Disability is a restriction, resulting from an impairment, in the ability to perform an activity in a normal manner. Handicap is a disadvantage for an individual resulting from an impairment or disability, which limits the fulfillment of a sociocultural role. The objective WHO framework is usefully supplemented by patient-reported outcomes. Like all diseases, stroke affects persons in their biologic, functional, social, and experiential dimensions; consequently, to capture all important qualitative aspects of outcome, pivotal clinical trials should consider deploying measures of 1) impairment, 2) disability, 3) handicap, and 4) quality of life.

Among these dimensions of health, the most important in acute stroke RCTs is disability. Ability to perform activities related to self-care, work, and enjoyment is of unquestionable importance to patients, health providers, and society. In contrast, impairments that do not compromise patient functional capacity are of minor significance; handicaps are greatly affected by cultural and social factors beyond the scope of medical therapies to alter; and patient-reported measures of quality of life are confounded by fundamental epistemological issues. For all diseases, the human capacity for psychological adaptation alters patient reported outcomes over time (adjustment to disease bias). In stroke, between 28-78% of individuals at 6 months poststroke demonstrate response shift unrelated to the impact of their stroke on their function.9 A distinctive challenge for neurologic diseases is that they directly alter the brain-mind that reports patient reported outcomes. Aphasia, anosognosia, and hemisphere emotional valence bias may render patient reports unavailable or unreliable.

Key desirable properties of an outcome scale include validity (agreement between the value of a measurement and the true value), reliability (reproducibility of a measurement), and responsiveness (sensitivity to change). Disability measures include global judgment scales, such as the modified Rankin Scale (mRS) and the Glasgow Outcome Scale (GOS), and activities of daily living scales, such as the Barthel Index and the Functional Independence Measure (FIM). Among the global scales, the mRS is preferred over the GOS because of its greater sensitivity to change (more levels) and the availability of structured assessments and certification programs that improve its reliability.10,11 These desirable properties have made the modified Rankin Scale the most commonly employed outcome measure in acute stroke trials.7 Among the activities of daily living scales, the Barthel Index is generally not suitable as a sole primary endpoint because of pronounced floor ceiling effects.(Figure 1) The FIM has greater sensitivity to change than global scales and lesser issues with ceiling effects than the BI, but is burdensome to perform. A recent innovation in disability outcome scales are item banks ordered by item response theory. These enable scoring of a patient's disability on a continuous linear scale with a modest number of queries, potentially increasing sensitivity to change while minimizing measurement burden.12 Item banks may play more important roles in future stroke trials.

Figure 1.

Figure 1

Final 90 day outcome scores in the two NINDS-TPA trials. The NIHSS and Barthel Index both show a markedly skewed, U-shaped distribution, unfavorable for analytic power and clinical interpretation. The Barthel also shows strong ceiling effect. In contrast, the modified Rankin Scale distributes substantial groups of patients among all hierarchical ranks, permitting more robust analysis and interpretation.

An additional challenge for outcome scales in stroke trials is that some patients and practitioners consider severely disabled states (e.g. persistent vegetative state) as worse, not better, than death, challenging the common assumption in construction and interpretation of outcome scales that death is the worst possible health state. The modified Rankin Scale is therefore often best analyzed by collapsing the levels of 5 (severe disability) and 6 (death) into a single worst outcome category.13 The remaining levels of the mRS are all appropriately monotonically ordered and each is a clinically worthwhile distance from its neighbors on a continuous measure of disability weight, though these distances are not uniform.14

In addition to these cardinal functional measures of outcome, economic measures may be useful adjunctive endpoints in pivotal trials. Cost of each quality or disability adjusted life year gained provide important data for health policy decisions.

Statistical Methods for Analyzing the Primary Endpoint

Since stroke is a condition that cripples as well as kills, final outcome health states in acute stroke trials are arrayed over a spectrum of disability/impairment/handicap. Consequently, in acute trials, the primary outcomes are intrinsically non-binary, and most commonly exist as ordinal scales which array patients among ordered ranks of ascending/descending desirability.

Accordingly, the first decision to be made in the statistical analysis of an acute stroke trial is how to handle the ordinal (multirank) nature of the primary outcome measure – whether to choose a test statistic that reflects all the health state transitions captured in the primary outcome measure, some of the transitions, or only one of the transitions. Analyzing ordinal scales concurrently for benefit at multiple health state transitions has been called shift analysis or analysis over ranks. Multiple test statistics appropriate for shift analysis are available, including the Wilcoxon Rank Sum, the Cochran Mantel Haenszel, and ordered logistic regression. Some require data distributions to behave in tightly ordered ways and others are less restrictive. Analyzing ordinal scales at just one state transition requires dichotomizing the scale at a single score threshold, converting it to a binary good-bad outcome measure, and discarding the remaining outcome information. Intermediate approaches are to use a sliding dichotomy (responder analysis) or reduce the number of levels in a scale, e.g. from 7 to 3 (trichotomizing) or to 4 (tetrachotomizing). Each of these approaches has been employed in major phase 3 stroke trials: polychotomous (shift) analysis (e.g. SAINT, ENOS, FAST-MAG), oligochotomous analysis (e.g. GAIN), responder analysis (e.g. AbESTT 2, PAIS, STICH), and dichotomous (e.g. IST, PROACT 2, ECASS 3).

The two key considerations in selecting statistical techniques by which to analyze the primary endpoint in any controlled trial are efficiency and interpretability. Efficiency refers to a test's power to detect a difference in treatments when such a difference truly exists. More efficient tests enable trials to detect genuine treatment differences employing smaller sample sizes. Interpretability refers whether the test is assessing a difference that is clinically intuitable and clinically important.

Statistical Efficiency

Acute stroke trials are particularly expensive and challenging to conduct. The disease strikes patients unexpectedly, deprives individuals of the ability to consent for themselves to research participation, and is most effectively treated within the first minutes or few hours of onset. Centers capable of recruiting large numbers of patients in early time windows are few, and the number of patients enrolled in multicenter acute trials worldwide is currently less than 5000 each year.1 The success rate of acute stroke trials is dismal – less than 2% of drugs entering human testing have achieved regulatory approval.1,15 For these reasons, it is critical to avoid the use of inefficient statistical tests that render trials underpowered to detect moderate, but clinically worthwhile, treatment effects.

The most efficient statistical test metric for an acute stroke trial varies depending on the expected shape of the treatment effect in the population being studied. When the treatment will improve outcomes across several health state transitions, test metrics that sample all ranks will detect the efficacy signal present at each of the transitions, while dichotomized analyses will detect the efficacy signal present at only one rank. Since dichotomized analysis will miss much of the efficacy signal, shift analysis will be more powerful. Discarding outcome information to reduce a continuous to a binary outcome typically reduces a study's power by at least one-third, often more.16,17 In contrast, when the benefit of a treatment clusters at only a single health state transition, test metrics that sample all ranks will squander some power searching for efficacy signals at health state transitions in which they are absent. In this setting, dichotomized analysis prespecified to focus on the health state transition at which the benefit clusters will be more powerful than shift analysis, but dichotomized analysis prespecified to focus on a health state transition at which the benefit does not cluster will be less powerful than shift analysis.18

Analysis of model and actual acute stroke clinical trials has clarified when to select between analysis over ranks and dichotomized analysis for a particular trial. Most commonly, beneficial treatments improve outcomes to at least a modest degree at multiple health state transitions simultaneously, and shift analysis is a more powerful technique than dichotomized analysis.4,18,19 The Optimizing Acute Stroke Trials Collaboration analyzed 47 trials testing treatments with likely biologic benefit or harm and found that shift analysis was positive in 26% while dichotomized analysis was positive in only 9%.19 However, in certain settings treatment effects do cluster. Three key variables determine whether and where in the outcome spectrum clustering will occur: onset to treatment time, deficit severity at time of treatment, and type of treatment.(Figure 2) In acute ischemic stroke, early after onset, the ischemic field is all or preponderantly salvageable penumbra, with little irreversibly infarcted core tissue yet established, and excellent outcomes are possible. Late after onset, much of the ischemic field is already infarcted, placing a ceiling on the degree of attainable recovery. Patients with mild stroke deficits at treatment start are more likely to attain excellent final outcomes with an effective intervention; patients with severe deficits at treatment start less so. Therapies capable of rescuing all threatened brain tissue can yield excellent outcome clusters; therapies capable of salvaging only fractions of brain tissue will likely provide benefits at multiple health state transitions. As a consequence of these factors, powerful brain-saving therapies applied hyper-early to moderately-severely affected patients, such as recanalization treatments in the first 3 hours of onset, tend to produce benefits clustering at the excellent functional outcome extreme of scales. Therapies applied late to severely affected patients, such as hemicraniectomy for malignant middle cerebral artery infarction, tend to produce benefits clustered at the survival/fair functional outcome extreme of scales.

Figure 2.

Figure 2

Clustering of treatment effect at different health state transitions of the modified Rankin Scale, depending on treatment timing, baseline prognosis, and type of acute stroke intervention. Rows show landmark analyses of three acute ischemic stroke treatments. Final three columns show p values indicating presence or absence of treatment effects at excellent (mRS 0-1), good (mRS 0-2), and fair (mRS 0-4) dichotomizations of the modified Rankin Scale. Cells with p values <0.05 are colored green, 0.06 -0.20 yellow, and >0.20 red. All treatments studied, IV recanalization, IA recanalization, and hemicraniectomy, exert powerful biological effects, so that clustered rather than distributed treatment effects may be expected. In the 2 NINDS trials testing a hyperacute treatment in moderately severe patients, dichotomization at excellent outcome is most efficient. In the PROACT 2 trial testing an early, but not hyperacute, treatment in more severe patients, dichotomization at good outcomes is most efficient. In the hemicraniectomy trials testing a late treatment in extremely severe patients, dichotomization at fair outcomes is most efficient.

In exceptional circumstances, enough information will be known in advance of the trial regarding the shape of the expected treatment effect to specifically guide analytic choice. When a treatment is expected to alter outcomes modestly at a number of health state transitions, as is common for neuroprotective therapies, shift analysis is preferred. When the treatment effect can confidently be expected to strongly cluster at a single health state transition and the site of that transistion can confidently be prespecified, dichotomization is preferred. However, most commonly, trialists have insufficient data from early and mid-stage trials to predict that the shape of the treatment response that will occur in a pivotal trial, and should then use shift analysis since it usually is more powerful.

Between the extremes of analyzing all clinically important health state transistions in an ordinal scale (polychotomous analysis) and only one (fixed dichotomous analysis) are intermediate approaches. These include 1) analyzing the scale with a sliding dichotomy (responder analysis), and 2) collapsing the scale to fewer divisions than in the original, but more than one, e.g. trichotomizing, and performing shift analysis over these fewer ranks (oligochotomous analysis). In sliding dichotomy, a dichotomous good outcome threshold is set at different breakpoints in the scale for different subgroups of patients enrolled in a trial, based on their baseline prognostic features and the expected treatment effect. These intermediate approaches detect signal and expend power at fewer transitions than full shift analyses, but at more transitions (oligochotomous) or more informative transitions (sliding dichotomy) than fixed dichotomous analysis. Accordingly, these approaches have less power than full ordinal analysis but more power than dichotomized analysis to detect treatment effects that exert benefits at multiple health state transitions.20 Conversely, when treatments exert benefits that strongly cluster at different single health transitions in subgroups of patients, or at only two or a few health state transitions in all patients, and these can be prespecified with high confidence before trial performance, sliding dichotomous and oligochotomous analyses will have more power than either full ordinal or dichotomized analysis. Available evidence suggests that most acute stroke treatments exert their benefits at multiple health transistions, not just two or three clusters, so that full ordinal analysis will usually be more powerful than sliding dichotomy or oligochotomous analysis, and these in turn, will usually be more powerful than fixed dichotomous analysis.

Interpretability

It is a fundamental tenet of person-centered, evidence-based medicine that treatment decisions should be based upon all outcomes that the intervention might alter, in proportion to the degree they are valued by the patient. Each of the analytic approaches to outcome scales in acute stroke has obstacles to being converted into values that index this global outcome perspective. Full ordinal analyses are best able to be converted into summary metrics, sliding dichotomous and oligochotomous metrics less so, and fixed dichotomous analyses are intrinsically unable to yield the needed information.

Let us first consider why fixed dichotomized analysis has the greatest difficulty in interpretability. From a binary analysis, the benefit or harm at the one analyzed health state transition can be calculated readily. But the computational ease of this derivation masks the fundamental flaw that it reflects change at only one of the several health state transitions at which it is important to assess treatment effect. The metric needed to guide therapy, the sum of benefits/harms across all important health state transitions, cannot be calculated, or even estimated in any way, from a binary analysis. The weakness of dichotomized analysis in this regard can be seen from considering the analogous situation for letter grades in school classrooms. The school letter grades, A, B, C, D, and F, constitute an ordinal outcome scale with which all Americans are familiar from secondary schooling. When a new teaching technique is introduced, teachers and students are interested in knowing how the pedagogical intervention affects student performance across all levels of the grading scale, not just one transition. Focusing on only the transition from D to C, for example, provides a radically insufficient guide to decision-making. If the intervention provides an even shift for students at every grade transition (B to A, C to B, D to C, etc), analyzing just a single transition would substantially underestimate the number needed to treat for one student to benefit.

Similarly, number needed to treat estimates based on dichotomized analysis typically substantially underestimate acute stroke treatment benefits (Table 1). This underestimate has had deleterious consequences for patients. When tissue plasminogen activator was first approved, many physicians failed to appreciate that number needed to treat estimates based on dichotomized analyses captured only one-third of the therapy's benefit, and frequently misinformed patients and families about the degree of benefit to be expected.21

Table 1.

Treatment Trial(s) Benefit per Hundred per Shift Analysis Benefit per Hundred per Dichotomized 0-2 v 3-6 Proportion of Benefit Missed by Dichotomized Analysis
IV TPA < 3h NINDS 1+2 29 12 59%
IV TPA 3-4.5h ECASS 3 14 5 36%
IA Pro-UK < 6h PROACT 2 17 15 14%
Coiling in SAH ISAT 17 7 59%
Hemicraniectomy Pooled analysis 46 5 88%

Benefit per Hundred = number of patients who benefit per hundred patients treated. IV = intravenous. IA = intra=arterial. TPA = tissue plasminogen activator. Pro-UK = Pro-urokinase. SAH = subarachnoid hemorrhage.

In contrast, full ordinal scale test statistics do provide a basis for robust estimation, albeit not direct calculation, of the total clinical benefit or harm of a therapy. Since acute stroke trials are parallel group trials in which trial arm experiences only one treatment, not crossover design trials, they are unable to directly measure the within patient variance, precluding determination how much of the total group benefit seen occurred through many patients benefting a little versus a few patients benefiting a lot. However, multiple techniques are available to estimate number needed to treat values from full ordinal analyses, including joint outcome table specification, matched pair analysis, derivation of a proportional odds ratio, and conversion of scale ranks into health adjusted life years gained by use of disability weights or quality weights.14,22

The sliding dichotomy approach to ordinal scales again has an intermediate result. If correctly calibrated, the sliding dichotomy will capture more of the benefits and harms of a treatment than a fixed dichotomized analysis, but will still substantially underestimate the total benefit or harm of an intervention compared with full ordinal analysis. For example, for TPA under 3 hours, a standard sliding dichotomy analysis captures only 39% of the actual benefit of TPA observed in full ordinal analysis.23

A drawback of all the standard methods for endpoint analysis is that they each assume a treatment exerts an effect in only one direction across all health state transitions, either explicitly (shift analysis) or implicitly (fixed and sliding dichotomy analysis). When this assumption is invalid, none of these analytic approaches will provide a fully informative delineation of treatment impact. A competing win-lose dichotomy analysis can be useful when benefit tends to cluster at one health state transition and harm at another. For example, in later time windows beyond 4.5 hours, intravenous thrombolysis may improve the rate of excellent outcomes but also increase severe disability and death. A dichotomized analysis at an excellent outcome transition (e.g. mRS 1 to 0 or mRS 2 to 1) can capture the benefit and a separate dichotomized analysis at a poor outcome transition (e.g. mRS 4 to 5 or mortality) can capture the harm. Presenting both competing effects simultaneously to the patient and provider can support an informed decision based on patient and clinician risk-taking preferences.24 It is important that the competing outcomes be independent. An incorrect, but unfortunately common, practice in presenting the effects of thrombolytics is to state benefit using a comprehensive final functional outcome scale, but state harm in terms of symptomatic hemorrhage. Since the effects of hemorrhage are already captured and summarized (together with the effects of reperfusion) in the functional outcome, it can be misleading to report this as a separate outcome.25

When treatment effects are unidirectional, considerations of both efficiency and interpretability favor full ordinal analysis; accordingly, analysis over ranks should generally be the preferred analytic approach in acute stroke trials.(Table 2) Sliding dichotomy is less preferred and fixed dichotomy least preferred – both should be reserved for exceptional circumstances in which the treatment effect is confidently expected to be clustered at a single or a few health state transitions. At all times for decision-making in practice at the bedside, when outcome distributions are compatible with unidirectional treatment effects, number needed to treat values based on full ordinal analysis are preferable over dichotomized and oligotochotomized approaches.

Table 2. Strengths and Weaknesses of Analytic Strategies for Ordinal Scale Outcomes.

Fixed Dichotomy Sliding Dichotomy Full Ordinal Analysis
Power + ++ +++
Appropriate for Broad Trial Population - ++ +++
Consistent Effect Assumption - - -
Calibration independent - - - +++
Ease of Calculation of Partial NNT +++ +++ ++
Ease of Calculation of Total NNT - - - ++

Accounting for Baseline Heterogeneity and Improving Endpoint Measurement Precision

Additional important statistical design steps to consider in endpoint analysis in acute stroke trials are: 1) accounting for baseline patient heterogeneity, and 2) improving precision of endpoint assessment.

Several patient characteristics exert strong prognostic effects on patient outcome after acute stroke. In acute cerebral ischemia, patient age and initial stroke deficit severity are the two most important clinical prognostic factors and ischemic lesion volume and presence and site of large artery occlusion the two most important, readily available imaging prognostic factors, but many others play a role26-29 The influence of the leading prognostic factors upon outcome typically exceeds the influence of the treatment effect acute stroke trials seek to detect. Analyses that fail to adjust for baseline patient heterogeneity have several vulnerabilities, including 1) reduced power to detect treatment effects (typically by 10-30%),30,31 2) understimation of the magnitude of the true treatment effect when using nonlinear effect measures such as odds ratios (due to non-collapsability of within-strata effects),32 and 3) false positive/false negative results if confounding prognostic variables are imbalanced across treatment arms.26 Consequently, acute stroke is a condition for which statistical adjustment for baseline differences in prognostic variables should almost always be performed in the primary trial analysis. Unadjusted analyses are desirable as secondary analyses, to probe the robustness of the signal detected, but are less reliable than adjusted analyses.

The ordinal scales used in endpoint assessment in acute stroke trials typically have moderate inter-rater reliability. Nonetheless, any one scale administered at one time by one rater is somewhat imprecise, due to residual inter-rater variation, patient variation in function over time, and variation in the intrinsic accuracy of different scales at different score levels. Imprecision in measurement of the primary endpoint introduces noise that reduces study power. Validated techniques to reduce inter-rater reliability in assessment of functional outcomes include the use of structured interviews, certified training programs, and central raters.10,11,33 Repeating measures over time can also be useful, allowing several assessments of the target outcome state, rather than just one.34

An additional approach that has proved helpful in selected stroke trials is to measure the target outcome with several similar scales, and statistically combine the measures using a generalized estimating equation. Though theoretically conceived as mapping different dimensions of outcome, measures of neurologic deficit (e.g the NIHSS), activities of daily living (e.g. the BI), and global disability (e.g. the mRS and the GOS), are all strongly correlated with one another, indicating they can also be conceived of as mapping a single latent trait, which has been termed stroke recovery. When these scales are assessed at the same visit, the precision of measurement of the latent trait of stroke recovery is increased over measurement with just one scale. However, the increase in study power provided by the generalized estimating equation comes with a cost in result interpretabiliity. The latent variable being assessed, e.g. favorable recovery, is itself not fully measured on any individual scale. As a result, regulatory agencies often discourage use of the generalized estimating equation in primary endpoint analysis of a pivotal trial.

The cumulative benefits upon increased study power are substantial for the three key statistical strategies reviewed: 1) use all the outcome information in an ordinal scale, 2) adjust for baseline prognostic heterogeneity, and 3) simultaneously incorporate information from multiple recovery scales. Each alone will increase study power compared with unadjusted analysis of crude dichotomy on a single scale. But these techniques are not mutually exclusive, and can be combined, in pairs or all together. In an analysis of a model treatment effect applied to placebo data from the pooled citicoline trial dataset, when all three techniques were used concurrently, study power increased 3 to 6-fold. When full shift analysis and baseline prognosis adjustment were used, leaving out generalized estimating to heighten result interpretability, study power increased 2 to 5 fold (Torres JV et al, Stroke clinical trials efficiency can be improved. International Society for Clinical Biostatistics 31st Annual Meeting, Montpellier, France, September 2010).

Conclusion

In the last decade, several major advances occurred in endpoint analysis of acute stroke trials. Investigators can now knowledgeably select outcome scales that are valid, reliable, and responsive, like the modified Rankin Scale, and analytic techniques that are efficient and interpretable, like full ordinal analysis and sliding dichotomy analysis, to optimize study design and maximize chances of success in finding new treatments for the leading cause of combined death and disability worldwide.

Acknowledgments

Funding: This study was sponsored in part by NIH-NINDS Awards U01 NS 44364 and P50 NS044378 and an American Heart Association Pharmacy Roundtable Health Outcomes Research Center Award.

Footnotes

Disclosures: The University of California, Regents receive funding for Dr. Saver's services as a scientific consultant regarding trial design and conduct to BrainsGate, CoAxia, ev3, Talecris, PhotoThera, and Sygnis (all modest). Dr. Saver is an investigator in the NIH FAST-MAG, MR RESCUE, ICES, CUFFS, CLEAR-ER and IMS 3 multicenter clinical trials for which the UC Regents receive payments based clinical trial performance; has served as an unpaid site investigator in a multicenter trials run by Lundbeck and Mitsubishi for which the UC Regents received payments based on the clinical trial contracts for the number of subjects enrolled; is a site investigator in a multicenter registry run by Concentric for which the UC Regents received payments based on the clinical trial contracts for the number of subjects enrolled; is an employee of the University of California, which holds a patent on retriever devices for stroke; and is funded by NIH-NINDS Awards P50 NS044378 and U01 NS 44364.

References

  • 1.Hong KS, Lee SJ, Hao Q, Liebeskind DS, Saver JL. Acute stroke trials in the 1st decade of the 21th century. Stroke. 2011;42:e314. [Google Scholar]
  • 2.Fisher M, Albers GW, Donnan GA, Furlan AJ, Grotta JC, Kidwell CS, et al. Enhancing the development and approval of acute stroke therapies: Stroke Therapy Academic Industry Roundtable. Stroke. 2005;36:1808–13. doi: 10.1161/01.STR.0000173403.60553.27. [DOI] [PubMed] [Google Scholar]
  • 3.Higashida RT, Furlan AJ, Roberts H, Tomsick T, Connors B, Barr J, et al. Trial design and reporting standards for intra-arterial cerebral thrombolysis for acute ischemic stroke. Stroke. 2003;34:e109–37. doi: 10.1161/01.STR.0000082721.62796.09. [DOI] [PubMed] [Google Scholar]
  • 4.Optimising Analysis of Stroke Trials Collaboration. Calculation of sample size for stroke trials assessing functional outcome: comparison of binary and ordinal approaches. Int J Stroke. 2008;3:78–84. doi: 10.1111/j.1747-4949.2008.00184.x. [DOI] [PubMed] [Google Scholar]
  • 5.Rha JH, Saver JL. The impact of recanalization on ischemic stroke outcome: a meta-analysis. Stroke. 2007;38:967–73. doi: 10.1161/01.STR.0000258112.14918.24. [DOI] [PubMed] [Google Scholar]
  • 6.Mayer SA, Brun NC, Begtrup K, Broderick J, Davis S, Diringer MN, et al. Efficacy and safety of recombinant activated factor VII for acute intracerebral hemorrhage. N Engl J Med. 2008;358:2127–37. doi: 10.1056/NEJMoa0707534. [DOI] [PubMed] [Google Scholar]
  • 7.Quinn TJ, Dawson J, Walters MR, Lees KR. Functional outcome measures in contemporary stroke trials. Int J Stroke. 2009;4:200–5. doi: 10.1111/j.1747-4949.2009.00271.x. [DOI] [PubMed] [Google Scholar]
  • 8.World Health Organization. The International Classification of Impairments, Disabilities and Handicaps. Geneva: 1980. [Google Scholar]
  • 9.Barclay-Goddard R, Epstein JD, Mayo NE. Response shift: a brief overview and proposed research priorities. Qual Life Res. 2009;18:335–46. doi: 10.1007/s11136-009-9450-x. [DOI] [PubMed] [Google Scholar]
  • 10.Saver JL, Filip B, Hamilton S, Yanes A, Craig S, Cho M, et al. Improving the reliability of stroke disability grading in clinical trials and clinical practice: the Rankin Focused Assessment (RFA) Stroke. 2010;41:992–5. doi: 10.1161/STROKEAHA.109.571364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Quinn TJ, Lees KR, Hardemark HG, Dawson J, Walters MR. Initial experience of a digital training resource for modified Rankin scale assessment in clinical trials. Stroke. 2007;38:2257–61. doi: 10.1161/STROKEAHA.106.480723. [DOI] [PubMed] [Google Scholar]
  • 12.Weisscher N, Vermeulen M, Roos YB, de Haan RJ. What should be defined as good outcome in stroke trials; a modified Rankin score of 0-1 or 0-2? J Neurol. 2008;255:867–74. doi: 10.1007/s00415-008-0796-8. [DOI] [PubMed] [Google Scholar]
  • 13.Samsa G, Matchar D, Goldstein L, Bonito A, Duncan PW, Lipscomb J, et al. Utilities for major stroke: results from a survey of preferences among persons at increased risk for stroke. American Heart Journal. 1998;136:703–13. doi: 10.1016/s0002-8703(98)70019-5. [DOI] [PubMed] [Google Scholar]
  • 14.Hong KS, Saver JL. Quantifying the value of stroke disability outcomes: WHO global burden of disease project disability weights for each level of the modified Rankin Scale. Stroke. 2009;40:3828–33. doi: 10.1161/STROKEAHA.109.561365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kidwell CS, Liebeskind DS, Starkman S, Saver JL. Trends in acute ischemic stroke trials through the 20th century. Stroke. 2001;32:1349–59. doi: 10.1161/01.str.32.6.1349. [DOI] [PubMed] [Google Scholar]
  • 16.Altman DG, Royston P. The cost of dichotomising continuous variables. BMJ. 2006;332:1080. doi: 10.1136/bmj.332.7549.1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Federov V, Mannino F, Zhang R. Consequences of dichotomization. Pharm Stat. 2009;8:50–61. doi: 10.1002/pst.331. [DOI] [PubMed] [Google Scholar]
  • 18.Saver JL, Gornbein J. Treatment effects for which shift or binary analyses are advantageous in acute stroke trials. Neurology. 2009;72:1310–5. doi: 10.1212/01.wnl.0000341308.73506.b7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bath PM, Gray LJ, Collier T, Pocock S, Carpenter J. Can we improve the statistical analysis of stroke trials? Statistical reanalysis of functional outcomes in stroke trials. Stroke. 2007;38:1911–5. doi: 10.1161/STROKEAHA.106.474080. [DOI] [PubMed] [Google Scholar]
  • 20.McHugh GS, Butcher I, Steyerberg EW, et al. A simulation study evaluating approaches to the analysis of ordinal outcome data in randomized controlled trials in traumatic brain injury: results from the IMPACT Project. Clin Trials. 2010;7:44–57. doi: 10.1177/1740774509356580. [DOI] [PubMed] [Google Scholar]
  • 21.Gadhia J, Starkman S, Ovbiagele B, Ali L, Liebeskind D, Saver JL. Assessment and improvement of figures to visually convey benefit and risk of stroke thrombolysis. Stroke. 2010;41:300–6. doi: 10.1161/STROKEAHA.109.566935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lansberg MG, Schrooten M, Bluhmki E, Thijs VN, Saver JL. Treatment time-specific number needed to treat estimates for tissue plasminogen activator therapy in acute stroke based on shifts over the entire range of the modified Rankin Scale. Stroke. 2009;40:2079–84. doi: 10.1161/STROKEAHA.108.540708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Saver JL, Yafeh B. Confirmation of tPA treatment effect by baseline severity-adjusted end point reanalysis of the NINDS-tPA stroke trials. Stroke. 2007;38:414–6. doi: 10.1161/01.STR.0000254580.39297.3c. [DOI] [PubMed] [Google Scholar]
  • 24.Kent DM, Selker HP, Ruthazer R, Bluhmki E, Hacke W. Can Multivariable Risk-Benefit Profiling Be Used to Select Treatment-Favorable Patients for Thrombolysis in Stroke in the 3- to 6-Hour Time Window? Stroke. 2006;37:2963–9. doi: 10.1161/01.STR.0000249005.37120.9f. [DOI] [PubMed] [Google Scholar]
  • 25.Saver JL. Hemorrhage after thrombolytic therapy for stroke: the clinically relevant number needed to harm. Stroke. 2007;38:2279–83. doi: 10.1161/STROKEAHA.107.487009. [DOI] [PubMed] [Google Scholar]
  • 26.Mandava P, Kent TA. A method to determine stroke trial success using multidimensional pooled control functions. Stroke. 2009;40:1803–10. doi: 10.1161/STROKEAHA.108.532820. [DOI] [PubMed] [Google Scholar]
  • 27.Konig IR, Ziegler A, Bluhmki E, et al. Predicting long-term outcome after acute ischemic stroke: a simple index works in patients from controlled clinical trials. Stroke. 2008;39:1821–6. doi: 10.1161/STROKEAHA.107.505867. [DOI] [PubMed] [Google Scholar]
  • 28.Johnston KC, Wagner DP, Wang XQ, et al. Validation of an acute ischemic stroke model: does diffusion-weighted imaging lesion volume offer a clinically significant improvement in prediction of outcome? Stroke. 2007;38:1820–5. doi: 10.1161/STROKEAHA.106.479154. [DOI] [PubMed] [Google Scholar]
  • 29.Qureshi AI. New grading system for angiographic evaluation of arterial occlusions and recanalization response to intra-arterial thrombolysis in acute ischemic stroke. Neurosurgery. 2002;50:1405–14. doi: 10.1097/00006123-200206000-00049. [DOI] [PubMed] [Google Scholar]
  • 30.Gray LJ, Bath PM, Collier T. Should stroke trials adjust functional outcome for baseline prognostic factors? Stroke. 2009;40:888–94. doi: 10.1161/STROKEAHA.108.519207. [DOI] [PubMed] [Google Scholar]
  • 31.Hernandez AV, Steyerberg EW, Habbema JD. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. J Clin Epidemiol. 2004;57:454–60. doi: 10.1016/j.jclinepi.2003.09.014. [DOI] [PubMed] [Google Scholar]
  • 32.Kent DM, Trikalinos TA, Hill MD. Are unadjusted analyses of clinical trials inappropriately biased toward the null? Stroke. 2009;40:672–3. doi: 10.1161/STROKEAHA.108.532051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lyden P, Raman R, Liu L, Emr M, Warren M, Marler J. National Institutes of Health Stroke Scale certification is reliable across multiple venues. Stroke. 2009;40:2507–11. doi: 10.1161/STROKEAHA.108.532069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Li N, Elashoff RM, Li G, Saver J. Joint modeling of longitudinal ordinal data and competing risks survival times and analysis of the NINDS rt-PA stroke trial. Stat Med. 2010;29:546–57. doi: 10.1002/sim.3798. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES