Abstract
Developing biomarkers that can predict whether patients are likely to benefit from an intervention is a pressing objective in many areas of medicine. Recent guidance documents have recommended that the accuracy of predictive biomarkers, ie, sensitivity, specificity, and positive and negative predictive values, should be assessed. We clarify the meanings of these entities for predictive markers and demonstrate that generally they cannot be estimated from data without making strong untestable assumptions. Language suggesting that predictive biomarkers can identify patients who benefit from an intervention is also widespread. We show that in general one cannot estimate the chance that a patient will benefit from treatment. We recommend instead that predictive biomarkers be evaluated with respect to their ability to predict clinical outcomes among patients treated and among patients receiving standard of care, and the population impact of treatment rules based on those predictions. Ideally these entities are estimated from a randomized trial comparing the experimental intervention with standard of care.
Biomarkers that predict the likelihood that a patient will benefit from an intervention are highly sought after in many areas of medicine. In early-stage breast cancer treatment, for example, the development of biomarkers for identifying which women can be spared chemotherapy has been called the highest translational research priority (1). In advanced colorectal cancer, a RAS mutation in the tumor has been found to predict lack of benefit from anti–epidermal growth factor receptor (EGFR) monoclonal antibodies (2–4). Research on such biomarkers, called “predictive,” “prescriptive,” or “treatment selection” biomarkers (5–11), abounds in cancer therapeutics where interventions are often marginally efficacious, toxic, and costly, so that sparing patients ineffective treatments is expected to improve outcomes and decrease medical costs.
An active research agenda is defining what it means to clinically validate predictive biomarkers. Several guidance documents that address biomarkers of multiple types—predictive biomarkers as well as diagnostic biomarkers, intended to identify patients with and without a clinical condition, and prognostic biomarkers that predict clinical outcomes regardless of intervention—suggest that the clinical accuracy of the biomarkers, ie, sensitivity and specificity, must be assessed (12,13). However, these measures of accuracy need careful definition for predictive biomarkers—the measures defined for diagnostic and prognostic markers do not apply. Unfortunately it is quite common for biomarkers of different types to be discussed as a group in the literature.
Consider a biomarker to identify patients likely to benefit from an experimental treatment instead of the current standard of care. The standard of care might be an existing treatment or no treatment; for simplicity, we refer to standard of care as “no treatment.” Suppose that a “good” and “bad” clinical outcome have been defined, eg, survival to a landmark time following treatment or not. The clinical sensitivity of the biomarker is the proportion of patients with “positive” biomarker results among those who would benefit from the treatment (14–16). This is defined more precisely if we consider that each patient has two potential outcomes, their clinical outcome if given the treatment and their outcome if not given the treatment (17–19). A patient is benefitted by treatment if they would have a good outcome with treatment but not without treatment. The sensitivity of the biomarker can then be defined as the proportion of such patients who test positive with the biomarker. On the other hand, patients whose outcomes would be the same regardless of treatment, or who would have good outcomes without treatment but not with treatment, are not benefitted by treatment. The specificity of the test is the proportion of such patients who are biomarker negative.
While these definitions are compelling, they raise a fundamental problem: Almost never can patients’ outcomes under both treatment options be observed. Rather, for each patient, we only observe the outcome under the treatment he or she received. Whether or not a patient benefits from treatment is not known and not directly measurable. Therefore, in general, the sensitivity and specificity defined above cannot be estimated from data—even data from a randomized clinical trial. Figure 1 shows an example of two biomarkers with very different sensitivities and specificities, but identical data observed in a randomized trial. We note that in very restrictive settings, and under assumptions that cannot be verified (see the Supplementary Materials, available online), statistical methods exist for estimating predictive biomarker sensitivity and specificity (14–16). In general, however, predictive biomarker sensitivity and specificity cannot be estimated.
Figure 1.
Two biomarkers with different accuracy, but identical observed randomized trial data. A) Unobservable data for two binary biomarkers. Patients are categorized according to their potential outcomes with and without an experimental treatment. The sensitivity of the biomarker is the proportion of biomarker-positive patients among those who benefit from the treatment, ie, have good outcomes with treatment but not without treatment. The specificity is the proportion of biomarker-negative patients among those who do not benefit from treatment, ie, have the same clinical outcome regardless of treatment or have a bad outcome with treatment but not without treatment. The data observed in a clinical trial that randomizes patients with equal probability to treatment or standard of care are shown in (B) (see Supplementary Materials, available online, for calculations). The observed treatment effect is null, whereas the probability of treatment benefit is 20%. The two biomarkers have very different sensitivities and specificities, but this cannot be determined based on the observable data. PPV = positive predictive value; NPV = negative predictive value.
Similar arguments apply to positive and negative predictive values. The positive predictive value is the probability that a patient with positive biomarker will benefit from treatment, ie, will have a good outcome under treatment but not under standard of care. A patient with a positive biomarker result would like to know this when deciding on treatment. The negative predictive value, defined as the probability of no benefit from treatment for biomarker negative patients, cannot be estimated either. Unfortunately, Figure 1 demonstrates that biomarkers with very different positive and negative predictive values can have the same observed data, proving that predictive values cannot be estimated from data.
Therefore, guidance documents should not be asking for assessments of accuracy for predictive markers. In our view, instead they should be asking for assessment of the clinical impact of the markers on patient outcomes. This can be done in several steps. First, plots such as Figure 2 can be used to study whether and how the probability of a bad outcome varies with the biomarker for the two treatment groups (5,20–26). Such displays help interpret biomarker results for individual patients: Given my biomarker value, what is the probability I will experience a bad outcome with treatment and without treatment? The difference between the probabilities of a bad outcome with vs without treatment is often called the “treatment effect;” however, it is important to distinguish this from the probability of treatment benefit, which is more compelling but cannot be estimated from data.
Figure 2.
Ideal evaluation of a biomarker given randomized trial data. Rates of bad clinical outcomes under an experimental treatment and under standard of care as a function of a hypothetical continuous biomarker. High biomarker values predict higher rates of bad outcomes under standard of care, and low biomarker values predict higher rates of bad outcomes under treatment. A rule that recommends treatment if the rate of bad outcomes is lower with treatment vs without treatment is indicated; 50% of patients would be recommended treatment, and the treatment effect among those patients is a 10% reduction in the rate of bad outcomes. Thus the population impact of treating based on biomarker value is a 5% reduction in the rate of bad outcomes (0.50 * 0.10 = 0.05); the 50% rate of bad outcomes under standard of care is reduced to 45% under biomarker-based treatment.
Displays such as Figure 2 motivate natural biomarker-based rules for recommending for or against treatment, commonly of the form: Treat if the probability of a bad outcome is lower on treatment than off treatment (5,9,26–31). The second step is to estimate the population impact of using the biomarker-based rule to recommend treatment. What is the reduction in the rate of bad outcomes among those recommended treatment by the rule? What proportion of patients have treatment recommended to them by the rule? And, of ultimate interest (5,9,26,28–32), what is the rate of bad outcomes under biomarker-based treatment, in contrast to the rate under standard of care? Note that a statistical interaction between the biomarker and treatment, while a necessary condition, does not guarantee that the biomarker has population impact (5,11). Ideally, a trial that randomizes participants to the experimental treatment vs standard of care is used to estimate these quantities; the randomization ensures that the treatment groups are comparable.
In summary, accuracy measures such as sensitivity, specificity, and positive and negative predictive values must be defined carefully for predictive biomarkers. In most settings, these measures cannot be estimated. Validation studies should focus on the ability of the biomarker to predict outcomes for patients with and without treatment, and on the population impact of basing treatment recommendations on those predictions. We should recognize that individual-level treatment effects are not generally observable. That is, despite what is desired, typically we cannot assess the ability of a biomarker to predict whether a patient will benefit from treatment—we can only assess its ability to predict differences in outcomes between treated patients and untreated patients.
Funding
R01 CA152089 (PI: Janes); R01 GM54438 (PI: Pepe); R01 CA174779 (PI: Sargent); R01 HL072966 (PI: Heagerty); and UL1TR000423 (PI: Disis).
Supplementary Material
The study sponsor had no role in the design of the study, the collection, analysis, or interpretation of the data, the writing of the manuscript, nor the decision to submit the manuscript for publication.
References
- 1. Dowsett M, Goldhirsch A, Hayes DF, et al. International Web-based consultation on priorities for translational breast cancer research. Breast Cancer Res. 2007;9 (6):R81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Karapetis CS, Khambata-Ford S, Jonker DJ, et al. K-ras mutations and benefit from cetuximab in advanced colorectal cancer. N Engl J Med. 2008;359 (17):1757–1765. [DOI] [PubMed] [Google Scholar]
- 3. Allegra CJ, Jessup JM, Somerfield MR, et al. American Society of Clinical Oncology provisional clinical opinion: testing for KRAS gene mutations in patients with metastatic colorectal carcinoma to predict response to anti-epidermal growth factor receptor monoclonal antibody therapy. J Clin Oncol. 2009;27 (12):2091–2096. [DOI] [PubMed] [Google Scholar]
- 4. Douillard JY, Oliner KS, Siena S, et al. Panitumumab-FOLFOX4 treatment and RAS mutations in colorectal cancer. N Engl J Med. 2013;369 (11):1023–1034. [DOI] [PubMed] [Google Scholar]
- 5. Janes H, Pepe MS, Bossuyt PM, et al. Measuring the performance of markers for guiding treatment decisions. Ann Intern Med. 2011;154 (4):253–259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Sargent D, Allegra C. Issues in clinical trial design for tumor marker studies. Semin Oncol. 2002;29 (3):222–230. [DOI] [PubMed] [Google Scholar]
- 7. Simon R, Maitournam A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clin Cancer Res. 2004;10 (20):6759–6763. [DOI] [PubMed] [Google Scholar]
- 8. Simon R. Lost in translation: problems and pitfalls in translating laboratory observations to clinical utility. Eur J Cancer. 2008;44 (18):2707–2713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Gunter L, Zhu J, S. M Variable selection for optimal decision making. In: Proceedings of the 11th conference on Artificial Intelligence in Medicine. 2007:149–154. [Google Scholar]
- 10. Taube SE, Clark GM, Dancey JE, et al. A perspective on challenges and issues in biomarker development and drug and biomarker codevelopment. J Natl Cancer Inst. 2009;101 (21):1453–1463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Polley MY, Freidlin B, Korn EL, et al. Statistical and practical considerations for clinical evaluation of predictive biomarkers. J Natl Cancer Inst. 2013;105 (22):1677–1683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. US Department of Health and Human Services, Food and Drug Evaluation. Guidance for Industry: Enrichment Strategies for Clinical Trials to Support Approval of Human Drugs and Biological Products: Draft Guidance. In; 2012.
- 13. Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials, Institute of Medicine. Lessons Learned and the Path Forward. 2012.
- 14. Huang Y, Gilbert PB, Janes H. Assessing treatment-selection markers using a potential outcomes framework. Biometrics. 2012;68 (3):687–696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Sitlani CM, Heagerty PJ. Analyzing longitudinal data to characterize the accuracy of markers used to select treatment. Stat Med. 2014;33 (17):2881–2896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zhang Z, Nie L, Soon G, et al. The use of covariates and random effects in evaluating predictive biomarkers under a potential outcomes framework. Ann Appl Stat. 2014;8:2336–2355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Neyman J. The use of covariates and random effects in evaluating predictive biomarkers under a potential outcomes framework, section 9. Translated in: Stat Sci. 1923;5:465–480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66 (5):688–701. [Google Scholar]
- 19. Rubin DB. Bayesian Inference for Causal Effects: The Role of Randomization. Ann Stat. 1978;6:34–58 [Google Scholar]
- 20. Albain KS, Barlow WE, Shak S, et al. Prognostic and predictive value of the 21-gene recurrence score assay in postmenopausal women with node-positive, oestrogen-receptor-positive breast cancer on chemotherapy: a retrospective analysis of a randomised trial. Lancet Oncol. 2010;11 (1):55–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Bonetti M, Gelber RD. Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics. 2004;5 (3):465–481. [DOI] [PubMed] [Google Scholar]
- 22. Royston P, Sauerbrei W. A new approach to modelling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials. Stat Med. 2004;23 (16):2509–2525. [DOI] [PubMed] [Google Scholar]
- 23. Lazar AA, Cole BF, Bonetti M, et al. Evaluation of treatment-effect heterogeneity using biomarkers measured on a continuous scale: subpopulation treatment effect pattern plot. J Clin Oncol. 2010;28 (29):4539–4544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Cai T, Tian L, Wong PH, et al. Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics. 2011;12 (2):270–282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Royston P, Sauerbrei W. Interaction of treatment with a continuous variable: simulation study of power for several methods of analysis. Stat Med. 2014;33 (27):4695–4708. [DOI] [PubMed] [Google Scholar]
- 26. Janes H, Brown MD, Huang Y, et al. An approach to evaluating and comparing biomarkers for patient treatment selection. Int J Biostat. 2014;10 (1):99–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Lu W, Zhang HH, Zeng D. Variable selection for optimal treatment decision. Stat Methods Med Res. 2013;22 (5):493–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Qian M, Murphy SA. Performance Guarantees for Individualized Treatment Rules. Ann Stat. 2011;39 (2):1180–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Zhang B, Tsiatis AA, Laber EB, et al. A robust method for estimating optimal treatment regimes. Biometrics. 2012;68 (4):1010–1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Zhao Y, Zeng D, Rush AJ, et al. Estimating Individualized Treatment Rules Using Outcome Weighted Learning. J Am Stat Assoc. 2012;107 (449):1106–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Matsouaka RA, Li J, Cai T. Evaluating marker-guided treatment selection strategies. Biometrics. 2014; In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Song X, Pepe MS. Evaluating markers for selecting a patient’s treatment. Biometrics. 2004;60 (4):874–883. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.