Skip to main content
Annals of Oncology logoLink to Annals of Oncology
. 2017 Mar 6;28(8):1730–1733. doi: 10.1093/annonc/mdx064

Statistical controversies in cancer research: using standardized effect size graphs to enhance interpretability of cancer-related clinical trials with patient-reported outcomes

M L Bell 1,*, M H Fiero 2, H M Dhillon 3, V J Bray 4, J L Vardy 5,6
PMCID: PMC5834129  PMID: 28327975

Abstract

Patient reported outcomes (PROs) are becoming increasingly important in cancer studies, particularly with the emphasis on patient centered outcome research. However, multiple PROs, using different scales, with different directions of favorability are often used within a trial, making interpretation difficult. To enhance interpretability, we propose the use of a standardized effect size graph, which shows all PROs from a study on the same figure, on the same scale. Plotting standardized effects with their 95% confidence intervals (CIs) on a single graph clearly showing the null value conveys a comprehensive picture of trial results. We demonstrate how to create such a graph using data from a randomized controlled trial that measured 12 PROs at two time points. The 24 effect sizes and CIs are shown on one graph and clearly indicate that the intervention is effective and sustained.

Keywords: patient-reported outcomes, effect size, graphs, cancer, quality of life

Introduction

With the emphasis on patient centered outcome research [1], patient reported outcomes (PROs) are increasingly important in oncology studies [2]. PROs are reports completed by patients to measure their perception of health status such as quality of life (QoL), cognitive functioning, fatigue, and depression; concepts requiring the patient’s assessment of the effect of the disease or intervention [3]. PROs can be used in support of medical product labeling when the instrument is well defined and reliable [4]; and in routine cancer clinical care to improve early symptom detection and communication between clinicians and patients [5]. Thus, the interpretability of PROs is essential for designing studies, assessing interventions, and decision-making [3, 4]. However, most studies use multiple PROs, often on different scales (with various ranges) and with different minimum important differences [6], making interpretation difficult and time-consuming [7]. This can severely limit the utility of PROs in the assessment of an intervention.

Graphs with effect sizes and confidence intervals (CI), forest plots, have become fundamental tools for displaying results of multiple studies in meta-analyses [8]. Similarly, graphs can be used to display PRO estimates and 95% CIs of intervention effect, standardized to have the same range. This allows a comprehensive visual description of overall patterns and associations of study results. Readers can examine the magnitude of differences found on a standardized scale, while simultaneously assessing likely (see below) statistical significance, based on whether the null value is included in the 95% CI. Using an example, we demonstrate how standardized effect size (SES) graphs for PROs can enhance interpretation of results in an oncology trial.

Example

We used data from a randomized controlled trial evaluating a web-based program to improve cognition (described elsewhere) [9]. Briefly, 242 cancer survivors who had completed adjuvant chemotherapy 6–60 months previously, but still reported cognitive symptoms, were randomized to cognitive rehabilitation or usual care. All outcomes were measured at baseline, 3 and 6 months post-intervention. The primary outcome was self-reported cognitive function, measured by the perceived cognitive impairments subscale of the Functional Assessment of Cancer Therapy-Cognitive Function (FACT-Cog) [10]. Eleven other PROs were used to measure QoL, anxiety, depression, stress, fatigue and other aspects of cognition. These PROs had variable ranges, with higher scores indicating better outcome for some PROs and worse for other PROs.

Standardized effect sizes

Intervention effect size measures for binary outcomes (e.g. odds and risk ratios) are on the scale of 0 to ∞, with a null value of 1. Continuous outcomes, such as differences in means, need to be standardized to be on the same scale. An SES for a continuous outcome is the difference between the means of the arms divided by a standardizing factor, such as the pooled standard deviation. Cohen’s d formula for an SES is

d=X-int-X-ctlSDpooled,

where X- is the average of the intervention (int) or control (ctl) arm. The pooled standard deviation is

SDpooled=nint-1sint2+(nctl-1)sctl2nint+nctl,

where n is the number of subjects and s is the standard deviation of the intervention or control arm [11]. See reference [12] for other SES formulas, including for a single arm study. This transforms each outcome to a z-score, with a mean of zero and variance of one. SESs can be created for any test of differences in continuous outcomes, including t-tests or contrasts from a linear mixed model [13]. CIs are less straightforward, but can be calculated using non-central t-distributions [12, 14] (see supplementary Appendix, available at Annals of Oncology online). A subtlety is that the exclusion of 0 from a 95% CI for a SES does not always indicate statistical significance at the 0.05 level for a two-sided test. However, in the case of larger sample sizes as seen in RCTs, it is rare that P<0.05 does not correspond to a 95% CI for the standardized effect that excludes 0.

In the absence of a clear interpretation of important differences or changes, Cohen suggested guidelines for small, medium, and large effect sizes of d = 0.2, 0.5, and 0.8 [11]. Minimum important differences for PROs are usually in the range of d = 0.2–0.5 [15]. We used mixed models to estimate the difference between arms at both post-baseline time point for each of the 12 PROs. All scores were coded so positive values indicate favorability for the intervention arm. More details are given in the supplementary Appendix, available at Annals of Oncology online.

SES graphs

Figure 1 shows the SES and 95% CI for each PRO in the trial. The 95% CI indicates the likely statistical significance for each outcome (the null value of 0 is excluded) and allows quick evaluation of the intervention effect as a whole. Each PRO has two symbols, circle or square, indicating SES estimates at 3 and 6 months post-intervention. This allows assessment of sustained intervention effect. Direction of SESs was labeled to indicate whether the PRO favored the control or treatment arm. Outcomes were grouped by assessment scales, with the primary outcome at the top. From this graph the reader can easily determine that the intervention was effective, with sustained effects (although slightly decreasing), for most of the PROs measured, with effect sizes that are small to medium sized. Unstandardized results, in this case, mean differences between arms, and/or the standardized effect values can be shown in a table or text on the graph. Another addition to the graph may be vertical references lines indicating small, medium, and large effect sizes. For more on this trial, see reference [9].

Figure 1.

Figure 1.

Standardized effect sizes and their corresponding 95% confidence intervals of each patient reported outcome for 242 participants in an RCT. FACT-COG, Functional Assessment of Cancer Therapy-Cognitive Function; QoL, quality of life; FACT-G, Functional Assessment of Cancer Therapy-General; GHQ, General Health Questionnaire; PSS, Perceived Stress Scale; FACT-F, Functional Assessment of Cancer Therapy-Fatigue.

Discussion

We aimed to illustrate the use of SES graphs for PROs to enhance interpretation of cancer study results. In SES graphs, the statistical significance along with the effect size are visualized, which avoids trying to dichotomize the result into a simple positive or negative finding based on a P-value [16]. Others have considered how to make PROs more interpretable [17–19]. For example, Cappelleri and Bushmakin give an overview of strategies [20] and Anota et al. discuss time-to-deterioration (TTD) [21]. However, these approaches are not designed for simultaneous assessment of multiple PROs, and TTD requires frequent assessment of the PRO. We recommend using SES graphs when multiple PROs are used in oncology trials to provide an overview of trial results. Clinicians, regulatory agencies, patients and families, can use this visual display to make decisions regarding different treatments based on outcomes important to patients.

Funding

Dr Bell is supported by the University of Arizona Cancer Center, through NCI grant P30CA023074.

Disclosure

The authors have declared no conflicts of interest. This article reflects the views of the authors and should not be construed to represent FDA’s views or policies.

Supplementary Material

Supplementary Appendix

References

  • 1. Patient Centered Outcomes Research Institute. PCORI Methodology Standards. Washington, DC: 2012; 1–16. [Google Scholar]
  • 2. Friedlander ML, King MT.. Patient-reported outcomes in ovarian cancer clinical trials. Ann Oncol 2013; 24: x64–x68. [DOI] [PubMed] [Google Scholar]
  • 3. Fairclough DL. Patient reported outcomes as endpoints in medical research. Stat Methods Med Res 2004; 13: 115–138. [DOI] [PubMed] [Google Scholar]
  • 4. Food, Administration D. Guidance for industry patient-reported outcome measures: use in medical product development to support labeling claims. Fed Regist 2009; 74: 65132–65133. [Google Scholar]
  • 5. Howell D, Molloy S, Wilkinson K. et al. Patient-reported outcomes in routine cancer clinical practice: a scoping review of use, impact on health outcomes, and implementation factors. Ann Oncol 2015; 26(9): 1846–1858. [DOI] [PubMed] [Google Scholar]
  • 6. Revicki DA, Cella D, Hays RD. et al. Responsiveness and minimal important differences for patient reported outcomes. Health Qual Life Outcomes 2006; 4(1): 70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Calvert M, Blazeby J, Altman DG. et al. Reporting of patient-reported outcomes in randomized trials: the CONSORT PRO extension. JAMA 2013; 309: 814–822. [DOI] [PubMed] [Google Scholar]
  • 8. Higgins JP, Green S, Cochrane Handbook for Systematic Reviews of Interventions. Wiley Online Library, 2008. [Google Scholar]
  • 9. Bray VJ, Dhillon HM, Bell ML. et al. Evaluation of a web-based cognitive rehabilitation program in cancer survivors reporting cognitive symptoms after chemotherapy. J Clin Oncol 2017; 35: 217–225. [DOI] [PubMed] [Google Scholar]
  • 10. Wagner LI, Sweet J, Butt Z. et al. Measuring patient self-reported cognitive function: development of the functional assessment of cancer therapy-cognitive function instrument. J Support Oncol 2009; 7: W32–W39. [Google Scholar]
  • 11. Cohen J, Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Earlbaum Associates, 1988. [Google Scholar]
  • 12. Nakagawa S, Cuthill IC.. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev Camb Philos Soc 2007; 82: 591–605. [DOI] [PubMed] [Google Scholar]
  • 13. Fitzmaurice GM, Laird NM, Ware JH, Applied Longitudinal Analysis. Hoboken, NJ: Wiley, 2011. [Google Scholar]
  • 14. Kelley K. Confidence intervals for standardized effect sizes: theory, application, and implementation. J Stat Soft 2007; 20(8): 1–24. [Google Scholar]
  • 15. Revicki D, Hays RD, Cella D, Sloan J.. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol 2008; 61: 102–109. [DOI] [PubMed] [Google Scholar]
  • 16. Schmidt K, Schmidtke J, Kohl C. et al. Enhancing the interpretation of statistical P values in toxicology studies: implementation of linear mixed models (LMMs) and standardized effect sizes (SESs). Arch Toxicol 2016; 90(3): 731–751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Marquis P, Chassany O, Abetz L.. A comprehensive strategy for the interpretation of quality-of-life data based on existing methods. Value Health. 2004; 7: 93–104. [DOI] [PubMed] [Google Scholar]
  • 18. Revicki DA, Erickson PA, Sloan JA. et al. Interpreting and reporting results based on patient-reported outcomes. Value Health 2007; 10: S116–S124. [DOI] [PubMed] [Google Scholar]
  • 19. Schünemann HJ, Akl EA, Guyatt GH.. Interpreting the results of patient reported outcome measures in clinical trials: the clinician's perspective. Health Qual Life Outcomes 2006; 4: 62.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Cappelleri JC, Bushmakin AG.. Interpretation of patient-reported outcomes. Stat Methods Med Res 2014; 23: 460–483. [DOI] [PubMed] [Google Scholar]
  • 21. Anota A, Hamidou Z, Paget-Bailly S. et al. Time to health-related quality of life score deterioration as a modality of longitudinal analysis for health-related quality of life studies in oncology: do we need RECIST for quality of life to achieve standardization? Qual Life Res 2015; 24: 5–18. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Appendix

Articles from Annals of Oncology are provided here courtesy of Oxford University Press

RESOURCES