Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jun 1.
Published in final edited form as: Ann Surg. 2024 Feb 23;279(6):907–912. doi: 10.1097/SLA.0000000000006250

Distinguishing Clinical from Statistical Significances in Contemporary Comparative Effectiveness Research

Ajami Gikandi 1, Julie Hallet 2, Bas Groot Koerkamp 3, Clancy J Clark 4, Keith D Lillemoe 5, Raja R Narayan 6, Harvey J Mamon 7, Marco A Zenati 8, Nabil Wasif 9, Dana Gelb Safran 10, Marc G Besselink 11, David C Chang 5, Lara N Traeger 12, Joel S Weissman 13, Zhi Ven Fong 9
PMCID: PMC11087199  NIHMSID: NIHMS1968537  PMID: 38390761

Structured Abstract

Objective:

To determine the prevalence of clinical significance reporting in contemporary comparative effectiveness research (CER).

Background:

In CER, a statistically significant difference between study groups may or may not be clinically significant. Misinterpreting statistically significant results could lead to inappropriate recommendations that increase healthcare costs and treatment toxicity.

Methods:

CER studies from 2022 issues of Annals of Surgery, Journal of the American Medical Association, Journal of Clinical Oncology, Journal of Surgical Research, and Journal of the American College of Surgeons were systematically reviewed by two different investigators. The primary outcome of interest was whether authors specified what they considered to be a clinically significant difference in the Methods.

Results:

Of 307 reviewed studies, 162 were clinical trials and 145 were observational studies. Authors specified what they considered to be a clinically significant difference in 26 studies (8.5%). Clinical significance was defined using clinically validated standards in 25 studies and subjectively in 1 study. Seven studies (2.3%) recommended a change in clinical decision-making, all with primary outcomes achieving statistical significance. Five (71.4%) of these studies did not have clinical significance defined in their methods. In randomized controlled trials with statistically significant results, sample size was inversely correlated with effect size (r = −0.30, p=.038).

Conclusion:

In contemporary CER, most authors do not specify what they consider to be a clinically significant difference in study outcome. Most studies recommending a change in clinical-decision making did so based on statistical significance alone, and clinical significance was usually defined with clinically validated standards.

Keywords: Clinical significance, statistical significance, comparative effectiveness research, clinical trials

Clinical significance mini-abstract

In contemporary comparative effectiveness research, most authors do not specify what they consider to be a clinically significant difference in study outcome and most studies recommending a change in clinical-decision making did so based on statistical significance alone. In randomized controlled trials with statistically significant results, sample size was inversely correlated with effect size.

INTRODUCTION

In clinical research, studies may not specify whether “significant” results are statistically significant and/or clinically significant. Statistical significance, if defined by a p-value of less than 0.05, denotes that the probability of obtaining test results at least as extreme as the observed difference is less than 5% under the assumption that the null hypothesis is true1. Clinical significance, in contrast, denotes a difference in outcomes that is deemed to be important enough to create a lasting impact to pertinent stakeholders such as patients, clinicians, or policy-makers2. Statistically significant results may or may not be clinically significant, and vice-versa27.

As a study’s sample size increases, statistical power increases such that smaller between-group differences are more likely to achieve a statistically significant p-value. Since p-values only reveal whether a treatment effect exists, not the magnitude of effect size, statistically significant results may or may not be associated with clinically significant differences in treatment effect. One concept used to capture clinically meaningful change is the minimal clinically important difference (MCID). Introduced more than 30 years ago to facilitate interpretation of quality of life instruments in patients with chronic heart disease and lung disease, MCID represents “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management8.” Investigators argue that MCID should be called “minimal important difference” to emphasize the importance of the patient’s perspective rather than the physician’s perspective9. Hereafter, we use the term M(C)ID to represent the smallest change in outcome that is meaningful to patients and/or clinicians. M(C)ID now routinely informs sample size calculations for clinical trials and is often used to interpret scores from patient-reported outcome (PRO) instruments. Despite its importance, M(C)ID remain poorly utilized, particularly in the surgical literature1012, likely due to the dearth of PRO studies in the field. In accordance, the 2023 Zurich-Danish conference included defining and utilizing clinically meaningful measures as a consensus recommendation for improving the quality of surgical outcomes research13.

With the increasing availability and utilization of large, administrative databases in research as well as large cooperative group clinical trials14, determining what constitutes a clinically significant difference is critical, especially in comparative effectiveness research (CER), which is designed to inform treatment decision-making and policy implementation15. As such, the primary goal of this study was to quantify how well authors specify what they consider to be clinical significance in contemporary CER studies from the medical and surgical literature. Secondary goals were to determine how clinical significance was defined and whether studies recommended a change in clinical decision-making without explaining what difference they consider to be clinically significant.

METHOD

Study design

This was a cross-sectional study of CER studies published in 2022 issues of five journals: Annals of Surgery, Journal of the American Medical Association, Journal of Clinical Oncology, Journal of Surgical Research, and Journal of the American College of Surgeons. We hypothesized that these journals would adequately represent clinical significance reporting in contemporary medical and surgical literature since they are highly regarded in their respective fields. Two of the journals have also been included in previous studies evaluating clinical significance reporting16,17.

Search strategy and study selection

Inclusion criteria were CER studies using observational or experimental methods that reported a p-value for the primary outcome. Non-inferiority studies were excluded. All abstracts from 2022 issues of the three journals of interest were manually screened for inclusion. Eligible studies were subsequently reviewed in full by two different investigators (A.G. and Z.F.) to abstract variables of interest. Any discrepancies in data abstracted by investigators were addressed and reconciled by committee.

Data abstraction

The primary outcome of interest was whether authors specified what they considered to be clinical significance or a related concept (e.g. clinical importance, clinically meaningful difference, or clinical relevance) in the Methods section of the published manuscript. Information on whether clinical significance was defined using clinically validated and established standards, such as M(C)ID was also abstracted8. Other outcomes of interest included study type, power calculations, whether the primary outcome reached statistical significance, and whether studies explicitly recommended a change in clinical decision-making. Study type was categorized as clinical trial or observational studies. Observational studies were further categorized into registry-based studies, patient-reported outcomes (PRO) studies, single-institution clinical outcome analyses, or multi-institutional clinical outcome analyses. If power calculations were reported in the published manuscript, they were categorized as calculations based on an underlying sampling distribution (e.g. based on standard deviation or standard error), published incidence rates, or omitting a clear rationale. Primary outcomes were not distinguished based on whether they were individual, composite, or surrogate outcomes. Primary outcomes with percentages that allowed for direct comparison of study groups (e.g., 3-year overall survival or complication rate) were also collected.

Statistical analyses

Statistical analysis was performed using STATA software, version 18 (StataCorp, College Station, TX). Descriptive statistics (frequencies and proportions for categorical data, median and interquartile ranges (IQR) for non-parametric continuous data) were used to summarize study characteristics. Comparative analyses for categorical and continuous data were performed using the Chi-squared and Kruskal-Wallis tests respectively. Pearson’s correlation coefficient was used to determine the association between continuous variables. Scatterplots were generated by graphing the magnitude of absolute difference in primary outcome (% study group 1 with primary outcome - % study group 2 with primary outcome) versus sample size. Statistical significance was defined at the p<0.05 level.

RESULTS

Study characteristics

We reviewed 307 CER studies published in 2022 issues of Journal of the American Medical Association (n=90), Annals of Surgery (n=62), Journal of Clinical Oncology (n=58), Journal of Surgical Research (n=55), and Journal of the American College of Surgeons (n=42) (Figure 1). Of these studies, 162 (52.8%) were clinical trials and 145 (47.2%) were observational studies (82 registry-based studies, 40 single-institution analyses, 14 multi-institution analyses, and 9 PRO studies) (Table 1). Median sample size for each study category was as follows: registry-based 13,379 (IQR 1,264–62,574, range 35 – 3,900,000), PRO 169 (IQR 105–388, range 56–1163), single-institution 259 (IQR 118–900, range 42–13,104), multi-institution 900 (IQR 242–1,341, range 89–64,642), and clinical trials 360 (IQR 164–850, range 25–14,104).

Figure 1.

Figure 1

Flow chart of study outcomes

Table 1.

Distribution of study type by journal

Study type Annals of Surgery (n=62) JAMA (n=90) JCO (n=58) JSR (n=55) JACS (n=42) Total
Registry-based 22 (35.5) 19 (21.1) 2 (3.4) 22 (40.0) 17 (40.5) 82 (26.7)
Patient-reported outcomes 2 (3.2) 0 (0) 3 (5.2) 3 (5.5) 1 (2.4) 9 (2.9)
Single-institution analyses 2 (3.2) 1 (1) 0 (0) 25 (45.5) 13 (31.0) 40 (13.0)
Multi-institution analyses 3 (4.8) 5 (5.6) 0 (0) 2 (3.7) 4 (9.5) 14 (4.6)
Clinical trials 33 (53.2) 66 (73.3) 53 (91.6) 3 (5.5) 7 (16.7) 162 (52.8)

Data reported as n (%). JAMA, Journal of the American Medical Association; JCO, Journal of Clinical Oncology; JSR, Journal of Surgical Research; JACS, Journal of the American College of Surgeons

Statistical significance

The primary outcome reached statistical significance (p<0.05) in 168 studies (54.7%). This included 47 / 82 (57.3%) registry-based studies, 4/9 (44.4%) PROs, 28/40 (70.0%) single-institution analyses, 10/14 (71.4%) multi-institution analyses, and 79/162 (48.8%) clinical trials. Median sample size for studies that did and did not reach statistical significance were 508 (IQR 184–2,934) and 510 (IQR 256–1,483), respectively.

Clinical significance

Authors specified what they considered to be a clinically significant difference in the Methods section of 26 studies (8.5%), including 17 clinical trials, 7 PROs, 1 registry-based study, and 1 single-institution analysis. Twenty-five of these studies specified clinical significance using clinically validated standards and one did so subjectively (Table 2). A change in clinical decision-making was recommended in seven studies (3.3%), including five clinical trials and two registry-based studies, all with primary outcomes reaching statistical significance (Supplementary Table 1). Clinical significance was defined in the Methods of five (71.4%) of these studies.

Table 2.

Methods for defining clinical significance in 26 studies

Method N
Clinically validated standard 25
 EORTC QLQ-C3033 3
 6 min Walk Test34 3
 Breast-Q Score35 2
 Quality of Recovery Score36 2
 COPD Assessment Test37 1
 Duchenne Muscular Dystrophy Tests38 1
 Low Anterior Resection Syndrome Score39 1
 Wexner Incontinence Score40 1
 MRC Breathlessness Scale27 1
 Numerical Pain Rating Scale41 1
 Medical Outcomes Study SF-3642 1
 Oswestry Disability Index43 1
 Female Sexual Function Index44 1
 State-Trait Anxiety Inventory45 1
 MD Anderson Dysphagia Inventory46 1
 Expanded Prostate Cancer Index Composite47 1
 Functional Assessment of Cancer Therapy48 1
 Colorectal Functional Outcome Questionnaire49 1
 Hospital Anxiety and Depression Scale50 1
Subjectively 1

Data reported as n (%).

Subgroup analysis: clinical trials and registry-based studies

Of 138 clinical trials reporting power calculations in the published manuscripts, 117 (84.8%) had power calculations based on published incidence rates, 14 (10.1%) had power calculations based on underlying sampling distributions, and 7 (5.1%) omitted a clear rationale for the power calculation. Authors discussed what they considered to be a clinically significant difference in 21 / 117 (18.0%) power calculations based on published incidence rates. Comparing studies that reached statistical significance compared to those that did not, median sample size was 338 (IQR 151–600) vs. 400 (IQR 191–1030) for clinical trials and 18,288 (IQR 2,676–62,574) vs. 4,723 (IQR 678–39,358) for registry-based studies. Among studies with statistically significant group differences, the median between-group difference was 14 (IQR 9.2–23.0) for clinical trials and 5.7 (IQR 3.1–13.7) for registry-based studies. In positive clinical trials, there was an inverse correlation between sample size and effect size (r = −0.30, p=.038) (Figure 2).

Figure 2.

Figure 2

Scatterplot demonstrating correlation between magnitude of absolute difference in primary outcome and sample size. Only studies with primary outcomes reaching statistical significance are shown. A fitted trend line indicates the correlation between variables. r, Pearson’s correlation coefficient.

DISCUSSION

In this review of contemporary CER, most authors did not specify what they consider to be a clinically significant difference. This finding corroborates results from two studies published more than 20 years ago. One study found clinical significance to be documented in 20 / 102 (19%) clinical trials reviewed from three journals between 1975 and 199016. The other study found authors to explicitly discuss the clinical importance of their results in 10 / 27 (37%) clinical trials reviewed from five journals between 1998 and 199917. In comparison, clinical significance was defined in 17 / 162 (10%) of clinical trials that we reviewed. Together, these results demonstrate that over the last few decades, clinical significance has remained poor.

Another main finding was that most studies recommending a change in clinical decision-making did so based on statistical significance alone. This is problematic because when clinical significance is not well-defined, readers may erroneously presume statistically significant results to be clinically significant. At worst, misinterpreting results with trivial effect sizes as clinically significant could lead to flawed recommendations that increase healthcare costs, treatment toxicity, and surgical harm. One notable example of this was a phase 3 trial that randomized 569 patients with advanced pancreatic cancer to erlotinib plus gemcitabine or gemcitabine alone18. The trial demonstrated a statistically significant improvement in survival in the study arm (6.2 months vs. 5.9 months, p=0.038). Does a 10-day difference in survival warrants the additional toxicities and costs associated with an extra chemotherapeutic agent? People will have different opinions on what constitutes clinical significance and weigh other relevant outcomes like side effects and treatment costs differently. Nonetheless, authors should, at minimum, discuss what they consider to be a clinically significant difference in order to help readers interpret study results.

Objective measures may help resolve controversy regarding whether differences are clinically significant. Various working groups have attempted to set benchmarks for clinical significance, but their conclusions need to be validated and are limited to certain disease outcomes and settings1921. Among the studies we reviewed that defined clinical significance, most did so using clinically validated and established standards, rather than subjectively. M(C)ID, which represents the smallest change in outcome that is meaningful to patients and/or clinicians8, was the most commonly used measure. Frequently used for PRO instruments, the M(C)ID is most often estimated by triangulating anchor-based and distribution-based methods22. A specific quality-of-life instrument is administered longitudinally to patients. Subsequently, the changes in scores between time points are then validated by the same patients in order to assign clinical meaning to the quantitative differences (e.g. 0 to 5 indicates “no change”, 5 to 10 indicates “a little change”, 10 to 20 indicates “moderate change”).23 Studies may choose to report the proportion of individual patients achieving a M(C)ID, or the mean difference between study group scores and whether that reaches the M(C)ID. Since the distribution of anchor-based estimates is highly variable regardless of the methodology used24, the standard approach is to ascertain M(C)ID using a triangulation of anchor-based methods in addition to distribution-based methods25.

A recent example of M(C)ID informing the interpretation of statistically significant results was a randomized trial evaluating the effect of extended-release morphine on breathlessness in people with COPD26. The primary outcome was change in the intensity of worst breathlessness, which has a M(C)ID of 1.1 points27. Although the mean change in breathlessness was statistically significant at −0.3 (95% CI, −0.9 to 0.4), the authors appropriately concluded that their findings did not support the use of extended-release morphine to relieve breathlessness since the M(C)ID was not reached. Limitations of M(C)ID are that it is more difficult to apply to non-PRO research, its value varies depending on the method used to calculate it, and it does not take into account baseline levels, perspectives of stakeholders beyond the patient, or treatment costs22.

Lastly, a noteworthy finding from this study was the inverse association between sample size and effect size in positive studies. Although expected, this finding provides real-world evidence for why it is appealing to design studies with large sample sizes28,29. Larger trials have a higher probability of achieving statistical significance, one of the main criterion regulatory agencies use to evaluate pivotal clinical trials and grant new approvals.

Two main strategies have attempted to distinguish clinical from statistical differences. The first involves changing the p-value threshold for statistical significance from, for example, 0.05 to 0.005 for claims of new discoveries in an effort to reduce the non-reproducibility and false positivity rates of studies.30 Operationally, this increases the required number of standard deviations in differences before statistical significance is reached. However, it may not be practical for randomized controlled trials to satisfy the larger sample size requirements needed to detect these lower p-values. Moreover, this strategy does not actually clarify what constitutes a clinically significant difference since p-values can be associated with large or small effect sizes. The second approach is to utilize standardized differences to define a clinically significant difference. The standardized difference is obtained by dividing the difference between the mean of a variable between two groups by an estimate of the standard deviation of that variable. The American Statistical Association recommends the use of standardized differences to qualitatively assess effect sizes, with 0.2, 0.5, and 0.8 denoting small, medium, and large effect sizes respectively.3,31 One limitation of using standardized differences is that it cannot be used for survival outcomes, which are the majority of randomized controlled trials in oncology, unless an arbitrary point on the survival curve is selected. Importantly, both changing the p-value and standardized differences influenced by population size. For example, if the study population consists of a Medicare population with 200,000 patients, the gross difference in values to produce a p-value of 0.005 or a standardized difference of 0.8, might still be small enough to not be clinically meaningful. One option is to report point estimates and confidence intervals without p-values.

Taken together, there is a dire need to standardize and define clinically significant differences. One potential solution to this problem is establishing a technical blueprint that can be utilized in different fields to achieve consensus on the definition of clinical significance. This technical blueprint would require collaboration between biostatisticians, epidemiologists, mixed-methods researchers, clinicians, patients, economists, and clinical content experts. It would serve as the basis of numerous working groups that could then utilize it to determine clinical significance in the different fields of any field of medicine. These efforts would have several important implications. Academically, it will standardize interpretation and conclusion of data in the literature, especially in the era of big data. In the real world, standardizing the definition of clinical significance can guide objective sample size calculations for clinical trials. Consequently, in studies with sample size calculations based on clinically significant minimum differences, any results reaching statistical significance would also be clinically significant. It may also provide clinical significance to quantitative differences that guide government and clinical policy making, such as the Center for Medicare and Medicaid services’ negotiation of drug prices with pharmaceutical companies. Eventually, methodology for defining clinical significance could be incorporated into reporting guidelines and checklists prior to manuscript submission to journals, with results only being “significant” if both statistical and consensus-defined clinical significance are achieved. In the interim, journals, similar to The Journal of the National Cancer Institute, could require authors to qualify the term “significant” as whether it refers to statistical and/or clinical significance32. Table 3 summarizes our suggestions for clinical significance reporting in CER.

Table 3.

Target areas to improve clinical significance reporting in contemporary comparative effectiveness research

Suggestion
Authors should qualify whether “significance” refers to statistical and/or clinical significance when the term is being used
Authors should provide rationale for what they consider to be a clinically significant difference in the Methods
Authors should discuss the clinical significance of findings in the Discussion
Authors and journals should remind readers that statistically significant results may or may not be clinically significant, and vice-versa

LIMITATIONS

The results of this study need to be interpreted in the context of its limitations. First, evaluating whether a study specifies clinical significance is inherently subjective. To minimize subjectivity, strict criteria were used to determining whether studies met this primary outcome. Second, this study does not capture efforts to specify clinical significance that did not make it through the editorial process, such as the prespecified minimum differences used during sample size calculations which may be clinically significant. Both of these limitations likely resulted in this study underestimating the proportion of studies specifying clinical significance. Third, this study reviewed studies published in three high-impact journals over a one-year period. This may not represent contemporary clinical significance reporting in the broader literature, or clinical significance reporting during the time period between the initial studies describing clinical significance reporting published more than 20 years ago and the present study. Fourth, half the clinical trials in this study had positive results. This may reflect publication bias by journals and/or submission bias by authors. Additionally, non-inferiority studies were omitted, while demonstrating non-inferiority may be sufficient to satisfy FDA regulatory needs. Lastly, we acknowledge that superiority clinical trials can have “negative” outcomes that still provide clinically significant information, such as by helping avoiding unnecessary procedures.

CONCLUSIONS

In contemporary CER, most authors do not specify what they consider to be a clinically significant difference in study outcome, most studies recommending a change in clinical-decision making did so based on statistical significance alone, and clinical significance was usually defined with clinically validated standards like M(C)ID. A potential solution to this long-standing issue would be a consensus-based technical blueprint on a definition of a clinically significant difference. Additional target areas to improve clinical significance reporting include 1) authors qualifying whether “significance” refers to statistical and/or clinical significance, 2) authors providing rationale for what they consider to be a clinically significant difference in their Methods, 3) authors discussing the clinical significance of their findings in their Discussion, or 4) authors and journals reminding readers that statistically significant results may or may not be clinically significant, and vice-versa.

Supplementary Material

Supplementary table 1

Footnotes

Conflicts of Interest and Source of Funding: None

Data Availability:

The datasets generated during and/or analyzed during the current study are not publicly available, but are available from the corresponding author on reasonable request.

REFERENCES

  • 1.McShane BB, Gal D. Rejoinder: Statistical significance and the dichotomization of evidence. J Am Stat Assoc. 2017;112:904–908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ranganathan P, Pramesh CS, Buyse M. Common pitfalls in statistical analysis: Clinical versus statistical significance. Perspect Clin Res. 2015;6:169–170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wasserstein RL, Lazar NA. The ASA statement on p-values: Context, process, and purpose. Am Stat. 2016;70:129–133. [Google Scholar]
  • 4.Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Fingerhut A. Probability, P values, and statistical significance: instructions for use by surgeons. Br J Surg. Epub ahead of print December 21, 2022. DOI: 10.1093/bjs/znac440. [DOI] [PubMed] [Google Scholar]
  • 6.van Zwet Erik, Andrew Gelman, Greenland Sander, et al. A New Look at P Values for Randomized Clinical Trials. NEJM Evidence. 2023;3:EVIDoa2300003. [DOI] [PubMed] [Google Scholar]
  • 7.Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature Publishing Group UK. Epub ahead of print March 20, 2019. DOI: 10.1038/d41586-019-00857-9. [DOI] [PubMed] [Google Scholar]
  • 8.Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10:407–415. [DOI] [PubMed] [Google Scholar]
  • 9.Schünemann HJ, Guyatt GH. Commentary--goodbye M(C)ID! Hello MID, where do you come from? Health services research. 2005;40:593–597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Peterson AM, Miller B, Ioerger P, et al. Most-Cited Patient-Reported Outcome Measures Within Otolaryngology-Revisiting the Minimal Clinically Important Difference: A Review. JAMA Otolaryngol Head Neck Surg. 2023;149:261–276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Magouliotis DE, Bareka M, Rad AA, et al. Demystifying the Value of Minimal Clinically Important Difference in the Cardiothoracic Surgery Context. Life.;13. Epub ahead of print March 6, 2023. DOI: 10.3390/life13030716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Embry TW, Piccirillo JF. Minimal Clinically Important Difference Reporting in Randomized Clinical Trials. JAMA Otolaryngol Head Neck Surg. 2020;146:862–863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Domenghino A, Walbert C, Birrer DL, et al. Consensus recommendations on how to assess the quality of surgical interventions. Nat Med. 2023;29:811–822. [DOI] [PubMed] [Google Scholar]
  • 14.Alluri RK, Leland H, Heckmann N. Surgical research using national databases. Ann Transl Med. 2016;4:393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Sox HC, Greenfield S. Comparative effectiveness research: a report from the Institute of Medicine. Ann Intern Med. 2009;151:203–205. [DOI] [PubMed] [Google Scholar]
  • 16.Moher D, Dulberg CS, Wells GA. Statistical power, sample size, and their reporting in randomized controlled trials. JAMA. 1994;272:122–124. [PubMed] [Google Scholar]
  • 17.Chan KB, Man-Son-Hing M, Molnar FJ, et al. How well is the clinical importance of study results reported? An assessment of randomized controlled trials. CMAJ. 2001;165:1197–1202. [PMC free article] [PubMed] [Google Scholar]
  • 18.Moore MJ, Goldstein D, Hamm J, et al. Erlotinib plus gemcitabine compared with gemcitabine alone in patients with advanced pancreatic cancer: a phase III trial of the National Cancer Institute of Canada Clinical Trials Group. J Clin Oncol. 2007;25:1960–1966. [DOI] [PubMed] [Google Scholar]
  • 19.Cherny NI, Sullivan R, Dafni U, et al. A standardised, generic, validated approach to stratify the magnitude of clinical benefit that can be anticipated from anti-cancer therapies: the European Society for Medical Oncology Magnitude of Clinical Benefit Scale (ESMO-MCBS). Ann Oncol. 2015;26:1547–1573. [DOI] [PubMed] [Google Scholar]
  • 20.Ellis LM, Bernstein DS, Voest EE, et al. American Society of Clinical Oncology perspective: Raising the bar for clinical trials by defining clinically meaningful outcomes. J Clin Oncol. 2014;32:1277–1280. [DOI] [PubMed] [Google Scholar]
  • 21.Jenei K, Aziz Z, Booth C, et al. Cancer medicines on the WHO Model List of Essential Medicines: processes, challenges, and a way forward. Lancet Glob Health. 2022;10:e1860–e1866. [DOI] [PubMed] [Google Scholar]
  • 22.Copay AG, Subach BR, Glassman SD, et al. Understanding the minimum clinically important difference: a review of concepts and methods. Spine J. 2007;7:541–546. [DOI] [PubMed] [Google Scholar]
  • 23.Osoba D, Rodrigues G, Myles J, et al. Interpreting the significance of changes in health-related quality-of-life scores. J Clin Oncol. 1998;16:139–144. [DOI] [PubMed] [Google Scholar]
  • 24.Tsujimoto Y, Fujii T, Tsutsumi Y, et al. Minimal important changes in standard deviation units are highly variable and no universally applicable value can be determined. J Clin Epidemiol. 2022;145:92–100. [DOI] [PubMed] [Google Scholar]
  • 25.Revicki D, Hays RD, Cella D, et al. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008;61:102–109. [DOI] [PubMed] [Google Scholar]
  • 26.Ekström M, Ferreira D, Chang S, et al. Effect of Regular, Low-Dose, Extended-release Morphine on Chronic Breathlessness in Chronic Obstructive Pulmonary Disease: The BEAMS Randomized Clinical Trial. JAMA. 2022;328:2022–2032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ekström M, Johnson MJ, Huang C, et al. Minimal clinically important differences in average, best, worst and current intensity and unpleasantness of chronic breathlessness. Eur Respir J.;56. Epub ahead of print August 2020. DOI: 10.1183/13993003.02202-2019. [DOI] [PubMed] [Google Scholar]
  • 28.Ocana A, Tannock IF. When are “positive” clinical trials in oncology truly positive? J Natl Cancer Inst. 2011;103:16–20. [DOI] [PubMed] [Google Scholar]
  • 29.Nathan H, Pawlik TM. Limitations of claims and registry data in surgical oncology research. Ann Surg Oncol. 2008;15:415–423. [DOI] [PubMed] [Google Scholar]
  • 30.Benjamin DJ, Berger JO, Johannesson M, et al. Redefine statistical significance. Nat Hum Behav. 2018;2:6–10. [DOI] [PubMed] [Google Scholar]
  • 31.Muller K, Cohen J. Statistical power analysis for the behavioral sciences. Technometrics. 1989;31:499. [Google Scholar]
  • 32.General Instructions. Oxford Academic. Available from: https://academic.oup.com/jnci/pages/general_instructions. Accessed December 23, 2022.
  • 33.Aaronson NK, Ahmedzai S, Bergman B, et al. The European Organization for Research and Treatment of Cancer QLQ-C30: a quality-of-life instrument for use in international clinical trials in oncology. J Natl Cancer Inst. 1993;85:365–376. [DOI] [PubMed] [Google Scholar]
  • 34.Bohannon RW, Crouch R. Minimal clinically important difference for change in 6-minute walk test distance of adults with pathology: a systematic review. J Eval Clin Pract. 2017;23:377–381. [DOI] [PubMed] [Google Scholar]
  • 35.Stolpner I, Heil J, Feißt M, et al. Clinical Validation of the BREAST-Q Breast-Conserving Therapy Module. Ann Surg Oncol. 2019;26:2759–2767. [DOI] [PubMed] [Google Scholar]
  • 36.Myles PS, Myles DB, Galagher W, et al. Minimal Clinically Important Difference for Three Quality of Recovery Scales. Anesthesiology. 2016;125:39–45. [DOI] [PubMed] [Google Scholar]
  • 37.Kon SSC, Canavan JL, Jones SE, et al. Minimum clinically important difference for the COPD Assessment Test: a prospective analysis. Lancet Respir Med. 2014;2:195–203. [DOI] [PubMed] [Google Scholar]
  • 38.Arora H, Willcocks RJ, Lott DJ, et al. Longitudinal timed function tests in Duchenne muscular dystrophy: ImagingDMD cohort natural history. Muscle Nerve. 2018;58:631–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Carpelan A, Elamo E, Karvonen J, et al. Validation of the low anterior resection syndrome score in finnish patients: preliminary results on quality of life in different lars severity groups. Scand J Surg. 2021;110:414–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rothbarth J, Bemelman WA, Meijerink WJ, et al. What is the impact of fecal incontinence on quality of life? Dis Colon Rectum. 2001;44:67–71. [DOI] [PubMed] [Google Scholar]
  • 41.Salaffi F, Stancati A, Silvestri CA, et al. Minimal clinically important changes in chronic musculoskeletal pain intensity measured on a numerical rating scale. Eur J Pain. 2004;8:283–291. [DOI] [PubMed] [Google Scholar]
  • 42.Ogura K, Yakoub MA, Christ AB, et al. What Are the Minimum Clinically Important Differences in SF-36 Scores in Patients with Orthopaedic Oncologic Conditions? Clin Orthop Relat Res. 2020;478:2148–2158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Copay AG, Glassman SD, Subach BR, et al. Minimum clinically important difference in lumbar spine surgery patients: a choice of methods using the Oswestry Disability Index, Medical Outcomes Study questionnaire Short Form 36, and pain scales. Spine J. 2008;8:968–974. [DOI] [PubMed] [Google Scholar]
  • 44.Krychman M, Rowan CG, Allan BB, et al. Effect of Single-Session, Cryogen-Cooled Monopolar Radiofrequency Therapy on Sexual Function in Women with Vaginal Laxity: The VIVEVE I Trial. J Womens Health. 2018;27:297–304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Kvaal K, Ulstein I, Nordhus IH, et al. The Spielberger State-Trait Anxiety Inventory (STAI): the state scale in detecting mental disorders in geriatric patients. Int J Geriatr Psychiatry. 2005;20:629–634. [DOI] [PubMed] [Google Scholar]
  • 46.Hutcheson KA, Barrow MP, Lisec A, et al. What is a clinically relevant difference in MDADI scores between groups of head and neck cancer patients? Laryngoscope. 2016;126:1108–1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Skolarus TA, Dunn RL, Sanda MG, et al. Minimally important difference for the Expanded Prostate Cancer Index Composite Short Form. Urology. 2015;85:101–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Mah K, Swami N, Le LW, et al. Validation of the 7-item Functional Assessment of Cancer Therapy-General (FACT-G7) as a short measure of quality of life in patients with advanced cancer. Cancer. 2020;126:3750–3757. [DOI] [PubMed] [Google Scholar]
  • 49.Liapi A, Mavrantonis C, Lazaridis P, et al. Validation and comparative assessment of low anterior resection syndrome questionnaires in Greek rectal cancer patients. Ann Gastroenterol Hepatol. 2019;32:185–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.White D, Leach C, Sims R, et al. Validation of the Hospital Anxiety and Depression Scale for use with adolescents. Br J Psychiatry. 1999;175:452–454. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary table 1

Data Availability Statement

The datasets generated during and/or analyzed during the current study are not publicly available, but are available from the corresponding author on reasonable request.

RESOURCES