Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2023 Apr 6:2023.04.05.23288189. [Version 1] doi: 10.1101/2023.04.05.23288189

Experiment aversion among clinicians and the public — an obstacle to evidence-based medicine and public health

Randi L Vogt 1,*, Patrick R Heck 1,*, Rebecca M Mestechkin 1, Pedram Heydari 1,2, Christopher F Chabris 1,, Michelle N Meyer 1,†,§
PMCID: PMC10104223  PMID: 37066423

Abstract

Background:

Randomized controlled trials (RCTs) are essential for determining the safety and efficacy of healthcare interventions. However, both laypeople and clinicians often demonstrate experiment aversion: preferring to implement either of two interventions for everyone rather than comparing them to determine which is best. We studied whether clinician and layperson views of pragmatic RCTs for Covid-19 or other interventions became more positive early in the pandemic, which increased both the urgency and public discussion of RCTs.

Methods:

We conducted several survey studies with laypeople (total n=2,909) and two with clinicians (n=895; n=1,254) in 2020 and 2021. Participants read vignettes in which a hypothetical decision-maker who sought to improve health could choose to implement intervention A for all, implement intervention B for all, or experimentally compare A and B and implement the superior intervention. Participants rated and ranked the appropriateness of each decision.

Results:

Compared to our pre-pandemic results, we found no decrease in laypeople’s aversion to non-Covid-19 experiments involving catheterization checklists and hypertension drugs. Nor were either laypeople or clinicians less averse to Covid-19 RCTs (concerning corticosteroid drugs, vaccines, intubation checklists, proning, school reopening, and mask protocols), on average. Across all vignettes and samples, levels of experiment aversion ranged from 28% to 57%, while levels of experiment appreciation (in which the RCT is rated higher than the participant’s highest-rated intervention) ranged from only 6% to 35%.

Conclusions:

Advancing evidence-based medicine through pragmatic RCTs will require anticipating and addressing experiment aversion among both patients and healthcare professionals.

Introduction

Randomized controlled trials (RCTs) are crucial for understanding how to safely, effectively, and equitably prevent and treat disease and deliver healthcare. They have repeatedly upended conventional clinical wisdom and the results of observational studies,1 and are urgently needed to evaluate new technologies.2 However, RCTs often prove controversial, even when they compare interventions that are within the standard of care or otherwise unobjectionable, and about which the relevant expert community is in equipoise.3 Prestigious medical journals have recently published several trials—including SUPPORT4, FIRST5, and iCOMPARE6—that have received considerable criticism from physician-scientists, ethicists, and regulators in those journals7,8 and the public square.912 Although criticisms of RCTs can be complex and nuanced, many reflect a rejection of the very idea that an experiment was conducted, as opposed to simply giving everyone the allegedly superior intervention.

In prior studies—inspired by several “notorious RCTs,” including technology industry “A/B tests”1315—we confirmed that substantial shares of both laypeople and clinicians can be averse to randomized evaluation of efforts to improve health. People rated a pragmatic RCT designed to compare the effectiveness of two interventions significantly lower than the average rating of implementing either one, untested, for everyone, a phenomenon we call the “A/B effect.”16 In some cases, the lower average rating of an experiment could be driven not by dislike of experiments, per se, but by the fact that many people believe one of its arms is inferior to the other,16,17 a belief that is often not evidence-based. We therefore also documented “experiment aversion”: rating an RCT comparing two interventions as worse than even one’s own least-preferred intervention.17 Both patterns of negative sentiments about experiments—including experiments judged to compare two unobjectionable interventions—can impede efforts to identify what does and does not work to improve health outcomes.

The Covid-19 pandemic presented a potential inflection point in attitudes towards health experimentation. In April 2020, 72 Covid-19 drug trials were already underway18 and RCTs became daily, front-page news. That sustained exposure might have educated people about RCTs, or made RCTs more normative. Separately, our previous research suggests that one cause of experiment aversion is an illusion of knowledge—a (mis)perception that experts already must know what works best, and should simply implement that. But Covid-19 was a novel disease, and—at least in the case of pharmaceutical interventions—no sensible person thought the correct treatments were already obvious. People therefore may be less averse to Covid-19 RCTs than to RCTs that test interventions against longstanding conditions or problems. On the other hand, because of the urgency attached to Covid-19, people may be more averse to Covid-19 RCTs, being even less inclined to risk giving someone a treatment that might turn out to “lose” in a comparison study.19,20 Finally, even if the pandemic did not affect public attitudes towards RCTs, it could have affected the attitudes of clinicians, many of whom were involved in Covid-19 research. Because clinicians strongly influence whether particular RCTs are conducted, their attitudes matter.

We investigated attitudes towards experimentation in the first year of the pandemic by conducting a series of preregistered studies between August 2020 and February 2021. First, we used decision-making vignettes from our previous work to ask whether the extraordinary publicity around Covid-19 RCTs reduced general healthcare experiment aversion by the public. Next, we adapted these vignettes to determine whether the public was averse to experimentation on pharmaceutical and/or non-pharmaceutical interventions (NPIs) for Covid-19. Finally, we recruited two large clinician samples to investigate how their attitudes compared to those of laypeople. All three studies were randomized survey experiments in which participants first read about a decision-maker faced with a problem who either implemented one of two interventions (A or B) or ran an experiment to compare them (and then implemented the superior one). Participants then evaluated how appropriate each of those three decisions was.

Methods

Lay Sentiments About Healthcare Experimentation

In August 2020, we used the CloudResearch service to recruit 700 crowd workers on Amazon Mechanical Turk to participate in a brief online survey. These services provide samples that are broadly representative of the U.S. population and are well-accepted in social science research as providing as good or better-quality data than convenience samples such as student volunteers, with results that are similar to probability sampling methods.21,22

Each participant first read a vignette that described a problem that the decision-maker could address in three ways (see Table 1 for examples; see the Supplemental Appendix [SA] for text and motivations for all vignettes): by implementing intervention A for all patients (A); by implementing intervention B for all patients (B); or by conducting an experiment in which patients are randomly assigned to A or B and the superior intervention is then implemented for all (A/B). (Our vignettes are silent about whether consent will be obtained, but IRBs customarily waive consent when it would make low-risk pragmatic RCTs impracticable23; in separate work, we found that substantial shares of people object to such experiments even when we specify that consent will be obtained.24) Next, following standard methods in social and moral psychology for evaluating decisions,25 participants rated each option on a scale of appropriateness from 1 (“very inappropriate”) to 5 (“very appropriate”), with 3 as a neutral midpoint. Participants then rank-ordered the options from best to worst. We also collected demographic information, but found no substantial associations with it in any of our studies (Tables S811).

Table 1.

Vignette text for Catheterization Safety Checklist and Ventilator Proning

(A) Catheterization Safety Checklist (B) Ventilator Proning
Background Some medical treatments require a doctor to insert a plastic tube into a large vein. These treatments can save lives, but they can also lead to deadly infections. Some coronavirus (Covid-19) patients have to be sedated and placed on a ventilator to help them breathe. Even with a ventilator, these patients can have dangerously low blood oxygenation levels, which can result in death. Current standards suggest that laying ventilated patients on their stomach for 12–16 hours per day can reduce pressure on the lungs and might increase blood oxygen levels and improve survival rates.
Intervention A A hospital director wants to reduce these infections, so he decides to give each doctor who performs this procedure a new ID badge with a list of standard safety precautions for the procedure printed on the back. All patients having this procedure will then be treated by doctors with this list attached to their clothing. A hospital director wants to save as many ventilated Covid-19 patients as possible, so he decides that all of these patients will be placed on their stomach for 12–13 hours per day.
Intervention B A hospital director wants to reduce these infections, so he decides to hang a poster with a list of standard safety precautions for this procedure in all procedure rooms. All patients having this procedure will then be treated in rooms with this list posted on the wall. A hospital director wants to save as many ventilated Covid-19 patients as possible, so he decides that all of these patients will be placed on their stomach for 15–16 hours per day.
A/B test A hospital director thinks of two different ways to reduce these infections, so he decides to run an experiment by randomly assigning patients to one of two test conditions. Half of patients will be treated by doctors who have received a new ID badge with a list of standard safety precautions for the procedure printed on the back. The other half will be treated in rooms with a poster listing the same precautions hanging on the wall. After a year, the director will have all patients treated in whichever way turns out to have the highest survival rate. A hospital director thinks of two different ways to save as many ventilated Covid-19 patients as possible, so he decides to run an experiment by randomly assigning ventilated Covid-19 patients to one of two test conditions. Half of these patients will be placed on their stomach for 12–13 hours per day. The other half of these patients will be placed on their stomach for 15–16 hours per day. After one month, the director will have all ventilated Covid-19 patients treated in whichever way turns out to have the highest survival rate.

Participants were randomly assigned to one of two vignettes. In Best Anti-Hypertensive Drug, some doctors in a walk-in clinic prescribe “Drug A” while others prescribe “Drug B” (both of which are affordable, tolerable, and FDA approved) and Dr. Jones prescribes either A or B for all his hypertensive patients, or runs a randomized experiment to compare their effectiveness. In Catheterization Safety Checklist, a hospital director similarly considers two locations where he might display a safety checklist for clinicians (see Table 1A).

We define the “A/B Effect” as the degree to which participants’ ratings of the A/B test were lower than the average of their ratings of implementing A and B.16 “Experiment aversion” is the degree to which participants rated the A/B test lower than their own lowest-rated intervention (either A or B for each person). “Experiment appreciation” is the opposite: the degree to which the experiment is rated higher than each participant’s highest-rated intervention. (See the SA for full details on power analyses and sample sizes, statistical analyses, materials, preregistrations, and data availability.)

Lay Sentiments About Covid-19 Healthcare Experimentation

Between August 2020 and January 2021, we recruited 2,209 additional laypeople in the same manner described above. They read, rated, and ranked six new vignettes involving Covid-19 interventions (N = 339–450 per vignette). Four vignettes were based on Covid-19-related interventions that were discussed, tested, and/or implemented at the time: Masking Rules (which described two masking policies, of varying scope); School Reopening (two school schedules designed to increase social distancing); Best Vaccine (two types of vaccine—mRNA versus inactivated virus); and Ventilator Proning (two protocols for positioning ventilated Covid-19 patients; see Table 1B). The other two vignettes—Intubation Safety Checklist and Best Corticosteroid Drug—were adapted from the first study to apply to Covid-19.

Clinician Sentiments About Covid-19 Healthcare Experimentation

Between November 2020 and February 2021, clinicians (including physicians, physician assistants, and nurse practitioners) in a large health system in the Northeastern U.S. were recruited by email to read, rate, and rank one of four Covid-19-related vignettes (Masking Rules: n = 349; Intubation Safety Checklist: n = 271; Best Corticosteroid Drug: n = 275; Best Vaccine: n = 1254) from the second study.

Results

Lay Sentiments About Healthcare Experimentation

We found substantial negative reactions to A/B testing in both vignettes (Table 2A), replicating our pre-pandemic findings.16,17 In Catheterization Safety Checklist (Figure 1A), we found evidence of the A/B Effect: participants rated the A/B test significantly below the average ratings they gave to implementing interventions A and B (d = 0.69, 95% CI: (0.53, 0.85)). Here, 41% ± 5% (95% CI) of participants expressed experiment aversion (rating the A/B test lower than their own lowest-rated intervention; d = 0.25, 95% CI: (0.11, 0.39)). When ranking the three options from best to worst, only 32% placed the A/B test first, while 48% placed it last.

Table 2.

Sentiments about experiments by vignette and population

Negative sentiment Positive sentiment
Experiment Aversion A/B Effect More people averse than appreciative? More people rank AB test worst than best? More people rank AB test best than worst? More people appreciative than averse? Reverse A/B Effect Experiment Appreciation
(A) Lay Sentiments About Healthcare Experimentation
 Catheterization Safety Checklist
 Best Anti-Hypertensive Drug
(B) Lay Sentiments About Covid-19 Healthcare Experimentation
 Ventilator Proning
 School Reopening
 Masking Rules
 Intubation Safety Checklist
 Best Corticosteroid Drug
 Best Vaccine
(C) Clinician Sentiments About Covid-19 Healthcare Experimentation
 Masking Rules
 Intubation Safety Checklist
 Best Corticosteroid Drug
 Best Vaccine ✓*

Note. The A/B Effect refers to the difference between the average rating of the two interventions and the rating of the A/B test. Experiment Aversion refers to the difference between the lowest-rated intervention and the rating of the A/B test. The Reverse A/B Effect refers to the difference between the rating of the A/B test and the average rating of the two interventions. Experiment Appreciation refers to the difference between the rating of the A/B test and the rating of the highest-rated intervention.

Checkmarks (✓) represent a stastically significant effect at p < .05. In one case, the checkmark is followed by an asterisk (*). This indicates that while the effect reaches statistical significance, the effect size is very small and might have only reached significance due to the large sample size (three times as large as that for other vignettes).

Variables to the right of the thick vertical line are the reverse of those on the left. If no checkmark appears in either of the corresponding columns to the left and right of the thick vertical line (e.g., “More people rank A/B test worst than best?” and “More people rank A/B test best than worst?”), that means that there is no significant difference (e.g., there is no statistically significant difference between the proportion of people ranked that A/B test worst and the proportion of people who ranked the A/B test best).

Figure 1.

Figure 1

Lay Sentiments About Healthcare Experimentation

Note. (A) Percentages of participants objecting to implementing intervention A, intervention B, and the A/B test (objecting was defined as assigning a rating of 1 or 2—”very inappropriate” or “somewhat inappropriate”— on a 1–5 scale). (B) Mean appropriateness ratings, on a 1–5 scale, with SEs, for intervention A, intervention B, the highest-rated intervention, the average intervention, the lowest-rated intervention, and the A/B test.

We also observed an A/B Effect in Best Anti-Hypertensive Drug (Figure 1B); d = 0.52, 95% CI: (0.36, 0.68)), where 44% ± 5% also expressed experiment aversion (d = 0.46, 95% CI: (0.30, 0.52)). Notably, participants were averse to this experiment even though there is no reason to prefer “Drug A” to “Drug B,” and patients are effectively already randomized to A or B based on which clinician happens to see them—which occurs wherever unwarranted variation in practice determines treatments, such as walk-in clinics and emergency departments. Here, however, similar proportions of people ranked the A/B test best and worst (50% vs. 45%; p = 0.16).

These levels of experiment aversion near the height of the pandemic were slightly (but not significantly) higher than those we observed among similar laypeople in 2019 (41% ± 5% in 2020 vs. 37% ± 6% in 2019 for Catheterization Safety Checklist, p = 0.31 ; 44% ± 5% in 2020 vs. 40% ± 6% in 2019 for Best Anti-Hypertensive Drug, p = 0.32).17

Lay Sentiments About Covid-19 Specific Healthcare Experimentation

In all six Covid-19 vignettes, we found evidence of the A/B Effect (Table 2B). In three, however, we did not find experiment aversion: Best Vaccine, Best Corticosteroid Drug, and School Reopening. In the first two of these, participants rated the two interventions very similarly and the experiment only slightly lower (Figure 2B). These vignettes also elicited the largest proportion of participants (65% in Best Vaccine and 56% in Best Corticosteroid Drug) in any vignette who ranked the A/B test best among the three options, compared to 31–34% of participants who ranked it worst. In School Reopening, experiment aversion was not observed because participants on average clearly preferred intervention B to A and rated the experiment similar to intervention A.26,27 53% of participants ranked intervention B as the best of the three options (compared to 17% choosing intervention A and 30% choosing the A/B test).

Figure 2.

Figure 2

Lay Sentiments About Covid-19 Specific Healthcare Experimentation

Note. (A) Percentages of participants objecting to implementing intervention A, intervention B, and the A/B test (objecting was defined as assigning a rating of 1 or 2—”very inappropriate” or “somewhat inappropriate”— on a 1–5 scale). (B) Mean appropriateness ratings, on a 1–5 scale, with SEs, for intervention A, intervention B, the highest-rated intervention, the average intervention, the lowest-rated intervention, and the A/B test.

In the other three vignettes, participants rated the A/B test condition as significantly less appropriate than their lowest-rated intervention (Masking Rules: d = 0.56, 95% CI: (0.41, 0.71); Ventilator Proning: d = 0.17, 95% CI: (0.04, 0.30); Intubation Safety Checklist: d = 0.36, 95% CI: (0.21, 0.49)). These levels of aversion to Covid-19 RCTs are similar to the levels of aversion to non-Covid-19 RCTs both before17 and during the pandemic (see above).

Clinician Sentiments About Covid-19 Specific Healthcare Experimentation

We observed an A/B effect in all four vignettes. In two, clinicians, like laypeople, were also significantly experiment averse (Masking Rules: d = 0.74, 95% CI: (0.57, 0.91); Intubation Safety Checklist: d = 0.30, 95% CI: (0.15, 0.45)). In Best Vaccine, clinicians, like laypeople, did not show any significant difference in their ratings of the A/B test and their lowest-rated intervention (d = −0.03, 95% CI: (−0.10, 0.04)). Again, like laypeople, 58% of clinicians ranked the vaccine A/B test as the best of the three options, the highest proportion of any clinician-rated vignette.

Clinicians differed from laypeople in their response to Best Corticosteroid Drug. Laypeople did not show experiment aversion, but clinicians rated the A/B test as significantly less appropriate than their lowest-rated intervention (d = 0.49, 95% CI: (0.32, 0.66)). This difference may be due to clinicians’ greater familiarity with the treatment of Covid-19. Clinicians may also have seen an urgent need for any drugs to treat Covid-1920 and thus rated adopting a clear treatment intervention as more appropriate than an RCT.

Discussion

We found no diminution in general experiment aversion among laypeople during the first year of the Covid-19 pandemic, despite increased exposure to the nature and purpose of RCTs. Neither laypeople nor clinicians were overall less averse to Covid-19 experiments, despite the fact that confidence in anyone’s knowledge of what works should have been even more circumscribed than in the everyday contexts of hypertension and catheter infections. To the contrary, we found an A/B effect (the average rating of the RCT was lower than the average rating of the two policies) in all vignettes and samples. Most Covid-19 vignettes were met with experiment aversion (on average, participants rated the RCT lower than each participant’s lowest-rated intervention). This is consistent with an emphasis during the pandemic that we must “do” instead of “learn,” a false dichotomy that fails to recognize that implementing an untested intervention is itself a nonconsensual experiment from which, unlike an RCT, little or nothing can be learned.2830 Similarly, across all vignettes and samples, between 28% and 57% of participants demonstrated experiment aversion, while only 6%–35% demonstrated experiment appreciation (by rating the RCT higher than their highest-rated intervention).

In none of our 12 studies were more people appreciative of than averse to the RCT, in none was the average RCT rating higher than the average intervention rating, and in none was the RCT rating higher than each participant’s highest-rated intervention, on average. Notably, unlike trials with placebo or no-contact controls, the A/B tests in our vignettes compared two active, plausible interventions, neither of which was obviously known ex ante to be superior. Yet substantial shares of participants still preferred that one intervention simply be implemented without bothering to determine which (if either) worked best.

There is one bright spot in our results: the most positive sentiment towards experiments was observed in both laypeople and clinicians in the vignettes involving Covid-19 drugs and vaccines. Here we observed the highest proportions of participants who demonstrated experiment appreciation (31%–46%) and who ranked the RCT first (49%–65%). This result is consistent with our previous findings that the illusion of knowledge—here, the belief that either the participant herself or some expert already does or should know the right thing to do and should simply do it—biases people to prefer universal intervention implementation to RCTs.16,17 Rightly or wrongly, both laypeople and clinicians might (a) appropriately recognize that near the start of a pandemic, no one knows which existing drugs, if any, are safe and effective in treating a novel disease, and that new vaccines need to be tested, yet (b) fail to sufficiently appreciate the level of uncertainty around NPIs like masking, proning, and social distancing, which can also benefit from rigorous evaluation. This is consistent with the dearth of RCTs of Covid-19 NPIs:31 of the more than 4,000 Covid-19 trials registered worldwide as of August 2021, only 41 tested NPIs.32 Explaining critical concepts like clinical equipoise or unwarranted variation in medical and NPI practice alike might diminish experiment aversion.

Critics note that RCTs have limited external validity when they employ overly selective inclusion/exclusion criteria or are executed in ways that deviate from how interventions would be operationalized in diverse, real-world settings. However, the solution is not to abandon randomized evaluation, but to incorporate it into routine clinical care and healthcare delivery via pragmatic RCTs.1,33 It has been many years since the Institute of Medicine urged research of many varieties to be embedded in care.34 More recently, the FDA established a Real-World Evidence Program that promotes pragmatic RCTs to support post-marketing monitoring and other regulatory decision-making.35,36 Pragmatic RCTs have been fielded successfully and informed healthcare practice and policy,3739 but they remain far from ubiquitous and they require buy-in to be successful, as shown by the case of a Denmark school reopening trial that was abandoned due to lack of such support.40 Wider use of pragmatic RCTs will require not only redoubling investment in interoperable electronic health records and recalibrating regulators’ views of the comparative risks of research versus idiosyncratic practice variation,1 but also anticipating and addressing experiment aversion among patients and healthcare professionals.

Supplementary Material

1

Figure 3.

Figure 3

Clinician Sentiments About Covid-19 Specific Healthcare Experimentation

Note. (A) Percentages of participants objecting to implementing intervention A, intervention B, and the A/B test (objecting was defined as assigning a rating of 1 or 2—”very inappropriate” or “somewhat inappropriate”— on a 1–5 scale). (B) Mean appropriateness ratings, on a 1–5 scale, with SEs, for intervention A, intervention B, the highest-rated intervention, the average intervention, the lowest-rated intervention, and the A/B test.

Acknowledgements

Supported by Office of the Director, National Institutes of Health (NIH) (3P30AG034532-13S1) and funded by the Food and Drug Administration (FDA). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the FDA. We thank Daniel Rosica and Tamara Gjorgjieva for excellent research assistance. Vogt and Heck contributed equally to this work. Meyer and Chabris contributed equally to this work.

Data availability

Participant response data, preregistrations, materials, and analysis code have been deposited in Open Science Framework and will be released upon final publication of this paper.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Data Availability Statement

Participant response data, preregistrations, materials, and analysis code have been deposited in Open Science Framework and will be released upon final publication of this paper.


Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES