Abstract
Despite persistent critiques of the rigor of surgical research, surgeons have actually pursued careful empirical studies for centuries. Their work has enriched not only surgical science but also the development of evidence-based medicine. From conducting landmark controlled trials, to using statistics, alternate patient allocation, randomization, and sham controls, surgeons have long embraced innovative trial approaches and played important roles in the development of key methods of randomized controlled trials (RCTs). However, historical contexts unique to surgery have shaped the implementation of RCTs in this field. Unlike the history of pharmaceuticals, in which substantial research funding has been devoted to testing new drugs before their approval, surgical trials have followed a different trajectory. New operations have repeatedly come into wide use in the absence of an RCT. On many occasions, when established procedures have become controversial, surgeons have then marshalled the resources to conduct RCTs reassessing the operations. Such trials have triggered powerful debates in which proponents of surgical RCTs battled against ingrained practices and preferences. In such cases, RCTs often were not decisive factors in determining the fate of surgical practices but supporting tools that followed and reflected changes in surgical judgment already underway. Surgical trialists also have encountered specific, recurring challenges, especially with the methodological and ethical complexity of blinded and sham-controlled trials. The history of surgical trials thus reveals major contributions from surgeons to the advancement of evidence-based medicine, as well as ongoing challenges. Strengthened and systematic support could advance the future of surgical RCTs.
MINI-ABSTRACT
Despite recurrent concerns about the rigor of surgical research, surgeons have made important contributions to the historical development of randomized controlled trials (RCTs). RCTs, however, have faced unique obstacles in surgery. As a result, surgeons have deployed them differently than did other physicians. RCTs have increasingly been used to reassess established but contested surgical procedures.
INTRODUCTION
Surgeons have a long history of critiquing the quality of their research. In his 1947 president’s address to the surgical section of the Royal Society of Medicine, British surgeon Sir Max Page declared that “in the course of its rapid development modern surgery has been over-dependent on judgments tinctured by the emotional reactions common to mankind, and it has largely failed to utilize statistical research.”1 Sixty years later, McGill University surgeon Jonathan Meakins noted the paucity of surgical RCTs: “Unlike in medicine, the randomized trial has not been the hallmark of excellence in the appraisal of surgical innovation—despite prominent advocates in the discipline.”2
Recent evidence has supported some of these critiques. A 2003 review, by concerned surgeons, of the five leading surgical journals found that only 3.4 per cent of published articles were RCTs.3 A 2012 review found methodological concerns in many surgical RCTs: over one-third did not identify their funding source, only 55 per cent included a power calculation, fewer than half were randomized, and only 40 per cent reported blinding. The authors concluded, “the methodologic quality of surgical trials, and their reporting, are in need of significant improvement.”4 These critiques suggest that surgeons have not fully embraced RCTs.
The history of clinical trials in surgery, however, actually tells a more complex story. As we will demonstrate, surgeons have long pursued rigorous investigative methods, implementing some of the earliest attempts at quantitative analysis and standardized controls. Their experiences have revealed compromises that often must be made between methodological ideals and clinical research realities. For instance, surgeons have contended with limited funding for surgical research (as compared with pharmaceutical research), difficult choices about allocating time to surgical research or practice, and ethical concerns about randomization and sham controls. By working through these challenges, surgeons have made substantial contributions to the methodological development of RCTs. Their work has occurred in distinct regulatory contexts: although RCTs have been required for the approval of new pharmaceuticals, RCTs have been encouraged but not legally mandated for new operations. As a result, RCTs have often been used at later stages in surgical research, for instance to challenge procedures that had already come into wide use. Such trials have not always had the desired results. While there are cases in which negative RCTs have debunked controversial operations, there are other cases in which negative results have failed to diminish enthusiasm for popular procedures.
Seen in historical perspective, the perceived modest uptake of RCTs by surgeons has not been due to indifference to methodological rigor but instead reflects the distinct challenges and limitations of surgical RCTs. A deeper understanding of the history of surgical trials offers important insights into the nature of clinical research and opportunities for enhancing the role of RCTs in surgery. We focus on the work of twentieth century surgeons in the US and to some extent the UK, where many key methodological surgical innovations and groundbreaking trials occurred during this period. Even within this limited scope, we have had to be selective. Comparative analyses of the contributions of surgical researchers in other countries would be valuable.
THE LONG SEARCH FOR RIGOR IN SURGICAL RESEARCH
RCTs took shape in the mid-twentieth century as one result of a longer history of clinical trial innovation influenced by specific intellectual and social contexts. Surgical researchers played important roles in this process. The development of surgical research shows the distinct trajectory of multiple components of modern RCTs: controls, statistical analysis, and randomization.
Any examination of the long history of surgical research must cope with inevitable ambiguities about what counts as surgical research. While the status of “surgeon” is clearly delineated today, this was not the case before the twentieth century. What counts as “surgery” has also varied historically: while surgeons have long performed recognizable operations (e.g. amputation), they also performed less invasive procedures (e.g., smallpox inoculation or wound care) and worked to improve medical aspects of surgical interventions (e.g., adjuvant antibiotics). While we focus on trials of major operations by credentialed surgeons, our search for historical antecedents to surgical RCTs sometimes includes broader roles that surgeons have played in trial development, such as in studies of medical interventions by surgeons.
Deliberate analyses of the efficacy of surgical procedures date at least to the sixteenth century. In 1537, French surgeon Ambroise Paré conducted a battlefield comparison of two treatments for gunshot wounds: conventional treatment of cautery with elderberry oil versus a concoction of egg yolk, oil of roses, and turpentine. Although his study was somewhat accidental, as he resorted to the latter remedy only because he had exhausted his elderberry oil supply, he carefully observed and documented the comparative outcomes, finding the egg yolk concoction more effective. In the 1580s Seville surgeon Bartolomé Hidalgo de Agüero compared cleaning and closing wounds to the traditional approach of keeping wounds open to encourage “laudable pus.” The former reduced mortality; he appealed to “disinterested and dispassionate” observers to heed his results.5 Scottish surgeon James Lind has been celebrated for a carefully designed prospective, controlled study of remedies to prevent scurvy (Fig. 1). He tested six remedies—cider; dilute sulfuric acid; vinegar; sea water; lemons and oranges; and a paste of garlic, mustard, radish, and myrrh—among six pairs of sailors on the H.M.S. Salisbury in May 1747. Citrus performed best.6
FIGURE 1.
Portrait of James Lind (1716–1794), by G. Chalmers, c 1783. Wellcome Collection, available at https://wellcomecollection.org/works/q763qfax#licenseInformation. License via https://creativecommons.org/licenses/by/4.0/legalcode.
Surgeons also made important statistical research advances. Boston surgeon Zabdiel Boylston compared mortality statistics to argue that inoculation with smallpox was safer than acquiring the disease naturally (6 deaths among 242 inoculations versus 844 deaths among 5889 cases during a 1721–1722 epidemic).7 Nineteenth century surgeons quantified outcomes of amputations, surgical drainage of empyema, and tracheostomy for croup.6 Statistical debates were rarely conclusive. Skeptics of aggregated statistics inevitably emphasized the variability of individual patients. Different counters also often produced different counts, especially when protagonists omitted data or presented one-sided analyses, as occurred in fierce debates over Joseph Lister’s introduction of antiseptic surgical techniques.8
Eighteenth and nineteenth century scientists and clinicians used blinded controls to test therapeutic interventions, for example with trials of mesmerism, homeopathy, or testicular extracts. However, we have not found blinded surgical trials before 1958.9 Nonetheless, it is clear that surgeons took empirical analysis seriously. Surgery makes major demands on patients. They must accept the promise that cutting into their bodies will relieve suffering. Decisions to undergo surgery were especially fraught before anesthesia and asepsis. Surgeons felt the burden of advising patients to proceed. As Edward Alanson wrote in his 1782 text on amputations, “When we attempt to introduce any new and important deviations from the common mode of practice into general use, and particularly in a point of such consequence, as the directing almost a total change in the mode of performing and after-treating one of the principal operations in surgery, the public have a right to be fully acquainted with the author’s reasons and motives for such attempt; and such trials should likely previously have been made, as are sufficient to demonstrate, that the doctrine recommended will bear the test of general experience.”6
HOSPITALS AND THE GROWTH OF SURGICAL CASE SERIES
Changes in the site and scale of nineteenth century surgical practice led to shifts in standards of surgical knowledge. Before 1850 most surgery took place outside of hospitals. Surgeons, often informally trained, operated in homes, on battlefields, or other sites of dire injuries. Most surgical knowledge derived from individual case reports or small case series. This changed in the late nineteenth century. The development and implementation of antisepsis and asepsis necessitated the hospitalization of surgery. Hospitals built dedicated operating theaters with sterile equipment, anesthesia apparatus, electric lights, and access to x-rays and laboratory tests (Fig. 2).5,10 As more patients underwent and survived hospital surgery, surgeons increased the volume and diversity of their procedures. Some took advantage of these changes to increase the scale and sophistication of their case series. The Mayo brothers, for instance, developed an elaborate case report system and published case series of unprecedented size (e.g., William Mayo’s 1906 account of 1500 gallbladder operations). Surgeons established their authority not based on elaborate trial design but on the magnitude of their experience.
FIGURE 2.
“The New Operating Theatre,” Great Northern Central Hospital, London, circa 1888–1911. Wellcome Collection, available at https://wellcomecollection.org/works/gcqafahs. License via https://creativecommons.org/licenses/by/4.0/legalcode).
Some surgeons recognized the limits of case series. When case studies focused narrowly on the experiences of hospitalized patients, they missed long-term consequences of operations. Theodor Kocher, for instance, did not recognize the problem of myxedema in thyroidectomy patients until it was reported by another surgeon.11 Different surgeons published competing cases series. William Halsted, for instance, used case series to promote radical mastectomy in the 1880s-1890s, while skeptics soon published case series against the procedure.12 Australian surgeons used competing case series to debate prostate procedures in the 1930s without reaching consensus.13 Case series, in the end, proved too permissive: they were used to popularize many operations in the early twentieth century that later fell into disrepute, including operations for ptosis, constipation, or on endocrine glands and autonomic nerves.14 Alton Ochsner and Michael DeBakey warned about the risk of misplaced enthusiasm in 1937: “The proper evaluation of any new therapeutic procedure is always difficult, because one must always rely upon the published data. Obviously, the advocates and originators of a method will be most prolific in their writings, and their pardonable enthusiasm may unwittingly dull their critical judgment.”15
FROM ALTERNATE ALLOCATION TO RANDOMIZATION
Some surgeons recognized that high hospital operative volumes provided opportunities to conduct deliberate trials comparing two or more procedures.5 Standards for this kind of research were changing in the early twentieth century. Aware of how easily bias entered trials, physicians began to implement alternate allocation, a simple technique assigning patients who presented for care to either the experimental or the baseline treatment in alternating order. Investigators increasingly published alternate allocation trials from the 1890s onward.16 Surgeons adopted this technique as well. In the 1910s and 1920s, for instance, surgeons debated the work of New Jersey psychiatrist Henry Cotton who claimed to root out occult infections that he believed caused mental illness by removing his patients’ teeth, tonsils, colons, and other organs (Fig. 3). George Kirby, director of the New York Psychiatric Institute, was skeptical of Cotton’s operations. He and colleagues conducted a trial alternately allocating 120 asylum patients to surgery or standard therapy. As they explained, this reduced “the study as nearly as possible to the terms of an experiment.” They saw no benefit from surgery.17
FIGURE 3.
One of the many popular press articles that promoted Henry Cotton’s theories. From The evening world (New York, NY), 24 July 1919. Available at Chronicling America: Historic American Newspapers, Library of Congress. https://chroniclingamerica.loc.gov/lccn/sn83030193/1919-07-24/ed-1/seq-16/). In the public domain.
Despite improvements in surgical trial design, many obstacles remained. Unlike drug or vaccine trials, it was difficult to standardize operations from one patient to the next (e.g., because of variations in patient anatomy and pathology), or from one surgeon to another (e.g., because of variations in skill and experience). This complicated consistency within alternate allocation trials and thus the generalizability of trial results. Nonetheless, relative to case series, alternate allocation trials offered improved internal accountability and bias control.
Carefully controlled trials, often with alternate allocation, appeared increasingly in surgical research. In 1936 Australian physician David Rosenthal tested whether calcium gluconate improved outcomes in tuberculosis patients after artificial pneumothorax, an intervention that sought to treat the infection by collapsing the most severely affected portions of the lung.18 Swedish surgeon Inga Lindgren began a prospective study of sympathetic ganglionectomy for angina pectoris in 1944. She compared 76 operated patients to 78 non-operated controls, finding similar results in both groups.19 In 1947 surgeons used alternate allocation to test penicillin in surgery.20 In 1950 six US Veterans Administration Hospitals began a prospective study of lobotomy versus psychoanalysis in patients with severe mental illness. The researchers concluded that lobotomy produced better results.21
It is not clear whether patients were aware of their participation in these alternate allocation trials. Late nineteenth and early twentieth century surgeons often obtained patient consent for surgery.13 However, when publishing trials, researchers did not indicate whether patients had consented to be research subjects. Documented informed consent only became standard practice in the second half of the twentieth century after regulatory bodies began to require it.5 While it is possible that early surgical trialists obtained participant consent without explicitly mentioning so in their publications, the absence of any indication of consent in trial reporting is revealing.
If implemented rigorously, alternate allocation approximates randomization. By the 1930s, however, researchers realized that selection bias sometimes compromised alternate allocation schemes. Some trialists subconsciously or intentionally manipulated patient allocation, for example by diverting the sickest patients to intervention groups.16 To remove this bias, pioneering researchers turned to randomization. In the 1940s, Austin Bradford Hill campaigned for RCTs, and while numerous researchers initially resisted the turn toward statistical rather than clinician research authority, others were enthused by the possibilities of more rigorous methods. In the 1940s-1950s, numerous large-scale landmark RCTs began to appear, including a number of important surgical RCTs.5 In March 1953, for instance, a Brooklyn team began a study of surgical and medical management of upper GI bleeding, comparing conservative management (bedrest, sedation, and a liquid diet), immediate intervention (transfusion and gastrectomy), and selective intervention (transfusion, followed by gastrectomy if shock persisted). They started with alternate allocation but settled in 1955 on “a random sampling technic using randomized cards.” They explained, “the pattern of therapy to be provided to any individual patient in the study must be truly random to achieve statistically sound conclusions.” Results from 403 patients found no statistically significant mortality differences among the groups.22
Four noteworthy surgical RCTs launched in 1958. A New Haven study randomized patients to prophylactic portocaval anastomosis or medical management for esophageal varices.23 A research group in Cambridge, England, randomized subjects to simple or radical mastectomy; all patients also received radiation treatment.24 A team in Seattle and one in Kansas City used random allocation and sham controls to test internal mammary artery ligation.25,26 This operation, which had dubious physiological rationale, was developed in Italy in 1939. When it was first performed in the United States in 1956, it met with popular press fanfare and skepticism from many surgeons. The RCTs found that IMA ligation and the sham surgery (a simple skin incision) both provided modest angina relief.27
Other important early surgical RCTs assessed operations for breast cancer and ear infections. The study often (erroneously) cited as the first surgical RCT28 began in 1959. Over the course of nearly a decade, nine investigators in Leeds and York randomly allocated 375 patients to one of three elective duodenal ulcer operations: vagotomy with gastroenterostomy, vagotomy with antrectomy, and subtotal gastrectomy. Surgical opinion was divided over the comparative efficacy of the procedures. Patients were enrolled who qualified for all three procedures, and prior to operating, surgeons received sealed envelopes randomly indicating which operation to perform. Independent clinical panels conducted post-operative patient evaluations, blinded as to which operation a patient had received. Uniform assessment criteria were also established. The trial ultimately found each operation equally unsatisfying.29
While randomization improved upon alternate allocation, some familiar problems persisted. Could the experiences of trialists be generalized and interpreted as a judgment of the absolute value of a procedure? Could surgeons reasonably dismiss unfavorable trial results, asserting that their own skills or techniques were superior to those of the trialists? Trials performed by different surgeons at multiple sites, such as those at York and Leeds, partially addressed these concerns. However, individual surgeons could still object that their unique techniques might produce different results.
FOUR LESSONS FROM THE EARLY HISTORY OF SURGICAL RCTS
This history raises important questions about the origins, scope, methodological challenges, and policy contexts of early surgical trials. First, despite different claims, it is unclear who deserves credit for the first surgical RCT. RCTs did not appear fully formed in the 1940s. They developed gradually as investigators implemented increasingly careful methods. Credit is due for each step, whether alternate allocation, randomization, or sham controls. Though we have sought the earliest examples of surgical RCTs, earlier trials might remain buried in the historical record.
Second, while surgeons recognized the value of blinding in principle, blinded trials have often been unrealistic in practice. Most medication trials are easily blinded: researchers prepare a control tablet with identical look, mouth feel, and, ideally, side effect profile as the experimental drug. However, blinded sham controls are only ethically possible for a narrow range of surgical interventions. The sham procedure should match the active operation in all outward appearances (e.g., anesthesia, incision, post-operative experiences) without posing significant patient risk. In a 1953 lobotomy variant study, a sham only required a skin incision and excision of a small piece of skull.30 Researchers conducted sham-controlled IMA ligation RCTs because they believed the sham procedure of local anesthesia with a small skin incision was an acceptable risk; it is not clear whether the patients were informed of the sham controls.25,26 Most other early surgical RCTs did not use blinded shams, as realistic sham controls for major, invasive procedures have never been considered ethical. Only in the past two decades have surgeons increasingly adopted blinding, deploying it in cleverly designed trials of carefully selected procedures.30
Third, surgeons did not exclusively study surgical procedures. Surgeons joined other physicians in trials of medications as adjuncts for surgery, including anesthesia, antibiotics, and other drugs for intra- and post-operative management. In 1949 investigators at the Birmingham Accident Hospital in England reported results of an RCT assessing penicillin in patients with infected fingers.31 A 1955 study randomized patients to three different antiemetics for post-operative vomiting management.32 A 1958 RCT found that adrenochrome monosemicarbazone did not reduce operative bleeding. The authors noted that the “results are disappointing, but the method by which they were collected and assessed did indicate the value of the blind controlled trial… It could be applied to the investigation of many other forms of treatment which are at present prescribed, without any factual basis for their value, both in surgery and in internal medicine.”33 If trials of medications used in surgery are considered surgical RCTs, the number of RCTs credited to surgeons would increase substantially.
Fourth, surgical RCTs developed in a distinct policy context. The 1962 Kefauver-Harris Amendments to the Food, Drug, and Cosmetic Act empowered the FDA to require evidence of drug efficacy from “well-controlled investigations.” By 1970 this was interpreted to mean RCTs. A veritable explosion of pharmaceutical RCTs soon followed.5 The FDA asserted authority over medical devices more slowly. Only in 1976 did it begin to require FDA approval of new devices.34 The FDA has never regulated surgical procedures except, indirectly, those procedures that require new devices. Oversight of surgical innovation has been left to hospital review committees and professional societies. To the extent that surgeons have performed RCTs, they have acted on their own sense of scientific and professional obligation rather than in compliance with governmental regulatory requirements. While surgeons have deployed RCTs effectively in numerous settings, this has not been done systematically: although it is hard to quantify, it is likely that most surgical procedures now in use have not been tested by an RCT. This has fueled persistent calls for more and larger surgical trials.28
IMPACT OF RCTS ON SURGICAL PRACTICE
As RCTs rose to prominence, physicians and surgeons used them to resolve debates about therapeutic practice. RCTs became tools of knowledge production and professional self-regulation. In the 1970s, surgical and health services researchers began to study this history to assess the impact of surgical RCTs. In 1977, for instance, Boston surgeon Ernest Barsamian reflected on the RCTs of IMA ligation: “Rarely has any operation had its usefulness questioned at the zenith of its popularity in as decisive a test as that to which the mammary artery operation was subjected.” The rapid rise and fall of the procedure between 1956 and 1960 was “a vivid demonstration of the efficacy of a properly designed study in answering difficult questions about the value of a surgical procedure.” As a result of this trial, “the demand for properly controlled studies gained impetus and spread to other surgical and medical procedures.”27
The history of RCTs for other surgical procedures was more complex. Owen Wangensteen introduced gastric freezing to treat ulcers in 1961. By the end of 1963, over 10,000 gastric freezing procedures had been done. Many surgeons were skeptical. Some argued that the procedure froze nothing more than gastric contents; if it did freeze the stomach wall, it would produce dangerous necrosis. Surgeons published critical editorials and case series in 1963 and 1964. They also launched 20 comparative studies, six of which used sham controls and double-blinding. The American Gastroenterological Association sponsored the largest RCT, enrolling 160 patients. Published in 1969, it provided conclusive evidence against gastric freezing. When statistician Lillian Miao reviewed this history in 1977, she, like Barsamian, saw it as a triumph of surgical trials: “Through the collaborative effort on a carefully randomized investigation, the physicians reached a consensus whereupon the use of gastric freezing for the treatment of duodenal ulcer was discontinued.”35 However, competing histories soon emerged. Physician-scientist Harvey Fineberg noted that gastric freezing had already been in steep decline by 1965, before publication of the major RCTs. Fineberg felt that the gastric freezing saga illuminated not the power of RCTs, but their limitations. The 1969 RCT was “of little practical consequence, as if a marble tombstone were erected over the grave of a patient already several years deceased.”36
The history of radical mastectomy trials was even more contentious. In 1942 Philadelphia surgeon Isidor Ravdin called for a randomized assessment of radical mastectomy, but American surgeons did not pursue the challenge (Fig. 4). When postwar surgeons introduced more invasive extended radical mastectomies, skeptics renewed their calls for RCTs. Passionate debates ensued. Many surgeons who were already convinced of a preferable procedure characterized RCTs as entirely unethical.12 Roald Grant, a Bellevue Hospital surgeon who worked with the American Cancer Society, described RCTs as “Scientific Russian Roulette,” comparing them to Nazi experimentation.37 Some critics of radical mastectomy also rejected RCTs, arguing that existing case series already provided sufficient evidence against the procedure. Opponents of RCTs directed much vitriol toward Bernard Fisher, who led trials for the National Surgical Adjuvant Project for Breast and Bowel Cancers. Fisher responded by exhorting at the American Cancer Society in 1970, “I believe that all of us must get these clinical trials done as quickly as possible and not sit on our butts and continue year after year to go through this same type of masturbation.”12 As US surgeons debated the merits of trials, researchers in Denmark and England conducted RCTs finding no benefit of radical mastectomy over simple mastectomy plus radiation.38,39 Surgeons remained divided. As late as 1970, American surgeons still performed radical mastectomy on 80 percent of women with breast cancer. Writing in 1977, Oxford epidemiologist Klim McPherson and MIT molecular biologist Maurice Fox identified factors influencing high radical mastectomy rates, including a strong faith that the most extensive operations maximized the chance of removing all cancer cells and fee schedules providing much higher reimbursements for radical mastectomy.40 American researchers who questioned whether women were receiving optimal care eventually launched large trials of radical versus simple mastectomy in 1971, and of lumpectomy versus mastectomy in 1976. Rates of radical mastectomy in the United States, meanwhile, fell dramatically during the 1970s, before the American trials were published. This decline reflected the impact of European data, changing theories of disease, changing patient preferences, and patient empowerment.12 As had happened with gastric freezing, major RCT findings followed changes in surgical practice already underway.
FIGURE 4.
Breast cancer awareness poster produced by the American Society for the Control of Cancer, c. 1936. Source: National Museum of American History Available at https://visualsonline.cancer.gov/details.cfm?imageid=1817. In the public domain.
The history of RCTs for coronary artery bypass grafting (CABG) offered a different lesson. After a sixty-year gestation, CABG burst onto the surgical scene in 1968. By 1977 over 100,000 procedures were performed annually in the United States.41 Skeptics, especially cardiologists, called for RCTs. Thomas Chalmers, who had directed the NIH Clinical Center in the 1960s, became an outspoken advocate for RCTs, popularizing his mantra “randomize the first patient.” He contended that limited comparative data existed, observing that of 152 published trials of pre-CABG coronary artery surgery, only two had been controlled: the RCTs debunking IMA ligation.42 Surgeons, meanwhile, acknowledged the potential of RCTs but explained why their confidence in CABG was justified without them. Everyone agreed that a sham controlled test (e.g., place but then ligate a saphenous graft) would be unethical, despite “strong scientific interest” in such a study.27
When the first large RCT comparing medication alone versus medication and CABG for chronic stable angina showed that the operation only provided a survival benefit for patients with severe coronary disease, many surgeons dismissed the results. They challenged methodological imperfections in the trial and argued, at a fundamental level, that RCTs simply were not appropriate for surgery. Lawrence Bonchek, for instance, explained that operations required refinement over many years and surgeons had to gain skill through many procedures before a new operation could be fairly tested: the call to randomize the first patient was inappropriate for surgery. Since only worthwhile procedures would survive this process of refinement and scrutiny, it would be inappropriate to conduct an RCT once the procedure had been perfected. As Bonchek concluded, “We should resist the almost religious fervor of those who would sanctify randomized studies as the only means of learning the truth … Modern medical therapy is sufficiently sophisticated so that only physiologically sound operations achieve wide use.”43 The history of CABG reveals the layered complexity of certain surgical RCTs. Some surgeons dismissed their necessity while others embraced them only to encounter a fundamental methodological paradox: surgeons did not want to participate in an RCT of a new operation until they had acquired sufficient expertise with that operation, but by the end of this initial learning curve it was likely that some surgeons would have either abandoned or adopted the operation, undermining both the rationale for an RCT and their willingness to participate in one.
Historical challenges continued. For instance, in more recent history, surgeons at McGill University conducted an RCT of laparoscopic cholecystectomy in 1990 and 1991. They compared patients’ convalescence after laparoscopic surgery and mini-cholecystectomy and found significantly improved experiences in the laparoscopy group.44 The surgical profession valued the trial for quantifying the comparative outcomes and side effects of the two procedures. Yet, the shift in surgeons’ preferences toward laparoscopic cholecystectomy had already taken place by the time trial results were published.45
LEGACIES OF SURGICAL RCTS
The expansion of surgical RCTs between the 1950s and 1990s left a mixed legacy. In cases such as IMA ligation, influential RCTs clarified the safety and efficacy of therapies that had launched based on expert testimonial rather than rigorous experimental validation. Many thought such trials should be done more often. In 1976 Seymour Perry, chair of the NIH Clinical Trials Committee, argued that more RCTs were needed to challenge unverified procedures. He listed many once-popular surgeries that had since been abandoned, including colectomy for epilepsy, bilateral hypogastric-artery ligation for pelvic hemorrhage, portacaval shunt for hepatic cirrhosis, renal-capsule stripping for acute renal failure, sympathectomy for asthma, adrenalectomy for essential hypertension, and wiring for aortic aneurysm.46 This was part of a broader intellectual transition in medical practice: authority increasingly shifted away from the opinions of revered expert practitioners toward the data of evidence-based medicine.5
However, RCT impacts were less clear in other cases. Definitive RCTs of gastric freezing and radical mastectomy were published after the procedures had fallen from favor: the trials added evidence in support of changes already underway. Similarly, publication of the laparoscopic cholecystectomy trial, after years of planning and conduct, confirmed existing shifts in procedure preference rather than instigating such shifts. CABG demonstrated that a new operation could thrive without, and even despite, RCTs. As Walsh McDermott wrote of surgical evidence in 1978, “If the condition to be treated presents an immediate threat to life, if the proposed treatment appears to be physiologically sound, if the treatment appears to be dramatically successful, and if no reasonable therapeutic alternative is available, its efficacy is apt to be considered self-evident.”47
Surgeons widely adopted RCTs as the gold standard for surgical research by the 1980s.5 RCTs became especially relevant as surgeons tackled subtler clinical problems associated with historical shifts in disease burdens toward chronic diseases. While RCTs might not be needed to demonstrate the efficacy of some self-evident procedures in acute settings (e.g., some operations in trauma, hemorrhage, etc.), they are especially valuable for operations in which the relevant outcomes include long-term survival, quality of life, or subjective experience.48 Countless surgical RCTs have now been performed. CABG, for instance, has become one of the most scrutinized of all operations, with scores of trials comparing different conduits, minimally invasive techniques, and the relative indications of CABG versus catheter-based interventions such as balloon angioplasty or coronary stents. Surgeons have continued to pursue sham-controlled trials, for example with important RCTs of neuron implants for Parkinson’s disease and of orthopedic procedures.30 The empirical basis for surgical practice is stronger than ever before.
Critical challenges remain. Numerous operations now in wide use have never been rigorously tested, which has raised concerns regarding the evidence base for these surgeries. A 2014 review of 53 placebo-controlled trials found that surgical procedures performed no better than placebos in just over half of the trials.49 This finding, however, may reflect selection bias: surgeons might only have been willing to submit operations to placebo-controlled trials when they had already become skeptical of their value. The RCTs simply confirmed what researchers had suspected. The authors of the 2014 review also noted another problem. Negative trials often had little impact on surgeons’ approaches: “most of the trials did not result in a major change in practice.”49
Surgical researchers, in principle, increasingly shifted away from case series and expert opinion and toward the RCT as the gold standard of evidence in the twentieth century. In practice, however, the transition was less decisive. Even as surgeons implemented RCTs, they recognized with ever-greater sophistication the obstacles that surgical trials face.2,3,48,50,51 Surgical procedures, which often promise a direct, mechanical fix, have self-evident efficacy for many patients and doctors. This makes it difficult for surgical researchers to maintain the equipoise that RCTs require. Financial conflicts of interest also have produced interesting problems and disincentives for RCTs. Pioneering surgeons historically used large case series to publicize new operations.5 These case series served as advertisements of the surgeons’ skills, attracting patients who could travel for surgery. Elite surgical centers such as the Mayo Clinic, the Cleveland Clinic, or the Texas Heart Institute certainly benefitted. Under traditional reimbursement systems, more publicity led to more patients, higher institutional revenues, and higher surgeon salaries. Given such contexts, surgical investigators often faced competing financial interests when assessing novel operations. Financial interests also could create subtle disincentives for surgeons to undertake RCTs of popular procedures if they feared that results might be worse than desired. Finally, financial interests could color how surgeons responded to trial results. Any trial could be critiqued, and critiques have been especially vociferous when RCTs reported unfavorable findings for popular and well-reimbursed procedures. Aligning reimbursement systems with evidence-based practice could help address these conflicts.
Methodological limitations have continued to complicate, and often undermine, the evidentiary power of surgical RCTs. Blinding has remained an often unachievable ideal. The surgical learning curve complicates when trials should be done. If done too early, RCTs might under-estimate the potential efficacy of a new procedure that is promising but requires further mastery. If begun too late, surgeons and patients may be unwilling to participate.3 Additionally, since individual surgeons each have different skills, questions of generalizability have remained. Operations are complex interventions, with outcomes depending not just on the surgeon and procedure, but also on the patient’s anatomy and pathology; the surgical team of nurses, assistants, and anesthetists; post-operative management; and rehabilitation programs.51 Further, since operations, unlike drug compounds, are unfixed, they present a moving target for research. Long-term trial results can become obsolete before completion if surgeons have continued to refine the procedure in the meantime.52 Moreover, it can be challenging for trials to capture rare surgical complications. Even when RCTs are done, large patient data registry studies of thousands of patients can help augment results on rare complications.
As surgeons have grappled with these methodological challenges, they have faced less pressure to do RCTs: no law requires RCTs of new operations. While third-party payers might someday demand rigorous evidence as a precondition for reimbursement,50 this has not happened systematically. Meanwhile, funding for surgical research has not matched that provided by the pharmaceutical industry for the trials required for new drug approval. Any attempt to expand surgical RCTs will require the investment of more time, funding, and other resources. Notwithstanding these challenges, many surgeons continue to conduct new trials and develop collaborative multicenter trial platforms that will facilitate future work.53
CONCLUSIONS
Despite frequent insinuations that surgeons have resisted evidence-based medicine, surgeons have actually played key roles for centuries in controlled trial development. As surgery itself advanced since the nineteenth century with asepsis, hemostasis, new approaches, and meticulous techniques, surgical research also became more refined, evolving from case series to alternate allocation studies to modern RCTs by the mid-twentieth century. Although fewer RCTs have been done in surgery than in other areas of medicine, surgical trials have made significant contributions to surgical knowledge. Surgeons have also played critical roles in the methodological refinement of RCTs. Surgeons, for instance, helped clarify thinking on the importance of sham controls, even in difficult research conditions.
However, as a result of intellectual, pragmatic, and regulatory differences, surgical RCTs have had a unique history distinct from medical trials. In the absence of a regulatory mandate for surgical RCTs, they have been applied less systematically than in pharmaceutical research. This has not been because surgeons somehow have been anti-intellectual. As Harvard researchers wrote in 1977, “The reason is not that surgeons have been slow to accept new patterns of thought, but rather the very real conceptual, practical, ethical, and economic difficulties of carrying out in adequate numbers and sizes experiments involving complex surgical procedures in human beings.”54
Surgical RCTs certainly have limitations. Nonetheless, the fact that many surgical RCTs have been done, even in the absence of a legal mandate, demonstrates the utility of controlled trials for surgeons. Surgical RCTs have frequently functioned not to bring new operations into use but to consolidate consensus against increasingly contested techniques. Historical analysis has shown, and surgeons know too well, that past patients were exposed to the risks of operations that ultimately proved unnecessary, such as radical mastectomy, gastric freezing, or gastrectomy for ulcers. Thousands of patients could have been spared discomfort, pain, and other adverse consequences had robust RCTs (or other compelling evidence) been produced earlier in the histories of these procedures. This awareness contributes to the enthusiasm that surgeons now have for performing RCTs of contested procedures.
The history of surgical trials also suggests that RCTs are just one of several modes of knowledge production, and surgeons will continue to introduce new interventions in the absence of RCTs. This is sometimes justified. However, particularly when surgical interventions target subjective responses, such as relief of symptoms, or actuarial outcomes, such as improved long-term survival, RCTs can clarify the actual impacts of surgical interventions, distinguish these from placebo effects, and reduce bias. The financial conflicts of interest of surgery demand that surgeons remain scrupulous in assessments of the rationale and outcomes of procedures. Trialists also must think deliberately about the possibility of blinding and contend with surgical learning curves and the potential for ongoing innovation in technique over the course of a trial. While history reveals some seemingly intransigent challenges for surgical RCTs, understanding the practical, intellectual, political, and economic contexts of historical surgical research also indicates potential areas for future progress, for instance, by critically assessing the role for RCTs at the outset of novel operations, developing better funding mechanisms, aligning reimbursement with evidence-based outcomes, broadening discussions about conflicts of interest, and fostering a rigorous commitment to evidence within surgical research.
Sources of funding:
Bothwell: none
Jones: Funding for this Scholarly Works project was made possible by grant 1G13LM012053 from the National Library of Medicine, NIH, DHHS. The views expressed in any written publication, or other media, do not necessarily reflect the official policies of the Department of Health and Human Services; nor does mention by trade names, commercial practices, or organizations imply endorsement by the U.S. Government.
Contributor Information
Laura E. Bothwell, Health Sciences Department, Worcester State University, Worcester, MA.
David S. Jones, Department of Global Health and Social Medicine, Harvard Medical School, Boston, MA.
REFERENCES
- 1.Page M President’s address. Proc Roy Soc Med. 1948;41:113–118.18903032 [Google Scholar]
- 2.Meakins JL. Surgical research: act 3, answers. Lancet. 2009;374:1039–40. [DOI] [PubMed] [Google Scholar]
- 3.Wente MN, Seller CM, Uhl W, et al. Perspectives of evidence-based surgery. Digestive Surgery. 2003;20:263–269. [DOI] [PubMed] [Google Scholar]
- 4.Wenner DM, Brody BA, Jarman AF, et al. Do surgical trials meet the scientific standards for clinical trials. J Am Coll Surg. 2012;215:722–730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bothwell L The emergence of the randomized controlled trial: origins to 1980. Diss. Columbia University. New York: Academic Commons; 2014. [Google Scholar]
- 6.Tröhler U “To improve the evidence of medicine”: the 18th century British origins of a critical approach. Edinburgh: Royal College of Physicians of Edinburgh; 2000. [Google Scholar]
- 7.Blake JB. The inoculation controversy in Boston, 1721–1722. New Engl Q. 1952;25:489–506. [Google Scholar]
- 8.Tröhler U. Statistics and the British controversy about the effects of Joseph Lister’s system of antisepsis for surgery, 1867–1890. JLL Bulletin: Commentaries on the history of treatment evaluation [serial online]. 2014. Available from: http://www.jameslindlibrary.org/articles/statistics-and-the-british-controversy-about-the-effects-of-joseph-listers-system-of-antisepsis-for-surgery-1867-1890/. Accessed January 30, 2019. [Google Scholar]
- 9.Podolsky SH, Jones DS, Kaptchuk TJ. From trials to trials: blinding, medicine, and honest adjudication. In Robertson CT, Kesselheim AS, eds. Blinding as a solution to bias: strengthening biomedical science, forensic science, and law, Amsterdam: Elsevier; 2016:45–58. [Google Scholar]
- 10.Surgery Schlich T., science and modernity: operating rooms and laboratories as spaces of control. Hist of Sci 45;2007:231–256. [Google Scholar]
- 11.Schlich T The origins of organ transplantation: surgery and laboratory science, 1880–1930. Rochester: University of Rochester Press; 2010. [Google Scholar]
- 12.Lerner BH. The breast cancer wars: fear, hope, and the pursuit of a cure in twentieth-century America. New York: Oxford University Press; 2011. [PubMed] [Google Scholar]
- 13.Wilde S See one, do one, modify one: prostate surgery in the 1930s. Med Hist. 48;2004:351–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Barnes BA. Discarded operations: surgical innovation by trial and error. In: Barnes BA, Mosteller F, eds. Costs, risks, and benefits of surgery. New York: Oxford University Press; 1977:109–123. [Google Scholar]
- 15.Ochsner A, DeBakey M. The surgical treatment of coronary disease. Surgery 1937;2:428–455. [Google Scholar]
- 16.Bothwell L, Podolsky S. The emergence of the randomized, controlled trial. NEJM. 2016;375:501–504. [DOI] [PubMed] [Google Scholar]
- 17.Wessely S Surgery for the treatment of psychiatric illness: the need to test untested theories. JLL Bulletin: Commentaries on the history of treatment evaluation [serial online]. 2009. Available from: http://www.jameslindlibrary.org/articles/surgery-for-the-treatment-of-psychiatric-illness-the-need-to-test-untested-theories/. Accessed January 30, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Rosenthal DB. The incidence of pleural effusion in artificial pneumothorax. BMJ. 1936;1:95–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bergiund E Historical notes: the first control study on treatment of angina pectoris - Inga Lindgren in 1950. Int J Card. 1999;70:191–193. [DOI] [PubMed] [Google Scholar]
- 20.Mitchell GAG. The value of penicillin in surgery. BMJ. 1947;1:41–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Pressman J Last resort: psychosurgery and the limits of medicine. Cambridge: Cambridge University Press; 1998. [Google Scholar]
- 22.Enquist IF, Karlson KE, Dennis C, et al. Statistically valid ten-year comparative evaluation of three methods of management of massive gastroduodenal hemorrhage. Ann Surg. 1965;162:550–560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Conn HO, Lindenmuth WW. Prophylactic portacaval anastomosis in cirrhotic patients with esophageal varices. NEJM. 1962;266:743–749. [DOI] [PubMed] [Google Scholar]
- 24.Brinkley D, Haybittle JL. Treatment of stage-II carcinoma of the female breast. Lancet 1966;288):291–295. [DOI] [PubMed] [Google Scholar]
- 25.Cobb LA, Thomas GI, Dillard DD, et al. An evaluation of internal-mammary-artery ligation by a double-blind technic. NEJM. 1959;260:1115–1118. [DOI] [PubMed] [Google Scholar]
- 26.Dimond EG, Kittle CF, Crockett JF. Comparison of internal mammary artery ligation and sham operation for angina pectoris. Am J Card. 1960;5:483–486. [DOI] [PubMed] [Google Scholar]
- 27.Barsamian EM. The rise and fall of internal mammary artery ligation in the treatment of angina pectoris and the lessons learned. In: Barnes BA, Mosteller F, eds. Costs, risks, and benefits of surgery. New York: Oxford University Press; 1977:213–220. [Google Scholar]
- 28.Rahbari NN, Diener MK, Wente MN, Seiler CM. Development and perspectives of randomized controlled trials. Am J Surg 2007;194suppl:S148–S152. [Google Scholar]
- 29.Goligher JC, Pulvertaft CN, Watkinson G. Controlled trial of vagotomy and gastroenterostomy, vagotomy and antrectomy, and subtotal gastrectomy in elective treatment of duodenal ulcer. BMJ 1964;1(5381):455–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wartolowska KA, Beard DJ, Carr AJ. The use of placebos in controlled trials of surgical interventions: a brief history. J Roy Soc Med. 2018;111:177–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Harrison SH, Topley E, Lennard-Jones J. The value of systemic penicillin in finger pulp infections: a controlled trial of 169 cases. Lancet. 1949;1(6550):425–430. [DOI] [PubMed] [Google Scholar]
- 32.Knapp MR, Beecher HK. Postanesthetic nausea, vomiting, and retching: evaluation of the antiemetic drugs dimenhydrinate (Dramamine), chlorpromazine, and pentobarbital sodium. JAMA. 1956;160:376–385. [DOI] [PubMed] [Google Scholar]
- 33.Calnan J, Innes FLF. Control of bleeding at operation: a trial of adrenochrome monosemicarbazone (adrenoxyl). Brit J Plast Surg. 1958;11:87–96. [Google Scholar]
- 34.Reputation Carpenter D. and power: organizational image and pharmaceutical regulation at the FDA. Princeton: Princeton University Press; 2010. [Google Scholar]
- 35.Miao LL. Gastric freezing: an example of the evaluation of medical therapy by randomized clinical trials. In: Barnes BA, Mosteller F, eds. Costs, risks, and benefits of surgery. New York: Oxford University Press; 1977:198–211. [Google Scholar]
- 36.Fineberg HV. Gastric freezing--a study of diffusion of a medical innovation. In: National Research Council, Medical Technology and the Health Care System. Washington, DC: National Academy of Sciences; 1979:173–200. [Google Scholar]
- 37.Grant RN. Scientific Russian roulette. Ca. 1963;13:44–45. [Google Scholar]
- 38.Kaae S, Johansen H. Simple mastectomy plus postoperative irradiation by the method of McWhirter for mammary carcinoma. Prog Clin Cancer. 1965;1:453–461. [PubMed] [Google Scholar]
- 39.Brinkley D, Haybittle JL. Treatment of stage-II carcinoma of the female breast. Lancet. 1966;288:291–295. [DOI] [PubMed] [Google Scholar]
- 40.McPherson K, Fox MS. Treatment of breast cancer. In: Barnes BA, Mosteller F, eds. Costs, risks, and benefits of surgery. New York: Oxford University Press;1977:308–322. [Google Scholar]
- 41.Jones DS. Visions of a cure: visualization, clinical trials, and controversies in cardiac therapeutics, 1968–1998. Isis. 2000;91:504–541. [DOI] [PubMed] [Google Scholar]
- 42.Chalmers TC. Randomization and coronary artery surgery. Ann Thor Surg. 1972;14:323–327. [DOI] [PubMed] [Google Scholar]
- 43.Bonchek LI. Are randomized trials appropriate for evaluating new operations? NEJM. 1979;301:44–45. [DOI] [PubMed] [Google Scholar]
- 44.Barkun JS, Barkun AN, Sampalis JS, et al. Randomised controlled trial of laparoscopic versus mini cholecystectomy. The McGill Gallstone Treatment Group. Lancet. 1992;340:1116–1119. [DOI] [PubMed] [Google Scholar]
- 45.Tang CL, Schlich T. Surgical innovation and the multiple meanings of randomized controlled trials: the first RCT on minimally invasive cholecystectomy (1980 – 2000). J Hist Med All Sci. 2016;72(2):117–141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Perry S Clinical trials and the public’s interest. NIH Clinical Trials Committee, NIH Office of the Director Files. January 16, 1976. [Google Scholar]
- 47.McDermott W Surgical Innovation and Its Evaluation. Science. 1978;200:937–941. [DOI] [PubMed] [Google Scholar]
- 48.Meakins JL. Innovation in surgery: the rules of evidence. Am J Surg. 2002;183:399–405. [DOI] [PubMed] [Google Scholar]
- 49.Wartolowska K, Judge A, Hopewell S, et al. Use of placebo controls in the evaluation of surgery: systematic review. BMJ. 2014;348:g3253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.McLeod RS, Wright JG, Solomon MJ, Hu X, Walters BC, Lossing A. Randomized controlled trials in surgery: issues and problems. Surgery. 1996;119:483–486. [DOI] [PubMed] [Google Scholar]
- 51.Ergina PL, Cook JA, Blazeby JM, et al. Challenges in evaluating surgical innovation. Lancet. 2009;374:1097–1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Bothwell L, Greene J, Podolsky S, et al. Assessing the gold standard—lessons from the history of RCTs. NEJM. 2016;374:2175–2181. [DOI] [PubMed] [Google Scholar]
- 53.Kron IL, LaPar DJ, Horvath KA. Cardiothoracic surgical trials network: evidence-based surgery. J Thor CV Surg. 2016;151:28–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Bunker JP, Barnes BA, Mosteller F, et al. Summary, conclusions, and recommendations. In: Barnes BA, Mosteller F, eds. Costs, risks, and benefits of surgery. New York: Oxford University Press; 1977:387–394. [Google Scholar]