In January of 1927, Dr. Richard D. Mudd of Detroit pub-lished a letter in the Journal of the American Medical Associa-tion, seeking to vindicate his grandfather, Dr. Samuel A. Mudd, against charges of conspiring in a murder [1]. The victim was U.S. President Abraham Lincoln; the murderer, actor John Wilkes Booth (see Appendix). In this editorial, I, an erstwhile actor, would like to vindicate my own grandfather, Dr. John Rosslyn Earp, for a letter he published on the same day, just one column over, in the very same issue of the journal [2]. But I mean “vindicate” in its other sense—to prove correct—as we shall see.
I never knew my grandfather. He died in 1941 at the age of 49, more than four decades before I was born. My father, his son, hardly knew him either: he was only 7 when “Ros” passed away from longstanding health problems, leaving him and his siblings to the care of their mother. I had been told that Grandpa Earp—no relation to Wyatt—was at one point the Director of Public Health for the State of New Mexico [3]. I knew that he'd emigrated from somewhere in England around the turn of the last century. That, and an impression I had from an old photographic proof balanced atop a bookcase in my childhood home, was about it (Figure 1).
Figure 1. Photograph of John Rosslyn Earp, taken circa 1930.
In 2013, I took a break from my acting career to study the history and philosophy of science at the University of Cam- bridge. My preoccupation at the time, which has not abated, was the public and professional “crisis of confidence” affect¬ing among other fields medicine and social psychology [4-6]. The term “crisis of confidence” refers to the “unprecedented level of doubt” experienced by many contemporary scientists about the reliability of reported findings in the literature [7].
Why all the doubt? There are several reasons. Anonymous surveys of practicing scientists have shown widespread use of “questionable research practices,” including “p-hacking,” selective reporting of measures or outcomes, and HARKing —hypothesizing after the results are known—all of which in-crease the likelihood of generating Type 1 errors [8-11]. Moreover, critiques have been raised about the reward struc¬ture of science which favors non-stop “productivity” and head¬line-grabbing conclusions over painstaking methodology [12-15]. And a series of high-profile apparent failures to repli¬cate major findings from prior studies has sent shockwaves through the scientific community [16,17].
All of this has combined to create a sense of genuine worry: how much of what we think we know do we actually know? Controversially, at least one prominent meta-scientist, John Ioannidis, has estimated that “most published research findings are false” [18].
The hardest-hit field seems to be psychology (which to its credit has also taken up the vanguard for reform) [19,20], with biomedicine and related disciplines trailing not so far behind [21-23]. Since I had studied the former subject as an under-graduate student, I was familiar with an eerily similar crisis in that field from the 1970s, as a result of which leading practi-tioners sought to root out problems in the way they conducted, evaluated, and published their empirical research [24]. One of the biggest problems to get spotlight treatment was the failure of most journals to publish “negative” results.
In a now-famous article published in 1975, Professor An-thony Greenwald, then of Ohio State University, discussed what he called the “Consequences of prejudice against the null hypothesis” [25]. As he wrote, the lack of a dependable “home” for negative findings creates “a dysfunctional re-search-publication system.” Not only are there “relatively few publications on problems for which the null hypothesis is (at least to a reasonable approximation) true,” but, even among those, “a high proportion will erroneously reject the null hy-pothesis.”
In short, Greenwald identified what is now termed “publica-tion bias” in favor of “statistically significant” findings—a bias that has featured prominently in contemporary discussions about the potential causes of the so-called “replication crisis”[26-28].
The idea is simple. If 20 labs, say, run essentially the same experiment, and only one of them gets it to “work,” chances are good that the apparent finding from this one “lucky” lab is actually a statistical fluke. But since journals—and especially high-impact journals—have had a historical tendency to pub-lish only positive findings, it is this probably-a-fluke result that will end up enshrined in the scientific record [29].
The “negative” results, by contrast, from the 19 other labs in our dummy example—or perhaps the 19 previous versions of the same study from the original lab, recast as “pilot” experi-ments when they didn't pan out—won't typically be written up and submitted, much less published in a prominent journal. Instead, they get “filed away” in the researcher's bottom drawer (the so-called “file drawer” problem), never to be seen again [30,31].
The literature, then, gets skewed in the direction of impres-sive-looking errors, which, for obvious reasons, can't be repli-cated later on. In a clinical context, this “skew” may have se-rious ethical implications for the protection of patient health and well-being. As the editor-in-chief of this journal notes, “selective publication [of] trials can skew the apparent risk-benefit ratio of the drug towards the latter and generate an unrealistic bias, thereby potentially slanting the accuracy of evidence-based medicine” [32].
Needless to say, medical treatments need to be based on accurate research. Basing them on something else is not only unethical (because of the unjustified risk it poses to patients and study participants); it is also an extraordinary waste of resources [33]. Selectively publishing “positive” findings makes these problems worse.
So what can be done? In the course of researching this issue, I stumbled across a paper with a pertinent title that I thought might offer a solution: “The Need for Reporting Negative Re-sults.” The source? Journal of the American Medical Associa-tion—volume 88, number 2. The year? 1927. The author? J. R. Earp, my grandfather [2].
I had no idea he had ever written on the subject (to speak of chills and spines is to get it right). What follows then is his prophetic letter in full, with a few minor edits for ease of reading:
To the Editor:—One of the things we practitioners sometimes neglect is the reporting of failures. In THE JOURNAL, Oct. 2, 1926, Dr. Richard L. Sutton, with proper scientific reserve, reported the treatment of six consecutive cases of warts with intramuscular injections of sulpharsphenamine. As a result of this communication, I venture to guess that not less than a hundred physicians, perhaps several hundred, injected sulpharsphenamine into patients with warts. Supposing that 99 per cent get negative results, what happens? Each of them gives up the method as a failure and does not say anything more about it, and the treatment remains on record as an un¬disputed success. Possibly 1 per cent who meet with suc¬cess will communicate with Dr. Sutton, so that by and by he will have quite an impressive series of cases, compa¬rable with the mercurochrome successes published in a recent number of THE JOURNAL. ...
To practice what I am preaching, let me now report that on November 30, I injected 0.4 g of sulpharsphena- mine [into] the left buttock of E. M. B., a girl, aged 18, who was at that date complaining of the presence of twenty-four warts distributed mostly over the hands and arms. At the present date, there are twenty-eight warts, and evidence of regressive changes in the original twen¬ty-four has not been seen.
The problem is plain to see; the “need for reporting negative results” is equally apparent [34]. But one-off letters to the edi-tor by conscientious doctors like my grandfather will not suf¬fice to address the root of the problem. What is needed is top-down leadership from journals themselves: not only pas¬sively allowing for the submission of negative findings, but actively welcoming them and even seeking them out. In fact, it should be no harder to publish a high-quality study with “null” results—including unsuccessful attempts at replication—than a high-quality study that purports to show an effect.
There are some signs of progress. Articles with “replication” in the title are now being published on a regular basis [35-42]; there is even a dedicated Journal of Articles in Support of the Null Hypothesis (although it is not especially well-known). But there is still a lot of room for improvement. In a recent review of 1151 journals, researchers found that only 3% ex-plicitly stated that they accepted replications; 63% did not state as much but also did not discourage them; 33% discouraged them implicitly by stressing novelty in solicited submissions; and 1% actively frowned on replications by stating that they did not publish them [43].
Against this backdrop, where does the Journal of Clinical and Translational Research (JCTR) stand? In the founding editorial for this journal, the editor states that JCTR encour¬ages the publication of negative results for two main reasons in addition to counteracting the “skewing” problem already men-tioned [32]:
(1)publication of negative data, especially when ob¬tained in a technically sound study ... provides cuesas to why a certain procedure or process did not work and steers research efforts away from failure. In that sense, something not working can be considered ‘part' of the mechanism.
(2)negative results prevent colleagues from conduct¬ing redundant work, saving animals and valuable resources in the process. An expedient trajectory to the clinical setting, during which redundancy is mini¬mized, is ultimately beneficial for everyone involved in translational and clinical research as well as the target group (i.e., patients).
It is with these points in mind that I am happy to introduce, on behalf of my co-editors Emma Bruns and Michal Heger— as well as the entire journal staff—this special issue dedicated entirely to the publication of negative results. Though I never had a chance to meet him, something tells me Grandpa would be proud.
APPENDIX
Letter from Dr. Mudd. JAMA. 1927;88:119.
References
- [1].Mudd RD. Dr. Mudd and the death of Lincoln. JAMA. 1927;88:119. [Google Scholar]
- [2].Earp JR. The need for reporting negative results. JAMA. 1927;88:119. [Google Scholar]
- [3].Editor. News from the field. Am J Public Health. 1937;27:755–758. [Google Scholar]
- [4].Baker M. Is there a reproducibility crisis? Nature. 2016;533:452–454. doi: 10.1038/533452a. [DOI] [PubMed] [Google Scholar]
- [5].Earp BD, Trafimow D. Replication, falsification, and the crisis of confidence in social psychology. Front Psychol. 2015;6:1–11. doi: 10.3389/fpsyg.2015.00621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Nosek BA, Errington TM. Making sense of replications. eLife. 2017;6:e23383. doi: 10.7554/eLife.23383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Pashler H, Wagenmakers E. Editors' introduction to the special section on replicability in psychological science: a crisis of confidence? Perspect Psychol Sci. 2012;7:528–530. doi: 10.1177/1745691612465253. [DOI] [PubMed] [Google Scholar]
- [8].John LK, Loewenstein G, Prelec D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol Sci. 2012;23:524–532. doi: 10.1177/0956797611430953. [DOI] [PubMed] [Google Scholar]
- [9].Kerr NL. HARKing: hypothesizing after the results are known. Personal Soc Psychol Rev. 1998;2:196–217. doi: 10.1207/s15327957pspr0203_4. [DOI] [PubMed] [Google Scholar]
- [10].Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The extent and consequences of p-hacking in science. PLOS Biol. 2015;13:e1002106. doi: 10.1371/journal.pbio.1002106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Trafimow D, Earp BD. Null hypothesis significance testing and Type I error: the domain problem. New Ideas in Psychology. 2017;45:19–27. [Google Scholar]
- [12].Nosek BA, Spies JR, Motyl M. Scientific utopia II: restructuring incentives and practices to promote truth over publishability. Perspect Psychol Sci. 2012;7:615–631. doi: 10.1177/1745691612459058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Munafo MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Sert NP du, Simonsohn U, Wagenmakers E, Ware JJ. Ioannidis JPA. A manifesto for reproducible science. Nat Hum Behav. 2017;1:1–9. doi: 10.1038/s41562-016-0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Everett JAC, Earp BD. A tragedy of the (academic) commons: interpreting the replication crisis in psychology as a social dilemma for early-career researchers. Front Psychol. 2015;6:1–4. doi: 10.3389/fpsyg.2015.01152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Earp BD. The unbearable asymmetry of bullshit. Health Watch. 2016;Spring(101):4–5. [Google Scholar]
- [16].Yong E. Replication studies: bad copy. Nat News. 2012;485:298–300. doi: 10.1038/485298a. [DOI] [PubMed] [Google Scholar]
- [17].Earp BD. What did the OSC replication initiative reveal about the crisis in psychology? BMC Psychol. 2016;4:1–19. [Google Scholar]
- [18].Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2:e124. doi: 10.1371/journal.pmed.0020124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Chambers C. The changing face of psychology. The Guardian. 2014 Jan 24 https://www.theguardian.com/science/head-quarters/2014/jan/24/the-changing-face-of-psychology
- [20].LeBel EP, Vanpaemel W, McCarthy RJ, Earp BD, Elson M. A unified framework to quantify the trustworthiness of empirical research. PsyArXiv. 2017 https://osf.io/preprints/ psyarxiv/uwmr8
- [21].Engber D. Cancer research is broken. Slate. 2016 Apr 19 http://www.slate.com/articles/health_and_science/future_tense/201 6/04/biomedicine_facing_a_worse_replication_crisis_than_the_one _plaguing_psychology.html
- [22].Collins FS, Tabak LA. NIH plans to enhance reproducibility. Nature. 2014;505:612–613. doi: 10.1038/505612a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Lose G, Klarskov N. Why published research is untrustworthy. Int Urogynecol J. 2017 doi: 10.1007/s00192-017-3389-1. in press. [DOI] [PubMed] [Google Scholar]
- [24].Elms AC. The crisis of confidence in social psychology. Am Psychol. 1975;30:967–976. [Google Scholar]
- [25].Greenwald AG. Consequences of prejudice against the null hypothesis. Psychol Bull. 1975;82:1–20. [Google Scholar]
- [26].Easterbrook PJ, Gopalan R, Berlin JA, Matthews DR. Publication bias in clinical research. The Lancet. 1991;337:867–872. doi: 10.1016/0140-6736(91)90201-y. [DOI] [PubMed] [Google Scholar]
- [27].Francis G. Replication, statistical consistency, and publication bias. J Math Psychol. 2013;57:153–69. [Google Scholar]
- [28].Bakker M, van Dijk A, Wicherts JM. The rules of the game called psychological science. Perspect Psychol Sci. 2012;7:543–554. doi: 10.1177/1745691612459060. [DOI] [PubMed] [Google Scholar]
- [29].Earp BD, Wilkinson D. The publication symmetry test: a simple editorial heuristic to combat publication bias. J Clin Transl Res. 2017;3 in press. [PMC free article] [PubMed] [Google Scholar]
- [30].Rosenthal R. The file drawer problem and tolerance for null results. Psychol Bull. 1979;86:638–41. [Google Scholar]
- [31].Pautasso M. Worsening file-drawer problem in the abstracts of natural, medical and social science databases. Scientometrics. 2010;85:193–202. [Google Scholar]
- [32].Heger M. Editor's inaugural issue foreword: perspectives on translational and clinical research. J Clin Transl Res. 2015;1:1–5. [PMC free article] [PubMed] [Google Scholar]
- [33].Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S, Michie S, Moher D, Wager E. Reducing waste from incomplete or unusable reports of biomedical research. The Lancet. 2014;383:267–276. doi: 10.1016/S0140-6736(13)62228-X. [DOI] [PubMed] [Google Scholar]
- [34].Earp BD, Everett JAC. How to fix psychology's replication crisis. The Chronicle of Higher Education. 2015 Oct 25 http://www. chronicle.com/article/How-to-Fix- psychologys/233857
- [35].Boekel W, Wagenmakers EJ, Belay L, Verhagen J, Brown S, Forstmann BU. A purely confirmatory replication study of structural brain-behavior correlations. Cortex. 2015;66:115–133. doi: 10.1016/j.cortex.2014.11.019. [DOI] [PubMed] [Google Scholar]
- [36].Bostyn DH, Roets A. Trust, trolleys and social dilemmas: a replication study. J Exp Psychol Gen. 2017;146:e1–7. doi: 10.1037/xge0000295. [DOI] [PubMed] [Google Scholar]
- [37].Castro VM, Kong SW, Clements CC, Brady R, Kaimal AJ, Doyle AE, Robinson EB, Churchill SE, Kohane IS, Perlis RH. Absence of evidence for increase in risk for autism or attention-deficit hyperactivity disorder following antidepressant exposure during pregnancy: a replication study. Transl Psychiatry. 2016;6:e708. doi: 10.1038/tp.2015.190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Earp BD, Everett JAC, Madva EN, Hamlin JK. Out, damned spot: Can the “Macbeth Effect” be replicated? Basic Appl Soc Psychol. 2014;36:91–98. [Google Scholar]
- [39].Radke S, de Bruijn ERA. Does oxytocin affect mind-reading? A replication study. Psychoneuroendocrinology. 2015;60:75–81. doi: 10.1016/j.psyneuen.2015.06.006. [DOI] [PubMed] [Google Scholar]
- [40].Renes RA, van der Weiden A, Prikken M, Kahn RS, Aarts H, van Haren NEM. Abnormalities in the experience of self-agency in schizophrenia: a replication study. Schizophr Res. 2015;164:210–213. doi: 10.1016/j.schres.2015.03.015. [DOI] [PubMed] [Google Scholar]
- [41].Simeoni S, Hannah R, Daisuke S, Kawakami M, Gigli GL, Rothwell JC. Effects of quadripulse stimulation on human motor cortex excitability: a replication study. Brain Stimul. 2016;9:148–150. doi: 10.1016/j.brs.2015.10.007. [DOI] [PubMed] [Google Scholar]
- [42].Gil-Gomez de, Liano B, Stablum F, Umilta C. Can concurrent memory load reduce distraction? A replication study and beyond. J Exp Psychol Gen. 2016;145:e1. doi: 10.1037/xge0000131. [DOI] [PubMed] [Google Scholar]
- [43].Martin GN, Clarke RM. Are psychology journals anti-replication? A snapshot of editorial practices. Front Psychol. 2017;8:1–6. doi: 10.3389/fpsyg.2017.00523. [DOI] [PMC free article] [PubMed] [Google Scholar]