Skip to main content
Clinical and Translational Science logoLink to Clinical and Translational Science
. 2021 May 7;14(4):1210–1221. doi: 10.1111/cts.13050

Reimagining the peer‐review system for translational health science journals

Elise M Smith 1,
PMCID: PMC8301572  PMID: 33963670

Abstract

Retractions of coronavirus disease 2019 (COVID‐19) papers in high impact journals, such as The Lancet and the New England Journal of Medicine, have been panned as major scientific fraud in public media. The initial reaction to this news was to seek out scapegoats and blame individual authors, peer‐reviewers, editors, and journals for wrong doing. This paper suggests that scapegoating a few individuals for faulty science is a myopic approach to the more profound problem with peer‐review. Peer‐review in its current limited form cannot be expected to adequately address the scope and complexity of large interdisciplinary science research collaboration, which is central in translational research. In addition, empirical studies on the effectiveness of traditional peer‐review reveal its very real potential for bias and groupthink; as such, expectations regarding the capacity and effectiveness of the current peer review process are unrealistic. This paper proposes a new vision of peer‐review in translational science that, on the one hand, would allow for early release of a manuscript to ensure expediency, whereas also creating a forum or a collective of various experts to actively comment, scrutinize, and even build on the research under review. The aim would be to not only generate open discussion and oversight respecting the quality and limitations of the research, but also to assess the extent and the means for that knowledge to translate into social benefit.

INTRODUCTION

Given the urgency and severity of the pandemic, coronavirus disease 2019 (COVID‐19) research has been prioritized for increased public and private funding, new collaborations, and greater sharing of research resources (data, plasmids, repositories, and reagents). 1 , 2 , 3 There have also been systemic/procedural changes to promote the translation of science into practical clinical policy applications. The modifications include overlapping clinical trial phases, expedited institutional review board review procedures, increased use of the US Food and Drug Administration (FDA) emergency use authorizations (EUAs), 4 a focus on repurposing already existing drugs, 5 and the lifting of journal paywalls to increase accessibility to research. 6 From December 2019 to May 2020, a total of 7440 manuscripts have been made available online (published and in preprints), a veritable deluge of research studies. 7

Advances in COVID‐19 research have been highly mediatized, further amplifying the public’s expectations of research for a cure and relief from the pandemic. The media has also reported research shortcomings as well as its breakthroughs; this includes notable retractions of COVID‐19 papers in high impact journals, such as The Lancet and the New England Journal of Medicine. Headlined as major scientific fraud, these retractions have undermined public trust in scientific knowledge. 8 , 9 According to Retraction Watch—which catalogues all retractions in peer‐reviewed journals—39 papers on COVID‐19 have been retracted as of November 23, 2020. 10 The system of quality control that serves as the gatekeeper of research integrity seems to have failed quite publicly during the COVID‐19 crisis; the term “Lancet‐Gate” has been invoked to infer as scandalous, the scientific retractions regarding hydroxychloroquine during the COVID‐19 pandemic. 11 , 12 , 13 In order to root out and hold the guilty accountable, scapegoating ensued; authors, editors, and peer‐reviewers related to retracted articles were closely scrutinized.

This paper argues that scapegoating is not particularly effective in addressing the more systemic issues that account for peer‐review shortcomings. In considering the socio‐historical context of peer‐review process, one recognizes that it was originally designed to assess small scale single authored work. The assessment of current complex ideas and Big Science studies often require a more diversified and comprehensive skillset to ensure the required rigor and avoid bias. Expectations of traditional peer‐review are unrealistically high for contemporary large scale translational science especially during a pandemic. This paper will compare expectations of the peer‐review process to those of a clinical trial to underscore the importance of establishing realistic expectations in research development and the testing of ideas. Recognition and acknowledgment that traditional peer‐review is not a “fail‐safe” method to ensure scientific quality especially during a pandemic is an important first step in the process of exploring alternative approaches to increase the rigor of science. Ultimately, focus should shift away from traditional peer‐review toward a more continuous system of collaborative multistakeholder peer‐review embedded with values specific to translational research.

Any change to the status quo could be refuted as too time‐consuming—that it would prolong an already lengthy peer‐review process. In addition, it could also be argued that it would be an ethical travesty to delay the public sharing of important knowledge that could directly save lives during a pandemic. However, this paper definitely does not envisage or promote a longer process; rather, it identifies shortcomings to the existing peer‐review process and suggests modernizing it consistent with values inherent to translational health science journals. Although this proposal may be rudimentary, especially with regard to its recommendations, its goal is to show the need and possibility to move beyond the status quo. As such, it is an invitation to all scholars to think about novel quality control mechanisms to ensure that research products are safe, effective, and readily applicable.

FINDING THE SCAPEGOAT IN COVID‐19 HYDROXYCHLORIQUINE STUDIES

At the beginning of the pandemic, hydroxychloroquine was heavily mediatized as a readily available miracle drug to alleviate the symptoms of COVID‐19 or prevent the infection altogether. 14 Although the news was based on the unverified claims of politicians, it sparked great hope, and, subsequently, scientific experts were called upon to verify, confirm, or refute this claim. On May 22, 2020, The Lancet published a manuscript demonstrating an increased risk of ventricular arrhythmias after taking hydroxychloroquine for COVID‐19; this justified the suspension of various clinical trials, including sections of the World Health Organization Solidarity clinical trial. 15

At first, this conveyed the impression that science had prevailed to correct unfounded, expeditious, and possibly politically motivated opinion. However, The Lancet hydroxychloroquine manuscript itself was based on unverifiable and likely fabricated data. To make matters worse, these data had been used more broadly in various other peer‐review papers, resulting in not one, but 3 retractions in 1 week. Another paper, published in the New England Journal of Medicine on May 1, 2020, contradicted previous hypotheses by demonstrating that there was no increase in hospital‐death rates linked to the use of angiotensin‐converting enzyme (ACE) inhibitors and angiotensin‐receptor blockers (ARBs). 16 , 17 This was not only a blow to the prestige of the journals, it also fomented broader public doubt in the integrity and legitimacy of scientific knowledge. There was an urgent call to find the culprit(s) responsible for such shortcomings, ascribe accountability, and to some extent, “make things right.” Because science is self‐policing, it was science’s responsibility to take corrective action.

The first people to blame for a lack of oversight are logically the authors of the paper. The journal Science published a commentary by Charles Piller titled “Who’s to blame? These three scientists are at the heart of the Surgisphere COVID‐19 scandal.” 18 Indeed, as the named authors, they were responsible for the veracity and reliability or the research while also being accountable to publicly defend their work. 19 They had developed the hydroxychloroquine study using a registry of observational data from a private company named Surgisphere owned by the main author Sapan Desai. More than 100,000 COVID‐19 medical records from 671 hospitals were included in this study. Not only was this an unusually high number of records at the time of publication, the number of deaths per country did not match up with other sources. 20 The veracity of the data was called into question. Furthermore, when scholars and editors inquired, the data could not be provided to an auditor. Desai explained that the lack of transparency was justified due to confidentiality concerns. All in all, there was little evidence to support that the data source was reliable and transparent. Ultimately, in no way could the authors transparently demonstrate that their work was conducted reliably and credibly.

Public and scholarly media interviewed Desai’s colleagues about his medical career. Critics mentioned that his integrity had been brought into question often over 12 years, stating for example that: “his research patient data did not always match charts.” 21 It was also stated that his “unreliability was an open secret.” 21 Colleagues reported that Sapai demonstrated questionable behavior and did not properly follow directives for treating patients. 22 Although his questionable integrity seemed to be common knowledge in various organizations, few would openly or publicly criticize Desai because of the power dynamics at play. There were other colleagues who praised him as a talented surgeon who performed with integrity. 18 As the central character and designated culprit in this affair, Desai’s academic affiliation was ultimately terminated.

As the owner of Surgisphere, Desai would be the main target of scrutiny and blame, but other authors were also questioned. This included Mandeep Mehra (Brigham and Women’s Hospital and Harvard Medical School), and Amit Patel (University of Utah). In light of criticisms of their paper, they decided to ask for an audit to review the Surgisphere data, confirm its completeness, and replicate the study. However, their request for an audit was refused by Desai who stated that this would violate client agreements, including confidentiality concerns. 23 Basically, Desai’s co‐authors—Mehra and Patel—appear to be uninformed and somewhat removed from the situation. Although some may criticize this as irresponsible, they are surely not the first authors not to have access to raw data.

The case of Dr. Hwang Woo‐Suk (Seoul University) provides an interesting comparison. Dr Woo‐Suk published two articles in Science claiming to have effectively transferred DNA from somatic cells of research subjects to embryonic stem cells 24 ; all articles were retracted due to falsification and unethical conduct with regard to human subjects. Dr. Woo‐Suk’s co‐author, Dr. Gerald Schatten (University of Pittsburg), did not face the same accusations and consequences. The investigation at Schatten’s institution suggested that, although his behavior was questionable, his “unknowingness,” or lack of knowledge of the actual misconduct absolved him of any significant negative repercussions. Unknowingness of misconduct or problematic behavior in both cases—the hydroxychloroquine case and stem cell case—protects co‐authors from condemnation for misconduct, specifically because, as required by US federal regulation (2 CFR § 910.132), they were deemed not to have acted intentionally, knowingly, or recklessly. Although co‐authors may effectively dodge any legal responsibility, their close association to a high‐profile case of misconduct for a published work can negatively impact a scholar’s career. 25 As is the case with hydroxychloroquine, the damage may be felt throughout the system of science.

A share of the blame could also befall the gatekeepers of the scientific process, including the editors and peer‐reviewers who review the quality and importance of a manuscript. A recent study uncovered that length of time spent in the publication process of coronavirus articles has decreased on average by 49% (or 57 days). 26 Although there is some concern as to the quality of research that has undergone more rapid peer‐review, there is no known correlation between the speed of publication process and quality of review. The increased number of COVID‐19 papers, as well as the public attention and the urgency of the topic itself have certainly brought to bear extraordinary pressure on editors. Unfortunately, some very problematic publications have indeed made it through the publication process, including a paper by Fioranelli and colleagues on 5G technology and induction of coronavirus in skin cells. 27 This paper was quickly retracted by the editor for faulty peer‐review and its obvious scientific shortcomings.

Individual blame for egregious and intentional acts of misconduct may be useful to focus accountability, quickly censure bad actors, and also manage public perception. The notion that “heads did roll” may also be convenient in achieving closure and moving on. However, where multidisciplinary teamwork is involved, blaming one individual is akin to applying a “band aid” to deal with an isolated incident while overlooking the more complex collaborative relationships and responsibilities of other stakeholders in the scientific process. As such, it will fail to adequately “diagnose and remedy” the problem. Granted, the development of misconduct policies and laws that establish the legal liability of individual wrongdoers is necessary. However, this emphasis on individual action and responsibility is limited and provides an insufficiently myopic approach in resolving causal and systemic issues.

Historically, in the scholarship regarding responsible conduct of research, misconduct was often perceived as the actions of a few “bad apples.” 28 , 29 However, retraction rates due to misconduct are generally seen as the “tip of the iceberg,” an indication of further incidents of scientific mistakes, misconduct, and questionable ethical conduct. 28 The implications of this are significant. Retracted papers are those for which someone has identified an issue and journals have taken the time to correct the research record. However, there remains a significant amount of problematic science published and cited.

Blaming and shaming a few individuals without tackling the complex systemic issues, is a cop‐out or sorts. Many have argued for a “beyond the bad apple approach,” 29 which would imply a broader review of environmental and systemic considerations and, as such, a multidimensional approach to examine the research environment, its institutions, and stakeholders, including journals and knowledge users. Studies in behavioral sciences generally demonstrate that problematic environmental pressures respecting the laboratory, team, institutional research environment, the broader funding, and/or workforce dynamics can have significant impacts on ethical behavior in science. 30 It would truly take a collective commitment to improve the current peer‐review process.

UNREALISTIC EXPECTATIONS OF PEER‐REVIEW

Peer‐review has been held up as the “gold‐standard” quality control mechanism in the publication of contemporary science. 31 In this process, an editor selects independent experts in a given field to judge if research is deemed sufficiently novel, scientifically rigorous, and ethically sound. When a high impact journal has published a paper, its symbolic value to the author, to science, and to society increases markedly. Although an argument can be made that “good science” is inherently valuable, it is its publication that allows a paper to be read, shared, and embraced by the scientific community. The publication and translation of knowledge to clinical application is central to the raison d’être of translational science. Papers published in reputed, high impact journals will get more traction and thus, will dictate the “knowledge” and “evidence” upon which we base our translational medicine best practice, policy decisions, and future scientific goals.

In the 1980s, various authors in scholarly research perceived peer‐review as a “black box.” 32 The traditional Mertonian norms of impartiality were at the core of this process where a few independent reviewers ensure quality control that is impartial and free of bias. 33 It was not until the late 1990s and more so after the 2000s, that peer review underwent greater empirical scrutiny by researchers who highlighted significant limitations.

Carole Lee and colleagues 34 have provided extensive review and categorization of research that looks at biases in peer‐review. Although their work includes peer‐review in a broader spectrum of activities (e.g., grant and fellowships application review, evaluation of book proposals, and assessment of teaching ability), the basic categorization of biases are present in journal peer‐review process. Biases are divided into four categories: (1) bias regarding quality of submission, (2) bias regarding the social characteristics of authors, (3) bias regarding the social characteristics of the editor, and (4) bias regarding the content of the study. 34 Bias regarding quality has been studied by assessing the consistency of reviews. Studies generally suggest that peer‐reviewers interpret and apply scientific criteria in an inconsistent manner. 35 , 36 The second and third categories consider bias linked to social characteristics; and, empirical studies have demonstrated significant bias linked to the country of origin of the author. 37 The fourth bias category is linked to content, which includes prioritizing positive outcomes as oppose to no‐difference outcome papers. 38 Content bias has been noted to also include the tendency of peer‐reviewers to promote very conservative papers that provide incremental change as opposed to creative papers that may seem different than what is traditionally published. 39 In sum, scholars looking at the “science of science” have concluded that peer‐review is far from being fail‐proof. 35

Although there have not been any revolutionary shifts in peer‐review, there have been minor modifications and improvements. Editors and scholars have reflected on ways to increase training, minimize certain peer‐review bias, increase courtesy, manage conflict of interest, and promote open review and transparency. 40 For example, to reduce bias, some journals have gone from single blind review (making peer‐reviewers anonymous) to double blind reviews (making both peer‐reviewed and authors anonymous). Anonymity of the reviewer is said to allow for open criticism without fear of retribution. However, many have criticized this notion, pointing to the lack of transparency and accountability of the reviewer. Anonymity of the author is said to allow peer‐review to be free of bias linked to gender, race, institutional affiliation, country, or discipline. However, the specificity of fields and expertise allows most scholars to ascertain who works on what and with what resources; as such, many researchers can actually identify the authors of a paper. Given the disadvantages and limitations of blinding (mainly double‐blinding), certain journals have chosen to increase transparency by using “open review” to provide names of both authors and peer‐reviewers online; many also choose to post the full review online.

PLoS journals, often referred to as “Open access mega journals,” have promoted the notion of “soundness only peer‐review.” In short, this refers to a very limited peer‐review scope with scrutiny on methodological trustworthiness and soundness of the article. Editors who hold this view do not consider the importance of the topic or readership, applicability, or novelty. Indeed, good science in many fields should be published even if it leads to a null result, an incremental advancement, or a small mechanistic development. To avoid prepublication peer‐review altogether, postpublication peer‐review undertaken by the F1000 journals 41 and at Frontiers 42 allows for publication before peer‐review, which speeds up the research process. 43 All comments by reviewers are made public; to some extent, this transparency is similar to publishing on preprint servers like arXiv.org and bioRxiv.org with the added feature of actually being considered a completed “publication.”

Although small modifications have been made to the peer‐review process, these have not kept pace with the more radical modifications, which have, and are, occurring in science more generally. The significant growth and evolution of Big Science has promoted more expansive projects (e.g., multisite, larger studies) that require greater diversification of expertise. Interdisciplinary collaborations allow researchers to more fully understand the disciplinary contribution of colleagues from other disciplines in achieving good science. Exceptionally in the case of a smaller group, one person may understand all parts of the project and introduce collaborators only as a manner to reduce the workload, get the research completed quicker, or allow for broader demographic inclusion (e.g., multisite studies). But more often, as in the case of interdisciplinary translational health research, the work requires the collaboration and skills of researchers organized in a less traditional, hierarchical distribution of labor, which draws upon different knowledge sets across various disciplines.

However, at this time, we still rely on a rather narrow peer‐review system designed in 1731 for single authored work to verify the integrity of research and identify any shortcomings of large‐scale translational work. The “age” of the system is not at issue, but rather, the fact that it has been outstripped by the evolution of science, which is much more complex and epistemologically diverse than it once was. Scholars Stahel and Moore offer the following comparison: “This is analogous to considering a modern 21st century information technology company running its operations on first‐generation 4 kB Apple computers from 1976.” 44 Stahel and Moore remark convincingly that the increase in number of papers has created such an increased workload, that peer‐reviewers are presently at a breaking point. 44 Notably, they made this finding in a paper written before the COVID‐19 pandemic, which has only further intensified the need to peer‐review papers more urgently and quickly.

The very nature and scope of severe acute respiratory syndrome‐coronavirus 2 (SARS‐CoV‐2) demands nothing less than an “all‐hands‐on‐deck” collaborative response both in terms of research and peer‐review. Researchers and clinicians quickly found that this novel coronavirus, initially thought to be a respiratory virus, could also present itself as a heterogeneous multi‐organ or systemic illness. 45 Patients with COVID‐19 have displayed issues related to the cardiovascular symptoms, neuropsychiatric symptoms alongside a widespread hyper‐inflammatory state, which have forced researchers from various fields to work in collaboration. Alongside the biomedical and clinical research, public health and behavioral studies have contributed to finding novel ways to implement prevention behaviors (e.g., social distancing, confinement, hand‐washing, and travel restriction). 46 Additionally, studies on health disparities have shown us the inequitable manner in which populations are impacted by COVID‐19. 47 , 48

Although we can acknowledge the importance of diverse contributions, we still need a way to ensure it is quality work. The more interdisciplinary and diverse a research team becomes, the harder it is for a primary author to truly be responsible for the work as a whole. In literature dealing with authorship accountability, the notion of the “authorless paper” has been used by Rebecca Kukla to categorize cases in which there are many authors on a paper with none who could actually take responsibility for the paper as a whole. 49 The argument to support this scientific collaboration would be that a group of people may form a collaborative entity bonded by social and professional ties, which, in turn, would constitute a new responsible collective entity often called “group author.” The counter‐argument that Kukla supports is that there is no collective unity cohesive enough to serve as a “group author” and that the coordination of each part and how it fits together will never be as strong as with an individual author. 49 Although this may be true, the fact remains that the complexity of science requires a diversity of contributions to answer the translational issues at hand. If that makes a translational science project less of a coherent story, perhaps we do not have a choice in the matter and we simply need to attempt to find ways to engage in relational scientific groups that may not be perfect, but allow for some level of coherence. Although collaboration ethics or relational ethics is a field that requires further development, it seems nonetheless feasible to ensure that teams create the trust, respect, and communication to create collective epistemic coherence and accountability (what Kukla would call “group author”).

The complexity of broad translational interdisciplinary work and, notably, the requirement for collective epistemic coherence may well outstrip the current peer‐review format. If it is all but impossible to identify one or a few individuals capable of being responsible for the totality of a study or publication, can an editor really be expected to find two peer‐reviewers capable of providing in‐depth critical review of a paper that is the product of multiple different disciplines and expertise? An individual may be a content expert but not be an expert on the methodological approach. In this instance, a second individual with the methodological expertise would complement the content‐expert. However, where different disciplines, methodological approaches, and values are involved, it would seem logical to expect that several more reviewers may be required. Notably as well, it is not a matter of simply adding more individual experts, but rather, establishing a group of reviewers who share an epistemic coherence similar to the collective epistemic coherence of the research. This would be infeasible; there is no point in reviewing the specifics of a project where there is no generalized epistemic coherence.

To further demonstrate the importance of epistemic coherence, consider the epistemic justification for limitations. For example, in interdisciplinary research, there are various acceptable standards or criteria for evidence and rigor pursuant to the epistemic justification of the differing fields and disciplines. In an interdisciplinary team environment, various justifications and methods are brought to bear during collaborative research. Mention of these decisions may be found in the limitation sections of scientific papers, however, rarely is there space to go into detail as to why standards from one discipline may have been prioritized or not. A peer‐reviewer may be assessing a paper with standards from another discipline, which, although interesting, may not ensure the actual quality of what the paper is trying to do. In order to broaden peer‐review to include or reflect a greater diversity of competencies, it may well be necessary to add resources beyond the traditional two peer‐reviewers.

TESTING KNOWLEDGE

We can make an interesting, albeit imperfect, comparison between the clinical trial process and knowledge development via publication. In clinical trials, a drug is tested on subjects in four different phases to assess its safety and then its efficacy. With each phase, a larger test sample is studied to understand the effects of the treatment on diverse research subjects. Similarly, in knowledge development, we test knowledge in phases. Phase I of knowledge development starts with a group of scientists who design a project based on their knowledge and assumptions grounded in the scientific literature as well as their experiential knowledge. They continuously test and if necessary, modify project methodology, hypotheses, findings, and conclusions. They reflect critically on their work throughout the research process.

Phase II of knowledge development kicks in when the study is submitted to a journal for publication. During this publication phase, peer‐reviewers and the editor then “test” the knowledge; however, contrary to a clinical trial in which the effect of a drug is tested on a larger group of individuals, peer‐review is restricted to feedback from two or three people. Understandably, recruiting a large number of people to review a paper could well be infeasible. One could argue that it is unnecessary and that a few experts with the relevant knowledge should be selected and should suffice. However, as demonstrated previously in this paper, many studies on peer‐review have pointed out the important biases and limitations of peer‐reviewers. 34 , 50 In addition, the interdisciplinary breadth of translational research may be beyond the capacity of two or three reviewers. Hence, the notion of a broader, more diversified, and inclusive review group warrants serious consideration.

According to the FDA, 70% of phase I medications will move to the next phase, ~ 33% of phase II medications will move to the next phase, and ~ 25%–30% pass phase III, 51 resulting in a total success rate of 5%–6%. Furthermore, postmarket review (phase IV) may also reveal unforeseen long‐term effects. When medication is tested on a large group of individuals in phase III, it is not unusual nor is it shocking to discover that although it successfully passed the first and second phases, it is found not to be sufficiently effective or may pose serious side‐effects or health risks. There is no equivalent tolerance afforded to the peer‐review process; if a paper should be retracted following review, we are appalled, even though it has only been “tested” or reviewed by authors, peer‐reviewers, and editors. Expectations of traditional peer‐review as a fail‐safe process may well have been unrealistic from the outset.

If we were to expand the review or “testing” of a paper by making it accessible to a greater diversity of stakeholders, this would facilitate the identification of additional issues from various standpoints, disciplines, and social characteristics that impact scientific judgment. Herron et al. demonstrated through computer modeling that a broader group of informed individuals (readers of the journal) are more effective reviewers when compared to two or three subject experts (typical peer‐reviewers).

The idea of challenging the peer‐review process, which has been with us since 1731, may raise concern: dare we challenge the status quo? Although we prioritize novelty and openness to solve complex problems in science, somewhere along the way, scientists become unintentionally entrenched within disciplinary parameters and historical dogma. Conformity of thought has been central in theories of scientific development. For example, Thomas Kuhn suggested that research was generally done within the same paradigm shared by colleagues with the same epistemic and social values. 52 Although this may allow for collegial collaborative research within a specific niche, it resists any notion of challenging the status quo and any “paradigm shift.” It is difficult for a new idea to dislodge years of previous research, which Kuhn would equate to as “normal science.” 52 The most common example of a paradigm shift is the replacement of Newtonian physics with Einstein’s theory of relativity, which was resisted for decades. 53

Although Kuhn’s complete theory may not have been formally adopted, the current system is known to promote groupthink in scientific teams; like‐minded people end up in teams from similar disciplines that share or adhere to the same manner of thinking. 54 The concept of GroupThink, developed by Irvin Janis, was found to promote inordinately high group cohesion that could undo or hinder rational decision making. Indeed, the team’s cohesion and homogeneous train of thought may overwhelm and serves to silence any team member who would dare to stray from the status quo. Groupthink is generally perceived to be a direct result of power differentials and group socialization; as a result, it increases the likelihood of bias in science and could skew peer‐review. 54

The Abilene Paradox, a narrative developed by James Harvey, illustrates well a paradox often referred to as a “crisis of agreement.” 55 It is a variation of sorts on Groupthink dynamics. To summarize the story: on a very hot day, a family is comfortable at home playing dominos when the father suggests that they go to Abilene for dinner. Not one individual really looked forward to the 4 h drive to Abilene on a hot day, but they went along with the idea. After the long and hot car ride and the mediocre supper, all family members admit that not one of them actually wanted to go to Abilene in the first place. They blamed each other for the decision to make the journey.

This absurd situation “a crisis of agreement” occurs when members refrain from expressing their true beliefs and instead abide by a new collective reality that does not reflect their personal individual views and, in some instances, may be irrational and even dangerous. Future decisions can also be seriously destructive as members feel increasingly distraught and frustrated or lapse into resignation. This frustration may be shared by team members who react by blaming others for the collective decisions and any resulting problems. According to Harvey, 55 the Abilene paradox is created because of “action anxiety” in a which an individual fears acting in line with their own thoughts. Saying anything that would threaten or change the collective entity would create uncertainty. Often uncertainty or “fear or the unknown” creates greater anxiety or stress when compared to the problematic situation with which you are familiar. Importantly as well, team members do not want to alienate themselves from the group or be ostracized by other members.

Could the narrative and any critical discussion of peer‐review itself be something of an Abilene paradox? When mistakes or misconduct are uncovered and lead to retractions in journals, the reaction is to publicly blame individuals, while individually, there is growing awareness that the peer‐review system itself cannot realistically meet public expectations. Promoting the narrative that peer‐review is the gold‐standard publicly, while also internally rejecting as unrealistic, this same narrative is certainly absurd. Peer review in its current form is somewhat of an Abilene Paradox in which peer‐reviewers function close to the point of failure. It is time to consider viable alternatives to the current system in order to promote quality control that the public can trust.

FINDING MOTIVATION TO “FIX” THE PEER‐REVIEW PROCESS

A main objection to modifying the status quo is that researchers simply do not have the time or interest for review. Even within individual fields of expertise, scholars often find fault with the findings and conclusions of a paper, and yet, most do not feel the impetus or “take the time” to correct the research record. 56 However, during COVID‐19, this apathy for correcting the published record has given way to a greater volition to scrutinize and critique the research record. This may be explained in part by the immediacy and direct application of translational research on public health. For example, the Lancet paper on hydroxychloroquine was quickly and publicly criticized by a group of scholars led by James Watson; their argument was then considered by the Lancet editor who further assessed alleged shortcomings. 57 Watson also wrote a similar expression of concern regarding the integrity of another paper using Surgisphere data published in the New England Journal of Medicine. 58

Another case—that of the importance of mask‐wearing—also brought to light an increased scrutiny of researchers. On July 11, 2020, the Proceedings for the National Academies of Science (PNAS) journal published a manuscript written by Renyi Zhang, Yixin Li, Annie L. Zhang, Yuan Wand, and Mario J. Molina called “Identifying airborne transmission as the dominant route for the spread of COVID‐19.” 59 Molina, the senior author of this group, is a Nobel Prize winner of considerable prominence. In this paper, Zhang and colleagues conclude that wearing a face mask in public is the most effective means to prevent transmission and this was subsequently used to justify various policies regarding mask‐wearing. On June 18, 2020, 45 epidemiologists, including Noah Haber, commented publicly in a paper that although they agreed with the benefits stated in other studies regarding mask‐wearing, they argued that Zhang and colleagues relied on “easily falsifiable claims and methodological design flaws.” 60 The media used this debate to illustrate the messiness of science. 61 The paper was corrected on October 5, 2020, with editors claiming an oversight in the proofing system and that the second round of edits made by the authors were never included in the final paper. 62 On October 13, 2020, Günter Kamf published a letter which suggested that relevant variables with likely impact on outcome were still not considered, which put into question many conclusions made by the authors. 63

Although some claim that COVID‐19 retraction rates are higher than average at about 1%, 64 others think that it is still too soon to tell. 65 Even before COVID‐19, retractions had been increasing over the last few decades especially in high impact journals. 66 , 67 , 68 , 69 , 70 , 71 Before 2000, there had been fewer than 100 retractions per year; this number would grow over time to more than 1000 retractions per year in 2014. 72 It should also be noted that the actual percentage of all papers retracted leveled off in 2014 to ~ 4 of 10,000 papers. 72 Generally, retractions are made within 3 years following publication; so it is early in the process to arrive at conclusions regarding COVID‐19 retraction rates. The increased scrutiny of COVID‐19 papers may not be proof of some unusual increase in substandard or problematic science. Rather, the rate of retractions may simply reflect increased scrutiny proportional with the increase in scientific work as a result of the pandemic.

Retractions can impact reputations and thus it takes a significant amount of humility and integrity for an individual to retract their own work. Sometimes scholars do provide critical feedback to journal editors privately (e.g., directly through email) or publicly, as did the two previous examples of hydroxychloroquine and mask‐wearing. Not only does this take time and energy, it is not valued much in the system of science when compared with numbers of grants or publications. Similar to whistle‐blowing, it may be morally and ethically important, but it may also create a strain on professional relationships. It is improbable to expect researchers to go out of their way to identify scientific shortcomings in their publications when it is not truly valued within the system. Valuing peer‐review as an important contribution to the literature could be a great way to incentivize researchers to actively participate in this process.

Ideally, the “self‐correcting” and “self‐policing” in the scientific process should occur prior to publication. However, the increasing number of retractions suggest that peer‐reviewers are simply not catching all mistakes or questionable conduct and probably never will. 70 Some issues would be almost impossible for a peer‐reviewer to catch without fully replicating a study. Retractions are a necessary evil to ensure continued correction of the scientific record. In practice, science is reviewed, modified, published, and in some instances retracted. Knowing that authors do make errors and commit misconduct and knowing that peer‐review is not a fail proof system, we are left with the need to have more diversified researchers correct the research record.

There remains a concern that public awareness of retractions is “bad news” or the beginning of a loss of public trust in science. We have sold peer‐review as the “gold standard” that will ensure the integrity of scientific publication. The hype surrounding research has whipped up a frenzy of public expectation that COVID‐19 vaccines or treatments will be available quickly, safely, and expeditiously for all 73 ; any delay or inconclusive finding are much more likely to have negative effects on public trust than any increase in correcting publications before they are translated into practical knowledge. However, a more transparent and realistic process that also detects and retracts errors or substandard research from the scientific record relatively quickly may help to clarify that science is not infallible. Typically, the findings of a single study would not warrant its immediate clinical application; other studies would be conducted to test hypotheses, replicate, and corroborate or disprove findings as well as better understand the translational implications. As previously discussed, the paper by Zhang and colleagues is but one of several on mask‐wearing. Any decision making process should consider the entire body of knowledge and the diversity of evidence available and not focus solely on one recently published paper. If the public was made aware that postpublication review and retraction is not an alarming event per se but a part of the process, their expectations may be tempered and more realistic. Authors and peer‐reviewers do have limitations and this should be acknowledged to temper wild expectations.

BUILDING AN OPEN COLLABORATIVE MULTISTAKEHOLDER PEER‐REVIEW SYSTEM

The following open collaborative model is proposed as an alternative peer‐review system. When a research team submits their work to a journal, the typical peer‐review would ensue and the authors would, at the same time, upload their paper onto a prepublication server. This would inform the public more broadly as to what research topic is undergoing review, and also update policy makers and clinicians as to what new research is being considered, without actually advocating for or changing any practice.

Peer‐reviewers would accept or reject a paper based on its accuracy, quality, and rigor. As prerequisites, the paper must be complete and methods and processes completely transparent. When applicable, authors should follow standardized reporting procedures, including Consolidated Standards of Reporting Trials (COSORT), the Standards for Reporting of Diagnostic Accuracy (STARD), Strengthening the Reporting of Observational studies in Epidemiology (STROBE), and Animal Research: Reporting In Vivo Experiments (ARRIVE). Providing data and any other materials to ensure transparency of the review is essential. By ensuring that all peer‐review documents are openly available we can see how a researcher can provide a critical lens on another team’s work. In addition, by publicly valuing peer‐review as an important contribution to research, already busy researchers may be incentivized to contribute to the process.

With fair, honest, and constructive peer‐review of another’s work, researchers may promote further collaboration and a more productive and open exchange of ideas among researchers. Peer‐review would be acknowledged and valued as a contribution to a field or topic of research, which should be recognized in the merit based research system. Although this can be similar to the “soundness only peer‐review,” it does have an important distinction. Peer‐reviewers would be the ones who decide not only if the accuracy, quality, and rigor is sufficient but also what should be further considered in the collaborative peer‐review process. For example, the peer‐reviewer may identify an important modification which the author(s) can include within reason. Authors may in turn share valid reasons for excluding modifications by explaining certain limitations of a study. The open discussion during the collaborative stage may yield ways to reduce said limitations. The first reviewers would help create space for discussion among a broader constituency.

Editors would also adhere to ethics guidance regarding proper institutional review board review, conflict of interest, and public availability of data (in accordance with confidentiality of research participants). Based on the reviews, the editor would either accept or refuse to publish a paper, or, if possible, ask for modifications prior to further consideration. Once the paper is published, the journal editor would make it available online in a “multistakeholder review mode” and invite various interested parties to share their comments and critiques.

This inclusive multistakeholder process aligns with the application goals of translational research. Involved stakeholders might include interdisciplinary scholars, policy makers, ethicists, humanists, patients, and community members or representatives. Notably, there need not be universal agreement among the various contributors. This last step of collaborative peer review will focus on (A) quality control and limitations, (B) challenges in application, and (C) social benefits and future needs (Table 1).

TABLE 1.

Phases of the collaborative review process model

Phase Openness Tasks Goal
1 – Science development
  • Internal discussions

  • Team develops science and writes manuscript

  • Test new ideas

2 – Peer review
  • Placed online in prepublication format

  • Disclosure includes: this paper has not completed all steps of review process

  • Manuscript is sent to journal

  • Editor review

  • Peer‐review

  • First review for accuracy, quality, rigor

3 – Multistakeholder review
  • Placed online in journal webpage with caveat disclosure

  • Disclosure includes: this paper has not completed all steps of collaborative review process

  • Manuscript is published in a scholarly journal in a “multistakeholder review mode”

  • This publication step lasts about 3 months (more if needed) and is open to comment and debate from various stakeholders

  • Stakeholder review for

    1. Quality

    2. Applicability

    3. Social Benefit

    4. Need

4 – Published
  • Final publication considered as evidence accepted by a collective entity

  • Application in policies, best practices, etc.

  • First steps in knowledge translation and integration

  1. Quality control and limitations : This logical extension of typical expert peer‐review aims to critically discuss methodological, statistical, and analytical considerations from a variety of research fields to broaden the debate. Quality also includes considerations of the limitations of a study and the level of certainty and generalizability of its conclusions.

  2. Challenges in application: Collaborative peer‐review should also consider the practical application of knowledge and highlight implementation challenges. Whereas a researcher may conclude that their published work can be made readily applicable, those living and working in different settings with different patients or clients may have useful insight as to the contextual limitations, possible contraindications, or obstacles to individual and community behavioral adaptation and modification.

  3. Social benefit and future needs : If we consider the quality control (A) with the challenges in application (B), stakeholders can engage in a discussion of the practical implications of the science and its real‐world benefits to individuals and populations. This would allow policy makers to focus on conclusions specific to the practical application of research and determine as well, any areas of research that merit or warrant further work. Identifying the beneficiaries of research also allows us to determine those individuals or groups who could or will be excluded or who may actually be harmed by resulting secondary or adverse effects.

Although this proposed alternative process should be completely open and transparent, the editor could refuse to publish a review that is disrespectful or nonsensical. There would need to be a minimal level of quality and curation within this peer‐review process. Although unlikely, a paper could not pass the multistakeholder review if there is a significant issue that renders the research conclusions false. Identifying justifiable scientific limitations or minor flaws to a paper should not be construed as a reason to refuse final acceptance of a paper. Interestingly, after a paper goes through this process, it is no longer a single research team’s work. Rather, it is adopted and considered by a multistakeholder community not only for its rigor but also its meaning for clinical application. Meaning for clinical application should not only be construed as a binary (i.e., effective or not effective). Rather, meaning spans the impacts that the research might have on different individuals, communities, and societies. This may include discussions regarding cost‐effectiveness, socio‐behavioral limitations, comorbidity complications, long‐term effects, and perhaps even the broader environmental impact of a research application.

CONCLUSION

In this paper, the argument has been made that traditional peer‐review is no longer an adequate system to assess multidisciplinary translational health science publications. Furthermore, assigning individual blame whenever retractions occur may fail to fully address more complex and systemic underlying issues. It may unfairly stigmatize authors who chose to retract a paper as being de facto guilty of misconduct or some more extreme wrongdoing when, in fact, a retraction may simply be the act of a courageous author who wishes to correct an error.

Gatekeepers ensure the quality of research, and the integrity of the researcher, with the understanding that mistakes are made and misconduct does occur. Science’s gatekeepers are editors and peer‐reviewers with the disciplinary, methodological, and content expertise to ensure quality control. However, the current peer‐review system was designed centuries ago when science consisted of single authored projects. Today’s more complex, translational health science environment calls for a review process with a more expansive and diversified expertise—one that is commensurate with the diversified skills and knowledge of multidisciplinary teams of authors. No matter how well‐intentioned, two peer‐reviewers are unlikely to have that capacity.

In this paper, peer‐review of knowledge was compared to the process of clinical trials to further argue that the “testing” of theories or knowledge requires a similar and more extensive process than is currently offered by that of two people conducting one review. This process should include diversity in peer‐review to avoid the potential pitfalls of Groupthink. At this time, the scientific community publicly continues to uphold the existing peer‐review process, while personally harboring significant reservations about its relevance or effectiveness. As such, there is a collective decision to accept peer‐review in spite its shortcomings and our individual reservations—we are living in an Abilene Paradox of sorts. Further research is necessary to break with the status quo and explore viable alternatives to the current system.

This paper presents a rudimentary collaborative multistakeholder peer‐review process, which incorporates considerations of application, social benefit, and future research needs to existing standards of scientific quality and rigor. Some may argue that researchers simply do not have the incentive nor do they wish to contribute to this type of process. However, we have already witnessed researchers dedicate increased effort to scrutinize COVID‐19 research and we have also seen stakeholders criticize sloppy research. Moreover, adding more incentive to participate in peer‐review is long overdue and should be embraced by departments and institutions within the research system.

The proposed model basically reorganizes the peer‐review process to increase rigor and enable faster application of science in practice. Ideally, this process should not take more time and may actually decrease the gap between research and practice because multiple stakeholders would already be engaged in and play a role in the scientific process. Future research should include the development and pilot testing of various types of peer‐review models to assess its quality control, feasibility, impact, value, and capacity to include truly engage stakeholders in reviewing research. Future research should include the further elaboration of this type of model with different stakeholders as it relates to translational health science. The inclusion of both clinical and translational sciences in a new quality control process would be a significant step forward.

CONFLICT OF INTEREST

The author declared no competing interests for this work.

ACKNOWLEDGEMENTS

I would like to thank Drs. David Resnik and Kevin Wooten for their critical feedback of an earlier version of this manuscript.

Funding information

This study was supported in part from the Clinical and Translational Science Award (UL1TR001439) from the National Center for Advancing Translational Sciences, National Institutes of Health.

REFERENCES


Articles from Clinical and Translational Science are provided here courtesy of Wiley

RESOURCES