Skip to main content
Medscape General Medicine logoLink to Medscape General Medicine
editorial
. 2005 Feb 16;7(1):11.

Standardization vs Diversity: How Can We Push Peer Review Research Forward?

Karen Shashok 1
PMCID: PMC1681382  PMID: 16369316

Introduction

Peer review as a means toward quality assurance has been known for some time now to have serious limitations. Although research evaluators and promotional committees still consider it a guarantee of scientific quality and relevance, they seem unaware that they may be expecting too much from peer review. Journal editors, in contrast, have become aware of its failings, but they rightly point out that it has thus far not been possible to develop an acceptable substitute. So a realistic goal right now is to look for ways to improve the process and make it more reliable – for example, by enhancing its internal validity, its predictive power, or other indicators of the quality of the process and its outcomes.

Many of the indicators that have been tested thus far are derived from the qualitative statistical methods currently preferred in biomedical science. As noted by Overbeke and Wager[1]:

Most research on peer review in biomedical journals has been carried out by journal editors with a background in medical research. They therefore tend to use methods that have been developed for clinical trials. These methods…are widely accepted as valid ways of producing robust evidence about medicines and medical techniques, but they may be less appropriate for the complex psychosocial interactions involved in peer review.

Despite the apparent progress made since the 1980s in turning biomedical peer review research into a scientifically grounded endeavor, most “serious” studies to date have yielded inconclusive findings or only weak trends toward statistical significance that neither confirm nor refute the basic assumptions regarding the usefulness of peer review. In short, this research has not produced enough solid evidence for deciding what steps, what criteria, and what outcomes are most likely to help editors decide what to publish and what to reject. The evidence is that there is precious little scientific evidence[2] for the effectiveness of specific practices that have been investigated thus far.

Need for More Broadly Grounded Research Agenda

What do we mean when we say we want to “improve” peer review? It all depends on what we expect peer review to accomplish. As noted by Drummond Rennie[3] in the second edition of Peer Review in Health Sciences, “[g]iven the disparity between the apparent enthusiasm for peer review on the part of editors, and the data, it may be that researchers are studying the wrong factors.” Here, I try to explain how methods and approaches from other areas of knowledge may be used to design research aimed at identifying the factors that influence the outcome of peer review as it is used in the health sciences.

Standard Practice

One way to make peer review research yield useful information may be to distinguish clearly between features of peer review that are “givens” and those that vary between disciplines, journals, and individual editors. The givens – features that can be said to have become more or less standardized – turn out to be surprisingly few: The material is read by persons who are presumed to be experts and whose feedback on the scientific content will be useful to the authors, and these persons are presumed not to have any “formal relationship” with the journal or publisher. The editor uses the reviews to reach a decision about whether to accept, reject, or request revisions to the material. Aside from these basic features, every other practical and professional aspect of the peer review process is characterized by diversity, from the number of reviewers consulted down to the level of editing applied to accepted papers. So what we have is a complex process involving varying numbers of persons playing various roles, with various degrees of responsibility, accountability, and professionalism. As a result, expectations, practices, and assumptions differ markedly across journals. This has made it hard to generalize findings from peer review research across journals, even within the field of biomedical sciences.

The “Quantitative” vs “Qualitative” Schism

Because reviewers and editors are humans, their behavior, whether performing their salaried duties, enjoying their leisure time, or reading manuscripts and writing reviews, is influenced by factors that cannot be predicted, controlled, or standardized. Many factors related to human skills, attitudes, and behaviors influence how peer review is performed, and we know that expectations for peer review vary between disciplines and between editors. There is no standard definition of the roles, steps in the process, or minimum competencies. This makes it important to be aware that in peer review research, as in any other area of science, values and assumptions may easily bias research design and data interpretation. Using only quantitative methods while neglecting qualitative methods, in an attempt to ensure that the research is “scientific,” will not eliminate these biases. But some peer review researchers have found it hard to accept that qualitative methods (also called nonexperimental, descriptive, or observational methods) are no less rigorous than quantitative methods. This narrow view may be holding back a potentially fruitful flow of information across disciplinary boundaries.

At a colloquium on end-of-life care held in Amsterdam, The Netherlands, in 1999, 3 physicians noted how misconceptions about qualitative research methods interfered with the ability of general medical journal reviewers to appropriately interpret and critique their manuscripts.[4] They noted that “quantitative research strives for generalizability and controls intervening or confounding variables using sampling and statistical methods. Qualitative research seeks to understand the particular characteristics of the phenomena under study and admits the influence of all intervening variables as data to be described and analyzed.[4]” A few years previously, an editorial in BMJ [5] reminded readers of one of the fundamental notions in research design: “the question being asked determines the appropriate research architecture, strategy and tactics to be used.” The message for peer review researchers is clear: First, the right question needs to be asked, and then the right type of research method needs to be chosen as the one most likely to yield understandable answers that can be used to develop or modify policies. “Quantitative” is not necessarily synonymous with “more rigorous,” nor is “qualitative” a synonym for “less rigorous.”

Some researchers in the human, social, and behavioral sciences have argued that the divide between quantitative and qualitative is a fallacy that may be holding back useful research in many human endeavors. At the recent IFSE-12 Congress of Science Editors, a panel of sociologists explained the relevance of sociologic research methods for both qualitative and quantitative research. Russell Bernard, editor of the journal Research Methods, explained that the ostensible schism between quantitative and qualitative methods has been greatly exaggerated by some scientists. He noted that many who claim that sociology is “only” a qualitative science are apparently unaware that research in sociology (as in other human sciences) is based on rigorous methods of data collection and analysis. There was no doubt that some members of the audience (consisting largely of biomedical journal editors) were surprised to learn that several quantitative statistical methods now commonplace in the “hard” and biomedical sciences were first developed for data analysis in sociology, eg, factor analysis, regression analysis, probability theory, and Bayesian logic.[6] Alan Singleton, an academic publisher in the United Kingdom, recently pointed out that peer review has been perpetuated in part to sustain Merton's “norms of science” – a set of general principles developed “as part of a sociology of science.[7]” How ironic then that modern-day researchers in biomedical peer review have for the most part turned their backs on sociology as a potential source of research methods.

Defining the Functions of Peer Review

Gatekeeping

Overbeke and Wager[1] put their fingers on one of the most important challenges in peer review research: “Given the lack of consensus about the primary function of peer review, it is perhaps not surprising that a review of the literature reveals little agreement about the outcomes that should be measured.” So deciding which function of peer review is to be tested is an important step. For example, when the aim of peer review is to aid the editor in tending the gates (ie, in deciding what information “deserves” dissemination and what information is deemed not to be of sufficient “priority”), research methods from the behavioral, social, and anthropological sciences can be useful. Methods in these sciences are likely to help editors find out how peers reach a consensus about what is “relevant to the field” and about how motivation and decision making influence the outcomes of peer review. If the questions we are asking are “What makes some reviewers better than others?” or “How can I motivate reviewers to do a good job?,” then advice from ethnographers (ie, scientists who study behavior in specific communities and cultures) and other experts in human behavior may well be useful. The potential contributions from the methods used in these areas will depend on the degree to which the editor relies on the reviewers' opinions in reaching a decision on imprecisely defined variables, such as “relevance,” “importance,” or “timeliness.”

Legitimizing

Another common function of peer review, whether intended or not, is as a “stamp of legitimacy.” As a result, the fallacy that peer review guarantees the scientific validity of the material unfortunately remains widespread. The potentially disastrous consequences of this unscientific assumption for both journals and readers were pointed out forcefully in a recent Letter to the Editor in Nature.[8] This aspect of the peer review process is crying out for a closer look by sociologists, psychologists, and other scientists whose findings may well be of interest not only to editors, but also to the productivity or performance evaluators who are searching for the best criteria for rewarding certain research activities over others.

Improving the Science, the Writing, or Both?

Confusion persists over whether editing to improve the writing, reporting, or readability of texts falls under the purview of peer review – a confusion reflected in the variety of practices across journals. It may therefore be helpful to distinguish between “reviewing interventions” (ie, review of the scientific content) and “editing interventions” (that affect the use of language but not the actual scientific content). This division allows the review process to be broken down into components that can be investigated separately (eg, blinding, types of support provided to reviewers, rewards offered for prompt or better-than-average reviewing, or various types of text editing). Focusing on specific interventions will make it easier to design studies that can be adapted to individual journal and editor practices.

For example, if a goal of peer review is to identify the “best” science, researchers should ask themselves whether this is assumed to mean:

  • Screening out unsuitable manuscripts only and eliminating them from the selection process;

  • Identifying a gradient of manuscripts potentially suitable for publication after varying degrees of revision; or

  • Skimming off the cream and discarding everything else.

Each reviewing task implies different levels of expertise in critical reading skills, and as any editor knows, reviewers' performance in this area varies widely. Cohort studies, before-after or diachronic studies, and other designs that aim to produce “humble” frequency distributions may shed light on what reviewers and editors are actually doing, and how changes in editorial policies affect these outcomes.

On the other hand, if the question we wish to answer is “How can I get reviewers to perform consistently and reliably?,” then observational, descriptive methods are probably a necessary first step in the attempt to identify features in their backgrounds and personalities that influence their critical reading skills and their motivation to produce rigorous, useful reviews. Given the lack of consensus on how to recruit good reviewers or how to train reviewers to perform better, it may be worthwhile here to examine the literature on education and training methods. Management science is another potentially fruitful area to investigate for research methods, given that management methods and peer review methods share some basic features: Both seek to optimize the results obtained with the analytical skills that need to be applied to practical tasks, and both are based to a large extent on decision-making and choice-making processes.

Writing and Text Quality

As noted, it is useful to keep reviewing interventions separate from editing interventions. Editing, however – like reviewing – can mean very different things to different people, so it is important to know exactly what kind of editing is to be studied. Journals and publishers differ in the resources at their disposal, so the scope and quality of the editing cannot be assumed to be standardized across all journals. Are we talking about substantive editing (also known as subediting and editing to improve readability)? This type of text editing seems to be a task that few journals can afford these days. And even among those journals that can afford substantive editing, the final quality of the “scientific English” shows considerable diversity.

If the journal values good writing and good editing, and if these elements are to be studied to determine which editing interventions are helpful to readers, the advice of experts in writing and editing should be sought. If the aim is to find out how to improve the effectiveness of the text in communicating the authors' messages, possible outcomes that should be targeted for intervention include:

  • Improving “the English” (grammar, spelling, punctuation, and other superficial elements);

  • Improving the reporting (especially with regard to details of the experimental design and statistical analysis as well as the choice of figures and tables used to support the findings);

  • Improving the writing (choice of what information is included and how it is organized); and

  • Improving the argumentation and internal logic (although such improvement may not be possible if the study was poorly designed or carelessly performed).

Research published by an expert in the writing and reading of science texts[9] has suggested that only a minority of reviewers and editors are able to help authors with the last 2 points above. It is here where research methods from the areas of languages for specific purposes, English for specific purposes, communication, and education are likely to be helpful in designing research.

Readers – the Forgotten Element

The issue of editing to improve readability brings us face to face with the readers – whose participation in peer review research has been largely neglected. A variable that has been overlooked in much peer review research is how actual readers judge the usefulness of what is published. Nevertheless, the ultimate success of peer review can only be judged by members of the population whose interests it claims to serve: the consumers of scientific information. Do peer review and editing make texts more “useful” to their intended readers? How we go about trying to answer this question depends on how we define useful.

The clarity of the internal logic, argumentation, and rhetoric can differ between articles in the same journal, and these features of the writing and editing (as separate from the articles' scientific validity) may affect how useful the material is to different readers. “Readability,” as measured by the Gunning-Fogg or Fleisch Readability Index, has been tried as a surrogate measure of the quality of the writing and editing, but these measures are based mainly on word or syllable counts and are less than ideal indicators of how easy or difficult readers find the material to comprehend.

The most glaring methodological flaw of peer review, as it is currently practiced, is probably that decisions on how the material should be changed and when it is ready to be published are based on subjective judgments issued by a small, nonrandom, nonsignificant, and therefore nonrepresentative sample of all potential readers. Some surveys have asked authors what they expect from peer review in general, but thus far there have been few attempts to ask a “natural” population of readers how well peer review fulfilled their expectations with regard to specific texts that have undergone review and been published. Surveying readers is labor- and resource-intensive, and the resulting data are “merely” descriptive statistics. But this need not be considered a methodological limitation if what we need to know is how well peer review is fulfilling its function of providing scientists with access to the best possible information in the best possible way.

The only way to find out how useful or readable published texts are is to survey real readers. Their definitions of readability, “flow,” and “internal coherence” may well differ from the definitions assumed by editors. So, instead of asking readers to rate some poorly defined variables on a visual analog scale, it may be necessary to distribute versions of the same text in various states of revision and editing to a large sample of readers; ask them to take a “reading comprehension” test; and compare the scores between groups who received different versions of the text. Unfortunately, such surveys require resources that not all journals have at their disposal. Help and inspiration, however, may come from research done by editors who work with authors on texts before they are submitted for peer review. A few examples of how research by authors' editors can shed light on the peer review process are given elsewhere (Shashok, 2005, in preparation).

Cultural Diversity in Scientific Writing

Joy Burrough-Boenisch, a translator and authors' editor with a particular interest in language interference, has observed that “cultural perceptions of appropriate writing style (a cultural preference for inductive over deductive argumentation, for example, or for author backgrounding) and of the reviser's remit and perceived responsibilities (to the author? to the readers – also taking account of their culture and mother tongue?) will be influential.[9]” Her research suggests the likelihood that “the reviser's mother tongue influences which linguistic, semantic, lexical and orthographic ‘errors’ or infelicities the reviser detects and remedies in a text.[9]” In an article published in The Write Stuff (the journal of the European Medical Writers Association),[10] she provides a concise description of how expectations and preferences for scientific writing style differ across cultural backgrounds.

Evidence of cultural differences in reviewers' comprehension of texts was again found in Burrough-Boenisch's study of how readers from 8 different countries corrected the hedging (degree of caution or uncertainty in a text, indicated by words and phrases such as “may,” “could,” “probably,” “suggest,” or “appear to show”) in sample texts from research papers written by Dutch scientists.[11] As many editors know, reviewers often complain that the statements in a manuscript need to be expressed more cautiously or less assertively. Because of the international nature of peer review, how confidently or gingerly reviewers from different cultural backgrounds expect facts and opinions to be stated may not always match what the authors themselves believe is appropriate or justified on the strength of the evidence. Burrough-Boenisch noted that the differences in responses to her sample texts meant that in some cases, “readers disagreed about what was ‘appropriate’ hedging in a given context.” There were even differences between US and UK readers in how hedging was corrected – a finding that shows that even among native speakers of English, there may be discrepancies over the degree of caution or confidence that authors should use to report their findings.

Editors should be aware of these and other culturally related differences in how readers interpret texts, as they may well affect how reviewers react to manuscripts from “international authors” and how these authors interpret and implement comments provided by “international reviewers.” This is an area of peer review that calls for additional research with methods (both qualitative and quantitative) used widely in applied linguistics and one of its academic subspecialties, languages for specific purposes. Low tolerance for ways of defending an argument or emphasizing a point that departs from how the reviewer expects these elements of scientific writing to be handled may in some cases skew the reviewer's judgment of the scientific content and messages.

The implications for peer review research are clear. Studies that intend to test interventions to improve peer review (as a method of scientific quality assurance) should be designed, performed, and analyzed separately from studies that aim to test editing interventions (as a method of quality assurance in the use of language). The research methods likely to be most useful in each case will probably be different, and peer review researchers would do well to investigate the methods used in the human and social sciences to discover whether other specialists have already devised methods that may prove fruitful.

Conclusion: Time for Greater Diversity

In contemplating research on peer review, researchers should first ask themselves what function they expect or assume peer review to perform for their journal before they decide whether they need an experimental design or a nonexperimental design. Then they should think carefully about what questions they are trying to answer, and after that they should think even more carefully about how they choose, define, and measure what they want to study. Some variables may be discrete and readily quantifiable, but there are many others that may be observable and analyzable only in qualitative or descriptive terms.

When questions are being asked about people's perceptions and behaviors – which undoubtedly play a large role in the peer review process – qualitative research methods are advisable. The research methods used in social and human sciences may be more appropriate for behavior-related elements (attitudes, motivations, and potential biases, for example) and language-related elements (reporting, technical editing, and readability, for example) that come into play in peer review, whereas quantitative methods may be suitable for studying (for example) the effectiveness of specific reviewing or editing interventions in reducing the number of outright errors in published articles.

How can we bring different research communities closer together and facilitate communication so that their members can begin to share their expertise?

As a common ground, compliance with at least part of the MOOSE Checklist for meta-analyses of observational studies[12] may be a way for researchers in the human and social sciences to satisfy the expectations of biomedical editors for methodological rigor. The recommendations in the section on “reporting of methods” offer plenty of clues about the information that readers accustomed to quantitative methods will expect to see so that they can judge the soundness of the information-gathering techniques. For their part, biomedical editors should be prepared to remove their methodological blinkers and take a look around. Excessive faith in strictly quantitative methods intended to test hypotheses rather than to generate them may have blinded some earnest researchers to the existence and potential usefulness of alternative research designs and methods from other areas of knowledge.

There is a risk that strictly quantitative methods may yield numerical data for samples (of manuscripts, journals, or reviewers, for example) that end up being too small to produce significant results. Another risk is that such data may be obtained at the expense of ignoring potentially useful information about reviewers', editors', or readers' expectations for the roles of peer review and scientific communication. Instruments, such as the CONSORT checklist,[13] intended for use in a highly structured situation (the randomized, controlled trial) amenable to methodological standardization are probably unsuitable in important ways as a methodological yardstick for complex processes, such as peer review, beset as it is by fuzzy concepts, biases, and unproved assumptions. Peer review does not need to become an exact science. But peer review research can be made more productive by incorporating methods from sciences that are comfortable with qualitative methods alongside quantitative ones.

Standardization is useful only if it makes processes and outcomes more efficient and more effective. But standardization should not be assumed to be a virtue in itself. If it stifles potentially useful, creative thinking, it is time to make room for diversity. Just as there is diversity in how peer review is implemented, how manuscripts are edited, and how journals are managed, there is also diversity in how authors argue their points and how readers read and understand scientific texts. Researchers should allow alternative models of peer review that are best suited to each scientific community to flourish, rather than try to force this complex process into a standard mold. Greater diversity in designing studies to investigate peer review, achievable by drawing on methods that have been tested and validated in other disciplines, may help move biomedical peer review research out of its becalmed state and set it on course again toward deeper, bluer seas of insight.

Summary Points

  • There is no standard definition of peer review, and practitioners do not agree on the primary functions of peer review;

  • Research in biomedical peer review may have reached an impasse because of judgments that place higher value on quantitative than on qualitative research methods;

  • There is a need for research based on data from the ultimate consumers of the products of peer review, ie, the readers;

  • Disciplines that can provide potentially useful research methods include (but are not limited to) sociology, psychology, anthropology, ethnography, education, management science, and linguistics; and

  • Research methods used to study biomedical peer review may benefit from greater diversity and less insistence on standardization.

Acknowledgments

The author owes her rediscovery of and respect for the scientific validity of qualitative research methods to Mary Ellen Kerans, a teacher of scientific writing in Barcelona, Spain. The author's thinking about peer review research has been influenced by many colleagues, including members of the team that produced the Cochrane methodology reviews on peer review and technical editing (especially Liz Wager and Tom Jefferson), members of the WAME email discussion list, and experts in academic writing and editing (especially Joy Burrough-Boenisch). The author dedicates this article to Ana Marusic, a biomedical journal editor who has been calling for an interdisciplinary approach to peer review research for several years.

References

  • 1.Overbeke J, Wager E. The state of evidence: what we know and what we don't know about journal peer review. In: Godlee F, Jefferson T, editors. Peer Review in Health Sciences. 2nd ed. London, United Kingdom: BMJ Books; 2003. pp. 45–61. [Google Scholar]
  • 2.Jefferson TO, Alderson P, Davidoff F, Wager E. The Cochrane Library Issue 1. Oxford, United Kingdom: Update Software; 2003. Editorial peer-review for improving the quality of reports of biomedical studies (Cochrane Methodology Review) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Rennie D. Editorial peer review: its development and rationale. In: Godlee F, Jefferson T, editors. Peer Review in Health Sciences. 2nd ed. London, United Kingdom: BMJ Books; 2003. pp. 1–13. [Google Scholar]
  • 4.Martin DK, Lavery JV, Singer PA. Qualitative research on end-of-life care: unrealized potential. In: van der Heide A, Onwuteaka Philipsen B, Emanuel EJ, van der Maas PJ, van der Wal G, editors. Clinical and Epidemiological Aspects of End-of-Life Decision-Making: Proceedings of the Colloquium; 7–9 October 1999; Amsterdam, The Netherlands. Amsterdam, The Netherlands: Royal Netherlands Academy of Arts and Sciences; 2001. pp. 77–87. [Google Scholar]
  • 5.Sacket DL, Wennberg JE. Choosing the best research design for each question (editorial) BMJ. 1997;315:1636. doi: 10.1136/bmj.315.7123.1636. Available at: http://www.bmj.com/cgi/contents/full/315/7123/1636 Accessed October 27, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bernard HR. Publishing social sciences. Balancing qualitative and quantitative in a social science journal. The 12th International Conference for Science Editors (IFSE-12). Future trends in science editing and publishing. Bringing science to society; October 10–14, 2004; Merida, Mexico. Abstract p. 20. [Google Scholar]
  • 7.Singleton A. Data protection and peer review. Learned Publishing. 2004;17:195–198. [Google Scholar]
  • 8.Svetlov V. The real dirty secret of academic publishing (correspondence) Nature. 2004;431:897. doi: 10.1038/431897a. [DOI] [PubMed] [Google Scholar]
  • 9.Burrough-Boenisch J. Culture and conventions: writing and reading Dutch scientific English. Utrecht (The Netherlands): LOT Netherlands Graduate School of Linguistics (dissertation no. 59) 2002 Available at: http://www.lotpublications.nl/publish/issues/Burrough/index.html Accessed February 2, 2005.
  • 10.Burrough-Boenisch J. A bit of culture. The Write Stuff. 2004;13:41–42. [Google Scholar]
  • 11.Burrough-Boenisch J. NS and NNS scientists' amendments of Dutch scientific English and their impact on hedging. English Specific Purposes. 2005;24:35–39. Available at: http://authors.elsevier.com/sd/article/S0889490603000632 Accessed November 8, 2003. [Google Scholar]
  • 12.Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology. A proposal for reporting. JAMA. 2000;283:2008–2012. doi: 10.1001/jama.283.15.2008. Available at: www.consort-statement.org/MOOSE.pdf Accessed November 7, 2004. [DOI] [PubMed] [Google Scholar]
  • 13.Altma DG, Schulz KF, Moher D, et al. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med. 2001;134:663–694. doi: 10.7326/0003-4819-134-8-200104170-00012. [DOI] [PubMed] [Google Scholar]

Articles from Medscape General Medicine are provided here courtesy of WebMD/Medscape Health Network

RESOURCES