Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 1.
Published in final edited form as: Res Synth Methods. 2019 Jul 18;10(4):539–545. doi: 10.1002/jrsm.1369

The Value of a Second Reviewer for Study Selection in Systematic Reviews

Carolyn Stoll 1, Sonya Izadi 1, Susan Fowler 2, Paige Green 3, Jerry Suls 3, Graham A Colditz 1
PMCID: PMC6989049  NIHMSID: NIHMS1040153  PMID: 31272125

Abstract

Background:

Although dual independent review of search results by two reviewers is generally recommended for systematic reviews, there are not consistent recommendations regarding the timing of the use of the second reviewer. This study compared the use of a complete dual review approach, with two reviewers in the both the title/abstract screening stage and the full-text screening stage, as compared to a limited dual review approach, with two reviewers only in the full-text stage.

Methods:

This study was performed within the context of a large systematic review. Two reviewers performed a complete dual review of 15,000 search results and a limited dual review of 15,000 search results. The number of relevant studies mistakenly excluded by highly experienced reviewers in the complete dual review was compared to the number mistakenly excluded during the full-text stage of the limited dual review.

Results:

In the complete dual review approach, an additional 6.6% to 9.1% of eligible studies were identified during the title/abstract stage by using two reviewers, and an additional 6.6% to 11.9% of eligible studies were identified during the full-text stage by using two reviewers. In the limited dual review approach, an additional 4.4% to 5.3% of eligible studies were identified with the use of two reviewers.

Conclusions:

Using a second reviewer throughout the entire study screening process can increase the number of relevant studies identified for use in a systematic review. Systematic review performers should consider using a complete dual review process to ensure all relevant studies are included in their review.


When performing a systematic review, the importance of study selection cannot be overstated. Decisions about which studies to include are considered among the most significant decisions made during the review process.1,2 The quality of the study selection process is dependent on two factors, the formation of specific and clear eligibility criteria, and the systematic implementation of these criteria against each record found in the search process.

As a comprehensive search strategy can result in thousands of results that must be screened, the process of screening search results against eligibility criteria can require significant time and resources. The best method for study screening is one that allows for a high level of accuracy, ensuring that no relevant studies are mistakenly excluded, with as much efficiency as possible. Studies can be mistakenly excluded during the screening process due to a misapplication or misunderstanding of eligibility criteria or due to random error of the screener. In order to reduce this potential for missed studies, it is commonly recommended that two (or more) screeners undertake the screening process. The Agency for Healthcare Research and Quality (AHRQ), the Center for Reviews and Dissemination (CRD), the Institute of Medicine (IOM), and the Cochrane Collaboration all recommend using two or more members of the review team, working independently, to screen studies.14 The IOM specifically notes that doubling the number of screeners requires significant time and resources, but that the additional expense is justified in order to reduce bias and errors.2 A previous study explored the impact of using dual reviewers as compared to a single reviewer, and found that the average increase in eligible studies identified using two reviewers was 9%, ranging from 0% to 32%, suggesting a notable impact by the second reviewer.5

Additionally, these groups are consistent in recommending that study screening be performed in a two-stage process, in which titles and abstracts are screened first, followed by full-text study reports. However, these groups do not explicitly recommend if the additional reviewer should be involved in both stages. Cochrane and AHRQ address this briefly. Cochrane suggests that adding a second reviewer at the full-text stage may be sufficient, saying that “Authors must first decide if more than one of them will assess the titles and abstracts of records retrieved from the search…It is most important that the final selection of studies into the review is undertaken by more than one author.”1 AHRQ states that, “Some form of dual review should be done at each stage,” however they suggest alternatives to dual review in the title/abstract stage such as having the second reviewer only review the first reviewer’s exclusions, or only conducting dual review on a small percentage of the records in a pilot phase in order to resolve any confusion, and then going on to single review only for the remainder of the title/abstract phase.3 However, while performing a pilot phase may help to reduce error due to unclear or misunderstood eligibility criteria, it is unlikely to prevent all error, and it will not prevent random error by the reviewer, so is unlikely to be sufficient6 to improve accuracy and prevent mistakenly excluded studies.

Using a second reviewer in the screening process represents a significant amount of resources. Using a second reviewer only in the full-text stage of screening may be a way to reduce resources necessary, while still maintaining a lower level of bias. A previous study explored various methods of study selection using a cost-effectiveness analysis and found that using two reviewers in the title/abstract stage and moving all records marked by at least one reviewer as eligible, rather than resolving disagreements, was equally effective and less costly than traditional double screening in a systematic review of effects of undergraduate medical education in UK general practice settings.6 However, they conclude that effectiveness of different screening methods is likely to vary between systematic reviews.

We set out to compare two methods of study selection, using dual independent reviewers throughout the title/abstract and full-text stages of the screening process versus using dual independent reviewers only in the full-text stage in the context of a systematic review exploring representation of multimorbidity in behavioral intervention randomized controlled trials.7 The objective of this study is to identify if using dual reviewers throughout the entire study screening process produces a clear benefit over using dual reviewers only at the full-text screening stage in a large systematic review.

Methods

Study Setting

This study took place in the context of a large systematic review evaluating the inclusion of participants with multiple chronic conditions in randomized trials of behavioral health interventions.7 The methods and results of this systematic review are reported separately7, but summarized briefly here.

The eligibility criteria of the systematic review were (1) Primary report of a RCT testing the efficacy or effectiveness of behavioral interventions (2) the study reports original data (protocols, post-trial follow-up studies, secondary or separate subgroup analyses were excluded) (3) the RCT targets chronic illness (4) the RCT applied eligibility criteria at the individual level (5) the trial was published in English (6) the RCT enrolled only adult subjects (≥18 years).

The search strategy of the systematic review was designed to be broad in order to identify all published RCTs in adults that test behavioral health interventions and target chronic illness. Due to this broad search strategy, this systematic review involved a large number of search results which provided the ideal setting for the current study. This search produced 343,123 records of potentially relevant reports. After removing duplicate records, 190,555 records remained.

After the search was performed, a sampling strategy was used to produce a representative sample of literature of behavioral intervention RCTs targeting participants with chronic conditions published from 2000 to 2014. This was done by randomly ordering search results (within three time periods, 2000–2004, 2005–2009, 2010–2014) using the RAND function in Microsoft Excel and performing study selection on the randomly ordered results within each time period until the target sample size (200 studies per time period, 600 studies total) was reached.

For purposes of the current study, the first 15,000 records from the randomly ordered search results (5,000 per time period) were used for the complete dual review approach and the next 15,000 records of the randomly ordered search results (5,000 per time period) were used for the limited dual review approach. Two experienced reviewers (CS, SI) took part in this study. Both reviewers were involved in the study design of the systematic review and the definition of the eligibility criteria. The reviewers went through a pilot process with the eligibility criteria prior to starting the study to ensure they had a similar understanding of the eligibility criteria.

Complete Dual Review Approach

During the first approach of the study (Fig 1), reviewers fully performed a dual independent review of all records. Reviewers first independently screened studies by title/abstract and compared results. Records were excluded if both reviewers had excluded them. Records were moved to full-text screening if both reviewers indicated they should be kept. Records for which reviewers had opposing decisions were reviewed again together and a consensus was made to exclude them or move them to full-text screening.

Fig 1.

Fig 1.

Comparison of the complete dual review approach and the limited dual review approach

Reviewers then independently screened identical lists of studies by reading the full-text of each study report and applying eligibility criteria. After the screenings were complete, results were compared. Records were excluded if both reviewers had excluded them. Records were included in the review if both reviewers had included them. Records for which reviewers had opposing decisions were reviewed again together and a consensus was formed.

This review process was completed in three sets of 5,000 records (total of 15,000 records), with comparison between the results of the two reviewers at the end of each set.

Limited Dual Review Approach

In the second approach of the study (Fig 1), reviewers performed a limited dual review of records. Records were assigned to each reviewer in alternating groups of 2,500 (total of 15,000 records) such that each title/abstract was reviewed by only one reviewer. Decisions made by the sole reviewer regarding exclusion or moving of the record to full-text screening were considered final.

Studies indicated for full-text review by solo review were then independently dually reviewed, following the same full-text review process as in the first approach of the study.

Analysis

Complete Dual Review

In order to quantify the benefit of the use of a second reviewer in each stage, analysis was performed to identify records in each stage that were mistakenly excluded by a reviewer. Screening results from the title/abstract stage were appraised to identify records that were originally selected for inclusion by one reviewer and exclusion by the other reviewer, and then after discussion between reviewers were moved to full-text screening. Only records that were excluded by one reviewer but eventually included in the systematic review were considered to have been mistakenly excluded. Records that were originally disagreed on but were not eventually included in the review were not counted. The full-text screening stage was appraised in a similar way to identify which studies that were eventually included in the review had been mistakenly excluded by each reviewer. The records mistakenly excluded in both the title/abstract stage and full-text stage were identified and subtracted from the total number of records mistakenly excluded to determine the total number of unique records mistakenly excluded by a reviewer during the complete dual review approach.

Limited Dual Review

The full-text screening results of the limited dual review approach were analyzed in the same way as the full-text screening results of the complete dual review to identify which studies that were eventually included in the review had been mistakenly excluded by each reviewer.

Comparison of Approaches

Total number of mistakenly excluded unique records by reviewers (mistakenly excluded vs. not mistakenly excluded) in each approach (complete dual vs. limited dual) were compared using a chi-square test of independence. Only unique records mistakenly excluded in the complete dual review approach were counted so that a record that was mistakenly excluded in both the title/abstract screening stage and the full-text screening stage was only counted once.

Comparison of Results within Complete Dual Screening Approach

Number of unique records mistakenly excluded in each set of 5,000 records screened during the complete dual review approach was calculated to explore bias over time, i.e., if reviewers improved their accuracy of screening throughout the approach. Total number of mistakenly exclude unique records by reviewers (mistakenly excluded vs. not mistakenly excluded) in each set of 5,000 results screened using the complete dual review approach (first set of 5,000 vs. second set of 5,000 vs. third set of 5,000) were compared using a chi-square test of independence.

Results

Figure 2 summarizes the results of the analysis.

Fig 2.

Fig 2.

Results using the complete dual review approach and the limited dual review approach

Complete Dual Review Approach

Using the complete dual review approach, a total of 15,000 title/abstract records were screened. Of these records, 242 study reports were ultimately included in the systematic review. During the title/abstract screening, a total of 810 records were identified by at least one reviewer as potentially meeting eligibility criteria. After comparison of results and discussion of disagreements, 507 records were moved to full-text review stage. During this stage, the reviewers mistakenly excluded 22 (9.1%) and 16 (6.6%) records of the 242 study reports that eventually were included in the systematic review.

During the full-text review stage, 507 records were screened. Of these, 242 were ultimately included in the systematic review. During this stage the reviewers mistakenly excluded 29 (11.9%) and 16 (6.6%) records of the 242 study reports that eventually were included in the systematic review. Overall, 51 (0.3%) and 32 (0.2%) of 15,000 records were identified as being mistakenly excluded in at least one stage by a reviewer, for a total of 83 across both stages. However, 17 of these were mistakenly excluded in both the title/abstract stage and the full-text stage, leaving 66 (0.4%) unique records mistakenly excluded of 15,000 screened using the complete dual review approach.

Limited Dual Review Approach

During the limited dual review approach, a total of 15,000 title/abstract records were screened. Of these 15,000 records, 515 records were moved to the full-text review stage by a single reviewer. These 515 records were then dually reviewed by two reviewers, and ultimately 226 full-text study reports were included in the review. During the full-text review stage, reviewers mistakenly excluded 10 (4.4%) and 12 (5.3%) records of the 226 study reports that eventually were included in the systematic review. Overall, 22 (0.2%) of 15,000 records were mistakenly excluded in the full-text stage by a reviewer using the limited dual review approach.

Comparison of Selection Approaches

A chi-square test of independence was performed to examine the relation between review process (complete dual vs. limited dual) and number of mistakenly excluded study reports (mistakenly excluded vs. not mistakenly excluded). The relation between screening approach and number of mistakenly excluded study reports was significant (χ2(1) = 22.065, p <0.05). Reviewers were more likely to identify mistakenly excluded relevant study reports through the complete dual review process (0.4%) than the limited dual review process (0.2%).

Comparison of Results within Complete Dual Review Approach

When considering each set of 5,000 results used in the complete dual screening approach, reviewers mistakenly excluded 16 (0.3%) of 5000, 22 (0.4%) of 5000, and 26 (0.5%) of 5000 results. A chi-square test of independence was performed to examine the relation between set of 5,000 results screened (first set of 5,000 vs. second set of 5,000 vs. third set of 5,000) and number of mistakenly excluded unique records by reviewers (mistakenly excluded vs. not mistakenly excluded) using the complete dual review approach. The relation between set of 5,000 and number of mistakenly excluded unique study reports was not statistically significant (χ2(2) = 2.385, p =0.30).

Discussion

This study compared the use of a complete dual review screening process, with independent dual review at both the title/abstract stage and the full-text stage, to a limited dual review screening process, with single review at the title/abstract stage and dual review at the full-text stage, within the context of a large systematic review of randomized controlled trials of behavioral interventions in participants with chronic conditions.7 Results show that a complete dual review screening process can result in identifying a larger number of eligible studies than a limited dual screening process. Our results confirm the conclusions of a previous study that including a second reviewer in the screening process can result in significant impact,5 and contribute to the evidence from previous studies6 of the effects of the timing of the use of the second reviewer.

It is reasonable that a complete dual review process would result in increased thoroughness in selection of studies for a systematic review. Random error, particularly when screening such a large number of search results, can occur at either stage in the screening process so it is logical that a complete dual review process would help to prevent this error from affecting the inclusion of relevant study reports in a systematic review. Given the potential for random error, other strategies recommended to improve completeness of study selection in a systematic review, such as conducting dual review on a small percentage of search results only in a pilot phase before proceeding to single review of title/abstract,3 are unlikely to result in the level of precision that can be reached with a complete dual review process.

Our results differ from that of Shemilt et al.,6 which reported similar rates of effectiveness (number of inappropriate exclusions avoided) across each method of study screening, including what they refer to as double screening and single screening, which are similar to our two approaches. There are several possible explanations for this different result. Potentially the two systematic reviews that provided the context for these studies differed in complexity of eligibility criteria, or the different research questions resulted in different types of title/abstract records or full-text reports that may be more or less difficult to screen properly. The systematic review that provided the context for this study had a broad study question and sought to include a diverse set of studies both in type of behavioral intervention being tested and participant population targeted.7 Other systematic reviews on more focused questions aim to include a more specific set of study reports which will greatly impact how the eligibility criteria is defined, what type and variety of records are found through the search strategy, and therefore the entire study selection process. Continued investigation of these screening methods in other systematic reviews differing in type or context is necessary to provide additional evidence guiding systematic reviewers on which screening methods should be used.

This study represents an addition to the limited evidence regarding the effectiveness of different study screening approaches in systematic reviews. The systematic review during which this study was performed provided a unique opportunity to perform this study given the large of number of search results that we needed to screen.

However, there are several limitations of the study. A major limitation is that our study only focuses on effectiveness of the two approaches in reducing the number of records mistakenly excluded, and does not directly address the cost-effectiveness of each method.

There are also limitations of our methods. If studies were misclassified by both reviewers it is possible that the number of studies mistakenly excluded by reviewers is underestimated. Additionally, the complete dual review process approach was performed before the limited dual review process approach, creating the possibility of bias over time as reviewers became more experienced in the screening process and improved their precision in identifying relevant studies. This is one possible explanation for the higher number of studies mistakenly excluded by reviewers in the full-text screening of the complete dual review process approach as compared to the limited dual review process approach. However, in order to reduce this potential bias, the complete dual review process approach was performed in three sets of 5,000 search records. After each set of 5,000 records, reviewers compared titles/abstracts chosen, agreed on the records for full-text screening, performed the full-text screening stage, and agreed on the studies to be included in the systematic review. This allowed the two reviewers to improve their process and potentially improve their precision within the complete dual review process approach, which would have not been possible had we performed the complete dual review process on the entire 15,000 records before comparing between reviewers. Our results comparing across sets of 5,000 records used in the complete dual review approach show no statistically significant difference is number of mistakenly excluded study reports, suggesting that although reviewers may improve at study screening over time, it is not at a level that would impact our results. This adds to our confidence in our overall conclusion, that the complete dual review approach identifies more mistakenly excluded study reports than the limited dual review approach, even while acknowledging the potential bias by performing one approach before the other.

It is also possible that dual screening in the title/abstract stage could affect the number of studies mistakenly excluded by dual review in the full-text review stage. Potentially some of the records that were prevented from being mistakenly excluded by the use of a second reviewer in the title/abstract stage may have been study reports that were less straightforward in meeting the eligibility criteria, creating a more complicated process for full-text review if records were more difficult to assess using the eligibility criteria. A reader may be more likely to have rejected these mistakenly during the full-text stage, providing more opportunity for the second reviewer to identify mistakenly excluded records as compared to the limited dual review process which may have started with a group of records that more obviously fit eligibility criteria.

However, the study design also allows us to compare independent dual review at the title/abstract stage to the full-text stage within the 15,000 records used in the complete dual review process, which is less likely to be impacted by these limitations. As additional studies were identified in the title/abstract stage than in the full-text stage alone, this confirms that dual review at the title/abstract stage results in finding additional relevant studies.

Further studies should be performed to confirm the findings of this study in the context of other systematic reviews on different topics and with varying complexity of eligibility criteria. Exploration of whether there are certain characteristics of study reports that made them more likely to be mistakenly excluded, such as publication in non-English journals, could provide helpful context for how to improve the study selection process. Additionally, studies should evaluate additional other strategies recommended to improve precision of study selection in a systematic review, such as having a second reviewer confirm the decisions of the first reviewer in the title/abstract stage, as opposed to performing fully independent reviews.3 Novel methods for performing study selection in systematic reviews, such as automating the process,812 have been proposed and have shown promise in providing a reduction in work load but also a reduction in precision.6,13 These methods need further comparison in precision to the complete dual review process.

Although it may be tempting for systematic review performers to use a limited dual review process to save time and resources, this study provides evidence that a complete dual review process will increase the precision of study selection in systematic reviews. Given the importance of a precise study selection process for the quality and results of a systematic review, review performers should carefully consider the potential impact of performing anything other than a complete dual review process for study selection.

Funding

This project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. HHSN261200800001E. The content of this publication does not necessarily reflect the views or polices of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations to imply endorsement by the U.S. Government. Stoll, Izadi, and Colditz are supported, in part, by the Foundation for Barnes-Jewish Hospital, St Louis. Colditz is also supported by the Alvin J. Siteman Cancer Center Biostatistics Shared Resource, P30 CA091842.

Data Sharing

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Footnotes

Conflict of Interest

The authors declare no conflict of interest.

References

  • 1.Higgins JP, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011) The Cochrane Collaboration, 2011. The Cochrane Collaboration. [Google Scholar]
  • 2.Morton S, Berg A, Levit L, Eden J. Finding what works in health care: standards for systematic reviews. National Academies Press; 2011. [PubMed] [Google Scholar]
  • 3.McDonagh M, Peterson K, Raina P, Chang S, Shekelle P. Avoiding bias in selecting studies. Rockville (MD): Agency for Health Care Research and Quality;2013. [PubMed] [Google Scholar]
  • 4.University of York Centre for Reviews and Dissemination. Systematic reviews: CRD’s guidance for undertaking reviews in health care. University of York, Centre for Reviews & Dissemination; 2009. [Google Scholar]
  • 5.Edwards P, Clarke M, DiGuiseppi C, Pratap S, Roberts I, Wentz R. Identification of randomized controlled trials in systematic reviews: accuracy and reliability of screening records. Statistics in medicine. 2002;21(11):1635–1640. [DOI] [PubMed] [Google Scholar]
  • 6.Shemilt I, Khan N, Park S, Thomas J. Use of cost-effectiveness analysis to compare the efficiency of study identification methods in systematic reviews. Systematic reviews. 2016;5(1):140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Stoll CR, Izadi S, Fowler S, et al. Multimorbidity in Randomized Controlled Trials of Behavioral Interventions: A Systematic Review. Health Psychology. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Olofsson H, Brolund A, Hellberg C, et al. Can abstract screening workload be reduced using text mining? User experiences of the tool Rayyan. Research synthesis methods. 2017;8(3):275–280. [DOI] [PubMed] [Google Scholar]
  • 9.Paynter R, Bañez LL, Berliner E, et al. EPC methods: an exploration of the use of text-mining software in systematic reviews. Rockville (MD): Agency for Healthcare Research and Quality;2016. 16-EHC023-EF. [PubMed] [Google Scholar]
  • 10.Przybyła P, Brockmeier AJ, Kontonatsios G, et al. Prioritising for systematic reviews with RobotAnalyst: a user study. Research synthesis methods. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Research Synthesis Methods. 2011;2(1):1–14. [DOI] [PubMed] [Google Scholar]
  • 12.Wallace BC, Small K, Brodley CE, et al. Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining. Genetics In Medicine. 2012;14:663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic reviews. 2015;4(1):5. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES