Skip to main content
Cochrane Evidence Synthesis and Methods logoLink to Cochrane Evidence Synthesis and Methods
. 2025 Nov 24;3(6):e70063. doi: 10.1002/cesm.70063

Responsible Integration of Artificial Intelligence in Rapid Reviews: A Position Statement From the Cochrane Rapid Reviews Methods Group

Gerald Gartlehner 1,2, Barbara Nussbaumer‐Streit 1, Candyce Hamel 3,4, Chantelle Garritty 4, Ursula Griebler 1, Valerie Jean King 5, Declan Devane 6,7, Chris Kamel 8
PMCID: PMC12644243  PMID: 41306645

1. Background

Rapidly evolving artificial intelligence (AI) technologies are increasingly used to accelerate literature review processes. A recent review and evidence map identified almost 100 studies published since 2021assessing AI applications in evidence synthesis [1]. These technologies span from machine‐learning classifiers to generative large‐language models (LLMs). Recently, a preprint reported that a tool powered by LLMs autonomously reproduced and updated 12 Cochrane reviews in just 2 days [2], sparking debate about when and how AI can be used safely and effectively to support systematic and rapid reviews.

In this position statement, the Cochrane Rapid Reviews Methods Group outlines its stance on the use of AI in rapid reviews. Rapid reviews encompass various types of evidence synthesis, and while some AI tools have been developed for specific review types, such as qualitative evidence syntheses, most are designed for more general application across review methodologies.

The main recommendations are summarized in Textbox 1. They complement a recently released position statement by Cochrane and other evidence synthesis organizations on the use of AI in evidence synthesis [3].

Textbox 1. Summary of Recommendations for AI Use in Cochrane Rapid Reviews.

1.

  • Do not rely on AI to fully automate any step of a rapid review. Human oversight is essential to maintain methodological rigor and accountability.

  • Use AI for quality assurance where it complements human judgment and helps mitigate risks associated with accelerated methods, such as single‐reviewer workflows by flagging missed studies, spotting inconsistencies, or detecting extraction errors.

  • Be transparent about AI use and justify its use. Specify in the protocol which tools will be used, the tasks they will perform, and how their outputs will be verified. Clearly describe AI use in the Cochrane rapid review's “AI Use Disclosure” section. If using a generative large language model, document the model version and prompts applied.

  • Demonstrate that the selected AI tool upholds the methodological rigor and integrity of the rapid review.

  • Maintain author responsibility. AI is a support tool, but authors remain fully accountable for the accuracy, interpretation, and overall validity of the rapid review.

  • Respect copyright and intellectual property. Ensure that AI tools are used in compliance with licensing terms.

  • Adhere to RAISE principles and the Cochrane position statement on AI use in evidence synthesis to ensure that AI applications are ethical, transparent, and fit‐for‐purpose.

Abbreviations: AI = artificial intelligence, RAISE = Responsible AI use in Systematic Evidence Synthesis.

2. Semi‐Automation in Rapid Reviews

Semi‐automation of discrete steps in the evidence synthesis process —where algorithms assist but do not replace human reviewers—is not new. Cochrane, for instance, was an early adopter with the development of the randomized controlled trial (RCT) Classifier, a machine learning tool that identifies RCTs during abstract screening [4]. Semi‐automation plays a different role in rapid reviews than in traditional systematic reviews, where methodological certainty is typically prioritized. Because rapid reviews already balance rigor and timeliness, teams may be more willing to adopt efficiency‐enhancing tools sooner.

The advent of generative LLMs, such as ChatGPT [5] or Gemini [6], has substantially expanded the potential for AI to support tasks in evidence synthesis. Unlike earlier machine learning tools that required extensive task‐specific training data, LLMs can be deployed in zero‐shot settings—meaning they can be applied without prior training or fine‐tuning to a given task. This dramatically lowers the barrier to entry, offering a more accessible pathway for integrating AI into review workflows. Multiple studies have assessed the utility of generative LLMs to support the development of search strategies [7], literature screening [8, 9, 10], risk of bias assessment [11, 12], and data extraction [8, 13, 14, 15]. However, findings to date indicate highly variable performance ranging from high accuracy in some tasks to concerning errors in others [1]. In parallel, developers of literature review software have begun integrating LLMs into their products.

Importantly, in rapid reviews, AI has the potential not only to enhance efficiency but also to improve quality. Many rapid reviews rely on a single reviewer to perform key tasks—such as study selection, data extraction, risk of bias assessment, or certainty of evidence ratings, which heightens the risk of undetected errors. In these settings, AI can function as a scalable quality control tool, helping to identify inconsistencies, flag missing data, or suggest overlooked studies. For instance, Cochrane guidance for rapid reviews recommends switching to single‐reviewer screening if inter‐rater agreement during dual screening of abstracts is high [16, 17], which will likely miss some eligible studies [18]. In this context, AI may complement human judgment and mitigate the risks associated with single‐reviewer workflows. An AI‐integrated review software could re‐examine abstracts excluded during single‐reviewer screening, reducing the likelihood of erroneous exclusions. Another error‐prone step in the review process that could benefit from AI support is data extraction. Research shows that up to 50% of data elements extracted by humans include errors, depending on reviewer experience and topic complexity [19, 20]. The use of AI as a secondary reviewer for data extraction may improve data quality and reduce errors [21]. Ultimately, however, reviewers must decide whether the additional effort of using AI is feasible for their rapid review.

Despite its promising potential, AI also introduces risks. Specifically, generative LLMs can produce incorrect responses, fabricate data or references, perpetuate biases, and spread misinformation. To ensure that AI integration strengthens rather than undermines the credibility of rapid reviews, sustained human oversight, despite its own fallibility, must remain the core principle of any AI‐supported evidence synthesis effort. A recent review on the use of AI in evidence synthesis found that incorrect inclusion decisions of AI tools during screening ranged from 0% to 29% (median = 10%) and incorrect data extractions from 4% to 31% (median = 14%) [22].

3. Responsible Use of AI

The Responsible AI use in Systematic Evidence Synthesis (RAISE) guidance, developed through a multi‐stakeholder consensus process, provides foundational principles for the transparent, ethical, and scientifically sound integration of AI in evidence synthesis [23]. Cochrane has been actively involved in RAISE, both as a contributor and as an implementing organization, reflecting its commitment to ensuring methodological rigor in the face of rapid technological advancement. Members of the Cochrane Rapid Reviews Methods Group also participated in the development of RAISE, bringing expertise from the rapid review context where the pressure for speed makes AI adoption especially appealing—and potentially risky. RAISE emphasizes key principles such as transparency in reporting, human oversight, reproducibility, and fit‐for‐purpose evaluation. It cautions against overreliance on AI systems without robust validation and highlights the importance of disclosing when, how, and for which tasks automation has been used. As generative LLMs and other AI tools become increasingly accessible, adhering to RAISE principles will be essential to safeguard the trustworthiness and utility of AI‐assisted evidence synthesis.

When using AI tools, researchers also need to verify that any uploaded or processed material is permissible under fair use or equivalent scholarly exceptions and that the AI model itself complies with copyright and data protection standards. For instance, some professional or enterprise versions of generative LLMs explicitly guarantee that uploaded material will not be used for model training or redistribution, offering greater assurance of confidentiality and legal compliance. Transparent documentation of AI tool usage—including model version, purpose, and data inputs—should be maintained to uphold reproducibility, accountability, and ethical integrity in evidence synthesis.

4. What Does This Mean for Authors of Cochrane Rapid Reviews?

Cochrane, together with the Campbell Collaboration, JBI, and the Collaboration for Environmental Evidence, has endorsed RAISE in a position statement on the use of AI in evidence synthesis [3]. The statement emphasizes that evidence synthesists remain fully accountable for their work and must ensure ethical and legal compliance when using AI or automation. Any use of AI should be clearly justified, and the tool must be methodologically sound, ensuring that it does not compromise the trustworthiness or reliability of a review's findings. Importantly, all AI‐assisted tasks require human oversight, and any AI‐generated or AI‐informed judgments must be reported transparently in the final synthesis.

Authors should not use AI to fully automate an entire rapid review or any of its methodological steps. Doing so risks introducing errors, bias, and a lack of transparency, ultimately undermining the trustworthiness and reproducibility of the rapid review. Furthermore, such approaches violate established Cochrane methodological standards. Additionally, authors must continue to follow existing methodological guidance for Cochrane rapid reviews [16] and uphold Cochrane's standards of transparency, conflicts of interest, accountability, and scientific rigor [24].

The use of AI tools is acceptable—and even encouraged—when it serves to improve review quality. When resource constraints necessitate that a task, such as study selection or data extraction, is carried out by a single reviewer, an AI tool may be used to provide a secondary check or offer an independent suggestion. In this way, rapid review authors can introduce an additional layer of quality assurance with minimal extra effort. However, in all cases of AI use, a human reviewer must remain responsible for verifying all AI outputs and making final decisions.

Human reviewers must continue to resolve ambiguity, thoughtfully apply inclusion criteria, and interpret findings within the broader context of clinical relevance or policy implications. AI tools cannot be listed as authors, nor can they be accountable for their errors. Therefore, transparency regarding the use of AI is essential. The review protocol must document the intention to include AI during review production. The review methods and the new “AI Use Disclosure” section of Cochrane reports must clearly specify which tools were used, how they were applied, and what role they played in the review process. If reviewers used a generative LLM, the model version and prompts need to be documented. This includes a description of the extent of human oversight and any validation steps taken. It is critical that human reviewers retain all accountability, as AI systems lack the capacity for independent judgment and responsibility. While AI can support efficiency and quality assurance, human authors must be responsible for the final interpretation of evidence and all methodological judgments.

5. Future Possibilities

Fully autonomous evidence synthesis with AI remains a distant prospect. Retaining human oversight is not a shortcoming of AI, but a methodological imperative to safeguard the trustworthiness, transparency, and applicability of synthesized evidence.

Looking ahead, we are likely to see an increase in fully automated outputs—reviews that appear well‐written and methodologically sound on the surface but lack the depth of critical evaluation required for high‐stakes decision‐making. Without proper human oversight, this could unleash a flood of low‐quality reviews that are difficult to detect. Vigilance by peer reviewers and editors will be paramount. While increased automation in rapid reviews is likely inevitable and welcome, the transition must be carefully managed with strict adherence to rigorous conduct and reporting standards. To ensure that rapid reviews serve the needs of both science and society, active participation of human expertise will remain indispensable for the foreseeable future.

Research in this area is advancing rapidly, and the views expressed here reflect the position of the Cochrane Rapid Reviews Methods Group at the time of writing. As the evidence base evolves, this statement will be reviewed at least biannually and updated as needed. We strongly encourage rapid review authors to acquaint themselves with the most recent versions of RAISE [23] and the Cochrane position statement on AI use in evidence synthesis [3].

Author Contributions

Gerald Gartlehner: conceptualization, writing – review and editing, writing – original draft, project administration. Barbara Nussbaumer‐Streit: conceptualization, writing – review and editing. Candyce Hamel: conceptualization, writing – review and editing. Chantelle Garritty: writing – review and editing, conceptualization. Ursula Griebler: writing – review and editing, conceptualization. Valerie Jean King: conceptualization, writing – review and editing. Declan Devane: conceptualization, writing – review and editing. Chris Kamel: conceptualization, writing – review and editing.

Conflicts of Interest

All authors are co‐convenors of the Cochrane Rapid Reviews Methods Group; G.G. and H.C. are co‐convenors of the Cochrane Artificial Intelligence Methods Group.

Acknowledgments

We would like to thank Sandra Hummel from Cochrane Austria for administrative support. Open Access funding provided by Universitat fur Weiterbildung Krems/KEMÖ.

Gartlehner G., Nussbaumer‐Streit B., Hamel C., et al., “Responsible Integration of Artificial Intelligence in Rapid Reviews: A Position Statement From the Cochrane Rapid Reviews Methods Group,” Cochrane Evidence Synthesis and Methods 4 (2025): e70063, 10.1002/cesm.70063.

Data Availability Statement

Data sharing is not applicable to this article, as no new data were created or analyzed in this study.

References

  • 1. Adam G. P., Davies M., George J., et al., Machine Learning Tools To (Semi‐) Automate Evidence Synthesis, Version 2. White Paper. (Prepared by the Brown Evidence‐based Practice Center [contract no. 75Q80120D00001], Kaiser Permanente Research Affiliates Evidence‐based Practice Center [contract no. 75Q80120D00004], Pacific Northwest Evidence‐based Practice Center [contract no. 75Q80120D00006], and the Scientific Resource Center [contract no. 75Q80122C00002]). AHRQ Publication No. 25‐EHC038. Rockville. Agency for Healthcare Research and Quality. 2025. 10.23970/AHRQEPCWHITEPAPERMACHINE2. [DOI]
  • 2. Cao C., Arora R., Cento P., et al., “Automation of Systematic Reviews With Large Language Models,” medRxiv (2025), 10.1101/2025.06.13.25329541. [DOI] [Google Scholar]
  • 3. Flemyng E., Noel‐Storr A., Macura B., et al., “Position Statement on AI Use in Evidence Synthesis Across Cochrane, the Campbell Collaboration, JBI and the Collaboration for Environmental Evidence 2025 [Editorial],” Cochrane Database of Systematic Reviews, no. Issue 10 (2025): ED000178, 10.1002/14651858.ED000178. [DOI] [Google Scholar]
  • 4. Thomas J., McDonald S., Noel‐Storr A., et al., “Machine Learning Reduced Workload With Minimal Risk of Missing Studies: Development and Evaluation of a Randomized Controlled Trial Classifier for Cochrane Reviews,” Journal of Clinical Epidemiology 133 (2021): 140–151, 10.1016/j.jclinepi.2020.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.OpenAI. ChatGPT, accessed July 20, 2025, https://openai.com/chatgpt.
  • 6.Google. Gemini, accessed July 20, 2025, https://gemini.google.com/
  • 7. Adam G. P., DeYoung J., Paul A., et al., “Literature Search Sandbox: A Large Language Model That Generates Search Queries for Systematic Reviews,” JAMIA Open 7, no. 3 (2024): ooae098, 10.1093/jamiaopen/ooae098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Khraisha Q., Put S., Kappenberg J., Warraitch A., and Hadfield K., “Can Large Language Models Replace Humans in Systematic Reviews? Evaluating GPT‐4's Efficacy in Screening and Extracting Data From Peer‐Reviewed and Grey Literature in Multiple Languages,” Research Synthesis Methods 15, no. 4 (2024): 616–626, 10.1002/jrsm.1715. [DOI] [PubMed] [Google Scholar]
  • 9. Guo E., Gupta M., Deng J., Park Y.‐J., Paget M., and Naugler C., “Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study,” Journal of Medical Internet Research 26 (2024): e48996, 10.2196/48996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Tran V.‐T., Gartlehner G., Yaacoub S., et al., “Sensitivity and Specificity of Using GPT‐3.5 Turbo Models for Title and Abstract Screening in Systematic Reviews and Meta‐Analyses,” Annals of Internal Medicine 177, no. 6 (2024): 791–799, 10.7326/M23-3389. [DOI] [PubMed] [Google Scholar]
  • 11. Lai H., Ge L., Sun M., et al., “Assessing the Risk of Bias in Randomized Clinical Trials With Large Language Models,” JAMA Network Open 7, no. 5 (2024): e2412687‐e, 10.1001/jamanetworkopen.2024.12687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Hasan B., Saadi S., Rajjoub N. S., et al., “Integrating Large Language Models in Systematic Reviews: A Framework and Case Study Using ROBINS‐I for Risk of Bias Assessment,” BMJ Evidence‐Based Medicine 29, no. 6 (2024): 394–398, 10.1136/bmjebm-2023-112597. [DOI] [PubMed] [Google Scholar]
  • 13. Gartlehner G., Kugley S., Crotty K., et al., “AI‐Assisted Data Extraction With a Large Language Model: A Study Within Reviews,” Annals of Internal Medicine (2025), 10.7326/ANNALS-25-00739. [DOI] [PubMed] [Google Scholar]
  • 14. Motzfeldt Jensen M., Brix Danielsen M., Riis J., et al., “ChatGPT‐4o Can Serve as the Second Rater for Data Extraction in Systematic Reviews,” PLoS One 20, no. 1 (2025): e0313401, 10.1371/journal.pone.0313401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Gartlehner G., Kahwati L., Hilscher R., et al., “Data Extraction for Evidence Synthesis Using a Large Language Model: A Proof‐of‐Concept Study,” Research Synthesis Methods 15, no. 4 (2024): 576–589, 10.1002/jrsm.1710. [DOI] [PubMed] [Google Scholar]
  • 16. Garritty C., Hamel C., Trivella M., et al., “Updated Recommendations for the Cochrane Rapid Review Methods Guidance for Rapid Reviews of Effectiveness,” BMJ 384 (2024): e076335, 10.1136/bmj-2023-076335. [DOI] [PubMed] [Google Scholar]
  • 17. Nussbaumer‐Streit B., Sommer I., Hamel C., et al., “Rapid Reviews Methods Series: Guidance on Team Considerations, Study Selection, Data Extraction and Risk of Bias Assessment,” BMJ Evidence‐Based Medicine 28, no. 6 (2023): 418–423, 10.1136/bmjebm-2022-112185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Gartlehner G., Affengruber L., Titscher V., et al., “Single‐Reviewer Abstract Screening Missed 13 Percent of Relevant Studies: A Crowd‐Based, Randomized Controlled Trial,” Journal of Clinical Epidemiology 121 (2020): 20–28, 10.1016/j.jclinepi.2020.01.005. [DOI] [PubMed] [Google Scholar]
  • 19. Lieberum J.‐L., Toews M., Metzendorf M.‐I., et al., “Large Language Models for Conducting Systematic Reviews: on the Rise, but Not yet Ready for Use—A Scoping Review,” Journal of Clinical Epidemiology 181 (2025): 111746, 10.1016/j.jclinepi.2025.111746. [DOI] [PubMed] [Google Scholar]
  • 20. Mathes T., Klaßen P., and Pieper D., “Frequency of Data Extraction Errors and Methods to Increase Data Extraction Quality: A Methodological Review,” BMC Medical Research Methodology 17, no. 1 (2017): 152, 10.1186/s12874-017-0431-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Helms Andersen T., Marcussen T. M., Termannsen A. D., Lawaetz T. W. H., and Nørgaard O., “Using Artificial Intelligence Tools as Second Reviewers for Data Extraction in Systematic Reviews: A Performance Comparison of Two AI Tools Against Human Reviewers,” Cochrane Evid Synth Methods 3, no. 4 (2025): e70036, 10.1002/cesm.70036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Clark J., Barton B., Albarqouni L., et al., “Generative Artificial Intelligence Use in Evidence Synthesis: A Systematic Review,” Research Synthesis Methods 16, no. 4 (2025): 601–619, 10.1017/rsm.2025.16. [DOI] [Google Scholar]
  • 23. Thomas J., Flemyng E., and Noel‐Storr A., “Responsible AI in Evidence SynthEsis (RAISE): Guidance and Recommendations (Version 2; Updated 3 June 2025),” Open Science Framework (OSF) (2024), 10.17605/OSF.IO/FWAUD. [DOI] [Google Scholar]
  • 24. Cochrane Resources . Principles of Collaboration: Working Together for Cochrane, accessed November 2, 2025, https://resources.cochrane.org/policies/principles-collaboration-working-together-cochrane.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data sharing is not applicable to this article, as no new data were created or analyzed in this study.


Articles from Cochrane Evidence Synthesis and Methods are provided here courtesy of John Wiley & Sons Ltd on behalf of The Cochrane Collaboration

RESOURCES