Systematic reviews are positioned as the highest form of evidence in health and medical sciences, and have become central to informing clinical guidelines, health policy, modelling and advocacy [1]. They first gained prominence in the 1980s as a key tool of the evidence‐based medicine movement. Scientific evidence from randomised controlled trials of novel medical interventions increased greatly in volume throughout the latter half of the 20th century. The fragmented nature of the literature, however, meant that clinical trials continued to be undertaken long after sufficient evidence had been collected to prove treatment effectiveness, contributing to research waste as well as delayed and inconsistent implementation of the most effective interventions. By exhaustively identifying, appraising and synthesising all relevant trials of a given intervention, systematic reviews sought to provide definitive answers regarding the most effective approach to manage a given condition [2]. This was a major achievement of evidence‐based medicine, contributing to more efficient translation of clinical research into practice in areas including breast cancer, vascular disease and neonatal health [2, 3, 4]. Systematic review methods designed for clinical trials were rapidly adapted and applied to other research questions, including observational studies of prevalence and risk factors for health behaviours and diseases, as well as implementation issues such as barriers to care and intervention coverage. The number of systematic reviews published annually skyrocketed in the first decades of the 21st century [5], alongside methodological guidance [6] and a plethora of reporting checklists [7, 8].
At the same time, some researchers have raised concerns that systematic reviews have proliferated in ways that are unhelpful, increasing research waste and redundancy—the very problems they were, in part, developed to address. Systematic reviews are typically cited more frequently than primary studies in a given area [9], and this, combined with the use of citation metrics to measure individual researcher achievement and rank scientific journals, may incentivise researchers and journal editors to prioritise reviews over primary research [5, 10]. The result is the ‘mass production’ of reviews that are redundant, of dubious quality or lacking in rationale or potential benefit [1, 11, 12, 13].
This pattern is clearly visible in addiction science. Consider the clinical question of the impacts of methadone versus buprenorphine on post‐natal outcomes for babies born to patients being treated for opioid use disorder. In 2013, a Cochrane review identified three relevant trials, involving 227 patients, and concluded that there was insufficient evidence to declare a superior treatment and a need for further controlled trials on this question [14]. In 2014, Brogly and colleagues, employing less stringent inclusion criteria, reviewed 12 studies and concluded that buprenorphine was associated with improved birthweight and gestational age, but urged that more evidence was needed [15]. A review of reviews published in 2015 concluded that there were no significant differences in physical parameters and Apgar scores at birth in neonates exposed to methadone versus buprenorphine [16]. A 2016 review by Zedler and colleagues included the same three clinical trials as the 2013 Cochrane review plus 15 observational studies, and concluded that there was moderately strong evidence to support buprenorphine over methadone in relation to several neonatal outcomes [17]. In 2020, the 2013 Cochrane review was updated, including the very same three studies as in the previous Cochrane review and the Zedler review, with conclusions unchanged from 2013: insufficient evidence to declare either methadone or buprenorphine superior, and calling for more clinical trials [18]. In 2022, a review of 20 studies reached the same conclusion as the Brogly review from 2014: buprenorphine was associated with improved birthweight and gestational age, but more evidence was needed [19]. Critically, these duplicative efforts appear to have been largely uninformative for clinical practice: Major clinical guidelines unreservedly recommend either methadone or buprenorphine during pregnancy [20, 21, 22], as there is sufficient evidence that the benefits far outweigh risks relative to no opioid agonist therapy.
This is just one example, but other sets of duplicative and arguably uninformative reviews can be identified in studies of substance use interventions and epidemiology, representing significant redundancy. Measures devised to minimise duplication and redundancy, such as decision frameworks to determine whether an update of an existing review is needed [23, 24], and the PROSPERO registry of planned, in‐progress and completed reviews [25], appear to have had limited success in curbing these issues. Furthermore, all of this comes at a significant cost; it has been estimated that the average review takes at least a year to complete [26], whereas another study estimated the cost of a systematic review to be approximately US$141,000 in 2019 [27].
This situation is already concerning, but questions about redundancy and the limited usefulness of many systematic reviews gain new dimensions when considered in light of advancements in large language models (LLM) such as ChatGPT. The potential for LLMs to increase the speed with which reviews are completed by performing certain tasks, particularly the more rote processes such as study selection and data extraction, has been widely discussed [28, 29, 30, 31, 32]. A review of the use of LLMs in the production of systematic reviews published in February 2025 declared that these approaches were promising but not yet adequately validated [29]. In June 2025, however, a pre‐print describing an LLM with astonishingly accurate (‘superhuman’) performance in updating Cochrane reviews was published [33]. Furthermore, the authors argued, LLMs may be able to generate de novo reviews, provided that a clear, detailed protocol is first developed by researchers to prompt the model [33]. If trends in the use of LLMs in the peer‐reviewed literature are a guide though [34], it can be assumed that even protocol writing would be outsourced in full to an LLM in at least some cases. That is, we are reaching a point where systematic reviews may not only be duplicative and of questionable utility; they may also be generated in short order, with minimal intellectual forethought or human input.
Setting aside the potential for questionable practices and assuming good faith use of LLMs in systematic reviews, the further integration of artificial intelligence into systematic review production also adds to questions about the epistemological underpinnings of this method [35, 36, 37]. Publishers and editors of scientific journals have declared that AI tools cannot be credited with authorship because they are unable to be accountable for the accuracy and integrity of published work [38, 39]. It is conceivable, however, that in the near future, reviews will be produced in which no humans meet all four requirements of the International Council for Medical Journal Editors for authorship [39]; in particular, the requirement to make a substantial contribution to the acquisition, analysis or interpretation of the data, and perhaps even that of drafting the work or critically reviewing it for intellectual content. Assuming a research question for which a review is needed (these do indeed exist), and a protocol developed by humans with relevant expertise to ensure some level of intellectual oversight, if an LLM can produce a review that is of the same standard as one produced by humans, is it necessary or appropriate to require humans to invest scarce research resources to meet these authorship guidelines? And if a complete and high‐quality review can be produced by an LLM in a fraction of the time of a team of researchers, what does this mean for how reviews are evaluated as scientific outputs? Can researchers who develop a review protocol, but do not perform any of the tasks of the review, reasonably claim authorship or intellectual ownership of the work, or credit for any resulting impact? Most speculatively, can—or should—LLMs that generate valid systematic reviews be considered epistemic agents in their own right [40]? A shift in this direction would mean a reconfiguration in our understanding of a systematic review not as a research output of individual researchers, but a scientifically valid contribution that comes to be outside of our current systems of knowledge production.
Systematic reviews of the evidence remain an essential aspect of addiction science, as with all health research. The emergence of tools capable (or soon‐to‐be capable) of producing systematic reviews rapidly and with minimal human input threatens to overwhelm the literature with unnecessary and redundant work. Somewhat paradoxically, though, potential shifts away from viewing systematic reviews as research outputs that bolster the profile and impact of individual researchers may itself act as a brake on this trend. The integration of LLM technology into systematic reviews does more, however, than just reshape how reviews are conducted and evaluated as knowledge products. It raises significant epistemological questions that unsettle long‐held scientific norms of knowledge generation and epistemic agency. Grappling with these questions will be just as important as ensuring that LLM use in systematic reviews is beneficial for science as a whole.
Author Contributions
S.L. conceived of the idea and wrote the manuscript.
Funding
S.L. is supported by a Research Scholar award from Fonds de recherche du Québec ‐ Santé.
Conflicts of Interest
The author declares no conflicts of interest.
Larney S., “Research Waste, Redundancy and the Rise of the Machines: The Questionable Future of Systematic Reviews,” Drug and Alcohol Review 45, no. 2 (2026): e70128, 10.1111/dar.70128.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
- 1. Uttley L., Weng Y., and Falzon L., “Yet Another Problem With Systematic Reviews: A Living Review Update,” Journal of Clinical Epidemiology 177 (2025): 111608, https://www.jclinepi.com/article/S0895‐4356(24)00364‐0/fulltext. [DOI] [PubMed] [Google Scholar]
- 2. Chalmers I., “The Cochrane Collaboration: Preparing, Maintaining, and Disseminating Systematic Reviews of the Effects of Health Care,” Annals of the New York Academy of Sciences 703 (1993): 156–163. [DOI] [PubMed] [Google Scholar]
- 3. Antiplatelet Trialists' Collaboration , “Secondary Prevention of Vascular Disease by Prolonged Antiplatelet Treatment,” British Medical Journal (Clinical Research Edition) 296, no. 6618 (1988): 320–331. [PMC free article] [PubMed] [Google Scholar]
- 4. Early Breast Cancer Trialists' Collaborative Group , “Systemic Treatment of Early Breast Cancer by Hormonal, Cytotoxic, or Immune Therapy. 133 Randomised Trials Involving 31,000 Recurrences and 24,000 Deaths Among 75,000 Women,” Lancet (London, England) 339, no. 8784 (1992): 1–15. [PubMed] [Google Scholar]
- 5. Gurevitch J., Koricheva J., Nakagawa S., and Stewart G., “Meta‐Analysis and the Science of Research Synthesis,” Nature (London) 555, no. 7695 (2018): 175–182. [DOI] [PubMed] [Google Scholar]
- 6. Kolaski K., Logan L. R., and Ioannidis J. P. A., “Guidance to Best Tools and Practices for Systematic Reviews,” Systematic Reviews 12, no. 1 (2023): 96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Page M. J., McKenzie J. E., Bossuyt P. M., et al., “The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews,” Journal of Clinical Epidemiology 134 (2021): 178–189. [DOI] [PubMed] [Google Scholar]
- 8. “Systematic reviews/Meta‐analyses/Reviews/HTA/Overviews | Study Designs|EQUATOR Network [Internet],” https://www.equator‐network.org/reporting‐guidelines‐study‐design/systematic‐reviews‐and‐meta‐analyses/.
- 9. Royle P., Kandala N. B., Barnard K., and Waugh N., “Bibliometrics of Systematic Reviews: Analysis of Citation Rates and Journal Impact Factors,” Systematic Reviews 2 (2013): 74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Cohnstaedt L. W. and Poland J., “Review Articles: The Black‐Market of Scientific Currency,” Annals of the Entomological Society of America 110, no. 1 (2017): 90. [Google Scholar]
- 11. Ioannidis J. P. A., “The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta‐Analyses,” Milbank Quarterly 94, no. 3 (2016): 485–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Lund H., Robinson K. A., Gjerland A., et al., “Meta‐Research Evaluating Redundancy and Use of Systematic Reviews When Planning New Studies in Health Research: A Scoping Review,” Systematic Reviews 11 (2022): 241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Kwok W., Dallant T., Martin G., et al., “Systematic Reviews on the Same Topic Are Common but Often Fail to Meet Key Methodological Standards: A Research‐On‐Research Study,” Journal of Clinical Epidemiology 189 (2025): 112018. [DOI] [PubMed] [Google Scholar]
- 14. Minozzi S., Amato L., Bellisario C., Ferri M., and Davoli M., “Maintenance Agonist Treatments for Opiate‐Dependent Pregnant Women,” Cochrane Database of Systematic Reviews 12 (2013): 1–38, https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD006318.pub3/full. [DOI] [PubMed] [Google Scholar]
- 15. Brogly S. B., Saia K. A., Walley A. Y., Du H. M., and Sebastiani P., “Prenatal Buprenorphine Versus Methadone Exposure and Neonatal Outcomes: Systematic Review and Meta‐Analysis,” American Journal of Epidemiology 180, no. 7 (2014): 673–686. [DOI] [PubMed] [Google Scholar]
- 16. Holbrook A. M. and Nguyen V. H., “Medication‐Assisted Treatment for Pregnant Women: A Systematic Review of the Evidence and Implications for Social Work Practice,” Journal of the Society for Social Work and Research 6, no. 1 (2015): 1–19. [Google Scholar]
- 17. Zedler B. K., Mann A. L., Kim M. M., et al., “Buprenorphine Compared With Methadone to Treat Pregnant Women With Opioid Use Disorder: A Systematic Review and Meta‐Analysis of Safety in the Mother, Fetus and Child,” Addiction 111, no. 12 (2016): 2115–2128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Minozzi S., Amato L., Jahanfar S., Bellisario C., Ferri M., and Davoli M., “Maintenance Agonist Treatments for Opiate‐Dependent Pregnant Women,” Cochrane Database of Systematic Reviews 11 (2020): CD006318, https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD006318.pub4/full?highlightAbstract=methadon%7Cmethadone. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Kinsella M., Halliday L. O. E., Shaw M., Capel Y., Nelson S. M., and Kearns R. J., “Buprenorphine Compared With Methadone in Pregnancy: A Systematic Review and Meta‐Analysis,” Substance Use and Misuse 57, no. 9 (2022): 1400–1416. [DOI] [PubMed] [Google Scholar]
- 20. Clinical Guidelines on Drug Misuse and Dependence Update 2017 Independent Expert Working Group , Drug Misuse and Dependence: UK Guidelines on Clinical Management (Department of Health, 2017). [Google Scholar]
- 21. American Society of Addiction Medicine , “The ASAM National Practice Guideline for the Treatment of Opioid Use Disorder: 2020 Focused Update [Internet],” (Maryland, American Society of Addiction Medicine: 2020), 1–91, 10.1097/ADM.0000000000000633. [DOI] [PubMed]
- 22. Yakovenko I., Mukaneza Y., Germé K., et al., “Management of Opioid Use Disorder: 2024 Update to the National Clinical Practice Guideline,” CMAJ 196, no. 38 (2024): E1280–E1290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Cumpston M. and Flemyng E., “Chapter IV: Updating a Review|Cochrane,” in Cochrane Handbook for Systematic Reviews of Interventions (Cochrane Collaboration, 2024), https://www.cochrane.org/authors/handbooks‐and‐manuals/handbook/current/chapter‐iv#anchor‐iv2‐deciding‐whether‐and‐when‐to‐update. [Google Scholar]
- 24. Garner P., Hopewell S., Chandler J., et al., “When and How to Update Systematic Reviews: Consensus and Checklist,” BMJ 354 (2016): i3507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. “PROSPERO [Internet],” https://www.crd.york.ac.uk/prospero/.
- 26. Borah R., Brown A. W., Capers P. L., and Kaiser K. A., “Analysis of the Time and Workers Needed to Conduct Systematic Reviews of Medical Interventions Using Data From the PROSPERO Registry,” BMJ Open 7, no. 2 (2017): e012545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Michelson M. and Reuter K., “The Significant Cost of Systematic Reviews and Meta‐Analyses: A Call for Greater Involvement of Machine Learning to Assess the Promise of Clinical Trials,” Contemporary Clinical Trials Communications 16 (2019): 100443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Li M., Sun J., and Tan X., “Evaluating the Effectiveness of Large Language Models in Abstract Screening: A Comparative Analysis,” Systematic Reviews 3, no. 1 (2024): 219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Lieberum J. L., Toews M., Metzendorf M. I., et al., “Large Language Models for Conducting Systematic Reviews: On the Rise, but Not Yet Ready for Use—A Scoping Review,” Journal of Clinical Epidemiology 181 (2025): 111746. [DOI] [PubMed] [Google Scholar]
- 30. Trad F., Yammine R., Charafeddine J., et al., “Streamlining Systematic Reviews With Large Language Models Using Prompt Engineering and Retrieval Augmented Generation,” BMC Medical Research Methodology 25, no. 1 (2025): 130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Gartlehner G., Kahwati L., Hilscher R., et al., “Data Extraction for Evidence Synthesis Using a Large Language Model: A Proof‐Of‐Concept Study,” Research Synthesis Methods 15, no. 4 (2024): 576–589. [DOI] [PubMed] [Google Scholar]
- 32. Oami T., Okada Y., and Nakada T a., “Performance of a Large Language Model in Screening Citations,” JAMA Network Open 7, no. 7 (2024): e2420496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Cao C., Arora R., Cento P., et al., “Automation of Systematic Reviews With Large Language Models,” (2025), medRxiv, 10.1101/2025.06.13.25329541v1. [DOI]
- 34. Kobak D., González‐Márquez R., Horvát E. Á., and Lause J., “Delving Into LLM‐Assisted Writing in Biomedical Publications Through Excess Vocabulary,” Science Advances 11, no. 27 (2025): eadt3813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Gond J. P., Mena S., and Mosonyi S., “The Performativity of Literature Reviewing: Constituting the Corporate Social Responsibility Literature Through re‐Presentation and Intervention,” Organizational Research Methods 26, no. 2 (2023): 195–228. [Google Scholar]
- 36. MacLure M., “‘Clarity Bordering on Stupidity’: Where's the Quality in Systematic Review?,” Journal of Education Policy 20, no. 4 (2005): 393–416. [Google Scholar]
- 37. Greenhalgh T., Thorne S., and Malterud K., “Time to Challenge the Spurious Hierarchy of Systematic Over Narrative Reviews?,” European Journal of Clinical Investigation 48, no. 6 (2018): e12931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. “The Use of AI and AI‐Assisted Technologies in Writing for Elsevier,” https://www.elsevier.com/about/policies‐and‐standards/the‐use‐of‐generative‐ai‐and‐ai‐assisted‐technologies‐in‐writing‐for‐elsevier.
- 39. ICMJE , “Recommendations|Defining the Role of Authors and Contributors,” https://www.icmje.org/recommendations/browse/roles‐and‐responsibilities/defining‐the‐role‐of‐authors‐and‐contributors.html#four.
- 40. Lissack M. and Meagher B., “LLMs as Epistemic Tools: Exformation and the Architecture of Machine Explanation,” (Rochester, NY, Social Science Research Network: 2025), https://papers.ssrn.com/abstract=5595850.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
