ABSTRACT
Introduction
While artificial intelligence (AI) tools have been utilized for individual stages within the systematic literature review (SLR) process, no tool has previously been shown to support each critical SLR step. In addition, the need for expert oversight has been recognized to ensure the quality of SLR findings. Here, we describe a complete methodology for utilizing our AI SLR tool with human‐in‐the‐loop curation workflows, as well as AI validations, time savings, and approaches to ensure compliance with best review practices.
Methods
SLRs require completing Search, Screening, and Extraction from relevant studies, with meta‐analysis and critical appraisal as relevant. We present a full methodological framework for completing SLRs utilizing our AutoLit software (Nested Knowledge). This system integrates AI models into the central steps in SLR: Search strategy generation, Dual Screening of Titles/Abstracts and Full Texts, and Extraction of qualitative and quantitative evidence. The system also offers manual Critical Appraisal and Insight drafting and fully‐automated Network Meta‐analysis. Validations comparing AI performance to experts are reported, and where relevant, time savings and ‘rapid review’ alternatives to the SLR workflow.
Results
Search strategy generation with the Smart Search AI can turn a Research Question into full Boolean strings with 76.8% and 79.6% Recall in two validation sets. Supervised machine learning tools can achieve 82–97% Recall in reviewer‐level Screening. Population, Interventions/Comparators, and Outcomes (PICOs) extraction achieved F1 of 0.74; accuracy for study type, location, and size were 74%, 78%, and 91%, respectively. Time savings of 50% in Abstract Screening and 70–80% in qualitative extraction were reported. Extraction of user‐specified qualitative and quantitative tags and data elements remains exploratory and requires human curation for SLRs.
Conclusion
AI systems can support high‐quality, human‐in‐the‐loop execution of key SLR stages. Transparency, replicability, and expert oversight are central to the use of AI SLR tools.
Keywords: artificial intelligence, evidence synthesis, human‐in‐the‐loop, meta‐analysis, systematic literature review
1. Introduction
Systematic literature reviews (SLRs) are crucial to understanding the efficacy, safety, and cost‐effectiveness of therapies and are used to support evidence‐based decision‐making across healthcare, policy‐making, guidelines, regulatory and reimbursement decisions, and scientific publishing. As the demand for more efficient and reliable evidence generation and reporting has grown, artificial intelligence (AI) technologies have been incorporated to automate various stages of the SLR process. However, despite extensive calls for innovation and tool development [1] and many exploratory reviews integrating machine learning (ML) or Large Language Models (LLMs) for individual review steps [2], AI tools for SLR remain underdeveloped and have not replaced conventional SLR methods to date.
Additionally, a critical aspect to ensure the efficacy and trustworthiness of AI‐powered SLR tools is their validation via comparison to expert‐based methods. The main goal of AI integration in SLR is time savings for experts given the extraordinary labor involved in reviews; use of such AI tools is generally as assistive tools rather than full replacements for experts [2]. In addition, established guidelines and frameworks provided by government frameworks and Health Technology Assessment (HTA) bodies such as National Institute for Care and Health Excellence (NICE) provide frameworks for transparency, reproducibility, and rigor in the systematic review process and users and developers of AI SLR tools alike are responsible for ensuring compliance of methodologies with HTA standards. While these AI‐powered SLR tools aim to enhance efficiency of the SLR process, human‐in‐the‐loop oversight at each step remains key to ensure high‐quality, accurate findings [3]. Thus, leading guidance recommends leveraging the strengths of such tools where there is demonstrable value, but with strict curation of AI findings– enhancing human expertise instead of replacing it [4]. Optimally, AI tools should be utilized to improve efficiency and accuracy of reviews while maintaining high‐quality outputs via human‐in‐the‐loop oversight.
Here, we detail the AI methodologies and expert‐oversight mechanisms of our single software solution (AutoLit®, Nested Knowledge) to provide AI assistance for each core step––namely, Search, Screening, Qualitative and Quantitative Extraction, and Export of data–in SLR [5]. Where available, we document all validations of each AI tool, with comparison against expert gold standards. While provided in the context of SLRs with full expert curation, we detail different modes that may be appropriate for more rapid evidence generation processes, including rapid scoping reviews, targeted literature reviews, gap analyses, and other accelerated review types [6]. Lastly, we describe and emphasize the importance of implementing validated, integrated tools with transparency, reproducibility, and oversight mechanisms in finding, extracting, and presenting synthesized evidence.
2. Methods
To both present and test the full workflow of AI‐assisted SLR stages, we documented the AutoLit system's full methods for each stages of review, with a focus on disclosing the AI models available and the results of validation studies against expert‐driven findings where relevant. Manual actions were summarized from the online Documentation [7], and AI methods from the Model Cards [8], with full compliance with the PALISADE Checklist for AI tools [9].
3. Literature Search
3.1. Software Workflow
3.1.1. Direct Queries to Databases
The literature search stage of the review process is intended to comprehensively identify all relevant studies that address a predefined Research Question based on records from databases (such as MEDLINE/PubMed, Embase, and the Cochrane Central Register of Controlled Trials [CENTRAL]) and other sources, often requiring extensive time and effort. In short, the Research Question is used to generate Boolean queries, which are in turn run across the databases, with the final output of records (metadata and abstracts) to be screened for relevance.
In AutoLit, as of May 31st, 2025, Boolean queries may be drafted or copied into a query editor/builder on the Literature Search page and run via Application Program Interface (API) on PubMed, ClinicalTrials. gov, the MAUDE database, the Directory of Open Access Journals, and/or Europe PMC.
3.1.2. Import and Other Sources Upload
In addition, records may imported from bibliographic files (RIS, nBIB, or TXT files) or from spreadsheets containing metadata (Excel and CSV files). If a record contains a PubMed ID or Digital Object Identifier (DOI), its metadata may be checked and ‘synced’ against PubMed or CrossRef [10], respectively. To date, XML and other file types are not importable in AutoLit.
Non‐indexed sources from grey literature searches may be added individually or in bulk using DOI, manual metadata entry, or extracted from the PDF of a primary study; however, note that AutoLit does not provide direct searching of sources for grey literature. Lastly, records may be added via “Bibliomining,” the extraction of References from the bibliographies of previously‐published systematic reviews. For this module, users must provide the existing SLR for bibliomining, which then has the reference section identified and all citation data extracted using the open‐source CERMINE model [11]. Note that the user has an opportunity to correct or override all bibliomining outputs.
Notably, any API‐based search may be set to a schedule to be automatically re‐executed; the software automatically de‐duplicates both these automatic search results and any import or upload completed after initial searches are completed, enabling immediate screening and the maintenance of ‘living’ SLRs, an established practice in SLR [12].
3.1.3. Duplicate Management
All searched and imported records are deduplicated by parsing and comparing metadata fields using a multi‐field matching algorithm: DOIs, PubMed IDs, and other IDs such as National Clinical Trial (NCT) identifier, as well as normalized and Jaro‐Winkler similarity scoring of Title match. Matches are determined by exact shared IDs or by Jaro‐Winkler scores above 0.95. Conference abstracts, as identified by Publication Type or journal/conference are deduplicated separately from full text articles, to ensure matching titles are found separately for these source types; no specific or separate treatment is used for pre‐prints. Users can merge records or override automatic deduplication decisions from a Duplicate queue.
3.2. AI Tools
To support the literature search process, search strategy generation and automatic classification of studies is generally accepted as an area where AI can be suitably implemented [13, 14, 15, 16]. The AI tools that assist in generating Boolean search strings are Smart Search, a fully AI‐driven query builder, and Search Exploration, an iterative system for user‐refined query development.
3.2.1. Smart Search
Smart Search is an LLM‐based reasoning agent that builds Boolean queries using a Generator‐Critic loop [17]. Following user provision of a Research Question, Smart Search drafts questions to the user to provide all Population, Interventions, Comparators, and Outcomes (PICOs) and any other user‐identified search constraints, after which the model generates three candidate queries. The user can select one or more queries, edit them using the manual query builder or Search Exploration, and run them against the API‐connected databases.
Validation Methods: ten Cochrane reviews and nineteen reviews (previously published as a validation set) [18] with manually‐constructed searches were identified. All included records from these reviews were extracted, as well as the Aims/Objective statement from each review. The Aims/Objective statements were given to Smart Search and the number of included records returned from the broadest query generated was used to generate Recall (percentage of included records from the gold standard found by the model) and Precision (percentage of included records compared to total number of results returned by the model). For comparison against foundational models, GPT 4.0 was also tested in its capability to identify included studies from the ten Cochrane reviews. Notably, the Cochrane reviews covered multi‐database searches, while Smart Search was performed solely on PubMed, meaning that Recall may be underestimated for Smart Search in PubMed‐only searches, but should represent realistic coverage compared against multi‐database search.
3.2.2. Search Exploration
Search Exploration is a multi‐model system for human‐in‐the‐loop query editing, where a Boolean query is entered and Explored, comprising the extraction of PICOs (Figure 1), study type, size, and location using a multi‐model system [19] including the BioELECTRA Natural Language Processing (NLP) algorithm [20]; Acronyms and word clusters; and ‘fuzzy’ Topics from underlying abstracts using an open‐source fork of Carrot2 [21]. For larger searches, Search Exploration samples 250 records and breaks down the frequency of each of the aforementioned concepts across search results. Then, the user may adopt any concept or term and add it to the Boolean query, with the final result representing a user‐constructed query refined based on feedback and suggested terms of the model.
Figure 1.

Search Exploration predicts and maps Populations, Interventions, Comparators, and Outcomes from underlying abstracts resulting from Boolean PubMed queries, including reporting frequency and enabling adoption of new search terms.
In addition to the validations of Carrot2 and BioELECTRA (above), the PICOs and study type, size, and location extraction were validated via comparison against expert‐extracted datasets of each content type [19].
4. Screening
Screening represents the filtering of candidate records from search results to identify those that match the pre‐specified Inclusion Criteria (and exclude those that match Exclusion Criteria). AutoLit reduces this practice to a streamlined workflow for user identification of Included and Excluded records.
4.1. Software Workflow
4.1.1. Screening Configuration and Modes
Screening is configured by entering all Exclusion Reasons, with user‐defined hierarchical organization of reasons. User Keywords may also be created, which are then highlighted in color in candidate abstracts. Users also select the Screening Mode to be utilized, choosing between: Standard mode, in which all Title/Abstract (TIAB) and Full Text review is completed in one stage, and Two‐Pass mode, in which TIAB and Full Text screening are completed in two stages. Users also choose between Single mode, where each screening stage requires only one decision (such as may be utilized in rapid review or targeted literature review), and Dual mode, wherein two reviewer‐level screening decisions must be completed, with all conflicts resolved by a third‐party Adjudicator (as is conventional in SLR). Note that, due to higher rates of error in Single screening, Dual screening at both TIAB and Full Text stages are the recommended modes for SLRs [22].
4.1.2. Screening Performance
Screening is completed from a single window containing TIAB or Full Texts PDFs, as relevant, and all Exclusion Reasons. Users select reasons or Advance/Include records; in Dual modes (Standard or Two‐Pass), two reviewer decisions must be made before records are advanced to Adjudication. In Two‐Pass modes (Single or Dual), TIAB Advanced records are sent forward for Full Text screening before final Inclusion. Full Screening History is auditable on a record‐by‐record basis.
4.1.3. Related Reports Management
During or after Screening, users can identify any ‘linked’ or related reports. As outlined in the Cochrane Handbook [23], the unit of interest in an SLR is the study that was performed, not the published record or records reporting its findings. Users manually identify and link Related Reports by their Title or Authors to prevent inadvertent multiple‐reporting of data on the same population.
4.1.4. PRISMA Flow Diagram Creation and Management
AutoLit enables compliance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta‐Analyses) by structuring the review process into transparent, traceable stages aligned with PRISMA guidelines [24]. The platform logs the search strategies, Exclusion Reasons and screening decisions, and extracted data, ensuring reproducibility and auditability. It generates a PRISMA Flow Diagram and maintains exportable records that support full reporting of methods and results, facilitating adherence to PRISMA 2020 standards for transparency and completeness in systematic reviews. Note that, since the PRISMA Checklist includes items regarding methodological reporting and contents beyond the Flow Diagram, manual checklist completion is necessary for any PRISMA‐compliant review.
4.2. AI Tools
4.2.1. The Inclusion Prediction Model (Robot Screener)
As users screen records, all TIAB decisions (in Two‐Pass mode) or final decisions (in Standard mode) are available to train the Inclusion Prediction Model in each specific ‘nest’ (each specific review project). The Inclusion Prediction Model is a text‐embedding‐driven machine learning classifier utilizing XGBoost (Extreme Gradient Boosting) [25] trained in real time to provide an inclusion‐prediction score and provide a threshold to optimize decisions regarding whether each remaining unscreened article should be included or excluded. It trains on user decisions and each record's metadata and textual content (e.g., title, abstract, keywords) as input features. In each nest, the Recall, Precision, F1, Accuracy, and Area under the Receiver Operating Characteristic curve are calculated in a fivefold Cross‐Validation and displayed to the user to assist with determining the appropriateness of employing the Model (Figure 2a) [26].
Figure 2.

The Inclusion Prediction Model learns from expert screening on a project‐by‐project basis, with transparent fivefold cross‐validation statistics for researchers to review regularly. a (above): Example of the cross‐validation statistics from an example project in AutoLit. b (below): Robot Screener decisions displayed in the Adjudication workflow, with disagreements between human and Robot decisions available to be adopted or overridden by the independent adjudicator.
The recommended method of employing the Inclusion Prediction Model is as a second reviewer (“Robot Screener”) in TIAB reviewer‐level screening. The Model requires a minimum of 50 decisions, of which 10 must be Advancements/Inclusions, to train initially. From this point forward, automatic re‐training may be configured, and Robot Screener decisions are displayed to the Adjudicator as soon as one human reviewer‐level decision is made (Figure 2b).
4.2.2. Validation Methods
Largely due to the labor‐intensive nature of manual dual screening, AI is generally accepted by guidelines as a suitable means of supporting the identification of relevant studies [13, 14, 15, 16]. The Inclusion Prediction Model was validated in two separate studies: one internal [18], using a set of nineteen reviews performed using Robot Screener for diverse review topics, and one external [27], using a set of six clinical and economic reviews for HTA purposes. Note that in both validations, more expert decisions were available than Robot decisions, as the initial records screened by experts served as the training set in the performance of Robot‐assisted screening.
5. Tagging (Qualitative Extraction)
SLRs require the extraction of qualitative information on underlying studies, ranging from study designs to populations included to the definitions of outcome variables. To accomplish qualitative extraction, AutoLit offers a Tagging module, wherein both AI‐configured and user‐configured Tags are placed in a hierarchy, with the relationships between concepts represented by placing ‘child tags’ below ‘parent tags’ (e.g., “Overall Survival” may be placed below “Clinical Outcomes,” which in turn may be below “Outcomes”). Then, these tags are applied to underlying studies, with traceable extraction of text segments, numbers, or customized tables of data.
5.1. Software Workflow
5.1.1. Tagging Configuration
Tags are configured with a Tag Name and, optionally, a Description and/or Aliases (i.e., synonyms or acronyms). Parent‐child relationships are created by drag‐and‐drop, defining the position of the tag in the hierarchy. By default, Tags are set for Text extraction, though advanced configuration enables Tags to extract numeric‐only, drop‐down options, or tabular data [28]. Tags may also be imported from user‐configured templates.
In Form‐based Mode, Tags may also be identified as Questions, which represent the questions shown to the user and/or AI models for extraction. In brief, Single Apply Questions are applied or not based on the relevance of an individual tag in the hierarchy, while Select questions are answered with one (Single Select) or several (Multiple Select) child tags of the Tag configured as a Question.
5.1.2. Tagging Individual Studies
Users can apply Tags from a single interface containing Full Text PDFs and a Tagging panel (in Form‐based Mode, the Tagging panel is presented Question‐by‐Question). If using AI‐driven Recommendations, a Recommendations panel displays and populates AI‐extracted information upon user selection; in addition, all applied Tags can be edited or overridden. Date, time, and user (or AI) for each applied Tag are auditable within each record.
5.2. AI Tools
5.2.1. Core Smart Tags
Core Smart Tags (CSTs) use the multi‐model system [19] described above under Search Exploration; however, instead of constructing searches, CSTs builds hierarchies of PICOs, study size, type, and location (Figure 3) based on the abstract contents of all search results and the user‐supplied Research Question. Not only are Tags generated for all PICOs concepts recognized and reported in over 0.5% of abstracts, for a structure of study types (e.g. Randomized Controlled Trial [RCT], Observational), and country and study population size, but text excerpts are also automatically extracted. If used in “Recommend” mode, CSTs provide Recommendations for extraction that must be user‐curated; if in “Apply” mode, CSTs are applied to underlying records. While curation is enabled in either mode, Applied CSTs are immediately available for interpretation/analysis or export. Due to the high standard for accuracy in extraction, human‐in‐the‐loop curation of Recommendations is recommended for SLRs, though immediate AI‐Applied Tags may be sufficient for rapid review types.
Figure 3.

Core Smart Tags extracts preset elements (PICOs, Study Location, Type, Size) and builds the project's data extraction template/tag hierarchy. These hierarchies can be edited and expanded by the user, such as adding tags for specific sub‐categories, administrative annotation, and concepts outside of PICOs, Study Location, Type, and Size.
5.2.2. Adaptive Smart Tags
Adaptive Smart Tags (ASTs) provide LLM‐driven extraction of evidence from custom/user‐configured Questions and Tags. While CSTs are fundamentally limited to abstracts and content type, ASTs enable custom extraction from abstracts and Full Text PDFs, with extraction of direct quotations, numerical data, drop‐down options, tabular data, and/or generative answers to user questions. Like CSTs, ASTs can be used in Recommend or Apply modes and provide annotation of underlying sources (see Figure 4) and full audit records of any applied Tags.
Figure 4.

Adaptive Smart Tags extracts evidence from abstracts or Full Texts (pictured here) based on custom, project‐specific Questions. Contents extracted are highlighted for full traceability, and can be text, numeric, options/drop‐down, or tabular data (pictured here).
5.2.3. Validation Methods
We validated each element of CSTs against gold‐standard datasets. For CST‐generated PICOs and study size, the underlying model was tested against an open‐source EBM‐NLP PICO Corpus [29]; for study location, we used ClinicalTrials. gov study locations from NCT‐linked studies, where every country was assessed independently for multi‐country studies; for study type, we hand‐labelled 1,000 PubMed‐sourced records.
ASTs were validated by external researchers in 2024 [30]; however, a second version of ASTs was launched in January 2025, and to date, only the prelaunch validation statistics from comparison against six SLRs is available (see below). Note that, while ASTs incorporate LLMs that can extract in multiple languages (tested in development on Romance languages and Chinese), all validation studies were performed on English‐language studies, meaning that generalizability of such validation findings to other languages remains to be demonstrated.
6. Meta‐Analytical (Quantitative) Extraction
6.1. Software Workflow
6.1.1. Meta‐Analytical Configuration
In addition to qualitative extraction, SLRs may require extraction of highly structured quantitative data for use in meta‐analysis. In AutoLit, users configure the Interventions and Data Elements for extraction within the Hierarchy by providing additional statistical context for any Tag. In short, a single hierarchy represents ‘nested’ Interventions, and each individual Data Element can be configured as dichotomous (event rate), categorical (ordinal), or continuous (a central tendency such as mean or median, with standard deviation, inter‐quartile range, or range). Users may configure Dual Meta‐analytical Extraction, wherein two reviewers complete extraction of every variable, with independent adjudication of differences in findings between reviewer‐level extraction.
6.1.2. Meta‐Analytical Extraction
To extract data, users first identify the interventions and arm sizes/populations; then, for each Data Element, users can extract a Baseline timepoint if relevant and any number of Outcome timepoints. For dichotomous variables, users extract the event rate of the chosen outcome; for categorical, the number of occurrences of each ordinal category; for continuous, the central tendency measure and the measure of variance. All arm sizes may be adjusted for loss‐to‐follow‐up.
6.2. AI Tools
6.2.1. Smart MA Extraction
AutoLit provides Smart MA Extraction, which can either build the full Intervention and Data Element hierarchy (similarly to CSTs) or extract user‐configured Data Elements (more akin to ASTs). This multi‐model approach uses a PDF reader (PDF Miner) [31], LLMs, heuristics, and PICOs ontologies to extract Interventions, Data Elements, and timepoints from tables and text (but with no data extracted from figures or supplements). The model then harmonizes the heterogenous data via comparison and combining of ‘like’ elements before ingesting all data, with direct traceability, from underlying PDFs.
7. Critical Appraisal
AutoLit also offers a fully manual module for Critical Appraisal. In short, users can select a pre‐configured Critical Appraisal survey and execute it on underlying studies, with the system tracking each user answer and generating Traffic Light and Domain Distribution diagrams. As of May 31, 2025, the available options are: Cochrane RoB 2, Version 2; Cochrane ROBINS‐I, Version 1; SIGN, Versions 2011 and 2019; JBI, Version 2020; Newcastle‐Ottawa Scale, Version I; JBI, Version 2020; QUADAS, Version 2. For relevant tools (JBI, SIGN, Newcastle‐Ottawa), multiple study types may be assessed in a single review, and users must answer each question as laid out in each survey's published formats, with the option to include comments or text. Like other modules, Critical Appraisal has full audit records and is exportable, but no AI assistance has yet been integrated into AutoLit.
8. Study Inspector
In addition to the AI‐assisted Search, Screening, Qualitative and Quantitative Extraction, and manual Critical Appraisal, AutoLit offers a curation environment, Study Inspector. In this module, flexible filters are available on all underlying studies (e.g. “Included,” “Tagged as RCT,” “Published after 01/01/2024”). Users can then view filtered results and edit individual records or complete “Bulk Actions” such as bulk inclusion, PDF upload, or Tagging. From Inspector, users can also employ the Search Exploration tools on filtered records, receiving a frequency analysis of PICOs, study type, location, size, date, acronyms, and Topics (see above).
9. Export and Reporting
9.1. Export
All data can be exported via the Download function from the Inspector page: metadata (in CSV or RIS files), annotated PDFs, Screening Decisions, Tags, Meta‐analytical Data, or Critical Appraisal are all available for rapid export. In addition, users can construct custom Excel/CSV exports containing any combination of metadata, Tags, Interventions, Data Elements, and Appraisal findings, as applicable.
9.2. Qualitative Synthesis
The AutoLit module is connected to an accompanying Synthesis module, which allows users to draft Abstracts and Manuscripts, build customizable Dashboards of tables, images, and text, and drill down on findings. Note that no automated writing systems are integrated into the system; all written findings must be drafted by the user. Automatically‐generated visuals are available for Tagging in the Qualitative Synthesis module, which provides an interactive Sunburst diagram for filtering, ‘zooming in’, and reading qualitative evidence. Synthesis also displays users’ Search Strategies and generates an updatable/living PRISMA 2020 Flow Diagram [24], ensuring full traceability of the searches, records, and findings for any review performed in AutoLit.
9.3. Insights
Users may also draft Insights, which connect traceable evidentiary statements to an overall conclusion or finding. After extracting all relevant Tags with textual or table extractions, the user drafts findings in a text box, and in the Qualitative Synthesis environment, identifies the Tags and studies this Insight is drawn from. When finalized, the Insight becomes clickable to the reader of Qualitative Synthesis, who can view the textual Insight, with direct highlights of relevant tags and a list of Associated Studies. When selected, these Associated Studies display the Tags and textual or table extractions supporting this Insight, providing full traceability of the findings. To date, all Insight drafting is manual.
9.4. Quantitative Synthesis
Then, Quantitative Synthesis provides interactive analytics on all data extracted in Meta‐analytical Extraction. This module presents Summary statistics presented by Intervention as well as performing and presenting Network Meta‐analysis (NMA). NMA is performed in Quantitative Synthesis using an open‐source model, shukra, a fork of the R “meta” package, for Random or Fixed‐Effects Frequentist analysis [32]. In short, the shukra model is run directly on the Interventions and Data Elements extracted in the Meta‐analytical (Quantitative) Extraction module, with continuous and dichotomous variables, provided alongside timepoints, units and total population numbers. Indirect treatment comparisons are generated between all Interventions with sufficient connections in the Network; the network itself is constructed by the user or by Smart Meta‐analytical Extraction when configuring Interventions. Based on these connections and the values extracted in the Meta‐analytical module, shukra generates each SLR's Network Diagram, Forest Plots, Funnel Plots, I‐squared calculations, SUCRA Rankings, and Odds Ratios (Figure 5). Note that 95% Confidence Intervals are generated and visualized for all variable types supported and for both I‐squared and Odds Ratio values as well as at the study level within all Forest Plots.
Figure 5.

Forest Plot from automated NMA of data extracted in the Meta‐analytical Module in AutoLit. Forest plots, Funnel plots, I‐squared calculations, SUCRA Rankings, and Odds Ratios with 95% Confidence Intervals, as well as the Network Diagram for hierarchical meta‐analysis, are automatically generated in the Quantitative Synthesis module.
Synthesis pages are link‐ and QR‐code shareable, and enable rapid download of data from underlying studies. Notably, Synthesis pages may also be embedded by iFrame, enabling interactive materials to be attached to publications as demonstrated by review authors such as Barbosa et al. [33].
9.5. Rapid Review Approaches
While SLRs require adherence to best practices in oversight [3], AutoLit enables a wide range of rapid review types, including Targeted Literature Reviews, Scoping Reviews, Narrative Review support, and Gap Analyses [6]; each review type may have differing levels of human oversight/curation of outputs. Documentation of the methods to employ for AI‐driven rapid reviews (with flexible levels of curation enabled, but not required) are disclosed online [34], including time savings estimates. These rapid review methods in AutoLit—ranging from Single Screening mode to fully AI extraction with ASTs enable same‐day delivery of AI Search, Screening, and Qualitative and Quantitative Extraction, though with limits to comprehensiveness and curation/quality.
9.6. Methods Summary
AutoLit enables each step in an SLR in a single software solution, with AI automations for Search, Screening, and Qualitative and Quantitative Extraction. The automated tools are deployed in a manner that enables human‐in‐the‐loop curation of all AI‐generated or extracted contents, ensuring that SLR practitioners may follow best practices identified in published guidance and HTA position statements while also employing AI assistance to accelerate critical review stages.
10. Results
Validation tests have been performed on Smart Search, Robot Screener, Adaptive Smart Tags, and Core Smart Tags; no validation has been performed to date on Smart Meta‐analytical Extraction. All Validation results are summarized in Table 1.
Table 1.
Summary of Validation Tests.
| Review stage | AI tool in autoLit | Gold standard data set | AI Performance in validation test |
|---|---|---|---|
| Search strategy construction | Smart Search (run on PubMed) | Validation 1: 10 Cochrane reviews; | 79.6% Recall (vs. 19 systematic reviews). |
| Validation 2: 19 reviews from a previous validation set. | 76.8% Recall (vs. Cochrane reviews), | ||
| Abstract screening | Robot Screener (as reviewer in Dual Mode) | Internal Validation: Human reviewers in 19 reviews in published validation set (containing 8,580 Advanced records). |
97.1% Recall (vs. 94.4% for humans), 47.3% Precision (vs. 86.4% for humans). |
| External Validation: Human reviewers in three clinical and three economic reviews (total of 8,729 records) |
82% Recall (vs. 75% for dual experts), 50% Precision (vs. 85% for dual experts). |
||
| Extracting evidence | Adaptive Smart Tags (Version 1) | Expert annotations in qualitative review. | 68.4% Accuracy. |
| Core Smart Tags (on abstracts) |
Study Type: 1,000 hand‐annotated PubMed results, Study Location: ClinicalTrials.gov locations for NCT‐linked studies; Study Size and Populations, Interventions/Comparators, and Outcomes (PICOs): Open‐source EBM‐NLP PICO Corpus. |
Study Type: F1 of 0.74 and Accuracy of 74%; Study Location: Accuracy of 78%, Recall of 79%, Precision of 90%; Study Size: Accuracy of 91%; PICOs extraction: F1 score of 0.74. |
10.1. Smart Search Validation Results
For the Smart Search Validation [17], the following gold standard Cochrane review topics were selected: multiple sclerosis, non‐small cell lung cancer, renal cell carcinoma, subfertility, nonalcoholic fatty liver disease, epilepsy, human immunodeficiency virus treatments, statins, ischemic conditioning, and prostate cancer. Smart Search had Recall of 76.8% in finding included records from Cochrane reviews; for comparison, GPT 4.0 had 13.0% Recall. Smart Search had 1.81% Precision; the expected Precision in expert‐drafted searches ranges from 1% to 5%, with the largest study to date by Wang et al. finding Precision of 5.3% [22].
When compared against nineteen diverse gold standard reviews from earlier validation studies of Nested Knowledge, Smart Search had Recall of 79.6%. However, in this set, Precision was lower, with 0.47% of results representing records included in the gold standard reviews [17].
10.2. Robot Screener Validation Results
10.2.1. Internal Validation
In an Internal Validation against nineteen reviews performed using Robot Screener containing 8,927 total ‘Advanced’ records, human reviewers correctly Advanced 8,097/8,580 records; Recall was 94.4% and Precision was 86.4% [18]. After training on initial human decisions, Robot Screener correctly Advanced 5,791/5,965 records; Recall was 97.1% and Precision was 47.3%. Based on a two‐sided chi‐squared analysis, Robot Screener had significantly higher Recall (p < .001) than humans, but significantly lower Precision (p < .001). In effect, AI found more includable records but was also more likely to advance excludable records than humans.
10.2.2. External Validation
Cichewicz et al. validated Robot Screener in updates of six reviews; after fully expert Dual screening of 80% of records, Robot Screener was tested as a second screener on the final 20% of records [27]. Three clinical reviews containing 3,194 records (testing set = 640) and three economic reviews containing 8,729 records (testing set = 1729) were used. Mean Robot Screener Recall was 82% (Standard Deviation: 15%) compared to Dual Expert Recall of 75% (23%; p = 0.59). Mean Robot Screener Precision of 50% (15%) was significantly lower than Dual Experts' Precision of 85% (16%; p = 0.008).
10.2.3. Screening Time Savings
The time savings for Robot Screener in Dual Screening compared to manual screening has been previously reported; in simulation, time savings were 46% [35], while in practice, it has been reported to be 50% during Abstract screening [36].
10.3. Tagging Validation Results
10.3.1. Core Smart Tags
CSTs performance has been reported [19]: for study type, F1 was 0.74 and Accuracy was 74%, though the model achieved 96% Recall for finding RCTs. For study location, Accuracy was 78%, Recall was 79%, and Precision was 90%. Study size accuracy was 91%. PICOs extraction and hierarchy building had an F1 score of 0.74.
10.3.2. Adaptive Smart Tags
Version 1 of ASTs was validated in 2024, with 68.4% overlap between ASTs and expert annotations [30]. However, since the launch of Version 2 of ASTs, which included upgraded LLM models and also enabled extraction of more content types, no formal validation study has been undertaken.
In an internal prelaunch validation study performed on three clinical reviews, one review of methods, a regulatory review, and a real‐world evidence design review, ASTs Version 2 achieved 76% Recall and 69% Precision against uncurated Full‐Text expert annotations. However, these results should be considered preliminary given the small number of reviews and nature of the expert extractions.
10.3.3. Smart MA Extraction
Smart MA Extraction was launched on May 12, 2025, and to date, no postlaunch validation studies have been performed. To ensure veracity of data, review of all Interventions and Data Elements configured and/or extracted by the model by an expert is the best practice for use in meta‐analysis.
10.3.4. Tagging Time Savings
While curation of AI findings is the best practice for any SLR, AI‐assisted processes can vastly improve the efficiency of qualitative extraction, with 70%–80% time savings reported in independent testing of the Tagging module [36].
11. Discussion
The use of review‐specific AI tools in SLRs has significant implications for the practice, efficiency, and quality of evidence synthesis. As described above, the capability to perform each key stage of review with AI assistance, in the context of human‐in‐the‐loop workflows such as AutoLit, provides both the immediate implementation of AI‐driven review and an approach to ensure that the final review deliverables match the quality of manual SLRs. Furthermore, we present validations for searching, Screening, and Qualitative Extraction automations, as well as time savings of 50%–80% for the most work‐intensive portions of the review process. Lastly, the system presented here enables users to conduct ‘living’, updatable reviews and automated Synthesis of qualitative tags, quantitative evidence and NMA, although no AI writing or interpretation is currently offered. This framework follows established best practices [3, 4], providing a replicable workflow with transparent outputs for all SLRs.
The workflow and AI models reported here overcome a widely‐recognized unmet need. Reviews have been estimated to cost over $141,000 in labor [37] and require 42 weeks of effort for teams of 7 [38]; in addition, work‐intensive review steps were identified as the primary blocker to living HTA [39]. An assessment of published SLRs found that only 10%–17% of oncological treatments were covered by an up‐to‐date review [40], and especially as the publication of primary evidence accelerates, this problem can only be addressed by introducing efficiencies that enable faster delivery while maintaining quality, reproducibility, and transparency in methods. Furthermore, to bridge evidence gaps, rapid/scoping reviews that are less systematic in structure have been called for [7]. The flexibility of AutoLit's workflow to offer systematic review AI tools with full expert workflows, but with the option to skip full curation steps and adopt AI findings in the context of more rapid review types, enhances the value of the methods disclosed here for evidence synthesis generally.
Determining which features are manual and which can be done with AI or hybrid manual‐plus‐AI tools, a 2022 comparison identified thirty key features for SLR from four previous feature analyses, covering Retrieval, Appraisal, Extraction, Documentation, Administration, and Access and Support [41]. As of 2025, while the majority of listed features involve or require manual input, AutoLit and Synthesis combine to offer all thirty features, satisfying published SLR feature demands in a single platform.
Ultimately, compliance with publishing guidelines, journal rules, and guidance from HTA bodies such as the UK's NICE is a critical consideration in the use of AI in evidence synthesis. Notably, NICE's guidance [2] identifies three pillars of responsible AI use: (1) Human‐in‐the‐loop expert oversight: “Any use of AI methods should be based on the principle of augmentation, not replacement, of human involvement (i.e., having a capable and informed human in the loop);” (2) Methodological transparency: “When AI is used, the submitting organisation and authors should clearly declare its use, explain the choice of method and report how it was used, including human input,” as well as recommending the PALISADE and other AI checklists; and (3) Replicable practices: “[Submitters should] consider how these methods can be accessibly presented, including appropriate referencing… When available, consider using tools to support the explainability of AI methods and increase transparency of their application.” [2] Nested Knowledge provides human‐in‐the‐loop methods that enable researchers to follow NICE guidelines and ensure full expert curation for any reviews that may be submitted to bodies that do not yet have AI guidance. These practices ensure that AI tools used in evidence synthesis meet the necessary ethical, methodological, and regulatory requirements.
While the principal advantage of AI‐assisted SLR systems is the reduction in time required for tasks requiring extraordinary time contributions by experts, it is important to note that AI tools have inherent limitations and are not a replacement for expert judgment. Rather, it acts as an assistive tool, streamlining workflows and enabling researchers to focus on more complex tasks, such as analysis and interpretation [2]. For that reason, there is greater skepticism about AI writing and interpretation, and until validation is available, all final interpretations of review data should be performed by trained experts. Beyond AI writing, additional validation opportunities also exist within AutoLit's current modules; note that no validation of Deduplication, Free full text sourcing, Version 2 of ASTs, or Smart Meta‐analytical Extraction has been published to date, and certain steps other than writing such as Critical Appraisal are not yet automated in AutoLit.
Moving forward, there are opportunities for further advancement of tools, methods, and best practices in AI‐assisted SLR. Researchers should prioritize transparency in reporting use of AI models, and developers should provide validations of AI tools for the specific SLR steps they address. However, demonstrating generalizability of model performance goes beyond validation studies, and users should employ curation and oversight mechanisms to ensure quality. Furthermore, collaborations between academia, industry, publishers, HTA bodies, and other stakeholders on AI SLR tools used can help refine these practices and ensure their compliance with emerging standards for AI in evidence synthesis.
The methods reported here represent the first full SLR system for performance of human‐in‐the‐loop AI reviews, with validation against expert practices and time savings shown in addition to the methods that can be used to perform rapid and systematic reviews. Further development may enable AI assistance with stages like Critical Appraisal and Interpretation, with emerging tools held to the best practice standards in reproducible SLRs.
All methods outlined in this publication are detailed in full in the AutoLit online Documentation. All validation studies and statistics are available in the Documentation or upon request to the authors.
Author Contributions
Kevin M. Kallmes: conceptualization, investigation, writing – original draft, methodology, validation, writing – review and editing, formal analysis, data curation, supervision, project administration. Jade Thurnham: conceptualization, investigation, writing – original draft, methodology, validation, visualization, writing – review and editing, formal analysis, data curation. Marius Sauca: conceptualization, investigation, writing – original draft, writing – review and editing, methodology, formal analysis. Ranita Tarchand: conceptualization, investigation, writing – original draft, writing – review and editing, methodology, formal analysis. Keith R. Kallmes: conceptualization, investigation, writing – original draft, writing – review and editing, methodology, formal analysis. Karl J. Holub: conceptualization, investigation, writing – original draft, writing – review and editing, methodology, validation, visualization, software, formal analysis, data curation, supervision.
Peer Review
The peer review history for this article is available at https://www.webofscience.com/api/gateway/wos/peer-review/10.1002/cesm.70059.
Acknowledgments
We acknowledge the support of the full Nested Knowledge team and organization in the creation of this tool and the drafting of this publication. No specific funding was provided for this publication, but all authors have worked on both the tools and disclosure of methods and validations discussed here.
Kallmes K. M., Thurnham J., Sauca M., Tarchand R., Kallmes K. R., and Holub K. J., “Human‐in‐the‐Loop Artificial Intelligence System for Systematic Literature Review: Methods and Validations for the AutoLit Review Software,” Cochrane Evidence Synthesis and Methods 3 (2025): 1‐13, 10.1002/cesm.70059.
Data Availability Statement
The data that support the findings of this study are openly available in Nested Knowledge Documentation at https://about.nested-knowledge.com/docs/autolit/.
References
- 1. Ge L., Agrawal R., Singer M., et al., “Leveraging Artificial Intelligence to Enhance Systematic Reviews in Health Research: Advanced Tools and Challenges,” Systematic Reviews 13 (2024): 269, 10.1186/s13643-024-02682-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Lieberum J. L., Toews M., Metzendorf M. I., et al., “Large Language Models for Conducting Systematic Reviews: on the Rise, but Not yet Ready for Use‐A Scoping Review,” Journal of Clinical Epidemiology 181 (2025): 111746, 10.1016/j.jclinepi.2025.111746. [DOI] [PubMed] [Google Scholar]
- 3. Amann J., Blasimme A., Vayena E., Frey D., and Madai V. I., “Explainability for Artificial Intelligence in Healthcare: A Multidisciplinary Perspective,” BMC Medical Informatics and Decision Making 20, no. 1 (2020): 310, 10.1186/s12911-020-01332-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.National Institute for Health and Care Excellence (NICE). Use of AI in Evidence Generation– NICE Position Statement. NICE. Published October 2023, accessed May 29, 2025, https://www.nice.org.uk/about/what-we-do/our-research-work/use-of-ai-in-evidence-generation--nice-position-statement.
- 5. Rycroft C. E., Fernandez M., and Copley‐Merriman C., “Systematic Literature Reviews at the Heart of Health Technology Assessment: A Comparison Across Markets,” Value in Health 16, no. 7 (2013): A481, 10.1016/j.jval.2013.08.1236. [DOI] [Google Scholar]
- 6.Guidance on Review Types. Nested Knowledge, accessed May 29, 2025, https://about.nested-knowledge.com/docs/guidance-on-review-types/.
- 7. Thurnham J., Holub K., Johnson J., et al. AutoLit Documentation. Nested Knowledge. Last edited May 15, 2025, accessed May 29, 2025, https://about.nested-knowledge.com/docs/autolit/.
- 8. Twaites J., Holub K., Johnson J., et al. Model Cards. Nested Knowledge. Last edited January 17, 2025, accessed May 29, 2025, https://about.nested-knowledge.com/docs-category/model-cards/.
- 9. Padula W. V., Kreif N., Vanness D. J., et al., “Machine Learning Methods in Health Economics and Outcomes Research—The Palisade Checklist: A Good Practices Report of an ISPOR Task Force,” Value in Health 25, no. 7 (2022): 1063–1080. [DOI] [PubMed] [Google Scholar]
- 10.Crossref. Crossref Metadata Search. Crossref. Published 2023, accessed May 30, 2025, https://search.crossref.org.
- 11. Tkaczyk D., Szostek P., Fedoryszak M., Dendek P. J., and Bolikowski Ł., “Cermine: Automatic Extraction of Structured Metadata From Scientific Literature,” International Journal on Document Analysis and Recognition (IJDAR) 18, no. 4 (2015): 317–335, 10.1007/s10032-015-0249-8. [DOI] [Google Scholar]
- 12. Elliott J. H., Synnot A., Turner T., et al., “Living Systematic Review: 1. Introduction—The Why, What, When, and How,” Journal of Clinical Epidemiology 91 (2017): 23–30, 10.1016/j.jclinepi.2017.08.010. [DOI] [PubMed] [Google Scholar]
- 13.Cochrane Training. Artificial Intelligence Technologies in Cochrane. Cochrane. Published January 18, 2023, accessed May 29, 2025, https://training.cochrane.org/resource/artificial-intelligence-technologies-in-cochrane/.
- 14.ISPOR. Revolutionizing Systematic Reviews: Harnessing the Power of AI. ISPOR. Published May 20, 2024, accessed May 29, 2025, https://www.ispor.org/education-training/webinars/webinar/revolutionizing-systematic-reviews--harnessing-the-power-of-ai/.
- 15. Fleurence R., Bian J., Wang X., et al. Generative AI for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations. arXiv. Published July 9, 2024, accessed May 29, 2025, https://arxiv.org/abs/2407.11054. [DOI] [PMC free article] [PubMed]
- 16.National Institute for Health and Care Excellence (NICE). Developing NICE Guidelines: The Manual (PMG20). NICE. Published October 31, 2014. Last updated May 29, 2024, accessed May 29, 2025, https://www.nice.org.uk/process/pmg20/.
- 17. Twaites J., Kallmes K. M., and Holub K. J., “Assessment of Reasoning Agents for Building Literature Search Strategies,” Value in Health 28, no. Suppl 1 (2025): P22. [Google Scholar]
- 18. Thurnham J., Kallmes K., and Holub K., “MSR91 Assessing Recall in Abstract Screening: Artificial Intelligence Vs. Human Reviewers,” Value in Health 27, no. 6 (2024): S277. [Google Scholar]
- 19. Twaites J., Kallmes K., and Holub K., “Repeatable Auto‐Extraction Frameworks in Clinical Systematic Literature Review: Validating a Multi‐Model Human‐In‐The‐Loop Artificial Intelligence System for Extracting Study PICOs, Location, Size, and Type,” Value in Health 28, no. Suppl 1 (2025): SA72. [Google Scholar]
- 20. Kanakarajan K., Kundumani B., and Sankarasubbu M., “BioELECTRA: Pretrained Biomedical Text Encoder Using Discriminators.” Proceedings of the 20th Workshop on Biomedical Language Processing (Association for Computational Linguistics, 2021), 143–154, 10.18653/v1/2021.bionlp-1.16. [DOI] [Google Scholar]
- 21. Osiński S. and Weiss D., “Carrot2: Design of a Flexible and Efficient Web Information Retrieval Framework.” in Advances in Web Intelligence. Lecture Notes in Computer Science, eds. Szczepaniak P. S., Kacprzyk J., and Niewiadomski A. (Berlin, Heidelberg: Springer, 2005. 3528, 439–444, 10.1007/11495772_68. [DOI] [Google Scholar]
- 22. Wang Z., Nayfeh T., Tetzlaff J., O'Blenis P., and Murad M. H., “Error Rates of Human Reviewers During Abstract Screening in Systematic Reviews,” PLoS One 15, no. 1 (January 2020): e0227742, 10.1371/journal.pone.0227742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Li T., Higgins J. P. T., and Deeks J. J. Chapter 5: Collecting data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA, eds. Cochrane Handbook for Systematic Reviews of Interventions. Version 6.5. Cochrane; 2024, accessed May 31, 2025, https://training.cochrane.org/handbook/current/chapter-05.
- 24. Page M. J., McKenzie J. E., Bossuyt P. M., et al., “The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews,” BMJ 372 (2021): n71, 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Chen T. and Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD' 16). ACM; 2016:785–794, 10.1145/2939672.2939785. [DOI]
- 26. Sauca M. and Holub K. J. AI for Screening: How Is It Used, and how can We Tell how Accurate It Is? Nested Knowledge, https://about.nested-knowledge.com/2024/08/02/ai-for-screening-how-is-it-used-and-how-can-we-tell-how-accurate-it-is/. Published August 2, 2024, accessed May 31, 2025.
- 27. Cichewicz A., Pande A., Borkowska K., Mittal L., Wittkopf P., and Slim M., “MSR22 Automating Systematic Literature Review (SLR) Updates: A Comparative Validation Study of Artificial Intelligence (AI) Versus Human Screeners,” Value in Health 27, no. 6 (2024): S263. [Google Scholar]
- 28. Kallmes K. M. Tag Tables, Explained. Nested Knowledge, https://about.nested-knowledge.com/2023/07/10/tag-tables-explained. Published July 10, 2023, accessed May 31, 2025.
- 29. Nye B., Li J. J., Patel R., et al. A Corpus With Multi‐level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics; 2018:197–207, 10.18653/v1/P18-1019. [DOI] [PMC free article] [PubMed]
- 30. Ciapponi A., Bardach A., Glujovsky D., and Tarchand R. The Efficiency of “Nested Knowledge” to Facilitate the Conduction of Systematic Reviews and Meta‐analyses. Poster Presented at: Cochrane Global Evidence Summit; September 4–6, 2024; Prague, Czech Republic, https://abstracts.cochrane.org/2024-prague-global-evidence-summit/efficiency-nested-knowledge-facilitate-conduction-systematic.
- 31. Shinyama Y. PDFMiner: Python PDF Parser and Analyzer. Released June 29, 2008, accessed May 31, 2025, https://pypi.org/project/pdfminer/.
- 32. Holub K. J. Shukra: Tools for Meta‐analytical Statistics, Including Network Meta‐analysis (NMA), in NodeJS. Version 3.2.4. Published May 16, 2023, accessed May 31, 2025, https://github.com/holub008/shukra.
- 33. Barbosa M. F., Canan A., Xi Y., et al., “Comparative Effectiveness of Coronary CT Angiography and Standard of Care for Evaluating Acute Chest Pain: A Living Systematic Review and Meta‐Analysis,” Radiology. Cardiothoracic Imaging 5, no. 4 (2023): e230022, 10.1148/ryct.230022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Kallmes K. M. Rapid Reviews in Nested Knowledge. Nested Knowledge, https://about.nested-knowledge.com/docs/rapid-reviews-in-nested-knowledge/. Published March 19, 2025, accessed May 30, 2025.
- 35. Holub K. J. The Data Is In: Deciding When to Automate Screening in Your SLR. Nested Knowledge, https://about.nested-knowledge.com/2023/11/10/the-data-is-in-deciding-when-to-automate-screening-in-your-slr/. Published November 10, 2023, accessed May 30, 2025.
- 36. Grys M., Casciano R., and Pieniazek I., “Comparison of AI‐Enhanced Tools for Automating Scientific Literature Reviews,” Value in Health 28, no. Suppl 1 (2025): MSR56. [Google Scholar]
- 37. Michelson M. and Reuter K., “The Significant Cost of Systematic Reviews and Meta‐Analyses: A Call for Greater Involvement of Machine Learning to Assess the Promise of Clinical Trials,” Contemporary Clinical Trials Communications 16 (2019): 100443, 10.1016/j.conctc.2019.100443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Borah R., Brown A. W., Capers P. L., and Kaiser K. A., “Analysis of the Time and Workers Needed to Conduct Systematic Reviews of Medical Interventions Using Data From the PROSPERO Registry,” BMJ Open 7, no. 2 (2017): e012545, 10.1136/bmjopen-2016-012545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Thokala P., Srivastava T., Smith R., et al., “Living Health Technology Assessment: Issues, Challenges and Opportunities,” PharmacoEconomics 41, no. 3 (March 2023): 227–237, 10.1007/s40273-022-01229-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Créquit P., Trinquart L., Yavchitz A., and Ravaud P., “Wasted Research When Systematic Reviews Fail to Provide a Complete and Up‐To‐Date Evidence Synthesis: The Example of Lung Cancer,” BMC Medicine 14 (2016): 8, 10.1186/s12916-016-0555-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Cowie K., Rahmatullah A., Hardy N., Holub K., and Kallmes K., “Web‐Based Software Tools for Systematic Literature Review in Medicine: Systematic Search and Feature Analysis,” JMIR Medical Informatics 10, no. 5 (2022): e33219, 10.2196/33219. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are openly available in Nested Knowledge Documentation at https://about.nested-knowledge.com/docs/autolit/.
