Skip to main content
Cochrane Evidence Synthesis and Methods logoLink to Cochrane Evidence Synthesis and Methods
. 2025 Sep 27;3(6):e70050. doi: 10.1002/cesm.70050

Comparison of Elicit AI and Traditional Literature Searching in Evidence Syntheses Using Four Case Studies

Oscar Lau 1, Su Golder 2,
PMCID: PMC12483133  PMID: 41035533

ABSTRACT

Background

Elicit AI aims to simplify and accelerate the systematic review process without compromising accuracy. However, research on Elicit's performance is limited.

Objectives

To determine whether Elicit AI is a viable tool for systematic literature searches and title/abstract screening stages.

Methods

We compared the included studies in four evidence syntheses to those identified using the subscription‐based version of Elicit Pro in Review mode. We calculated sensitivity, precision and observed patterns in the performance of Elicit.

Results

The sensitivity of Elicit was poor, averaging 39.5% (25.5–69.2%) compared to 94.5% (91.1–98.0%) in the original reviews. However, Elicit identified some included studies not identified by the original searches and had an average of 41.8% precision (35.6–46.2%) which was higher than the 7.55% average of the original reviews (0.65–14.7%).

Discussion

At the time of this evaluation, Elicit did not search with high enough sensitivity to replace traditional literature searching. However, the high precision of searching in Elicit could prove useful for preliminary searches, and the unique studies identified mean that Elicit can be used by researchers as a useful adjunct.

Conclusion

Whilst Elicit searches are currently not sensitive enough to replace traditional searching, Elicit is continually improving, and further evaluations should be undertaken as new developments take place.

Keywords: artificial Intelligence (AI), evidence synthesis, literature searching, research methodology, systematic review

Summary

  • AI tools, such as Elicit, have been developed to improve the efficiency of evidence synthesis processes, including the identification of studies.

  • Using four case study evidence syntheses Elicit searches had a sensitivity between 25.5% and 69.2% (39.5% average) and precision between 35.6% and 46.2% (41.8% average).

  • Elicit identified some unique studies that met the inclusion criteria for each of the case study evidence syntheses.

  • Based on our results, Elicit is recommended as a supplementary search tool, but not sufficient to replace traditional search methods

  • Elicit is constantly improving and developing its systems, thus independent researchers should continue to evaluate its performance.

1. Background

Systematic reviews aim to answer a clear question by collating and synthesizing existing research on a specific topic [1, 2]. Searching on multiple databases and sources, identifying, selecting, and appraising studies are all part of the traditional methods of systematic reviews and other evidence syntheses, one that relies on rigorous, time‐consuming, and manual processes, often undertaken collaboratively as part of a team [1, 3, 4]. Elicit AI aims to simplify and speed up this process by up to 80%, with a claimed time saved per review of 16 h [5]. Elicit AI is an AI‐powered research assistant originally developed by Ought, a nonprofit machine learning (ML) research lab based in the US [6]. Since then, Elicit has become a for‐profit company backed by venture capital investment [7]. Elicit allows the user to ask a research question and from there, is able to significantly automate several of the key stages in the systematic review process by searching over 126 million papers, generating screening criteria, undertaking data extraction, and providing a first draft of the review in a very short period of time [8]. Traditionally, systematic reviews can take from 6 to 18 months [9]. Policy makers, like NICE, structure guidelines based on systematic reviews so there is a trade‐off between quality and time [10]. Waiting for a systematic review to be completed may have real world implications on patients who are being treated using potentially out‐of‐date guidelines. Alongside the speed and ease of use, Elicit claims to allow for more evidence to be considered and to avoid human bias without compromising on accuracy [11].

AI is projected to contribute $15.7 trillion to the global economy by 2030 [12] and in the UK alone, the number of AI companies has increased by 600% over the last 10 years [13] (Artificial Intelligence [14], As the use of AI in scientific research increases, the impact on research integrity becomes an ever more important factor to consider. Between 2003 and 2024, there has been an increase in over 15 000 papers published in Pubmed with the term ‘artificial intelligence’ [15]. Of these, only 3 included both the terms ‘artificial intelligence’ and ‘research integrity’. This highlights that although research integrity is a crucial consideration when using AI tools like Elicit, there is a lack of literature surrounding this topic. Elicit may be able to significantly speed up the systematic review process, but at what cost to the rigour and reliability of traditional methods. Recent evidence reinforces this concern. Clark et al. [16] conducted a systematic review of generative AI tools in evidence synthesis, and concluded that they generally are “not yet ready to be used without human oversight.” Likewise, Lieberum et al. [17] carried out a scoping review in 2024 on large language model workflows and reached the same verdict. However, crucially neither review evaluated Elicit ‐ instead mainly focussing on ChatGPT ‐ leaving a clear gap that our present study aims to address.

After compiling an answerable question, the next part of the systematic review process is to create comprehensive searches of multiple databases and other sources. This step is key to the quality of any systematic review. It is this stage of the review process that the present study evaluates by assessing the value of Elicit's literature search facility.

Our objective is to compare the use of Elicit in identifying the included studies in evidence syntheses reviews compared to traditional search methods in evidence syntheses.

2. Methods

We selected four different evidence syntheses (two systematic reviews, one umbrella review and one scoping review) in three different areas: public health (two reviews), pharmacology, and surgical procedure. This allows a comparison of Elicit across a range of subject areas to see how Elicit performs in each, thus improving the generalizability of our results. Two of the evidence syntheses (one umbrella review and one scoping review) were conducted in 2024 by one of the authors with the searches undertaken by an experienced information specialist. The other two were recent systematic reviews and were selected from the Cochrane Database of Systematic Reviews as Cochrane Reviews tend to be of higher quality than those published in peer review journals [18, 19, 20, 21, 22, 23]. The case studies were selected from very recently published evidence syntheses, ensuring they did not overlap with the GPT‐3 training period and thereby avoiding contamination.

Elicit uses GPT‐3 as its underlying large language model (LLM) and searches the Semantic Scholar database to find publications [6], searching over 126 million papers. Semantic Scholar is an open access AI‐based search engine providing ranked citations. Elicit has the advantage of helping to formulate the research question and screening articles identified against prespecified inclusion criteria as well as features to assist in the other key processes in systematic reviews.

For this evaluation, we used the Review feature in the subscription‐based version Elicit Pro. Elicit is available in three tiers: Basic (free), Plus, and Pro. The Basic version is intended for students and casual exploration and includes access to core Elicit functions such as literature search and basic extraction. Both Plus and Pro are subscription‐based, with the Plus tier offering increased usage limits. However, the Review feature is only available in Elicit Pro, which provides the highest usage limits and is the most comprehensive subscription‐based tier of Elicit. Elicit also offers three different modes tailored to various research tasks: Find Papers mode for literature searching, Research Report mode for generating summaries from selected papers, and Review mode, which we chose because it is the dedicated workflow specifically designed to support systematic reviews.

We translated the original research question from each evidence synthesis based on its PICO elements into an Elicit query. Next, Elicit finds the 500 most relevant studies based on the query, and the title and abstract. After this, Elicit generates automated screening criteria for these 500 studies. Some of the initial criteria did not align with those in the review, so Elicit makes it easy for us to adapt and specify the screening criteria ‐ this does not change the 500 potentially eligible studies. We manually adjusted the screening criteria where necessary to match the PICO elements and inclusion criteria of the original comparative reviews. This ensured that the criteria were aligned before Elicit carrying out the screening step. Once Elicit identified the eligible studies, we exported all 500 studies into Microsoft Excel and compared the studies found in Elicit to those included in the review. Elicit makes the screening process very transparent as it provides a spreadsheet with individual columns for every inclusion criterion and if each of the studies meets it. Finally, Elicit generates a score based on the number of criteria met by the study, factoring in the importance of each one. This score is then used to determine the overall outcome of the study, with the threshold score for inclusion set automatically by Elicit. Elicit stresses the fact that although it can fully automate the search, at any point in time, the user is able to ‘manually override any disagreements.’ [24]. We noted three categories: studies that Elicit did not find, studies that Elicit found but excluded based on not meeting the screening criteria, and studies that Elicit found and included. We emailed the authors of the original reviews with a list of studies identified by Elicit that hadn't been included in the original review. With the help of the authors, we were able to assess whether any of these studies met the original inclusion criteria–representing unique additional findings. We calculated the sensitivity and precision of each search using the following formulae: Sensitivity—Number of included records retrieved/Total number of included records *100, Precision—Number of included records retrieved/Total number of records retrieved *100 [25].

2.1. Case Study 1: Vaping Harms

This was an umbrella review on the acute and long‐term harms of vaping in young people under the age of 25 [26]. The authors of the review searched the KSR Evidence database on OVID, with no language restrictions on the search strategy. They also conducted a separate search for umbrella reviews on MEDLINE, Embase, and PsycINFO as these types of review are not indexed in the KSR Evidence database. The search strategy was limited to a publication date of 2015 onwards, reflecting the shift in the type of e‐cigarette device being used by young people at that time. The search strategy was last conducted on the 21st November 2024. In total, 381 records were screened at title and abstract stage with 56 reviews included in the final umbrella review. Thus, the precision of the searches was 56/381, 14.7%, with a Number Need to Read of 7.

For Elicit, the question asked was “What systematic reviews exist that comprehensively document the short‐term and long‐term health risks associated with vaping in populations aged 10‐24 years?” We asked Elicit the question on the 24th February 2025.

2.2. Case Study 2: Physical Activity

This was a scoping review mapping the evidence from economic analyses on the cost‐effectiveness of population‐based interventions that could be funded or provided by local authorities in the UK to increase physical activity and to inform decision making at local and national level [27]. The search was focused on economic literature from 2015 onwards, and contained two main segments: physical activity public health interventions in the UK and the study design being economic analyses. Fifteen databases were searched in this review (MEDLINE, Embase, CEA Registry, EconLit, Research papers in economics (RePEc) ‐IDEAS), Social Policy and Practice, Healthcare Management Information Consortium (HMIC), Social Care Online (up to 2022), Social Systems Evidence, ASSIA, SPORTDiscus, BNI, PsycINFO, DoPHER, TRoPHI) as well as citation searching and browsing of key websites and organisations. In total 4868 records were screened at title and abstract with 50 studies included in the final scoping review. Thus, the precision of the searches was 50/4868, 1.03%, with a Number Need to Read of 97.

For Elicit, the question asked was “What is the cost‐effectiveness of population‐based interventions that could be funded or provided by local authorities in the UK to increase physical activity, based on economic analyses published since 2015.” We asked Elicit the question on the 11th March 2025.

2.3. Case Study 3: Fertility Treatment

This was a systematic review published in the Cochrane Library evaluating the effectiveness and safety of vasodilators in women undergoing fertility treatment [28]. The inclusion criteria consisted of: population ‐ women of any age undergoing fertility treatment, intervention ‐ vasodilators administered via any route with or without other agents, comparator placebo/no treatment/other active intervention, and the primary outcomes being live birth or ongoing pregnancy and the side effects of vasodilators. The following electronic databases, trial registers, and websites were searched in this review: the Cochrane Gynaecology and Fertility Group (CGF) Specialised Register of controlled trials, the Cochrane Central Register of of Controlled Trials, via the Cochrane Register of Studies Online (CRSO), MEDLINE, Embase, PsycINFO, the Cumulative Index to Nursing and Allied Health Literature (CINAHL), Web of Knowledge, the Open System for Information on Grey Literature in Europe (OpenSIGLE), the Latin American and Caribbean Health Science Information Database (LILACS), clinical trial registries, and the reference lists of relevant articles. In total, 347 records were screened at title and abstract stage with 48 publications representing 45 studies in the final systematic review. Thus, the precision of the searches was 48/347, 13.8%, with a Number Need to Read of 7.

For Elicit, the question asked was “Evaluate the effectiveness and safety of vasodilators in women undergoing fertility treatment. Outcomes including (endometrial thickness, adverse drug reactions, live births, multiple pregnancy, ectopic pregnancy, clinical pregnancy, and miscarriage.” We asked Elicit the question on the 7th March 2025.

2.4. Case Study 4: Breast Reconstruction

This was a systematic review published in the Cochrane Library assessing ‘the effects of implants vs autologous tissue flaps for postmastectomy breast reconstruction on women's quality of life, satisfaction, and short‐ and long‐term surgical complications’ [29]. The inclusion criteria consisted of women undergoing primary breast reconstruction after mastectomy for breast cancer treatment or risk reduction, implant‐based reconstruction compared to any autologous‐tissue reconstruction. With the primary outcomes being patient‐reported outcomes (BREAST‐Q, BRECON‐31, and EORTC QLQ BRECON‐23), short‐ and long‐term complications or oncological outcomes. The Cochrane Breast Cancer Group's Specialised Register, CENTRAL, MEDLINE, Embase, and two trials registries were searched for included studies in this review. In total, 6308 records were screened at title and abstract stage with 41 publications representing 35 studies in the final systematic review. Thus, the precision of the searches was 41/6308, 0.65%, with a Number Need to Read of 154.

For Elicit, the question asked was “Assess the effects of implants vs. autologous tissue flaps for post mastectomy breast reconstruction on women's quality of life, satisfaction and short‐ and long‐term surgical complications.” We asked Elicit the question on the 6th March 2025.

3. Results

3.1. Case Study 1: Vaping Harms

Elicit identified 38 reviews that met the screening criteria according to Elicit. Of the 38, 14 were already included in the original umbrella review. 22 studies identified by Elicit did not meet the inclusion criteria for the umbrella review. One was a scoping review [30], four were protocols of reviews [31, 32, 33, 34], four were not systematic reviews Weni Nur [35, 36, 37, 38], six weren't looking at the harms [39, 40, 41, 42, 43, 44], three had no analysis of young people Arya Marganda [45, 46, 47], one was a pre‐print [48] of a study that was already included in the umbrella review [49] and three weren't available in English language [50, 51, 52]. A further two studies [53, 54] that were identified by Elicit met the inclusion criteria of the umbrella review and would have been included if they had been identified. Both reviews were not indexed on the KSR Evidence database at the time of searching; one was a report, and the other was not indexed on MEDLINE, Embase or PsycINFO.

Whilst Elicit included 14 of the originally included reviews, 42 of the reviews included by Golder were not included. 31 were not identified, and a further 11 were excluded based on multiple reasons such as ‘failure to meet health outcomes requirement’ [55, 56] and ‘uncertainty regarding specific age range.’ [57, 58, 59, 60]. Of the 31 unidentified included studies, 26/31 (83.9%) were indexed in Semantic Scholar, and 18/26 (69.2%) of these were open access (Figures 1, 2, 3, 4, 5, 6).

Figure 1.

Figure 1

Included Studies in Case Study 1.

Figure 2.

Figure 2

Included Studies in Case Study 2.

Figure 3.

Figure 3

Included Studies in Case Study 3.

Figure 4.

Figure 4

Included Studies in Case Study 4.

Figure 5.

Figure 5

Sensitivity Comparison of Elicit and Traditional Review for Each Case Study.

Figure 6.

Figure 6

Precision Comparison of Elicit and Traditional Review for Each Case Study.

The search in Elicit had a sensitivity of 27.6% (16/58) and a precision of 42.1% (16/38).

3.2. Case Study 2: Physical Activity

Elicit identified 30 reviews that met the screening criteria according to Elicit. Of the 30, 12 were also included in the original scoping review [26]. However, nine reviews considered eligible by Elicit were published before 2015 even though there was explicit screening criteria of a post 2015 publication date. The other eight studies identified by Elicit did not meet the inclusion criteria. Four were protocols [61, 62, 63, 64], two were not an evaluation of an intervention [65, 66], one did not include separate outcome data for physical activity [67] and one had no specific intervention evaluated in terms of Return on Investment [68]. Additionally, Elicit found one review [69] that would have been included in the original review had it been identified by the traditional searches. This study was published in the Journal of Transport and Health which was not indexed in the majority of the databases searched such as MEDLINE or Embase, however, it was indexed in PsycINFO but was not identified using the economic search filter used (although it was identified by the other facets of the search).

Elicit did not include 38 of the reviews included by Golder – 27 were not identified and a further 11 were excluded based on multiple reasons such as ‘limited by the individual level approach’ [70, 71, 72], ‘narrow employee population and workplace setting limit’ [73], ‘uncertainties around explicit UK context’ [74, 75, 76, 77] and ‘significant uncertainty regarding publication year’ [78, 79, 80]. Of the 27 unidentified included studies, 22/27 (81.5%) were indexed in Semantic Scholar, and 22/22 (100%) of these were open access.

The search in Elicit had a sensitivity of 25.5% (13/51) and a precision of 43.3% (13/30).

3.3. Case Study 3: Fertility Treatment

Elicit identified 78 studies that met the screening criteria according to Elicit. Of the 78, 32 were included in the original Cochrane Review. Similarly, to case study 1 and 2, Elicit identified four additional studies that would have been included had they been identified at the time of searching [81, 82, 83, 84]. Notably, all four RCTs found by Elicit alone were not indexed in either MEDLINE or Embase. The rest of the studies did not meet the inclusion criteria due to multiple reasons such as not being an RCT [85, 86].

Elicit did not include 16 of the RCTs included in the original Cochrane Review– 12 were not identified and a further four were excluded for multiple reasons such as ‘significant uncertainty exists regarding specific outcomes reported and the precise study design’ [87], ‘ doesn't specifically breakdown types of pregnancy.’ [88], ‘ambiguity about whether this fully aligns with the purposes of fertility treatment.’ [89] and ‘doesn't explicitly report endometrial thickness, live births, adverse reactions, ectopic pregnancy, miscarriage or multiple pregnancy.’ [90]. Of the 12 unidentified included studies, 7/12 (58.3%) were indexed in Semantic Scholar, and 4/7 (57.1%) of these were open access.

The search in Elicit had a sensitivity of 69.2% (36/52) and a precision of 46.2% (36/78).

3.4. Case Study 4: Breast Reconstruction

Elicit identified 45 studies that met the screening criteria according to Elicit. Of the 45, 12 were also included in the original Cochrane review. Additionally, Elicit identified a further four studies that weren't included in the original review, but met the inclusion criteria and in hindsight, would've been included. The following four are [91, 92, 93, 94]:.

Elicit did not include 29 of the studies included in the Cochrane review – 21 were not identified and a further eight were excluded for multiple reasons such as ‘absence of patient‐centred outcomes’ [95], ‘ minor uncertainty regarding direct comparison between reconstruction techniques’ [96], ‘ primarily focusing on costs and technical complications’ [97], ‘ cross‐sectional study’ [98], ‘quality of life measures and patient satisfaction were not measured’ [99], ‘ uncertainty regarding primary vs secondary reconstruction and patient reported measures’ [100] and ‘some population details remain implicit.’ [101] However [102], ‐ one of the eight excluded studies ‐ met all the inclusion criteria but wasn't included because the Elicit score was not high enough (it scored 4.8 with the cut‐off set at greater than 4.8). Of the 21 unidentified included studies, 21/21 (100%) were indexed in Semantic Scholar, and 7/21 (33.3%) of these were open access.

The search in Elicit had a sensitivity of 35.6% (16/45) and a precision of 35.6% (16/45).

Across all case studies, we observed that the stage of selecting the 500 most relevant studies via AI is neither as transparent or reproducible as using a traditional database search strategy such as in OVID Medline (Table 1).

Table 1.

Studies identified by Elicit and review with traditional searching methods for each case study.

Elicit inclusion Included in traditional review Not included in traditional review and did not meet inclusion criteria Not included in traditional review but did meet inclusion criteria Elicit precision (%) Elicit sensitivity (%) Traditional searching Precision (%) Traditional searching sensitivity (%)
Case Study 1: Vaping harms 38 14 22 2 16/38 (42.1%) 16/58 (27.6%) 56/381 (14.7%) 56/58 (96.6%)
Case Study 2: Physical Activity 30 12 17 (9 pre‐2015) 1 13/30 (43.3%) 13/51 (25.5%) 50/4868 (1.03%) 50/51 (98.0%)
Case Study 3: Fertility Treatment 78 32 42 4 36/78 (46.2%) 36/52 (69.2%) 48/347 (13.8%) 48/52 (92.3%)
Case Study 4: Breast Reconstruction 45 12 33 4 16/45 (35.6%) 16/45 (35.6%) 41/6308 (0.65%) 41/45(91.1%)

4. Discussion

The primary aim of this study was to assess Elicit's ability to identify studies that meet the inclusion criteria for an evidence synthesis. We hypothesized that Elicit may be able to assist in the literature search and screening of studies as part of the evidence synthesis process, and with improvements even replace traditional searching methods. We also sought to determine whether Elicit is comparable to traditional searching in terms of recall and precision.

Across all four case studies, Elicit was able to find and include additional studies that the original authors didn't. This may be due to Elicit's ability to search on a wider set of records [11], as it uses Semantic Scholar which indexes over 126 million articles. This, coupled with Elicit's speed and high precision makes it useful for conducting preliminary searches for example, costing a grant proposal, providing seed papers for improving traditional searches or testing search strategies, or determining whether there is a risk of an empty review. The fact that Elicit provides detail for each of the studies and reasoning for inclusion/exclusion in the screening process is an advantage over many other AI tools and it also allows the user to understand to some extent what Elicit is doing in real time and make adjustments ‐ strengthening the research integrity [15]. Whilst Elicit may not be at the stage of development to replace traditional searching methods, Elicit's ability to assist in the search process is very apparent through its speed and precision. Precision using Elicit across the four evidence syntheses had an average of 41.8% which is impressively high, compared to the average of the original reviews which was 7.55%. With such high precision, Elicit would be invaluable in the costing and proposal stage of a review. In addition to its potential role in identifying seed articles, Elicit could also serve as a useful tool for supplementary searches once traditional searching and screening have been completed.

Little independent research on the performance of Elicit has been conducted to date. Fenske and Otts [103] conducted a descriptive study comparing Elicit to PubMed and CINAHL. They surveyed 323 graduate nursing students, with the primary outcome being which resource they preferred and their opinions on Elicit. As the study was conducted in the fall of 2023, they used a beta version of Elicit. Of the 26% of students who preferred Elicit, 38.8% listed the user‐friendliness as a top strength. This aligns with our study as Elicit has a very shallow learning curve, which allows for it to be easily implemented as an auxiliary tool alongside a traditional literature search. By using Elicit as an adjunct, as shown in our case studies, it may allow the researcher to find additional studies that otherwise wouldn't have been identified.

However, in our study we noticed some key flaws in Elicit. For case study 2, Elicit deemed nine studies‐which were published before 2015‐ eligible even though we explicitly added to the screening criteria to only include articles published since 2015. This shows that even through manually adjusting screening criteria, Elicit is not perfect and studies can slip through the cracks in the screening stage. This potentially contradicts the benefit of being able to change the criteria by hand. Raising the question of the value of being able to make adjustments if Elicit doesn't always apply the criteria. The sensitivity for systematic review searches should reach at least 90% [104]. However, due to Elicit's limited sensitivity ‐ average of 39.5% ‐ this means that it currently cannot be used as a one stop shop in systematic review searches or other comprehensive reviews. Across all four case studies, an average of 80.9% of the unidentified included studies were indexed in Semantic Scholar, with 64.9% of these being open access. This suggests that the limited sensitivity observed is more likely due to Elicit's searching skills rather than the lack of access to the same journals. Case study 3 [28] had by far the highest sensitivity of 69.2%. A potential reason for this may be because the PICO structure and inclusion criteria for pharmacology reviews are more straightforward, well‐defined and binary. Alternatively, the two public health reviews had the lowest sensitivity, highlighting potential limitations with Elicit when the PICO is more ambiguous and complex.

Interestingly, in case study 2, Elicit retrieved a pair of true duplicate studies [105] – with the same DOI and the same journal publication—but treated them as distinct studies rather than recognising them as duplicates. Hence, they were scored individually, and they received two different scores: 4.1 (excluded) and 4.8 (included). This highlights possible inconsistencies with the screening process and questions whether Elicit is reliable in de‐duplication when the study appears in different sources.

Furthermore, in case study 4, the Elicit score threshold (as mentioned previously) was set at greater than 4.8, with [102] scoring 4.8, thus being excluded. However, several other studies—for example [106, 107, 108] ‐ also had a score of 4.8 but were included by Elicit. This highlights potential inconsistencies in Elicit's screening, proving that human oversight remains an integral part when interpreting borderline cases. Further investigation into the scoring process may explain why studies with identical scores receive different outcomes and whether supplementary criteria are affecting inclusion.

A recently published study by Bernard et al. [109] evaluated the repeatability, reliability, and accuracy of Elicit in one case study. The authors searched Elicit on the 18th of April 2023 and had similar findings to our study. Elicit identified three unique studies that weren't included in the umbrella review but did meet the inclusion criteria. The ability of Elicit to find studies that otherwise wouldn't have been included is a strong argument for Elicit as an adjunct. However, just like our study, Bernard et al. found that the sensitivity of Elicit is poor. As it was only able to find 3 out of 17 studies (17.6%) included in the umbrella review. In our case studies using the same methods of calculation as Bernard et al., Elicit found 25.0%, 24.0%, 66.7% & 29.3% respectively. Bernard et al. conducted their evaluation 2 years ago (April 2023). The higher sensitivities observed in our study may potentially reflect improvements to Elicit over time, although it is still unable to replace a traditional literature search.

Building on from both our study and Bernard et al., Tomczyk et al. [110] evaluated the search ability of Elicit against those of experienced researchers using Scopus and Web of Science, focusing on nine marketing and e‐commerce themed research questions. Their findings were consistent with ours, as they concluded that Elicit demonstrated “great potential in surfacing rare literature that other means miss.” Suggesting that the key question is not whether Elicit can replace traditional search methods, but how it can enhance systematic review processes by complementing conventional approaches.

Responsible AI in Evidence Synthesis (RAISE) is a framework that provides tailored recommendations to ‘ensure the responsible, transparent and ethical use of AI’ [111]. There are two principles in the RAISE framework that are particularly important in maintaining research integrity. One of which is transparency. Although Elicit exposes the prompt, screening criteria, recommendations, and relevance scores. The underlying Semantic Scholar query and version history of pre‐edited criteria remains opaque which limits full reproducibility. A fundamental requirement for systematic reviews and similar evidence syntheses ‐ as outlined in methodological guidance such as the Cochrane Handbook [1], the JBI Manual for Evidence Synthesis [112], and the PRISMA reporting guidelines [113] ‐ is that the search strategies must be fully transparent and reproducible. Since Elicit does not provide access to the complete search strategy, the search process cannot be replicated. Consequently, preventing Elicit's use as the primary search method for these types of evidence syntheses. The second salient point is human oversight (also referred to as the ‘human‐in‐the‐loop’). Bhaumik [114], emphasises the human‐centric modality of deployment where the steps are AI‐enabled rather than fully automated. As AI systems lack moral agency, accountability rests with the people who use them. This is essential to ensure trust in both the evidence synthesis and the conclusions made [111], and to guard against the potential responsibility gap if errors do occur. However, without legal authority, RAISE cannot guarantee compliance. This highlights the need for formal government regulation of AI tools to protect both patients and researchers. Bhaumik [114], puts it very well, “ technology, much like a cat, cannot be put back in the bag once it is out. Let us put a bell to ensure that the cat does not run wild.”

There are a number of limitations of our study. The nature of our study was retrospective as we were analysing Elicit after the evidence syntheses had been carried out. Although we did aim to take into account the difference in timing of the searches using Elicit and the traditional searches, this was not always possible. Also, due to Elicit's lack of history of the pre‐edited criteria, we could not quantify the exact change in the number of eligible studies caused by the minor adjustments we made to the inclusion criteria. Additionally, Elicit is being constantly updated as “if you asked once before the update and once after, you might get two different answers from two different version of Elicit.” [115] This will be a limitation of any evaluation of the performance of Elicit, as the results and conclusions may be different as new versions of Elicit are released.

Strengths of our study include the use of four reviews in different areas of health research which enhanced the generalizability and robustness of our findings. We were also able to identify Elicit's strengths and limitations, and spot patterns with how Elicit works for different topics. Furthermore, there is a lack of prior research on Elicit for systematic searches. So, this study is important in quantifying the sensitivity and precision of Elicit which can help in deciding if Elicit currently is a viable research tool. Further research could include: (1) conducting sensitivity analyses using additional studies identified by Elicit, and (2) exploring whether there is any bias in the types of studies Elicit includes compared with those included by human reviewers.

5. Conclusions

This study highlights the high precision of the current version of Elicit AI, making it useful for writing grant proposals, and conducting scoping and preliminary searches. The unique studies identified in Elicit demonstrate that it can be a useful adjunct to traditional searching, but it is not equipped to replace standard systematic searching due to its low sensitivity across the four reviews. Further independent evaluations of Elicit are needed as Elicit is updated and newer versions are released.

Author Contributions

Oscar Lau: investigation, formal analysis, writing – review and editing, writing – original draft. Su Golder: conceptualization, methodology, investigation, writing – original draft, writing – review and editing, supervision.

Ethics Statement

The authors have nothing to report.

Conflicts of Interest

The authors declare no conflicts of interest.

1. Peer Review

The peer review history for this article is available at https://www.webofscience.com/api/gateway/wos/peer-review/10.1002/cesm.70050.

Acknowledgments

We would like to thank the authors of case study 3 and 4 for their invaluable contributions to screening the new records identified by Elicit.

Lau O., and Golder S., “Comparison of Elicit AI and Traditional Literature Searching in Evidence Syntheses Using Four Case Studies,” Cochrane Evidence Synthesis and Methods 3 (2025): 1‐12, 10.1002/cesm.70050.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • 1. Cochrane , Cochrane Handbook for Systematic Reviews of Interventions (Cochrane.org, 2024), https://training.cochrane.org/handbook. [Google Scholar]
  • 2. CASP , What Is a Systematic Review & Why Are They Important? (CASP ‐ Critical Appraisal Skills Programme, 2023), https://casp-uk.net/news/what-is-a-systematic-review/. [Google Scholar]
  • 3. Young A., LibGuides: Searching for Systematic Reviews: Stages of a Systematic Review, (Library‐Guides.ucl.ac.uk, 2024), https://library-guides.ucl.ac.uk/systematic-reviews/stages. [Google Scholar]
  • 4. Uman L. S., “Systematic Reviews and Meta‐Analyses,” Journal of the Canadian Academy of Child and Adolescent Psychiatry = Journal de l'Academie canadienne de psychiatrie de l'enfant et de l'adolescent 20, no. 1 (2011): 57–59, https://pmc.ncbi.nlm.nih.gov/articles/PMC3024725/. [PMC free article] [PubMed] [Google Scholar]
  • 5. Étienne Fortier‐Dubois , How We Evaluated Elicit Reports. (The Elicit Blog, 2025, March 4), https://blog.elicit.com/elicit-reports-eval/. [Google Scholar]
  • 6. Kung J., “Elicit (Product Review),” Journal of the Canadian Health Libraries Association/Journal de l'Association des bibliothèques de la santé du Canada 44, no. 1 (2023): 15, 10.29173/jchla29657. [DOI] [Google Scholar]
  • 7. Stuhlmueller A. and Byun J., Elicit Raises $22M to Build the Most Trusted AI Platform for Evidence‐Backed Decisions (The Elicit Blog, 2025, February 26), https://blog.elicit.com/series-a/. [Google Scholar]
  • 8. Elicit , Elicit: The AI Research Assistant (Elicit.com, 2023), https://elicit.com/welcome#Features. [Google Scholar]
  • 9. Wright L., Library Guides: Systematic Reviews: How long does a systematic review take? (Libguides.tulane.edu, 2025), https://libguides.tulane.edu/c.php?g=1192346&p=8721961.
  • 10. NICE , How We Develop NICE Guidelines (NICE, 2019). https://www.nice.org.uk/about/what-we-do/our-programmes/nice-guidance/nice-guidelines/how-we-develop-nice-guidelines. [Google Scholar]
  • 11. Elicit , AI for Systematic Literature Reviews ‐ Elicit (Elicit.com, 2025), https://elicit.com/solutions/systematic-reviews. [Google Scholar]
  • 12. National University , AI Statistics and Trends (2024) (National University, 2024, March 1), 131, https://www.nu.edu/blog/ai-statistics-trends/. [Google Scholar]
  • 13. Hooson M., “UK Artificial Intelligence (AI) Statistics and Trends In 2025,” Forbes (2025, January 7), https://www.forbes.com/uk/advisor/business/software/uk-artificial-intelligence-ai-statistics/.
  • 14. Artificial Intelligence . (n.d.). Www.great.gov.uk. https://www.great.gov.uk/campaign-site/uk-na-innovation/sectors/artificial-intelligence/.
  • 15. Woodhams J., Research Your AI Research Tools ‐ UK Research Integrity Office. (Ukrio.org, 2024), https://ukrio.org/ukrio-resources/research-your-ai-research-tools/. [Google Scholar]
  • 16. Clark J., Barton B., Loai A., et al., “Generative Artificial Intelligence Use in Evidence Synthesis: A Systematic Review,” Research Synthesis Methods 16 (2025): 1–19, 10.1017/rsm.2025.16. [DOI] [Google Scholar]
  • 17. Lieberum J.‐L., Toews M., Metzendorf M.‐I., et al., “Large Language Models for Conducting Systematic Reviews: on the Rise, but Not Yet Ready for Use—A Scoping Review,” Journal of Clinical Epidemiology 181 (2025): 111746, 10.1016/j.jclinepi.2025.111746. [DOI] [PubMed] [Google Scholar]
  • 18. Collier A., Heilig L., Schilling L., Williams H., and Dellavalle R. P., “Cochrane Skin Group Systematic Reviews Are More Methodologically Rigorous Than Other Systematic Reviews in Dermatology,” British Journal of Dermatology 155, no. 6 (2006): 1230–1235, 10.1111/j.1365-2133.2006.07496.x. [DOI] [PubMed] [Google Scholar]
  • 19. Windsor B., Popovich I., Jordan V., Showell M., Shea B., and Farquhar C., “Methodological Quality of Systematic Reviews in Subfertility: A Comparison of Cochrane and Non‐Cochrane Systematic Reviews in Assisted Reproductive Technologies,” Human Reproduction 27, no. 12 (2012): 3460–3466, 10.1093/humrep/des342. [DOI] [PubMed] [Google Scholar]
  • 20. Fleming P. S., Seehra J., Polychronopoulou A., Fedorowicz Z., and Pandis N., “Cochrane and Non‐Cochrane Systematic Reviews in Leading Orthodontic Journals: A Quality Paradigm,” European Journal of Orthodontics 35, no. 2 (2012): 244–248, 10.1093/ejo/cjs016. [DOI] [PubMed] [Google Scholar]
  • 21. Moher D., Tetzlaff J., Tricco A. C., Sampson M., and Altman D. G., “Epidemiology and Reporting Characteristics of Systematic Reviews,” PLoS Medicine 4, no. 3 (2007): e78, 10.1371/journal.pmed.0040078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Wen J., Ren Y., Wang L., et al., “The Reporting Quality of Meta‐Analyses Improves: A Random Sampling Study,” Journal of Clinical Epidemiology 61, no. 8 (2008): 770–775, 10.1016/j.jclinepi.2007.10.008. [DOI] [PubMed] [Google Scholar]
  • 23. Delaney A., Bagshaw S. M., Ferland A., Laupland K., Manns B., and Doig C., “The Quality of Reports of Critical Care Meta‐Analyses in the Cochrane Database of Systematic Reviews: An Independent Appraisal*,” Critical Care Medicine 35, no. 2 (2007): 589–594, 10.1097/01.ccm.0000253394.15628.fd. [DOI] [PubMed] [Google Scholar]
  • 24. Byun J. (2025, February 20). Introducing Elicit Systematic Review. The Elicit Blog. https://blog.elicit.com/systematic-review/. [Google Scholar]
  • 25. Golder S. and Loke Y. K., “Sensitivity and Precision of Adverse Effects Search Filters in Medline and EMBASE: A Case Study of Fractures With Thiazolidinediones,” Health Information and Libraries Journal 29, no. 1 (2012): 28–38, 10.1111/j.1471-1842.2011.00972.x. [DOI] [PubMed] [Google Scholar]
  • 26. Golder S., Hartwell G., Barnett L. M., Nash S. G., Petticrew M., and Glover R. E., “Vaping and Harm in Young People: Umbrella Review,” Tobacco Control (2025): tc–2024–059219, 10.1136/tc-2024-059219. [DOI] [PubMed] [Google Scholar]
  • 27. Golder S., Castro A., Dale V., et al., Population‐based Physical Activity Interventions Potentially Provided by Local Authorities in the UK: A Scoping Review Of Economic Analyses. HSR UK Conference: Research Presentation. 2‐3 July 2025, Newcastle University.
  • 28. Gutarra‐Vilchez R. B., Vazquez J. C., Glujovsky D., Lizaraso F., Viteri‐García A., and Martinez‐Zapata M. J., “Vasodilators for Women Undergoing Fertility Treatment,” Cochrane Library 2025, no. 3 (2025): CD010001, 10.1002/14651858.cd010001.pub4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Rocco N., Catanuto G. F., Accardo G., et al., “Implants Versus Autologous Tissue Flaps for Breast Reconstruction Following Mastectomy,” Cochrane Database of Systematic Reviews 2024, no. 10 (2024): CD013821, 10.1002/14651858.cd013821.pub2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Javed S., Usmani S., Sarfraz Z., et al., “A Scoping Review of Vaping, E‐Cigarettes and Mental Health Impact: Depression and Suicidality,” Journal of Community Hospital Internal Medicine Perspectives 12, no. 3 (2022): 33–39, 10.55729/2000-9666.1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Goel S., Shabil M., Kaur J., Chauhan A., and Rinkoo A. V., “Safety, Efficacy and Health Impact of Electronic Nicotine Delivery Systems (ENDS): An Umbrella Review Protocol,” BMJ Open 14, no. 1 (2024): e080274, 10.1136/bmjopen-2023-080274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. MacDonald M., O'Leary R., Stockwell T., and Reist D., “Clearing the Air: Protocol for a Systematic Meta‐Narrative Review on the Harms and Benefits of e‐Cigarettes and Vapour Devices,” Systematic Reviews 5, no. 1 (2016): 85, 10.1186/s13643-016-0264-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Gardner L. A., Rowe A.‐L., Newton N. C., et al., “School‐Based Preventive Interventions Targeting E‐Cigarette Use Among Adolescents: A Systematic Review Protocol,” BMJ Open 12, no. 9 (2022): e065509, 10.1136/bmjopen-2022-065509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Kaltabanis D., Smye V., Oudshoorn A., and Jackson K., Evaluating the Effectiveness of Recovery‐oriented Interventions for Youth Who Vape Nicotine: A Systematic Review Protocol (Bmj.com, 2017). https://bmjopen.bmj.com/content/14/11/e090112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Aisyah W. N., Rahayu A. C., Jasmine D., Ciptaningrum A. D., and Herbawani C. K., “Factor Influencing E‐Cigarette Use and Its Impact on Adolescent Lung Health: Literature Review,” MIRACLE Journal of Public Health 7, no. 2 (2024): 176–190, 10.36566/mjph.v7i2.372. [DOI] [Google Scholar]
  • 36. Banks E., Yazidjoglou A., and Joshy G., “Electronic Cigarettes and Health Outcomes: Epidemiological and Public Health Challenges,” International Journal of Epidemiology 52, no. 4 (2023): 984–992, 10.1093/ije/dyad059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Tobore T. O., “on the Potential Harmful Effects of E‐Cigarettes (EC) on the Developing Brain: The Relationship Between Vaping‐Induced Oxidative Stress and Adolescent/Young Adults Social Maladjustment,” Journal of Adolescence 76, no. 1 (2019): 202–209, 10.1016/j.adolescence.2019.09.004. [DOI] [PubMed] [Google Scholar]
  • 38. Rahim F., Toguzbaeva K., Sokolov D., et al., “Vaping Possible Negative Effects on Lungs: State‐Of‐The‐Art From Lung Capacity Alteration to Cancer,” Cureus 16, no. 10 (2024): 72109, 10.7759/cureus.72109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Yan D., Wang Z., Laestadius L., et al., “A Systematic Review for the Impacts of Global Approaches to Regulating Electronic Nicotine Products,” Journal of Global Health 13 (2023): 04076, 10.7189/jogh.13.04076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Mohapatra S., Wisidagama S., and Schifano F., “Exploring Vaping Patterns and Weight Management‐Related Concerns Among Adolescents and Young Adults: A Systematic Review,” Journal of Clinical Medicine 13, no. 10 (2024): 2896, 10.3390/jcm13102896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Hassanein Z. M., Barker A. B., Murray R. L., Britton J., Agrawal S., and Leonardi‐Bee J., “Impact of Smoking and Vaping in Films on Smoking and Vaping Uptake in Adolescents: Systematic Review and Meta‐Analysis,” Health Education & Behavior: The Official Publication of the Society for Public Health Education 49, no. 6 (2022): 1004–1013, 10.1177/10901981221086944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Mylocopos G., Wennberg E., Reiter A., et al., “Interventions for Preventing E‐Cigarette Use Among Children and Youth: A Systematic Review,” American Journal of Preventive Medicine 66, no. 2 (2024): 351–370, 10.1016/j.amepre.2023.09.028. [DOI] [PubMed] [Google Scholar]
  • 43. Gardner L. A., Rowe A.‐L., Newton N. C., et al., “A Systematic Review and Meta‐Analysis of School‐Based Preventive Interventions Targeting E‐Cigarette Use Among Adolescents,” Prevention Science 25, no. 7 (2024): 1104–1121, 10.1007/s11121-024-01730-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Lee S. J., Rees V. W., Yossefy N., Emmons K. M., and Tan A. S. L., “Youth and Young Adult Use of Pod‐Based Electronic Cigarettes From 2015 to 2019,” JAMA Pediatrics 174, no. 7 (2020): 714, 10.1001/jamapediatrics.2020.0259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Simanjuntak A. M., Putra E., Amalia N. P., Hutapea A., Suyanto S., and Siregar I. E., “Lung and Airway Disease Caused by E‐Cigarette (Vape): A Systematic Review,” Deleted Journal 76, no. 5 (2024): 325–332, 10.33192/smj.v76i5.267185. [DOI] [Google Scholar]
  • 46. Sharma H. and Verma S., “‘Vaping’—a Trojan Horse Against Fight Toward Tobacco Use and Cancer: A Systematic Review of the Existing Evidence,” Indian Journal of Medical and Paediatric Oncology 41, no. 03 (2020): 321–327, 10.4103/ijmpo.ijmpo_11_20. [DOI] [Google Scholar]
  • 47. Glasser A. M., Collins L., Pearson J. L., et al., “Overview of Electronic Nicotine Delivery Systems: A Systematic Review,” American Journal of Preventive Medicine 52, no. 2 (2017): e33–e66, 10.1016/j.amepre.2016.10.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Khouja J. N., Suddell, Steph F., Peters S., Taylor A. E., and Munafò M. R., “Is Ecigarette Use in Nonsmoking Young Adults Associated With Later Smoking? A Systematic Review and Metaanalysis,” MedRxiv 30 (2020): 19007005, 10.1101/19007005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Khouja J. N., Suddell S. F., Peters S. E., Taylor A. E., and Munafò M. R., “Is e‐Cigarette Use in Non‐Smoking Young Adults Associated With Later Smoking? A Systematic Review and Meta‐Analysis,” Tobacco Control 30, no. 1 (2020): 8–15, tobaccocontrol‐2019‐055433. 10.1136/tobaccocontrol-2019-055433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Cheon E., Kim H., Kang N., Park S., Cho S., and Hwang J., “A Systematic Review on Health Impact of Electronic Cigarettes in South Korea,” Journal of the Korean Society for Research on Nicotine and Tobacco 15, no. 2 (2024): 29–41, 10.25055/jksrnt.2024.15.2.29. [DOI] [Google Scholar]
  • 51. Neves F., Guilherme Hastenreiter Aleixo, Lusvarghi M., et al., “Os Riscos Do Uso De Cigarros Eletrônicos Entre Jovens,” Brazilian Journal of Implantology and Health Sciences 6, no. 7 (2024): 218–234, 10.36557/2674-8169.2024v6n7p218-234. [DOI] [Google Scholar]
  • 52. Araújo A., Marinho A., de S., et al., O IMPACTO DO USO DE CIGARROS ELETRÔNICOS NA SAÚDE MENTAL DOS ADOLESCENTES: UMA REVISÃO SISTEMÁTICA – ISSN 1678‐0817 Qualis B2 (Revista Ft, 2024). https://revistaft.com.br/o-impacto-do-uso-de-cigarros-eletronicos-na-saude-mental-dos-adolescentes-uma-revisao-sistematica/. [Google Scholar]
  • 53. Becker T. D., Arnold M. K., Ro V., Martin L., and Rice T., “42.9 Electronic Cigarette Use (Vaping) AND Mental Health Comorbidity: A Systematic Review OF Studies Among Adolescents,” Journal of the American Academy of Child & Adolescent Psychiatry 59, no. 10 (2020): S225–S226, 10.1016/j.jaac.2020.08.328. [DOI] [Google Scholar]
  • 54. Piras S., Maria De Oliveira Latuf G., Esteves A., et al., “Electronic Cigarette Use and Smoking Initiation in Adolescents and Young Adults: Evidence Synthesis Uso Eletrônico De Cigarros E Iniciação De Fumo Em Adolescentes E Jovens: Síntese De Evidências,” Com. Ciências Saúde 31, no. 2 (2020), https://repositoriobce.fepecs.edu.br/handle/prefix/158. [Google Scholar]
  • 55. Dautzenberg B., Legleye S., Underner M., Arvers P., Pothegadoo B., and Bensaidi A., “Systematic Review and Critical Analysis of Longitudinal Studies Assessing Effect of E‐Cigarettes on Cigarette Initiation Among Adolescent Never‐Smokers,” International Journal of Environmental Research and Public Health 20, no. 20 (2023): 6936, 10.3390/ijerph20206936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Aladeokin A. and Haighton C., “Is Adolescent E‐Cigarette Use Associated With Smoking in the United Kingdom?: A Systematic Review With Meta‐Analysis,” Tobacco Prevention & Cessation 5, no. April (2019): 15, 10.18332/tpc/108553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Bravo‐Gutiérrez O. A., Falfán‐Valencia R., Ramírez‐Venegas A., Sansores R. H., Ponciano‐Rodríguez G., and Pérez‐Rubio G., “Lung Damage Caused by Heated Tobacco Products and Electronic Nicotine Delivery Systems: A Systematic Review,” International Journal of Environmental Research and Public Health 18, no. 8 (2021): 4079, 10.3390/ijerph18084079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Baenziger O. N., Ford L., Yazidjoglou A., Joshy G., and Banks E., “E‐Cigarette Use and Combustible Tobacco Cigarette Smoking Uptake Among Non‐Smokers, Including Relapse in Former Smokers: Umbrella Review, Systematic Review and Meta‐Analysis,” BMJ Open 11, no. 3 (2021): e045603, 10.1136/bmjopen-2020-045603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Bourke M., Sharif N., and Narayan O., “Association Between Electronic Cigarette Use in Children and Adolescents and Coughing a Systematic Review,” Pediatric Pulmonology 56 (2021): 3402–3409, 10.1002/ppul.25619. [DOI] [PubMed] [Google Scholar]
  • 60. Hua M. and Talbot P., “Potential Health Effects of Electronic Cigarettes: A Systematic Review of Case Reports,” Preventive Medicine Reports 4 (2016): 169–178, 10.1016/j.pmedr.2016.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Mansfield L., Anokye N., Fox‐Rushby J., and Kay T., “The Health and Sport Engagement (Hase) Intervention and Evaluation Project: Protocol for the Design, Outcome, Process and Economic Evaluation of a Complex Community Sport Intervention to Increase Levels of Physical Activity,” BMJ Open 5, no. 10 (2015): e009276, 10.1136/bmjopen-2015-009276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Audrey S., Cooper A. R., Hollingworth W., et al., “Study Protocol: The Effectiveness and Cost Effectiveness of an Employer‐Led Intervention to Increase Walking During the Daily Commute: The Travel to Work Randomised Controlled Trial,” BMC Public Health 15, no. 1 (2015): 154, 10.1186/s12889-015-1464-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Brown H. E., Whittle F., Jong S. T., et al., “A Cluster Randomised Controlled Trial to Evaluate the Effectiveness and Cost‐Effectiveness of the Goactive Intervention to Increase Physical Activity Among Adolescents Aged 13–14 Years,” BMJ Open 7, no. 9 (2017): e014419, 10.1136/bmjopen-2016-014419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Howlett N., Jones A., Bain L., and Chater A., “How Effective Is Community Physical Activity Promotion In Areas of Deprivation for Inactive Adults With Cardiovascular Disease Risk And/Or Mental Health Concerns? Study Protocol for a Pragmatic Observational Evaluation of the ‘Active Herts’ Physical Activity Programme,” BMJ Open 7, no. 11 (2017): 017783, 10.1136/bmjopen-2017-017783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Tainio M., Monsivais P., Jones N. R., Brand C., and Woodcock J., “Mortality, Greenhouse Gas Emissions and Consumer Cost Impacts of Combined Diet and Physical Activity Scenarios: A Health Impact Assessment Study,” BMJ Open 7, no. 2 (2017): e014199, 10.1136/bmjopen-2016-014199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Grellier J., White M. P., de Bell S., et al., “Valuing the Health Benefits of Nature‐Based Recreational Physical Activity in England,” Environment International 187 (2024): 108667, 10.1016/j.envint.2024.108667. [DOI] [PubMed] [Google Scholar]
  • 67. Gravett N. and Mundaca L., “Assessing the Economic Benefits of Active Transport Policy Pathways: Opportunities From a Local Perspective,” Transportation Research Interdisciplinary Perspectives 11 (2021): 100456, 10.1016/j.trip.2021.100456. [DOI] [Google Scholar]
  • 68. Martin A. (2015). An exploration of the dEterminants and Health Impacts of Active Commuting. https://ueaeprints.uea.ac.uk/id/eprint/57410/1/2015MartinAPhD.pdf.
  • 69. Aldred R., Woodcock J., and Goodman A., “Major Investment in Active Travel in Outer London: Impacts on Travel Behaviour, Physical Activity, and Health,” Journal of Transport & Health 20 (2021): 100958, 10.1016/j.jth.2020.100958. [DOI] [Google Scholar]
  • 70. Audrey S., Fisher H., Cooper A., et al., “Evaluation of an Intervention to Promote Walking During the Commute to Work: A Cluster Randomised Controlled Trial,” BMC Public Health 19, no. 1 (2019): 427, 10.1186/s12889-019-6791-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Snowsill T. M., Stathi A., Green C., et al., “Cost‐Effectiveness of a Physical Activity and Behaviour Maintenance Programme on Functional Mobility Decline in Older Adults: An Economic Evaluation of the React (Retirement in Action) Trial,” Lancet Public Health 7, no. 4 (2022): e327–e334, 10.1016/S2468-2667(22)00030-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Hunter R. F., Gough A., Murray J. M., et al., “A Loyalty Scheme to Encourage Physical Activity in Office Workers: A Cluster Rct,” Public Health Research 7, no. 15 (2019): 1–114, 10.3310/phr07150. [DOI] [PubMed] [Google Scholar]
  • 73. Hunter R. F., Murray J. M., Gough A., et al., “Effectiveness and Cost‐Effectiveness of a Loyalty Scheme for Physical Activity Behaviour Change Maintenance: Results From a Cluster Randomised Controlled Trial,” International Journal of Behavioral Nutrition and Physical Activity 15, no. 1 (2018): 127, 10.1186/s12966-018-0758-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Pretty J. and Barton J., “Nature‐Based Interventions and Mind–Body Interventions: Saving Public Health Costs Whilst Increasing Life Satisfaction and Happiness,” International Journal of Environmental Research and Public Health 17, no. 21 (2020): 7769, 10.3390/ijerph17217769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Candio P., Meads D., Hill A. J., and Bojke L., “Taking a Local Government Perspective for Economic Evaluation of a Population‐Level Programme to Promote Exercise,” Health Policy 125, no. 5 (2021): 651–657, 10.1016/j.healthpol.2021.02.012. [DOI] [PubMed] [Google Scholar]
  • 76. Jago R., Tibbitts B., Willis K., et al., “Effectiveness and Cost‐Effectiveness of the Plan‐A Intervention, a Peer Led Physical Activity Program for Adolescent Girls: Results of a Cluster Randomised Controlled Trial,” International Journal of Behavioral Nutrition and Physical Activity 18, no. 1 (2021): 63, 10.1186/s12966-021-01133-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Gc V. S., Suhrcke M., Atkin A. J., van Sluijs E., and Turner D., “Cost‐Effectiveness of Physical Activity Interventions in Adolescents: Model Development and Illustration Using Two Exemplar Interventions,” BMJ Open 9, no. 8 (2019): e027566, 10.1136/bmjopen-2018-027566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Le Gouais A., Panter J. R., Cope A., et al., “A Natural Experimental Study of New Walking and Cycling Infrastructure Across the United Kingdom: The Connect2 Programme,” Journal of Transport & Health 20 (2021): 100968, 10.1016/j.jth.2020.100968. [DOI] [Google Scholar]
  • 79. Galbraith N., Rose C., and Rose P., “The Roles of Motivational Interviewing and Self‐Efficacy on Outcomes and Cost‐Effectiveness of a Community‐Based Exercise Intervention for Inactive Middle‐Older Aged Adults,” Health & Social Care in the Community 30, no. 4 (2022): 1048, 10.1111/hsc.13510. [DOI] [PubMed] [Google Scholar]
  • 80. Kalita N., Cooper K., Baird J., et al., “Cost‐Effectiveness of a Dietary and Physical Activity Intervention in Adolescents: A Prototype Modelling Study Based on the Engaging Adolescents in Changing Behaviour (EACH‐B) Programme,” BMJ Open 12, no. 8 (2022): e052611, 10.1136/bmjopen-2021-052611. [DOI] [Google Scholar]
  • 81. waleed A. E.‐ghany, Ayman A., and Elashmawy A. A., “Effective of Sildenafil Citrate on Pregnancy Outcome in Infertile Women Undergoing Induction of Ovulation by Letrzole and Clomiphene Citrate,” Al‐Azhar International Medical Journal (Print) (2022), 10.21608/aimj.2022.136667.1930. [DOI] [Google Scholar]
  • 82. Mohammed El‐Khaldy D. M., Saeed Khallaf M., Nour Eldin Hashad A. M., and Amen Elshazly I. S. M., “Effect of Sildenafil in Endometrial Ripening With Induction of Ovulation by Clomiphene Citrate in Polycystic Ovarian Syndrome; Double Blinded; Randomized Controlled Trial,” Women Health Care and Issues 2, no. 3 (2019): 01–10, 10.31579/2642-9756/077. [DOI] [Google Scholar]
  • 83. Taher R. A., Kzar Al‐Essami S. A., and Hussein Kadhim B., “Evaluation of the Effect of Sildenafil on Nitric Oxide Secretion and Improvement of Endometrial Receptivity in Fresh ICSI Cycles,” International Journal of Reproduction, Contraception, Obstetrics and Gynecology 12, no. 4 (2023): 848–852, 10.18203/2320-1770.ijrcog20230780. [DOI] [Google Scholar]
  • 84. El‐Maghrabi H. A., Saad El‐Kasar Y., Elbaz Z. M., and Youssef L., “Effect of Sildenafil Citrate on Endometrial Thickness and Pregnancy Rate in Frozen‐Thawed Embryo Transfer Cycles,” Austin Journal of Obstetrics and Gynecology 7 (2020): ajog‐v7‐id1150. [Google Scholar]
  • 85. Reddy L., Madhavi Y., Khan M., and Tutor C. (2016). Role of Sildenafil in ovulation induction ‐A comparative Study of outcomes With Sildenafil in Ovulation Induction Cycles With Clomiphene Citrate. https://www.iaimjournal.com/storage/2016/12/iaim_2016_0312_04.pdf.
  • 86. Guo M., Yan Y., Lv J., et al., “Clinical Outcomes of Sildenafil Application in Patients of Poor Endometrial Development,” Gynecology and Obstetrics Clinical Medicine 2, no. 1 (2022): 14–19, 10.1016/j.gocm.2022.02.001. [DOI] [Google Scholar]
  • 87. Aboelroose A. A., Ibrahim Z. M., Madny E. H., Elmazzahy A. M., and Taha O. T., “A Randomized Clinical Trial of Sildenafil Plus Clomiphene Citrate to Improve the Success Rate of Ovulation Induction in Patients With Unexplained Infertility,” International Journal of Gynecology & Obstetrics 150, no. 1 (2020): 72–76, 10.1002/ijgo.13159. [DOI] [PubMed] [Google Scholar]
  • 88. Mahran A., Abdelmeged A., Shawki H., Moheyelden A., and Ahmed A. M. (2016). NItric Oxide Donors Improve the Ovulation and Pregnancy Rates in Anovulatory Women With Polycystic Ovary Syndrome Treated With Clomiphene Citrate: A RCT. International Journal of Reproductive Biomedicine, 14(1) 9–14. https://pubmed.ncbi.nlm.nih.gov/27141543/. [PMC free article] [PubMed] [Google Scholar]
  • 89. Doaa El Faham A., Adel K., Bibars M., and Azmy O., “Can Amlodipine Improve the Pre‐Ovulatory Follicle Blood Flow in Women With Polycystic Ovarian Syndrome?,” PubMed 20, no. 2 (2019): 89–94. [PMC free article] [PubMed] [Google Scholar]
  • 90. Farnoush F., Mehrafza M., Mirmansouri A., Oudi M., and Hoseeini A., “Administration of NTG Before Embryo Transfer Does Not Increase Pregnancy Rate,” International Journal of Reproductive BioMedicine (IJRM) 3, no. 2 (2005): 95–100. [Google Scholar]
  • 91. Alderman A. K., Wilkins E. G., Kim H. M., and Lowery J. C., “Complications in Postmastectomy Breast Reconstruction: Two‐Year Results of the Michigan Breast Reconstruction Outcome Study,” Plastic and Reconstructive Surgery 109, no. 7 (2002): 2265–2274, 10.1097/00006534-200206000-00015. [DOI] [PubMed] [Google Scholar]
  • 92. Shiraishi M., Sowa Y., Tsuge I., Kodama T., Inafuku N., and Morimoto N., “Long‐Term Patient Satisfaction and Quality of Life Following Breast Reconstruction Using the Breast‐Q: A Prospective Cohort Study,” Frontiers in Oncology 12 (2022): 815498, 10.3389/fonc.2022.815498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Persichetti P., Barone M., Salzillo R., et al., “Impact on Patient's Appearance Perception of Autologous and Implant Based Breast Reconstruction Following Mastectomy Using BREAST‐Q,” Aesthetic Plastic Surgery 46, no. 3 (2022): 1153–1163, 10.1007/s00266-022-02776-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94. Saulis A. S., Mustoe T. A., and Fine N. A., “A Retrospective Analysis of Patient Satisfaction With Immediate Postmastectomy Breast Reconstruction: Comparison of Three Common Procedures,” Plastic and Reconstructive Surgery 119, no. 6 (2007): 1669–1676, 10.1097/01.prs.0000258827.21635.84. [DOI] [PubMed] [Google Scholar]
  • 95. Ha J. H., Hong K. Y., Lee H.‐B., et al., “Oncologic Outcomes After Immediate Breast Reconstruction Following Mastectomy: Comparison of Implant and Flap Using Propensity Score Matching,” BMC Cancer 20, no. 1 (2020): 78, 10.1186/s12885-020-6568-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96. Timman R., Gopie J. P., Brinkman J. N., et al., “Most Women Recover From Psychological Distress After Postoperative Complications Following Implant or Diep Flap Breast Reconstruction: A Prospective Long‐Term Follow‐Up Study,” PLOS ONE 12, no. 3 (2017): e0174455, 10.1371/journal.pone.0174455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Aliu O., Zhong L., Chetta M. D., et al., “Comparing Health Care Resource Use Between Implant and Autologous Reconstruction of the Irradiated Breast: A National Claims‐Based Assessment,” Plastic & Reconstructive Surgery 139, no. 6 (2017): 1224e1231e, 10.1097/prs.0000000000003336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Kouwenberg C., de Ligt K. M., Kranenburg L. W., et al., “Long‐Term Health‐Related Quality of Life After Four Common Surgical Treatment Options for Breast Cancer and the Effect of Complications: A Retrospective Patient‐Reported Survey Among 1871 Patients,” Plastic and reconstructive surgery 146, no. 1 (2020): 1–13, 10.1097/prs.0000000000006887. [DOI] [PubMed] [Google Scholar]
  • 99. Mak J. C. and Kwong A., “Complications in Post‐Mastectomy Immediate Breast Reconstruction: A Ten‐Year Analysis of Outcomes,” Clinical Breast Cancer 20, no. 5 (2020): 402–407, 10.1016/j.clbc.2019.12.002. [DOI] [PubMed] [Google Scholar]
  • 100. Mioton L. M., Smetona J. T., Hanwright P. J., et al., “Comparing Thirty‐Day Outcomes in Prosthetic and Autologous Breast Reconstruction: A Multivariate Analysis of 13,082 Patients?,” Journal of plastic, Reconstructive & Aesthetic Surgery: JPRAS 66, no. 7 (2013): 917–925, 10.1016/j.bjps.2013.03.009. [DOI] [PubMed] [Google Scholar]
  • 101. Xu F., Sun H., Zhang C., et al., “Comparison of Surgical Complication Between Immediate Implant and Autologous Breast Reconstruction After Mastectomy: A Multicenter Study of 426 Cases,” Journal of Surgical Oncology 118, no. 6 (2018): 953–958, 10.1002/jso.25238. [DOI] [PubMed] [Google Scholar]
  • 102. Qin Q., Tan Q., Lian B., Mo Q., Huang Z., and Wei C., “Postoperative Outcomes of Breast Reconstruction After Mastectomy,” Medicine 97, no. 5 (2018): e9766, 10.1097/md.0000000000009766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103. Fenske R. F. and Otts J. A. A., “Incorporating Generative AI to Promote Inquiry‐Based Learning: Comparing Elicit Ai Research Assistant to PubMed and CINAHL Complete,” Medical Reference Services Quarterly 43, no. 4 (2024): 292–305, 10.1080/02763869.2024.2403272. [DOI] [PubMed] [Google Scholar]
  • 104. Beynon R., Leeflang M. M. G., McDonald S., et al., “Search Strategies to Identify Diagnostic Accuracy Studies in Medline and EMBASE,” Cochrane Database of Systematic Reviews 2013 (2013): MR000022, 10.1002/14651858.mr000022.pub3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105. Charles J. M., Harrington D. M., Davies M. J., et al., “Micro‐Costing and a Cost‐Consequence Analysis of the ‘Girls Active’ Programme: A Cluster Randomised Controlled Trial,” PLOS ONE 14, no. 8 (2019): e0221276, 10.1371/journal.pone.0221276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106. Hu E. S., Pusic A. L., Waljee J. F., et al., “Patient‐Reported Aesthetic Satisfaction With Breast Reconstruction During the Long‐Term Survivorship Perio,” Plastic and Reconstructive Surgery 124, no. 1 (2009): 1–8, 10.1097/prs.0b013e3181ab10b2. [DOI] [PubMed] [Google Scholar]
  • 107. Yueh J. H., Slavin S. A., Adesiyun T., et al., “Patient Satisfaction in Postmastectomy Breast Reconstruction: A Comparative Evaluation of Diep, Tram, Latissimus Flap, Andimplant Techniques,” Plastic and Reconstructive Surgery 125, no. 6 (2010): 1585–1595, 10.1097/prs.0b013e3181cb6351. [DOI] [PubMed] [Google Scholar]
  • 108. Kamel G. N., Mehta K., Nash D., et al., “Patient‐Reported Satisfaction and Quality of Life in Obese Patients: A Comparison Between Microsurgical and Prosthetic Implant Recipients,” Plastic & Reconstructive Surgery 144, no. 6 (2019): 960e–966e, 10.1097/PRS.0000000000006201. [DOI] [PubMed] [Google Scholar]
  • 109. N. Bernard, Jr. , Y. Sagawa, Jr. , Bier N., Lihoreau T., Pazart L., and Tannou T., “Using Artificial Intelligence for Systematic Review: The Example of Elicit,” BMC Medical Research Methodology 25, no. 1 (2025): 75, 10.1186/s12874-025-02528-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110. Tomczyk P., Brüggemann P., Mergner N., and Petrescu M., “Are Ai Tools Better Than Traditional Tools in Literature Searching? Evidence From E‐Commerce Research,” Journal of Librarianship and Information Science (2024), 10.1177/09610006241295802. [DOI] [Google Scholar]
  • 111. Thomas J., Flemyng E., Noel‐Storr A., et al., Responsible AI in Evidence Synthesis (RAISE): Guidance and Recommendations (version 2; updated 3 June (Washington DC: Center for Open Science, 2025). In: Open Science Framework [https://osf.io/]). 10.17605/OSF.IO/FWAUD. [DOI] [Google Scholar]
  • 112. JBI . (2021). JBI Manual for Evidence Synthesis. Refined.site. https://jbi-global-wiki.refined.site/space/MANUAL.
  • 113. PRISMA ., Preferred Reporting Items for Systematic Reviews and Meta‐Analyses (PRISMA) (PRISMA, 2020). https://www.prisma-statement.org/. [Google Scholar]
  • 114. Bhaumik S., “on the Ethical and Moral Dimensions of Using Artificial Intelligence for Evidence Synthesis,” PLOS Global Public Health 5, no. 3 (2025): e0004348, 10.1371/journal.pgph.0004348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115. Why does Elicit provide a different Answer When I ask the Same Question Again? . (2024). Elicit.com. https://support.elicit.com/en/articles/2595329.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from Cochrane Evidence Synthesis and Methods are provided here courtesy of John Wiley & Sons Ltd on behalf of The Cochrane Collaboration

RESOURCES