Abstract
Evidence-based medicine (EBM) is at the forefront of modern healthcare, emphasizing the use of the best available scientific evidence to guide clinical decisions. Due to the sheer volume and rapid growth of medical literature and the high cost of curation, there is a critical need to investigate Natural Language Processing (NLP) methods to identify, appraise, synthesize, summarize, and disseminate evidence in EBM. This survey presents an in-depth review of 129 research studies on leveraging NLP for EBM, illustrating its pivotal role in enhancing clinical decision-making processes. The paper systematically explores how NLP supports the five fundamental steps of EBM – Ask, Acquire, Appraise, Apply, and Assess. The review not only identifies current limitations within the field but also proposes directions for future research, emphasizing the potential for NLP to revolutionize EBM by refining evidence extraction, evidence synthesis, appraisal, summarization, enhancing data comprehensibility, and facilitating a more efficient clinical workflow.
1. Introduction
Evidence-based medicine (EBM) is at the forefront of modern healthcare, emphasizing the use of the best available scientific evidence to guide clinical decisions (Sackett et al., 1996). By integrating clinical expertise, patient values, and the most up-to-date research data, EBM facilitates healthcare decisions by patients and the general public, clinicians, guideline developers, administrators, and policy-makers (Mehta et al., 2022; Kwaan and Melton, 2012; Van de Vliet et al., 2023).
The foundation of EBM heavily relies on comprehensive research data from detailed textual sources such as clinical trial publications, cohort studies, and case reports (Blunt, 2022; Ratnani et al., 2023). Navigating this evidence hierarchy necessitates the use of advanced Natural Language Processing (NLP) techniques, which are crucial for streamlining literature searches and extracting PICO (Patient/Population, Intervention, Comparison, Outcomes) elements (Peng et al., 2023; Nye et al., 2018). From the early utilization of statistical machine learning (Arora et al., 2019) and recurrent neural networks (Guan et al., 2019), there has been a significant shift towards more advanced technologies such as transformer-based frameworks and large language models (LLMs). These modern approaches employ self-supervised pretraining and instruct-tuning (Rohanian et al., 2024) to capture domain-specific knowledge (Kalyan et al., 2022), enhancing the accuracy and scalability of medical information processing (Thirunavukarasu et al., 2023). Particularly, the recent advancements in LLMs have further propelled NLP capabilities within EBM, excelling in more complex tasks such as appraising and synthesizing evidence (Górska and Tacconelli, 2024), differentiating and ranking evidence (Datta et al., 2024), generating human-like responses, answering complex clinical questions (Shiraishi et al., 2024), and identifying relevant clinical trials (Devi et al., 2024a).
Despite these significant advancements, a comprehensive review summarizing NLP development and applications in EBM is still in demand. This paper seeks to fill the gap by offering a thorough review of essential NLP tasks in EBM, with a focus on evidence generation, such as evidence retrieval, extraction, synthesis, and summarization, as well as evidence adoption and evidence-based research, such as question-answering, clinical trial design and identification, and other cutting-edge studies across various clinical specialties.
Furthermore, we outline key benchmarks to facilitate the development of future NLP models. Finally, we explore several potential avenues for future research. To better support both clinicians and researchers in making more informed clinical decisions and producing more comprehensive review literature, we have made these resources publicly available.1
2. Scope and Literature Selection
Our scoping review adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA2) guidelines, as illustrated in Figure 1.
Figure 1:
PRISMA flow diagram.
2.1. Information sources
We searched 4 databases, including PubMed3, IEEE Xplore4, ACM Digital Library5, and ACL Anthology6. The search included studies from the past 5 years, spanning 2019 to 2024.
2.2. Search strategy
Our search strategy was meticulously designed to capture the most relevant studies at the intersection of NLP and EBM (Supplementary File A.3). We targeted key NLP concepts and technologies by including terms such as ‘natural language processing’, ‘language model’, ‘large language model’, ‘computational linguistics’, ‘information extraction’, ‘information retrieval’, ‘clinical trial retrieval’, ‘text summarization’, ‘question answering’, ‘sentence segmentation’, ‘named entity recognition’, ‘tokenization’ and the abbreviations like ‘NLP’ and ‘LLM’. In the domain of EBM, we included terms like ‘Evidence-Based Medicine’, ‘Evidence-Based Practice’, ‘Clinical Trial’ and their abbreviations like ‘EBM’ and ‘EBP’, also limited to appearances in the title or abstract. We used the Boolean operator to combine any word from the NLP domain and any work from the EBM domain in our search terms.
2.3. Study selection and metadata extraction
The references of all eligible studies were imported into Covidence7, and duplicates were removed. We then screened the articles by title and abstract. Inclusion criteria were defined as (1) Studies published in English, (2) research applying NLP techniques specifically for EBM, and (3) Studies focusing on applications for humans. Exclusion criteria were defined as (1) articles unrelated to NLP for EBM, (2) non-English publications, and (3) secondary literature such as systematic reviews, retracted papers, survey papers, case studies, and descriptive papers lacking experimental results.
After the screening, the metadata was extracted from each paper, including models, disease, tasks involved, results, and limitations. Two annotators cross-verified the study selection and metadata extraction processes and consulted a third in cases of disagreement.
2.4. Study Statistics
From an initial pool of 601 papers retrieved from databases and 9 additional sources, we removed 8 duplicates. Subsequently, 386 papers were excluded during the initial screening based on predefined exclusion criteria, and 88 more were removed during full-text screening due to misaligned objectives or lack of relevance to EBM tasks. Ultimately, 129 studies met the inclusion criteria and form the basis of this review, with detailed metadata provided in Supplementary Table 1.
Figure 2 illustrates the distribution of research papers across different years (2019–2024) and their corresponding NLP tasks. There has been a rapid growth of papers over the years, peaking in 2023. The most common tasks throughout the years are Entity Extraction, Classification, and Evaluation, showing their foundational role in NLP for EBM research. Emerging tasks like Question Answering and Quality Assessment have appeared more prominently in recent years, reflecting evolving research directions.
Figure 2:
Distribution of papers in different EBM tasks over time. The color schema is the same as Supplementary Table 1.
3. NLP Techniques for EBM
The entire EBM process consists of five steps, commonly referred to as the ‘5A’s: Ask, Acquire, Appraise, Apply, and Assess (Ratnani et al., 2023). NLP can be leveraged at each step to enhance the process (Table 1). For example, in the Ask step, clinicians or patients formulate precise clinical questions to address specific healthcare concerns. During the Acquire step, NLP can be employed to extract evidence, often leveraging the PICO framework. In the Appraise step, NLP tools can assist in evaluating and ranking the quality, validity, and relevance of the retrieved information to ensure its applicability to clinical decision-making. For the Apply and Assess steps, NLP can streamline the design and identification of relevant clinical trials and facilitate their integration into practice, enabling continuous assessment and refinement of patient care strategies. Detailed trends and advancements for each step of the EBM process are discussed in the following sections.
Table 1:
Mapping of EBM cycle to corresponding NLP tasks.
| EBM cycle | Description | NLP tasks |
|---|---|---|
| Ask | Search & select studies | Question answering, Information retrieval |
| Acquire | Collect data | Named entity recognition and normalization, Relation extraction |
| Appraise | Examine relevance, validity, and results | Quality assessment, Evidence ranking and screening, Evidence synthesis, Evidence summarization |
| Apply & Asses | Apply EBM in practice and research and evaluate their effectiveness | Clinical trial identification and design, Question Answering, Domain-specific applications |
4. Ask - Searching & Selecting Studies
EBM can help researchers and clinicians draft a successful systematic review. After the scope and questions have been determined, the first step is to search for studies to include in the reviews and ensure they remain up to date.
This step is typically achieved using NLP-based information retrieval techniques, which extract relevant information from large text corpora based on user queries. Early heuristic methods involved structured, keyword-based queries to retrieve articles from repositories like MEDLINE or PubMed. These methods, while foundational, are limited by the high cost of expert annotation, maintenance, and domain sensitivity (Névéol et al., 2011). Despite these limitations, recent methods often rely on predefined rule-based strategies, e.g., SR[pt] and CQrs (Navarro-Ruan and Haynes, 2022), to filter and compare the retrieved results for systematic reviews. In addition, while statistical machine learning and context-aware models (Kamath et al., 2021; Samuel et al., 2021) have been widely adopted, they often lack scalability and struggle with less representative text embeddings.
Recent advancements are leaning towards transformer-based deep learning frameworks (Ramprasad et al., 2023a; Jin et al., 2022) due to their scalability and the ability to integrate medical ontologies, improving domain-specific text representation through self-supervised pretraining. For example, (Lokker et al., 2023) used BioBERT’s (Lee et al., 2020) embeddings and attention mechanisms to improve query representation and biomedical literature retrieval in clinical practice. Furthermore, the integration of generative AI models has advanced literature retrieval despite challenges like hallucination. For example, Gwon et al. (2024) compared Microsoft Bing AI and ChatGPT in accelerating the systematic literature search for a clinical review on Peyronie disease treatment, finding both can speed up the search process.
5. Acquire - Collecting Data
EBM is designed to identify all studies relevant to their research questions and synthesize data regarding the study design, risk of bias, and results. Therefore, the findings of EBM heavily depend on decisions about which data from these studies are presented and analyzed. The data collected should be accurate, complete, and accessible for future review, updates, and data-sharing purposes. Here we describe NLP approaches used to extract data directly from journal articles and other studies’ reports.
5.1. Entity Extraction and Normalization
Initially, entity (e.g., PICO) extraction relied on rule-based approaches, which utilize predefined lexical, syntactic, and contextual rules for extracting entities from clinical trial data (Chen et al., 2019c; Borchert et al., 2022). These methods are simple, transparent, and customizable, making them practical for high-precision tasks in structured contexts. Although they face challenges with complex or ambiguous data, their interpretability and ease of adaptation remain valuable for PICO extraction (Dhrangadhariya and Müller, 2023).
RNN/LSTM-based frameworks lacked long-term memory capabilities. Nevertheless, they have been used for sequential sentence classification to enhance context utilization and improve classification accuracy in unstructured or less structured medical abstracts (Jin and Szolovits, 2018).
The current trend is towards the dominance of transformer-based frameworks due to their domain-aware pertaining benefits. For instance, models such as SciBERT and PubMedBERT have been specifically developed for extracting ‘Intervention’ (‘I’ in PICO) (Tsubota et al., 2022), SrBERT (Aum and Choe, 2021) for classifying articles into “included’ or “excluded” categories based on predefined inclusion criteria.
5.2. Relation Extraction
Following the identification of PICO elements, relation extraction approaches can be used to link these elements within studies.
Initially, rule-based and machine-learning methods were used to extract meaningful relationships from medical literature (Alodadi and Janeja, 2019; Borchert et al., 2022). By 2021, transformative methodologies were developed, integrating deep learning frameworks like BERT and Augment Mining (AM). For example, srBERT built-in (Aum and Choe, 2021), identified key elements and defines interrelations from the titles of articles. Stylianou and Vlahavas (2021) classified the relationships between argumentative components within the texts, such as claims and evidence. Their relationships were labeled as ‘supporting’ or ‘opposing’.
In the systematic review process, understanding the connections between different study results can influence the review outcomes. However, besides systematic reviews, automated relation extraction has shifted towards more structured approaches, such as schema-based relation extraction. For example, Sanchez-Graillet et al. (2022) utilized a richly annotated corpus that aligns with the C-TrO ontology. Complementing these advances, graph-based approaches offer a novel way to encode complex relationships between clinical entities. A knowledge graph is a structured representation of information where entities (e.g., symptoms, treatments, drugs) are represented as nodes and their relationships as edges. Graph-based approaches have emerged as an effective method to encode relationships. For example, a knowledge graph was used to organize and visualize relationships among clinical trial entities such as symptoms, treatments, and drug outcomes by structuring data into nodes and edges (Pan et al., 2021).
6. Appraise, Synthesize, and Summarize Evidence
This task screens the included studies for risk of bias and appraises them for quality to ensure that healthcare decisions are informed by the most reliable and relevant evidence. Once the appraisal is complete, the next step is synthesizing evidence by combining findings from multiple studies, often using meta-analyses. Finally, these synthesized insights are summarized into concise, actionable conclusions.
6.1. Quality Assessment
Developing tools to assess evidence is crucial in EBM, such as the fully automated tool that combines machine learning and rule-based techniques by Brassey et al. (2021). It assessed the evidence from randomized clinical trials and systematic reviews by sentiment analysis, indication of bias, and sample size calculation, and used them to estimate the potential effectiveness of the intervention. Besides, deep learning models such as BERT (Devlin et al., 2019) have been used to evaluate the quality of evidence by analyzing article titles and abstracts. For example, different variations like BioBERT, BlueBERT, and BERTBASE were fine-tuned to classify the articles based on their adherence to methodological quality criteria (Lokker et al., 2023).
6.2. Evidence Ranking and Screening
After the quality assessment, the next step is to screen and rank the evidence. Several ranking methods are available, with statistical-based methods being among the earliest used. For example, Norman et al. (2019b) developed a method to rank references by their likelihood of relevance. Compared with randomized screening, their study showed that prioritization methods (with technological assistance) allow for fewer studies to be screened while still producing reliable results, which effectively reduces both the time and cost associated with the screening process. Rybinski et al. (2020a) introduced the platform A2A, which used Okapi Best Match 25 (BM25) that assigned scores to documents based on term frequency and document length and Divergence from Randomness (DFR) that quantified informativeness as the divergence of a term’s distribution from randomness for document ranking. Additionally, machine learning methods are implemented. Rybinski et al. (2020b) designed a search system with a simple query formulation strategy for initial ranking and used pre-trained BERT models (SciBERT, BioBERT, and BlueBERT) for re-ranking in clinical trial searches, which improved the robustness.
6.3. Evidence Synthesis
Evidence synthesis combines data from included studies to draw conclusions about a body of evidence. While the most common method used is meta-analysis, which statistically combines results from studies to estimate overall effect sizes, NLP-based approaches have also been applied to synthesize studies or findings. Mutinda et al. (2022b) proposed a method to reproduce meta-analysis, computing summary statistics (e.g., risk ratio) and visualizing results using forest plots by extracting and normalizing PICO elements from breast cancer randomized controlled trials. However, this method is built on a small amount of data. Górska and Tacconelli (2024) developed a system to continuously update summary statistics from key publications, further improving the meta-analysis process. However, only binary outcomes were supported in both methods, limiting the applicability to broader meta-analysis needs. Besides meta-analysis, EvidenceMap (Kang et al., 2023) effectively synthesized medical findings by employing a structured and hierarchical representation comprising Entities, Propositions, and Maps that enhances the interpretability and retrievability of evidence through its sophisticated semantic relational retranslation.
6.4. Evidence Summarization
Finally, EBM must present a clear statement of findings or conclusions to help people make better-informed decisions and increase usability. This summary should include information on all important outcomes, evidence certainty, and the intervention’s desirable and undesirable consequences.
From an NLP technical perspective, evidence summarization uses extractive and abstractive strategies. Extractive summarization selects the most important sentences from the original text. Gulden et al. (2019) generated a new dataset from clinicaltrials.gov to test various algorithms (e.g., LexRank, TextRank, and Latent Semantic Analysis), identifying TextRank as the best performer in creating summaries directly from the source texts without altering the original wording. However, these algorithms suffered from inefficiency and high computational complexity when processing large datasets. Sarker et al. (2020) developed a lightweight system that leverages Maximal Marginal Relevance (MMR) and pre-trained word embeddings trained on PubMed and PMC texts to integrate semantic relevance and reduce redundancy. Similarly, Xie et al. (2022) proposed a knowledge infusion training framework called KeBioSum, which incorporated PICO into pre-trained language models (PLMs). It utilized lightweight knowledge adapters to reduce computational costs while improving semantic understanding and contextual representation.
Abstractive summarization focuses on the most critical information and creates new text for the summary; usually, more advanced techniques are used. Lalitha et al. (2023) have implemented sophisticated techniques such as neural network-based model T5 (Text-to-Text Transfer Transformer), BART (Bidirectional Auto-Regressive Transformer), and PEGASUS (Pre-training with Extracted Gap-sentences for Abstractive Summarization Sequence-to-sequence) to resolve the challenge of obtaining useful information from a vast amount of clinical documents.
With the increased demand for user-interacted summarization, Ramprasad et al. (2023b) presented TrialsSummarizer, a system that helps automate summarizing the most relevant evidence in a set of randomized controlled trials by a multi-headed architecture, enabling each token in the generated summary to be explicitly linked to specific input aspects (e.g., population, intervention, or outcome). It introduces template-infilling capabilities, allowing users to correct or adjust generated summaries dynamically. Moreover, the application of LLMs has evolved to address these tasks with growing precision and depth. Hamed et al. (2023) explored ChatGPT’s capabilities in synthesizing diabetic ketoacidosis (KDA) guidelines by comparing, integrating, and abstracting content. Unlu et al. (2024) employed a Retrieval-Augmented Generation (RAG) framework with GPT-4 for generating responses to clinical trial eligibility questions based on retrieved patient data. Furthermore, TriSum (Jiang et al., 2024) stood out by using structured rationale-based abstractive summarization, where large language models generate aspect-triple rationales that are distilled into smaller models through a dual-scoring selection mechanism and curriculum learning.
7. Apply and Assess: adoption, refinement, and research
Transitioning from the evidence generation and synthesis, the next critical step is its adoption and refinement, facilitated by an ‘Evidence-based Research’ approach. Adoption and refinement are crucial to consistently reassessing and enhancing clinical evidence, particularly when existing evidence gaps lead to unmet needs of clinicians and patients. Evidence-based research further ensures that these gaps inform future clinical studies. Here, we summarize several applications identified from our literature review that align with this topic.
7.1. Specialty-specific adoption
In addition to general applications, we observed that NLP for EBM has been applied within specific medical specialties. Here, we summarized common specialties featured in the papers, such as oncology for conditions like Non-small cell lung cancer (NSCLC) and cardiovascular events such as heart failure. Other diseases are detailed in Supplementary Table 1.
Oncology.
Cancer is a central topic in EBM, as it demands continuous integration of new research findings to guide evidence-based decisions for accurate diagnosis, effective treatment, and long-term patient management. Saiz et al. (2021) introduced Watson Oncology Literature Insights (WOLI), an AI system, by automatically identifying, prioritizing, and extracting relevant oncology research, which facilitated the translation of evidence into clinical practice. Similarly, the Clinical Trial Matching (CTM) system (Alexander et al., 2020) was evaluated at a cancer center in Australia with an overall accuracy of 92% for screening lung cancer patients. These tools highlight how AI-driven systems are increasingly embedded in hospital workflows.
Cardiology.
Cardiology demands robust evidence to support the decision due to the high prevalence and the critical consequences of diagnostic errors, which can result in severe harm or loss of life. For example, the hybrid model proposed by Tun et al. (2023) exemplifies clinical practice by automating patient eligibility assessment directly within clinical workflows. In a real-world application on a dataset of 40,000 patients across several clinical care pathways, such as heart failure with reduced and preserved ejection fraction and atrial fibrillation, this model was deployed and achieved an impressive accuracy of 87.3%.
7.2. Clinical trial design and identification
Not all medical specialties are fully addressed by current research, and even in those with significant focus, the integration of findings into real-world guidance remains insufficient. Automating clinical trial procedures is critical for instant reaction to pandemics or public health emergencies. A crucial step in advancing future clinical trials or experiments is the design phase, where NLP plays a pivotal role. Effective clinical trial design involves structuring and optimizing trials to ensure they align with patient needs and research objectives. NLP tools can enhance the efficiency of clinical trial design by facilitating the automated matching of patients to suitable trials and ensuring trials are aligned with the right patient cohorts. This capability supports a more effective and timely deployment of research resources in emergency health situations.
Eligibility matching and cohort identification is a process of matching patients to clinical trials based on their eligibility and identifying groups of patients (cohorts) who meet specific criteria for inclusion in clinical trials. There are several applications. Vydiswaran et al. (2019) proposed a hybrid approach to identify patient cohorts for clinical trials, which combines pattern-based, knowledge-intensive, and feature-weighting techniques to determine if patients meet specific selection criteria. Segura-Bedmar and Raez (2019) explored the use of deep learning models for cohort selection, framing it as a multi-label classification task. By employing CNNs and RNNs to process free-text eligibility criteria, this method allows for automatic learning of representations directly from text. Building on these foundations, Liu et al. (2022) developed Criteria2Query (C2Q) to extract and transform free-text eligibility criteria into structured, queryable data for cohort identification. More recently, Murcia et al. (2024) proposed the “TrialMatcher” algorithm to match veterans for clinical trials using existing information within EHRs. It extracted attributes from patient profiles and eligibility criteria from trial profiles and compared them using the Sørensen-Dice Index (SDI). These applications show the potential of streamlining the process of recruitment and improving future clinical trial design. Now, researchers try to add LLMs to the studies. LLMs like GPT-3.5 or GPT-4 enhance clinical trial workflows by processing complex natural language data, such as patient profiles and trial eligibility criteria. The examples include AutoTrial (Wang et al., 2023b), focusing on trial design, specifically generating eligibility criteria using multi-step reasoning and hybrid prompting and TrialGPT (Jin et al., 2024), implementing a comprehensive framework for large-scale patient-trial matching, emphasizing real-world deployment and time-saving efficiency.
7.3. Drug repurposing
Another frontier application in this field is drug repurposing, which utilizes NLP to analyze existing medical literature and uncover new therapeutic applications for established drugs. By automating the analysis of large datasets such as clinical trials and research papers, NLP speeds up the identification of potential treatments, offering a faster and more cost-effective alternative to traditional drug discovery methods. During the COVID-19 pandemic, there is an urgent need for drugs for treatment. To quickly meet this requirement, the CovidX Network Algorithm Gates and Hamed (2020) was developed, which utilized NLP to analyze vast COVID-19 biomedical literature. It ranked potential drug candidates for repurposing, highlighting NLP’s power in automating and accelerating evidence synthesis during critical times. Alzheimer’s disease (AD), a progressive neurodegenerative disorder, remains a major global health challenge with limited treatment options and no definitive cure. Despite significant investment in drug development, the failure rate for Alzheimer’s-specific drugs in clinical trials remains exceedingly high. To address this, Daluwatumulle et al. (2022) employed knowledge graph embeddings to predict AD drug candidates by linking textual data and generating hypotheses from unstructured information.
7.4. Question Answering
While EBM is taught according to the five steps: ask, acquire, appraise, apply, and evaluate, a recent trend of application with the advancement in LLMs focuses on treating the entire process as a question-answering (QA) task. Xie et al. (2023) experimented with the consultation of rhinoplasty questions to ChatGPT, which pre-learned knowledge and summarized texts to respond, testing the potential of LLMs to offer valuable feedback. Moreover, Mohammed and Fiaidhi (2024) added bootstrapping to BioBERT and BioGPT so that they could better understand PICO questions from physicians and find potential answers from publications. Expanding on this trend, Chuan and Morgan (2021) introduced Chatbot SOPHIA, which helps users understand their eligibility for clinical trials by answering questions based on trial criteria. Addressing rare cancers, Jang et al. (2022) fine-tuned SAPBERT for QA and NER tasks, ultimately summarizing potential drugs ranked by relevance, such as bevacizumab, temozolomide, lomustine, and nivolumab.
8. EBM Benchmark dataset
Here, we summarize the benchmarks used in NLP and EBM (Supplementary Table 2). The tasks frequently involved with these benchmarks are Evidence Retrieval, Evidence Extraction, and Clinical Trial Identification. There is a notable gap in datasets specifically tailored for Evidence Synthesis and Appraisal, as well as Question Answering. The existing datasets are often built upon general texts rather than medical-specific content. For example, CNN-DailyMail (See et al., 2017) is used for Evidence Summarization, but it is not medical-related. We also noticed that the primary data sources for these benchmarks are scholarly articles from PubMed and clinical trials.
9. Challenges and Future Directions
EBM is an important, rewarding, and dynamic field that organizes current data to improve healthcare decision-making. By integrating the best available evidence with a healthcare professional’s experience and the patient’s values, EBM aims to optimize health outcomes. Our focus here is on retrieving, extracting, appraising, synthesizing, and summarizing evidence from biomedical literature such as clinical trials, cohort studies, and case reports. However, conducting these analyses can be both demanding and time-consuming. In this study, we explore key NLP techniques that can streamline and facilitate this process.
Our review indicates that NLP-based systems or pipelines have achieved impressive results in EBM, such as extracting entities like PICO, enhancing the information retrieval engines, automating the evidence synthesis, assessing evidence quality, ranking the evidence with the highest confidence, summarizing the information, and answering questions. At the same time, as in any other evolving area, there remain challenges ahead. For example, generative models in EBM tasks have demonstrated impressive fluency and scalability, yet their tendency to hallucinate facts, lack source attribution, and sensitivity to prompt phrasing remain significant limitations for clinical use. A core challenge is the validation and trustworthiness of generated outputs, especially in high-stakes domains like medicine. Mechanisms such as RAG offer potential mitigations but require further development and evaluation.
From another perspective, particularly in handling diseases with limited literature or annotated data (Ge et al., 2023). Few-shot learning holds significant potential, as it enables models to generalize effectively from a small number of examples, reducing the dependency on large, annotated datasets. This data-efficient approach is crucial for EBM tasks in under-researched areas, such as rare diseases, where annotated resources are scarce. Few-shot learning can help these models adapt quickly to specific clinical needs, allowing for more accurate information extraction, question answering, and evidence synthesis, even with minimal training data.
Additionally, there is a pressing need for more benchmark datasets, especially for Evidence Synthesis and Appraisal and Question Answering. Current resources often rely on general corpora rather than those specifically oriented toward medical content, limiting the development of specialized NLP applications. Researchers can consider and build more meaningful datasets. Moreover, NLP-based tools have not yet been widely applied across all medical specialties, such as Urology and Hepatology, indicating room for expansion in these areas.
Another future direction for NLP in EBM involves incorporating real-world data from various sources, such as mobile devices, social media, and genomics. These data sources capture rich and diverse information beyond traditional clinical records, offering valuable insights into patient behaviors, lifestyle, environmental factors, and genetic predispositions. For example, data from mobile health apps and wearable devices can provide real-time health metrics. At the same time, social media posts may reveal patient self-reported outcomes or experiences that are often missed in clinical settings. Integrating genomic data adds another layer, enabling family history and personalized genomic code into disease risk and treatment response.
Furthermore, the “black box” nature of many NLP models limits their interpretability and accountability. Biases within training data can restrict NLP’s effectiveness and fairness across diverse patient demographics. Additionally, the high computational demands and the need for domain expertise in both NLP and healthcare are resource-intensive.
To fully realize the potential of NLP for EBM in real-world clinical workflows often involve interdisciplinary scenarios that span multiple conditions, comorbidities, and patient subpopulations. To address these complexities, NLP systems for EBM must evolve toward more holistic, adaptable frameworks capable of reasoning across diverse clinical questions and integrating heterogeneous data sources.
Addressing these limitations is important for enabling efficiency and ultimately contributing to a safer, more equitable healthcare landscape.
10. Conclusion
Our comprehensive review of over 600 papers resulted in the selection of 129 studies that focus on critical aspects of NLP within EBM. We first provide an overview of EBM, followed by a survey of NLP methods and techniques that address each step of the EBM process. We also explore use cases that demonstrate the application of EBM in various scenarios. Additionally, we review popular datasets and benchmarks. Finally, we present open challenges and future directions for research in this field. As NLP technologies evolve, they offer promising prospects for harnessing vast amounts of unstructured data, thus supporting clinical and research applications.
Limitations
Our study primarily focuses on English-language publications, potentially overlooking important research published in other languages. The inclusion criteria may have excluded studies indirectly related to EBM and NLP that could provide valuable insights. Additionally, our analysis only covers articles published between 2019 and 2024, which may have led to the omission of significant earlier works that contributed to the foundation of this field. Furthermore, the databases and search engines used in this review are limited, and it is possible that some relevant studies on NLP for EBM during the specified period were not identified.
Supplementary Material
Acknowledgments
This project was sponsored by the National Library of Medicine grants R01LM014344 and R01LM014573.
Footnotes
References
- Al Hafiz Khan Md Abdullah, Shamsuzzaman Md, Hasan Sadid A., Sorower Mohammad S, Liu Joey, Datla Vivek, Milosevic Mladen, Mankovich Gabe, van Ommering Rob, and Dimitrova Nevenka. 2019. Improving disease named entity recognition for clinical trial matching. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2541–2548. [Google Scholar]
- Alexander Marliese, Solomon Benjamin, Ball David L, Sheerin Mimi, Dankwa-Mullan Irene, Preininger Anita M, Jackson Gretchen Purcell, and Herath Dishan M. 2020. Evaluation of an artificial intelligence clinical trial matching system in australian lung cancer patients. JAMIA Open, 3(2):209–215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alodadi Mohammad S. and Janeja Vandana P.. 2019. Linking knowledge discovery in clinical notes and massive biomedical literature repositories. In 2019 IEEE International Conference on Big Data. IEEE. [Google Scholar]
- Arora Paul, Boyne Devon, Slater Justin J., Gupta Alind, Brenner Darren R., and Druzdzel Marek J.. 2019. Bayesian networks for risk prediction using real-world data: A tool for precision medicine. Value in Health, 22(4):439–445. [DOI] [PubMed] [Google Scholar]
- Aum Sungmin and Choe Seon. 2021. srBERT: automatic article classification model for systematic review using BERT. Syst. Rev, 10(1):285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beattie Jacob, Neufeld Sarah, Yang Daniel, Chukwuma Christian, Gul Ahmed, Desai Neil, Jiang Steve, and Dohopolski Michael. 2024. Utilizing large language models for enhanced clinical trial matching: A study on automation in patient screening. Cureus, 16(5):e60044. [Google Scholar]
- Beck J Thaddeus, Rammage Melissa, Jackson Gretchen P, Preininger Anita M, Dankwa-Mullan Irene, Roebuck M Christopher, Torres Adam, Holtzen Helen, Coverdill Sadie E, Williamson M Paul, Chau Quincy, Rhee Kyu, and Vinegra Michael. 2020. Artificial intelligence tool for optimizing eligibility screening for clinical trials in a large community cancer center. JCO Clin. Cancer Inform, 4(4):50–59. [Google Scholar]
- Blunt Chris. 2022. The pyramid schema: The origins and impact of evidence pyramids. SSRN Electron. J [Google Scholar]
- Borchert Florian, Lohr Christina, Modersohn Luise, Langer Thomas, Follmann Markus, Sachs Jan Philipp, Hahn Udo, and Schapranow Matthieu-P.. 2020. GGPONC: A corpus of German medical text with rich metadata based on clinical practice guidelines. In Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, pages 38–48, Online. Association for Computational Linguistics. [Google Scholar]
- Borchert Florian, Meister Laura, Langer Thomas, Follmann Markus, Arnrich Bert, and Schapranow Matthieu-P. 2022. Controversial trials first: Identifying disagreement between clinical guidelines and new evidence. In AMIA Annual Symposium Proceedings, pages 237–246. [Google Scholar]
- Brassey Jon, Price Christopher, Edwards Jonny, Zlabinger Markus, Bampoulidis Alexandros, and Hanbury Allan. 2021. Developing a fully automated evidence synthesis tool for identifying, assessing and collating the evidence. BMJ Evid. Based Med, 26(1):24–27. [Google Scholar]
- Brockmeier Austin J., Ju Meizhi, Przybyła Piotr, and Ananiadou Sophia. 2019. Improving reference prioritisation with pico recognition. BMC Medical Informatics and Decision Making, 19:256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai Tianrun, Cai Fiona, Dahal Kumar P., Cremone Gabrielle, Lam Ethan, Golnik Charlotte, Seyok Thany, Hong Chuan, and Liao Katherine P.. 2021. Improving the efficiency of clinical trial recruitment using an ensemble machine learning to assist with eligibility screening. ACR Open Rheumatology, 3:593–600. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Campillos-Llanos Leonardo, Valverde-Mateos Ana, Capllonch-Carrión Adrián, and Moreno-Sandoval Antonio. 2021. A clinical trials corpus annotated with umls entities to enhance the access to evidence-based medicine. BMC Medical Informatics and Decision Making, 21(69). This article has been corrected. See BMC Med Inform Decis Mak. 2021 Apr 7;21:118. [Google Scholar]
- Chen Boyu, Jin Hao, Yang Zhiwen, Qu Yingying, Weng Heng, and Hao Tianyong. 2019a. An approach for transgender population information extraction and summarization from clinical trial text. BMC Med. Inform. Decis. Mak, 19(Suppl 2):62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Chi-Jen, Warikoo Neha, Chang Yun Chun, and Chen. 2019b. Medical knowledge infused convolutional neural networks for cohort selection in clinical trials. Journal of the american medical informatics association, 26(11):1227–1236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Long, Gu Yu, Ji Xin, Lou Chao, Sun Zhiyong, Li Haodan, Gao Yuan, and Huang Yang. 2019c. Clinical trial cohort selection based on multi-level rule-based natural language processing system. Journal of the American Medical Informatics Association, 26(11):1218–1226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chuan Ching-Hua and Morgan Susan. 2021. Creating and evaluating chatbots as eligibility assistants for clinical trials: An active deep learning approach towards user-centered classification. ACM Trans. Comput. Healthc, 2(1):1–19. [Google Scholar]
- Cunningham Jonathan W, Singh Pulkit, Reeder Christopher, Claggett Brian, Marti-Castellote Pablo M, Lau Emily S, Khurshid Shaan, Batra Puneet, Lubitz Steven A, Maddah Mahnaz, Philippakis Anthony, Desai Akshay S, Ellinor Patrick T, Vardeny Orly, Solomon Scott D, and Ho Jennifer E. 2024. Natural language processing for adjudication of heart failure in a multicenter clinical trial: A secondary analysis of a randomized clinical trial. JAMA Cardiol., 9(2):174–181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daluwatumulle Geesa, Wijesinghe Rupika, and Weerasinghe Ruvan. 2022. In silico drug repurposing using knowledge graph embeddings for alzheimer’s disease. In Proceedings of the 9th International Conference on Bioinformatics Research and Applications, pages 61–66, New York, NY, USA. ACM. [Google Scholar]
- Datta Surabhi, Lee Kyeryoung, Paek Hunki, Manion Frank J., Ofoegbu Nneka, Du Jingcheng, Li Ying, Huang Liang-Chin, Wang Jingqi, Lin Bin, Xu Hua, and Wang Xiaoyan. 2024. Autocriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. Journal of the American Medical Informatics Association, 31(2):375–385. Published: 11 November 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deng Yang, Li Yaliang, Shen Ying, Du Nan, Fan Wei, Yang Min, and Lei Kai. 2019. Medtruth: A semi-supervised approach to discovering knowledge condition information from multi-source medical data. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ‘19, page 719–728, New York, NY, USA. Association for Computing Machinery. [Google Scholar]
- Devi Arti, Uttrani Shashank, Singla Aryansh, Jha Sarthak, Dasgupta Nataraj, Natarajan Sayee, Punekar Rajeshwari S, Pickett Larry A, and Dutt Varun. 2024a. Quantitative analysis of GPT-4 model: Optimizing patient eligibility classification for clinical trials and reducing expert judgment dependency. In Proceedings of the 2024 8th International Conference on Medical and Health Informatics, pages 230–237, New York, NY, USA. ACM. [Google Scholar]
- Devi Arti, Uttrani Shashank, Singla Aryansh, Jha Sarthak, Dasgupta Nataraj, Natarajan Sayee, Punekar Rajeshwari S., Pickett Larry A., and Dutt Varun. 2024b. Automating clinical trial eligibility screening: Quantitative analysis of GPT models versus human expertise. In Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments, New York, NY, USA. ACM. [Google Scholar]
- Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics. [Google Scholar]
- DeYoung Jay, Beltagy Iz, Madeleine van Zuylen Bailey Kuehl, and Wang Lucy. 2021. MŜ2: Multi-document summarization of medical studies. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7494–7513, Stroudsburg, PA, USA. Association for Computational Linguistics. [Google Scholar]
- Dhayne Houssein, Kilany Rima, Haque Rafiqul, and Taher Yehia. 2021. Emr2vec: Bridging the gap between patient data and clinical trial. Computers & Industrial Engineering, 156:107236. [Google Scholar]
- Dhrangadhariya Anjani, Hilfiker Roger, Schaer Roger, and Müller Henning. 2020. Machine learning assisted citation screening for systematic reviews. Digital Personalized Health and Medicine. [Google Scholar]
- Dhrangadhariya Anjani, Manzo Gaetano, and Müller Henning. 2024. Pico to picos: Weak supervision to extend datasets with new labels. Digital Health and Informatics Innovations for Sustainable Health Care Systems, 316. [Google Scholar]
- Dhrangadhariya Anjani and Müller Henning. 2023. Not so weak pico: leveraging weak supervision for participants, interventions, and outcomes recognition for systematic review automation. JAMIA Open, 6(1). [Google Scholar]
- Do Nhan V, Elbers Danne C, Fillmore Nathanael R, Ajjarapu Samuel, Bergstrom Steven J, Bihn John, Corrigan June K, Dhond Rupali, Dipietro Svitlana, Dolgin Arkadiy, Feldman Theodore C, Goryachev Sergey D, Huhmann Linden B, La Jennifer, Marcantonio Paul A, McGrath Kyle M, Miller Stephen J, Nguyen Vinh Q, Schneeloch George R, Sung Feng-Chi, Swinnerton Kaitlin N, Tarren Amelia H, Tosi Hannah M, Valley Danielle, Vo Austin D, Yilsdirim Cenk, Zheng Chunlei, Zwolinski Robert, Sarosy Gisele A, Loose David, Shannon Colleen, and Brophy Mary T. 2024. Matching patients to accelerate clinical trials (MPACT): Enabling technology for oncology clinical trial workflow. Stud. Health Technol. Inform, 310:1086–1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobbins Nicholas J., Han Bin, Zhou Weipeng, Lan Kristine F., Kim H. Nina, Harrington Robert, Uzuner Özlem, and Yetisgen Meliha. 2023. LeafAI: query generator for clinical cohort discovery rivaling a human programmer. Journal of the American Medical Informatics Association, 30(12):1954–1964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobbins Nicholas J, Mullen Tony, Uzuner Özlem, and Yetisgen Meliha. 2022. The leaf clinical trials corpus: a new resource for query generation from clinical trial eligibility criteria. Sci. Data, 9(1):490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Du Jingcheng, Wang Qing, Wang Jingqi, Ramesh Prerana, Xiang Yang, Jiang Xiaoqian, and Tao Cui. 2021. COVID-19 trial graph: a linked graph for COVID-19 clinical trials. J. Am. Med. Inform. Assoc, 28(9):1964–1969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eldawlatly Abdelazeem, Alshehri Hussain, Alqahtani Abdullah, Ahmad Abdulaziz, Al-Dammas Fatma, and Marzouk Amir. 2018. Appearance of population, intervention, comparison, and outcome as research question in the title of articles of three different anesthesia journals: A pilot study. Saudi Journal of Anaesthesia, 12(2):283–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fang Yilu, Kim Jae Hyun, Idnay Betina Ross, Garcia Rebeca Aragon, Castillo Carmen E., Sun Yingcheng, Liu Hao, Liu Cong, Yuan Chi, and Weng Chunhua. 2021. Participatory design of a clinical trial eligibility criteria simplification method. Studies in Health Technology and Informatics, 281:984–988. [DOI] [PubMed] [Google Scholar]
- Gates Lyndsey Elaine and Hamed Ahmed Abdeen. 2020. The anatomy of the SARS-CoV-2 biomedical literature: Introducing the CovidX network algorithm for drug repurposing recommendation. J. Med. Internet Res, 22(8):e21169. [Google Scholar]
- Ge Yao, Guo Yuting, Das Sudeshna, Al-Garadi Mohammed Ali, and Sarker Abeed. 2023. Few-shot learning for medical text: A review of advances, trends, and opportunities. J. Biomed. Inform, 144(104458):104458. [Google Scholar]
- Ghosh Madhusudan, Mukherjee Shrimon, Ganguly Asmit, Basuchowdhuri Partha, Naskar Sudip Kumar, and Ganguly Debasis. 2024a. AlpaPICO: Extraction of PICO frames from clinical trial documents using LLMs. Methods, 226:78–88. [DOI] [PubMed] [Google Scholar]
- Ghosh Madhusudan, Mukherjee Shrimon, Santra Payel, Na Girish, and Basuchowdhuri Partha. 2024b. BLINKtextsubscriptLSTM: BioLinkBERT and LSTM based approach for extraction of PICO frame from clinical trial text. In Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD), New York, NY, USA. ACM. [Google Scholar]
- Guan Meijian, Cho Samuel, Petro Robin, Zhang Wei, Pasche Boris, and Topaloglu Umit. 2019. Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes. JAMIA Open, 2(1):139–149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gulden Christian, Kirchner Melanie, Schüttler Christina, Hinderer Marc, Kampf Marvin, Prokosch Hans-Ulrich, and Toddenroth Dennis. 2019. Extractive summarization of clinical trial descriptions. Int. J. Med. Inform, 129:114–121. [DOI] [PubMed] [Google Scholar]
- Gwon YN, Kim JH, Chung HS, Jung EJ, Chun J, Lee S, and Shim SR. 2024. The use of generative ai for scientific literature searches for systematic reviews: Chatgpt and microsoft bing ai performance evaluation. JMIR Medical Informatics, 12:e51187. [Google Scholar]
- Górska Anna and Tacconelli Evelina. 2024. Towards autonomous living meta-analyses: A framework for automation of systematic review and meta-analyses. Stud. Health Technol. Inform, 316:378–382. [Google Scholar]
- Hamed Ehab, Eid Ahmad, and Alberry Medhat. 2023. Exploring ChatGPT’s potential in facilitating adaptation of clinical guidelines: A case study of diabetic ketoacidosis guidelines. Cureus, 15(5):e38784. [Google Scholar]
- Hassanzadeh Hamed, Karimi Sarvnaz, and Nguyen Anthony. 2020. Matching patients to clinical trials using semantically enriched document representation. Journal of Biomedical Informatics, 105:103406. [Google Scholar]
- Horst Hendrik Ter, Brazda Nicole, Schira-Heinen Jessica, Krebbers Julia, Müller Hans-Werner, and Cimiano Philipp. 2023. Automatic knowledge graph population with model-complete text comprehension for pre-clinical outcomes in the field of spinal cord injury. Artificial Intelligence in Medicine, 137:102491. [Google Scholar]
- Hu Yan, Keloth Vipina K, Raja Kalpana, Chen Yong, and Xu Hua. 2023. Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach. Bioinformatics, 39(9):btad542. [Google Scholar]
- Huang Andy S., Hirabayashi Kyle, Barna Laura, Parikh Deep, and Pasquale Louis R.. 2024. Assessment of a large language model’s responses to questions and cases about glaucoma and retina management. JAMA Ophthalmology, 142(4):371–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jang Bum-Sup, Park Andrew J, and Kim In Ah. 2022. Exploration of biomedical knowledge for recurrent glioblastoma using natural language processing deep learning models. BMC Med. Inform. Decis. Mak, 22(1):267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang Pengcheng, Xiao Cao, Wang Zifeng, Bhatia Parminder, Sun Jimeng, and Han Jiawei. 2024. TriSum: Learning summarization ability from large language models with structured rationale. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 2805–2819, Mexico City, Mexico. Association for Computational Linguistics. [Google Scholar]
- Jin Di and Szolovits Peter. 2018. PICO element detection in medical text via long short-term memory neural networks. In Proceedings of the BioNLP 2018 workshop, Stroudsburg, PA, USA. Association for Computational Linguistics. [Google Scholar]
- Jin Qiao, Tan Chuanqi, Chen Mosha, Yan Ming, and Zhang Xiaozhong Liu Ningyu, Huang Songfang. 2022. State-of-the-art evidence retriever for precision medicine: Algorithm development and validation. JMIR Medical Informatics, 10(12):e40743. [Google Scholar]
- Jin Qiao, Wang Zifeng, Floudas Charalampos S, Chen Fangyuan, Gong Changlin, Bracken-Clarke Dara, Xue Elisabetta, Yang Yifan, Sun Jimeng, and Lu Zhiyong. 2024. Matching patients to clinical trials with large language models. Nat. Commun, 15(1):9074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnston Tom H, Lacoste Alix M B, Ravenscroft Paula, Su Jin, Tamadon Sahar, Seifi Mahtab, Lang Anthony E, Fox Susan H, Brotchie Jonathan M, and Visanji Naomi P. 2024. Using artificial intelligence to identify drugs for repurposing to treat l-DOPA-induced dyskinesia. Neuropharmacology, 248(109880):109880. [Google Scholar]
- Kalyan Katikapalli Subramanyam, Rajasekharan Ajit, and Sangeetha Sivanesan. 2022. Ammu: A survey of transformer-based biomedical pretrained language models. Journal of Biomedical Informatics, 126:103982. [Google Scholar]
- Kamath Sowmya, Mayya Veena, and Priyadarshini. 2021. A probabilistic precision information retrieval model for personalized clinical trial recommendation based on heterogeneous data. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pages 1–5. IEEE. [Google Scholar]
- Kanbar Lara J, Wissel Benjamin, Ni Yizhao, Pajor Nathan, Glauser Tracy, Pestian John, and Dexheimer Judith W. 2022. Implementation of machine learning pipelines for clinical practice: Development and validation study. JMIR Medical Informatics, 10(12):e37833. [Google Scholar]
- Kang Tian, Perotte Adler, Tang Youlan, Ta Casey, and Weng Chunhua. 2021. Umls-based data augmentation for natural language processing of clinical research literature. Journal of the American Medical Informatics Association, 28(4):812–823. [Google Scholar]
- Kang Tian, Sun Yingcheng, Kim Jae Hyun, Ta Casey, Perotte Adler, Schiffer Kayla, Wu Mutong, Zhao Yang, Moustafa-Fahmy Nour, Peng Yifan, and Weng Chunhua. 2023. EvidenceMap: a three-level knowledge representation for medical evidence computation and comprehension. J. Am. Med. Inform. Assoc, 30(6):1022–1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang Tian, Zou Shirui, and Weng Chunhua. 2019. Pre-training to recognize PICO elements from randomized controlled trial literature. Stud. Health Technol. Inform, 264:188–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaskovich Samuel, Wyatt Kirk D., Oliwa Tomasz, Graglia Luca, Furner Brian, Lee Jooho, Mayampurath Anoop, and Volchenboum Samuel L.. 2023. Automated matching of patients to clinical trials: A patient-centric natural language processing approach for pediatric leukemia. JCO Clinical Cancer Informatics, 7. [Google Scholar]
- Kefeli Jenna and Tatonetti Nicholas. 2024. Tcga-reports: A machine-readable pathology report resource for benchmarking text-based ai models. Patterns, 5(3):100933. Published online February 21, 2024. [Google Scholar]
- Khan AH, Abbe A, Falissard B, Carita P, Bachert C, Mullol J, Reaney M, Chao J, Mannent LP, Amin N, Mahajan P, Pirozzi G, and Eckert L. 2021. Data mining of free-text responses: An innovative approach to analyzing patient perspectives on chronic rhinosinusitis with nasal polyps in a phase iia proof-of-concept study for dupilumab. Dove Medical Press, 2021(15):2577–2586. [Google Scholar]
- Kim Jeongeun, Izower Mitchell, and Quintana Yuri. 2023a. Parsable clinical trial eligibility criteria representation using natural language processing. In AMIA Annual Symposium Proceedings, pages 616–624. American Medical Informatics Association. [Google Scholar]
- Kim Jeongeun, Izower Mitchell, and Quintana Yuri. 2023b. Parsable clinical trial eligibility criteria representation using natural language processing. In AMIA Annual Symposium Proceedings, pages 616–624. American Medical Informatics Association. [Google Scholar]
- Kim Su Nam, Martinez David, Cavedon Lawrence, and Yencken Lars. 2024. Nicta-piboso dataset. 10.57702/ne4r48m1. Dataset consists of 1,000 medical abstracts manually annotated with semantic tags based on the PICO criteria to support the automatic classification of sentences. [DOI] [Google Scholar]
- Koopman Bevan, Wright Tracey, Omer Natacha, McCabe Veronica, and Zuccon Guido. 2021. Precision medicine search for paediatric oncology. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘21, page 2536–2540, New York, NY, USA. Association for Computing Machinery. [Google Scholar]
- Koopman Bevan and Zuccon Guido. 2022. Cohort-based clinical trial retrieval. In Proceedings of the 25th Australasian Document Computing Symposium, ADCS ‘21, New York, NY, USA. Association for Computing Machinery. [Google Scholar]
- Kury Fabrício, Butler Alex, Yuan Chi, Fu Li-Heng, Sun Yingcheng, Liu Hao, Sim Ida, Carini Simona, and Weng Chunhua. 2020. Chia, a large annotated corpus of clinical trial eligibility criteria. Sci. Data, 7(1):281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kwaan Mary R and Melton Genevieve B. 2012. Evidence-based medicine in surgical education. Clin. Colon Rectal Surg, 25(3):151–155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lalitha Evani, Ramani Kasarapu, Shahida Dudekula, Deepak Esikela Venkata Sai, Bindu M Hima, and Shaikshavali Diguri. 2023. Text summarization of medical documents using abstractive techniques. In 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), pages 939–943. IEEE. [Google Scholar]
- Lan Mengfei, Cheng Mandy, Hoang Linh, Riet Gerben Ter, and Kilicoglu Halil. 2024. Automatic categorization of self-acknowledged limitations in randomized controlled trial publications. J. Biomed. Inform, 152:104628. [Google Scholar]
- Lee Jinhyuk, Yoon Wonjin, Kim Sungdong, Kim Donghyeon, Kim Sunkyu, So Chan Ho, and Kang Jaewoo. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee Kyeryoung, Liu Zongzhi, Mai Yun, Jun Tomi, Ma Meng, Wang Tongyu, Ai Lei, Calay Ediz, Oh William, Stolovitzky Gustavo, Schadt Eric, and Wang Xiaoyan. 2024. Optimizing clinical trial eligibility design using natural language processing models and real-world data: Algorithm development and validation. JMIR AI, 3:e50800. [Google Scholar]
- Li Chao, Gurulingappa Harsha, Karmalkar Prathamesh, Raab Jana, Vij Aastha, Megaro Gerard, and Henke Christian. 2021a. Automate clinical evidence synthesis by linking trials to publications with text analytics. In 2021 International Symposium on Electrical, Electronics and Information Engineering, New York, NY, USA. ACM. [Google Scholar]
- Li Jianfu, Wei Qiang, Ghiasvand Omid, Chen Miao, Lobanov Victor, Weng Chunhua, and Xu Hua. 2022. A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora. BMC Med. Inform. Decis. Mak, 22(Suppl 3):235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Xinhang, Liu Hao, Kury Fabrício, Yuan Chi, Butler Alex, Sun Yingcheng, Ostropolets Anna, Xu Hua, and Weng Chunhua. 2021b. A comparison between human and nlp-based annotation of clinical trial eligibility criteria text using the omop common data model. In AMIA Joint Summits on Translational Science Proceedings, pages 394–403. AMIA. [Google Scholar]
- Li Yizhen, Luan Zhongzhi, Liu Yixing, Liu Heyuan, Qi Jiaxing, and Han Dongran. 2024. Automated information extraction model enhancing traditional chinese medicine rct evidence extraction (evi-bert): algorithm development and validation. frontiers artificial intelligence, 7(1454945):not listed. [Google Scholar]
- Liu Cong, Liu Hao, Ta Casey, Roger James, Butler Alex, Lee Junghwan, Kim Jaehyun, Shang Ning, and Weng Chunhua. 2022. Evaluation of Criteria2Query: Towards augmented intelligence for cohort identification. Stud. Health Technol. Inform, 290:297–300. [DOI] [PubMed] [Google Scholar]
- Liu Cong, Yuan Chi, Butler Alex M, Carvajal Richard D, Li Ziran Ryan, Ta Casey N, and Weng Chunhua. 2019. Dquest: dynamic questionnaire for search of clinical trials. Journal of the American Medical Informatics Association, 26(11):1333–1343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Hao, Chi Yuan, Butler Alex, Sun Yingcheng, and Weng Chunhua. 2021. A knowledge base of clinical trial eligibility criteria. Journal of Biomedical Informatics, 117:103771. [Google Scholar]
- Lokker Cynthia, Bagheri Elham, Abdelkader Wael, Parrish Rick, Afzal Muhammad, Navarro Tamara, Cotoi Chris, Germini Federico, Linkins Lori, R Brian Haynes Lingyang Chu, and Iorio Alfonso. 2023. Deep learning to refine the identification of high-quality clinical research articles from the biomedical literature: Performance evaluation. J. Biomed. Inform, 142(104384):104384. [Google Scholar]
- Malik Khalid Mahmood, Krishnamurthy Madan, Marcinek Pawel, and Malik Ghaus M. 2020. Impact of size, location, symptomatic-nature and gender on the rupture of saccular intracranial aneurysms. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ‘18, page 995–1001. IEEE Press. [Google Scholar]
- Marshall Iain J, Nye Benjamin, Kuiper Joël, Noel-Storr Anna, Marshall Rachel, Maclean Rory, Soboczenski Frank, Nenkova Ani, Thomas James, and Wallace Byron C. 2020. Trialstreamer: A living, automatically updated database of clinical trial reports. J. Am. Med. Inform. Assoc, 27(12):1903–1912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marshall Iain J, Trikalinos Thomas A, Soboczenski Frank, Yun Hye Sun, Kell Gregory, Marshall Rachel, and Wallace Byron C. 2023. In a pilot study, automated real-time systematic review updates were feasible, accurate, and work-saving. J. Clin. Epidemiol, 153:26–33. [DOI] [PubMed] [Google Scholar]
- Mayer Tobias, Marro Santiago, Cabrio Elena, and Villata Serena. 2021. Enhancing evidence-based medicine with natural language argumentative analysis of clinical trials. Artificial Intelligence in Medicine, 118:102098. [Google Scholar]
- Mehta Chirag, Cohen David, Jaisinghani Priya, and Parikh Payal. 2022. Internal medicine resident adherence to evidence-based practices in management of diabetes mellitus. J. Med. Educ. Curric. Dev, 9:23821205221076659. [Google Scholar]
- Meystre Stéphane M., Heider Paul M., Cates Andrew, Bastian Grace, Pittman Tara, Gentilin Stephanie, and Kelechi Teresa J.. 2023. Piloting an automated clinical trial eligibility surveillance and provider alert system based on artificial intelligence and standard data models. BMC Medical Research Methodology, 23(88). [Google Scholar]
- Mishra Rashmi, Burke Andrea, Gitman Bonnie, Verma Payal, Engelstad Mark, Alevizos Ilias, Gahl William A., Collins Michael T., Lee Janice S., and Sincan Murat. 2019. Data-driven method to enhance craniofacial and oral phenotype vocabularies. The Journal of the American Dental Association, 150(11):933–939.e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mohammed Sabah and Fiaidhi Jinan. 2023. Investigation into scaling-up the soap problem-oriented medical record into a clinical case study. In 2023 IEEE 11th International Conference. IEEE. [Google Scholar]
- Mohammed Sabah and Fiaidhi Jinan. 2024. Generative AI for evidence-based medicine: A PICO GenAI for synthesizing clinical case reports. In ICC 2024 - IEEE International Conference on Communications, volume 3, pages 1503–1508. IEEE. [Google Scholar]
- Mohammed Sabah, Fiaidhi Jinan, and Kudadiya Rahul. 2023. Integrating a PICO clinical questioning to the QL4POMR framework for building evidence-based clinical case reports. In 2023 IEEE International Conference on Big Data (BigData), volume 4, pages 4940–4947. IEEE. [Google Scholar]
- Murcia Victor M, Aggarwal Vinod, Pesaladinne Nikhil, Thammineni Ram, Do Nhan, Alterovitz Gil, and Fricks Rafael B. 2024. Automating clinical trial matches via natural language processing of synthetic electronic health records and clinical trial eligibility criteria. AMIA Summits Transl. Sci. Proc, 2024:125–134. [PMC free article] [PubMed] [Google Scholar]
- Mutinda Faith, Liew Kongmeng, Yada Shuntaro, Wakamiya Shoko, and Aramaki Eiji. 2022a. PICO corpus: A publicly available corpus to support automatic data extraction from biomedical literature. In Proceedings of the first Workshop on Information Extraction from Scientific Publications, pages 26–31, Online. Association for Computational Linguistics. [Google Scholar]
- Mutinda Faith Wavinya, Liew Kongmeng, Yada Shuntaro, Wakamiya Shoko, and Aramaki Eiji. 2022b. Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer. BMC Med. Inform. Decis. Mak, 22(1):158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Myszewski Joshua J., Klossowski Emily, Meyer Patrick, Bevil Kristin, Klesius Lisa, and Schroeder Kristopher M.. 2022. Validating gan-biobert: A methodology for assessing reporting trends in clinical trials. Frontiers in Digital Health, 4. [Google Scholar]
- Navarro-Ruan Tamara and Haynes R. Brian. 2022. Preliminary comparison of the performance of the national library of medicine’s systematic review publication type and the sensitive clinical queries filter for systematic reviews in pubmed. Journal of the Medical Library Association, 110(1). [Google Scholar]
- Névéol Aurélie, Doğan Rezarta Islamaj, and Lu Zhiyong. 2011. Semi-automatic semantic annotation of pubmed queries: a study on quality, efficiency, satisfaction. Journal of biomedical informatics, 44(2):310–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newbury Abigail, Liu Hao, Idnay Betina, and Weng Chunhua. 2023. The suitability of UMLS and SNOMED-CT for encoding outcome concepts. J. Am. Med. Inform. Assoc, 30(12):1895–1903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen Vincent, Karimi Sarvnaz, and Jin Brian. 2019. An experimentation platform for precision medicine. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, page 1357–1360, New York, NY, USA. Association for Computing Machinery. [Google Scholar]
- Ni Yizhao, Bermudez Monica, Kennebeck Stephanie, Liddy-Hicks Stacey, and Dexheimer Judith. 2019. A real-time automated patient screening system for clinical trials eligibility in an emergency department: Design and evaluation. JMIR Medical Informatics, 7(3):e14185. [Google Scholar]
- Nievas Mauro, Basu Aditya, Wang Yanshan, and Singh Hrituraj. 2024. Distilling large language models for matching patients to clinical trials. Journal of the American Medical Informatics Association, 31(9):1953–1963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Norman Christopher, Leeflang Mariska, Spijker René, Kanoulas Evangelos, and Névéol Aurélie. 2019a. A distantly supervised dataset for automated data extraction from diagnostic studies. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 105–114, Florence, Italy. Association for Computational Linguistics. [Google Scholar]
- Norman Christopher R, Leeflang Mariska M G, Porcher Raphaël, and Névéol Aurélie. 2019b. Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy. Syst. Rev, 8(1):243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nurmambetova Elvira, Pan Jie, Zhang Zilong, Lee Seungwon, Southern Danielle A, Martin Elliot A, Wu Guosong, Ho Chester, and Eastwood Cathy A. 2023. Developing an inpatient electronic medical record phenotype for hospital-acquired pressure injuries: Case study using natural language processing models. JMIR AI, 2(2023):e41264. [Google Scholar]
- Nye Benjamin, Junyi Jessy Li Roma Patel, Yang Yinfei, Marshall Iain, Nenkova Ani, and Wallace Byron. 2018. A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 197–207, Melbourne, Australia. Association for Computational Linguistics. [Google Scholar]
- Pan Zhenhe, Jiang Shuang, Su Juntao, Guo Muzhe, and Zhang Yuanlin. 2021. Knowledge graph based platform of COVID-19 drugs and symptoms. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, New York, NY, USA. ACM. [Google Scholar]
- Peng Yifan, Rousseau Justin F, Shortliffe Edward H, and Weng Chunhua. 2023. AI-generated text may have a role in evidence-based medicine. Nat. Med, 29(7):1593–1594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramprasad Sanjana, Marshall Iain J., McInerney Denis Jered, and Wallace Byron C.. 2023a. Automatically summarizing evidence from clinical trials: A prototype highlighting current challenges. In Proceedings of the Conference of the Association for Computational Linguistics Meeting, pages 236–247. [Google Scholar]
- Ramprasad Sanjana, Mcinerney Jered, Marshall Iain, and Wallace Byron. 2023b. Automatically summarizing evidence from clinical trials: A prototype highlighting current challenges. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 236–247, Stroudsburg, PA, USA. Association for Computational Linguistics. [Google Scholar]
- Ratnani Iqbal, Fatima Sahar, Abid Muhammad Mohsin, Surani Zehra, and Surani Salim. 2023. Evidence-based medicine: History, review, criticisms, and pitfalls. Cureus, 15(2):e35266. [Google Scholar]
- Rohanian Omid, Nouriborji Mohammadmahdi, Kouchaki Samaneh, Nooralahzadeh Farhad, Clifton Lei, and Clifton David A.. 2024. Exploring the effectiveness of instruction tuning in biomedical language processing. Artificial Intelligence in Medicine, 158:103007. [Google Scholar]
- Abu Tareq Rony Mohammad, Islam Mohammad Shariful, Sultan Tipu, Alshathri Samah, and El-Shafai Walid. 2023. Medigpt: Exploring potentials of conventional and large language models on medical data. IEEE Access, 12. [Google Scholar]
- Rybinski Maciej, Karimi Sarvnaz, Nguyen Vincent, and Paris Cecile. 2020a. A2A: a platform for research in biomedical literature search. BMC Bioinformatics, 21(Suppl 19):572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rybinski Maciej, Xu Jerry, and Karimi Sarvnaz. 2020b. Clinical trial search: Using biomedical language understanding models for re-ranking. J. Biomed. Inform, 109(103530):103530. [Google Scholar]
- Sackett David L., Rosenberg William M. C., Gray J. A. Muir, Haynes R. Brian, and Richardson W. Scott. 1996. Evidence based medicine: what it is and what it isn’t. BMJ, 312(7023):71–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sadek Jawad, Inskip Alex, Woltmann James, Wilkins Georgina, Marshall Christopher, Pokora Maria, Vedpathak Amey, Jadrevska Anastasija, Craig Dawn, and Trenell Michael. 2023. Scanmedicine: An online search system for medical innovation. Contemporary Clinical Trials, 125:107042. [Google Scholar]
- Saiz Fernando Suarez, Sanders Corey, Stevens Rick, Nielsen Robert, Britt Michael, Yuravlivker Leemor, Preininger Anita M, and Jackson Gretchen P. 2021. Artificial intelligence clinical evidence engine for automatic identification, prioritization, and extraction of relevant clinical oncology research. JCO Clin. Cancer Inform., 5(5):102–111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Samuel Hamman, Zaiane Osmar, and Bolduc Francois. 2021. Evaluation of applied machine learning for health misinformation detection via survey of medical professionals on controversial topics in pediatrics. In Proceedings of the 5th International Conference on Medical and Health Informatics, pages 1–6. ACM. [Google Scholar]
- Sanchez-Graillet Olivia, Witte Christian, Grimm Frank, and Cimiano Philipp. 2022. An annotated corpus of clinical trial publications supporting schema-based relational information extraction. J. Biomed. Semantics, 13(1):14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sarker Abeed, Yang Yuan-Chi, Al-Garadi Mohammed Ali, and Abbas Aamir. 2020. A light-weight text summarization system for fast access to medical evidence. Front. Digit. Health, 2:585559. [Google Scholar]
- See Abigail, Liu Peter J., and Manning Christopher D.. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368. [Google Scholar]
- Segura-Bedmar Isabel and Raez Pablo. 2019. Cohort selection for clinical trials using deep learning models. J. Am. Med. Inform. Assoc, 26(11):1181–1188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shiraishi Makoto, Tomioka Yoko, Miyakuni Ami, Ishii Saaya, Hori Asei, Park Hwayoung, Ohba Jun, and Okazaki Mutsumi. 2024. Performance of ChatGPT in answering clinical questions on the practical guideline of blepharoptosis. Aesthetic Plast. Surg, 48(13):2389–2398. [DOI] [PubMed] [Google Scholar]
- Spasic Irena, Krzeminski David, Corcoran Paul, and Balinsky Alexander. 2019. Cohort selection for clinical trials from longitudinal patient records: Text mining approach. JMIR Medical Informatics, 7(4):e15980. [Google Scholar]
- Stylianou Nikolaos, Razis Gerasimos, Goulis Dimitrios G., and Vlahavas Ioannis. 2020. Ebm+: Advancing evidence-based medicine via two level automatic identification of populations, interventions, outcomes in medical literature. Artificial Intelligence in Medicine, 108:101949. [Google Scholar]
- Stylianou Nikolaos and Vlahavas Ioannis. 2021. TransforMED: End-to-end transformers for evidence-based medicine and argument mining in medical literature. J. Biomed. Inform, 117(103767):103767. [Google Scholar]
- Testa Davide, Chersoni Emmanuele, and Lenci Alessandro. 2023. We understand elliptical sentences, and language models should too: A new dataset for studying ellipsis and its interaction with thematic fit. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3340–3353, Toronto, Canada. Association for Computational Linguistics. [Google Scholar]
- Thirunavukarasu Arun James, Ting Darren Shu Jeng, Elangovan Kabilan., et al. 2023. Large language models in medicine. Nature Medicine, 29:1930–1940. [Google Scholar]
- Tian Shubo, Erdengasileng Arslan, Yang Xi, Guo Yi, Wu Yonghui, Zhang Jinfeng, Bian Jiang, and He Zhe. 2021. Transformer-based named entity recognition for parsing clinical trial eligibility criteria. ACM BCB, 2021. [Google Scholar]
- Tian Shubo, Yin Pengfei, Zhang Hansi, Erdengasileng Arslan, Bian Jiang, and He Zhe. 2023. Parsing clinical trial eligibility criteria for cohort query by a multi-input multi-output sequence labeling model. In 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 4426–4430. [Google Scholar]
- Tissot Hegler C., Shah Anoop D., Brealey David, Harris Steve, Agbakoba Ruth, and Folarin Amos. 2020. Natural language processing for mimicking clinical trial recruitment in critical care: A semi-automated simulation based on the leopards trial. IEEE Journal of Biomedical and Health Informatics, 24(10):2950–2959. [DOI] [PubMed] [Google Scholar]
- Tsubota Tadashi, Bollegala Danushka, Zhao Yang, Jin Yingzi, and Kozu Tomotake. 2022. Improvement of intervention information detection for automated clinical literature screening during systematic review. J. Biomed. Inform, 134(104185):104185. [Google Scholar]
- Tun Pyae Phyo Luo Jiawen, Xie Jiecheng, Wibowo Sandi, and Hao Chen. 2023. Automatic assessment of patient eligibility by utilizing nlp and rule-based analysis. In 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), page 10340494, Sydney, Australia. IEEE. [Google Scholar]
- Turfaha Ali, Liu Hao, Stewart Latoya A, Kang Tian, and Weng Chunhua. 2022. Extending pico with observation normalization for evidence computing. In MEDINFO 2021: One World, One Health – Global Partnership for Digital Innovation, pages 268–272, New York, New York, USA. International Medical Informatics Association and IOS Press, IOS Press. [Google Scholar]
- Unlu Ozan, Shin Jiyeon, Mailly Charlotte J, Oates Michael F, Tucci Michela R, Varugheese Matthew, Wagholikar Kavishwar, Wang Fei, Scirica Benjamin M, Blood Alexander J, and Aronson Samuel J. 2024. Retrieval augmented generation enabled generative pre-trained transformer 4 (GPT-4) performance for clinical trial screening. medRxiv. [Google Scholar]
- Van de Vliet Peter, Sprenger Tobias, Kampers Linde F C, Makalowski Jennifer, Schirrmacher Volker, Stücker Wilfried, and Van Gool Stefaan W. 2023. The application of evidence-based medicine in individualized medicine. Biomedicines, 11(7). [Google Scholar]
- Vora Bianca, Kuruvilla Denison, Kim Chloe, Wu Michael, Shemesh Colby S., and Roth Gillie A.. 2023. Applying natural language processing to clinicaltrials.gov: mrna cancer vaccine case study. Clinical and Translational Science, 16:2417–2420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vydiswaran V G Vinod, Strayhorn Asher, Zhao Xinyan, Robinson Phil, Agarwal Mahesh, Bagazinski Erin, Essiet Madia, Iott Bradley E, Joo Hyeon, Ko Pingjui, Lee Dahee, Lu Jin Xiu, Liu Jinghui, Murali Adharsh, Sasagawa Koki, Wang Tianshi, and Yuan Nalingna. 2019. Hybrid bag of approaches to characterize selection criteria for cohort identification. J. Am. Med. Inform. Assoc, 26(11):1172–1180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Kunyuan, Cui Hao, Zhu Yun, Hu Xiaoyun, Hong Chang, Guo Yabing, An Lingyao, Zhang Qi, and Liu Li. 2024. Evaluation of an artificial intelligence-based clinical trial matching system in chinese patients with hepatocellular carcinoma: a retrospective study. BMC Cancer, 24(1):246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Yu, Wang Yuan, Peng Zhenwan, Zhang Feifan, Zhou Luyao, and Yang Fei. 2023a. Medical text classification based on the discriminative pre-training model and prompt-tuning. Digit. Health, 9:20552076231193213. [Google Scholar]
- Wang Zifeng and Sun Jimeng. 2022. Trial2Vec: Zero-shot clinical trial document similarity search using self-supervision. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6377–6390, Stroudsburg, PA, USA. Association for Computational Linguistics. [Google Scholar]
- Wang Zifeng, Xiao Cao, and Sun Jimeng. 2023b. AutoTrial: Prompting language models for clinical trial design. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12461–12472, Singapore. Association for Computational Linguistics. [Google Scholar]
- Witte Christian, Schmidt David M, and Cimiano Philipp. 2024. Comparing generative and extractive approaches to information extraction from abstracts describing randomized clinical trials. J. Biomed. Semantics, 15(1):3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie Qianqian, Bishop Jennifer Amy, Tiwari Prayag, and Ananiadou Sophia. 2022. Pre-trained language models with domain knowledge for biomedical extractive summarization. Knowl. Based Syst, 252(109460):109460. [Google Scholar]
- Xie Shiyao, Zhao Wenjing, Deng Guanghui, He Guohua, He Na, Lu Zhenhua, Hu Weihua, Zhao Mingming, and Du Jian. 2024. Utilizing chatgpt as a scientific reasoning engine to differentiate conflicting evidence and summarize challenges in controversial clinical questions. Journal of the American Medical Informatics Association, 31(7):1551–1560. Published online: 17 May 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie Yi, Seth Ishith, Hunter-Smith David J, Rozen Warren M, Ross Richard, and Lee Matthew. 2023. Aesthetic surgery advice and counseling from artificial intelligence: A rhinoplasty consultation with ChatGPT. Aesthetic Plast. Surg, 47(5):1985–1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu Quan, Liu Yueyue, Sun Dawei, Huang Xiaoqian, Li Feihong, Zhai Jincheng, Li Yang, Zhou Qiming, Qian Niansong, and Niu Beifang. 2023. OncoCT-Miner: streamlining precision oncology trial matching via molecular profile analysis. Database (Oxford), 2023:baad077. [Google Scholar]
- Yang Yumeng, Jayaraj Soumya, Ludmir Ethan, and Roberts Kirk. 2023. Text classification of cancer clinical trial eligibility criteria. AMIA Annu. Symp. Proc, 2023:1304–1313. [PMC free article] [PubMed] [Google Scholar]
- Yao Xiaoxi, Attia Zachi I., Behnken Emma M., Walvatne Kelli, Giblon Rachel E., Liu Sijia, Siontis Konstantinos C., Gersh Bernard J., Jonathan Graff-Radford Alejandro A. Rabinstein, Friedman Paul A., and Noseworthy Peter A.. 2021. Batch enrollment for an artificial intelligence-guided intervention to lower neurologic events in patients with undiagnosed atrial fibrillation: rationale and design of a digital clinical trial. American Heart Journal, 239:73–79. [DOI] [PubMed] [Google Scholar]
- Yazi Fatin Syafiqah, Vong Wan-Tze, Raman Valliappan, Then Patrick Hang Hui, and Lunia Mukulraj J. 2021. Towards automated detection of contradictory research claims in medical literature using deep learning approach. In 2021 Fifth International Conference on Information Retrieval and Knowledge Management (CAMP), pages 116–121. [Google Scholar]
- Yuan Jiayi, Tang Ruixiang, Jiang Xiaoqian, and Hu Xia. 2024. Large language models for healthcare data augmentation: An example on patient-trial matching. In AMIA Annual Symposium Proceedings, volume 2024, pages 1324–1333. AMIA. [Google Scholar]
- Zeng Kun, Pan Zhiwei, Xu Yibin, and Qu Yingying. 2020. An ensemble learning strategy for eligibility criteria text classification for clinical trial recruitment: Algorithm development and validation. JMIR Medical Informatics, 8(7):e17832. [Google Scholar]
- Zhang Gongbo, Jin Qiao, Zhou Yiliang, Wang Song, Idnay Betina, Luo Yiming, Park Elizabeth, Nestor Jordan G., Spotnitz Matthew E., Soroush Ali, Campion Thomas R. Jr, Lu Zhiyong, Weng Chunhua, and Peng Yifan. 2024a. Closing the gap between open source and commercial large language models for medical evidence summarization. NPJ Digital Medicine, 7(1):239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Gongbo, Zhou Yiliang, Hu Yan, Xu Hua, Weng Chunhua, and Peng Yifan. 2024b. A span-based model for extracting overlapping PICO entities from RCT publications. J. Am. Med. Inform. Assoc [Google Scholar]
- Zhang Yixuan, Liu Junzhen, and Lu Wei. 2023. Medict-gp: An accurate entity recognition model combining medical domain knowledge and globalization ideas. In Proceedings of the 2023 9th International Conference on Computing and Artificial Intelligence, ICCAI ‘23, page 477–483, New York, NY, USA. Association for Computing Machinery. [Google Scholar]
- Zheng Ce, Ye Hongfei, Guo Jinming, Yang Junrui, Fei Ping, Yuan Yuanzhi, Huang Danqing, Huang Yuqiang, Peng Jie, Xie Xiaoling, Xie Meng, Zhao Peiquan, Chen Li, and Zhang Mingzhi. 2024. Development and evaluation of a large language model of ophthalmology in chinese. Br. J. Ophthalmol, 108(10):1390–1397. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


