Artificial Intelligence-assisted Biomedical Literature Knowledge Synthesis to Support Decision-making in Precision Oncology

Ting He; Kory Kreimeyer; Mimi Najjar; Jonathan Spiker; Maria Fatteh; Valsamo Anagnostou; Taxiarchis Botsis

. 2025 May 22;2024:513–522.

Artificial Intelligence-assisted Biomedical Literature Knowledge Synthesis to Support Decision-making in Precision Oncology

Ting He ^1,², Kory Kreimeyer ^1,², Mimi Najjar ^1,³, Jonathan Spiker ^1,², Maria Fatteh ^1,³, Valsamo Anagnostou ^1,³, Taxiarchis Botsis ^1,²

PMCID: PMC12099343 PMID: 40417512

Abstract

The delivery of effective targeted therapies requires comprehensive analyses of the molecular profiling of tumors and matching with clinical phenotypes in the context of existing knowledge described in biomedical literature, registries, and knowledge bases. We evaluated the performance of natural language processing (NLP) approaches in supporting knowledge retrieval and synthesis from the biomedical literature. We tested PubTator 3.0, Bidirectional Encoder Representations from Transformers (BERT), and Large Language Models (LLMs) and evaluated their ability to support named entity recognition (NER) and relation extraction (RE) from biomedical texts. PubTator 3.0 and the BioBERT model performed best in the NER task (best F1-score 0.93 and 0.89, respectively), while BioBERT outperformed all other solutions in the RE task (best F1-score 0.79) and a specific use case it was applied to by recognizing nearly all entity mentions and most of the relations. Our findings support the use of AI-assisted approaches in facilitating precision oncology decision-making.

Introduction

Precision oncology aims at delivering patient-tailored cancer therapies by targeting specific molecular alterations after in-depth characterization of the molecular profiling of tumors¹. This task requires thoroughly reviewing several sources, including electronic health records and next-generation sequencing outputs, to accurately define the patient’s clinical-genomic phenotype and information from biomedical literature, several knowledge bases, clinical guidelines, clinical trials, FDA-approved and off-label treatments, and more. In practice, medical experts often collaborate in Molecular Tumor Boards (MTB) to interpret patients’ clinical and molecular profiles and suggest genotype-targeted therapies based on levels of evidence retrieved and ranked from biomedical resources. This process typically demands manual and labor-intensive steps as existing automated approaches are limited and often do not meet the expected level of performance required to assist human experts. Considering the increasing volume of knowledge generated in all the above sources and the evolving landscape of precision oncology, significant automation is necessary to streamline information extraction from various sources, such as biomedical literature, and combine it with existing and new knowledge to support expert review in the MTB and other use cases.

Automated information retrieval from biomedical texts is generally supported by NLP techniques, and numerous studies have been conducted on this end over the last decades. The exciting capabilities of LLMs in language understanding and generation open new opportunities for research in the domain, and they have already been applied to clinical text for NER purposes². Published literature is a major knowledge source in biomedicine, and several researchers have used NLP and advanced methods to process large corpora of published material and extract information of interest. Recent efforts have shown promise in efficiently retrieving cancer-specific genomic alteration and treatment mentions from PubMed and may allow us to successfully incorporate scientific findings in precision oncology. For example, PubTator 3.0 is a tool the National Library of Medicine built to identify several entities and their relations in the literature using state-of-the-art Artificial Intelligence (AI) techniques³. PubTator addresses a major limitation in previous efforts that focused on processing the titles and abstracts rather than the full text of biomedical articles. The importance of this richer text is highlighted in recent efforts that have released annotated full-text corpora and/or conducted relevant Challenges^4-6.

These emerging needs inspired our exploration of automated information retrieval from biomedical literature assisted by modern NLP models and technologies. Our main focus was identifying entities essential in supporting decision-making in precision oncology, including mutations, cancer types, targeted therapies, and the relations between these entities. We evaluated the performance of the selected approaches using an existing reference standard widely used in the community, demonstrated their readiness to automate certain manual and labor-intensive processes, and discussed the remaining challenges that must be addressed to support several use cases in precision oncology more efficiently.

Methods

BioRED Corpus

We used the Biomedical Relation Extraction Dataset (BioRED), a unique resource of 600 biomedical PubMed abstracts that contain several entity types (Gene, Variant, Disease, Chemical, Species, and Cell Line) with their relations at the document level⁷. Most other annotated datasets are limited as they focus on one entity, such as tmVar⁸, they do not include any relation annotations among the annotated entities, such as the BC5CDR corpus⁹, or annotate relations at the sentence level, such as the DDI corpus¹⁰. In most real use cases in precision oncology, it is essential to evaluate the entity relations at the document level, making BioRED the ideal choice for our exploration.

We focused on four entities (Gene, Variant, Disease, and Chemical) that represent a patient’s clinical-genomic phenotype and relevant drugs as well as the relations between these entities to train models that would process biomedical publications and identify genotype-driven therapies for specific cancer types. Table 1 shows the distribution of the selected entities and relations in the BioRED corpus and the training, development, and testing sets; we retained the same split in our study. Per the BioRED annotation guidelines, curators were asked to annotate each relation using one of the following nine labels: Association, Bind, Cause, Comparison, Cotreatment, Drug Interaction, Negative Correlation, and Positive Correlation. The detailed statistics for these relation types are shown in Table 2. Some curators also annotated relations between Genes and Variants as well as Variants and Variants; these relations were not described in the guidelines. Considering the sparse distribution in most relation types, we treated them all as belonging to the “Association” type, such that we have association, negative correlation and positive correlation in the end.

Table 1.

The distribution of the selected entity types and their relations in the BioRED training, development, and test sets; the numbers in parentheses show unique entity mentions. All statistics were retrieved from the BioRED publication⁷.

Sets	Abstracts	Entity Types				Relations
Sets	Abstracts	Gene	Variant	Disease	Chemical	Relations
Training	400	4430 (1141)	890 (420)	3646 (576)	2853 (486)	4178
Development	100	1087 (268)	250 (135)	982 (244)	822 (184)	1162
Testing	100	1180 (399)	241 (137)	917 (244)	754 (170)	1163
Total	600	6697 (1643)	1381 (678)	5545 (778)	4429 (651)	6503

Open in a new tab

Table 2.

Distribution of relation types for each entity pair in the BioRED corpus. The grey-shaded cells represent relation types that do not apply to the corresponding entity pairs, per the BioRED annotation guidelines.

Open in a new tab

G: Gene; D: Disease; V: Variant; C: Chemical.

Natural Language Processing Solutions

We initially evaluated several BERT models for the NER and the RE tasks. The BioBERT and BioLinkBERT were two top-performing BERT models evaluated over the Biomedical Language Understanding and Reasoning Benchmark11, ¹², and we selected them for both tasks. We also employed two LLMs, the open-source Mixtral-8x7b Instruct and the openAI’s ChatGPT 4, which have already been investigated in several studies in biomedicine^13-17 and explored their performance in the NER task only. Lastly, we used PubTator 3.0, a tool built by the National Library of Medicine that, as mentioned above, identifies several entities and their relations in biomedical literature³, including those analyzed in our study.

In the BERT exploration, we treated the NER task as a sequence labeling problem, annotating each token using the Beginning-Inside-Outside (BIO) tagging scheme. According to this scheme, the first token in an entity mention is labeled with the B-entity_type tag, subsequent tokens within the same entity are labeled with the I-entity_type tag, and all other tokens are labeled with the O tag. For example, in the sentence “Iodide transport defect (ITD) is a rare disorder”, the tokens were tagged as “B-disease, I-Disease, I-Disease, B-Disease, O, O, O, O”, respectively, as the ITD abbreviation was considered a new Disease mention. We then trained our BERT models using the training set and predicted each token based on the highest probability of this token. For the RE task, we pursued the identification of relations at the document level. However, BERT models can only accept a limited number of tokens, making it impossible to process a full abstract in a single pass. We, therefore, annotated each relation pair if the two entities appeared within nearby sentences (within a 3-sentence window: one before and one after) and used entity markers to highlight the entities. Then, we treated the RE task as a sequence classification problem using the relation type between the two entities as the label to train the model. For example, we converted the sentence “Crocin improves lipid dysregulation in subacute diazinon exposure through ERK1/2 pathway” into “Crocin improves <e1> lipid </e1> dysregulation in subacute <e2> diazinon </e2> exposure through ERK1/2 pathways”, linked the two entities, assigned the Association label and trained the model to learn this relation. As mentioned above, we did not use the relation types described in the BioRED guidelines and found in the annotated corpus. Instead, we categorized bind, comparison, conversion, cotreatment and drug interaction relations under the broader Association type. This process was trained using 1 Nvidia V100 GPU with a 32GB RAM cluster.

In the LLM analysis, we randomly selected five abstracts to use as instructional training for the model from the set of BioRed training abstracts that contained all four entities of interest (N=62). As shown in Figure 1, these five abstracts, along with their lists of annotated entities, were fed into the prompt before asking for labeled output on a new abstract provided in the same format. In the case of the Mixtral-8x7b Instruct model, we used the Microsoft Azure Databricks environment to run the analysis by setting the sampling temperature to 0.5 to decrease randomness and the maximum number of generated tokens to 2000 to receive the complete output for long abstracts contained in the BioRED testing set. For the ChatGPT4 model, we directly used the chat interface from openAI to deliver the instructions and examples. In both cases, we asked the LLM to annotate 5 abstracts before resetting the session (to minimize hallucinations observed with longer inputs) and provided the same instructions each time.

Figure 1. — Few-shot prompt engineering in the LLM exploration. The Instruction section contained five PubMed records (titles and abstracts) from the BioRed training set with the annotated entities and their exact location in the text. These examples instructed the LLMs to process the new PubMed records provided in the Question section and generate the annotated entities with their location in the text.

We subsequently used the PubTator 3.0 online user interface to retrieve the entities and relations for all abstracts in the BioRED testing set. At the time of retrieving this information, the PubTator 3.0 Application Programming Interface (API) was rather unstable, making it easier to obtain this data from the user interface.

The performance of all models was evaluated using the standard metrics of recall, precision, and F1-score. For the NER task, we calculated the strict version of these metrics based on the exact matches (same span of text, annotation label, and boundary) between the annotated entities in the reference BioRED testing set and the model output. In the case of the two LLMs, we also used the relaxed versions of the selected metrics by allowing a boundary overlap (at least one token) between the entity mentions as long as they were annotated with the same label. In the RE task, only the strict recall, precision, and F1 were calculated.

Use Case

Besides published literature in PubMed, other important resources contain information on variant actionability and pathogenicity, such as the PubMed LitVar¹⁸, the Variant Interpretation for Cancer Consortium meta-knowledgebase¹⁹, and the Clinical Interpretation for Variants in Cancer knowledge base²⁰. A commonly used and major resource in precision oncology is OncoKB, a comprehensive knowledge base that contains detailed information about genomic alterations in cancer and clinical actionability based on pre-defined levels of evidence^{21, 22}. The information in OncoKB comes from human curators who process several data sources, including scientific literature, in a comprehensive process to identify relations between key entities (genes, alterations, cancer types, and drugs) and include them in publicly available summarized tables to assist precision oncology experts in their decision-making in MTB and other settings. This labor-intensive process might benefit from applying automated tools and models, such as the ones explored in our work, that may expedite the collection of all scientific findings on the above relations, quickly synthesize the corresponding knowledge, and make it immediately available to the community. We, therefore, sought to evaluate the efficiency of the two BERT models and PubTator 3.0 to detect known relations and findings already described in OncoKB for a specific mutation.

We selected the PIK3CA E545K mutation as a representative example of an actionable mutation associated with FDA-approved targeted therapy and pulled the corresponding webpage in OncoKB on February 23, 2024. This webpage contains a detailed description of this mutation, the cancers it has been found in, the targeted drugs, the levels of evidence, as defined in OncoKB, and other supporting information; these are manually retrieved from twelve PubMed-indexed journal papers and three conference abstracts. OncoKB further summarizes the mutation-cancer-drug relations in a table under the “Therapeutic” tab, which also cites four (of the twelve) and two (of the three) above journal papers and conference abstracts, respectively. We pulled the four papers and abstracts and processed them with BioBERT and BioLinkBERT to retrieve all relations between the four entities (Gene, Variant, Disease, and Chemical) of interest. In parallel, we queried PubTator 3.0 using the four PubMed IDs to identify the relations between the entities of interest. Subsequently, the medical experts (MN, MF, and VA) of this study evaluated the output from the BERT models and PubTator and compared it with the information listed on OncoKB’s summarized table for PIK3CA E545K, treating it as the reference standard. This analysis helped us determine whether these tools accurately captured the key findings from the corresponding sources without any human intervention.

Results

Model Performance

As shown in Figure 2, PubTator 3.0 achieved the highest performance in the NER task with an F1-score of 0.9 or higher across all entity types, followed by BioBERT, which efficiently retrieved Genes and Drugs with balanced recall and precision between 0.86 and 0.89. When comparing the results between BioBERT and PubTator, we found that many of BioBERT’s incorrect predictions stem from its tendency to rely heavily on the surrounding context of a mention to determine its label. This often results in inconsistencies, where the same mention is assigned different entity types, which is generally not the case in medical texts. For example, in the sentence “Congenital long QT syndrome (LQTS) with in utero onset of the rhythm disturbances…,” BioBERT correctly identified LQTS as a disease. However, in another sentence, “A novel spontaneous LQTS-3 mutation was identified in the…,” LQTS was incorrectly predicted as a gene. We hypothesize that this behavior arises because the BERT model, trained for masked language prediction tasks, tends to prioritize context words over medical accuracy. BioLinkBERT was less efficient than BioBERT and PubTator 3.0 in supporting the NER task, with the highest recall for the Gene entity at 0.71 and other metrics between 0.55 and 0.66 for the remaining entities. Both LLMs performed poorly overall, although ChatGPT 4 demonstrated some promise, particularly in relaxed recall for the Disease and Gene entities at 0.81 and 0.78, respectively.

The RE task was particularly challenging. BioBERT outperformed both BioLinkBERT and PubTator, excelling in identifying Chemical-Disease relations with a recall of 0.77. However, its performance on other relations was less efficient with F1-score around 0.60, and it missed half of the Chemical-Chemical relations with recall at 0.50. The best F1-score for the BioLinkBERT was 0.47 for the Gene-Gene relation, while the lowest was 0.24 for the Chemical-Variant relations, which are essential in precision oncology. Quite interestingly, although PubTator 3.0 identified nearly all entity mentions, it struggled with relation extraction, achieving the best recall at 0.36 for the Chemical-Disease relations and F1-score ranging from 0.12 to 0.37 for other relations. Figure 3 summarizes all findings for the RE task.

Figure 3. — The performance of BioBERT, BioLinkBERT, and PubTator 3.0 in the RE task. G: Gene; D: Disease; V: Variant; C: Chemical.

Use Case

As mentioned above, four journal papers and two conference abstracts are cited in OncoKB’s summarized table under the “Therapeutic” tab for the PIK3CA E545K mutation. One of the medical experts retrieved all relations listed in OncoKB’s summarized table and evaluated whether they were included in the BioBERT, BioLinkBERT, and PubTator 3.0 outputs and discussed the findings with the other two medical experts. Only BioBERT identified at least one relation in all papers and abstracts, while BioLinkBERT did not output any relations. As shown in Table 3, BioBERT outperformed the other two solutions by identifying 55% of all findings (21 out of 38), which, on average, did not deviate much from the corresponding numbers in the BioRED testing set. On the other hand, BioLinkBERT and PubTator 3.0 performed poorly, recognizing only 0 and 9 findings, respectively. It should be noted, though, that PubTator 3.0 provides only relations found at the abstract and not full-text level, which partially explains its low performance.

Table 3.

The ability of the BERT models and PubTator 3.0 to support knowledge synthesis in biomedical literature. BioBERT identified several entities in the NER task (shown in blue) but not their corresponding relations. It could not find some of the variants because the corresponding papers cited in OncoKB did not include them (C420R, Q546E, and Q546E, shown in red) or contained the altered codon without further specifying the effect of the mutation on the amino acid sequence (E545 and H1047, shown in brown).P: Paper; A: Abstract; BB: BioBERT; BLB: BioLinkBERT; PT: PubTator 3.0.

Open in a new tab

Because BioBERT was the best-performing solution in the earlier RE task, we took a deeper dive to evaluate the missed relations on this task and discovered that, in almost half of them (8 out of 17), the model pulled all participating entities correctly but could not associate them (Table 3, lines 24-31). In the remaining relations, the model identified the cancer type, i.e., “breast cancer”, but not the associated variant. , This phenomenon was attributed to two reasons: the corresponding papers cited in OncoKB either did not include them (C420R, Q546E, and Q546E; Table 3, lines 1-3) or reported the altered codon without further specifying the amino acid change (E545 and H1047; Table 3, lines 4-9). It appears that the OncoKB human curators used their expertise to interpret the published findings, making inferences from relevant knowledge acquired in separate explorations without providing complete references on the PIK3CA E545K webpage. None of these could have been captured by the solutions explored in our study.

Discussion

We explored the ability of selected NLP solutions to efficiently retrieve specific entities (Gene, Variant, Disease, and Chemical) and their relationships from biomedical literature. Additionally, we assessed how these solutions might contribute to decision-making in precision oncology through a particular use case. In the NER task, PubTator 3.0 was the best-performing solution (F1-score close to or above 0.9), followed by BioBERT (F1-score between 0.82 and 0.89), while BioLinkBERT and the two LLMs demonstrated average and poor performance, respectively. In the RE task, BioBERT outperformed BioLinkBERT and PubTator 3.0, but did not reach the required level of performance for routine use, missing several relations in the evaluation with the BioRED testing set and the OncoKB use case. However, it captured all individual entities included in retrieved or missed relations and reported in the OncoKB-cited papers.

Our study has three main limitations. First, the selected BioRED corpus was not originally created with a focus on precision oncology, e.g., the Disease entity did not solely represent cancer types, and the annotated relations did not fully capture all potential associations between the Chemical and the other entities, which is a critical factor in the selection of targeted therapies. Dedicated resources might better support information retrieval and knowledge synthesis from biomedical literature. Second, none of the explored solutions could make inferences from other sections within the same full-text papers or other sources in general, as depicted in the OncoKB use case. This limitation applies to detecting relations at the document level and not simply at the section level, which partly occurred in our approach. It also refers to linking several documents and recognizing entity relations across them. Generative AI may likely bridge this gap, which may represent an additional challenge, as we elected to use LLMs for annotation rather than knowledge generation purposes.

The presented level of performance in the RE task cannot lead to operationalizing any of the solutions evaluated in our work. We would argue, though, that the domain has not solved this problem yet, especially in precision medicine and oncology. A representative example is the recently developed BioREX model, the core RE engine in PubTator 3.0, that considerably improved RE in several relation types (average F1-score 0.79) by applying deep learning to heterogeneous datasets (BioRED was one of these datasets)²³. However, none of these relations contained the Variant entity, although annotated in several datasets^{24, 25}, which is significant for characterizing a patient’s genomic profile and selecting targeted therapies in precision oncology. BioREX’s inability to efficiently capture this relation type was demonstrated in PubTator’s evaluation in the OncoKB use case. In that sense, although PubTator performed best in the NER task, our BioBERT model offered a more efficient approach by nearing PubTator in the NER and outperforming it in the RE task. It should be clarified, though, that none of these approaches is mature enough to solve the RE problem.

Acknowledging the limitations of existing approaches in accurately detecting certain relations in biomedical literature and clinical texts, we suggest pursuing other strategies in the precision oncology context. Traditional NLP approaches would require annotating new corpora or combining and refining existing labeled datasets and (re)training some of the best-performing state-of-the-art models to improve performance. One might also argue that the strengths of the BERT models have not been fully explored in precision oncology, and we would probably agree with this statement. For example, our BioBERT model, as part of an NLP ensemble pipeline, could accept PubTator’s NER output and efficiently detect the relations in biomedical literature. This approach was not examined in our work but could be investigated in one of our next steps. On the other hand, the research community is trying to move from training models to “zero-shot” learning frameworks²⁶ that promise less labor-intensive processes and efficient implementations in end-to-end systems utilizing LLMs. The current work demonstrated that the selected LLMs could not accurately support the NER task traditional NLP approaches have (nearly) solved, further suggesting potential major challenges for these models in the more complex RE task, which we investigated in a separate analysis and found them performing very poorly (data not shown).

The application of generative AI to several tasks in biomedicine and precision oncology is inevitable, and we will be seeing numerous studies in this area over the next several years. Our limited analysis of the two LLMs represents a pilot study and, as such, precludes firm conclusions on the use of these methodologies in clinical decision-making. It is paramount to accurately collect the requirements and expectations from the end users before determining the next steps and calibrating these approaches. In precision oncology specifically, several sources are evaluated to make a clinical decision, and biomedical literature is only one of them. Moreover, knowledge synthesis must be coupled with several other processes, including, but not limited to, accurate computable phenotyping and efficient data integration.

Conclusion

We explored several NLP solutions to automatically extract, synthesize and characterize scientific knowledge from biomedical literature that might support clinical decision-making in the context of identifying genotype-driven therapies for cancer patients. Identifying the relations between key entities was the most challenging task in our analysis, as shown in the comparisons with the reference standard and knowledge included in the OncoKB resource. The latter evaluation was very informative and demonstrated that one of the models (BioBERT) successfully identified all entity mentions found in the OncoKB-cited publications and 55% of the relations listed by the human curators. Future research must deliver efficient systems that will accurately process the compendium of information and process knowledge to support decision-making in precision oncology.

Acknowledgments

This study was supported by the National Cancer Institute as part of two research awards (U01CA274631 and P30CA006973). The contents are those of the authors and do not necessarily represent the official views of, nor an endorsement, by NCI or the U.S. Government.

Figures & Tables

References

1.Schwartzberg L, Kim ES, Liu D, Schrag D. Precision Oncology: Who, How, What, When, and When Not? Am Soc Clin Oncol Educ Book. 2017;37:160–9. doi: 10.1200/EDBK_174176. [DOI] [PubMed] [Google Scholar]
2.Hu Y, Chen Q, Du J, Peng X, Keloth VK, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. J Am Med Inform Assoc. 2024. [DOI] [PMC free article] [PubMed]
3.Wei C-H, Allot A, Lai P-T, Leaman R, Tian S, Luo L, Jin Q, Wang Z, Chen Q, Lu Z. PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge. arXiv preprint arXiv:240111048. 2024. [DOI] [PMC free article] [PubMed]
4.Almeida T, Antunes R, J FS, Almeida JR, Matos S. Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics. Database (Oxford) 2022;2022 doi: 10.1093/database/baac047. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Leaman R, Islamaj R, Adams V, Alliheedi MA, Almeida JR, Antunes R, Bevan R, Chang YC, Erdengasileng A, Hodgskiss M, Ida R, Kim H, Li K, Mercer RE, Mertova L, Mobasher G, Shin HC, Sung M, Tsujimura T, Yeh WC, Lu Z. Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII. Database (Oxford) 2023;2023 doi: 10.1093/database/baad005. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Yang X, Saha S, Venkatesan A, Tirunagari S, Vartak V, McEntyre J. Europe PMC annotated full-text corpus for gene/proteins, diseases and organisms. Sci Data. 2023;10(1):722. doi: 10.1038/s41597-023-02617-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Luo L, Lai PT, Wei CH, Arighi CN, Lu Z. BioRED: a rich biomedical relation extraction dataset. Brief Bioinform. 2022;23(5) doi: 10.1093/bib/bbac282. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Wei CH, Harris BR, Kao HY, Lu Z. tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics. 2013;29(11):1433–9. doi: 10.1093/bioinformatics/btt156. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Li J, Sun Y, Johnson RJ, Sciaky D, Wei CH, Leaman R, Davis AP, Mattingly CJ, Wiegers TC, Lu Z. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database (Oxford) 2016;2016 doi: 10.1093/database/baw068. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Herrero-Zazo M, Segura-Bedmar I, Martinez P, Declerck T. The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. J Biomed Inform. 2013;46(5):914–20. doi: 10.1016/j.jbi.2013.07.011. [DOI] [PubMed] [Google Scholar]
11.Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, Naumann T, Gao J, Poon H. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH) 2021;3(1):1–23. [Google Scholar]
12.BLURB Leaderboard. [Available from: https://microsoft.github.io/BLURB/leaderboard.html]
13.Adams L, Busch F, Han T, Excoffier J-B, Ortala M, Löser A, Aerts HJ, Kather JN, Truhn D, Bressem K. LongHealth: A Question Answering Benchmark with Long Clinical Documents. arXiv preprint arXiv:240114490. 2024. [DOI] [PMC free article] [PubMed]
14.Chen Y, Couto I, Cai W, FU C, Dorneles B, editors. SoftTiger: A Clinical Foundation Model for Healthcare Workflows. AAAI 2024 Spring Symposium on Clinical Foundation Models. 2024.
15.Bhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology. 2024;310(1):e232756. doi: 10.1148/radiol.232756. [DOI] [PubMed] [Google Scholar]
16.Cheng K, Guo Q, He Y, Lu Y, Gu S, Wu H. Exploring the potential of GPT-4 in biomedical engineering: the dawn of a new era. Annals of Biomedical Engineering. 2023;51(8):1645–53. doi: 10.1007/s10439-023-03221-1. [DOI] [PubMed] [Google Scholar]
17.Tian S, Jin Q, Yeganova L, Lai P-T, Zhu Q, Chen X, Yang Y, Chen Q, Kim W, Comeau DC. Opportunities and challenges for ChatGPT and large language models in biomedicine and health. Briefings in Bioinformatics. 2024;25(1):bbad493. doi: 10.1093/bib/bbad493. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Allot A, Peng Y, Wei CH, Lee K, Phan L, Lu Z. LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Nucleic acids research. 2018;46(W1):W530–W6. doi: 10.1093/nar/gky355. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Wagner AH, Walsh B, Mayfield G, Tamborero D, Sonkin D, Krysiak K, Deu-Pons J, Duren RP, Gao J, McMurry J, Patterson S, Del Vecchio Fitz C, Pitel BA, Sezerman OU, Ellrott K, Warner JL, Rieke DT, Aittokallio T, Cerami E, Ritter DI, Schriml LM, Freimuth RR, Haendel M, Raca G, Madhavan S, Baudis M, Beckmann JS, Dienstmann R, Chakravarty D, Li XS, Mockus S, Elemento O, Schultz N, Lopez-Bigas N, Lawler M, Goecks J, Griffith M, Griffith OL, Margolin AA. Variant Interpretation for Cancer C. A harmonized meta-knowledgebase of clinical interpretations of somatic genomic variants in cancer. Nat Genet. 2020;52(4):448–57. doi: 10.1038/s41588-020-0603-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Griffith M, Spies NC, Krysiak K, McMichael JF, Coffman AC, Danos AM, Ainscough BJ, Ramirez CA, Rieke DT, Kujan L, Barnell EK, Wagner AH, Skidmore ZL, Wollam A, Liu CJ, Jones MR, Bilski RL, Lesurf R, Feng YY, Shah NM, Bonakdar M, Trani L, Matlock M, Ramu A, Campbell KM, Spies GC, Graubert AP, Gangavarapu K, Eldred JM, Larson DE, Walker JR, Good BM, Wu C, Su AI, Dienstmann R, Margolin AA, Tamborero D, Lopez-Bigas N, Jones SJ, Bose R, Spencer DH, Wartman LD, Wilson RK, Mardis ER, Griffith OL. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat Genet. 2017;49(2):170–4. doi: 10.1038/ng.3774. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Chakravarty D, Gao J, Phillips SM, Kundra R, Zhang H, Wang J, Rudolph JE, Yaeger R, Soumerai T, Nissan MH, Chang MT, Chandarlapaty S, Traina TA, Paik PK, Ho AL, Hantash FM, Grupe A, Baxi SS, Callahan MK, Snyder A, Chi P, Danila D, Gounder M, Harding JJ, Hellmann MD, Iyer G, Janjigian Y, Kaley T, Levine DA, Lowery M, Omuro A, Postow MA, Rathkopf D, Shoushtari AN, Shukla N, Voss M, Paraiso E, Zehir A, Berger MF, Taylor BS, Saltz LB, Riely GJ, Ladanyi M, Hyman DM, Baselga J, Sabbatini P, Solit DB, Schultz N. OncoKB: A Precision Oncology Knowledge Base. JCO Precis Oncol. 2017;2017 doi: 10.1200/PO.17.00011. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Suehnholz SP, Nissan MH, Zhang H, Kundra R, Nandakumar S, Lu C, Carrero S, Dhaneshwar A, Fernandez N, Xu BW, Arcila ME, Zehir A, Syed A, Brannon AR, Rudolph JE, Paraiso E, Sabbatini PJ, Levine RL, Dogan A, Gao J, Ladanyi M, Drilon A, Berger MF, Solit DB, Schultz N, Chakravarty D. Quantifying the Expanding Landscape of Clinical Actionability for Patients with Cancer. Cancer Discov. 2024;14(1):49–65. doi: 10.1158/2159-8290.CD-23-0467. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Lai PT, Wei CH, Luo L, Chen Q, Lu Z. BioREx: Improving biomedical relation extraction by leveraging heterogeneous datasets. J Biomed Inform. 2023;146:104487. doi: 10.1016/j.jbi.2023.104487. [DOI] [PubMed] [Google Scholar]
24.Doughty E, Kertesz-Farkas A, Bodenreider O, Thompson G, Adadey A, Peterson T, Kann MG. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. Bioinformatics. 2011;27(3):408–15. doi: 10.1093/bioinformatics/btq667. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Pinero J, Ramirez-Anguita JM, Sauch-Pitarch J, Ronzano F, Centeno E, Sanz F, Furlong LI. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48(D1):D845–D55. doi: 10.1093/nar/gkz1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Xian Y, Lampert CH, Schiele B, Akata Z. Zero-Shot Learning-A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Trans Pattern Anal Mach Intell. 2019;41(9):2251–65. doi: 10.1109/TPAMI.2018.2857768. [DOI] [PubMed] [Google Scholar]

[r1-5508] 1.Schwartzberg L, Kim ES, Liu D, Schrag D. Precision Oncology: Who, How, What, When, and When Not? Am Soc Clin Oncol Educ Book. 2017;37:160–9. doi: 10.1200/EDBK_174176. [DOI] [PubMed] [Google Scholar]

[r2-5508] 2.Hu Y, Chen Q, Du J, Peng X, Keloth VK, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. J Am Med Inform Assoc. 2024. [DOI] [PMC free article] [PubMed]

[r3-5508] 3.Wei C-H, Allot A, Lai P-T, Leaman R, Tian S, Luo L, Jin Q, Wang Z, Chen Q, Lu Z. PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge. arXiv preprint arXiv:240111048. 2024. [DOI] [PMC free article] [PubMed]

[r4-5508] 4.Almeida T, Antunes R, J FS, Almeida JR, Matos S. Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics. Database (Oxford) 2022;2022 doi: 10.1093/database/baac047. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5-5508] 5.Leaman R, Islamaj R, Adams V, Alliheedi MA, Almeida JR, Antunes R, Bevan R, Chang YC, Erdengasileng A, Hodgskiss M, Ida R, Kim H, Li K, Mercer RE, Mertova L, Mobasher G, Shin HC, Sung M, Tsujimura T, Yeh WC, Lu Z. Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII. Database (Oxford) 2023;2023 doi: 10.1093/database/baad005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6-5508] 6.Yang X, Saha S, Venkatesan A, Tirunagari S, Vartak V, McEntyre J. Europe PMC annotated full-text corpus for gene/proteins, diseases and organisms. Sci Data. 2023;10(1):722. doi: 10.1038/s41597-023-02617-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7-5508] 7.Luo L, Lai PT, Wei CH, Arighi CN, Lu Z. BioRED: a rich biomedical relation extraction dataset. Brief Bioinform. 2022;23(5) doi: 10.1093/bib/bbac282. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r8-5508] 8.Wei CH, Harris BR, Kao HY, Lu Z. tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics. 2013;29(11):1433–9. doi: 10.1093/bioinformatics/btt156. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9-5508] 9.Li J, Sun Y, Johnson RJ, Sciaky D, Wei CH, Leaman R, Davis AP, Mattingly CJ, Wiegers TC, Lu Z. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database (Oxford) 2016;2016 doi: 10.1093/database/baw068. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10-5508] 10.Herrero-Zazo M, Segura-Bedmar I, Martinez P, Declerck T. The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. J Biomed Inform. 2013;46(5):914–20. doi: 10.1016/j.jbi.2013.07.011. [DOI] [PubMed] [Google Scholar]

[r11-5508] 11.Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, Naumann T, Gao J, Poon H. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH) 2021;3(1):1–23. [Google Scholar]

[r12-5508] 12.BLURB Leaderboard. [Available from: https://microsoft.github.io/BLURB/leaderboard.html]

[r13-5508] 13.Adams L, Busch F, Han T, Excoffier J-B, Ortala M, Löser A, Aerts HJ, Kather JN, Truhn D, Bressem K. LongHealth: A Question Answering Benchmark with Long Clinical Documents. arXiv preprint arXiv:240114490. 2024. [DOI] [PMC free article] [PubMed]

[r14-5508] 14.Chen Y, Couto I, Cai W, FU C, Dorneles B, editors. SoftTiger: A Clinical Foundation Model for Healthcare Workflows. AAAI 2024 Spring Symposium on Clinical Foundation Models. 2024.

[r15-5508] 15.Bhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology. 2024;310(1):e232756. doi: 10.1148/radiol.232756. [DOI] [PubMed] [Google Scholar]

[r16-5508] 16.Cheng K, Guo Q, He Y, Lu Y, Gu S, Wu H. Exploring the potential of GPT-4 in biomedical engineering: the dawn of a new era. Annals of Biomedical Engineering. 2023;51(8):1645–53. doi: 10.1007/s10439-023-03221-1. [DOI] [PubMed] [Google Scholar]

[r17-5508] 17.Tian S, Jin Q, Yeganova L, Lai P-T, Zhu Q, Chen X, Yang Y, Chen Q, Kim W, Comeau DC. Opportunities and challenges for ChatGPT and large language models in biomedicine and health. Briefings in Bioinformatics. 2024;25(1):bbad493. doi: 10.1093/bib/bbad493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r18-5508] 18.Allot A, Peng Y, Wei CH, Lee K, Phan L, Lu Z. LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Nucleic acids research. 2018;46(W1):W530–W6. doi: 10.1093/nar/gky355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19-5508] 19.Wagner AH, Walsh B, Mayfield G, Tamborero D, Sonkin D, Krysiak K, Deu-Pons J, Duren RP, Gao J, McMurry J, Patterson S, Del Vecchio Fitz C, Pitel BA, Sezerman OU, Ellrott K, Warner JL, Rieke DT, Aittokallio T, Cerami E, Ritter DI, Schriml LM, Freimuth RR, Haendel M, Raca G, Madhavan S, Baudis M, Beckmann JS, Dienstmann R, Chakravarty D, Li XS, Mockus S, Elemento O, Schultz N, Lopez-Bigas N, Lawler M, Goecks J, Griffith M, Griffith OL, Margolin AA. Variant Interpretation for Cancer C. A harmonized meta-knowledgebase of clinical interpretations of somatic genomic variants in cancer. Nat Genet. 2020;52(4):448–57. doi: 10.1038/s41588-020-0603-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20-5508] 20.Griffith M, Spies NC, Krysiak K, McMichael JF, Coffman AC, Danos AM, Ainscough BJ, Ramirez CA, Rieke DT, Kujan L, Barnell EK, Wagner AH, Skidmore ZL, Wollam A, Liu CJ, Jones MR, Bilski RL, Lesurf R, Feng YY, Shah NM, Bonakdar M, Trani L, Matlock M, Ramu A, Campbell KM, Spies GC, Graubert AP, Gangavarapu K, Eldred JM, Larson DE, Walker JR, Good BM, Wu C, Su AI, Dienstmann R, Margolin AA, Tamborero D, Lopez-Bigas N, Jones SJ, Bose R, Spencer DH, Wartman LD, Wilson RK, Mardis ER, Griffith OL. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat Genet. 2017;49(2):170–4. doi: 10.1038/ng.3774. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21-5508] 21.Chakravarty D, Gao J, Phillips SM, Kundra R, Zhang H, Wang J, Rudolph JE, Yaeger R, Soumerai T, Nissan MH, Chang MT, Chandarlapaty S, Traina TA, Paik PK, Ho AL, Hantash FM, Grupe A, Baxi SS, Callahan MK, Snyder A, Chi P, Danila D, Gounder M, Harding JJ, Hellmann MD, Iyer G, Janjigian Y, Kaley T, Levine DA, Lowery M, Omuro A, Postow MA, Rathkopf D, Shoushtari AN, Shukla N, Voss M, Paraiso E, Zehir A, Berger MF, Taylor BS, Saltz LB, Riely GJ, Ladanyi M, Hyman DM, Baselga J, Sabbatini P, Solit DB, Schultz N. OncoKB: A Precision Oncology Knowledge Base. JCO Precis Oncol. 2017;2017 doi: 10.1200/PO.17.00011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r22-5508] 22.Suehnholz SP, Nissan MH, Zhang H, Kundra R, Nandakumar S, Lu C, Carrero S, Dhaneshwar A, Fernandez N, Xu BW, Arcila ME, Zehir A, Syed A, Brannon AR, Rudolph JE, Paraiso E, Sabbatini PJ, Levine RL, Dogan A, Gao J, Ladanyi M, Drilon A, Berger MF, Solit DB, Schultz N, Chakravarty D. Quantifying the Expanding Landscape of Clinical Actionability for Patients with Cancer. Cancer Discov. 2024;14(1):49–65. doi: 10.1158/2159-8290.CD-23-0467. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23-5508] 23.Lai PT, Wei CH, Luo L, Chen Q, Lu Z. BioREx: Improving biomedical relation extraction by leveraging heterogeneous datasets. J Biomed Inform. 2023;146:104487. doi: 10.1016/j.jbi.2023.104487. [DOI] [PubMed] [Google Scholar]

[r24-5508] 24.Doughty E, Kertesz-Farkas A, Bodenreider O, Thompson G, Adadey A, Peterson T, Kann MG. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. Bioinformatics. 2011;27(3):408–15. doi: 10.1093/bioinformatics/btq667. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r25-5508] 25.Pinero J, Ramirez-Anguita JM, Sauch-Pitarch J, Ronzano F, Centeno E, Sanz F, Furlong LI. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48(D1):D845–D55. doi: 10.1093/nar/gkz1021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26-5508] 26.Xian Y, Lampert CH, Schiele B, Akata Z. Zero-Shot Learning-A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Trans Pattern Anal Mach Intell. 2019;41(9):2251–65. doi: 10.1109/TPAMI.2018.2857768. [DOI] [PubMed] [Google Scholar]

PERMALINK

Artificial Intelligence-assisted Biomedical Literature Knowledge Synthesis to Support Decision-making in Precision Oncology

Ting He, MS

Kory Kreimeyer, MS

Mimi Najjar, MD

Jonathan Spiker, BS

Maria Fatteh, MD

Valsamo Anagnostou, MD, PhD

Taxiarchis Botsis, MS, MPS, PhD

Abstract

Introduction