Skip to main content
Springer logoLink to Springer
. 2025 Jul 16;58(10):314. doi: 10.1007/s10462-025-11316-5

A survey of NLP methods for oncology in the past decade with a focus on cancer registry applications

Isaac Hands 1,2,3,, Ramakanth Kavuluru 3,4
PMCID: PMC12267331  PMID: 40688631

Abstract

Clinical texts from pathology and radiology reports provide critical information for cancer diagnosis and staging. This study surveys the application of natural language processing (NLP) in cancer registry operations from 2014 to 2024. A total of 156 articles from Scopus and PubMed were reviewed and were categorized by NLP methods, document types, cancer sites, and research aims. NLP approaches were evenly distributed across rule-based (n=70), machine learning (n=66), and traditional deep learning (n=70), with transformer models (n=29) gaining prominence since 2019. Encoder-only models like BERT and its clinical adaptations (e.g., ClinicalBERT, RadBERT) show significant promise, though methods for increasing context length are needed. Decoder-only models (e.g., GPT-3, GPT-4) are less explored due to privacy concerns and computational demands. Notably, pediatric cancers, melanomas, and lymphomas are underrepresented, as are research areas such as disease progression, clinical trial matching, and patient communication. Multi-modal models, important for precision oncology and cancer screening, are also scarce. Our study highlights the potential of NLP to enhance data abstraction efficiency and accuracy in cancer registries, making greater use of cancer registry data for patient benefit. However, further research is needed to fully leverage transformer-based models, particularly for underrepresented cancer types and outcomes. Addressing these gaps can improve the timeliness, completeness, and accuracy of structured data collection from clinical text, ultimately enhancing cancer research and patient outcomes.

Supplementary Information

The online version contains supplementary material available at 10.1007/s10462-025-11316-5.

Keywords: Natural language processing, Cancer registries, Cancer, Transformers, Clinical text

Introduction

Cancer and clinical text

Cancer is a collection of diseases characterized by the abnormal growth of cells that can evade the immune system and spread to distant parts of the body. It will be responsible for the deaths of an estimated 611,720 Americans in 2024 (https://seer.cancer.gov/statfacts/html/common.html). Most cancer diagnoses are first definitively recorded in clinical text through either a pathology report, based on a biopsy, or a radiology report, when a biopsy is not feasible (https://www.cancer.gov/about-cancer/diagnosis-staging/diagnosis/pathology-reports-fact-sheet). In addition to cancer diagnosis, pathology and radiology reports provide cancer staging information, describing the extent of disease in the body of a cancer patient. Along with observations recorded in clinical text, genomic and radiologic image profiling provide a more detailed characterization of a cancer diagnosis. These details are essential for determining viable treatments, progression of the disease, and eligibility for clinical trials.

Cancer registries

The primary roles of central cancer registries are to count the number of new cancer diagnoses each year and collect detailed information on cancer patients, such as cancer staging and vital status. Every U.S. state has a central cancer registry, given authority and parameters for data collection by state legislatures. Data are typically first collected at healthcare facilities, who fund and manage their own local cancer registries, where it is then sent to a state central cancer registry in order to consolidate information for a patient across the state. The most critical sources of diagnosis and staging information for a cancer registry are pathology and radiology reports, which are reviewed by Oncology Data Specialists (https://www.ncra-usa.org/ODS-Credential), previously known as Certified Tumor Registrars, who manually extract structured data from the narrative text. The text is typically found in electronic health records, which must be searched manually, and can be difficult to consolidate as these documents often exist across multiple healthcare systems and each document type can be located in different parts of the medical record.

Evolution of natural language processing methods

The evolution of natural language processing (NLP) methods can be divided into four main stages, outlined in Table 1. Rule-based NLP is characterized by human-derived linguistic rules that apply expert knowledge to text in order to accomplish NLP tasks. In clinical text, pattern matching rules could be designed to extract important dates, medication dosages, or detect biomarkers or medications names. Deriving and maintaining these rules is time-consuming, does not scale well, and is difficult to apply out of context from their original intent. Rule-based NLP shows high precision due to carefully tailored rules, but it lacks sensitivity because it cannot capture all the ways information is expressed in natural language. Machine learning constitutes algorithms that allow machines to learn from data, primarily through statistical approaches. Some examples of traditional machine learning algorithms include logistic regression, decision trees, and traditional neural networks. Machine learning is typically used to solve classification and regression tasks for small data sets and relies on feature engineering to prepare input data. Some examples where these NLP methods excel on clinical text include categorizing clinical document types or identifying named medical entities in text such as diseases or symptoms.

Table 1.

Stages of NLP methods evolution

NLP method Early references Early references in oncology
Rule-based Chomsky (1957) Friedman et al. (1994)
Machine learning Brown et al. (1990) Taira et al. (2001)
Traditional deep learning Mikolov et al. (2013) Yoon et al. (2017)
Transformer-based Vaswani et al. (2017) Zhang et al. (2019)

Traditional deep learning

Traditional deep learning methods extend standard neural networks by incorporating multiple hidden layers to capture complex data representations. Common traditional deep learning architectures include convolutional neural networks (CNNs) and recurrent neural networks (RNNs), which excel at tasks involving structured sequences or spatially structured data. With adequate training data, these methods generally outperform machine learning and rule-based systems, as they automatically learn complex feature representations from raw text, eliminating much of the manual feature engineering required by earlier methods. Traditional deep learning models have demonstrated strong performance in clinical NLP tasks, such as classifying clinical documents, recognizing medical entities, and extracting structured clinical values embedded within narrative text. Despite their strengths, these methods are computationally demanding, sensitive to data quality, susceptible to overfitting, and typically produce predictions through processes that are difficult to interpret.

Transformer architectures

Transformer models represent a significant advancement in deep learning, explicitly designed to address key limitations of traditional deep learning methods, especially their difficulty in capturing long-range dependencies in text. Transformers enhance traditional deep neural networks through the introduction of attention mechanisms and positional encoding. Attention mechanisms allow language models to capture the importance of different parts of a text sequence independently, while positional encoding records the relative order of a text sequence. The original transformer architecture uses deep neural networks, attention, and positional encoding to create a fixed-sized vector of a variable sized input, called encoding, and transforms it into a new sequence of text, during decoding. These encoder-decoder transformers perform well on language translation, summarization, and question-answering tasks, where the length of the input text can vary widely from the output length. For example, the T5 model (Raffel et al. 2020) demonstrates exceptional performance on the GLUE benchmarks, covering a wide range of NLP tasks. A second transformer architecture utilizing only the encoder portion of the transformer excels at language understanding tasks that rely on sufficient amounts of labeled data, such as classification and named entity recognition. These encoder-only models utilize masked-language modeling during pre-training to encode the context of input text. One of the most popular encoder-only models, BERT (Devlin et al. 2018), has spawned multiple clinical domain specific pre-trained models (Alsentzer et al. 2019; Gu et al. 2021; Peng et al. 2019). The performance of encoder-decoder and encoder-only models is dependent on an additional training step called fine-tuning that requires ample amounts of labeled training data. This dependence can limit the scenarios where these types of transformers can be deployed. A third variation of the transformer, the generative pre-trained transformer (GPT), uses only the decoder portion of the transformer to achieve state of the art results on NLP tasks where labeled data is scarce. GPT-based architectures are trained to predict the next token in a sequence, called auto-regressive training, where the first part of a phrase is shown to the model and it must predict the next token. This special training objective is more efficient to train on large volumes of unlabeled text, allowing GPT models to capture extensive knowledge across billions of parameters, leading to superior performance on generative tasks such as summarization and question answering. Decoder-only models can generate summaries of patient care from clinical notes spanning multiple hospital visits or explain complex medical conditions to non-experts in plain language. Additionally, GPT models have demonstrated excellent performance on classification and information extraction tasks without supervised fine tuning, utilizing an approach called “in-context learning” (Brown et al. 2020).

Objectives

The purpose of this survey is to investigate current published research on NLP methods applied to unstructured narrative text in the cancer domain, as they are informed by use cases at cancer registries. It aims to characterize the NLP methodologies, data characteristics, study purpose, and clinical outcomes of such research. Notably, we do not attempt to cover artificial intelligence methods that take structured data as input, we are only focused on NLP methods that have been applied to unstructured cancer clinical text. Special attention was given to distinguishing between modern NLP methods based on transformers versus older methods. Through this approach, we have identified gaps in the ways that NLP methods are applied to the cancer registry setting, suggesting ways to improve cancer registry data collection, and ultimately improve patient outcomes.

Related work

In the past five years, there have been a few review articles that investigate natural language processing techniques as they apply to either clinical text in general, specific types of clinical documents, or the broad field of oncology research. To date, however, there are no review articles that focus on the use of natural language processing of all types of clinical text in cancer registries. For example, one review (Tucker et al. 2019) discussed the usefulness and untapped potential of central cancer registries, mentioning the value of NLP for determining the cancer reportability of pathology reports. Another (Savova et al. 2019) paper reported on advances in general clinical NLP up to 2019 with a brief overview of information extraction techniques for pathology reports at cancer registries. We also found a systematic review (Santos et al. 2022) of how NLP is used to classify cancer pathology reports with the unexpected conclusion that, “... there are no studies adapting BERT models to pathology tasks.” With a short mention of cancer registries, one review (López-Úbeda et al. 2022) catalogued modern NLP methods for processing pathology reports and argued that they can be used to enhance a pathologist’s workflow when creating reports. Another review (Bitterman et al. 2021) reported on the current state of clinical NLP for radiology reports, drawing attention to the potential for NLP to facilitate the creation of a national radiation oncology registry. A scoping review (Saha et al. 2023) described how NLP is used in breast cancer studies, where cancer registry involvement was found to be utilized in less than 5% of studies and transformer based architectures were used 13.6% of the time. A more recent review (Lin et al. 2024) looked at the use of NLP for cancer clinical text while noting the evolution of methods development from rule-based approaches to transformer architectures, but it was limited to radiology reports and lacked any mention of cancer registries. Their study noted that privacy, security, and bias were important challenges that modern NLP methods must overcome.

The process of manually extracting structured data from clinical text at a cancer registry is slow, resource intensive, and becomes more difficult over time as the complexity and number of data elements increases. Even with the advent of electronic medical records, cancer registries have struggled to keep up with the increasing breadth and depth of data needed to characterize cancer (Tangka et al. 2021). A recent article on the evolution of central cancer registry data processing (Wormeli et al. 2021) called out the unsustainable nature of continued manual data collection. On the other hand, central cancer registries provide a unique source of cancer data that is often the best source of external validity for cancer research since they collect data from a large, diverse population (Tucker et al. 2019). Moreover, they are the definitive source of cancer data that can most accurately inform public policy decisions for distributing healthcare resources and bring awareness to weaknesses in current healthcare strategies (White et al. 2017). Methods that can improve the timeliness, completeness, and accuracy of structured data collection from clinical text at central cancer registries can greatly improve cancer research and will ultimately lead to better patient outcomes

Unlike previous review papers, this survey summarizes NLP methods used in current cancer research while bringing attention to the unmet needs of cancer registries. This work is important for identifying ways that modern NLP techniques can bring efficiencies to cancer registry operations, data collection, and their mission of assisting cancer prevention and control.

Methodology

In this survey, we identified relevant research articles from the last 10 years, filtered out studies that did not meet our eligibility criteria, and then analyzed the included papers by extracting a pre-determined set of data variables using a web-based template. We followed the general flow of information defined by the PRISMA 2020 reporting guidelines (Page et al. 2021), which defines a standardized way to record and report the number of articles identified, included and excluded, and the reasons for exclusions. Reported results are based on the final set of research articles and extracted data. A complete list of articles included in the study can be found in supplementary information.

Article search criteria

Articles were identified for this study by searching two well-known literature search engines, Scopus and PubMed, for publications where the title or abstract included the term “cancer”, at least one keyword denoting a focus on clinical text, and at least one keyword denoting a focus on natural language processing. The keywords were chosen to include a wide range of natural language processing methods, including deep learning and transformer-based architectures, as well as capturing any articles that mentioned cancer registries. In addition, we chose a publication cutoff date of 2014 so that modern NLP methods from approximately the last 10 years were represented, including transformer-based NLP methods that began in 2017 (Vaswani et al. 2017). The full queries for each search engine are shown in Table 2.

Table 2.

Search criteria used in literature search

Scopus PubMed
(TITLE-ABS-KEY(cancer) AND (TITLE-ABS-KEY(“cancer registr*”) OR TITLE-ABS-KEY(“pathology report”) OR TITLE-ABS-KEY(“pathology text”) OR TITLE-ABS-KEY(“radiology report”) OR TITLE-ABS-KEY(“radiology text”) OR TITLE-ABS-KEY(“clinical text”))) AND (TITLE-ABS-KEY(“language model”) OR TITLE-ABS-KEY(“NLP”) OR TITLE-ABS-KEY(“natural language processing”) OR TITLE-ABS-KEY(“AI”) OR TITLE-ABS-KEY(“artificial intelligence”) OR TITLE-ABS-KEY(“deep learning”) OR TITLE-ABS-KEY(“text classif*”)) “cancer”[Title/Abstract] AND (“cancer registr*”[Title/Abstract] OR “pathology report”[Title/Abstract] OR “pathology text”[Title/Abstract] OR “radiology report”[Title/Abstract] OR “radiology text”[Title/Abstract] OR “clinical text”[Title/Abstract]) AND ((“language model”[Title/Abstract] OR “NLP”[Title/Abstract] OR “natural language processing”[Title/Abstract] OR “AI”[Title/Abstract] OR “artificial intelligence” [Title/Abstract] OR (“deep learning”[Title/Abstract]) OR “text classif*”[Title/Abstract]))

Eligibility screening

Articles were de-duplicated and manually filtered to include only articles that met all of the following criteria:

  • Included a research question in the cancer domain

  • Methods included NLP of clinical text

  • Focused on English language text

  • Peer-reviewed original research

  • Not a review article

Data extraction

We extracted structured data elements from our final set of articles for three categories of data elements: NLP Methods, Data Characteristics, and Research Aims. For NLP methods, we identified the NLP algorithm stages as listed in Table 1, along with other details of the language model. For example, we recorded the names of base models when pre-trained transformer models were used and captured the architecture details. When a paper utilized more than one NLP method, we considered these as hybrid NLP systems and counted each method separately. In the Data Characteristics category, we gathered information on the types of clinical documents used, the use of multi-modal models, and the types of cancer represented in the data corpus. For Research Aims, we categorized the clinical outcomes that were the aims of the study and noted mentions or awareness of cancer registries.

Results

From an initial search of both Scopus and PubMed, 840 articles were found that were published in 2014 or later. After removing 235 duplicates, we were left with 605 articles that we screened for eligibility. Many of these articles (n=284) did not describe NLP methods on clinical text from cancer patients, but instead focused on discrete medical data. A smaller (n=87) but significant portion were deemed ineligible because they did not describe peer-reviewed original research, such as papers describing commercial products or challenges in adopting clinical NLP methods. The remaining articles that were excluded focused on applications of NLP to non-English text (n=52), or were review articles (n=26). Overall, 156 articles were left for data extraction and final analysis. Figure 1 shows an overall diagram of the screening and eligibility workflow that produced our final set of publications to analyze.

Fig. 1.

Fig. 1

PRISMA diagram showing flow of articles through the screening process

Utilization of NLP methods

As shown in Fig. 2, we found that total NLP research on cancer clinical text since 2014 is spread evenly across rule-based (n=70), machine learning (n=66), and traditional deep learning (n=70) methods, with less than half as much research employing transformer-based architectures (n=29). Note that these numbers total more than the 156 articles in our final cohort since many studies utilized multiple NLP methods.

Fig. 2.

Fig. 2

Total number of articles by NLP method since 2014

Figure 3 shows that rule-based and machine learning approaches were the only methods utilized for the first three years of our study, from 2014 to 2016. They then increased to a peak in 2019 and then slowly decrease in prevalence until the end of the study in 2024. Traditional deep learning techniques appeared first in 2017, growing to the highest level of any method in 2021, before steadily decreasing each year thereafter. Transformer architectures appear in 2019 and continue to increase, matching or exceeding the levels of other methods each year thereafter.

Fig. 3.

Fig. 3

Number of articles utilizing an NLP method in cancer clinical text by year

The chart in Fig. 4 shows that Transformer-based architectures were identified in a total of 28 articles, with encoder-only models dominating the literature (n=22), followed by decoder-only (n=6) and encoder-decoder variants with much lower counts. Of the six papers that utilized decoder-only architectures, four of them used radiology reports for clinical text and one used pathology reports. In these studies, two used ChatGPT to summarize radiology reports (Chung et al. 2023; Lyu et al. 2023), one used PubMedGPT to determine disease response from radiology reports (Tan et al. 2023), and one showed how to generate new radiology reports from images using the distilGPT2 model (Alfarghaly et al. 2021). We found that GPTInline graphic3.5 and GPT-4 were used for information extraction (Huang et al. 2024) and for summarization, question answering, and information extraction of structured data from pathology reports (Truhn et al. 2024).

Fig. 4.

Fig. 4

Number of articles utilizing transformers by architecture

Sources of cancer clinical text

As shown in Fig. 5, 68% (n=106) of the articles in our survey reported using pathology report text to train their NLP models. Radiology reports were used in 33% (n=51) while progress notes and other clinical text appeared in only 23% (n=36) of articles. Cancer abstract text was rarely utilized, showing up in only two articles. A minority of research studies (n=28, 18%) used some combination of these text document types, shown in Fig. 6. An even smaller portion (6%) of the research articles we reviewed integrated natural language processing of clinical text with other types of non-textual data. Of the papers that included data types other than text, five looked at ways to combine text and imaging data, three used discrete lab values alongside narrative text, and one paper utilized genomic variants in addition to the text. Only three of these multi-modal studies relied on transformers, using either the original BERT encoder-only model (Wang et al. 2024a), the distilGPT decoder-only model (Alfarghaly et al. 2021), or a customized architecture called Q-former (Wang et al. 2024b).

Fig. 5.

Fig. 5

Total number of articles by clinical document type

Fig. 6.

Fig. 6

Total number of articles by combinations of clinical document types

Cancer types

We recorded the cancer types that were represented in the text corpus of NLP studies as defined by the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) Program (Fritz et al. 2001). Counts of research articles by SEER cancer site are reported in Fig. 7. The top five most researched cancer sites were breast (n=42), all cancer (n=41), lung (n=35), digestive system (n=27), and male genital system (n=23).

Fig. 7.

Fig. 7

Total number of articles by SEER cancer site

Cancer research aims

Every research paper that was part of this survey had one or more cancer-research related aims, which were counted and reported in Fig. 8. The most common research aim was to determine whether or not a cancer diagnosis could be found in a clinical text document (60%, n=94). The second most common aim was to extract prognostic data items from clinical text (42%, n=65), such as staging and grade. Aims that focused on clinical decision support or determining the best treatment for a cancer patient were found in 41 articles, followed by those articles looking at disease risk (n=20) and then recurrence or progression of disease (n=10). A small number of articles (n=9) attempted to use NLP methods to assist with quality assurance or quality control of data collection, while another small cohort of studies (n=7) attempted to create research-ready datasets from NLP techniques. Only three articles were found with a research aim of patient or physician education and a single article attempted to use NLP for matching cancer patients to clinical trials.

Fig. 8.

Fig. 8

Total number of articles by cancer research aim

Discussion

When looking at counts of papers by NLP method over individual years, as shown in Fig. 3, several interesting patterns emerge. First, rule-based approaches showed significant utilization across all years of the study. Second, total annual publication counts are relatively low from 2014 to 2016. Then, in 2017, traditional deep learning methods start to appear, followed by transformers in 2019, along with an increase in the total annual volume of all NLP methods research on cancer clinical text. These observations follow the general historical trend of traditional deep learning NLP emergence after the publication of the Word2Vec paper (Mikolov et al. 2013) in 2013. Our research indicates that these deep learning methods applied to cancer clinical text started to increase four years after the Word2Vec paper, in 2017. One explanation for this late emergence and the general increase in cancer clinical text research around 2017 is that clinical text has been historically difficult to obtain in digital form in the United States until the widespread adoption of electronic medical records in 2015 (Henry et al. 2016). In addition, the MIMIC-III dataset was released in 2016 (Johnson et al. 2016), consisting of over 2 million clinical notes and more than 500,000 radiology notes, all labeled with ICD-9 codes. Once clinical text was more readily available in digital form, NLP researchers could take advantage of the data for training traditional deep learning models.

Popularity of rule-based approaches

Rule-based approaches were well represented in both total papers and annual counts. When rule-based methods were mixed with other methods, two patterns emerged. First, rule-based methods are sometimes used as baseline comparators to more sophisticated methods (Rios et al. 2019; Banerjee et al. 2019; Mithun et al. 2023). Of note in these types of studies, the rule-based approaches are reported to sometimes outperform more sophisticated approaches (Mithun et al. 2023). Second, rule-based approaches can be used in combination with machine learning and traditional deep learning methods to create a hybrid NLP system (Wang et al. 2020; Gauthier et al. 2022; Dai et al. 2021). Or, rule-based approaches can be used for both baseline comparisons and incorporation into a larger hybrid model (Huang et al. 2023). The only clear pattern that emerged regarding the selection of rule-based methods was that they were chosen for NLP tasks involving either information extraction or classification.

In most of the papers that used rule-based methods (n=36), they were the only methods used, which may be a result of their interpretability and the general availability of software tools and high-level programming languages that can implement rule-based approaches. Rule-based NLP on clinical text applies human-derived rules that implement clinical domain knowledge to either extract or classify information from cancer clinical text. Another term for this approach is white-box algorithm development, where parameters are derived from explicit, human-crafted rules that can be inspected and modified through human curation. It stands in contrast to so-called black-box algorithm methods, such as deep learning, where the parameters of the model are automatically derived through complex mathematical processes that are not easily interpretable by humans. When the output of an algorithm is being used for patient care, white-box methods that are easily understood and controlled by the clinician responsible for a patient will often be preferred. As a drawback, white-box models require manual intervention by clinical domain experts to generate and maintain domain-specific rules, which is time consuming and difficult to maintain for already overburdened clinicians. Black-box methods of NLP on cancer clinical text may be a better match for cancer registries which are not responsible for patient care and are mostly concerned with data efficiency and reducing manual effort while relying on data quality audits to ensure the integrity of cancer registry data sets.

Traditional deep learning approaches in cancer text analysis

Traditional deep learning methods constituted a significant portion of the reviewed literature (n=70) for cancer-related text processing. The most common architectures were convolutional neural networks (CNNs) and recurrent neural networks (RNNs), including variants such as long short-term memory (LSTM) networks. In some cases, researchers combined CNN and LSTM layers to leverage both local features and long-range dependencies. For example, (Alawad et al. 2020) trained a multi-task CNN to extract multiple tumor characteristics from pathology reports, while (Mithun et al. 2023) leveraged BiLSTM-based architectures for lung cancer report classification. A smaller subset of papers explored other deep learning designs such as custom feed-forward deep neural networks or autoencoder-based models, but these were far less common.

These traditional deep learning architectures proved well-suited to the complexities of cancer clinical text. In contrast to rule-based systems, CNN and RNN-like models can automatically learn rich latent representations of clinical language, reducing the need for extensive manual feature engineering. CNNs excel at capturing local textual patterns, such as identifying key phrases indicative of tumor attributes or clinical findings, which is particularly valuable in parsing cancer clinical text (Alawad et al. 2020). LSTM networks, conversely, are able to retain long-term context, interpreting clinical details across multiple sentences within documents. These strengths allowed many deep learning models to achieve superior performance compared to earlier methods on tasks such as entity extraction (Gao et al. 2018) and document classification (Yoon et al. 2017; Banerjee et al. 2019). However, the inherent complexity of these models can make their decision-making processes opaque, contrasting sharply with the interpretability of rule-based methods.

Distribution of transformer architectures

Transformer-based architectures begin to appear in 2019, with their prevalence steadily increasing throughout the study period. This emergence of papers shortly after the publication of the transformer definition (Vaswani et al. 2017), along with a gradual increase in representation matching the overall commercial and research interest in transformers is expected. However, the distribution of transformer architectures, reported in Fig. 4, was notable for its emphasis on encoder-only models and under-representation of decoder-only (n=6) and encoder-decoder (n=3) architectures. With the development of GPT-3 in 2020, decoder-only models have been shown to perform well on not just generative tasks, but text classification and information extraction, through the use of few-shot learning (Brown et al. 2020). Fully supervised fine-tuning of large language models has demonstrated superior performance compared to encoder-only models (Hu et al. 2024). However, this improvement comes with an order of magnitude more compute time and significantly more GPU resource demands. Moreover, the small performance gains may not always justify these substantial resource requirements. Other efforts demonstrate that it is still an open question as to whether decoder-only models can consistently outperform encoder-only or encoder-decoder models on these non-generative tasks (Chen et al. 2024).

Decoder-only models can perform well on generative tasks when the pretraining data consists of a massive amount of unlabeled text, and since 80% of all clinical data is unstructured (Sedlakova et al. 2023), we would expect robust representation of decoder-only models in NLP research focused on cancer clinical text. Their lack of prevalence in our study may be due to the problem of hallucinations, that is, the generation of plausible but incorrect or nonsensical responses from a GPT model. In a clinical context, generative responses need to be trusted and traceable to source information, something GPT models cannot guarantee without additional safeguards against hallucinations. The problem of protecting against hallucinations is unsolved, but can be mitigated with techniques such as human-in-the-loop review, rule-based constraints, or Retrieval Augmented Generation (Lewis et al. 2020), which uses external, trusted information to ground generated responses in facts. Privacy is also a significant challenge in the deployment of decoder-only language models for clinical applications. These models require substantial infrastructure for hosting, which can often only be found through commercial providers. Additionally, fine-tuning these models on clinical data poses privacy risks, as private health information may be inadvertently exposed through techniques such as membership inference attacks (Jagannatha et al. 2021; Lukas et al. 2023). To address these concerns, researchers have explored the use of large decoder-only language models for de-identifying clinical data prior to transmission to commercial hosting environments (Wiest et al. 2024).

Encoder-only architectures were found in a majority of research articles that used transformers in our survey (71%, n=22). These models typically excel when labeled text data are available and the NLP tasks of interest can be formulated as classification or named entity recognition. Every encoder-only model we encountered was based on the BERT architecture (Devlin et al. 2018), where most downstream NLP tasks were fine-tuned using either pathology (n=10) or radiology (n=12) reports. Only two studies did not use either pathology or radiology reports, opting instead to use clinical progress notes. Almost half of the encoder-only studies (n=10) started with a base model that was trained on clinical text, including ClinicalBERT (Huang et al. 2019), BioClinicalBERT (Alsentzer et al. 2019), RadBERT (Yan et al. 2022), and BlueBERT (Peng et al. 2019). And approximately half (n=12) started with either the base BERT model or variants that were fine tuned using non-clinical text, such as BioBERT and PubMedBERT. One significant limitation of standard BERT-based models is their inability to process input texts longer than 512 tokens, as constrained by the WordPiece tokenizer. This limit, equivalent to roughly 400 words, is far smaller than the average length of a clinical text document, which typically exceeds this size (Gao et al. 2021). Although there have been some encoder-only models that attempt to address the context length weaknesses of BERT, such as Clinical-Longformer and Clinical-BigBird (Li et al. 2022), they were not found in our survey. As clinical text documents are first reviewed at a cancer registry, they often lack reliable labels that could be used by encoder-only models. However, as a cancer abstract is completed and linked to its clinical text source documents, the structured data in the cancer abstract could be used as labels for the source text documents. These labeled data can then be used for training a variety of NLP models, as demonstrated in the MOSSAIC project (Hsu et al. 2024).

Types of clinical documents

The large percentage of articles (68%) that reported using pathology reports is unsurprising since the first definitive diagnosis of cancer is often found in those documents. Radiology reports were found in less than half the number of articles with pathology reports, which is unfortunate since radiologic imaging is used as an early detector of many cancers, including the four most common: lung, prostate, breast, and colorectal. Narrative text in a radiology report can also be the most definitive source of a cancer diagnosis when surgical pathology is not feasible, as in brain, bone, and certain lung cancers. A primary goal of cancer registries is to record a definitive diagnosis for every cancer in a population, whether from pathology, radiology, or other clinical notes. Progress notes and other clinical text were found in much fewer numbers of articles that we reviewed, which seems unusual since these types of documents are abundant throughout the medical record. The rare utilization of cancer registry abstract text is expected due to the widespread use of abbreviations in this data source. These abbreviations may not be found in standard vocabularies and prove difficult to disambiguate. Overall, there were no clear patterns identified for how the selections of document types were made.

Multi-modal models

Very few research articles that utilized multi-modal data were found in our study, which is notable considering that detailed characterization of a cancer diagnosis through a combination of clinical text and other data modalities can be essential for finding the correct diagnosis and the right cancer treatment. As mentioned previously, radiologic imaging in particular is often the only feasible method of cancer diagnosis and characterization for certain tumors. Moreover, these medical images are always accompanied by the narrative text of a radiology report. This lack of multi-modal data may be due to the complexity of developing and refining models that combine text and images in a clinical setting, a challenge that would be amplified in an under-resourced cancer registry.

Representation of cancer types

The cancer sites most abundantly represented in the articles of our survey cover the top four most common cancers in the U.S. in 2024 (https://seer.cancer.gov/statfacts/html/common.html). However, there was a striking lack of research dedicated to some of the most deadly, lower incidence cancers. For example, pediatric cancer was only researched in two articles (Yoon et al. 2022; Huang et al. 2024), even though it is a leading cause of death by disease among children and adolescents (Siegel et al. 2023). Melanoma was found in one study (Malke et al. 2019), even though it is one of the deadliest cancers and can be effectively treated if caught early (Mishra et al. 2018). And lymphomas only showed up in one article (Luo et al. 2014), despite non-Hodgkin lymphoma having the sixth highest cancer mortality (Thandra et al. 2021). These deadly cancers were possibly included in the studies that chose to look at all cancer sites, but it is difficult to tell how well represented they were in the data. Since central cancer registries are tasked with collecting data that represents an entire population, it is essential that they are able to accurately record incidence and mortality for rare, low incidence cancers, in addition to the most prevalent.

Research aims of NLP methods

The most common cancer research aims found in our study, determining diagnosis and prognosis, represent the most important structured data items that anchor a cancer abstract in a cancer registry. It is surprising then, that of the 115 research articles whose research aim is either diagnosis or prognosis, 43% (n=50) do not mention cancer registries.

One of the only ways that a cancer patient can get access to cutting edge therapies is through enrollment in a clinical trial. It is of great concern, therefore, that only one article we reviewed had the research aim of matching eligible patients to clinical trials. In addition, a relatively small number of articles were devoted to determining disease progression, metastasis, or recurrence (n=10). These markers of cancer progression are critical for tracking the effectiveness of therapies and finding new tumors early enough to treat.

One research aim that shows significant potential for improvement through advances in NLP methods is patient and physician communication. In particular, decoder-only transformer models excel at the summarization and question-answering capabilities needed to provide patients and caregivers with clear, simplified explanations of complex medical scenarios. Our review identified only two articles employing decoder-only models to summarize patient information (Chung et al. 2023; Lyu et al. 2023), and only one article using a combination of rule-based and machine learning methods for visualization of patient data (Yuan et al. 2020).

Concluding remarks

Every state in the United States has a centralized cancer registry, tasked with counting and monitoring the incident cases of cancer in the population with the ultimate goal of reducing the burden of this disease. Central cancer registries are perhaps the only institutions with the legal authority to access cancer reportable data on entire populations of people, crossing the boundaries of local hospital systems. All cancer registries rely on manual abstraction of cancer data from clinical text, which becomes more difficult over time as our understanding of cancer continues to evolve. Cancer registries cannot keep up with the task of manually turning unstructured narrative text into an ever increasing catalog of structured data elements necessary for research in a timely manner. On the clinical side, even with the widespread adoption of electronic medical records, narrative text remains the primary mode of diagnosis, prognosis, therapy determination, and continued monitoring of cancer patients. Moreover, the burden of cancer on society continues to grow, a fact we know in large part because of the data collected by cancer registries. With the efficiencies of NLP methods applied to cancer clinical text, cancer registries could collect a more complete set of structured data elements for research in a timely manner, and even provide new ways to take advantage of the data as outlined below.

Missed opportunities with transformer models

Although transformer-based architectures have been increasingly applied to the cancer domain, the research has mostly focused on encoder-only BERT-derived models, as shown in Fig. 4. These models require the input text to be relatively short compared to the average length of clinical text documents, and therefore may be missing significant textual context that could boost the performance of NLP tasks. More research should be done to apply long context encoder-only models to cancer clinical text, particularly in cancer registry settings where the clinical text spans multiple visits and is likely to go beyond traditional encoder-only context lengths. In addition, decoder-only models are severely underrepresented in the cancer clinical text literature, and when they are represented, tend to only utilize radiology reports. These models do not need massive amounts of labeled text for task-specific fine-tuning, and since most medical text is unlabeled, this feature alone should warrant increased interest. In fact, one research paper (Huang et al. 2024) found that it was feasible to use ChatGPT on cancer pathology reports, “...to extract data from free text without extensive task-specific human annotation and model training". More attention should be paid to the property of few-shot in-context learning that has been shown to arise in decoder-only models, specifically as they apply to structured data extraction from cancer clinical text. Moreover, the language variations used in clinical text across multiple providers can be substantial (Rios et al. 2019), confounding NLP methods that rely on fine-tuning with labeled examples. In the absence of sufficient amounts of labeled text, a generalized decoder-only model with in-context learning might be a good solution for accurately extracting structured data across the longitudinal history of a cancer patient available at a cancer registry. With better data extraction methodologies, cancer registries could direct manual labor efforts more toward data analysis, quality assurance, and data dissemination.

Given these considerations, there remains significant potential to expand transformer-based solutions for oncology practice, particularly through decoder-only architectures and longer context encoder models. Clinical oncology teams could directly incorporate transformer-based NLP tools within patient care workflows by automating the summarization and extraction of essential clinical information such as diagnoses, cancer staging, and treatment history from complex narrative texts. Establishing closer collaborations between oncology practices and cancer registries would facilitate the refinement of these models using extensive, registry-curated datasets, ultimately enhancing real-time decision-making capabilities and improving patient care outcomes.

Enabling precision oncology with multi-modal models

The advent of next generation sequencing technologies and other advanced molecular profiling techniques have revolutionized the field of precision oncology (Malone et al. 2020). These new technologies, however, produce orders of magnitude more data about a cancer patient that supplement, rather than replace, the clinical narrative text used for cancer diagnosis, staging, and treatment determination. Comprehensive characterization of a cancer diagnosis in the age of precision oncology requires an awareness and integration of clinical text with advanced profiling test results, a problem where multi-modal models could provide a solution. Our survey shows a severe lack of multi-modal approaches utilizing NLP methods for cancer characterization. As cancer registries become a repository for both clinical text and molecular profiling test results, they could provide the most comprehensive collection of multi-modal data about a cancer patient, ripe for training, validation, and testing of multi-modal models.

Enhancing cancer screening with multi-modal models

Lung cancer will remain the most deadly cancer in 2024 and the third most common in the U.S. (https://seer.cancer.gov/statfacts/html/common.html), a distinction it has held for many decades. Studies have shown that radiologic image-based screening can detect lung cancer at an early stage and reduce its mortality (Gierada et al. 2020). Multi-modal models are ideal for integrating image and clinical text data as a way to bring efficiencies and greater accuracy to these screening efforts. Cancer registries are often positioned in public health departments or academic medical centers that implement cancer screening efforts and could facilitate the acquisition, linkage, and integration of these data to train and test multi-modal models. Once developed, these models would provide an automated, efficient way for cancer registries to rapidly identify incidence and mortality trends for the most deadly cancer in the U.S.

Such integration of multi-modal data represents an underexplored opportunity for advancing cancer detection, particularly in lung cancer screening. To translate these multi-modal capabilities into meaningful clinical benefits, oncology clinics could directly embed combined image-and-text NLP models into routine cancer screening workflows, enabling swift identification of at-risk individuals and streamlining subsequent diagnostic steps. Developing structured partnerships between oncology practices, radiology departments, and cancer registries would further support continuous validation and refinement of these multi-modal tools with population-based data, thus ensuring reliability and practical utility in clinical settings.

Understudied cancers

Cancer registries collect data on all cancer types, which can prove difficult for rare cancers or cancers that have not been studied adequately. Our research highlights a lack of attention to pediatric cancers, non-Hodgkin lymphoma, and melanomas in NLP methods research. This deficiency is troubling since these cancers are either leading causes of death in the case of pediatric cancers, or highly prevalent and deadly in the cases of melanoma and non-Hodgkin lymphoma. Because these cancers are under-represented, they often lack sufficient labeled training data required for traditional machine learning and deep learning methods. On the other hand, decoder-only models that can perform well with unlabeled data could help identify and characterize these cancers. Cancer registries are uniquely positioned to have access to a population-based data set that includes rare and understudied cancers across medical providers, ripe for utilization by these techniques.

Addressing this critical gap, transformer-based methods, especially those capable of handling missing or limited labeled data, offer a promising approach for improving the reliable capture of understudied cancer data. In clinical oncology practice, implementing these NLP techniques could expedite the identification and characterization of challenging or rare cancer types-such as pediatric malignancies, melanoma, and non-Hodgkin lymphoma-facilitating earlier diagnosis and targeted treatment initiation. By leveraging comprehensive registry-curated datasets tailored to these specific cancers, oncology practitioners could gain immediate access to powerful NLP tools that directly enhance clinical decision-making, patient stratification, and ultimately, treatment outcomes for these vulnerable patient populations.

Novel uses of cancer registries

Very few of the research articles we reviewed focused on patient/caregiver communication, disease progression, or clinical trials matching for cancer patients. These research aims all have the shared need for data about a patient that sometimes crosses hospital system boundaries, leading to collections of clinical text documents that can have incompatible formats and varying clinical language. Since central cancer registries can get a complete picture of a cancer patient journey across healthcare systems, they may be in the best position to help with these needs, taking advantage of NLP methods that can efficiently synthesize disparate data sources. In particular, long-context transformer models that excel at information extraction could provide this synthesis across clinical documents in order to extract key data elements that would help match patients to potentially life saving drugs in clinical trials. Also, GPT models with appropriate safeguards against hallucinations could be used to provide summaries of cancer diagnoses in language that patients and caregivers can understand, leading to improved quality of life and patient outcomes. And finally, GPT models that can summarize patient trajectories across medical encounters could help track disease progression, metastasis, and recurrence for clinical decision support. These new ways of utilizing data available to cancer registries cannot be done with more manual abstraction, they must instead be developed as efficient, robust processes taking advantage of state of the art NLP methods.

Limitations

This survey was an attempt to capture the general trends of NLP methods development in the literature for cancer clinical text, not a systematic review. As a result, our work had several limitations worth noting. First, in order to contain the scope of this work, we purposely excluded research articles that were not in the cancer domain or focused on non-English text. Second, we limited our search engines to Scopus and PubMed, leaving out other search engines and preprint servers. Third, our keyword search was human-engineered and could have missed research articles through omission of search terms. For example, adding the phrases “Machine Learning" and “Clinical Report" resulted in too many articles that did not have anything to do with text processing and instead focused mostly on structured data, which is out of scope for this survey. And finally, we did not perform a systematic data extraction of the NLP task or performance metrics from each paper. We nevertheless emphasize that due to lack of benchmarks with public datasets, most of the performance metrics from different papers are not going to be comparable and hence may lead to unfair conclusions.

Supplementary Information

Below is the link to the electronic supplementary material.

Acknowledgements

This work is supported by the U.S. NIH National Cancer Institute through grant P30CA177558. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Author contributions

IH and RK conceived the study. IH did the literature review and wrote the first draft, which was revised by both authors. RK advised in scoping and framing the review.

Data availability

No datasets were generated or analysed during the current study.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Change history

7/22/2025

Acknowledgements was modified.

References

  1. Alawad M, Gao S, Qiu JX et al (2020) Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks. J Am Med Inform Assoc 27(1):89–98 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alfarghaly O, Khaled R, Elkorany A et al (2021) Automated radiology report generation using conditioned transformers. Inf Med Unlocked 24:100557 [Google Scholar]
  3. Alsentzer E, Murphy JR, Boag W, et al (2019) Publicly available clinical bert embeddings. In: Proceedings of the 2nd clinical natural language processing workshop, association for computational linguistics, pp 72–78, arxiv:org/abs/1904.03323
  4. Banerjee I, Bozkurt S, Caswell-Jin JL et al (2019) Natural language processing approaches to detect the timeline of metastatic recurrence of breast cancer. JCO Clin Cancer inf 3:1–12 [DOI] [PubMed] [Google Scholar]
  5. Bitterman DS, Miller TA, Mak RH et al (2021) Clinical natural language processing for radiation oncology: a review and practical primer. Int J Radiat Oncol Biol Phys 110(3):641–655. 10.1016/j.ijrobp.2021.01.044 [DOI] [PubMed] [Google Scholar]
  6. Brown PF, Cocke J, Della Pietra SA et al (1990) A statistical approach to machine translation. Comput Linguist 16(2):79–85 [Google Scholar]
  7. Brown T, Mann B, Ryder N et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901 [Google Scholar]
  8. Chen S, Li Y, Lu S et al (2024) Evaluating the chatgpt family of models for biomedical reasoning and classification. J Am Med Inform Assoc 31(4):940–948 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chomsky N (1957) Syntactic structures. De Gruyter Mouton, Berlin [Google Scholar]
  10. Chung EM, Zhang SC, Nguyen AT et al (2023) Feasibility and acceptability of chatgpt generated radiology report summaries for cancer patients. Digital Health 9:20552076231221620 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dai HJ, Yang YH, Wang TH et al (2021) Cancer registry coding via hybrid neural symbolic systems in the cross-hospital setting. IEEE Access 9:112081–112096 [Google Scholar]
  12. Devlin J, Chang MW, Lee K, et al (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
  13. Friedman C, Alderson PO, Austin JH et al (1994) A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1(2):161–174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Fritz AG, Rhit C, Hurlbut AA et al (2001) Seer summary staging manual-2000 codes and coding instructions. National Cancer Institute, Bethesda, p 1 [Google Scholar]
  15. Gao S, Young MT, Qiu JX et al (2018) Hierarchical attention networks for information extraction from cancer pathology reports. J Am Med Inform Assoc 25(3):321–330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gao S, Alawad M, Young MT et al (2021) Limitations of transformers on clinical text classification. IEEE J Biomed Health Inform 25(9):3596–3607 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Gauthier MP, Law JH, Le LW et al (2022) Automating access to real-world evidence. JTO Clin Res Rep 3(6):100340 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gierada DS, Black WC, Chiles C et al (2020) Low-dose ct screening for lung cancer: evidence from 2 decades of study. Radiol: Imaging Cancer 2(2):e190058 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gu Y, Tinn R, Cheng H et al (2021) Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc (HEALTH) 3(1):1–23 [Google Scholar]
  20. Henry J, Pylypchuk Y, Searcy T et al (2016) Adoption of electronic health record systems among us non-federal acute care hospitals: 2008–2015. ONC Data Brief 35(35):2008–15 [Google Scholar]
  21. Hsu E, Hanson H, Coyle L et al (2024) Machine learning and deep learning tools for the automated capture of cancer surveillance data. JNCI Monographs 65:145–151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Huang K, Altosaar J, Ranganath R (2019) Clinicalbert: modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342
  23. Huang H, Lim FXY, Gu GT et al (2023) Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures. Heliyon 9(4):e14793 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Huang J, Yang DM, Rong R et al (2024) A critical assessment of using chatgpt for extracting structured data from clinical notes. NPJ Digit Med 7(1):106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hu Y, Zuo X, Zhou Y, et al (2024) Information extraction from clinical notes: are we ready to switch to large language models? arXiv preprint arXiv:2411.10020
  26. Jagannatha A, Rawat BPS, Yu H (2021) Membership inference attack susceptibility of clinical language models. arXiv preprint arXiv:2104.08305
  27. Johnson AE, Pollard TJ, Shen L et al (2016) Mimic-iii, a freely accessible critical care database. Sci Data 3(1):1–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. López-Úbeda P, Martín-Noguerol T, Aneiros-Fernández J, Luna A et al (2022) Natural language processing in pathology. Am J Pathol 192(11):1486–1495. 10.1016/j.ajpath.2022.07.012 [DOI] [PubMed] [Google Scholar]
  29. Lewis P, Perez E, Piktus A et al (2020) Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv Neural Inf Process Syst 33:9459–9474 [Google Scholar]
  30. Lin H, Ni L, Phuong C et al (2024) Natural language processing for radiation oncology: personalizing treatment pathways. Pharmgenomics Pers Med 17:65–76. 10.2147/PGPM.S396971 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Li Y, Wehbe RM, Ahmad FS, et al (2022) Clinical-longformer and clinical-bigbird: transformers for long clinical sequences. arXiv preprint arXiv:2201.11838
  32. Lukas N, Salem A, Sim R, et al (2023) Analyzing leakage of personally identifiable information in language models. In: 2023 IEEE symposium on security and privacy (SP), IEEE, pp 346–363
  33. Luo Y, Sohani AR, Hochberg EP et al (2014) Automatic lymphoma classification with sentence subgraph mining from pathology reports. J Am Med Inform Assoc 21(5):824–832 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Lyu Q, Tan J, Zapadka ME et al (2023) Translating radiology reports into plain language using chatgpt and gpt-4 with prompt learning: results, limitations, and potential. Vis Comput Ind, Biomed, Art 6(1):9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Malke JC, Jin S, Camp SP et al (2019) Enhancing case capture, quality, and completeness of primary melanoma pathology records via natural language processing. JCO Clin Cancer Inf 3:1–11 [DOI] [PubMed] [Google Scholar]
  36. Malone ER, Oliva M, Sabatini PJ et al (2020) Molecular profiling for precision cancer therapies. Genome Med 12:1–19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Mikolov T, Chen K, Corrado G, et al (2013) Efficient estimation of word representations in vector space. In: Proceedings of the international conference on learning representations (ICLR)
  38. Mishra H, Mishra PK, Ekielski A et al (2018) Melanoma treatment: from conventional to nanotechnology. J Cancer Res Clin Oncol 144:2283–2302 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Mithun S, Jha AK, Sherkhane UB et al (2023) Clinical concept-based radiology reports classification pipeline for lung carcinoma. J Digit Imaging 36(3):812–826 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Page MJ, McKenzie JE, Bossuyt PM et al (2021) The prisma 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372:n71 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Peng Y, Yan S, Lu Z (2019) Transfer learning in biomedical natural language processing: an evaluation of bert and elmo on ten benchmarking datasets. In: Proceedings of the 2019 workshop on biomedical natural language processing (BioNLP 2019), Association for Computational Linguistics, pp 58–65, arxiv:org/abs/1906.05474
  42. Raffel C, Shazeer N, Roberts A et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–6734305477 [Google Scholar]
  43. Rios A, Durbin EB, Hands I et al (2019) Cross-registry neural domain adaptation to extract mutational test results from pathology reports. J Biomed Inform 97:103267 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Saha A, Burns L, Kulkarni A (2023) A scoping review of natural language processing of radiology reports in breast cancer. Front Oncol. 10.3389/fonc.2023.1160167 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Santos T, Tariq A, Gichoya JW et al (2022) Automatic classification of cancer pathology reports: a systematic review. J Pathol Inf 13:100003. 10.1016/j.jpi.2022.100003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Savova GK, Danciu I, Alamudun F et al (2019) Use of natural language processing to extract clinical cancer phenotypes from electronic medical records. Can Res 79(21):5463–5470. 10.1158/0008-5472.CAN-19-0579 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Sedlakova J, Daniore P, Horn Wintsch A et al (2023) Challenges and best practices for digital unstructured data enrichment in health research: a systematic narrative review. PLOS Digit Health 2(10):e0000347 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Siegel DA, King JB, Lupo PJ et al (2023) Counts, incidence rates, and trends of pediatric cancer in the United States, 2003–2019. JNCI: J Natl Cancer Inst 115(11):1337–1354 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Taira RK, Soderland SG, Jakobovits RM (2001) Automatic structuring of radiology free-text reports. Radiographics 21(1):237–245 [DOI] [PubMed] [Google Scholar]
  50. Tan RSYC, Lin Q, Low GH et al (2023) Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting. J Am Med Inform Assoc 30(10):1657–1664 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Tangka FKL, Edwards P, Pordell P et al (2021) Factors affecting the adoption of electronic data reporting and outcomes among selected central cancer registries of the national program of cancer registries. JCO Clin Cancer Inf 5:921–932. 10.1200/CCI.21.00083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Thandra KC, Barsouk A, Saginala K et al (2021) Epidemiology of non-Hodgkins lymphoma. Med Sci 9(1):5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Truhn D, Loeffler CM, Müller-Franzes G et al (2024) Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (gpt-4). J Pathol 262(3):310–319 [DOI] [PubMed] [Google Scholar]
  54. Tucker TC, Durbin EB, McDowell JK et al (2019) Unlocking the potential of population-based cancer registries. Cancer 125(21):3729–3737. 10.1002/cncr.32355 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1 [Google Scholar]
  56. Wang H, Li Y, Khan SA et al (2020) Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network. Artif Intell Med 110:101977 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Wang H, Wu Y, Sun M et al (2024) Enhancing diagnosis of benign lesions and lung cancer through ensemble text and breath analysis: a retrospective cohort study. Sci Rep 14(1):8731 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Wang Z, Zheng C, Han X et al (2024) An innovative and efficient diagnostic prediction flow for head and neck cancer: a deep learning approach for multi-modal survival analysis prediction based on text and multi-center pet/ct images. Diagnostics 14(4):448 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. White MC, Babcock F, Hayes NS et al (2017) The history and use of cancer registry data by public health cancer control programs in the united states. Cancer 123:4969–4976 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Wiest IC, Leßmann ME, Wolf F, et al (2024) Anonymizing medical documents with local, privacy preserving large language models: the llm-anonymizer. medRxiv pp 2024–06
  61. Wormeli P, Mazreku J, Pine J et al (2021) Next generation of central cancer registries. JCO Clin Cancer Inf 5:288–294 [DOI] [PubMed] [Google Scholar]
  62. Yan A, McAuley J, Lu X et al (2022) Radbert: adapting transformer-based language models to radiology. Radiol: Artif Intell 4(4):e210258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Yoon HJ, Peluso A, Durbin EB et al (2022) Automatic information extraction from childhood cancer pathology reports. JAMIA Open 5(2):ooac049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Yoon HJ, Ramanathan A, Tourassi G (2017) Multi-task deep neural networks for automated extraction of primary site and laterality information from cancer pathology reports. In: Advances in big data: proceedings of the 2nd INNS conference on big data, October 23–25, 2016, Thessaloniki, Greece 2, Springer, pp 195–204
  65. Yuan Z, Finan S, Warner J et al (2020) Interactive exploration of longitudinal cancer patient histories extracted from clinical text. JCO Clin Cancer Inf 4:412–420 [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Zhang X, Zhang Y, Zhang Q et al (2019) Extracting comprehensive clinical information for breast cancer using deep learning methods. Int J Med Inf 132:103985 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

No datasets were generated or analysed during the current study.


Articles from Artificial Intelligence Review are provided here courtesy of Springer

RESOURCES