Abstract
Digitized healthcare data, high-throughput profiling technologies, and data repositories have facilitated the emergence of a new era of cancer research. Each data stream requires specialized analysis methods for interpretation. The data-driven era of cancer research requires the development, enhancement, and sustainment of informatics technology software infrastructure, including fundamental methodology development in artificial intelligence and data science. We review current and emerging informatics technology developments for cancer research and discovery, spanning molecular and cellular characterizations, image analysis, informatics, and therapeutics. Summarizing the diverse methods and applications of informatics throughout cancer research identifies themes and emerging areas for the next generation of cancer research.
Introduction
Tumors are dynamic, heterogeneous tissues composed of distinct cellular subpopulations containing cells with evolving clonal identities. These clonal identities depend not only on the epigenetic, genetic, transcriptomic, and proteomic characteristics of the tumor itself but also on its contextual relationship to the surrounding tumor microenvironment (TME) and to its host (1). Advances in sequencing, proteomics, single-cell analysis, and imaging technologies have revolutionized our ability to characterize the multiscale regulatory programs that underlie cancer. The resulting insights into the cellular and molecular pathways relevant to cancer pathophysiology are being leveraged to develop personalized treatment strategies for patients. Still, the complexity of tumor evolution and the broader impact of clinical, familial, and socioeconomic factors can modulate treatment response and disease progression (2). The modern digital age provides a wealth of data that lend researchers and clinicians the potential to unravel the complexity of tumors across molecular, cellular, and population scales while meeting new challenges in data standardization, high-throughput analysis, and clinical interpretability and applicability of informatics models (3). As a result, informatics sciences inform both clinical decisions and basic science research.
An unprecedented expansion of repositories of high-throughput data of patients with cancer, containing genomics, imaging, medical informatics, network biology, and population science data (among other data types) has facilitated data-driven discoveries. At the same time, the interrogation of multimodal cancer data in large and diverse patient populations for the extraction of biologically and clinically relevant insights demands advancements in traditional data analysis methods and multiomics data integration approaches (4, 5). Extracting meaningful insights from these data requires the development, enhancement, and sustainment of informatics software infrastructure, including data analysis and artificial intelligence (AI)-based tools (6). Analysis of each data modality requires numerous and distinct algorithms and may require further customized methodology development to address specific biological or clinical questions. The discipline of cancer informatics has supplied numerous methods and algorithms for specific analysis goals, which can then be leveraged for biomedical research at large by encoding them into reusable software. Whereas AI is a critical contributor to the field, holistic cancer informatics relies on a diversity of methods and suite of tools integrated into standardized, broad analysis pipelines. Such pipelines are needed to preprocess, normalize, and facilitate interpretation of each dataset; thus, workflow engines can be used to automate these processes and promote standardization and reproducibility (7). Increasing automation of informatics pipelines and AI tools for development are democratizing access to informatics in the cancer research community, further supported by the widespread culture of open-source software underlying these analyses. However, informatics knowledge and evaluation criteria are needed to determine the suitability of independent algorithms for specific analysis tasks. Further partnership with bench and clinical investigators in evaluating the output of such pipelines enables the cancer research community to distill multiscale data into biological and clinical insights (8). Bioinformatics software thus play an integral role in basic and translational cancer research and patient care.
In this review, we focus on providing a survey of computational methods and software available across informatics domains and the cancer continuum. We disentangle unique methods and describe the relative utility of distinct methodologies and pipelines for each of the diverse scientific and computational domains that comprise the discipline of cancer informatics. We discuss novel algorithms and the criteria for their selection for use at multiple stages of prognostication, treatment selection and personalization, and longitudinal assessment (Fig. 1). Mirroring a patient’s journey, our review spans the analysis of electronic health record (EHR) information, imaging data (radiologic, histologic, and spatial molecular profiling), and raw data from patient biospecimens (genomic, epigenomic, transcriptomic, and proteomic). We review methods for sample use cases leveraging the integration and translation of quantitative results to the clinic with impacts in precision oncology, disease monitoring, and identification of novel immunotherapies as we follow the path of the patient with cancer from diagnosis to treatment. Although taking a translational focus, we note that these methods also empower basic science and close with a discussion of good practices for data and pipeline standardization, knowledge transfer, and data visualization and highlight emerging areas of impact throughout cancer research and discovery. This is the first review to our knowledge to address each class of method in cancer informatics, spanning methods to analyze data at all resolutions (from molecular to population-wide). Moreover, our review spans applications of these methods for research projects and data streams both in and out of the clinic. We provide a broad yet detailed and up-to-date road map of all components of cancer informatics that eventually contribute to the advancement of scientific knowledge in cancer biology and the development of patient-specific therapies.
Figure 1.
Hallmarks of cancer informatics span computational methods and software tools across the array of digitized data leveraged in modern cancer research. Created in BioRender. Noller, K. (2025) https://BioRender.com/1ph6pke.
Computational Approaches Transform Digital Cancer Data into Clinical and Biological Insights
Modern cancer research leverages a breadth of high-throughput measurements spanning population, tissue, cellular, and molecular scales. The health system and clinical records provide a multitude of measurements from laboratory tests, symptoms, therapeutics, patient characteristics, imaging, and clinical genomics to characterize a patient. Molecular and cellular technologies enable more in-depth mechanistic analysis and are applicable broadly in patient biospecimens and preclinical models for basic science investigation. Profiling of patients with cancer is limited to the measurements and profiling of peripheral, biopsy, and surgical biospecimens that are feasible to obtain clinically. Biological models such as murine tumors, organoids, and cell lines can enhance the depth of characterization and even explore the impact of perturbation and longitudinal changes with many of the same molecular and cellular measurement technologies that can be applied to characterize human biospecimens (Fig. 2A). Although computational pipelines are diverse and tailored for each technology and analysis goal, they follow a similar basic workflow for almost every data type (Fig. 2B). First, raw data are preprocessed and normalized to generate high-quality information. Next, these processed data are subject either to statistical testing to compare sample groups or machine learning (ML) and AI methods to develop classifiers aimed at a similar analysis goal. Finally, pipelines leverage data visualization of analysis results to aid in clinically actionable or biologically meaningful interpretation (Fig. 2B). Each data modality has different benchmarks to quantitatively evaluate the performance of the computational methods and tools used for analysis; their reliance on open-source software enables broad evaluation and improvement by users throughout the informatics community. This section provides a summary of the software and methods used for the several common technologies used in modern cancer research.
Figure 2.
Overview of lifecycle of informatics in cancer research. A, Multiscale data spanning clinical records, imaging representations, and bulk and single-cell multiomics and proteomics assays now comprehensively characterize human tumors and preclinical models. B, Interpretation of these data into biological mechanism and biomarkers of therapeutic response or tumor progression requires technology-specific informatics analysis workflows. For all technologies, these workflows have a common structure of data digitization, preprocessing, and normalization; comparison between groups; visualization; and mechanistic modeling or ML to predict outcomes. Created in BioRender. Noller, K. (2025) https://BioRender.com/m18ljuu.
Clinical informatics
Clinical informatics encompasses data residing in EHRs, including both structured components such as International Classification of Diseases-9/10 codes and medication orders and unstructured components such as clinical report text. As precision medicine strategies in cancer expand and become more common, clinical sequencing and imaging are also increasingly incorporated into these records. One of the primary challenges in working with clinical data is extracting and curating the text of clinical reports into well-curated categorical data for analysis. Manual extraction is painstaking, rendering informatics tools essential to automating this process.
Since 2015, the ML community has steadily moved into deep learning (DL) methodologies that focus on artificial neural networks with multiple layers of neurons inspired by the way the human brain processes information and that were made feasible by the convergence of plentiful digital data and compute power (9). The field of AI and more specifically natural language processing (NLP) was further reshaped by the introduction of the attention mechanism in a transformer architecture (arXiv 1706.03762). Encoder models, such as Bidirectional Embedding Representations from Transformers, were adapted for the automated extraction of clinical information into more standardized data formats (arXiv 1810.04805). This advancement has led to improved prognostic prediction and treatment recommendations. Enhanced scaling of input data and compute resources has greatly enhanced the capabilities of informatics tools for clinical applications. For example, NLP facilitates event extraction of radiotherapy events (10), whereas the Cancer Deep Phenotype Extraction from electronic medical records (EMR; ref. 11) project combines NLP, summarization, data models, and visual analytics to help understand complex cancer cases. The Cancer Deep Phenotype Extraction from EMR project has clinical applications, including cancer surveillance, cancer registrar case abstraction, and an emerging task with the goal to extract the chemotherapy treatment timelines from the clinical narratives in EHRs (12). This project’s extensibility underscores the importance of embedding clinical expertise into the development of informatics methods and software.
Advancements in clinical informatics extends to decision support tools, such as (13) methods of identifying population health management cohorts (13), resources for matching patients to phase I or II clinical trials (14), and traditional ML methods for emergency settings and triaging (15). Other advancements include neural methods that identify social determinants of health in the clinical narrative (16) and statistical and neural transformer-based models that identify side effects in patients with lung cancer receiving radiotherapy (17).
NLP and AI are undergoing rapid advances with the current large language models (LLM) trained on massive amounts of general-purpose text data. LLMs with 8B+ parameters have been shown to exhibit emergent abilities (arXiv 2206.07682), thus opening the door to their usability in prompting settings including cancer research (bioRxiv 2025.01.28.634527). Recent LLMs such as DeepSeek-R1 (arXiv 2501.12948) and Mistral/Mixtral (arXiv 2401.04088v1) are an undeniable technological advancement; however, there are still open research, practical, ethical and regulatory considerations relating to their wide application in healthcare (including cancer research), in which safety and patient privacy are of utmost importance, and hence many non-LLM clinical text extraction tools are still in use. A recent study discusses that LLMs struggle even with simple classification or comparison “unless chains of thought are included in both training and inference” (arXiv 2305.13673). Specifically, LLMs face pronounced limitations when attempting to automate therapeutic selection; current studies indicate that LLMs (e.g., GPT-3.5) provide partially incorrect cancer treatment recommendations 34% of the time (18). Still, applications of AI methods including LLMs to clinical records are in their nascence [only 5% of the LLM studies in the clinical domain for the period of January 1, 2022, to February 19, 2024, used real EHR data for LLM evaluation (19)], and their application to large-scale EHR systems represents the next frontier in clinical practice and patient care.
Imaging data
Radiology
Imaging data, from macroscopic radiology scans to histopathologic tissue representations in digitized images, are essential components of patient management, including diagnosis, prognosis, and response assessment through longitudinal monitoring, and surgical planning. Radiology is typically the first-line assessment for patients with suspicion of cancer and provides a macroscopic view of the tumor location, host reaction, and general phenotype. Radiologic cancer imaging can also provide information on the susceptibility of the patient’s normal tissue to cancer, as well as information on intratumor/peritumor heterogeneity and longitudinal tumor changes during therapy (20). Modalities that provide such information include CT, MRI, and PET. The Cancer Imaging Archive, one of the largest centralized repositories of radiology images, has supported the wide dissemination and enabled the broad utilization of large datasets of multimodal cancer imaging data within the research community for tool development and validation (21).
Early computational tools primarily focused on fundamental aspects of medical image analysis, extending the interpretation of imaging data beyond the conventional clinical assessment of imaging based on RECIST. More recent development has focused on DL and other AI applications that utilize large training data sets to create models capable of expert-level organ and lesion segmentation, patient phenotyping, and disease risk prediction (22–24). Most recently, the advent of foundation models portends a future in which AI is tuned to the unique needs and patient populations of specific clinical settings. The high-dimensional feature space underlying these models can drive clinical applications, including radiology report generation and patient similarity searches (arXiv:2406.06512; refs. 25, 26).
Imaging analysis methods span tumor segmentation, visualization, and feature extraction. Among these tools, 3D Slicer enables broad use for image analysis and visualization (27) with an explicit add-on platform for the dissemination of (i) quantitative cancer imaging biomarker discovery data, produced by novel analysis tools, and (ii) quantitative analysis methods (28). Pyradiomics, the RADIomic Spatial TexturAl Descriptor, and the Cancer Imaging Phenomics Toolkit (29) have pioneered and democratized the utilization of radiomics, which describe the high-throughput extraction of quantitative imaging features, conceptually akin to molecular profiling assays, but from multimodal imaging data (30). This has enabled the generation of comprehensive high-dimensional imaging biomarker data that can be used to predict, assess, and monitor treatment response (31) and analyze tumor heterogeneity (32) and vasculature (33).
Given the wide array of radiomic and DL tools developed, standards have also been established (34) that enable informaticians to benchmark the performance of disparate algorithms for radiomic analysis (35). In addition to data standards, advanced visualization platforms, such as the Open Health Imaging Foundation Viewer, have enabled web-based image viewing capabilities. These platforms, integrated with informatics tools, such as XNAT and Imaging Data Commons (36, 37), allow for the extraction and evaluation of imaging biomarker measurements across centers (38, 39). Despite standardization across imaging platforms, specialized cancer imaging informatics tools are still needed for preclinical imaging (40), magnetic resonance spectroscopy (41), diffusion MRI (42), and molecular imaging (43). These new tools are essential to lead to deeper, noninvasive tumor profiling aiming toward extending beyond clinical biomarkers to the mechanistic characterization of tumor-level phenotypes. They also enable the systematic development and evaluation of subvisual cues not able to be perceived by the expert naked eye but that can lead to the introduction of new knowledge. Despite the tremendous progress in the development of clinically relevant AI, its adoption in radiology practice has been tempered by practical challenges, including difficulties ensuring model generalizability across varied imaging equipment, protocols, and patient populations.
Histology
Histopathology is the microscopic evaluation of cells toward understanding morphologic and pathophysiologic properties of disease and providing a definitive diagnosis. Histology relies on glass slides of tissue sections for diagnostic assessments of patients with cancer. Increasingly, these slides are digitized to create high-resolution images, commonly referred to as whole-slide images (WSI), with each containing billions of pixels. Whereas WSIs have been used in research for more than 15 years, improvements in imaging throughput, storage, and FDA approval of WSIs for primary diagnosis has increased clinical adoption. Over the coming decade, hospitals are expected to produce tens of petabytes of WSI data that when linked with clinical and genomic data can be used for scientific inquiry or development of tools to enhance patient care and healthcare operations. Although there are tremendous opportunities, working with data at this scale presents significant analysis and data management challenges. Recent computational advances made apparent that there is more to these images than meets the eye and that latent patterns can offer information of diagnostic, prognostic, and predictive value. Furthermore, technological advancements in digital pathology have enabled the spatial association of molecular characterization (spatial proteomics and transcriptomics).
Open-source software has played an important role in enabling biomedical researchers to work with WSIs. Tools for tasks like stain normalization (44), segmentation and classification of cells and organelles (45), and semantic segmentation of tissues (46) are often the building blocks of pipelines for tasks like prognostication, treatment response prediction, or diagnostic classification. These tools are often built for analyzing images of standard hematoxylin and eosin–stained tissues but may also handle IHC (47) or immunofluorescence (48), as well as emerging spatial molecular technologies. Prediction of clinical outcomes has been a very active area of research, with many prognostic models from measurements of the TME like cell type abundance, morphology, interactions, and spatial distributions (49, 50). The microscopic scale of WSIs can reveal important biological patterns, and the correlation of histology with genomics has received much attention. Such analyses seek to build joint prognostic models that combine quantitative histologic measurements with genomic signatures or to associate latent patterns with molecular features like pathway activations, somatic mutations, cellular phenotypes, DNA methylation profiles, or even in vitro therapeutic response (bioRxiv 2023.03.22.533810; ref. 51).
Integration of pathology and radiology images can provide insights into the macroscopic radiologic patterns by using pathology as a source of ground truth. Pathology is also an area with many new emerging modalities, including computational staining of unstained tissue (44, 52) and multiplex immunofluorescence that allows deep interrogation of the TME for cancer biology or clinically focused applications (48). Registration of serially sectioned slides is also emerging as a new technology for 3D cellular resolution of tumors (53). Data management is another important theme found in open-source tools for histology imaging. Software tools like Sedeen (54), Digital Slide Archive (55), and DPLab (56) allow end users to view and annotate images and to work with algorithms. Many of these tools feature an application programming interface that facilitates automation of tasks or integration with other tools. These tools may be desktop- or web-based to allow remote access to large, centralized repositories and computer systems.
Despite significant academic research and growth of the digital pathology industry, there are relatively few FDA-approved AI tools (57). Whereas the performance of tools for diagnostic, prognostic, and treatment predictive tasks have steadily increased, due in part to the development of pathology foundation models (58–60), clinical adoption of these tools remains low. Pathology images still exhibit significant variability in practice because of preanalytic factors like tissue processing and staining. Whereas digital pathology adoption in healthcare systems is increasing in the United States, it remains expensive and difficult to implement. Regulation and reimbursement are also significant factors in driving trends in clinical adoption.
Molecular and cellular characterizations
Genetic alterations
Somatic alterations in the genome have long been recognized as a major driver of cancer (61). Somatic alterations found in tumor samples range in size from single-nucleotide variants and small insertions–deletions (small somatic mutations) to copy-number alterations and structural variants (SV) that span large regions of DNA (62). In recent years, advances in next-generation DNA sequencing technologies and bioinformatics tools have improved mutation calling (63). Small somatic mutation callers (64, 65) identify common somatic mutations with high precision by comparing the sequences with those quantified in matched normal samples (63). Copy-number alteration callers (64, 66–68) operate with lower precision (approximately 30%) yet near 100% recall (69). SV calling on short-read sequencing data is even more challenging, with even the highest-performing SV callers having a high FDR (70, 71). Accurate SV identification generally requires whole-genome sequencing data; however, major advances in SV calling are expected as long-read sequencing methods able to sequence across the entire altered region become more prevalent (72).
Bioinformatic analyses of large cancer genome cohorts have revealed enormous intratumor and intertumor genetic heterogeneity. The distribution and combination of somatic alterations varies by individual (73), and careful analysis is required to distinguish genetic drivers of clonal expansion and cancer progression from passenger mutations. Large cohorts provide the raw material and statistical power necessary to perform such analyses using purely data-driven approaches. As large-scale analyses promote the accumulation of knowledge about driver mutations, visualization tools and databases curating somatic mutations, such as Open-CRAVAT (74), have been developed to annotate and interpret the functional and clinical relevance of somatic alterations in cancer. Once individual somatic alterations are identified, subclone analysis may be performed to identify the proportions of each genetically distinct subclone in a given tumor and reconstruct the evolutionary path of a given clone (75). Such analysis would shed light on the functional consequences of each somatic alteration and help characterize the genetic component of tumor heterogeneity. Newer single-cell DNA sequencing technologies are emerging to enhance this clonal analysis, but many are limited to targeted gene panels that could miss critical alterations. As these single-cell genetic profiling technologies enhance in resolution, new informatics tools will be needed for their alignment, variant inference, and clonal evolution (76).
Whereas somatic alterations are often the focus of cancer genome studies, familial cancers are accompanied by germline mutations, which confer increased disease susceptibility and earlier disease onset as in breast (77), colorectal (78), and lung (79) cancers. Identification of germline variants requires patients to undergo germline testing necessitating increased sensitivity to both patient and family privacy, but results of germline and somatic alteration calling can be integrated to improve discernment of somatic drivers (80). Genome-wide association studies have long been used to identify cancer risk variants and commonalities in the genetic backgrounds of patients with cancer (81); however, they face the challenge of distinguishing between causative and acquired genetic alterations in cancer, unlike in other diseases. Both germline and somatic mutation callers often rely on reference genomes, which can introduce biases in these analyses. Ancestry-aware analysis algorithms are essential to overcome these limitations and accurately map the landscape of somatic and germline mutations in cancer (82).
Information about germline alterations provides critical biomarkers for clinical decision-making, leveraged currently to guide screening practices, preemptive surgeries, and even current clinical trials evaluating the potential of preventative vaccines in high-risk patient cohorts (83). Differences in germline alterations can affect the fundamental biology of a tumor, yielding different response to therapy, with further opportunities for personalization obtained from matching driver genes to targeted therapies. EMR systems such as EPIC are being expanded to incorporate genetic information for this clinical decision-making. The widespread use of clinical genetics still faces challenges in including standardized integration into existing clinical workflows, provider and patient education and interpretability, and manner of presentation in the health record (84). Several consortia arising from public projects and private genomic profiling companies are also starting to collate large-scale clinical genetics data for research for further biomarker discovery (medRxiv 2024.11.05.24316583) and AI-based solutions matching genetics to the clinical records (85). These projects must balance open data for biomarker research and improving clinical care with patient privacy given the sensitive nature of genetics that spans across family history.
Transcriptomics
Whereas information about the genomic alterations that drive tumorigenesis can be largely captured through genome sequencing of tumors, the downstream effects of these alterations on transcription and translation can rewire cellular circuitry toward tumorigenesis and produce significant phenotypic changes (86). The specific applications of RNA sequencing (RNA-seq) for cancer studies are numerous and enable the detection of the following: gene and isoform expression (87), expressed somatic variants (88), posttranscriptional modifications (PTM) and RNA-editing events (89), aberrant transcript splicing (90), fusion transcripts (91), and alterations in expression due to large-scale chromosomal imbalances (92). Unaligned reads can be further processed to estimate the counts of additional features present in the sample, such as viral reads in cancers with viral etiology (93) or immunologic features such as lymphocyte receptor sequences (94). Specialized library preparations can enhance the detection of these features, enable the characterization of specific transcriptional events, and investigate the regulatory role of RNA (95). As is the case for genetic profiling, estimates of these events are being enhanced through the development of long-read sequencing technologies (96). Recent advances in long-read sequencing enable the high-throughput sequencing of full-length isoforms with high accuracy and at single-cell resolution. From short- and/or long-read RNA sequences, the Trinity Cancer Transcriptome Analysis Toolkit (CTAT) provides methods for detecting fusion transcripts [TrinityFusion and STAR-Fusion (97), FusionInspector (98), and CTAT-LR-Fusion, bioRxiv 2024.02.24.581862], somatic variants and RNA-editing events (CTAT-Mutations, bioRxiv gr.279281.124), known oncogenic splicing aberrations (CTAT-Splicing), single-cell copy-number aberrations [inferCNV, RRID: SCR_021140; cited 2025 May 22; available at https://github.com/broadinstitute/inferCNV], and oncogenic viral genomic insertions [CTAT-VirusIntegrationFinder; Trinity CTAT (cited 2025 May 22); available at https://github.com/TrinityCTAT/Trinity_CTAT/wiki; ref. 99].
Differential expression analysis is commonly used to compare the transcriptional profiles of normal, dysplastic, primary tumor, and metastatic tissue samples to evaluate multiple stages of cancer progression (100, 101). Web-based tools enable differential expression analyses from the wide array of cancer transcriptomics data available in the public domain (102). In lieu of a priori knowledge of tumor heterogeneity or biological phenotypes, unsupervised learning techniques such as clustering (103, 104) and nonnegative matrix factorization (105, 106) are more suitable tools for defining tumor subtypes. Differential expression and unsupervised learning tools provide gene signatures or ranked gene lists associated with phenotypes of interest. Gene set enrichment analysis (GSEA; ref. 107) provides a way to biologically interpret ranked gene lists and was one of the first methods of expanding transcriptomic analysis beyond single-gene comparisons. Rather than providing lists of differentially expressed genes, GSEA identifies the coordinate activation or repression of groups of genes that share common biological functions, biochemical pathways, chromosomal locations, or regulatory relationships, thereby distinguishing even subtle differences between phenotypes or cellular states. GSEA is now standard practice for interpreting global transcription profiles and elucidating the biological mechanisms associated with disease and other biological phenotypes of interest. It is available along with a companion resource of curated and annotated gene sets, the Molecular Signatures Database (108), which comprises ∼50,000 sets organized into human and mouse collections. These gene signatures can be applied to interpret the lengthy outputs of pairwise differential expression analyses (109) to annotate and characterize samples of interest (110), characterize tissue-specific transcriptomic heterogeneity (111), and perform multiomics pathway integration (112). A few gene expression signatures have been advanced to the clinic to predict recurrence and inform treatment selection (113, 114), but these have yet not reached the pervasive clinical utility observed with genetic profiling. Limitations to the use of gene signatures in oncology include their limited generalizability and context dependence, unclear underlying biological mechanisms of disease-specific gene signatures or pathways, the biological redundancy of several genes across different signatures, and the lack of standardization in developing and validating gene signatures.
Although limited clinically, transcriptional profiling remains a powerful tool from which to infer phenotypic states of tumors. Even in the early microarray days, clustering analysis of gene expression data has distinguished tumor subtypes with distinct clinical outcomes across numerous cancer subtypes (115, 116). Bulk RNA-seq profiling contains the transcriptional signatures of all the cells in a sample, which may represent a mixture of tumor cells and cells in the TME. Bioinformatics techniques have been developed to deconvolve bulk RNA-seq data to estimate cell type proportions in each sample (117), but newer technologies such as single-cell RNA-seq (scRNA-seq) and spatial transcriptomics (ST) provide higher-resolution information about underlying tissue architecture and data from dissociated single cells, respectively (118). When applied to tumor samples, single-cell and spatial technologies have yielded unprecedented insights into intratumor heterogeneity, composition of the TME, and interactions between tumor cells and the TME.
Analysis pipelines for single-cell and spatial data can prove more challenging than that of bulk RNA-seq and can span multiple programming languages; however, user-friendly interfaces such as The Single-Cell Toolkit (119) exist for noncomputational users to perform complex analyses. Limitations to current tools developed for single-cell transcriptomics analysis in the context of cancer informatics include challenges in handling sparse data and dropout events, which may be aided by imputation methods (120), and integration of multiple datasets for population-wide analyses (121). Annotation of rare, transitional, or patient-specific cell types or states presents another challenge, although newer methods such as transfer learning from a reference atlas (122, 123) or copy-number estimation to distinguish tumor-adjacent normal cells from cancerous cells within a given tumor (124) offer improvements on the cluster-based marker gene annotation approach. Once cell type annotation is performed, pseudobulk differential expression and pathway analysis can be performed using similar tools for bulk RNA-seq discussed above. In taking advantage of single-cell resolution, downstream analysis may include trajectory inference for the pseudotemporal ordering of cells along a biologically relevant continuum (125, 126) and latent space analysis for the unsupervised identification of transcriptional programs (123, 127). Defining the intracellular and intercellular regulatory programs that underlie these observed phenotypic changes in cells is now a central area for bioinformatics research. These scRNA-seq analysis tools mentioned, and the limitations they attempt to overcome, become relevant to cancer informatics as a single-cell resolution understanding of the tumor or microenvironmental transcriptome, and its transcriptional dynamics is needed to characterize tumor development.
Epigenetics
Epigenetics refers to the study of gene expression control mechanisms that involve chromosome-bound molecules or reversible covalent DNA modifications. It includes the study of DNA methylation, nucleosome positioning, histone PTMs, genomic binding of transcription factors (TF), cofactors and other chromatin complexes, and regulatory RNA. Cancer genome studies have revealed that many of the most common genetic alterations occur in genes that encode epigenetic regulators of gene expression (128). In addition, epigenetic alterations can silence some tumor suppressors lacking an apparent second-hit mutation. Technologies for measuring the binding patterns of transfactors on a genome-wide scale (referred to as “cistromes”) include chromatin immunoprecipitation sequencing (129), DNase sequencing (130), and Assay for Transposase-Accessible Chromatin using sequencing (131). More refined data about chromatin structure can be inferred from chromosome confirmation capture technologies, such as HiC (132). DNA methylation, measured first by microarrays and extended by bisulfite sequencing (133), methylated DNA immunoprecipitation, or long-read sequencing, can reveal important aspects of its associated gene regulation in human tumor samples. Cistrome Data Browser (134) is a web resource that contains annotated, processed, and quality-controlled human and mouse cistrome data, simplifying its discovery, visualization, and analysis. Cistrome Data Browser–related web applications, such as Cistrome GO (135) and LISA (136), facilitate tasks such as associating TF-binding sites with genes, inferring TF-regulated gene set functions, identifying “super-enhancer” elements, and inferring TF regulators of differentially expressed gene sets. Cistrome Explorer (137) and Gosling (138) help visualize TF binding and chromatin marks on the genome.
The widespread epigenetic regulation of transcription has led to the hypothesis of their association with tumor subtypes. Methcon5 is a method for characterizing DNA methylation pattern consistency within and between sample types to identify functional regions of DNA methylation (139). To more directly address the problem of epigenetic regulation of subtypes, ELMER uses differential DNA methylation with gene expression profiles in combination with histone modification data to infer gene regulatory networks in cancer subtypes (140). This inference has been expanded by a wide literature of multiomics analysis methods that infer epigenetic-driven tumor subtypes for pan-cancer analysis (141, 142). These integrative techniques have become the foundation for the next generation of bioinformatics methods for single-cell epigenetics technology, with a particular focus in cancer research on epigenetic regulation of cells in the TME. More recently, epigenetic marks have been shown to be identifiable in cell-free DNA (cfDNA), with new AI methods being developed to detect these events for use as screening tools for cancer risk and progression (143).
Proteomics
Systems-level protein-based characterization of cells and tissues by reverse phase protein arrays (144) and mass spectrometry (MS) has emerged as an important molecular characterization for cancer, capturing new insights into protein regulation and PTMs that cannot be directly inferred from mRNA or other molecular-level measurements (145–147). Like The Cancer Genome Atlas (TCGA) that preceded it, the Clinical Proteomic Tumor Analysis Consortium in the NCI has made significant advances in developing and applying proteomics and phosphoproteomics to profile cancers (148), currently accounting for 6,000 tumors profiled across 16 cancers, with other PTM-specific profiles like acetylation, glycosylation, and ubiquitination accelerating in their coverage (149). For MS to arrive where it is today, a series of computational advances was requisite (150). For example, spectral mapping and high-efficiency database searchers for defining the proteins and PTMs captured by peptide fragments (151, 152) along with methods for estimation of relative abundance from MS experiments (153, 154). Following those first steps of moving spectra to identified molecules with relative quantification, the next phase involves creating storage and access solutions (149, 155, 156) for cancer proteomic datasets as well as comprehensive analysis suites (157–159). In moving from molecular measurements to insights into cancer, existing methods, like enrichment, can be translated to proteomics data. However, two key challenges are created by the uniqueness of MS-based analysis: missing data and lack of annotations for PTMs (160, 161). Two key strategies have been developed for the challenge of data sparsity: data imputation (162, 163) and statistical methods that are robust to missing data (164, 165). One promising approach to addressing the lack of annotations and the high degree of study bias that exist for phosphorylation (162) leverages the unique aspects of phosphoproteomics. This method shows promise in complementing clinical standard-of-care by predicting kinase activities from tumor samples (166). Two key emerging technologies in this space will present challenges for bioinformatics analysis, including the development of spatially resolved proteomics (imaging MS; ref. 167) and one-pot proteomics for coverage of nanoscale (e.g., single cell) sample sizes (168).
Spatial “omics”
Spatial molecular technologies are leading to the convergence of genomics and histology imaging. Initially capturing individual proteins or RNAs selected for study, these technologies now have expanded molecular resolution and multiplexing. Spatial proteomic technologies, Nature’s 2024 Method of the Year, use IHC-based assays for single-cell and sometimes subcellular spatial resolution of tens, and in some cases nearly a hundred, protein markers concurrently. The advent of ST has transformed tissue analysis by facilitating the characterization of cell types and states within their spatial context and the identification of regional gene expression programs (118). ST studies follow two primary methodologies that provide differing levels of cellular resolution: (i) sequencing-based methods that enable whole-transcriptome analyses but offer low sensitivity and spatial resolution, and (ii) in situ hybridization–based methods that achieve subcellular resolution and high sensitivity but are limited to predefined gene panels, typically consisting of a few hundred genes. These limitations can be partially circumvented by computational tools that leverage scRNA-seq data from the same tissue. Tools that map scRNA-seq data into ST data (169, 170) can enhance the phenotypic resolution of in situ hybridization–based ST datasets. Additional insights can be gained through computational integration of multiplexed IHC data with related single-cell proteotranscriptomic (e.g., CITE-seq) datasets (171) or by using new spatial multiomics or epigenetics technologies that enable concurrent measurements of ST and other omics data modalities (172). Taken together, spatial omics technologies allow for robust characterizations of the cellular and molecular landscapes of patient-specific samples and preclinical models, with the promise to fully resolve the TME. However, despite their tremendous promise, spatial omics technologies are still in their infancy. The molecular resolution obtained from tumors is still higher for bulk technologies, and their cost can be prohibitive for large cohorts of samples or in clinical contexts. As a trade-off, some spatially resolved transcriptomics technologies instead perform whole-transcriptome profiling in regions of interest selected based on imaging features or proteomics characterization.
The application of spatial molecular assays to cancer research provides an opportunity for understanding the spatial and cellular organizations of gene and protein expression programs involved in tumor progression, invasion, resistance to therapy, and the immune TME. The first analysis stage involves image processing and AI approaches (e.g., cell segmentation) to identify regional (multicellular, cellular, or nuclear) boundaries from the imaging data and isolate the regional molecular profiles. The performance of these approaches depends on the cellular morphology, density, and inclusion of staining for cellular and nuclear boundaries in the imaging data (173, 174). However, many current technologies for whole-transcriptome spatial profiling do not achieve single-cell resolution and instead represent the composite transcriptional profile of relatively large fixed-size “spots,” which often encompass tens of cells. In this case, tools for deconvolving ST data based on reference scRNA-seq data (175) can be used to estimate the molecular profiles of each cell captured in these regions. When ST is paired with histology, the true single-cell resolution of histology can purify regions to specific cell types for analysis (176) or be integrated to further enhance the deconvolution of molecular profiles for each cell (177, 178). Imaging software for histology is being expanded to enable further human-guided analyses of these spatial omics data and account for their higher-dimensional nature (179, 180).
The interpretation of molecular and cellular features from spatial omics can leverage and build upon many of the well-established bioinformatics techniques developed to analyze bulk- and single-cell data. The spatial nature of these assays requires expanding these methods to identify and quantify spatially variable genes (127, 181, 182), spatial domains enriched for specific cell populations and gene expression modules (183, 184), spatial colocalization of cell types and states (185, 186), and alignment of spatial data from multiple tissue slices (187). Recent studies have found associations between spatial omics TME features and many clinical attributes, including progression of precancers to malignancies (188, 189), tumor response to therapy (190, 191), and patient disease-free survival (192–194). These examples highlight the potential impact of these new technologies on patient care. However, given the high cost of spatial molecular profiling, an important consideration to further advance these studies is attaining the required statistical power for establishing associations with clinical variables. Informatics has the potential to play an important role by identifying the TME features that are most useful in understanding cancer biology and predicting clinical outcomes or annotate cellular function associated with distinct cellular morphologies. Lower-cost spatial omics assays or even AI-based annotations of traditional histology imaging can then be developed to quantify only these features, and these assays could be more widely deployed in clinical settings because of their lower cost.
Clinical Applications of Cancer Informatics
In many cases, the oncogenic and clinical relevance of a particular gene is well-established and further supported by the remarkable number and diversity of informatics resources available for understanding the molecular characteristics of individual tumors. As a result, the potential applications of bioinformatics to basic cancer biology are nearly boundless. However, translating high-throughput data to clinical practice requires more than just interpretation. It involves selecting precision therapeutic strategies, designing novel therapies, identifying populations for therapeutic interception, and data infrastructure to incorporate these features as interpretable notes in clinical records. These remain active areas of informatics research (Fig. 3).
Figure 3.
Informatics tools contributing to functional and interpretable distillation of genomics and clinical datasets can enhance target identification, biomarker study, and clinical trials to enhance patient outcome. Created in BioRender. Noller, K. (2025) https://BioRender.com/oyayfpk.
Informatics techniques enable targeted precision medicine
Complex analyses enabled by tools that blend sequencing data with clinical datasets can be used to implicate patient-specific variants in personalized clinical decision-making. To this end, several cancer variant knowledgebases have begun to systematically curate relevant experimental and clinical data (195–197). These resources have faced many challenges, such as the incompleteness and difficulty in scaling expert-driven curation of complex literature, but computational efforts are beginning to address them (discussed further in “section 2.1”). Predicting the effects of a patient-specific variant on disease progression and treatment outcome is another important component of precision oncology. Despite the ability to leverage existing tools and resources, many variants observed in patient tumors today are classified as variants of unknown significance based on current knowledge, thus limiting their potential clinical significance. Algorithms that computationally predict the functional or oncogenic impact of a given variant, including those for which little or no primary data exist, continue to develop but are not yet widely used in the clinic (198). In addition to these algorithms, the recent development of multiplex assays of variant effects has the potential to accelerate the functional characterization of individual variants and create an avenue for informatics resources that aggregate these data (199), leverage them to train new predictive algorithms (200), and integrate them into existing variant interpretation platforms (bioRxiv 2023.06.20.545702). Resources designed to integrate data such as variant annotations (201), variant frequencies in tumors (75, 202) and control populations (203), expert curated experimental data, algorithmic predictions, multiplex assays of variant effects data, and so on have become critical to handling the complexity of these data (204) and building consensus assertions of clinical relevance (204, 205). These integration efforts are increasingly benefiting from the development of centralized variant registries (206), formal specifications for computable representation of variants (207), relevant formats, standards, and ontologies for central concepts (208, 209), and robust guidelines for assessing oncogenicity (209) and clinical tiering (210). However, significant challenges remain in translating these advances into routine clinical practice (211, 212). Many cases still do not yield an actionable biomarker, and the clinical utility of genomic testing remains variable across different tumor types. Lack of access to key precision medicine support systems—including clinical decision support tools, integration with EMRs, and access to molecular tumor boards—continues to hinder widespread adoption. As data complexity increases and novel modalities such as liquid biopsies become more integrated, new strategies are required to ensure equitable access. Precision oncology often relies on results from multiple tests and panels, yet access to domain expertise and standardized best practices remains inconsistent. Testing algorithms are complex and highly dependent on tumor type and stage, underscoring the need for improved physician and patient education in genomic profiling. Although much remains to be done, the ecosystem of tools and resources described above is enabling real-world impact on patients by matching individual patients to targeted therapy or basket clinical trials (213), supplementing information from individual diagnostic laboratories and software to create tumor portraits with molecular-level information (214, 215), and aiding the translation of large-scale precision oncology efforts to routine clinical practice (212). Tools such as MatchMiner (Molecular Tumor Board), Platform for Oncogenomic Reporting and Interpretation, and Molecular Tumor Board Portal have facilitated enrollment of hundreds of patients to dozens of trials with shortened time to completed consent (213–215). Beyond clinical trials and molecular tumor boards, the same tools and resources that support these experimental applications are enabling adoption of precision oncology approaches in routine clinical treatment of many tumors and across large health care systems (216–218).
Immunotherapy
As immunotherapy combinations are becoming mainstays of cancer care, bioinformatics algorithms for personalized treatment selection must account for both the properties of tumor cells as well as immune cells in an individual’s TME. Profiling and targeting of both the innate and adaptive immune systems have become major areas of cancer therapy research, including immune checkpoint blockade by antibodies, cellular therapies such as chimeric antigen receptor–T, NK, and adoptive T-cell therapies, TCRmimic and bispecific antibodies, and cancer vaccines. Informatics tools play a crucial role in this research, aiming to develop and advance these therapies by facilitating analysis, interpretation, and management of large datasets generated in the context of cancer immunotherapy model systems and clinical trials. The field of cancer immunotherapy research now benefits from a rich and rapidly growing ecosystem of informatics tools. Broad themes addressed by these tools include assessing the immune TME, identifying novel molecular targets for immune therapies, predicting response to off-the-shelf immunotherapies, designing personalized immune therapies, and characterizing immune responses and mechanisms of resistance that emerge under treatment. The application of quantum computing has enabled advances in small-molecule design and modeling of drug–target interactions in the cell (219); however, this work has yet to be extended to immunotherapies. Examples of specific analytic applications of these tools are diverse. Several relate to the goal of understanding the composition of immune cells interacting with tumors, including tools for estimating the overall level of immune infiltration as a biomarker for immune checkpoint inhibition or characterization of the immune cell type composition, to inform combination therapeutic selections.
Because neoantigens, or tumor-specific antigens, are key targets of immune recognition, they have become the subject of an additional class of tools for neoantigen identification and analysis. These tools are able to predict neoantigen processing, transport, and presentation (220, 221), estimate tumor neoantigen burden (222), and design personalized neoantigen therapies (223, 224). However, enhanced training sets are needed to improve the accuracy of peptide–DNA binding predictions, and particular attention must be paid to ancestry biases, which may limit the accuracy of antigen prediction on an individual basis.
Finally, tools that characterize the immune response itself are becoming widely used for correlative analyses to evaluate mechanisms of response and resistance to immunotherapies. These analysis methods focus on B- and T-cell immune (VDJ) repertoire analysis (225–227) and phenotyping and interpretation of immune cell states (227). Databases are available to catalog key immune-related genes and pathways (108, 228, 229) and experimentally validated antigens and immune receptors (230, 231). Collectively, these tools are enabling research to advance development of an increasingly diverse array of immune-based cancer therapies and evaluate patient response.
Liquid (noninvasive) biopsy
Informatics tools play a crucial role in the analysis of cfDNA data in tumors, especially in the field of liquid biopsy. Informatics analysis of genetic material released by tumors into the bloodstream can provide valuable information for the early detection and diagnosis (232), subtyping (233), or monitoring (234) of cancers. Specific informatics tools are required to select genetic features of interest in cfDNA, either via de novo identification from the cfDNA data (235, 236) itself or, where feasible, by direct analysis of bulk tumor tissue with exome or whole-genome sequencing followed by the creation of a customized panel, targeted enrichment, and detection of patient-specific variants in cfDNA (234). Relevant classes of genetic features for cfDNA detection are typically identified by sequence alignment of short reads to a reference genome followed by the use of tools designed to interrogate alignments accurately for distinct variant classes such as single-nucleotide variants, small insertions and deletions, and larger SVs and copy-number variants.
An orthogonal approach does not identify specific variants but instead relies on “fragmentomics,” the detection of cfDNA fragments of a size distribution, or end motifs distinctive of a tumor cell origin (237). Several tools leverage ML approaches (238) for specific applications, such as using cfDNA fragment or motif patterns to classify tumors with DNA damage repair defects (239) or identify sensitive methylation signatures for specific tumor types (240, 241). Finally, some tools focus on integrating cfDNA-based assessment of minimal residual disease with clinical data and into treatment decision-making processes (242). For example, the integration of informatics tools in cfDNA analysis facilitates the identification of actionable genomic alterations (243), monitoring of response to specific treatments (243, 244), the emergence of resistance variants (245), and the development of personalized treatment plans in patients with cancer (246). Integrating liquid biopsy with noninvasive imaging approaches may further enhance the preoperative accuracy of cancer detection, strengthen the effectiveness of longitudinal monitoring, and inform the clinical team without obstructing the patient’s clinical picture. Further study of the variables affecting cfDNA levels and characteristics (such as methylation or fragment size) in the patient and the challenge of detecting tumor-derived cfDNA in the early stage of tumorigenesis will determine the applicability of this technology particularly to the early stages of cancer (247).
Computational Infrastructure
Computational infrastructure—the software that supports data analysis tools and visualizations—plays several key roles in cancer informatics. Computational infrastructure connects everything needed for a cancer data analysis, from data management systems to analysis tools and visualizations to computing and hardware resources. Infrastructure also ensures that data are available in standardized formats, privacy and security protocols are enforced, multitool workflows can be created and scaled to run on large datasets, and all data analyses are reproducible and widely accessible. Combining robust software practices with analysis, code, and data access are essential for best practices in reproducible cancer informatics research (248). Computational tools and resources provide critical support for the operations that cancer informatics relies on: robust and efficient data access, including storage, distribution, and annotation in accordance with the findable, accessible, interoperable, and reusable) principles (249). However, the vast array of available raw data coupled with the large number of informatics tools necessitates standardization of both the data and the analysis platforms, as well as the development of appropriate infrastructure following best practices.
The first step in standardization is to adopt domain-specific data standards and follow established principles of data storage, transfer, distribution, and later annotation (250). These efforts must include file storage and transfer, metadata management, indexing and searching, and a portal-based user interface providing access to these components. Raw data from each modality should be preprocessed and converted to community formats with a common or translatable syntax. After preprocessing and normalization, data are used in a variety of downstream analyses that rely on tools built for different hardware and software configurations. Solutions to the problem of interoperability include ecosystems such as Bioconductor (251), Galaxy (252), and Jupyter Notebook (253). Another challenge that computational infrastructure addresses is accessibility. Many informatics techniques are developed in command line software, which may render them prohibitive for general use. To this end, user-friendly interfaces for data analysis platforms and visualization methods enable broad access of large-scale and complex datasets yet balance usability with interpretability (254, 255). In addition, whereas it is important to establish platforms for data analysis that incorporate existing tools, the open-source software community also ensures that the underlying code for the tools themselves adhere to good software practices. These practices include version control and unit testing and aiming to maximize code readability and modularity. Besides documenting software versions used in a workspace, platforms such as Docker (256), MLCube (257), or CodeOcean (256) can be used to containerize software and control version release for reproducibility. We anticipate that computational infrastructure will continue to be a rapidly growing field, especially with the incorporation of LLMs and AI. Clinical data infrastructure pose additional challenges of developing user interfaces that can readily be adapted in clinical practice and maintain patient privacy when collated for downstream data science–based or AI-based research designed to advance improve human health.
Data standards
The cancer informatics age is accelerated by the widespread availability of data. Although data may be seemingly accessible, robust informatics still requires obtaining, curating, and preprocessing the data prior to analysis. Data standardization is a labor-intensive process, as it converts observational and research data to community formats to support several critical processes such as the optimal primary and secondary use of collected information and efficient inter- or intra-institution system interoperability. The community has developed several ontologies, terminologies, taxonomies, and common data models (CDM) to define domain-agnostic or -specific standardized data structures, formats, and classifications to cover these needs. Several biomedical consortia have significantly contributed to this frontier over the last years, such as the National Patient-Centered Clinical Research Network (258), the Observational Health Data Sciences and Informatics program (259), the US FDA Sentinel System (260), and the Health Level Seven organization (261). Major efforts in the precision oncology space include the NCI’s Genomic Data Commons data model (262), the American Society of Clinical Oncology’s Minimal Common Oncology Data Elements model (263), the OSIRIS CDM (264), and the Precision Oncology Core Data Model (265). Extracting and formatting data into CDMs can be time-consuming and error-prone. To address these challenges, LLMs are increasingly being used to extract and format structured information from unstructured free text in clinical reports and EHRs (266, 267). CDMs and LLMs are highly synergistic as LLMs can accurately create CDM-compliant datasets, which can then be analyzed with analysis tools that operate on CDM datasets.
Implementation of domain-specific data standards benefits from informatics tools that can support data standardization tasks or use commonly accepted CDMs to accomplish key objectives. One of the first efforts supported by the NCI through its Informatics Technologies in Cancer Research (ITCR) consortium is a quality assurance system to convert pattern-based expressions from integrated repositories into the Web Ontology Language syntax after identifying duplicates and detecting modeling errors in common data elements (268). Other attempts to promote interoperability include a web-based system developed to map North American Association of Central Cancer Registries–compliant data to the NCI Thesaurus, which utilizes data from cancer registries according to the North American Association of Central Cancer Registries model (269). Current efforts leverage accomplishments in biomedical data standardization and seek to apply existing data standards rather than reconstruct new solutions. One approach uses the Precision Oncology Core Data Model to construct a data repository for a precision oncology decision support platform (NCI, 1U01CA274631-01A1), whereas another approach combines LLMs with advanced AI algorithms to harmonize EMR/EHR data according to the Minimal Common Oncology Data Elements CDM, subsequently processed by a distributed analysis platform developed for cancer research (NCI, 1U01CA274576-01A1).
Other informatics tools targeting different areas may still benefit from the above work by reusing the developed solutions. Whereas genomics datasets are broadly uploaded to common databases with standardized formats, including GEO, dbGAP, and ERA commons, obtaining and standardizing these datasets in specific disease contexts is still labor-intensive. Tools such as cBioPortal (254), WebMeV (102), Xena (255), and recount (270) attempt to overcome this by centralizing analysis and preprocessing or public domain data. The Bioconductor project has developed standardized software structures to store and operate on multiomics datasets (271). The NCI’s Imaging Data Commons and emerging software standards and platforms are providing support for imaging technologies (272). At the same time, new data standards are emerging for the advances in high-throughput research data, including datasets that span clinical, biospecimen, sequencing, and spatial molecular technologies being developed in the Human Tumor Atlas Network, providing further opportunities for learning from multimodal datasets (273). Still, considerable informatics research is required to develop infrastructure and implement analysis methods bridging these diverse data modalities.
Data analysis platforms
As cancer research has evolved, the number of software tools required to curate, preprocess, analyze, and visualize cancer data in all its modalities has exploded alongside the proliferation of data types and volume of data. Because tools are designed for varying hardware and software environments, it is a significant challenge to maintain reproducible analysis workflows and coordinate the flow of data across tools. Accessibility is another challenge as many cancer researchers do not have the informatics expertise to set up and run analysis tools and visualizations. Data analysis platforms address these challenges by providing a large collection of software tools within an accessible and unified visual environment that offers functionality to manage and support the analysis process, such as recording and replaying of analyses, joining of individual tools into workflows, sharing of analyses and results, tutorial vignettes, and support for a wide range of data types. Typically, a platform’s tool collection will span many steps of derivation, from preprocessing to downstream analysis, and offer several choices at each step, allowing scientists to choose the tool that best suits their requirements. Platforms feature open architectures that allow members of the community to contribute new tools easily and run tools on many different software and hardware configurations. The purpose of developing software in compliance with informatics data standards is to allow for interoperability and applicability across various data modalities and settings. Data analysis platforms use standards to connect individual software tools into cohesive analysis pipelines to further aid with reproducibility and automation (274, 275).
Numerous data analysis platforms exist that collectively provide access to thousands of tools. The Bioconductor (251) ecosystem includes thousands of packages written in the R programming language. The Galaxy (252), GenePattern (276), and WebMeV (102) platforms each provide a collection of analysis methods for a variety of cancer data modalities, as well as general AI approaches. They are available as online resources requiring only a web browser for use, though local installation is also available. The NCI Cancer Cloud Resources, Broad Institute Firecloud, Seven Bridges Cancer Gateway in the Cloud (277), and the ISB Cancer Gateway in the Cloud (278) are NCI-sponsored platforms that provide access to many large public cancer datasets. However, use of these platforms to run analysis tools and store processed datasets incurs costs because they run on commercial cloud computing platforms. Due to the rise of computational notebooks as a key medium for scientific communication in both R and Python, several platforms have incorporated the notebook metaphor. Galaxy enables users to launch, use, and save Jupyter notebooks within the Galaxy interface (253). GenePattern has released GenePattern Notebooks (279), which allow scientists to insert GenePattern analyses as cells in a Jupyter notebook. In addition to supporting analysis of specific datasets, notebooks can provide the foundation of code templates or well annotated vignettes to adapt to custom datasets and analyses. Conformance to good software practices is critical to developing high-quality, maintainable, reproducible, and scalable software solutions for cancer informatics research (arXiv 2306.03255; ref. 280). AI developer tools, such as GitHub Copilot, can reduce the coding expertise needed to adapt these templates and software for new analysis tasks.
Using data analysis platforms, it is often possible to run several different analysis tools for the same task. Choosing between different tools for similar analysis tasks is a challenging. Each tool performs optimally within a preestablished set of parameters or conditions, which should be documented to enable the user to evaluate the suitability of a given method for their desired application and evaluate method performance. When developing a new method, performance of the method should first be tested in simulated datasets with user-set conditions and a wide range of selected parameters to test method sensitivity (281). Next, the method should be applied to biological datasets with well-defined conditions for which an underlying ground truth is already known. High-performing methods are often identified by comparing method performance in simulated and biological datasets across multiple methods, so long as outcomes are quantifiable and ground truth is clearly defined (282). Independent training and test datasets are needed to avoid model overfitting and improper parameter selection. To quantify performance, simple metrics may be useful; however, for complex biological systems, users must consider the assumptions behind any performance metric and attempt to include multiple approaches for quantitative comparison. To help choose suitable analysis tools, some data analysis platforms generate a summary of analysis outputs from different tools that can then be used to compare and select tools.
Data visualization
Data visualization is a crucial complement to statistical analyses in its ability to bring trends, patterns, and discrepancies quickly to the attention of researchers. Visual representations of data not only aid in interpretation but also facilitate the generation of hypotheses and the formulation of experimental strategies. Visualization tools for cancer data are often available as components of an online public data repository, allowing scientists to easily define a cohort or subset of data from a cancer study, identify features of interest, and visualize the resulting data in a variety of ways appropriate to each datatype. The cBioPortal for Cancer Genomics (283) and Xena (255) provide these capabilities for cancer genomics data from TCGA and other initiatives (284).
Whereas portal-based visualization tools offer easy access to large-scale datasets, scientists frequently require additional flexibility in the datasets they support and the way that data can be navigated. These needs are supported by standalone software visualization packages that provide easy access to local datasets. The Integrative Genomics Viewer (285) and JBrowse (286) both are available as local installations or web-based applications. Both base their viewing model on a linear genome, with data laid out as tracks spanning genomic regions. Each adds its own selection of nonlinear viewing options for additional insights, including circular genome, splice junction, 3D genome, discontinuous gene set, dotplot, and synteny views. To allow their tools to be used in other portals and online resources, the developers of Integrative Genomics Viewer and JBrowse have made their tools available as widgets that can be embedded in web pages. For array-type data, the Next-Generation Clustered Heat Map (287) tool provides comprehensive functionality for visualizing, navigating, and clustering datasets in a heatmap format.
Discussion
In an era in which new data are being rapidly generated and becoming increasingly high-dimensional, high-resolution, and multimodal, we anticipate that the computational community will only grow as it addresses the growing need to adapt to such changes. We anticipate the growing importance of AI in the cancer informatics and healthcare spheres, in which AI may aid in both automation of data analysis and software development as well as the prediction of cell behaviors and clinical factors in patients with cancer. Cancer informatics principles and community standards will become ever more important as multidisciplinary research interactions increase. Collaborative analyses will help drive optimal information extraction from patient and experimental data, as well as from the biological application and clinical translation of informatics-derived results. In the cancer biology space, tailoring informatics tools to the complexities of the cancer ecosystem is important, but we do so with a broad applicability in mind, knowing that these tailored tools may pave the way for future advances in biomedical research.
Use of informatics tools in research related to cancer therapeutics remains a rapidly evolving area. In some cases, new areas of biology relevant to cancer open new avenues for data generation and reveal a need for new informatics tools. For example, research on the role of the microbiome in cancer risk and oncogenic mechanisms initially focused on species identification through marker gene surveys but now integrates metabolite, metaproteome, and metatranscriptome profiling. Recent studies have shifted from characterizing microbiome composition to understanding microbiome function, in which informatics software supporting microbiome DNA sequencing data analysis (288) provides tools for the investigation of microbial functional and metabolic networks. The complexity of microbiome data and its potential relationships with cancer has led to AI-based approaches aimed at the typing of cancer based on tissue-specific microbial information (289), cancer diagnosis (288, 290), and cancer therapy selection (291). On the other hand, new data types such as high-resolution ST data can also drive the development of new informatics tools as their value to the interrogation of the cancer microenvironment and spatial tumor phenotypes is realized (191). As more tools and new data modalities become available, benchmarking of tools and standardization of analysis pipelines will become more important. Such benchmarking is enabled by using publicly available benchmark datasets such as TCGA and International Cancer Genome Consortium to test new tools and making tools available for cross-comparison within the informatics community. The NCI ITCR connectivity map (292) developed by NDEx 2.0 (293) depicts current and planned relationships between tools and helps with integration and proper use of benchmarked methods. To enable the development of tailored informatics tools for cancer research and clinical care, it is critical to have access to publicly available omics, imaging, and clinical datasets. Data are currently made available at a centralized location, often enabled by consortia focused on certain applications (294–296). However, centralization can be challenging due to various issues including privacy, data ownership, intellectual property, and the need to comply with different regulatory policies, such as the Health Insurance Portability and Accountability Act in the United States (297) and the General Data Protection Regulation in the European Union (298). Moving forward, investigators are adapting to the realities of limitations in data sharing across institutions by introducing infrastructure for the federated training of algorithms. The Federated Tumor Segmentation tool (299), the Generally Nuanced Deep Learning Framework (300), and the Open Federated Learning library (301) were the first open-source tools to demonstrate the feasibility of facilitating real-world, secure, and multi-institutional collaborations without the need to share patient data, therefore alleviating concerns for legal, privacy, and data ownership considerations and paving the way toward accelerating research of cancer radiophenotypes. Of note is the largest real-world federated learning study to date, facilitated by these tools, that developed and validated AI models by leveraging knowledge from patient data from 71 geographically distinct institutions across six continents (302). Data sharing and cross-institutional collaborative efforts will greatly enable the progression of cancer bioinformatics and the contribution of multidisciplinary investigators to its goals. Still, the regulatory programs, privacy preservation considerations (303), and infrastructure to obtain research data cohorts across institutions remain active areas of research essential to empower the next generation of cancer informatics.
Increased access to digitized data poses the opportunity to advance clinical informatics and healthcare through AI. Whereas informatics tools are often tailored to a specific biological task, many AI methods can be generally applied for informatics analyses and biomarker discovery. This ready access can reduce the barrier to entry for informatics and enhance the predictive power in cancer informatics to even cases without human annotations. However, the black box nature of many AI methods can yield noninterpretable features that are difficult to trust for clinical decision-making and may be biased based on the composition of the training data or technical artifacts between hospitals. Ultimately, AI, like informatics, is a class of methods whose performance depends on the training data used to inform the algorithm and the specific ML model selected for the analysis. Research in explainable AI for cancer is a developing field and includes methods such as neural networks with internal structures that mimic biological systems (304, 305). These approaches can enhance the trustworthiness and accuracy of AI methods for cancer informatics while simultaneously enhancing their accuracy. Moreover, the incorporation of prior biological or clinical knowledge into these algorithms can also reduce the amount of training data needed for their accuracy. As a complement to AI-based methods, mechanistic mathematical models can play a further role in simulating changes to cellular or patient-level phenotypes over time to predict emergent cell behaviors. When integrated with digitized data, both AI methods and mathematical models provide the potential to leverage large-scale cancer datasets to develop digital twins and other virtual clinical trials systems that may reduce the amount of patient or preclinical data needed for biomarker and therapeutic discovery. AI will also play a role in cancer informatics by automating aspects of software development and data analysis via code generation using LLMs (306). AI is likely to become commonplace in these activities, suggesting code for new features in software as well as writing code to process and visualize datasets to reduce the barrier of entry to AI for cancer researchers.
Whereas AI is democratizing access to data science and informatics, understanding the selection of tools for specific analysis tasks and interpreting the output from these methods still often requires dual expertise in informatics and cancer biology. This challenge is furthered with the rapid advance of new technologies to investigate cancer, requiring advancement of still further methods to account for technology-specific effects and the new biological and clinical questions that it unlocks. The close connection between the clinical and biological questions with algorithm development is making cancer research increasingly multidisciplinary and reliant on large-scale team science of scientists spanning diverse disciplines. Bridging the gap between the deep algorithmic theory, software, and disease biology requires that all investigators speak the same language despite their background for effective implementation and development of informatics. Up-to-date and broad educational resources are as essential to realizing the power of informatics as the software tools themselves and as recommendations for standardization, validation, and good clinical practice of AI in oncology (307). Projects like the ITCR Training Network (308) exist to assist the cancer informatics community in both these aspects by wide knowledge dissemination via online training materials and workshops. Software developed by the same group (309) are intended to enable researchers to create courses, websites, and videos about the usage of their tools and other fundamental topics in a manner that is easy to update, enabling content to stay up-to-date with technological innovations. Standardized training on shared resources through networks such as ITCR will ensure that the new and expanding wave of cancer researchers have access to the appropriate methods and best practices suited to specific biological and clinical research questions. In turn, these questions will motivate the next generation of measurement technologies, analysis methods, and software that will empower biological discovery for cancer and beyond to the broader biomedical research community. Together, this blending of informatics and oncology research marks the start of a new paradigm of a tripartite mission of data science, bench, and beside in the current generation of cancer research.
Acknowledgments
We thank Juli Klemm for feedback on the manuscript and her tireless advocacy for open-source software development in the NCI ITCR. Her leadership has promoted the culture of open science, empowering democratized access to informatics for modern cancer research. This work was supported by NIH grants U24CA279629 and U01CA242871 (to S. Bakas); U01CA269409 (to P.G. Camara); U01CA220401 and U24CA19436201 (to L.A.D. Cooper); U24CA237719, U24CA258115, U24CA275783, and U01CA248235 (to M. Griffith); U24CA231877, U2CCA233280, and U24HG010263 (to J. Goecks); U24 CA269436, 1OT2OD032742-01, 5P41GM103504-12, and 5U24HG012107-02 (to T. Ideker); 1U54CA274502-01 (to T. Ideker as MPI); U24CA258393 (to R. Karchin); 5U24CA189523-05, 5R01CA161749-09, and 5R01CA223816-05 (to D. Kontos); 5U24CA258482, 5R01EB009352, and 5U24CA253531 (to D. Marcus); U24CA237617 (to C.A. Meyer); U24CA248138 (to B. Peters); 1U24CA269436-01A1 (to D. Pratt); U24CA248453 (to B.J. Raphael); U24CA248457 (to M. Reich); U24CA248010 (to G.K. Savova); UE5CA254170 (to C. Wright); and U01CA271273 and U54CA253403 (to E.J. Fertig).
Authors’ Disclosures
J. Goecks reports other support from GalaxyWorks outside the submitted work. M. Griffith reports grants from the NIH during the conduct of the study, as well as personal fees from Pathfinder Oncology, grants and personal fees from Jaime Leandro Foundation, and grants from Natera, Inc., outside the submitted work. T. Ideker reports being a cofounder, member of the advisory board, and having an equity interest in Data4Cure and Serinus Biosciences. T. Ideker is a consultant for and has an equity interest in Eikon Therapeutics and Ideaya Biosciences. The terms of these arrangements have been reviewed and approved by the University of California San Diego in accordance with its conflict-of-interest policies. D. Kontos reports grants from GenMab, Calico, iCAD, and Hologic outside the submitted work. D. Marcus reports grants from the NIH during the conduct of the study and other support from Embark Laboratories outside the submitted work. B. Peters reports grants from the NCI during the conduct of the study. B.J. Raphael reports grants from the NCI during the conduct of the study and personal fees from Merck outside the submitted work. G.K. Savova reports grants from the NIH during the conduct of the study. C. Wright reports grants from the NCI during the conduct of the study and personal fees from Coursera and Leanpub outside the submitted work. E.J. Fertig reports grants from the NIH/NCI, the National Foundation for Cancer Research, Break Through Cancer, and the Lustgarten Foundation during the conduct of the study, as well as personal fees from Mestag Therapeutics and Resistance Bio and grants from Roche/Genetech and Abbvie, Inc., outside the submitted work. S. Bakas reports grants from the NIH/NCI/ITCR during the conduct of the study and is a member of the Board of Directors of the Medical Image Computing and Computer Assisted Interventions Society; the Chair of the AI-RANO cooperative group; and the Vice Chair for Benchmarking and Clinical Translation in the MLCommons organization. No disclosures were reported by the other authors.
References
- 1. Dagogo-Jack I, Shaw AT. Tumour heterogeneity and resistance to cancer therapies. Nat Rev Clin Oncol 2018;15:81–94. [DOI] [PubMed] [Google Scholar]
- 2. Chaunzwa TL, Del Rey MQ, Bitterman DS. Clinical informatics approaches to understand and address cancer disparities. Yearb Med Inform 2022;31:121–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Hong N, Sun G, Zuo X, Chen M, Liu L, Wang J, et al. Application of informatics in cancer research and clinical practice: opportunities and challenges. Cancer Innov 2022;1:80–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform 2016;17:628–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Correa-Aguila R, Alonso-Pupo N, Hernández-Rodríguez EW. Multi-omics data integration approaches for precision oncology. Mol Omics 2022;18:469–79. [DOI] [PubMed] [Google Scholar]
- 6. Patel SK, George B, Rai V. Artificial intelligence to decode cancer mechanism: beyond patient stratification for precision oncology. Front Pharmacol 2020;11:1177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Köster J, Rahmann S. Snakemake–a scalable bioinformatics workflow engine. Bioinformatics 2012;28:2520–2. [DOI] [PubMed] [Google Scholar]
- 8. Bonaguro L, Schulte-Schrepping J, Ulas T, Aschenbrenner AC, Beyer M, Schultze JL. A guide to systems-level immunomics. Nat Immunol 2022;23:1412–23. [DOI] [PubMed] [Google Scholar]
- 9. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436–44. [DOI] [PubMed] [Google Scholar]
- 10. Bitterman DS, Goldner E, Finan S, Harris D, Durbin EB, Hochheiser H, et al. An end-to-end natural language processing system for automatically extracting radiation therapy events from clinical texts. Int J Radiat Oncol Biol Phys 2023;117:262–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Savova GK, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, et al. DeepPhe: a natural language processing system for extracting cancer phenotypes from clinical records. Cancer Res 2017;77:e115–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Yao J, Hochheiser H, Yoon W, Goldner E, Savova GK, Naumann T, et al. Overview of the 2024 shared task on chemotherapy treatment timeline extraction. In: Proceedings of the 6th Clinical Natural Language Processing Workshop ; Mexico City, Mexico: Association for Computational Linguistics; 2024. p. 557–69. [Google Scholar]
- 13. Bradshaw RL, Kawamoto K, Kaphingst KA, Kohlmann WK, Hess R, Flynn MC, et al. GARDE: a standards-based clinical decision support platform for identifying population health management cohorts. J Am Med Inform Assoc 2022;29:928–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. LLS PedAL . Genomic eligibility algorithm for better outcomes. The University of Chicago; 2025. [Google Scholar]
- 15. Lee S, Park HJ, Hwang J, Lee SW, Han KS, Kim WY, et al. Machine learning-based models for prediction of critical illness at community, paramedic, and hospital stages. Emerg Med Int 2023;2023:1221704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Lybarger K, Dobbins NJ, Long R, Singh A, Wedgeworth P, Uzuner Ö, et al. Leveraging natural language processing to augment structured social determinants of health data in the electronic health record. J Am Med Inform Assoc 2023;30:1389–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Chen S, Guevara M, Ramirez N, Murray A, Warner JL, Aerts H, et al. Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy. JCO Clin Cancer Inform 2023;7:e2300048. [DOI] [PubMed] [Google Scholar]
- 18. Chen S, Kann BH, Foote MB, Aerts HJWL, Savova GK, Mak RH, et al. Use of artificial intelligence chatbots for cancer treatment information. JAMA Oncol 2023;9:1459–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Bedi S, Liu Y, Orr-Ewing L, Dash D, Koyejo S, Callahan A, et al. Testing and evaluation of health care applications of large language models: a systematic review. JAMA 2025;333:319–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Napel S, Mu W, Jardim-Perassi BV, Aerts HJWL, Gillies RJ. Quantitative imaging of cancer in the postgenomic era: radio(geno)mics, deep learning, and habitats. Cancer 2018;124:4633–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Prior F, Smith K, Sharma A, Kirby J, Tarbox L, Clark K, et al. The public cancer radiology imaging collections of The Cancer Imaging Archive. Sci Data 2017;4:170124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Chakrabarty S, Abidi SA, Mousa M, Mokkarala M, Hren I, Yadav D, et al. Integrative imaging informatics for cancer research: workflow automation for neuro-oncology (I3CR-WANO). JCO Clin Cancer Inform 2023;7:e2200177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Kickingereder P, Isensee F, Tursunova I, Petersen J, Neuberger U, Bonekamp D, et al. Automated quantitative tumour response assessment of MRI in neuro-oncology with artificial neural networks: a multicentre, retrospective study. Lancet Oncol 2019;20:728–40. [DOI] [PubMed] [Google Scholar]
- 24. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med Image Anal 2017;42:60–88. [DOI] [PubMed] [Google Scholar]
- 25. Pai S, Bontempi D, Hadzic I, Prudente V, Sokač M, Chaunzwa TL, et al. Foundation model for cancer imaging biomarkers. Nat Mach Intell 2024;6:354–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Paschali M, Chen Z, Blankemeier L, Varma M, Youssef A, Bluethgen C, et al. Foundation models in radiology: what, how, why, and why not. Radiology 2025;314:e240597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Fedorov A, Beichel R, Kalpathy-Cramer J, Finet J, Fillion-Robin J-C, Pujol S, et al. 3D slicer as an image computing platform for the Quantitative Imaging Network. Magn Reson Imaging 2012;30:1323–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Kapur T, Pieper S, Fedorov A, Fillion-Robin J-C, Halle M, O’Donnell L, et al. Increasing the impact of medical image computing using community-based open-access hackathons: the NA-MIC and 3D slicer experience. Med Image Anal 2016;33:176–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Davatzikos C, Rathore S, Bakas S, Pati S, Bergman M, Kalarot R, et al. Cancer imaging phenomics toolkit: quantitative imaging analytics for precision diagnostics and predictive modeling of clinical outcome. J Med Imaging (Bellingham) 2018;5:011018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, et al. Computational radiomics system to decode the radiographic phenotype. Cancer Res 2017;77:e104–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Aerts HJWl, Grossmann P, Tan Y, Oxnard GR, Rizvi N, Schwartz LH, et al. Defining a radiomic response phenotype: a pilot study using targeted therapy in NSCLC. Sci Rep 2016;6:33860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Antunes JT, Ismail M, Hossain I, Wang Z, Prasanna P, Madabhushi A, et al. RADIomic spatial TexturAl descriptor (RADISTAT): quantifying spatial organization of imaging heterogeneity associated with tumor response to treatment. IEEE J Biomed Health Inform 2022;26:2627–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Braman N, Prasanna P, Bera K, Alilou M, Khorrami M, Leo P, et al. Novel radiomic measurements of tumor-associated vasculature morphology on clinical imaging as a biomarker of treatment response in multiple cancers. Clin Cancer Res 2022;28:4410–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Selim M, Zhang J, Fei B, Zhang G-Q, Chen J. STAN-CT: standardizing CT image using generative adversarial networks. AMIA Annu Symp Proc 2021;2020:1100–9. [PMC free article] [PubMed] [Google Scholar]
- 35. Farahani K, Kalpathy-Cramer J, Chenevert TL, Rubin DL, Sunderland JJ, Nordstrom RJ, et al. Computational challenges and collaborative projects in the NCI quantitative imaging network. Tomography 2016;2:242–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Marcus DS, Olsen TR, Ramaratnam M, Buckner RL. The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data. Neuroinformatics 2007;5:11–34. [DOI] [PubMed] [Google Scholar]
- 37. Fedorov A, Longabaugh WJR, Pot D, Clunie DA, Pieper SD, Gibbs DL, et al. National cancer institute imaging data commons: toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 2023;43:e230180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Ziegler E, Urban T, Brown D, Petts J, Pieper SD, Lewis R, et al. Open health imaging foundation viewer: an extensible open-source framework for building web-based imaging applications to support cancer research. JCO Clin Cancer Inform 2020;4:336–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Doran SJ, Al Sa’d M, Petts JA, Darcy J, Alpert K, Cho W, et al. Integrating the OHIF viewer into XNAT: achievements, challenges and prospects for quantitative imaging studies. Tomography 2022;8:497–512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Moore SM, Quirk JD, Lassiter AW, Laforest R, Ayers GD, Badea CT, et al. Co-clinical imaging metadata information (CIMI) for cancer research to promote open science, standardization, and reproducibility in preclinical imaging. Tomography 2023;9:995–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Branzoli F, Liserre R, Deelchand DK, Poliani PL, Bielle F, Nichelli L, et al. Neurochemical differences between 1p/19q codeleted and noncodeleted IDH-Mutant gliomas by in vivo MR spectroscopy. Radiology 2023;308:e223255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Zhang F, Noh T, Juvekar P, Frisken SF, Rigolo L, Norton I, et al. SlicerDMRI: diffusion MRI and tractography research software for brain cancer surgery planning and visualization. JCO Clin Cancer Inform 2020;4:299–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Zhong Z, Kim Y, Plichta K, Allen BG, Zhou L, Buatti J, et al. Simultaneous cosegmentation of tumors in PET-CT images using deep fully convolutional networks. Med Phys 2019;46:619–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Agraz JL, Grenko CM, Chen AA, Viaene AN, Nasrallah MD, Pati S, et al. Robust image population based stain color normalization: how many reference slides are enough? IEEE Open J Eng Med Biol 2022;3:218–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Wang L, Goldwag J, Bouyea M, Barra J, Matteson K, Maharjan N, et al. Spatial topology of organelle is a new breast cancer cell classifier. iScience 2023;26:107229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Mercan E, Mehta S, Bartlett J, Shapiro LG, Weaver DL, Elmore JG. Assessment of machine learning of breast pathology structures for automated differentiation of breast cancer and high-risk proliferative lesions. JAMA Netw Open 2019;2:e198777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Zhang X, Cornish TC, Yang L, Bennett TD, Ghosh D, Xing F. Generative adversarial domain adaptation for nucleus quantification in images of tissue immunohistochemically stained for Ki-67. JCO Clin Cancer Inform 2020;4:666–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Uttam S, Stern AM, Sevinsky CJ, Furman S, Pullara F, Spagnolo D, et al. Spatial domain analysis predicts risk of colorectal cancer recurrence and infers associated tumor microenvironment networks. Nat Commun 2020;11:3515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Rong R, Sheng H, Jin KW, Wu F, Luo D, Wen Z, et al. A deep learning approach for histology-based nucleus segmentation and tumor microenvironment characterization. Mod Pathol 2023;36:100196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Amgad M, Hodge JM, Elsebaie MAT, Bodelon C, Puvanesarajah S, Gutman DA, et al. A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer. Nat Med 2024;30:85–97. [DOI] [PubMed] [Google Scholar]
- 51. Bray M-A, Singh S, Han H, Davis CT, Borgeson B, Hartland C, et al. Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 2016;11:1757–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Rivenson Y, Wang H, Wei Z, de Haan K, Zhang Y, Wu Y, et al. Virtual histological staining of unlabelled tissue-autofluorescence images via deep learning. Nat Biomed Eng 2019;3:466–77. [DOI] [PubMed] [Google Scholar]
- 53. Kiemen AL, Braxton AM, Grahn MP, Han KS, Babu JM, Reichel R, et al. CODA: quantitative 3D reconstruction of large tissues at cellular resolution. Nat Methods 2022;19:1490–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Martel AL, Hosseinzadeh D, Senaras C, Zhou Y, Yazdanpanah A, Shojaii R, et al. An image analysis resource for cancer research: Piip-pathology image informatics platform for visualization, analysis, and management. Cancer Res 2017;77:e83–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Gutman DA, Khalilia M, Lee S, Nalisnik M, Mullen Z, Beezley J, et al. The digital slide archive: a software platform for management, integration, and analysis of histology for cancer research. Cancer Res 2017;77:e75–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Shen A, Wang F, Paul S, Bhuvanapalli D, Alayof J, Farris AB, et al. An integrative web-based software tool for multi-dimensional pathology whole-slide image analytics. Phys Med Biol 2022;67:224001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Aggarwal A, Bharadwaj S, Corredor G, Pathak T, Badve S, Madabhushi A. Artificial intelligence in digital pathology - time for a reality check. Nat Rev Clin Oncol 2025;22:283–91. [DOI] [PubMed] [Google Scholar]
- 58. Chen RJ, Ding T, Lu MY, Williamson DFK, Jaume G, Song AH, et al. Towards a general-purpose foundation model for computational pathology. Nat Med 2024;30:850–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Wang X, Zhao J, Marostica E, Yuan W, Jin J, Zhang J, et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 2024;634:970–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Xu H, Usuyama N, Bagga J, Zhang S, Rao R, Naumann T, et al. A whole-slide foundation model for digital pathology from real-world data. Nature 2024;630:181–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Nowell PC, Hungerford DA. Chromosome studies on normal and leukemic human leukocytes. J Natl Cancer Inst 1960;25:85–109. [PubMed] [Google Scholar]
- 62. Weir B, Zhao X, Meyerson M. Somatic alterations in the human cancer genome. Cancer Cell 2004;6:433–8. [DOI] [PubMed] [Google Scholar]
- 63. Chen Z, Yuan Y, Chen X, Chen J, Lin S, Li X, et al. Systematic comparison of somatic variant calling performance among different sequencing depth and mutation frequency. Sci Rep 2020;10:3501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Stahlberg EA, Abdel-Rahman M, Aguilar B, Asadpoure A, Beckman RA, Borkon LL, et al. Exploring approaches for predictive cancer patient digital twins: opportunities for collaboration and innovation. Front Digit Health 2022;4:1007784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Ji S, Zhu T, Sethia A, Wang W. Accelerated somatic mutation calling for whole-genome and whole-exome sequencing data from heterogenous tumor samples. Genome Res 2024;34:633–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 2011;21:974–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput Biol 2016;12:e1004873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Zaccaria S, Raphael BJ. Accurate quantification of copy-number aberrations and whole-genome duplications in multi-sample tumor sequencing data. Nat Commun 2020;11:4301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Gabrielaite M, Torp MH, Rasmussen MS, Andreu-Sánchez S, Vieira FG, Pedersen CB, et al. A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. Cancers (Basel) 2021;13:6283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol 2014;15:R84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 2016;32:1220–2. [DOI] [PubMed] [Google Scholar]
- 72. Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet 2020;21:597–614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, et al. Mutational landscape and significance across 12 major cancer types. Nature 2013;502:333–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Pagel KA, Tokheim C, Kim R, Moad K, Busby B, Zheng L, et al. Integrated informatics analysis of cancer-related variants. JCO Clin Cancer Inform 2020;4:310–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Tarabichi M, Salcedo A, Deshwar AG, Ni Leathlobhair M, Wintersinger J, Wedge DC, et al. A practical guide to cancer subclonal reconstruction from DNA sequencing. Nat Methods 2021;18:144–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Sashittal P, Zhang H, Iacobuzio-Donahue CA, Raphael BJ. ConDoR: tumor phylogeny inference with a copy-number constrained mutation loss model. Genome Biol 2023;24:272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Shin H-C, Lee H-B, Yoo T-K, Lee E-S, Kim RN, Park B, et al. Detection of germline mutations in breast cancer patients with clinical features of hereditary cancer syndrome using a multi-gene panel test. Cancer Res Treat 2020;52:697–713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Stoffel EM, Koeppe E, Everett J, Ulintz P, Kiel M, Osborne J, et al. Germline genetic features of young individuals with colorectal cancer. Gastroenterology 2018;154:897–905.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Liu M, Liu X, Suo P, Gong Y, Qu B, Peng X, et al. The contribution of hereditary cancer-related germline mutations to lung cancer susceptibility. Transl Lung Cancer Res 2020;9:646–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Rogers MF, Gaunt TR, Campbell C. CScape-somatic: distinguishing driver and passenger point mutations in the cancer genome. Bioinformatics 2020;36:3637–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Sato G, Shirai Y, Namba S, Edahiro R, Sonehara K, Hata T, et al. Pan-cancer and cross-population genome-wide association studies dissect shared genetic backgrounds underlying carcinogenesis. Nat Commun 2023;14:3671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Carrot-Zhang J, Chambwe N, Damrauer JS, Knijnenburg TA, Robertson AG, Yau C, et al. Comprehensive analysis of genetic ancestry and its molecular correlates in cancer. Cancer Cell 2020;37:639–54.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Graciotti M, Kandalaft LE. Vaccines for cancer prevention: exploring opportunities and navigating challenges. Nat Rev Drug Discov 2025;24:134–50. [DOI] [PubMed] [Google Scholar]
- 84. Hicks JK, Howard R, Reisman P, Adashek JJ, Fields KK, Gray JE, et al. Integrating somatic and germline next-generation sequencing into routine clinical oncology practice. JCO Precis Oncol 2021;5:884–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. Pugh TJ, Bell JL, Bruce JP, Doherty GJ, Galvin M, Green MF, et al. AACR project GENIE: 100,000 cases and beyond. Cancer Discov 2022;12:2044–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell 2011;144:646–74. [DOI] [PubMed] [Google Scholar]
- 87. Deng W, Mou T, Pawitan Y, Vu TN. Quantification of mutant-allele expression at isoform level in cancer from RNA-seq data. NAR Genom Bioinform 2022;4:lqac052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Zhang T, Jia H, Song T, Lv L, Gulhan DC, Wang H, et al. De novo identification of expressed cancer somatic mutations from single-cell RNA sequencing data. Genome Med 2023;15:115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Wen J, Rusch M, Brady SW, Shao Y, Edmonson MN, Shaw TI, et al. The landscape of coding RNA editing events in pediatric cancer. BMC Cancer 2021;21:1233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90. Mertes C, Scheller IF, Yépez VA, Çelik MH, Liang Y, Kremer LS, et al. Detection of aberrant splicing events in RNA-seq data using FRASER. Nat Commun 2021;12:529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Dorney R, Dhungel BP, Rasko JEJ, Hebbard L, Schmitz U. Recent advances in cancer fusion transcript detection. Brief Bioinform 2023;24:bbac519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Ozcan Z, San Lucas FA, Wong JW, Chang K, Stopsack KH, Fowler J, et al. Chromosomal imbalances detected via RNA-sequencing in 28 cancers. Bioinformatics 2022;38:1483–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Zapatka M, Borozan I, Brewer DS, Iskar M, Grundhoff A, Alawi M, et al. The landscape of viral associations in human cancers. Nat Genet 2020;52:320–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. Hu X, Zhang J, Wang J, Fu J, Li T, Zheng X, et al. Landscape of B cell immunity and related immune evasion in human cancers. Nat Genet 2019;51:560–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95. Benesova S, Kubista M, Valihrach L. Small RNA-Sequencing: approaches and considerations for miRNA analysis. Diagnostics (Basel) 2021;11:964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96. Al’Khafaji AM, Smith JT, Garimella KV, Babadi M, Popic V, Sade-Feldman M, et al. High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat Biotechnol 2024;42:582–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97. Haas BJ, Dobin A, Li B, Stransky N, Pochet N, Regev A. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol 2019;20:213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98. Haas BJ, Dobin A, Ghandi M, Van Arsdale A, Tickle T, Robinson JT, et al. Targeted in silico characterization of fusion transcripts in tumor and normal tissues via FusionInspector. Cell Rep Methods 2023;3:100467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99. Van Arsdale A, Turker L, Chang Y-C, Gould J, Harmon B, Maggi EC, et al. Structure and transcription of integrated HPV DNA in vulvar carcinomas. NPJ Genom Med 2024;9:35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100. Kaur J, Chandrashekar DS, Varga Z, Sobottka B, Janssen E, Kowalski J, et al. Distinct gene expression profiles of matched primary and metastatic triple-negative breast cancers. Cancers (Basel) 2022;14:2447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101. Gao W, Yang J, Zhuo C, Huang S, Lin J, Wu G, et al. A pipeline to call multilevel expression changes between cancer and normal tissues and its applications in repurposing drugs effective for gastric cancer. Biomed Res Int 2020;2020:3451610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102. Wang YE, Kutnetsov L, Partensky A, Farid J, Quackenbush J. WebMeV: a cloud platform for analyzing and visualizing cancer genomic data. Cancer Res 2017;77:e11–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103. Herman JS, Sagar, Grün D. FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat Methods 2018;15:379–86. [DOI] [PubMed] [Google Scholar]
- 104. Lin P, Troup M, Ho JWK. CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol 2017;18:59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105. Sherman TD, Gao T, Fertig EJ. CoGAPS 3: bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data structures. BMC Bioinformatics 2020;21:453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106. Shao C, Höfer T. Robust classification of single-cell transcriptome data by nonnegative matrix factorization. Bioinformatics 2017;33:235–42. [DOI] [PubMed] [Google Scholar]
- 107. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005;102:15545–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 2015;1:417–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109. Irizarry RA, Wang C, Zhou Y, Speed TP. Gene set enrichment analysis made simple. Stat Methods Med Res 2009;18:565–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110. Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, et al. Systematic RNA interference reveals that oncogenic KRAS-Driven cancers require TBK1. Nature 2009;462:108–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111. Fan J, Salathia N, Liu R, Kaeser GE, Yung YC, Herman JL, et al. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat Methods 2016;13:241–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112. Meng C, Basunia A, Peters B, Gholami AM, Kuster B, Culhane AC. MOGSA: integrative single sample gene-set analysis of multiple omics data. Mol Cell Proteomics 2019;18(8 suppl 1):S153–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113. Haan JC, Bhaskaran R, Ellappalayam A, Bijl Y, Griffioen CJ, Lujinovic E, et al. MammaPrint and BluePrint comprehensively capture the cancer hallmarks in early-stage breast cancer patients. Genes Chromosomes Cancer 2022;61:148–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114. Schildgen V, Warm M, Brockmann M, Schildgen O. Oncotype DX breast cancer recurrence score resists inter-assay reproducibility with RT2-profiler multiplex RT-PCR. Sci Rep 2019;9:20266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115. Chung CH, Parker JS, Karaca G, Wu J, Funkhouser WK, Moore D, et al. Molecular classification of head and neck squamous cell carcinomas using patterns of gene expression. Cancer Cell 2004;5:489–500. [DOI] [PubMed] [Google Scholar]
- 116. Perou CM, Sørlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. Molecular portraits of human breast tumours. Nature 2000;406:747–52. [DOI] [PubMed] [Google Scholar]
- 117. Chu T, Wang Z, Pe’er D, Danko CG. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat Cancer 2022;3:505–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118. Williams CG, Lee HJ, Asatsuma T, Vento-Tormo R, Haque A. An introduction to spatial transcriptomics for biomedical research. Genome Med 2022;14:68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119. Wang Y, Sarfraz I, Pervaiz N, Hong R, Koga Y, Akavoor V, et al. Interactive analysis of single-cell data using flexible workflows with SCTK2. Patterns (N Y) 2023;4:100814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120. Linderman GC, Zhao J, Roulis M, Bielecki P, Flavell RA, Nadler B, et al. Zero-preserving imputation of single-cell RNA-seq data. Nat Commun 2022;13:192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121. Luecken MD, Büttner M, Chaichoompu K, Danese A, Interlandi M, Mueller MF, et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 2022;19:41–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122. Abdelaal T, Michielsen L, Cats D, Hoogduin D, Mei H, Reinders MJT, et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol 2019;20:194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123. Stein-O’Brien GL, Clark BS, Sherman T, Zibetti C, Hu Q, Sealfon R, et al. Decomposing cell identity for transfer learning across cellular measurements, platforms, tissues, and species. Cell Syst 2019;8:395–411.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124. Gao R, Bai S, Henderson YC, Lin Y, Schalck A, Yan Y, et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat Biotechnol 2021;39:599–608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125. Street K, Risso D, Fletcher RB, Das D, Ngai J, Yosef N, et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 2018;19:477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126. Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014;32:381–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127. Yang Q, Xu Z, Zhou W, Wang P, Jiang Q, Juan L. An interpretable single-cell RNA sequencing data clustering method based on latent Dirichlet allocation. Brief Bioinform 2023;24:bbad199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128. Van Laere M, Van Goethem M, Bosmans J. Fibrosis of the breast with papillomatosis and calcifications. J Belge Radiol 1990;73:536–7. [PubMed] [Google Scholar]
- 129. Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S, et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods 2008;5:829–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130. Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 2008;132:311–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-Binding proteins and nucleosome position. Nat Methods 2013;10:1213–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132. Belton J-M, McCord RP, Gibcus JH, Naumova N, Zhan Y, Dekker J. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 2012;58:268–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 2009;462:315–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134. Taing L, Dandawate A, L’Yi S, Gehlenborg N, Brown M, Meyer CA. Cistrome Data Browser: integrated search, analysis and visualization of chromatin data. Nucleic Acids Res 2024;52:D61–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135. Li S, Wan C, Zheng R, Fan J, Dong X, Meyer CA, et al. Cistrome-GO: a web server for functional enrichment analysis of transcription factor ChIP-seq peaks. Nucleic Acids Res 2019;47:W206–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136. Qin Q, Fan J, Zheng R, Wan C, Mei S, Wu Q, et al. Lisa: inferring transcriptional regulators through integrative modeling of public chromatin accessibility and ChIP-seq data. Genome Biol 2020;21:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137. L’Yi S, Keller MS, Dandawate A, Taing L, Chen C-H, Brown M, et al. Cistrome Explorer: an interactive visual analysis tool for large-scale epigenomic data. Bioinformatics 2023;39:btad018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138. L’Yi S, Wang Q, Lekschas F, Gehlenborg N. Gosling: a grammar-based toolkit for scalable and interactive genomics data visualization. IEEE Trans Vis Comput Graph 2022;28:140–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139. Hvitfeldt E, Xia C, Siegmund KD, Shibata D, Marjoram P. Epigenetic conservation is a beacon of function: an analysis using Methcon5 software for studying gene methylation. JCO Clin Cancer Inform 2020;4:100–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140. Silva TC, Coetzee SG, Gull N, Yao L, Hazelett DJ, Noushmehr H, et al. ELMER v.2: an R/Bioconductor package to reconstruct gene regulatory networks from DNA methylation and transcriptome profiles. Bioinformatics 2019;35:1974–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141. Guo X, Feng C, Xing J, Cao Y, Liu T, Yang W, et al. Epigenetic profiling for prognostic stratification and personalized therapy in breast cancer. Front Immunol 2024;15:1510829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142. Wang N, Li Y, Wang Y, Wang W. Integration of multi-omics profiling reveals an epigenetic-based molecular classification of lung adenocarcinoma: implications for drug sensitivity and immunotherapy response prediction. Front Pharmacol 2025;16:1540477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143. Cristiano S, Leal A, Phallen J, Fiksel J, Adleff V, Bruhm DC, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 2019;570:385–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144. Chen M-JM, Li J, Wang Y, Akbani R, Lu Y, Mills GB, et al. TCPA v3.0: an integrative platform to explore the pan-cancer analysis of functional proteomic data. Mol Cell Proteomics 2019;18(8 Suppl 1):S15–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145. Wang K, Huang C, Nice E. Recent advances in proteomics: towards the human proteome. Biomed Chromatogr 2014;28:848–57. [DOI] [PubMed] [Google Scholar]
- 146. Fortelny N, Overall CM, Pavlidis P, Freue GVC. Can we predict protein from mRNA levels? Nature 2017;547:E19–20. [DOI] [PubMed] [Google Scholar]
- 147. Ponomarenko EA, Krasnov GS, Kiseleva OI, Kryukova PA, Arzumanian VA, Dolgalev GV, et al. Workability of mRNA sequencing for predicting protein abundance. Genes (Basel) 2023;14:2065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148. Rodriguez H, Zenklusen JC, Staudt LM, Doroshow JH, Lowy DR. The next horizon in precision oncology: proteogenomics to inform cancer diagnosis and treatment. Cell 2021;184:1661–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149. Thangudu RR, Rudnick PA, Holck M, Singhal D, MacCoss MJ, Edwards NJ, et al. Abstract LB-242: proteomic data commons: a resource for proteogenomic analysis. Cancer Res 2020;80(Suppl 18):LB-242. [Google Scholar]
- 150. Rudnick PA, Markey SP, Roth J, Mirokhin Y, Yan X, Tchekhovskoi DV, et al. A description of the clinical proteomic tumor analysis consortium (CPTAC) common data analysis pipeline. J Proteome Res 2016;15:1023–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151. Tabb DL. The SEQUEST family tree. J Am Soc Mass Spectrom 2015;26:1814–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152. Kumar P, Johnson JE, Easterly C, Mehta S, Sajulga R, Nunn B, et al. A sectioning and database enrichment approach for improved peptide spectrum matching in large, genome-guided protein sequence databases. J Proteome Res 2020;19:2772–85. [DOI] [PubMed] [Google Scholar]
- 153. Tyanova S, Temu T, Cox J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 2016;11:2301–19. [DOI] [PubMed] [Google Scholar]
- 154. Mehta S, Easterly CW, Sajulga R, Millikin RJ, Argentini A, Eguinoa I, et al. Precursor intensity-based label-free quantification software tools for proteomic and multi-omic analysis within the galaxy platform. Proteomes 2020;8:15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 155. Matlock MK, Holehouse AS, Naegle KM. ProteomeScout: a repository and analysis resource for post-translational modifications and proteins. Nucleic Acids Res 2015;43:D521–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156. Lindgren CM, Adams DW, Kimball B, Boekweg H, Tayler S, Pugh SL, et al. Simplified and unified access to cancer proteogenomic data. J Proteome Res 2021;20:1902–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157. Mani DR, Maynard M, Kothadia R, Krug K, Christianson KE, Heiman D, et al. PANOPLY: a cloud-based platform for automated and reproducible proteogenomic data analysis. Nat Methods 2021;18:580–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 158. Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, et al. A galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023;20:251–66. [DOI] [PubMed] [Google Scholar]
- 159. Miller RM, Millikin RJ, Rolfs Z, Shortreed MR, Smith LM. Enhanced proteomic data analysis with MetaMorpheus. Methods Mol Biol 2023;2426:35–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160. Krug K, Mertins P, Zhang B, Hornbeck P, Raju R, Ahmad R, et al. A curated resource for phosphosite-specific signature analysis. Mol Cell Proteomics 2019;18:576–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161. Needham EJ, Parker BL, Burykin T, James DE, Humphrey SJ. Illuminating the dark phosphoproteome. Sci Signal 2019;12:eaau8645. [DOI] [PubMed] [Google Scholar]
- 162. Webb-Robertson B-JM, Wiberg HK, Matzke MM, Brown JN, Wang J, McDermott JE, et al. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J Proteome Res 2015;14:1993–2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 163. Bramer LM, Irvahn J, Piehowski PD, Rodland KD, Webb-Robertson B-JM. A review of imputation strategies for isobaric labeling-based shotgun proteomics. J Proteome Res 2021;20:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164. Webb-Robertson B-JM, Bramer LM, Jensen JL, Kobold MA, Stratton KG, White AM, et al. P-MartCancer-interactive online software to enable analysis of shotgun cancer proteomic datasets. Cancer Res 2017;77:e47–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 165. Bramer LM, Stratton KG, White AM, Bleeker AH, Kobold MA, Waters KM, et al. P-Mart: interactive analysis of ion abundance global proteomics data. J Proteome Res 2019;18:1426–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166. Crowl S, Jordan BT, Ahmed H, Ma CX, Naegle KM. KSTAR: an algorithm to predict patient-specific kinase activities from phosphoproteomic data. Nat Commun 2022;13:4283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167. Guo G, Papanicolaou M, Demarais NJ, Wang Z, Schey KL, Timpson P, et al. Automated annotation and visualisation of high-resolution spatial proteomic mass spectrometry imaging data using HIT-MAP. Nat Commun 2021;12:3241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 168. Tsai C-F, Wang Y-T, Hsu C-C, Kitata RB, Chu RK, Velickovic M, et al. A streamlined tandem tip-based workflow for sensitive nanoscale phosphoproteomics. Commun Biol 2023;6:70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 169. Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 2015;33:495–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 170. Abdelaal T, Mourragui S, Mahfouz A, Reinders MJT. SpaGE: spatial gene enhancement using scRNA-seq. Nucleic Acids Res 2020;48:e107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 171. Govek KW, Troisi EC, Miao Z, Aubin RG, Woodhouse S, Camara PG. Single-cell transcriptomic analysis of mIHC images via antigen mapping. Sci Adv 2021;7:eabc5464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 172. Zhang D, Deng Y, Kukanja P, Agirre E, Bartosovic M, Dong M, et al. Spatial epigenome-transcriptome co-profiling of Mammalian tissues. Nature 2023;616:113–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 173. Xing F, Yang L. Robust nucleus/cell detection and segmentation in digital pathology and microscopy images: a comprehensive review. IEEE Rev Biomed Eng 2016;9:234–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 174. Moen E, Bannon D, Kudo T, Graf W, Covert M, Van Valen D. Deep learning for cellular image analysis. Nat Methods 2019;16:1233–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 175. Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol 2022;40:517–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 176. Bell ATF, Mitchell JT, Kiemen AL, Lyman M, Fujikura K, Lee JW, et al. PanIN and CAF transitions in pancreatic carcinogenesis revealed with spatial data integration. Cell Syst 2024;15:753–69.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 177. Zhang D, Schroeder A, Yan H, Yang H, Hu J, Lee MYY, et al. Inferring super-resolution tissue architecture by integrating spatial transcriptomics with histology. Nat Biotechnol 2024;42:1372–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 178. He S, Jin Y, Nazaret A, Shi L, Chen X, Rampersaud S, et al. Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumor-immune hubs. Nat Biotechnol 2025;43:223–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 179. Dries R, Zhu Q, Dong R, Eng C-HL, Li H, Liu K, et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol 2021;22:78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 180. Schapiro D, Sokolov A, Yapp C, Chen Y-A, Muhlich JL, Hess J, et al. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods 2022;19:311–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 181. BinTayyash N, Georgaka S, John ST, Ahmed S, Boukouvalas A, Hensman J, et al. Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments. Bioinformatics 2021;37:3788–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 182. Zhu Q, Shah S, Dries R, Cai L, Yuan G-C. Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nat Biotechnol 2018;36:1183–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 183. Arnol D, Schapiro D, Bodenmiller B, Saez-Rodriguez J, Stegle O. Modeling cell-cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep 2019;29:202–11.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 184. Pham D, Tan X, Balderson B, Xu J, Grice LF, Yoon S, et al. Robust mapping of spatiotemporal trajectories and cell-cell interactions in healthy and diseased tissues. Nat Commun 2023;14:7739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 185. Cang Z, Nie Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat Commun 2020;11:2084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 186. Tanevski J, Flores ROR, Gabor A, Schapiro D, Saez-Rodriguez J. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol 2022;23:97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 187. Zeira R, Land M, Strzalkowski A, Raphael BJ. Alignment and integration of spatial transcriptomics data. Nat Methods 2022;19:567–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 188. Ospina OE, Wilson CM, Soupir AC, Berglund A, Smalley I, Tsai KY, et al. spatialGE: quantification and visualization of the tumor microenvironment heterogeneity using spatial transcriptomics. Bioinformatics 2022;38:2645–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 189. Zhu J, Fan Y, Xiong Y, Wang W, Chen J, Xia Y, et al. Delineating the dynamic evolution from preneoplasia to invasive lung adenocarcinoma by integrating single-cell RNA sequencing and spatial transcriptomics. Exp Mol Med 2022;54:2060–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 190. Gouin KH 3rd, Ing N, Plummer JT, Rosser CJ, Ben Cheikh B, Oh C, et al. An N-Cadherin 2 expressing epithelial cell subpopulation predicts response to surgery, chemotherapy and immunotherapy in bladder cancer. Nat Commun 2021;12:4906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 191. Arora R, Cao C, Kumar M, Sinha S, Chanda A, McNeil R, et al. Spatial transcriptomics reveals distinct and conserved tumor core and edge architectures that predict survival and targeted therapy response. Nat Commun 2023;14:5029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 192. Ferri-Borgogno S, Zhu Y, Sheng J, Burks JK, Gomez JA, Wong KK, et al. Spatial transcriptomics depict ligand-receptor cross-talk heterogeneity at the tumor-stroma interface in long-term ovarian cancer survivors. Cancer Res 2023;83:1503–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 193. Toki MI, Merritt CR, Wong PF, Smithy JW, Kluger HM, Syrigos KN, et al. High-plex predictive marker discovery for melanoma immunotherapy-treated patients using digital spatial profiling. Clin Cancer Res 2019;25:5503–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 194. Blise KE, Sivagnanam S, Betts CB, Betre K, Kirchberger N, Tate BJ, et al. Machine learning links T-cell function and spatial localization to neoadjuvant immunotherapy and clinical outcome in pancreatic cancer. Cancer Immunol Res 2024;12:544–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 195. Reardon B, Moore ND, Moore NS, Kofman E, AlDubayan SH, Cheung ATM, et al. Integrating molecular profiles into clinical frameworks through the Molecular Oncology Almanac to prospectively guide precision oncology. Nat Cancer 2021;2:1102–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 196. Krysiak K, Danos AM, Saliba J, McMichael JF, Coffman AC, Kiwala S, et al. CIViCdb 2022: evolution of an open-access cancer variant interpretation knowledgebase. Nucleic Acids Res 2023;51:D1230–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 197. Yim W-W, Yetisgen M, Harris WP, Kwan SW. Natural language processing in oncology: a review. JAMA Oncol 2016;2:797–804. [DOI] [PubMed] [Google Scholar]
- 198. Ghosh R, Oak N, Plon SE. Evaluation of in silico algorithms for use with ACMG/AMP clinical variant interpretation guidelines. Genome Biol 2017;18:1353–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 199. Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019;20:223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 200. Brandes N, Goldman G, Wang CH, Ye CJ, Ntranos V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat Genet 2023;55:1512–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 201. Martin FJ, Amode MR, Aneja A, Austine-Orimoloye O, Azov AG, Barnes I, et al. Ensembl 2023. Nucleic Acids Res 2023;51:D933–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 202. Chang MT, Bhattarai TS, Schram AM, Bielski CM, Donoghue MTA, Jonsson P, et al. Accelerating discovery of functional mutant alleles in cancer. Cancer Discov 2018;8:174–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 203. Gudmundsson S, Singer-Berk M, Watts NA, Phu W, Goodrich JK, Solomonson M, et al. Variant interpretation using population databases: lessons from gnomAD. Hum Mutat 2022;43:1012–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 204. Griffith M, Spies NC, Krysiak K, McMichael JF, Coffman AC, Danos AM, et al. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat Genet 2017;49:170–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 205. Preston CG, Wright MW, Madhavrao R, Harrison SM, Goldstein JL, Luo X, et al. ClinGen Variant Curation Interface: a variant classification platform for the application of evidence criteria from ACMG/AMP guidelines. Genome Med 2022;14:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 206. Pawliczek P, Patel RY, Ashmore LR, Jackson AR, Bizon C, Nelson T, et al. ClinGen Allele Registry links information about genetic variants. Hum Mutat 2018;39:1690–701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 207. Wagner AH, Babb L, Alterovitz G, Baudis M, Brush M, Cameron DL, et al. The GA4GH Variation Representation Specification: a computational framework for variation representation and federated identification. Cell Genom 2021;1:100027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 208. Köhler S, Gargano M, Matentzoglu N, Carmody LC, Lewis-Smith D, Vasilevsky NA, et al. The human phenotype ontology in 2021. Nucleic Acids Res 2021;49:D1207–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 209. Horak P, Griffith M, Danos AM, Pitel BA, Madhavan S, Liu X, et al. Standards for the classification of pathogenicity of somatic variants in cancer (oncogenicity): joint recommendations of Clinical Genome Resource (ClinGen), Cancer Genomics Consortium (CGC), and Variant Interpretation for Cancer Consortium (VICC). Genet Med 2022;24:986–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 210. Li MM, Datto M, Duncavage EJ, Kulkarni S, Lindeman NI, Roy S, et al. Standards and guidelines for the interpretation and reporting of sequence variants in cancer: a joint consensus recommendation of the association for molecular pathology, American society of clinical oncology, and college of American pathologists. J Mol Diagn 2017;19:4–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 211. Mardis ER. The impact of next-generation sequencing on cancer genomics: from discovery to clinic. Cold Spring Harb Perspect Med 2019;9:a036269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 212. Gibbs SN, Peneva D, Cuyun Carter G, Palomares MR, Thakkar S, Hall DW, et al. Comprehensive review on the clinical impact of next-generation sequencing tests for the management of advanced cancer. JCO Precis Oncol 2023;7:e2200715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 213. Klein H, Mazor T, Siegel E, Trukhanov P, Ovalle A, Vecchio Fitz CD, et al. MatchMiner: an open-source platform for cancer precision medicine. NPJ Precis Oncol 2022;6:69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 214. Reisle C, Williamson LM, Pleasance E, Davies A, Pellegrini B, Bleile DW, et al. A platform for oncogenomic reporting and interpretation. Nat Commun 2022;13:756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 215. Tamborero D, Dienstmann R, Rachid MH, Boekel J, Lopez-Fernandez A, Jonsson M, et al. The Molecular Tumor Board Portal supports clinical decisions and automated reporting for precision oncology. Nat Cancer 2022;3:251–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 216. Dowdell AK, Meng RC, Vita A, Bapat B, Hanes D, Chang S-C, et al. Widespread adoption of precision anticancer therapies after implementation of pathologist-directed comprehensive genomic profiling across a large US health system. JCO Oncol Pract 2024;20:1523–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 217. Dias-Santagata D, Heist RS, Bard AZ, da Silva AFL, Dagogo-Jack I, Nardi V, et al. Implementation and clinical adoption of precision oncology workflows across a healthcare network. Oncologist 2022;27:930–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 218. Horgan D, Hamdi Y, Lal JA, Nyawira T, Meyer S, Kondji D, et al. Framework for adoption of next-generation sequencing (NGS) globally in the oncology area. Healthcare (Basel) 2023;11:431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 219. Ghazi Vakili M, Gorgulla C, Snider J, Nigam A, Bezrukov D, Varoli D, et al. Quantum-computing-enhanced algorithm unveils potential KRAS inhibitors. Nat Biotechnol 2025 Jan 22. [Epub ahead of print]. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 220. Lang F, Schrörs B, Löwer M, Türeci Ö, Sahin U. Identification of neoantigens for individualized therapeutic cancer vaccines. Nat Rev Drug Discov 2022;21:261–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 221. Xie N, Shen G, Gao W, Huang Z, Huang C, Fu L. Neoantigens: promising targets for cancer therapy. Signal Transduct Target Ther 2023;8:9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 222. Wang P, Chen Y, Wang C. Beyond tumor mutation burden: tumor neoantigen burden as a biomarker for immunotherapy and other types of therapy. Front Oncol 2021;11:672677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 223. Fotakis G, Trajanoski Z, Rieder D. Computational cancer neoantigen prediction: current status and recent advances. Immunooncol Technol 2021;12:100052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 224. Vensko SP, Olsen K, Bortone D, Smith CC, Chai S, Beckabir W, et al. LENS: landscape of effective neoantigens software. Bioinformatics 2023;39:btad322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 225. Haas BJ. Cancer transcriptome analysis toolkit wiki. GitHub; 2023. [Google Scholar]
- 226. Peng K, Nowicki TS, Campbell K, Vahed M, Peng D, Meng Y, et al. Rigorous benchmarking of T-cell receptor repertoire profiling methods for cancer RNA sequencing. Brief Bioinform 2023;24:bbad220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 227. Liu CC, Steen CB, Newman AM. Computational approaches for characterizing the tumor immune microenvironment. Immunology 2019;158:70–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 228. Giudicelli V, Duroux P, Rollin M, Aouinti S, Folch G, Jabado-Michaloud J, et al. IMGT® immunoinformatics tools for standardized V-DOMAIN analysis. Methods Mol Biol 2022;2453:477–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 229. Barker DJ, Maccari G, Georgiou X, Cooper MA, Flicek P, Robinson J, et al. The IPD-IMGT/HLA database. Nucleic Acids Res 2023;51:D1053–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 230. Koşaloğlu-Yalçın Z, Blazeska N, Vita R, Carter H, Nielsen M, Schoenberger S, et al. The cancer epitope database and analysis resource (CEDAR). Nucleic Acids Res 2023;51:D845–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 231. Bagaev DV, Vroomans RMA, Samir J, Stervbo U, Rius C, Dolton G, et al. VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium. Nucleic Acids Res 2020;8:D1057–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 232. Medina JE, Dracopoli NC, Bach PB, Lau A, Scharpf RB, Meijer GA, et al. Cell-free DNA approaches for cancer early detection and interception. J Immunother Cancer 2023;11:e006013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 233. Doebley A-L, Ko M, Liao H, Cruikshank AE, Santos K, Kikawa C, et al. A framework for clinical cancer subtyping from nucleosome profiling of cell-free DNA. Nat Commun 2022;13:7475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 234. Semenkovich NP, Szymanski JJ, Earland N, Chauhan PS, Pellini B, Chaudhuri AA. Genomic approaches to cancer and minimal residual disease detection using circulating tumor DNA. J Immunother Cancer 2023;11:e006284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 235. Ganesamoorthy D, Robertson AJ, Chen W, Hall MB, Cao MD, Ferguson K, et al. Whole genome deep sequencing analysis of cell-free DNA in samples with low tumour content. BMC Cancer 2022;22:85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 236. Jongbloed EM, Jansen MPHM, de Weerd V, Helmijr JA, Beaufort CM, Reinders MJT, et al. Machine learning-based somatic variant calling in cell-free DNA of metastatic breast cancer patients using large NGS panels. Sci Rep 2023;13:10424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 237. Kim J, Hong S-P, Lee S, Lee W, Lee D, Kim R, et al. Multidimensional fragmentomic profiling of cell-free DNA released from patient-derived organoids. Hum Genomics 2023;17:96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 238. Moser T, Kühberger S, Lazzeri I, Vlachos G, Heitzer E. Bridging biological cfDNA features and machine learning approaches. Trends Genet 2023;39:285–307. [DOI] [PubMed] [Google Scholar]
- 239. Ritch EJ, Herberts C, Warner EW, Ng SWS, Kwan EM, Bacon JVW, et al. A generalizable machine learning framework for classifying DNA repair defects using ctDNA exomes. NPJ Precis Oncol 2023;7:27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 240. Liu MC, Oxnard GR, Klein EA, Swanton C, Seiden MV; CCGA Consortium . Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol 2020;31:745–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 241. Liang N, Li B, Jia Z, Wang C, Wu P, Zheng T, et al. Ultrasensitive detection of circulating tumour DNA via deep methylation sequencing aided by machine learning. Nat Biomed Eng 2021;5:586–99. [DOI] [PubMed] [Google Scholar]
- 242. Kilgour E, Rothwell DG, Brady G, Dive C. Liquid biopsy-based biomarkers of treatment response and resistance. Cancer Cell 2020;37:485–95. [DOI] [PubMed] [Google Scholar]
- 243. Christensen E, Birkenkamp-Demtröder K, Sethi H, Shchegrova S, Salari R, Nordentoft I, et al. Early detection of metastatic relapse and monitoring of therapeutic efficacy by ultra-deep sequencing of plasma cell-free DNA in patients with urothelial bladder carcinoma. J Clin Oncol 2019;37:1547–57. [DOI] [PubMed] [Google Scholar]
- 244. Kasi PM, Fehringer G, Taniguchi H, Starling N, Nakamura Y, Kotani D, et al. Impact of circulating tumor DNA-based detection of molecular residual disease on the conduct and design of clinical trials for solid tumors. JCO Precis Oncol 2022;6:e2100181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 245. Bharde A, Nadagouda S, Dongare M, Hariramani K, Basavalingegowda M, Haldar S, et al. ctDNA-based liquid biopsy reveals wider mutational profile with therapy resistance and metastasis susceptibility signatures in early-stage breast cancer patients. J Liq Biopsy 2025;7:100284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 246. Cohen SA, Liu MC, Aleshin A. Practical recommendations for using ctDNA in clinical decision making. Nature 2023;619:259–68. [DOI] [PubMed] [Google Scholar]
- 247. Liu S-C. Circulating tumor DNA in liquid biopsy: current diagnostic limitation. World J Gastroenterol 2024;30:2175–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 248. Heil BJ, Hoffman MM, Markowetz F, Lee S-I, Greene CS, Hicks SC. Reproducibility standards for machine learning in the life sciences. Nat Methods 2021;18:1132–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 249. Wilkinson MD, Dumontier M, Aalbersberg IJI, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 250. Welch JD, Kozareva V, Ferreira A, Vanderburg C, Martin C, Macosko EZ. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 2019;177:1873–87.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 251. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004;5:R80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 252. Galaxy Community . The galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res 2024;52:W83–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 253. Grüning BA, Rasche E, Rebolledo-Jaramillo B, Eberhard C, Houwaart T, Chilton J, et al. Jupyter and galaxy: easing entry barriers into complex data analyses for biomedical researchers. PLoS Comput Biol 2017;13:e1005425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 254. Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012;2:401–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 255. Goldman MJ, Craft B, Hastie M, Repečka K, McDade F, Kamath A, et al. Visualizing and interpreting cancer genomics data via the xena platform. Nat Biotechnol 2020;38:675–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 256. Merkel D. Docker: lightweight linux containers for consistent development and deployment. Linux J 2014;239:2. [Google Scholar]
- 257. Karargyris A, Umeton R, Sheller MJ, Aristizabal A, George J, Wuest A, et al. Federated benchmarking of medical artificial intelligence with MedPerf. Nat Mach Intell 2023;5:799–810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 258. Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc 2014;21:578–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 259. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, et al. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015;216:574–8. [PMC free article] [PubMed] [Google Scholar]
- 260. Platt R, Brown JS, Robb M, McClellan M, Ball R, Nguyen MD, et al. The FDA sentinel initiative - an evolving national resource. N Engl J Med 2018;379:2091–3. [DOI] [PubMed] [Google Scholar]
- 261. Duda SN, Kennedy N, Conway D, Cheng AC, Nguyen V, Zayas-Cabán T, et al. HL7 FHIR-Based tools and initiatives to support clinical research: a scoping review. J Am Med Inform Assoc 2022;29:1642–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 262. Heath AP, Ferretti V, Agrawal S, An M, Angelakos JC, Arya R, et al. The NCI genomic data commons. Nat Genet 2021;53:257–62. [DOI] [PubMed] [Google Scholar]
- 263. Osterman TJ, Terry M, Miller RS. Improving cancer data interoperability: the promise of the minimal common oncology data elements (mCODE) initiative. JCO Clin Cancer Inform 2020;4:993–1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 264. Guérin J, Laizet Y, Le Texier V, Chanas L, Rance B, Koeppel F, et al. OSIRIS: a minimum data set for data sharing and interoperability in oncology. JCO Clin Cancer Inform 2021;5:256–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 265. Botsis T, Murray JC, Ghanem P, Balan A, Kernagis A, Hardart K, et al. Precision oncology core data model to support clinical genomics decision making. JCO Clin Cancer Inform 2023;7:e2200108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 266. Omiye JA, Gui H, Rezaei SJ, Zou J, Daneshjou R. Large language models in medicine: the potentials and pitfalls : a narrative review. Ann Intern Med 2024;177:210–20. [DOI] [PubMed] [Google Scholar]
- 267. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med 2023;29:1930–40. [DOI] [PubMed] [Google Scholar]
- 268. Jiang G, Solbrig HR, Prud’hommeaux E, Tao C, Weng C, Chute CG. Quality assurance of cancer study common data elements using a post-coordination approach. AMIA Annu Symp Proc 2015;2015:659–68. [PMC free article] [PubMed] [Google Scholar]
- 269. Tao S, Zeng N, Hands I, Hurt-Mueller J, Durbin EB, Cui L, et al. Web-based interactive mapping from data dictionaries to ontologies, with an application to cancer registry. BMC Med Inform Decis Mak 2020;20(Suppl 10):271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 270. Wilks C, Zheng SC, Chen FY, Charles R, Solomon B, Ling JP, et al. recount3: summaries and queries for large-scale RNA-Seq expression and splicing. Genome Biol 2021;22:323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 271. Ramos M, Schiffer L, Re A, Azhar R, Basunia A, Rodriguez C, et al. Software for the integration of multiomics experiments in bioconductor. Cancer Res 2017;77:e39–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 272. Fedorov A, Longabaugh WJR, Pot D, Clunie DA, Pieper S, Aerts HJWL, et al. NCI imaging data commons. Cancer Res 2021;81:4188–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 273. Rozenblatt-Rosen O, Regev A, Oberdoerffer P, Nawy T, Hupalowska A, Rood JE, et al. The human tumor atlas network: charting tumor transitions across space and time at single-cell resolution. Cell 2020;181:236–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 274. Menzel P. Snakemake workflows for long-read bacterial genome assembly and evaluation. GigaByte 2024;2024:gigabyte116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 275. Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol 2017;35:316–9. [DOI] [PubMed] [Google Scholar]
- 276. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. GenePattern 2.0. Nat Genet 2006;38:500–1. [DOI] [PubMed] [Google Scholar]
- 277. Lau JW, Lehnert E, Sethi A, Malhotra R, Kaushik G, Onder Z, et al. The cancer genomics cloud: collaborative, reproducible, and democratized-A new paradigm in large-scale computational research. Cancer Res 2017;77:e3–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 278. Reynolds SM, Miller M, Lee P, Leinonen K, Paquette SM, Rodebaugh Z, et al. The ISB cancer genomics cloud: a flexible cloud-based platform for cancer genomics research. Cancer Res 2017;77:e7–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 279. Reich M, Tabor T, Liefeld T, Thorvaldsdóttir H, Hill B, Tamayo P, et al. The GenePattern notebook environment. Cell Syst 2017;5:149–51.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 280. Jiménez RC, Kuzak M, Alhamdoosh M, Barker M, Batut B, Borg M, et al. Four simple recommendations to encourage best practices in research software. F1000Research 2017;6:ELIXIR-876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 281. Chen RJ, Lu MY, Chen TY, Williamson DFK, Mahmood F. Synthetic data in machine learning for medicine and healthcare. Nat Biomed Eng 2021;5:493–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 282. Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods 2020;17:147–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 283. Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 2013;6:pl1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 284. Tsherniak A, Vazquez F, Montgomery PG, Weir BA, Kryukov G, Cowley GS, et al. Defining a cancer dependency map. Cell 2017;170:564–76.e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 285. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol 2011;29:24–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 286. Diesh C, Stevens GJ, Xie P, De Jesus Martinez T, Hershberg EA, Leung A, et al. JBrowse 2: a modular genome browser with views of synteny and structural variation. Genome Biol 2023;24:74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 287. Ryan MC, Stucky M, Wakefield C, Melott JM, Akbani R, Weinstein JN, et al. Interactive Clustered Heat Map Builder: an easy web-based tool for creating sophisticated clustered heat maps. F1000Res 2019;8:ISCB Comm J-1750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 288. Zhou Y-H, Sun G. Improve the colorectal cancer diagnosis using gut microbiome data. Front Mol Biosci 2022;9:921945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 289. Freitas P, Silva F, Sousa JV, Ferreira RM, Figueiredo C, Pereira T, et al. Machine learning-based approaches for cancer prediction using microbiome data. Sci Rep 2023;13:11821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 290. Xu W, Wang T, Wang N, Zhang H, Zha Y, Ji L, et al. Artificial intelligence-enabled microbiome-based diagnosis models for a broad spectrum of cancer types. Brief Bioinform 2023;24:bbad178. [DOI] [PubMed] [Google Scholar]
- 291. Hajeebu S, Ngembus NJ, Bandi PS, Panigrahy PK, Heindl S. Machine learning as a tool in investigating the possible role of microbiome in development and treatment of cancer. Cureus 2021;13:e17415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 292. National Cancer Institute . Discover iTCR’s connectivity map: a visual diagram depicting informatics tools and the relationships between them. 2021. [Google Scholar]
- 293. Pratt D, Chen J, Pillich R, Rynkov V, Gary A, Demchak B, et al. NDEx 2.0: a clearinghouse for research on cancer pathways. Cancer Res 2017;77:e58–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 294. Armato SG 3rd, McLennan G, McNitt-Gray MF, Meyer CR, Yankelevitz D, Aberle DR, et al. Lung image database consortium: developing a resource for the medical imaging research community. Radiology 2004;232:739–48. [DOI] [PubMed] [Google Scholar]
- 295. Thompson PM, Stein JL, Medland SE, Hibar DP, Vasquez AA, Renteria ME, et al. The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data. Brain Imaging Behav 2014;8:153–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 296. Davatzikos C, Barnholtz-Sloan JS, Bakas S, Colen R, Mahajan A, Quintero CB, et al. AI-based prognostic imaging biomarkers for precision neuro-oncology: the ReSPOND consortium. Neuro Oncol 2020;22:886–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 297. Annas GJ. HIPAA regulations - a new era of medical-record privacy? N Engl J Med 2003;348:1486–90. [DOI] [PubMed] [Google Scholar]
- 298. Reuter P. The general data protection regulation (GDPR): an EPSU briefing. 2018. [Google Scholar]
- 299. Pati S, Baid U, Edwards B, Sheller MJ, Foley P, Anthony Reina G, et al. The federated tumor segmentation (FeTS) tool: an open-source solution to further solid tumor research. Phys Med Biol 2022;67:204002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 300. Pati S, Thakur S, Hamamcı İE, Baid U, Baheti B, Bhalerao M, et al. GaNDLF: the generally nuanced deep learning framework for scalable end-to-end clinical workflows. Commun Eng 2023;2:23. [Google Scholar]
- 301. Foley P, Sheller MJ, Edwards B, Pati S, Riviera W, Sharma M, et al. OpenFL: the open federated learning library. Phys Med Biol 2022;67:214001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 302. Pati S, Baid U, Edwards B, Sheller M, Wang S-H, Reina GA, et al. Federated learning enables big data for rare cancer boundary detection. Nat Commun 2022;13:7346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 303. Pati S, Kumar S, Varma A, Edwards B, Lu C, Qu L, et al. Privacy preservation for federated learning in health care. Patterns (N Y) 2024;5:100974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 304. Kuenzi BM, Park J, Fong SH, Sanchez KS, Lee J, Kreisberg JF, et al. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 2020;38:672–84.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 305. Wang J, Wen Y, Zhang Y, Wang Z, Jiang Y, Dai C, et al. An interpretable artificial intelligence framework for designing synthetic lethality-based anti-cancer combination therapies. J Adv Res 2024;65:329–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 306. Liu Y, Han T, Ma S, Zhang J, Yang Y, Tian J, et al. Summary of ChatGPT-Related research and perspective towards the future of large language models. Meta-Radiology 2023;1:100017. [Google Scholar]
- 307. Villanueva-Meyer JE, Bakas S, Tiwari P, Lupo JM, Calabrese E, Davatzikos C, et al. Artificial Intelligence for Response Assessment in Neuro Oncology (AI-RANO), part 1: review of current advancements. Lancet Oncol 2024;25:e581–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 308. National Cancer Institute . ITCR training network.
- 309. Savonen C, Wright C, Hoffman AM, Muschelli J, Cox K, Tan FJ, et al. Open-source tools for training resources - OTTR. J Stat Data Sci Educ 2023;31:57–65. [DOI] [PMC free article] [PubMed] [Google Scholar]



