Skip to main content
BioTech logoLink to BioTech
. 2025 Jul 29;14(3):58. doi: 10.3390/biotech14030058

Pathway Analysis Interpretation in the Multi-Omic Era

William G Ryan V 1, Smita Sahay 1, John Vergis 1, Corey Weistuch 2, Jarek Meller 3,4,5,6, Robert E McCullumsmith 1,7,*
Editor: Gerald Lushington
PMCID: PMC12372084  PMID: 40843781

Abstract

In bioinformatics, pathway analyses are used to interpret biological data by mapping measured molecules with known pathways to discover their functional processes and relationships. Pathway analysis has become an essential tool for interpreting large-scale omics data, translating complex gene sets into actionable experimental insights. However, issues inherent to pathway databases and misinterpretations of pathway relevance often result in “pathway fails,” where findings, though statistically significant, lack biological applicability. For example, the Tumor Necrosis Factor (TNF) pathway was originally annotated based on its association with observed tumor necrosis, while it is multifunctional across diverse physiological processes in the body. This review broadly evaluates pathway analysis interpretation, including embedding-based, semantic similarity-based, and network-based approaches to clarify their ideal use-case scenarios. Each method for interpretation is assessed for its strengths, such as high-quality visualizations and ease of use, as well as its limitations, including data redundancy and database compatibility challenges. Despite advancements in the field, the principle of “garbage in, garbage out” (GIGO) shows that input quality and method choice are critical for reliable and biologically meaningful results. Methodological standardization, scalability improvements, and integration with diverse data sources remain areas for further development. By providing critical guidance with contextual examples such as TNF, we aim to help researchers align their objectives with the appropriate method. Advancing pathway analysis interpretation will further enhance the utility of pathway analysis, ultimately propelling progress in systems biology and personalized medicine.

Keywords: omics interpretation, pathway analysis, gene ontology, embeddings, semantic similarity

1. Introduction

The development of high-throughput multi-omics technologies, such as genomics, transcriptomics, and proteomics, has revolutionized biological research, enabling the exploration of cellular processes with unprecedented depth and scale [1]. These advances have catalyzed the rise of systems biology, an integrative approach that focuses on the interactions within biological systems rather than isolated molecular components [2]. By combining diverse omics layers, researchers are now driving personalized medicine forward, tailoring treatments to individual biological signatures derived from patient-specific data [3]. In this context, pathway analysis has become an indispensable tool for translating omic datasets into actionable clinical insights by targeting key pathways and mechanisms for further investigation and manipulation [4]. As multi-omics data continue to expand, pathway analyses offer a crucial method for gleaning meaningful biological conclusions from raw data, linking molecular changes to functional outcomes across various experimental conditions [5]. This approach advances our understanding of disease pathogenesis and also accelerates the development of targeted therapies [6].

Despite pathway analyses providing instrumental insights into biological functioning, inherent challenges often prevent effective application. Translating these analyses into actionable experimental targets and candidate treatments remains a persistent bottleneck in omics-driven research [7,8]. Further, deciphering clinically relevant mechanisms is difficult given the unpredictable interactions within biological systems [9,10,11,12,13]. Extensive lists of results are often generated. While promising, such results are difficult to curate manually, increasing the risk of bias during pathway selection [14,15,16]. Moreover, even when results may appear promising, validation is frequently hindered due to the variability among experimental models [17,18,19] and the potential mismatch between statistical confidence and biological significance [20,21].

Another key challenge lies in the complexity of new biological data types, such as kinome array data, continually outpacing the capability to be analyzed, leaving researchers without techniques to translate their findings in meaningful ways [22,23]. The diversity of available tools further complicates this process. Varying algorithms and databases may yield inconsistent results, which often makes the harmonization of outputs impractical [24]. Thus, researchers carry the responsibility of evaluating the accuracy and practical utility of selected tools to ensure proper integration of multi-omics data and sensible outputs [25,26]. An example of this is the correct prioritization and validation of selected pathways for targeted therapies [27]. Misinterpretation or incomplete analysis of pathway data may lead to erroneous biological conclusions, undermining the reliability of personalized treatments [28].

Here, we address a critical gap by focusing on the interpretation of pathway analysis results, a novel approach distinct from previous work [29,30,31,32], which primarily examines different methods for performing pathway analyses. We evaluate the strengths, limitations, and practical applicability of various pathway analysis interpretation methods, offering guidelines to optimize their use across diverse research settings. Ultimately, we argue that the future of pathway analysis lies in enhancing automation, scalability, and the development of tools that produce interpretable outputs, bridging the gap between computational predictions and experimental validation.

2. Key Challenges to Pathway Analysis

2.1. Pathway Annotation

One of the primary issues in pathway annotation arises from how pathways were first identified and named [33]. Pathway names typically reflect the initial experimental conditions in which they were discovered, rather than encompassing their broader roles. A notable example of this is the Tumor Necrosis Factor (TNF) pathway. Despite its name, TNF is not solely a mechanism for tumor necrosis. Early researchers linked TNF with tumor suppression, as necrosis of tumors was observed in vivo under specific pathological experimental conditions [34]. However, subsequent studies revealed TNF as a multipotent cytokine involved in numerous physiological processes, including the innate immune response, inflammation, and apoptosis across many different tissues [35]. This represents an example of a domain-specific anchor bias [36], wherein the initial characterization of a pathway became a fixed reference point influencing subsequent perspectives. With TNF, the original association with tumor necrosis has anchored its perception, overshadowing its other roles in normal physiology. Highlighting this concern, the so-called “TNF pathway” also mediates NMDA receptor activity in neurons and glial cells [37]. Such semantic mismatches obscure a pathway’s true biological functions, perpetuating narrow interpretations and hindering a comprehensive understanding of pathways beyond the conditions in which they were originally discovered [38,39].

Interpreting function is also highly context-dependent, requiring careful consideration of experimental design and biological domain. For instance, in cancer research, activation of apoptosis is typically associated with programmed cell death, the mechanism for eliminating cancerous cells [40]. However, in the brain, similar activation likely indicates synaptic pruning or neurite retraction, which are critical for neurodevelopment and synaptic plasticity [41,42,43]. Similarly, inflammation activated during injury or infection might signal tissue damage and repair processes [44,45], whereas in the brain, these may reflect immune activation in response to neuroinflammation or other neural stimuli [46,47]. A notable example of this context dependence is the NF-κB pathway, which has distinct canonical and non-canonical activation mechanisms. The canonical NF-κB pathway is typically associated with acute inflammatory responses and innate immunity [48], while the non-canonical pathway governs processes like lymphoid organ development and adaptive immune signaling [49]. Misinterpreting activation could lead to flawed conclusions, such as conflating an immune response with developmental signaling or vice versa. Without accounting for the biological context, researchers may draw incorrect conclusions about pathways’ roles, potentially leading to flawed experimental designs or misdirected therapeutic strategies.

Bias, redundancy and overlap in pathway annotation databases also present significant challenges for interpreting enrichment results [47]. Databases like Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, WikiPathways, and others commonly describe similar biological functions each using slightly different gene sets or interaction details [50]. Within databases, pathways may be annotated as multiple distinct terms with close variations in the genes and regulatory mechanisms involved, complicating interpretation. For example, in GO, “cell adhesion” (GO:0007155) is annotated as the attachment of cells to other cells or the extracellular matrix, while “cell-cell adhesion” (GO:0098609) specifies adhesion between cells only. Similarly, “cell migration” (GO:0016477) refers specifically to directed cell movement, while “cell motility” (GO:0048870) includes directed movement and also broader movements. These subtle distinctions result in overlapping enrichment results, hindering prioritization of the most relevant pathways [51,52].

Structural differences between annotation databases compound these challenges, obscuring shared biological function and contributing to inconsistencies in pathway coverage. For example, overlapping gene sets among those labeled as “Wnt signaling” in KEGG, Reactome, and WikiPathways have significant divergence, with only 73 overlapping genes out of 148, 312, and 135 total genes, respectively, despite annotating the same function [53]. Furthermore, total gene coverage is heavily disparate, with databases like Reactome annotating over 10,000 human genes, while others like WikiPathways cover only about 6000. Study biases further exacerbate this problem, as genes frequently studied in fields like cancer research, where omics is used predominantly, are counterintuitively underrepresented in gene-set annotations [54]. Similarly, GO, originally developed for model organisms, may overemphasize highly conserved cellular processes at the expense of species-specific functions [55]. Despite this stringency, even curated GO annotations have error rates ranging from 18% to 49% [55].

To explore potential biases in annotations, we surveyed gene sets presently annotated to GO terms, i.e., pathways, which illustrated how these challenges may manifest as patterns of redundancy and bias. We found that certain genes, like transforming growth factor beta 1, are annotated to over 1000 pathways, while others like chromosome 6 open reading frame 62 are annotated to just 2 pathways (Table 1, Supplemental Table S1). Moreover, a substantial number of genes, including 611 genes coding for protein products, lack any known annotation entirely (Table 2). Therefore, if a researcher were to investigate the function of any of these unannotated genes (Supplemental Table S2) using GO, they would be excluded from their analyses. These disparities result in highly skewed gene coverage, where a small subset of genes dominate annotations (Figure 1A). Similarly, a handful of highly annotated pathways disproportionately account for most of all known gene–pathway associations (Figure 1B). These patterns highlight the broader challenges of redundancy and bias in practice, which obscures biological significance of underrepresented genes and hinders the prioritization of relevant results by researchers. Efforts to mitigate these issues, such as set theory-based approaches to reduce overlap, have shown promise but require further refinement to ensure comprehensive and equitable representation within annotation databases [56]. Awareness of these domain-specific challenges is essential to improve the clarity and utility of pathway annotations.

Table 1.

Representative top two genes nearest Tukey’s five-number summary of gene ontology annotations (GOALL w/IEA, Bader Lab, October 2024). The complete list is available in the Supplementary (Table S1).

Gene # of Pathways
TGFB1 transforming growth factor beta 1 1010
CTNNB1 catenin beta 1 894
ACADL acyl-CoA dehydrogenase long chain 120
ACTBL2 actin beta like 2 120
ABCA6 ATP binding cassette subfamily A member 6 72
ACKR1 atypical chemokine receptor 1 (Duffy blood group) 72
ABCF3 ATP binding cassette subfamily F member 3 44
ADISSP adipose secreted signaling protein 44
C6orf62 chromosome 6 open reading frame 62 2
CTAGE3P CTAGE family member 3, pseudogene 2

Figure 1.

Figure 1

Gene ontology gene-term annotations (GOALL w/IEA, Bader Lab, October 2024). (A). Histogram showing the frequency of all annotated genes and the number of pathways, i.e., GO terms, annotated to them. (B). Bar plot showing representative terms in each GO subontology after percentile ranking all terms by their cumulative gene annotations (bias score) and using Tukey’s five-number summary. BP: biological process; CC: cellular component; MF: molecular function.

Table 2.

Locus types of all HGNC genes with zero gene ontology annotations (GOALL w/IEA, Bader Lab, October 2024). HGNC: HUGO Gene Nomenclature Committee.

Locus Type Count
pseudogene 13,940
RNA, long non-coding 5640
RNA, micro 1912
gene with protein product 611
RNA, transfer 591
RNA, small nucleolar 568
immunoglobulin pseudogene 202
readthrough 143
RNA, cluster 119
fragile site 116
endogenous retrovirus 92
T cell receptor gene 67
RNA, ribosomal 58
immunoglobulin gene 55
RNA, small nuclear 51
region 46
unknown 46
T cell receptor pseudogene 38
RNA, misc 29
virus integration site 8
complex locus constituent 6
RNA, vault 4
RNA, Y 4

Table 3.

Overview of pathway analysis interpretation tools and their features. This table lists various tools categorized by their method of analysis (e.g., semantic, network, embedding), detailing their year of release, description, access platform, supported databases, and visualization capabilities. This comparison serves as a comprehensive resource for researchers to identify tools best suited for their specific pathway analysis needs.

Tool Year Method Access Database Visualization Description
REVIGO [57] 2011 Semantic Web GO Scatterplots,
Interactive
graph,
tree maps
Summarizes GO term list
using Semantic Similarity
and clustering
clusterProfiler [58] 2013 Semantic R package GO,
KEGG,
DO
Dot plot Enrichment analysisfor
GO/KEGG terms
and visualization
ReCiPa [52] 2018 Semantic R package KEGG,
Reactome
Data tables Controls redundancy in
pathway databases
GOGO [59] 2018 Semantic Web,
Perl
GO Data tables Calculates semantic
similarity of
GO terms using improved
algorithms
FunSet [60] 2019 Semantic Web,
Standalone
GO 2D plots Performs GO enrichment
analysis with interactive
visualizations
GeneSetCluster [61] 2020 Semantic R package Any Network
graph,
dendrogram,
heatmap
Groups gene sets post
analysis based on shared
genes
GOMCL [62] 2020 Semantic Python GO Heatmap,
Network graph
Clusters GO terms using
Markov clustering algorithm
GoSemSim [63] 2020 Semantic R package GO Data tables Computes semantic
similarity among GO terms
for comparison
GO-FIGURE! [15] 2021 Semantic Python GO Scatterplot Visualizes GO term
similarity with custom
scatterplots
Simplify
Enrichment [64]
2022 Semantic R package GO Heatmap Clusters with a unique
binary cut algorithm.
iDEP [65] 2018 Semantic Web, R package GO,
KEGG,
Reactome
Tree,
Heatmap,
Network
Graph
Web app for transcriptomics
And pathway exploration
DAVID [66] 2009 Semantic Web,
REST API
KEGG,
Any
Tabular,
Barchart,
Network graph
Enrichment analysis with
functional annotation
clustering
g:Profiler [67] 2007 Semantic Web, R package KEGG,
Reactome,
WikiPathways,
Any
Dot plot,
Tabular,
Network graph
Orthology-aware
enrichment analysis and
clustering
RICHNET [68] 2019 Network R protocol MSigDB Network graph Automated gene-set network
creation
EnrichmentMap [69] 2019 Network Cytoscape Any Interactive
network
Detailed enrichment
mapping
Gscluster [70] 2019 Network Web,
R Package
MSigDB Interactive
network
Network-weighted gene-set
clustering integrating PPI
data
aPEAR [71] 2019 Network R package Any Network graph Clustering with automated
naming
GeneFEAST [72] 2023 Network Web,
Python
Any Heatmap,
Dot plot,
Upset plot
Highlights multi-enrichment
genes
vissE [73,74] 2023 Network R package MSigDB,
Any
Network graph Visualizes higher-order
interactions
pathlinkR [75] 2024 Network R package Reactome,
MSigDB,
InnateDB
Network
graph,
Volcano plot,
Dot plot
Integrated PPI network
construction
Pathview [76] 2017 Network Web,
R Package,
REST API
KEGG Network
Graph
Visualizes and maps data
onto KEGG pathways
PAVER [77] 2024 Embedding Web,
R package
Any UMAP,
Heatmap,
Dot plot
Embedding-based clustering
with UMAP for clear
pathway visualization
Mondrian-
Map [78]
2024 Embedding Python WikiPathways Mondrian
Map
Embedding visualizations
highlighting pathway
interactions and crosstalk
GOsummaries [79] 2015 Word Cloud R package GO PCA,
Boxplot
Visualizes GO analyses as
word clouds and overlays
results
genesetSV [80] 2023 Game Theory Python KEGG,
MSigDB
Scatterplot Uses Shapley values for
ranking and reducing
pathway sets
Archetype-
Discovery [81]
2024 Non-negative
Matrix
Factorization
MATLAB MSigDB,
Any
Radar,
Scatter-
and Boxplot,
Heatmap
Derives compact
archetypal gene-set patterns and their pathway associations

2.2. Visualizing Pathway Findings

Another critical challenge in pathway analysis is effectively visualizing high-dimensional data, which is essential for interpreting results and communicating findings [7,10]. The complexity of multi-omics datasets often makes illustrating informative relationships between data difficult [25]. Without specialized computational expertise, this complexity may hinder an experimental biologist’s ability to derive meaningful conclusions [82]. Visualization tools must therefore balance simplifying data for clarity while retaining necessary detail for accurate interpretation [83]. Moreover, visualization is further complicated by the aforementioned redundancy, leading to visual clutter that obscures key findings. Heatmaps, Uniform Manifold Approximation and Projection (UMAP) plots, and network-based plots address these challenges by offering more intuitive representations. Notably, these approaches all have limitations, such as artifacts introduced by dimensionality reduction or network overcrowding and sparsity [69,84]. A lack of standardized and widely accepted visualization practices exacerbates this challenge, leaving researchers with fragmented outputs. Thus, developing more effective and user-friendly visualization strategies remains a key priority for advancing pathway analysis and improving accessibility to a broader research audience.

2.3. Limitations to Pathway Analysis Utility

Despite the potential of pathway analysis to generate new insights, it often falls short in providing actionable leads, resulting in cases where results are uninformative. In studies using gene expression data, analyses may overrepresent genes in canonical pathways, such as “immune function,” even when biologically irrelevant to the experimental conditions [85]. For example, in an RNAseq study of postmortem DLPFC tissue exploring gene co-expression networks related to schizophrenia [86], eye development was identified within astrocyte modules. Such results lack readily apparent insights into schizophrenia-specific biological mechanisms. Similarly, pathways associated with learning and memory, highlighted in neuron modules, presented challenges as their relevance to schizophrenia risk and clinical state is extremely vague. Such “pathway fails” are not uncommon. For example, the kynurenine pathway’s role in psychiatric conditions such as schizophrenia and major depressive disorder is well characterized [87]. However, the precise mechanisms by which it contributes to changes in cognitive function remain poorly understood. This lack of clarity complicates the development of targeted therapeutic interventions, exemplifying how even well-studied pathways can yield results that are uninformative for real-world translational applications. These instances highlight a common pitfall in pathway analysis since statistical significance does not necessarily equate to biological relevance [20,21].

However, not all analyses fall short; there are notable examples of “pathway successes” that have led to significant clinical advancements: nearly two-thirds of recent FDA-approved drugs were shown a priori that their gene or protein targets had a significant phenotypic association with the targeted disease [88]. Anifrolumab, an IFNAR1 antagonist approved for systemic lupus erythematosus (SLE), lacks direct association with SLE. However, variants in TYK2—a kinase that physically interacts with IFNAR1—have a pathogenic role acting via the TYK2/JAK pathways in a pathway association study [89,90]. Similarly, PCSK9, a target for hyperlipidemia therapies, was identified through pathway analysis despite lacking direct genetic association with the disease; instead, it was implicated via protein interaction networks whose interacting nodes were enriched for hyperlipidemia-related pathways [91]. These successes underscore the potential of pathway analysis to yield actionable insights, even as challenges persist in ensuring biological relevance and clinical utility. Selecting appropriate analysis methods and interpreting results within the appropriate biological context is crucial to avoid uninformative or misleading conclusions.

2.4. Discrepancies in Molecular Biology Mislead Validation

Traditionally, mRNA (i.e., gene) expression levels and protein abundance have been relied on as proxies for biological activity [92,93,94]. While mRNA levels are often used as convenient inferences of protein expression, substantial evidence shows that mRNA and protein abundances are not well-correlated [95,96]. Gene expression levels often do not correlate with protein abundance, and protein abundance does not necessarily reflect functional activity due to factors such as post-translational modifications, protein–protein interactions, and subcellular localization [97,98]. These discrepancies present significant challenges in pathway analyses based solely on transcriptomics or proteomics data, as they may not fully reflect functional states of biological pathways. Researchers unaware of these limitations risk drawing incomplete conclusions about cell processing and signaling [99]. To address this challenge, there has been a shift toward functionally informed methods [100,101,102,103], such as phosphoproteomics and kinome reporter phosphopeptide arrays. These methods provide a functional view of the genes and pathways being studied, moving away from simply observing there is too much or too little of a gene product [104]. By capturing the dynamic nature of biological systems, these approaches may mitigate the inherent pitfalls of traditional omics-driven pathway predictions and enhance the overall reliability of pathway analyses [105].

The cell subtype context also presents unique challenges, as scRNAseq datasets are dominated by dropout events that create sparse expression matrices and obscure true signals [106]. Limited mRNA detection from rare cell populations also introduces selection bias into downstream analyses [107]. Enrichment tools originally developed for bulk RNAseq data therefore risk misclassifying dropouts as biological absence, or inflating significance [108]. Although imputation strategies can fill dropouts, they may amplify technical variation and wash out genuine biological heterogeneity [109]. Consequently, dropout-aware methods are more reliable than traditional approaches [110,111]. Despite these advancements, purpose-built methods require careful consideration of input data quality, outliers, and pathway complexity [112,113]. Collectively, these challenges highlight the need for next-generation, single-cell-specific pathway tools that explicitly model technical noise, accommodate rare cells, and exploit multi-omic integration to minimize “pathway fails” [114].

2.5. Research Data Mismanagement

While effective management of datasets is a cornerstone of rigorous and reproducible science, it remains fraught with challenges. The prevalence of data-management errors, such as coding mistakes and ambiguous documentation, undermines scientific reliability and leads to numerous article retractions [115]. These errors, which may occur at any stage of the research workflow, waste resources and highlight the vulnerability of current systems to human fallibility [116]. Retractions due to honest mistakes have also caused immense personal stress and professional consequences for researchers, prompting calls to improve data workflows and prevent similar errors in the future [116]. Emerging frameworks like FAIR (Findable, Accessible, Interoperable, Reusable) data principles aim to address these issues by promoting robust, scalable, and automatable systems that ensure interoperability, standardization, and accessibility of data [117]. By integrating multi-omics datasets into unified frameworks, these principles can enhance the efficiency and reliability of pathway analysis at scale [7,118,119]. As these tools evolve, their capacity to offer high-confidence outputs that align with real-world applicability becomes increasingly important.

3. Methods for Pathway Analysis Interpretation

3.1. Semantic Similarity Based Methods

Semantic similarity-based methods are a foundational approach in pathway analysis interpretation, designed to quantify the functional relationships among complex annotation terms. For example, semantic similarity-based methods are often applied to interpret the extensive and often redundant lists of GO terms [120]. These methods leverage the hierarchical structure of GO, where terms are organized into parent–child relationships that span from general biological processes to gradually more specific functions [121]. By calculating a “semantic similarity” score between terms, these methods identify and group related GO terms [57], reducing redundancy and enabling researchers to focus on overarching biological themes.

The calculation of semantic similarity typically involves information content metrics based on shared ancestors within the GO hierarchy, reflecting the functional overlap between terms [122]. For instance, terms that share a highly specific ancestor within the GO tree yield higher similarity scores due to their closer functional relationship. Popular metrics include Resnik’s and Lin’s similarity measures, which quantify similarity by evaluating the specificity of shared ancestors and their positions within the GO structure [59]. Tools like clusterProfiler implement these methods by grouping semantically similar GO terms into clusters. Representative terms are selected based on similarity scores and user-defined thresholds, with results visualized through scatter plots or dot plots for clarity [58]. This grouping provides a summarized and non-redundant list of terms, making interpretation of large results more feasible.

However, these methods are inherently tied to the GO framework and can fail to generalize to other databases like KEGG, Reactome, or WikiPathways [121,123]. Additionally, some studies suggest that these similarity scores may not fully capture the nuanced meanings of GO terms, highlighting an area for further refinement [124]. While the combination of functional clustering and visualization described here offers a powerful means of simplifying GO term analysis, new developments are needed to broaden applicability and enhance interpretive precision.

3.2. Network-Based Methods

Network-based methods provide a powerful approach for pathway analysis by visualizing interconnected networks, where pathways are represented as nodes, and edges denote their shared genes or similar functional annotations [75]. These methods rely on graph theory principles to quantify relationships between pathways based on shared content, using metrics like gene overlap, Jaccard index, or semantic similarity [68]. This network structure captures relationships between pathways, identifies related terms, and reveals broad biological themes that may be obscured in list-based approaches.

Clustering algorithms, such as modularity-based or hierarchical clustering, group these pathways into distinct modules based on their connectivity, reducing redundancy and highlighting cohesive functional groups [70]. In pathway networks, nodes with above-average connections, or “hubs,” represent key biological functions that interact in communities of multiple pathways, potentially indicating regulatory roles [72]. Visualization techniques position these pathways close together, with edge weights reflecting the strength of their relationships, emphasizing direct and indirect associations across broader biological processes [71].

Tools like EnrichmentMap enhance these analyses by constructing pathway networks based on gene overlap, using significance thresholds to reduce visual complexity and emphasize biologically relevant connections [69]. These clustering and visualization approaches allow researchers to visualize enrichment results using maps to highlight major biological processes and their relationships. Similarly, Visualization of Set Enrichment (vissE) extends the utility of network-based methods by integrating data across modalities, including single-cell and spatial transcriptomics [73,74]. vissE clusters pathways based on their content similarity and links them to specific cellular phenotypes, providing insights into their interactions within complex biological contexts.

Together, these approaches offer researchers a dynamic, systems-level framework for interpreting pathway data by revealing functional relationships at the network level, clustering pathways into interpretable communities, and accordingly simplifying large datasets for actionable insights.

3.3. Embedding Based Methods

Embedding-based methods offer a cutting-edge approach to pathway analysis by transforming biological entities, like pathways, into high-dimensional vectors, known as embeddings [125,126]. These embeddings numerically encode the semantic meaning of pathways, capturing complex relationships among genes and pathways within the context of large biomedical datasets. Originating in natural language processing, embeddings are widely used to represent words in hundreds of numerical dimensions, allowing models to mathematically quantify relationships between concepts (e.g., “Proteome − Protein + Kinases = Kinome”) [127]. In biomedical research, this technique has become instrumental for clustering, visualization, and predictive modeling [128].

In pathway analysis interpretation, embedding-based methods reduce redundancy and enhance interpretability by grouping pathways based on their semantic similarity [124]. Tools like Pathway Analysis Visualization with Embedding Representations (PAVER) leverage cosine similarity between embeddings to cluster similar pathways and identify a “most representative term” (MRT) for each group based on the average embedding [77,129,130]. This results in streamlined and summarized output, which is especially useful for managing large datasets. PAVER also employs dimensionality reduction techniques, such as UMAP, to convert high-dimensional embeddings into two-dimensional visualizations. These layouts visually group related pathways while preserving semantic relationships, enabling researchers to more easily identify biological themes and generate publication-ready visualizations [131,132].

Another innovative tool, MondrianMap, draws inspiration from abstract art to spatially arrange pathways on a grid, where proximity reflects functional similarity, and color indicates regulatory states, such as upregulation or downregulation [78]. This intuitive visualization method facilitates the exploration of functional clusters and crosstalk patterns across complex datasets. By going beyond traditional node–edge diagrams or heatmaps, MondrianMap provides an interactive and visually accessible representation of results.

While embedding-based methods are promising, they are limited by the information encoded in annotation descriptions or hierarchical structures [131]. These pre-trained language models can capture broad semantic relationships, but they may lack context-specific details [125]. As these models evolve, embedding-based approaches are likely to become even more sophisticated and scalable for interpreting pathway analysis results across diverse datasets.

3.4. Applications of Tools for Pathway Interpretation

The practical utility of exemplar pathway analysis tools can be best appreciated through real-world case studies that illustrate the impact of their specific feature set on advancing biological research.

clusterProfiler: clusterProfiler is widely used as a standard for pathway enrichment analysis and visualization in many fields. Chen et al. employed clusterProfiler to identify differentially expressed genes in acute myocardial infarction, identifying key enrichment of cytokine–cytokine receptor interaction and TNF signaling via GO and KEGG pathway analyses [133]. Jia et al. applied clusterProfiler to distinguish unique gene expression profiles between luminal A and basal-like subtypes of breast cancer, identifying pathways involved in subtype-specific progression and novel therapeutic targets, such as neuromedin U receptor 1, neural cell adhesion molecule 1, and STIL centriolar assembly protein [134]. In neurodegenerative research, Niu et al. utilized clusterProfiler to study differentially expressed genes in varying stages of Alzheimer’s disease, uncovering the role of mitochondrial components and proteasome subunits in disease progression [135]. Additionally, Gamazon et al. employed clusterProfiler in a multi-tissue transcriptome study to link gene expression with neuropsychiatric traits, demonstrating its utility in mapping complex genetic influences across brain and non-brain tissues [136]. clusterProfiler’s semantic similarity-based method was particularly valued in these studies for its simplicity, ease of use, and capability to effectively identify key pathways in diverse datasets with minimal computational overhead. However, despite its strengths, the semantic similarity-based method of clusterProfiler may seemingly fail to be effective with the inherent bias and redundancy in annotation databases used like GO or KEGG, potentially leading to the overrepresentation of certain pathways while underrepresenting others with biological relevance. Additionally, the tool’s reliance on predefined annotation databases limits its applicability in contexts where novel, poorly annotated, or species-specific processes are of interest, potentially overlooking critical insights in less-studied biological systems.

vissE: vissE excels in handling complex multi-omics data, enabling researchers to identify pathway relationships within diverse contexts at the network level. Kulasinghe et al. utilized vissE to analyze transcriptomic profiles of cardiac tissues from patients who succumbed to SARS-CoV-2. Visualization of enriched pathways related to DNA damage and immune responses was able to pinpoint the molecular impact of COVID-19 on cardiac health [137]. In immune research, Dalit et al. applied vissE to map divergent cytokine and transcriptional signatures across T follicular helper cell populations, revealing how different signaling environments guide immune cell heterogeneity and B cell output during various pathogen exposures [138]. In colorectal cancer, Lee et al. leveraged vissE to explore serotonin-mediated signaling, identifying key interactions linked to tumor growth suppression through ERK signaling [139]. In these studies, vissE’s network-based approach facilitated the identification of pathway–pathway relationships and communities, allowing for a deeper understanding of the complex interactions they observed. However, despite its ability to explain these higher-order phenotypic patterns, vissE’s reliance on network-based visualization can become unwieldy when dealing with highly interconnected or very large datasets, potentially leading to information overload and obscured insights for researchers lacking advanced computational expertise. Moreover, the tool’s dependence on comprehensive input data from multi-omics experiments may amplify the impact of incomplete datasets or biases within them, thereby influencing the reliability and interpretability of the visualized pathways in these specific research contexts.

PAVER: PAVER’s embedding-based method, coupled with UMAP visualizations, has proven effective in simplifying complex datasets and revealing critical biological insights. In the brain, Nguyen et al. used PAVER to analyze transcriptomic and kinomic data from mice exposed to pyrethroid pesticides during development, uncovering disruptions in pathways related to MAP kinase signaling and circadian rhythms that may underlie neurodevelopmental disorders [140]. Similarly, Curtis et al. used PAVER to integrate metabolomic and transcriptomic data from the brains of male mice developmentally exposed to deltamethrin, effectively visualizing pathway clusters related to folate biosynthesis, dopamine synapses, and MAPK signaling to highlight the multi-modal impact of environmental exposure on adult brain metabolism [141]. O’Donovan et al. characterized transcriptional changes in the orbitofrontal cortex across psychiatric disorders such as schizophrenia and bipolar disorder, identifying immune-related functions and sex-specific gene expression patterns that distinguished diagnoses and contributed to understanding disease mechanisms [142]. In toxicology research, Hu et al. applied PAVER to interpret kidney transcriptomics in mice exposed to microcystin-LR, demonstrating its use in identifying pathways modulated by probiotic treatment, which offered protective effects against toxin-induced damage [143]. In these studies, PAVER simplified interpretation by visually identifying similar clusters and highlighting functional groups in their multi-omic datasets. However, PAVER necessitates the use of pre-computed embeddings, which may limit flexibility in real-time analyses or exploration of novel datasets. Additionally, the tool’s effectiveness is heavily dependent on the quality and diversity of the input datasets, which may restrict utility in studies where such data are incomplete or unbalanced, potentially impacting the robustness of its pathway clustering and functional interpretations.

Collectively, these tools highlight how different interpretation methods cater to distinct research needs. By choosing the right tool for generating interpretations and deliverables of enrichment results, researchers may maximize the impact of their pathway analyses on the understanding of complex biological systems they study.

3.5. Choosing the Right Tool for Your Research

Before selecting any pathway interpretation strategy, researchers must first confirm that the raw data have been rigorously pre-processed, e.g., library-size normalization [144], dropout-aware imputation for zero-inflated single-cell data [145], and cross-study batch-effect correction [146], in order to ensure reproducibility and prevent technical noise from propagating into “pathway fails.” Choosing the most appropriate pathway analysis method then depends on aligning one’s research goals with the specific strengths and limitations of each approach available. The semantic similarity-based, network-based, and embedding-based methods qualitatively compared in Table 2 below may guide researchers in selecting the method that best fits their study’s objectives.

Visualization quality and usability differ across methods. Semantic similarity-based methods, as seen with clusterProfiler, provide straightforward and accessible output of data, such as via bar charts, for researchers seeking simple visualization. Network-based methods, such as vissE, offer interactive and detailed visualizations that map complex relationships between pathways, ideal for the exploration of interrelations at a deeper level. Embedding-based approaches, such as those employed by PAVER, excel in generating high-quality visualizations to simplify clusters for clearer interpretation.

Ease of use varies as well. Semantic similarity-based methods offer simple workflows, appealing to experimental biologists who are new to bioinformatics. In contrast, network-based methods can demand more technical expertise, appealing to researchers who are comfortable navigating intricate visualizations and data relationships. Embedding-based methods generally automate visualization, requiring minimal input, which is beneficial for quick insights and user-friendly experiences.

Effectiveness in handling redundancy and integrating multi-omics data also sets these methods apart. Semantic similarity-based methods, while excellent at reducing GO term redundancy, are more limited when integrating non-GO data. Network-based approaches stand out in integrating multi-omics data by mapping interconnected pathways and revealing functional interactions across different biological layers. Embedding-based methods reduce redundancy effectively by clustering similar pathways, facilitating interpretation of high-dimensional datasets.

Computational efficiency and scalability are further practical considerations. Semantic similarity-based methods are lightweight and run efficiently for smaller-scale analyses. Network-based methods, however, can be more computationally demanding, especially when visualizing extensive networks or handling dense data. Embedding-based methods are often efficient and scalable, making them suitable for large-scale studies involving dimensionality reduction.

Accessibility also varies across methods. Semantic similarity-based methods are widely accessible as R packages or command-line tools with strong community support. Network-based methods might require specific software installations but are supported by active, albeit more niche, user communities. Embedding-based tools are often web-based with minimal setup requirements.

In summary, semantic similarity-based methods are most appropriate for GO-focused studies requiring straightforward analysis. Network-based methods are best suited for complex analyses that need detailed mapping of interactions. Embedding-based methods are ideal for those seeking quick, visually intuitive summaries and data-driven redundancy reduction. By understanding the unique features of each method type (Table 4), researchers can better align their interpretation strategies with their study goals. This thoughtful approach provides a guide for future advancements and adaptations in pathway analysis interpretation as the field continues to evolve.

Table 4.

Summary of pathway interpretation tools by methodological category. This table groups representative tools by core strategy, with brief notes on their typical strengths and limitations to guide method selection at a glance. For a complete detailed listing of features, see Table 3.

Category Representative Tools Typical Strength Typical Limitation
Semantic similarity-based REVIGO, clusterProfiler,
ReCiPa
Fast redundancy reduction for GO terms Tied to GO; limited cross-database scope
Network-based EnrichmentMap,
vissE, GScluster
Visualizes pathway crosstalk as network modules Computationally heavy for large networks
Embedding-based PAVER, MondrianMap Data-driven clustering with intuitive plots Relies on text descriptions; may miss context

4. Conclusions and Future Directions

This review highlights the strengths and limitations of the three main pathway analysis interpretation methods: semantic similarity-, network-, and embedding-based approaches. Each method offers unique advantages, from the straightforward, GO-focused analyses provided by semantic similarity-based techniques, to the detailed interaction maps facilitated by network-based methods, as well as the high-quality, visually intuitive outputs of embedding-based methods. However, no single approach is without limitations. Challenges remain in integrating diverse data types, minimizing redundancy, and ensuring compatibility across annotation databases. Recognizing these strengths and limitations helps researchers select the most appropriate method for their specific objectives and experimental contexts. The surest way to avoid “pathway fails” is to pair rigorous preprocessing with the interpretation method that best fits the research question: semantic tools for concise GO redundancy reduction, network approaches for interaction context, and embedding strategies for rapid, scalable multi-omic summarization.

We have highlighted areas of improvement for interpretation tools. Ultimately, the principle of “garbage in, garbage out” (GIGO) remains paramount. High-quality, standardized datasets are essential for obtaining reliable results and maximizing the value of any analysis tool [147]. Poor input data or inconsistent standards can significantly limit the effectiveness of these methods and lead to erroneous conclusions, highlighting the need for rigorous data curation and adherence to open data standards. Future developments should focus on enhancing scalability to accommodate increasingly large datasets and ensuring compatibility with so-called non-model model organisms. The influence of gene-set size and database choice on enrichment outcomes must be carefully considered, as this may impact the interpretation of results [148]. Furthermore, consistent standards for functional enrichment analysis, such as proper p-value corrections and background gene list selection, are necessary to ensure reliable findings [149,150].

Looking ahead, researchers are encouraged to adopt tools that streamline and automate analysis, thus enhancing reproducibility and scalability. Emerging AI-driven tools are now offering integrative and customizable approaches that potentially reduce misinterpretation by leveraging domain-specific large language models [151,152,153]. These advancements are particularly vital as multi-omics data grow in complexity and require more accurate interpretations. Continued innovation is essential to bridge the gap between computational predictions and experimental validation, driving deeper insights and supporting the advancement of systems biology and precision medicine. As the field evolves, the development of more adaptable, comprehensive, and user-friendly tools will empower researchers to fully leverage omics data, uncover complex biological relationships, and inform therapeutic strategies. By addressing current challenges and promoting methodological rigor, the pathway analysis field will continue to be a robust and indispensable component of modern biological research.

Glossary

Pathway Analysis A method for identifying biological pathways enriched with differentially expressed genes or proteins in datasets.
Dimensionality Reduction Techniques used to visualize high-dimensional data in simpler forms for clearer pathway analysis.
Pathway Redundancy The occurrence of overlapping or repeated pathways in analysis, which can complicate interpretation and reduce clarity.
Embedding-Based Methods Computational approaches that represent biological pathways as high-dimensional numerical vectors for analysis.
Semantic Similarity A metric that quantifies the functional similarity between different biological terms or pathways.
Network-Based Analysis A method that visualizes relationships between pathways as interconnected networks, highlighting shared functions or genes.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biotech14030058/s1.

Author Contributions

Conceptualization, R.E.M.; vali-dation, W.G.R.V., S.S. and R.E.M.; formal analysis, W.G.R.V., S.S., J.V. and R.E.M.; resources, J.M. and R.E.M.; data curation, W.G.R.V.; writing—original draft preparation, W.G.R.V. and S.S.; writing—review and editing, W.G.R.V. and S.S.; supervision, C.W., J.M. and R.E.M.; project admin-istration, R.E.M.; funding acquisition, J.M. and R.E.M. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Key Contribution

We review semantic-similarity, network, and embedding approaches for pathway analysis interpretation and distill clear guidelines to translate multi-omic outputs into actionable biological insights.

Funding Statement

This work was supported by NIH grants 1T32GM144873-01, R01MH107487, R01MH121102, R01AG057598, and R01AG083628.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Manzoni C., Kia D.A., Vandrovcova J., Hardy J., Wood N.W., Lewis P.A., Ferrari R. Genome, transcriptome and proteome: The rise of omics data and their integration in biomedical sciences. Brief. Bioinform. 2016;19:286–302. doi: 10.1093/bib/bbw114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Veenstra T.D. Omics in Systems Biology: Current Progress and Future Outlook. Proteomics. 2021;21:e2000235. doi: 10.1002/pmic.202000235. [DOI] [PubMed] [Google Scholar]
  • 3.Herr T.M., Bielinski S.J., Bottinger E., Brautbar A., Brilliant M., Chute C.G., Denny J., Freimuth R.R., Hartzler A., Kannry J., et al. A conceptual model for translating omic data into clinical action. J. Pathol. Inform. 2015;6:46. doi: 10.4103/2153-3539.163985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.García-Campos M.A., Espinal-Enríquez J., Hernández-Lemus E. Pathway Analysis: State of the Art. Front. Physiol. 2015;6:383. doi: 10.3389/fphys.2015.00383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wegman-Points L., Alganem K., Imami A.S., Mathis V., Creeden J.F., McCullumsmith R., Yuan L.-L. Subcellular partitioning of protein kinase activity revealed by functional kinome profiling. Sci. Rep. 2022;12:17300. doi: 10.1038/s41598-022-21026-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ramanan V.K., Shen L., Moore J.H., Saykin A.J. Pathway analysis of genomic data: Concepts, methods, and prospects for future development. Trends Genet. 2012;28:323–332. doi: 10.1016/j.tig.2012.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Krassowski M., Das V., Sahu S.K., Misra B.B. State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Front. Genet. 2020;11:610798. doi: 10.3389/fgene.2020.610798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sboner A., Mu X.J., Greenbaum D., Auerbach R.K., Gerstein M.B. The real cost of sequencing: Higher than you think! Genome Biol. 2011;12:125. doi: 10.1186/gb-2011-12-8-125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.D’Adamo G.L., Widdop J.T., Giles E.M. The future is now? Clinical and translational aspects of “Omics” technologies. Immunol. Cell Biol. 2021;99:168–176. doi: 10.1111/imcb.12404. [DOI] [PubMed] [Google Scholar]
  • 10.Denecker T., Lelandais G. In: Omics Analyses: How to Navigate Through a Constant DataData Deluge, in Yeast Functional Genomics: Methods and Protocols. Devaux F., editor. Springer; New York, NY, USA: 2022. pp. 457–471. [DOI] [PubMed] [Google Scholar]
  • 11.Bell G., Hey T., Szalay A. Computer science. Beyond the data deluge. Science. 2009;323:1297–1298. doi: 10.1126/science.1170411. [DOI] [PubMed] [Google Scholar]
  • 12.Stead W.W., Searle J.R., Fessler H.E., Smith J.W., Shortliffe E.H. Biomedical informatics: Changing what physicians need to know and how they learn. Acad. Med. 2011;86:429–434. doi: 10.1097/ACM.0b013e3181f41e8c. [DOI] [PubMed] [Google Scholar]
  • 13.Pita-Juárez Y., Altschuler G., Kariotis S., Wei W., Koler K., Green C., Tanzi R.E., Hide W. The Pathway Coexpression Network: Revealing pathway relationships. PLoS Comput. Biol. 2018;14:e1006042. doi: 10.1371/journal.pcbi.1006042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chicco D., Agapito G. Nine quick tips for pathway enrichment analysis. PLoS Comput. Biol. 2022;18:e1010348. doi: 10.1371/journal.pcbi.1010348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Reijnders M.J., Waterhouse R.M. Summary visualizations of gene ontology terms with GO-Figure! Front. Bioinform. 2021;1:6. doi: 10.3389/fbinf.2021.638255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Yu C., Woo H.J., Yu X., Oyama T., Wallqvist A., Reifman J. A strategy for evaluating pathway analysis methods. BMC Bioinform. 2017;18:453. doi: 10.1186/s12859-017-1866-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Searson P.C. The Cancer Moonshot, the role of in vitro models, model accuracy, and the need for validation. Nat. Nanotechnol. 2023;18:1121–1123. doi: 10.1038/s41565-023-01486-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Durinikova E., Buzo K., Arena S. Preclinical models as patients’ avatars for precision medicine in colorectal cancer: Past and future challenges. J. Exp. Clin. Cancer Res. 2021;40:185. doi: 10.1186/s13046-021-01981-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Diaz-Uriarte R., de Lope E.G., Giugno R., Fröhlich H., Nazarov P.V., Nepomuceno-Chamorro I.A., Rauschenberger A., Glaab E. Ten quick tips for biomarker discovery and validation analyses using machine learning. PLoS Comput. Biol. 2022;18:e1010357. doi: 10.1371/journal.pcbi.1010357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Grabowski T., Tomczyk A., Wolc A., Gad S.C. Between Biological Relevancy and Statistical Significance—Step for Assessment Harmonization. Am. J. Biomed. Sci. Res. 2021;13 doi: 10.34297/AJBSR.2021.13.001908. [DOI] [Google Scholar]
  • 21.Committee E.S., Hardy A., Benford D., Halldorsson T., Jeger M.J., Knutsen H.K., More S., Naegeli H., Noteborn H., Ockleford C., et al. Guidance on the assessment of the biological relevance of data in scientific assessments. EFSA J. 2017;15:e04970. doi: 10.2903/j.efsa.2017.4970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Perez-Riverol Y., Zorin A., Dass G., Vu M.T., Xu P., Glont M., Vizcaino J.A., Jarnuczak A.F., Petryszak R., Ping P., et al. Quantifying the impact of public omics data. Nat. Commun. 2019;10:3512. doi: 10.1038/s41467-019-11461-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Misra B.B., Langefeld C., Olivier M., Cox L.A. Integrated omics: Tools, advances and future approaches. J. Mol. Endocrinol. 2019;62:R21–R45. doi: 10.1530/JME-18-0055. [DOI] [PubMed] [Google Scholar]
  • 24.Domingo-Fernández D., Mubeen S., Marín-Llaó J., Hoyt C.T., Hofmann-Apitius M. PathMe: Merging and exploring mechanistic pathway knowledge. BMC Bioinform. 2019;20:243. doi: 10.1186/s12859-019-2863-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wieder C., Cooke J., Frainay C., Poupin N., Bowler R., Jourdan F., Kechris K.J., Lai R.P.J., Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. PLoS Comput. Biol. 2024;20:e1011814. doi: 10.1371/journal.pcbi.1011814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Canzler S., Hackermüller J. multiGSEA: A GSEA-based pathway enrichment analysis for multi-omics data. BMC Bioinform. 2020;21:561. doi: 10.1186/s12859-020-03910-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ivanisevic T., Sewduth R.N. Multi-Omics Integration for the Design of Novel Therapies and the Identification of Novel Biomarkers. Proteomes. 2023;11:34. doi: 10.3390/proteomes11040034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mohr A.E., Ortega-Santos C.P., Whisner C.M., Klein-Seetharaman J., Jasbi P. Navigating Challenges and Opportunities in Multi-Omics Integration for Personalized Healthcare. Biomedicines. 2024;12:1496. doi: 10.3390/biomedicines12071496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Khatri P., Sirota M., Butte A.J. Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Comput. Biol. 2012;8:e1002375. doi: 10.1371/journal.pcbi.1002375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Nguyen T.-M., Shafi A., Nguyen T., Draghici S. Identifying significantly impacted pathways: A comprehensive review and assessment. Genome Biol. 2019;20:203. doi: 10.1186/s13059-019-1790-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nam D., Kim S.Y. Gene-set approach for expression pattern analysis. Brief. Bioinform. 2008;9:189–197. doi: 10.1093/bib/bbn001. [DOI] [PubMed] [Google Scholar]
  • 32.Maghsoudi Z., Nguyen H., Tavakkoli A., Nguyen T. A comprehensive survey of the approaches for pathway analysis using multi-omics data integration. Brief. Bioinform. 2022;23:bbac435. doi: 10.1093/bib/bbac435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Winston J.E. Twenty-First Century Biological Nomenclature—The Enduring Power of Names. Integr. Comp. Biol. 2018;58:1122–1131. doi: 10.1093/icb/icy060. [DOI] [PubMed] [Google Scholar]
  • 34.Vassalli P. The pathophysiology of tumor necrosis factors. Annu. Rev. Immunol. 1992;10:411–452. doi: 10.1146/annurev.iy.10.040192.002211. [DOI] [PubMed] [Google Scholar]
  • 35.Webster J.D., Vucic D. The Balance of TNF Mediated Pathways Regulates Inflammatory Cell Death Signaling in Healthy and Diseased Tissues. Front. Cell Dev. Biol. 2020;8:365. doi: 10.3389/fcell.2020.00365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wang D., Yang Q., Abdul A., Lim B.Y. Designing Theory-Driven User-Centric Explainable AI; Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems; Glasgow, UK. 4–9 May 2019; Glasgow, UK: Association for Computing Machinery; 2019. p. 601. [Google Scholar]
  • 37.Jara J.H., Singh B.B., Floden A.M., Combs C.K. Tumor necrosis factor alpha stimulates NMDA receptor activity in mouse cortical neurons resulting in ERK-dependent death. J. Neurochem. 2007;100:1407–1420. doi: 10.1111/j.1471-4159.2006.04330.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sebastian-Leon P., Vidal E., Minguez P., Conesa A., Tarazona S., Amadoz A., Armero C., Salavert F., Vidal-Puig A., Montaner D., et al. Understanding disease mechanisms with models of signaling pathway activities. BMC Syst. Biol. 2014;8:121. doi: 10.1186/s12918-014-0121-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lee J., Jo K., Lee S., Kang J., Kim S. Prioritizing biological pathways by recognizing context in time-series gene expression data. BMC Bioinform. 2016;17:477. doi: 10.1186/s12859-016-1335-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Sjöström J., Bergh J. How apoptosis is regulated, and what goes wrong in cancer. BMJ. 2001;322:1538–1539. doi: 10.1136/bmj.322.7301.1538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Nguyen T.T.M., Gillet G., Popgeorgiev N. Caspases in the Developing Central Nervous System: Apoptosis and Beyond. Front. Cell Dev. Biol. 2021;9:702404. doi: 10.3389/fcell.2021.702404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ryu J.R., Hong C.J., Kim J.Y., Kim E.-K., Sun W., Yu S.-W. Control of adult neurogenesis by programmed cell death in the mammalian brain. Mol. Brain. 2016;9:43. doi: 10.1186/s13041-016-0224-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Anosike N.L., Adejuwon J.F., Emmanuel G.E., Adebayo O.S., Etti-Balogun H., Nathaniel J.N., Omotosho O.I., Aschner M., Ijomone O.M. Necroptosis in the developing brain: Role in neurodevelopmental disorders. Metab. Brain Dis. 2023;38:831–837. doi: 10.1007/s11011-023-01203-9. [DOI] [PubMed] [Google Scholar]
  • 44.Saini S., Kakati P., Singh K. Role of Inflammation in Tissue Regeneration and Repair. In: Tripathi A., Dwivedi A., Gupta S., Poojan S., editors. Inflammation Resolution and Chronic Diseases. Springer Nature; Singapore: 2024. pp. 103–127. [Google Scholar]
  • 45.Choi B., Lee C., Yu J.-W. Distinctive role of inflammation in tissue repair and regeneration. Arch. Pharmacal Res. 2023;46:78–89. doi: 10.1007/s12272-023-01428-3. [DOI] [PubMed] [Google Scholar]
  • 46.Wyss-Coray T., Mucke L. Inflammation in neurodegenerative disease--a double-edged sword. Neuron. 2002;35:419–432. doi: 10.1016/S0896-6273(02)00794-8. [DOI] [PubMed] [Google Scholar]
  • 47.Gasque P., Neal J.W., Singhrao S.K., McGreal E.P., Dean Y.D., Van B.J., Morgan B.P. Roles of the complement system in human neurodegenerative disorders: Pro-inflammatory and tissue remodeling activities. Mol. Neurobiol. 2002;25:1–17. doi: 10.1385/MN:25:1:001. [DOI] [PubMed] [Google Scholar]
  • 48.Shih R.-H., Wang C.-Y., Yang C.-M. NF-kappaB Signaling Pathways in Neurological Inflammation: A Mini Review. Front. Mol. Neurosci. 2015;8:77. doi: 10.3389/fnmol.2015.00077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Sun S.-C. The non-canonical NF-κB pathway in immunity and inflammation. Nat. Rev. Immunol. 2017;17:545–558. doi: 10.1038/nri.2017.52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Adriaens M.E., Jaillard M., Waagmeester A., Coort S.L.M., Pico A.R., Evelo C.T.A. The public road to high-quality curated biological pathways. Drug Discov. Today. 2008;13:856–862. doi: 10.1016/j.drudis.2008.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Shin M.-G., Pico A.R. Using published pathway figures in enrichment analysis and machine learning. BMC Genom. 2023;24:713. doi: 10.1186/s12864-023-09816-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Vivar J.C., Pemu P., McPherson R., Ghosh S. Redundancy control in pathway databases (ReCiPa): An application for improving gene-set enrichment analysis in Omics studies and “Big data” biology. Omics. 2013;17:414–422. doi: 10.1089/omi.2012.0083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Pastrello C., Niu Y., Jurisica I. Pathway Enrichment Analysis of Microarray Data. Methods Mol. Biol. 2022;2401:147–159. doi: 10.1007/978-1-0716-1839-4_10. [DOI] [PubMed] [Google Scholar]
  • 54.Gable A.L., Szklarczyk D., Lyon D., Rodrigues J.F.M., von Mering C. Systematic assessment of pathway databases, based on a diverse collection of user-submitted experiments. Brief. Bioinform. 2022;23:bbac355. doi: 10.1093/bib/bbac355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Maertens A., Tran V.P., Maertens M., Kleensang A., Luechtefeld T.H., Hartung T., Paller C.J. Functionally Enigmatic Genes in Cancer: Using TCGA Data to Map the Limitations of Annotations. Sci. Rep. 2020;10:4106. doi: 10.1038/s41598-020-60456-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Stoney R.A., Schwartz J.-M., Robertson D.L., Nenadic G. Using set theory to reduce redundancy in pathway sets. BMC Bioinform. 2018;19:386. doi: 10.1186/s12859-018-2355-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Supek F., Bosnjak M., Skunca N., Smuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE. 2011;6:e21800. doi: 10.1371/journal.pone.0021800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Yu G., Wang L.G., Han Y., He Q.Y. clusterProfiler: An R package for comparing biological themes among gene clusters. Omics. 2012;16:284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Zhao C., Wang Z. GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms. Sci. Rep. 2018;8:15107. doi: 10.1038/s41598-018-33219-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Hale M.L., Thapa I., Ghersi D. FunSet: An open-source software and web server for performing and displaying Gene Ontology enrichment analysis. BMC Bioinform. 2019;20:359. doi: 10.1186/s12859-019-2960-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Ewing E., Planell-Picola N., Jagodic M., Gomez-Cabrero D. GeneSetCluster: A tool for summarizing and integrating gene-set analysis results. BMC Bioinform. 2020;21:443. doi: 10.1186/s12859-020-03784-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Wang G., Oh D.H., Dassanayake M. GOMCL: A toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions. BMC Bioinform. 2020;21:139. doi: 10.1186/s12859-020-3447-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Yu G. Gene Ontology Semantic Similarity Analysis Using GOSemSim. Methods Mol. Biol. 2020;2117:207–215. doi: 10.1007/978-1-0716-0301-7_11. [DOI] [PubMed] [Google Scholar]
  • 64.Gu Z., Hübschmann D. SimplifyEnrichment: A Bioconductor Package for Clustering and Visualizing Functional Enrichment Results. Genom. Proteom. Bioinform. 2022;21:190–202. doi: 10.1016/j.gpb.2022.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Ge S.X., Son E.W., Yao R. iDEP: An integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinform. 2018;19:534. doi: 10.1186/s12859-018-2486-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Sherman B.T., Hao M., Qiu J., Jiao X., Baseler M.W., Lane H.C., Imamichi T., Chang W. DAVID: A web server for functional enrichment analysis and functional annotation of gene lists (2021 update) Nucleic Acids Res. 2022;50:W216–W221. doi: 10.1093/nar/gkac194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Kolberg L., Raudvere U., Kuzmin I., Adler P., Vilo J., Peterson H. g:Profiler—Interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update) Nucleic Acids Res. 2023;51:W207–W212. doi: 10.1093/nar/gkad347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Prummer M. Enhancing gene set enrichment using networks. F1000Research. 2019;8:129. doi: 10.12688/f1000research.17824.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Reimand J., Isserlin R., Voisin V., Kucera M., Tannus-Lopes C., Rostamianfar A., Wadi L., Meyer M., Wong J., Xu C., et al. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nat. Protoc. 2019;14:482–517. doi: 10.1038/s41596-018-0103-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Yoon S., Kim J., Kim S.-K., Baik B., Chi S.-M., Kim S.-Y., Nam D. GScluster: Network-weighted gene-set clustering analysis. BMC Genom. 2019;20:352. doi: 10.1186/s12864-019-5738-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Kerseviciute I., Gordevicius J. aPEAR: An R package for autonomous visualization of pathway enrichment networks. Bioinformatics. 2023;39:btad672. doi: 10.1093/bioinformatics/btad672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Taylor A., Macaulay V.M., Maurya A.K., Miossec M.J., Buffa F.M. GeneFEAST: The pivotal, gene-centric step in functional enrichment analysis interpretation. arXiv. 2023 doi: 10.1093/bioinformatics/btaf100.2309.00061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Bhuva D.D., Tan C.W., Liu N., Whitfield H.J., Papachristos N., Lee S.C., Kharbanda M., Mohamed A., Davis M.J. vissE: A versatile tool to identify and visualise higher-order molecular phenotypes from functional enrichment analysis. BMC Bioinform. 2024;25:64. doi: 10.1186/s12859-024-05676-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Mohamed A., Bhuva D.D., Lee S., Liu N., Tan C.W., Davis M.J. vissE.cloud: A webserver to visualise higher order molecular phenotypes from enrichment analysis. Nucleic Acids Res. 2023;51:W593–W600. doi: 10.1093/nar/gkad337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Blimkie T.M., An A., Hancock R.E.W. Facilitating pathway and network based analysis of RNA-Seq data with pathlinkR. PLoS Comput. Biol. 2024;20:e1012422. doi: 10.1371/journal.pcbi.1012422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Luo W., Pant G., Bhavnasi Y.K., Blanchard S.G., Brouwer C. Pathview Web: User friendly pathway visualization and data integration. Nucleic Acids Res. 2017;45:W501–W508. doi: 10.1093/nar/gkx372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Ryan V.W., Imami A.S., Sajid H.A., Vergis J., Zhang X., Meller J., Shukla R., McCullumsmith R. Interpreting and visualizing pathway analyses using embedding representations with PAVER. Bioinformation. 2024;20:700–704. doi: 10.6026/973206300200700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Al Abir F., Chen J.Y. Mondrian Abstraction and Language Model Embeddings for Differential Pathway Analysis. bioRxiv. 2024 doi: 10.1101/2024.04.11.589093. [DOI] [Google Scholar]
  • 79.Kolde R., Vilo J. GOsummaries: An R Package for Visual Functional Annotation of Experimental Data. F1000Research. 2015;4:574. doi: 10.12688/f1000research.6925.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Balestra C., Maj C., Muller E., Mayr A. Redundancy-aware unsupervised ranking based on game theory: Ranking pathways in collections of gene sets. PLoS ONE. 2023;18:e0282699. doi: 10.1371/journal.pone.0282699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Weistuch C., Murgas K.A., Zhu J., Norton L., Dill K.A., Tannenbaum A.R., Deasy J.O. Normal tissue transcriptional signatures for tumor-type-agnostic phenotype prediction. Sci. Rep. 2024;14:27230. doi: 10.1038/s41598-024-76625-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Hanspers K., Kutmon M., Coort S.L., Digles D., Dupuis L.J., Ehrhart F., Hu F., Lopes E.N., Martens M., Pham N., et al. Ten simple rules for creating reusable pathway models for computational analysis and visualization. PLoS Comput. Biol. 2021;17:e1009226. doi: 10.1371/journal.pcbi.1009226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.He C., Micallef L., Serim B., Vuong T., Ruotsalo T., Jacucci G. Interactive visual facets to support fluid exploratory search. J. Vis. 2023;26:211–230. doi: 10.1007/s12650-022-00865-4. [DOI] [Google Scholar]
  • 84.Ovchinnikova S., Anders S. Exploring dimension-reduced embeddings with Sleepwalk. Genome Res. 2020;30:749–756. doi: 10.1101/gr.251447.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Li Y., Ge X., Peng F., Li W., Li J.J. Exaggerated false positives by popular differential expression methods when analyzing human population samples. Genome Biol. 2022;23:79. doi: 10.1186/s13059-022-02648-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Radulescu E., Jaffe A.E., Straub R.E., Chen Q., Shin J.H., Hyde T.M., Kleinman J.E., Weinberger D.R. Identification and prioritization of gene sets associated with schizophrenia risk by co-expression network analysis in human brain. Mol. Psychiatry. 2020;25:791–804. doi: 10.1038/s41380-018-0304-1. [DOI] [PubMed] [Google Scholar]
  • 87.Sapienza J., Spangaro M., Guillemin G.J., Comai S., Bosia M. Importance of the dysregulation of the kynurenine pathway on cognition in schizophrenia: A systematic review of clinical studies. Eur. Arch. Psychiatry Clin. Neurosci. 2023;273:1317–1328. doi: 10.1007/s00406-022-01519-0. [DOI] [PubMed] [Google Scholar]
  • 88.Rusina P.V., Falaguera M.J., Romero J.M.R., McDonagh E.M., Dunham I., Ochoa D. Genetic support for FDA-approved drugs over the past decade. Nat. Rev. Drug Discov. 2023;22:864. doi: 10.1038/d41573-023-00158-x. [DOI] [PubMed] [Google Scholar]
  • 89.Ochoa D., Karim M., Ghoussaini M., Hulcoop D.G., McDonagh E.M., Dunham I. Human genetics evidence supports two-thirds of the 2021 FDA-approved drugs. Nat. Rev. Drug Discov. 2022;21:551. doi: 10.1038/d41573-022-00120-3. [DOI] [PubMed] [Google Scholar]
  • 90.Diogo D., Bastarache L., Liao K.P., Graham R.R., Fulton R.S., Greenberg J.D., Eyre S., Bowes J., Cui J., Lee A., et al. TYK2 protein-coding variants protect against rheumatoid arthritis and autoimmunity, with no evidence of major pleiotropic effects on non-autoimmune complex traits. PLoS ONE. 2015;10:e0122271. doi: 10.1371/journal.pone.0122271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.MacNamara A., Nakic N., Al Olama A.A., Guo C., Sieber K.B., Hurle M.R., Gutteridge A. Network and pathway expansion of genetic disease associations identifies successful drug targets. Sci. Rep. 2020;10:20970. doi: 10.1038/s41598-020-77847-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.de la Fuente van Bentem S., Mentzen W.I., de la Fuente A., Hirt H. Towards functional phosphoproteomics by mapping differential phosphorylation events in signaling networks. Proteomics. 2008;8:4453–4465. doi: 10.1002/pmic.200800175. [DOI] [PubMed] [Google Scholar]
  • 93.Ponomarenko E.A., Krasnov G.S., Kiseleva O.I., Kryukova P.A., Arzumanian V.A., Dolgalev G.V., Ilgisonis E.V., Lisitsa A.V., Poverennaya E.V. Workability of mRNA Sequencing for Predicting Protein Abundance. Genes. 2023;14:2065. doi: 10.3390/genes14112065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Prabahar A., Zamora R., Barclay D., Yin J., Ramamoorthy M., Bagheri A., Johnson S.A., Badylak S., Vodovotz Y., Jiang P. Unraveling the complex relationship between mRNA and protein abundances: A machine learning-based approach for imputing protein levels from RNA-seq data. NAR Genom. Bioinform. 2024;6:lqae019. doi: 10.1093/nargab/lqae019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.de Sousa Abreu R., Penalva L.O., Marcotte E.M., Vogel C. Global signatures of protein and mRNA expression levels. Mol. Biosyst. 2009;5:1512–1526. doi: 10.1039/b908315d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Upadhya S.R., Ryan C.J. Experimental reproducibility limits the correlation between mRNA and protein abundances in tumor proteomic profiles. Cell Rep. Methods. 2022;2:100288. doi: 10.1016/j.crmeth.2022.100288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Arshad O.A., Danna V., Petyuk V.A., Piehowski P.D., Liu T., Rodland K.D., McDermott J.E. An Integrative Analysis of Tumor Proteomic and Phosphoproteomic Profiles to Examine the Relationships Between Kinase Activity and Phosphorylation. Mol. Cell. Proteom. 2019;18((Suppl. S1)):S26–S36. doi: 10.1074/mcp.RA119.001540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Liu Y., Beyer A., Aebersold R. On the Dependency of Cellular Protein Levels on mRNA Abundance. Cell. 2016;165:535–550. doi: 10.1016/j.cell.2016.03.014. [DOI] [PubMed] [Google Scholar]
  • 99.Handly L.N., Yao J., Wollman R. Signal Transduction at the Single-Cell Level: Approaches to Study the Dynamic Nature of Signaling Networks. J. Mol. Biol. 2016;428:3669–3682. doi: 10.1016/j.jmb.2016.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Creeden J.F., Alganem K., Imami A.S., Brunicardi F.C., Liu S.H., Shukla R., Tomar T., Naji F., McCullumsmith R.E. Kinome Array Profiling of Patient-Derived Pancreatic Ductal Adenocarcinoma Identifies Differentially Active Protein Tyrosine Kinases. Int. J. Mol. Sci. 2020;21:8679. doi: 10.3390/ijms21228679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Litichevskiy L., Peckner R., Abelin J.G., Asiedu J.K., Creech A.L., Davis J.F., Davison D., Dunning C.M., Egertson J.D., Egri S., et al. A Library of Phosphoproteomic and Chromatin Signatures for Characterizing Cellular Responses to Drug Perturbations. Cell Syst. 2018;6:424–443.e7. doi: 10.1016/j.cels.2018.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Reinecke M., Heinzlmeir S., Wilhelm M., Médard G., Klaeger S., Kuster B. Target Discovery and Validation. Wiley; Hoboken, NJ, USA: 2019. Kinobeads: A Chemical Proteomic Approach for Kinase Inhibitor Selectivity Profiling and Target Discovery; pp. 97–130. [Google Scholar]
  • 103.Patricelli M.P., Nomanbhoy T.K., Wu J., Brown H., Zhou D., Zhang J., Jagannathan S., Aban A., Okerberg E., Herring C., et al. In situ kinase profiling reveals functionally relevant properties of native kinases. Chem. Biol. 2011;18:699–710. doi: 10.1016/j.chembiol.2011.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Alganem K., Hamoud A.R., Creeden J.F., Henkel N.D., Imami A.S., Joyce A.W., Ryan V.W., Rethman J.B., Shukla R., O’Donovan S.M., et al. The active kinome: The modern view of how active protein kinase networks fit in biological research. Curr. Opin. Pharmacol. 2022;62:117–129. doi: 10.1016/j.coph.2021.11.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Cowen L., Ideker T., Raphael B.J., Sharan R. Network propagation: A universal amplifier of genetic associations. Nat. Rev. Genet. 2017;18:551–562. doi: 10.1038/nrg.2017.38. [DOI] [PubMed] [Google Scholar]
  • 106.Qiu P. Embracing the dropouts in single-cell RNA-seq analysis. Nat. Commun. 2020;11:1169. doi: 10.1038/s41467-020-14976-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Nguyen A., Khoo W.H., Moran I., Croucher P.I., Phan T.G. Single Cell RNA Sequencing of Rare Immune Cell Populations. Front. Immunol. 2018;9:1553. doi: 10.3389/fimmu.2018.01553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Franchini M., Pellecchia S., Viscido G., Gambardella G. Single-cell gene set enrichment analysis and transfer learning for functional annotation of scRNA-seq data. NAR Genom. Bioinform. 2023;5:lqad024. doi: 10.1093/nargab/lqad024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Ronen J., Akalin A. netSmooth: Network-smoothing based imputation for single cell RNA-seq. F1000Research. 2018;7:8. doi: 10.12688/f1000research.13511.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Pavel A., Grønberg M.G., Clemmensen L.H. The impact of dropouts in scRNAseq dense neighborhood analysis. Comput. Struct. Biotechnol. J. 2025;27:1278–1285. doi: 10.1016/j.csbj.2025.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Bouland G.A., Mahfouz A., Reinders M.J.T. Consequences and opportunities arising due to sparser single-cell RNA-seq datasets. Genome Biol. 2023;24:86. doi: 10.1186/s13059-023-02933-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Davis D., Wizel A., Drier Y. Accurate estimation of pathway activity in single cells for clustering and differential analysis. Genome Res. 2024;34:925–936. doi: 10.1101/gr.278431.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Livne D., Efroni S. Pathway metrics accurately stratify T cells to their cells states. BioData Min. 2024;17:60. doi: 10.1186/s13040-024-00416-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Lähnemann D., Köster J., Szczurek E., McCarthy D.J., Hicks S.C., Robinson M.D., Vallejos C.A., Campbell K.R., Beerenwinkel N., Mahfouz A., et al. Eleven grand challenges in single-cell data science. Genome Biol. 2020;21:31. doi: 10.1186/s13059-020-1926-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Conroy G. Retractions caused by honest mistakes are extremely stressful, say researchers. Nature. 2025. online ahead of print . [DOI] [PubMed]
  • 116.Kovacs M., Varga M.A., Dianovics D., Poldrack R.A., Aczel B. Opening the black box of article retractions: Exploring the causes and consequences of data management errors. R. Soc. Open Sci. 2024;11:240844. doi: 10.1098/rsos.240844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Wilkinson M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.W., da Silva Santos L.B., Bourne P.E., et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data. 2016;3:160018. doi: 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Doniparthi G., Mühlhaus T., Deßloch S. Integrating FAIR Experimental Metadata for Multi-omics Data Analysis. Datenbank-Spektrum. 2024;24:107–115. doi: 10.1007/s13222-024-00473-6. [DOI] [Google Scholar]
  • 119.Jan M., Gobet N., Diessler S., Franken P., Xenarios I. A multi-omics digital research object for the genetics of sleep regulation. Sci. Data. 2019;6:258. doi: 10.1038/s41597-019-0171-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Supek F., Skunca N. Visualizing GO Annotations. Methods Mol. Biol. 2017;1446:207–220. doi: 10.1007/978-1-4939-3743-1_15. [DOI] [PubMed] [Google Scholar]
  • 121.Gan M., Dou X., Jiang R. From ontology to semantic similarity: Calculation of ontology-based semantic similarity. Sci. World J. 2013;2013:793091. doi: 10.1155/2013/793091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Pesquita C., Faria D., Falcao A.O., Lord P., Couto F.M. Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 2009;5:e1000443. doi: 10.1371/journal.pcbi.1000443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Galeota E., Kishore K., Pelizzola M. Ontology-driven integrative analysis of omics data through Onassis. Sci. Rep. 2020;10:703. doi: 10.1038/s41598-020-57716-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Duong D., Ahmad W.U., Eskin E., Chang K.W., Li J.J. Word and Sentence Embedding Tools to Measure Semantic Similarity of Gene Ontology Terms by Their Definitions. J. Comput. Biol. 2019;26:38–52. doi: 10.1089/cmb.2018.0093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Major V., Surkis A., Aphinyanaphongs Y. Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research. AMIA Annu. Symp. Proc. 2018;2018:1405–1414. [PMC free article] [PubMed] [Google Scholar]
  • 126.Chiu B., Crichton G., Korhonen A., Pyysalo S. How to train good word embeddings for biomedical NLP; Proceedings of the 15th Workshop on Biomedical Natural Language Processing; Berlin, Germany. 12 August 2016. [Google Scholar]
  • 127.Mikolov T., Chen K., Corrado G., Dean J. Efficient estimation of word representations in vector space. arXiv. 2013 doi: 10.48550/arXiv.1301.3781.1301.3781 [DOI] [Google Scholar]
  • 128.Ofer D., Brandes N., Linial M. The language of proteins: NLP, machine learning & protein sequences. Comput. Struct. Biotechnol. J. 2021;19:1750–1758. doi: 10.1016/j.csbj.2021.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Xenos A., Malod-Dognin N., Milinkovic S., Przulj N. Linear functional organization of the omic embedding space. Bioinformatics. 2021;37:3839–3847. doi: 10.1093/bioinformatics/btab487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Asgari E., Mofrad M.R. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS ONE. 2015;10:e0141287. doi: 10.1371/journal.pone.0141287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Kulmanov M., Smaili F.Z., Gao X., Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief. Bioinform. 2021;22:bbaa199. doi: 10.1093/bib/bbaa199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Lerman G., Shakhnovich B.E. Defining functional distance using manifold embeddings of gene ontology annotations. Proc. Natl. Acad. Sci. USA. 2007;104:11334–11339. doi: 10.1073/pnas.0702965104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Chen D.-Q., Kong X.-S., Shen X.-B., Huang M.-Z., Zheng J.-P., Sun J., Xu S.-H. Identification of Differentially Expressed Genes and Signaling Pathways in Acute Myocardial Infarction Based on Integrated Bioinformatics Analysis. Cardiovasc. Ther. 2019;2019:8490707. doi: 10.1155/2019/8490707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Jia R., Li Z., Liang W., Ji Y., Weng Y., Liang Y., Ning P. Identification of key genes unique to the luminal a and basal-like breast cancer subtypes via bioinformatic analysis. World J. Surg. Oncol. 2020;18:268. doi: 10.1186/s12957-020-02042-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Niu Y., Zhang Y., Zha Q., Shi J., Weng Q. Bioinformatics to analyze the differentially expressed genes in different degrees of Alzheimer’s disease and their roles in progress of the disease. J. Appl. Genet. 2024;66:73–85. doi: 10.1007/s13353-024-00827-6. [DOI] [PubMed] [Google Scholar]
  • 136.Gamazon E.R., Zwinderman A.H., Cox N.J., Denys D., Derks E.M. Multi-tissue transcriptome analyses identify genetic mechanisms underlying neuropsychiatric traits. Nat. Genet. 2019;51:933–940. doi: 10.1038/s41588-019-0409-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Kulasinghe A., Liu N., Tan C.W., Monkman J., Sinclair J.E., Bhuva D.D., Godbolt D., Pan L., Nam A., Sadeghirad H., et al. Transcriptomic profiling of cardiac tissues from SARS-CoV-2 patients identifies DNA damage. Immunology. 2023;168:403–419. doi: 10.1111/imm.13577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Dalit L., Tan C.W., Sheikh A.A., Munnings R., Alvarado C., Hussain T., Zaini A., Cooper L., Kirn A., Hailes L., et al. Divergent cytokine and transcriptional signatures control functional T follicular helper cell heterogeneity. bioRxiv. 2024 doi: 10.1101/2024.06.12.598622. [DOI] [Google Scholar]
  • 139.Lee J.-Y., Park S., Park E.J., Pagire H.S., Pagire S.H., Choi B.-W., Park M., Fang S., Ahn J.H., Oh C.-M. Inhibition of HTR2B-mediated serotonin signaling in colorectal cancer suppresses tumor growth through ERK signaling. Biomed. Pharmacother. 2024;179:117428. doi: 10.1016/j.biopha.2024.117428. [DOI] [PubMed] [Google Scholar]
  • 140.Nguyen J.H., Curtis M.A., Imami A.S., Ryan W.G., Alganem K., Neifer K.L., Saferin N., Nawor C.N., Kistler B.P., Miller G.W., et al. Developmental pyrethroid exposure disrupts molecular pathways for MAP kinase and circadian rhythms in mouse brain. bioRxiv. 2024 doi: 10.1101/2023.08.28.555113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Curtis M.A., Saferin N., Nguyen J.H., Imami A.S., Ryan W.G., Neifer K.L., Miller G.W., Burkett J.P. Developmental pyrethroid exposure in mouse leads to disrupted brain metabolism in adulthood. Neurotoxicology. 2024;103:87–95. doi: 10.1016/j.neuro.2024.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.O’Donovan S., Ali S., Deng W., Patti G., Wang J., Eladawi M., Imami A. Shared and unique transcriptional changes in the orbitofrontal cortex in psychiatric disorders and suicide. Transl. Univ. Toledo J. Med. Sci. 2024;12 doi: 10.46570/utjms.vol11-2023-822. [DOI] [Google Scholar]
  • 143.Hu Y., Sun I., Tang E., Ryan W., Shrestha U., Gautam J., Lad A., Huntley J.F., Haller S., Kennedy D. Probiotic Protects Kidneys Exposed to Microcystin-LR. Transl. Univ. Toledo J. Med. Sci. 2024;12 doi: 10.46570/utjms.vol12-2024-823. [DOI] [Google Scholar]
  • 144.Cuevas-Diaz Duran R., Wei H., Wu J. Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets. BMC Genom. 2024;25:444. doi: 10.1186/s12864-024-10364-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Andrews T.S., Hemberg M. False signals induced by single-cell imputation. F1000Research. 2018;7:1740. doi: 10.12688/f1000research.16613.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Yu Y., Mai Y., Zheng Y., Shi L. Assessing and mitigating batch effects in large-scale omics studies. Genome Biol. 2024;25:254. doi: 10.1186/s13059-024-03401-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Hodgman C., French A., Westhead D. BIOS Instant Notes in Bioinformatics. Taylor & Francis; Abingdon, UK: 2009. [Google Scholar]
  • 148.Karp P.D., Midford P.E., Caspi R., Khodursky A. Pathway size matters: The influence of pathway granularity on over-representation (enrichment analysis) statistics. BMC Genom. 2021;22:191. doi: 10.1186/s12864-021-07502-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149.Wijesooriya K., Jadaan S.A., Perera K.L., Kaur T., Ziemann M. Urgent need for consistent standards in functional enrichment analysis. PLoS Comput. Biol. 2022;18:e1009935. doi: 10.1371/journal.pcbi.1009935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Ziemann M., Schroeter B., Bora A. Two subtle problems with over-representation analysis. Bioinform. Adv. 2024;4:vbae159. doi: 10.1093/bioadv/vbae159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 151.Zhao H., Ma C., Xu F., Kong L., Deng Z.-H. BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning. arXiv. 20252502.16660 [Google Scholar]
  • 152.Wong C.-K., Choo A., Cheng E.C., San W.-C., Cheng K.C.-K., Lau Y.-M., Lin M., Li F., Liang W.-H., Liao S.-Y. Lomics: Generation of pathways and gene sets using large language models for transcriptomic analysis. arXiv. 2024 doi: 10.48550/arXiv.2407.09089.2407.09089 [DOI] [Google Scholar]
  • 153.Kamya P., Ozerov I.V., Pun F.W., Tretina K., Fokina T., Chen S., Naumov V., Long X., Lin S., Korzinkin M., et al. PandaOmics: An AI-Driven Platform for Therapeutic Target and Biomarker Discovery. J. Chem. Inf. Model. 2024;64:3961–3969. doi: 10.1021/acs.jcim.3c01619. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from BioTech are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES