Graphical abstract

Keywords: Differentially expressed genes, Molecular mechanisms, Analytical strategies, Biological interpretation, Biological process
Highlights
-
•
The first review to systematically introduce and summarize the tools for maximizing biological information of genes.
-
•
A comprehensive overview of representative tools and algorithms for analyzing differentially expressed genes.
-
•
More than 300 tools, databases, and algorithms are summarized on the website DEGMiner.
-
•
A detailed guideline is provided to help researchers better mine the functions and interactions of genes.
Abstract
Background
Identifying differentially expressed genes (DEGs) is a core task of transcriptome analysis, as DEGs can reveal the molecular mechanisms underlying biological processes. However, interpreting the biological significance of large DEG lists is challenging. Currently, gene ontology, pathway enrichment and protein–protein interaction analysis are common strategies employed by biologists. Additionally, emerging analytical strategies/approaches (such as network module analysis, knowledge graph, drug repurposing, cell marker discovery, trajectory analysis, and cell communication analysis) have been proposed. Despite these advances, comprehensive guidelines for systematically and thoroughly mining the biological information within DEGs remain lacking.
Aim of review
This review aims to provide an overview of essential concepts and methodologies for the biological interpretation of DEGs, enhancing the contextual understanding. It also addresses the current limitations and future perspectives of these approaches, highlighting their broad applications in deciphering the molecular mechanism of complex diseases and phenotypes. To assist users in extracting insights from extensive datasets, especially various DEG lists, we developed DEGMiner (https://www.ciblab.net/DEGMiner/), which integrates over 300 easily accessible databases and tools.
Key scientific concepts of review
This review offers strong support and guidance for exploring DEGs, and also will accelerate the discovery of hidden biological insights within genomes.
Introduction
Transcriptome analysis plays a crucial role in determining the gene expression changes among individuals with and/or without specific diseases. This analysis helps identify differentially expressed genes (DEGs) that may be linked to the investigated disease as genetic triggers, consequences, or indicators [1]. In biomedicine, computationally exploring DEGs has become an essential strategy, aiding in the unraveling of the underlying mechanisms behind complex diseases such as cancer [2], [3], Alzheimer’s disease [4], and epilepsy [5], or in elucidating the body’s response to drug stimulation [6], aging [7], and other perturbations [8].
Among the approaches used for analyzing and mining DEGs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are widely considered as the two most significant ones [9], [10]. GO focuses on the basic biological functions of genes, and KEGG emphasizes the genes within pathways. Nevertheless, both methods share a common flaw of overlooking the interactions between genes or their products [11], [12]. To conquer this limitation, protein–protein interaction (PPI) networks and gene regulatory networks (GRNs) have been employed to explore the interactions among DEGs. Unfortunately, PPI and GRN approaches often neglect the function of non-coding genes (e.g., microRNA) and metabolites that play significant roles in complex biological processes [13], [14], [15]. To bridge these gaps, emerging approaches such as module/pattern analysis, bio-network analysis, survival analysis, drug repurposing, and knowledge graphs (KGs) have been proposed. These approaches provide tailored analyses to investigate diverse states and conditions that regulate biological responses to specific perturbations. For instance, weighted gene co-expression network analysis (WGCNA) has been successfully applied to identify modules associated with autism spectrum disorder [16]. A growing number of omics studies have demonstrated their importance in identifying gene dysfunction in diseases [17], [18] and determining the impact of therapeutic intervention on biological responses [19].
An important task for biologists working in the biomedicine is to understand cellular functions, which requires in-depth interpretation of information from DEGs and building accurate cellular models to generate test hypotheses. Despite advances in methodologies, databases and tools, some researchers still face challenges in fully understanding the specialized bioinformatics techniques for biological interpretation. So far, the scientific community has not reached a consensus on how to comprehensively analyze DEGs to extract effective biological information.
In this systematic review, we summarize popular approaches and provide practical guidelines for deciphering DEGs to uncover the molecular mechanisms underlying gene function. These approaches encompass gene annotation, gene set enrichment analysis, gene-associated networks [including gene regulatory, competing endogenous RNA (ceRNA), and PPI networks], module/pattern analysis (such as co-expression, co-regulation, and other modules), KGs, drug repurposing, clustering analysis, and more. We have compiled numerous related databases and tools on a website named DEGMiner (https://www.ciblab.net/DEGMiner/), which offers convenient access to over 300 databases and tools. To the best of our knowledge, this is the first comprehensive review on deeply mining DEG information across various groups or conditions. It serves as a valuable reference for researchers in the biological field, enabling deeper exploration of DEGs and facilitating the discovery of molecular mechanisms underlying complex diseases, phenotypes, and biological behaviors.
Analytical strategies for deciphering the biological significance of DEGs
Accurate functional annotation of genes
Identification of orthologous genes across species
Functional annotation and enrichment analysis are the pivotal initial steps in speculating gene functions. Existing bioinformatics tools for predicting gene function primarily cater to model organisms, such as Homo sapiens and Mus musculus, limiting their applicability to non-model species. Specially, this constraint poses significant challenges when analyzing gene function for non-model organisms. Even in medical research involving model organisms like Mus musculus, it is still crucial to identify homologous genes playing key functionalities in humans, especially in drug development. Therefore, accurately translating genes from model/non-model species into homologs (i.e., orthologous genes) of the common model species is a necessary step before utilizing existing sophisticated tools for gene function analysis. Current ortholog conversion tools fall into two categories: sequence-free and sequence-based methods. Sequence-free tools convert gene Entrez identifiers (IDs) or symbols directly between diverse species, with examples including AllEnricher [20], biomaRt [21], homologene, OMA browser [22], gprofiler2 [23] and KEGG Orthology. While this direct conversion works well for model species, it can be awkward for non-model species where gene Entrez IDs or symbols are not readily available for downstream analysis. Complementary to sequence-free tools, sequence-based methods identify orthologous genes through DNA sequence comparison, effectively overcoming this limitation. Notable examples of such tools and databases include OMA browser, eggNOG [24], OrthoVenn2 [25], OrthoDB [26] and ORCAN [27].
Gene ontology-based annotation
GO is a standardized framework designed to describe the functional attributes of gene products, containing molecular function (MF), biological process (BP) and cellular components (CC). GO facilitates the integration of annotations from diverse databases, providing a unified approach to characterize gene function [28]. Numerous bioinformatics tools have been developed to annotate and visualize gene functions for DEG lists according to GO terms, and these tools primarily vary in their breadth, scope, and depth of annotation. For instance, WEGO is a tool tailored for GO annotation and visualization of large-scale genomic data across nine model organisms [29]. In contrast, g:Profiler broadens the scope of gene annotation by offering an extensive array of gene and protein functional annotation tools, including GO, pathways, and disease associations [30]. Similarly, agriGO specializes in the annotation of plant gene ontology [31]. Instead of using gene lists, Blast2GO is one of the widely used tools for assigning GO terms to sequences based on similarity search, providing advanced functional analysis in research of non-model species and demonstrating powerful capability of annotating large high-throughput sequences derived from transcriptomics and metagenomics [32].
The standard semantic system provided by GO offers significant advantages in advancing the research of gene functions. However, they also present some limitations. One major limitation is the potential for these standard descriptions to be overly broad and general, which may lead to a lack of detailed descriptions for unique gene functions within specific biological contexts. Consequently, it is imperative for researchers to integrate contextual and detailed information to fully comprehend the specific functions and mechanisms of genes, when utilizing GO for gene function analysis. This approach helps avoid an overreliance on standardized descriptions that might overlook the distinct characteristics of individual genes.
Biological pathway annotation
Although GO annotation provides a lot of information about the basic functions of genes, its terms can be too broad and generic, which may obscure specific and nuanced functions of individual genes within particular biological contexts. To fill these gaps, signaling pathways provide a more detailed description of gene function by illustrating how biomolecules cooperate to carry out cellular tasks in various conditions [33].
The pathway is a collection of physically interacting molecules, primarily proteins and metabolites, organized in specific directions with defined upstream and downstream relationships. To fully understand the biological roles these molecules play in various life processes, it is essential to annotate pathways within their broader context, taking into account their interactions and functions from a holistic perspective. Till now, at least 30 pathway-related databases have been developed, as shown in Fig. 1A. Over the past three decades, the pathway databases have evolved in four main directions: (1) Species-specific databases, such as FDBC for fungal-related pathways and Plant Reactome for plant-related pathways; (2) The databases emphasizing the function-specific pathways, exemplified by the Human Autophagy Database (HAMDB), which consolidates information on cell autophagy; (3) Interactive pathway visualization tools, such as Reactome and the Small Molecule Pathway Database (SMPDB); and (4) The comprehensive databases that also exhibit particular strengths in specific areas, such as KEGG with its diverse datasets (genomics, metabolic pathways, and diseases), PathBank with its unique collection of pathways (covering over 100,000 pathways), and Wikipathways characterized by its openness. Based on those distinct characteristics of pathway databases, it is highly worthwhile to obtain further functional analysis through pathway annotation for very small and gene-dispersed DEG sets across different pathways.
Fig. 1.
Overview of gene functional annotation databases and enrichment analysis methodologies. A) Some representative pathway databases and gene functional annotation databases. It is noteworthy that 13 of the aforementioned databases are members of the InterPro Consortium, which include CATH-Gene3D, CDD, HAMAP, MobiDB, PANTHER, Pfam, PIRSF, PRINTS, PROSITE, SFLD, SMART, SUPERFAMILY, and NCBIfam. B) Three types of methods for enrichment analysis. The schematic diagram demonstrates that ORA employs a hypergeometric test to assess whether the number of query genes is significantly higher than expected by chance. FCS method ranks the gene set based on gene expression levels and tests if the hit genes map to the annotated gene set. TB method integrates scores that measure genes' connectivity within the expression level and their position within a network.
Other annotations of genes
Beyond the conventional GO and KEGG pathway annotations, there are many valuable annotations, such as chromosome cytobands or subcellular gene locations, which provide deeper insights into the intricate correlation between molecular mechanisms and phenotypic traits. Additionally, other types of annotations also include terms related to disease-associated genes, human/mouse phenotypes, hallmark genes, oncogenic signatures, immunologic signatures, cell type signatures, developmental stages, gene variants, literature references, and protein domains.
These diverse annotation strategies have been recognized by tool developers, who have integrated them into various platforms to enhance analytical power. As illustrated in Fig. 1A and detailed in Supplementary Material Table S1, comprehensive tools like Gene Set Enrichment Analysis (GSEA [34]) encompass the majority of annotation types mentioned previously, making them widely applicable across multiple research areas. Besides, some tools specialize in specific annotation types. For instance, the Disease Ontology [35], which focuses on disease-associated genes, has gained significant popularity for its relevance in disease research.
Enrichment analysis
Enrichment analysis offers a distinct approach to computationally identify the biologically significant patterns within a given set of features, such as DEGs concentrated in this review. Rather than mapping genes to their biological annotations directly, enrichment analysis compares the distribution of terms within a gene set of interest against a background distribution to statistically determine the likelihood of a nonrandom distribution [36]. Enrichment analysis can be carried out against various annotation items, such as gene ontology and biological pathways, making it a prominent computational method for integrating newly identified DEGs into existing biological knowledge [34].
Enrichment analysis tools are categorized into three main types according to mathematical principles: over-representation analysis (ORA), functional class scoring (FCS), and topology-based (TB) methods (Fig. 1B). ORA is one of the simplest methods, identifying whether annotated gene sets are over-represented within a given gene set. Tools like DAVID [36] and ToppGene [37] typically use hypergeometric or Fisher's exact tests for ORA to assess the significance of gene set enrichment compared to random chance. Regrettably, ORA overlooks gene-gene relationships and relies heavily on thresholds, which can result in the loss of important information and reduced reliability and reproducibility. Additionally, the use of inappropriate background gene sets and outdated annotation databases can significantly affect ORA results [38]. To overcome these limitations, the FCS and TB methods were proposed, which reduce reliance on background gene sets. FCS, a threshold-free method, uses all gene expression values (including those with small changes in magnitude) to rank genes and then generates a gene set score that reflects the level of enrichment of genes within the set. GSEA, a representative example of FCS [34], ranks genes based on their differential expression levels and calculates an enrichment score for specific gene sets, indicating activation or inhibition based on their position in the ranked list. The TB methods integrate gene expression data with the topological structure of genes and calculate their position, connectivity, and overlap with other gene sets within the network and pathway. Sequentially, they evaluate the association of the gene set with specific biological processes or pathways by comparing the characteristics with those of other gene sets in the entire network. This assessment is based on the observed differences between the characteristics of the target gene set and others within the network [39], such as Pathway-Express [40] and SPIA [41]. However, the computational complexity and the requirement for precise relationships in the input data make these methods costly to compute and the results more complex to interpret.
Enrichment analysis is widely used in omics research to characterize the holistic function of a given gene set. Since omics datasets from proteomics, single-cell RNA sequencing (scRNA-seq), genome-wide association studies (GWAS) and epigenomics exhibit different statistical distributions compared to bulk gene expression data [38], the adaptability of enrichment methods to these varying distributions must be fully considered during data analysis. For example, in proteomics, where detection is biased towards highly expressed proteins and quantifying protein complexes is challenging, FCS-based tools can help mitigate variability and identify robustly enriched protein sets. Numerous enrichment tools have been developed based on these methods, as detailed in Supplementary Material Table S1. As yet, there is insufficient evidence to definitively recommend specific methods for each type of omics. Given the distinct characteristics of each method, using both ORA and FCS/TB methods together may provide more comprehensive and reliable results for identifying enriched pathways or functional categories. Further research and comparative studies are necessary to validate this hypothesis and determine optimal enrichment strategies across different omics datasets. An updated list of web-based tools and R/Python packages is provided in Table 1.
Table 1.
Some representative software for gene set enrichment analysis.
| Name | Statistical method | Resource | PMID |
|---|---|---|---|
| DAVID | Fisher's exact test (modified as EASE score) | https://david.ncifcrf.gov | 19131956 |
| GSEA | Kolmogorov-Smirnov-like test | https://www.gsea-msigdb.org | 16199517 |
| clusterProfiler | Hypergeometric test | Bioconductor package | 22455463 |
| Enrichr | Fisher’s exact test | https://maayanlab.cloud/Enrichr/ | 27141961 |
| PANTHER | Binomial test, Fisher's exact test | https://pantherdb.org | 12952881 |
| ClueGO | Hypergeometric test | Cytoscape plugin | 19237447 |
| Toppgene | Fisher’s inverse χ2 test, hypergeometric test | https://toppgene.cchmc.org | 19465376 |
| EnrichmentMap | Fisher’s exact test | Cytoscape plugin | 21085593 |
| Metascape | Hypergeometric test | https://metascape.org/ | 30944313 |
| g:Profiler | Hypergeometric test | https://biit.cs.ut.ee/gprofiler/ and CRAN package | 27098042 |
| GAGE | Meta-test | Bioconductor package | 19473525 |
| PAGE | Z-score | Python module | 15941488 |
| GeneCodis | Hypergeometric test, χ2 test | https://genecodis.genyo.es/ | 17204154 |
| iDEP | Student’s t-test | https://ge-lab.org/idep/ | 30567491 |
| hypeR | Hypergeometric test | https://github.com/montilab/hypeR | 31498385 |
Gene-associated network analyses
Genes are often functionally interdependent to maintain intercellular and intracellular homeostasis, organizing a complex network with genetic interactions. It is reported that many complicated diseases are caused by perturbated gene networks rather than a single genetic abnormality [42]. Therefore, it is necessary to unravel the regulatory relationships among genes through network analyses, which generally require a holistic exploration and interpretation from different genomic perspectives. At the transcriptome level, genes are directly or indirectly regulated by transcription factors (TFs), microRNAs (miRNAs), and long non-coding RNAs (lncRNAs) [13], [43], forming a variety of gene-related networks (including gene regulation, TF regulation, miRNA regulation, and ceRNA networks). At the protein level, PPI reflects direct physical interactions, helping understand protein functions, identify drug targets, reveal disease mechanisms, and support systems biology research [44]. Collectively, gene-associated network analyses are of vital importance in predicting gene functions, identifying regulation modules related to diseases, and guiding drug design and screening. Table 2 and Supplementary Material Table S2 summarize current knowledge about gene-associated networks, relevant databases, and popular tools for network analysis and visualization.
Table 2.
The tools used for network analysis and visualization.
| Name | gene-gene | gene → gene | TF → gene | miRNA → gene | Year of Release | PMID |
|---|---|---|---|---|---|---|
| Cytoscape | ✓ | ✓ | ✓ | ✓ | 2003 | 14597658 |
| GeneMANIA | ✓ | ✗ | ✓ | ✓ | 2018 | 29912392 |
| ConsensusPathDB | ✓ | ✓ | ✗ | ✗ | 2016 | 27606777 |
| iRegulon | ✓ | ✓ | ✓ | ✓ | 2014 | 25058159 |
| NetworkAnalyst | ✓ | ✓ | ✓ | ✓ | 2014 | 24861621 |
| NAViGaTOR | ✓ | ✓ | ✗ | ✗ | 2009 | 19837718 |
| BioProfiling.de | ✓ | ✗ | ✗ | ✓ | 2011 | 21609949 |
| FunCoup | ✓ | ✗ | ✓ | ✓ | 2009 | 19246318 |
| miRTargetLink | ✗ | ✗ | ✗ | ✓ | 2016 | 27089332 |
| SNOW | ✓ | ✗ | ✗ | ✗ | 2009 | 19454602 |
| FFLtool | ✗ | ✗ | ✓ | ✓ | 2020 | 31830251 |
Gene regulatory network analysis
Gene regulatory network (GRN) is one of the fundamental structures describing gene expression patterns, consisting of both direct and indirect regulatory interactions [45], [46], [47]. Indirect regulatory interactions occur when a gene (e.g., gene A) regulates the expression of gene B, which in turn regulates the expression of gene C (Fig. 2A). In this scenario, gene A and B are considered to have a direct regulatory interaction, while gene A and C exhibit an indirect regulatory interaction. GRNs have emerged as powerful tools for capturing gene-gene interactions that govern mRNA and protein expression levels, thus providing deep outlooks into the complex landscape of transcriptional regulation. Constructing a GRN involves leveraging existing knowledge and customizing it to a specific biological context. The information for building GRNs is typically collected manually from empirical research studies and curated in dedicated knowledge bases. These databases for constructing GRNs fall into three categories: interactome databases (e.g., IntAct [48], SIGNOR [49], BIND [50], PID [51] and Spike [52]), specific function-related databases (e.g., InnateDB [53]) and comprehensive open databases (e.g., NDEx [54]).
Fig. 2.
Gene-associated networks and modules schematic. A) Gene regulatory network. B-C) TF-gene and miRNA-gene interaction networks. The green triangle, circle with two colors and blue cylinder stand for the miRNA, TF and mRNA, respectively. D) The ceRNA network. The triangle, circle and rhombus stand for the miRNA, mRNA and lncRNA, respectively. Gene B interacts with miRNA through miRNA response element (MRE) binding sites, leading to the inhibition of its expression by miRNA binding. miRNA can also be regulated by interacting with Gene C (lncRNA), pseudogenes or Gene A, by MRE site interaction with their seed sequence. ceRNA crosstalk is influenced by the expression levels of all RNA molecules participating in the network, behaving as “competitors” for the same miRNA cluster. E) Protein-protein interaction network and the related functional modules. In PPI network, the nodes indicate the proteins, with the size of the node (degree) indicating the number of links to a given node. Different colors represent the different functions of proteins. F) Co-expression modules. G) Co-regulation modules.
TF-regulation network
The use of higher-order regulatory patterns is also particularly important for gene regulation. The regulators, known as transcription factors (TFs), are proteins that bind to either enhancer or promoter regions of DNA adjacent to the target genes and determine the on/off state of genes. The interaction between several TFs and their target genes forms a complex TF-regulation network in cells (Fig. 2B), and it may be closely associated with many pathological processes, including cellular malfunction and disease pathways [55], and so on. For instance, Rauch et al. identified a diverse transcriptional network comprising pro-osteogenic and antiadipogenic TFs through analyzing the interactions between DEGs and TFs, which sheds new light on disease therapy [56]. TF-gene networks provide valuable knowledge about disease mechanisms and the impact of critical genes on diseases, thereby offering potential clues for clinical diagnosis.
Several databases currently compile the detailed information on TF-regulation network, and different databases with varying preferences and objectives bring forth distinct emphases. AnimalTFDB [57], functioning as a repository for animal TFs information, aids research concerning animals. In contrast, investigations into TF regulatory networks in plants or fungi can utilize JASPAR [58], which predicts TF-gene binding sites, and ENCODE [59], which furnishes extensive gene regulatory information. In the field of plant research, PlantCARE demonstrates a useful web tool, with the powerful capability of predicting the regulatory relationships between TFs and cis-acting elements from a set of co-regulated genes [60]. Naturally, for specific diseases, databases such as CHEA [61], focusing on TF-gene relationships in disease progression, are more suitable. Furthermore, comprehensive GRN maps provided by databases like RegNetwork [62] facilitate the comprehension of the intricacies and dynamics inherent in GRNs.
miRNA-regulation network
MicroRNAs (i.e., miRNAs) have arisen as another class of transcription regulator component which have huge effects on the expression levels of genes [63]. The miRNAs bind to the 3′ untranslated region or other regions of target genes, leading to the silencing or degradation of the target genes. Those silencing interactions are of great importance in many biological processes, including cell development, differentiation, and homeostasis [64], [65]. Similar to TF-regulation networks, miRNA-regulation networks are created in a combinatorial manner (Fig. 2C), which means that one miRNA may regulate one or multiple target genes. Meanwhile, an individual gene may be regulated by multiple miRNAs [66]. Owing to the significant role of miRNAs in the intricate regulatory network of genes, there has been a surge in publications exploring their novel functions. For instance, Ghini et al. discovered a sophisticated regulatory layer involving specific endogenous targets that mediate miRNA gene expression in mammalian cells [67].
Indeed, the intervention or regulation mechanisms of miRNAs and genes in pathological processes are highly complex. These intricate miRNA regulatory details are documented in various databases. Researchers can select the appropriate database according to their needs to access relevant miRNA-gene regulatory relationships. A rigorously validated miRNA-gene interaction database by biological experiments, like miRTarbase [68] and TarBase [69], is always a trustworthy choice. Conversely, the databases that predict miRNA-gene target sites based on classical or machine learning algorithms, including TargetScan [70], miRDB [71] and targetMiner [72], offer insights into unknown miRNA-regulation network.
ceRNA network
In general, a part of non-coding RNAs, such as lncRNAs and circular RNAs (circRNAs), regulate gene expression by competitively binding to proteins or miRNAs, resulting in activation or inhibition [73]. This type of interaction, where molecules compete for binding to shared miRNAs, is referred to as a competing endogenous RNA (ceRNA) network. By sequestering miRNAs, ceRNAs can indirectly regulate the expression of target genes that are also regulated by those miRNAs. At the molecular level, a complex interplay among different RNA transcripts influences gene regulation in various biological processes [74]. Compared to miRNA-gene networks, ceRNA networks are more intricate and complex, involving a larger number of RNA molecules. As shown in Fig. 2D, the nodes within the ceRNA network encompass various types of RNA, including protein-coding mRNAs and non-coding RNAs such as miRNAs, lncRNAs, pseudogenic RNAs and circRNAs [75].
The ceRNA interaction network plays a crucial role in elucidating gene functions and regulatory mechanisms, and several identified ceRNA networks have been implicated in human diseases. For example, Sumazin et al. investigated the ceRNA activity of protein-coding mRNAs in a study on glioblastoma. They identified an extensive network of sponge interactions, mediating crosstalk between different regulatory pathways [76]. As the potential of the ceRNA network continues to be explored, the number of ceRNA interaction databases has been increasing. Prominent examples include LncACTdb [77] and DIANA-LncBase [78], which are centered around lncRNAs, starBase [79], miRSponge [80] and miRcode [81], which concentrate on miRNAs, ceRDB [82] and lnCeDB [83], specialized in the construction and analysis of competing ceRNA networks.
Protein-protein interaction (PPI) network
Protein-protein interactions form the basis of signaling pathways and bionetworks involved in various physiological processes [84], making PPI network construction a valuable approach for understanding cellular functions [85] as well as disease mechanisms [86], aiding drug design and repurposing [87], and deciphering subcellular gene interactions [88], [89] (as shown in Fig. 2E). To date, many important discoveries have been made through the use of PPI networks. For example, Fernández-Torras et al. demonstrated that gene modules within a PPI network can predict drug response, while Escala-Garcia et al. identified mediators of germline-driven variation in breast cancer prognosis through PPI network analysis [90], [91].
The databases containing PPI information serve as a cornerstone when building corresponding networks. The most prominent database in this regard is STRING, which specializes in constructing PPI networks based on the collected and rigorously assessed information of protein–protein interactions [49]. Another one is the IMEx consortium database, which aggregates protein interaction data from various high-throughput techniques and low-throughput experiments [92]. In addition, databases such as HuRI [93] and HPRD [94] focus on human interactions, while BioGRID [95] and InWeb [96] integrate data from multiple species. In addition to the typical databases mentioned above regarding PPI information, state-of-art artificial intelligence techniques have contributed to PPI networks by predicting the complex interactions between proteins, such as AlphaFold [97], Robetta (where RoseTTAFold is deployed) [98], DeepPPI [99], DeepConv-DTI [100] and DPPI [101]. Among these, AlphaFold is the most popular, achieving near-atomic accuracy in protein structure prediction and providing promising opportunities to uncover functional insights into the mechanisms of biological processes.
Module analysis
Molecular networks are known to exhibit a high degree of modularity, with individual modules often consisting of genes (or proteins) that participate in the same biological functions. Modules can enhance functional genome annotation through the principle of guilt-by-association and contribute to a better understanding of disease pathogenesis and progression. As a result, module identification is often a critical step in extracting biological insights from network data. Up to date, a wide range of methods has been designed to recognize the communities (i.e., modules) in bionetworks, including random walk-based methods, modularity optimization methods, local methods, kernel clustering methods, ensemble clustering methods, and hybrid methods [102], [103]. In terms of functionality, computationally predicted modules can be divided into three types, containing basic, co-expression and co-regulation modules. Basic modules represent a fundamental characteristic of many biological networks and are determined through high-density or correlation clustering. These modules share similar functions but not necessarily common expression or regulation patterns (Fig. 2E-G). In contrast, co-expression modules consist of genes with similar expression patterns that collectively serve a specific function, while co-regulation modules consist of genes that are regulated together by a shared regulatory program, influencing their behavior or participating in common biological processes.
Basic module analysis
An important property of the module is its ability to function independently of other modules, with its members having more connections to each other than to members of other modules, as reflected in the network topology [104]. The property of modular independence allows the application of various algorithms for analyzing the modules within networks (Fig. 2E). When tackling module identification, two prevalent approaches are the Infomap [105] and the MCODE [106]. Of which, Infomap is typically suitable for large-scale networks, exhibiting good robustness in effectively partitioning networks and identifying modules. However, it can be sensitive to parameters and entail longer computation time. Conversely, MCODE algorithm focuses on discovering locally dense subgraphs, making it effective for identifying functionally related substructures. Nevertheless, it may overlook global network structures and face higher computational complexity when dealing with large networks. Additionally, the Louvain community detection [107], based on modularity optimization, is renowned for efficiently identifying community structures in large networks. However, its performance can be influenced by the initial node selection, and it may not perform well in networks with overlapping communities. Collectively, the selection of algorithms should be based on careful consideration of the specific application scenarios and objectives, weighing their respective advantages and disadvantages.
Gene co-expression analysis
Divergent from algorithm-based module identification, co-expression refers to the phenomenon in which a set of genes share the same or relatively similar expression pattern concurrently (Fig. 2F). This phenomenon plays a pivotal role in shaping biological phenotypes, as the co-expression of genes is governed by intricate and integrated regulatory pathways spanning across multiple molecular levels [108]. In the pursuit of unraveling comprehensive co-expression networks from complex molecular alterations, researchers employ various methodologies. Among these, constructing co-expression networks based on weighted co-expression and clustering are prevalent. For instance, methods such as WGCNA [109], GeCON [110] and Petal [111] model the co-expression relationships between genes as weighted networks, where the weights represent the expression correlation between genes. In contrast to WGCNA's intricate thresholding, Petal identifies a co-expression network using an automatically defined threshold to indicate similar expression patterns. Additionally, CEMiTool facilitates automatic parameter selection and function enrichment analysis of modules, with more optimized parameter configurations than WGCNA and Petal [112]. Using these methodologies, researchers can systematically analyze gene co-expression networks, identify potential gene modules, and delve into their roles in biological processes, thereby providing vital clues for uncovering novel gene functionalities.
Gene co-regulated analysis
The genes that are simultaneously controlled or influenced by common regulators (e.g., transcription factors, enhancers or repressors) are known as co-regulated genes, and they often occupy close positions on chromosomes, displaying associations within the same expression module or signaling pathway (Fig. 2G). Co-regulated genes share similar expression patterns due to shared regulators, which distinguishes them from co-expressed genes, which achieve similar expression patterns through different mechanisms. It is important to note that both co-regulation and independent co-expression may pass statistical tests designed to measure the similarity of expression patterns and/or the tightness of gene clusters relative to other clusters. Therefore, tools that focus solely on detecting clusters of similarly expressed genes may not effectively discriminate co-regulation from co-expression. For the construction of co-regulated modules, NetworkAnalyst and CoMoFinder [113] can be utilized. NetworkAnalyst is an online platform that supports integrative analysis of gene expression data through statistical, visual and regulated network approaches. CoMoFinder strives to discover reliable composite network motifs in co-regulatory networks consisting of miRNAs, TFs and genes.
Whenever possible, the use of diverse networks is recommended when conducting gene co-regulation analysis, as they contain complementary types of modules [103]. After community detection, over-representation analysis can be applied to unveil the functions of individual gene modules. Additionally, the association between module activity and observed phenotypes can be further investigated and clarified using GSEA.
Knowledge Graph/text mining
Knowledge Graph (KG) has been proposed to discover new associations, patterns and knowledge by performing tasks such as information extraction, attribute definition, and creation of classification summaries against known data and relationships (Fig. 3A) [114]. KGs enhance the comprehensiveness and scientific insight of existing biological network data, enabling the prediction of novel associations between biological factors (such as gene signatures) and phenotypes [115]. This is particularly useful in domains like drug repurposing [116] and tumor research [115]. Currently, there are many popular tools for building KGs, and they are summarized in Supplementary Material Table S3.
Fig. 3.
The analysis and visualization of KGs and potential drug discovery. A) The networks about genes extracted from literature. B) The nodes in graph represent data entities, and the edges represent the relationships between them. The network view depicts original nodes as enriched terms, with node size reflecting the weight of each term. Red nodes indicate up-regulation, green nodes indicate down-regulation, and numbers represent fold changes of DEGs. The size of the letters represents the significance of the P value. C) Gene signature-driven potential drug discovery. Initially, a query signature is prepared by compiling upregulated and downregulated genes associated with a disease state. This signature is then compared to a database of gene expression signatures from known perturbations or disease phenotypes. Compounds that exhibit a similar expression pattern to the disease state (inducing red and suppressing blue) are considered potential side-effect compounds. Conversely, compounds capable of reversing the disease expression pattern (suppressing red and inducing blue) are identified as candidate drugs.
Information extraction
With the rapid growth of scientific literature, manually locating and extracting relevant information is becoming increasingly challenging. Automated text extraction systems have emerged as a more efficient and comprehensive solution. In biomedicine, text mining often focuses on entity–entity interactions/relationships, such as drug-drug interactions (DDIs), PPIs/GRNs, protein-residue associations, or biological processes (such as phosphorylation). Notable software for text mining biomedical literature includes PubTator Central [117] and BEST [118]. PubTator Central specializes in automated annotation and tagging of texts from the PubMed database, while BEST helps users quickly locate information about specific entities within biomedical literature from multiple data sources.
Information assertions
The proliferation of biological data has created significant challenges in integrating and connecting related information from disparate sources. KG and text mining techniques can extract functional relationships and infer new relationships from massive amounts of literature. Additional information in KG can be automatically inferred through graph algorithms and logical reasoning. Knowledge reasoning methods are divided into three main categories: rule-based reasoning, distributed representation-based reasoning, and neural network-based reasoning [119]. Comprehensive application tools such as CROssBAR [120] and BioGraph [121] combine multiple technologies to extract, analyze, and visualize information from large amounts of literature. Another comprehensive tool, Phenolyzer [122], focuses on predicting potential genetic diseases and gene mutations based on genetic variation information and clinical phenotype data.
Information visualization
Word cloud is a popular information visualization method for quickly displaying terms with different frequencies (Fig. 3B), and has several applications in biomedicine [123], [124], [125], including the visualization of GO terms, visualization of pathway analysis results, analysis of literature and text mining, and clustering and annotation visualization. For instance, directly visualizing enrichment terms by their weights helps highlight the main biological processes associated with a group of genes, while filtering out redundant information. Similarly, key gene characteristics, including up- and down-regulation, fold changes between experimental and control groups, and P values, can be displayed using word clouds. Tools like WordCloud [126] and Gephi (https://gephi.org/) are widely used for generating word cloud and creating complex network graphs with various layout and customization options.
Prediction of potential drugs
With the rise of high-throughput experimental techniques and the accumulation of omics data, transcriptome-based methods have become highly promising for drug repurposing [127], [128]. The fundamental idea of drug repurposing is that a specific drug induces unique gene expression signatures in cells, and comparing these gene expression signatures can establish connections with drug- or disease-induced phenotypes (Fig. 3C), thereby uncovering novel indications for existing drugs [129], [130].
Multiple large-scale databases have been developed based on this principle. One of the most famous is Connectivity Map (CMap), which consists of 6,100 gene expression profiles generated by exposing 1,309 compounds to five different cell lines at varying doses [131]. Since its inception, CMap has been instrumental in drug repurposing for cancers [132], neurological diseases [133], cardiovascular diseases [90], and other conditions. Currently, CMap has evolved into CMap2, also known as the LINCS-L1000 program [134], which encompasses 591,697 profiles derived from 29,668 compounds and genetic modifications (referred to as “perturbagens”) across 98 diverse cell lines. The substantial expansion in scale and breadth of CMap2 offers promising opportunities for enhanced pharmacogenomics investigations [133], [135].
In addition to the CMap series of databases, the Drug Signatures Database (DSigDB) is a widely utilized repository for drug gene-expression signatures [136]. DSigDB contains over 22,000 gene sets from different drugs, which can be used for drug-repurposing analysis using approaches such as signature similarity-based methods. Furthermore, PharmGKB is the most comprehensive pharmacogenomics knowledgebase, collecting extensive genotype and phenotype information linked to the pharmacogenome [137]. It contains data on 100 clinical dosing guidelines, 498 drug labels, 3,753 clinical annotations, 130 pathways, 65 pharmacogenes, and over 20,000 genetic variations.
Bioactive compounds from natural products in traditional Chinese medicine (TCM) represent a diverse and valuable source of potential drugs. To effectively identify candidate small molecules and active compounds that interact with the therapeutic targets and treat disease, a general strategy has been established, supported by various versatile and useful database resources. The process of predicting potential drug candidates typically involves integrating network pharmacology with multi-omics data, such as transcriptomics, proteomics, and metabolomics. The discovery of potentially effective components and active compounds from herb medicine can be divided into four main steps. The initial step is to decipher the chemical components within the TCM using text mining, database searching, and metabolomics technologies such as liquid chromatography/mass spectrometry (LC/MS), gas chromatography/mass spectrometry (GC/MS) and nuclear magnetic resonance (NMR) spectroscopy. Next, these components are screened against databases such as Traditional Chinese Medicine Systems Pharmacology Database (TCMSP) [138], based on the criteria like oral bioavailability, drug half-life and drug-likeness. In the third step, potential therapeutic targets for the disease or the chemical components are gathered or predicted through online databases. These include the Therapeutic Target Database (TTD) [139], TCMSP [138], SwissTargetPrediction [140], GeneCards [141], DrugBank [142] and PharmMapper [143], all of which provide target information related to diseases and chemical compounds. Once potential treatment target genes are obtained, a PPI network can be constructed, providing insights into interactions between chemical molecules and the targets. Finally, active compounds can be evaluated according to the compound-target interaction network using criteria such as the contribution index [144], ingredient efficacy scores [145], enrichment scoring algorithm based on a binomial statistical model [146] and maximal clique centrality algorithm [147].
In this process, transcriptomics data can also contribute to the discovery of potentially active small molecules and compounds with medicinal value. By revealing differential gene expression between control and diseased or treated samples, the statistically significant DEGs identified from transcriptomic analyses, particularly genes that are upregulated or uniquely expressed in patient tissues, may become potential therapeutic targets in future clinical trials [148], [149]. Apart from those predicted by algorithms and retrieved from databases, these potential target genes can also be used to construct compound-target networks and perform enrichment analysis. This helps prioritize hub genes, uncover functional categories, and provide new research perspectives. In summary, by integrating DEGs with network pharmacology, researchers can gain deeper insights into the biological processes affected by diseases or treatments and identify both therapeutic targets and potential drug candidates.
A summary of computational drug prediction tools is listed in Table 3, with additional tools provided in Supplementary Material Table S4.
Table 3.
A summary of computational drug prediction tools.
| Name | Linking to DB | Resource | Year of release | PMID |
|---|---|---|---|---|
| Clue | CMap, LINCS-L1000 | https://clue.io/command | 2017 | 29195078 |
| Integrity | Integrity | https://integrity.clarivate.com/ | 2013 | 23593264 |
| CREEDS | LINCS-L1000 | https://amp.pharm.mssm.edu/CREEDS/ | 2016 | 27667448 |
| L1000CDS2 | LINCS-L1000 | https://maayanlab.cloud/L1000CDS2/ | 2016 | 28413689 |
| DvD | CMap, DrugBank, MeSH | Bioconductor package and Cytoscape plugin | 2013 | 23129297 |
| DeSigN | GDSC | http://design.cancerresearch.my/ | 2017 | 28198666 |
| cogena | CMap, LINCS-L1000, CTD | Bioconductor package | 2016 | 27234029 |
| ksRepo | CTD | https://github.com/adam-sam-brown/ksRepo | 2016 | 26860211 |
| gene2drug | CMap | https://gene2drug.tigem.it/ | 2018 | 29236977 |
| GeneExpressionSignature | CMap | Bioconductor package | 2013 | 23374109 |
| PDOD | CTD, DrugBank, MeSH | http://gto.kaist.ac.kr/pdod/index.php/main | 2016 | 26818006 |
| DTX | KEGG DRUG, DrugBank, NDB Open Data, PMDA JADER, Database relations |
https://harrier.nagahama-i-bio.ac.jp/dtx/ | 2021 | 38097606 |
| Phosprof | Reactome, PDB | https://phosprof.medals.jp/ | 2022 | 35994309 |
Condition-specific gene expression analysis
Inferring spatial and temporal-specific gene expression patterns or markers
Although different tissues or developmental stages in organisms may share some common biological processes, their gene expression patterns vary significantly. This variation suggests that different regulatory mechanisms control spatial and temporal specificity [150]. Understanding the specific expression and regulation of genes in these contexts is essential for exploring genetic relationships, the etiology of tissues and developmental stages, and discovering new therapeutic targets (Fig. 4A). For example, SIRT1 regulates glucose and fatty acid metabolism in the liver but inhibits fat mobilization in adipose tissues during fasting [151]. Additionally, SIRT1 expression is high at certain stages of mouse embryonic development but declines with further organogenesis [152]. Such variations underscore the distinct regulatory mechanisms that shape gene expression patterns across different spatial and temporal domains.
Fig. 4.
Overview of condition-specific analysis based on gene signatures. Gene signature with the ability to serve as the marker with A) spatial or temporal specificity, and usage for conducting B) cell deconvolution in spatial and bulk data, C) inferring trajectory, and D) uncovering cell–cell communication.
In research practice, several large-scale bio-projects and initiatives (e.g., GTEx [153], Expression Atlas [154], Human Proteome Map [155], and RNA-Seq Atlas [156]) provide valuable data on gene expression levels and patterns across various tissues and developmental time points (Additional software and tools are listed in Table 4). All these resources enhance our understanding of the spatial and temporal dynamics of gene expression within organisms.
Table 4.
Several query tools for tissue and developmental stage-specific gene expression.
| Name | Statistical method | Resource | Year of release | PMID |
|---|---|---|---|---|
| GEPIA | TPM cut-off | http://gepia.cancer-pku.cn/ | 2017 | 28407145 |
| HumanBase | Bayesian integration | https://hb.flatironinstitute.org | 2015 | 25915600 |
| Expression Atlas | FPKM cut-off | https://www.ebi.ac.uk/gxa | 2010 | 19906730 |
| ToppCluster | Hypergeometric test | https://toppcluster.cchmc.org | 2010 | 20484371 |
| TISSUES | confidence score | http://tissues.jensenlab.org and Cytoscape plugin | 2015 | 26157623 |
| TissueEnrich | Hypergeometric test | https://tissueenrich.gdcb.iastate.edu/ | 2019 | 30346488 |
| ORGANizer | Hypergeometric test | geneorganizer.huji.ac.il | 2017 | 28444223 |
| deTS | Fisher’s exact test, t-test | CRAN package | 2019 | 30824912 |
| TS-GOEA | Hypergeometric test | https://bioinformaticshome.com/tools/rna-seq/descriptions/TS-GOEA.html | 2019 | 31760951 |
| TEnGExA | FPKM cut-off | http://webtom.cabgrid.res.in/tissue_enrich/ and github package | 2021 | 32960209 |
| Dynamic-BM | FPKM cut-off | http://bioinfo.ibp.ac.cn/Dynamic-BM/ | 2018 | 28575155 |
| ADEIP | Two tailed Mann Whitney U test | http://gb.whu.edu.cn/ADEIP/ | 2021 | 34254996 |
| IID | Expression cut-off | https://ophid.utoronto.ca/iid | 2016 | 26516188 |
| GENT | ANOVA or t-test | http://gent2.appex.kr/gent2/ | 2019 | 31296229 |
| diseaseQUEST | Wilcoxon rank sum test | https://github.com/FunctionLab/diseasequest-docker/ | 2018 | 30346941 |
| WebCSEA | Permutation-based cell-type specificity test | https://bioinfo.uth.edu/webcsea/ | 2022 | 35610053 |
Discovering tissue-specific gene markers (or cell markers)
When analyzing bulk RNA-seq data, DEGs that are upregulated in specific tissues are typically defined as tissue-specific gene markers. After further thorough experimental validation, these markers can be used for tissue labeling, targeted treatment of disease, and organ development studies. Similarly, in scRNA-seq, numerous cell type-specific DEGs (particularly those that are upregulated) have been identified and recommended as cell markers through experimental studies and scRNA-seq analysis, aiding in cell annotation [157], [158]. The identification of cell markers has been carefully performed and validated in many publications, and relevant databases have been gradually developed, e.g., CellMarker [159] and PlantCellMarker [160]. These cell (or tissue) marker genes, discovered through additional analysis of DEGs, have found wide applications in various fields, including cell and tissue identification, analysis of complex tissue microenvironment, pseudotime analysis of cells, RNA stability analysis, and prediction of cell communication. In this review, we focus on the significance of cell type-specific markers derived from DEGs and their potential biological implications.
-
(1)
Estimation of cell type composition in complex tissue
As shown in Fig. 4B, cell markers can be used for estimating cell type compositions and proportions within complex tissues [161]. There are two primary types of technologies that utilize gene signatures along with expression profile data to deduce the cellular composition of mixed samples: enrichment-based methods and deconvolution algorithms [162]. The former approach typically requires assessing the enrichment scores of individual cell types, where cell type-specific genes are highly expressed in the sample of interest and expressed at lower levels in other samples [163]. Prominent examples of enrichment methods include MCPcounter [164], xCell [162], ImmuCellAI [165]. However, it is important to note that incomplete or inaccurate gene lists can lead to incorrect estimates of cell type enrichment. Moreover, genes specific to certain cell types may also be expressed in other cell types, introducing errors in enrichment analysis. By mathematically modeling mixed data, deconvolution algorithms can estimate the contribution of each cell type in mixed samples, which helps overcome the challenges in estimating cell type enrichment. Deconvolution algorithms enable quantitative estimation of cell type proportions by utilizing cell type-specific gene expression signatures [166], dividing into three main classes: linear regression approach [167], integer linear programming approach [168], and machine learning approach [169]. These deconvolution methods have been developed with an emphasis on various perspectives. For instance, CIBERSORT [169] and TIMER2.0 [170] are tailored for the identification and quantification of immune cell types, MuSiC [14] primarily addresses sample heterogeneity and technical variability issues in single-cell data, while EPIC [171] is predominantly utilized for identifying and analyzing DNA methylation patterns within individual cells. The more cell composition analysis tools are summarized in Supplementary Material Table S5.
-
(2)
Dynamics inference: pseudotime and RNA velocity analysis
In a single-cell transcriptome analysis, trajectory inference seeks to predict the evolving patterns in a single-cell transcriptome landscape by considering each cell's transcriptome as a fixed snapshot at a specific time point within a cellular process. These sequential snapshots form a dynamic trajectory illustrating cells' progression through varying states, commonly referred to as a “pseudo-temporal trajectory” [172]. For trajectory inference, the computational burden is a major constraint. To minimize computational consumption, the highly variable features usually are used to carry out trajectory inference (Fig. 4C). In actual practice, DEGs are often treated as highly variable genes for cell trajectory inference and subsequent analysis.
Currently, there are two main strategies for cell trajectory analysis, including pseudotime analysis [173] and RNA velocity analysis [174]. Pseudotime analysis allows us to reconstruct dynamic biological processes without sampling tissues at different time points, identify critical transition points between distinct cell states, and analyze shifts in cell-type composition and cell synchronization. RNA velocity is a computational technique that estimates the future transcriptional trajectory of individual cells by analyzing the relative abundances of spliced and unspliced mRNA, leveraging the assumption that unspliced mRNA represents nascent transcripts while spliced mRNA reflects mature, stable transcripts [175].
In the past few years, several popular trajectory inference tools have been released one after another, such as Monocle [173], Slingshot [176], TIMEOR [177], scVelo [178], CellRank [179], and VeloViz [180]. As a prevail trajectory inference method, Monocle3 adopts an enhanced principal graph-embedding procedure to refine the details of learned trajectories, reduce the running time and enable the identification of loop-structured cellular development. Besides, it also has unique versatility, featured by the additional functions of identifying genes with trajectory-dependent expression and allowing users to visualize the analytical results in different ways. In pseudotime analysis, Slingshot primarily focuses on pseudotime inference and transition point identification [176], while TIMEOR is more dedicated to recognizing time information and conducting time-series analysis [177]. To overcome the limitations of the original RNA velocity model [175], scVelo employs a likelihood-based dynamical model, which effectively infers gene-specific transcriptional dynamics and resolves distinct kinetics in heterogeneous subpopulations. CellRank facilitates pseudotime and RNA velocity analyses, especially for handling branching and cyclic cellular developmental pathways [179]. VeloViz is an R package that provides the rich visualization capabilities and customization options [180]. More detailed information on these tools can be found in Supplementary Material Table S6.
-
(3)
Cell-cell communication prediction
Cell communication across multiple cell types and tissues extensively relies on interactions between secreted ligands (such as hormones, growth factors, chemokines, cytokines, and neurotransmitters) and cell-surface receptors [181], [182], and plays a critical role in the regulation of early embryonic development, tissue and organ development, tumorigenesis, and cross-cellular metabolic homeostasis [183], [184], [185], [186]. Typically, signaling events between cells are mediated by protein interactions, such as ligand-receptor binding (Fig. 4D). Transcriptome data (e.g., DNA microarray, bulk RNA-seq and scRNA-seq data) are commonly recommended for analyzing cell communication due to their accessibility compared to proteomics. These data is applied to infer cellular communication by predicting ligand-receptor interactions based on the differential expression of ligands and receptors between different cell types or samples [181].
To enhance cellular communication analysis, databases of ligand-receptor pairs and corresponding computational tools are continually evolving. These resources provide essential support for investigating intercellular communication. Notably, databases and tools vary in their focus and scope. CellPhoneDB [187] specializes in intercellular signaling, particularly ligand-receptor interactions, while ConnectomeDB [188] focuses on brain connectomics, analyzing connectivity patterns within the brain. Several computational tools have been developed to calculate communication scores between different cells or samples according to gene expression profiles. While a considerable portion of these tools is primarily tailored for single-cell transcriptomic data, CellChat [189], CellCall [189], iTALK [190], SpaOTsc [191] and scTensor [192]), are capable of analyzing intercellular communication in bulk transcriptomic data. Among these, SpaOTsc integrates spatial and transcriptomic information, enabling the analysis of spatial distribution and communication patterns between cells. For additional details on these tools and databases, refer to Supplementary Material Table S7.
Sample label prediction
The prediction of sample labels is a critical challenge in scenarios where there is a need to discover new groups or enhance diagnostic accuracy with limited labeled samples. Classification and clustering methods are two types of well-established and effective machine learning approaches to address this problem. As shown in Fig. 5A, classification methods can predict the categories of samples or the types of cells based on gene lists, making them widely applicable in fields such as single-cell genomics and spatial transcriptome analyses. On the other hand, clustering analysis divides data into subsets, grouping similar patterns together based on DEGs [193]. This approach is also valuable for exploring subgroups within a dataset and uncovering new functions of genes within the same cluster. Additionally, clustering analysis can also be used to establish relationships between subgroups and clinical annotations or to assess batch effects in samples. A number of clustering tools offering a variety of algorithms and the ability to visualize the analysis results are listed in Table 5.
Fig. 5.
Illustration of sample label prediction and gene-phenotype association analysis. A) The sample label prediction mainly relies on two strategies, classification prediction and clustering analysis. B) Gene and phenotype association analysis. Based on gene expression level, the interactions between genes and phenotypes (e.g., disease, tissue, cell state, cell or organism morphology) were inferred based on machine learning algorithms. The single-cell genomics provides a means to quantitatively annotate cell states on the basis of high-information content and high-throughput measurements according to the gene expression level. The SNP-gene-phenotype association strategy consists of two kinds: (1) Direct model development. This model is based on SNP-gene-phenotype association using data integration algorithms with gene clusters (sets) and expression levels. SNP clusters (sets) corresponding to the selected gene clusters can be identified by eQTL data. (2) Modeling using reference panel: the TWAS strategy. TWAS consists of three steps: (i) Modeling based on a reference panel to establish the relationship between SNPs and gene expression levels. Samples in the reference panel have genotype and expression level data for fitting the relationship between these SNP loci and corresponding gene expression levels (selecting SNP loci within 500 kb or 1 M range upstream and downstream of the gene). (ii) Using the model in step (1) to predict the gene expression levels for another set of individuals with genotype data. (iii) Analyzing the association between genes and phenotype using predicted gene expression levels. C) The principle of time-to-event (survival) analysis.
Table 5.
The popular classification and clustering tools based on gene expression profiles.
| Name | Description | Resource | Remark |
|---|---|---|---|
| Supervised learning: classification model development tools | |||
| caret | ·streamline the process of model training for classification. | https://topepo.github.io/caret/index.html | https://doi.org/10.18637/jss.v028.i05 |
| Tidymodels | ·building models using tidyverse principles. | https://www.tidymodels.org/packages/ | − |
| mlr3verse | ·data.table and R6 ·parallel computing ·building “graph” flow learners ·unified interface ·advanced machine learning algorithms |
https://github.com/mlr-org/mlr3 | https://doi.org/10.21105/joss.01903 |
| MASS | ·providing multiple datasets ·basic models and statistical algorithms |
https://cran.r-project.org/web/packages/MASS/index.html | https://www.stats.ox.ac.uk/pub/MASS4/ |
| Unsupervised learning: clustering tools | |||
| Nbclust | ·30 indexes for determining the optimal number of clusters ·providing the best clustering scheme |
https://cran.r-project.org/web/packages/NbClust/ | https://doi.org/10.18637/jss.v061.i06 |
| ClustVis | ·user-friendly and clustering visualization web tool | https://biit.cs.ut.ee/clustvis/ | PMID: 25969447 |
| TimeClust | ·user-friendly software package to cluster genes according to temporal expression profiles. ·two original algorithms expressed designed for clustering short time series together ·Windows and LINUX platforms can be downloaded free |
http://aimed11.unipv.it/TimeClust/ | PMID: 18065427 |
| Medusa | ·highly interactive and it supports weighted and multi-edged graphs ·a variety of layout and clustering methods for visualization |
https://sites.google.com/site/medusa3visualization | PMID: 21978489 |
| wcd | ·compact memory-large files can be clustered on a single processor and very fast | https://code.google.com/p/wcdest | PMID: 18480101 |
| cola | ·helps users to select optimal parameter values. ·provides rich functionalities to apply multiple partitioning methods in parallel and directly compare their results ·generates a comprehensive HTML report. |
Bioconductor package | PMID: 33275159 |
| hiplot | ·open and advanced one-stop biomedical visualization and analysis platform with various modules | https://hiplot.cn/ | PMID: 35788820 |
Gene-phenotype association analysis
Transcriptome-wide association study (TWAS)
Gene-environment interactions are fundamental in determining individual traits, or phenotypes, which include molecular or cellular characteristics, morphological traits, behaviours, and so on (Fig. 5B). To delve into the specific molecular mechanisms underlying the relationship between gene expression quantitative loci (eQTL) and phenotypes, transcriptome-wide association studies (TWAS), akin to GWAS [194], have emerged as an important new strategy. TWAS uses genotype data to estimate gene expression based on reference transcriptomic datasets (e.g., GTEx) and then links the predicted gene expression with phenotypic traits to identify gene-trait associations. By correlating these estimated expression phenotypes with disease phenotypes, researchers can identify gene expression changes associated with diseases. This approach reveals critical gene expression changes relevant to disease prediction, diagnosis, and treatment. TWAS is particularly useful for identifying functional genes regulated by disease-associated variants, thereby providing insights into disease mechanisms and other phenotypic characteristics [195]. Recently, single-cell genomics has gained increasing traction as an effective approach to overcome this limitation, and this technology has demonstrated superior capability in investigating the correlation between gene expression levels and eQTL strength across diverse cell types [196].
Over the past decade, several methodologies have been developed for conducting TWAS and analyzing the associations between single-nucleotide polymorphisms (SNPs)/genes and phenotypes to uncover genetic variations linked to complex human diseases or traits. These findings have unveiled new connections between genes and traits, enhancing our comprehension of the complexities in various traits and finding practical applications across diverse clinical settings [197], [198]. Prominent tools in TWAS analysis include FUSION [199], PrediXcan [200], MetaXcan [201], SMR [202], and so on. Of which, FUSION [199] integrates various genotype and expression data, providing a robust statistical framework for association analysis. PrediXcan [200] and MetaXcan [201], employ linear regression models to estimate gene expression and correlate it with disease phenotypes, supporting a range of phenotypic combinations and offering enhanced statistical capabilities and visualization tools. SMR identifies significant associations between the expression levels of certain genes and complex traits using summary data from GWAS and eQTL studies [202]. These tools are crucial for TWAS analysis and are widely used in various fields, such as medicine and agriculture. For example, Li et al. conducted a systematic analysis of gene expression, structural variations and alternative splicing in soybeans using SMR and FUSION, to investigate the genetic basis of traits at the gene level [203]. You et al. used FUSION to determine the association between gene expression and fiber quality traits in their study on the regulatory controls of duplicated gene expression during fiber development in allotetraploid cotton [204].
Time-to-event analysis
Time-to-event analysis, also called survival analysis, encompasses a learning framework and a range of techniques employed to estimate the duration until specific events occur based on observed data (Fig. 5C). In biomedicine, time-to-event analysis is widely used to evaluate the influence of gene expression levels on different events (e.g., patient progress, disease recurrence, biological persistence, and animal behavior [205]). Valuable findings can be obtained by constructing a curve based on gene expression deviations. For instance, an upregulated gene expression may indicate a shorter time until death, or the presence of an upregulated gene post-treatment could predict a favorable prognosis.
To handle events that are not binary or variables that change over time, multistate modeling has been explored [206]. While the Kaplan-Meier and Cox regression methods are commonly used statistical techniques in time-to-event analysis, the field has progressively shifted towards incorporating different machine learning methods, such as random forest, Naïve Bayes, and K-nearest neighbors.
DEGMiner, a comprehensive website hub was developed
Given the lack of a comprehensive guide to the analysis of DEGs and their subsequent implications, we have created a robust online platform called DEGMiner (https://www.ciblab.net/DEGMiner/). This platform, developed using shiny and rmarkdown under R (Fig. 6A), deploys a centralized repository of analytical tools and web-based databases that facilitate the interpretation and exploration of biological data associated with DEGs. DEGMiner offers researchers a comprehensive selection of alternative downstream strategies for a specific gene list. With over 300 tools and databases covering nine kinds of strategies for DEG analysis, this website serves as a valuable resource. Additionally, DEGMiner collects features and additional information about the databases, as well as installation instructions, environmental deployment details, and other metadata to provide users with a preliminary understanding. The primary goal of DEGMiner is to support readers in quickly and easily finding tools for analyzing DEGs, thereby reducing the time spent on manual searching. It lays a solid foundation for numerous bioinformatics studies and helps beginners understand the molecular mechanisms underlying the DEGs.
Fig. 6.
DEGMiner website and practical guidelines for users. A) The homepage of DEGMiner website (https://www.ciblab.net/DEGminer/). B) Practical guidelines for users.
Risks and challenges in practice
Due to insufficient attention in certain fields, excessive dependence on prior information, intrinsic limitation of analytical approaches, and usage preferences, the downstream analysis of gene lists poses four primary challenges, which we mainly focus on for further discussion and elucidation.
-
(1)
The annotation and enrichment of non-model species have not received adequate attention, and many tools lack support for these species. In bioinformatics analysis, it is essential to facilitate the convenient and rapid conversion of homologous genes from non-model species to model species. Therefore, the development of robust and user-friendly homologous gene conversion tools is an urgent priority.
-
(2)
Network analysis usually depends on proven knowledge bases. Since there is little prior knowledge about gene-small molecule interactions, it is difficult to infer such interactive relationships, construct the network and track the upstream or downstream regulatory molecules of target genes. In such cases, we suggest searching for conserved motifs among sequences through BLAST [207], and subsequently targeting potential elements that regulate gene activity within commonly conserved regions in the absence of the specifically curated and summarized interaction information.
-
(3)
Computational drug repurposing approach based on gene expression changes closely links gene expression to drug treatment. It saves time and money, as well as reduces the possibility of identifying drugs with high toxicity to some extent. However, pharmacologically relevant effects may not be primarily reflected at the transcriptional level. Moreover, the current database consists of a small number of compounds (compared to the large number of drugs currently available), and different treatment durations can also lead to batch effects in the results [208].
-
(4)
When it comes to extensively exploring the information related to DEGs, a major future trend is the creation of comprehensive tools that are more user-friendly, efficient, and accessible, without the need for a coding background. Alternatively, some complex methods (with numerous parameter settings) will be designed to be “simpler”, such as built-in deep learning capabilities, which can directly recommend suitable parameters or optimal results based on the characteristics of the data. In many cases, a tool or method providing a number of parameters may create a “complexity trap” and it will be easily abandoned by users if it takes a long time to learn the excessive parameters before using them.
Overall, the development trend for tools should be towards a simple operation that mobilizes complex operations and then obtains the optimal results.
Discussion and suggestion
DEGs are commonly used to characterize genetic differences between two or more biological sample groups, in support of specific hypothesis-driven studies [209], [210]. However, analyzing a large number of DEGs poses challenges and requires careful consideration when applying bioinformatics methods. Therefore, it is essential to have a systematic and directional analysis workflow.
In this review, we not only present a complete methodology for the analysis of DEGs but also offer a practical guide for researchers to choose the appropriate methods (Fig. 6B). First, users should consider whether converting genes into orthologs of model species is necessary to ensure broader applicability in various analysis processes. If researchers are specifically interested in gene interactions, they may opt for module-based methods; otherwise, they can choose module-free methods. Gene function enrichment should be prioritized if the focus is on gene function or roles in particular pathways. For detailed gene interactions, network construction is recommended, and two types of topologies are possible. Known entities from different networks may offer meaningful insights, while unknown entities from knowledge graphs may lead to new discoveries. Transcription or protein-level regulatory interactions can be chosen if the focus is on known coding genes, with miRNA regulation network and ceRNA for regulated interactions of non-coding genes and GRN and TF regulation for regulated interactions of coding genes. If the researchers are interested in module or community interactions of genes, the module analysis would be suitable for downstream projects, followed by additional network analysis.
If researchers focus on a few genes and related molecular mechanisms, the module-free methods should be utilized (Fig. 6B). Gene annotation and condition-specific analysis are effective ways to obtain functional information about genes. Particularly, gene annotation can provide basic functional descriptions when the enrichment service does not return significant (i.e., P value > 0.05) enriched terms, or when there are few genes in the analysis. Researchers can also use other relevant knowledge to mine biological information, such as phenotype information about time and state for survival analysis, SNP data and gene expression data for gene-phenotype association analysis, gene expression matrix for drug repurposing, cluster analysis, cell marker discovery, trajectory analysis, and cell communication. In addition, RNA velocity analysis can be performed with the help of mRNA spliced information, and cell/sample annotation can contribute to cell communication inference.
Conclusion and prospects
To better understand the identified DEGs, various methods and tools have been developed to place these findings within a broader biological context [211]. In this review, we systematically summarized and discussed the strategies for conducting a comprehensive analysis of DEGs, aiming to maximize the biological insights from transcriptome data. The DEG analysis methods discussed here can also be applied to proteomics data to a great extent. This suggests the possibility of integrating various omics datasets, starting from genes, to comprehensively understand biological processes. Specifically, we outline nine strategies for mining biological information from DEGs, with detailed descriptions of relevant tools in their respective sections. We also highlight the advantages and limitations of these methods. For reader convenience, we provide an online resource that consolidates all the mentioned databases or tools. This practical guide, along with the summarized tools and data available on the website, serves as a valuable reference for researchers without a strong background in bioinformatics.
Despite existing methodological and technological challenges, numerous opportunities are available to enhance our understanding of life activities by exploring the wealth of data information at hand. In future research examining the biological importance and relevance of DEGs using bioinformatics and computational biology approaches, it will be crucial to take into account the following key considerations: 1) Integrated omics profiling: Integrating diverse omics data types (e.g., genomics, transcriptomics, proteomics) provides a more comprehensive view of DEGs and their functional implications. Developing algorithms and tools for multi-omics data integration is crucial for gaining insights into complex biological processes. 2) Gene-phenotype association: Large cohorts are essential for human phenome studies, but limitations arise when exploring gene-phenotype connections. Future tools should address challenges posed by small sample sizes. 3) Machine learning and predictive modeling: Leveraging machine learning algorithms and predictive models helps identify patterns in DEG datasets, predict gene functions, and infer regulatory relationships. Customized machine learning methods are crucial for analyzing the complexity of biological data effectively. 4) Dimensionality reduction techniques: Techniques like principal component analysis or t-distributed stochastic neighbor embedding aid in visualizing and interpreting high-dimensional DEG datasets. These methods help identify key features and reduce noise in complex gene expression data, particularly useful for single-cell data analysis.
Ethics Statement.
No clinical trials and animal experiments were performed in this study.
CRediT authorship contribution statement
Huachun Yin: Investigation, Writing – original draft, Writing – review & editing, Visualization. Hongrui Duo: Web Construction, Writing – review & editing. Song Li: Supervision, Funding acquisition. Dan Qin: Writing – review & editing. Lingling Xie: Writing – review & editing. Yingxue Xiao: Writing – review & editing. Jing Sun: Writing – review & editing. Jingxin Tao: Web Construction. Xiaoxi Zhang: Writing – review & editing. Yinghong Li: Resources, Visualization. Yue Zou: Writing – review & editing, Data collection. Qingxia Yang: Web Construction. Xian Yang: Resources, Supervision. Youjin Hao: Writing – review & editing, Supervision, Funding acquisition. Bo Li: Conceptualization, Supervision, Funding acquisition, Project administration, Writing – review & editing.
Funding
This work was sponsored by Natural Science Foundation of Chongqing, China (No. CSTC2019JCYJ-MSXMX0527), Science and Technology Research Program of Chongqing Municipal Education Commission (No. KJQN202100538, KJQN202100642), National Natural Science Foundation of China (62101087), China Postdoctoral Science Foundation (2021MD703942), and Open Fund of Yunnan Key Laboratory of Plant Reproductive Adaptation and Evolutionary Ecology, Yunnan University (YNPRAEC-2023004) .
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Biographies

Huachun Yin holds Master’s Degree from Chongqing Normal University and is currently a Ph.D. student in the Department of Neurobiology at The Army Medical University. Her research interests lie in bioinformatics and intracranial tumors, focusing on exploring the pathogenesis of intracranial tumors using the technologies of bioinformatics.

Hongrui Duo is a Master’s student at College of Life Sciences, Chongqing Normal University. His research interests focus on the evaluation of computational tools and data analysis for single-cell and spatial transcriptomics.

Dr. Song Li received Ph.D. degree from Army Medical University and is currently an associate professor of neurosurgery at the Second Affiliated Hospital of Army Medical University. He is a member of the Chinese Society of Neuro-oncology, the China Anti-Cancer Association, and the China Pituitary Adenoma Specialist Council. Dr. Li's research focuses on the role of abnormal tumor metabolism in the development of pituitary tumors. As a project leader, he oversees six projects, including those funded by the National Natural Science Foundation of China. He has published over 15 SCI papers.

Dan Qin is a researcher at Genentech, specializing in immunology and single-cell data analysis. She holds a Master's Degree in Bioinformatics from Northeastern University. Her expertise lies in deciphering the immune system's complexities to inform the development of novel therapeutics. At Genentech, Dan applies her bioinformatics acumen to translate complex datasets into meaningful insights, contributing to the field with her research and publications. Her academic and professional endeavors reflect a dedication to advancing personalized medicine through the power of data science.

Lingling Xie, a postgraduate student in the College of Life Sciences of Chongqing Normal University, is a member of the Computational and Integrative Biology Research Group. The main aspect of her research is the benchmarking on the workflow of scRNA-seq data analysis, on the purpose of presenting a comprehensive quantitative summary of the current landscape of single-cell benchmarking studies.

Yingxue Xiao is a member of the Computational and Integrative Biology Research Group, College of Life Sciences, Chongqing Normal University. She works in the fields of single-cell Omics, with a focus on evaluating the bioinformatics methods used for differential abundance analysis. Miss Xiao actively engages in research within this field and aims to achieve significant results.

Jing Sun is currently a Master’s student at Chongqing Normal University, China. She graduated with a Bachelor of Science degree from Yancheng Normal University in 2023. Her research focuses on the field of bioinformatics, with an emphasis on single-cell data analysis and the use of cell deconvolution.

Jingxin Tao graduated from Chongqing Normal University with a Master’s Degree in Biochemistry and Molecular Biology. Currently, she is engaged in bioinformatics, single-cell and spatial genomics, and exploring tissue cell fate and related areas of research.

Xiaoxi Zhang is currently a Master’s student in the College of Life Sciences, Chongqing Normal University, China. Her research interest is the effects of non-nutritive sugars on reproduction and development of Drosophila melanogaster.

Dr. Yinghong Li, an Associate Professor, is a full-time faculty member at the School of Bioinformatics, Chongqing University of Posts and Telecommunications. He is also a member of the Chongqing Key Laboratory of Big Data for Bio Intelligence and serves on the council of the Chongqing Bioinformatics Society. His research focuses on artificial intelligence and drug discovery. Dr. Li has published over 20 papers in journals such as Nucleic Acids Research and Briefings in Bioinformatics.

Yue Zou is a member of the Computational and Integrative Biology Research Group, College of Life Sciences, Chongqing Normal University. She works in the fields of evolutionary biology and bioinformatics, with a focus on the minimal gene sets that reflect the evolutionary relationships of species. Yue Zou actively participates in this research and aims to achieve remarkable results in her field.

Dr. Qingxia Yang is the Research Fellow at the Woman's Hospital of Zhejiang University. Dr. Yang works in the fields of bioinformatics, computational biology and omics data analysis, including (1) mining transcriptomics, proteomics, metabolomics and other omics data based on AI algorithms; (2) developing intelligent analysis tools for bioinformatics and drug target discovery; and (3) conducting biomarker and drug target discovery for complex diseases. She is the author of more than 20 scientific articles published in peer-reviewed journals, such as Nucleic Acids Res, Brief Bioinform and Anal Chem.

Prof.Xian Yang is a faculty of Chongqing Normal University, and holds Ph.D. degree from Chongqing University of China. He serves as the National Science and Technology Commissioner for the 'Three Regions', a Science and Technology Commissioner for Chongqing Municipality, and a Review Expert for the National Youth Science and Technology Innovation Competition. Prof. Yang's research focuses on the molecular biology of natural medicines, and he has published over 40 academic papers internationally.

Prof. Youjin Hao is a faculty of College of Life Sciences, Chongqing Normal University and currently focus on exploring molecular mechanisms of food additives on lifespan of Drosophila melanogaster through mining Omics data. So far Prof. Hao has published 90 original scientific papers in peer-reviewed journals. Aside from an active writer, Prof. Hao has reviewed more than 60 manuscripts for different journals, such as Briefings in Bioinformatics, Scientific Reports, Computers in Biology and Medicine.

Dr. Bo Li is an Associated Professor at College of Life Sciences, Chongqing Normal University, China. With nearly 20 years of experience teaching bioinformatics, he also conducts research in multi-omics and computational biology, focusing on elucidating the molecular mechanisms underlying complex diseases. Dr. Li serves as the director of the Chongqing Society of Bioinformatics and has published 60 academic papers in prestigious journals such as Genome Biology, Nucleic Acids Research, Trends in Food Science & Technology, Bioinformatics, Briefings in Bioinformatics, and Molecular & Cellular Proteomics, accumulating over 2,000 citations. He supervised and critically reviewed this manuscript.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jare.2024.12.004.
Contributor Information
Youjin Hao, Email: haoyoujin@hotmail.com.
Bo Li, Email: libcell@cqnu.edu.cn.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
References
- 1.Porcu E., Sadler M.C., Lepik K., Auwerx C., Wood A.R., Weihs A., et al. Differentially expressed genes reflect disease-induced rather than disease-causing changes in the transcriptome. Nat Commun. 2021;12(1):5647. doi: 10.1038/s41467-021-25805-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Cheng J., Wei D., Ji Y., Chen L., Yang L., Li G., et al. Integrative analysis of DNA methylation and gene expression reveals hepatocellular carcinoma-specific diagnostic biomarkers. Genome Med. 2018;10(1):1–11. doi: 10.1186/s13073-018-0548-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Yu S., Li Y., Liao Z., Wang Z., Wang Z., Li Y., et al. Plasma extracellular vesicle long RNA profiling identifies a diagnostic signature for the detection of pancreatic ductal adenocarcinoma. Gut. 2020;69(3):540–550. doi: 10.1136/gutjnl-2019-318860. [DOI] [PubMed] [Google Scholar]
- 4.Wang M., Roussos P., McKenzie A., Zhou X., Kajiwara Y., Brennand K.J., et al. Integrative network analysis of nineteen brain regions identifies molecular signatures and networks underlying selective regional vulnerability to Alzheimer's disease. Genome Med. 2016;8(1):1–21. doi: 10.1186/s13073-016-0355-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Guo M., Cui C., Song X., Jia L., Li D., Wang X., et al. Deletion of FGF9 in GABAergic neurons causes epilepsy. Cell Death Dis. 2021;12(2):196. doi: 10.1038/s41419-021-03478-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sadegh S., Matschinske J., Blumenthal D.B., Galindez G., Kacprowski T., List M., et al. Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing. Nat Commun. 2020;11(1):3518. doi: 10.1038/s41467-020-17189-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yuan J., Chang S.Y., Yin S.G., Liu Z.Y., Cheng X., Liu X.J., et al. Two conserved epigenetic regulators prevent healthy ageing. Nature. 2020;579(7797):118–122. doi: 10.1038/s41586-020-2037-y. [DOI] [PubMed] [Google Scholar]
- 8.Kour S., Rajan D.S., Fortuna T.R., Anderson E.N., Ward C., Lee Y., et al. Loss of function mutations in GEMIN5 cause a neurodevelopmental disorder. Nat Commun. 2021;12(1):2558. doi: 10.1038/s41467-021-22627-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Huntley R.P., Sawford T., Mutowo-Meullenet P., Shypitsyna A., Bonilla C., Martin M.J., et al. The GOA database: gene Ontology annotation updates for 2015. Nucleic Acids Res. 2015;43(D1):D1057–D1063. doi: 10.1093/nar/gku1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Tian L., Greenberg S.A., Kong S.W., Altschuler J., Kohane I.S., Park P.J. Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA. 2005;102(38):13544–13549. doi: 10.1073/pnas.0506577102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Reimand J., Isserlin R., Voisin V., Kucera M., Tannus-Lopes C., Rostamianfar A., et al. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA. Cytoscape and EnrichmentMap Nat Protoc. 2019;14(2):482–517. doi: 10.1038/s41596-018-0103-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ogris C., Guala D., Helleday T., Sonnhammer E.L. A novel method for crosstalk analysis of biological networks: improving accuracy of pathway annotation. Nucleic Acids Res. 2017;45(2):e8. doi: 10.1093/nar/gkw849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Anastasiadou E., Jacob L.S., Slack F.J. Non-coding RNA networks in cancer. Nat Rev Cancer. 2018;18(1):5–18. doi: 10.1038/nrc.2017.99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Shen X., Wang R., Xiong X., Yin Y., Cai Y., Ma Z., et al. Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics. Nat Commun. 2019;10(1):1516. doi: 10.1038/s41467-019-09550-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Panir K., Schjenken J.E., Robertson S.A., Hull M.L. Non-coding RNAs in endometriosis: a narrative review. Hum Reprod Update. 2018;24(4):497–515. doi: 10.1093/humupd/dmy014. [DOI] [PubMed] [Google Scholar]
- 16.Mariani J., Coppola G., Zhang P., Abyzov A., Provini L., Tomasini L., et al. FOXG1-dependent dysregulation of GABA/glutamate neuron differentiation in Autism spectrum disorders. Cell. 2015;162(2):375–390. doi: 10.1016/j.cell.2015.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dimopoulos M.A., Goldschmidt H., Niesvizky R., Joshua D., Chng W.J., Oriol A., et al. Carfilzomib or bortezomib in relapsed or refractory multiple myeloma (ENDEAVOR): an interim overall survival analysis of an open-label, randomised, phase 3 trial. Lancet Oncol. 2017;18(10):1327–1337. doi: 10.1016/S1470-2045(17)30578-8. [DOI] [PubMed] [Google Scholar]
- 18.Mohamed S.K., Nounu A., Novacek V. Biological applications of knowledge graph embedding models. Brief Bioinform. 2020;22(2):1679–1693. doi: 10.1093/bib/bbaa012. [DOI] [PubMed] [Google Scholar]
- 19.Luo Y., Zhao X., Zhou J., Yang J., Zhang Y., Kuang W., et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573. doi: 10.1038/s41467-017-00680-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zhang D., Hu Q., Liu X., Zou K., Sarkodie E.K., Liu X., et al. AllEnricher: a comprehensive gene set function enrichment tool for both model and non-model species. BMC Bioinf. 2020;21(1):106. doi: 10.1186/s12859-020-3408-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Durinck S., Spellman P.T., Birney E., Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc. 2009;4(8):1184–1191. doi: 10.1038/nprot.2009.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Altenhoff A.M., Warwick Vesztrocy A., Bernard C., Train C.-M., Nicheperovich A., Prieto Baños S., et al. OMA orthology in 2024: improved prokaryote coverage, ancestral and extant GO enrichment, a revamped synteny viewer and more in the OMA Ecosystem. Nucleic Acids Res. 2024;52(D1):D513–D521. doi: 10.1093/nar/gkad1020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kolberg L., Raudvere U., Kuzmin I., Vilo J., Peterson H. gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler. F1000Res. 2020;9 doi: 10.12688/f1000research.24956.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Huerta-Cepas J., Szklarczyk D., Heller D., Hernandez-Plaza A., Forslund S.K., Cook H., et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47(D1):D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Xu L., Dong Z., Fang L., Luo Y., Wei Z., Guo H., et al. OrthoVenn2: a web server for whole-genome comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res. 2019;47(W1):W52–W58. doi: 10.1093/nar/gkz333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zdobnov E.M., Kuznetsov D., Tegenfeldt F., Manni M., Berkeley M., Kriventseva E.V. OrthoDB in 2020: evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2021;49(D1):D389–D393. doi: 10.1093/nar/gkaa1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zielezinski A., Dziubek M., Sliski J., Karlowski W.M. ORCAN-a web-based meta-server for real-time detection and functional annotation of orthologs. Bioinformatics. 2017;33(8):1224–1226. doi: 10.1093/bioinformatics/btw825. [DOI] [PubMed] [Google Scholar]
- 28.The Gene Ontology C. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res 2019;47(D1):D330-D338. doi: 10.1093/nar/gky1055. [DOI] [PMC free article] [PubMed]
- 29.Ye J., Zhang Y., Cui H., Liu J., Wu Y., Cheng Y., et al. WEGO 2.0: a web tool for analyzing and plotting GO annotations, 2018 update. Nucleic Acids Res. 2018;46(W1):W71–W75. doi: 10.1093/nar/gky400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Reimand J., Arak T., Adler P., Kolberg L., Reisberg S., Peterson H., et al. g:Profiler-a web server for functional interpretation of gene lists (2016 update) Nucleic Acids Res. 2016;44(W1):W83–W89. doi: 10.1093/nar/gkw199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tian T., Liu Y., Yan H., You Q., Yi X., Du Z., et al. agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update. Nucleic Acids Res. 2017;45(W1):W122–W129. doi: 10.1093/nar/gkx382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Götz S., García-Gómez J.M., Terol J., Williams T.D., Nagaraj S.H., Nueda M.J., et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008;36(10):3420–3435. doi: 10.1093/nar/gkn176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rahmati S., Abovsky M., Pastrello C., Kotlyar M., Lu R., Cumbaa C.A., et al. pathDIP 4: an extended pathway annotations and enrichment analysis resource for human, model organisms and domesticated species. Nucleic Acids Res. 2020;48(D1):D479–D488. doi: 10.1093/nar/gkz989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Schriml L.M., Munro J.B., Schor M., Olley D., McCracken C., Felix V., et al. The human disease ontology 2022 update. Nucleic Acids Res. 2022;50(D1):D1255–D1261. doi: 10.1093/nar/gkab1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Huang da W., Sherman B.T., Lempicki R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 37.Chen J., Bardes E.E., Aronow B.J., Jegga A.G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 2009;37(suppl_2):W305–W311. doi: 10.1093/nar/gkp427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhao K., Rhee S.Y. Interpreting omics data with pathway enrichment analysis. Trends Genet. 2023;39(4):308–319. doi: 10.1016/j.tig.2023.01.003. [DOI] [PubMed] [Google Scholar]
- 39.Geistlinger L., Csaba G., Kuffner R., Mulder N., Zimmer R. From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems. Bioinformatics. 2011;27(13):366–373. doi: 10.1093/bioinformatics/btr228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Draghici S., Khatri P., Tarca A.L., Amin K., Done A., Voichita C., et al. A systems biology approach for pathway level analysis. Genome Res. 2007;17(10):1537–1545. doi: 10.1101/gr.6202607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tarca A.L., Draghici S., Khatri P., Hassan S.S., Mittal P., Kim J.S., et al. A novel signaling pathway impact analysis. Bioinformatics. 2009;25(1):75–82. doi: 10.1093/bioinformatics/btn577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Barabasi A.L., Gulbahce N., Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56–68. doi: 10.1038/nrg2918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ravasi T., Suzuki H., Cannistraci C.V., Katayama S., Bajic V.B., Tan K., et al. An atlas of combinatorial transcriptional regulation in mouse and man. Cell. 2010;140(5):744–752. doi: 10.1016/j.cell.2010.01.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Buda K., Miton C.M., Fan X.C., Tokuriki N. Molecular determinants of protein evolvability. Trends Biochem Sci. 2023;48(9):751–760. doi: 10.1016/j.tibs.2023.05.009. [DOI] [PubMed] [Google Scholar]
- 45.Materna S.C., Oliveri P. A protocol for unraveling gene regulatory networks. Nat Protoc. 2008;3(12):1876–1887. doi: 10.1038/nprot.2008.187. [DOI] [PubMed] [Google Scholar]
- 46.Davidson E.H. Emerging properties of animal gene regulatory networks. Nature. 2010;468(7326):911–920. doi: 10.1038/nature09645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Aalto A., Viitasaari L., Ilmonen P., Mombaerts L., Goncalves J. Gene regulatory network inference from sparsely sampled noisy data. Nat Commun. 2020;11(1):3493. doi: 10.1038/s41467-020-17217-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Orchard S., Ammari M., Aranda B., Breuza L., Briganti L., Broackes-Carter F., et al. The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014;42(D1):D358–D363. doi: 10.1093/nar/gkt1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Licata L., Lo Surdo P., Iannuccelli M., Palma A., Micarelli E., Perfetto L., et al. SIGNOR 2.0, the SIGnaling network open resource 2.0: 2019 update. Nucleic Acids Res. 2020;48(D1):D504–D510. doi: 10.1093/nar/gkz949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Bader G.D., Betel D., Hogue C.W. BIND: the biomolecular interaction network database. Nucleic Acids Res. 2003;31(1):248–250. doi: 10.1093/nar/gkg056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Schaefer C.F., Anthony K., Krupa S., Buchoff J., Day M., Hannay T., et al. PID: the Pathway Interaction Database. Nucleic Acids Res. 2009;37(D1):D674–D679. doi: 10.1093/nar/gkn653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Paz A., Brownstein Z., Ber Y., Bialik S., David E., Sagir D., et al. SPIKE: a database of highly curated human signaling pathways. Nucleic Acids Res. 2011;39(D1):D793–D799. doi: 10.1093/nar/gkq1167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Breuer K., Foroushani A.K., Laird M.R., Chen C., Sribnaia A., Lo R., et al. InnateDB: systems biology of innate immunity and beyond–recent updates and continuing curation. Nucleic Acids Res. 2013;41(D1):D1228–D1233. doi: 10.1093/nar/gks1147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Pratt D., Chen J., Welker D., Rivas R., Pillich R., Rynkov V., et al. NDEx, the network data exchange. Cell Syst. 2015;1(4):302–305. doi: 10.1016/j.cels.2015.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Lopes-Ramos C.M., Chen C.Y., Kuijjer M.L., Paulson J.N., Sonawane A.R., Fagny M., et al. Sex differences in gene expression and regulatory networks across 29 human tissues. Cell Rep. 2020;31(12) doi: 10.1016/j.celrep.2020.107795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Rauch A., Haakonsson A.K., Madsen J.G.S., Larsen M., Forss I., Madsen M.R., et al. Osteogenesis depends on commissioning of a network of stem cell transcription factors that act as repressors of adipogenesis. Nat Genet. 2019;51(4):716–727. doi: 10.1038/s41588-019-0359-1. [DOI] [PubMed] [Google Scholar]
- 57.Hu H., Miao Y.R., Jia L.H., Yu Q.Y., Zhang Q., Guo A.Y. AnimalTFDB 3.0: a comprehensive resource for annotation and prediction of animal transcription factors. Nucleic Acids Res. 2019;47(D1):D33–D38. doi: 10.1093/nar/gky822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Fornes O., Castro-Mondragon J.A., Khan A., van der Lee R., Zhang X., Richmond P.A., et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2020;48(D1):D87–D92. doi: 10.1093/nar/gkz1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Davis C.A., Hitz B.C., Sloan C.A., Chan E.T., Davidson J.M., Gabdank I., et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 2018;46(D1):D794–D801. doi: 10.1093/nar/gkx1081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lescot M., Dehais P., Thijs G., Marchal K., Moreau Y., Van de Peer Y., et al. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res. 2002;30(1):325–327. doi: 10.1093/nar/30.1.325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Keenan A.B., Torre D., Lachmann A., Leong A.K., Wojciechowicz M.L., Utti V., et al. ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Res. 2019;47(W1):W212–W224. doi: 10.1093/nar/gkz446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Liu Z.P., Wu C., Miao H., Wu H. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database (Oxford) 2015;2015:1–12. doi: 10.1093/database/bav095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Slack F.J., Chinnaiyan A.M. The role of Non-coding RNAs in oncology. Cell. 2019;179(5):1033–1055. doi: 10.1016/j.cell.2019.10.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Wilczynska A., Bushell M. The complexity of miRNA-mediated repression. Cell Death Differ. 2015;22(1):22–33. doi: 10.1038/cdd.2014.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Gebert L.F.R., MacRae I.J. Regulation of microRNA function in animals. Nat Rev Mol Cell Biol. 2019;20(1):21–37. doi: 10.1038/s41580-018-0045-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Bracken C.P., Scott H.S., Goodall G.J. A network-biology perspective of microRNA function and dysfunction in cancer. Nat Rev Genet. 2016;17(12):719–732. doi: 10.1038/nrg.2016.134. [DOI] [PubMed] [Google Scholar]
- 67.Ghini F., Rubolino C., Climent M., Simeone I., Marzi M.J., Nicassio F. Endogenous transcripts control miRNA levels and activity in mammalian cells by target-directed miRNA degradation. Nat Commun. 2018;9(1):3119. doi: 10.1038/s41467-018-05182-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Beer M.A., Shigaki D., Huangfu D. Enhancer predictions and genome-wide regulatory circuits. Annu Rev Genomics Hum Genet. 2020;21:37–54. doi: 10.1146/annurev-genom-121719-010946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Karagkouni D., Paraskevopoulou M.D., Chatzopoulos S., Vlachos I.S., Tastsoglou S., Kanellos I., et al. DIANA-TarBase v8: a decade-long collection of experimentally supported miRNA-gene interactions. Nucleic Acids Res. 2018;46(D1):D239–D245. doi: 10.1093/nar/gkx1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Agarwal V., Bell G.W., Nam J.W., Bartel D.P. Predicting effective microRNA target sites in mammalian mRNAs. Elife. 2015;4 doi: 10.7554/eLife.05005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Chen Y., Wang X. miRDB: an online database for prediction of functional microRNA targets. Nucleic Acids Res. 2020;48(D1):D127–D131. doi: 10.1093/nar/gkz757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Bandyopadhyay S., Mitra R. TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples. Bioinformatics. 2009;25(20):2625–2631. doi: 10.1093/bioinformatics/btp503. [DOI] [PubMed] [Google Scholar]
- 73.Nejadi Orang F., Abdoli S.M. Competing endogenous RNA networks and ferroptosis in cancer: novel therapeutic targets. Cell Death Dis. 2024;15(5):357. doi: 10.1038/s41419-024-06732-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Salmena L., Poliseno L., Tay Y., Kats L., Pandolfi PP. A ceRNA hypothesis: the rosetta stone of a hidden RNA language? Cell. 2011;146(3):353–358. doi: 10.1016/j.cell.2011.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Tay Y., Rinn J., Pandolfi PP. The multilayered complexity of ceRNA crosstalk and competition. Nature. 2014;505(7483):344–352. doi: 10.1038/nature12986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Sumazin P., Yang X., Chiu H.S., Chung W.J., Iyer A., Llobet-Navas D., et al. An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma. Cell. 2011;147(2):370–381. doi: 10.1016/j.cell.2011.09.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Wang P., Li X., Gao Y., Guo Q., Wang Y., Fang Y., et al. LncACTdb 2.0: an updated database of experimentally supported ceRNA interactions curated from low- and high-throughput experiments. Nucleic Acids Res. 2019;47(D1):D121–D127. doi: 10.1093/nar/gky1144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Karagkouni D., Paraskevopoulou M.D., Tastsoglou S., Skoufos G., Karavangeli A., Pierros V., et al. DIANA-LncBase v3: indexing experimentally supported miRNA targets on non-coding transcripts. Nucleic Acids Res. 2020;48(D1):D101–D110. doi: 10.1093/nar/gkz1036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Li J.H., Liu S., Zhou H., Qu L.H., Yang J.H. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 2014;42(D1):D92–D97. doi: 10.1093/nar/gkt1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Wang P., Zhi H., Zhang Y., Liu Y., Zhang J., Gao Y., et al. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs. Database (Oxford) 2015;2015:1–7. doi: 10.1093/database/bav098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Jeggari A., Marks D.S., Larsson E. miRcode: a map of putative microRNA target sites in the long non-coding transcriptome. Bioinformatics. 2012;28(15):2062–2063. doi: 10.1093/bioinformatics/bts344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Sarver A.L., Subramanian S. Competing endogenous RNA database. Bioinformation. 2012;8(15):731–733. doi: 10.6026/97320630008731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Das S., Ghosal S., Sen R., Chakrabarti J. lnCeDB: database of human long noncoding RNA acting as competing endogenous RNA. PLoS One. 2014;9(6) doi: 10.1371/journal.pone.0098965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Hunter T. Signaling–2000 and beyond. Cell. 2000;100(1):113–127. doi: 10.1016/s0092-8674(00)81688-8. [DOI] [PubMed] [Google Scholar]
- 85.Procaccini C., Carbone F., Di Silvestre D., Brambilla F., De Rosa V., Galgani M., et al. The proteomic landscape of human ex vivo regulatory and conventional T cells reveals specific metabolic requirements. Immunity. 2016;44(2):406–421. doi: 10.1016/j.immuni.2016.01.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Cho H., Berger B., Peng J. Compact integration of multi-network topology for functional analysis of genes. Cell Syst. 2016;3(6):540–548. doi: 10.1016/j.cels.2016.10.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Gustafsson M., Nestor C.E., Zhang H., Barabasi A.L., Baranzini S., Brunak S., et al. Modules, networks and systems medicine for understanding disease and aiding diagnosis. Genome Med. 2014;6(10):1–11. doi: 10.1186/s13073-014-0082-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Zhu L., Malatras A., Thorley M., Aghoghogbe I., Mer A., Duguez S., et al. CellWhere: graphical display of interaction networks organized on subcellular localizations. Nucleic Acids Res. 2015;43(W1):W571–W575. doi: 10.1093/nar/gkv354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Liu X., Salokas K., Tamene F., Jiu Y., Weldatsadik R.G., Ohman T., et al. An AP-MS- and BioID-compatible MAC-tag enables comprehensive mapping of protein interactions and subcellular localizations. Nat Commun. 2018;9(1):1188. doi: 10.1038/s41467-018-03523-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Cheng F., Lu W., Liu C., Fang J., Hou Y., Handy D.E., et al. A genome-wide positioning systems network algorithm for in silico drug repurposing. Nat Commun. 2019;10(1):3476. doi: 10.1038/s41467-019-10744-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Escala-Garcia M., Abraham J., Andrulis I.L., Anton-Culver H., Arndt V., Ashworth A., et al. A network analysis to identify mediators of germline-driven differences in breast cancer prognosis. Nat Commun. 2020;11(1):312. doi: 10.1038/s41467-019-14100-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Orchard S., Kerrien S., Abbani S., Aranda B., Bhate J., Bidwell S., et al. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat Methods. 2012;9(4):345–350. doi: 10.1038/nmeth.1931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Luck K., Kim D.K., Lambourne L., Spirohn K., Begg B.E., Bian W., et al. A reference map of the human binary protein interactome. Nature. 2020;580(7803):402–408. doi: 10.1038/s41586-020-2188-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Keshava Prasad T.S., Goel R., Kandasamy K., Keerthikumar S., Kumar S., Mathivanan S., et al. Human Protein Reference Database–2009 update. Nucleic Acids Res. 2009;37(D1):D767–D772. doi: 10.1093/nar/gkn892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Oughtred R., Stark C., Breitkreutz B.J., Rust J., Boucher L., Chang C., et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 2019;47(D1):D529–D541. doi: 10.1093/nar/gky1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Li T., Wernersson R., Hansen R.B., Horn H., Mercer J., Slodkowicz G., et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat Methods. 2017;14(1):61–64. doi: 10.1038/nmeth.4083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Kim D.E., Chivian D., Baker D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 2004;32(W1):W526-W531 doi: 10.1093/nar/gkh468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Du X., Sun S., Hu C., Yao Y., Yan Y., Zhang Y. DeepPPI: Boosting Prediction of Protein-Protein Interactions with Deep Neural Networks. J Chem Inf Model. 2017;57(6):1499–1510. doi: 10.1021/acs.jcim.7b00028. [DOI] [PubMed] [Google Scholar]
- 100.Lee I., Keum J., Nam H. DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput Biol. 2019;15(6) doi: 10.1371/journal.pcbi.1007129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Hashemifar S., Neyshabur B., Khan A.A., Xu J. Predicting protein-protein interactions through sequence-based deep learning. Bioinformatics. 2018;34(17):i802–i810. doi: 10.1093/bioinformatics/bty573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Nie X., Qin D., Zhou X., Duo H., Hao Y., Li B., et al. Clustering ensemble in scRNA-seq data analysis: methods, applications and challenges. Comput Biol Med. 2023;159 doi: 10.1016/j.compbiomed.2023.106939. [DOI] [PubMed] [Google Scholar]
- 103.Choobdar S., Ahsen M.E., Crawford J., Tomasoni M., Fang T., Lamparter D., et al. Assessment of network module identification across complex diseases. Nat Methods. 2019;16(9):843–852. doi: 10.1038/s41592-019-0509-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Tornow S., Mewes H.W. Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res. 2003;31(21):6283–6289. doi: 10.1093/nar/gkg838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Ahn Y.Y., Bagrow J.P., Lehmann S. Link communities reveal multiscale complexity in networks. Nature. 2010;466(7307):761–764. doi: 10.1038/nature09182. [DOI] [PubMed] [Google Scholar]
- 106.Bader G.D., Hogue C.W. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinf. 2003;4:2. doi: 10.1186/1471-2105-4-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Johnson J.S., De Veaux N., Rives A.W., Lahaye X., Lucas S.Y., Perot B.P., et al. A Comprehensive Map of the Monocyte-Derived Dendritic Cell Transcriptional Network Engaged upon Innate Sensing of HIV. Cell Rep. 2020;30(3):914–931. doi: 10.1016/j.celrep.2019.12.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Thompson D., Regev A., Roy S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu Rev Cell Dev Biol. 2015;31:399–428. doi: 10.1146/annurev-cellbio-100913-012908. [DOI] [PubMed] [Google Scholar]
- 109.Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Roy S., Bhattacharyya D.K., Kalita J.K. Reconstruction of gene co-expression network from microarray data using local expression patterns. BMC Bioinf. 2014;15(Suppl 7):S10. doi: 10.1186/1471-2105-15-S7-S10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Petereit J., Smith S., Harris F.C., Jr., Schlauch K.A. petal: co-expression network modelling in R. BMC Syst Biol. 2016;10(Suppl 2):51. doi: 10.1186/s12918-016-0298-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Russo P.S.T., Ferreira G.R., Cardozo L.E., Burger M.C., Arias-Carrasco R., Maruyama S.R., et al. CEMiTool: a Bioconductor package for performing comprehensive modular co-expression analyses. BMC Bioinf. 2018;19(1):56. doi: 10.1186/s12859-018-2053-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Liang C., Li Y., Luo J., Zhang Z. A novel motif-discovery algorithm to identify co-regulatory motifs in large transcription factor and microRNA co-regulatory networks in human. Bioinformatics. 2015;31(14):2348–2355. doi: 10.1093/bioinformatics/btv159. [DOI] [PubMed] [Google Scholar]
- 114.Callahan T.J., Tripodi I.J., Pielke-Lombardo H., Hunter L.E. Knowledge-based biomedical data science. Annu Rev Biomed Data Sci. 2020;3:23–41. doi: 10.1146/annurev-biodatasci-010820-091627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Fei H., Ren Y., Zhang Y., Ji D., Liang X. Enriching contextualized language model from knowledge graph for biomedical information extraction. Brief Bioinform. 2021;22(3) doi: 10.1093/bib/bbaa110. [DOI] [PubMed] [Google Scholar]
- 116.Ye Q., Hsieh C.Y., Yang Z., Kang Y., Chen J., Cao D., et al. A unified drug-target interaction prediction framework based on knowledge graph and recommendation system. Nat Commun. 2021;12(1):6775. doi: 10.1038/s41467-021-27137-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Wei C.H., Allot A., Leaman R., Lu Z. PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res. 2019;47(W1):W587–W593. doi: 10.1093/nar/gkz389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Lee S., Kim D., Lee K., Choi J., Kim S., Jeon M., et al. BEST: next-generation biomedical entity search tool for knowledge discovery from biomedical literature. PLoS One. 2016;11(10) doi: 10.1371/journal.pone.0164680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Chen X., Jia S., Xiang Y. A review: Knowledge reasoning over knowledge graph. Expert Syst Appl. 2020;141 doi: 10.1016/j.eswa.2019.112948. [DOI] [Google Scholar]
- 120.Dogan T., Atas H., Joshi V., Atakan A., Rifaioglu A.S., Nalbat E., et al. CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations. Nucleic Acids Res. 2021;49(16):e96. doi: 10.1093/nar/gkab543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Messina A., Fiannaca A., La Paglia L., La Rosa M., Urso A. BioGraph: a web application and a graph database for querying and analyzing bioinformatics resources. BMC Syst Biol. 2018;12(Suppl 5):98. doi: 10.1186/s12918-018-0616-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Yang H., Robinson P.N., Wang K. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat Methods. 2015;12(9):841–843. doi: 10.1038/nmeth.3484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.He J.P., Zhao M., Zhang W.Q., Huang M.Y., Zhu C., Cheng H.Z., et al. Identification of gene expression changes associated with uterine receptivity in mice. Front Physiol. 2019;10:125. doi: 10.3389/fphys.2019.00125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Sanders D.A., Ross-Innes C.S., Beraldi D., Carroll J.S., Balasubramanian S. Genome-wide mapping of FOXM1 binding reveals co-binding with estrogen receptor alpha in breast cancer cells. Genome Biol. 2013;14(1):R6. doi: 10.1186/gb-2013-14-1-r6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Sonawane A.R., Platig J., Fagny M., Chen C.Y., Paulson J.N., Lopes-Ramos C.M., et al. Understanding tissue-specific gene regulation. Cell Rep. 2017;21(4):1077–1088. doi: 10.1016/j.celrep.2017.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Oesper L., Merico D., Isserlin R., Bader G.D. WordCloud: a Cytoscape plugin to create a visual semantic summary of networks. Source Code Biol Med. 2011;6(1):7. doi: 10.1186/1751-0473-6-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.He H., Duo H., Hao Y., Zhang X., Zhou X., Zeng Y., et al. Computational drug repurposing by exploiting large-scale gene expression data: strategy, methods and applications. Comput Biol Med. 2023;155 doi: 10.1016/j.compbiomed.2023.106671. [DOI] [PubMed] [Google Scholar]
- 128.Nie X., Wei J., Hao Y., Tao J., Li Y., Liu M., et al. Consistent biomarkers and related pathogenesis underlying asthma revealed by systems biology approach. Int J Mol Sci. 2019;20(16):4037. doi: 10.3390/ijms20164037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Hassane D.C., Guzman M.L., Corbett C., Li X., Abboud R., Young F., et al. Discovery of agents that eradicate leukemia stem cells using an in silico screen of public gene expression data. Blood. 2008;111(12):5654–5662. doi: 10.1182/blood-2007-11-126003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Zhong Y., Chen E.Y., Liu R., Chuang P.Y., Mallipattu S.K., Tan C.M., et al. Renoprotective effect of combined inhibition of angiotensin-converting enzyme and histone deacetylase. J Am Soc Nephrol. 2013;24(5):801–811. doi: 10.1681/ASN.2012060590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Lamb J., Crawford E.D., Peck D., Modell J.W., Blat I.C., Wrobel M.J., et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929–1935. doi: 10.1126/science.1132939. [DOI] [PubMed] [Google Scholar]
- 132.Chen B., Ma L., Paik H., Sirota M., Wei W., Chua M.S., et al. Reversal of cancer gene expression correlates with drug efficacy and reveals therapeutic targets. Nat Commun. 2017;8(1):16022. doi: 10.1038/ncomms16022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Ballard C., Aarsland D., Cummings J., O'Brien J., Mills R., Molinuevo J.L., et al. Drug repositioning and repurposing for Alzheimer disease. Nat Rev Neurol. 2020;16(12):661–673. doi: 10.1038/s41582-020-0397-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Keenan A.B., Jenkins S.L., Jagodnik K.M., Koplev S., He E., Torre D., et al. The library of integrated network-based cellular signatures NIH program: aystem-level cataloging of human cells response to perturbations. Cell Syst. 2018;6(1):13–24. doi: 10.1016/j.cels.2017.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Mill C.P., Fiskus W., DiNardo C.D., Qian Y., Raina K., Rajapakshe K., et al. RUNX1-targeted therapy for AML expressing somatic or germline mutation in RUNX1. Blood. 2019;134(1):59–73. doi: 10.1182/blood.2018893982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Yoo M., Shin J., Kim J., Ryall K.A., Lee K., Lee S., et al. DSigDB: drug signatures database for gene set analysis. Bioinformatics. 2015;31(18):3069–3071. doi: 10.1093/bioinformatics/btv313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Barbarino J.M., Whirl-Carrillo M., Altman R.B., Klein T.E. PharmGKB: a worldwide resource for pharmacogenomic information. Wiley Interdiscip Rev Syst Biol Med. 2018;10(4) doi: 10.1002/wsbm.1417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138.Ru J., Li P., Wang J., Zhou W., Li B., Huang C., et al. TCMSP: a database of systems pharmacology for drug discovery from herbal medicines. J Cheminform. 2014;6:13. doi: 10.1186/1758-2946-6-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Zhou Y., Zhang Y., Zhao D., Yu X., Shen X., Zhou Y., et al. TTD: Therapeutic Target Database describing target druggability information. Nucleic Acids Res. 2024;52(D1):D1465–D1477. doi: 10.1093/nar/gkad751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140.Daina A., Michielin O., Zoete V. SwissTargetPrediction: updated data and new features for efficient prediction of protein targets of small molecules. Nucleic Acids Res. 2019;47(W1):W357–W364. doi: 10.1093/nar/gkz382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Stelzer G., Rosen N., Plaschkes I., Zimmerman S., Twik M., Fishilevich S., et al. The genecards suite: from gene data mining to disease genome sequence analyses. Curr Protoc Bioinformatics. 2016;54:1.30.1–1.30.33. doi: 10.1002/cpbi.5. [DOI] [PubMed] [Google Scholar]
- 142.Knox C., Wilson M., Klinger C.M., Franklin M., Oler E., Wilson A., et al. DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res. 2024;52(D1):D1265–D1275. doi: 10.1093/nar/gkad976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143.Wang X., Shen Y., Wang S., Li S., Zhang W., Liu X., et al. PharmMapper 2017 update: a web server for potential drug target identification with a comprehensive target pharmacophore database. Nucleic Acids Res. 2017;45(W1):W356–W360. doi: 10.1093/nar/gkx374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.Yue S.J., Xin L.T., Fan Y.C., Li S.J., Tang Y.P., Duan J.A., et al. Herb pair Danggui-Honghua: mechanisms underlying blood stasis syndrome by system pharmacology approach. Sci Rep. 2017;7:40318. doi: 10.1038/srep40318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Liang X., Li H., Li S. A novel network pharmacology approach to analyse traditional herbal formulae: the Liu-Wei-Di-Huang pill as a case study. Mol Biosyst. 2014;10(5):1014–1022. doi: 10.1039/c3mb70507b. [DOI] [PubMed] [Google Scholar]
- 146.Liu B., Zhang J., Shao L., Yao J. Network pharmacology analysis and molecular docking to unveil the potential mechanisms of San-Huang-Chai-Zhu formula treating cholestasis. PLoS One. 2022;17(2) doi: 10.1371/journal.pone.0264398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147.Huang H., Xu J., Zhang S., Zhao J., Liu S., Tian L., et al. A network pharmacology-based approach to explore the active ingredients and molecular mechanism of Shen-Kui-Tong-Mai granules on a rat model with chronic heart failure. J Pharm Pharmacol. 2023;75(6):764–783. doi: 10.1093/jpp/rgad009. [DOI] [PubMed] [Google Scholar]
- 148.Qu J., Yang F., Zhu T., Wang Y., Fang W., Ding Y., et al. A reference single-cell regulomic and transcriptomic map of cynomolgus monkeys. Nat Commun. 2022;13(1):4069. doi: 10.1038/s41467-022-31770-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Schinke H., Shi E., Lin Z., Quadt T., Kranz G., Zhou J., et al. A transcriptomic map of EGFR-induced epithelial-to-mesenchymal transition identifies prognostic and therapeutic targets for head and neck cancer. Mol Cancer. 2022;21(1):178. doi: 10.1186/s12943-022-01646-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Greene C.S., Krishnan A., Wong A.K., Ricciotti E., Zelaya R.A., Himmelstein D.S., et al. Understanding multicellular function and disease with human tissue-specific networks. Nat Genet. 2015;47(6):569–576. doi: 10.1038/ng.3259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Lu J., Xu Q., Ji M., Guo X., Xu X., Fargo D.C., et al. The phosphorylation status of T522 modulates tissue-specific functions of SIRT1 in energy metabolism in mice. EMBO Rep. 2017;18(5):841–857. doi: 10.15252/embr.201643803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Bindu S., Pillai V.B., Gupta M.P. Role of sirtuins in regulating pathophysiology of the heart. Trends Endocrinol Metab. 2016;27(8):563–573. doi: 10.1016/j.tem.2016.04.015. [DOI] [PubMed] [Google Scholar]
- 153.eGTEx Project Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nat Genet. 2017;49(12):1664–1670. doi: 10.1038/ng.3969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154.Petryszak R., Burdett T., Fiorelli B., Fonseca N.A., Gonzalez-Porta M., Hastings E., et al. Expression Atlas update–a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments. Nucleic Acids Res. 2014;42(D1):D926–D932. doi: 10.1093/nar/gkt1270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 155.Kim M.S., Pinto S.M., Getnet D., Nirujogi R.S., Manda S.S., Chaerkady R., et al. A draft map of the human proteome. Nature. 2014;509(7502):575–581. doi: 10.1038/nature13302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Krupp M., Marquardt J.U., Sahin U., Galle P.R., Castle J., Teufel A. RNA-Seq Atlas–a reference database for gene expression profiling in normal tissue by next-generation sequencing. Bioinformatics. 2012;28(8):1184–1185. doi: 10.1093/bioinformatics/bts084. [DOI] [PubMed] [Google Scholar]
- 157.Huang K., Gong H., Guan J., Zhang L., Hu C., Zhao W., et al. AgeAnno: a knowledgebase of single-cell annotation of aging in human. Nucleic Acids Res. 2022;51(D1):D805–D815. doi: 10.1093/nar/gkac847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 158.Duo H., Li Y., Lan Y., Tao J., Yang Q., Xiao Y., et al. Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios. Genome Biol. 2024;25(1):145. doi: 10.1186/s13059-024-03290-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159.Zhang X., Lan Y., Xu J., Quan F., Zhao E., Deng C., et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 2019;47(D1):D721–D728. doi: 10.1093/nar/gky900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.Jin J., Lu P., Xu Y., Tao J., Li Z., Wang S., et al. PCMDB: a curated and comprehensive resource of plant cell markers. Nucleic Acids Res. 2022;50(D1):D1448–D1455. doi: 10.1093/nar/gkab949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161.Yang J., Vamvini M., Nigro P., Ho L.L., Galani K., Alvarez M., et al. Single-cell dissection of the obesity-exercise axis in adipose-muscle tissues implies a critical role for mesenchymal stem cells. Cell Metab. 2022;34(10):1578–1593. doi: 10.1016/j.cmet.2022.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 162.Aran D., Hu Z., Butte A.J. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017;18(1):220. doi: 10.1186/s13059-017-1349-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 163.Finotello F., Trajanoski Z. Quantifying tumor-infiltrating immune cells from transcriptomics data. Cancer Immunol Immunother. 2018;67(7):1031–1040. doi: 10.1007/s00262-018-2150-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164.Becht E., Giraldo N.A., Lacroix L., Buttard B., Elarouci N., Petitprez F., et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17(1):218. doi: 10.1186/s13059-016-1070-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 165.Miao Y.R., Zhang Q., Lei Q., Luo M., Xie G.Y., Wang H., et al. ImmuCellAI: a unique method for comprehensive T-cell subsets abundance prediction and its application in cancer immunotherapy. Adv Sci (Weinh) 2020;7(7) doi: 10.1002/advs.201902880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166.Venet D., Pecasse F., Maenhaut C., Bersini H. Separation of samples into their constituents using gene expression data. Bioinformatics. 2001;17(Suppl 1):S279–S287. doi: 10.1093/bioinformatics/17.suppl_1.s279. [DOI] [PubMed] [Google Scholar]
- 167.Abbas A.R., Wolslegel K., Seshasayee D., Modrusan Z., Clark H.F. Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS One. 2009;4(7) doi: 10.1371/journal.pone.0006098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 168.Gong T., Szustakowski J.D. DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data. Bioinformatics. 2013;29(8):1083–1085. doi: 10.1093/bioinformatics/btt090. [DOI] [PubMed] [Google Scholar]
- 169.Newman A.M., Liu C.L., Green M.R., Gentles A.J., Feng W., Xu Y., et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12(5):453–457. doi: 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 170.Li T., Fu J., Zeng Z., Cohen D., Li J., Chen Q., et al. TIMER2.0 for analysis of tumor-infiltrating immune cells. Nucleic Acids Res. 2020;48(W1):W509–W514. doi: 10.1093/nar/gkaa407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 171.Racle J., de Jonge K., Baumgaertner P., Speiser D.E., Gfeller D. Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. Elife. 2017;6 doi: 10.7554/eLife.26476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 172.Trapnell C., Cacchiarelli D., Grimsby J., Pokharel P., Li S., Morse M., et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32(4):381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 173.Cao J., Spielmann M., Qiu X., Huang X., Ibrahim D.M., Hill A.J., et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566(7745):496–502. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 174.Li S., Zhang P., Chen W., Ye L., Brannan K.W., Le N.-T., et al. A relay velocity model infers cell-dependent RNA velocity. Nat Biotechnol. 2024;42(1):99–108. doi: 10.1038/s41587-023-01728-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 175.La Manno G., Soldatov R., Zeisel A., Braun E., Hochgerner H., Petukhov V., et al. RNA velocity of single cells. Nature. 2018;560(7719):494–498. doi: 10.1038/s41586-018-0414-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 176.Street K., Risso D., Fletcher R.B., Das D., Ngai J., Yosef N., et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018;19(1):477. doi: 10.1186/s12864-018-4772-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 177.Conard A.M., Goodman N., Hu Y., Perrimon N., Singh R., Lawrence C., et al. TIMEOR: a web-based tool to uncover temporal regulatory mechanisms from multi-omics data. Nucleic Acids Res. 2021;49(W1):W641–W653. doi: 10.1093/nar/gkab384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 178.Bergen V., Lange M., Peidli S., Wolf F.A., Theis F.J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat Biotechnol. 2020;38(12):1408–1414. doi: 10.1038/s41587-020-0591-3. [DOI] [PubMed] [Google Scholar]
- 179.Lange M., Bergen V., Klein M., Setty M., Reuter B., Bakhti M., et al. CellRank for directed single-cell fate mapping. Nat Methods. 2022;19(2):159–170. doi: 10.1038/s41592-021-01346-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 180.Atta L., Sahoo A., Fan J. VeloViz: RNA velocity informed embeddings for visualizing cellular trajectories. Bioinformatics. 2021;38(2):391–396. doi: 10.1093/bioinformatics/btab653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 181.Armingol E., Officer A., Harismendy O., Lewis N.E. Deciphering cell-cell interactions and communication from gene expression. Nat Rev Genet. 2021;22(2):71–88. doi: 10.1038/s41576-020-00292-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 182.Ramilowski J.A., Goldberg T., Harshbarger J., Kloppmann E., Lizio M., Satagopam V.P., et al. A draft network of ligand-receptor-mediated multicellular signalling in human. Nat Commun. 2015;6(1):7866. doi: 10.1038/ncomms8866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 183.Pavlicev M., Wagner G.P., Chavan A.R., Owens K., Maziarz J., Dunn-Fletcher C., et al. Single-cell transcriptomics of the human placenta: inferring the cell communication network of the maternal-fetal interface. Genome Res. 2017;27(3):349–361. doi: 10.1101/gr.207597.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 184.Cillo A.R., Kurten C.H.L., Tabib T., Qi Z., Onkar S., Wang T., et al. Immune landscape of viral- and carcinogen-driven head and neck cancer. Immunity. 2020;52(1):183–199. doi: 10.1016/j.immuni.2019.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 185.Hildreth A.D., Ma F., Wong Y.Y., Sun R., Pellegrini M., O'Sullivan T.E. Single-cell sequencing of human white adipose tissue identifies new cell states in health and obesity. Nat Immunol. 2021;22(5):639–653. doi: 10.1038/s41590-021-00922-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 186.Camp J.G., Sekine K., Gerber T., Loeffler-Wirth H., Binder H., Gac M., et al. Multilineage communication regulates human liver bud development from pluripotency. Nature. 2017;546(7659):533–538. doi: 10.1038/nature22796. [DOI] [PubMed] [Google Scholar]
- 187.Efremova M., Vento-Tormo M., Teichmann S.A., Vento-Tormo R. CellPhoneDB: inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes. Nat Protoc. 2020;15(4):1484–1506. doi: 10.1038/s41596-020-0292-x. [DOI] [PubMed] [Google Scholar]
- 188.Hodge M.R., Horton W., Brown T., Herrick R., Olsen T., Hileman M.E., et al. ConnectomeDB—sharing human brain connectivity data. Neuroimage. 2016;124(Pt B):1102–1107. doi: 10.1016/j.neuroimage.2015.04.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 189.Jin S., Guerrero-Juarez C.F., Zhang L., Chang I., Ramos R., Kuan C.H., et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun. 2021;12(1):1088. doi: 10.1038/s41467-021-21246-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 190.Wang Y., Wang R., Zhang S., Song S., Jiang C., Han G., et al. iTALK: an R package to characterize and illustrate intercellular communication. BioRxiv. 2019 doi: 10.1101/507871. [DOI] [Google Scholar]
- 191.Cang Z., Nie Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat Commun. 2020;11(1):2084. doi: 10.1038/s41467-020-15968-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 192.Tsuyuzaki K., Ishii M., Nikaido I. scTensor detects many-to-many cell–cell interactions from single cell RNA-sequencing data. BMC Bioinf. 2023;24(1):420. doi: 10.1186/s12859-023-05490-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 193.Saxena A., Prasad M., Gupta A., Bharill N., Patel O.P., Tiwari A., et al. A review of clustering techniques and developments. Neurocomputing. 2017;267:664–681. doi: 10.1016/j.neucom.2017.06.053. [DOI] [Google Scholar]
- 194.Wainberg M., Sinnott-Armstrong N., Mancuso N., Barbeira A.N., Knowles D.A., Golan D., et al. Opportunities and challenges for transcriptome-wide association studies. Nat Genet. 2019;51(4):592–599. doi: 10.1038/s41588-019-0385-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 195.Mai J., Lu M., Gao Q., Zeng J., Xiao J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun Biol. 2023;6(1):899. doi: 10.1038/s42003-023-05279-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 196.Camp J.G., Platt R., Treutlein B. Mapping human cell phenotypes to genotypes with single-cell genomics. Science. 2019;365(6460):1401–1405. doi: 10.1126/science.aax6648. [DOI] [PubMed] [Google Scholar]
- 197.Chen F., Wang X., Jang S.K., Quach B.C., Weissenkampen J.D., Khunsriraksakul C., et al. Multi-ancestry transcriptome-wide association analyses yield insights into tobacco use biology and drug repurposing. Nat Genet. 2023;55(2):291–300. doi: 10.1038/s41588-022-01282-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 198.Zhao B., Shan Y., Yang Y., Yu Z., Li T., Wang X., et al. Transcriptome-wide association analysis of brain structures yields insights into pleiotropy with complex neuropsychiatric traits. Nat Commun. 2021;12(1):2878. doi: 10.1038/s41467-021-23130-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 199.Gusev A., Mancuso N., Won H., Kousi M., Finucane H.K., Reshef Y., et al. Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat Genet. 2018;50(4):538–548. doi: 10.1038/s41588-018-0092-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 200.Gamazon E.R., Wheeler H.E., Shah K.P., Mozaffari S.V., Aquino-Michaels K., Carroll R.J., et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015;47(9):1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 201.Barbeira A.N., Dickinson S.P., Bonazzola R., Zheng J., Wheeler H.E., Torres J.M., et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun. 2018;9(1):1825. doi: 10.1038/s41467-018-03621-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 202.Zhu Z., Zhang F., Hu H., Bakshi A., Robinson M.R., Powell J.E., et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48(5):481–487. doi: 10.1038/ng.3538. [DOI] [PubMed] [Google Scholar]
- 203.Li D., Wang Q., Tian Y., Lyv X., Zhang H., Hong H., et al. TWAS facilitates gene-scale trait genetic dissection through gene expression, structural variations, and alternative splicing in soybean. Plant Commun. 2024;5(10):101010. doi: 10.1016/j.xplc.2024.101010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 204.You J., Liu Z., Qi Z., Ma Y., Sun M., Su L., et al. Regulatory controls of duplicated gene expression during fiber development in allotetraploid cotton. Nat Genet. 2023;55(11):1987–1997. doi: 10.1038/s41588-023-01530-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 205.Asher L., Harvey N.D., Green M., England G.C. Application of survival analysis and multistate modeling to understand animal behavior: examples from guide dogs. Front Vet Sci. 2017;4:116. doi: 10.3389/fvets.2017.00116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 206.Colling M.E., Ay C., Kraemmer D., Koder S., Quehenberger P., Pabinger I., et al. Lupus anticoagulant test persistence over time and its associations with future thrombotic events. Blood Adv. 2022;6(10):2957–2966. doi: 10.1182/bloodadvances.2021006011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 207.Johnson M., Zaretskaya I., Raytselis Y., Merezhuk Y., McGinnis S., Madden T.L. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008;36(W1):W5–W9. doi: 10.1093/nar/gkn201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 208.Lotfi Shahreza M., Ghadiri N., Mousavi S.R., Varshosaz J., Green J.R. A review of network-based approaches to drug repositioning. Brief Bioinform. 2018;19(5):878–892. doi: 10.1093/bib/bbx017. [DOI] [PubMed] [Google Scholar]
- 209.Bludau I., Aebersold R. Proteomic and interactomic insights into the molecular basis of cell functional diversity. Nat Rev Mol Cell Biol. 2020;21(6):327–340. doi: 10.1038/s41580-020-0231-2. [DOI] [PubMed] [Google Scholar]
- 210.Karczewski K.J., Snyder M.P. Integrative omics for health and disease. Nat Rev Genet. 2018;19(5):299–310. doi: 10.1038/nrg.2018.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 211.De Leeuw C.A., Neale B.M., Heskes T., Posthuma D. The statistical properties of gene-set analysis. Nat Rev Genet. 2016;17(6):353–364. doi: 10.1038/nrg.2016.29. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






