Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2025 Nov 9;26(6):bbaf584. doi: 10.1093/bib/bbaf584

GT-GRN: a graph transformer framework for enhanced gene regulatory network inference via multimodal embedding of expression data and existing network knowledge

Binon Teji 1, Swarup Roy 2,3,, Dinabandhu Bhandari 4, Jugal Kalita 5
PMCID: PMC12597036  PMID: 41206950

Abstract

The inference of gene regulatory networks (GRNs) is critical for understanding the regulatory mechanisms underlying cellular development, functional specialization, and disease progression. Predicting regulatory gene interactions—often framed as a link prediction task—is a foundational step toward modeling cellular behavior. However, GRN inference from gene coexpression data alone is limited by noise, low interpretability, and difficulty in capturing indirect regulatory signals. Additionally, challenges such as data sparsity, nonlinearity, and complex gene interactions hinder accurate network reconstruction. To address these issues, we propose, a novel graph transformer (GT) based framework (GT-GRN) that enhances GRN inference by integrating multimodal gene embeddings. Our method combines three complementary sources of information: (i) autoencoder-based embeddings, which capture high-dimensional gene expression patterns while preserving biological signals; (ii) structural embeddings, derived from previously inferred GRNs and encoded via random walks and a Bidirectional Encoder Representations from Transformers (BERT) based language model to learn global gene representations; (iii) positional encodings, capturing each gene’s role within the network topology . These heterogeneous features are fused and processed using a GT, allowing the joint modeling of both local and global regulatory structures. Experimental results on benchmark datasets show that GT-GRN outperforms existing GRN inference methods in predictive accuracy and robustness. Furthermore, it reconstructs cell-type-specific GRNs with high fidelity and produces gene embeddings that generalize to other tasks such as cell-type annotation.

Keywords: network inference, graph transformer, graph generation, gene expression, single-cell RNA seq, microarray, data fusion, embedding, global embeddings

Introduction

Systems Biology seeks to understand the big picture in the complex biological systems, focusing on the extraction of relevant biological information within an organism at the cellular level. Biological components, such as genes, interact with each other to reconstruct gene regulatory networks (GRNs) from observational gene expression data [1]. This process is used to unveil a complex web of interactions, shedding light on the underlying patterns that govern gene regulation. Inferring GRNs from gene expression data is crucial for understanding the molecular interaction patterns among genes. A gene network consists of interlinked genes, where the expression of a gene influences the activity of other genes in the network [2]. An effective approach to describing GRNs involves the use of graphical and mathematical modeling, often grounded in graph-theoretic formalism to capture complex interactions between genes. Formally, a GRN is represented as a network of nodes and edges, where the nodes represent genes, and edges represent the regulatory interactions between them [3]. GRN inference involves predicting the connections among macromolecules by analyzing their relative expression patterns.

Technologies such as DNA microarray [4], single-cell RNA sequencing (scRNA-seq) [5], and single-nucleus RNA sequencing (snRNA-seq) [6] have revolutionized transcriptomics by offering diverse and detailed insights into gene expression. Although each of these technologies has its unique strengths, they also come with certain limitations. Common limitations include noisy gene expression data, which often complicate the inference process. Furthermore, the dynamic and nonlinear nature of gene–gene regulatory interactions present a significant challenge, as traditional or linear methods often fail to capture the complex relationships comprehensively. In addition, GRNs tend to be sparse, further reducing the overall accuracy of inference methods. In the case of single-cell technologies, significant dropout events introduce a large number of zero counts in the expression matrix, adding another layer of complexity [7]. These challenges highlight the need for more robust and sophisticated approaches to effectively analyze and interpret gene expression data.

A wide range of supervised and unsupervised GRN inference methods has been developed to uncover the intricate relationships within gene networks [8, 9]. Early approaches relied on relatively simple techniques such as correlation analysis, mutual information (MI)-based methods, and differential equation models. Attempts have been made to understand these complex relationships [10–12]. However, many of these methods exhibit inherent limitations. Thus, the development of more reliable GRN inference techniques remains an important research goal, and numerous intelligent computational strategies have been proposed to address this challenge.

Despite these advancements notable limitations persist. A key concern is that many approaches rely solely on a single source of information, typically gene expression data, to infer GRNs. This alone is often insufficient for accurate and reliable network prediction. In addition, some methods fail to incorporate knowledge from previously inferred GRNs, restricting their ability to build on existing insights. Furthermore, several techniques overlook the integration of topological information, which is essential to capture structural properties critical to robust GRN inference.

Rather than focusing solely on each aforementioned issue, our approach adopts an integrated perspective driven by the intuition that combining multiple complementary sources of information, beyond gene expression alone, can enhance the quality of GRN inference. We propose GT-GRN, a novel approach that integrates the strengths of both unsupervised inference methods and supervised learning frameworks. Our method combines outcomes from the available inference techniques to minimize method-specific biases, ultimately deriving a more realistic and biologically meaningful GRN. To integrate multiple networks, rather than relying on GNN-based methods, which often suffer from over-smoothing when stacking multiple layers. We adopt a state-of-the-art unsupervised approach based on NLP that effectively captures and integrates information across networks. GT-GRN leverages the latest advancements in Graph Transformer models to enhance GRN inference. GT-GRN integrates three distinct representations derived from input expression networks: (i) topological features, which capture the structural properties of the network; (ii) gene expression values, which are crucial for identifying gene interactions; and (iii) the positional importance of genes, which reflects their functional relevance within the network. By fusing the multimodal embeddings from diverse perspectives, our framework improves both the interpretability and predictive power of inferred GRNs, making it a robust solution for various biological applications. GT-GRN superiority comes from multiple design decisions.

  • (1) Multinetwork integration: A key challenge in supervised GRN inference is the absence of ground-truth networks. True GRNs are often incomplete or unavailable, so we must rely on inferred networks as proxies. However, using a single inferred network can introduce bias or overlook critical interactions. We incorporate multiple networks inferred by different methods, harnessing their complementary strengths. While various inference models exist, each with its own set of advantages and limitations, combining these diverse sources allows us to leverage their shared strengths. This approach helps mitigate methodological bias, ultimately enhancing the confidence and accuracy of our GRN predictions.

  • (2) Gene expression embedding: Capturing meaningful representations of gene expression data through advanced embedding techniques can provide a richer understanding of the underlying regulatory mechanisms and improve GRN inference.

  • (3) GT frameworks: Traditional GNNs rely on local message-passing mechanisms to infer graph structures. However, adopting GT-based frameworks offers a more effective encoding strategy by leveraging global attention mechanisms, enabling better capture of complex regulatory relationships in GRNs.

The contributions of our present work are listed below:

  • We capture the quantitative characteristics of gene expression profiles through an autoencoder that learns biologically meaningful latent representations, effectively summarizing complex gene activity patterns while preserving essential regulatory signals (Section “Gene Expression Feature Encoding” ).

  • We introduce a method to consolidate prior knowledge from multiple inferred GRNs by converting networks into text-like sequences, enabling a BERT-based masked language model to learn global gene embeddings that integrate structural information across all networks (Section “Global Embeddings via multinetwork integration of the inferred GRNs”).

  • We propose a novel framework, GT-GRN, which leverages attention mechanisms within a GT model to learn rich gene embeddings by integrating multisource data—including gene expression profiles, structured inferred knowledge, and graph positional encodings from the input graph. These unified gene embeddings effectively capture the underlying biological relationships between genes, facilitating enhanced GRN inference (Section “Graph transformer for GRN inference”).

  • We demonstrate that GT-GRN effectively advances cell-type-specific GRN reconstruction. Moreover, the superior quality of the learned embeddings enables their successful application to cell type annotation tasks, highlighting the model’s robustness and generalizability.

The remainder of the paper is organized as follows. Section “Related work” reviews related work in GRN inference. Section “Materials and methodology” describes the proposed GT-GRN framework, including gene expression embedding, multinetwork integration, and graph positional encoding, along with details of the datasets used. Section “Results and analysis” presents experimental results and analysis. Section “Application of GT-GRN on cell-type classification” demonstrates the classification capabilities of GT-GRN. Finally, Section “Conclusion” concludes the paper by summarizing key findings and describing directions for future research.

Related work

With decades of effort within the research community dedicated to deciphering gene regulatory relationships from gene expression data, numerous methods have been proposed for reconstructing GRNs [13]. Traditional approaches include regression-based techniques [14] and MI-based methods, such as Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) [15], Minimum Redundancy Networks (MRNET) [16], and Context Likelihood of Relatedness (CLR) [17], which assess statistical dependencies between genes to infer potential regulatory interactions. Network Structure Controlling-based GRN inference method (NSCGRN) [18] is a global network partitioning and local network motif-based control framework for GRN inference. Global structure dominates the overall network structure to enforce hierarchy and sparsity while the local topology is refined using four known network motifs to adjust the specific patterns to improve biological plausibility.

Efforts from the area of computational biology have addressed data imbalance and noise with innovative modeling strategies based on complex-valued polynomial models [19]. Other approaches based on optimization, such as PGRNIG [20], which combines a parallel whale optimization algorithm with decomposition and regularization strategies, have shown high accuracy and speed in GRN inference from time-series data.

Supervised machine learning methods have also been explored for GRN inference. Support vector machines (SVMs) have been utilized to reconstruct biological networks through local modeling approaches [21]. Extensions such as CompareSVM [22] and GRADIS [23] further leverage classification-based frameworks to enhance network prediction accuracy.

With the advent of deep learning, more powerful and data-driven models have emerged. For instance, Daoudi et al. [24] proposed a deep neural network (DNN) model to infer GRNs from experimental data. Turki et al. [25] integrated both supervised and unsupervised learning techniques to perform link prediction on time-series gene expression data. Mao et al. [26] introduced a 3D convolutional neural network (CNN) model utilizing single-cell transcriptomic data, employing a novel labeling trick to enhance performance. GNE [27], a graph-based deep learning framework, unified known gene interactions, and expression profiles to robustly infer GRNs in a scalable manner. Other works such as Teji et al., [28, 29], use synthetic data to evaluate various embedding models to evaluate for GRN inference in link prediction setup.

Significant progress has been made in developing approaches that incorporate machine learning techniques to infer the network from gene expression [30]. A recent trend in network-science has gained significant momentum in modeling graph-based applications powered by graph neural networks (GNNs). For example, Wang et al. propose GRGNN [31] method to reconstruct GRNs from gene expression data in a supervised and semi-supervised framework. The problem is formulated as a graph classification problem for GRN inference on DREAM5 benchmarks. Q-graph attention network (GAT) [32] proposes a quadratic complexity neuronal network using dual attention mechanism for GRN inference. The model is validated by introducing adversarial perturbations to the gene expression data on E. coli (Escherichia coli) and S. cerevisiae datasets. Huang et al. [33] propose a GNN-based model called MIGGRI for GRN inference using spatial expression images that capture gene regulation from multiple images. DeepRIG [34] emphasizes learning global regulatory structures by embedding entire graphs using a graph autoencoder, thereby capturing comprehensive latent representations. GMFGRN [35] applies graph convolutional networks (GCNs) to factorize scRNA-seq data into gene and cell embeddings, which are then used in a multilayer perceptron (MLP) for interaction prediction. GNNLink [36] frames GRN inference as a link prediction problem by employing a GCN-based interaction graph encoder to capture and infer potential regulatory dependencies between genes. AnomalGRN [37] addresses the challenges of heterogeneity and sparsity in GRNs by reformulating GRN inference as a node prediction task. To tackle the pronounced imbalance between positive and negative links, the authors cast the problem as a graph anomaly detection (GAD) task, enabling the identification of anomalous regulatory patterns within the network.

Another line of work also concentrates on Transformer-based architectures for GRN inference. TRENDY [38] leverages transformer models to construct a pseudo-covariance matrix as part of the WENDY [39] framework. Rather than generating GRNs from scratch, it enhances existing inferred GRNs. However, it does not incorporate additional structural side information into the inference process. STGRNS [40] is an interpretable transformer-based method for inferring GRNs from scRNA-seq data. This method only considers two genes for gene regulation excluding the possibility of indirect regulation for prediction. GRN-Transformer [41] utilizes multiple statistical features extracted from scRNA-seq data and uses inferred GRN extracted from a single inference algorithm PIDC [42].

Despite these advancements, many GRN inference methods still rely on a single source of data or even purely topological information. They often emphasize local or pairwise gene interactions. This narrow focus limits the depth and breadth of biological insights. Although deep learning approaches such as MLPs, CNNs, and GNNs have significantly improved inference performance, they frequently process each data modality in isolation, missing opportunities for deeper integration.

The present research takes a step forward by proposing a novel GT framework that integrates multiple sources of biological information for GRN inference. Unlike conventional methods that rely on convolution-based architectures, our approach leverages graph-based attention mechanisms to effectively model complex regulatory relationships. A key strength of this framework lies in its ability to fuse diverse embeddings derived from gene expression data, input graph structures, and both existing and previously inferred regulatory networks. By jointly leveraging these complementary sources within a unified model, it enables more accurate and biologically meaningful inference of GRNs.

Materials and methodology

This section outlines the methodology of the proposed GT-GRN framework for GRN inference, followed by a description of the datasets used for evaluation. The framework is composed of three key modules: (i) encoding gene expression profiles as embedding features using unsupervised deep learning; (ii) extracting global gene embeddings through multinetwork integration; and (iii) capturing graph positional encodings from the input network structure. These complementary representations—gene expression embeddings, prior gene representations, and graph positional encodings—are fused to enhance GRN interaction prediction within a GT model. The effectiveness of our approach is then evaluated using publicly available gene expression datasets.

Gene expression feature encoding

Gene expression data are increasingly complex due to advances in profiling technologies, making traditional linear models insufficient to capture its intricate patterns. Unsupervised deep learning, particularly variational autoencoders (VAEs) [43], offers a powerful way to nonlinearly encode such data into compact, informative representations that better reflect underlying biological structures. Figure 1 illustrates the overview VAEs for encoding gene expressions. VAEs are a probabilistic deep generative class of neural networks designed to reconstruct input data by learning a compressed, low-dimensional representation that effectively characterizes the input. Using VAE for gene expression encoding, we can efficiently capture complex expression dynamics and generate compact feature representations for further analysis and downstream tasks. VAEs are a powerful framework for unsupervised learning and generally comprise two interconnected components: an encoder and a decoder.

Figure 1.

Alt Text: Conceptual illustration of a VAE showing the encoder mapping gene expression to latent space, latent sampling, and decoder reconstructing the expression matrix.

Variational autoencoder (VAE) for gene expression embedding, where the encoder maps the original gene expression matrix Inline graphic into latent variables (Inline graphicInline graphic), and the decoder reconstructs Inline graphic from the latent representation Inline graphic using a neural network to minimize the reconstruction loss.

  • Encoder: It maps the input gene expression matrix Inline graphic to a latent representation space Inline graphic. It approximates the posterior distribution Inline graphic using a neural network. The encoder outputs the parameters, i.e. the mean and variance of a multivariate Gaussian distribution Inline graphic, that serves as an approximation of the true posterior Inline graphic. This process captures the underlying biological variability and regulatory patterns among genes.

  • Decoder: It takes a sample Inline graphic from the latent space and maps it back to the original gene expression space, generating a reconstructed matrix Inline graphic. This is modeled by the likelihood Inline graphic, which represents the probability of generating the observed gene expression profiles Inline graphic given the latent variables Inline graphic. The decoder, implemented via a neural network, learns to reconstruct biologically plausible gene expression patterns from the learned latent representations.

During training, the VAE aims to learn the parameters of the encoder and the decoder network parameters by maximizing the Evidence Lower Bound (ELBO) which is given by:

graphic file with name DmEquation1.gif (1)

where Inline graphic is the reconstruction term that reconstructs the input gene expression data Inline graphic given the latent representation Inline graphic. Inline graphic is the KL (Kullback–Leibler) divergence that quantifies the distance between the approximate posterior Inline graphic and the prior distribution Inline graphic.

Since sampling from learned distributions is inherently nondifferentiable, it hinders the use of gradient-based optimization during backpropagation. To address this, the reparameterization trick introduces a differentiable transformation by expressing the random variable as Inline graphic, where, Inline graphic is a deterministic function and Inline graphic is an auxiliary noise variable drawn from a fixed, independent distribution. The above problem can be rewritten as:

graphic file with name DmEquation2.gif
graphic file with name DmEquation3.gif

Where Inline graphic, Inline graphic is the element-wise product and Inline graphic is the identity matrix, which serves as the covariance matrix.

Global embeddings via multinetwork integration of the inferred gene regulatory networks

We integrate multinetwork information to understand gene interactions as prior knowledge. Integrating data from diverse inference methods, each with unique strengths and limitations, provides a holistic and reliable view of the network. This approach overcomes the shortcomings of relying on a single method, enabling robust downstream analysis. Figure 2 illustrates the workflow.

Figure 2.

Alt Text: Diagram showing how multiple gene networks are sampled into node sequences and processed by a transformer model to produce global gene embeddings across all networks.

Global gene embeddings via multinetwork integration, where gene expression data are processed through inference algorithms to generate networks, sampled via random walks into node sequences beginning with a [CLS] token, tokenized and embedded, and passed through a transformer trained via masked node prediction to produce final gene embeddings.

Unsupervised network integration via random walks and transformers

We present an unsupervised learning approach for integrating multiple networks to generate global embeddings. Let a graph Inline graphic represent an inferred network, where Inline graphic denotes the set of nodes and Inline graphic represents the set of edges. The graph is characterized by its adjacency matrix Inline graphic, where each entry Inline graphic reflects the relationship between nodes Inline graphic and Inline graphic. Specifically, Inline graphic if an edge Inline graphic connects Inline graphic and Inline graphic, and Inline graphic otherwise, indicating no direct connection.

We consider a collection of Inline graphic networks, Inline graphic, sharing the same set of Inline graphic nodes but differing in the number of edges in each network. We capture the structural information of the networks by converting them into text-like sequences using random walks, similar to node2vec [44]. The walks are encoded through an embedding matrix Inline graphic, where Inline graphic is the size of the vocabulary (total nodes across all networks) and Inline graphic is the desired embedding dimension. Positional encodings are used to account for node order as described in [45]:

graphic file with name DmEquation4.gif (2)

Here, Inline graphic represents the Inline graphicth coordinate of the position encoding at sequence position Inline graphic. These encodings are concatenated with the original input features or the embedding matrix.

The embedding matrix and the decoding layer are initialized with uniform random values, while the transformer layer is initialized using Xavier’s initialization [46]. During training, all parameters are updated. Each sequence begins with a special classification token Inline graphic, while other tokens correspond to node-specific vectors from the embedding matrix. The final hidden state of the Inline graphic token for a given sequence serves as the sequence representation.

Masked language learning with BERT

We utilize the Masked Language Modeling (MLM) approach as implemented in BERT [46]. At its core, this method employs a transformer encoder composed of Inline graphic identical blocks. Each block includes a self-attention mechanism followed by a feedforward neural network (FFN), as described in [45].

Let Inline graphic denotes an input sequence of Inline graphic tokens, where each token is represented by a Inline graphic-dimensional vector. A self-attention layer processes this sequence using the following transformation:

graphic file with name DmEquation5.gif (3)

where Inline graphic, Inline graphic, and Inline graphic. Here, Inline graphic, Inline graphic, and Inline graphic are learnable matrices that project the input into query, key, and value spaces of node sequences, respectively. Inline graphic is the dimension of the key vectors.

The feedforward layer, applied independently to each token, performs the transformation:

graphic file with name DmEquation6.gif (4)

where Inline graphic and Inline graphic are learnable matrices, and Inline graphic and Inline graphic are bias vectors. FFN is the feed-forward network and Inline graphic is the global gene embeddings.

The MLM task involves masking a random subset of input tokens and predicting their identities based on the remaining context. This encourages the model to capture bidirectional contextual relationships within sequences. Specifically, we mask Inline graphic of the tokens (representing nodes) in each sequence and train the model to recover the masked tokens using a cross-entropy loss function:

graphic file with name DmEquation7.gif (5)

where Inline graphic is the batch size, Inline graphic is the sequence length and Inline graphic is the number of classes (total number of possible tokens that can be predicted). Inline graphic is a binary indicator equal to Inline graphic if the correct class of token Inline graphic in batch Inline graphic is Inline graphic, and Inline graphic is the predicted probability for this classification. The final embeddings are extracted from the embedding layer represented as Inline graphic.

This enables the model to learn rich contextual embeddings for nodes, capturing both structural and positional relationships within the networks.

Graph transformer for gene regulatory network inference

After deriving features from available GRNs and gene expression data, we utilize the GT to learn comprehensive representations by injecting the underlying regulatory structure. Since GT is specifically designed to model complex dependencies in graph-structured data, it effectively captures gene-gene interactions based on attention mechanisms, making it well suited for GRN-based representation learning. GT-GRN is illustrated in Fig. 3.

Figure 3.

Alt Text: Block diagram showing the GT-GRN model where GT layers process input graphs with gene expression, positional, and global embeddings to generate node representations, followed by a link predictor estimating connections between node pairs.

Architecture diagram of GT-GRN, where graph transformer layer processes input graph with gene expression, graph positional, and global embeddings to generate gene representations, followed by a link predictor module that estimates connections between two genes using their embeddings Inline graphic and Inline graphic.

Graph positional encodings

NLP-oriented Transformers are supplied with Positional Encodings. At the heart of GT, graph positional encodings hold a special place which is important for encoding node positions. From the available graph structure, we make use of Laplacian eigenvectors and use them as graph positional encoding (Inline graphic) information. This is helpful to encode distance-aware information, i.e. nearby nodes would have similar positional features and vice versa. Eigenvectors are defined as the factorization of the graph Laplacian matrix:

graphic file with name DmEquation8.gif (6)

where Inline graphic is the input adjacency matrix, Inline graphic is the identity matrix of size Inline graphic. Inline graphic is the degree matrix, Inline graphic is the eigen-values and Inline graphic are the eigenvectors. We then use the Inline graphic smallest significant eigen-vectors of a node as its positional encoding, which is denoted by Inline graphic for node Inline graphic.

Input to GT-GRN

The input to the GT layer is the graph structure Inline graphic and its associated features Inline graphic. The features Inline graphic are constructed as a combination of gene expression embeddings (Inline graphic), global gene embeddings (Inline graphic), and graph positional embeddings (Inline graphic), which are derived from the graph Inline graphic.

  • For gene expression embeddings, each gene Inline graphic is passed through a linear projection layer to embed it into a Inline graphic-dimensional space.
    graphic file with name DmEquation9.gif (7)
    where, Inline graphic is a learnable weight matrix, Inline graphic is the learnable bias vector, and Inline graphic is the projected embedding.
  • For the global gene embeddings, each vector Inline graphic is also embedded into a Inline graphic-dimensional space using a separate linear projection layer. The transformation is defined as:
    graphic file with name DmEquation10.gif (8)
    where, Inline graphic is a learnable weight matrix, Inline graphic is the learnable bias vector.
  • The graph positional encodings are extracted from the input graph Inline graphic. For a particular gene’s positional encoding Inline graphic is embedded into Inline graphic-dimensional space using a linear projection layer which is given by:
    graphic file with name DmEquation11.gif (9)
    where, Inline graphic is a learnable weight matrix, Inline graphic is a learnable bias vector.

Finally, the gene expression embeddings Inline graphic, global gene embeddings Inline graphic, and graph positional embeddings Inline graphic are each projected into a shared Inline graphic-dimensional space through separate linear transformation layers. These projected representations are then summed element-wise (often called fusion by summation) to form the final node features Inline graphic:

graphic file with name DmEquation12.gif (10)

where, Inline graphic is the unified, per-gene embedding that fuses its expression profile, its topological position in the regulatory network, and a dataset-wide global context. This final representation Inline graphic is then injected into the GT layer along with the adjacency matrix Inline graphic of the input graph Inline graphic.

Graph transformer layer

The node update in the GT at layer Inline graphic is defined as follows:

graphic file with name DmEquation13.gif (11)
graphic file with name DmEquation14.gif (12)

and Inline graphic, Inline graphic denotes number of attention heads, and Inline graphic denotes the concatenation of the number of heads. Inline graphic, Inline graphic, Inline graphic, Inline graphic, The attention outputs Inline graphic are then passed to an FFN preceded and succeeded by residual connections and normalization layers as:

graphic file with name DmEquation15.gif (13)
graphic file with name DmEquation16.gif (14)
graphic file with name DmEquation17.gif (15)

where Inline graphic, Inline graphic, and Inline graphic, Inline graphic are the intermediate representations. Norm could be either Layer-Norm [47] or BatchNorm [48].

Link prediction with learned representations

The final module is designed to predict edges between nodes using the learned node representations Inline graphic obtained from the GT layer. The module takes as input the embeddings Inline graphic, where Inline graphic is the number of nodes and Inline graphic is the embedding dimension, along with an edge index representing node pairs. For each edge Inline graphic, the embeddings of the source node Inline graphic and the destination node Inline graphic are extracted and concatenated to form a feature vector. This vector is passed through a decoder network consisting of a multilayer perceptron (MLP) with a hidden layer, ReLU activation, and an output layer, which reduces the concatenated vector to a scalar. The scalar represents the predicted likelihood of an edge between the nodes Inline graphic and Inline graphic. By utilizing the updated node embeddings Inline graphic, this module effectively learns to identify and score potential edges in the graph.

Next, we discuss the experimental setup used to demonstrate the superiority of GT-GRN.

Experimental setup

The performance of GT-GRN is evaluated on Linux based NVIDIA RTX A3000 GPU as the computing machine. The deep learning libraries used here are pytorch(https://pytorch.org/), dgl (https://www.dgl.ai/), scikit-learn (https://scikit-learn.org/stable/), Pytorch Geometric (https://pytorch-geometric.readthedocs.io/en/latest/index.html∖#).

Datasets

We establish our findings on two scRNA-seq human cell types: human embryonic stem cells (hESC) [49] and mouse embryonic stem cells (mESC) [50] from the BEELINE [51] study. The cell-type-specific ChIP-seq ground-truth networks are used as a reference for these datasets. Additionally, we use the synthetic expression profiles generated using GeneNetWeaver (GNW) [52], a simulation tool developed for DREAM (Dialogue on Reverse Engineering Assessment and Methods) along with their corresponding ground-truth networks. The details of the datasets have been discussed in Table 1.

Table 1.

Dataset statistics

Species/cell types Type Source Inline graphic Inline graphic
Yeast Microarray GNW 4000 11,323
hESC-500 scRNA-seq BEELINE 910 3940
mESC-500 scRNA-seq BEELINE 1120 20,923
hESC-1000 scRNA-seq BEELINE 1410 6139
mESC-1000 scRNA-seq BEELINE 1620 30,254

Preprocessing of raw data

We preprocess the raw scRNA-seq data using an established method [51] to handle redundancy. We filter out low-expressed genes and prioritized the variable ones. Primarily, the genes expressed in <10% of cells were removed. Then, we computed the variance and Inline graphic-values for each gene, selecting those with P-values below.01 after Bonferroni correction. Gene expression levels were log-transformed for normalization. This yielded a feature matrix Inline graphic, where Inline graphic is the number of genes and Inline graphic is the number of cells. Furthermore, we adopt the approach of Pratapa et al. [51] to assess performance across different network sizes. Specifically, we rank genes by variance and select the most variable transcription factors (TFs), along with the top 500 and 1000 genes with the highest variability.

Baseline methods

We evaluate the efficacy of GT-GRN against the existing baselines methods commonly used for inferring GRNs are shown in Table 2.

Table 2.

Summary of GRN inference methods classified by category

Category Method Description
Graph neural network GNNLink [36]a Uses a GCN-based interaction graph encoder to capture gene expression patterns.
GENELink [53]b Leverages a GAT to infer GRNs via attention mechanisms.
GNE [27]c Uses an MLP to encode gene expression profiles and network topology for predicting gene regulatory links.
MI ARACNE [15]d Infers networks based on adaptive partitioning (AP) and MI.
BC3NET [54]e An ensemble technique derived from the C3NET algorithm employing bagging.
C3MTC [55]f Infers networks where edge weights are defined by MI values.
C3NET [56]g Uses MI and a maximization step to capture causal structure.
Feature selection MRNET [16]h Applies supervised gene selection using MRMR (maximum relevance/minimum redundancy).
Ensemble tree-based GRNBOOST2 [57]i A fast inference algorithm using stochastic gradient boosting regression.
GENIE3 [58]i A classic inference algorithm using random forest or extra trees regression.

Results and Discussion

We report results using both single-cell and microarray gene expression datasets, selecting representative methods from each major category of GRN inference techniques against GT-GRN. These include MI-based methods, feature selection approaches, ensemble tree-based models, and graph neural network frameworks. This diverse selection allows us to comprehensively evaluate performance across different inference paradigms, ensuring a balanced comparison that highlights the strengths and limitations of each method.

Gene regulatory network inference via full network reconstruction

Fundamentally, GRN inference aims to reconstruct the entire regulatory network, capturing the full complexity of gene interactions. By striving for complete network reconstruction, it seeks to reveal the intricate web of regulatory relationships among all involved genes, reflecting the true, comprehensive regulatory architecture [59]. We evaluate the effectiveness of GT-GRN alongside existing methods designed for GRN inference. Following a similar motivation, we report these results for GT-GRN and its baseline methods in Fig. 4. In the figure, the results of the full network reconstruction, highlight the comparative performance of different methods. For the scRNA dataset, the performance of GT-GRN remains consistently higher with minor variations for all datasets. This suggests that the GT-GRN method is robust for different cell types and sequencing depths.

Figure 4.

Alt Text: Bar plots comparing AUROC scores of different network reconstruction methods across various datasets.

Full network reconstruction performance of various methods for different datasets in terms AUROC score. (a) BEELINE’s scRNA-seq datasets and (b) GNW’s Yeast dataset.

For the Yeast (microarray) dataset, the GT-GRN method significantly outperforms all other network inference methods, indicating its superior ability to capture gene interactions. Baseline methods such as ARACNE, BC3NET, C3MTC, C3NET, and MRNET perform at similar levels. In general, GT-GRN appears to be the most effective method for the microarray dataset, while for scRNA expression profiles the method demonstrate stable performance across conditions in terms of AUROC score.

Gene regulatory network inference via link prediction

There exist many benchmark methods that treat GRN inference as a link prediction problem, focusing on identifying only a limited subset of interactions. We report results using this conventional approach. Table 3 reports the performance results for the candidate datasets in terms of Area Under the Receiver-Operating Characteristic Curve (AUROC) and Area Under the Precision-Recall Curve (AUPRC) metrics. To assess the predictive performance of GT-GRN, we present a comparison with various baseline methods. In the table it is clearly evident that the GT-GRN consistently outperforms other methods in all datasets, achieving the highest AUROC and excelling in AUPRC, particularly for the mESC-1000 and mESC-500 datasets. GNE emerges as a strong contender, especially in the hESC-1000 dataset, where it achieves the highest AUPRC Inline graphic and the AUROC of Inline graphic, indicating its effectiveness in balancing precision and recall. GENELink maintains strong performance across most datasets, while GNNLink performs well in AUROC but lags in AUPRC. In contrast, GENIE3 and GRNBOOST2 show consistently lower scores, indicating challenges in handling these complex datasets. Notably, GNE outperforms all methods in both AUROC and AUPRC for the hESC-1000 dataset, highlighting its scalability. For the Yeast-4000 dataset, GT-GRN demonstrates superior performance compared to its baseline counterparts. Overall, GT-GRN proves to be the most reliable across datasets, while GNE stands out for its precision, making them the top choices for biological network predictions in this study.

Table 3.

AUROC and AUPRC scores for different methods across the various scRNA-seq datasets

Dataset Method AUROC AUPRC
mESC-1000 GT-GRN 0.9483 0.8990
GNNLink 0.8833 0.8660
GENELink 0.9133 0.8103
GNE 0.8984 0.8925
mESC-500 GT-GRN 0.9402 0.8853
GNNLink 0.8768 0.8331
GENELink 0.9057 0.8004
GNE 0.8378 0.8416
hESC-500 GT-GRN 0.8793 0.5932
GNNLink 0.8251 0.4542
GENELink 0.8618 0.5581
GNE 0.8402 0.8466
hESC-1000 GT-GRN 0.8784 0.8604
GNNLink 0.8442 0.5011
GENELink 0.8657 0.5610
GNE 0.9025 0.9042

Bold values indicate the best performance.

Impact of hyperparameters

Additionally, we investigate the role of various hyperparameters (HPs) that influence the overall performance of the models. Optimal selection of HPs is a difficult task and time-consuming activity. We analyze the behavior of the learning models compared to GT-GRN by selecting key HPs from each baseline model to assess their impact on the overall performance of each model. Table 4 describes the parametric configurations that we tuned for each learning model with their respective explanations. We summarize the overall impact of HP tuning for different models using a boxplot, as shown in Fig. 5. This visualization presents a comprehensive comparison of our model, GT-GRN, against several baseline approaches in terms of AUROC performance across multiple datasets.

Table 4.

Tuned hyperparameters (HPs) for different models, where an epoch represents one full pass over the dataset, the learning rate controls the update pace, the output dimension defines the final prediction layer shape, attention heads compute and aggregate attention over input elements, and layers indicate the number of transformation steps applied sequentially to produce the output and pass to the next layer

Model Epochs Learning rate Output dimensions Attention heads Layers
GT-GRN 0.001, 0.003, 0.0005 2,4,8 4,6,8
GNNLink 100, 200, 300 0.001, 0.005, 0.01 128, 256, 512
GENELink 10, 20, 30 0.001, 0.003, 0.0005 128, 256, 512
GNE 10, 20, 30 0.001, 0.005, 0.01 128, 256, 512

Figure 5.

Alt Text: Box plots comparing AUROC scores of GENELink, GNE, GNNLink, and GT-GRN models across hESC and mESC datasets, showing variation and consistency under different HP configurations.

Overall HP tuning plot for various models.

A key observation is that GT-GRN consistently achieves higher median AUROC scores with notably lower variance across datasets, highlighting its robustness and reliability under varying experimental conditions. Each model was tuned with its respective optimal HPs, yet GT-GRN exhibits both stability and effectiveness across settings, clearly outperforming the existing baselines on all candidate datasets. In contrast, GNNLink shows a higher variance, particularly in the hESC-500 dataset, where its performance fluctuates significantly. GENELink and GNE display relatively stable performances, though both fall short of the superior AUROC achieved by GT-GRN. In particular, the mESC-1000 dataset highlights the clear dominance of GT-GRN, with its AUROC surpassing that of all other methods by a substantial margin.

Application of GT-GRN on cell-type classification

Cell-type-specific GRNs are crucial for defining transcriptional states during development, with each cell-type being characterized by a unique set of active TFs. These GRNs offer an unbiased method for studying gene regulation, providing valuable insights into the mechanisms driving cellular diversity. In this context, we explore the effectiveness of GT-GRN in reconstructing cell-type-specific GRNs, with the goal of cell-type annotation. To achieve this, we apply GT-GRN to scRNA-seq data from over 8000 human peripheral blood mononuclear cells (PBMCs8k), sourced from 10X Genomics (https://www.10xgenomics.com/datasets/8-k-pbm-cs-from-a-healthy-donor-2-standard-2-1-0). The data are preprocessed using the Scanpy framework [60], ensuring efficient handling and analysis of the single-cell data. For the ground truth network, we utilize the hTFtarget database [61], which integrates ChIP-seq data, TF binding sites, and epigenetic modification information. This comprehensive resource provides detailed insights into gene regulation and TF-target interactions, making it an invaluable tool for studying gene regulatory mechanisms.

In order to perform cell-type annotation, it is essential to first reconstruct the GRN. This involves generating embeddings that represent the cell types based on their GRNs. These embeddings serve as the foundation for cell-type classification, enabling accurate annotation of the cell types based on their unique regulatory patterns. By leveraging the power of GT-GRN in inferring cell-type-specific GRNs, we aim to advance the cell-type annotation in scRNA-seq data, ultimately improving our understanding of the complex regulatory landscapes that define cellular identities.

GT-GRN for PBMC network reconstruction

We investigate GRN inference using GT-GRN by formulating it as a network regeneration problem on the PBMC dataset. First, we filter the data by removing genes expressed in <5% of cells and discarding cells that express <200 genes. Next, we normalize the total counts per cell to 10,000, ensuring comparability across cells. To further refine the data, we apply MAGIC [62] imputation, which reduces noise and improves expression patterns. Finally, we perform a logarithmic transformation to improve interpretability and optimize the data for downstream analysis.

After preprocessing, we employ GT-GRN to reconstruct the PBMC GRN and compare its performance against existing baseline methods. Comparison of PBMC’s hTFTarget (Gold standard) and generated networks in Table 5 reveals key structural differences and predictive performance variations. GT-GRN achieves the highest AUROC (0.9852) while maintaining balanced connectivity and clustering, making it the best-performing model. Supplementary Fig. S1 reports the ROC curve of the GT-GRN model showing high discriminative ability with performance well above the random baseline (dashed line). GENELink and GNNLink exhibit dense connectivity, high clustering, and shorter path length but are highly disassortative, indicating a strong preference for high-degree nodes connecting to low-degree ones. GNE, with lower connectivity and clustering, results in longer path lengths and the lowest AUROC (0.7596) but retains some structural similarities to the Gold network. The Gold standard itself maintains moderate connectivity and a sparse clustering structure, serving as a key benchmark. We also measure the quality of the generated graph network characteristics from the candidate methods with the input network (hTFTarget) using a single measurement score using Pearson correlation coefficient. The results show that all methods exhibit strong correlation, with GNE (0.9992) achieving the highest agreement, followed closely by GT-GRN (0.9838). However, GNNLink and GENELink report the similar score of 0.9817. These insights highlight the trade-offs between network structure and predictive performance, guiding model selection for biological network analysis.

Table 5.

Network Characteristics Comparison of PBMC’s hTFTarget and generated networks with AUROC Score. Maximum degree computes the degree over all vertices. Assortativity is the Pearson correlation of degrees of connected nodes. Triangle count denotes the connection between two nodes. Clustering coefficient measure of the tendency of nodes in a network to form triangles. Characteristic path length represents the average shortest path length between all nodes pairs in a network. PCC is the Pearson Correlation Coefficient between hTFTarget and generated network characterisitics

Model Maximum degree Assortativity Triangle count Clustering coefficient Characteristic path length AUROC PCC
Gold 1837 −0.4762 9596 0.0060 2.7392
GT-GRN 2593 −0.6614 207,579 0.0255 2.1934 0.9852 0.9838
GENELink 3999 −0.9867 5,671,530 0.0389 1.9731 0.8810 0.9817
GNNLink 3999 −0.9867 5,671,530 0.0389 1.9731 0.8467 0.9817
GNE 924 −0.1872 4014 0.0094 2.9621 0.7596 0.9992

Further, we analyze the degree-distribution plot of the generated networks in comparison to the input PBMC’s hTFTarget network. Figure 6 describes the log–log degree distribution plot compares the degree distributions of the input (Gold) and generated networks (GT-GRN, GENELink, GNNLink, and GNE). The Gold network follows a natural decay which is scale-free in degree distribution, while GT-GRN shows a similar trend with slight deviations. GNE displays a more scattered pattern, indicating the similar tailed degree-distribution. GENELink and GNNLink exhibit significantly higher maximum degrees, i.e. these generated networks generate more high-degree nodes than the input network.

Figure 6.

Alt Text: Line plot on log–log scale showing degree versus frequency distributions for the original PBMC hTFTarget network and networks generated by various models.

Log–log degree distribution plot of the input PBMC’s hTFTarget network and generated network for different models.

To optimize the architecture of GT-GRN model, we conducted a comprehensive HP search across different numbers of layers and attention heads on PBMC dataset. We evaluated model performance using the AUROC metric across varying combinations of input modalities. Positional embeddings, global and gene expression embeddings (unimodal), all pairwise combinations (bimodal), and the full trimodal input. Supplementary Fig. S2 shows the AUROC performance for each configuration focusing on the number of layers and attention heads. Each line represents a different head configuration (2, 4, and 8). For single-modal embeddings (top row), performance varies moderately with layer depth: positional embeddings show a decline at higher layers, whereas global and gene expression embeddings remain relatively stable. In two-modal combinations (middle row), AUROC is generally higher than in single-modal cases, indicating that combining modalities improves predictive performance. Some combinations benefit from deeper layers, while others peak at intermediate depths. For the three-modal combination (bottom row), integrating all three embeddings achieves the highest AUROC overall, although the optimal layer and head configuration differ slightly, reflecting complex interactions between modalities.

Furthermore, we report the computational efficiency of GT-GRN in comparison to the baselines on PBMC dataset, as detailed in Supplementary Table T1. It highlights a clear trade-off between predictive performance and resource requirements. GT-GRN incurs a higher computational cost compared with lightweight methods such as GNNLink and GNE, requiring Inline graphic1.5 h for execution on the PBMC dataset, while GNNLink and GNE complete in under 12 min and 1 min, respectively. Although GT-GRN is more resource-intensive, this additional cost stems from its GT architecture, which jointly integrates positional, global, and gene expression embeddings. In contrast, baseline methods rely on simpler architectures with limited feature integration, resulting in faster runtimes but reduced representational capacity. Importantly, GT-GRN remains significantly more efficient than GENELink, which exceeds 2 h of runtime, suggesting that our method balances accuracy and computational feasibility.

Next, we assess how well our embeddings capture the community structure by clustering gene embeddings. We utilize the Leiden algorithm [63] to cluster the resultant gene embeddings. Figure 7 presents a uniform manifold approximation and projection (UMAP) dimensional reduction of gene representations for various methods. The methods compared include Gene Expression data, GENELink, GNNLink, GNE, and GT-GRN. The visualization clearly shows the distinct community structures produced by GNNLink, and GT-GRN embeddings that indicates effective preservation of biological modules. Notably, GT-GRN embeddings exhibit tighter and more biologically coherent clusters that align with functional gene modules, suggesting that the model captures regulatory programs. These preserved clusters provide evidence of pathway-level organization, highlighting the ability of GT-GRN to reveal biologically meaningful communities relevant to cellular processes.

Figure 7.

Alt Text: UMAP visualizations of gene embeddings from different models on the PBMC dataset.

UMAP visualization of genes representations for PBMC’s dataset according to different methods.

GT-GRN for cell-type annotation

We further investigate these embeddings for cell-type annotation task. We manually annotate cell-types for gene expression data and focus on four cell-types with highest number of cells, CD4+ T cells, CD14+ Monocyte cells, and CD8+ T cells. We train a three-layered multilayered perceptron classifier for annotating cell-types. The classifier is trained using multiclass classification setting with five-fold cross validation. We benchmark GT-GRN against GENELink, GNNLink, and GNE methods. Figure 8 demonstrates that GT-GRN effectively captures the cell-types using gene representation classification setup.

Figure 8.

Alt Text: Bar chart comparing AUROC and AUPRC scores for various models in cell-type classification.

Cell-type classification. AUROC and AUPRC score of GT-GRN, GENELink, GNNLink, GNE-based embeddings in annotation cell-types using MLP classifier in five-fold cross-validation setting.

Next, we delve into the individual contributions of each embedding modality within the GT-GRN framework through a systematic ablation study on the PBMC dataset. This dataset provides a biologically rich and diverse single-cell expression landscape, making it an ideal benchmark for evaluating the role of each embedding component in GRN inference.

Ablation studies

To assess the overall efficacy and robustness of GT-GRN. We conducted the ablation study in two stages: first at the modality level, followed by the GT layer level. At the modality level, we systematically examined the contributions of structural positional encodings, global embeddings, and gene expression embeddings, both individually and in combination. Subsequently, we performed ablation experiments to assess the role of its internal components of GT layer. We first report the results of the modality-level study, and then present the findings from the GT layer ablation.

Modality-level ablation study

The modality-level ablation study systematically examines the individual and combined contributions of its embedding modalities: structural positional encodings, global embeddings, and gene expression embeddings. This experiment is crucial, given the multicomponent nature of our framework, as it allows us to systematically assess the role and efficacy of each component in the GRN inference process under a link prediction setup. We organize our baselines into three categories: uni-modal, bi-modal, and tri-modal, where the prefixes “uni,” “bi,” and “tri” denote the number of information sources used. This study establishes the necessity of each module in the overall architecture, demonstrating that omitting any single modality leads to a significant performance drop—thereby justifying the integration of all components for optimal GRN reconstruction.

  • Structural positional encodings: In this uni-modal baseline, we have used the Inline graphic of the input network which is then fed into the GT-GRN. Here, each gene positional encodings is of length 512. This vector is fed into the model to predict possibility of link with other gene vectors based on the input link information.

  • Global embeddings: This is a uni-model baseline, that extracts the global knowledge of inferred GRNs in a multinetwork integration framework using BERT model. The output length of each gene embedding here is 512. The vector is given as input into the model to estimate the likelihood of forming links with other gene vectors for link inference.

  • Gene expression embeddings: In this uni-modal baseline, raw gene expression is converted into a embedding vector to capture the latent information using auto-encoder model. The length of the embedding vector for each gene is 512 which is then used to estimate the likelihood of the link using the GT-GRN framework.

  • Structural positional encodings + Global embeddings: This multimodal approach combines the structural positional encodings of the input network with global embeddings derived from the BERT-based multinetwork integration framework. The fused representation is used in the GT-GRN model to enhance link prediction performance.

  • Structural positional encodings + Gene expression embeddings: This approach integrates the structural positional encodings of the input network with gene expression embeddings obtained through an autoencoder. The combined vector representation is fed into the GT-GRN model to infer potential links between genes.

  • Global embeddings + Gene expression embeddings

    In this setup, global embeddings capturing inferred GRN knowledge are combined with gene expression embeddings. The resulting representation is used to predict gene interactions within the GT-GRN framework.

  • Structural positional encodings + Global embeddings + Gene expression embeddings: This comprehensive multimodal approach fuses all three embeddings—structural positional encodings, global embeddings, and gene expression embeddings—to provide a richer representation for link inference. This integrated approach aims to leverage complementary information from multiple modalities to enhance predictive performance.

Table 6 reports the ablation study results from different feature sets for GT-GRN using AUROC scores. Among unimodal representations, Global Embeddings achieve the highest AUROC of 0.8860, followed by gene expression embeddings (0.8693) and structural positional encodings with AUROC score of 0.8480, indicating that global information is the most informative for link inference.

Table 6.

Ablation study in terms of different features for PBMC’s dataset for GT-GRN framework

Modality Feature sets AUROC
Unimodal Structural positional encodings 0.8480
Global embeddings 0.8860
Gene expression embeddings 0.8693
Bimodal Structural positional encodings + Global embeddings 0.8843
Structural positional encodings + Gene expression embeddings 0.8666
Global embeddings + Gene expression embeddings 0.8841
Trimodal Structural positional encodings + Global embeddings + Gene expression embeddings 0.9852

The bold value indicates the best performance.

Bimodal combinations improve performance, with Global embeddings + Structural positional encodings (0.8843) and Global embeddings + Gene expression embeddings (0.8841) performing best. The trimodal combination of all three embeddings achieves the highest AUROC of 0.9852, demonstrating that integrating multiple modalities provides the most effective representation for GRN inference.

Overall, the study highlights the importance of multimodal integration, with Global Embeddings playing a key role in enhancing predictive performance.

Graph transformer layer ablation study

This study explores how different components of the GT layer impact the model’s performance, specifically in the context of predicting GRNs. The ablation study isolates specific components to assess their importance for the overall performance of the model. These components include attention heads, depth, FFN, normalization, and residual connections.

Table 7 presents the impact of different components of the GT layer on GRN inference performance. It summarizes various model variants, detailing the specific modifications made to each component, and reports the corresponding AUROC scores. This analysis highlights how changes to the GT layer architecture influence model effectiveness, providing insights into the relative importance of each component. Firstly, the full model demonstrates the highest performance, with an AUROC of 0.9852, emphasizing the importance of all components working in harmony to provide the most effective representation for GRN inference. When individual components, such as attention heads, FFN, or layer depth, are removed, the performance decreases, highlighting the critical role each element plays in the model’s effectiveness.

Table 7.

Ablation study of GT Layer components

Model variant Description of change Metric (AUROC)
Full model All components included 0.9852
Reduced heads Head = 1 0.9741
Reduced depth Layers = 2 0.9619
Feedforward network Remove FFN 0.8815
Normalization Disable normalization 0.8442
Residual connections Remove skip-connections between layers 0.8436

The bold value indicates the best performance.

Among the various ablations, the most significant performance drops occur when the FFN is removed or normalization is disabled. This indicates that these components are especially vital for the success of the GT layer. In general, the ablation study underscores the substantial benefits of incorporating multiple attention heads, a deeper structure, a FFN, normalization, and residual connections within the GT layer. Each of these components improves the predictive capabilities of the model and their removal leads to a marked decline in performance. Therefore, to achieve optimal results in GRN inference, it is essential to retain all these components.

Conclusion

In this work, we proposed a novel GRN inference framework, GT-GRN, which leverages GTs to infer regulatory links by incorporating graph-based techniques. Our approach begins by generating gene embeddings through an autoencoder. We then integrate prior network knowledge from known GRNs using an NLP-based BERT model, where these graphs are converted into sequences to extract contextual embeddings. Additionally, we incorporate graph positional information to enhance the inference process.

Through extensive experiments, GT-GRN demonstrates superior link prediction performance compared to baseline methods. We further assess the quality of the generated embeddings by evaluating their community structure, showing that GT-GRN effectively supports cell-type annotation in real PBMC gene expression datasets. Our ablation study reveals that the combination of gene expression data, global gene context, and positional information significantly contributes to improved GRN inference precision. While the current work focuses on accurate reconstruction of GRNs using multimodal embeddings, ongoing efforts are directed toward extending the framework to prioritize disease-associated genes. By leveraging the inferred GRNs, the goal is to identify key regulatory hubs and pathways that may play pivotal roles in disease development and progression.

Key Points

  • GT-GRN is a novel graph transformer framework for enhanced gene regulatory network inference.

  • It leverages multimodal embeddings by integrating gene expression data, prior biological network knowledge, and graph positional encodings.

  • Proposed model outperforms baseline models, achieving higher predictive accuracy and robustness on various datasets.

  • Achieves strong performance on cell-type classification using the peripheral blood mononuclear cell single-cell RNA sequence dataset and provides a scalable and extensible framework for diverse biological network analysis tasks.

Supplementary Material

GT_GRN_FINAL_FINAL_SUPP_bbaf584

Contributor Information

Binon Teji, Network Reconstruction & Analysis (NetRA) Lab, Department of Computer Applications, Sikkim University, 6th Mile, Tadong 737102, Sikkim, India.

Swarup Roy, Network Reconstruction & Analysis (NetRA) Lab, Department of Computer Applications, Sikkim University, 6th Mile, Tadong 737102, Sikkim, India; Department of Computer Science and Engineering, Tezpur University, Napaam, Tezpur 784028, Assam, India.

Dinabandhu Bhandari, Department of Computer Science and Engineering, Heritage Institute of Technology, Kolkata 700107, West Bengal, India.

Jugal Kalita, Department of Computer Science, University of Colorado, Colorado Springs, CO, 80918, United States.

Conflict of interest

None declared.

Funding

The research work is supported by the Department of Biotechnology (DBT), GoI, under the project BT/PR51150/NER/95/1996/2023. The work is also partially supported by IDEAS-TIH, ISI-Kolkata.

Data availability

All data and code used in this study are publicly available. The source code for GT-GRN can be accessed at https://github.com/Netralab/GT-GRN.

The datasets used in our experiments are available at the following locations:

References

  • 1. Bellot  P, Olsen  C, Salembier  P. et al.  Netbenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference. BMC Bioinformatics  2015;16:1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. de la Fuente  A. What are gene regulatory networks? In: Handbook of Research on Computational Methodologies in Gene Regulatory Networks, Hershey, PA, USA. pages 1–27. IGI Global, 2010, 10.4018/978-1-60566-685-3.ch001. [DOI] [Google Scholar]
  • 3. Guzzi  PH, Roy  S. Biological Network Analysis: Trends, Approaches, Graph Theory, and Algorithms. USA: Elsevier. 2020. [Google Scholar]
  • 4. Shyamsundar  R, Kim  YH, Higgins  JP. et al.  A DNA microarray survey of gene expression in normal human tissues. Genome Biol  2005;6:404–9. 10.1186/gb-2005-6-9-404 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Kolodziejczyk  AA, Kim  JK, Svensson  V. et al.  The technology and biology of single-cell RNA sequencing. Mol Cell  2015;58:610–20. 10.1016/j.molcel.2015.04.005 [DOI] [PubMed] [Google Scholar]
  • 6. Grindberg  RV, Yee-Greenbaum  JL, McConnell  MJ. et al.  RNA-sequencing from single nuclei. Proc Natl Acad Sci USA  2013;110:19802–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Talwar  D, Mongia  A, Sengupta  D. et al.  Autoimpute: autoencoder based imputation of single-cell RNA-seq data. Sci Rep  2018;8:16329. 10.1038/s41598-018-34688-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Huynh-Thu VA, Sanguinetti G. Gene Regulatory Network Inference: An Introductory Survey. In: Sanguinetti G, Huynh-Thu V. (eds) Gene Regulatory Networks. Methods in Molecular Biology, vol 1883. Humana Press, New York, NY. 2019. 10.1007/978-1-4939-8882-2_1 [DOI] [PubMed] [Google Scholar]
  • 9. Jha  M, Roy  S, Kalita  JK. Prioritizing disease biomarkers using functional module based network analysis: a multilayer consensus driven scheme. Comput Biol Med  2020;126:104023. 10.1016/j.compbiomed.2020.104023 [DOI] [PubMed] [Google Scholar]
  • 10. Roy  S, Bhattacharyya  DK, Kalita  JK. Reconstruction of gene co-expression network from microarray data using local expression patterns. BMC Bioinformatics  2014;15:1–14. 10.1186/1471-2105-15-S7-S10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Sebastian  S, Roy  S, Kalita  J. A generic parallel framework for inferring large-scale gene regulatory networks from expression profiles: application to Alzheimer’s disease network. Brief Bioinform  2023;24:bbac482. [DOI] [PubMed] [Google Scholar]
  • 12. Sebastian  S, Roy  S, Kalita  J. Network-based analysis of Alzheimer’s disease genes using multi-omics network integration with graph diffusion. J Biomed Inform  2025;164: 104797. 10.1016/j.jbi.2025.104797 [DOI] [PubMed] [Google Scholar]
  • 13. Mochida  K, Koda  S, Inoue  K. et al.  Statistical and machine learning approaches to predict gene regulatory networks from transcriptome datasets. Front Plant Sci  2018;9:421043. 10.3389/fpls.2018.01770 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Haury  A-C, Mordelet  F, Vera-Licona  P. et al.  Tigress: trustful inference of gene regulation using stability selection. BMC Syst Biol  2012;6:1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Margolin  AA, Nemenman  I, Basso  K. et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. London, UK: BioMed Central; 2006;7:1–15. 10.1186/1471-2105-7-S1-S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Meyer  PE, Kontos  K, Lafitte  F. et al.  Information-theoretic inference of large transcriptional regulatory networks. EURASIP J Bioinform Syst Biol  2007;2007:1–9. 10.1155/2007/79879 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Faith  JJ, Hayete  B, Thaden  JT. et al.  Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol  2007;5:e8. 10.1371/journal.pbio.0050008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Liu  W, Sun  X, Yang  L. et al.  NSCGRN: a network structure control method for gene regulatory network inference. Brief Bioinform  2022;23:1–14. 10.1093/bib/bbac156 [DOI] [PubMed] [Google Scholar]
  • 19. Bao  W, Yang  B. Protein acetylation sites with complex-valued polynomial model. Front Comp Sci  2024;18:183904. 10.1007/s11704-023-2640-9 [DOI] [Google Scholar]
  • 20. Yang  B, Bao  W, Chen  B. PGRNIG: novel parallel gene regulatory network identification algorithm based on GPU. Brief Funct Genomics  2022;21:441–54. 10.1093/bfgp/elac028 [DOI] [PubMed] [Google Scholar]
  • 21. Bleakley  K, Biau  G, Vert  J-P. Supervised reconstruction of biological networks with local models. Bioinformatics  2007;23:i57–65. 10.1093/bioinformatics/btm204 [DOI] [PubMed] [Google Scholar]
  • 22. Gillani  Z, Akash  MSH, Matiur Rahaman  MD. et al.  CompareSVM: supervised, support vector machine (SVM) inference of gene regularity networks. BMC Bioinformatics  2014;15:1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Razaghi-Moghadam  Z, Nikoloski  Z. Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data. NPJ Syst Biol Appl  2020;6:21. 10.1038/s41540-020-0140-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Daoudi  M, Meshoul  S. Deep neural network for supervised inference of gene regulatory network. In Modelling and Implementation of Complex Systems: Proceedings of the 5th International Symposium, MISC 2018, December 16–18, 2018, Laghouat, Algeria 5, p. 149–157. Springer, 64, 10.1007/978-3-030-05481-6_11. [DOI] [Google Scholar]
  • 25. Turki  T, Wang  JTL, Rajikhan  I. Inferring gene regulatory networks by combining supervised and unsupervised methods. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 140–5. Anaheim, CA, USA: IEEE (Institute of Electrical and Electronics Engineers), 2016. [Google Scholar]
  • 26. Mao  G, Pang  Z, Zuo  K. et al.  Gene regulatory network inference using convolutional neural networks from scRNA-seq data. J Comput Biol  2023;30:619–31. 10.1089/cmb.2022.0355 [DOI] [PubMed] [Google Scholar]
  • 27. Kc  K, Li  R, Cui  F. et al.  GNE: a deep learning framework for gene network inference by aggregating biological information. BMC Syst Biol  2019;13:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Teji  B, Das  JK, Roy  S. et al.  Predicting missing links in gene regulatory networks using network embeddings: a qualitative assessment of selective embedding techniques. In: Intelligent Systems: Proceedings of ICMIB 2021, pp. 143–154. IGIT Sarang; Sarang, Odisha, India: Springer, 2022. [Google Scholar]
  • 29. Teji  B, Roy  S, Dhami  DS. et al.  Graph embedding techniques for predicting missing links in biological networks: an empirical evaluation.  IEEE Trans Emerg Top Comput  2023;12:190–201. [Google Scholar]
  • 30. Dewey  GT, Galas  DJ. Gene regulatory networks. In: Madame Curie Bioscience Database [Internet]. Austin, TX, USA: Landes Bioscience, 2013. [Google Scholar]
  • 31. Wang  J, Ma  A, Ma  Q. et al.  Inductive inference of gene regulatory network using supervised and semi-supervised graph neural networks. Comput Struct Biotechnol J  2020;18:3335–43. 10.1016/j.csbj.2020.10.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Zhang  H, An  X, He  Q. et al.  Quadratic graph attention network (q-GAT) for robust construction of gene regulatory networks. arXiv, arXiv:2303.14193. 2023, preprint: not peer reviewed. https://arxiv.org/pdf/2303.14193
  • 33. Huang  Y, Gufeng  Y, Yang  Y. Miggri: a multi-instance graph neural network model for inferring gene regulatory networks for drosophila from spatial expression images. PLoS Comput Biol  2023;19:e1011623. 10.1371/journal.pcbi.1011623 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Wang  J, Chen  Y, Zou  Q. Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model. PLoS Genet  2023;19:e1010942. 10.1371/journal.pgen.1010942 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Li  S, Liu  Y, Shen  L-C. et al.  GMFGRN: a matrix factorization and graph neural network approach for gene regulatory network inference. Brief Bioinform  2024;25:bbad529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Mao  G, Pang  Z, Zuo  K. et al.  Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform  2023;24:1–11. 10.1093/bib/bbad414 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Zhou  Z, Wei  J, Liu  M. et al.  AnomalGRN: deciphering single-cell gene regulation network with graph anomaly detection. BMC Biol  2025;23:73. 10.1186/s12915-025-02177-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Tian  X, Patel  Y, Wang  Y. Trendy: gene regulatory network inference enhanced by transformer. bioRxiv, 2024, preprint: not peer reviewed, 2024–10. https://www.biorxiv.org/content/10.1101/2024.10.14.618189v1 [DOI] [PMC free article] [PubMed]
  • 39. Wang  Y, Zheng  P, Cheng  Y-C. et al.  Wendy: covariance dynamics based gene regulatory network inference. Math Biosci  2024;377:109284. 10.1016/j.mbs.2024.109284 [DOI] [PubMed] [Google Scholar]
  • 40. Jing  X, Zhang  A, Liu  F. et al.  STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics  2023;39:btad165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Shu  H, Ding  F, Zhou  J. et al.  Boosting single-cell gene regulatory network reconstruction via bulk-cell transcriptomic data. Brief Bioinform  2022;23:1–12. 10.1093/bib/bbac389 [DOI] [PubMed] [Google Scholar]
  • 42. Chan  TE, Stumpf  MPH, Babtie  AC. Gene regulatory network inference from single-cell data using multivariate information measures. Cell Systems  2017;5:251–267.e3. 10.1016/j.cels.2017.08.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Kingma  DP, Welling  M. Auto-encoding variational bayes. arXiv, arXiv:1312.6114. 2013, preprint: not peer reviewed. https://arxiv.org/pdf/1312.6114
  • 44. Grover  A, Leskovec  J. node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016, pp. 855–64, ACM; 2016. [DOI] [PMC free article] [PubMed]
  • 45.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is All You Need. In: Advances in Neural Information Processing Systems (NeurIPS 2017), vol. 30, pp. 5998–6008; 2017. [Google Scholar]
  • 46. Devlin  J. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv, arXiv:1810.04805, 2018, preprint: not peer reviewed. https://arxiv.org/pdf/1810.04805
  • 47. Ba  JL. Layer normalization. arXiv, arXiv:1607.06450. 2016, preprint: not peer reviewed. https://arxiv.org/pdf/1607.06450
  • 48. Ioffe  S. Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv, arXiv:1502.031672015, preprint: not peer reviewed. https://arxiv.org/pdf/1502.03167
  • 49. Yuan  Y, Bar-Joseph  Z. Deep learning for inferring gene relationships from single-cell expression data. Proc Natl Acad Sci USA  2019;116:27151–8. 10.1073/pnas.1911536116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Gray Camp  J, Sekine  K, Gerber  T. et al.  Multilineage communication regulates human liver bud development from pluripotency. Nature  2017;546:533–8. 10.1038/nature22796 [DOI] [PubMed] [Google Scholar]
  • 51. Pratapa  A, Jalihal  AP, Law  JN. et al.  Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods  2020;17:147–54. 10.1038/s41592-019-0690-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Schaffter  T, Marbach  D, Floreano  D. Genenetweaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics  2011;27:2263–70. 10.1093/bioinformatics/btr373 [DOI] [PubMed] [Google Scholar]
  • 53. Chen  G, Liu  Z-P. Graph attention network for link prediction of gene regulations from single-cell rna-sequencing data. Bioinformatics  2022;38:4522–9. 10.1093/bioinformatics/btac559 [DOI] [PubMed] [Google Scholar]
  • 54. Matos Simoes  R, de Emmert-Streib  F. Bagging statistical network inference from large-scale gene expression data. PLoS One  2012;7:e33624. 10.1371/journal.pone.0033624 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Matos Simoes  R de, Emmert-Streib  F. Influence of statistical estimators of mutual information and data heterogeneity on the inference of gene regulatory networks. PLoS One  2011;6:e29279. 10.1371/journal.pone.0029279 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Altay  G, Emmert-Streib  F. Inferring the conservative causal core of gene regulatory networks. BMC Syst Biol  2010;4:1–13. 10.1186/1752-0509-4-132 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Moerman  T, Santos  SA, González-Blas  CB. et al.  GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics  2019;35:2159–61. 10.1093/bioinformatics/bty916 [DOI] [PubMed] [Google Scholar]
  • 58. Huynh-Thu  VA, Irrthum  A, Wehenkel  L. et al.  Inferring regulatory networks from expression data using tree-based methods. PLoS One  2010;5:e12776. 10.1371/journal.pone.0012776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Teji  B, Roy  S, Guzzi  PH. et al.  Application of generative graph models in biological network regeneration: a selective review and qualitative analysis. In: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 5786–92. Lisbon, Portugal: IEEE, 2024. [Google Scholar]
  • 60. Alexander wolf  F, Angerer  P, Theis  FJ. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol  2018;19:1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Zhang  Q, Liu  W, Zhang  H-M. et al.  htftarget: a comprehensive database for regulations of human transcription factors and their targets. Genom Proteom Bioinformat  2020;18:120–8. 10.1016/j.gpb.2019.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Van Dijk  D, Sharma  R, Nainys  J. et al.  Recovering gene interactions from single-cell data using data diffusion. Cell  2018;174:716–729.e27. 10.1016/j.cell.2018.05.061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Traag  VA, Waltman  L, Van Eck  NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep  2019;9:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

GT_GRN_FINAL_FINAL_SUPP_bbaf584

Data Availability Statement

All data and code used in this study are publicly available. The source code for GT-GRN can be accessed at https://github.com/Netralab/GT-GRN.

The datasets used in our experiments are available at the following locations:


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES