Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2026 Mar 25;42(4):btag144. doi: 10.1093/bioinformatics/btag144

GRNFormer: accurate gene regulatory network inference using graph transformer

Akshata Hegde 1,2, Jianlin Cheng 3,4,
Editor: Anthony Mathelier
PMCID: PMC13069479  PMID: 41883144

Abstract

Motivation

Deciphering gene regulatory networks (GRNs) from single-cell transcriptomics data remains a fundamental challenge in computational biology. It is hindered by data sparsity, high dimensionality, and the lack of scalable, generalizable inference models. To address this, we present GRNFormer, a generalizable graph transformer framework for accurate GRN inference from transcriptomics data across species, cell types, and platforms without requiring cell-type annotations or prior regulatory information.

Results

GRNFormer integrates a transformer-based gene expression encoder (Gene-Transcoder) with a variational graph autoencoder (GraViTAE) employing pairwise attention to jointly learn the representations of genes (nodes) and their co-expression relationships (edges). Leveraging TF-Walker, a transcription factor-anchored subgraph sampling strategy, it effectively captures gene regulatory interactions from either single-cell or bulk RNA-seq datasets. Benchmarking on standard datasets demonstrates that GRNFormer outperforms existing traditional and deep learning state-of-the-art methods in blind evaluations, achieving average sampled area under the receiver operating characteristic curve (Sampled_AUROC) and sampled area under the precision–recall curve (Sampled_AUPRC) values between 0.90 and 0.98 as well as 0.87–0.98 average sampled F1 score. The model robustly recovers both known and novel regulatory networks, including pluripotency circuits in human embryonic stem cells (hESCs) and immune cell modules in peripheral blood mononuclear cells (PBMCs). The architecture enables scalable, biologically interpretable GRN inference across various datasets, cell types, and species, establishing GRNFormer as a robust and transferable tool for network biology.

Availability and implementation

GRNFormer is available on GitHub (https://github.com/BioinfoMachineLearning/GRNformer); the version used in this work is archived on Zenodo (https://doi.org/10.5281/zenodo.18868395), with evaluation resources for reproducibility.

1 Introduction

Precise gene regulation is essential for normal cellular function, while its dysregulation is linked to diseases such as cancer, neurodegeneration, and developmental disorders. Understanding gene regulation at the network level—through gene regulatory networks (GRNs)—can illuminate the interactions among genes and proteins that orchestrate cellular function. GRN inference enables the identification of key regulatory drivers and pathways, offering a foundation for mechanistic insights and therapeutic targeting (De Smet and Marchal 2010). However, inferring GRNs from high-throughput expression data remains challenging due to noise, high dimensionality, and limited sample sizes (Chen et al. 2019).

Traditional approaches such as information-theoretic methods (e.g. ARACNE, CLR) (Margolin et al. 2006, Faith et al. 2007), or Bayesian networks (Shermin and Orgun 2009) can model gene dependencies but often require large datasets and face scalability limitations. Dynamic models, including Boolean networks and differential equation-based approaches (Hickman and Hodgman 2009, Lu et al. 2011), offer temporal insights but depend on time-resolved data, which is often lacking. The BEELINE benchmarking study (Pratapa et al. 2020) underscores these issues, showing that conventional statistical and shallow learning approaches struggle to generalize across diverse regulatory contexts. Recent approaches, particularly those based on deep learning, are capable of learning complex, non-linear patterns from high-dimensional data, making them better suited for inferring GRNs from RNA-Seq data (Huynh-Thu and Sanguinetti 2018, Yang et al. 2019). In particular, transformers have revolutionized the perspectives of analysing the input information through the “attention” mechanism (Vaswani et al. 2017), integrating global and local dependencies enabling improved GRN inference (Ma et al. 2021). There have been a few transformer-based GRN inference methods developed, such as STGRNs (Xu et al. 2023) and scGREAT (Wang et al. 2024). Complementing these, graph neural networks (GNNs) provide a natural framework for modeling GRNs by learning over graph-structured data, effectively capturing both local and global regulatory patterns (Kipf and Welling 2016, Zhou et al. 2020). GeneLink is one such early link-prediction methods combining GNNs and attention (Chen and Liu 2022).

However, most GRN inference methods are tailored to specific datasets, limiting their generalizability across species, cell types, or conditions. Hence, there is a critical need for robust, transferable models that perform reliably beyond narrow contexts. In this study, we introduce GRNFormer, a graph deep-learning framework developed to infer GRNs from either single-cell or bulk transcriptomic data with high accuracy and generalizability (Fig. 1). Designed to operate across diverse cell types, species, and regulatory contexts, GRNFormer addresses key challenges in GRN inference such as context-specificity, data sparsity, and model transferability.

Figure 1.

A schematic diagram of the GRNFormer pipeline for gene regulatory network inference. On the left, a table labeled “Input: Gene Expression Data (genes × cells)” shows expression values across cells, which are used to construct a gene co-expression network (GCEN) where nodes are genes and edges represent weighted correlations. A “TF-Walker (subgraph sampling)” module extracts transcription factor (TF)-centered subgraphs from this network. In parallel, gene expression data are converted into node and edge embeddings and passed through a GraViTAE encoder to produce a latent space, which is sampled and then decoded by a GraViTAE decoder. A gene-transcoder generates fixed gene expression embeddings that integrate with the sampled TF-centered subgraphs. Finally, a GRN inference module uses these representations to produce the output: a gene regulatory network with genes connected by weighted regulatory interactions.

Overview of GRNFormer pipeline.

2 Materials and methods

GRNFormer architecture comprises of three main components (Fig. 1). First, TF-Walker introduces a transcription factor (TF)-centered de novo subgraph sampling approach that constructs localized gene co-expression subgraphs from a full gene co-expression network (GCEN) input, capturing the neighborhood context around each TF. This biologically informed strategy enhances the model’s ability to learn meaningful regulatory patterns by focusing on TF-driven structures in the gene expression space.

Second, we implement end-to-end representation learning through two key modules: the Gene-Transcoder, a transformer-based embedding module that captures context-aware gene representations across diverse datasets; and GraViTAE (graph variational transformer autoencoder), which jointly encodes node and edge features to reconstruct gene regulatory feature representations.

Third, a dedicated GRN inference module integrates node- and edge-level representations to predict TF–target interactions. This inference strategy allows GRNFormer to generalize effectively across input sizes, data modalities, and regulatory frameworks, including both ChIP-seq-based and STRING (Szklarczyk et al. 2023)-derived networks.

2.1 Construction of gene co-expression network

To construct the GCEN, raw single-cell RNA-seq (scRNA-seq) expression data were first normalized using the inverse hyperbolic sine (Arc Sinh) transformation to stabilize variance across highly skewed expression profiles (Johnson and Krishnan 2022). Pairwise Pearson correlations were then computed between all genes across individual cells to quantify co-expression, and gene pairs with absolute correlation >0.1 were retained to form the GCEN. In this graph, nodes represent genes and edges denote co-expression relationships, forming a dense, high-dimensional network. The GCEN construction workflow is illustrated in Method S1; Fig. S1A, available as supplementary data at Bioinformatics online.

2.2 TF-Walker—a de novo subgraph sampling method

scRNA-seq data present a fundamental challenge for machine learning: expression matrices are inherently high-dimensional and sparse, with expression values captured across thousands of genes and cells. However, these datasets often contain only a limited number of training examples. This imbalance results in under-constrained models, where the number of features far exceeds the number of samples needed for learning regulatory patterns.

To address these challenges, GRNFormer incorporates a biologically motivated subgraph sampling strategy, termed TF-Walker (Fig. S1B, available as supplementary data at Bioinformatics online) that functions as a principled form of data augmentation. Instead of operating on the full GCEN, TF-Walker extracts local, TF-centered subgraphs that capture meaningful co-expression contexts. To extract TF-centered subgraphs from the GCEN, we iteratively process each TF as a center node. For each TF, we first identify its direct neighbors (hop = 1) in the network. If the total number of neighbors (T) is <99, we incrementally expand the neighborhood by increasing the hop distance (hop + 1) and recursively collecting neighbors at each subsequent hop until T reaches or exceeds 99. During training, we fixed the subgraph size to 100 nodes (one TF node and 99 neighbors), which balances capturing meaningful regulatory context, minimizing overlap among subgraphs, and computational efficiency. If the neighborhood size exceeds T, we do single random selection of 99 neighbors. Once the target neighborhood size is achieved, we select all edges present between the nodes within the selected neighborhood, ensuring comprehensive connectivity representation. Subsequently, we extract all node and edge features associated with the selected subgraph, including binary TF identities, co-expression weights (positive or negative correlations). This process generates a TF-centered subgraph for each TF, where the TF serves as the central hub connected to up to 99 neighboring genes through their co-expression relationships, thereby capturing the local regulatory context surrounding each TF within the broader network topology. This procedure is repeated for all identified TFs in the network, resulting in a collection of TF-specific subgraphs that preserve the structural and functional characteristics of each TF’s regulatory neighborhood while maintaining computational tractability through the size constraint. The TF-Walker sampling algorithm is shown in Fig. S1B and formalized in Method S2, available as supplementary data at Bioinformatics online. In contrast, during inference, TF-Walker deterministically extends to all available neighbors of a given TF, traversing the GCEN in a sequential and exhaustive manner to generate subgraphs. This ensures that regulatory predictions are based on the full local topology of the expression network, without arbitrary truncation. Additional details on the inference strategy are provided in the section 2.9.

2.3 Ground truth network sampling and feature representation

To supervise GRNFormer training, we constructed comprehensive ground-truth regulatory networks by integrating three validated sources: cell-type-specific ChIP-seq data, nonspecific ChIP-seq networks, and STRING-derived protein-protein interactions. This multi-source integration enhances biological coverage and supports cross-species generalization.

For each TF-centered subgraph generated by TF-Walker, a corresponding labeled regulatory subgraph was created by identifying all nodes (genes) in the subgraph and extracting known regulatory interactions among them from the global ground-truth network. If no regulatory edges existed between nodes, the subgraph was discarded to ensure only biologically informative samples were used for training. This approach guarantees that the model learns from functionally relevant regulatory relationships, improving inference reliability.

Each subgraph is represented as a graph of 100 genes (nodes), where the features of each node consist of the expression values of the gene across all cells along with a binary indicator denoting TF identity (i.e. if a gene is a TF). Edge features are defined by pairwise Pearson correlation values computed from the full dataset, capturing global co-expression structure. Gene expression of each subgraph undergoes cellwise z-score normalization across genes. This normalization mitigates information leakage by preventing memorization of gene identities/patterns across subgraphs while preserving local expression dynamics. While the edge weights reflect broader transcriptional associations, the node-level gene expression inputs provide localized information contextualized at the subgraph level.

2.4 Gene-Transcoder: transformer encoder for gene expression representation learning

Figure 2A illustrates the architecture of the Gene-Transcoder, a transformer-based encoder designed for gene expression representation learning, which is the first stage in training GRNFormer. It is designed to address variability in scRNA-seq datasets, where the number of cells and expression ranges can differ substantially across experiments, making it challenging for conventional models to learn generalizable patterns. To tackle this issue, the Gene-Transcoder processes gene expression data to produce fixed, representative embeddings that capture essential biological information.

Figure 2.

A multi-panel diagram showing four sequential components of the GRNFormer model. (A) Gene-Transcoder: Gene expression input (heatmap) is passed through a 1D convolution encoder and a transformer encoder (with transformer layers), followed by mean pooling to produce gene expression embeddings. (B) GraViTAE Encoder: Node and edge embeddings are input into the encoder. The detailed architecture consists of multiple TransConv blocks that generate latent representations, producing node latent space (z: [100, d]) and edge latent space. (C) GraViTAE Decoder: Latent node and edge representations are sampled from a Gaussian space and passed through several TransConv blocks and an MLP. The decoder outputs predicted node representations (Z′: [100, d]) and predicted edge representations ([2, number of edges, d]). (D) GRN Inference: The decoder outputs are used to compute edge probabilities. Node representations are combined via inner product (Z′ × Z′ᵀ), integrated with edge representations, and passed through a sigmoid function to produce a final matrix of edge probabilities (100 × 100), representing the inferred gene regulatory network.

The four key sequentially connected components of GRNFormer. (A) Architecture of Gene-Transcoder for fixed embeddings of gene expression. (B) Detailed architecture of GraViTAE Encoder. (C) Detailed architecture of GraViTAE variational decoder. (D) Edge probability prediction of GRN inference module.

Gene-Transcoder begins with an input gene expression matrix (batch × 100 genes × no_of_cells + 1), where the additional channel corresponds to the TF-identity flag appended along the cell/feature dimension during feature extraction. The 100-gene input dimension used in Gene-Transcoder corresponds to the TF-Walker subgraph size during training. A 1D convolutional layer processes the gene expression matrix. This convolution utilizes a sliding kernel of size 1, and stride 1 along the cell axis to capture local patterns of gene expression and TF-related signal across cells. These features are then passed to a transformer encoder of single layer, where a 4-head attention mechanism is employed. This mechanism helps the model to jointly capture local expression dependencies among nearby genes and global relationships that span distant genes or cell populations and produces embeddings that represent gene expression data in an embedding feature space.

The transformer outputs are averaged across cells via mean pooling to produce fixed 64-dimensional embeddings (batch × 100 × 64), providing a compact yet information-rich representation for each gene. These embeddings are invariant to dataset-specific variation, enabling effective generalization across species, cell types, and experimental conditions. Detailed mathematical algorithm for the Gene-Transcoder is provided in Method S3; Fig. S2, available as supplementary data at Bioinformatics online. By transforming variable-length, noisy expression profiles into fixed, context-aware embeddings, this module establishes a robust foundation for downstream GRN inference.

2.5 Graph variational transformer autoencoder

Following Gene-Transcoder is the graph variational transformer autoencoder (GraViTAE), that forms the core of GRNFormer. GraViTAE is a supervised variational graph transformer autoencoder composed of an encoder–decoder pair operating on TF-centered gene co-expression subgraphs. It integrates fixed-length node embeddings from the Gene-Transcoder with co-expression edge weights for joint message passing.

As shown in Fig. 2B, the GraViTAE encoder, applies stacked Transformer Convolution (TransConv) blocks to capture both local (e.g. between closely related genes) and global gene–gene interactions (e.g. co-expression relationships across a subgraph). Each block combines multi-head attention (4-head) with feedforward layers to update the node and edge features jointly. The GraViTAE encoder produces the parameters of a Gaussian latent distribution for each TF-centered subgraph. Let z(v) denote the latent embedding of the TF-centered subgraph for TF v. Given input features x, the encoder outputs a mean and standard deviation vector, μϕ(x) and σϕ(x), defining the approximate posterior

qϕ(z(v)x(v))=N(z(v)μϕ(x(v)), σϕ2(x(v))I)

where ϕ: parameters of the encoder network. N(μ,σ2): multivariate normal distribution with mean μ and covariance σ2. σϕ2(x(v))I: diagonal covariance matrix, where I is the identity matrix. To regularize the latent space and promote generalization, the encoder estimates distributions over latent variables by computing both the mean (μ) and variance (σ2) of node and edge embeddings. These parameters are used to sample continuous latent embeddings via a Gaussian reparameterization trick

Z(v)=μϕ(x(v))+σϕ(x(v))ϵ, ϵN(0,I).

where Z(v): latent embedding sampled for the TF-centered subgraph. This variational formulation enables uncertainty modeling in noisy single-cell expression data, while the joint encoding of expression and co-expression features ensures biologically grounded, context-aware representations. The GraViTAE decoder (Fig. 2C) reconstructs node and edge embeddings from the sampled latent space of the encoder. Using these samples, the decoder applies additional TransConv blocks followed by a lightweight multilayer perceptron (MLP) to produce two outputs: updated node embeddings Z representing each gene’s regulatory identity and edge attentions, indicating the strength of potential regulatory interactions. These outputs serve as intermediate proxy representations to guide the downstream GRN inference.

The TransConv block (Fig. S3A, available as supplementary data at Bioinformatics online) underpins both GraViTAE encoder and decoder modules (Fig. 2B and C). At the heart of the TransConv block lies the Transformer Convolution layer. It extends transformer-based graph convolution (Shi et al. 2020) by incorporating edge attributes such as co-expression into its pairwise attention computations, along with node features, enabling context-aware message passing. The pairwise attention coefficient aij between genes i and j is defined as

aij=softmax(qi(kj+Weeij)d),

where qi=Wqxi, kj=Wkxj, and eij is the edge feature. The pairwise attention details, message passing, and update steps implemented through the Transformer Convolution layer are provided in Method S4 and Fig. S3B, available as supplementary data at Bioinformatics online. Each block includes multi-head pairwise attentions with residual connections, feedforward layers, and non-linear transformations, followed by batch normalization. We employ Leaky ReLU activations to preserve gradient flow and capture negative co-expression patterns often associated with repressive regulation that standard ReLU activations tend to suppress (Xu et al. 2020). Implemented in PyTorch Geometric, this architecture effectively models the combinatorial and context-dependent nature of transcriptional regulation.

2.6 Adjacency matrix reconstruction and GRN inference

The final step of GRNFormer reconstructs a probabilistic adjacency matrix representing the likelihood of regulatory interactions between genes. This is achieved by combining inner products of node embeddings with pooled edge features, followed by a sigmoid activation to produce gene–gene regulatory interaction probabilities (Fig. 2D). Detailed steps of adjacency matrix reconstruction are described in Method S5, available as supplementary data at Bioinformatics online. This subgraph level inference captures modular gene regulation, while a dedicated aggregation strategy integrates predictions across subgraphs to reconstruct a coherent global GRN (see Section 2.9 for details).

2.7 Datasets and preprocessing

2.7.1 BEELINE data

To train and evaluate GRNFormer in biologically diverse settings, we used single-cell RNA-seq datasets curated by BEELINE, a comprehensive benchmarking framework for GRN inference. These datasets span two species (human and mouse), seven cell types, and include three types of regulatory ground-truth networks, offering a broad foundation for testing model generalization across cell types, regulatory contexts, species, and evolutionary backgrounds.

Our analysis focused on all seven cell-type representative datasets: mouse hematopoietic stem cells (mHSC-E, mHSC-GM, mHSC-L), embryonic stem cells (mESC), dendritic cells (mDC), human hepatocytes (hHeps), and embryonic stem cells (hESC). Together, they reflect a wide range of biological processes from early development and lineage commitment to immune regulation and tissue-specific differentiation. All expression data were preprocessed using BEELINE’s standardized workflow, including log transformation, gene filtering, and normalization, ensuring consistency and cross-dataset comparability. Each expression dataset was paired with three regulatory reference networks (i) cell-type-specific TF–target interactions from experimental ChIP-seq, (ii) non-cell-type-specific ChIP-seq networks, and (iii) functional protein interaction networks from STRING. The combined use of regulatory evidence enabled robust benchmarking of GRNFormer’s predictions in both precise and broad regulatory contexts.

2.7.2 DREAM5 challenge and PBMC datasets

We also applied GRNFormer, pretrained on the BEELINE single-cell data to bulk RNA-seq datasets from the DREAM5 challenge (Marbach et al. 2012), that includes Escherichia coli and Saccharomyces cerevisiae data. These well-characterized prokaryotic and eukaryotic networks, with gold-standard ChIP-derived regulatory maps, evaluated model’s cross-species transferability and robustness to different transcriptomic data types. To further test generalizability, we conducted zero-shot inference using the 10x Genomics PBMC 3k dataset (Genomics 10x 2018), which lacks any cell-type labels or prior regulatory annotations. This allowed us to assess GRNFormer’s ability to recover meaningful regulatory structure de novo.

2.8 Training

To assess the generalizability of GRNFormer, we adopted a cross-lineage training and evaluation strategy. The model was trained on subgraphs derived from five datasets—human embryonic stem cells (hESCs), hHeps, mouse dendritic cells (mDC), and two mouse hematopoietic stem cell subtypes (mHSC-GM, mHSC-E)—while holding out mouse embryonic stem cells (mESC) and an additional mHSC subtype (mHSC-L) for blind testing. For each dataset, we first constructed full GCENs and then applied TF-Walker subgraph sampling to generate localized training instances. Ground-truth regulatory interactions were aggregated from three sources—cell-type-specific ChIP-seq, nonspecific ChIP-seq, and STRING networks—into a unified training label set, allowing the model to learn from heterogeneous regulatory modalities.

Training on shuffled subgraphs from diverse datasets encouraged the model to capture transferable transcriptional patterns rather than overfitting to dataset-specific profiles. The Gene-Transcoder module encodes variable-length gene expression profiles of subgraph genes into fixed-length node embeddings of dimension 64. These node embeddings, along with co-expression values used as edge features, are passed to the encoder implemented as a part of variational graph transformer autoencoder (GraViTAE). The encoder comprises four Transformer Convolution (TransConv) blocks, each with four attention heads, and projects the input features into a latent space of dimension 16.

The sampled latent node and edge representations are then passed through the GraViTAE decoder, which consists of three additional TransConv blocks followed by two fully connected (MLP) layers, reconstructing interaction-specific representations. These intermediate outputs are aggregated and transformed into probabilistic edge predictions used to infer the GRN. GRNFormer was trained for 100 epochs with a batch size of 8, using a composite loss function that combines binary cross-entropy (BCE) for predicting regulatory edge presence with Kullback–Leibler (KL) divergence to regularize the variational latent space.

The BCE reconstruction loss is

LBCE(v)=-(i,j)[yij(v).logy^ij(v)+(1-yij(v)) . log (1-y^ij(v))],

which measures the discrepancy between the predicted probabilities y^ij(v) and the true binary labels yij(v){0,1}. To regularize the latent representation z(v), we include the KL divergence between the approximate posterior qϕ(z(v)x(v)) and a standard normal prior p(z). The normalized KL term is

LKL(v)=-12k[1+log(σϕ,k2)-μϕ,k2-σϕ,k2],

where μϕ,k and σϕ,k2 are the mean and variance of the k-th component of the latent vector z(v).

The total loss combines the reconstruction and regularization terms as

Ltotal=v(LBCE(v)+βLKL(v)),

where β is a scaling factor (e.g. 0.01or 1/#genes in subgraph) controlling the strength of KL regularization.

The training objective is framed as a binary classification task: identifying whether an edge between any two genes within a subgraph corresponds to a known regulatory interaction. Detailed loss functions and variational formulation are also explained in Method S6, available as supplementary data at Bioinformatics online, along with evidence lower bound (ELBO) formalization of the model. The model used Adam optimizer (initial learning rate 0.001) with a ReduceLROnPlateau scheduler and early stopping. L1 regularization was applied to prevent overfitting.

To address extreme class imbalance-typical of GRN inference where true regulatory edges are sparse, we employed dynamic negative sampling during training. For each subgraph, we matched the number of negative (non-regulatory) edges to the number of positives, ensuring balanced and effective learning across highly imbalanced edge distributions. Importantly, the model was trained without any explicit knowledge of species, cell type, or regulatory context, and therefore it can be applied across diverse species, cell types, and regulatory contexts. All subgraphs were randomly shuffled prior to training, forcing the model to rely solely on expression-derived contextual signals. GRNFormer is trained on two GPUs of NVDIA A10, on national GPU clusters of Nautilus server. Table 1 consolidates the essential configuration settings for Gene-Transcoder, GraViTAE, and TF-Walker, as well as the training procedure.

Table 1.

Summary of key architectural and training hyperparameters used in GRNFormer.

Module Key settings
Gene-Transcoder 1D conv (kernel = 1, stride = 1); single-layer transformer encoder (four heads); output embedding = 64 dims; input = 100 genes × (#cells + TF flag).
GraViTAE (Encoder/Decoder) 3 TransConv blocks (encoder) + 3 TransConv blocks (decoder); each block with four attention heads per block; hidden size = 64; latent mean/variance dim = 16.
TF-Walker Training subgraph size = 100 genes; co-expression threshold = 0.1
Training Adam (lr = 0.001); ReduceLROnPlateau; batch size = 8; epochs = 100; L1 regularization; loss = BCE + normalized KL divergence (β = 1/#genes_in_subgraph)

2.9 GRN inference and evaluation on test datasets

During inference, GRNFormer employs an extended version of the TF-Walker strategy to generate high-coverage subgraphs for each test dataset. Unlike training, where 99 neighbors are sampled randomly, inference subgraphs are constructed sequentially to include all co-expressed neighbors of each TF, ensuring maximal gene and interaction coverage. Multiple subgraphs may contain overlapping genes; however, each subgraph is independently z-score normalized, treating recurring genes as context-specific instances to enhance robustness. The deterministic expansion of TF-Walker algorithm used during inference stage is illustrated in Method S7; Fig. S4, available as supplementary data at Bioinformatics online.

Predicted interactions from all subgraphs are aggregated to reconstruct the full GRN, with a focus on TF–gene and TF–TF interactions that form the regulatory core. Within each subgraph, interactions with predicted probability >0.3 were retained to capture both confident and potentially weaker but biologically relevant regulatory relationships, recognizing that gene regulation often involves subtle, low-strength interactions. TF-initiated edges were prioritized during aggregation. Each interaction was assigned a weighted score based on prediction probability, TF involvement, and frequency across subgraphs. These scores were summed, averaged, and min–max normalized to produce the final adjacency matrix, with entries representing the likelihood of regulatory relationships between gene pairs.

We evaluated GRNFormer using standard classification metrics—area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), F1 score, precision, recall, accuracy and early precision (EPR)—each offering a complementary insight into performance under class imbalance. Metric definitions are provided in Methods S8, available as supplementary data at Bioinformatics online. Context-specific evaluations were conducted on the BEELINE cross-cell-type test dataset (mESC, mHSC-L), focusing on biologically relevant gene subsets (e.g. top 500–1000 most variable genes, with or without TFs) to assess model robustness under-constrained feature spaces.

Given the absence of annotated true negatives in GRN benchmarks, we adopted an enhanced negative sampling strategy based on bootstrapping. Following practices used in GRLGRN (Wang et al. 2025), CNNC, and scGREAT, we randomly sampled gene pairs not found in the ground-truth network as negatives. For each test dataset, we paired known TF–target interactions (positives) with an equal number of randomly sampled negatives. This process was repeated over 100 bootstrap iterations, reducing sampling bias and providing statistically robust performance estimates. In each iteration, classification metrics were computed based on predicted probabilities and the combined set of positives and sampled negatives. Final scores were averaged across all iterations. We primarily report AUROC and AUPRC, in benchmarks which are most reliable in evaluating performance on sparse, imbalanced regulatory networks. When we calculate AUROC and AUPRC using the bootstrapped negative samplings, we report them as Sampled_AUROC and Sampled_AUPRC and full-matrix evaluations are reported as Full test-set AUROC/AUPRC.

Hyperparameter sensitivity analyses for the GCEN co-expression threshold and TF-Walker subgraph size were performed at the inference stage (Method S11, available as supplementary data at Bioinformatics online). Figure S5, available as supplementary data at Bioinformatics online, shows that GCEN thresholds of 0.1–0.3 yield stable performance, and Fig. S6 and Table S4, available as supplementary data at Bioinformatics online, indicate that TF-Walker subgraph sizes of 100–200 genes are consistently optimal. In our main experiments, we used GCEN = 0.1 and a subgraph size of 100, and these parameters are configurable in the GRNFormer codebase for application-specific tuning. Guidelines for selections are provided in Method S11 C, available as supplementary data at Bioinformatics online.

3 Results

3.1 High-accuracy inference of GRNs across regulatory contexts

To evaluate the generalizability and contextual sensitivity of GRNFormer, we conducted a comprehensive benchmarking study using the BEELINE suite of single-cell gene expression datasets. Our goal was to assess whether GRNFormer could robustly infer GRNs across species boundaries, cell identities, and diverse regulatory contexts without relying on prior knowledge.

GRNFormer was trained to infer GRNs solely from gene and TF expression profiles, without providing information about gene names, cell type, species, or regulatory context during training or evaluation. This design encourages inductive generalization across diverse biological settings. During evaluation, global GRNs were reconstructed by aggregating regulatory interactions inferred from locally sampled, TF-centered co-expression subgraphs.

Training and internal evaluation were performed on five cell types (hESC, hHep, mDC, mHSC-E, and mHSC-GM). Although individual genes may appear in both phases, we performed localized z-score normalization across genes for each cell in every subgraph, ensuring that the model treats each instance independently and preventing memorization. To rigorously assess generalization and guard against information leakage, we conducted a cross-dataset evaluation on two held-out cell types—mESC and mHSC-L—not seen during training.

Table 2 summarizes GRNFormer’s average performance on these two withheld cell types (mESC and mHSC-L) unseen in the training across regulatory settings, reporting Sampled_AUROC(SAUROC), Sampled_AUPRC(SAUPRC), Sampled-F1 score, precision, recall, and accuracy for each species-cell type-regulatory context combination. The results of the internal evaluation on the other five cell types are presented in Table S1, available as supplementary data at Bioinformatics online. During the cross-dataset evaluation, we evaluated GRNFormer’s performance on subsets of the dataset of each cell type (mESC and mHSC-L) containing the top 500 and 1000 highly variable genes, both with and without the most significantly varying TFs. The test subsets were prepared according to BEELINE’s standard data preprocessing and evaluation protocols.

Table 2.

Summary of evaluation metrics of GRNFormer across BEELINE test dataset.

Network type Cell type Gene sets #Genes #True positives SAUROC SAUPRC Precision Recall F1-score Accuracy
Non-cell-type specific ChIP-Seq mESC 1000 1000 852 0.965 0.938 0.925 0.998 0.96 0.959
500 500 264 0.97 0.953 0.92 0.999 0.958 0.956
TF + 1000 1000 853 0.965 0.939 0.926 0.998 0.961 0.959
TF + 500 500 264 0.965 0.943 0.918 0.999 0.957 0.955
mHSC-L 1000 1000 589 0.984 0.974 0.956 0.999 0.978 0.977
500 500 109 0.981 0.966 0.956 1 0.977 0.977
TF + 1000 1124 771 0.979 0.964 0.945 0.999 0.972 0.971
TF + 500 624 467 0.942 0.903 0.886 0.859 0.872 0.874
Cell type specific ChIP-Seq mESC 1000 1000 5499 0.967 0.942 0.93 0.998 0.963 0.962
500 500 1913 0.968 0.944 0.923 0.999 0.959 0.958
TF + 1000 1000 5500 0.968 0.943 0.93 0.999 0.963 0.962
TF + 500 500 1913 0.967 0.943 0.923 0.999 0.96 0.958
mHSC-L 1000 1000 1325 0.979 0.959 0.956 0.999 0.978 0.977
500 500 1295 0.979 0.954 0.96 0.999 0.98 0.979
TF + 1000 1124 7746 0.976 0.954 0.949 1 0.974 0.973
TF + 500 624 5182 0.943 0.898 0.89 1 0.942 0.938
STRING mESC 500 500 264 0.957 0.928 0.916 0.996 0.954 0.952
1000 1000 661 0.963 0.937 0.926 1 0.961 0.960
TF + 1000 1000 661 0.964 0.937 0.925 0.991 0.957 0.955
TF + 500 500 263 0.959 0.931 0.917 0.992 0.953 0.951
mHSC-L 500 500 14 0.982 0.973 0.957 1 0.977 0.976
1000 1000 537 0.991 0.982 0.956 1 0.977 0.977
TF + 1000 1124 114 0.981 0.972 0.941 1 0.969 0.968
TF + 500 624 67 0.944 0.913 0.89 0.855 0.872 0.874

As shown in Table 2, GRNFormer consistently achieves high accuracy across all the subsets spanning different network types (nonspecific-ChIP-Seq network, cell-type-specific ChIP-Seq network, and STRING-based network), cell types, and gene sets. High Sampled_AUROC and Sampled_AUPRC scores between ∼0.90 and 0.98 as well as high Sampled F1 scores between 0.87 and 0.98, indicates robust, accurate, generalized GRN inference across diverse conditions.

3.2 GRNFormer outperforms state-of-the-art GRN inference methods across single-cell benchmarks under blind evaluation

To systematically evaluate the performance of GRNFormer, we conducted multiple comprehensive benchmarking analyses against nine widely used GRN inference methods: with five deep-learning methods, CNNC (Yuan and Bar-Joseph 2019), GNE (Kc et al. 2019), GNNLink (Mao et al. 2023), STGRNS, and scGREAT and four traditional methods, LEAP (Jiang and Neapolitan 2015), PIDC (Chan et al. 2017), PPCOR (Kim 2015), SINCERITIES (Papili Gao et al. 2018). Table S2, available as supplementary data at Bioinformatics online, provides the brief summary of each method used, and Method S9, available as supplementary data at Bioinformatics online, provides details on benchmark setup. Performance was assessed on three independent gold-standard reference networks: (i) non-cell-type-specific ChIP-seq, (ii) cell-type-specific ChIP-seq, and (iii) functional interaction networks from STRING. For each method and evaluation setting, we computed, Full test-set AUROC/AUPRC, Full test-set EPR and Sampled_AUROC/AUPRC. Gene selection followed the BEELINE protocol, using four standard configurations: (i) top 500 highly variable genes (500), (ii) top 1000 variable genes (1000), (iii) TFs + top 500 (TF500), and (iv) TFs + top 1000 (TF1000) variable genes. While BEELINE primarily targets unsupervised GRN inference, our benchmarking includes both supervised and unsupervised methods; therefore, we extend the evaluation framework through a unified clean negative pool to enable fair comparison across learning paradigms.

GRNFormer was trained once on general dataset spanning five cell types (hESC, hHep, mDC, mHSC-E, mHSC-GM) and evaluated on two unseen cell types (mESC and mHSC-L) without task-specific fine-tuning, simulating a stringent, cross datasets generalization scenario. In contrast competing deep-learning methods were trained and evaluated on within-dataset splits of each test cell type, following protocols described in their original publications.

This less stringent evaluation strategy was applied to the nine state-of-the-art (SOTA) methods, because, as described in their original publications, most of these models were optimized for within-dataset performance and did not perform cross-cell-type generalization. For consistency and fairness, we reproduced each competing method using the same training–validation splits but retained their originally reported training procedures. In all evaluations, only positive regulatory interactions were split into training, validation, and test sets. For sampled metrics, negative edges were bootstrapped in a 1:1 ratio from a clean negative evaluation pool, whereas full test-set AUROC and AUPRC were computed using the entire clean negative pool. This clean pool is constructed by removing all training/validation/test positives, all training/validation negatives, and self-loops from the full gene–gene space, ensuring that none of the negative edges used for evaluation were seen during any of the model’s training. The construction of the clean negative evaluation pool is described in Method S10, available as supplementary data at Bioinformatics online.

To evaluate performance under realistic extreme class imbalance, we first performed a full-matrix evaluation using the positive test set and complete clean negative evaluation pool which served as unseen negatives (see Method S10 for detailed protocol, Fig. S8B for test-positives-to-negative edge counts, available as supplementary data at Bioinformatics online).

Figure 3 presents the comparison heatmaps of Full test-set AUROC and Full test-set AUPRC across all methods and test datasets under this standardized full-matrix evaluation. Even though GRNFormer was tested on cell types it had never encountered during training while several competing methods were trained and tested within the same cell type, GRNFormer consistently achieves the highest or near-highest performance across most of the datasets. In Full test-set AUROC (Fig. 3A), GRNFormer maintains strong performance in both non-cell-type-specific and STRING network settings (Full test-set AUROC >0.8 21/24 datasets), with particularly robust values in the mESC STRING benchmark and the non-ChIP-seq TF500/TF1000 gene sets (Full test-set AUROC >0.9). In terms of Full test-set AUPRC (Fig. 3B), GRNFormer shows clear advantages in precision–recall behavior, especially on non-cell-type-specific and cell-type-specific ChIP-seq datasets, where many competing methods experience substantial degradation. Traditional statistical approaches such as PPCOR, and SINCERITIES continue to perform poorly in these settings, reflecting their limited ability to generalize. Among the external supervised competing methods GNE emerges as the strongest, achieving the most competitive Full test-set AUROC and Full test-set AUPRC whereas LEAP emerges as a strongest competing unsupervised method across datasets under the stringent full-matrix setting. GNNLink also performs well on several datasets on Full test-set AUROC, but its Full test-set AUPRC varies considerably across evaluation settings, reflecting its greater sensitivity to dataset characteristics. In contrast, GRNFormer surpasses all SOTA methods-including GNE-on the majority of datasets, with especially strong consistency in Full test-AUPRC, where precise identification of true regulatory relationships is most critical (see Results S1B, available as supplementary data at Bioinformatics online, for detailed analyses). To further evaluate early-ranking performance, we computed Early Precision (EP) across multiple cutoffs (Results S1.C, Figs. S10 and S11, available as supplementary data at Bioinformatics online). GRNFormer demonstrates consistently strong and stable recovery of true regulatory interactions at cutoffs (EP@1000, EP@2000, and EPR@# ground-truth(gt)), with GNE being equally competitive at lower cutoffs EP@100, EP@gt, and GRNFormer outperforms other competing methods across datasets with the average win rate of 84% (GRNFormer ≥ other methods). Win rate (Fig. S11A, available as supplementary data at Bioinformatics online) and rank frequency (Fig. S11B, available as supplementary data at Bioinformatics online) further confirm GRNFormer’s robust prioritization of biologically relevant regulatory edges.

Figure 3.

Figure3: A two-panel figure comparing GRNFormer with nine GRN inference methods across multiple datasets using heatmaps. (A) AUROC scores: Three side-by-side heatmaps show performance on non-cell-type-specific ChIP-seq, cell-type-specific ChIP-seq, and STRING benchmarks. Rows correspond to different transcription factor settings (e.g., TF500, TF1000) and cell types, while columns list methods. Each cell contains a numeric AUROC value with color intensity indicating performance (higher values in yellow/orange, lower in purple). GRNFormer consistently shows high AUROC scores, often among the top across datasets. (B) AUPRC scores: Similar layout with three heatmaps for the same benchmarks. Values (scaled by ×1e−2) are shown in each cell, with color intensity reflecting performance. GRNFormer generally achieves competitive or higher AUPRC values compared to other methods, particularly in cell-type-specific ChIP-seq evaluations.

Comparison of GRNFormer and nine state-of-the-art methods in various test datasets. (A) Full test-set AUROC scores for 10 GRN inference methods evaluated on the BEELINE test set using three gold-standard benchmark GRNs: non-cell-type-specific ChIP-seq, cell-type-specific ChIP-seq, and STRING protein–protein interaction network. GRNFormer consistently ranks among the top performers across all evaluation scenarios. (B) Full test-set AUPRC scores for the same evaluation settings. For readability, values are displayed after factoring out ×1e − 2.

To complement the full-matrix evaluation, we additionally computed balanced sampled metrics following common supervised benchmarking practice. Specifically, we perform 100 iterations of 1:1 bootstrapped negative sampling from the same clean evaluation pool, pairing each test positive with a randomly drawn unseen negative edge. Because these metrics are derived from balanced sampling rather than the full edge space, they are explicitly reported as Sampled AUROC and Sampled AUPRC as Fig. S7A and B, available as supplementary data at Bioinformatics online.

See Method S10 for detailed protocol, detailed Results S1.A, available as supplementary data at Bioinformatics online. These sampled evaluation and full-matrix robustness evaluations (Results S1A.B.C; Fig. S7A and B and S8–S11, available as supplementary data at Bioinformatics online) confirm that GRNFormer is the top-performing method across more than half of datasets, with GNE emerging as the strongest competing method under the both sampled and stringent full-matrix setting. These results demonstrate that GRNFormer reliably infers high-confidence GRNs in a data-agnostic manner and achieves SOTA accuracy under a stringent evaluation framework in which all methods are tested against the same unseen positives and negatives. This robustness across diverse benchmarks underscores GRNFormer’s practical utility for single-cell regulatory genomics.

3.3 Ablation study: dissecting GRNFormer’s architecture

To assess the contribution of each architectural component in GRNFormer, we performed a systematic ablation study using a cross-lineage blind evaluation. The model was trained on five diverse expression datasets (hESC, hHep, mDC, mHSC-E, mHSC-GM) and blindly evaluated on two unseen cell types (mESC and mHSC-L) to rigorously test generalization across species and lineages. Ablation results shown in Results S2 and Table S3, available as supplementary data at Bioinformatics online, demonstrate that the full GRNFormer model—comprising the TF-Walker subgraph sampler, Gene-Transcoder, Transformer Convolution (TransConv) block, and GraViTAE decoder—achieved the highest performance (Sampled_AUROC = 0.97, Sampled_AUPRC = 0.95), confirming its robustness across biological contexts. The study reveals that TF-Walker effectively localizes transcriptional context, the Gene-Transcoder and TransConv Block enrich representation learning, and the decoder improves accuracy in predicting regulatory interactions. Together, these components contribute to improved GRN reconstruction performance across diverse transcriptomic contexts.

3.4 Cross-species generalization on DREAM5 bulk RNA-seq datasets

To evaluate GRNFormer’s ability to generalize beyond mammalian single-cell data, we conducted a blind assessment using bulk RNA-seq datasets from E. coli and S. cerevisiae curated by the DREAM5 challenge. These benchmarks span distinct regulatory architectures—prokaryotic and unicellular eukaryotic—and are derived from high-confidence ChIP-based GRNs. GRNFormer, trained solely on human and mouse single-cell RNA-seq, was tested on these microbial bulk datasets without using species-specific annotations or prior regulatory knowledge.

This evaluation addressed two core questions: (i) Can GRNFormer generalize across species with divergent transcriptional logic? (ii) Does it retain accuracy when applied to bulk RNA-seq without retraining?

Applied directly to DREAM5 expression matrices, GRNFormer achieved Sampled_AUROC scores of 0.979 (E. coli) and 0.977 (S. cerevisiae), with Sampled_AUPRCs of 0.955 and 0.957, respectively, despite GRNFormer never seeing bulk or microbial data during training. Precision–recall and receiver operating characteristic curves are shown in Results S3; Fig. S12, available as supplementary data at Bioinformatics online. To ensure reliability, we used 100 rounds of bootstrapped negative sampling (1:1 ratio) and averaged metrics to mitigate variance. These findings confirm GRNFormer’s broad utility: it captures regulatory principles conserved across species, performs robustly across transcriptomic modalities, and scales up to genome-wide data without retraining. Its robust performance on DREAM5 underscores its potential as a general-purpose GRN inference engine, particularly in settings where ground-truth networks are limited or unavailable.

3.5 Scalability and robustness of GRNFormer across gene set size, dataset complexity, and species

We evaluated the computational and biological scalability of GRNFormer using a diverse set of expression datasets varying in gene number, cell type, species, and regulatory context. Our analysis included all single-cell RNA-seq datasets from the BEELINE suite and bulk RNA-seq datasets from E. coli and yeast provided by the DREAM5 challenge. As shown in Fig. 4C, inference runtime scaled predictably with the number of input genes (ranging from ∼500 to ∼5900 genes), increasing smoothly on a logarithmic scale despite high dimensionality. Notably, this trend held across a wide range of network densities, reflecting the efficiency of the TF-Walker sampling strategy and the fixed-size subgraph processing pipeline. To assess computational scalability, all inference experiments were conducted on a single NVIDIA A100 GPU. For inference, the maximum peak memory usage was <2500 MB, while the final memory usage after completion remained below 1000 MB. For 4911 genes, inference required 1732.56 s, with a peak memory usage of 1074.54 MB and a final memory usage of 380.87 MB. For 5910 genes, inference took 4512.08 s, with a peak memory usage of 2406.18 MB and a final memory usage of 915.49 MB. Here, peak memory refers to the maximum GPU memory allocated at any point during inference, while final memory indicates the memory remaining after inference. These results demonstrate that our method is both memory-efficient and scalable to datasets with thousands of genes.

Figure 4.

Figure 4: A multi-panel figure illustrating additional analyses of GRNFormer. (A) A gene regulatory network graph for human embryonic stem cells (hESCs), showing predicted regulatory interactions not present in ground-truth networks. Nodes represent genes, with transcription factors highlighted in orange and non-transcription factor genes in gray. Directed edges indicate regulatory relationships. (B) An enrichment map summarizing biological functions of the network in (A). Nodes represent enriched terms such as lineage differentiation, gastrulation, mesodermal commitment, and ectoderm/endoderm differentiation, connected by edges indicating related processes. (C) A scatter plot of runtime versus number of genes (log scale for time). Each point represents an experiment, with color indicating network density. Runtime generally increases with gene set size. (D) A line and scatter plot showing AUROC and AUPRC scores versus number of genes. Both metrics remain high across increasing gene set sizes, with slight variation at mid-range sizes. (E) A heatmap of cross-species performance (E. coli, human, mouse, yeast), showing average AUPRC and AUROC scores. Performance is highest for E. coli and yeast, and slightly lower for human and mouse datasets.

(A) Novel regulatory interactions of hESCs predicted by GRNFormer outside of ground-truth networks. (B) Enrichment map of GRN shown in A. (C) Runtime versus number of genes. (D) Sampled_AUROC or Sampled_AUPRC versus number of genes. (E) The average performance on the test datasets of four different species.

The prediction performance remained consistently high across increasing gene set sizes. Figure 4D shows Sampled_AUROC and Sampled_AUPRC scores as a function of the number of genes per dataset. GRNFormer maintained robust performance—even on large-scale networks—with both Sampled_AUROC and Sampled_AUPRC exceeding 0.9 for most settings beyond 2000 genes. These results demonstrate that the model does not overfit to low-dimensional settings and effectively captures regulatory signals in complex transcriptomic profiles.

To further assess generalizability across phylogenetic and regulatory variation, we benchmarked GRNFormer on gene expression datasets from Homo sapiens, Mus musculus, S. cerevisiae, and E. coli. As shown in Fig. 4E, GRNFormer achieved high Sampled_AUROC and Sampled_AUPRC scores across all four species. This cross-species robustness highlights GRNFormer’s capacity to learn transferable regulatory representations. Together, these results affirm GRNFormer’s suitability for scalable and high-accuracy GRN inference-capable of generalizing across increasing gene dimensions, diverse species, and varied biological contexts without retraining or fine-tuning.

GRNFormer shows strong robustness to perturbations in gene expression data. Under 20 combinations of Gaussian noise and dropout, Sampled_AUROC and Sampled_AUPRC varied by <0.3% and 0.18%, respectively (Results S4; Fig. S13; Tables S5.1–S5.3, available as supplementary data at Bioinformatics online). This stability arises from correlation-preserving z-score normalization, TF-Walker’s context-based subgraph sampling, and the smoothing effect of GraViTAE’s variational latent space. These results demonstrate that GRNFormer maintains reliable predictive performance even under substantial noise and sparsity, supporting its applicability across diverse scRNA-seq conditions.

We further evaluated robustness of GRNFormer in comparisons to other benchmark existing SOTA methods, under the full-matrix setting (Results S1.B, available as supplementary data at Bioinformatics online). GRNFormer consistently maintained the highest median full AUROC and full AUPRC distributions across all datasets (Fig. S8, available as supplementary data at Bioinformatics online), whereas competing methods exhibited substantially broader spread and lower lower-quartile performance. Average-rank and win rate summaries (Fig. S9, available as supplementary data at Bioinformatics online) confirm that GRNFormer achieves the greatest proportion of rank-#1 outcomes across datasets for both AUROC and AUPRC. In addition, paired t-tests confirm that GRNFormer significantly outperforms GNE and GNNLink across datasets (Table S6, available as supplementary data at Bioinformatics online).

3.6 Case study: GRNFormer recovers hESC pluripotency and predicts PBMC lineage networks via blind inference

hESCs are pluripotent, self-renewing cells capable of differentiating into all somatic lineages (Gepstein 2002, Vazin and Freed 2010). Their shared regulatory programs with cancer—such as those controlling the cell cycle, apoptosis, and epigenetic states—make hESCs a valuable model for uncovering candidate oncogenic regulators (Blum and Benvenisty 2008). In the first case study, to evaluate GRNFormer’s ability to recover biologically meaningful transcriptional circuits, we applied it to the hESC dataset that includes all annotated TFs and the top 500 most variable genes. The predicted GRN was benchmarked against cell-type-specific ChIP-seq ground truth, and all interactions shown in Fig. S14A, available as supplementary data at Bioinformatics online, correspond to predictions supported by the experimental ground-truth reference. GRNFormer successfully reconstructed the core transcriptional architecture of hESCs, identifying master regulators of pluripotency, including POU5F1 (OCT4), SOX2, MYC, and NANOG, as central nodes in the network. These TFs are well established to operate in tightly interconnected autoregulatory circuits that coordinate gene expression programs essential for sustaining pluripotency and suppressing differentiation (Yeo and Ng 2013) (see Results S5, Fig. S14A and B for details, available as supplementary data at Bioinformatics online).

In addition to predicting known pluripotency networks, GRNFormer revealed a novel transcriptional module centered on GATA6, HAND1, NRP1, HNF1B, and TET2 (Fig. 4A). These TFs are not typically active in ground-state pluripotency but are known to mediate early lineage specification events, including mesendoderm formation, cardiac development, neurulation, and vascular morphogenesis (Cirio et al. 2011, Lynch et al. 2025). TET2, in particular, plays a role in DNA demethylation and chromatin remodeling, and its presence in this module suggests an epigenetically primed state (Eyres et al. 2021). Importantly, this subnetwork was absent from the considered ChIP-seq-derived hESC gold-standard networks, implying that GRNFormer can capture regulatory heterogeneity or transient cell states that elude bulk profiling techniques. To characterize the functional relevance of these predictions shown in Fig. S14A, available as supplementary data at Bioinformatics online, and Fig. 4A, we performed pathway enrichment analysis using g: Profiler (Raudvere et al. 2019) and visualized the results (Fig. S14B, available as supplementary data at Bioinformatics online, and Fig. 4B, respectively) using Cytoscape (Shannon et al. 2003). The targets of the novel regulatory module (Fig. 4B) were enriched for developmental programs, such as gastrulation and specification of the endoderm, mesoderm, and ectoderm (Muhr and Ackerman 2023). Additional pathways included cardiac development and neuronal differentiation (Yao et al. 2017, Li et al. 2024, Mensah and Gowher 2024). These findings indicate that the network in Fig. 4A reflects an early, lineage-primed transcriptional program-potentially marking the transition from naïve pluripotency to germ layer commitment. The developmental programs enriched in the novel sub-network were absent from the core pluripotency GRN, suggesting that GRNFormer distinguishes stable from transitional regulatory states. These predictions reveal cryptic or low-frequency transcriptional programs in hESCs that may represent early cell fate determinants.

To illustrate interpretability, we examined a predicted subgraph centered on the gene NRP1 (Results S6; Fig. S15, available as supplementary data at Bioinformatics online). This NRP1-centered subgraph connects the gene to multiple TFs, including POU5F1 (OCT4), LHX1, HAND1, GATA6, and TET2. Higher predicted edge weights (shown in warmer colors) highlight strong model-inferred associations between NRP1 and both core pluripotency factors and lineage-priming TFs. Several high-weight edges are not present in the considered ChIP-derived gold standard but are consistent with literature links to developmental processes, suggesting that GRNFormer’s high-weight predictions are enriched for biologically plausible novel interactions. This case study highlights GRNFormer’s ability to recover both known and novel, biologically plausible regulatory modules.

In the second case study, to evaluate GRNFormer’s ability in resolving cell-type-resolved transcriptional programs, we applied it in a zero-shot setting to the 10x Genomics PBMC 3k dataset, a standard benchmark for immune single-cell profiling. The dataset was obtained in preprocessed form via the Scanpy toolkit, and GRNFormer inferred regulatory networks without any access to cell-type labels, clustering results, or pathway priors, relying solely on gene expression (Wolf et al. 2018).

A subnetwork extracted from the predicted GRN (Fig. S16A, available as supplementary data at Bioinformatics online) highlights high-centrality genes with well-established immunological relevance, including MS4A1, GNLY, NKG7, KLRB1, LST1, FCER1A, and TAL1. These genes mark key immune lineages: MS4A1 (CD20) marks B cells (Stamenkovic and Seed 1988), GNLY and NKG7 define cytotoxic T and NK cells (Krensky and Clayberger 2005), KLRB1 is linked to NK cell function (Fang and Zhou 2024), LST1 marks monocytes (Rollinger-Holzinger et al. 2000), FCER1A labels plasmacytoid dendritic cells (Reshetnikova et al. 2018), and TAL1 regulates early hematopoietic lineage commitment (Real et al. 2012). These predictions underscore the model’s capacity to capture underlying regulatory logic directly from transcriptomic signals.

To interpret the functional landscape of the inferred network, we also performed pathway enrichment analysis using g: Profiler and visualized the results in Cytoscape via Enrichment Map plugin. The enriched pathways formed coherent modules aligning with known PBMC biology (Fig. S16B and Results S7, available as supplementary data at Bioinformatics online), including vesicle trafficking, antigen presentation, and apoptotic signaling pathways. Interaction module around TAL1 inferred by GRNFormer is supported by literature evidence, that TAL1 is a master regulator of hematopoiesis (Sanda et al. 2012). These findings demonstrate that GRNFormer can infer biologically coherent cell-type-specific functional modules from single-cell data without external supervision. The model not only recapitulates known transcriptional circuits in immune lineages but also organizes them into interpretable regulatory programs, validating its utility for de novo GRN reconstruction in complex tissues.

4 Discussion

GRNFormer offers a scalable, context-aware framework for inferring GRNs from transcriptomic data. Unlike traditional approaches that rely on predefined motifs, GRNFormer learns regulatory patterns through TF-centered subgraphs sampled via the TF-Walker strategy, enabling inductive generalization. Ablation studies confirmed the critical role of this approach, with random sampling resulting in diminished performance. The Gene-Transcoder and GraViTAE modules further capture biologically meaningful features and learn gene regulatory representations without relying on sequence priors.

The model demonstrates effective generalization across the species evaluated (human, mouse, yeast, E. coli), data types (bulk and single-cell RNA-seq), and regulatory modalities. On DREAM5 yeast and bacterial benchmarks-despite no prior exposure to prokaryotic data-GRNFormer achieved high Sampled_AUROC (∼0.97) and Sampled_AUPRC (∼0.95), Full test-set AUROC/AUPRC and greater EPR win rate (∼84%). It also uncovered lineage-specific regulatory programs in human tissues and identified functional modules linked to pluripotency, antigen presentation, epithelial–mesenchymal transition (EMT), and developmental pathways such as gastrulation. Additionally, cross-context interactions, including links between pluripotency and oncogenesis, were also revealed. Beyond performance, GRNFormer provides practical advantages in data efficiency, scalability, and interpretability. Unlike many supervised GRN methods, it generalizes to new species without retraining in our evaluations, making it applicable to systems with limited annotation. Its modular design allows easy adaptation, while edge-centric decoding ensures computational scalability without full-graph propagation.

Nevertheless, GRNFormer has certain limitations. Although GRNFormer performs well in zero-shot and cross-context settings, its accuracy may be reduced in datasets with extreme sparsity or weak TF signal. Moreover, its reliance on local expression neighborhoods may overlook distal or chromatin-level regulatory interactions such as enhancer–promoter loops.

Looking forward, future developments could extend GRNFormer’s capabilities in several directions. Integrating multi-omics data, such as chromatin accessibility (ATAC-seq), epigenetic modifications, and TF motif occupancy, could enrich the contextual learning of regulatory interactions. Similarly, adapting the architecture to model temporal dynamics would enable causal modeling of transcriptional cascades. Further, incorporating cross-cell alignment or contrastive learning objectives may improve generalization in datasets with limited signal or across species.

Supplementary Material

btag144_Supplementary_Data

Contributor Information

Akshata Hegde, Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, 65211, United States; Roy Blunt Nextgen Precision Health, University of Missouri, Columbia, Missouri, 65211, United States.

Jianlin Cheng, Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, 65211, United States; Roy Blunt Nextgen Precision Health, University of Missouri, Columbia, Missouri, 65211, United States.

Author contributions

Akshata Hegde (Data curation [lead], Formal analysis [lead], Investigation [lead], Methodology [equal], Software [lead], Validation [lead], Visualization [lead], Writing—original draft [lead], Writing—review & editing [equal]) and Jianlin Cheng (Conceptualization [lead], Formal analysis [supporting], Funding acquisition [lead], Investigation [supporting], Methodology [equal], Project administration [lead], Resources [lead], Supervision [lead], Validation [supporting], Writing—review & editing [equal])

Supplementary material

Supplementary material is available at Bioinformatics online.

Conflicts of interests

None declared.

Funding

This work was supported by National Sscience Foundation (NSF) grants (#: CCF2343612 and IOS2525780) and a Department of Energy grant (#: DE-SC0026121).

Data availability

GRNFormer is available on GitHub (https://github.com/BioinfoMachineLearning/GRNformer); the version used in this work is archived on Zenodo (https://doi.org/10.5281/zenodo.18868395), with evaluation resources for reproducibility.

References

  1. Blum B, Benvenisty N.  The tumorigenicity of human embryonic stem cells. Adv Cancer Res  2008;100:133–58. [DOI] [PubMed] [Google Scholar]
  2. Chan TE, Stumpf MPH, Babtie AC.  Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst  2017;5:251–67. e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chen G, Liu Z-P.  Graph attention network for link prediction of gene regulations from single-cell RNA-sequencing data. Bioinformatics  2022;38:4522–9. [DOI] [PubMed] [Google Scholar]
  4. Chen G, Ning B, Shi T.  Single-cell RNA-seq technologies and related computational data analysis. Front Genet  2019;10:317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cirio MC, Hui Z, Haldin CE  et al.  Lhx1 is required for specification of the renal progenitor cell field. PLoS One  2011;6:e18858. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. De Smet R, Marchal K.  Advantages and limitations of current network inference methods. Nat Rev Microbiol  2010;8:717–29. [DOI] [PubMed] [Google Scholar]
  7. Eyres M, Lanfredini S, Xu H  et al.  TET2 drives 5hmc marking of GATA6 and epigenetically defines pancreatic ductal adenocarcinoma transcriptional subtypes. Gastroenterology  2021;161:653–68.e16. [DOI] [PubMed] [Google Scholar]
  8. Faith JJ, Hayete B, Thaden JT  et al.  Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol  2007;5:e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Fang S, Zhou Y.  Deciphering the role of KLRB1: a novel prognostic indicator in hepatocellular carcinoma. BMC Gastroenterol  2024;24:210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Genomics 10x. PBMCs from a Healthy Donor (v3, 3k Cells) – Single Cell Immune Profiling Dataset by Cell Ranger v3.0.0. 2018. Available at: https://www.10xgenomics.com/datasets/10-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0 (6 June 2025, date last accessed).
  11. Gepstein L.  Derivation and potential applications of human embryonic stem cells. Circ Res  2002;91:866–76. [DOI] [PubMed] [Google Scholar]
  12. Hickman GJ, Hodgman TC.  Inference of gene regulatory networks using Boolean-network inference methods. J Bioinform Comput Biol  2009;7:1013–29. [DOI] [PubMed] [Google Scholar]
  13. Huynh-Thu VA, Sanguinetti G.  Gene regulatory network inference: an introductory survey. Gene Regulatory Networks: Methods and Protocols. New York, NY: Springer New York, 2018, 1–23. [Google Scholar]
  14. Jiang X, Neapolitan RE.  LEAP: biomarker inference through learning and evaluating association patterns. Genet Epidemiol  2015;39:173–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Johnson KA, Krishnan A.  Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data. Genome Biol  2022;23:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kc K, Li R, Cui F  et al.  GNE: a deep learning framework for gene network inference by aggregating biological information. BMC Syst Biol  2019;13:38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kim S.  ppcor: an R package for a fast calculation to semi-partial correlation coefficients. Commun Stat Appl Methods  2015;22:665–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv, arXiv:160902907, 2016, preprint: not peer reviewed.
  19. Krensky AM, Clayberger C.  Granulysin: a novel host defense molecule. Am J Transplant  2005;5:1789–92. [DOI] [PubMed] [Google Scholar]
  20. Li Y, Du J, Deng S  et al.  The molecular mechanisms of cardiac development and related diseases. Signal Transduct Target Ther  2024;9:368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lu T, Liang H, Li H  et al.  High-dimensional ODEs coupled with mixed-effects modeling techniques for dynamic gene regulatory network identification. J Am Stat Assoc  2011;106:1242–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lynch AT, Phillips N, Douglas M  et al.  HAND1 level controls the specification of multipotent cardiac and extraembryonic progenitors from human pluripotent stem cells. EMBO J  2025;44:2541–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ma A, Wang X, Li J et al. Single-cell biological network inference using a heterogeneous graph transformer. Nat Commun 2023;14:964. 10.1038/s41467-023-36559-0 [DOI] [Google Scholar]
  24. Mao G, Pang Z, Zuo K  et al. Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform 2023;24:bbad414. 10.1093/bib/bbad414 [DOI] [Google Scholar]
  25. Marbach D, Costello JC, Küffner R  et al. ; DREAM5 Consortium. Wisdom of crowds for robust gene network inference. Nat Methods  2012;9:796–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Margolin AA, Nemenman I, Basso K  et al.  ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics  2006;7 Suppl 1:S7–15. [Google Scholar]
  27. Mensah IK, Gowher H.  Signaling pathways governing cardiomyocyte differentiation. Genes (Basel)  2024;15:798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Muhr J, Ackerman KM.  Embryology, gastrulation. [Updated 2023 Apr 23]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2026 Jan. Available from: https://www.ncbi.nlm.nih.gov/sites/books/NBK554394/ [Google Scholar]
  29. Papili Gao N, Ud-Dean SMM, Gandrillon O  et al.  SINCERITIES: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles. Bioinformatics  2018;34:258–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Pratapa A, Jalihal AP, Law JN  et al.  Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods  2020;17:147–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Raudvere U, Kolberg L, Kuzmin I  et al.  g: Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res  2019;47:W191–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Real PJ, Ligero G, Ayllon V  et al.  SCL/TAL1 regulates hematopoietic specification from human embryonic stem cells. Mol Ther  2012;20:1443–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Reshetnikova E, Guselnikov S, Volkova O  et al.  B cell‐specific protein FCRLA is expressed by plasmacytoid dendritic cells in humans. Cytometry B Clin Cytom  2018;94:683–7. [DOI] [PubMed] [Google Scholar]
  34. Rollinger-Holzinger I, Eibl B, Pauly M  et al.  LST1: a gene with extensive alternative splicing and immunomodulatory function. J Immunol  2000;164:3169–76. [DOI] [PubMed] [Google Scholar]
  35. Sanda T, Lawton LN, Barrasa MI  et al.  Core transcriptional regulatory circuit controlled by the TAL1 complex in human T cell acute lymphoblastic leukemia. Cancer Cell  2012;22:209–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Shannon P, Markiel A, Ozier O  et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res  2003;13:2498–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Shermin A, Orgun MA. Using dynamic Bayesian networks to infer gene regulatory networks from expression profiles. In: Proceedings of the 2009 ACM Symposium on Applied Computing. New York: Association for Computing Machinery, 2009, 799–803.
  38. Shi Y, Huang Z, Feng S et al. Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021, 1548–54. 10.24963/ijcai.2021/214 [DOI]
  39. Stamenkovic I, Seed B.  Analysis of two cDNA clones encoding the B lymphocyte antigen CD20 (B1, Bp35), a type III integral membrane protein. J Exp Med  1988;167:1975–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Szklarczyk D, Kirsch R, Koutrouli M  et al.  The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res  2023;51:D638–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Vaswani A, Shazeer N, Parmar N  et al.  Attention is all you need. Adv Neural Inf Process Syst  2017;30. [Google Scholar]
  42. Vazin T, Freed WJ.  Human embryonic stem cells: derivation, culture, and differentiation: a review. Restor Neurol Neurosci  2010;28:589–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Wang Y, Chen X, Zheng Z  et al.  scGREAT: transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics. iScience  2024;27:109352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Wang K, , LiY, , Liu F  et al.  GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data. BMC Bioinformatics  2025;26:108. 10.1186/s12859-025-06116-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Wolf FA, Angerer P, Theis FJ.  SCANPY: large-scale single-cell gene expression data analysis. Genome Biol  2018;19:15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Xu J, Li Z, Du B  et al.  Reluplex made more practical: Leaky ReLU. In: 2020 IEEE Symposium on Computers and Communications (ISCC). Rennes, France: IEEE, 2020, 1–7. [Google Scholar]
  47. Xu J, Zhang A, Liu F  et al.  STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics  2023;39:btad165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Yang Y, Fang Q, Shen HB.  Predicting gene regulatory interactions based on spatial gene expression data and deep learning. PLoS Comput Biol  2019;15:e1007324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Yao Z, Mich JK, Ku S  et al.  A single-cell roadmap of lineage bifurcation in human ESC models of embryonic brain development. Cell Stem Cell  2017;20:120–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Yeo J-C, Ng H-H.  The transcriptional regulation of pluripotency. Cell Res  2013;23:20–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Yuan Y, Bar-Joseph Z.  Deep learning for inferring gene relationships from single-cell expression data. Proc Natl Acad Sci USA  2019;116:27151–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Zhou J, Cui G, Hu S  et al.  Graph neural networks: a review of methods and applications. AI Open  2020;1:57–81. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btag144_Supplementary_Data

Data Availability Statement

GRNFormer is available on GitHub (https://github.com/BioinfoMachineLearning/GRNformer); the version used in this work is archived on Zenodo (https://doi.org/10.5281/zenodo.18868395), with evaluation resources for reproducibility.


Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES