Skip to main content
Genome Biology logoLink to Genome Biology
. 2025 Nov 20;26:394. doi: 10.1186/s13059-025-03860-8

Structure-enhanced graph meta learning for few-shot gene regulatory network inference

Weiming Yu 1,2, Zhuobin Chen 3, Yaohua Hu 4,, Jing Qin 3,, Le Ou-Yang 2,
PMCID: PMC12636225  PMID: 41267133

Abstract

Inferring gene regulatory networks (GRNs) is essential for understanding biological regulation. Although numerous deep learning approaches have been developed for GRN inference, most require large amounts of labeled data. We present Meta-TGLink, a structure-enhanced graph meta-learning model for few-shot GRN inference. By formulating GRN inference as a link prediction task, Meta-TGLink captures transferable regulatory patterns while reducing dependence on extensive labeled datasets. The model combines graph neural networks with Transformer architectures to integrate relational and positional information, thereby improving predictive performance under data-scarce conditions. Experiments on real datasets demonstrate its superiority over state-of-the-art baselines, particularly in cross-domain few-shot scenarios.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13059-025-03860-8.

Keywords: Graph meta learning, Gene regulatory networks, Networks inference, Graph neural networks

Background

Gene regulatory networks (GRNs) characterize the complex regulatory interactions between transcription factors (TFs) and target genes (TGs), and are crucial for the regulation of transcription processes and the execution of cellular functions [1]. By inferring GRNs, we can gain insights into the fundamental mechanisms of cellular behavior, identify potential targets for therapeutic intervention, and improve our understanding of genetic disorders and diseases [2, 3]. The advent of next-generation sequencing technology has revolutionized our understanding of gene regulation by generating gene expression data at an unprecedented scale and speed [4], which provide a solid foundation for using computational methods to infer GRNs. However, challenges often arise when utilizing gene expression data to infer GRNs, due to the inherent complexity and dynamic nature of gene regulation [5].

Over the past decade, advances in machine learning have led to the development of various techniques aimed at inferring GRNs from gene expression data. These methods can be broadly classified into two main types based on their use of supervised information. Unsupervised learning methods primarily leverage statistical measures, such as correlation coefficients [6], or machine learning techniques to identify gene associations [7]. For example, tree-based approaches [8] use regression trees to enhance robustness against noise in GRN inference, while gradient boosting [9] improves its scalability for large datasets. Other unsupervised methods incorporate generative models, such as the variational auto-encoder combined with a structural equation model [10], or utilize a meta-decoder to generate pseudo-labels and perform bi-level optimization [11]. However, without incorporating prior regulatory relationships, unsupervised methods struggle with the inherent noise and complexity of gene expression data, leading to high false-positive rates. Supervised learning methods mitigate this issue by leveraging prior regulatory knowledge in the training process. For instance, some approaches transform gene expression data into histograms and use convolutional neural network (CNN) to capture complex gene interactions [1214], while others apply multi-layer perceptrons (MLP) [15] or probabilistic matrix factorization [16] to model gene regulatory interactions. Additionally, graph neural network [17] (GNN)-based methods are well-suited for GRN inference, as GRNs naturally exhibit graph structures, making GNNs effective for modeling regulatory interactions through topological dependencies [18]. Some methods employ Graph Attention Network (GAT) [19] to adaptively aggregate gene information [20, 21], while others use Graph Convolutional Network (GCN) [22] to learn gene representations [23]. Other advanced approaches integrate structural causal models with GNNs [24], enabling causal inference among genes.

Although deep learning methods have greatly advanced GRN inference, most rely on large amounts of labeled data (i.e., known gene regulatory relationships), which are often difficult or costly to obtain, particularly in less-studied cell types or species. This limitation is evident in several scenarios. For instance, when inferring regulatory relationships for a new TF, the lack of known TGs leads to a TF cold-start problem that severely restricts inference. A similar challenge arises in constructing cell line-specific or cell type-specific GRNs, where prior regulatory knowledge is often limited. A common strategy to address these limitations is transfer learning [25, 26], which leverages knowledge from well-labeled cell lines to enhance inference in label-scarce cell lines. Cross-species knowledge transfer provides another promising direction [27], as GRNs from model organisms are often more accessible than those from human cell lines. These challenges motivate the formulation of GRN inference as a few-shot learning problem, where models must not only predict regulatory interactions from limited labeled data but also adapt quickly to new tasks.

Meta-learning, also known as “learning to learn”, has emerged as an effective learning paradigm for enhancing model adaptability. It leverages experience from multiple learning episodes across related tasks to enhance performance on new tasks [28], making it well-suited to address challenges in few-shot learning [29, 30]. Recently, meta-learning has been applied to tasks involving graph-structured data, giving rise to graph meta-learning, which has shown effectiveness in tackling various network-related problems [3134]. Building on these advances, meta-learning provides a promising strategy for improving GRN inference in few-shot scenarios by enabling knowledge transfer across tasks.

In this work, we propose a structure-enhanced graph meta-learning model (Meta-TGLink) for few-shot GRN inference. Similar to most GNN-based methods, we formulate GRN inference as a link prediction task, where the goal is to identify previously unknown gene regulatory relationships based on a graph of known gene interactions. Firstly, to leverage the topological structure of GRNs, we design a specialized meta-task tailored for GRN inference, which alleviates the issue of limited episodes during meta-training. Secondly, to address the challenge of performing link prediction with limited known gene regulatory information and restricted message passing in few-shot scenarios, we introduce a structure-enhanced GNN module. This module alternates between Transformer [35] and GNNs to mutually enhance their feature extraction capabilities. Additionally, we incorporate a positional encoding module to capture more gene-related information, thereby improving the quality of learned gene representations.

Extensive experimental results on four real-world datasets demonstrate that Meta-TGLink outperforms nine state-of-the-art methods in benchmark GRN inference. Notably, Meta-TGLink demonstrates outstanding performance on three few-shot and one zero-shot GRN inference scenarios compared to other GNN-based baselines, highlighting its exceptional generalization capabilities. Finally, we utilized Meta-TGLink to infer regulatory relationships of new TFs in a real-world dataset and validated the reliability of the inferred GRN through the ChIP-Atlas [36] database and gene set enrichment analysis.

Results

Meta-TGLink advances GRN inference by integrating meta-learning with GNN

GRN inference becomes especially challenging when prior regulatory information is scarce. To overcome this limitation, we developed a novel meta-learning framework, Meta-TGLink, inspired by Model-Agnostic Meta-Learning (MAML) [37]. As illustrated in Fig. 1, Meta-TGLink consists of two main phases: meta-training and meta-testing.

Fig. 1.

Fig. 1

Illustration of the complete meta-learning process for few-shot GRN inference. We first construct L meta-tasks using a source dataset with a dense prior network. Subsequently, for each task, (1) we feed subgraphs from the support set into our proposed TGLink model. (2) The training loss is computed on the support set and (3) utilized to update the parameters of TGLink through back-propagation. Following this, (4) we feed subgraphs from the query set into the model with updated parameters, and (5) compute the loss of the query set which (6) back-propagates for further parameter updating. For the remaining tasks, we repeat the same process to update the model parameters. During the meta-testing stage, we only sample a meta-task from the target dataset and (7) feed the few-shot support set into the trained model. Subsequently, (8) we calculate the loss and (9) update the model parameters using back-propagation. Finally, (10) we perform inference and evaluate the model performance on the query set

During the meta-training phase, we construct multiple meta-tasks, each composed of a support set and a query set. Meta-TGLink leverages both the support and query sets through a bi-level optimization process (see Methods), enabling the model to learn transferable regulatory patterns across genes. In the meta-testing phase, a single meta-task is formed, where the support set contains a small number of known regulatory interactions, and the query set consists of the gene relationships to be inferred.

To enable accurate GRN inference for target cell lines with only a limited number of known regulatory interactions, we formulate the meta-task as a subgraph-level link prediction problem. This design is motivated by two considerations. First, the lack of known regulatory relationships for specific cell lines makes it difficult to construct enough tasks at the cell line level, where each task corresponds to inferring the GRN of a distinct cell line. Dividing a single cell line’s GRN into multiple subgraphs alleviates this limitation. Second, training on subgraphs allows the model to better adapt to few-shot scenarios and learn diverse regulatory structures, thereby improving its ability to capture heterogeneous patterns within GRNs.

To further improve performance under sparse graph conditions, we designed TGLink, a GNN-based architecture tailored for GRN inference. As depicted in Fig. 2, TGLink consists of three main modules: (a) a positional encoding module, (b) a structure-enhanced GNN module, and (c) a neighborhood perception module. TGLink integrates the global attention mechanism of Transformer to expand the receptive field of the GNN, thereby improving its ability to capture long-range gene interactions. The neighborhood perception module adaptively selects the most relevant neighboring genes, which not only reduces computational cost but also suppresses noise. Additionally, the positional encoding module incorporates topological information into gene features, preserving structural information during message passing (see Methods). Finally, a prediction head is used to infer gene regulatory interactions.

Fig. 2.

Fig. 2

Overview of TGLink. TGLink is a structure-enhanced graph neural network (GNN) framework for inferring gene regulatory networks (GRNs) that takes the gene expression matrix and the prior GRN as input. To predict potential links between gene pairs of interest, we start by encoding the position information of gene pairs. a The positional encoding module consists of a degree encoder (DE), a PageRank encoder (PRE), and a fusion MLP. It encodes gene nodes based on their degrees, PageRank scores, and original gene features, ultimately producing encoded gene embeddings. Then, we feed the gene embeddings into (b) the structure-enhanced GNN module, which comprises the Transformer layer, the GNN layer, and the prediction head, to further capture the latent representations of genes. Meanwhile, we employ (c) the neighbor perception module to select the most relevant genes based on the computed neighborhood matrix for information aggregation. The Transformer and GNN layers are employed to learn gene embeddings, while the prediction head, implemented as multilayer perceptrons (MLPs), performs link prediction

Meta-TGLink demonstrates superior performance in GRN inference across four human cell line benchmarks

To demonstrate Meta-TGLink’s effectiveness in handling sparse scenarios, four human cell line datasets, i.e., A375, A549, HEK293T and PC3, were employed. Following the preprocessing pipeline, we curated a prior regulatory network for each cell line (see Methods).

We selected nine baseline methods for benchmarking GRN inference. These include three GNN-based approaches, i.e., GRACE [24], GNNLink [23] and GENELink [20]; scGPT [26], a large-scale pretrained model based on the Transformer architecture; CNNC [12], which employs CNNs; GNE [15], an MLP-based method; DeepSEM [10], which uses a beta-variational autoencoder; MetaSEM [11], an extension of DeepSEM with bi-level optimization and meta-learning; and GENIE3 [8], a regression tree-based method. To ensure fairness, all supervised learning methods were trained on the same training sets and evaluated on a common testing set. For unsupervised learning methods, which output prediction score matrices, we restricted the evaluation to genes present in the test set. Detailed implementation settings for all baseline methods are provided in Additional file 1: Supplementary Text.

As shown in Fig. 3, Meta-TGLink outperforms all nine baseline methods on most datasets. In particular, it achieves substantially higher AUROC and AUPRC scores compared to unsupervised methods such as MetaSEM, DeepSEM, and GENIE3. Specifically, Meta-TGLink shows an average improvement of 26.0%, 42.3%, 25.9%, and 34.2% in AUROC, and 19.5%, 34.6%, 20.4%, and 36.2% in AUPRC across the four datasets, respectively. These results highlight the importance of incorporating supervised signals or prior regulatory knowledge to improve the accuracy and robustness of GRN inference.

Fig. 3.

Fig. 3

Comparison of (A) AUROC and (B) AUPRC scores between Meta-TGLink and nine baselines on four human cell line datasets for benchmark GRN inference

Furthermore, compared with the non-GNN-based methods CNNC and GNE, Meta-TGLink achieves average improvements of 17.2%, 27.9%, 19.4%, and 27.3% in AUROC, and 13.6%, 24.3%, 14.7%, and 32.3% in AUPRC, respectively. Notably, Meta-TGLink also outperforms the pretrained model scGPT, with gains of 13.7%, 25.3%, 11.9%, and 25.6% in AUROC and 9.8%, 21.2%, 10.0%, and 31.1% in AUPRC across the datasets. These results suggest that the generalizable knowledge captured by large-scale pretrained models is insufficient for cell line-specific GRN inference, even with fine-tuning. Collectively, the findings underscore the advantage of GNN-based approaches in GRN inference, which naturally capture both local gene–gene interactions and global regulatory structures.

Finally, compared with three state-of-the-art GNN-based methods: GRACE, GNNLink, and GENELink, Meta-TGLink consistently achieves higher performance, with average gains of 10.8%, 4.9%, 3.9%, and 2.6% in AUROC, and 4.3%, 3.9%, 3.6%, and 2.2% in AUPRC across the four datasets. These improvements further confirm the effectiveness of Meta-TGLink among GNN-based GRN inference methods. The observed gains are largely attributable to the meta-training strategy embedded in the framework, as analyzed in the following section.

Meta-TGLink enables inference of regulatory relationships for uncharacterized transcription factors

Inferring a complete GRN requires identifying the regulatory relationships of numerous TFs. However, many TFs remain unconnected in the prior regulatory network due to the absence of known interactions with TGs. Inferring the regulatory relationships of these isolated TFs poses a major challenge, as it creates a cold-start problem where the model lacks well-learned representations.

To evaluate the ability of Meta-TGLink to handle uncharacterized TFs, we conducted experiments under both few-shot and zero-shot settings. A549 and HEK293T cell lines were chosen for this analysis due to their relatively large number of connected TFs (see Table 2). Datasets were curated according to the procedures outlined in Methods section. For comparison, we included one transfer learning-based baseline method (scGPT) and three GNN-based methods (GRACE, GNNLink, and GENELink) as baselines.

Table 2.

Details of datasets. The “TFs” and “TGs” columns indicate the number of transcription factors and target genes present in the ground truth networks, respectively

Datasets Feature matrices Ground truth networks
Genes Features TFs TGs Edges
A375 12214 491 5 4166 5651
A549 12214 587 28 7846 25972
HEK293T 12214 776 42 8441 21480
PC3 12214 482 4 2060 3041
mESC 1474 421 89 1385 42795
mHSC-E 1210 1071 33 1177 21975

In the few-shot setting, Meta-TGLink was first trained on characterized TFs and then adapted to isolated TFs using a small number of known regulatory interactions. The other four baseline methods were fine-tuned on the same few-shot interactions for fairness. In the zero-shot setting, all methods were evaluated directly on the testing samples without any fine-tuning, simulating a complete cold-start scenario for unseen TFs.

As shown in Fig. 4A, Meta-TGLink significantly outperformed the baseline methods on both the A549 and HEK293T datasets under the few-shot setting. Additionally, comparing to the second-best method GENELink, Meta-TGLink achieved average improvements of 1.6% and 2.2% in AUROC, and 4.1% and 2.1% in AUPRC across the two datasets. In terms of the zero-shot setting, as depicted in Fig. 4B, Meta-TGLink consistently outperformed the comparative methods, achieving mean AUROC scores of 0.686 and 0.626, and mean AUPRC scores of 0.644 and 0.583 on the A549 and HEK293T datasets, respectively. Compared with GENELink, Meta-TGLink showed average improvements of 1.4% and 3.2% in AUROC, and 2.2% and 2.4% in AUPRC across the two datasets. Furthermore, we observed that scGPT performed close to random guessing under both few-shot and zero-shot settings, suggesting that large-scale pretrained models like scGPT require more data for effective adaptation to downstream GRN inference tasks.

Fig. 4.

Fig. 4

Comparison of AUROC and AUPRC scores between Meta-TGLink and four baselines on A549 and HEK293T human cell line datasets for TF cold-start GRN inference under (A) few-shot (Inline graphic) and (B) zero-shot settings

These results demonstrate the strong generalization capability of Meta-TGLink, especially for TFs with no prior regulatory information. The observed performance gains emphasize its practical utility in predicting regulatory interactions for previously uncharacterized TFs.

Meta-TGLink demonstrates cross-cell line generalization with limited prior regulatory interactions

Inferring cell line-specific or cell type-specific gene regulatory networks is crucial for elucidating the regulatory mechanisms underlying cellular functions. However, prior regulatory networks for specific cell lines are often sparse, with limited known interactions. Cross-cell line few-shot GRN inference has emerged as a promising approach to address this challenge, enabling improved inference accuracy through knowledge transfer across related cell lines.

In this study, we conducted cross-cell line few-shot GRN inference experiments on four human cell line datasets to assess the knowledge transfer capability of Meta-TGLink. In each experiment, one cell line was designated as the source cell line, while the remaining three served as target cell lines. Specifically, Meta-TGLink was first trained on the source cell line and then adapted to each target cell line using a few available regulatory interactions. For comparison, the GNN-based baseline methods followed a similar two-step procedure: pre-training on the source cell line, followed by fine-tuning with few-shot samples from each target cell line.

As shown in Table 1 and Additional file 1: Table S2, Meta-TGLink consistently outperformed all baseline methods across nearly all cross-cell line few-shot settings, achieving the highest AUROC and AUPRC scores. These results highlight its strong capacity for knowledge transfer between different cell lines. In contrast, GENELink and GRACE exhibited near-random performance, with AUROC and AUPRC scores close to 0.5, indicating limited capacity for knowledge transfer. Although scGPT generally underperformed, it surpassed GENELink and GRACE in several experiments, suggesting that pretrained models can still provide some transferable knowledge for GRN inference across cell lines. GNNLink achieved better results than other baselines in a few scenarios but remained less generalizable than Meta-TGLink.

Table 1.

AUPRC scores of Meta-TGLink and four baselines on cross-cell line few-shot (Inline graphic) GRN inference experiments

A375Inline graphicA549 A375Inline graphicHEK293T A375Inline graphicPC3 A549Inline graphicA375 A549Inline graphicHEK293T A549Inline graphicPC3
GENELink 0.499(±0.002) 0.500(±0.001) 0.507(±0.008) 0.498(±0.005) 0.498(±0.003) 0.504(±0.004)
GNNLink 0.535(±0.014) 0.490(±0.014) 0.488(±0.011) 0.501(±0.018) 0.521(±0.016) 0.489(±0.012)
GRACE 0.500(±0.001) 0.503(±0.004) 0.504(±0.006) 0.504(±0.004) 0.503(±0.004) 0.496(±0.012)
scGPT 0.504(±0.003) 0.492(±0.005) 0.529(±0.018) 0.519(±0.014) 0.498(±0.007) 0.518(±0.018)
Meta-TGLink 0.566(±0.007) 0.544(±0.005) 0.542(±0.010) 0.531(±0.019) 0.609(±0.109) 0.564(±0.044)
HEK293TInline graphicA375 HEK293TInline graphicA549 HEK293TInline graphicPC3 PC3Inline graphicA375 PC3Inline graphicA549 PC3Inline graphicHEK293T
GENELink 0.501(±0.007) 0.498(±0.001) 0.508(±0.005) 0.504(±0.005) 0.499(±0.006) 0.499(±0.003)
GNNLink 0.503(±0.023) 0.513(±0.003) 0.481(±0.020) 0.495(±0.023) 0.535(±0.007) 0.513(±0.026)
GRACE 0.505(±0.003) 0.501(±0.003) 0.502(±0.006) 0.503(±0.015) 0.502(±0.002) 0.500(±0.009)
scGPT 0.516(±0.013) 0.506(±0.003) 0.510(±0.015) 0.528(±0.010) 0.505(±0.003) 0.487(±0.008)
Meta-TGLink 0.536(±0.008) 0.529(±0.006) 0.518(±0.010) 0.545(±0.028) 0.530(±0.019) 0.517(±0.019)

Overall, these results demonstrate that Meta-TGLink achieves robust cross-cell line transferability even with limited prior regulatory information, highlighting its potential for applications in data-scarce biological contexts.

Meta-TGLink demonstrates cross-species generalization with limited prior regulatory interactions

GRNs exhibit dynamic characteristics across different species, leading to shifts in data distribution [27] and posing challenges for conventional GRN inference methods. Meta-learning provides a promising solution by enabling the extraction of shared regulatory patterns across species. To assess Meta-TGLink’s ability for cross-species GRN inference, we conducted experiments using two mouse cell datasets as sources and four human cell line datasets as targets, following the same setup as the cross-cell line experiments.

As shown in Fig. 5, Meta-TGLink outperforms other GNN-based baseline methods in terms of AUROC and AUPRC scores across multiple cross-species few-shot GRN inference tasks. Baseline methods exhibit trends similar to those observed in cross-cell line experiments, with limited generalization across species. Although Meta-TGLink consistently shows superior performance, we observe that GNNLink outperforms in a small number of tasks, which may be attributed to the data distribution characteristics of the datasets.

Fig. 5.

Fig. 5

Comparison of (A)-(B) AUROC and (C)-(D) AUPRC scores between Meta-TGLink and three GNN-based baselines on cross-species few-shot (Inline graphic) GRN inference experiments. The cell type at the top of the radar chart represents the source dataset

Further analysis indicates that Meta-TGLink achieves higher AUPRC when using mESC as the source dataset compared to using mHSC-E. This difference may result from the larger scale of known GRNs in mESC, which provides richer prior information. Moreover, gene overlap analysis between the mouse cell lines and human cell lines revealed that mESC shares more genes with human cell lines (1120 overlapping genes) than mHSC-E (915 overlapping genes), suggesting that a higher number of overlapping genes enhances the effectiveness of cross-species GRN inference.

Overall, these results demonstrate that, despite the challenges of cross-species scenarios, Meta-TGLink effectively generalizes across species by leveraging shared regulatory patterns of overlapping genes.

Meta-TGLink achieves superior GRN inference performance via an enhanced meta-training strategy

In this study, we employed an enhanced meta-training strategy based on MAML to optimize performance in GRN inference tasks (see Methods). To evaluate the effectiveness of this strategy, we conducted comparative experiments under three distinct training paradigms:

  1. Meta-TGLink: Employing the meta-training strategy we proposed.

  2. Meta-TGLink-MAML: Using the standard MAML approach.

  3. TGLink: Training with the conventional strategy.

We compared the performance on benchmark GRN inference tasks across four human cell line datasets. For fair comparison, Meta-TGLink and Meta-TGLink-MAML were configured with identical meta-learning hyperparameters, such as learning rates and update steps for both the inner and outer loops. All other hyperparameter settings remained consistent across the three training strategies.

As shown in Fig. 6, Meta-TGLink consistently achieved the highest AUROC scores across all datasets and outperformed in terms of AUPRC in half of the cases. On average, it yielded improvements of 1.7% in AUROC and 1.2% in AUPRC compared to Meta-TGLink-MAML, while the gains over TGLink were more substantial, with 6.6% in AUROC and 3.1% in AUPRC. In addition, we observed that TGLink exhibited notably higher standard deviations across all datasets than Meta-TGLink and Meta-TGLink-MAML, indicating that models trained with the conventional strategy are more prone to over-fitting in sparse scenarios. By contrast, the meta-learning framework effectively alleviates this issue, leading to more stable and reliable predictions.

Fig. 6.

Fig. 6

Comparison of (A) AUROC and (B) AUPRC scores between different training strategies on four human cell line datasets for benchmark GRN inference

Overall, these results highlight the effectiveness of meta-learning in improving GRN inference under data-scarce conditions. Notably, they emphasize the importance of accounting for meta-task interrelations during meta-training and effectively leveraging support set samples for gradient updates.

Meta-TGLink reveals key transcription factor regulations in a human lung cancer cell line

To validate the capability of Meta-TGLink in inferring regulatory interactions for previously uncharacterized TFs, we applied it to predict regulatory scores between three key TFs (GATA2, STAT3 and SP1) and all genes in the A549 cell line dataset. These TFs were absent from the ground truth network of A549, making the task consistent with the zero-shot TF cold-start setting described in Meta-TGLink enables inference of regulatory relationships for uncharacterized transcription factors section.

We visualized the regulatory network comprising the top-10 predicted target genes for the three TFs (see Fig. 7A). Notably, 9 out of the 20 (45%) predicted target genes are co-regulated by more than two TFs, reflecting the functional associations of GATA2, STAT3, and SP1 in promoting tumor cell proliferation, metastasis, and invasion [3841]. Furthermore, all of the top-10 target genes of SP1 were supported by ChIP-seq evidence from the ChIP-Atlas database [36]. Notably, in non-small cell lung cancer (A549), S100A4 and ALDH3A1 have been reported as SP1 targets, contributing to the regulation of NF-Inline graphicB signaling activity [42, 43].

Fig. 7.

Fig. 7

Inference of regulatory interactions for new TFs (SP1, GATA2, and STAT3) in the A549 cell line using Meta-TGLink. A The predicted gene regulatory network comprising SP1, GATA2, STAT3, and their top-10 predicted target genes. B The odds ratios of representative Gene Ontology (GO) terms associated with GATA2 and its top-100 predicted target genes. C Enriched GO biological processes identified from the gene set consisting of GATA2 and its top-100 predicted target genes

Subsequently, we performed gene enrichment analysis on GATA2 and its top-100 predicted target genes, identifying Gene Ontology (GO) terms with p-values less than 0.05. The top-5 enriched terms are presented in Fig.7C. The enriched GO term “positive regulation of intrinsic apoptotic signaling pathway by p53 class mediator” (GO:1902255) highlights the critical role of GATA2 in regulating tumor cell apoptosis [44]. The high odds ratio (odds ratio=133.98) (see Fig. 7B) for this GO term indicates a strong enrichment of this pathway among the top-100 target genes regulated by GATA2, suggesting that this pathway may be preferentially targeted by GATA2 in the A549 cell line. In addition, we observed that the GO term “natural killer cell-mediated cytotoxicity” (GO:0042267) was also significantly enriched, indicating an association between GATA2 and immune activities within tumor tissues, particularly in the regulation of natural killer cell development and function [45].

These findings demonstrate that Meta-TGLink can effectively infer GRNs for previously uncharacterized TFs by leveraging the regulatory patterns of observed TFs. Moreover, the inferred networks show strong biological relevance, underscoring the practical utility of Meta-TGLink.

Discussion

In this work, we present Meta-TGLink, a general meta-learning framework for inference of cell line- or cell type-specific GRNs from gene expression data. A major advantage of Meta-TGLink is its ability to achieve high performance with minimal prior knowledge, consistently outperforming existing supervised methods. This generalizability makes it particularly suitable for real-world GRN studies, where only a limited number of cell type-specific regulatory interactions are available.

The strength of Meta-TGLink lies in its tailored meta-task design for cell type-specific GRN inference and its efficient meta-training strategy. These components enable the model to capture transferable regulatory patterns, which can be further enhanced through hyperparameter optimization or by incorporating advanced encoder architectures. Unlike many existing methods that are restricted to specific sequencing platforms, Meta-TGLink supports a wide range of input data, from bulk RNA-seq to scRNA-seq, thereby demonstrating broad applicability.

Benchmark experiments show that Meta-TGLink consistently outperforms state-of-the-art unsupervised and supervised methods across most datasets. Its superiority is particularly evident in few-shot scenarios, where it leverages meta-learning to acquire transferable regulatory knowledge. Importantly, our results indicate that such knowledge can also be transferred across species, offering a promising approach for zero-shot GRN inference. The effectiveness of cross-species transfer depends on both the similarity of regulatory patterns and the resolution of the input data. For instance, human and mouse share substantial evolutionary conservation [46], which allows Meta-TGLink to extract shared regulatory patterns iteratively. Additionally, using high-resolution scRNA-seq data from mouse as a source further facilitates generalization to lower-resolution human bulk RNA-seq data. These observations suggest that matching data types and leveraging conserved regulatory features are important for enhancing cross-species generalization.

Application of Meta-TGLink to a human lung cancer cell line further illustrates its biological relevance. By inferring GRNs for three key TFs, we observed that many target genes are co-regulated, highlighting cooperative regulatory mechanisms in cancer cells. Gene Ontology analysis further elucidated functional roles of these TFs in cancer proliferation and differentiation, demonstrating the practical utility of Meta-TGLink for biological interpretation and potential targeted therapies.

Despite its advantages, several challenges remain. More complex scenarios, such as TF–TG dual cold-start and cross-domain zero-shot inference, require further investigation. Integrating metadata, such as textual descriptions of genes or cell lines [47], may improve representation learning and model generalization. Refining the model architecture, for instance using advanced Graph Transformer variants like Graphormer [48], and designing more tailored positional encoding modules could further enhance performance. Finally, improving cross-domain inference may benefit from non-linear dimensionality reduction approaches or pre-trained gene features, while incorporating self-supervised learning strategies may help mitigate dataset heterogeneity and improve GRN inference robustness.

Conclusion

We propose Meta-TGLink, a structure-enhanced graph meta-learning framework for GRN inference. Following the principles of meta-learning, we design a novel meta-task tailored specifically for GRN inference. In addition, we develop a structure-enhanced GNN module to mitigate the issue of limited labeled data (i.e., limited known regulatory relationships). This module enables more effective information propagation in sparse graphs, thereby improving inference performance. Moreover, we introduce a positional encoding module to integrate additional gene information and enhance the learning of gene representations.

Benchmark experiments demonstrate that Meta-TGLink consistently outperforms existing methods, demonstrating its effectiveness in aggregating gene information. Results from TF cold-start and cross-domain few-shot GRN inference tasks further show that integrating meta-learning significantly improves model generalization. Ablation studies (see Additional file 1: Supplementary Text) reveal that both the Transformer layer and the positional encoding module contribute to improved gene representation learning, thereby enhancing performance in few-shot scenarios. Moreover, analysis of the training strategy empirically validates the effectiveness of our meta-training approach. Finally, applying Meta-TGLink to novel TFs in a human cell line dataset highlights its practical utility in real-world biological applications.

The framework demonstrates strong generalization to unseen cell types, cell lines, and even across species, while applications to human datasets highlight its biological relevance. Despite remaining challenges in dual cold-start and cross-domain inference, Meta-TGLink provides a scalable, practical, and interpretable approach for GRN reconstruction, offering significant potential for both methodological development and biological discovery.

Methods

Problem formulation

Assuming the Gene Regulatory Network (GRN) is modeled as a graph with genes as nodes, it can be represented as Inline graphic, where V denotes the set of nodes (genes) and E denotes the set of edges (gene regulatory interactions). Gene expression data can be treated as initial gene features and represented by a feature matrix Inline graphic, where N is the number of genes and M is the number of subjects. Additionally, let Inline graphic be the adjacency matrix, which describes the known gene regulatory interactions in G. For convenience, we may also represent the graph as Inline graphic. In this context, we consider G as an undirected graph, making A a symmetric matrix, where Inline graphic if Inline graphic, and Inline graphic otherwise. Specifically, Inline graphic denotes a known gene regulatory interaction, while Inline graphic indicates the absence of a regulatory interaction or an unknown regulatory relationship. The primary objective in GRN inference is to identify these potential regulatory relationships. Thus, GRN inference can be naturally formulated as a link prediction problem.

Datasets and data processing

We collected transcriptomic data from four human cell lines in the CMap [49] project, including human amelanotic melanoma cell line (A375) [50], human lung carcinoma cell line (A549) [51], human embryonic kidney 293 cell line (HEK293T) [52] and human prostate cancer cell line (PC3) [53]. To construct the gene feature matrices, we curated experimental results from TF over-expression assays conducted on these cell lines. Each experimental result was treated as a distinct feature and multiple sets of experiments formed a feature matrix for each cell line. Before feeding the data into the model, we applied Z-score normalization to the expression values within each experiment to reduce variability and facilitate stable convergence during training. Ground truth networks were derived following the data processing protocol described in CORN [54], with detailed information provided in Additional file 1: Supplementary Text.

Additionally, we gathered single-cell RNA sequencing data from two mouse cell types derived from BEELINE [55], including mouse embryonic stem cells (mESC) [56] and mouse hematopoietic stem cells of the erythroid-lineage (mHSC-E) [57]. Following the data preprocessing pipeline of BEELINE, we first applied a log-transformation to transcripts per kilobase million (TPM) counts and used the resulting values as gene expressions. Genes expressed in fewer than 10% of cells were removed, and Bonferroni correction was applied to filter out genes with corrected P-values greater than 0.01. We then selected the top 1000 highly variable genes using a variance ranking strategy. Finally, the expression values were standardized by Z-score normalization, consistent with the preprocessing applied to bulk RNA-seq data. For ground truth construction, we directly adopted the cell type-specific regulatory networks provided by BEELINE. A summary of the resulting feature matrices and gold-standard networks is presented in Table 2.

To ensure robust performance evaluation and facilitate model training, we employed tailored strategies for dataset partitioning and negative sampling. For the human cell line datasets, we employed a hard negative sampling strategy to introduce more discriminative information [58]. Specifically, for each observed regulatory pair Inline graphic, we constructed hard negative samples Inline graphic, where Inline graphic represents a specific TF and Inline graphic denotes its non-target genes. To maintain class balance, a 1:1 ratio between positive and negative samples was maintained. For the mouse cell type datasets, we applied the same hard negative sampling strategy. However, due to the higher connectivity of these networks, enforcing a strict 1:1 ratio between positive and negative samples was not feasible.

Following negative sample selection, all curated datasets were partitioned into training, validation, and testing sets with a ratio of 60%, 20%, and 20%, respectively. For the TF cold-start analysis, we adopted a 3:1:1 partitioning ratio at the TF level for training, validation, and testing, ensuring that the TFs in each set were mutually exclusive with no overlap, i.e., Inline graphic.

Evaluation metrics

For performance evaluation, we used the Area Under the Receiver Operating Characteristic curve (AUROC) and the Area Under the Precision-Recall curve (AUPRC) as evaluation metrics.

Meta-task construction

Each meta-task comprises a support set and a query set, with all TF-gene pair samples drawn randomly from the dataset. Specifically, we first sample K labeled TF-gene pairs, each consisting of a gene pair and a binary label (0 or 1), to form the support set. We then sample Q labeled TF-gene pairs from the remaining data to form the query set. This process is repeated to generate a total of L meta-tasks. During sampling, we maintain balanced proportions of positive and negative TF-gene pairs in both the support and query sets to reduce the risk of model overfitting. This sampling strategy ensures no overlap between the support and query sets. It also prevents information leakage across different meta-tasks.

From the graph perspective, constructing a meta-task is equivalent to extracting a subgraph from the full graph. Subgraphs are generated by randomly sampling edges from the known GRN and using these edges to construct a subgraph. To retain global information, all nodes from the original graph are preserved during subgraph construction, as illustrated in Fig. 1. Specifically, for the original graph Inline graphic, we sample edges Inline graphic to construct the graph Inline graphic, where Inline graphic refers to the set of sampled edges and Inline graphic represents the subgraph. The subgraph can also be represented as Inline graphic, with the adjacency matrix Inline graphic maintaining the same dimensions as the original adjacency matrix A.

Meta-training

To facilitate knowledge transfer between networks of different cell types and enable rapid adaptation to novel few-shot scenarios, we employ a novel meta-training paradigm for model training. We aims to leverage episodic training on a source dataset Inline graphic with a multitude of labeled data, enabling the model to fast adapt to a target dataset Inline graphic with few labeled data. In this framework, each episode is treated as a task, consisting of a support set and a query set. The support set serves as the training data during meta-training, while the query set acts as the testing data. For dataset Inline graphic, we totally sample L meta-tasks for meta-training, where each support set comprising K gene pairs and each query set comprising Q gene pairs.

As shown in Algorithm 1, meta-training involves a bi-level optimization process comprising an inner loop update and an outer loop update. In the inner update, our goal is to derive optimal initialization parameters by performing one or more gradient descent updates on a small number of samples from the support set. Therefore, we feed the support set samples (i.e., gene pairs) Inline graphic of task l into TGLink f and compute the cross-entropy loss:

graphic file with name 13059_2025_3860_Figa_HTML.jpg

Algorithm 1 Procedure of meta-TGLink for few-shot gene regulatory network inference

graphic file with name d33e1364.gif 1

where Inline graphic and Inline graphic represent labels of the support set. Then, we update the model parameters Inline graphic based on Inline graphic:

graphic file with name d33e1386.gif 2

Finally, we calculate the loss Inline graphic and update model parameters using the query set in a similar manner:

graphic file with name d33e1396.gif 3

Inline graphic and Inline graphic denote as the inner loop learning rate and the outer loop learning rate respectively.

A crucial difference between our meta-training strategy and MAML is that the gradients calculated by the query set do not accumulate across meta-tasks. Instead, we update the model parameters immediately after training on a single task. The motivation is that the tasks we define are highly correlated, all involving link prediction on subgraphs of the same cell line. Therefore, updating parameters on a single task is beneficial for extracting shared knowledge between tasks while also reducing GPU memory consumption. Another key distinction lies in how the support set is handled. In MAML, the support set does not directly influence the model parameters, whereas our approach takes the impact of the support set on the model parameters into account. This is primarily attributed to our utilization of a single-task update strategy, allowing for direct parameter updates using the support set. This approach not only takes full advantage of limited labeled data but also enhances the predictive performance within the task.

Meta-testing

Our goal is to infer the complete GRN using a limited number of known regulatory relationships. Therefore, during the meta-testing stage, only a meta-task needs to be sampled. This task still includes a support set and a query set. However, unlike the meta-training stage, the query set now includes all samples in Inline graphic excluding those in the support set, rather than just Q samples. Similar to the meta-training stage, we feed the support gene pairs into the model and then the pretrained parameters Inline graphic are updated through gradient descent for a specified number of iterations to obtain adapted parameters. Finally, we utilize the updated Inline graphic to make predictions on the query set.

Positional encoding module

In gene representation learning, the topological position of genes within GRN is important, particularly for cell type-specific networks. A gene’s topological properties reflect its regulatory role: (1) Degree centrality indicates gene identity, with TFs typically having higher degrees as interaction sources and TGs showing lower degrees. (2) PageRank scores [59] quantify gene importance, where hub TFs controlling many pathways receive higher scores.

To explicitly incorporate biologically meaningful positional information, we design a positional encoding module that integrates degree centrality and PageRank scores into gene representations. The degree-driven encoding captures role differentiation (TF vs. TG), while PageRank-based encoding reflects the overall importance of genes.

Degree and PageRank scores are encoded separately using two distinct MLPs and subsequently combined with gene features through an additional MLP. To facilitate cross-cell lines and cross-species GRN inference, we apply singular value decomposition (SVD) to reduce the dimensionality of the gene expression matrix X, ensuring consistent input dimensions across datasets. The complete encoding procedure is summarized below:

graphic file with name d33e1443.gif 4
graphic file with name d33e1447.gif 5
graphic file with name d33e1451.gif 6
graphic file with name d33e1455.gif 7

where Inline graphic. Inline graphic and Inline graphic refer to the degree and the PageRank score of gene i, respectively. Inline graphic refers to concatenate operation. Inline graphic denotes the input feature vector of gene i, obtained from Inline graphic. Inline graphic and Inline graphic represent the degree encoding vector and the PageRank encoding vector, respectively. The input dimension Inline graphic and the encoding dimension Inline graphic are both set to 200. Inline graphic refers to the output embedding of the positional encoding module.

Neighborhood perception module

To effectively capture gene interactions in few-shot link prediction, inspired by TransGNN [60], we employ Transformer to assist GNNs in information aggregation. However, not all information contributes positively to representation learning. Without proper filtering, irrelevant signals may introduce noise and degrade model performance. To address this issue, we design a neighborhood perception module, which selects the top n most relevant genes for inclusion in the Transformer’s self-attention computation.

We quantify gene relevance using the Pearson correlation coefficient (PCC), which identifies co-expression patterns that may reflect potential regulatory relationships. However, PCC alone often introduces false positives, leading to misleading information aggregation. To mitigate this, we refine the selection process by incorporating structural constraints from the adjacency matrix. The resulting relevance matrix is computed as follows:

graphic file with name d33e1524.gif 8

where P denotes the PCC matrix. A and I represent the adjacency matrix and the identity matrix respectively. Inline graphic is a scaling factor that we empirically set to 0.5. We then select the top n genes based on R with the highest relevance scores:

graphic file with name d33e1549.gif 9

where Inline graphic refers to the relevant genes set of gene i and Inline graphic denotes the selecting function of the top n genes from the i-th row of R.

Structure-enhanced GNN module

Our primary objective is to improve GRN inference performance in few-shot scenarios. However, prior networks in these contexts are typically sparse, which severely limits message passing in GNNs. To overcome this limitation, we integrate a Transformer into the GNN to expand its receptive field. By leveraging the global attention mechanism of the Transformer, GNNs can capture information from long-range nodes, even when direct connections are absent. Specifically, in a single-layer standard GNN, each node can only exchange information with its one-hop neighbors through existing edges. In contrast, our structure-enhanced GNN module first aggregates node features across relevant genes using a Transformer layer before applying standard GNN message passing. This mechanism enables the model to capture both long-range dependencies and interactions among non-adjacent nodes. As illustrated in Additional file 1: Fig. S1, the proposed mechanism effectively alleviates the under-aggregation of regulatory signals caused by network sparsity.

Initially, we stack the embedding of the gene i with the embeddings of relevant genes Inline graphic to form an aggregation matrix, denoted as Inline graphic. Subsequently, Inline graphic is fed into the Transformer layer for computing the attention scores between the gene i and its relevant genes:

graphic file with name d33e1600.gif 10
graphic file with name d33e1604.gif 11

where Inline graphic, Inline graphic, Inline graphic, Inline graphic. We employ multi-head attention to capture interaction information from different latent spaces:

graphic file with name d33e1626.gif 12

where b is the number of head and Inline graphic. Inline graphic refers to concatenate operation. Finally, following a layer of feed-forward network, we obtain the gene embedding Inline graphic that aggregates information from the relevant genes.

Following the Transformer layer, GNNs are further employed to integrate structural and node-level information of genes. To generate the input feature matrix of GNNs, we replace the Inline graphic in the feature matrix Inline graphic with the output embedding Inline graphic from the Transformer layer. Then, two layers of GCNs [22] are utilized for feature extraction. The entire process can be concisely represented as follows:

graphic file with name d33e1663.gif 13
graphic file with name d33e1667.gif 14

where Inline graphic refers to the output feature matrix of the l-th layer of GCN; Inline graphic is the adjacency matrix with added self-connections; Inline graphic represents the degree matrix, where Inline graphic and Inline graphic is a layer-specific learnable weight matrix. Inline graphic denotes the generating function and Inline graphic denotes the ELU [61] activation function.

To facilitate the gene regulatory interactions prediction, we utilize two separate MLPs to project the embeddings of gene i and gene j into the low-dimensional feature space respectively:

graphic file with name d33e1716.gif 15
graphic file with name d33e1720.gif 16

where Inline graphic and Inline graphic can be obtained from the output of final GCN layer Inline graphic. Finally, we compute the similarity between these two embeddings using the dot product to predict the presence of regulatory interactions between the gene pairs:

graphic file with name d33e1738.gif 17

where Inline graphic denotes the linkage score.

Supplementary Information

13059_2025_3860_MOESM1_ESM.pdf (944.6KB, pdf)

Additional file 1: Supplementary Text, Supplementary Tables S1-S3, and Supplementary Figures S1-S8. Supplementary Text includes supplementary methods and results.

Authors' contributions

W. Y, Y. H., J. Q. and L. O. -Y. conceived and designed the study. W. Y. implemented the model, wrote the software and code used for benchmarking, and generated the figures. W. Y. and Z. C. completed the evaluation and analysis. W. Y. and L. O. -Y. drafted the manuscript with critical comments from all the co-authors. Y. H., J. Q. and L. O. -Y. supervised and reviewed the manuscript. All authors read and approved the final manuscript.

Peer review information

Tim Sands and Andrew Cosgrove were the primary editors of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The peer-review history is available in the online version of this article.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 62173235, 62473266, 12571327, 12222112, 12426311 and 32170655, in part by the Guangdong Basic and Applied Basic Research Foundation under Grants 2024B1515020059, 2023A1515012395 and 2024A1515011210, in part by the Shenzhen Science and Technology Program under Grants RCYX20221008092922051, JCYJ20230808105802006, RCJC20221008092753082, RCYX20231211090222026, JCYJ20241202124209011, 202206193000001 and 20220817122906001, in part by the (Key) Project of Department of Education of Guangdong Province under Grant 2022ZDZX1022 and 2023ZDZX1017, in part by the Shenzhen Medical Research Fund under Grant B2502001, and in part by the Research Team Cultivation Program of Shenzhen University under Grant 2023QNT011.

Data availability

We constructed ground truth gene regulatory networks (GRNs) for four human cell lines (A375, A549, HEK293T, and PC3) using the methodology described in CORN[54] (see Additional file 1: Supplementary Text). Raw bulk RNA-seq data were obtained from the L1000 platform (https://clue.io) [49]. We also collected two single-cell RNA-seq datasets from BEELINE [55], which are publicly available on Zenodo at https://doi.org/10.5281/zenodo.3378975 [62]. To facilitate reproducibility of the benchmark, all data and codes used in this study are publicly available under an MIT license on Zenodo at: https://doi.org/10.5281/zenodo.17106585 [63]. The source codes of Meta-TGLink and data are available in Github under an MIT license at: https://github.com/Yoyiming/Meta-TGLink [64].

Declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Yaohua Hu, Email: mayhhu@szu.edu.cn.

Jing Qin, Email: qinj29@mail.sysu.edu.cn.

Le Ou-Yang, Email: leouyang@smbu.edu.cn.

References

  • 1.Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, et al. Wisdom of crowds for robust gene network inference. Nat Methods. 2012;9(8):796–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Park JH, Hothi P, de Lomana ALG, Pan M, Calder R, Turkarslan S, et al. Gene regulatory network topology governs resistance and treatment escape in glioma stem-like cells. Sci Adv. 2024;10(23):eadj7706. [DOI] [PMC free article] [PubMed]
  • 3.Madhamshettiwar PB, Maetschke SR, Davis MJ, Reverter A, Ragan MA. Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets. Genome Med. 2012;4:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Li X, Wang CY. From bulk, single-cell to spatial RNA sequencing. Int J Oral Sci. 2021;13(1):36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Badia-i Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet. 2023;24(11):739–54. [DOI] [PubMed] [Google Scholar]
  • 6.Kim S. ppcor: an R package for a fast calculation to semi-partial correlation coefficients. Commun Stat Appl Methods. 2015;22(6):665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ou-Yang L, Zhang XF, Zhao XM, Wang DD, Wang FL, Lei B, et al. Joint learning of multiple differential networks with latent variables. IEEE Trans Cybern. 2019;49(9):3494–506. [DOI] [PubMed] [Google Scholar]
  • 8.Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS One. 2010;5(9):e12776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Moerman T, Aibar Santos S, Bravo González-Blas C, Simm J, Moreau Y, Aerts J, et al. GRNboost2 and Arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics. 2019;35(12):2159–61. [DOI] [PubMed] [Google Scholar]
  • 10.Shu H, Zhou J, Lian Q, Li H, Zhao D, Zeng J, et al. Modeling gene regulatory networks using neural network architectures. Nat Comput Sci. 2021;1(7):491–501. [DOI] [PubMed] [Google Scholar]
  • 11.Zhang Y, Wang M, Wang Z, Liu Y, Xiong S, Zou Q. MetaSEM: gene regulatory network inference from single-cell RNA data by meta-learning. Int J Mol Sci. 2023;24(3):2595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yuan Y, Bar-Joseph Z. Deep learning for inferring gene relationships from single-cell expression data. Proc Natl Acad Sci U S A. 2019;116(52):27151–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chen J, Cheong C, Lan L, Zhou X, Liu J, Lyu A, et al. DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-seq data. Brief Bioinform. 2021;22(6):bbab325. [DOI] [PMC free article] [PubMed]
  • 14.Lin Z, Ou-Yang L. Inferring gene regulatory networks from single-cell gene expression data via deep multi-view contrastive learning. Brief Bioinform. 2022;24(1):bbac586. [DOI] [PubMed] [Google Scholar]
  • 15.Kc K, Li R, Cui F, Yu Q, Haake AR. GNE: a deep learning framework for gene network inference by aggregating biological information. BMC Syst Biol. 2019;13:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Skok Gibbs C, Mahmood O, Bonneau R, Cho K. PMF-GRN: a variational inference approach to single-cell gene regulatory network inference using probabilistic matrix factorization. Genome Biol. 2024;25(1):88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS. A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst. 2021;32(1):4–24. [DOI] [PubMed] [Google Scholar]
  • 18.Wang Z, Xu G, Yu W, Ou-Yang L. LineGRN: a line graph neural network for gene regulatory network inference. IEEE Journal of Biomedical and Health Informatics. 2025. 10.1109/JBHI.2025.3591840. [DOI] [PubMed]
  • 19.Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph Attention Networks. Proc. Int. Conf. Learn. Representat., pp. 1-12, 2018. https://openreview.net/forum?id=rJXMpikCZ
  • 20.Chen G, Liu ZP. Graph attention network for link prediction of gene regulations from single-cell RNA-sequencing data. Bioinformatics. 2022;38(19):4522–9. [DOI] [PubMed] [Google Scholar]
  • 21.Yu W, Lin Z, Lan M, Ou-Yang L. GCLink: a graph contrastive link prediction framework for gene regulatory network inference. Bioinformatics. 2025;41(3):btaf074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhang H, Lu G, Zhan M, Zhang B. Semi-supervised classification of graph convolutional networks with Laplacian rank constraints. Neural Process Lett. 2022;54(4):2645–56. [Google Scholar]
  • 23.Mao G, Pang Z, Zuo K, Wang Q, Pei X, Chen X, et al. Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform. 2023;24(6):bbad414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wang JC, Chen YJ, Zou Q. GRACE: Unveiling Gene Regulatory Networks With Causal Mechanistic Graph Neural Networks in Single-Cell RNA-Sequencing Data. IEEE Trans Neural Netw Learn Syst. 2025;36(5):9005–17. [DOI] [PubMed] [Google Scholar]
  • 25.Theodoris CV, Xiao L, Chopra A, Chaffin MD, Al Sayed ZR, Hill MC, et al. Transfer learning enables predictions in network biology. Nature. 2023;618(7965):616–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Cui H, Wang C, Maan H, Pang K, Luo F, Duan N, et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods. 2024;21:1470–80. [DOI] [PubMed]
  • 27.Feng G, Qin X, Zhang J, Huang W, Zhang Y, Cui W, et al. CellPolaris: Decoding Cell Fate through Generalization Transfer Learning of Gene Regulatory Networks. bioRxiv [Preprint]. 2023:559244. 10.1101/2023.09.25.559244.
  • 28.Hospedales T, Antoniou A, Micaelli P, Storkey A. Meta-learning in neural networks: a survey. IEEE Trans Pattern Anal Mach Intell. 2022;44(9):5149–69. [DOI] [PubMed] [Google Scholar]
  • 29.Lai N, Kan M, Han C, Song X, Shan S. Learning to learn adaptive classifier-predictor for few-shot learning. IEEE Trans Neural Netw Learn Syst. 2021;32(8):3458–70. [DOI] [PubMed] [Google Scholar]
  • 30.Zhou F, Qi X, Zhang K, Trajcevski G, Zhong T. MetaGeo: a general framework for social user geolocation identification with few-shot learning. IEEE Trans Neural Netw Learn Syst. 2023;34(11):8950–64. [DOI] [PubMed] [Google Scholar]
  • 31.Guo Z, Zhang C, Yu W, Herr J, Wiest O, Jiang M, et al. Few-Shot Graph Learning for Molecular Property Prediction. In: Proceedings of the Web Conference 2021. 2021. pp. 2559–67. Association for Computing Machinery, New York, USA.
  • 32.Lv Q, Chen G, Yang Z, Zhong W, Chen CYC. Meta-MolNet: A Cross-Domain Benchmark for Few Examples Drug Discovery. IEEE Trans Neural Netw Learn Syst. 2025;36(3):4849–63 . [DOI] [PubMed]
  • 33.Bose AJ, Jain A, Molino P, Hamilton WL. Meta-Graph: Few Shot Link Prediction via Meta Learning. arXiv preprint arXiv: 1912.09867, 2020. https://arxiv.org/abs/1912.09867
  • 34.Huang K, Zitnik M. Graph meta learning via local subgraphs. Adv Neural Inf Process Syst. 2020;33:5862–74. [Google Scholar]
  • 35.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30:6000–10. [Google Scholar]
  • 36.Zou Z, Ohta T, Oki S. ChIP-atlas 3.0: a data-mining suite to explore chromosome architecture together with large-scale regulome data. Nucleic Acids Res. 2024;52(W1):W45-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70; 2017. pp. 1126–35. JMLR.org, Sydney, NSW, Australia.
  • 38.Tessema M, Yingling CM, Snider AM, Do K, Juri DE, Picchi MA, et al. GATA2 is epigenetically repressed in human and mouse lung tumors and is not requisite for survival of KRAS mutant lung cancer. J Thorac Oncol. 2014;9(6):784–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Hu Y, Dong Z, Liu K. Unraveling the complexity of STAT3 in cancer: molecular understanding and drug discovery. J Exp Clin Cancer Res. 2024;43(1):23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Xu X, Wang X, Chen Q, Zheng A, Li D, Meng Z, et al. Sp1 promotes tumour progression by remodelling the mitochondrial network in cervical cancer. J Transl Med. 2023;21(1):307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Li L, Davie JR. The role of Sp1 and Sp3 in normal and cancer cell biology. Ann Anat. 2010;192(5):275–83. [DOI] [PubMed] [Google Scholar]
  • 42.Muzio G, Trombetta A, Maggiora A, Martinasso G, Vasiliou V, Lassen N, et al. Arachidonic acid suppresses growth of human lung tumor A549 cells through down-regulation of ALDH3A1 expression. Free Radic Biol Med. 2006;40:1929–38. [DOI] [PubMed] [Google Scholar]
  • 43.Stewart RL, Carpenter BL, West DS, Knifley T, Liu L, Wang C, et al. S100A4 drives non-small cell lung cancer invasion, associates with poor prognosis, and is effectively targeted by the FDA-approved anti-helminthic agent niclosamide. Oncotarget. 2016;7(23):34630–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Yaguchi T, Nakano T, Gotoh A, Nishizaki T. Adenosine promotes GATA-2-regulated p53 gene transcription to induce HepG2 cell apoptosis. Cell Physiol Biochem. 2011;28(4):761–70. 10.1159/000335770. [DOI] [PubMed] [Google Scholar]
  • 45.Wang D, Uyemura B, Hashemi E, Bjorgaard S, Riese M, Verbsky J, et al. Role of GATA2 in human NK cell development. Crit Rev Immunol. 2021;41(2):21–33 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Cheng Y, Ma Z, Kim BH, Wu W, Cayting P, Boyle AP, et al. Principles of regulatory information conservation between mouse and human. Nature. 2014;515(7527):371–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Weng G, Martin P, Kim H, Won KJ. Integrating prior knowledge using transformer for gene regulatory network inference. Adv Sci. 2025;12(3):2409990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ying C, Cai T, Luo S, Zheng S, Ke G, He D, et al. Do transformers really perform badly for graph representation? Adv Neural Inf Process Syst. 2021;34:28877–88. [Google Scholar]
  • 49.Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell. 2017;171(6):1437–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Giard DJ, Aaronson SA, Todaro GJ, Arnstein P, Kersey JH, Dosik H, et al. In vitro cultivation of human tumors: establishment of cell lines derived from a series of solid tumors. J Natl Cancer Inst. 1973;51(5):1417–23. [DOI] [PubMed] [Google Scholar]
  • 51.Lieber M, Todaro G, Smith B, Szakal A, Nelson-Rees W. A continuous tumor-cell line from a human lung carcinoma with properties of type II alveolar epithelial cells. Int J Cancer. 1976;17(1):62–70. [DOI] [PubMed] [Google Scholar]
  • 52.DuBridge RB, Tang P, Hsia HC, Leong PM, Miller JH, Calos MP. Analysis of mutation in human cells by using an Epstein-Barr virus shuttle system. Mol Cell Biol. 1987;7(1):379–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kaighn M, Narayan KS, Ohnuki Y, Lechner JF, Jones L. Establishment and characterization of a human prostatic carcinoma cell line (PC-3). Investig Urol. 1979;17(1):16–23. [PubMed] [Google Scholar]
  • 54.Leung RWT, Jiang X, Zong X, Zhang Y, Hu X, Hu Y, et al. CORN—condition orientated regulatory networks: bridging conditions to gene networks. Brief Bioinform. 2022;23(6):bbac402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali T. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods. 2020;17(2):147–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Hayashi T, Ozaki H, Sasagawa Y, Umeda M, Danno H, Nikaido I. Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs. Nat Commun. 2018;9(1):619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Nestorowa S, Hamey FK, Pijuan Sala B, Diamanti E, Shepherd M, Laurenti E, et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood J Am Soc Hematol. 2016;128(8):e20–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Yang Z, Ding M, Zou X, Tang J, Xu B, Zhou C, et al. Region or global? a principle for negative sampling in graph-based recommendation. IEEE Trans Knowl Data Eng. 2022;35(6):6264–77. [Google Scholar]
  • 59.Page L. The PageRank citation ranking: Bringing order to the web. Technical Report. 1999.
  • 60.Zhang P, Yan Y, Zhang X, Li C, Wang S, Huang F, et al. TransGNN: Harnessing the Collaborative Power of Transformers and Graph Neural Networks for Recommender Systems. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024. pp. 1285–95. Association for Computing Machinery, Inc.
  • 61.Clevert DA, Unterthiner T, Hochreiter S. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).Proc. Int. Conf. Learn. Representat., pp. 1-14 2016. https://arxiv.org/abs/1511.07289
  • 62.Pratapa A, Jalihal A, Law J, Bharadwaj A, Murali T.M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Zenodo. 2025. 10.5281/zenodo.3701939. [DOI] [PMC free article] [PubMed]
  • 63.Yu W, Chen Z, Hu Y, Qin J, Ou-Yang L. Structure-Enhanced Graph Meta Learning for Few-Shot Gene Regulatory Networks Inference. Zenodo. 2025. 10.5281/zenodo.17106585. [DOI] [PMC free article] [PubMed]
  • 64.Yu W, Chen Z, Hu Y, Qin J, Ou-Yang L. Structure-Enhanced Graph Meta Learning for Few-Shot Gene Regulatory Networks Inference. Github. 2025. Accessed 2025 27 Oct. https://github.com/Yoyiming/Meta-TGLink [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13059_2025_3860_MOESM1_ESM.pdf (944.6KB, pdf)

Additional file 1: Supplementary Text, Supplementary Tables S1-S3, and Supplementary Figures S1-S8. Supplementary Text includes supplementary methods and results.

Data Availability Statement

We constructed ground truth gene regulatory networks (GRNs) for four human cell lines (A375, A549, HEK293T, and PC3) using the methodology described in CORN[54] (see Additional file 1: Supplementary Text). Raw bulk RNA-seq data were obtained from the L1000 platform (https://clue.io) [49]. We also collected two single-cell RNA-seq datasets from BEELINE [55], which are publicly available on Zenodo at https://doi.org/10.5281/zenodo.3378975 [62]. To facilitate reproducibility of the benchmark, all data and codes used in this study are publicly available under an MIT license on Zenodo at: https://doi.org/10.5281/zenodo.17106585 [63]. The source codes of Meta-TGLink and data are available in Github under an MIT license at: https://github.com/Yoyiming/Meta-TGLink [64].


Articles from Genome Biology are provided here courtesy of BMC

RESOURCES