Transcriptome Transformer: improving patient survival prediction via multitask learning of transcriptomic and clinical features

Bonil Koo; Inyoung Sung; Sangseon Lee; Sun Kim

doi:10.1093/bib/bbaf628

. 2025 Nov 25;26(6):bbaf628. doi: 10.1093/bib/bbaf628

Transcriptome Transformer: improving patient survival prediction via multitask learning of transcriptomic and clinical features

Bonil Koo ^1,², Inyoung Sung ³, Sangseon Lee ⁴, Sun Kim ^5,^6,^7,^8,^✉

PMCID: PMC12645844 PMID: 41288989

Abstract

Accurate survival prediction is essential in healthcare as it guides treatment strategies and improves patient outcomes. While clinical features provide valuable prognostic information, they often fail to represent the molecular complexity of diseases. Transcriptomic data, which reflects gene expression patterns of tumors, present a complementary perspective to address this limitation. We introduce Transcriptome Transformer (TxT), a multitask learning framework that uses a transcriptome-centric approach to improve patient survival prediction. TxT employs a Transformer-based architecture with multihead attention mechanisms to effectively capture complex dependencies among genes, enabling dynamic modeling of gene–gene interactions while using shared information across multiple clinical prediction tasks. By jointly analyzing transcriptomic data and incorporating clinical features, TxT offers a more complete representation of patient biology. In experiments across both single-task and multitask datasets, TxT outperformed existing methods in survival prediction and related clinical tasks. Additionally, TxT offers biological insights through attention-derived gene interaction networks, identifying immune-related pathways in longer-surviving Luminal A patients and coagulation and epithelial–mesenchymal transition pathways in shorter-surviving counterparts. Differential attention analysis further revealed that integrating clinical features enhances the model’s ability to prioritize genes involved in biologically meaningful pathways that are known to influence tumor progression and distant recurrence. The source code of TxT is available at https://github.com/BonilKoo/TxT.

Keywords: transcriptome, transformer, survival prediction, multitask learning, gene expression

Introduction

Motivation

Accurate prediction of survival outcomes in cancer patients is crucial for guiding personalized treatment decisions, optimizing resource allocation, and informing patients about their prognosis [1]. While traditional clinical features, such as age and tumor size, are important predictors of survival and recurrence, they are not designed to probe the molecular complexity and heterogeneity of cancer that can significantly influence patient outcomes [2].

Advances in transcriptomic profiling have revealed that gene expression patterns offer critical insights into tumor biology, disease progression, and patient outcomes [3]. For instance, overexpression of osteopontin (SPP1) has been shown to upregulate key epithelial–mesenchymal transition (EMT) transcription factors, thereby promoting aggressive phenotypes in breast cancer [4]. In triple-negative breast cancer, elevated expression of immune-related genes is associated with better survival, suggesting that immune transcriptomic activity can outweigh otherwise aggressive tumor biology [5]. EMT-related expression programs have also been consistently linked to recurrence and therapeutic resistance across various tumor types [6]. These findings underscore the clinical relevance of transcriptomic data and have led to an increased focus on leveraging gene expression profiles to infer clinical features and improve survival prediction in cancer.

Among clinical factors, age and tumor size have long been recognized as significant predictors of survival and recurrence in cancer patients. However, leveraging additional molecular data, particularly transcriptomics, to augment the predictive power of these established factors continues to be a significant research focus. While previous studies have explored the integration of genomic mutations with clinical features [7], approaches that employ a transcriptome-centric strategy to infer clinical features and predict survival are not well studied.

In this study, we propose a transcriptome-centric approach to survival prediction. Our goal is to leverage the rich information in transcriptomic data together with clinical features for patient survival prediction so that survival prediction accuracy can be improved in explainable ways in terms of molecular mechanisms of cancers (Fig. 1).

Approaches to survival prediction using clinical and transcriptomic data. (A) A model that predicts survival outcomes based solely on clinical features. (B) A model that relies exclusively on transcriptome-wide gene expression data for survival prediction. (C) A model that combines both clinical features and transcriptomic data to predict patient prognosis, leveraging complementary information from these data sources. (D) The proposed *Transcriptome Transformer* (TxT) framework, which employs multitask learning (MTL) to jointly integrate transcriptomic and clinical features, leading to enhanced survival prediction and clinical-feature inference.

Challenge

In developing a transcriptome-centric model for patient survival prediction, two main challenges arise. First, transcriptomic profiles are real-valued high-dimensional vectors representing expression levels across numerous genes while clinical features are often single values (e.g. patient age or cancer grade). Designing a prediction model that primarily processes transcriptomic data while effectively incorporating clinical features presents significant challenges, as each data type must be handled in a way that preserves its unique properties and maximizes its contribution to the prediction task.

Second, predicting survival from transcriptomic data requires representing intricate gene interactions. A single gene may carry out multiple functions depending on the regulatory setting. For example, p53 is involved in DNA repair, cell-cycle arrest, and apoptosis [8, 9]. Considering these varied functions requires a modeling strategy that can represent dynamic interactions among genes. Understanding these interactions is crucial for survival prediction because they can reveal potential therapeutic targets or biomarkers associated with disease progression. Moreover, the technical challenges of unifying clinical and transcriptomic data continue to be substantial, given their different scales and data structures. Resolving these issues in a single framework may enhance the predictive capability for both established clinical features and complex molecular indicators.

Approach

This study introduces TxT, an MTL model that adopts a transcriptome-centric approach to patient survival prediction. TxT primarily processes transcriptomic data while integrating clinical features in a complementary manner, addressing the challenges of handling disparate data types, and modeling complex gene–gene interactions. The model employs a Transformer-based architecture [10], applying multihead attention to capture complex relationships among genes, which is essential for accurately modeling the intricate biological processes involved in cancer. This approach is based on the principle that each gene’s role may vary depending on its regulatory contexts, similar to how a word’s interpretation can shift depending on its surrounding text.

TxT utilizes a novel positional embedding method that interprets gene expression levels as position-like information, allowing the model to create patient-specific representations based on individual molecular profiles. Traditional positional encoding methods impose fixed or artificially defined positional rules, which may not properly represent the dynamic and context-dependent interactions among genes. Instead, TxT interprets gene expression levels as position-like information, enabling the creation of patient-specific representations grounded in each individual’s molecular profile. This data-driven approach offers greater flexibility, allowing the model to adaptively capture complex gene interactions and better represent the unique regulatory landscapes present in different patients.

To implement this transcriptome-centric approach, TxT utilizes an MTL framework that simultaneously predicts multiple clinical features and survival outcomes from transcriptomic data. By formulating survival prediction tasks in a unified framework, the model leverages the mutually beneficial connections between tasks, where improvements in predicting clinical features can enhance survival prediction accuracy. Such an approach enables the model to learn reliable data patterns that capture both established prognostic markers and high-dimensional transcriptomic signals. Ultimately, these shared representations enhance survival prediction performance, which is the primary focus of this work.

Related work

Transformer

While Transformer-based architectures have been widely studied in various fields, their application to bulk RNA-seq transcriptome data remains relatively underexplored. Transformer is designed to work with sequences of tokens, but transcriptome data usually comes in the form of numerical values (i.e. gene expression quantities). DeepGene Transformer [11] processes gene expression levels as a 1D vector, using convolutional neural networks (CNNs) instead of positional encoding for the multihead attention layer. However, this CNN-based approach may limit the model’s ability to learn explicit relationships between genes in the multihead attention layer.

T-GEM [12] uses gene expression quantities directly as tokens and employs gene-dependent parameters, distinguishing it from traditional Transformers. While this approach allows for gene-specific learning, it increases the number of model parameters proportionally to the number of genes used. Additionally, T-GEM’s learning is limited to the context provided by transcriptome data without incorporating prior biological knowledge.

Multitask learning

MTL is an approach that aims to improve model performance by jointly learning multiple related tasks [13, 14]. This technique can lead to more generalized and robust data representations, which is particularly valuable in complex biological contexts. MTL is widely studied in various fields, such as computer vision [15, 16], natural language processing (NLP) [17, 18], and drug discovery [19, 20]. Recently, studies related to MTL have also been conducted in the field of bioinformatics [21, 22].

OmiEmbed [23], to our knowledge, is the only study that uses omics data to simultaneously predict age, cancer type, and survival. While it uses a variational autoencoder (VAE) [24] architecture and GradNorm [25] optimization, the VAE’s constraint to specific distributions may limit its flexibility in modeling complex gene interactions. Our approach aims to address this limitation through a more flexible Transformer-based architecture.

Materials and methods

Computational formulation of transcriptome data for transformer models

To effectively utilize transcriptome data within the Transformer framework, we adopt a formulation inspired by NLP. In the case of texts, words in a sentence are vectorized using the word embedding method [26], and the position of each word in the sentence is encoded using positional encoding [10, 27]. As a result, the words and position information within the sentence collectively learn the context of each word. Subsequently, the word embeddings with position information are used as input to the Transformer for performing specific tasks such as text classification.

In the context of transcriptome data, we liken a patient’s transcriptomic profile to a sentence, where individual genes correspond to words. This analogy helps illustrate how understanding the relationship between genes and their expression levels enables the model to learn the context of genes within each patient. Consequently, both embedding and positional information of genes are integrated to form comprehensive representations. Specifically, transcriptome data, such as gene expression levels, provide position-like information that updates gene embeddings for each sample through the layers of the transformer. This novel approach enhances the model’s ability to capture intricate gene–gene interactions and predict patient conditions or outcomes effectively. Importantly, we do not assume or impose any biologically meaningful ordering of genes. Gene expression values are treated as an unordered set, and any positional information is learned in a data-driven manner through transcriptome-based embeddings.

Model architecture

Building upon the computational formulation, we present the architecture of the TxT model, which is designed to leverage both transcriptomic and clinical data for improved survival prediction. The overview of the model architecture is shown in Fig. 2.

Inline graphic — Illustration of the proposed model architecture. The embedding for each gene, used as input to the transformer, is derived from a pretrained node2vec model on the protein–protein interaction (PPI) network. Gene expression values are transformed through a projection layer, serving as transcriptome-based positional embedding. These embeddings are processed through multihead self-attention and gene-wise feed-forward layers to capture complex gene–gene interactions and sample-specific transcriptomic profiles. Finally, the latent embeddings are passed to task-specific layers to predict outcomes for multiple tasks. Notations: denotes the number of samples, represents the number of genes, is the embedding dimension of a gene, signifies the number of encoder layers, and represents the number of tasks.

Pretraining gene embeddings on biological networks

Genes interact with each other and function collectively in complexes, while certain genes are co-expressed due to a shared regulatory mechanism. A biological network serves as a representation of the intricate interactions and relationships among various biological entities within a living system. Among various types of biological networks, the protein–protein interaction (PPI) network represents the physical or functional associations between proteins or genes and is widely used to encode biologically relevant gene–gene dependencies in transcriptomic modeling studies [28, 29].

Instead of employing randomly initialized gene embeddings as input to the Transformer, we effectively utilize pretrained gene embeddings on the PPI network to incorporate informative prior knowledge. Among gene embedding methods, we used node2vec [30] to encode biological context into the gene representations. This initialization provides the model with inductive biases that reflect known biological structure, potentially improving generalization, and interpretability in downstream predictive tasks.

Transcriptome-based positional embedding

As discussed in Section “Computational formulation of transcriptome data for transformer models”, transcriptome data play a pivotal role akin to positional encoding for gene embeddings in each sample. Unlike traditional positional encodings used for sequence data, our approach does not rely on any predefined ordering of genes and instead learns position-related signals directly from expression profiles.

Gene embeddings and transcriptome data originate from distinct heterogeneous resources. Drawing inspiration from TUPE [31], which is an effective alteration that unties words and positional information in the Transformer, we introduce projection layers for our novel approach called transcriptome-based positional embedding (Fig. 3). This process occurs during the calculation of attention values ( Inline graphic ) in the attention layers, specifically for each query gene and key gene pair of sample , as defined by the equation:

Transcriptome-based positional embedding. The transcriptomic data serve as a position-like role, enabling sample-specific updates to gene embeddings across transformer layers. Initially, all gene embeddings are equally initialized, but they are dynamically adjusted based on patient-specific transcriptomic profiles during model processing. Notations: represents the embedding of gene , and denote the learnable parameters for query and key in the multihead attention layer, respectively, signifies the gene expression value for gene of patient , and represent learnable parameters for query and key in the projection layer, respectively, denotes the attention value for the query gene and key gene pair of sample , and represents the gene set used.

(1)

Here, Inline graphic represents the dimensionality of hidden representations, is the embedding vector of gene in the th layer, and denotes the gene expression value of gene for sample . Matrices and are projection matrices for the gene embedding of query and key genes in the th layer, respectively, while vectors Inline graphic and are projection vectors for the positional embedding of query and key genes. Consequently, by incorporating transcriptomic profile information into the initially consistent gene embeddings across all samples, it becomes possible to tailor the gene embedding specifically for each sample through the layers of the transformer, facilitating prediction of the target variable associated with that particular sample.

Multitask optimization technique

TxT employs an MTL framework to simultaneously predict multiple clinical features and survival outcomes by sharing transformer encoder layers across all tasks while maintaining task-specific output layers. Specifically, the shared encoder learns robust representations that capture both clinical and transcriptomic signals, which are then fed into separate heads tailored for each prediction task.

In the MTL, a model is typically trained to minimize a composite objective function represented by the equations:

(2)

and

(3)

Here, Inline graphic denotes the number of tasks, and are coefficients and loss for task , respectively, and and are shared parameters and task-specific parameters of model for task , respectively. represents the number of samples, and and denote input features and the target variable for task of sample Inline graphic , respectively. The loss function varies according to the task , typically adopting mean squared error for regression and cross-entropy for classification. In our study on survival prediction using neural networks, we employed the loss function proposed by N-MTLR [32], which models survival as a sequence of binary classification tasks over discretized time intervals (Supplementary Methods).

To optimize MTL, various techniques have been developed. Some studies focus on finding optimal values of Inline graphic to ensure effective training across all tasks [15, 25, 33]. In contrast, PCGrad [34] represents a distinct approach by minimizing conflicts among tasks and guiding the gradient vectors themselves so that all tasks learn in an appropriate direction, without relying on predefined or learned task-specific weights. If a gradient of one task exhibits negative cosine similarity to gradients of multiple tasks, the gradients are conflicting. Specifically, this becomes problematic when the gradient magnitude differences are large or there is a high curvature in the objective landscape. PCGrad addresses these issues by projecting the gradient vector of one task onto the normal plane of another task in the presence of conflicting gradient vectors. The projection is expressed as:

(4)

where Inline graphic represents the gradient vector of task .

Differential attention analysis

To assess how incorporating clinical features alters the distribution of gene-level attention weights, we trained two variants of our model—one with clinical outputs (multitask) and one without (single-task)—and compared their attention patterns. For each gene, we computed an aggregate attention score under both settings and performed a t-test to identify genes whose attention increased significantly with clinical features. We then applied false discovery rate (FDR) correction to account for multiple comparisons, retaining only those genes with adjusted P-values <.05. Finally, pathway enrichment analysis was conducted on the genes showing clinically influenced attention gains, offering insight into the molecular processes that become more prominent when clinical variables guide the model’s attention.

Gene interaction modeling via multihead attention

After training TxT, we constructed a gene interaction network by extracting attention weights from multihead attention layers for each test sample. Specifically, if the attention weight between query and key genes within each head exceeds 0.01, an edge is formed from the key gene to the query gene. Combining these networks from all heads resulted in the comprehensive gene interaction network for each sample. Subsequently, for each sample, we identified genes with out-degree centrality surpassing Inline graphic as hub genes, while those with in-degree centrality exceeding were designated as attractor genes. Here, and represent the average out-degree centrality and standard deviation of out-degree centrality in a network, respectively, and and denote the average in-degree centrality and standard deviation of in-degree centrality in a network, respectively.

To facilitate the comparison and interpretation of hub and attractor genes between groups, the proportion of each gene selected in each group was computed. By counting the frequency with which each gene was selected in both groups and conducting the Fisher’s exact test, we obtained P-values indicating the differences in gene selection between the groups. Similar to the differentially expressed gene analysis, the ratio in the proportion of samples selecting each gene in each group was calculated. Genes with an absolute proportion ratio >2 and an adjusted P-value corrected by FDR <.05 were identified, thus revealing core genes for each group.

Experiments

Datasets

In our experiments, we evaluated model performance on three different datasets, covering both single-task learning (STL) and MTL scenarios. For MTL, we used the SCAN-B dataset [35], focusing on breast cancer under two survival endpoints—overall survival (OS) and distant recurrence-free interval (DRFi). Additionally, the TARGET-AML dataset [36] was employed, where age and OS were framed as multitask objectives in a pediatric acute myeloid leukemia (AML) cohort. For STL, we used TCGA-BRCA [37] for PAM50 subtype classification in breast cancer patients. Further details about data selection and characteristics are described in Supplementary Methods.

Data preprocessing

We pretrained gene embeddings on the STRING (v12.0) network [38] using node2vec [30] and selected 1000 highly variable genes from MSigDB hallmark sets [39]. Log-transformed gene expression values (log Inline graphic (FPKM+1)) were normalized for each gene. A detailed description of sample selection, replicate handling, and thresholding is provided in Supplementary Methods.

Model training

Each dataset was randomly split into training (70%), validation (10%), and test (20%) sets 10 times to evaluate model robustness. We employed the Adam optimizer with a 0.0001 learning rate and applied early stopping based on validation loss to prevent overfitting. The best-performing model on the validation set was then used for the final evaluation. Additional model architecture details and hyperparameter tuning procedures are available in Supplementary Methods.

Results

Predictive performance across multiple datasets

We first assessed all competing methods on the SCAN-B dataset under two survival endpoints: OS and DRFi. In both scenarios, five tasks were jointly analyzed: predicting age, tumor size, Nottingham histological grade (NHG), PAM50, and survival (OS or DRFi). Table 1 and Table S7 summarize the results for OS and DRFi, respectively.

Table 1.

Performance comparison on the SCAN-B OS dataset using various methods and evaluation metrics. Standard deviations and additional evaluation metrics are provided in Table S6. The best performance for each metric is highlighted in bold, and the second-best is underlined. Metrics with higher values indicating better performance are marked with ( Inline graphic ), while those with lower values being better are marked with (). OS, overall survival; NHG, Nottingham histological grade; C-Index, Concordance Index; IBS, Integrated Brier Score; ACC, accuracy; MAE, mean absolute error; SCC, Spearman’s correlation coefficient; SVM, support vector machine; RF, random forest; MLP, multilayer perceptron

Method		OS		PAM50		NHG		Age		Tumor size
		C-Index ()	IBS ()	ACC ()	F1 ()	ACC ()	F1 ()	MAE ()	SCC ()	MAE ()	SCC ()
STL	SVM	0.645	–	0.905	0.865	0.719	0.623	8.420	0.599	6.965	0.480
	RF	0.639	0.092	0.882	0.821	0.698	0.585	8.593	0.553	7.764	0.393
	MLP	0.669	0.108	0.899	0.864	0.710	0.648	8.009	0.645	7.646	0.450
	OmiEmbed [23]	0.674	0.098	0.872	0.832	0.670	0.625	7.296	0.672	7.179	0.484
	T-GEM [12]	0.696	0.096	0.914	0.883	0.713	0.640	15.280	0.270	11.647	0.308
	Autosurv [40]	0.632	0.096	0.664	0.464	0.632	0.482	65.479	0.148	20.109	0.200
	SurvConvMixer [41]	0.672	0.100	0.848	0.792	0.686	0.569	9.105	0.461	7.714	0.345
	CNN+FMAP [42]	0.677	0.090	0.817	0.766	0.617	0.564	10.562	0.029	8.136	0.034
	TxT	0.701	0.087	0.920	0.891	0.715	0.649	7.185	0.684	6.778	0.485
MTL	OmiEmbed [23]	0.778	0.112	0.898	0.860	0.712	0.635	7.231	0.673	6.735	0.518
	TxT	0.797	0.084	0.920	0.891	0.722	0.667	7.140	0.681	6.721	0.519

Open in a new tab

On the SCAN-B OS dataset (Table 1), TxT (MTL) demonstrated better performance across nearly all metrics, including the lowest mean absolute error (MAE) for predicting age and tumor size, and the highest classification accuracy for NHG and PAM50. Moreover, TxT (MTL) achieved a C-Index of 0.797, compared to 0.701 under TxT (STL), corresponding to a 13.7% relative improvement. A similar trend emerged for the DRFi endpoint (Table S7), where TxT (MTL) showed better performance compared with other methods, achieving the best or second-best results on every task and further underlining the benefits of MTL.

Next, we evaluated the TARGET-AML dataset, which framed age and OS as multitask objectives (Table S8), and the TCGA-BRCA dataset for PAM50 subtype classification (Table S9). For TARGET-AML, TxT (MTL) exhibited better performance, attaining the best or second-best results on each metric, including a C-Index of 0.718 and lower error rates for age prediction than existing methods. Similarly, on TCGA-BRCA, TxT achieved the highest accuracy, precision, recall, and F1 scores. Overall, these results underscore the robustness of our framework, demonstrating consistent improvements in both clinical feature and survival prediction. These improvements were especially pronounced under the MTL setup, which generally led to better performance than single-task variants with particularly notable gains in survival prediction.

To further evaluate model generalizability and minimize the risk of overfitting, we also performed a more controlled internal validation using a cluster-based five-fold cross-validation strategy. Specifically, patient embeddings from the final encoder layer were clustered using k-means ( Inline graphic ), and each cluster was used as a test fold in turn. As shown in Tables S11 and S12, TxT maintained better performance across all tasks compared to baseline models. These results confirm that TxT’s performance is not limited to random splits, but holds under more stringent and biologically structured validation schemes.

To evaluate performance beyond internal validation, we applied the model trained on the SCAN-B OS dataset to the independent METABRIC dataset [43]. As summarized in Table S13, TxT showed comparable performance relative to baseline models across clinical prediction tasks and survival estimation. Details on baseline model implementations, including preprocessing and architectural adaptations, are provided in Supplementary Methods.

t-SNE visualization of patient embeddings

To examine TxT’s learned representations, we extracted the final-layer encoder outputs for the SCAN-B OS and DRFi datasets and projected them into two dimensions using t-SNE (Fig. 4). Each column in Fig. 4 corresponds to one of the five tasks, with samples color-coded by task-related values or labels. In the OS dataset (Fig. 4, upper panel), younger (blue) and older (red) patients form distinct regions under the age task, while tumor size exhibits a weaker gradient . NHG and PAM50 subtypes cluster more clearly, with Basal samples separating from other subtypes. Notably, survival risk scores also form a gradient, indicating that TxT captures survival-related signals in its shared embedding space. A similar pattern arises for DRFi (Fig. 4, lower panel), where younger/older patients again split along different regions, and patients sharing the same NHG or PAM50 labels group together.

Learned latent embedding space of test samples visualized using t-SNE. Values or labels for each task are displayed as color bars or legends next to each plot. The t-SNE plots illustrate the capacity of the MTL model to capture and visually represent the relationships between various tasks in a low-dimensional embedding space. For regression and survival tasks (age, tumor size, and survival), the values were log-transformed to enhance color contrast. A separate t-SNE visualization using model prediction error as the color-coding metric is provided in Fig. S1.

Overall, these t-SNE plots highlight that TxT’s shared encoder representations reflect multiple clinical and molecular factors. Although tumor size shows a less pronounced pattern, there is still evidence that samples with similar sizes tend to cluster. These observations confirm TxT’s ability to integrate both transcriptomic and clinical signals into a single embedding space, thereby facilitating the effective multitask prediction of clinical features and survival outcomes.

Enhanced biological interpretability by clinical features

To investigate how clinical features affect gene-level attention within a shared transcriptomic context, we compared attention patterns between a multitask model and a single-task model. Genes exhibiting significant changes in attention weights upon the inclusion of clinical features were analyzed using pathway enrichment analysis to interpret the biological relevance of these changes (Fig. S2).

In the context of OS, the results revealed strong enrichment of pathways associated directly with breast cancer prognosis, including EMT and TNF- Inline graphic signaling via NF-B. These pathways are critically linked to tumor progression and metastasis, particularly influencing cell adhesion, invasion, and inflammation within the tumor microenvironment [6, 44]. The enrichment of EMT-related genes underscores the ability of clinical features to guide the model in prioritizing interactions relevant to metastatic potential and tumor aggressiveness, thereby providing a clear biological basis for improved survival prediction.

Additionally, for DRFi, we observed enrichment in pathways such as EMT, ECM–receptor interaction, and Notch signaling pathway. These pathways play pivotal roles in distant metastasis by facilitating cancer cell survival in circulation, metastatic niche formation, and promoting invasive phenotypes essential for tumor dissemination [45–47]. Thus, integrating clinical features into the model notably enhances its capability to identify biologically meaningful pathways specifically associated with distant recurrence.

Biological insights from attention-based gene interactions

We explored how attention-derived gene interaction networks provide biological insights into patient subgroups. Luminal A (LumA) breast cancer was selected for this analysis because it is generally less aggressive than other subtypes but exhibits significant heterogeneity in patient outcomes, making it an ideal candidate for exploring molecular differences using attention-based methods. To examine this diversity, we applied k-means clustering with Inline graphic in the latent embedding space of our multitask model for LumA patients in the test set, identifying two distinct subgroups with significantly different survival rates (Fig. 5A). Next, to obtain more detailed understanding into the molecular basis of each subgroup, we constructed attention-derived gene interaction networks for individual patients and identified hub genes and attractor genes (Fig. 5B and C). We then performed pathway enrichment analysis on these network-derived gene sets.

In the subgroup with more favorable survival, pathway analysis revealed enrichment in the inflammatory response category (Fig. 5D). Box plots (Fig. 5F) show increased expression levels of key immune-related genes such as PTGER4, NFKBIA, IL15RA, PTGIR, CALCRL, F3, and CX3CL1. These genes play crucial roles in various aspects of the immune response, including NF- Inline graphic B regulation, prostaglandin signaling, and T-cell activation [48]. The active immune microenvironment suggested by this gene expression profile may contribute to better tumor control and improved patient outcomes [5].

In contrast, the subgroup with worse survival exhibited enrichment of coagulation and EMT pathways (Fig. 5E). Box plots (Fig. 5G and H) show increased expression levels of genes such as S100A1, C2, and SPP1, which are associated with tumor progression and metastasis [4]. Interestingly, several genes typically associated with EMT and extracellular matrix remodeling are downregulated in Cluster 2, suggesting a complex interplay between pro-metastatic processes and ECM remodeling in this subgroup [49, 50].

While the initial clustering distinguished the two LumA subgroups based on latent embeddings, the subsequent construction and analysis of attention-derived gene interaction networks identified particular pathways underlying these differences in outcome. Through the multitask attention model’s capacity to capture patient-specific gene–gene relationships, we can identify immune-related connections in the better-surviving subgroup and pro-metastatic processes in the poorer-surviving subgroup.

To complement the pathway-level interpretation, we further investigated whether attention coefficients between specific gene pairs could reveal biologically meaningful patterns not captured by expression levels alone. As a case study, we focused on the interaction between SERPINE1 and PLAU, two genes known to co-regulate the plasminogen activation system and associated with poor prognosis in multiple cancer types [51]. While their expression levels do not differ significantly between the two LumA subgroups (Fig. 5G), the attention coefficients for both directions are significantly elevated in the poorer-surviving subgroup (Fig. S3). This suggests that the model captures differential regulatory relevance of gene–gene interactions at the embedding level, offering insights that go beyond standard expression-based analyses. This approach provides an effective method to integrate transcriptomic data with clinical outcomes, possibly guiding refined approaches for prognostication, and targeted interventions within the LumA subtype.

Model ablation and perturbation-based interpretation

To assess the importance of each head in the multihead self-attention layer, we performed an ablation study by selectively pruning one head at a time and evaluating the performance impact. As shown in Tables S14 and S15, removing any single head led to noticeable declines on at least one or two tasks, indicating that each head contributes unique information. In particular, pruning certain heads caused a dramatic increase in MAE for tumor size or a steep drop in C-Index. Overall, these findings confirm that multihead attention is crucial for capturing diverse task-specific patterns within a shared encoder.

We conducted ablation studies to assess the impact of PPI-based pretraining for gene embeddings. We compared two variants of TxT: (i) Random Init, which uses randomly initialized gene embeddings without pretraining, and (ii) Random PPI, which uses embeddings pretrained on a degree-preserving but biologically uninformative network generated via double-edge swaps of the STRING PPI. As shown in Tables S16 and S17, both variants showed consistent performance drops across prediction tasks. These results show that pretraining on biologically structured PPI networks provides meaningful inductive bias that improves downstream performance and interpretability.

To further investigate whether the trained model captures biologically plausible gene–risk relationships, we performed a perturbation-based simulation in which gene expression levels were systematically up- or down-regulated in the test set. We found that certain genes, when perturbed, had consistent effects on the predicted survival risk (Fig. S4). Notably, up-regulation of CCL19 led to decreased risk scores, aligning with previous findings that higher CCL19 expression is associated with favorable prognosis in breast cancer [52]. Similarly, down-regulation of SERPINB5 and FABP4 led to elevated predicted risk under the DRFi endpoint, consistent with their reported roles as prognostic markers when overexpressed [53, 54]. These results support the model’s ability to recover clinically relevant gene–outcome associations from transcriptomic inputs.

Discussion and conclusion

In this study, we presented TxT, an MTL model that integrates transcriptomic and clinical features for survival prediction and clinical-feature inference. Using a Transformer-based architecture with multihead attention, TxT identifies intricate gene–gene interactions while simultaneously accounting for established clinical features. Our experiments showed that this combined modeling strategy can improve performance across multiple tasks compared with single-task approach. Moreover, the attention-derived gene interaction networks illuminate biological pathways associated with different patient subgroups, such as immune-related processes in the better-surviving cluster of Luminal A patients versus coagulation and EMT pathways in the poorer-surviving cluster.

Despite these promising results, certain limitations remain. Computational constraints limited our analysis to a subset of genes, potentially overlooking additional interactions and pathways. Future work could address these constraints by adopting more efficient attention mechanisms—such as local or hierarchical attention [55, 56]—or by incorporating domain knowledge to guide the model toward biologically meaningful regions of the transcriptome. Additionally, validating the generalizability of TxT in other cancers or diseases would help confirm its utility in broader clinical contexts.

Recent progress in transformer-based architectures has extended their application to single-cell RNA sequencing data, allowing more detailed analysis of cellular heterogeneity within tumors. For example, UCE [57] classifies cell roles using millions of single-cell profiles, while scGPT [58] predicts cell states and types from 33 million cells, aiding in the discovery of rare cell populations and cell lifespan prediction. Geneformer [59] integrates data from Inline graphic 30 million cells across numerous studies to predict the impact of gene mutations on specific diseases and support gene-based diagnostics. These single-cell transformer methods supplement bulk RNA-seq by capturing both population-level and cell-specific dynamics, providing a more thorough knowledge of tumor biology. Future iterations of TxT could potentially incorporate these single-cell approaches to further enhance its predictive capabilities and biological insights.

Overall, this study highlights the potential of an attention-based multitask approach to integrate high-dimensional transcriptomic data with clinical features, improving prediction performance while offering biologically interpretable outcomes. The comprehensive evaluation, interpretability, and biological insights derived from our model underscore its potential to advance MTL in bioinformatics and to inform broader applications in healthcare and precision medicine.

Key Points

Transcriptome Transformer (TxT) effectively integrates transcriptomic data and clinical features through a multitask learning framework, significantly improving patient survival prediction.
TxT leverages multihead attention mechanisms to dynamically model complex gene–gene interactions, offering valuable biological insights.
Differential attention analysis demonstrates that incorporating clinical data guides the model to prioritize biologically relevant genes, highlighting pathways critical for understanding tumor progression and recurrence.

Supplementary Material

251104_TxT_supplementary_bbaf628

251104_txt_supplementary_bbaf628.pdf^{(1.4MB, pdf)}

Contributor Information

Bonil Koo, Interdisciplinary Program in Bioinformatics, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea; AIGENDRUG Co., Ltd., 1793, Nambusunhwan-ro, 08758 Seoul, Republic of Korea.

Inyoung Sung, BK21 FOUR Intelligence Computing, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea.

Sangseon Lee, Department of Artificial Intelligence, Inha University, 100 Inha-ro, 22180 Incheon, Republic of Korea.

Sun Kim, Interdisciplinary Program in Bioinformatics, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea; AIGENDRUG Co., Ltd., 1793, Nambusunhwan-ro, 08758 Seoul, Republic of Korea; Department of Computer Science and Engineering, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea; Interdisciplinary Program in Artificial Intelligence, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea.

Author contributions

B.K. and S.K. conceived the experiment(s); B.K. and I.S. conducted the experiment(s). All authors analyzed the results, wrote, and reviewed the manuscript.

Conflict of interest: None declared.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-NR077172), a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant no: RS-2024-00403375), and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) [RS-2021-II211343, Artificial Intelligence Graduate School Program (Seoul National University)]. This study is also funded by AIGENDRUG Co., Ltd. and ICT at Seoul National University provided research facilities.

Data availability

The SCAN-B dataset is available from Mendeley Data (https://data.mendeley.com/datasets/yzxtxn4nmd/3). The RNA sequencing-based gene expression profiles of the TCGA-BRCA and TARGET-AML datasets were downloaded from UCSC Xena (https://xenabrowser.net/datapages/) [60]. Clinical data for TCGA-BRCA were obtained from TCGA-CDR (https://gdc.cancer.gov/about-data/publications/PanCan-Clinical-2018) [61].

References

1. Tsimberidou A-M, Hong DS, Wheler JJ. et al. Long-term overall survival and prognostic score predicting survival: the impact study in precision medicine. J Hematol Oncol 2019;12:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Kimbung S, Loman N, Hedenfalk I. Clinical and molecular complexity of breast cancer metastases. Seminars in Cancer Biology 2015;35:85–95. 10.1016/j.semcancer.2015.08.009 [DOI] [PubMed] [Google Scholar]
3. Dai L-J, Ma D, Yu-Zheng X. et al. Molecular features and clinical implications of the heterogeneity in Chinese patients with HER2-low breast cancer. Nat Commun 2023;14:5112. 10.1038/s41467-023-40715-x [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Li NY, Weber CE, Mi Z. et al. Osteopontin up-regulates critical epithelial-mesenchymal transition transcription factors to induce an aggressive breast cancer phenotype. J Am Coll Surg 2013;217:17–26. 10.1016/j.jamcollsurg.2013.02.025 [DOI] [PubMed] [Google Scholar]
5. Oshi M, Patel A, Rongrong W. et al. Enhanced immune response outperform aggressive cancer biology and is associated with better survival in triple-negative breast cancer. NPJ Breast Cancer 2022;8:92. 10.1038/s41523-022-00466-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Thiery JP, Acloque H, Huang RYJ. et al. Epithelial-mesenchymal transitions in development and disease. cell 2009;139:871–90. 10.1016/j.cell.2009.11.007 [DOI] [PubMed] [Google Scholar]
7. Elena C, Galli A, Such E. et al. Integrating clinical features and genetic lesions in the risk assessment of patients with chronic myelomonocytic leukemia. Blood 2016;128:1408–17. 10.1182/blood-2016-05-714030 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Williams AB, Schumacher B. p53 in the DNA-damage-repair process. Cold Spring Harb Perspect Med 2016;6:a026070. 10.1101/cshperspect.a026070 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Aubrey BJ, Kelly GL, Janic A. et al. How does p53 induce apoptosis and how does this relate to p53-mediated tumour suppression? Cell Death Differ 2018;25:104–13. 10.1038/cdd.2017.169 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Vaswani A, Shazeer N, Parmar N. et al. Attention is all you need. Advances in neural information processing systems 2017;30:5998–6008. [Google Scholar]
11. Khan A, Lee B. Deepgene transformer: transformer for the gene expression-based classification of cancer subtypes. Expert Syst Appl 2023;226:120047. 10.1016/j.eswa.2023.120047 [DOI] [Google Scholar]
12. Ting-He Zhang M, Hasib M, Chiu Y-C. et al. Transformer for gene expression modeling (T-GEM): an interpretable deep learning model for gene expression-based phenotype predictions. Cancers 2022;14:4763. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Crawshaw M. Multi-task learning with deep neural networks: a survey. arXiv preprint, arXiv:2009.09796. 2020, preprint: not peer reviewed. 10.48550/arXiv.2009.09796 (accessed on 20 April 2025). [DOI]
14. Zhang Y, Yang Q. A survey on multi-task learning. IEEE Trans Knowl Data Eng 2021;34:5586–609. 10.1109/TKDE.2021.3070203 [DOI] [Google Scholar]
15. Kendall A, Gal Y, Cipolla R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE; 2018;7482–91. [Google Scholar]
16. Ye H, Dan X. Inverted pyramid multi-task transformer for dense scene understanding. European Conference on Computer Vision 2022;13687:514–530. 10.1007/978-3-031-19812-0_30 [DOI] [Google Scholar]
17. Liu P, Qiu X, Huang X-J. Adversarial multi-task learning for text classification. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada: Association for Computational Linguistics; pp. 1–10, 2017.
18. Clark K, Luong M-T, Khandelwal U. et al. Bam! Born-again multi-task networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy: Association for Computational Linguistics; pp. 5931–7, 2019.
19. Feinberg EN, Joshi E, Pande VS. et al. Improvement in ADMET prediction with multitask deep featurization. J Med Chem 2020;63:8835–48. 10.1021/acs.jmedchem.9b02187 [DOI] [PubMed] [Google Scholar]
20. Jiang Y, Rensi S, Wang S. et al. DrugOrchestra: jointly predicting drug response, targets, and side effects via deep multi-task learning. biorxiv. 2020;2020–11. 10.1101/2020.11.17.385757 (accessed on 20 April 2025). [DOI]
21. Yuan Q, Sheng Chen Y, Wang HZ. et al. Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning. Brief Bioinform 2022;23:bbac444. [DOI] [PubMed] [Google Scholar]
22. Zhang F, Zhao B, Shi W. et al. DeepDISOBind: accurate prediction of RNA-, DNA-and protein-binding intrinsically disordered residues with deep multi-task learning. Brief Bioinform 2022;23:bbab521. [DOI] [PubMed] [Google Scholar]
23. Zhang X, Xing Y, Sun K. et al. OmiEmbed: a unified multi-task deep learning framework for multi-omics data. Cancers 2021;13:3047. 10.3390/cancers13123047 [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint, arXiv:1312.6114. 2013, preprint: not peer reviewed. 10.48550/arXiv.1312.6114 (accessed on 20 April 2025). [DOI]
25. Chen Z, Badrinarayanan V, Lee C-Y. et al. Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: International Conference on Machine Learning, Stockholm, Sweden; pp. 794–803. PMLR, 2018. [Google Scholar]
26. Bamler R, Mandt S. Dynamic word embeddings. In: International Conference on Machine Learning, Sydney, Australia; pp. 380–9. PMLR, 2017. [Google Scholar]
27. Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. arXiv preprint, arXiv:1803.02155, 2018, preprint: not peer reviewed. 10.48550/arXiv.1803.02155 (accessed on 20 April 2025). [DOI]
28. Zitnik M, Leskovec J. Predicting multicellular function through multi-layer tissue networks. Bioinformatics 2017;33:i190–8. 10.1093/bioinformatics/btx252 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Wang J, Ma A, Chang Y. et al. scGNN is a novel graph neural network framework for single-cell RNA-seq analyses. Nat Commun 2021;12:1882. 10.1038/s41467-021-22197-x [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA: ACM; pp. 855–64, 2016. [DOI] [PMC free article] [PubMed]
31. Ke G, He D, Liu T-Y. Rethinking positional encoding in language pre-training. In: International Conference on Learning Representations, Addis Ababa, Ethiopia (virtual): ICLR / OpenReview.net; 2020.
32. Fotso S. Deep neural networks for survival analysis based on a multi-task framework. arXiv preprint, arXiv:1801.05512. 2018, preprint: not peer reviewed. 10.48550/arXiv.1801.05512 (accessed on 20 April 2025). [DOI]
33. Sener O, Koltun V. Multi-task learning as multi-objective optimization. Advances in Neural Information Processing Systems 2018;31:527–538. [Google Scholar]
34. Tianhe Y, Kumar S, Gupta A. et al. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems 2020;33:5824–36. [Google Scholar]
35. Staaf J, Häkkinen J, Hegardt C. et al. RNA sequencing-based single sample predictors of molecular subtype and risk of recurrence for clinical assessment of early-stage breast cancer. NPJ Breast Cancer 2022;8:94. 10.1038/s41523-022-00465-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Farrar JE, Schuback HL, Ries RE. et al. Genomic profiling of pediatric acute myeloid leukemia reveals a changing mutational landscape from disease diagnosis to relapse. Cancer Res 2016;76:2197–205. 10.1158/0008-5472.CAN-15-1015 [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Brigham & Women’s Hospital & Harvard Medical School Chin Lynda 9, Park Peter J, Raju K. et al. Comprehensive molecular portraits of human breast tumours. Nature 2012;490:61–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Szklarczyk D, Kirsch R, Koutrouli M. et al. The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 2023;51:D638–46. 10.1093/nar/gkac1000 [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Liberzon A, Birger C, Thorvaldsdóttir H. et al. The molecular signatures database hallmark gene set collection. Cell systems 2015;1:417–25. 10.1016/j.cels.2015.12.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Jiang L, Chao X, Bai Y. et al. Autosurv: interpretable deep learning framework for cancer survival analysis incorporating clinical and multi-omics data. NPJ Precision Oncol 2024;8:4. 10.1038/s41698-023-00494-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Wang S, Liu Y, Zhang H. et al. SurvConvMixer: robust and interpretable cancer survival prediction based on convmixer using pathway-level gene expression images. BMC Bioinform 2024;25:133. 10.1186/s12859-024-05745-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Yan M, Dong Z, Zhu Z. et al. Cancer type and survival prediction based on transcriptomic feature map. Comput Biol Med 2025;192:110267. 10.1016/j.compbiomed.2025.110267 [DOI] [PubMed] [Google Scholar]
43. Curtis C, Shah SP, Chin S-F. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012;486:346–52. 10.1038/nature10983 [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Balkwill F. Tumour necrosis factor and cancer. Nat Rev Cancer 2009;9:361–71. 10.1038/nrc2628 [DOI] [PubMed] [Google Scholar]
45. Labelle M, Begum S, Hynes RO. Direct signaling between platelets and cancer cells induces an epithelial-mesenchymal-like transition and promotes metastasis. Cancer Cell 2011;20:576–90. 10.1016/j.ccr.2011.09.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Hynes RO. The extracellular matrix: not just pretty fibrils. Science 2009;326:1216–9. 10.1126/science.1176009 [DOI] [PMC free article] [PubMed] [Google Scholar]
47. Ranganathan P, Weaver KL, Capobianco AJ. Notch signalling in solid tumours: a little bit of everything but not all the time. Nat Rev Cancer 2011;11:338–51. 10.1038/nrc3035 [DOI] [PubMed] [Google Scholar]
48. Marra P, Mathew S, Grigoriadis A. et al. IL15RA drives antagonistic mechanisms of cancer development and immune control in lymphocyte-enriched triple-negative breast cancers. Cancer Res 2014;74:4908–21. 10.1158/0008-5472.CAN-14-0637 [DOI] [PubMed] [Google Scholar]
49. Prakash J, Shaked Y. The interplay between extracellular matrix remodeling and cancer therapeutics. Cancer Discov 2024;14:1375–88. 10.1158/2159-8290.CD-24-0002 [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Elgundi Z, Papanicolaou M, Major G. et al. Cancer metastasis: the role of the extracellular matrix and the heparan sulfate proteoglycan perlecan. Front Oncol 2020;9:1482. 10.3389/fonc.2019.01482 [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Yuming J, Wang Z, Wang Q. et al. Pan-cancer analysis of SERPINE1 with a concentration on immune therapeutic and prognostic in gastric cancer. J Cell Mol Med 2024;28:e18579. 10.1111/jcmm.18579 [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Wang J, Qin D, Ye L. et al. CCL19 has potential to be a potential prognostic biomarker and a modulator of tumor immune microenvironment (TIME) of breast cancer: a comprehensive analysis based on tcga database. Aging (Albany NY) 2022;14:4158–75. 10.18632/aging.204081 [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Zhong C-Q, Zhang X-P, Ma N. et al. FABP4 suppresses proliferation and invasion of hepatocellular carcinoma cells and predicts a poor prognosis for hepatocellular carcinoma. Cancer Med 2018;7:2629–40. 10.1002/cam4.1511 [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Maass N, Hojo T, Rösel F. et al. Down regulation of the tumor suppressor gene maspin in breast carcinoma is associated with a higher risk of distant metastasis. Clin Biochem 2001;34:303–7. 10.1016/S0009-9120(01)00220-X [DOI] [PubMed] [Google Scholar]
55. Guo M, Ainslie J, Uthus D. et al. LongT5: efficient text-to-text transformer for long sequences. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 724–736, Seattle, United States. Association for Computational Linguistics.
56. Ye Z, Guo Q, Gan Q. et al. BP-transformer: modelling long-range context via binary partitioning. arXiv preprint, arXiv:1911.04070. 2019, preprint: not peer reviewed. 10.48550/arXiv.1911.04070 (accessed on 20 April 2025). [DOI]
57. Rosen Y, Roohani Y, Agarwal A. et al. Tabula sapiens consortium, Stephen R quake, and jure Leskovec. Universal cell embeddings: a foundation model for cell biology. bioRxiv. 2023, preprint: not peer reviewed; 2023–11. 10.1101/2023.11.28.568918 (acceessed on 20 April 2025). [DOI]
58. Cui H, Wang C, Maan H. et al. ScGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods 2024;21.8:1–11. [DOI] [PubMed] [Google Scholar]
59. Theodoris CV, Xiao L, Chopra A. et al. Transfer learning enables predictions in network biology. Nature 2023;618:616–24. 10.1038/s41586-023-06139-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
60. Goldman MJ, Craft B, Hastie M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol 2020;38:675–8. 10.1038/s41587-020-0546-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
61. Liu J, Lichtenberg T, Hoadley KA. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 2018;173:400–416.e11. 10.1016/j.cell.2018.02.052 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

251104_TxT_supplementary_bbaf628

251104_txt_supplementary_bbaf628.pdf^{(1.4MB, pdf)}

Data Availability Statement

[ref1] 1. Tsimberidou A-M, Hong DS, Wheler JJ. et al. Long-term overall survival and prognostic score predicting survival: the impact study in precision medicine. J Hematol Oncol 2019;12:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] 2. Kimbung S, Loman N, Hedenfalk I. Clinical and molecular complexity of breast cancer metastases. Seminars in Cancer Biology 2015;35:85–95. 10.1016/j.semcancer.2015.08.009 [DOI] [PubMed] [Google Scholar]

[ref3] 3. Dai L-J, Ma D, Yu-Zheng X. et al. Molecular features and clinical implications of the heterogeneity in Chinese patients with HER2-low breast cancer. Nat Commun 2023;14:5112. 10.1038/s41467-023-40715-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] 4. Li NY, Weber CE, Mi Z. et al. Osteopontin up-regulates critical epithelial-mesenchymal transition transcription factors to induce an aggressive breast cancer phenotype. J Am Coll Surg 2013;217:17–26. 10.1016/j.jamcollsurg.2013.02.025 [DOI] [PubMed] [Google Scholar]

[ref5] 5. Oshi M, Patel A, Rongrong W. et al. Enhanced immune response outperform aggressive cancer biology and is associated with better survival in triple-negative breast cancer. NPJ Breast Cancer 2022;8:92. 10.1038/s41523-022-00466-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref6] 6. Thiery JP, Acloque H, Huang RYJ. et al. Epithelial-mesenchymal transitions in development and disease. cell 2009;139:871–90. 10.1016/j.cell.2009.11.007 [DOI] [PubMed] [Google Scholar]

[ref7] 7. Elena C, Galli A, Such E. et al. Integrating clinical features and genetic lesions in the risk assessment of patients with chronic myelomonocytic leukemia. Blood 2016;128:1408–17. 10.1182/blood-2016-05-714030 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] 8. Williams AB, Schumacher B. p53 in the DNA-damage-repair process. Cold Spring Harb Perspect Med 2016;6:a026070. 10.1101/cshperspect.a026070 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] 9. Aubrey BJ, Kelly GL, Janic A. et al. How does p53 induce apoptosis and how does this relate to p53-mediated tumour suppression? Cell Death Differ 2018;25:104–13. 10.1038/cdd.2017.169 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] 10. Vaswani A, Shazeer N, Parmar N. et al. Attention is all you need. Advances in neural information processing systems 2017;30:5998–6008. [Google Scholar]

[ref11] 11. Khan A, Lee B. Deepgene transformer: transformer for the gene expression-based classification of cancer subtypes. Expert Syst Appl 2023;226:120047. 10.1016/j.eswa.2023.120047 [DOI] [Google Scholar]

[ref12] 12. Ting-He Zhang M, Hasib M, Chiu Y-C. et al. Transformer for gene expression modeling (T-GEM): an interpretable deep learning model for gene expression-based phenotype predictions. Cancers 2022;14:4763. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] 13. Crawshaw M. Multi-task learning with deep neural networks: a survey. arXiv preprint, arXiv:2009.09796. 2020, preprint: not peer reviewed. 10.48550/arXiv.2009.09796 (accessed on 20 April 2025). [DOI]

[ref14] 14. Zhang Y, Yang Q. A survey on multi-task learning. IEEE Trans Knowl Data Eng 2021;34:5586–609. 10.1109/TKDE.2021.3070203 [DOI] [Google Scholar]

[ref15] 15. Kendall A, Gal Y, Cipolla R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE; 2018;7482–91. [Google Scholar]

[ref16] 16. Ye H, Dan X. Inverted pyramid multi-task transformer for dense scene understanding. European Conference on Computer Vision 2022;13687:514–530. 10.1007/978-3-031-19812-0_30 [DOI] [Google Scholar]

[ref17] 17. Liu P, Qiu X, Huang X-J. Adversarial multi-task learning for text classification. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada: Association for Computational Linguistics; pp. 1–10, 2017.

[ref18] 18. Clark K, Luong M-T, Khandelwal U. et al. Bam! Born-again multi-task networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy: Association for Computational Linguistics; pp. 5931–7, 2019.

[ref19] 19. Feinberg EN, Joshi E, Pande VS. et al. Improvement in ADMET prediction with multitask deep featurization. J Med Chem 2020;63:8835–48. 10.1021/acs.jmedchem.9b02187 [DOI] [PubMed] [Google Scholar]

[ref20] 20. Jiang Y, Rensi S, Wang S. et al. DrugOrchestra: jointly predicting drug response, targets, and side effects via deep multi-task learning. biorxiv. 2020;2020–11. 10.1101/2020.11.17.385757 (accessed on 20 April 2025). [DOI]

[ref21] 21. Yuan Q, Sheng Chen Y, Wang HZ. et al. Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning. Brief Bioinform 2022;23:bbac444. [DOI] [PubMed] [Google Scholar]

[ref22] 22. Zhang F, Zhao B, Shi W. et al. DeepDISOBind: accurate prediction of RNA-, DNA-and protein-binding intrinsically disordered residues with deep multi-task learning. Brief Bioinform 2022;23:bbab521. [DOI] [PubMed] [Google Scholar]

[ref23] 23. Zhang X, Xing Y, Sun K. et al. OmiEmbed: a unified multi-task deep learning framework for multi-omics data. Cancers 2021;13:3047. 10.3390/cancers13123047 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] 24. Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint, arXiv:1312.6114. 2013, preprint: not peer reviewed. 10.48550/arXiv.1312.6114 (accessed on 20 April 2025). [DOI]

[ref25] 25. Chen Z, Badrinarayanan V, Lee C-Y. et al. Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: International Conference on Machine Learning, Stockholm, Sweden; pp. 794–803. PMLR, 2018. [Google Scholar]

[ref26] 26. Bamler R, Mandt S. Dynamic word embeddings. In: International Conference on Machine Learning, Sydney, Australia; pp. 380–9. PMLR, 2017. [Google Scholar]

[ref27] 27. Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. arXiv preprint, arXiv:1803.02155, 2018, preprint: not peer reviewed. 10.48550/arXiv.1803.02155 (accessed on 20 April 2025). [DOI]

[ref28] 28. Zitnik M, Leskovec J. Predicting multicellular function through multi-layer tissue networks. Bioinformatics 2017;33:i190–8. 10.1093/bioinformatics/btx252 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] 29. Wang J, Ma A, Chang Y. et al. scGNN is a novel graph neural network framework for single-cell RNA-seq analyses. Nat Commun 2021;12:1882. 10.1038/s41467-021-22197-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] 30. Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA: ACM; pp. 855–64, 2016. [DOI] [PMC free article] [PubMed]

[ref31] 31. Ke G, He D, Liu T-Y. Rethinking positional encoding in language pre-training. In: International Conference on Learning Representations, Addis Ababa, Ethiopia (virtual): ICLR / OpenReview.net; 2020.

[ref32] 32. Fotso S. Deep neural networks for survival analysis based on a multi-task framework. arXiv preprint, arXiv:1801.05512. 2018, preprint: not peer reviewed. 10.48550/arXiv.1801.05512 (accessed on 20 April 2025). [DOI]

[ref33] 33. Sener O, Koltun V. Multi-task learning as multi-objective optimization. Advances in Neural Information Processing Systems 2018;31:527–538. [Google Scholar]

[ref34] 34. Tianhe Y, Kumar S, Gupta A. et al. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems 2020;33:5824–36. [Google Scholar]

[ref35] 35. Staaf J, Häkkinen J, Hegardt C. et al. RNA sequencing-based single sample predictors of molecular subtype and risk of recurrence for clinical assessment of early-stage breast cancer. NPJ Breast Cancer 2022;8:94. 10.1038/s41523-022-00465-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref36] 36. Farrar JE, Schuback HL, Ries RE. et al. Genomic profiling of pediatric acute myeloid leukemia reveals a changing mutational landscape from disease diagnosis to relapse. Cancer Res 2016;76:2197–205. 10.1158/0008-5472.CAN-15-1015 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] 37. Brigham & Women’s Hospital & Harvard Medical School Chin Lynda 9, Park Peter J, Raju K. et al. Comprehensive molecular portraits of human breast tumours. Nature 2012;490:61–70. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref38] 38. Szklarczyk D, Kirsch R, Koutrouli M. et al. The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 2023;51:D638–46. 10.1093/nar/gkac1000 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] 39. Liberzon A, Birger C, Thorvaldsdóttir H. et al. The molecular signatures database hallmark gene set collection. Cell systems 2015;1:417–25. 10.1016/j.cels.2015.12.004 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] 40. Jiang L, Chao X, Bai Y. et al. Autosurv: interpretable deep learning framework for cancer survival analysis incorporating clinical and multi-omics data. NPJ Precision Oncol 2024;8:4. 10.1038/s41698-023-00494-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref41] 41. Wang S, Liu Y, Zhang H. et al. SurvConvMixer: robust and interpretable cancer survival prediction based on convmixer using pathway-level gene expression images. BMC Bioinform 2024;25:133. 10.1186/s12859-024-05745-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref42] 42. Yan M, Dong Z, Zhu Z. et al. Cancer type and survival prediction based on transcriptomic feature map. Comput Biol Med 2025;192:110267. 10.1016/j.compbiomed.2025.110267 [DOI] [PubMed] [Google Scholar]

[ref43] 43. Curtis C, Shah SP, Chin S-F. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012;486:346–52. 10.1038/nature10983 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref44] 44. Balkwill F. Tumour necrosis factor and cancer. Nat Rev Cancer 2009;9:361–71. 10.1038/nrc2628 [DOI] [PubMed] [Google Scholar]

[ref45] 45. Labelle M, Begum S, Hynes RO. Direct signaling between platelets and cancer cells induces an epithelial-mesenchymal-like transition and promotes metastasis. Cancer Cell 2011;20:576–90. 10.1016/j.ccr.2011.09.009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] 46. Hynes RO. The extracellular matrix: not just pretty fibrils. Science 2009;326:1216–9. 10.1126/science.1176009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref47] 47. Ranganathan P, Weaver KL, Capobianco AJ. Notch signalling in solid tumours: a little bit of everything but not all the time. Nat Rev Cancer 2011;11:338–51. 10.1038/nrc3035 [DOI] [PubMed] [Google Scholar]

[ref48] 48. Marra P, Mathew S, Grigoriadis A. et al. IL15RA drives antagonistic mechanisms of cancer development and immune control in lymphocyte-enriched triple-negative breast cancers. Cancer Res 2014;74:4908–21. 10.1158/0008-5472.CAN-14-0637 [DOI] [PubMed] [Google Scholar]

[ref49] 49. Prakash J, Shaked Y. The interplay between extracellular matrix remodeling and cancer therapeutics. Cancer Discov 2024;14:1375–88. 10.1158/2159-8290.CD-24-0002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref50] 50. Elgundi Z, Papanicolaou M, Major G. et al. Cancer metastasis: the role of the extracellular matrix and the heparan sulfate proteoglycan perlecan. Front Oncol 2020;9:1482. 10.3389/fonc.2019.01482 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref51] 51. Yuming J, Wang Z, Wang Q. et al. Pan-cancer analysis of SERPINE1 with a concentration on immune therapeutic and prognostic in gastric cancer. J Cell Mol Med 2024;28:e18579. 10.1111/jcmm.18579 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref52] 52. Wang J, Qin D, Ye L. et al. CCL19 has potential to be a potential prognostic biomarker and a modulator of tumor immune microenvironment (TIME) of breast cancer: a comprehensive analysis based on tcga database. Aging (Albany NY) 2022;14:4158–75. 10.18632/aging.204081 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref53] 53. Zhong C-Q, Zhang X-P, Ma N. et al. FABP4 suppresses proliferation and invasion of hepatocellular carcinoma cells and predicts a poor prognosis for hepatocellular carcinoma. Cancer Med 2018;7:2629–40. 10.1002/cam4.1511 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref54] 54. Maass N, Hojo T, Rösel F. et al. Down regulation of the tumor suppressor gene maspin in breast carcinoma is associated with a higher risk of distant metastasis. Clin Biochem 2001;34:303–7. 10.1016/S0009-9120(01)00220-X [DOI] [PubMed] [Google Scholar]

[ref55] 55. Guo M, Ainslie J, Uthus D. et al. LongT5: efficient text-to-text transformer for long sequences. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 724–736, Seattle, United States. Association for Computational Linguistics.

[ref56] 56. Ye Z, Guo Q, Gan Q. et al. BP-transformer: modelling long-range context via binary partitioning. arXiv preprint, arXiv:1911.04070. 2019, preprint: not peer reviewed. 10.48550/arXiv.1911.04070 (accessed on 20 April 2025). [DOI]

[ref57] 57. Rosen Y, Roohani Y, Agarwal A. et al. Tabula sapiens consortium, Stephen R quake, and jure Leskovec. Universal cell embeddings: a foundation model for cell biology. bioRxiv. 2023, preprint: not peer reviewed; 2023–11. 10.1101/2023.11.28.568918 (acceessed on 20 April 2025). [DOI]

[ref58] 58. Cui H, Wang C, Maan H. et al. ScGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods 2024;21.8:1–11. [DOI] [PubMed] [Google Scholar]

[ref59] 59. Theodoris CV, Xiao L, Chopra A. et al. Transfer learning enables predictions in network biology. Nature 2023;618:616–24. 10.1038/s41586-023-06139-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref60] 60. Goldman MJ, Craft B, Hastie M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol 2020;38:675–8. 10.1038/s41587-020-0546-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref61] 61. Liu J, Lichtenberg T, Hoadley KA. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 2018;173:400–416.e11. 10.1016/j.cell.2018.02.052 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Transcriptome Transformer: improving patient survival prediction via multitask learning of transcriptomic and clinical features

Bonil Koo

Inyoung Sung

Sangseon Lee

Sun Kim

Abstract

Introduction

Motivation

Figure 1.

Challenge

Approach

Related work

Transformer

Multitask learning

Materials and methods

Computational formulation of transcriptome data for transformer models

Model architecture

Figure 2.

Pretraining gene embeddings on biological networks

Transcriptome-based positional embedding

Figure 3.

Multitask optimization technique

Differential attention analysis

Gene interaction modeling via multihead attention

Experiments

Datasets

Data preprocessing

Model training

Results

Predictive performance across multiple datasets

Table 1.

t-SNE visualization of patient embeddings

Figure 4.

Enhanced biological interpretability by clinical features

Biological insights from attention-based gene interactions

Figure 5.

Model ablation and perturbation-based interpretation

Discussion and conclusion

Key Points

Supplementary Material

Contributor Information

Author contributions

Funding

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases